Transfering 256 bytes over Tube

Discuss all aspects of programming here. From 8-bit through to modern architectures.
User avatar
kieranhj
Posts: 529
Joined: Sat Sep 19, 2015 10:11 pm
Location: Farnham, Surrey, UK

Transfering 256 bytes over Tube

Postby kieranhj » Sun Jan 15, 2017 4:18 pm

Hey all,

Given all the RasPi copros that seemed to arrive as Christmas presents, I thought I'd spend some time messing about with 6502 second processor code to see what might be possible. Thanks to Tom Seddon for his great write up of how the Tube works and JGH for his meticulous documentation.

After investigating the usual methods of data transfer (hooking into oswrch mostly) this was too slow for my particular application, so I moved on to using the fast 256 byte Tube transfer protocol. However, I've come to an impasse and cannot get it to work. Can anyone help me figure out what I'm doing wrong please?

Code: Select all

.host_param_block
{
    EQUW parasite_draw_buffer
    EQUW 0
}

.host_read256
{
    IF _DEBUG_RASTERS
    LDX #&00 + PAL_green
    STX &FE21
    ENDIF

    \\ Claim Tube
    .claim
    LDA #&C0 + &10
    JSR &406
    BCC claim

    \\ 256 byte read
    {
        LDA #LO(parasite_draw_buffer)
        STA host_param_block+0
        LDA #HI(parasite_draw_buffer)
        STA host_param_block+1

        LDX #LO(host_param_block)
        LDY #HI(host_param_block)
        LDA #&6
        JSR &406

        \\ Initial delay
        LDX #6
        .initial_delay
        DEX
        BNE initial_delay

        LDY #0

        LDX #0
        .loop
        LDA &FEE5           ; 4c
        STA &71             ; 3c
        NOP                 ; 2c
        NOP                 ; 2c
        NOP                 ; 2c
        NOP                 ; 2c
        NOP                 ; 2c
        INX                 ; 2c
        BNE loop            ; 3c
    }

    \\ Release Tube
    .release
    LDA #&80 + &10
    JSR &406         ;    <---- BLOCKED HERE


    IF _DEBUG_RASTERS
    LDX #&00 + PAL_black
    STX &FE21
    ENDIF

    RTS
}

For now this code is just throwing the data transferred into a sink (&71) for test purposes but it is supposed to be an array of addresses and byte masks for a particle system. However I am getting a hang during the release Tube call, specifically the Tube code is spinning waiting for R4 to be free whilst attempting to send the Tube ID over to the parasite:

Code: Select all

\ Release Tube
\ ------------
ORA #&40              :\ Ensure release ID same as claim ID
CMP &15:BNE L0434     :\ Not same as the claim ID, exit
.L0414
PHP:SEI               :\ Disable IRQs
LDA #05:JSR L069E     :\ Send &05 via R4 to CoPro
JSR L069C             :\ Send Tube ID to notify a Tube release

...

\ Send Tube ID via R4
\ ===================
.L069C
LDA &15               :\ Get Tube ID
:
\ Send byte in A via R4
\ =====================
.L069E
BIT TUBES4:BVC L069E  :\ Loop until R4 free   <---- BLOCKED HERE
STA TUBER4:RTS        :\ Send byte

Any Tube experts have any thoughts? It doesn't matter if I call the read function in an osbyte callback or the vsync event handler on the host, I still get the same result. This is in B-Em BTW, I haven't been able to test on real hardware yet (my Master Turbo or PiZero copro). It doesn't hang on BeebEm but the debugger isn't much help (unresponsive) so I can't tell what's going on.

I have a whole separate thread to start on why there aren't (m)any 2nd processor apps or games - it's certainly not designed for parallelism, it's designed to allow a different processor to utilise the I/O of the host Beeb, without wanting to state the obvious..!
Bitshifters Collective | Retro Code & Demos for BBC Micro & Acorn computers | https://bitshifters.github.io/

User avatar
jgharston
Posts: 2762
Joined: Thu Sep 24, 2009 11:22 am
Location: Whitby/Sheffield

Re: Transfering 256 bytes over Tube

Postby jgharston » Sun Jan 15, 2017 5:21 pm

Looks right, I'll have a look in more detail after my tea.

Code: Select all

$ bbcbasic
PDP11 BBC BASIC IV Version 0.25
(C) Copyright J.G.Harston 1989,2005-2015
>_

User avatar
jgharston
Posts: 2762
Joined: Thu Sep 24, 2009 11:22 am
Location: Whitby/Sheffield

Re: Transfering 256 bytes over Tube

Postby jgharston » Sun Jan 15, 2017 6:37 pm

It looks like you're pushing the timing requirements slightly over the edge. You've got:

Code: Select all

  \\ Initial delay
   LDX #6
  .initial_delay
   DEX               \\ 6*(2+3)+2-1cy = 31cy = 15us
   BNE initial_delay
   LDY #0            \\ 16us
   LDX #0            \\ 17us

The initial delay needs to be 19us, increasing X to 7 gives 7*(2+3)+2-1cy = 36cy = 18us, then another 2us giving 20us. Increasing X to 8 gives 8*(2+3)+2-1cy = 41cy = 20us and can continue straight to the first access without any intervening instructions.

Code: Select all

  .loop
   LDA &FEE5           ; 4c
   STA &71             ; 3c
   NOP                 ; 2c
   NOP                 ; 2c
   NOP                 ; 2c
   NOP                 ; 2c
   NOP                 ; 2c
   INX                 ; 2c
   BNE loop            ; 3c = 22cy = 11us (but actually 9.5us-ish)

The inter-byte delay needs to be 10us. However, you've been caught out by counting the time of the LDA &FEE5 in the loop, the delay is between accesses, so excluding the access itself. So, you can't count all 4 cycles of the LDA &FEE5. The safest is to count it as zero, that gives you 18cycles = 9us, but it is safe to count it as one, giving 19cycles = 9.5us. That's very marginal, and is inadvertantly caused by you using a dummy fast STA &71 instead of doing something with the data. Doing something useful such as STA (dest),Y takes 5 cycles which gets the loop to a safe 20cycles=10us not counting the LDA &FEE5 itself.

The following example code works, I've just tested it with a Z80, 6502 and ARM. It just dumps the data to the screen.

Code: Select all

   10 REM > Tube256/s
   20 :
   30 load%=&FFFF0900
   40 ParasiteBuffer=&F800
   50 DIM mcode% &100
   60 :
   70 FOR P=0 TO 1
   80   P%=load%:O%=mcode%
   90   [OPT P*3+4
  100   .exec%
  110   :
  120   .TubeClaim
  130   LDA #&C0 + &10
  140   JSR &406     \\ Claim Tube
  150   BCC TubeClaim
  160   :
  170   LDA #ParasiteBuffer AND 255
  180   STA TubeAddr+0
  190   LDA #ParasiteBuffer DIV 256
  200   STA TubeAddr+1
  210   LDX #TubeAddr AND 255
  220   LDY #TubeAddr DIV 256
  230   LDA #6
  240   JSR &406     \\ Initiate 256-byte read
  250   :
  260   LDX #7
  270   .TubeDelay
  280   DEX          \\ 7*(2+3)+2-1cy = 36cy = 18us
  290   BNE TubeDelay
  300   :
  310   LDY #0       \\ ...19us
  320   LDX #0       \\ ...20us
  330   .TubeLoop
  340   LDA &FEE5    \\ 4cy, count as 1cy
  350   STA &7F00,X  \\ 4cy
  360   NOP          \\ 2cy
  370   NOP          \\ 2cy
  380   NOP          \\ 2cy
  390   NOP          \\ 2cy
  400   NOP          \\ 2cy
  410   INX          \\ 2cy
  420   BNE TubeLoop \\ 3cy = 20cy/loop = 10us/loop
  430   :
  440   .TubeRelease
  450   LDA #&80 + &10
  460   JSR &406     \\ Release Tube
  470   :
  480   RTS
  490   :
  500   .TubeAddr
  510   EQUD 0
  520   ]NEXT
  530 PRINT"*SAVE RD256 ";~mcode%;" ";~O%;" ";~exec%OR&FFFF0000;" ";~load%
>*SPOOL

Code: Select all

$ bbcbasic
PDP11 BBC BASIC IV Version 0.25
(C) Copyright J.G.Harston 1989,2005-2015
>_

User avatar
hoglet
Posts: 6626
Joined: Sat Oct 13, 2012 6:21 pm
Location: Bristol

Re: Transfering 256 bytes over Tube

Postby hoglet » Sun Jan 15, 2017 6:45 pm

jgharston wrote:The inter-byte delay needs to be 10us. However, you've been caught out by counting the time of the LDA &FEE5 in the loop, the delay is between accesses, so excluding the access itself. So, you can't count all 4 cycles of the LDA &FEE5.

This is very interesting. Are you sure you can't include the LDA &FEE5?

That's counter to the example in the Tube App Note 004:
http://mdfs.net/Info/Comp/Acorn/AppNotes/004.pdf#page=9

Code: Select all

LOOP LDA &FEE5 ; Get it from the port ( 2 uS = 2)
STAIY &80 ; Put the data byte (+3 uS = 5)
NOP ; (+1 uS = 6)
NOP ; (+1 uS = 7)
NOP ; (+1 uS = 8)
INY ; (+1 uS = 9)
BNE LOOP ; Next data byte (+1.5 uS = 10.5 uS/byte)

Dave

User avatar
jgharston
Posts: 2762
Joined: Thu Sep 24, 2009 11:22 am
Location: Whitby/Sheffield

Re: Transfering 256 bytes over Tube

Postby jgharston » Sun Jan 15, 2017 7:00 pm

hoglet wrote:This is very interesting. Are you sure you can't include the LDA &FEE5?
That's counter to the example in the Tube App Note 004:

Code: Select all

LOOP LDA &FEE5 ; Get it from the port ( 2 uS = 2)
(snip)
BNE LOOP ; Next data byte (+1.5 uS = 10.5 uS/byte)

Hmm. I've found if you push it that tight you start having problems. All the working code I've seen and written are never that tight as they use a longer instruction than STA zp so the loop is never as tight as 9.5us or 10.5us depending how you measure it. I wonder if the AppNote used that example just to find the tightest example loop they could manage.

Edit: mis-read the MASM mnemonics, STAIY is STA (zp),Y so it is doing something useful - just very tight. The initial delay in kieranhj's original example was definitely too short though.

Edit edit: and, of course, that could be kieranhj's problem as his dummy code uses STA &81 which is two cycles as opposed to the STA (&81),Y in the AppNote which takes three cycles, so kieranhj's code is squeezed down to 10.0us if you do count the LDA &FEE5 which is tighter than the AppNote example code.

Code: Select all

$ bbcbasic
PDP11 BBC BASIC IV Version 0.25
(C) Copyright J.G.Harston 1989,2005-2015
>_

User avatar
kieranhj
Posts: 529
Joined: Sat Sep 19, 2015 10:11 pm
Location: Farnham, Surrey, UK

Re: Transfering 256 bytes over Tube

Postby kieranhj » Sun Jan 15, 2017 8:22 pm

jgharston wrote:It looks like you're pushing the timing requirements slightly over the edge. You've got:
<snip>
The following example code works, I've just tested it with a Z80, 6502 and ARM. It just dumps the data to the screen.

Thanks Jonathan! I will give this a go tomorrow. If the timing is off then I guess there would still be data in the FIFO hence the spin? Like hoglet I too was following assumptions from the AUG and AppNote but does seem like it's theoretical / sample code rather than thoroughly tested on real hardware.

I'll let you know how I get on and post my particle system sample + 2nd processor thoughts when it's working!
Bitshifters Collective | Retro Code & Demos for BBC Micro & Acorn computers | https://bitshifters.github.io/

User avatar
kieranhj
Posts: 529
Joined: Sat Sep 19, 2015 10:11 pm
Location: Farnham, Surrey, UK

Re: Transfering 256 bytes over Tube

Postby kieranhj » Mon Jan 16, 2017 8:57 pm

jgharston wrote:The following example code works, I've just tested it with a Z80, 6502 and ARM. It just dumps the data to the screen.

Is that on real hardware or an emulator? If I reduce my host code so it just calls that function and exits, I still get the hang on B-Em even with the increased waits as per your code. If I run on BeebEm then I get what I expect (a flash of background colour change from the raster debug) with any second processor selected. Testing on real hardware is a PITA - I haven't got time tonight and IIRC neither the default hacked DFS 0.9 supplied with TurboMMC in my Model B, nor DataCentre RAMFS in my Master, work correctly with Tube enabled. Last time I had to resort to copying files to a real floppy disc to check something on my the Turbo.
Bitshifters Collective | Retro Code & Demos for BBC Micro & Acorn computers | https://bitshifters.github.io/

User avatar
jgharston
Posts: 2762
Joined: Thu Sep 24, 2009 11:22 am
Location: Whitby/Sheffield

Re: Transfering 256 bytes over Tube

Postby jgharston » Mon Jan 16, 2017 10:21 pm

kieranhj wrote:
jgharston wrote:The following example code works, I've just tested it with a Z80, 6502 and ARM. It just dumps the data to the screen.
Is that on real hardware or an emulator?
Real hardware. I have sometimes noticed the timing requirements on B-Em are even tighter than real hardware. I found that some code where I miscounted cycles talking to the Z80 worked perfectly well on real hardware but hung on B-Em.

kieranhj wrote:If I reduce my host code so it just calls that function and exits, I still get the hang on B-Em even with the increased waits as per your code.
You can't initiate a 256-byte transfer then quit, you *must* do the 256-byte transfer as the client's ISR is still sitting there waiting for the 256 bytes within the ISR itself. You can initiate a 1- or 2-byte transfer than quit without doing anything as each byte/word transfer is an individual interupt.
(Edit: re-reading that I'm not sure if you mean JSR TubeClaim/JSR TubeInit/JSR TubeRelease or if you mean JSR read_256_byte_example_code/RTS)

kieranhj wrote:neither the default hacked DFS 0.9 supplied with TurboMMC in my Model B, nor DataCentre RAMFS in my Master, work correctly with Tube enabled.
The latest version of RAMFS is v1.04 which incorporates the Tube bugfixes that fix the problems you are seeing.

Code: Select all

$ bbcbasic
PDP11 BBC BASIC IV Version 0.25
(C) Copyright J.G.Harston 1989,2005-2015
>_

User avatar
kieranhj
Posts: 529
Joined: Sat Sep 19, 2015 10:11 pm
Location: Farnham, Surrey, UK

Re: Transfering 256 bytes over Tube

Postby kieranhj » Wed Jan 18, 2017 9:50 pm

jgharston wrote:Real hardware. I have sometimes noticed the timing requirements on B-Em are even tighter than real hardware. I found that some code where I miscounted cycles talking to the Z80 worked perfectly well on real hardware but hung on B-Em.

You can't initiate a 256-byte transfer then quit, you *must* do the 256-byte transfer as the client's ISR is still sitting there waiting for the 256 bytes within the ISR itself. You can initiate a 1- or 2-byte transfer than quit without doing anything as each byte/word transfer is an individual interupt.
(Edit: re-reading that I'm not sure if you mean JSR TubeClaim/JSR TubeInit/JSR TubeRelease or if you mean JSR read_256_byte_example_code/RTS)

The latest version of RAMFS is v1.04 which incorporates the Tube bugfixes that fix the problems you are seeing.

Thanks again JG. I was running just the whole 256 byte transfer function with ample timing but not able to complete the Tube release under emulation. Thanks to the updated version of RAMFS I just ran this on my Master Turbo and got what I expected - a flash of green raster whilst the transfer was active then some crap left on the screen copied over from the parasite RAM. This shows I can't use B-Em or jsbeeb for second processor development right now. :(

I will file a bug with Matt for jsbeeb and poke around in the B-Em internals since I got it building on Windows recently. I would use BeebEm but I just don't get any joy with the debugger - not sure if it's just me?
Bitshifters Collective | Retro Code & Demos for BBC Micro & Acorn computers | https://bitshifters.github.io/

User avatar
tricky
Posts: 1918
Joined: Tue Jun 21, 2011 8:25 am
Contact:

Re: Transfering 256 bytes over Tube

Postby tricky » Thu Jan 19, 2017 10:56 am

I generally use beebem for its debugger, I keep meaning to explore the importing labels and scripts, but haven't done yet. There are times when b-em or jsbeeb are more appropriate.

User avatar
kieranhj
Posts: 529
Joined: Sat Sep 19, 2015 10:11 pm
Location: Farnham, Surrey, UK

Re: Transfering 256 bytes over Tube

Postby kieranhj » Wed Jan 25, 2017 9:05 pm

tricky wrote:I generally use beebem for its debugger, I keep meaning to explore the importing labels and scripts, but haven't done yet. There are times when b-em or jsbeeb are more appropriate.

I must be experiencing something weird because it takes ~30 seconds for the BeebEm debugger to break into the code for me and then just won't step to next instruction (unless it's taking 30 seconds per step.) I like B-Em command line for speed once you know it - plus now I have the code building it's pretty easy to add new functionality - I added registers, memory inspection and disassembly of 6502 second processor already.

Not looking forward to stepping through two lots of 6502 code in parallel whilst simultaneously stepping through the emulator code itself in C to debug the Tube timing problem I'm seeing. :?
Bitshifters Collective | Retro Code & Demos for BBC Micro & Acorn computers | https://bitshifters.github.io/


Return to “programming”

Who is online

Users browsing this forum: No registered users and 1 guest