Hacker needed ... for Zarch ;-)

chat about arc/risc pc gaming & RISC OS software here (NOT the core OS!)

Related forum: adventures


User avatar
Rich Talbot-Watkins
Posts: 1121
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca

Re: Hacker needed ... for Zarch ;-)

Postby Rich Talbot-Watkins » Fri Nov 06, 2015 10:28 pm

When calculating the start/end points of a line of a filled poly, there are two options: Bresenham or calculate the gradient in advance which advances x for each increment to y. Bresenham's method works well for 'steep' lines, but with 'shallow' lines there is necessary iteration as you increment/decrement x until it results in an increment to y. The gradient method is quick regardless of line gradient, but has the setup cost of a division (or reciprocal mult if you store reciprocal tables). However it's easier to perform accurate clipping against the viewport if you have the gradient.

What have people done in the past for their filled poly routines? I think I've tried both approaches in the past and I think Bresenham generally resulted faster, although clipping is more awkward. I'm guessing for small polys like Zarch, it'd be the better option. Although maybe you could just create an enormous table of f(x,y) = (x<<16)/y for looking up gradients where x and y were both below a certain limit.

sirbod
Posts: 742
Joined: Mon Apr 09, 2012 8:44 am
Location: Essex
Contact:

Re: Hacker needed ... for Zarch ;-)

Postby sirbod » Sat Nov 07, 2015 7:08 am

Rich Talbot-Watkins wrote:The gradient method is quick regardless of line gradient, but has the setup cost of a division (or reciprocal mult if you store reciprocal tables). However it's easier to perform accurate clipping against the viewport if you have the gradient.

What have people done in the past for their filled poly routines?

In that case I've only ever used the gradient method with a reciprocal table and MUL and in the case of Zarch, where the point distances are below 64 I've added a pre-calculated table of the reciprocal already multiplied. So the setup cost of the gradient (including the fallback to the reciprocal with MUL) is 7S+1N+1I best case or +16I worst case if it falls back to the reciprocal with MUL. All three gradients taking 21N+3N+3I (or ~4650nS using the table above) - which is negligible when you consider the amount of screen memory writes going on.

I suspect we might get away with ignoring distances over 64x64 (I need to analyse a longer recording of tri's/quad's to be certain), in which case we can reduce that to 4S+1N+1I per gradient and drop the reciprocal table.

The 64x64 lookup table is currently 32kB, it's actually 128x64 (ie +-63X by 63Y to avoid sign conversion), increasing the distance to +-256x128 would use 128kB - which fits within the 640kB limit we have easily.

Once it's coded, I'll post it on here for folk to pick at/improve although I don't think we'll get many gains out of it, where Zarch is losing all it's time is on STR's to the screen memory - we need to reduce these to the bare minimum with as few wasted writes and the lowest use of STRB's as possible.

The screen clearing code is where big gains can be had as it's generating ~30% of the writes. If we can combine this into the landscape plotting code by using the line fill instead, we could remove up to 90% of those writes. This can fairly easily be added for the left/right landscape edges with a slight overlap from the top 1/3 screen screen to ensure landscape height is also cleared above the top of the top/left Quad.

Before we can make any drastic changes to the code however, I need to change all the absolute memory references to relative - and there's thousands of them, in link pointers and link tables. It's going to take weeks to sort that out. And as we can't release Zarch, I'll also need to write a patch Module that works on the original protected floppy, before anyone can get their hands on all our hard labour to try it for themselves.

I'll have to figure out how to record it at some point, so I can post YouTube videos of progress.

User avatar
trixster
Posts: 527
Joined: Wed May 06, 2015 11:45 am
Location: York

Re: Hacker needed ... for Zarch ;-)

Postby trixster » Sat Nov 07, 2015 8:08 am

What you guys are doing sounds like complete wizardry to me, but carry on! Optimisation of zarch is a great endeavour but I can't say the frame rate has ever been the issue for me, more the limited draw distance. Could the speed gains you're seeing be instead used to increase the draw distance whilst maintaining the original frame rate?
A3020 | A3000 | BBC B + 128K RAM/ROM + 20K Shadow + Pi0 + VideoNuLA
BBC Master Turbo + DC | Atom | A1200 060 | A500 | Jaguar | A420/1
A4000/040 060 | Atari Falcon 060 | Saturn | PS1 | SNES | CPC6128 | C64 | 3DO | MD

sirbod
Posts: 742
Joined: Mon Apr 09, 2012 8:44 am
Location: Essex
Contact:

Re: Hacker needed ... for Zarch ;-)

Postby sirbod » Sat Nov 07, 2015 10:08 am

trixster wrote:I can't say the frame rate has ever been the issue for me, more the limited draw distance. Could the speed gains you're seeing be instead used to increase the draw distance whilst maintaining the original frame rate?

That's the idea.

I've now performed the speed tests of a 64x64 distance lookup table vs Reciprocal table and MUL, results in ms are:

STRB_INSIDE_TRIANGLE_ROUTINE_128x128.png

The demo loop runs in under 60 seconds using the 64x64 table, which is 27.5 seconds quicker than Zarch's current code. Note this is using my original triangle routine I wrote for Zarch back in 1988, I expect to see further gains once I add in the Quadrilateral routine for the landscape.
Last edited by sirbod on Sat Nov 07, 2015 9:23 pm, edited 1 time in total.

Zarchos
Posts: 2355
Joined: Sun May 19, 2013 8:19 am
Location: FRANCE

Re: Hacker needed ... for Zarch ;-)

Postby Zarchos » Sat Nov 07, 2015 10:36 am

Fantastic !

sirbod
Posts: 742
Joined: Mon Apr 09, 2012 8:44 am
Location: Essex
Contact:

Re: Hacker needed ... for Zarch ;-)

Postby sirbod » Sat Nov 07, 2015 12:10 pm

Zarchos wrote:Fantastic !

And with your customised line fill routines, we should see further gains. Remember we only need 7-32 pixels covered by it, I'll use the routine above for larger lengths as it won't change the performance.

Zarchos
Posts: 2355
Joined: Sun May 19, 2013 8:19 am
Location: FRANCE

Re: Hacker needed ... for Zarch ;-)

Postby Zarchos » Sat Nov 07, 2015 12:21 pm

sirbod wrote:
Zarchos wrote:Fantastic !

And with your customised line fill routines, we should see further gains. Remember we only need 7-32 pixels covered by it, I'll use the routine above for larger lengths as it won't change the performance.


I hope what I did is OK, although it's true it's not with a game like Zarch you can see how fast they are : triangles are too small.
Source sent to Steve although not complete, I'm having 4 days off but the laptop pc and source are traveling with me to keep on coding.

sirbod
Posts: 742
Joined: Mon Apr 09, 2012 8:44 am
Location: Essex
Contact:

Re: Hacker needed ... for Zarch ;-)

Postby sirbod » Sat Nov 07, 2015 9:37 pm

Zarchos wrote:I hope what I did is OK, although it's true it's not with a game like Zarch you can see how fast they are : triangles are too small.

I'll test once I get it, I'm presuming Steve is going to pass it on?

I've now analysed the distances of the vertical vectors and off the back of that, have increased the gradient lookup table to 128x128 and updated the graph above. All gradients are now coming from the lookup table with no use of reciprocal/MUL's; its exactly 50% quicker than Zarch's triangle routine when filling lines up to 6 pixels inside the triangle routine :D

Quadrilateral routine and testing qUE's mask suggestion next.

sirbod
Posts: 742
Joined: Mon Apr 09, 2012 8:44 am
Location: Essex
Contact:

Re: Hacker needed ... for Zarch ;-)

Postby sirbod » Thu Nov 12, 2015 8:15 pm

I've produced line fill statistics to show the effect of using a quadrilateral routine on line fill lengths, over the two minute recording I'm using to benchmark.

Using only triangles, line fills lengths are:
Image

Adding in the quadrilateral routine shifts the line fill length bias more towards the average width of the landscape tiles:
ZARCH_LINE_FILL_STATS_QUAD_AND_TRI.png

The blue line on the two graphs can be directly compared, showing line fill lengths in Zarch as is in the first graph and the impact of adding a quadrilateral routine in the second. Note the radical reduction in short line fills from 261,000 to 27,000, which should improve the performance considerably. The majority of short line fills are now coming from the triangles on trees and the ships (red line). 75% of all line fills are now in the region of 13 to 27 pixels, which is where optimized STM routines can improve the fill rate :D

I've coded up the replacement Quadrilateral and Triangle routines in BASIC, so I can play around with them prior to recoding into ARM. There's a few minor issues to resolve before I post the code up, but I'm hoping to have those resolved over the next week. Essentially I've analysed the order of points coming from Zarch and organised the code so that it doesn't need to pre-sort the Quad points in Y order - which is quite costly, I've also ordered the code so that the highest hit code executes first to allow for early exit.

The Triangle routine I've left the pre-sort in for the time being as it's now rarely called. Over the recording duration, Zarch currently plots 253712 triangles. With the split Quad/Tri routines, triangles are now 14054 and quadrilaterals are 116177

steve3000
Posts: 1711
Joined: Sun Nov 25, 2012 12:43 am

Re: Hacker needed ... for Zarch ;-)

Postby steve3000 » Thu Nov 12, 2015 9:19 pm

Really impressive Jon!

Apart from an awesome exercise in optimisation ;) what's the ultimate goal here? Is it to increase the draw distance, frame rate or resolution? (or all!)

...and what's the next step - and what help do you need?

sirbod
Posts: 742
Joined: Mon Apr 09, 2012 8:44 am
Location: Essex
Contact:

Re: Hacker needed ... for Zarch ;-)

Postby sirbod » Thu Nov 12, 2015 11:30 pm

steve3000 wrote:what's the ultimate goal here? Is it to increase the draw distance, frame rate or resolution? (or all!)

All of the above. Provided we can increase the fill rate enough:

ARM2: Increase the average frame rate to 25fps and optionally increase draw distance, using unrolled code
ARM3: Increase the frame rate to 25fps min and optionally increase draw distance, using looped code
ARM610/710: Increase the frame rate to 50fps and optionally increase draw distance, using looped code
StrongARM+: Increase the frame rate to 50fps and increase draw distance, using looped code
Iyonx+: Increase the frame rate to 50fps, increase resolution to 640x512 min, increase draw distance, using looped code and make ARMv7 compatible

I've not looked at the frame rate yet, Zarch doesn't regulate itself, so we need to figure out how to move the camera, objects etc based on the time from the last frame - otherwise it will just get quicker and not smoother. The movement is currently at a fixed rate per frame.

Resolution is another area that needs looking at, we'll need to locate all the hardcoded *320's outside of the triangle routine (they'll be using LSL#6 + LSL#8) and switch them to LDR/MUL. The CLS routines will also need altering, as will the graphics for the score bar, ship height and fuel.

Draw distance, we need to find the code that sets the start point and alter the number of tiles across and out of the screen that it draws and alter the ship plotting routines etc. The later may not be necessary as watching the recording in slow motion, it looks like the triangles are pre-sorted by depth and height so the ships get plotted as it plots the landscape.

ARMv7, we need to look at all the functions and figure out which need to preserve flags, altering them accordingly.
steve3000 wrote:...and what's the next step - and what help do you need?

I could do with help on the above points.

The next step for me is to get the replacement triangle/quadrilateral routines tested in BASIC and recoded into ARM, then look at the optimized line fill routines. For the fill routines, having thought about it over the week, I think I'm going to writing a function to compile the most optimal code for a given start/end alignment and line length, by having it try different methods and calculating the CPU cycles. Having done some random sample tests earlier the cutover between STRB's and LDR/mask being quicker is very erratic and not something we'd want to hand code.

Zarchos
Posts: 2355
Joined: Sun May 19, 2013 8:19 am
Location: FRANCE

Re: Hacker needed ... for Zarch ;-)

Postby Zarchos » Thu Nov 12, 2015 11:37 pm

sirbod wrote:The next step for me is to get the replacement triangle/quadrilateral routines tested in BASIC and recoded into ARM, then look at the optimized line fill routines. For the fill routines, having thought about it over the week, I think I'm going to writing a function to compile the most optimal code for a given start/end alignment and line length, by having it try different methods and calculating the CPU cycles. Having done some random sample tests earlier the cutover between STRB's and LDR/mask being quicker is very erratic and not something we'd want to hand code.


Or read what I've just sent you (for ARM2 and ARM250 Archies).
I hope you won't laugh.
:? :D

sirbod
Posts: 742
Joined: Mon Apr 09, 2012 8:44 am
Location: Essex
Contact:

Re: Hacker needed ... for Zarch ;-)

Postby sirbod » Fri Nov 13, 2015 7:34 am

Zarchos wrote:Or read what I've just sent you (for ARM2 and ARM250 Archies).

Lord above, you didn't hand code all that did you? :shock:

What I quickly coded up yesterday is very similar, although I built the individual routines programmatically with a function at compile, instead of actually creating code. The Quad-word align fill is an interesting idea, what does it get us back? Is it 1S for every quad-aligned write and have you worked out the cost of pre-aligning over simply using STM with additional registers? I suspect in some cases (based on my test yesterday around LDR/mask) that it may be quicker to not use quad-aligned fills and simply add an additional register or two to the final STM.

As I alluded too in my post yesterday, I'm going to programmatically select the fastest fill for any given line length and misalignment based on the code working out the exact N/S/I cost of each fill routine. This has several advantages, first the source code will be tiny and easily distributable as a patch and secondly will be as fast as it could possibly be for the specific requirements. I'm keeping the max STM length to 4 registers to avoid the overhead of STM/LDM around the fill so could take advantage of quad-aligned fills where appropriate.

I've now analysed fills that fall into quad-word aligned writes based on their initial misalignment. Out of 687106 total line fills, 327281 (47%) fall over quad-aligned and could benefit from quad-aligned STM's, so there could well be gains to be had here:
ZARCH_QUAD_ALIGNED_FILLS_BY_INITIAL_MISALIGNMENT.png
Line fills that fill over 16 pixels after alignment to a quad-word boundary...by initial misalignment

I'm curious now as to the gains to be had, so will finish coding it up and post here. I won't be able to test it in Zarch yet, as I need to get the quadrilateral routine in ARM first, but at least it will then be done and dusted.

sirbod
Posts: 742
Joined: Mon Apr 09, 2012 8:44 am
Location: Essex
Contact:

Re: Hacker needed ... for Zarch ;-)

Postby sirbod » Fri Nov 13, 2015 10:09 am

EDIT (15-11-15 @ 06:15): Having now double checked the code, I spotted a typo which was skewing the bias to the non-Quad fill code. Revised source and table of preferred method below. Next, I'll add in code to compare STRB and LDR/mask on the initial and ending word misalignments. I'm not sure I'll be able to use it in the final code as I don't have think I can spare the extra register it requires, but someone may find it useful in the future.

EDIT2 (16-11-15 @ 04:20): Code now updated to try LDR/mask and STRB's at the start and end of the line fill

EDIT3 (16-11-15 @ 17:45): Added rotated LDR's for +2 offsets and fixed a miscalculation of the 1 cycle non-Quad alignment penalty

If someone could double-check and confirm I'm working out the timings correctly, that would be very helpful. I did this in a bit of a rush and without any reference material, so suspect I may have made the odd mistake :oops:

Code: Select all

DIM code 128*1024
DEBUG%      =TRUE:REM Displays preferred method if TRUE
ROTATED_LDR%=TRUE:REM Allow the use of rotated LDR's

REM registers
col1=8:col2=9:col3=10:col4=11:REM must be consecutive registers
addr1=5:count=12

FOR A%=0 TO 2 STEP 2
P%=code
  FOR fill_length%=1 TO 63
    FOR initial_alignment%=0 TO 15
      [OPT A%:FNoptimized_line_fill(fill_length%, initial_alignment%):]
    NEXT
  NEXT
NEXT
END



DEF FNoptimized_line_fill(oN%, oA%)
 REM Code must fit in 64 bytes
 alignP%=P% + 64
 type%=0

 IF DEBUG% THEN PRINT ';oN%;",";oA%;": ";

 REM Work out fastest fill method
 OP%=P%
 cost%=1000000
 IF oN%>2 THEN
   FOR quad_align%=0 TO 1
     FOR start_align%=0 TO 1
       FOR end_align%=0 TO 1
         P%=OP%
         IF quad_align%=0 THEN
           new_cost%=FNnon_quad_aligned(oN%,oA%,start_align%,end_align%)
         ELSE
           new_cost%=FNquad_aligned(oN%,oA%,start_align%,end_align%)
         ENDIF
         IF DEBUG% THEN
           IF start_align%=0 THEN PRINT ;"STRB, "; ELSE PRINT "LDR/mask, ";
           IF quad_align%=0  THEN PRINT ;"non-Quad, "; ELSE PRINT ;"Quad, ";
           IF end_align%=0   THEN PRINT ;"STRB"; ELSE PRINT "LDR/mask";
           PRINT ;" (";new_cost%;")"'"    ";
         ENDIF
         IF new_cost%<cost% THEN cost%=new_cost%:type%=(start_align%<<2)+(quad_align%<<1)+end_align%
       NEXT
     NEXT
   NEXT
 ENDIF

 P%=OP%
 IF (type% AND %010)=0 THEN
   cost%=FNnon_quad_aligned(oN%,oA%,type% AND %100,type% AND %001)
 ELSE
   cost%=FNquad_aligned(oN%,oA%,type% AND %100,type% AND %001)
 ENDIF

 IF DEBUG% THEN
   PRINT '"    Quickest: ";
   IF (type% AND %100)=0 THEN PRINT ;"STRB, "; ELSE PRINT "LDR/mask, ";
   IF (type% AND %010)=0 THEN PRINT ;"non-Quad, "; ELSE PRINT ;"Quad, ";
   IF (type% AND %001)=0 THEN PRINT ;"STRB"; ELSE PRINT "LDR/mask";
   PRINT ;" (";cost%;")"
 ENDIF

 [OPT A%:MOV PC,R14:]
 IF P%>alignP% THEN PRINT ;"  CODE OVERRUN"

 REM Pad to the next fill block
 WHILE P%<alignP%:[OPT A%:DCD 0:]:ENDWHILE

 REM IF (oN%-oA%)>15 REPEATUNTILINKEY(-99):REPEATUNTILNOTINKEY(-99)
=A%




DEF FNintro_STRB(Nll%, All%)
 Ncyclel%=0

 REM Word align start via STRB
 WHILE (All% AND %11)>0 AND Nll%>0
   [OPT A%:STRB col1, [addr1], #1:]:All%+=1:Nll%-=1:Ncyclel%+=2
 ENDWHILE
=Ncyclel%*250




DEF FNoutro_STRB(Nll%)
 Ncyclel%=0

 REM Last few pixel via STRB
 WHILE Nll%>0
   [OPT A%:STRB col1, [addr1], #1:]:Nll%-=1:Ncyclel%+=2
 ENDWHILE
=Ncyclel%*250




DEF FNintro_LDR_mask(Nl%,Al%)
 Ncyclel%=0
 Scyclel%=0
 Icyclel%=0

 CASE Al% OF
   WHEN 3
     [OPT A%
      LDR  col1, [addr1, #-3]!              ;1N + 1S + 1I
      BIC  col1, col1, #&FF << 24           ;     1S
      ORR  col1, col1, col2, LSL #24        ;     1S
     ]:Ncyclel%=1:Scyclel%=3:Icyclel%=1
   WHEN 2
     IF ROTATED_LDR% THEN
       [OPT A%
        LDR  col1, [addr1], #-2             ;1N + 1S + 1I   (using rotated load)
        MOV  col1, col1, LSR #16            ;     1S
        ORR  col1, col1, temp_col, LSL #16  ;     1S
       ]:Ncyclel%=1:Scyclel%=3:Icyclel%=1
     ELSE
       [OPT A%
        LDR  col1, [addr1, #-2]!            ;1N + 1S + 1I
        MOV  col1, col1, LSL #16            ;     1S
        ORR  col1, col1, col2, LSR #16      ;     1S
        MOV  col1, col1, ROR #16            ;     1S
       ]:Ncyclel%=1:Scyclel%=4:Icyclel%=1
     ENDIF
   WHEN 1
     [OPT A%
      LDRB col1, [addr1, #-1]!              ;1N + 1S + 1I
      ORR  col1, col1, col2, LSL #8         ;     1S
     ]:Ncyclel%=1:Scyclel%=2:Icyclel%=1
 ENDCASE
=(Ncyclel%*250)+(Scyclel%*125)+(Icyclel%*125)




DEF FNoutro_LDR_mask(Nl%)
 Ncyclel%=0
 Scyclel%=0
 Icyclel%=0

 REM Which register should we use for the last few pixels?
 S%=(Nl%>>2)+col1
 IF R% AND S%=col1 THEN S%+=1
 temp_col=S%+1:IF temp_col>col4 THEN temp_col=col3

 CASE (Nl% AND %11) OF
   WHEN 3
     [OPT A%
      LDRB S%, [addr1, #Nl%]           ;1N + 1S + 1I
      MOV  S%, S%, LSL #24             ;     1S
      ORR  S%, S%, temp_col, LSR #8    ;     1S
     ]:Ncyclel%=1:Scyclel%=3:Icyclel%=1
   WHEN 2
     IF ROTATED_LDR% THEN
       [OPT A%
        LDR  S%, [addr1, #Nl%]          ;1N + 1S + 1I   (using rotated load)
        MOV  S%, S%, LSL #16            ;     1S
        ORR  S%, S%, temp_col, LSR #16  ;     1S
       ]:Ncyclel%=1:Scyclel%=3:Icyclel%=1
     ELSE
       [OPT A%
        LDR  col1, [addr1, #-2]!        ;1N + 1S + 1I
        MOV  col1, col1, LSL #16        ;     1S
        ORR  col1, col1, col2, LSR #16  ;     1S
        MOV  col1, col1, ROR #16        ;     1S
       ]:Ncyclel%=1:Scyclel%=4:Icyclel%=1
     ENDIF
   WHEN 1
     [OPT A%
      LDR  S%, [addr1, #Nl%-3]          ;1N + 1S + 1I
      BIC  S%, S%, #&FF                 ;     1S
      ORR  S%, S%, temp_col, LSR #24    ;     1S
     ]:Ncyclel%=1:Scyclel%=3:Icyclel%=1
 ENDCASE

 REM If length below 4 pixels write immediately, otherwise its combined into the STM's later
 IF Nl%>0 AND Nl%<4 THEN
   [OPT A%:STR S%, [addr1, #0]:]:Ncyclel%+=2
 ENDIF

=(Ncyclel%*250)+(Scyclel%*125)+(Icyclel%*125)




DEF FNnon_quad_aligned(Nl%, Al%, start_alignl%, end_alignl%)
 Scycle%=0
 Ncycle%=0
 cost1%=0
 cost2%=0
 S%=0:REM register used for last few pixels

 REM Work out fastest way to word align the start
 R%=FALSE:REM TRUE if a colour register is used for first few pixels
 IF (Al% AND %11)>0 THEN
   IF start_alignl%=0 THEN
     cost1%=FNintro_STRB(Nl%,Al% AND %11)
     IF cost1%>0 THEN Nl%-=4-(Al% AND %11):Al%=(Al% + 3) AND %1100
   ELSE
     IF Nl%>2 THEN
       cost1%=FNintro_LDR_mask(Nl%,Al% AND %11)
       IF cost1%>0 THEN Nl%+=Al% AND %11:Al%=Al% AND %1100:R%=TRUE
     ELSE
       cost1%=10000
     ENDIF
   ENDIF
 ENDIF

 REM Can we combine the end pixels early?
 IF cost%=0 AND end_alignl% AND Nl%>2 AND Nl%<16 THEN
   cost2%=FNoutro_LDR_mask(Nl%)

   REM Do we need to write an extra word?
   IF cost2%>0 THEN Nl%+=4
 ENDIF

 REM Word aligned STM's
 WHILE Nl%>=16
   [OPT A%:STMIA (addr1)!,{col1,col2,col3,col4}:]:Nl%-=16:Ncycle%+=2:Scycle%+=4-1        :IF (Al% AND %1111)>0 THEN Scycle%+=1
   IF R% AND S%=col2  THEN [OPT A%:MOV col1, col3:]:R%=FALSE:Scycle%+=1
   IF R% AND S%<>col2 THEN [OPT A%:MOV col1, col2:]:R%=FALSE:Scycle%+=1
   REM Can we combine the end pixels early?
   IF cost2%=0 AND end_alignl% AND Nl%>2 AND Nl%<16 THEN
     cost2%=FNoutro_LDR_mask(Nl%)

     REM Do we need to write an extra word?
     IF cost2%>0 THEN Nl%+=4
   ENDIF
 ENDWHILE

 REM Can we combine the end pixels early?
 IF cost2%=0 AND end_alignl% AND Nl%>2 AND Nl%<16 THEN
   cost2%=FNoutro_LDR_mask(Nl%)

   REM Do we need to write an extra word?
   IF cost2%>0 THEN Nl%+=4
 ENDIF
 IF Nl%>=12 THEN
   [OPT A%:STMIA (addr1)!,{col1,col2,col3}:]     :Nl%-=12:Ncycle%+=2:Scycle%+=3-1:Al%+=12:IF (Al% AND %1111)>0 THEN Scycle%+=1
   IF R% AND S%=col2  THEN [OPT A%:MOV col1, col3:]:R%=FALSE:Scycle%+=1
   IF R% AND S%<>col2 THEN [OPT A%:MOV col1, col2:]:R%=FALSE:Scycle%+=1
 ENDIF

 REM Can we combine the end pixels early?
 IF cost2%=0 AND end_alignl% AND Nl%>2 AND Nl%<16 THEN
   cost2%=FNoutro_LDR_mask(Nl%)

   REM Do we need to write an extra word?
   IF cost2%>0 THEN Nl%+=4
 ENDIF
 IF Nl%>=8 THEN
   [OPT A%:STMIA (addr1)!,{col1,col2}:]          :Nl%-=8 :Ncycle%+=2:Scycle%+=2-1:Al%+=8 :IF (Al%  AND %1111)>0 THEN Scycle%+=1
   IF R% AND S%=col2  THEN [OPT A%:MOV col1, col3:]:R%=FALSE:Scycle%+=1
   IF R% AND S%<>col2 THEN [OPT A%:MOV col1, col2:]:R%=FALSE:Scycle%+=1
 ENDIF

 IF Nl%>=4 THEN
   [OPT A%:STR col1, [addr1], #4:]               :Nl%-=4 :Ncycle%+=2:            :Al%+=4
   IF R% AND S%=col2  THEN [OPT A%:MOV col1, col3:]:R%=FALSE:Scycle%+=1
   IF R% AND S%<>col2 THEN [OPT A%:MOV col1, col2:]:R%=FALSE:Scycle%+=1
 ENDIF

 REM Have we already covered the final few pixels?
 IF S%>0 THEN Nl%=0

 REM Work out fastest way to fill final pixels
 IF Nl%>0 THEN
   IF end_alignl%=0 THEN cost2%=FNoutro_STRB(Nl%) ELSE cost2%=FNoutro_LDR_mask(Nl%)
 ENDIF

 REM Reset colour register used for final pixels
 IF S%>0 THEN
   IF S%=col1 THEN [OPT A%:MOV col1, col2:] ELSE [OPT A%:MOV S%, col1:]
   Scycle%+=1
 ENDIF

=(Ncycle%*250)+(Scycle%*125)+cost1%+cost2%





DEF FNquad_aligned(Nl%, Al%, start_alignl%, end_alignl%)
 Scycle%=0
 Ncycle%=0
 cost1%=0
 cost2%=0
 S%=0:REM register used for last few pixels

 REM Work out fastest way to word align the start
 R%=FALSE:REM TRUE if a colour register is used for first few pixels
 IF (Al% AND %11)>0 THEN
   IF start_alignl%=0 THEN
     cost1%=FNintro_STRB(Nl%,Al% AND %11)
     IF cost1%>0 THEN Nl%-=4-(Al% AND %11):Al%=(Al% + 3) AND %1100
   ELSE
     IF Nl%>2 THEN
       cost1%=FNintro_LDR_mask(Nl%,Al% AND %11)
       IF cost1%>0 THEN Nl%+=Al% AND %11:Al%=Al% AND %1100:R%=TRUE
     ELSE
       cost1%=10000
     ENDIF
   ENDIF
 ENDIF

 REM Can we combine the end pixels early?
 IF end_alignl% AND Nl%>2 AND Nl%<16 THEN
   lP%=P%
   cost2%=FNoutro_LDR_mask(Nl%)

   REM Do we need to write an extra word?  And does it push us over the Quad alignment?
   IF cost2%>0 THEN
     CASE (Al% AND %1111) OF
       WHEN 0
         Nl%+=4
       WHEN 4
         IF S%<=col3 THEN Nl%+=4 ELSE cost2%=0:P%=lP%:S%=0
       WHEN 8
         IF S%<=col2 THEN Nl%+=4 ELSE cost2%=0:P%=lP%:S%=0
       WHEN 12
         cost2%=0:P%=lP%:S%=0
     ENDCASE
   ENDIF
 ENDIF

 REM Quad-align start
 IF Nl%>16 AND (Al% AND %1111)>0 THEN
   CASE (Al% AND %1111) OF
     WHEN 4
       [OPT A%:STMIA (addr1)!,{col1,col2,col3}:]     :Nl%-=12:Ncycle%+=2:Scycle%+=3-1:Al%+=12:IF (Al% AND %1111)>0 THEN Scycle%+=1
     WHEN 8
       [OPT A%:STMIA (addr1)!,{col1,col2}:]          :Nl%-=8 :Ncycle%+=2:Scycle%+=2-1:Al%+=8 :IF (Al% AND %1111)>0 THEN Scycle%+=1
     WHEN 12
       [OPT A%:STR col1, [addr1], #4:]               :Nl%-=4 :Ncycle%+=2             :Al%+=4
   ENDCASE
   IF R% AND S%=col2  THEN [OPT A%:MOV col1, col3:]:R%=FALSE:Scycle%+=1
   IF R% AND S%<>col2 THEN [OPT A%:MOV col1, col2:]:R%=FALSE:Scycle%+=1
 ENDIF

 REM Can we combine the end pixels early?
 IF cost2%=0 AND end_alignl% AND Nl%>2 AND Nl%<16 THEN
   cost2%=FNoutro_LDR_mask(Nl%)

   REM Do we need to write an extra word?
   IF cost2%>0 THEN Nl%+=4
 ENDIF

 REM Quad-aligned STM's
 WHILE Nl%>=16
   [OPT A%:STMIA (addr1)!,{col1,col2,col3,col4}:]:Nl%-=16:Ncycle%+=2:Scycle%+=4-1:IF (Al% AND %1111)>0 THEN Scycle%+=1
   IF R% AND S%=col2  THEN [OPT A%:MOV col1, col3:]:R%=FALSE:Scycle%+=1
   IF R% AND S%<>col2 THEN [OPT A%:MOV col1, col2:]:R%=FALSE:Scycle%+=1
   REM Can we combine the end pixels early?
   IF cost2%=0 AND end_alignl% AND Nl%>2 AND Nl%<16 THEN
     cost2%=FNoutro_LDR_mask(Nl%)

     REM Do we need to write an extra word?
     IF cost2%>0 THEN Nl%+=4
   ENDIF
 ENDWHILE

 REM Can we combine the end pixels early?
 IF cost2%=0 AND end_alignl% AND Nl%>2 AND Nl%<16 THEN
   cost2%=FNoutro_LDR_mask(Nl%)

   REM Do we need to write an extra word?
   IF cost2%>0 THEN Nl%+=4
 ENDIF
 IF Nl%>=12 THEN
   [OPT A%:STMIA (addr1)!,{col1,col2,col3}:]     :Nl%-=12:Ncycle%+=2:Scycle%+=3-1:Al%+=12:IF (Al% AND %1111)>0 THEN Scycle%+=1
   IF R% AND S%=col2  THEN [OPT A%:MOV col1, col3:]:R%=FALSE:Scycle%+=1
   IF R% AND S%<>col2 THEN [OPT A%:MOV col1, col2:]:R%=FALSE:Scycle%+=1
 ENDIF

 REM Can we combine the end pixels early?
 IF cost2%=0 AND end_alignl% AND Nl%>2 AND Nl%<16 THEN
   cost2%=FNoutro_LDR_mask(Nl%)

   REM Do we need to write an extra word?
   IF cost2%>0 THEN Nl%+=4
 ENDIF
 IF Nl%>=8 THEN
   [OPT A%:STMIA (addr1)!,{col1,col2}:]          :Nl%-=8 :Ncycle%+=2:Scycle%+=2-1:Al%+=8 :IF (Al%  AND %1111)>0 THEN Scycle%+=1
   IF R% AND S%=col2  THEN [OPT A%:MOV col1, col3:]:R%=FALSE:Scycle%+=1
   IF R% AND S%<>col2 THEN [OPT A%:MOV col1, col2:]:R%=FALSE:Scycle%+=1
 ENDIF

 IF Nl%>=4 THEN
   [OPT A%:STR col1, [addr1], #4:]               :Nl%-=4 :Ncycle%+=2:            :Al%+=4
   IF R% AND S%=col2  THEN [OPT A%:MOV col1, col3:]:R%=FALSE:Scycle%+=1
   IF R% AND S%<>col2 THEN [OPT A%:MOV col1, col2:]:R%=FALSE:Scycle%+=1
 ENDIF

 REM Have we already covered the final few pixels?
 IF S%>0 THEN Nl%=0

 REM Work out fastest way to fill final pixels
 IF Nl%>0 THEN
   IF end_alignl%=0 THEN cost2%=FNoutro_STRB(Nl%) ELSE cost2%=FNoutro_LDR_mask(Nl%)
 ENDIF

 REM Reset colour register used for final pixels
 IF S%>0 THEN
   IF S%=col1 THEN [OPT A%:MOV col1, col2:] ELSE [OPT A%:MOV S%, col1:]
   Scycle%+=1
 ENDIF

=(Ncycle%*250)+(Scycle%*125)+cost1%+cost2%


Here's the abridged output from the code, showing the preferred method for each line length and offset within the Quad word boundary. If you run the code it will display the speeds of every variation it tries.

Values are "<line fill length>,<initial misalignment>: <start align method>, <main fill method>, <trailing pixel method> (time in nS)":

Code: Select all

1,0: STRB, non-Quad, STRB (500)
1,1: STRB, non-Quad, STRB (500)
1,2: STRB, non-Quad, STRB (500)
1,3: STRB, non-Quad, STRB (500)
1,4: STRB, non-Quad, STRB (500)
1,5: STRB, non-Quad, STRB (500)
1,6: STRB, non-Quad, STRB (500)
1,7: STRB, non-Quad, STRB (500)
1,8: STRB, non-Quad, STRB (500)
1,9: STRB, non-Quad, STRB (500)
1,10: STRB, non-Quad, STRB (500)
1,11: STRB, non-Quad, STRB (500)
1,12: STRB, non-Quad, STRB (500)
1,13: STRB, non-Quad, STRB (500)
1,14: STRB, non-Quad, STRB (500)
1,15: STRB, non-Quad, STRB (500)
2,0: STRB, non-Quad, STRB (1000)
2,1: STRB, non-Quad, STRB (1000)
2,2: STRB, non-Quad, STRB (1000)
2,3: STRB, non-Quad, STRB (1000)
2,4: STRB, non-Quad, STRB (1000)
2,5: STRB, non-Quad, STRB (1000)
2,6: STRB, non-Quad, STRB (1000)
2,7: STRB, non-Quad, STRB (1000)
2,8: STRB, non-Quad, STRB (1000)
2,9: STRB, non-Quad, STRB (1000)
2,10: STRB, non-Quad, STRB (1000)
2,11: STRB, non-Quad, STRB (1000)
2,12: STRB, non-Quad, STRB (1000)
2,13: STRB, non-Quad, STRB (1000)
2,14: STRB, non-Quad, STRB (1000)
2,15: STRB, non-Quad, STRB (1000)
3,0: STRB, non-Quad, STRB (1500)
3,1: LDR/mask, non-Quad, STRB (1250)
3,2: STRB, non-Quad, STRB (1500)
3,3: STRB, non-Quad, STRB (1500)
3,4: STRB, non-Quad, STRB (1500)
3,5: LDR/mask, non-Quad, STRB (1250)
3,6: STRB, non-Quad, STRB (1500)
3,7: STRB, non-Quad, STRB (1500)
3,8: STRB, non-Quad, STRB (1500)
3,9: LDR/mask, non-Quad, STRB (1250)
3,10: STRB, non-Quad, STRB (1500)
3,11: STRB, non-Quad, STRB (1500)
3,12: STRB, non-Quad, STRB (1500)
3,13: LDR/mask, non-Quad, STRB (1250)
3,14: STRB, non-Quad, STRB (1500)
3,15: STRB, non-Quad, STRB (1500)
4,0: STRB, non-Quad, STRB (500)
4,1: LDR/mask, non-Quad, STRB (1750)
4,2: STRB, non-Quad, STRB (2000)
4,3: STRB, non-Quad, STRB (2000)
4,4: STRB, non-Quad, STRB (500)
4,5: LDR/mask, non-Quad, STRB (1750)
4,6: STRB, non-Quad, STRB (2000)
4,7: STRB, non-Quad, STRB (2000)
4,8: STRB, non-Quad, STRB (500)
4,9: LDR/mask, non-Quad, STRB (1750)
4,10: STRB, non-Quad, STRB (2000)
4,11: STRB, non-Quad, STRB (2000)
4,12: STRB, non-Quad, STRB (500)
4,13: LDR/mask, non-Quad, STRB (1750)
4,14: STRB, non-Quad, STRB (2000)
4,15: STRB, non-Quad, STRB (2000)
5,0: STRB, non-Quad, STRB (1000)
5,1: LDR/mask, non-Quad, STRB (2250)
5,2: STRB, non-Quad, STRB (2500)
5,3: STRB, non-Quad, STRB (1000)
5,4: STRB, non-Quad, STRB (1000)
5,5: LDR/mask, non-Quad, STRB (2250)
5,6: STRB, non-Quad, STRB (2500)
5,7: STRB, non-Quad, STRB (1000)
5,8: STRB, non-Quad, STRB (1000)
5,9: LDR/mask, non-Quad, STRB (2250)
5,10: STRB, non-Quad, STRB (2500)
5,11: STRB, non-Quad, STRB (1000)
5,12: STRB, non-Quad, STRB (1000)
5,13: LDR/mask, non-Quad, STRB (2250)
5,14: STRB, non-Quad, STRB (2500)
5,15: STRB, non-Quad, STRB (1000)
6,0: STRB, non-Quad, STRB (1500)
6,1: LDR/mask, non-Quad, LDR/mask (2250)
6,2: STRB, non-Quad, STRB (1500)
6,3: STRB, non-Quad, STRB (1500)
6,4: STRB, non-Quad, STRB (1500)
6,5: LDR/mask, non-Quad, LDR/mask (2375)
6,6: STRB, non-Quad, STRB (1500)
6,7: STRB, non-Quad, STRB (1500)
6,8: STRB, non-Quad, STRB (1500)
6,9: LDR/mask, non-Quad, LDR/mask (2375)
6,10: STRB, non-Quad, STRB (1500)
6,11: STRB, non-Quad, STRB (1500)
6,12: STRB, non-Quad, STRB (1500)
6,13: LDR/mask, non-Quad, LDR/mask (2375)
6,14: STRB, non-Quad, STRB (1500)
6,15: STRB, non-Quad, STRB (1500)
7,0: STRB, non-Quad, LDR/mask (1500)
7,1: LDR/mask, non-Quad, STRB (1375)
7,2: STRB, non-Quad, STRB (2000)
7,3: STRB, non-Quad, STRB (2000)
7,4: STRB, non-Quad, LDR/mask (1625)
7,5: LDR/mask, non-Quad, STRB (1500)
7,6: STRB, non-Quad, STRB (2000)
7,7: STRB, non-Quad, STRB (2000)
7,8: STRB, non-Quad, LDR/mask (1625)
7,9: LDR/mask, non-Quad, STRB (1500)
7,10: STRB, non-Quad, STRB (2000)
7,11: STRB, non-Quad, STRB (2000)
7,12: STRB, non-Quad, LDR/mask (1625)
7,13: LDR/mask, non-Quad, STRB (1500)
7,14: STRB, non-Quad, STRB (2000)
7,15: STRB, non-Quad, STRB (2000)
8,0: STRB, non-Quad, STRB (625)
8,1: LDR/mask, non-Quad, STRB (1875)
8,2: STRB, non-Quad, STRB (2500)
8,3: STRB, non-Quad, LDR/mask (2125)
8,4: STRB, non-Quad, STRB (750)
8,5: LDR/mask, non-Quad, STRB (2000)
8,6: STRB, non-Quad, STRB (2500)
8,7: STRB, non-Quad, LDR/mask (2125)
8,8: STRB, non-Quad, STRB (750)
8,9: LDR/mask, non-Quad, STRB (2000)
8,10: STRB, non-Quad, STRB (2500)
8,11: STRB, non-Quad, LDR/mask (2125)
8,12: STRB, non-Quad, STRB (750)
8,13: LDR/mask, non-Quad, STRB (2000)
8,14: STRB, non-Quad, STRB (2500)
8,15: STRB, non-Quad, LDR/mask (2000)
9,0: STRB, non-Quad, STRB (1125)
9,1: LDR/mask, non-Quad, STRB (2375)
9,2: STRB, non-Quad, LDR/mask (2625)
9,3: STRB, non-Quad, STRB (1250)
9,4: STRB, non-Quad, STRB (1250)
9,5: LDR/mask, non-Quad, STRB (2500)
9,6: STRB, non-Quad, LDR/mask (2625)
9,7: STRB, non-Quad, STRB (1250)
9,8: STRB, non-Quad, STRB (1250)
9,9: LDR/mask, non-Quad, STRB (2500)
9,10: STRB, non-Quad, LDR/mask (2625)
9,11: STRB, non-Quad, STRB (1250)
9,12: STRB, non-Quad, STRB (1250)
9,13: LDR/mask, non-Quad, STRB (2500)
9,14: STRB, non-Quad, LDR/mask (2500)
9,15: STRB, non-Quad, STRB (1125)
10,0: STRB, non-Quad, STRB (1625)
10,1: LDR/mask, non-Quad, LDR/mask (2375)
10,2: STRB, non-Quad, STRB (1750)
10,3: STRB, non-Quad, STRB (1750)
10,4: STRB, non-Quad, STRB (1750)
10,5: LDR/mask, non-Quad, LDR/mask (2500)
10,6: STRB, non-Quad, STRB (1750)
10,7: STRB, non-Quad, STRB (1750)
10,8: STRB, non-Quad, STRB (1750)
10,9: LDR/mask, non-Quad, LDR/mask (2500)
10,10: STRB, non-Quad, STRB (1750)
10,11: STRB, non-Quad, STRB (1750)
10,12: STRB, non-Quad, STRB (1750)
10,13: LDR/mask, non-Quad, LDR/mask (2500)
10,14: STRB, non-Quad, STRB (1625)
10,15: STRB, non-Quad, STRB (1625)
11,0: STRB, non-Quad, LDR/mask (1625)
11,1: LDR/mask, non-Quad, STRB (1500)
11,2: STRB, non-Quad, STRB (2250)
11,3: STRB, non-Quad, STRB (2250)
11,4: STRB, non-Quad, LDR/mask (1750)
11,5: LDR/mask, non-Quad, STRB (1625)
11,6: STRB, non-Quad, STRB (2250)
11,7: STRB, non-Quad, STRB (2250)
11,8: STRB, non-Quad, LDR/mask (1750)
11,9: LDR/mask, non-Quad, STRB (1625)
11,10: STRB, non-Quad, STRB (2250)
11,11: STRB, non-Quad, STRB (2250)
11,12: STRB, non-Quad, LDR/mask (1750)
11,13: LDR/mask, non-Quad, STRB (1625)
11,14: STRB, non-Quad, STRB (2125)
11,15: STRB, non-Quad, STRB (2125)
12,0: STRB, non-Quad, STRB (750)
12,1: LDR/mask, non-Quad, STRB (2000)
12,2: STRB, non-Quad, STRB (2750)
12,3: STRB, non-Quad, LDR/mask (2250)
12,4: STRB, non-Quad, STRB (875)
12,5: LDR/mask, non-Quad, STRB (2125)
12,6: STRB, non-Quad, STRB (2750)
12,7: STRB, non-Quad, LDR/mask (2250)
12,8: STRB, non-Quad, STRB (875)
12,9: LDR/mask, non-Quad, STRB (2125)
12,10: STRB, non-Quad, STRB (2750)
12,11: STRB, non-Quad, LDR/mask (2250)
12,12: STRB, non-Quad, STRB (875)
12,13: LDR/mask, non-Quad, STRB (2125)
12,14: STRB, non-Quad, STRB (2625)
12,15: STRB, non-Quad, LDR/mask (2125)
13,0: STRB, non-Quad, STRB (1250)
13,1: LDR/mask, non-Quad, STRB (2500)
13,2: STRB, non-Quad, LDR/mask (2750)
13,3: STRB, non-Quad, STRB (1375)
13,4: STRB, non-Quad, STRB (1375)
13,5: LDR/mask, non-Quad, STRB (2625)
13,6: STRB, non-Quad, LDR/mask (2750)
13,7: STRB, non-Quad, STRB (1375)
13,8: STRB, non-Quad, STRB (1375)
13,9: LDR/mask, non-Quad, STRB (2625)
13,10: STRB, non-Quad, LDR/mask (2750)
13,11: STRB, non-Quad, STRB (1375)
13,12: STRB, non-Quad, STRB (1375)
13,13: LDR/mask, non-Quad, STRB (2625)
13,14: STRB, non-Quad, LDR/mask (2625)
13,15: STRB, non-Quad, STRB (1250)
14,0: STRB, non-Quad, STRB (1750)
14,1: LDR/mask, Quad, LDR/mask (2500)
14,2: STRB, non-Quad, STRB (1875)
14,3: STRB, non-Quad, STRB (1875)
14,4: STRB, non-Quad, STRB (1875)
14,5: LDR/mask, Quad, LDR/mask (2625)
14,6: STRB, non-Quad, STRB (1875)
14,7: STRB, non-Quad, STRB (1875)
14,8: STRB, non-Quad, STRB (1875)
14,9: LDR/mask, Quad, LDR/mask (2625)
14,10: STRB, non-Quad, STRB (1875)
14,11: STRB, non-Quad, STRB (1875)
14,12: STRB, non-Quad, STRB (1875)
14,13: LDR/mask, Quad, LDR/mask (2625)
14,14: STRB, non-Quad, STRB (1750)
14,15: STRB, non-Quad, STRB (1750)
15,0: STRB, Quad, LDR/mask (1750)
15,1: LDR/mask, non-Quad, STRB (1625)
15,2: STRB, non-Quad, STRB (2375)
15,3: STRB, non-Quad, STRB (2375)
15,4: STRB, Quad, LDR/mask (1875)
15,5: LDR/mask, non-Quad, STRB (1750)
15,6: STRB, non-Quad, STRB (2375)
15,7: STRB, non-Quad, STRB (2375)
15,8: STRB, Quad, LDR/mask (1875)
15,9: LDR/mask, non-Quad, STRB (1750)
15,10: STRB, non-Quad, STRB (2375)
15,11: STRB, non-Quad, STRB (2375)
15,12: STRB, Quad, LDR/mask (1875)
15,13: LDR/mask, non-Quad, STRB (1750)
15,14: STRB, non-Quad, STRB (2250)
15,15: STRB, non-Quad, STRB (2250)
16,0: STRB, non-Quad, STRB (875)
16,1: LDR/mask, non-Quad, STRB (2125)
16,2: STRB, non-Quad, STRB (2875)
16,3: STRB, Quad, LDR/mask (2375)
16,4: STRB, non-Quad, STRB (1000)
16,5: LDR/mask, non-Quad, STRB (2250)
16,6: STRB, non-Quad, STRB (2875)
16,7: STRB, Quad, LDR/mask (2375)
16,8: STRB, non-Quad, STRB (1000)
16,9: LDR/mask, non-Quad, STRB (2250)
16,10: STRB, non-Quad, STRB (2875)
16,11: STRB, Quad, LDR/mask (2375)
16,12: STRB, non-Quad, STRB (1000)
16,13: LDR/mask, non-Quad, STRB (2250)
16,14: STRB, non-Quad, STRB (2750)
16,15: STRB, Quad, LDR/mask (2250)
17,0: STRB, non-Quad, STRB (1375)
17,1: LDR/mask, non-Quad, STRB (2625)
17,2: STRB, Quad, LDR/mask (2875)
17,3: STRB, non-Quad, STRB (1500)
17,4: STRB, non-Quad, STRB (1500)
17,5: LDR/mask, non-Quad, STRB (2750)
17,6: STRB, Quad, LDR/mask (2875)
17,7: STRB, non-Quad, STRB (1500)
17,8: STRB, non-Quad, STRB (1500)
17,9: LDR/mask, non-Quad, STRB (2750)
17,10: STRB, Quad, LDR/mask (2875)
17,11: STRB, non-Quad, STRB (1500)
17,12: STRB, non-Quad, STRB (1500)
17,13: LDR/mask, non-Quad, STRB (2750)
17,14: STRB, Quad, LDR/mask (2750)
17,15: STRB, non-Quad, STRB (1375)
18,0: STRB, non-Quad, STRB (1875)
18,1: LDR/mask, non-Quad, STRB (3125)
18,2: STRB, non-Quad, STRB (2000)
18,3: STRB, non-Quad, STRB (2000)
18,4: STRB, non-Quad, STRB (2000)
18,5: LDR/mask, Quad, LDR/mask (3125)
18,6: STRB, non-Quad, STRB (2000)
18,7: STRB, non-Quad, STRB (2000)
18,8: STRB, non-Quad, STRB (2000)
18,9: LDR/mask, Quad, LDR/mask (3125)
18,10: STRB, non-Quad, STRB (2000)
18,11: STRB, non-Quad, STRB (2000)
18,12: STRB, non-Quad, STRB (2000)
18,13: LDR/mask, Quad, LDR/mask (3000)
18,14: STRB, non-Quad, STRB (1875)
18,15: STRB, non-Quad, STRB (1875)
19,0: STRB, non-Quad, STRB (2375)
19,1: LDR/mask, non-Quad, STRB (2125)
19,2: STRB, non-Quad, STRB (2500)
19,3: STRB, non-Quad, STRB (2500)
19,4: STRB, Quad, LDR/mask (2375)
19,5: LDR/mask, non-Quad, STRB (2250)
19,6: STRB, non-Quad, STRB (2500)
19,7: STRB, non-Quad, STRB (2500)
19,8: STRB, Quad, LDR/mask (2375)
19,9: LDR/mask, non-Quad, STRB (2250)
19,10: STRB, non-Quad, STRB (2500)
19,11: STRB, non-Quad, STRB (2500)
19,12: STRB, Quad, LDR/mask (2250)
19,13: LDR/mask, Quad, STRB (2125)
19,14: STRB, non-Quad, STRB (2375)
19,15: STRB, non-Quad, STRB (2375)
20,0: STRB, non-Quad, STRB (1375)
20,1: LDR/mask, non-Quad, STRB (2625)
20,2: STRB, non-Quad, STRB (3000)
20,3: STRB, Quad, LDR/mask (2875)
20,4: STRB, non-Quad, STRB (1500)
20,5: LDR/mask, non-Quad, STRB (2750)
20,6: STRB, non-Quad, STRB (3000)
20,7: STRB, Quad, LDR/mask (2875)
20,8: STRB, non-Quad, STRB (1500)
20,9: LDR/mask, non-Quad, STRB (2750)
20,10: STRB, non-Quad, STRB (3000)
20,11: STRB, Quad, LDR/mask (2750)
20,12: STRB, Quad, STRB (1375)
20,13: LDR/mask, Quad, STRB (2625)
20,14: STRB, non-Quad, STRB (2875)
20,15: STRB, non-Quad, STRB (2875)
21,0: STRB, non-Quad, STRB (1875)
21,1: LDR/mask, non-Quad, STRB (3125)
21,2: LDR/mask, non-Quad, LDR/mask (3375)
21,3: STRB, non-Quad, STRB (2000)
21,4: STRB, non-Quad, STRB (2000)
21,5: LDR/mask, non-Quad, STRB (3250)
21,6: STRB, Quad, LDR/mask (3375)
21,7: STRB, non-Quad, STRB (2000)
21,8: STRB, non-Quad, STRB (2000)
21,9: LDR/mask, non-Quad, STRB (3250)
21,10: STRB, Quad, LDR/mask (3250)
21,11: STRB, Quad, STRB (1875)
21,12: STRB, Quad, STRB (1875)
21,13: LDR/mask, Quad, STRB (3125)
21,14: STRB, non-Quad, STRB (3375)
21,15: STRB, non-Quad, STRB (1875)
22,0: STRB, non-Quad, STRB (2375)
22,1: LDR/mask, non-Quad, LDR/mask (3125)
22,2: STRB, non-Quad, STRB (2500)
22,3: STRB, non-Quad, STRB (2500)
22,4: STRB, non-Quad, STRB (2500)
22,5: LDR/mask, Quad, LDR/mask (3250)
22,6: STRB, non-Quad, STRB (2500)
22,7: STRB, non-Quad, STRB (2500)
22,8: STRB, non-Quad, STRB (2500)
22,9: LDR/mask, Quad, LDR/mask (3250)
22,10: STRB, Quad, STRB (2375)
22,11: STRB, Quad, STRB (2375)
22,12: STRB, Quad, STRB (2375)
22,13: LDR/mask, non-Quad, LDR/mask (3375)
22,14: STRB, non-Quad, STRB (2375)
22,15: STRB, non-Quad, STRB (2375)
23,0: STRB, non-Quad, LDR/mask (2375)
23,1: LDR/mask, non-Quad, STRB (2250)
23,2: STRB, non-Quad, STRB (3000)
23,3: STRB, non-Quad, STRB (3000)
23,4: STRB, Quad, LDR/mask (2500)
23,5: LDR/mask, Quad, STRB (2375)
23,6: STRB, non-Quad, STRB (3000)
23,7: STRB, non-Quad, STRB (3000)
23,8: STRB, Quad, LDR/mask (2500)
23,9: LDR/mask, Quad, STRB (2375)
23,10: STRB, Quad, STRB (2875)
23,11: STRB, Quad, STRB (2875)
23,12: STRB, non-Quad, LDR/mask (2625)
23,13: LDR/mask, non-Quad, STRB (2500)
23,14: STRB, non-Quad, STRB (2875)
23,15: STRB, non-Quad, STRB (2875)
24,0: STRB, non-Quad, STRB (1500)
24,1: LDR/mask, non-Quad, STRB (2750)
24,2: STRB, non-Quad, STRB (3500)
24,3: STRB, Quad, LDR/mask (3000)
24,4: STRB, Quad, STRB (1625)
24,5: LDR/mask, Quad, STRB (2875)
24,6: STRB, non-Quad, STRB (3500)
24,7: STRB, Quad, LDR/mask (3000)
24,8: STRB, Quad, STRB (1625)
24,9: LDR/mask, Quad, STRB (2875)
24,10: STRB, Quad, STRB (3375)
24,11: STRB, non-Quad, LDR/mask (3125)
24,12: STRB, non-Quad, STRB (1750)
24,13: LDR/mask, non-Quad, STRB (3000)
24,14: STRB, non-Quad, STRB (3375)
24,15: STRB, non-Quad, LDR/mask (2875)
25,0: STRB, non-Quad, STRB (2000)
25,1: LDR/mask, non-Quad, STRB (3250)
25,2: LDR/mask, non-Quad, LDR/mask (3500)
25,3: STRB, Quad, STRB (2125)
25,4: STRB, Quad, STRB (2125)
25,5: LDR/mask, Quad, STRB (3375)
25,6: STRB, Quad, LDR/mask (3500)
25,7: STRB, Quad, STRB (2125)
25,8: STRB, Quad, STRB (2125)
25,9: LDR/mask, Quad, STRB (3375)
25,10: STRB, non-Quad, LDR/mask (3625)
25,11: STRB, non-Quad, STRB (2250)
25,12: STRB, non-Quad, STRB (2250)
25,13: LDR/mask, non-Quad, STRB (3500)
25,14: STRB, non-Quad, LDR/mask (3375)
25,15: STRB, non-Quad, STRB (2000)
26,0: STRB, non-Quad, STRB (2500)
26,1: LDR/mask, non-Quad, LDR/mask (3250)
26,2: LDR/mask, non-Quad, STRB (2625)
26,3: STRB, Quad, STRB (2625)
26,4: STRB, Quad, STRB (2625)
26,5: LDR/mask, Quad, LDR/mask (3375)
26,6: STRB, Quad, STRB (2625)
26,7: STRB, Quad, STRB (2625)
26,8: STRB, Quad, STRB (2625)
26,9: LDR/mask, non-Quad, LDR/mask (3500)
26,10: STRB, non-Quad, STRB (2750)
26,11: STRB, non-Quad, STRB (2750)
26,12: STRB, non-Quad, STRB (2750)
26,13: LDR/mask, non-Quad, LDR/mask (3500)
26,14: STRB, non-Quad, STRB (2500)
26,15: STRB, non-Quad, STRB (2500)
27,0: STRB, non-Quad, LDR/mask (2500)
27,1: LDR/mask, non-Quad, STRB (2375)
27,2: LDR/mask, non-Quad, STRB (3125)
27,3: STRB, Quad, STRB (3125)
27,4: STRB, Quad, LDR/mask (2625)
27,5: LDR/mask, Quad, STRB (2500)
27,6: STRB, Quad, STRB (3125)
27,7: STRB, Quad, STRB (3125)
27,8: STRB, non-Quad, LDR/mask (2750)
27,9: LDR/mask, non-Quad, STRB (2625)
27,10: STRB, non-Quad, STRB (3250)
27,11: STRB, non-Quad, STRB (3250)
27,12: STRB, non-Quad, LDR/mask (2750)
27,13: LDR/mask, non-Quad, STRB (2625)
27,14: STRB, non-Quad, STRB (3000)
27,15: STRB, non-Quad, STRB (3000)
28,0: STRB, non-Quad, STRB (1625)
28,1: LDR/mask, non-Quad, STRB (2875)
28,2: LDR/mask, non-Quad, STRB (3625)
28,3: STRB, Quad, LDR/mask (3125)
28,4: STRB, Quad, STRB (1750)
28,5: LDR/mask, Quad, STRB (3000)
28,6: STRB, Quad, STRB (3625)
28,7: STRB, non-Quad, LDR/mask (3250)
28,8: STRB, non-Quad, STRB (1875)
28,9: LDR/mask, non-Quad, STRB (3125)
28,10: STRB, non-Quad, STRB (3750)
28,11: STRB, non-Quad, LDR/mask (3250)
28,12: STRB, non-Quad, STRB (1875)
28,13: LDR/mask, non-Quad, STRB (3125)
28,14: STRB, non-Quad, STRB (3500)
28,15: STRB, non-Quad, LDR/mask (3000)
29,0: STRB, non-Quad, STRB (2125)
29,1: LDR/mask, non-Quad, STRB (3375)
29,2: LDR/mask, non-Quad, LDR/mask (3625)
29,3: STRB, Quad, STRB (2250)
29,4: STRB, Quad, STRB (2250)
29,5: LDR/mask, Quad, STRB (3500)
29,6: STRB, non-Quad, LDR/mask (3750)
29,7: STRB, non-Quad, STRB (2375)
29,8: STRB, non-Quad, STRB (2375)
29,9: LDR/mask, non-Quad, STRB (3625)
29,10: STRB, non-Quad, LDR/mask (3750)
29,11: STRB, non-Quad, STRB (2375)
29,12: STRB, non-Quad, STRB (2375)
29,13: LDR/mask, non-Quad, STRB (3625)
29,14: STRB, non-Quad, LDR/mask (3500)
29,15: STRB, non-Quad, STRB (2125)
30,0: STRB, non-Quad, STRB (2625)
30,1: LDR/mask, non-Quad, LDR/mask (3375)
30,2: LDR/mask, non-Quad, STRB (2750)
30,3: STRB, Quad, STRB (2750)
30,4: STRB, Quad, STRB (2750)
30,5: LDR/mask, non-Quad, LDR/mask (3625)
30,6: STRB, non-Quad, STRB (2875)
30,7: STRB, non-Quad, STRB (2875)
30,8: STRB, non-Quad, STRB (2875)
30,9: LDR/mask, non-Quad, LDR/mask (3625)
30,10: STRB, non-Quad, STRB (2875)
30,11: STRB, non-Quad, STRB (2875)
30,12: STRB, non-Quad, STRB (2875)
30,13: LDR/mask, non-Quad, LDR/mask (3625)
30,14: STRB, non-Quad, STRB (2625)
30,15: STRB, non-Quad, STRB (2625)
31,0: STRB, non-Quad, LDR/mask (2625)
31,1: LDR/mask, non-Quad, STRB (2500)
31,2: LDR/mask, non-Quad, STRB (3250)
31,3: STRB, Quad, STRB (3250)
31,4: STRB, non-Quad, LDR/mask (2875)
31,5: LDR/mask, non-Quad, STRB (2750)
31,6: STRB, non-Quad, STRB (3375)
31,7: STRB, non-Quad, STRB (3375)
31,8: STRB, non-Quad, LDR/mask (2875)
31,9: LDR/mask, non-Quad, STRB (2750)
31,10: STRB, non-Quad, STRB (3375)
31,11: STRB, non-Quad, STRB (3375)
31,12: STRB, non-Quad, LDR/mask (2875)
31,13: LDR/mask, non-Quad, STRB (2750)
31,14: STRB, non-Quad, STRB (3125)
31,15: STRB, non-Quad, STRB (3125)
32,0: STRB, non-Quad, STRB (1750)
32,1: LDR/mask, non-Quad, STRB (3000)
32,2: LDR/mask, non-Quad, STRB (3750)
32,3: STRB, non-Quad, LDR/mask (3375)
32,4: STRB, non-Quad, STRB (2000)
32,5: LDR/mask, non-Quad, STRB (3250)
32,6: STRB, non-Quad, STRB (3875)
32,7: STRB, non-Quad, LDR/mask (3375)
32,8: STRB, non-Quad, STRB (2000)
32,9: LDR/mask, non-Quad, STRB (3250)
32,10: STRB, non-Quad, STRB (3875)
32,11: STRB, non-Quad, LDR/mask (3375)
32,12: STRB, non-Quad, STRB (2000)
32,13: LDR/mask, non-Quad, STRB (3250)
32,14: STRB, non-Quad, STRB (3625)
32,15: STRB, non-Quad, LDR/mask (3125)
33,0: STRB, non-Quad, STRB (2250)
33,1: LDR/mask, non-Quad, STRB (3500)
33,2: STRB, non-Quad, LDR/mask (3875)
33,3: STRB, non-Quad, STRB (2500)
33,4: STRB, non-Quad, STRB (2500)
33,5: LDR/mask, non-Quad, STRB (3750)
33,6: STRB, non-Quad, LDR/mask (3875)
33,7: STRB, non-Quad, STRB (2500)
33,8: STRB, non-Quad, STRB (2500)
33,9: LDR/mask, non-Quad, STRB (3750)
33,10: STRB, non-Quad, LDR/mask (3875)
33,11: STRB, non-Quad, STRB (2500)
33,12: STRB, non-Quad, STRB (2500)
33,13: LDR/mask, non-Quad, STRB (3750)
33,14: STRB, non-Quad, LDR/mask (3625)
33,15: STRB, non-Quad, STRB (2250)
34,0: STRB, non-Quad, STRB (2750)
34,1: LDR/mask, non-Quad, STRB (4000)
34,2: STRB, non-Quad, STRB (3000)
34,3: STRB, non-Quad, STRB (3000)
34,4: STRB, non-Quad, STRB (3000)
34,5: LDR/mask, Quad, LDR/mask (4000)
34,6: STRB, non-Quad, STRB (3000)
34,7: STRB, non-Quad, STRB (3000)
34,8: STRB, non-Quad, STRB (3000)
34,9: LDR/mask, Quad, LDR/mask (4000)
34,10: STRB, non-Quad, STRB (3000)
34,11: STRB, non-Quad, STRB (3000)
34,12: STRB, non-Quad, STRB (3000)
34,13: LDR/mask, Quad, LDR/mask (3875)
34,14: STRB, non-Quad, STRB (2750)
34,15: STRB, non-Quad, STRB (2750)
35,0: STRB, non-Quad, STRB (3250)
35,1: LDR/mask, non-Quad, STRB (3000)
35,2: STRB, non-Quad, STRB (3500)
35,3: STRB, non-Quad, STRB (3500)
35,4: STRB, Quad, LDR/mask (3250)
35,5: LDR/mask, Quad, STRB (3125)
35,6: STRB, non-Quad, STRB (3500)
35,7: STRB, non-Quad, STRB (3500)
35,8: STRB, Quad, LDR/mask (3250)
35,9: LDR/mask, Quad, STRB (3125)
35,10: STRB, non-Quad, STRB (3500)
35,11: STRB, non-Quad, STRB (3500)
35,12: STRB, Quad, LDR/mask (3125)
35,13: LDR/mask, Quad, STRB (3000)
35,14: STRB, non-Quad, STRB (3250)
35,15: STRB, non-Quad, STRB (3250)
36,0: STRB, non-Quad, STRB (2250)
36,1: LDR/mask, non-Quad, STRB (3500)
36,2: STRB, non-Quad, STRB (4000)
36,3: STRB, Quad, LDR/mask (3750)
36,4: STRB, Quad, STRB (2375)
36,5: LDR/mask, Quad, STRB (3625)
36,6: STRB, non-Quad, STRB (4000)
36,7: STRB, Quad, LDR/mask (3750)
36,8: STRB, Quad, STRB (2375)
36,9: LDR/mask, Quad, STRB (3625)
36,10: STRB, non-Quad, STRB (4000)
36,11: STRB, Quad, LDR/mask (3625)
36,12: STRB, Quad, STRB (2250)
36,13: LDR/mask, Quad, STRB (3500)
36,14: STRB, non-Quad, STRB (3750)
36,15: STRB, non-Quad, STRB (3750)
37,0: STRB, non-Quad, STRB (2750)
37,1: LDR/mask, non-Quad, STRB (4000)
37,2: LDR/mask, non-Quad, LDR/mask (4250)
37,3: STRB, Quad, STRB (2875)
37,4: STRB, Quad, STRB (2875)
37,5: LDR/mask, Quad, STRB (4125)
37,6: STRB, Quad, LDR/mask (4250)
37,7: STRB, Quad, STRB (2875)
37,8: STRB, Quad, STRB (2875)
37,9: LDR/mask, Quad, STRB (4125)
37,10: STRB, Quad, LDR/mask (4125)
37,11: STRB, Quad, STRB (2750)
37,12: STRB, Quad, STRB (2750)
37,13: LDR/mask, Quad, STRB (4000)
37,14: STRB, non-Quad, STRB (4250)
37,15: STRB, non-Quad, STRB (2750)
38,0: STRB, non-Quad, STRB (3250)
38,1: LDR/mask, non-Quad, LDR/mask (4000)
38,2: LDR/mask, non-Quad, STRB (3375)
38,3: STRB, Quad, STRB (3375)
38,4: STRB, Quad, STRB (3375)
38,5: LDR/mask, Quad, LDR/mask (4125)
38,6: STRB, Quad, STRB (3375)
38,7: STRB, Quad, STRB (3375)
38,8: STRB, Quad, STRB (3375)
38,9: LDR/mask, Quad, LDR/mask (4125)
38,10: STRB, Quad, STRB (3250)
38,11: STRB, Quad, STRB (3250)
38,12: STRB, Quad, STRB (3250)
38,13: LDR/mask, non-Quad, LDR/mask (4375)
38,14: STRB, non-Quad, STRB (3250)
38,15: STRB, non-Quad, STRB (3250)
39,0: STRB, non-Quad, LDR/mask (3250)
39,1: LDR/mask, non-Quad, STRB (3125)
39,2: LDR/mask, non-Quad, STRB (3875)
39,3: STRB, Quad, STRB (3875)
39,4: STRB, Quad, LDR/mask (3375)
39,5: LDR/mask, Quad, STRB (3250)
39,6: STRB, Quad, STRB (3875)
39,7: STRB, Quad, STRB (3875)
39,8: STRB, Quad, LDR/mask (3375)
39,9: LDR/mask, Quad, STRB (3250)
39,10: STRB, Quad, STRB (3750)
39,11: STRB, Quad, STRB (3750)
39,12: STRB, non-Quad, LDR/mask (3625)
39,13: LDR/mask, non-Quad, STRB (3500)
39,14: STRB, non-Quad, STRB (3750)
39,15: STRB, non-Quad, STRB (3750)
40,0: STRB, non-Quad, STRB (2375)
40,1: LDR/mask, non-Quad, STRB (3625)
40,2: LDR/mask, non-Quad, STRB (4375)
40,3: STRB, Quad, LDR/mask (3875)
40,4: STRB, Quad, STRB (2500)
40,5: LDR/mask, Quad, STRB (3750)
40,6: STRB, Quad, STRB (4375)
40,7: STRB, Quad, LDR/mask (3875)
40,8: STRB, Quad, STRB (2500)
40,9: LDR/mask, Quad, STRB (3750)
40,10: STRB, Quad, STRB (4250)
40,11: STRB, non-Quad, LDR/mask (4125)
40,12: STRB, non-Quad, STRB (2750)
40,13: LDR/mask, non-Quad, STRB (4000)
40,14: STRB, non-Quad, STRB (4250)
40,15: STRB, non-Quad, LDR/mask (3750)
41,0: STRB, non-Quad, STRB (2875)
41,1: LDR/mask, non-Quad, STRB (4125)
41,2: LDR/mask, non-Quad, LDR/mask (4375)
41,3: STRB, Quad, STRB (3000)
41,4: STRB, Quad, STRB (3000)
41,5: LDR/mask, Quad, STRB (4250)
41,6: STRB, Quad, LDR/mask (4375)
41,7: STRB, Quad, STRB (3000)
41,8: STRB, Quad, STRB (3000)
41,9: LDR/mask, Quad, STRB (4250)
41,10: STRB, non-Quad, LDR/mask (4625)
41,11: STRB, non-Quad, STRB (3250)
41,12: STRB, non-Quad, STRB (3250)
41,13: LDR/mask, non-Quad, STRB (4500)
41,14: STRB, non-Quad, LDR/mask (4250)
41,15: STRB, non-Quad, STRB (2875)
42,0: STRB, non-Quad, STRB (3375)
42,1: LDR/mask, non-Quad, LDR/mask (4125)
42,2: LDR/mask, non-Quad, STRB (3500)
42,3: STRB, Quad, STRB (3500)
42,4: STRB, Quad, STRB (3500)
42,5: LDR/mask, Quad, LDR/mask (4250)
42,6: STRB, Quad, STRB (3500)
42,7: STRB, Quad, STRB (3500)
42,8: STRB, Quad, STRB (3500)
42,9: LDR/mask, non-Quad, LDR/mask (4500)
42,10: STRB, non-Quad, STRB (3750)
42,11: STRB, non-Quad, STRB (3750)
42,12: STRB, non-Quad, STRB (3750)
42,13: LDR/mask, non-Quad, LDR/mask (4500)
42,14: STRB, non-Quad, STRB (3375)
42,15: STRB, non-Quad, STRB (3375)
43,0: STRB, non-Quad, LDR/mask (3375)
43,1: LDR/mask, non-Quad, STRB (3250)
43,2: LDR/mask, non-Quad, STRB (4000)
43,3: STRB, Quad, STRB (4000)
43,4: STRB, Quad, LDR/mask (3500)
43,5: LDR/mask, Quad, STRB (3375)
43,6: STRB, Quad, STRB (4000)
43,7: STRB, Quad, STRB (4000)
43,8: STRB, non-Quad, LDR/mask (3750)
43,9: LDR/mask, non-Quad, STRB (3625)
43,10: STRB, non-Quad, STRB (4250)
43,11: STRB, non-Quad, STRB (4250)
43,12: STRB, non-Quad, LDR/mask (3750)
43,13: LDR/mask, non-Quad, STRB (3625)
43,14: STRB, non-Quad, STRB (3875)
43,15: STRB, non-Quad, STRB (3875)
44,0: STRB, non-Quad, STRB (2500)
44,1: LDR/mask, non-Quad, STRB (3750)
44,2: LDR/mask, non-Quad, STRB (4500)
44,3: STRB, Quad, LDR/mask (4000)
44,4: STRB, Quad, STRB (2625)
44,5: LDR/mask, Quad, STRB (3875)
44,6: STRB, Quad, STRB (4500)
44,7: STRB, non-Quad, LDR/mask (4250)
44,8: STRB, non-Quad, STRB (2875)
44,9: LDR/mask, non-Quad, STRB (4125)
44,10: STRB, non-Quad, STRB (4750)
44,11: STRB, non-Quad, LDR/mask (4250)
44,12: STRB, non-Quad, STRB (2875)
44,13: LDR/mask, non-Quad, STRB (4125)
44,14: STRB, non-Quad, STRB (4375)
44,15: STRB, non-Quad, LDR/mask (3875)
45,0: STRB, non-Quad, STRB (3000)
45,1: LDR/mask, non-Quad, STRB (4250)
45,2: LDR/mask, non-Quad, LDR/mask (4500)
45,3: STRB, Quad, STRB (3125)
45,4: STRB, Quad, STRB (3125)
45,5: LDR/mask, Quad, STRB (4375)
45,6: STRB, non-Quad, LDR/mask (4750)
45,7: STRB, non-Quad, STRB (3375)
45,8: STRB, non-Quad, STRB (3375)
45,9: LDR/mask, non-Quad, STRB (4625)
45,10: STRB, non-Quad, LDR/mask (4750)
45,11: STRB, non-Quad, STRB (3375)
45,12: STRB, non-Quad, STRB (3375)
45,13: LDR/mask, non-Quad, STRB (4625)
45,14: STRB, non-Quad, LDR/mask (4375)
45,15: STRB, non-Quad, STRB (3000)
46,0: STRB, non-Quad, STRB (3500)
46,1: LDR/mask, non-Quad, LDR/mask (4250)
46,2: LDR/mask, non-Quad, STRB (3625)
46,3: STRB, Quad, STRB (3625)
46,4: STRB, Quad, STRB (3625)
46,5: LDR/mask, non-Quad, LDR/mask (4625)
46,6: STRB, non-Quad, STRB (3875)
46,7: STRB, non-Quad, STRB (3875)
46,8: STRB, non-Quad, STRB (3875)
46,9: LDR/mask, non-Quad, LDR/mask (4625)
46,10: STRB, non-Quad, STRB (3875)
46,11: STRB, non-Quad, STRB (3875)
46,12: STRB, non-Quad, STRB (3875)
46,13: LDR/mask, non-Quad, LDR/mask (4625)
46,14: STRB, non-Quad, STRB (3500)
46,15: STRB, non-Quad, STRB (3500)
47,0: STRB, non-Quad, LDR/mask (3500)
47,1: LDR/mask, non-Quad, STRB (3375)
47,2: LDR/mask, non-Quad, STRB (4125)
47,3: STRB, Quad, STRB (4125)
47,4: STRB, non-Quad, LDR/mask (3875)
47,5: LDR/mask, non-Quad, STRB (3750)
47,6: STRB, non-Quad, STRB (4375)
47,7: STRB, non-Quad, STRB (4375)
47,8: STRB, non-Quad, LDR/mask (3875)
47,9: LDR/mask, non-Quad, STRB (3750)
47,10: STRB, non-Quad, STRB (4375)
47,11: STRB, non-Quad, STRB (4375)
47,12: STRB, non-Quad, LDR/mask (3875)
47,13: LDR/mask, non-Quad, STRB (3750)
47,14: STRB, non-Quad, STRB (4000)
47,15: STRB, non-Quad, STRB (4000)
48,0: STRB, non-Quad, STRB (2625)
48,1: LDR/mask, non-Quad, STRB (3875)
48,2: LDR/mask, non-Quad, STRB (4625)
48,3: STRB, non-Quad, LDR/mask (4375)
48,4: STRB, non-Quad, STRB (3000)
48,5: LDR/mask, non-Quad, STRB (4250)
48,6: STRB, non-Quad, STRB (4875)
48,7: STRB, non-Quad, LDR/mask (4375)
48,8: STRB, non-Quad, STRB (3000)
48,9: LDR/mask, non-Quad, STRB (4250)
48,10: STRB, non-Quad, STRB (4875)
48,11: STRB, non-Quad, LDR/mask (4375)
48,12: STRB, non-Quad, STRB (3000)
48,13: LDR/mask, non-Quad, STRB (4250)
48,14: STRB, non-Quad, STRB (4500)
48,15: STRB, non-Quad, LDR/mask (4000)
49,0: STRB, non-Quad, STRB (3125)
49,1: LDR/mask, non-Quad, STRB (4375)
49,2: STRB, non-Quad, LDR/mask (4875)
49,3: STRB, non-Quad, STRB (3500)
49,4: STRB, non-Quad, STRB (3500)
49,5: LDR/mask, non-Quad, STRB (4750)
49,6: STRB, non-Quad, LDR/mask (4875)
49,7: STRB, non-Quad, STRB (3500)
49,8: STRB, non-Quad, STRB (3500)
49,9: LDR/mask, non-Quad, STRB (4750)
49,10: STRB, non-Quad, LDR/mask (4875)
49,11: STRB, non-Quad, STRB (3500)
49,12: STRB, non-Quad, STRB (3500)
49,13: LDR/mask, non-Quad, STRB (4750)
49,14: STRB, non-Quad, LDR/mask (4500)
49,15: STRB, non-Quad, STRB (3125)
50,0: STRB, non-Quad, STRB (3625)
50,1: LDR/mask, non-Quad, STRB (4875)
50,2: STRB, non-Quad, STRB (4000)
50,3: STRB, non-Quad, STRB (4000)
50,4: STRB, non-Quad, STRB (4000)
50,5: LDR/mask, Quad, LDR/mask (4875)
50,6: STRB, non-Quad, STRB (4000)
50,7: STRB, non-Quad, STRB (4000)
50,8: STRB, non-Quad, STRB (4000)
50,9: LDR/mask, Quad, LDR/mask (4875)
50,10: STRB, non-Quad, STRB (4000)
50,11: STRB, non-Quad, STRB (4000)
50,12: STRB, non-Quad, STRB (4000)
50,13: LDR/mask, Quad, LDR/mask (4750)
50,14: STRB, non-Quad, STRB (3625)
50,15: STRB, non-Quad, STRB (3625)
51,0: STRB, non-Quad, STRB (4125)
51,1: LDR/mask, non-Quad, STRB (3875)
51,2: STRB, non-Quad, STRB (4500)
51,3: STRB, non-Quad, STRB (4500)
51,4: STRB, Quad, LDR/mask (4125)
51,5: LDR/mask, Quad, STRB (4000)
51,6: STRB, non-Quad, STRB (4500)
51,7: STRB, non-Quad, STRB (4500)
51,8: STRB, Quad, LDR/mask (4125)
51,9: LDR/mask, Quad, STRB (4000)
51,10: STRB, non-Quad, STRB (4500)
51,11: STRB, non-Quad, STRB (4500)
51,12: STRB, Quad, LDR/mask (4000)
51,13: LDR/mask, Quad, STRB (3875)
51,14: STRB, non-Quad, STRB (4125)
51,15: STRB, non-Quad, STRB (4125)
52,0: STRB, non-Quad, STRB (3125)
52,1: LDR/mask, non-Quad, STRB (4375)
52,2: STRB, non-Quad, STRB (5000)
52,3: STRB, Quad, LDR/mask (4625)
52,4: STRB, Quad, STRB (3250)
52,5: LDR/mask, Quad, STRB (4500)
52,6: STRB, non-Quad, STRB (5000)
52,7: STRB, Quad, LDR/mask (4625)
52,8: STRB, Quad, STRB (3250)
52,9: LDR/mask, Quad, STRB (4500)
52,10: STRB, non-Quad, STRB (5000)
52,11: STRB, Quad, LDR/mask (4500)
52,12: STRB, Quad, STRB (3125)
52,13: LDR/mask, Quad, STRB (4375)
52,14: STRB, non-Quad, STRB (4625)
52,15: STRB, non-Quad, STRB (4625)
53,0: STRB, non-Quad, STRB (3625)
53,1: LDR/mask, non-Quad, STRB (4875)
53,2: LDR/mask, non-Quad, LDR/mask (5125)
53,3: STRB, Quad, STRB (3750)
53,4: STRB, Quad, STRB (3750)
53,5: LDR/mask, Quad, STRB (5000)
53,6: STRB, Quad, LDR/mask (5125)
53,7: STRB, Quad, STRB (3750)
53,8: STRB, Quad, STRB (3750)
53,9: LDR/mask, Quad, STRB (5000)
53,10: STRB, Quad, LDR/mask (5000)
53,11: STRB, Quad, STRB (3625)
53,12: STRB, Quad, STRB (3625)
53,13: LDR/mask, Quad, STRB (4875)
53,14: STRB, non-Quad, STRB (5125)
53,15: STRB, non-Quad, STRB (3625)
54,0: STRB, non-Quad, STRB (4125)
54,1: LDR/mask, non-Quad, LDR/mask (4875)
54,2: LDR/mask, non-Quad, STRB (4250)
54,3: STRB, Quad, STRB (4250)
54,4: STRB, Quad, STRB (4250)
54,5: LDR/mask, Quad, LDR/mask (5000)
54,6: STRB, Quad, STRB (4250)
54,7: STRB, Quad, STRB (4250)
54,8: STRB, Quad, STRB (4250)
54,9: LDR/mask, Quad, LDR/mask (5000)
54,10: STRB, Quad, STRB (4125)
54,11: STRB, Quad, STRB (4125)
54,12: STRB, Quad, STRB (4125)
54,13: LDR/mask, non-Quad, LDR/mask (5375)
54,14: STRB, non-Quad, STRB (4125)
54,15: STRB, non-Quad, STRB (4125)
55,0: STRB, non-Quad, LDR/mask (4125)
55,1: LDR/mask, non-Quad, STRB (4000)
55,2: LDR/mask, non-Quad, STRB (4750)
55,3: STRB, Quad, STRB (4750)
55,4: STRB, Quad, LDR/mask (4250)
55,5: LDR/mask, Quad, STRB (4125)
55,6: STRB, Quad, STRB (4750)
55,7: STRB, Quad, STRB (4750)
55,8: STRB, Quad, LDR/mask (4250)
55,9: LDR/mask, Quad, STRB (4125)
55,10: STRB, Quad, STRB (4625)
55,11: STRB, Quad, STRB (4625)
55,12: STRB, non-Quad, LDR/mask (4625)
55,13: LDR/mask, Quad, STRB (4375)
55,14: STRB, non-Quad, STRB (4625)
55,15: STRB, non-Quad, STRB (4625)
56,0: STRB, non-Quad, STRB (3250)
56,1: LDR/mask, non-Quad, STRB (4500)
56,2: LDR/mask, non-Quad, STRB (5250)
56,3: STRB, Quad, LDR/mask (4750)
56,4: STRB, Quad, STRB (3375)
56,5: LDR/mask, Quad, STRB (4625)
56,6: STRB, Quad, STRB (5250)
56,7: STRB, Quad, LDR/mask (4750)
56,8: STRB, Quad, STRB (3375)
56,9: LDR/mask, Quad, STRB (4625)
56,10: STRB, Quad, STRB (5125)
56,11: STRB, non-Quad, LDR/mask (5125)
56,12: STRB, Quad, STRB (3625)
56,13: LDR/mask, Quad, STRB (4875)
56,14: STRB, non-Quad, STRB (5125)
56,15: STRB, non-Quad, LDR/mask (4625)
57,0: STRB, non-Quad, STRB (3750)
57,1: LDR/mask, non-Quad, STRB (5000)
57,2: LDR/mask, non-Quad, LDR/mask (5250)
57,3: STRB, Quad, STRB (3875)
57,4: STRB, Quad, STRB (3875)
57,5: LDR/mask, Quad, STRB (5125)
57,6: STRB, Quad, LDR/mask (5250)
57,7: STRB, Quad, STRB (3875)
57,8: STRB, Quad, STRB (3875)
57,9: LDR/mask, Quad, STRB (5125)
57,10: STRB, non-Quad, LDR/mask (5625)
57,11: STRB, Quad, STRB (4125)
57,12: STRB, Quad, STRB (4125)
57,13: LDR/mask, Quad, STRB (5375)
57,14: STRB, non-Quad, LDR/mask (5125)
57,15: STRB, non-Quad, STRB (3750)
58,0: STRB, non-Quad, STRB (4250)
58,1: LDR/mask, non-Quad, LDR/mask (5000)
58,2: LDR/mask, non-Quad, STRB (4375)
58,3: STRB, Quad, STRB (4375)
58,4: STRB, Quad, STRB (4375)
58,5: LDR/mask, Quad, LDR/mask (5125)
58,6: STRB, Quad, STRB (4375)
58,7: STRB, Quad, STRB (4375)
58,8: STRB, Quad, STRB (4375)
58,9: LDR/mask, non-Quad, LDR/mask (5500)
58,10: STRB, Quad, STRB (4625)
58,11: STRB, Quad, STRB (4625)
58,12: STRB, Quad, STRB (4625)
58,13: LDR/mask, Quad, LDR/mask (5375)
58,14: STRB, non-Quad, STRB (4250)
58,15: STRB, non-Quad, STRB (4250)
59,0: STRB, non-Quad, LDR/mask (4250)
59,1: LDR/mask, non-Quad, STRB (4125)
59,2: LDR/mask, non-Quad, STRB (4875)
59,3: STRB, Quad, STRB (4875)
59,4: STRB, Quad, LDR/mask (4375)
59,5: LDR/mask, Quad, STRB (4250)
59,6: STRB, Quad, STRB (4875)
59,7: STRB, Quad, STRB (4875)
59,8: STRB, non-Quad, LDR/mask (4750)
59,9: LDR/mask, non-Quad, STRB (4625)
59,10: STRB, Quad, STRB (5125)
59,11: STRB, Quad, STRB (5125)
59,12: STRB, Quad, LDR/mask (4625)
59,13: LDR/mask, Quad, STRB (4500)
59,14: STRB, non-Quad, STRB (4750)
59,15: STRB, non-Quad, STRB (4750)
60,0: STRB, non-Quad, STRB (3375)
60,1: LDR/mask, non-Quad, STRB (4625)
60,2: LDR/mask, non-Quad, STRB (5375)
60,3: STRB, Quad, LDR/mask (4875)
60,4: STRB, Quad, STRB (3500)
60,5: LDR/mask, Quad, STRB (4750)
60,6: STRB, Quad, STRB (5375)
60,7: STRB, non-Quad, LDR/mask (5250)
60,8: STRB, non-Quad, STRB (3875)
60,9: LDR/mask, non-Quad, STRB (5125)
60,10: STRB, Quad, STRB (5625)
60,11: STRB, Quad, LDR/mask (5125)
60,12: STRB, Quad, STRB (3750)
60,13: LDR/mask, Quad, STRB (5000)
60,14: STRB, non-Quad, STRB (5250)
60,15: STRB, non-Quad, LDR/mask (4750)
61,0: STRB, non-Quad, STRB (3875)
61,1: LDR/mask, non-Quad, STRB (5125)
61,2: LDR/mask, non-Quad, LDR/mask (5375)
61,3: STRB, Quad, STRB (4000)
61,4: STRB, Quad, STRB (4000)
61,5: LDR/mask, Quad, STRB (5250)
61,6: STRB, non-Quad, LDR/mask (5750)
61,7: STRB, non-Quad, STRB (4375)
61,8: STRB, non-Quad, STRB (4375)
61,9: LDR/mask, non-Quad, STRB (5625)
61,10: STRB, Quad, LDR/mask (5625)
61,11: STRB, Quad, STRB (4250)
61,12: STRB, Quad, STRB (4250)
61,13: LDR/mask, Quad, STRB (5500)
61,14: STRB, non-Quad, LDR/mask (5250)
61,15: STRB, non-Quad, STRB (3875)
62,0: STRB, non-Quad, STRB (4375)
62,1: LDR/mask, non-Quad, LDR/mask (5125)
62,2: LDR/mask, non-Quad, STRB (4500)
62,3: STRB, Quad, STRB (4500)
62,4: STRB, Quad, STRB (4500)
62,5: LDR/mask, non-Quad, LDR/mask (5625)
62,6: STRB, non-Quad, STRB (4875)
62,7: STRB, non-Quad, STRB (4875)
62,8: STRB, non-Quad, STRB (4875)
62,9: LDR/mask, non-Quad, LDR/mask (5625)
62,10: STRB, Quad, STRB (4750)
62,11: STRB, Quad, STRB (4750)
62,12: STRB, Quad, STRB (4750)
62,13: LDR/mask, Quad, LDR/mask (5500)
62,14: STRB, non-Quad, STRB (4375)
62,15: STRB, non-Quad, STRB (4375)
63,0: STRB, non-Quad, LDR/mask (4375)
63,1: LDR/mask, non-Quad, STRB (4250)
63,2: LDR/mask, non-Quad, STRB (5000)
63,3: STRB, Quad, STRB (5000)
63,4: STRB, non-Quad, LDR/mask (4875)
63,5: LDR/mask, non-Quad, STRB (4750)
63,6: STRB, non-Quad, STRB (5375)
63,7: STRB, non-Quad, STRB (5375)
63,8: STRB, non-Quad, LDR/mask (4875)
63,9: LDR/mask, non-Quad, STRB (4750)
63,10: STRB, Quad, STRB (5250)
63,11: STRB, Quad, STRB (5250)
63,12: STRB, Quad, LDR/mask (4750)
63,13: LDR/mask, Quad, STRB (4625)
63,14: STRB, non-Quad, STRB (4875)
63,15: STRB, non-Quad, STRB (4875)
Last edited by sirbod on Tue Aug 30, 2016 4:04 pm, edited 10 times in total.

User avatar
helpful
Posts: 406
Joined: Tue Sep 22, 2009 12:18 pm
Location: London
Contact:

Re: Hacker needed ... for Zarch ;-)

Postby helpful » Fri Nov 13, 2015 2:53 pm

This is all fascinating stuff, amazing work!

I wonder if it is worth trying to draw David Braben's attention to this thread? Might pique his interest enough to persuade him to allow an updated version to be released :-)

Bryan.
Last edited by helpful on Tue Nov 17, 2015 3:30 am, edited 1 time in total.
RISC OS User Group Of London - http://www.rougol.jellybaby.net/
RISC OS London Show - http://www.riscoslondonshow.co.uk/

User avatar
qUE
Posts: 67
Joined: Tue Dec 16, 2014 11:39 pm
Location: Bristol
Contact:

Re: Hacker needed ... for Zarch ;-)

Postby qUE » Sat Nov 14, 2015 12:09 am

Bryan, no I doubt it's worth it, because the game is going to be called Zorch.

Or maybe Zerch.

sirbod
Posts: 742
Joined: Mon Apr 09, 2012 8:44 am
Location: Essex
Contact:

Re: Hacker needed ... for Zarch ;-)

Postby sirbod » Sat Nov 14, 2015 5:58 am

helpful wrote: wonder if it is worth trying to draw David Braben's attention to this thread?

I've been trying to speak to him for a few years and have tried various methods of contact to no avail, please feel free to try.
Last edited by sirbod on Tue Nov 17, 2015 9:02 am, edited 1 time in total.

nudelooney
Posts: 117
Joined: Tue Sep 23, 2003 8:41 pm
Contact:

Re: Hacker needed ... for Zarch ;-)

Postby nudelooney » Sun Nov 15, 2015 1:04 pm

I'm a late-comer to this thread ... but it has made very interesting reading!

As part of all this work, has anyone created a decent disassembly of Zarch? I've been tinkering with a C++ project that emulates an ARM processor, and implements just enough RISC OS SWIs to make emulating games possible.

So far I've been using Pacmania and Terramex for tests, but it would be interesting to try Zarch too ... and it would be much simpler to check over if there's any annotated disassembly.

sirbod
Posts: 742
Joined: Mon Apr 09, 2012 8:44 am
Location: Essex
Contact:

Re: Hacker needed ... for Zarch ;-)

Postby sirbod » Sun Nov 15, 2015 1:54 pm

nudelooney wrote:As part of all this work, has anyone created a decent disassembly of Zarch?

I've disassembled, labelled the key functions with meaningful labels and am slowly working my way through changing the fixed address tables and links to relative addresses.

I obviously can't release it as JASPP doesn't have permission, but plan on using it to create a patch file for the changes discussed in this thread.

I'm making progress on the line fill code, the number of permutations is getting silly, it's testing around a dozen for any given line length and byte offset within each Quad word boundary. It's testing the speed of STRB and LDR/mask both at the start and end and also tries combining them into STM's to both reach and cross the Quad word boundary.

LDR/mask is working out quicker in some cases where it's replacing three STRB's, even with register preservation around it, so was definitely worth adding in. If I can free a working register when I convert the Tri/Quad code to ARM it will get back a further 125ns for each end that uses it.

sirbod
Posts: 742
Joined: Mon Apr 09, 2012 8:44 am
Location: Essex
Contact:

Re: Hacker needed ... for Zarch ;-)

Postby sirbod » Mon Nov 16, 2015 4:24 am

Code updated above and now includes masking methods for the start/end of the line fill.

I'm glad I didn't attempt to work out the fastest methods by hand, it's almost random as to which combination of code for start alignment, fill method and trailing pixels are the fastest :shock:
Last edited by sirbod on Tue Aug 30, 2016 4:07 pm, edited 1 time in total.

Zarchos
Posts: 2355
Joined: Sun May 19, 2013 8:19 am
Location: FRANCE

Re: Hacker needed ... for Zarch ;-)

Postby Zarchos » Mon Nov 16, 2015 7:33 am

I'm not at ease at all with BASIC, anyway here are my 2 cents (and it comes from the ideas used in the library of routines I sent you).
Hats off to you for writing all the logics to automate the mnemonics generation, knowing the time I spent to do it, by hand, even if it's 'just' copy/paste with minor modifications each time.
I'm impressed, you're obviously really very talented.
I wouldn't even have tried, to me it's far too complex to write, and to check the output code.(but well, it's true you could also generate an ASCII source with comments, too !).

Back to what you wrote, if I understood the following is for the beginning of segments, here are my remarks to do it slightly faster, depending on the offset (+0 to +3, or +3 to +0 in your code) from a word boundary (again if I managed to read your code, and I'm ill at ease with BASIC, so in advance I'm sorry if I completely missed the point :( ) :

WHEN 3
[OPT A%
LDR col1, [addr1, #-3]! ;1N + 1S + 1I
BIC col1, col1, #&FF << 24 ; 1S
ORR col1, col1, col2, LSL #24 ; 1S
]:Ncyclel%=1:Scyclel%=3:Icyclel%=1

------
why not doing this, saving 2 S ?
STRB col2,[addr1],#1

OK in fact it's even 3S saved because in your solution you'll store with STMIA this col1 register and as many col2, col3 registers needed,
and in my solution, after this STRB saving 2 cycles, I will STMIA the number of registers you needed minus 1 (thus saving 1 cycle).

Please note by using STRB col2,[addr1],#1, you'll change the quadword alignment situation for the STMIA executed afterwards ; when you were initially at addr1 = quadword + n = 0 to 15, after the STRB you'll be at quadword + n + 1 :wink:
-------




WHEN 2
[OPT A%
LDR col1, [addr1, #-2]! ;1N + 1S + 1I
MOV col1, col1, LSL #16 ; 1S
ORR col1, col1, col2, LSR #16 ; 1S
MOV col1, col1, ROR #16 ; 1S
]:Ncyclel%=1:Scyclel%=4:Icyclel%=1

-----
Why not doing this, saving 1S ?
LDR col1,[addr1],#-2 (unaligned address, will rotate bits) I understand you don't want it because of compatibility issues with latest ARMs
MOV col1,col1,LSR#2*8
ADD/ORR col1,col1,col2,LSL#2*8
-----



WHEN 1
[OPT A%
LDRB col1, [addr1, #-1]! ;1N + 1S + 1I
ORR col1, col1, col2, LSL #8 ; 1S
]:Ncyclel%=1:Scyclel%=2:Icyclel%=1

I've re read two thirds of what I wrote and corrected what didn't use quadword alignment when in fact it was possible.
Once fully OK I'll post it with some examples demonstrating how to use the code.

User avatar
qUE
Posts: 67
Joined: Tue Dec 16, 2014 11:39 pm
Location: Bristol
Contact:

Re: Hacker needed ... for Zarch ;-)

Postby qUE » Mon Nov 16, 2015 4:03 pm

Yeah, the LDRB/STRB thing I've literally seen go slower on ARM2 no matter what I tried in the past. There's probably a way to store the ending in a separate register and adjust the STMIA so that the end is included in one multiple store. i.e. TST length,#16:STMNEIA r!,{r0-r3,r7}:MOVNE length,#0

To be honest I get a bit obsessive trying to squeeze processing out of old processors, there is a point you have to say "that's fast enough" :)

Zarchos
Posts: 2355
Joined: Sun May 19, 2013 8:19 am
Location: FRANCE

Re: Hacker needed ... for Zarch ;-)

Postby Zarchos » Mon Nov 16, 2015 4:13 pm

qUE wrote:Yeah, the LDRB/STRB thing I've literally seen go slower on ARM2 no matter what I tried in the past. There's probably a way to store the ending in a separate register and adjust the STMIA so that the end is included in one multiple store. i.e. TST length,#16:STMNEIA r!,{r0-r3,r7}:MOVNE length,#0

To be honest I get a bit obsessive trying to squeeze processing out of old processors, there is a point you have to say "that's fast enough" :)


Yes there are the same ideas to use for both ends of the segment.

sirbod
Posts: 742
Joined: Mon Apr 09, 2012 8:44 am
Location: Essex
Contact:

Re: Hacker needed ... for Zarch ;-)

Postby sirbod » Mon Nov 16, 2015 5:51 pm

Zarchos wrote:if I understood the following is for the beginning of segments, here are my remarks to do it slightly faster, depending on the offset (+0 to +3, or +3 to +0 in your code) from a word boundary (again if I managed to read your code, and I'm ill at ease with BASIC, so in advance I'm sorry if I completely missed the point :( ) :
WHEN 3
[OPT A%
LDR col1, [addr1, #-3]! ;1N + 1S + 1I
BIC col1, col1, #&FF << 24 ; 1S
ORR col1, col1, col2, LSL #24 ; 1S
]:Ncyclel%=1:Scyclel%=3:Icyclel%=1

------
why not doing this, saving 2 S ?
STRB col2,[addr1],#1

It's not immediately obvious from the source what's going on, but it tries STRB's both at the start and end, as well as LDR/mask start and end. For example, using your example the full output from a line fill of 31 pixels at +3 in the Quad word is:

Code: Select all

30,3: STRB, non-Quad, STRB (2750)
      STRB, non-Quad, LDR/mask (3375)
      LDR/mask, non-Quad, STRB (3125)
      LDR/mask, non-Quad, LDR/mask (4000)
      STRB, Quad, STRB (2625)
      STRB, Quad, LDR/mask (3500)
      LDR/mask, Quad, STRB (3125)
      LDR/mask, Quad, LDR/mask (4000)

    Quickest: STRB, Quad, STRB (2625)

The code its using is:

Code: Select all

STRB R8,[R5],#1     ;2N
STMIA R5!,{R8-R10}  ;2N + 2S
STMIA R5!,{R8-R11}  ;2N + 3S
STRB R8,[R5],#1     ;2N

The first STM will Quad align the final STM, so neither STM incur a 1 cycle penalty. 8N + 5S = 2625 nS

Zarchos wrote:
WHEN 2
[OPT A%
LDR col1, [addr1, #-2]! ;1N + 1S + 1I
MOV col1, col1, LSL #16 ; 1S
ORR col1, col1, col2, LSR #16 ; 1S
MOV col1, col1, ROR #16 ; 1S
]:Ncyclel%=1:Scyclel%=4:Icyclel%=1

Why not doing this, saving 1S ?
LDR col1,[addr1],#-2 (unaligned address, will rotate bits) I understand you don't want it because of compatibility issues with latest ARMs
MOV col1,col1,LSR#2*8
ADD/ORR col1,col1,col2,LSL#2*8

Very valid point, I'm so used to avoiding rotated loads, it completely slipped my mind. I've amended the code above and updated the output to match.

User avatar
helpful
Posts: 406
Joined: Tue Sep 22, 2009 12:18 pm
Location: London
Contact:

Re: Hacker needed ... for Zarch ;-)

Postby helpful » Tue Nov 17, 2015 3:33 am

sirbod wrote:
helpful wrote: wonder if it is worth trying to draw David Braben's attention to this thread?

I've been trying to speak to him for a few years and have tried various methods of contact to no avail, please feel free to try.

Will do.

Bryan.
Last edited by helpful on Wed Nov 18, 2015 4:06 am, edited 1 time in total.
RISC OS User Group Of London - http://www.rougol.jellybaby.net/
RISC OS London Show - http://www.riscoslondonshow.co.uk/

sirbod
Posts: 742
Joined: Mon Apr 09, 2012 8:44 am
Location: Essex
Contact:

Re: Hacker needed ... for Zarch ;-)

Postby sirbod » Tue Nov 17, 2015 9:18 am

Spent a few more hours on the Quadrilateral routine this morning and spotted one particular type of Quad that is plotting incorrectly, so need to add some code to deal with it.

I also started adding code to clip the bottom of the screen, which is a bit of a minefield unless I move it to the line fill code - I'm trying to avoid that as it would add 3S to every line fill, so instead it need to calculate the intersection at the screen edge.

Statistically (over the 2min recording) 40% of Quads hit the lower screen boundary, so the code is going to be hit fairly heavily. Adding another pre-computed table to avoid the use of MUL may benefit here. I'll need to work out timings to be sure, as MUL can be quite quick on small values (one value is always below 32) provided the registers are ordered correctly.
Last edited by sirbod on Tue Aug 30, 2016 4:10 pm, edited 2 times in total.

User avatar
qUE
Posts: 67
Joined: Tue Dec 16, 2014 11:39 pm
Location: Bristol
Contact:

Re: Hacker needed ... for Zarch ;-)

Postby qUE » Tue Nov 17, 2015 10:46 pm

Jon, you can cheat on the side clipping by making a video buffer larger than the video screen, although I haven't worked out what the penalty for copying in the whole buffer to screen is, there might be a DMA trick possible here I'm not sure. Top and bottom clipping shouldn't be an issue assuming nothing lies before or after the video.

MUL is expensive imo, I try to use bitshift as much as possible.

sirbod
Posts: 742
Joined: Mon Apr 09, 2012 8:44 am
Location: Essex
Contact:

Re: Hacker needed ... for Zarch ;-)

Postby sirbod » Wed Nov 18, 2015 7:47 am

qUE wrote:Jon, you can cheat on the side clipping by making a video buffer larger than the video screen.

Considering the small size of the Quad's/Tri's, provided the screen buffer is oversized enough that no overrun will wrap, clipping could be avoided on the left, right and bottom by allowing the plot to overrun. It would need at least two 448x320 screens (280KB) with borders to reduce back down to 320x256.

The drawback however is that the top still needs clipping and it would be bespoke to VIDC/VIDC20 so clipping code would still be required for newer machines. I could reverse the plots so they start at the lowest point instead of the highest and stop at the screen top for free, it would need a complete rewrite of the Tri/Quad routines though as I coded them all the other way around.

It would give a minor improvement in speed as the line fill currently performs three clipping checks, so we'd get back 375 nS per line fill (0.257sec over the 2min recording), however we'd probably lose that time and possibly more on the line fill overruns so it might turn out to be counter productive. I don't think I can face the recoding required to find out for sure :shock:
qUE wrote:MUL is expensive imo, I try to use bitshift as much as possible.

Bitshift will be fine on ARM2 and is the method Zarch currently uses, we'll have to switch to MUL or lookup tables on later machines though so we can change the resolution.

User avatar
qUE
Posts: 67
Joined: Tue Dec 16, 2014 11:39 pm
Location: Bristol
Contact:

Re: Hacker needed ... for Zarch ;-)

Postby qUE » Wed Nov 18, 2015 4:20 pm

Maybe have multiple binaries hardcoded to different resolutions? It's what the demo crews do.

sirbod
Posts: 742
Joined: Mon Apr 09, 2012 8:44 am
Location: Essex
Contact:

Re: Hacker needed ... for Zarch ;-)

Postby sirbod » Thu Nov 19, 2015 6:38 am

qUE wrote:Maybe have multiple binaries hardcoded to different resolutions? It's what the demo crews do.

We'll probably need four binaries anyhow, ARM2 with all the code optimizations discussed in thread, ARM3/610/710/StrongARM 26bit with some of the code optimizations and no unrolled code, 32bit with none of the optimizations.

I'm concentrating on ARM2 at the minute, the other builds are just a case of dropping code and a few tweaks at compile for the resolution change.

I doubt I'll have anything to show this year, beyond graphs and stats, there's a lot to code and I do need to get back to finishing the next release of ADFFS at some point.


Return to “software”

Who is online

Users browsing this forum: No registered users and 1 guest