I wonder if it's the "maths" involved  using VDU means using human coordinates but those will have to be converted into 8bit values etc. mumblemumblemightbetalking!@$# So whether you use VDU in BASIC or assembler then conversion time is the same (and the significant part). But at least I know the answer to my idle wondering about MOVE/DRAW versus VDU!BeebMaster wrote: ↑Wed Jul 22, 2020 2:58 pmConcatenating the VDU commands into one doesn't help:
...
Or even doing it all from assembler (apart from the 3cs assembly time). This surprised me, I thought machine code would be quicker:
...
Drawing lines  BASIC vs machine code
 richardtoohey
 Posts: 4009
 Joined: Thu Dec 29, 2011 5:13 am
 Location: Tauranga, New Zealand
 Contact:
Re: Drawing lines  BASIC vs machine code

 Posts: 196
 Joined: Tue May 26, 2020 2:32 pm
 Contact:
Re: Drawing lines  BASIC vs machine code
I've spent hours trying to speedoptimise this and I've learned a lot!
It's also been very worthwhile  it can now handle 4 poles faster than it could 3! And not by nasty stuff like making the poles jump greater gaps at each cycle  nothing has changed there.
I think even 5 poles runs at a tolerable speed but I'd say that's the limit. It's opened up some great options for progressive gameplay. I'm thinking:
1) Start with 3 poles, with a delay to slow it down.
2) Once the absorber is very wide, the collection of a special pole will cause you to enter a warp level.
3) The warp level will always be 3 poles, not slowed down, no braking allowed
4) Once past a warp level your absorber goes back to the initial size, and the number of poles increases to 4 and then possibly to 5 after next warp.
Three poles now plays so fast that if I saw it on a Spectrum I'd believe it was written in machine code!
It's also been very worthwhile  it can now handle 4 poles faster than it could 3! And not by nasty stuff like making the poles jump greater gaps at each cycle  nothing has changed there.
I think even 5 poles runs at a tolerable speed but I'd say that's the limit. It's opened up some great options for progressive gameplay. I'm thinking:
1) Start with 3 poles, with a delay to slow it down.
2) Once the absorber is very wide, the collection of a special pole will cause you to enter a warp level.
3) The warp level will always be 3 poles, not slowed down, no braking allowed
4) Once past a warp level your absorber goes back to the initial size, and the number of poles increases to 4 and then possibly to 5 after next warp.
Three poles now plays so fast that if I saw it on a Spectrum I'd believe it was written in machine code!
Code: Select all
2MODE7:VDU23,1,0;0;0;0:PROCINST
5MODE2:VDU23,1,0;0;0;0
10DIMYB%(20):DIMYT%(20):DIMD%(20)
20FORI=1TO20:YB%(I)=340I^1.23:YT%(I)=340+I^1.693:D%(20I)=8+I*2:NEXT
22REMFORI=1TO10:YB%(I)=340I^1.6:YT%(I)=340+I^2.2:NEXT
30DIMR%(3):R%(0)=1:R%(1)=2:R%(2)=4:R%(3)=7
32DIMX%(10):DIMC%(10):DIMS%(10)
40R%=0:E%=100:A%=1:L%=20:M%=20:SC%=0
42GCOL0,7
45PRINT
47FORI%=0TO10:X%(I%)=RND(440)220:C%(I%)=RND(7):S%(I%)=RND(2):NEXT
50PRINTTAB(1,7);"ENERGY"
60PRINTTAB(2,9);"SCORE 0"
110VDU29,0;0;
120MOVE340,500:DRAW940,500:DRAW940,292:DRAW340,292:DRAW340,500
135FORI%=1TO10:PRINTTAB(7+I%,7);"*":SOUND1,10,50+I%*2,1:FORJ%=1TO300STEP1:NEXTJ%:NEXTI%
155VDU29,640;0;
157GCOL0,R%(0):MOVEL%,296:DRAWM%,296
158SOUND&11,2,2,255:SOUND&12,2,3,255
160REPEAT:FORI%=0TO2
165S%=S%(I%):X%=X%(I%):N%=INKEY(1)
170IFN%SOUND3,6,0,1
190GCOL0,0:MOVEX%,YB%(S%):DRAWX%,YT%(S%)
212D%=0:IFINKEY(98)D%=D%(S%)ELSEIFINKEY(67)D%=D%(S%)
215X%=X%+X%DIV(8*(1N%))+D%:S%(I%)=S%+2+N%:S%=S%(I%)
217IFS%>20ANDX%>=L%ANDX%<=M%A%=FNC(I%)
220IFS%>20ORABS(X%)>292S%=RND(2):S%(I%)=S%:X%=RND(440)220:C%(I%)=RND(7)
240GCOL3,C%(I%):MOVEX%,YB%(S%):DRAWX%,YT%(S%)
250IFR%=3ANDINKEY(74)Z%=FNL
255X%(I%)=X%
260NEXT:UNTILA%=0
261SOUND&11,0,0,0:SOUND&12,0,0,0:SOUND&13,0,0,0:ENVELOPE3,128,0,0,0,0,0,0,0,0,0,1,126,0:SOUND4,3,6,1
262PRINTTAB(5,30);"Press SPACE"
265G$=GET$:IFG$<>" "THENGOTO265
270CLS:GOTO40
300DEFFNC(I%)
305IFR%(R%)<>C%(I%)ORR%=3THENSOUND3,15,0,2:E%=E%10:R%=0:GCOL0,R%(R%):MOVEL%,296:DRAWM%,296:PRINTTAB(8+E%/10,7);" ":=E%
310SOUND3,7,100,1:R%=R%+1:IFR%=4THENR%=0
311SC%=SC%+50:PRINTTAB(8,9);SC%
312IFR%=3THENL%=L%8:M%=M%+8
313GCOL0,R%(R%):MOVEL%,296:DRAWM%,296
315=1
400DEFFNL
403ENVELOPE1,1,4,4,8,1,1,1,0,0,0,60,60,60:SOUND1,1,120,10:SOUND2,1,124,10:SOUND3,1,128,10
404GCOL3,7:MOVEL%,308:MOVEM%,308:PLOT85,M%,340:MOVEL%,340:PLOT85,L%,308
405R%=0:GCOL0,R%(R%):MOVEL%,296:DRAWM%,296
410FORJ%=0TO3
420IFC%(J%)=7ANDX%(J%)>=L%ANDX%(J%)<=M%THENGCOL3,C%(J%):MOVEX%(J%),YB%(S%(J%)):DRAWX%(J%),YT%(S%(J%)):S%(J%)=RND(2):X%(J%)=RND(440)+440:C%(J%)=RND(7):SC%=SC%+200:PRINTTAB(8,9);SC%:ENVELOPE3,128,0,0,0,0,0,0,0,0,0,2,100,0:SOUND4,3,5,1
425NEXT
427GCOL3,7:MOVEL%,308:MOVEM%,308:PLOT85,M%,340:MOVEL%,340:PLOT85,L%,308
430=1
500REPEAT: FOR I=1TO127STEP1:IFINKEY(I)THENPRINTI
510NEXT:UNTIL0
1000DEFPROCINST
1005PRINT
1010PRINT"POLE RUNNER"
1020PRINT
1030PRINT"You find yourself hurtling along the"
1035PRINT"barren surface of Planet RGB in your"
1040PRINT"Pole Runner Mk3 hover vehicle."
1045PRINT
1050PRINT"Your mission: to eliminate as many of"
1055PRINT"the evil WHITE POLES as you can. But"
1060PRINT"before you can fire your LASER beam"
1065PRINT"you must first charge it, by absorbing"
1070PRINT"a RED, GREEN and BLUE pole, in that"
1075PRINT"order. Absorb poles by bumping into"
1076PRINT"them with your frontmounted ABSORBER."
1080PRINT
1082PRINT"If you hit a pole of a colour that your"
1084PRINT"ABSORBER is not expecting, you will"
1086PRINT"take damage. When your LASER is charged"
1087PRINT"your ABSORBER will widen. This is good,":PRINT"and bad..."
1088PRINT
1090PRINT"ZLeft XRight SHIFTSlow RETURNFire"
1100PRINT
1110PRINT"PRESS A KEY TO GET STARTED!";
1900G$=GET$
2000ENDPROC

 Posts: 196
 Joined: Tue May 26, 2020 2:32 pm
 Contact:
Re: Drawing lines  BASIC vs machine code
Some insight into the optimisations I tried, according to my notes. All numbers are percentage reductions in time taken for an average cycle of the main loop. I set it up to have 4 poles always in the same starting position, hitting the bottom 10 times.
Changing a "*.1" to a "DIV8": 4% (thanks jms2!)
Getting rid of a "DIV8": 3% (but I wasn't able to do that in the end)
Removing all THEN statements: 0.0006% !
Removing two subtractions by shifting the origin: 2% (thanks jms2!)
Minimising references to S%(I%) by putting value into a temporary variable: 3%
Minimising references to X%(I%) by putting value into a temporary variable: 9% !
Replacing two SOUND statements with zero, or one only if braking: 13% !
Reading INKEY(1) once and putting into temporary variable: 1%
I'm a bit disappointed I've had to change the sound a bit, I really liked the way it lowered in pitch when you were braking. I'm working on what else may sound nice instead.
I tried lots of other stuff, too.
One of the more interesting things I tried was reducing the line heights and making the periscope view less tall. Any time savings there should be purely from the builtin linedrawing machine code. It saved 2%. While that's significant, I didn't like how it looked, so put things back as they were.
Changing a "*.1" to a "DIV8": 4% (thanks jms2!)
Getting rid of a "DIV8": 3% (but I wasn't able to do that in the end)
Removing all THEN statements: 0.0006% !
Removing two subtractions by shifting the origin: 2% (thanks jms2!)
Minimising references to S%(I%) by putting value into a temporary variable: 3%
Minimising references to X%(I%) by putting value into a temporary variable: 9% !
Replacing two SOUND statements with zero, or one only if braking: 13% !
Reading INKEY(1) once and putting into temporary variable: 1%
I'm a bit disappointed I've had to change the sound a bit, I really liked the way it lowered in pitch when you were braking. I'm working on what else may sound nice instead.
I tried lots of other stuff, too.
One of the more interesting things I tried was reducing the line heights and making the periscope view less tall. Any time savings there should be purely from the builtin linedrawing machine code. It saved 2%. While that's significant, I didn't like how it looked, so put things back as they were.
 BeebMaster
 Posts: 3866
 Joined: Sun Aug 02, 2009 5:59 pm
 Location: Lost in the BeebVault!
 Contact:
Re: Drawing lines  BASIC vs machine code
I just looked at the first few lines that appear in the code box and one way to increase speed and save variable space is to use arrays differently.
Instead of making an array with DIM, reserve some space instead.
So, instead of DIM X% (10) do DIM X% 10.
Then to refer to X%(1) or X%(10) you would use X%?1 or X%?10 to read the value, or X%(2)= becomes X%?2= etc.
If any value of X% in the array is going to take up more than one byte, DIM more space and search for the correct value as an offset:
e.g. an integer variable uses 4 bytes, so DIM X% 40 can be seen as 10 blocks of 4 bytes.
X%(n) is then !(X%+(n*4)).
One limitation of that method is that you can't store floating point numbers or negatives (at least not easily)  but it's the same with integer ("%") variables anyway.
Also remember with arrays you get X%(0) "for free" (except in terms of space) so DIM X%(n) actually creates elements 0 to n rather than 1 to n.
Various integer variables  A%, O%, P%, X%, and Y%  have special meanings in BASIC for referring to registers in the 6502 (Accumulator, Offset program counter, Program counter, X Register, Y Register) so although there's no harm in using them just as variables, if you put some assembler or OS calls in later on they could inadvertently be duplicated or corrupted.
Instead of making an array with DIM, reserve some space instead.
So, instead of DIM X% (10) do DIM X% 10.
Then to refer to X%(1) or X%(10) you would use X%?1 or X%?10 to read the value, or X%(2)= becomes X%?2= etc.
If any value of X% in the array is going to take up more than one byte, DIM more space and search for the correct value as an offset:
e.g. an integer variable uses 4 bytes, so DIM X% 40 can be seen as 10 blocks of 4 bytes.
X%(n) is then !(X%+(n*4)).
One limitation of that method is that you can't store floating point numbers or negatives (at least not easily)  but it's the same with integer ("%") variables anyway.
Also remember with arrays you get X%(0) "for free" (except in terms of space) so DIM X%(n) actually creates elements 0 to n rather than 1 to n.
Various integer variables  A%, O%, P%, X%, and Y%  have special meanings in BASIC for referring to registers in the 6502 (Accumulator, Offset program counter, Program counter, X Register, Y Register) so although there's no harm in using them just as variables, if you put some assembler or OS calls in later on they could inadvertently be duplicated or corrupted.
Re: Drawing lines  BASIC vs machine code
Beebmaster beat me to it. I was going to suggest exactly that. I have been doing a test to see how much faster it is, which was to simply assign a value to variables in an array 1000 times. Compared to using a BASIC array, the saving with indirection operators was 22%.
For byte values, you'd replace
with
The problem is that whilst you can store negative numbers (it uses two's complement, and can store the range 127 to +128 I think, which would be OK for your project), when you read back the value BASIC always interprets the data as a positive number.
So I think you can only use this number for positive numbers (0256), unless someone else can suggest a workaround.
For byte values, you'd replace
Code: Select all
X%(I%)
Code: Select all
X%?I%
So I think you can only use this number for positive numbers (0256), unless someone else can suggest a workaround.

 Posts: 196
 Joined: Tue May 26, 2020 2:32 pm
 Contact:
Re: Drawing lines  BASIC vs machine code
BeebMaster wrote: ↑Thu Jul 23, 2020 8:02 pmInstead of making an array with DIM, reserve some space instead.
So, instead of DIM X% (10) do DIM X% 10.
Then to refer to X%(1) or X%(10) you would use X%?1 or X%?10 to read the value, or X%(2)= becomes X%?2= etc.
If any value of X% in the array is going to take up more than one byte, DIM more space and search for the correct value as an offset:
I was just coming up with a plan to test this idea.jms2 wrote: ↑Thu Jul 23, 2020 8:59 pmBeebmaster beat me to it. I was going to suggest exactly that. I have been doing a test to see how much faster it is, which was to simply assign a value to variables in an array 1000 times. Compared to using a BASIC array, the saving with indirection operators was 22%.
...
So I think you can only use this number for positive numbers (0256), unless someone else can suggest a workaround.
For starters, I can shift the Y origin, then my array of line tops and bottoms will contain values 0200 as opposed to 300500, and will fit in a byte, and no need for an addition calculation.
Regarding the optimisation I've already done by sticking X%(I%) in a variable, i.e. I only read and write to the array once in the loop now, I'm wondering how much of a timesaving this new approach will be, but I'm going to try it!
First I'll try the line Y coord array though, that's likely to get the most benefit from this I think, and report back.
Thanks again:)

 Posts: 513
 Joined: Sat Dec 23, 2000 5:56 pm
 Contact:
Re: Drawing lines  BASIC vs machine code
As an aside, supposing it ever becomes a bottleneck, I once checked out the MOS routines versus completely handcrafted line drawing on an Electron. On that machine the processor accesses ROM more quickly than RAM so if you wanted to reproduce the full swathe of MOS functionality then you'd be at quite a disadvantage.
In practice the handcrafted line drawing was faster, and I think that was for the fairly obvious reason that the MOS routines have to be able to handle arbitrary screen start addresses, a whole bunch of ways to stipple, check whether they should be including or excluding start and end points, and more. My custom drawer had a fixed start address and geometry, and drew only solid lines.
In practice the handcrafted line drawing was faster, and I think that was for the fairly obvious reason that the MOS routines have to be able to handle arbitrary screen start addresses, a whole bunch of ways to stipple, check whether they should be including or excluding start and end points, and more. My custom drawer had a fixed start address and geometry, and drew only solid lines.
Re: Drawing lines  BASIC vs machine code
That's where I would expect the gains to come from  better specialisation, rather than fiddling with exactly how BASIC feeds bytes to OSWRCH.ThomasHarte wrote: ↑Thu Jul 23, 2020 9:35 pmIn practice the handcrafted line drawing was faster, and I think that was for the fairly obvious reason that the MOS routines have to be able to handle arbitrary screen start addresses, a whole bunch of ways to stipple, check whether they should be including or excluding start and end points, and more. My custom drawer had a fixed start address and geometry, and drew only solid lines.
Re: Drawing lines  BASIC vs machine code
I'd be interested to hear of anyone who is using floats from machine code. I thought it was more usual to use only integer arithmetic and if you need a fractional part you effectively use fixed point arithmetic, i.e. integer arithmetic with an implied point at some particular place.Adam James wrote: ↑Tue Jul 21, 2020 1:36 pmIf people are writing machine code and handling floats, do they write their own routines for e.g. multiplication from scratch, or do they call builtin routines that are already as fast as can be?
One classic example of that, though nothing to do with graphics, is doing financial calculations in pence and then printing a decimal point two digits from the right when displaying the result. You can have the same implied point in binary.
As for the implementation, Sophie's floating point routines seem to have been optimised for speed but they're also not part of the MOS interface and don't have a published, fixed, calling address.

 Posts: 196
 Joined: Tue May 26, 2020 2:32 pm
 Contact:
Re: Drawing lines  BASIC vs machine code
Update: I used this idea for the array of Y tops and bottoms after shifting the origin, and it shaved off 2% time. I then applied it to the S% array and a few other arrays and it's now shaved off nearly 5.5% of time.jms2 wrote: ↑Thu Jul 23, 2020 8:59 pmBeebmaster beat me to it. I was going to suggest exactly that. I have been doing a test to see how much faster it is, which was to simply assign a value to variables in an array 1000 times. Compared to using a BASIC array, the saving with indirection operators was 22%.
I can get more out of it, too: I've yet to apply it to the array of 'deltas' for steering as those values are all less than 255 as well.
Thank you so much you two for your ideas, I'm amazed at what more can be squeezed out of BASIC. I'm going to make a good effort of getting a polished game out of this that people will actually want to play:)
 richardtoohey
 Posts: 4009
 Joined: Thu Dec 29, 2011 5:13 am
 Location: Tauranga, New Zealand
 Contact:

 Posts: 513
 Joined: Sat Dec 23, 2000 5:56 pm
 Contact:
Re: Drawing lines  BASIC vs machine code
As an awkward segue point, classic Bresenham uses fixed point with an arbitrary base. If it's supposed to draw a line 163 pixels wide and 20 pixels tall then it'll step along the 163 pixels and at each step add 20/163 to a counter, moving up one line every time that amounts to a number greater than 1. Except there's no benefit to actually working out what 20/163 is, so it just adds 20 at every step and moves up one line every time that amounts to a number greater than 163. Which is exactly fixed point with a base of 163.Coeus wrote: ↑Thu Jul 23, 2020 10:08 pmI'd be interested to hear of anyone who is using floats from machine code. I thought it was more usual to use only integer arithmetic and if you need a fractional part you effectively use fixed point arithmetic, i.e. integer arithmetic with an implied point at some particular place.Adam James wrote: ↑Tue Jul 21, 2020 1:36 pmIf people are writing machine code and handling floats, do they write their own routines for e.g. multiplication from scratch, or do they call builtin routines that are already as fast as can be?
(Caveat: subject to some doubling to allow an initial bias of 0.5 in order to make things symmetrical, but you get the point)
There's also otherwayaround Bresenham which calculates how many pixels until you go up a level and draws horizontal or vertical segments, but you actually have to do a divide to set that up — in the above case you need to work out that 163/20 = 8 remainder 3 so that you know that each run will be 8 pixels, occasionally plus 1, and that 3/20 is the amount to add to your error counter at each step to determine when that extra 1 will occur.

 Posts: 196
 Joined: Tue May 26, 2020 2:32 pm
 Contact:
Re: Drawing lines  BASIC vs machine code
I'm stunned at just how many optimisations I've been able to perform on the main loop of this code. I suppose that says a lot about how poor it was in the first place . The latest optimisation was actually obvious, and more to do with poor thought in the first place. I was doing things like checking INKEY status and doing some calcs in the loop where each pole is updated. They can come before/after the loop. So another 9% time shaved off.
Optimisation is also becoming very addictive! I've now spent 2 days on it, instead of just trying to finish the game today. I think I may be turning into a typical Stardot member and getting more excited by the programming than the games. Heaven help me!
RE machine code and line drawing techniques, I'm very confident I won't need them now. And to be honest I'm relieved, not only have I ducked something very timeconsuming and difficult to learn, but a huge part of the fun of this project has been to come up with novel gameplay which takes advantage of things that already exist in the OS, or require very little processing, but can result in an immersive and worthwhile 3D game.
Optimisation is also becoming very addictive! I've now spent 2 days on it, instead of just trying to finish the game today. I think I may be turning into a typical Stardot member and getting more excited by the programming than the games. Heaven help me!
RE machine code and line drawing techniques, I'm very confident I won't need them now. And to be honest I'm relieved, not only have I ducked something very timeconsuming and difficult to learn, but a huge part of the fun of this project has been to come up with novel gameplay which takes advantage of things that already exist in the OS, or require very little processing, but can result in an immersive and worthwhile 3D game.
Re: Drawing lines  BASIC vs machine code
Is that how the MOS draws?ThomasHarte wrote: ↑Thu Jul 23, 2020 11:04 pmAs an awkward segue point, classic Bresenham uses fixed point with an arbitrary base. If it's supposed to draw a line 163 pixels wide and 20 pixels tall then it'll step along the 163 pixels and at each step add 20/163 to a counter, moving up one line every time that amounts to a number greater than 1. Except there's no benefit to actually working out what 20/163 is, so it just adds 20 at every step and moves up one line every time that amounts to a number greater than 163. Which is exactly fixed point with a base of 163.Coeus wrote: ↑Thu Jul 23, 2020 10:08 pmI'd be interested to hear of anyone who is using floats from machine code. I thought it was more usual to use only integer arithmetic and if you need a fractional part you effectively use fixed point arithmetic, i.e. integer arithmetic with an implied point at some particular place.Adam James wrote: ↑Tue Jul 21, 2020 1:36 pmIf people are writing machine code and handling floats, do they write their own routines for e.g. multiplication from scratch, or do they call builtin routines that are already as fast as can be?
(Caveat: subject to some doubling to allow an initial bias of 0.5 in order to make things symmetrical, but you get the point)
There's also otherwayaround Bresenham which calculates how many pixels until you go up a level and draws horizontal or vertical segments, but you actually have to do a divide to set that up — in the above case you need to work out that 163/20 = 8 remainder 3 so that you know that each run will be 8 pixels, occasionally plus 1, and that 3/20 is the amount to add to your error counter at each step to determine when that extra 1 will occur.
Re: Drawing lines  BASIC vs machine code
I believe so, but you would have to consult the MOS disassembly to be absolutely certain.
Check out also chapter 18 of ZX81 BASIC PROGRAMMING  on page 121 is an implementation in BASIC of a straightline drawing algorithm that never made it into the ZX81 ROM, but probably was used for the Spectrum.
If you reprogram the 6845 CRTC for 32 character (MODE 4) lines, then each character row will be exactly 256 bytes long, which will make the maths much easier (and therefore faster).
Check out also chapter 18 of ZX81 BASIC PROGRAMMING  on page 121 is an implementation in BASIC of a straightline drawing algorithm that never made it into the ZX81 ROM, but probably was used for the Spectrum.
If you reprogram the 6845 CRTC for 32 character (MODE 4) lines, then each character row will be exactly 256 bytes long, which will make the maths much easier (and therefore faster).
Re: Drawing lines  BASIC vs machine code
Good find. Should be some fun too convert into assembler but there are no divides.
Re: Drawing lines  BASIC vs machine code
Here you go: Some 6502 assembly code to perform division! This is going to be introduced into BCP to replace an earlier effort, which did about the same job only it took about three times as long to do it:Note that although it can handle up to a 32bit dividend, the quotient must fit into 16 bits! It works by attempting to subtract the divisor from the dividend in each place value, shifting ones into the quotient and updating the dividend wherever the subtraction leaves a positive difference.
You need six byes of zeropage workspace; four adjacent bytes for the divdend and the quotient, and two adjacent bytes for the divisor. I've got divr at &70 and divd at &74. If you want the corresponding multiply routine to go with it, just shout!
Code: Select all
\ 32 BIT BY 16 BIT DIVIDE
\
\ DEDICATED TO THE PUBLIC DOMAIN 2020
\ JULIE KIRSTY LOUISE MONTOYA
\ USE * ABUSE * ENJOY * DESTROY * STUDY * SHARE * ADAPT
\
\ We start with an extra left shift on just the bottom bits of the
\ dividend, to get the bit we need to shift into the top bits before we
\ attempt the subtraction. If the top bits of the dividend are equal to
\ or greater than the divisor, C will be 1; meaning we need to update
\ the dividend top half with the difference, and the next bit of the
\ quotient is a 1. Otherwise the next bit of the quotient is a 0.
\ We split the leftshifting so the top half of the divisor is
\ shifted at the beginning of the loop. The final left shift will thus
\ operate on just the low bits, bringing in the units bit of the quotient
\ from the last attempted subtraction. We are basically one bit behind
\ ourselves the whole time.
\ After 16 shifts and subtractions, the high bits of the dividend will
\ contain the remainder and the low bits will contain the quotient.
\ Both dividend and divisor must both be positive.
.divide32
LDY#17 \ one more than we need
BNE div32_2 \ do an extra left shift on just bottom bits
.div32_1
ROL divd+2
ROL divd+3
.div32_2
SEC
LDA divd+2
SBC divr
TAX \ stash low byte in X in case we need it
LDA divd+3
SBC divr+1
BCC div32_3
\ update dividend if we had room to subtract
STX divd+2
STA divd+3
.div32_3
ROL divd \ C shifts into divd
ROL divd+1
DEY
BNE div32_1
\ divd, divd+1 now contain quotient
\ divd+2, divd+3 contain remainder
RTS
You need six byes of zeropage workspace; four adjacent bytes for the divdend and the quotient, and two adjacent bytes for the divisor. I've got divr at &70 and divd at &74. If you want the corresponding multiply routine to go with it, just shout!
Re: Drawing lines  BASIC vs machine code
Thanks, but I meant I thought no divides were required in that ZX81 code.
Re: Drawing lines  BASIC vs machine code
Oh, I see what you mean. I didn't actually spot that there are no divisions in that code! Apart from one divide by two, that can be accomplished using LSR on the high byte (catching the bit that falls off the end in the carry) and then ROR on the low byte (which imports the carry into the first bit vacated). And everything else after that is just additions! It makes sense, when you are counting through every pixel position anyway .....
For absolutely blistering performance, you should look at creating a 256 pixel wide version of MODE 4, or a 128 pixel wide version of MODE 5, by reprogramming the CRTC. Each character row being exactly 256 bytes will shave some time off screen address calculation.
For absolutely blistering performance, you should look at creating a 256 pixel wide version of MODE 4, or a 128 pixel wide version of MODE 5, by reprogramming the CRTC. Each character row being exactly 256 bytes will shave some time off screen address calculation.