Border tracing Mandelbrot generator for the Beeb

discussion of beeb/electron applications, languages, utils and educational s/w
litwr
Posts: 208
Joined: Sun Jun 12, 2016 8:44 am
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by litwr » Thu Nov 10, 2016 5:31 am

I've just made a Mandelbrot video for the BBC Micro program. The program had worked during about 5 minutes but the video is accelerated. The speed and GUI are very good. :) IMHO it is the best Mandelbrot in the 8-bit world. The version for the 2nd processor may be better than the best Mandelbrot for Amiga or Atari ST. :D
Image

User avatar
hoglet
Posts: 7591
Joined: Sat Oct 13, 2012 6:21 pm
Location: Bristol
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by hoglet » Thu Nov 10, 2016 4:16 pm

litwr wrote:I've just made a Mandelbrot video for the BBC Micro program. The program had worked during about 5 minutes but the video is accelerated. The speed and GUI are very good. :) IMHO it is the best Mandelbrot in the 8-bit world. The version for the 2nd processor may be better than the best Mandelbrot for Amiga or Atari ST. :D
Is this program available for us to try out?

litwr
Posts: 208
Joined: Sun Jun 12, 2016 8:44 am
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by litwr » Thu Nov 10, 2016 6:36 pm

hoglet wrote:Is this program available for us to try out?
It is the main program of this thread! :)

User avatar
hoglet
Posts: 7591
Joined: Sat Oct 13, 2012 6:21 pm
Location: Bristol
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by hoglet » Thu Nov 10, 2016 7:02 pm

litwr wrote:
hoglet wrote:Is this program available for us to try out?
It is the main program of this thread! :)
Ah yes, sorry I was confused for a moment that you had a new program. Just ignore me :D

User avatar
Rich Talbot-Watkins
Posts: 1365
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by Rich Talbot-Watkins » Wed Nov 23, 2016 2:44 pm

litwr wrote:I've just made a Mandelbrot video for the BBC Micro program. The program had worked during about 5 minutes but the video is accelerated. The speed and GUI are very good. :) IMHO it is the best Mandelbrot in the 8-bit world. The version for the 2nd processor may be better than the best Mandelbrot for Amiga or Atari ST. :D
Thanks! That's nice to hear. My biggest problem with it now is the precision problem after zooming for the fifth time (which is why zoom gets disabled after that) - this would be fixed by either using more fixed-point bits (currently 29 bits), or by going for some kind of floating-point representation. Neither are great as they will slow things down more. The former would be simpler but would still only increase the number of zoom levels by 4 per extra byte used. The latter would probably allow zooming to a much greater depth, but will add a much greater processing cost. One day I'll have a play with it and see if I can improve the accuracy at high zooms.

By the way, any 2nd processor version would most likely only double the speed (entirely due to the double clock speed of the parasite) - if I could find a way for both CPUs to perform calculations in parallel, this would improve things a bit more, although I expect the synchronisation would probably hit performance.

User avatar
BigEd
Posts: 2139
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by BigEd » Wed Nov 23, 2016 2:50 pm

I think floating point is no win at all for Mandelbrot - the orbits of the points go all over the map. You just need more bits of precision, and you can stick with fixed point.

Edit:
Rich Talbot-Watkins wrote:By the way, any 2nd processor version would most likely only double the speed.
I don't suppose it would help that the modern co-pro is running at 64MHz, or 256MHz?!?

User avatar
Rich Talbot-Watkins
Posts: 1365
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by Rich Talbot-Watkins » Wed Nov 23, 2016 2:55 pm

Really? I figured that since the algorithm is basically z2+c iterated until it exceeds 2, the important thing is the number of significant bits you have, not the number of total bits. In cases on the edge of the Mandelbrot set at high zooms, you are potentially iterating many times, and the values themselves might be very small, but what's more important is the number of significant bits of the result. I imagined going to floating point would absolutely be a win, but perhaps I need to think more carefully about that (or just try coding it in C++ and comparing fixed vs floating point results).

User avatar
Rich Talbot-Watkins
Posts: 1365
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by Rich Talbot-Watkins » Wed Nov 23, 2016 2:56 pm

BigEd wrote:I don't suppose it would help that the modern co-pro is running at 64MHz, or 256MHz?!?
Well, yeah, there is that. But I consider that to be cheating! :lol:

User avatar
BigEd
Posts: 2139
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by BigEd » Wed Nov 23, 2016 3:06 pm

Rich Talbot-Watkins wrote:Really? I figured that since the algorithm is basically z2+c iterated until it exceeds 2, the important thing is the number of significant bits you have, not the number of total bits. In cases on the edge of the Mandelbrot set at high zooms, you are potentially iterating many times, and the values themselves might be very small, but what's more important is the number of significant bits of the result. I imagined going to floating point would absolutely be a win, but perhaps I need to think more carefully about that (or just try coding it in C++ and comparing fixed vs floating point results).
Hmm, on reflection I'm not sure any more... this calls for experiment, or a bit of Googling, or perhaps some deep thought.

User avatar
BigEd
Posts: 2139
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by BigEd » Wed Nov 23, 2016 3:08 pm

Isn't it true that we're not likely to be zooming in very close to (0,0) but close to somewhere on the boundary, so the magnitude of our numbers is, say, around a half? And yet, when zoomed right in, we need to distinguish the evolution of numbers which differ by very small amounts. So, we do need lots of accuracy, and our magnitudes don't vary so much. Floating point would seem like it's just extra work - what matters is how many bits of precision we have.

User avatar
Rich Talbot-Watkins
Posts: 1365
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by Rich Talbot-Watkins » Wed Nov 23, 2016 3:13 pm

Yeah I also just reflected and think you're probably right actually. The problem is that the ranges for c in "interesting" areas go from (-1.5...0.5, +/-1) - so adding c each iteration can immediately increase the number of significant bits needed to accurately hold the result. Floating point will sometimes help, but other times will just truncate precision, in which case we still have the same problem. Going to a 6 or 8 byte fixed-point representation would probably be the first step in getting better results. Also it needs to increase the maximum iteration count, the higher the zoom! Another thing that'll slow it down! #-o

Edit: yes, basically what you said there.

User avatar
BigEd
Posts: 2139
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by BigEd » Wed Nov 23, 2016 3:29 pm

With a normal 3 or 4MHz copro, there would be some advantage in getting the 2MHz host to do a bit of work. We know how to upload a bit of code to the host using OSWORD and to give it control using USERV. We need to remember that the host can't do any I/O while it's busy running our code - including passing on an ESCAPE event - so we need fairly small work packets. How about this:
  • - collect three multiplications which need doing
    - pass one to the host
    - perform the other two
    - collect result from the host
There's a bit of a gotcha there, in that a much faster copro will now be rate-limited by the host's speed. But that's OK if you provide a way to disable this speedup, or can detect this case.

I think computing z² demands two squarings and one multiplication, so that's three roughly-equal bits of work, which is handy.

User avatar
Rich Talbot-Watkins
Posts: 1365
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by Rich Talbot-Watkins » Wed Nov 23, 2016 3:59 pm

I decided to do a few rough calculations to see whether the multiplication is still likely the bottleneck. The 32 bit * 32 bit multiply code (see .mul_fixed in the source code) takes a bit less than 800 cycles (compared to the naive way which would be anything from 2200-3200 cycles, depending on how many bits are set in the multiplicand), so this is a lot better, but still the slowest thing in the code!

Squaring is actually a little faster as some of the partial products are identical and so only need to be calculated once.

Your idea for splitting up the three multiplications between the two CPUs is great, in the case that the parasite is about a factor of two faster. How lucky the algorithm works like that!

User avatar
lurkio
Posts: 1779
Joined: Tue Apr 09, 2013 11:30 pm
Location: Doomawangara
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by lurkio » Wed Nov 23, 2016 9:22 pm

Rich Talbot-Watkins wrote:New version! Now it looks like all the outlines are grown in parallel from the outside inwards - not sure if this is more or less visually interesting, but it has lower space requirements, so it's probably here to stay. Changed to a big overscan MODE 2 (184x272)!
Just catching up with this. Very nice! =D>
screenshot.png
Screenshot
:idea:

User avatar
BigEd
Posts: 2139
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by BigEd » Wed Nov 23, 2016 9:34 pm

Rich Talbot-Watkins wrote:Squaring is actually a little faster as some of the partial products are identical and so only need to be calculated once.
I like the idea of squaring being a bit cheaper!

Can you say in a couple of sentences what your approach to multiplication is? I think I can see you're using a number of page-sized lookup tables. (I like the trick of Karatsuba multiplication but I'm not sure I've yet seen it used in real life.)

User avatar
Rich Talbot-Watkins
Posts: 1365
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by Rich Talbot-Watkins » Wed Nov 23, 2016 10:24 pm

Yeah! So there's the reasonably well known "difference of squares method". This basically says that:

ab = f(a+b) - f(a-b)

where f(x) = x2/4

You can show this by expanding (a + b)2 and (a - b)2, like this:

(a + b)2 = a2 + b2 + 2ab -> (I)
(a - b)2 = a2 + b2 - 2ab -> (II)

(I) - (II) gives:

4ab = (a + b)2 - (a - b)2

So if we were to build a table of f(x) for x = 0...510, that would allow us to quickly look up f(a + b) for any positive a and b between 0 and 255:

Code: Select all

LDA mul1
STA mulindex
LDY mul2
.mulindex LDA (mulindex),Y  ; (mulindex) points to page aligned multable
How about f(a - b)? There's a lovely little hack which involves creating a new table, this time of g(x) = ((x-255)2)/4.

From this we can say: f(x) = g(x + 255).

So now we have f(a - b) = g( (a + 255) - b).

(255 - b) is just (b EOR &FF), so we can write f(a - b) = g(a + (b EOR &FF) ), and now we have the entire code snippet:

Code: Select all

LDA mul1
STA mulindex1
STA mulindex3
EOR #&FF
STA mulindex2
STA mulindex4

LDY mul2
SEC
LDA (mulindex1),Y ; points to page aligned multable_lo
SBC (mulindex2),Y ; points to page aligned multable255_lo
STA result_lo
LDA (mulindex3),Y ; points to page aligned multable_hi
SBC (mulindex4),Y ; points to page aligned multable255_hi
STA result_hi
The first 6 lines of that are a constant preamble for constant mul1, so we get a further optimization if we want to multiply several values by mul1! That works out an 8 bit * 8 bit = 16 bit multiplication in very few cycles.

How about bigger numbers?

Well this is just about adding together partial products:

(a0 + ma1)(b0 + mb1)
= a0b0 + m(a0b1 + a1b0) + m2(a1b1)
(where m=256)

So, to do 16 bit * 16 bit = 32 bit, it's 4 multiplies, and then add the partial products together.

You can extend that idea to as far as you need, but the number of terms becomes a bit hairy after a while. In the source code, you can see how all these partial products need to get added together. Because each one is 16 bit, they have to be done individually so you don't lose any carries - there's no shortcut, just horrible repetitive code, but it still ends up being quicker than the classic 6502 multiply algorithm.

Hmm, that was longer than a couple of sentences huh! :o

User avatar
BigEd
Posts: 2139
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by BigEd » Wed Nov 23, 2016 10:30 pm

Thanks! That's great... can you see a way to use Karatsuba's observation that only three multiplies are needed - or is it clear that it won't help?

User avatar
Rich Talbot-Watkins
Posts: 1365
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by Rich Talbot-Watkins » Wed Nov 23, 2016 10:40 pm

Interesting approach! Unfortunately the problem here is that (a0 + a1) and (b0 + b1) are both 9 bit values, and so won't work as multiplicands with the tables (nor with 6502 registers!). I don't think it's really important though because, in this case, the multiplication operation itself is so quick - particularly if one of the operands has already been set up.

User avatar
BigEd
Posts: 2139
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by BigEd » Wed Nov 23, 2016 10:48 pm

I suppose we'd need at least a BCS and a second byte-indexed table. But an 18-bit result is bit of a squeeze... does it start to make more sense as you move up to 6 byte and to 8 byte operands?

User avatar
BigEd
Posts: 2139
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by BigEd » Wed Nov 23, 2016 10:55 pm

Ooh, this looks even better:
"A three-way Toom–Cook can do a size-3N multiplication for the cost of five size-N multiplications"
So that's 6x6 for the price of five 2x2s.

User avatar
Rich Talbot-Watkins
Posts: 1365
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by Rich Talbot-Watkins » Wed Nov 23, 2016 11:00 pm

No, I don't think it wouldn't be worth the extra hassle unfortunately. In the code, I actually calculate and cache all the partial products first as it's quicker to calculate, for example, all the terms which contain a3 one after the other. When I add them all up, I do this in a different order (LSBs first), so I have to have them precalculated so I can access them as I need.

Code: Select all

	; do multiplication with a3
	LDA mulA+3:STA multab1loptr:STA multab1hiptr
	EOR #&FF:STA multab2loptr:STA multab2hiptr

	; a3.b0
	LDY mulB+0	
	LDA (multab1loptr),Y:SBC (multab2loptr),Y:STA a3b0lo
	LDA (multab1hiptr),Y:SBC (multab2hiptr),Y:STA a3b0hi

	; a3.b1	
	LDY mulB+1
	LDA (multab1loptr),Y:SBC (multab2loptr),Y:STA a3b1lo
	LDA (multab1hiptr),Y:SBC (multab2hiptr),Y:STA a3b1hi
	
	; a3.b2	
	LDY mulB+2
	LDA (multab1loptr),Y:SBC (multab2loptr),Y:STA a3b2lo
	LDA (multab1hiptr),Y:SBC (multab2hiptr),Y:STA a3b2hi

	; a3.b3	
	LDY mulB+3
	LDA (multab1loptr),Y:SBC (multab2loptr),Y:STA a3b3lo
	LDA (multab1hiptr),Y:SBC (multab2hiptr),Y:STA a3b3hi
Once it's set up to multiply with a3, each term takes only 29-33 cycles to calculate. Just building the extra terms needed for the Karatsuba method and handling the extra carry issues would take longer than that!

(I just had a look at the Toom-Cook page but it hurt my head before I got halfway down the page. I suspect it might suffer the same problem though if it's a generalisation of Karatsuba.)

User avatar
BigEd
Posts: 2139
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by BigEd » Wed Nov 23, 2016 11:02 pm

It feels like the square law will get you eventually - but eventually might be a lot more than 8 bytes of operand!

User avatar
Rich Talbot-Watkins
Posts: 1365
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by Rich Talbot-Watkins » Wed Nov 23, 2016 11:02 pm

Just to add, we're already doing a 9-bit index operation here, by addressing with (table+A),Y! Essentially the index lookup is doing the addition with carry for us!

User avatar
Rich Talbot-Watkins
Posts: 1365
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by Rich Talbot-Watkins » Wed Nov 23, 2016 11:20 pm

For a 64 bit * 64 bit multiplication, there'd be 64 terms in total, of which we could discard 21 (which contribute only to underflow terms of the result). That's still 43 multiplications and I frankly hate to think how many 16 bit additions, so it's not going to be quick. For comparison, 32*32 multiplication involves 13 terms and 12 additions, so the quadratic complexity is definitely going to kill.

User avatar
BigEd
Posts: 2139
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by BigEd » Thu Nov 24, 2016 5:51 am

It's nice that you only need an approximate result, so you can avoid doing a fair fraction of the work - about a third for 8x8 bytes, from what you say.

According to the table in this presentation it might be around the 8x8 byte point that Karatsuba starts being the right choice - I'm assuming 190 bits is to be considered about 6 words of 32 bits. Fortunately, perhaps, Toom-Cook isn't worth using until the 16x16 byte point.
Rich Talbot-Watkins wrote:Just to add, we're already doing a 9-bit index operation here, by addressing with (table+A),Y! Essentially the index lookup is doing the addition with carry for us!
That's neat!

User avatar
BigEd
Posts: 2139
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by BigEd » Mon Feb 20, 2017 7:04 pm

Rich Talbot-Watkins wrote:By the way, any 2nd processor version would most likely only double the speed (entirely due to the double clock speed of the parasite) - if I could find a way for both CPUs to perform calculations in parallel, this would improve things a bit more, although I expect the synchronisation would probably hit performance.
Just wondering if you're still thinking about giving this a go? There are more (and faster) second processors in play than ever before, and we need something to show them off!

(The PiTubeDirect has a game-friendly and historically appropriate setting for 3MHz operation, which lines up with a cheesewedge, and the Matchbox has a 4MHz operation which lines up with the Master Turbo. The PiTubeDirect next release will most likely have a selectable speed from 1MHz to 24MHz or full-on.)

User avatar
Rich Talbot-Watkins
Posts: 1365
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by Rich Talbot-Watkins » Tue Feb 21, 2017 1:33 pm

BigEd wrote:Just wondering if you're still thinking about giving this a go? There are more (and faster) second processors in play than ever before, and we need something to show them off!
One day I'd love to. But not any time real soon - I'm going through a bit of a "busy period" in work right now...

User avatar
BigEd
Posts: 2139
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by BigEd » Tue Feb 21, 2017 2:11 pm

Fair enough!

User avatar
BigEd
Posts: 2139
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by BigEd » Thu May 24, 2018 12:06 pm

Dave [hoglet] and I just had a bit of a chat about Mandelbrot speedups as a consequence of the 12-second Mandelbrot thread... which now contains a very interesting idea using 13 bit arithmetic and a big table of squares. (I wonder if perhaps the next step on those lines is 24 bit numbers, treated as two 12-bit digits?)

We were talking about ways of using a coprocessor. (It seems Rich is mostly interested in historical machinery, meaning the 3MHz or 4MHz 6502 second processors, which is fair enough, but it might be interesting to make a program which can also get advantage from a much faster Pi-based copro.)

There are two interesting questions, I think: one is how to get code started which cooperates over two CPUs, and the other is how to divide the labour between the two.

Second question first: we can send individual multiplications (or squarings) to the other side, as discussed above. We can then load-balance the multiplications between the two sides according to how fast they make progress. Or we can send a coordinate and an iteration count, like BBC BASIC's MANDEL command, and have the other side do a lot of work in one go - we get less parallelism, but also spend less time moving requests and results around.

As to how to run code, there are probably a few different ways. When a Beeb has a second processor, that's the default place that code will run, for example with *RUN. In the case of Elite and Tube Life, the code on the second processor then uploads some user interface code byte by byte to the host and set its running (in the case of Elite?) or hooks it into the VDU driver (in the case of Tube Life.)

But we had another idea: if the program finds itself running on the second processor, it can *RUN a helper program which has been saved with host-side load and run addresses, and that program will then run on the host. We think it likely that the parasite-side code can also continue running. (If the main program finds itself running on the host, then there is no enabled second processor and it must do all the work itself.)

This second approach might be easier to write and easier to get working. The main program could act as the computational slave to the helper program, which runs the user interface and decides which coordinates need to be iterated on.

User avatar
Rich Talbot-Watkins
Posts: 1365
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by Rich Talbot-Watkins » Thu May 24, 2018 1:13 pm

Not at all, I'd totally love to see a 200MHz (or whatever) Pi-based copro blitz through a Mandelbrot render on a Beeb! This being the case, you wouldn't even want to try to share the computation work - just have the co-pro doing the calculations, and host doing the plotting (via a custom WRCH implementation). I suspect the host might even still be the bottleneck in this case!

In the case of 3MHz or 4MHz co-pros, the parasite would still be the bottleneck, and would probably yield a ~1.5-2x speedup (essentially just from the higher clock). Alternatively, your parallel approach would work well, in which the parasite calculates zre' (with two multiplies) while the host calculates zim' (with 1 multiply), for maybe a ~2-3x speedup.

Post Reply