Border tracing Mandelbrot generator for the Beeb

discussion of beeb/electron applications, languages, utils and educational s/w
litwr
Posts: 184
Joined: Sun Jun 12, 2016 8:44 am

Re: Border tracing Mandelbrot generator for the Beeb

Postby litwr » Thu Nov 10, 2016 5:31 am

I've just made a Mandelbrot video for the BBC Micro program. The program had worked during about 5 minutes but the video is accelerated. The speed and GUI are very good. :) IMHO it is the best Mandelbrot in the 8-bit world. The version for the 2nd processor may be better than the best Mandelbrot for Amiga or Atari ST. :D
Image

User avatar
hoglet
Posts: 6357
Joined: Sat Oct 13, 2012 6:21 pm
Location: Bristol

Re: Border tracing Mandelbrot generator for the Beeb

Postby hoglet » Thu Nov 10, 2016 4:16 pm

litwr wrote:I've just made a Mandelbrot video for the BBC Micro program. The program had worked during about 5 minutes but the video is accelerated. The speed and GUI are very good. :) IMHO it is the best Mandelbrot in the 8-bit world. The version for the 2nd processor may be better than the best Mandelbrot for Amiga or Atari ST. :D

Is this program available for us to try out?

litwr
Posts: 184
Joined: Sun Jun 12, 2016 8:44 am

Re: Border tracing Mandelbrot generator for the Beeb

Postby litwr » Thu Nov 10, 2016 6:36 pm

hoglet wrote:Is this program available for us to try out?

It is the main program of this thread! :)

User avatar
hoglet
Posts: 6357
Joined: Sat Oct 13, 2012 6:21 pm
Location: Bristol

Re: Border tracing Mandelbrot generator for the Beeb

Postby hoglet » Thu Nov 10, 2016 7:02 pm

litwr wrote:
hoglet wrote:Is this program available for us to try out?

It is the main program of this thread! :)

Ah yes, sorry I was confused for a moment that you had a new program. Just ignore me :D

User avatar
Rich Talbot-Watkins
Posts: 1088
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca

Re: Border tracing Mandelbrot generator for the Beeb

Postby Rich Talbot-Watkins » Wed Nov 23, 2016 2:44 pm

litwr wrote:I've just made a Mandelbrot video for the BBC Micro program. The program had worked during about 5 minutes but the video is accelerated. The speed and GUI are very good. :) IMHO it is the best Mandelbrot in the 8-bit world. The version for the 2nd processor may be better than the best Mandelbrot for Amiga or Atari ST. :D

Thanks! That's nice to hear. My biggest problem with it now is the precision problem after zooming for the fifth time (which is why zoom gets disabled after that) - this would be fixed by either using more fixed-point bits (currently 29 bits), or by going for some kind of floating-point representation. Neither are great as they will slow things down more. The former would be simpler but would still only increase the number of zoom levels by 4 per extra byte used. The latter would probably allow zooming to a much greater depth, but will add a much greater processing cost. One day I'll have a play with it and see if I can improve the accuracy at high zooms.

By the way, any 2nd processor version would most likely only double the speed (entirely due to the double clock speed of the parasite) - if I could find a way for both CPUs to perform calculations in parallel, this would improve things a bit more, although I expect the synchronisation would probably hit performance.

User avatar
BigEd
Posts: 1392
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Postby BigEd » Wed Nov 23, 2016 2:50 pm

I think floating point is no win at all for Mandelbrot - the orbits of the points go all over the map. You just need more bits of precision, and you can stick with fixed point.

Edit:
Rich Talbot-Watkins wrote:By the way, any 2nd processor version would most likely only double the speed.

I don't suppose it would help that the modern co-pro is running at 64MHz, or 256MHz?!?

User avatar
Rich Talbot-Watkins
Posts: 1088
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca

Re: Border tracing Mandelbrot generator for the Beeb

Postby Rich Talbot-Watkins » Wed Nov 23, 2016 2:55 pm

Really? I figured that since the algorithm is basically z2+c iterated until it exceeds 2, the important thing is the number of significant bits you have, not the number of total bits. In cases on the edge of the Mandelbrot set at high zooms, you are potentially iterating many times, and the values themselves might be very small, but what's more important is the number of significant bits of the result. I imagined going to floating point would absolutely be a win, but perhaps I need to think more carefully about that (or just try coding it in C++ and comparing fixed vs floating point results).

User avatar
Rich Talbot-Watkins
Posts: 1088
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca

Re: Border tracing Mandelbrot generator for the Beeb

Postby Rich Talbot-Watkins » Wed Nov 23, 2016 2:56 pm

BigEd wrote:I don't suppose it would help that the modern co-pro is running at 64MHz, or 256MHz?!?

Well, yeah, there is that. But I consider that to be cheating! :lol:

User avatar
BigEd
Posts: 1392
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Postby BigEd » Wed Nov 23, 2016 3:06 pm

Rich Talbot-Watkins wrote:Really? I figured that since the algorithm is basically z2+c iterated until it exceeds 2, the important thing is the number of significant bits you have, not the number of total bits. In cases on the edge of the Mandelbrot set at high zooms, you are potentially iterating many times, and the values themselves might be very small, but what's more important is the number of significant bits of the result. I imagined going to floating point would absolutely be a win, but perhaps I need to think more carefully about that (or just try coding it in C++ and comparing fixed vs floating point results).

Hmm, on reflection I'm not sure any more... this calls for experiment, or a bit of Googling, or perhaps some deep thought.

User avatar
BigEd
Posts: 1392
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Postby BigEd » Wed Nov 23, 2016 3:08 pm

Isn't it true that we're not likely to be zooming in very close to (0,0) but close to somewhere on the boundary, so the magnitude of our numbers is, say, around a half? And yet, when zoomed right in, we need to distinguish the evolution of numbers which differ by very small amounts. So, we do need lots of accuracy, and our magnitudes don't vary so much. Floating point would seem like it's just extra work - what matters is how many bits of precision we have.

User avatar
Rich Talbot-Watkins
Posts: 1088
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca

Re: Border tracing Mandelbrot generator for the Beeb

Postby Rich Talbot-Watkins » Wed Nov 23, 2016 3:13 pm

Yeah I also just reflected and think you're probably right actually. The problem is that the ranges for c in "interesting" areas go from (-1.5...0.5, +/-1) - so adding c each iteration can immediately increase the number of significant bits needed to accurately hold the result. Floating point will sometimes help, but other times will just truncate precision, in which case we still have the same problem. Going to a 6 or 8 byte fixed-point representation would probably be the first step in getting better results. Also it needs to increase the maximum iteration count, the higher the zoom! Another thing that'll slow it down! #-o

Edit: yes, basically what you said there.

User avatar
BigEd
Posts: 1392
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Postby BigEd » Wed Nov 23, 2016 3:29 pm

With a normal 3 or 4MHz copro, there would be some advantage in getting the 2MHz host to do a bit of work. We know how to upload a bit of code to the host using OSWORD and to give it control using USERV. We need to remember that the host can't do any I/O while it's busy running our code - including passing on an ESCAPE event - so we need fairly small work packets. How about this:
    - collect three multiplications which need doing
    - pass one to the host
    - perform the other two
    - collect result from the host

There's a bit of a gotcha there, in that a much faster copro will now be rate-limited by the host's speed. But that's OK if you provide a way to disable this speedup, or can detect this case.

I think computing z² demands two squarings and one multiplication, so that's three roughly-equal bits of work, which is handy.

User avatar
Rich Talbot-Watkins
Posts: 1088
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca

Re: Border tracing Mandelbrot generator for the Beeb

Postby Rich Talbot-Watkins » Wed Nov 23, 2016 3:59 pm

I decided to do a few rough calculations to see whether the multiplication is still likely the bottleneck. The 32 bit * 32 bit multiply code (see .mul_fixed in the source code) takes a bit less than 800 cycles (compared to the naive way which would be anything from 2200-3200 cycles, depending on how many bits are set in the multiplicand), so this is a lot better, but still the slowest thing in the code!

Squaring is actually a little faster as some of the partial products are identical and so only need to be calculated once.

Your idea for splitting up the three multiplications between the two CPUs is great, in the case that the parasite is about a factor of two faster. How lucky the algorithm works like that!

User avatar
lurkio
Posts: 1148
Joined: Tue Apr 09, 2013 11:30 pm
Location: Doomawangara
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Postby lurkio » Wed Nov 23, 2016 9:22 pm

Rich Talbot-Watkins wrote:New version! Now it looks like all the outlines are grown in parallel from the outside inwards - not sure if this is more or less visually interesting, but it has lower space requirements, so it's probably here to stay. Changed to a big overscan MODE 2 (184x272)!

Just catching up with this. Very nice! =D>

screenshot.png
Screenshot

:idea:

User avatar
BigEd
Posts: 1392
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Postby BigEd » Wed Nov 23, 2016 9:34 pm

Rich Talbot-Watkins wrote:Squaring is actually a little faster as some of the partial products are identical and so only need to be calculated once.

I like the idea of squaring being a bit cheaper!

Can you say in a couple of sentences what your approach to multiplication is? I think I can see you're using a number of page-sized lookup tables. (I like the trick of Karatsuba multiplication but I'm not sure I've yet seen it used in real life.)

User avatar
Rich Talbot-Watkins
Posts: 1088
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca

Re: Border tracing Mandelbrot generator for the Beeb

Postby Rich Talbot-Watkins » Wed Nov 23, 2016 10:24 pm

Yeah! So there's the reasonably well known "difference of squares method". This basically says that:

ab = f(a+b) - f(a-b)

where f(x) = x2/4

You can show this by expanding (a + b)2 and (a - b)2, like this:

(a + b)2 = a2 + b2 + 2ab -> (I)
(a - b)2 = a2 + b2 - 2ab -> (II)

(I) - (II) gives:

4ab = (a + b)2 - (a - b)2

So if we were to build a table of f(x) for x = 0...510, that would allow us to quickly look up f(a + b) for any positive a and b between 0 and 255:

Code: Select all

LDA mul1
STA mulindex
LDY mul2
.mulindex LDA (mulindex),Y  ; (mulindex) points to page aligned multable


How about f(a - b)? There's a lovely little hack which involves creating a new table, this time of g(x) = ((x-255)2)/4.

Now we have f(a - b) = g(a + 255 - b).

(255 - b) is just (b EOR &FF), so now we have the entire code snippet:

Code: Select all

LDA mul1
STA mulindex1
STA mulindex3
EOR #&FF
STA mulindex2
STA mulindex4

LDY mul2
SEC
LDA (mulindex1),Y ; points to page aligned multable_lo
SBC (mulindex2),Y ; points to page aligned multable255_lo
STA result_lo
LDA (mulindex3),Y ; points to page aligned multable_hi
SBC (mulindex4),Y ; points to page aligned multable255_hi
STA result_hi

The first 6 lines of that are a constant preamble for constant mul1, so we get a further optimization if we want to multiply several values by mul1! That works out an 8 bit * 8 bit = 16 bit multiplication in very few cycles.

How about bigger numbers?

Well this is just about adding together partial products:

(a0 + ma1)(b0 + mb1)
= a0b0 + m(a0b1 + a1b0) + m2(a1b1)
(where m=256)

So, to do 16 bit * 16 bit = 32 bit, it's 4 multiplies, and then add the partial products together.

You can extend that idea to as far as you need, but the number of terms becomes a bit hairy after a while. In the source code, you can see how all these partial products need to get added together. Because each one is 16 bit, they have to be done individually so you don't lose any carries - there's no shortcut, just horrible repetitive code, but it still ends up being quicker than the classic 6502 multiply algorithm.

Hmm, that was longer than a couple of sentences huh! :o

User avatar
BigEd
Posts: 1392
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Postby BigEd » Wed Nov 23, 2016 10:30 pm

Thanks! That's great... can you see a way to use Karatsuba's observation that only three multiplies are needed - or is it clear that it won't help?

User avatar
Rich Talbot-Watkins
Posts: 1088
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca

Re: Border tracing Mandelbrot generator for the Beeb

Postby Rich Talbot-Watkins » Wed Nov 23, 2016 10:40 pm

Interesting approach! Unfortunately the problem here is that (a0 + a1) and (b0 + b1) are both 9 bit values, and so won't work as multiplicands with the tables (nor with 6502 registers!). I don't think it's really important though because, in this case, the multiplication operation itself is so quick - particularly if one of the operands has already been set up.

User avatar
BigEd
Posts: 1392
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Postby BigEd » Wed Nov 23, 2016 10:48 pm

I suppose we'd need at least a BCS and a second byte-indexed table. But an 18-bit result is bit of a squeeze... does it start to make more sense as you move up to 6 byte and to 8 byte operands?

User avatar
BigEd
Posts: 1392
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Postby BigEd » Wed Nov 23, 2016 10:55 pm

Ooh, this looks even better:
"A three-way Toom–Cook can do a size-3N multiplication for the cost of five size-N multiplications"
So that's 6x6 for the price of five 2x2s.

User avatar
Rich Talbot-Watkins
Posts: 1088
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca

Re: Border tracing Mandelbrot generator for the Beeb

Postby Rich Talbot-Watkins » Wed Nov 23, 2016 11:00 pm

No, I don't think it wouldn't be worth the extra hassle unfortunately. In the code, I actually calculate and cache all the partial products first as it's quicker to calculate, for example, all the terms which contain a3 one after the other. When I add them all up, I do this in a different order (LSBs first), so I have to have them precalculated so I can access them as I need.

Code: Select all

   ; do multiplication with a3
   LDA mulA+3:STA multab1loptr:STA multab1hiptr
   EOR #&FF:STA multab2loptr:STA multab2hiptr

   ; a3.b0
   LDY mulB+0   
   LDA (multab1loptr),Y:SBC (multab2loptr),Y:STA a3b0lo
   LDA (multab1hiptr),Y:SBC (multab2hiptr),Y:STA a3b0hi

   ; a3.b1   
   LDY mulB+1
   LDA (multab1loptr),Y:SBC (multab2loptr),Y:STA a3b1lo
   LDA (multab1hiptr),Y:SBC (multab2hiptr),Y:STA a3b1hi
   
   ; a3.b2   
   LDY mulB+2
   LDA (multab1loptr),Y:SBC (multab2loptr),Y:STA a3b2lo
   LDA (multab1hiptr),Y:SBC (multab2hiptr),Y:STA a3b2hi

   ; a3.b3   
   LDY mulB+3
   LDA (multab1loptr),Y:SBC (multab2loptr),Y:STA a3b3lo
   LDA (multab1hiptr),Y:SBC (multab2hiptr),Y:STA a3b3hi

Once it's set up to multiply with a3, each term takes only 29-33 cycles to calculate. Just building the extra terms needed for the Karatsuba method and handling the extra carry issues would take longer than that!

(I just had a look at the Toom-Cook page but it hurt my head before I got halfway down the page. I suspect it might suffer the same problem though if it's a generalisation of Karatsuba.)

User avatar
BigEd
Posts: 1392
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Postby BigEd » Wed Nov 23, 2016 11:02 pm

It feels like the square law will get you eventually - but eventually might be a lot more than 8 bytes of operand!

User avatar
Rich Talbot-Watkins
Posts: 1088
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca

Re: Border tracing Mandelbrot generator for the Beeb

Postby Rich Talbot-Watkins » Wed Nov 23, 2016 11:02 pm

Just to add, we're already doing a 9-bit index operation here, by addressing with (table+A),Y! Essentially the index lookup is doing the addition with carry for us!

User avatar
Rich Talbot-Watkins
Posts: 1088
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca

Re: Border tracing Mandelbrot generator for the Beeb

Postby Rich Talbot-Watkins » Wed Nov 23, 2016 11:20 pm

For a 64 bit * 64 bit multiplication, there'd be 64 terms in total, of which we could discard 21 (which contribute only to underflow terms of the result). That's still 43 multiplications and I frankly hate to think how many 16 bit additions, so it's not going to be quick. For comparison, 32*32 multiplication involves 13 terms and 12 additions, so the quadratic complexity is definitely going to kill.

User avatar
BigEd
Posts: 1392
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Postby BigEd » Thu Nov 24, 2016 5:51 am

It's nice that you only need an approximate result, so you can avoid doing a fair fraction of the work - about a third for 8x8 bytes, from what you say.

According to the table in this presentation it might be around the 8x8 byte point that Karatsuba starts being the right choice - I'm assuming 190 bits is to be considered about 6 words of 32 bits. Fortunately, perhaps, Toom-Cook isn't worth using until the 16x16 byte point.

Rich Talbot-Watkins wrote:Just to add, we're already doing a 9-bit index operation here, by addressing with (table+A),Y! Essentially the index lookup is doing the addition with carry for us!

That's neat!

User avatar
BigEd
Posts: 1392
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Postby BigEd » Mon Feb 20, 2017 7:04 pm

Rich Talbot-Watkins wrote:By the way, any 2nd processor version would most likely only double the speed (entirely due to the double clock speed of the parasite) - if I could find a way for both CPUs to perform calculations in parallel, this would improve things a bit more, although I expect the synchronisation would probably hit performance.

Just wondering if you're still thinking about giving this a go? There are more (and faster) second processors in play than ever before, and we need something to show them off!

(The PiTubeDirect has a game-friendly and historically appropriate setting for 3MHz operation, which lines up with a cheesewedge, and the Matchbox has a 4MHz operation which lines up with the Master Turbo. The PiTubeDirect next release will most likely have a selectable speed from 1MHz to 24MHz or full-on.)

User avatar
Rich Talbot-Watkins
Posts: 1088
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca

Re: Border tracing Mandelbrot generator for the Beeb

Postby Rich Talbot-Watkins » Tue Feb 21, 2017 1:33 pm

BigEd wrote:Just wondering if you're still thinking about giving this a go? There are more (and faster) second processors in play than ever before, and we need something to show them off!

One day I'd love to. But not any time real soon - I'm going through a bit of a "busy period" in work right now...

User avatar
BigEd
Posts: 1392
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Postby BigEd » Tue Feb 21, 2017 2:11 pm

Fair enough!


Return to “software: other”

Who is online

Users browsing this forum: No registered users and 8 guests