12-second Mandelbrot rendering on the BBC Master!

Discuss all aspects of programming here. From 8-bit through to modern architectures.
User avatar
hjalfi
Posts: 117
Joined: Sat May 13, 2017 10:17 pm
Location: Zürich, Switzelrand
Contact:

12-second Mandelbrot rendering on the BBC Master!

Post by hjalfi » Mon May 21, 2018 6:22 pm

Yes, really! Well, no, not really, but, yes really!

Okay, I was distracted, and I decided to have a play with stupidly limited fixed-point arithmetic, and it led to this.

The JSBeeb link is here: https://bbc.godbolt.org/?model=Master&d ... d&autoboot

And the writeup is here (including a link to the code): http://cowlark.com/2018-05-21-bbc-micro-mandelbrot/

User avatar
BigEd
Posts: 1825
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: 12-second Mandelbrot rendering on the BBC Master!

Post by BigEd » Mon May 21, 2018 7:28 pm

Excellent - thanks for the jsbeeb link! (And thanks for the detailed writeup too. It's interesting to see different tradeoffs in arithmetic.)

User avatar
lcww1
Posts: 230
Joined: Wed Mar 15, 2017 11:16 pm
Location: East of the Sun and West of the Moon
Contact:

Re: 12-second Mandelbrot rendering on the BBC Master!

Post by lcww1 » Mon May 21, 2018 7:43 pm

Beautiful!
Made me smile :D

User avatar
hjalfi
Posts: 117
Joined: Sat May 13, 2017 10:17 pm
Location: Zürich, Switzelrand
Contact:

Re: 12-second Mandelbrot rendering on the BBC Master!

Post by hjalfi » Tue May 22, 2018 7:29 pm

I got it down to 4.5 seconds! With some slight problems...
grab.jpg
grab.jpg (9.53 KiB) Viewed 1090 times
That's using reenigne's table-of-squares algorithm, but 2.6 fixed point just doesn't have enough precision to make it work and it's overflowing nearly everywhere. It's still getting the main set more-or-less right, though.
David Given
http://cowlark.com

User avatar
hjalfi
Posts: 117
Joined: Sat May 13, 2017 10:17 pm
Location: Zürich, Switzelrand
Contact:

Re: 12-second Mandelbrot rendering on the BBC Master!

Post by hjalfi » Wed May 23, 2018 9:26 pm

Sigh. I was feeling all smug there for a moment...

So reenigne came up with an incredibly cunning and impossibly fast Mandelbrot kernel, using 14-point fixed point arithmetic and a 16kB lookup table. After patching it into my renderer, this happened...
grab.png
zoomout.png
Jsbeeb link: https://bbc.godbolt.org/?model=Master&d ... d&autoboot

So, yeah. Now it really is a real Mandelbrot renderer. (That's 32 iterations, too.)
David Given
http://cowlark.com

User avatar
BigEd
Posts: 1825
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: 12-second Mandelbrot rendering on the BBC Master!

Post by BigEd » Wed May 23, 2018 9:30 pm

Very good indeed - what's the secret sauce?

User avatar
hjalfi
Posts: 117
Joined: Sat May 13, 2017 10:17 pm
Location: Zürich, Switzelrand
Contact:

Re: 12-second Mandelbrot rendering on the BBC Master!

Post by hjalfi » Wed May 23, 2018 9:57 pm

I'm going to do a proper writeup, because it's really neat, but essentially: there are 16384 possible 14 bit numbers; if we enforce that the LSB is always 0, then there are 8192 possible 14 bit numbers, each of which are 2 apart; the top two bits are unused, so let's mask in $8000; now every possible number is a valid pointer to a unique entry in a 16kB lookup table at 0x8000.

So, every number is a pointer to its own square.

This then allows a relatively standard table-of-squares approach to multiplication. Arithmetic operations are adds, subtractions and squares, all of which preserve the 0 in the LSB; operations which result in a number which might need to be squared require the top two bits to be fixed up; but that's basically all there is to it.

(Although reenigne's original kernel is also supposed to live in zero page and use self-modifying code. The version in my renderer doesn't, because I haven't figured out how to make the assembler do it. Sigh. I miss the BBC Basic assembler.)
David Given
http://cowlark.com

User avatar
BigEd
Posts: 1825
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: 12-second Mandelbrot rendering on the BBC Master!

Post by BigEd » Thu May 24, 2018 8:31 am

Very nice! (I think I'd call it 13 bits, but still.)

User avatar
Rich Talbot-Watkins
Posts: 1280
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca
Contact:

Re: 12-second Mandelbrot rendering on the BBC Master!

Post by Rich Talbot-Watkins » Thu May 24, 2018 9:58 am

That's a cool technique!

So, we have:

Zre' = Zre2 - Zim2 + Cre

Zim' = 2 Zre Zim + Cim
       = ( (Zre + Zim)2 - (Zre - Zim)2 ) / 2 + Cim

That way, all the squares come directly from the table.

Did you ever see my take on the Mandelbrot set, here? It uses 32-bit precision, and performs a boundary tracing technique rather than the recursive subdivision that you went for. I wonder if mine would perform better using that trick instead? I used to think mine was fast, but yours just blows it out of the water! (Looking at my source code in retrospect, I can see some obvious low hanging fruit for improving it marginally, but I doubt it'd make a huge difference.)

User avatar
hjalfi
Posts: 117
Joined: Sat May 13, 2017 10:17 pm
Location: Zürich, Switzelrand
Contact:

Re: 12-second Mandelbrot rendering on the BBC Master!

Post by hjalfi » Thu May 24, 2018 11:05 am

Sadly, I must be absolutely clear that this is no longer my trick! (Although I can still claim responsibility for the recursive subdivision, which is a big help, at least once I fixed the big bug that meant it didn't work properly in my original version.)

How does your boundary tracing version track edges? i.e. data structure management? I'm using video memory as a cache for rendered pixels; mode 2 colour 8, mapped to black so it's invisible, is used to mark unrendered pixels. I'd like to switch to mode 1 for higher resolution but because I need that colour I'd end up with a three-colour image. Might still be worth it, though.

(Also, I had forgotten how nasty pick/plot pixels on the BBC is... that framebuffer format, ugh. Anyone have any pointers to good (i.e. fast) machine code routines?)

(This algorithm would work *really* well on a Z80, with its 16-bit adds/subtracts.)
David Given
http://cowlark.com

User avatar
Rich Talbot-Watkins
Posts: 1280
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca
Contact:

Re: 12-second Mandelbrot rendering on the BBC Master!

Post by Rich Talbot-Watkins » Thu May 24, 2018 11:43 am

The border tracing is quite straightforward. Instead of using recursion, it maintains a queue of coordinates to process and carries on until it's empty. First thing it does is pushes all pixels at the screen edge to the queue. Then, for each item in the queue:
  • Evaluate the pixel at that position and plot it. Note that the pixel may already be plotted, in which case use the screen as a cache to save reevaluating the pixel.
  • Evaluate all 8 neighbours and plot them (again remembering that their result may already be known from the screen itself). If any neighbour differs from the central pixel, we have found a border, so push this to the queue so we trace along the border.
Once the queue is empty, we have all the borders rendered on screen (the border is actually two-sided, containing both colours). Now we just do a full screen pass to fill in the colour: for any pixel not yet drawn, we just duplicate the pixel directly above, and iterate in columns, top to bottom, left to right.

I use a similar trick to you with special on-screen colour values (given the 4th colour bit in MODE 2 is redundant). Each element is actually a pair of pixels (because I use dithering to get more colours), so the resolution ends up being 184x136 (overscan); then the two top bits have a special meaning - one to denote that the pixel hasn't yet been written, and one to denote that a pixel has already been added to the queue (otherwise we could quickly fill the queue with duplicates). These are distinct because we don't always write new pixels to the queue, if they happen to be the same colour as the central pixel being tested against.

I wrote a little about the multiplication technique that it uses here. It uses fixed point 3:29 values (between -4 and +4), and does an early-out when Zre or Zim are >= 2, before they're squared, as clearly that would already yield Z' >= 4. A cute consequence of this multiplication technique is that it can square a number slightly more quickly than a regular multiplication, because many of the partial products are the same.

Yeah, screen address calculation is a pain. I think I use a table of the start address for each character row, and then add in (y AND 7) + (x AND &FE)*4. Source code is in that link in case you want to see how it all works.

User avatar
BigEd
Posts: 1825
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: 12-second Mandelbrot rendering on the BBC Master!

Post by BigEd » Thu May 24, 2018 1:17 pm

Rich Talbot-Watkins wrote:That's a cool technique!

So, we have:

Zre' = Zre2 - Zim2 + Cre

Zim' = 2 Zre Zim + Cim
       = ( (Zre + Zim)2 - (Zre - Zim)2 ) / 2 + Cim

That way, all the squares come directly from the table.
Four squarings... it turns out, I think, that three squarings are enough:
The one multiply was being used to calculate zr * zi. If we instead calculate (zr + zi)^2 then we get zr^2 + 2*zr*zi + zi^2. Since we’ve already calculated zr^2 and zi^2 we can just subtract them off

User avatar
Rich Talbot-Watkins
Posts: 1280
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca
Contact:

Re: 12-second Mandelbrot rendering on the BBC Master!

Post by Rich Talbot-Watkins » Thu May 24, 2018 1:20 pm

Oooh, great trick! Kicking myself for not spotting that when I was writing it up! I can actually use that to improve the performance of my code, substantially I suspect!

User avatar
BigEd
Posts: 1825
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: 12-second Mandelbrot rendering on the BBC Master!

Post by BigEd » Thu May 24, 2018 1:31 pm

There's another idea which I'm failing to find in my searches- a little like the boundary following, expect it draws a small circle around each computed point. In areas of less detail, it turns out the circle is not so small, of course. The net effect, mathematically justified, is needing to compute fewer points than you paint.

User avatar
BigEd
Posts: 1825
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: 12-second Mandelbrot rendering on the BBC Master!

Post by BigEd » Thu May 24, 2018 1:34 pm

BTW, for the cost of a couple of base-2 logs, which is almost nothing, surely, in binary, and with the addition of a VideoNULA, it's possible to apply smooth shading to the display. See here
http://linas.org/art-gallery/escape/smooth.html

User avatar
Rich Talbot-Watkins
Posts: 1280
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca
Contact:

Re: 12-second Mandelbrot rendering on the BBC Master!

Post by Rich Talbot-Watkins » Thu May 24, 2018 2:02 pm

BigEd wrote: Four squarings... it turns out, I think, that three squarings are enough:
The one multiply was being used to calculate zr * zi. If we instead calculate (zr + zi)^2 then we get zr^2 + 2*zr*zi + zi^2. Since we’ve already calculated zr^2 and zi^2 we can just subtract them off
Out of curiosity, I just tried making this change in my original program to see what kind of difference it makes. It now renders the initial screen in 1:45 instead of 1:59 - not exactly the improvement I was expecting, but it's something I guess. And still clearly not 12 seconds :D

Not sure what sneaky tricks could be employed to speed up the multiply while still keeping 32 bit precision.
Attachments
mandel.zip
(9.59 KiB) Downloaded 14 times

User avatar
Rich Talbot-Watkins
Posts: 1280
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca
Contact:

Re: 12-second Mandelbrot rendering on the BBC Master!

Post by Rich Talbot-Watkins » Thu May 24, 2018 2:09 pm

BigEd wrote:There's another idea which I'm failing to find in my searches- a little like the boundary following, expect it draws a small circle around each computed point. In areas of less detail, it turns out the circle is not so small, of course. The net effect, mathematically justified, is needing to compute fewer points than you paint.
Sounds interesting. Essentially the greatest optimization at this stage is to avoid calculating as many points as possible (particularly those requiring large numbers of iterations, e.g. those in the Mandelbrot set).

I wonder which strategy hits fewer points out of recursive subdivision or boundary tracing? I figured that boundary tracing will only evaluate the border points of the Mandelbrot set, while subdivision will evaluate any which lie on the edge of the block being considered, but maybe there's an overall advantage in the latter approach. Maybe boundary tracing wastes time evaluating neighbours which are more likely to be the same colour as the central element.

User avatar
hjalfi
Posts: 117
Joined: Sat May 13, 2017 10:17 pm
Location: Zürich, Switzelrand
Contact:

Re: 12-second Mandelbrot rendering on the BBC Master!

Post by hjalfi » Thu May 24, 2018 3:54 pm

Reenigne's kernel (it's at the bottom of https://github.com/davidgiven/bogomande ... mandel.asm, annotations mostly mine) is computing 2xy from (x+y)^2 - x^2 - y^2:

Code: Select all

zr2_plus_zi2 = zr^2 + zi^2
if zr2_plus_zi2 > 4 then bailout
zr_plus_zi = zr + zi
zr2_minus_zi2 = zr^2 - zi^2
zr = zr2_minus_zi2 + cr
zi = zr2_plus_zi2 - zr_plus_zi + ci
I need to port the code from xa65 to the BBC assembler, which is easier to use and much more powerful, and get the kernel into zero page where it belongs (it uses self-modifying code to store one-byte values in the literal arguments of instructions).

Also it got pointed out to me that there's a really simple optimisation which will save tonnes of cycles per pixel for converting pixel coordinates to complex space. I may be able to use a similar trick for calculating screen addresses.

Reenigne said that they hadn't written any 6502 code before, BTW.
David Given
http://cowlark.com

User avatar
hjalfi
Posts: 117
Joined: Sat May 13, 2017 10:17 pm
Location: Zürich, Switzelrand
Contact:

Re: 12-second Mandelbrot rendering on the BBC Master!

Post by hjalfi » Sat May 26, 2018 10:47 pm

WITNESS THE POWER OF MY FULLY OPERATIONAL... PROGRAM THING.

Image

Now with pan-and-zoom support, Julia sets, actual help text, and three (3) levels of zoom!

The writeup with all the maths in it is here: http://cowlark.com/2018-05-26-bogomandel/ (and a link to the code).

The jsbeeb link is here: https://bbc.godbolt.org/?model=Master&d ... d&autoboot
David Given
http://cowlark.com

User avatar
tricky
Posts: 2457
Joined: Tue Jun 21, 2011 8:25 am
Contact:

Re: 12-second Mandelbrot rendering on the BBC Master!

Post by tricky » Sun May 27, 2018 4:43 am

Wow, I would never have thought I would have seen that on a beeb bitd.

dp11
Posts: 797
Joined: Sun Aug 12, 2012 8:47 pm
Contact:

Re: 12-second Mandelbrot rendering on the BBC Master!

Post by dp11 » Sun May 27, 2018 4:55 am

I don't know if you're interested in another micro optimization but

Code: Select all

   
    bcs bailout
    sta zr2_p_zi2_hi

    ; Calculate zr + zi. 

    clc
    lda zr+0            ; A = low(zr) 
   
Because of the bcs we know the carry is clear so no need for the clc.

Inlining mandel as it is only used once saves quite a few cycles.


This is all amazing work.
Last edited by dp11 on Sun May 27, 2018 8:18 am, edited 1 time in total.

User avatar
hjalfi
Posts: 117
Joined: Sat May 13, 2017 10:17 pm
Location: Zürich, Switzelrand
Contact:

Re: 12-second Mandelbrot rendering on the BBC Master!

Post by hjalfi » Sun May 27, 2018 8:02 am

@dp11: ooh, nice! Every second counts --- that shaves a tenth of a second off the render speed for the default image (1805 px/s instead of 1795).

...you know, I think I can scrape another bit of precision out of this (which would add another zoom layer). I'd need a 32kB lookup table, which would extend from 0x4000 to 0xc000; the framebuffer would go into shadow RAM. The top two bits of the number representation would would be either %01 or %10, so the top bit would be the inverse of the sign bit. That means my fixup would go from:

lda number+1
and #&3f
ora #&80
sta number+1

...to:

lda number+1
asl A
lsr A ; propagate sign bit to top bit
eor #&80 ; flip sign bit
sta number+1

That's two extra cycles. There are three fixups in the kernel, so I'd expect 1785 px/s --- which should be negligable.

Dammit, I thought I'd finished this.
David Given
http://cowlark.com

dp11
Posts: 797
Joined: Sun Aug 12, 2012 8:47 pm
Contact:

Re: 12-second Mandelbrot rendering on the BBC Master!

Post by dp11 » Sun May 27, 2018 8:19 am

Just edited above post Inlining mandel will save quite a bit of time. After inling try and make .iterator_loop all sit within one page .

Also align the data to a page boundary so no extra cycles are added when accessing the data :

Code: Select all

.pixels_to_zr_lo    skip &100
.pixels_to_zr_hi    skip &100
.pixels_to_zi_lo    skip &100
.pixels_to_zi_hi    skip &100
.col_table_lo       skip &80
.col_table_hi       skip &80
.row_table_lo       skip &100
.row_table_hi       skip &100

User avatar
hjalfi
Posts: 117
Joined: Sat May 13, 2017 10:17 pm
Location: Zürich, Switzelrand
Contact:

Re: 12-second Mandelbrot rendering on the BBC Master!

Post by hjalfi » Sun May 27, 2018 10:23 am

Yup, works fine! And I chopped off a bit from the integer part and added it to the fractional part, which gives me two extra bits of precision --- you can now zoom all the way down to areas 0.25 across!

Sadly the performance has suffered. The number fixup is more complex than I expected; I ended up with:

Code: Select all

    (lda number+1)
    asl A ; bit 6 -> bit 7 -> N; 2
    clc ; 2
    bmi dont_set_c ; 2.5
    sec ; 2
.dont_set_c
    ror A ; C -> bit 7  ; 2
    (sta number+1)
= 10.5 cycles, up from 4.

So it's now getting 1590 px/s for the top-level image instead of 1800px/s, which is a shame. All I want to do is to set bit7 = !bit6; are there any cunninger ways?

(The jsbeeb link is unchanged --- the new version's uploaded.)

Re alignment and inline etc: the program is really top-heavy; it's massively dominated by the inner loop inside the mandel routine. Optimising things outside that routine has very little effect; I tried aligning the lookup tables and the result was lost in the noise.

reenigne's original version had the entire kernel running in zero page and even more self-modifying code, just to shave a few extra cycles off. I actually don't do that, because I'd have to stomp all over Basic workspace to make room and the shell program with the UI in it is in Basic (for simplicity).
David Given
http://cowlark.com

User avatar
BigEd
Posts: 1825
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: 12-second Mandelbrot rendering on the BBC Master!

Post by BigEd » Sun May 27, 2018 10:29 am

hjalfi wrote:
Sun May 27, 2018 10:23 am
All I want to do is to set bit7 = !bit6; are there any cunninger ways?
Is bit7 already a known value, like zero for example? Or is it initially the same as bit6?

User avatar
danielj
Posts: 6171
Joined: Thu Oct 02, 2008 4:51 pm
Location: Manchester
Contact:

Re: 12-second Mandelbrot rendering on the BBC Master!

Post by danielj » Sun May 27, 2018 11:10 am

Love this - I haven't looked at the code, but extending the colour depth to the full 16 in MODE2 will mean that a bit of NULA magic can also be worked!

d.

User avatar
hjalfi
Posts: 117
Joined: Sat May 13, 2017 10:17 pm
Location: Zürich, Switzelrand
Contact:

Re: 12-second Mandelbrot rendering on the BBC Master!

Post by hjalfi » Sun May 27, 2018 11:17 am

Bit7 isn't known, because it'll get corrupted by the previous computation. The fixup adjusts the number so that it's a pointer to its square in the lookup table: bit6 is the sign bit, so positive numbers go in 0x8000 to 0xc000 (the top two bits are %10) and negative numbers go in 0x4000 to 0x8000 (the top two bits are %01). Adding two positive numbers will therefore reset bit7, but adding a positive number and a negative number will set it, etc. The fixup is necessary to correct this.

I'm on a BBC Master, so I get to use the extra 65C02 instructions, but I'm not sure they'll help. (We don't get the Rockwell and WDC extra instructions, do we?)
David Given
http://cowlark.com

User avatar
tricky
Posts: 2457
Joined: Tue Jun 21, 2011 8:25 am
Contact:

Re: 12-second Mandelbrot rendering on the BBC Master!

Post by tricky » Sun May 27, 2018 11:30 am

Wasn't there a code option.

Code: Select all

(LDA)
ASL
CMP #128
ROR
EOR #128
(STA)
LDA + 8 cycles if it is correct.
Last edited by tricky on Sun May 27, 2018 11:38 am, edited 1 time in total.

User avatar
danielj
Posts: 6171
Joined: Thu Oct 02, 2008 4:51 pm
Location: Manchester
Contact:

Re: 12-second Mandelbrot rendering on the BBC Master!

Post by danielj » Sun May 27, 2018 11:32 am

The master is the 65SC12 - so I think it doesn't get the bit instructions :(

d.

User avatar
BigEd
Posts: 1825
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: 12-second Mandelbrot rendering on the BBC Master!

Post by BigEd » Sun May 27, 2018 11:36 am

tricky wrote:
Sun May 27, 2018 11:30 am
Wasn't there a code option.
Still there, now with a picture and tooltip instead of a word: </>

Post Reply