Border tracing Mandelbrot generator for the Beeb

discussion of beeb/electron applications, languages, utils and educational s/w
User avatar
Rich Talbot-Watkins
Posts: 1299
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by Rich Talbot-Watkins » Thu May 24, 2018 1:18 pm

BigEd wrote:like BBC BASIC's MANDEL command
:shock:

Did I miss a trick here?

Code: Select all

>10 MANDEL -2,1,1,-1
>RUN
Image

User avatar
BigEd
Posts: 1967
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by BigEd » Thu May 24, 2018 1:29 pm

Oh hang on, that does rather more work than I thought!

RobC
Posts: 2224
Joined: Sat Sep 01, 2007 9:41 pm
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by RobC » Thu May 24, 2018 3:02 pm

Rich Talbot-Watkins wrote:This being the case, you wouldn't even want to try to share the computation work - just have the co-pro doing the calculations, and host doing the plotting (via a custom WRCH implementation). I suspect the host might even still be the bottleneck in this case!
Definitely - just let the native ARM core do all the work and fill up the VDU FIFO.

User avatar
hoglet
Posts: 7353
Joined: Sat Oct 13, 2012 6:21 pm
Location: Bristol
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by hoglet » Thu May 24, 2018 4:00 pm

BigEd wrote:Oh hang on, that does rather more work than I thought!
For reference, this is the usage of the Mandel command that I'm familiar with:

Code: Select all

10 MODE 2
20 D% = 16
30 FOR X% = 0 TO 1023 STEP 4
40 FOR Y% = 0 TO 1023 STEP 8
50 MANDEL X% / 1024 * 4 - 2, Y% / 1024 * 4 - 2
60 GCOL C% AND 7
70 POINT Y%, X%
80 NEXT
90 NEXT
As far as I know, it's only present in early versions of ARM Basic, like version 1.00 that ships with the ARM Eval system.

On BeebEm it takes 71.14s.
Last edited by hoglet on Thu May 24, 2018 4:07 pm, edited 1 time in total.

User avatar
Rich Talbot-Watkins
Posts: 1299
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by Rich Talbot-Watkins » Thu May 24, 2018 4:05 pm

I was being facetious! I honestly had no idea that such a thing existed (and assumed it was some kind of typo or something I didn't understand...).

Now I guess I have to ask - why was that ever included in any version of BBC BASIC? And how was it tokenised?!

User avatar
BigEd
Posts: 1967
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by BigEd » Thu May 24, 2018 4:12 pm

(Note the implicit use of D% and C%)

User avatar
hoglet
Posts: 7353
Joined: Sat Oct 13, 2012 6:21 pm
Location: Bristol
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by hoglet » Thu May 24, 2018 4:12 pm

Rich Talbot-Watkins wrote: Now I guess I have to ask - why was that ever included in any version of BBC BASIC? And how was it tokenised?!
No idea why it was included. Because there was the space?

Looks like the token for MANDEL was &C8 &96:
MANDEL.png
I should also say, this short program originated from Jon Welch.

Dave

User avatar
Rich Talbot-Watkins
Posts: 1299
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by Rich Talbot-Watkins » Thu May 24, 2018 4:20 pm

Just tried to run B-Em in "ARM evaluation system" mode to see if it worked, but when I try *BASIC, I get:

Code: Select all

Undefined instruction at 80008058
Register dump (stored at &E40) is:
R0  = 00000F00 R1  = 00000001 R2  = 03001570 R3  = 0300156C
R4  = 00000F01 R5  = 00000041 R6  = 83000F04 R7  = 00008000
R8  = 00000F06 R9  = 03001A94 R10 = 00000F06 R11 = 000000C0
R12 = 01000000 R13 = 01000000 R14 = 83000F08 R15 = 80008058
Mode USR flags set: Nzcvif

Finished after 0.14 sec.
:(

Edit: *MEMORYI around that area gives me what looks like a load of junk.

Code: Select all

80008050  SWINE WriteI+":"
80008054  2-STLS C0,[R4],#-624
80008058  SWINV WriteI+3
8000805C  BEQ &02890E7C
80008060  BGT &001082A0
Edit 2: This looks a bit like 6502 code to me. Plenty of A2 (LDX), A9 (LDA), 8D (STA) etc in the hex dump!

User avatar
hoglet
Posts: 7353
Joined: Sat Oct 13, 2012 6:21 pm
Location: Bristol
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by hoglet » Thu May 24, 2018 4:25 pm

Rich Talbot-Watkins wrote:Just tried to run B-Em in "ARM evaluation system" mode to see if it worked, but when I try *BASIC, I get:

Code: Select all

Undefined instruction at 80008058
Register dump (stored at &E40) is:
R0  = 00000F00 R1  = 00000001 R2  = 03001570 R3  = 0300156C
R4  = 00000F01 R5  = 00000041 R6  = 83000F04 R7  = 00008000
R8  = 00000F06 R9  = 03001A94 R10 = 00000F06 R11 = 000000C0
R12 = 01000000 R13 = 01000000 R14 = 83000F08 R15 = 80008058
Mode USR flags set: Nzcvif

Finished after 0.14 sec.
:(
You need to use the disc that includes Arm BASIC (*AB).

I think it's on System Disc 3 here:
https://www.g7jjf.com/acornArm_disc_images.htm

This needs to be mounted in ADFS.

Dave

User avatar
Rich Talbot-Watkins
Posts: 1299
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by Rich Talbot-Watkins » Thu May 24, 2018 4:35 pm

It is indeed! Thanks!

User avatar
hoglet
Posts: 7353
Joined: Sat Oct 13, 2012 6:21 pm
Location: Bristol
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by hoglet » Thu May 24, 2018 4:49 pm

Rich Talbot-Watkins wrote:It is indeed! Thanks!
MANDEL was included in ARM BASIC 1.00 (assembled 14th April 1986)

It was not included in the next version I can find: ARM BASIC 1.02 (25th September 1987).

As far as I'm aware, we don't have a disassembly for version 1.00.

Dave

steve3000
Posts: 1874
Joined: Sun Nov 25, 2012 12:43 am
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by steve3000 » Sun May 27, 2018 12:02 am

hoglet wrote:
Thu May 24, 2018 4:49 pm
MANDEL was included in ARM BASIC 1.00 (assembled 14th April 1986)

It was not included in the next version I can find: ARM BASIC 1.02 (25th September 1987).
Just out of interest, I powered up my Arthur 0.3 A310 today and had a look for the MANDEL keyword in it's ARM BASIC v1.00, but it's not present.

That's BBC BASIC V 1.00 (05 June 1987), "ARM BBC BASIC V assembled on 5th June 1987".

So probably only included in the ARM co-pro version of BASIC then?

Coeus
Posts: 900
Joined: Mon Jul 25, 2016 11:05 am
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by Coeus » Wed May 30, 2018 6:05 pm

litwr wrote:
Sat Oct 29, 2016 3:15 pm
The converted program

Code: Select all

   10 x=0:y=0:x2=0:y2=0:xy=0:r=0:j=0
   11 DIM d(2,319),r(319),j(199):a=0:b=1:c=2
   12 r0=-1.9:r1=.5:dr=(r1-r0)/320:FOR s%=0 TO 319:r(s%)=r0+dr*s%:NEXT
   13 j0=-0.10:j1=.95:dj=(j1-j0)/200:FOR t%=0 TO 199:j(199-t%)=j0+dj*t%:NEXT
   14 n=30
   15 MODE 4:TIME=0
   16 FOR t%=0 TO 199:j=j(t%):FOR s%=0 TO 319:r=r(s%):x=r:y=j
   17 FOR i=1 TO n:x2=x*x:y2=y*y:IF x2+y2<4 THEN xy=x*y:x=x2-y2+r:y=2*xy+j:NEXT
   18 d(c,s%)=i:IF i<=n THEN:PLOT 69,4*s%,800-4*t%
   19 IF s%<2 OR t%<2 THEN 28
   20 IF POINT(4*s%-4,804-4*t%)=0 THEN 28
   21 m=d(b,s%-1)
   22 IF d(b,s%-2)<m AND d(b,s%)<m THEN 27
   23 IF d(a,s%-2)<m AND d(c,s%)<m THEN 27
   24 IF d(c,s%-2)<m AND d(a,s%)<m THEN 27
   25 IF d(a,s%-1)<m AND d(c,s%-1)<m THEN 27
   26 GOTO 28
   27 PLOT 70,4*s%-4,804-4*t%
   28 NEXT s%:z=a:a=b:b=c:c=z:NEXT:clk=TIME/100:hours=INT(clk/3600):mins=INT(clk/60-hours*60)
   29 REM a$=GET$:IF a$="" THEN 29
   30 PRINT hours;":";mins;"-";INT(clk-hours*3600-mins*60)
Interestingly I tried that in Brandy BASIC and it complains that the NEXT s% on line 28 isn't incrementing the current loop control variable.
litwr wrote:
Sat Oct 29, 2016 3:15 pm
So it should be less than a minute for the fast 2nd processor.
I am not sure about under a minute but that is probably somewhere close. As this is using the proper VDU calls I thought I'd run it in B-Em in Master Turbo mode but with the 2nd proc running much faster and capture the VDU codes being output to a file with *SPOOL. Then I I timed doing a *PRINT of the resulting file. In order to avoid the filing system activity having much effect on the output I used VDFS in B-Em which traps to the host very quickly into the filing system call and makes no attempt to slow things down to 8bit speeds - it executes the I/O operation at host speed. The resulted in a time of 2m28s.

Then it occurred to me that the master has a layer that sits on top of the filing system so the *PRINT may be quicker on a plain BBC B with OS 1.20 so I tried that and got 1m11s.

Here's what's being executed that is not in the VDU driver:

Firstly there's a loop in the implementation of the *PRINT command:

Code: Select all

.chrlp  JSR     OSWRCH
        JSR     OSBGET
        BCS     eof
.found  BIT     &FF
        BPL     chrlp
.gotesc LDA     #&7E
        JSR     OSWRCH
.eof    LDA     #&00
        JMP     OSFIND
Then there's the code behind OSBGET starting with the code to jump to an extended vector:

Code: Select all

FF51: 48          PHA
FF52: 48          PHA
FF53: 48          PHA
FF54: 48          PHA
FF55: 48          PHA
FF56: 08          PHP
FF57: 48          PHA
FF58: 8A          TXA
FF59: 48          PHA
FF5A: 98          TYA
FF5B: 48          PHA
FF5C: BA          TSX
FF5D: A9 FF       LDA #FF
FF5F: 9D 08 01    STA 0108,X
FF62: A9 88       LDA #88
FF64: 9D 07 01    STA 0107,X
FF67: BC 0A 01    LDY 010A,X
FF6A: B9 9D 0D    LDA 0D9D,Y
FF6D: 9D 05 01    STA 0105,X
FF70: B9 9E 0D    LDA 0D9E,Y
FF73: 9D 06 01    STA 0106,X
FF76: A5 F4       LDA F4
FF78: 9D 09 01    STA 0109,X
FF7B: B9 9F 0D    LDA 0D9F,Y
FF7E: 85 F4       STA F4
FF80: 8D 30 FE    STA FE30
FF83: 68          PLA
FF84: A8          TAY
FF85: 68          PLA
FF86: AA          TAX
FF87: 68          PLA
FF88: 40          RTI
The VDFS implementation of BGET:

Code: Select all

8C07: 8D 5F FC    STA FC5F
8C0A: A9 04       LDA #04
8C0C: 8D 5E FC    STA FC5E
8C0F: 60          RTS
and the code to return from an extended vector:

Code: Select all

FF89: 08          PHP
FF8A: 48          PHA
FF8B: 8A          TXA
FF8C: 48          PHA
FF8D: BA          TSX
FF8E: BD 02 01    LDA 0102,X
FF91: 9D 05 01    STA 0105,X
FF94: BD 03 01    LDA 0103,X
FF97: 9D 06 01    STA 0106,X
FF9A: 68          PLA
FF9B: AA          TAX
FF9C: 68          PLA
FF9D: 68          PLA
FF9E: 68          PLA
FF9F: 85 F4       STA F4
FFA1: 8D 30 FE    STA FE30
FFA4: 68          PLA
FFA5: 28          PLP
FFA6: 60          RTS
8A96: B0 09       BCS 8AA1

Coeus
Posts: 900
Joined: Mon Jul 25, 2016 11:05 am
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by Coeus » Wed May 30, 2018 6:34 pm

Further to my last message, if I call the VDFS BGET directly thus cutting out the extended vector code the timing drops to 45s.

User avatar
BigEd
Posts: 1967
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by BigEd » Wed May 30, 2018 7:05 pm

Very interesting findings!

litwr
Posts: 194
Joined: Sun Jun 12, 2016 8:44 am
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by litwr » Tue Jun 12, 2018 10:35 am

I am curious what is the frequency of the 2nd CPU that gets us only 45 sec?
BTW Raspberry Pi @700 MHz under RiscOS does BBC Basic Mandelbrot program in less than 5 sec.

litwr
Posts: 194
Joined: Sun Jun 12, 2016 8:44 am
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by litwr » Wed Jun 13, 2018 2:27 am

Coeus wrote:
Wed May 30, 2018 6:05 pm
Interestingly I tried that in Brandy BASIC and it complains that the NEXT s% on line 28 isn't incrementing the current loop control variable.
Basic interpreters allow to have several NEXT for one FOR. This feature is too difficult to support in the modern world of fast compilers...

User avatar
BigEd
Posts: 1967
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by BigEd » Wed Jun 13, 2018 8:59 am

litwr wrote:
Tue Jun 12, 2018 10:35 am
I am curious what is the frequency of the 2nd CPU that gets us only 45 sec?
I think the answer is here:
B-Em in Master Turbo mode but with the 2nd proc running much faster
(Which is to say, it's in emulation, presumably on a very fast x86.)

Coeus
Posts: 900
Joined: Mon Jul 25, 2016 11:05 am
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by Coeus » Wed Jun 13, 2018 4:24 pm

litwr wrote:
Tue Jun 12, 2018 10:35 am
I am curious what is the frequency of the 2nd CPU that gets us only 45 sec?
The 45s was not using a 2nd processor to do the calculations "live" at all.

The idea of this test came from the fact there had been some discussion about parallel execution, i.e. having the 2nd processor do the calculations and have the I/O processor do the drawing and people were talking about a custom protocol over the tube interface with application-specific code running in both processors. I had the idea to explore at what point custom drawing code in the I/O processor would be necessary to improve the performance further compared to using the standard VDU drivers and this gives the answer of 45s. Once your second processor can do the calculations in less than 45s you need custom code in the I/O processor too to gain any further improvement.

So there were too parts of the test:

1. Capture the stream of bytes being fed by the VDU driver with *SPOOL. For that I did use a 2nd processor running very fast, just because I didn't want to wait around for a long time, not because I was interested in the performance of the 2nd processor pe se.

2. Replay that stream of bytes into the VDU driver with as little overhead as possible so the VDU driver completely dominates the result. The detail of how I did that is in my original post.

User avatar
mez
Posts: 39
Joined: Mon Feb 14, 2011 1:58 pm
Location: Yorkshire, UK
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by mez » Tue Jun 26, 2018 6:11 pm

Interesting - it is good to see so many people still pushing the 'boundaries' on the classic Mandelbrot task. I've always had a Windows project on the backburner over the past 25 years (it currently has borked OpenCL support due to Windows 10 Updates, but I'm too busy to fix it!).

Would loop unrolling help anyone that is doing a full block of iterations for any pixel? I used to temporarily store r and i, calculate a chunk of 8 iterations then either bail if max iterations was reached or rewind if it overshot. This saved 8 out of 9 costly condition checks at the expense of the additional code. You don't need to store r squared each time either. Made it about 15% faster I think, because the calculation pipeline is not being broken as much, but on the 6502 you'd probably only be saving the branch cycles.

Something like this (apologies for the C# but you get the drift):

Code: Select all

      static uint MAX = 256;
 
      static uint Calculate(double x, double y)
      {
         uint n = 0;
         double r = x;
         double i = y;
         double sqi, sqr;

         // calculate in chunks
         const uint CHUNK = 9;
         double tr = r, ti = i;
         do
         {
            if ((n += CHUNK) >= MAX)
               return MAX;

            sqi = i * i; i = 2d * i * r + y; r = r * r - sqi + x;
            sqi = i * i; i = 2d * i * r + y; r = r * r - sqi + x;
            sqi = i * i; i = 2d * i * r + y; r = r * r - sqi + x;
            sqi = i * i; i = 2d * i * r + y; r = r * r - sqi + x;
            sqi = i * i; i = 2d * i * r + y; r = r * r - sqi + x;
            sqi = i * i; i = 2d * i * r + y; r = r * r - sqi + x;
            sqi = i * i; i = 2d * i * r + y; r = r * r - sqi + x;
            sqi = i * i; i = 2d * i * r + y; r = r * r - sqi + x;

            sqr = r * r; sqi = i * i;
            if (sqi + sqr >= 4d)
               break;
            i = 2d * i * r + y;
            r = sqr - sqi + x;

            tr = r; ti = i;
         }
         while (true);

         // overshot, so rewind to start of chunk & reach limit properly
         n -= CHUNK;
         r = tr; i = ti;
         do
         {
            sqr = r * r; sqi = i * i;
            if (sqi + sqr >= 4d)
               return n;
            i = 2d * i * r + y;
            r = sqr - sqi + x;
            ++n;
         }
         while (true);
      }
Also, I think I found that the precise ordering of the intensive calculations made a difference too, but again that probably won't mean much on the 6502.

YMMV.
Last edited by mez on Tue Jun 26, 2018 6:31 pm, edited 7 times in total.

reenigne
Posts: 8
Joined: Thu May 24, 2018 11:31 am
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by reenigne » Wed Jun 27, 2018 9:14 pm

mez wrote:
Tue Jun 26, 2018 6:11 pm
Would loop unrolling help anyone that is doing a full block of iterations for any pixel?
I unrolled the iteration loop for my fast 8088 Mandelbrot program, but only for the purpose of getting rid of the final loop branch - I'm still doing the bailout check each iteration (and hence decrementing the iteration counter each time). I haven't measured, but I suspect that the cost of redoing iterations would exceed the cost of the bailout test (especially given the clever optimizations I'm using to reuse work from the bailout test in the actual iterations).

There's also the problem that this code uses fixed-point maths, which will overflow when the numbers get too large (possibly undoing the bailout), so we'd lose about 1 bit of precision for each iteration we unrolled (and we don't have much to spare to begin with).

Soruk
Posts: 98
Joined: Mon Jul 09, 2018 10:31 am
Location: Basingstoke, Hampshire
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by Soruk » Mon Jul 09, 2018 5:56 pm

litwr wrote:
Wed Jun 13, 2018 2:27 am
Coeus wrote:
Wed May 30, 2018 6:05 pm
Interestingly I tried that in Brandy BASIC and it complains that the NEXT s% on line 28 isn't incrementing the current loop control variable.
Basic interpreters allow to have several NEXT for one FOR. This feature is too difficult to support in the modern world of fast compilers...
I've got a different mandelbrot that is using nested FOR loops working fine in Brandy. I think the problem stems from line 17, that the NEXT for the FOR i will only be executed if the IF is true - I separated that out to its own line, so it no longer failed with that error, only problem then is it doean't actually do anything, I am not sure where that problem lies.

Coeus
Posts: 900
Joined: Mon Jul 25, 2016 11:05 am
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by Coeus » Mon Jul 09, 2018 7:31 pm

Soruk wrote:
Mon Jul 09, 2018 5:56 pm
I've got a different mandelbrot that is using nested FOR loops working fine in Brandy.
It would be interesting to see that to compare. Brandy can obviously do nested for loops but its setting an extra constraint compared to Acorn BASICs that the next needs to match the most recent FOR.
Soruk wrote:
Mon Jul 09, 2018 5:56 pm
I think the problem stems from line 17, that the NEXT for the FOR i will only be executed if the IF is true - I separated that out to its own line, so it no longer failed with that error, only problem then is it doean't actually do anything, I am not sure where that problem lies.
The problem is that doing that breaks the algorithm. That bit of basic is equivalent to this in C:

Code: Select all

for (i = 1; i < n; i++) {
    x2 = x * x;
    y2 = y * y;
    if ((x2 + y2) >= 4)
        break;
    xy = x * y;
    x = x2 - y2 + r;
    y = 2 * xy + j;
 }
note the condition in the break statement is the opposite of the one in BASIC.

It is not trival to re-work that to make the loop complete without breaking something else. The trick suggested in the BBC Micro User Guide of forcing the loop control variable to a value above the upper limit of the loop doesn't work because the following line:

Code: Select all

d(c,s%)=i:IF i<=n THEN:PLOT 69,4*s%,800-4*t%
expects i to have the value it had when ((x2 + y2) >= 4) became true.

Soruk
Posts: 98
Joined: Mon Jul 09, 2018 10:31 am
Location: Basingstoke, Hampshire
Contact:

Re: Border tracing Mandelbrot generator for the Beeb

Post by Soruk » Mon Jul 09, 2018 10:17 pm

Your patch from viewtopic.php?f=54&t=15352 allows the program to run probably as intended. I should compare the output with RPCEmu (the easiest thing I have to hand that's likely to produce near-guaranteed correct output and, with HostFS, easier than trying to faff with making SSD files in Linux or typing the whole thing in by hand), but in the meantime, it now runs in Brandy.

Post Reply