CLOCKSP on a BBC BASIC "emulator" under x64

discuss general risc os software applications and utilities
Related forum: adventures


Post Reply
User avatar
zolbatar
Posts: 48
Joined: Sat Sep 22, 2018 1:12 pm
Location: Nottingham, UK
Contact:

CLOCKSP on a BBC BASIC "emulator" under x64

Post by zolbatar » Tue Sep 08, 2020 9:23 am

I've written my own BBC BASIC implementation using my own VM and bytecodes (for fun). However, it runs so fast that I'm sure that CLOCKSP is overflowing variables, although it's entirely possible there is an error in my code somewhere. If I add a 1 microsecond pause at every bytecode it gives a result, but if I let it run full speed it locks up and gets stuck in a loop.

Has anybody ran CLOCKSP with really large results so I know it's not that it can't handle high MHz speeds?

Cheers!
Daryl.
Master 128 with DataCentre and RPi co-pro.
RPi B+ & 3B+ both running RISC OS.
Poorly A4000 (battery damage, partially repaired).

markdryan
Posts: 148
Joined: Sun Aug 20, 2017 11:37 pm
Contact:

Re: CLOCKSP on a BBC BASIC "emulator" under x64

Post by markdryan » Tue Sep 08, 2020 9:39 am

You might be getting a division by zero error. ClockSp was crashing when I ran it through ABC as ABC actually optimized away some of the for loop tests completly, and as a consequence they took 0 centi seconds, which led to a division by zero error, IIRC.

User avatar
zolbatar
Posts: 48
Joined: Sat Sep 22, 2018 1:12 pm
Location: Nottingham, UK
Contact:

Re: CLOCKSP on a BBC BASIC "emulator" under x64

Post by zolbatar » Tue Sep 08, 2020 10:03 am

I got it to work, it was overflow. I changed from float to double which allowed it to run.

Results are:

Code: Select all

BBC BASIC CPU Timing Program
Really real REPEAT loop   114206 MHz
Integer REPEAT loop        66760 MHz
Really real FOR loop       34316 MHz
Integer FOR loop            6532 MHz
Trig/Log test            1250909 MHz
String manipulation        59688 MHz
Procedure call             43608 MHz
GOSUB call                 48993 MHz
Combined Average          224824 MHz

Compared with a 2.00MHz BBC B
BBCSDL does the following on the same machine:

Code: Select all

BBC BASIC CPU Timing Program
Really real REPEAT loop    10098 MHz
Integer REPEAT loop         6476 MHz
Really real FOR loop       36312 MHz
Integer FOR loop           17979 MHz
Trig/Log test             181052 MHz
String manipulation        23242 MHz
Procedure call             28647 MHz
GOSUB call                 18000 MHz
Combined Average           41290 MHz

Compared with a 2.00MHz BBC B
Master 128 with DataCentre and RPi co-pro.
RPi B+ & 3B+ both running RISC OS.
Poorly A4000 (battery damage, partially repaired).

User avatar
BigEd
Posts: 3340
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: CLOCKSP on a BBC BASIC "emulator" under x64

Post by BigEd » Tue Sep 08, 2020 10:31 am

Well that seems pretty impressive! I hope you'll share the fruits of your labour...

User avatar
IanJeffray
Posts: 181
Joined: Sat Jun 06, 2020 3:50 pm
Contact:

Re: CLOCKSP on a BBC BASIC "emulator" under x64

Post by IanJeffray » Tue Sep 08, 2020 11:45 am

Combined average is 224 GHz? That's ... "impressive" :lol:

User avatar
Richard Russell
Posts: 1658
Joined: Sun Feb 27, 2011 10:35 am
Location: Downham Market, Norfolk
Contact:

Re: CLOCKSP on a BBC BASIC "emulator" under x64

Post by Richard Russell » Tue Sep 08, 2020 12:24 pm

zolbatar wrote:
Tue Sep 08, 2020 10:03 am
I got it to work, it was overflow. I changed from float to double which allowed it to run.
Results are:
Impressive, but the average (showing it running about five times faster than BBCSDL) is in contrast with the FOR loops, which are actually executing slower than in BBCSDL!

The trouble with CLOCKSP is that it doesn't weight the different tests according to how likely they are to be important in a real program (I think it was written to benchmark 6502 emulators, with the assumption that the ratio between the different tests would remain roughly constant).

User avatar
zolbatar
Posts: 48
Joined: Sat Sep 22, 2018 1:12 pm
Location: Nottingham, UK
Contact:

Re: CLOCKSP on a BBC BASIC "emulator" under x64

Post by zolbatar » Tue Sep 08, 2020 1:57 pm

IanJeffray wrote:
Tue Sep 08, 2020 11:45 am
Combined average is 224 GHz? That's ... "impressive" :lol:
I may have a decimal place off somewhere as I needed more digits so roughly altered it!!!

The relative ratios should be correct though.
Master 128 with DataCentre and RPi co-pro.
RPi B+ & 3B+ both running RISC OS.
Poorly A4000 (battery damage, partially repaired).

User avatar
zolbatar
Posts: 48
Joined: Sat Sep 22, 2018 1:12 pm
Location: Nottingham, UK
Contact:

Re: CLOCKSP on a BBC BASIC "emulator" under x64

Post by zolbatar » Tue Sep 08, 2020 2:08 pm

Actually, I think the numbers are correct, not off by 100 as I thought. I've double checked, are these numbers really that insane?
Master 128 with DataCentre and RPi co-pro.
RPi B+ & 3B+ both running RISC OS.
Poorly A4000 (battery damage, partially repaired).

User avatar
Richard Russell
Posts: 1658
Joined: Sun Feb 27, 2011 10:35 am
Location: Downham Market, Norfolk
Contact:

Re: CLOCKSP on a BBC BASIC "emulator" under x64

Post by Richard Russell » Tue Sep 08, 2020 2:31 pm

zolbatar wrote:
Tue Sep 08, 2020 2:08 pm
are these numbers really that insane?
It's probably more instructive to look at the individual ratios:

Code: Select all

Really real REPEAT loop   11.3 x BBCSDL
Integer REPEAT loop       10.3 x BBCSDL 
Really real FOR loop       0.95 x BBCSDL
Integer FOR loop           0.36 x BBCSDL
Trig/Log test              6.9 x BBCSDL
String manipulation        2.5 x BBCSDL
Procedure call             1.5 x BBCSDL
GOSUB call                 2.7 x BBCSDL
The REPEAT loops seem to be extremely fast and the FOR loops surprisingly slow. Is this what you would expect?

User avatar
zolbatar
Posts: 48
Joined: Sat Sep 22, 2018 1:12 pm
Location: Nottingham, UK
Contact:

Re: CLOCKSP on a BBC BASIC "emulator" under x64

Post by zolbatar » Tue Sep 08, 2020 2:49 pm

Richard Russell wrote:
Tue Sep 08, 2020 2:31 pm
The REPEAT loops seem to be extremely fast and the FOR loops surprisingly slow. Is this what you would expect?
This is all relatively new code, I'm converting to a bytecode of my own, this does of course mean I need to know types of things at compile time so even though it's interpreted, supporting something like EVAL would be difficult.

I would expect numeric stuff to be a lot faster, and some things like FOR loops to not be because I evaluate the TO and STEP expressions on each loop. For reference, here is the bytecode I produce, I think most of it is obvious as it's just stack based VM stuff.

The FOR loop could be optimised by creating specialised bytecodes for it I suppose.

Code: Select all

Line: 100 Statement: 0
-> 100 PRINT"Integer FOR loop        ";:T%=TIME:FOR A%=Z% TO D% STEP B%:NEXT:T%=TIME-T%:PROCp(F*178.00/T%)
[ 0]         Token 0x71                PRINT
[ 1]         Expression Start          
[ 2]         String operand            'Integer FOR loop        '
[ 3]         Expression End            
[ 4]         Semi-colon                ;
[0x000000A7] [0x04]/[0x002D]: PUSH.S   'Integer FOR loop        '
[0x000000A8] [0x56]/[0x0000]: PRINT.S

Line: 100 Statement: 1
[ 0]         Token 0x69                LET
[ 1]         Integer variable          T% (0x5576e546c170)
[ 2]         Expression Start          
[ 3]         Token 0x11                TIME
[ 4]         Expression End            
[0x000000A9] [0x58]/[0x0000]: SYS      TIME
[0x000000AA] [0x09]/[0x0003]: STORE.I  [T%]

Line: 100 Statement: 2
[ 0]         Token 0x63                FOR
[ 1]         Integer variable          A% (0x5576e546bd40)
[ 2]         Expression Start          
[ 3]         Integer variable          Z% (0x5576e546dad0)
[ 4]         Expression End            
[ 5]         Token 0x38                TO
[ 6]         Expression Start          
[ 7]         Integer variable          D% (0x5576e546fa20)
[ 8]         Expression End            
[ 9]         Token 0x08                STEP
[10]         Expression Start          
[11]         Integer variable          B% (0x5576e546e330)
[12]         Expression End            
[0x000000AB] [0x06]/[0x0008]: LOAD.I   [Z%]
[0x000000AC] [0x09]/[0x0001]: STORE.I  [A%]

Line: 100 Statement: 3
[ 0]         Token 0x6D                NEXT
[0x000000AD] [0x06]/[0x0001]: LOAD.I   [A%]
[0x000000AE] [0x06]/[0x000C]: LOAD.I   [B%]
[0x000000AF] [0x3E]/[0x0000]: ADD.I
[0x000000B0] [0x3A]/[0x0000]: DUP.I
[0x000000B1] [0x09]/[0x0001]: STORE.I  [A%]
[0x000000B2] [0x06]/[0x0017]: LOAD.I   [D%]
[0x000000B3] [0x2B]/[0x00AD]: JMP.LE.I [173/0xAD]

Line: 100 Statement: 4
[ 0]         Token 0x69                LET
[ 1]         Integer variable          T% (0x5576e546c170)
[ 2]         Expression Start          
[ 3]         Token 0x11                TIME
[ 4]         Integer variable          T% (0x5576e546c170)
[ 5]         Operator                  -
[ 6]         Expression End            
[0x000000B4] [0x58]/[0x0000]: SYS      TIME
[0x000000B5] [0x06]/[0x0003]: LOAD.I   [T%]
[0x000000B6] [0x45]/[0x0000]: SUB.I
[0x000000B7] [0x09]/[0x0003]: STORE.I  [T%]

Line: 100 Statement: 5
[ 0]         Token 0x72                PROC
[ 1]         Proc or FN Call           p
[ 2]         Expression Start          
[ 3]         Float variable            F (0x5576e546e760)
[ 4]         Real operand              178.000000
[ 5]         Operator                  *
[ 6]         Integer variable          T% (0x5576e546c170)
[ 7]         Operator                  /
[ 8]         Expression End            
[0x000000B8] [0x05]/[0x000E]: LOAD.F   [F]
[0x000000B9] [0x03]/[0x002E]: PUSH.F   178.000000
[0x000000BA] [0x42]/[0x0000]: MUL.F
[0x000000BB] [0x06]/[0x0003]: LOAD.I   [T%]
[0x000000BC] [0x19]/[0x0000]: I.TO.F
[0x000000BD] [0x40]/[0x0000]: DIV.F
[0x000000BE] [0x1A]/[0x0000]: F.TO.I
[0x000000BF] [0x20]/[0x0000]: PAR.I
[0x000000C0] [0x24]/[0x01B6]: CALL     [438/0x1B6] 
Master 128 with DataCentre and RPi co-pro.
RPi B+ & 3B+ both running RISC OS.
Poorly A4000 (battery damage, partially repaired).

User avatar
zolbatar
Posts: 48
Joined: Sat Sep 22, 2018 1:12 pm
Location: Nottingham, UK
Contact:

Re: CLOCKSP on a BBC BASIC "emulator" under x64

Post by zolbatar » Tue Sep 08, 2020 2:52 pm

For context, I'm not doing this as a "new" BBC BASIC, I did some 3D stuff on RISC OS before and I wanted to add some stuff to my own BASIC to make writing games more fun, think like BlitzBasic or similar.

My 3D stuff is mentioned in the show report https://www.riscository.com/2019/show-r ... ndon-2019/.

I think it would be cool to be able to write something like:

Code: Select all

10 LET S=CREATESHAPE
20 S.ADDVERTEX(x,y,z)
30 RENDER(S)
Although I haven't given it TOO much thought yet except that I wanted it to be bytecode based so no need to distribute source, just the runtime.
Master 128 with DataCentre and RPi co-pro.
RPi B+ & 3B+ both running RISC OS.
Poorly A4000 (battery damage, partially repaired).

User avatar
Richard Russell
Posts: 1658
Joined: Sun Feb 27, 2011 10:35 am
Location: Downham Market, Norfolk
Contact:

Re: CLOCKSP on a BBC BASIC "emulator" under x64

Post by Richard Russell » Tue Sep 08, 2020 3:38 pm

zolbatar wrote:
Tue Sep 08, 2020 2:49 pm
I would expect numeric stuff to be a lot faster, and some things like FOR loops to not be because I evaluate the TO and STEP expressions on each loop.
It's the contrast between FOR and REPEAT that surprises me. In my BASICs, which are all straightforward interpreters (not even the shortcuts that Brandy uses), the NEXT statement is interpreted every time around a FOR loop and the UNTIL statement is interpreted every time around a REPEAT loop. UNTIL has to evaluate an expression which NEXT doesn't, so if anything I would expect FOR...NEXT to be faster than REPEAT...UNTIL, which is what I find in my BASICs. But in yours the REPEAT loop is up to 30-times faster than the FOR loop which 'feels' wrong.
I think it would be cool to be able to write something like...
As anybody who has listened to me rant on about the subject incessantly over the years will testify, I take the view that extensions to the language can only be justified if the equivalent functionality cannot satisfactorily be obtained from a library. One of BBC BASIC's merits is that is a 'lean and mean' language, very much like C, with only the most fundamental operations implemented in the core interpreter.

I consider that 3D graphics can perfectly satisfactorily be implemented in a library rather than in the core language. If you are familiar with BBC BASIC for Windows and BBC BASIC for SDL 2.0 you will know that both come with 3D graphics libraries (highly compatible with each other).

But obviously you are free to do anything you wish (short of calling it BBC BASIC; that would require permission from the BBC, which I have).

User avatar
zolbatar
Posts: 48
Joined: Sat Sep 22, 2018 1:12 pm
Location: Nottingham, UK
Contact:

Re: CLOCKSP on a BBC BASIC "emulator" under x64

Post by zolbatar » Tue Sep 08, 2020 4:16 pm

Don’t worry, this is just a pet project and won’t be released. I’ll just keep it for my own personal RISC OS projects...it’s more a proof it can be done than anything.

The FOR and REPEAT thing is definitely a conundrum. I’m surprised how slow the FOR loop is considering how little it’s doing.

I was considering doing an ARM interpreter for the byte code, then I’ll know exactly how little or much it’s doing.
Master 128 with DataCentre and RPi co-pro.
RPi B+ & 3B+ both running RISC OS.
Poorly A4000 (battery damage, partially repaired).

Coeus
Posts: 1759
Joined: Mon Jul 25, 2016 12:05 pm
Contact:

Re: CLOCKSP on a BBC BASIC "emulator" under x64

Post by Coeus » Tue Sep 08, 2020 9:35 pm

zolbatar wrote:
Tue Sep 08, 2020 2:49 pm
I would expect numeric stuff to be a lot faster, and some things like FOR loops to not be because I evaluate the TO and STEP expressions on each loop. For reference, here is the bytecode I produce, I think most of it is obvious as it's just stack based VM stuff.
That isn't just sub-optimal it is wrong. The expressions should be evaluated once and those values used throughout. See this example:
for.png
for.png (1.54 KiB) Viewed 441 times
This is on BASIC 4.

User avatar
scarybeasts
Posts: 531
Joined: Tue Feb 06, 2018 7:44 am
Contact:

Re: CLOCKSP on a BBC BASIC "emulator" under x64

Post by scarybeasts » Tue Sep 08, 2020 10:29 pm

zolbatar wrote:
Tue Sep 08, 2020 10:03 am
Trig/Log test 1250909 MHz
Good grief... have you broken in the _tera_hertz range??

I suppose it makes sense.... the 6502 code to do trig / log is a huge number of instructions whereas the modern CPU has a single fast instruction to do it.


Cheers
Chris

User avatar
zolbatar
Posts: 48
Joined: Sat Sep 22, 2018 1:12 pm
Location: Nottingham, UK
Contact:

Re: CLOCKSP on a BBC BASIC "emulator" under x64

Post by zolbatar » Fri Sep 11, 2020 3:10 pm

I've made the FOR loops behave as expected. Trig performance is still slow, probably because this is using FP emulation and for strings I'm using strcpy etc. which likely aren't as efficient as the hand-coded ARM in BBC BASIC.

Assuming (once again) that the code is running correctly, it now runs the CLOCK3 benchmark faster than ARM BASIC V on RPCEmu.

Image
Master 128 with DataCentre and RPi co-pro.
RPi B+ & 3B+ both running RISC OS.
Poorly A4000 (battery damage, partially repaired).

markdryan
Posts: 148
Joined: Sun Aug 20, 2017 11:37 pm
Contact:

Re: CLOCKSP on a BBC BASIC "emulator" under x64

Post by markdryan » Fri Sep 11, 2020 3:26 pm

IIRC the string test does a / in several places. If floating point is slow in your VM it might be worth replacing the / with a DIV and re-running to see if it makes any difference.

User avatar
Richard Russell
Posts: 1658
Joined: Sun Feb 27, 2011 10:35 am
Location: Downham Market, Norfolk
Contact:

Re: CLOCKSP on a BBC BASIC "emulator" under x64

Post by Richard Russell » Fri Sep 11, 2020 4:06 pm

zolbatar wrote:
Fri Sep 11, 2020 3:10 pm
for strings I'm using strcpy etc.
BBC BASIC's strings can (of course) contain arbitrary binary data, as for example when used to hold graphics sprites, so NUL characters may appear anywhere. Therefore I'm not sure how you can apply strcpy etc.; perhaps you meant memcpy:

Code: Select all

      s$ = STRING$(123, CHR$(0))
      PRINT LEN(s$)

User avatar
zolbatar
Posts: 48
Joined: Sat Sep 22, 2018 1:12 pm
Location: Nottingham, UK
Contact:

Re: CLOCKSP on a BBC BASIC "emulator" under x64

Post by zolbatar » Fri Sep 11, 2020 4:19 pm

I'm currently building a test suite so I can make sure I honour the syntax and behaviour. I've come so far I nearly forgot the purpose of all this which was to create an update BAS135 with my 3D stuff and VFP for the PiTubeDirect. The cross-compile still works though, so I'd better get it tested.
Master 128 with DataCentre and RPi co-pro.
RPi B+ & 3B+ both running RISC OS.
Poorly A4000 (battery damage, partially repaired).

Soruk
Posts: 793
Joined: Mon Jul 09, 2018 11:31 am
Location: Basingstoke, Hampshire
Contact:

Re: CLOCKSP on a BBC BASIC "emulator" under x64

Post by Soruk » Wed Sep 16, 2020 11:53 pm

Just for a bit of fun, jumping on the benchmark bandwagon, I got this in Matrix Brandy BASIC 1.22.8, with a text-mode build on a VirtualBox VM running on a 2014-era Xeon:

Code: Select all

[soruk@CentOSvm8 ~]$ sbrandy ClockSp
BBC BASIC CPU Timing Program
Real REPEAT loop     93181.81MHz
Integer REPEAT loop  48380.56MHz
Real FOR loop       110107.52MHz
Integer FOR loop     40000.00MHz
Trig/Log test       860000.00MHz
String manipulation  84889.14MHz
Procedure call       51536.49MHz
GOSUB call           57729.31MHz
Combined Average    173634.61MHz

Compared to a 2.00MHz BBC B
[soruk@CentOSvm8 ~]$ _
Matrix Brandy BASIC VI (work in progress)

Post Reply

Return to “32-bit acorn software: other”