Benchmarks

discuss general risc os software applications and utilities
Related forum: adventures


Post Reply
User avatar
SarahWalker
Posts: 1313
Joined: Fri Jan 14, 2005 3:56 pm
Contact:

Benchmarks

Post by SarahWalker » Sun Aug 30, 2020 3:47 pm

I seem to have ended up going through a good variety of Arcs during lockdown...

So I thought I'd benchmark them!

I am aware I'm probably the only person who finds this interesting.

Anyway, here are some results. See below for machine & benchmark details. If anyone can think of any other interesting benchmarks, please let me know! I had been thinking of POVRay, but that would probably just show the superiority of FPA.

Code: Select all

			A3000	A3010	A3020	A410/1	R260	A5000a	RPC600	RPC600	RPC700	RPCSA	Mico
					24.86M				no VRAM
Dhrystone/sec		5972			18367	22425	26491	35423	36200	50095	423711	70732
Whetstone/sec		76			228	2788	310	189	195	309	2623	7649
Main memory read MB/s	15.71			13.52	21.99	23.32	26.12	28.77	28.02	40.48	47.01
Main memory write MB/s	15.71			16.79	26.31	25.98	48.64	54.15	53.03	37.37	93.57
CLOCKSP			49			204.73	238.18	284.49	331.46	315.15	469.26	3818.01	568.08
CLOCKSP, Rmfaster	61.46			215.28	259.11	313.61	335.06	336.58	473.18	3815.27	613.46
CLOCKSP, BASIC64					279.64						643.00
Mandelbrot						438.44						238.41
Mandelbrot, RMfaster					426.20						198.41
Mandelbrot, BASIC64					334.30						138.58
Doom, low res				9.93	6.32	8.65	9.27	12.54	13.09	18.36	40.47	
Doom				3.24	7.56	5.51	7.25	7.85	9.67	9.97	13.84	45.34	13.67
Reach – Galaxy		9.53	15.33		23.43	29.05	33.67	37.68	38.93	50.46	124.93	72.55
Reach – Tunnel		4.17	6.45		10.73	12.99	15.19	16.84	17.01	25.28	116.36	32.35
Benchmarks :

Dhrystone/Whetstone/Main memory - Synthetics from !SICK V1.28. Run in MODE 12 on non-VRAM machines.
CLOCKSP - jgh's CLOCKSP benchmark under BBC BASIC, combined result reported. I ran with BASIC in both ROM and RMFaster'd in RAM. On the R260 & Mico I also ran BASIC64 to test FPA. Run in MODE 12 on non-VRAM machines.
Mandlebrot - from https://github.com/markdryan/basic-benc ... /mandelBAS
Doom - Ultimate Doom under Doom+ running with "-timedemo demo1" at 320x240 with 8-bit colour.
Reach - Galaxy - Vast amount of multiplies in geometry part, render part performs read-modify-write on video memory
Reach - Tunnel - Full screen texture mapping, should be even between memory and ALU

Machines :

A3000 - 8 MHz ARM2, 4 MB RAM, Watford A3000 IDE interface, RISC OS 2.00. * result taken from Chris's Acorns

A3010 - Doom and Reach results - 12 MHz ARM250, 4 MB RAM, IanS-made ZIDEFS interface, RISC OS 3.11
Other results - 12 MHz ARM250, 4 MB RAM, Simtec IDEFS interface, RISC OS 3.11 (IanS)

A3020 - ARM250 Overclocked to 24.86 MHz (trixster)

A410/1 - 25 MHz ARM3, 8 MB RAM @ 8 MHz, ICS ideA interface, RISC OS 3.11

A420/1 - 36MHz ARM3, 8MB RAM @ 8MHz, Watford IDE podule, RISC OS 3.10 (IanS)

R260 - 26 MHz ARM3 + FPA10, 8 MB RAM @ 12 MHz, Acorn AKA31 SCSI interface, RISC OS 3.11

A5000a - 33 MHz ARM3, 8 MB RAM @ 12 MHZ, RISC OS 3.11

RiscPC - 30 MHz ARM610, 40 MHz ARM710, 233 MHz StrongARM, 32 MB RAM, 2 MB VRAM, RISC OS 3.7

Mico - 56 MHz ARM7500FE, 32 MB RAM, RISC OS 4.03
Last edited by SarahWalker on Sat Sep 05, 2020 2:37 pm, edited 3 times in total.

markdryan
Posts: 159
Joined: Sun Aug 20, 2017 11:37 pm
Contact:

Re: Benchmarks

Post by markdryan » Sun Aug 30, 2020 5:14 pm

SarahWalker wrote:
Sun Aug 30, 2020 3:47 pm
I am aware I'm probably the only person who finds this interesting.
I'm interested. Particularly, in the FPA performance. Perhaps you could try something like Mandelbrot in normal BASIC and then again in BASIC64 on the machine with the FPA? I'm guessing, we're not getting a true idea of the floating point performance boost provided by the FPA from the ClockSp benchmark as it includes the results for integer, string and function call benchmarks in the final score. Here's the Mandelbrot benchmark I've been using.

https://github.com/markdryan/basic-benc ... /mandelBAS

User avatar
SarahWalker
Posts: 1313
Joined: Fri Jan 14, 2005 3:56 pm
Contact:

Re: Benchmarks

Post by SarahWalker » Sun Aug 30, 2020 5:38 pm

I'll give that a try, thanks.

In the mean time I've added numbers from a couple of scenes ripped from Reach. In the process of removing VRAM, my current RiscPC got its first taste of blood. I hate that case...

User avatar
danielj
Posts: 8444
Joined: Thu Oct 02, 2008 5:51 pm
Location: Manchester
Contact:

Re: Benchmarks

Post by danielj » Sun Aug 30, 2020 6:42 pm

It's quite interesting to see what a bump the 710 was from the 610.

Edit: Although that's probably just the cranked up clock looking at it - I forgot it was 40MHz vs 30.

sirbod
Posts: 1133
Joined: Mon Apr 09, 2012 9:44 am
Location: Essex
Contact:

Re: Benchmarks

Post by sirbod » Sun Aug 30, 2020 9:29 pm

You should post the benchmarks.

I have an A310 with MEMC1, an A4000, an A7000+ and a Kinetic. It would be interesting to see what difference a MEMC1a has if someone has a stock A305/A310 with MEMC1a upgrade.

User avatar
SarahWalker
Posts: 1313
Joined: Fri Jan 14, 2005 3:56 pm
Contact:

Re: Benchmarks

Post by SarahWalker » Sun Aug 30, 2020 9:43 pm

!SICK and CLOCKSP are readily available online, Doom+ can be got from your local dealer...

I've attached the Reach benchmarks. Both will run then print an execution time in centiseconds. Galaxy framerate is 436 / (centisecond execution time / 100), tunnel is 896 / (centisecond execution time / 100).
Attachments
reachbench.zip
(80.92 KiB) Downloaded 12 times

User avatar
IanJeffray
Posts: 302
Joined: Sat Jun 06, 2020 3:50 pm
Contact:

Re: Benchmarks

Post by IanJeffray » Sun Aug 30, 2020 10:59 pm

Faster-clocked ARM3 machine:
A420/1 - 36MHz ARM3, 8MB RAM @ 8MHz, Watford IDE podule, RISC OS 3.10
CLOCKSP: 280.94
CLOCKSP, Rmfaster : 283.41
Dhyrystone/sec : 19768
Whetstone/sec : 280

A3010 results not originally included:
A3010 - 12 MHz ARM250, 4 MB RAM, Simtec IDEFS interface, RISC OS 3.11
CLOCKSP: 51.83
CLOCKSP, Rmfaster: 95.86
Dhyrystone/sec : 8871
Whetstone/sec : 54

But .... how can my A3010 Whetstone be worse than an ARM2 A3000 ? This is !SICK 1.28 I'm using.

User avatar
IanJeffray
Posts: 302
Joined: Sat Jun 06, 2020 3:50 pm
Contact:

Re: Benchmarks

Post by IanJeffray » Sun Aug 30, 2020 11:06 pm

IanJeffray wrote:
Sun Aug 30, 2020 10:59 pm
But .... how can my A3010 Whetstone be worse than an ARM2 A3000 ? This is !SICK 1.28 I'm using.
ahh hmmm ... RMFaster FPEmulator and Whetstone now comes in at 104 on the A3010.

User avatar
SarahWalker
Posts: 1313
Joined: Fri Jan 14, 2005 3:56 pm
Contact:

Re: Benchmarks

Post by SarahWalker » Mon Aug 31, 2020 7:34 am

My A3000 is running RISC OS 2 with an older FPemulator, so the FP results there probably aren't directly comparable with RISC OS 3 machines.

Cheers for the A420 results btw, interesting to see the relative impacts of CPU/memory speed. Just to check, this was in MODE 12?

User avatar
SarahWalker
Posts: 1313
Joined: Fri Jan 14, 2005 3:56 pm
Contact:

Re: Benchmarks

Post by SarahWalker » Mon Aug 31, 2020 8:01 am

Updated first post. I also tracked down a RISC OS 3 ARM2 w/MEMC1a Whetstone result to replace the blatantly rigged old-FPemulator-running-from-RAM A3000 result. Given that FPemulator would have been running from ROM it's not entirely surprising that ARM2 and ARM250 give basically the same result on this test, confirming that the best route for optimising an A3010/3020/4000 is to *RMFaster everything in sight.

User avatar
IanJeffray
Posts: 302
Joined: Sat Jun 06, 2020 3:50 pm
Contact:

Re: Benchmarks

Post by IanJeffray » Tue Sep 01, 2020 12:43 am

SarahWalker wrote:
Mon Aug 31, 2020 7:34 am
Cheers for the A420 results btw, interesting to see the relative impacts of CPU/memory speed. Just to check, this was in MODE 12?
They weren't. Oops. That was MODE 27. But on re-reunning in MODE12, I get identical numbers anyway. Except -- I also re-ran them without running UniBoot and the CLOCKSP (non-rmfaster) value is now down to 263. Something in UniBoot must be messing about to affect that. Other timings unaffected.

I notice you've attached Reach - I'll try that sometime too.

User avatar
helpful
Posts: 634
Joined: Tue Sep 22, 2009 1:18 pm
Location: London
Contact:

Re: Benchmarks

Post by helpful » Tue Sep 01, 2020 7:13 pm

Just for fun, a 2GHz Pi4 running BASIC V gives a ClockSp result of 72345MHz :-)

Full results here - https://www.riscosopen.org/forum/forums ... 71?page=19

Bryan.
RISC OS User Group Of London - http://www.rougol.jellybaby.net/
RISC OS London Show - http://www.riscoslondonshow.co.uk/

User avatar
trixster
Posts: 1065
Joined: Wed May 06, 2015 12:45 pm
Location: York
Contact:

Re: Benchmarks

Post by trixster » Tue Sep 01, 2020 7:36 pm

How is the Doom+ benchmark actually run, Sarah? I'd like to try it on my overclocked A3020. How do you input the command line options? Is it under Miscellaneous Options?

Ah, disregard, I've sussed it. Needed an edit to the Obey file

Arm250 clocked at 24.86Mhz (15.06 MIPS according to !Si)
1710 gametics in 7913 realtics, which i think is 7.563fps (that's with no border, but with the hud showing, so i think the same as Sarah)

--------------------------------------------

As a comparison, here are some figures for a few Amiga's running Ultimate Doom demo3 with two levels of border:
3863 gametics
Amiga 3000 68030 25Mhz - 5.278
Amiga CD32 with 68030 50Mhz - 11.86
Amiga 4000 with 68060 96Mhz - 36.37
A3020 ARM250 24.86 Mhz - 10.76

User avatar
SarahWalker
Posts: 1313
Joined: Fri Jan 14, 2005 3:56 pm
Contact:

Re: Benchmarks

Post by SarahWalker » Tue Sep 01, 2020 9:37 pm

Yep, that's the correct border size. What's the performance in low detail mode? (press F5)

You shouldn't need to modify any obey files, add '-timedemo demo1' in the 'Others' field under miscellaneous options.

For the curious, Doom timedemo performance is (gametics / realtics) x 35.

For the sake of further comparison, PC Ultimate Doom timedemo demo1 in high res mode and HUD showing but no border gets 10.1 fps on a 486SX/25 with no L2 cache and a reasonable ISA graphics card (Trident TVGA8900D), and 26.5 fps on a 486DX2/66 with 256kb L2 cache and a very fast VLB graphics card (Tseng ET4000/w32p).

User avatar
trixster
Posts: 1065
Joined: Wed May 06, 2015 12:45 pm
Location: York
Contact:

Re: Benchmarks

Post by trixster » Wed Sep 02, 2020 11:52 am

low details is 6023 realtics in 1710 gametics = 9.93fps

User avatar
SarahWalker
Posts: 1313
Joined: Fri Jan 14, 2005 3:56 pm
Contact:

Re: Benchmarks

Post by SarahWalker » Sat Sep 05, 2020 2:38 pm

Added in the O/C A3020 results, fleshed out the Mico a bit, also added !SICK main memory results (though IIRC !SICK applies a fudge factor so these shouldn't be taken as gospel).

User avatar
SarahWalker
Posts: 1313
Joined: Fri Jan 14, 2005 3:56 pm
Contact:

Re: Benchmarks

Post by SarahWalker » Sat Sep 05, 2020 6:39 pm

!SICK's memory results were annoying me enough for me to do some digging. It's using 12-word LDM/STMs for memory benchmarking, which provoke some suboptimal behaviour in ARM710 and (to a lesser degree) StrongARM, which have 8-word cache lines. For all the other CPUs (which have 4-word cache lines) it should be fine.

User avatar
IanJeffray
Posts: 302
Joined: Sat Jun 06, 2020 3:50 pm
Contact:

Re: Benchmarks

Post by IanJeffray » Sat Sep 05, 2020 6:57 pm

SarahWalker wrote:
Sat Sep 05, 2020 6:39 pm
ARM710 and (to a lesser degree) StrongARM, which have 8-word cache lines
That's interesting. Does that mean that 710 and SA also need 8-word rather than quad-word alignment of blocks, too? (For the ultimate performance).

User avatar
SarahWalker
Posts: 1313
Joined: Fri Jan 14, 2005 3:56 pm
Contact:

Re: Benchmarks

Post by SarahWalker » Sat Sep 05, 2020 7:05 pm

Yes, if you're willing to burn the space in cache.


Edit: To be more specific :

If you're not going to use the data again anytime soon, then yes, align to 8 words and try to read 8 words at a time if possible (as this will prevent the CPU being halted during the cache line fetch).

If you _are_ likely to want the data again (and there's a decent change it will still _be_ in the cache by then...) then pack the data as tightly as possible to maximise cache utilisation.

And if you're _storing_, then ARM610/710/7500 will buffer up to 8 words, they will be written in a single burst (unlike on MEMC machines where they will be broken into 4 word bursts), and the alignment doesn't matter unless you cross a 1 kB boundary. SA can buffer up to 32 words, but a quick scan of the datasheet suggests that it probably won't do a burst longer than 4 words from the write buffer. Unless the destination address is in the cache of course...

User avatar
SarahWalker
Posts: 1313
Joined: Fri Jan 14, 2005 3:56 pm
Contact:

Re: Benchmarks

Post by SarahWalker » Sun Sep 27, 2020 10:31 am

Ran some POVRay tests yesterday. Results are not at all surprising. If anyone is bored enough to reproduce, I used POVRay 2.2 as that's what I had to hand, with the command line "povray +Iscenes.level3.car +Oout/tga +W160 +H120".

Code: Select all

R260, ARM3 at 26 MHz + FPA10 - 11m48s (ylib version was 9m26s)
A5000a, ARM3 at 33 MHz       - 110m16s
RiscPC, ARM610 at 30 MHz     - 105m18s
RiscPC, ARM710 at 40 MHz     - 68m9s
RiscPC, StrongARM at 233 MHz - 8m49s
Mico, ARM7500FE at 56 MHz    - 5m9s (ylib version was 3m30s)
486DX/33                     - 4m57s

Post Reply

Return to “32-bit acorn software: other”