emulator cycle vs walclock vs TIME accuracy

want to talk about MESS/model b/beebem/b-em/electrem/elkulator? do it here!
Post Reply
dominicbeesley
Posts: 625
Joined: Tue Apr 30, 2013 11:16 am
Contact:

emulator cycle vs walclock vs TIME accuracy

Post by dominicbeesley » Mon Jun 25, 2018 12:06 pm

Hello,

I was playing with JGH's clocksp benchmark and noticed that the string manipulation test was spending a lot of time doing floating point divs so I tweaked it to pre-compute the LEN/2 and LEN/4 constants and wanted to re-calibrate the display to a 2.0MHz beeb and gave it a quick run in BeebEm and B-em [I've no real machine to test on!]

Both seem to give low numbers for all speeds (ignore the string test, that's what I've been tweaking) does anyone know where this discrepancy comes from. Is it the cycle accuracy of the 6502 emulation or is it the 6522's 1MHz signal that is being tweaked to make it reflect wall-clock time, or something else?

D

User avatar
Rich Talbot-Watkins
Posts: 1335
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca
Contact:

Re: emulator cycle vs walclock vs TIME accuracy

Post by Rich Talbot-Watkins » Mon Jun 25, 2018 12:29 pm

From memory, I think B-Em scrimps on ADC emulation and doesn't generate and handle as many IRQs as real hardware. Have you compared with jsbeeb to see if you get more reasonable results? Also try turning off the ADC (*FX16) to see if that gives closer results on B-Em. Can't speak at all for BeebEm as I haven't looked at it in years!

Coeus
Posts: 956
Joined: Mon Jul 25, 2016 11:05 am
Contact:

Re: emulator cycle vs walclock vs TIME accuracy

Post by Coeus » Mon Jun 25, 2018 6:41 pm

I am sure I have seen something to say that the 2Mhz reference for this program is with interrupts disabled. If you have them enabled you would expect the result to be slighly under 2Mhz.

What I would not rely on in an emulator without understanding how it works is the effect of speeding up/slowing down the emulator. In the case of B-Em there is a main loop that does a certain number of CPU cycles and a certain number of simulated clock cycles on various other emulated hardware and then it waits for a timer on the host (the old version "sleeps" for that time, the new one is event-driven and waits for the timer event). To speed up or slow down the amount of time waited in that loop is adjusted. To relate that back to real hardware this would be like changing the 16Mhz master clock, not the 2Mhz CPU clock - the whole machine will speed up/slow down together so the timer used to give the results in a program like clocksp will also speed up/slow down, i.e. whatever speed the machine it will always claim to be about 2Mhz.

Just doing some tests here, JsBeeb gives this:
Screenshot from 2018-06-25 19-52-17.png
and took 48s (as timed on my phone). B-Em gave this:
Screenshot from 2018-06-25 20-02-56.png
and took 49s. If I tell B-Em to go at 200% it gives exactly the same result but completes in 25s.

So, working on the string functions to make the measure more specific to string handling and less dependent on floating point performance I would not worry about trying to hit the 2Mhz figure but make the figure for that test match these from the other parts of the program on the same machine./emulator.

BTW, BASIC 4 gives quite different results:
Screenshot from 2018-06-25 20-11-05.png
Last edited by Coeus on Mon Jun 25, 2018 7:11 pm, edited 3 times in total.

dominicbeesley
Posts: 625
Joined: Tue Apr 30, 2013 11:16 am
Contact:

Re: emulator cycle vs walclock vs TIME accuracy

Post by dominicbeesley » Mon Jun 25, 2018 11:02 pm

Thanks both,

Coesus, I'm pretty sure it's not with interrupts totally disabled, i.e. keyboard wouldn't work to type RUN! However, you seem to be on the right track, selecting "basic hardware only" in BeebEm has brought about something more realistic so there must be one of the hardware devices being emulated that is interrupting enough to eat a couple of % of processor time!

I've got enough to go on now!

D
Attachments
beebok.png

User avatar
jgharston
Posts: 3178
Joined: Thu Sep 24, 2009 11:22 am
Location: Whitby/Sheffield
Contact:

Re: emulator cycle vs walclock vs TIME accuracy

Post by jgharston » Tue Jun 26, 2018 4:46 pm

dominicbeesley wrote:
Mon Jun 25, 2018 12:06 pm
I was playing with JGH's clocksp benchmark and noticed that the string manipulation test was spending a lot of time doing floating point divs so I tweaked it to pre-compute the LEN/2 and LEN/4 constants...
Errr... it's doing that to specificially do lots of string operations, *including* timing the LEN operation. So you've now removed the timing of one of the string operations from the timing test that tests the speed of string operations.

Code: Select all

$ bbcbasic
PDP11 BBC BASIC IV Version 0.25
(C) Copyright J.G.Harston 1989,2005-2015
>_

User avatar
jgharston
Posts: 3178
Joined: Thu Sep 24, 2009 11:22 am
Location: Whitby/Sheffield
Contact:

Re: emulator cycle vs walclock vs TIME accuracy

Post by jgharston » Tue Jun 26, 2018 4:49 pm

Coeus wrote:
Mon Jun 25, 2018 6:41 pm
I am sure I have seen something to say that the 2Mhz reference for this program is with interrupts disabled. If you have them enabled you would expect the result to be slighly under 2Mhz.
When changing what speed tests ClockSp tests did you actually LIST the program?

Code: Select all

  340 REM This is calibrated against a
  350 REM BBC model B with no second
  360 REM processor, running BASIC II
  370 REM and with almost all interupts
  380 REM turned off using:
  390 REM ?&FE4E=&3F
  400 REM This gives 2.00MHz.
Also, the main utility of ClockSp (and ClockSp4 which is calibratated against BASIC IV on a Master) is to compare the efficiency of different bits of the BASIC interpreter against a base model, for instance seeing that BASIC IV's trig functions are more efficient that BASIC II's, and the spread of speeds in Z80 BASIC is different to the spread of speeds in ARM BASIC, and to help in pinpointing what bits of code in BASIC or a CPU emulator can be targetting for optimisation. Eg, PDP11 BASIC wasted a CPU instruction in every BASIC instruction by preloading a register which was only needed for about 3% of BASIC instructions, and changing the floating arithmetic gave a speedup similar to the 6502 BASIC II->IV speedup.
Last edited by jgharston on Tue Jun 26, 2018 4:55 pm, edited 1 time in total.

Code: Select all

$ bbcbasic
PDP11 BBC BASIC IV Version 0.25
(C) Copyright J.G.Harston 1989,2005-2015
>_

dominicbeesley
Posts: 625
Joined: Tue Apr 30, 2013 11:16 am
Contact:

Re: emulator cycle vs walclock vs TIME accuracy

Post by dominicbeesley » Tue Jun 26, 2018 11:08 pm

jgharston wrote:
Tue Jun 26, 2018 4:49 pm
When changing what speed tests ClockSp tests did you actually LIST the program?
Yes, of course I did....I might not have read it _all_ every time though - I've listed it so many times that I've become blind to that part and forgot! :oops: [It doesn't make that much difference on my old test beeb that had little interrupt generating hardware and I've fallen out of the habit!]
jgharston wrote:
Tue Jun 26, 2018 4:46 pm
dominicbeesley wrote:
Mon Jun 25, 2018 12:06 pm
I was playing with JGH's clocksp benchmark and noticed that the string manipulation test was spending a lot of time doing floating point divs so I tweaked it to pre-compute the LEN/2 and LEN/4 constants...
Errr... it's doing that to specificially do lots of string operations, *including* timing the LEN operation. So you've now removed the timing of one of the string operations from the timing test that tests the speed of string operations.
Yes, I get that, I wasn't impugning the very useful CLOCKSP merely tweaking it temporarily to test my changes to mid$ and compare to a real beeb. The divides really _do_ swamp any contribution of LEN itself. (see below. even integer divides swamp LEN considerably). Plus for my purposes there's not much in LEN that can be optimized...
jgharston wrote:
Tue Jun 26, 2018 4:49 pm
Also, the main utility of ClockSp (and ClockSp4 which is calibratated against BASIC IV on a Master) is to compare the efficiency of different bits of the BASIC interpreter against a base model, for instance seeing that BASIC IV's trig functions are more efficient that BASIC II's, and the spread of speeds in Z80 BASIC is different to the spread of speeds in ARM BASIC, and to help in pinpointing what bits of code in BASIC or a CPU emulator can be targetting for optimisation. Eg, PDP11 BASIC wasted a CPU instruction in every BASIC instruction by preloading a register which was only needed for about 3% of BASIC instructions, and changing the floating arithmetic gave a speedup similar to the 6502 BASIC II->IV speedup.
I get that too (it's exactly what I've been using it for). However, the other day when I was playing with the FP routines I noticed that changing the fp div routine had a considerable knock-on effect on the string figure whereas changing the string routines didn't much! [disappointingly so after a lot of what I'd that was clever optimisation]. Now I've started digging deeper into optimisation I've been setting up more specific benchmarks, then going back to CLOCKSP for a final decision on whether I'm wasting my time. It's also a good quick check that my efforts at size reduction, sometimes at the cost of speed, aren't adverse.

Here is a quick, dirty and not so scientific comparison of a simple for loop with a simple assignment vs LEN, LEN/4, LEN DIV 4 shows that the division swamps the contribution of LEN by approx 1:8 ish, DIV isn't that much better...
divisslowerthanlen.png
D

PS: On a positive note this has led me to discover that there's some kind of problem in the 6809 mos whereby the sound semaphore is getting stuck on. fixing this should give another 5% improvement! yay!

Coeus
Posts: 956
Joined: Mon Jul 25, 2016 11:05 am
Contact:

Re: emulator cycle vs walclock vs TIME accuracy

Post by Coeus » Mon Jul 09, 2018 7:54 pm

I was trying to make sense of these figures from Brandy BASIC:
Screenshot from 2018-07-09 19-33-46.png
The overall theme seems to be that Brandy on my PC is a bit over 36,000 X as fast as a BBC unless floating point is involved in which case having hardware floating point is making an even bigger difference. Previously I would have wondered why string processing has speeded up so much but knowing from this thread that the string processing test uses lots of FP explains it perfectly.

dominicbeesley
Posts: 625
Joined: Tue Apr 30, 2013 11:16 am
Contact:

Re: emulator cycle vs walclock vs TIME accuracy

Post by dominicbeesley » Tue Jul 10, 2018 10:13 am

That's a lot morer fasterer 8) than I'd have expected. I wonder if that is genuinely that much faster or whether that's pushing the accuracy of the timer or overflowing somewhere?

Post Reply