Electron Memory Contention

for bbc micro/electron hardware, peripherals & programming issues (NOT emulators!)
ThomasHarte
Posts: 475
Joined: Sat Dec 23, 2000 5:56 pm
Contact:

Re: Electron Memory Contention

Post by ThomasHarte » Tue Mar 15, 2016 12:12 am

paulb wrote:If I understand you correctly here, what you're suggesting may be related to something someone else suggested, although that (generating colour mixes) may have been considered separately from attempting to put different colour information on alternate "625" lines (merely animating different screens quickly enough, regardless of where they end up precisely on the 625-line display) particularly as I think it was one of the US machines that was the subject of the technique (and so may have been using a different kind of display altogether).
Yep, two suggestions:
  • take advantage of the effective lowpass filter on chrominance if using UHF or modified-machine colour composite to invent extra colours.
  • as you're able to detect whether the machine is outputting the odd or even field, produce a genuine interlaced display.
It has subsequently struck me that the field that starts slightly higher must be the one that runs slightly longer because the cathode ray has to sweep the full height of the display and a little bit more to get to the next field, whereas from the field that starts slightly lower to the next requires not quite a full sweep.
paulb wrote:No, I don't think so. One of davidb's objectives was to eliminate the kind of judder that the GIF exhibits.
I meant before it was eliminated, the ROM-with-a-picture-in-it being the most recent thing I can run on the version of my code that interlaces. Though, caveat: I've attempted to do better than every so often repeating a frame twice to map from 50Hz to your display's output rate and, right now, am using a terrible camera model. Which amplifies judder. But the line at the top vanishing and appearing and ditto for the one at the bottom are hopefully appropriate?

(EDIT: I can see the extra green bar at the bottom on your "... on TV" picture, which I'm assuming to be real hardware rather than an emulator?)

Update on Joe Blade: I just had an error in my fast loading code and hadn't bothered trying it with real tape emulation for a while. With real tape emulation it works perfectly. It relies on the converse case to Northern Star and Southern Belle: that tape input will also signal the transmit data empty interrupt, but only in a very trivial way: it relies upon the transmit data empty bit being set in the interrupt status register after it has completed loading, to get past an initial check. An anti-piracy measure, at a guess, to prevent loading from disk. Subsequently it's probably just using the 50Hz and display end interrupts normally, though it appears to use a polling loop rather than actually enabling interrupts.

User avatar
davidb
Posts: 2373
Joined: Sun Nov 11, 2007 10:11 pm
Contact:

Re: Electron Memory Contention

Post by davidb » Tue Mar 15, 2016 1:36 am

ThomasHarte wrote:(EDIT: I can see the extra green bar at the bottom on your "... on TV" picture, which I'm assuming to be real hardware rather than an emulator?)
That's correct. Initially, it was hit and miss whether the palette entries would be synchronised with the top of the display. Later versions aim to do better but may not always succeed.

The code to handle the two fields can be found here, using this subroutine to perform measurement of half-scanlines. It disables interrupts and assumes that measurements like these are enough to distinguish between the two fields.

User avatar
hoglet
Posts: 7990
Joined: Sat Oct 13, 2012 6:21 pm
Location: Bristol
Contact:

Re: Electron Memory Contention

Post by hoglet » Tue Mar 15, 2016 8:02 am

paulb wrote: Meanwhile, hoglet's Electron FPGA needs to get this kind of thing right, which I think it mostly does.
I would be very interested in knowing the correct timing of the interrupts wrt. the sync signal.

At the moment I'm generating:
- the RTC interrupt on line 100 out of 0...311/312
- the Display interrupt on line 201 out of 0..311/312

Both of these change a few pixels after the leading edge of the hsync pulse at the end of that line.

Line 0 is the first line of the active display.

I don't think I ever measured an actual electron, which I might now do.

Dave

ThomasHarte
Posts: 475
Joined: Sat Dec 23, 2000 5:56 pm
Contact:

Re: Electron Memory Contention

Post by ThomasHarte » Tue Mar 15, 2016 1:18 pm

hoglet wrote:
paulb wrote: Meanwhile, hoglet's Electron FPGA needs to get this kind of thing right, which I think it mostly does.
I would be very interested in knowing the correct timing of the interrupts wrt. the sync signal.

At the moment I'm generating:
- the RTC interrupt on line 100 out of 0...311/312
- the Display interrupt on line 201 out of 0..311/312

Both of these change a few pixels after the leading edge of the hsync pulse at the end of that line.

Line 0 is the first line of the active display.

I don't think I ever measured an actual electron, which I might now do.

Dave
Line 201, not 256, and wouldn't both be relative to the 0..256 range?

Am I right to think you're taking hsync to start 15 2Mhz cycles after the end of pixels? The whole line being (i) 33 cycles, including hsync, space for the back porch and then the left border; (ii) 80 cycles of pixels; (iii) 15 cycles of the right border, which doubles as the front porch?

User avatar
hoglet
Posts: 7990
Joined: Sat Oct 13, 2012 6:21 pm
Location: Bristol
Contact:

Re: Electron Memory Contention

Post by hoglet » Fri Mar 18, 2016 11:47 am

Hi Thomas,
ThomasHarte wrote:
hoglet wrote:
paulb wrote: Meanwhile, hoglet's Electron FPGA needs to get this kind of thing right, which I think it mostly does.
I would be very interested in knowing the correct timing of the interrupts wrt. the sync signal.

At the moment I'm generating:
- the RTC interrupt on line 100 out of 0...311/312
- the Display interrupt on line 201 out of 0..311/312

Both of these change a few pixels after the leading edge of the hsync pulse at the end of that line.

Line 0 is the first line of the active display.

I don't think I ever measured an actual electron, which I might now do.

Dave
Line 201, not 256, and wouldn't both be relative to the 0..256 range?
Sorry, that was a typo, the display interrupt is at the start of line 256.

The range is 0..311 (odd field) and 0..312 (even field), which gives a 625 line frame in total.
ThomasHarte wrote: Am I right to think you're taking hsync to start 15 2Mhz cycles after the end of pixels? The whole line being (i) 33 cycles, including hsync, space for the back porch and then the left border; (ii) 80 cycles of pixels; (iii) 15 cycles of the right border, which doubles as the front porch?
In Electron FPGA in non VGA modes, all video timings are based off a 16MHz clock:
- The horizontal line is 1024 cycles, which is 64 us.
- The active part is cycles 0...639, which is 40us.
- The front porch is cycles 640..761, which is 7.625us.
- The sync pulse is cycles 762..835, which is 4.625us.
- The back porch is cycles 836...1023, which is 11.75us.

I think I arrived at these values by measuring a real electron, but I wouldn't swear to that. I'm pretty sure they are accurate to within a microsecond though.

I've just made some measurements of the interrupt timing on a real Electron.

In modes 0, 1, 2, 4 and 5 the display interrupt is co-incident with the falling edge of the HS pulse following the last active line of display. This is the case for both the odd and even fields:
IMG_0354.JPG
In modes 3 and 6 it is 2 lines later (which is expected, as there are two blank lines underneath each row).

The RTC interrupt is not aligned to the HS pulse, an it's timing depends on the field:

For the odd field (first), it occurs 31us after the start of the active part of line 99:
IMG_0352.JPG
For the even field (second), it occurs 1us before the start of the active part of line 99:
IMG_0351.JPG
More specifically, it occurs exactly 8192us (i.e. 128 lines) after the end of the 160us long VS pulse. This is not accurately replicated in Electron FPGA.

In these scope pictures, the screen is in mode 4, and the background is white (VDU 19,0,7;0;) so you can see the active part of the line in the bottom trace. You can also make out the front porch is approx 8us, the sync pulse is approx 4us and back porch is approx 12us.

Dave
Last edited by hoglet on Fri Mar 18, 2016 7:39 pm, edited 1 time in total.

ThomasHarte
Posts: 475
Joined: Sat Dec 23, 2000 5:56 pm
Contact:

Re: Electron Memory Contention

Post by ThomasHarte » Fri Mar 18, 2016 7:35 pm

I can't commend you highly enough for having obtained this information. I'll update my emulation code as soon as I get a chance and I assume it's also going to benefit ElectronFPGA. I'm already on all the other timing information that you've so far observed so it shouldn't be a big change.

EDIT: so, pulling it all together, a complete frame, timed on the 1Mhz bus to keep the numbers small, and assuming a constant video mode:

Cycle 0: start vertical sync
Cycle 160: end vertical sync
Cycle 1984: horizontal sync begins prior to first line with pixels
Cycle 8352: real-time clock interrupt starts
(if Mode 0 or 3) Cycle 17984: end-of-display sync begins
(if any other mode) Cycle 18368: end-of-display sync begins
Cycle 20000: start vertical sync
Cycle 20160: end vertical sync
Cycle 21952: horizontal sync begins prior to first line with pixels
Cycle 28352: real-time clock interrupt starts
(if Mode 0 or 3) Cycle 37984: end-of-display sync begins
(if any other mode) Cycle 38368: end-of-display sync begins
Cycle 40000: end of frame; repeat

... with each line in isolation being as you described.

paulb
Posts: 811
Joined: Mon Jan 20, 2014 9:02 pm
Contact:

Re: Electron Memory Contention

Post by paulb » Wed Apr 06, 2016 2:25 pm

paulb wrote:If there were any single major enhancement to the ULA that would have boosted the Electron's performance out of the box, it would have been the ability to have the CPU switch to 2MHz while the ULA is resting. The amount of CPU bandwidth in modes 0 to 2 would have matched the standard Electron's CPU bandwidth in modes 4 and 5! Even though programs in modes 0 to 2 would still need to push round twice as much data for the same visual effects as in modes 4 and 5, it would have made a lot of (non-memory-limited) games much more viable in the lower modes.
So, I decided to see if I could make Elkulator emulate this improved ULA performance. As far as I can tell, I need to change the memory access handling as follows...

In src/ula.c, we remove "cycles++" statements after waitforramsync invocations and migrate them into that function in src/mem.c, since it appears that every time that function gets called the cycles are incremented immediately afterwards, at least in the source code I have. Then, we adjust the logic for ULA contention as follows:

Code: Select all

void waitforramsync()
{
        /* During ULA screen update region. */
        if (ula.dispon && ula.x<640)
        {
                /* Lower screen modes where the ULA is continuously active. */
                if (!(ula.mode&4))
                {
                        cycles+=((640-ula.x)/8);
                }

                if (cycles&1) cycles++;

                cycles++;
        }
}
I'm guessing that we're allowed to skip incrementing the cycles if the ULA is not claiming those cycles since these are effectively additional cycles rather than those needed to normally fulfil the execution of the instructions at 2MHz. (Here, I am considering the various "fast" memory access paths as a guide.)

The effect of doing this is quite modest for things like empty FOR loops in mode 2, and for games like Skirmish there might be a marginal improvement in speed, but for games like Kourtyard (and for various scrolling experiments I've been doing) the difference in performance is quite substantial.

paulb
Posts: 811
Joined: Mon Jan 20, 2014 9:02 pm
Contact:

Re: Electron Memory Contention

Post by paulb » Sat Sep 03, 2016 11:02 pm

A belated follow-up. As I looked into doing the ULA in Verilog, I looked a bit more closely at the RAM timings, and the double round-trip to the RAM just takes too long.

It just isn't possible to get the RAM to serve up the bytes quickly enough. The ULA could work the RAM more quickly, and if the RAM were like the ROM and yielded its cargo in a single transfer, there'd be enough left of the 2MHz cycle to pass the data on to the CPU, but it seems that there is hardly any time left after two 4-bit transfers for the CPU to get the data before the next cycle begins. (It was this interaction with the CPU that I forgot to take into account.)

So, I guess that this enhanced RAM mode is just a fantasy with the current architecture and the RAM operating at the given speed. :(

ThomasHarte
Posts: 475
Joined: Sat Dec 23, 2000 5:56 pm
Contact:

Re: Electron Memory Contention

Post by ThomasHarte » Tue Sep 06, 2016 2:44 pm

paulb wrote:A belated follow-up. As I looked into doing the ULA in Verilog, I looked a bit more closely at the RAM timings, and the double round-trip to the RAM just takes too long.
To make sure I have correctly answered my own knee-jerk question: the reason a turbo board works — proving that the RAM chips in isolation could serve the CPU at 2Mhz — is because it routes around the ULA; the ULA introduces too much latency to permit a through-path for 2Mhz RAM accesses?

I guess it'd answer why Electrons as shipped don't do 2Mhz when not drawing and don't, as several other machines do, have a screen off mode which pumps up the CPU power.

User avatar
1024MAK
Posts: 8549
Joined: Mon Apr 18, 2011 4:46 pm
Location: Looking forward to summer in Somerset, UK...
Contact:

Re: Electron Memory Contention

Post by 1024MAK » Tue Sep 06, 2016 3:14 pm

Um, the turbo boards use eight bit wide static RAM (SRAM), with their data bus lines connected to the CPU data bus. Compare this to the dynamic RAM (DRAM) chips. There are only four DRAM chips, each storing one bit per address. So only four bits wide (a nybble). When the CPU wants to read from the on board DRAM, it puts the address on it's address bus. The ULA then (assuming it is not sending data to the screen for the CRT electron beam) has to fetch two lots of data (two nybble reads) to make up a eight bit byte. So two DRAM accesses. This data is I assume stored in a register / latch in the ULA. Then before the CPU competes it's read cycle, the ULA places the data from the register / latch on the CPU data bus. One other thing to note: the address bus on DRAM chips is multiplexed in order to keep the pin count low. The ULA has to send the wanted address to the DRAM chips as a low word (controlled by /RAS) and then a short time later, a high word (controlled by /CAS) before the DRAM has a valid address. As with any memory, there is a short delay before the data becomes available at the data output pins.

The read from the DRAM therefore takes a bit of time. Whereas, the read from the SRAM can be done far quicker. The address lines on SRAM are not multiplexed, and you get a whole 8 bit byte in one go straight to the CPU.

If a fast enough SRAM chip is used, it can work at 2MHz.

The downside of SRAM, is that it is more expensive.

Mark

ThomasHarte
Posts: 475
Joined: Sat Dec 23, 2000 5:56 pm
Contact:

Re: Electron Memory Contention

Post by ThomasHarte » Tue Sep 06, 2016 3:46 pm

1024MAK wrote:Um, the turbo boards use eight bit wide static RAM (SRAM), with their data bus lines connected to the CPU data bus.
This is as far as I needed to read to understand my mistake. Apologies all round. Like the person on the other thread, I've never managed to find a schematic and had assumed they cut over the ULA to reuse the existing RAM chips. Like some sort of idiot.

(EDIT: but this is nothing compared to my five minute slight puzzlement last week at the high number of SRAM chips on the Electron schematic. Which, it turned out, was a mislabelled Atom schematic. But as I'd zoomed right in on the RAM chips to crib part numbers and then started following the lines around in surprise, I'd failed miserably to notice the huge 6522 on the left edge of the board or the 6847 on the centre right)

paulb
Posts: 811
Joined: Mon Jan 20, 2014 9:02 pm
Contact:

Re: Electron Memory Contention

Post by paulb » Tue Sep 06, 2016 4:59 pm

ThomasHarte wrote:
1024MAK wrote:Um, the turbo boards use eight bit wide static RAM (SRAM), with their data bus lines connected to the CPU data bus.
This is as far as I needed to read to understand my mistake.
Yes, I didn't think that as author of ElectrEm you needed too much more information about how the ULA accesses the on-board RAM. :wink:
ThomasHarte wrote:Apologies all round. Like the person on the other thread, I've never managed to find a schematic and had assumed they cut over the ULA to reuse the existing RAM chips. Like some sort of idiot.
Not at all! It takes some digging to find out what people do with these board modifications sometimes.

I thought it would be interesting to put some logic on a carrier board so that zero page (and maybe other regions) could effectively be moved around in memory without the CPU knowing. Then you could have some primitive memory management!

Post Reply