Twisted Brain Demo

Got a programming project in mind? Tell everyone about it!
User avatar
kieranhj
Posts: 724
Joined: Sat Sep 19, 2015 10:11 pm
Location: Farnham, Surrey, UK
Contact:

Twisted Brain Demo

Post by kieranhj » Wed Jun 27, 2018 9:38 am

How Twisted Brain was created

If you're not familiar with the demo, check it out here.

Part #1: FX Framework & Main Loop

In this series of posts I will attempt to explain the inner workings of the Twisted Brain demo. Because some of the FX are more complex than others, these posts will not be in the order in which they appear in the demo. There are a number of concepts that I will try to introduce along the way, with opportunity between posts for questions and clarifications!

The single main principle behind the demo is the ability to execute code when the raster is at a given point on the screen. I will not go into explaining the fundamentals of raster scanning here but see this Wikipedia article if you would like to know more.

The single most important diagram you will need to refer to is the CRTC screen format diagram on page 187 of the NAUG:
CRTC screen format.PNG
CRTC screen format p187 NAUG
We'll go into much more detail about 6845 CRTC registers at a later point. Note that I'm going to use the term "raster line" to refer to the line on the actual screen that the raster is currently scanning across horizontally. The term "scanline" may be overloaded when refering to certain CRTC register behaviours (more later.)

The FX framework is designed so that code is executed at these raster times:
fx framework.png
FX framework
Assuming everything is behaving correctly then the following is true for every FX module:
  1. The FX draw function is called at the very beginning* of raster line 0
  2. The FX draw function may exit at any point but typically runs for 256 raster lines
  3. The music player is polled immediately after draw and must be done so every 20ms (* more on this later)
  4. The scripting system is updated for ~ 3 raster lines
  5. The FX update function is called during the vertical blank period but must return before raster line 0 is reached (maximum ~18 raster lines)
Some useful numbers to remember:
  • One raster line is 64us = 128 cycles @ 2 MHz
  • There are 312 raster lines in a non-interlaced PAL signal so 312 * 64 = 19968 us = 50.08Hz
  • Finally we have 312 * 128 = 39936 cycles per frame
This sounds like a lot but they disappear quickly! We will be counting cycles later on...

How does the FX draw function always get called at the same time?

First it is important to note that the entire demo (after boot) runs with interrupts disabled (SEI) although this does not mean that you cannot check that interrupts have occured by testing the Interrupt Flag Register (R13) of SHEILA. (See page p401 of NAUG.)

The vertical sync pulse is the only method we have to synchronise to the entire TV signal. To find the exact cycle of vsync I used the following code taken from the RetroSoftware forum:

Code: Select all

lda #2
.vsync1
bit &FE4D
beq vsync1 \ wait for vsync

\now we're within 10 cycles of vsync having hit

\delay just less than one frame
.syncloop
sta &FE4D \ 4(stretched), ack vsync

\{ this takes (5*ycount+2+4)*xcount cycles
\x=55,y=142 -> 39902 cycles. one frame=39936
ldx #142 \2
.deloop
ldy #55 \2
.innerloop
dey \2
bne innerloop \3
\ =152
dex \ 2
bne deloop \3
\}

nop:nop:nop:nop:nop:nop:nop:nop:nop \ +16
bit &FE4D \4(stretched)
bne syncloop \ +3
\ 4+39902+16+4+3+3 = 39932
\ ne means vsync has hit
\ loop until it hasn't hit

\now we're synced to vsync
My notes have it attributed to Tom Seddon and Tricky but unfortunately I can no longer find the post! Perhaps it got lost when the forum had to be restored after it was taken down? I know there were many conversations on this topic including RTW and hexwab so my apologies if this has been mis-attributed (I asked Tom and he couldn't remember writing it either!)

Next we setup the 1MHz Timer 1 to interupt at the exact point we require on every frame:

Code: Select all

; Exact time for a 50Hz frame less latch load time
FramePeriod = 312*64-2

; Calculate here the timer value to interrupt at the desired line
TimerValue = 32*64 - 2*64 - 22 - 2

\\ 32 lines for vsync (vertical position = 35 / 39)
\\ interupt arrives 2 lines after vsync pulse
\\ 22 us for code that executes after timer interupt fires
\\ 2 us for latch

; Write T1 low now (the timer will not be written until you write the high byte)
LDA #LO(TimerValue):STA &FE44
; Get high byte ready so we can write it as quickly as possible at the right moment
LDX #HI(TimerValue):STX &FE45             		; start T1 counting		; 4c +1/2c 

; Latch T1 to interupt exactly every 50Hz frame
LDA #LO(FramePeriod):STA &FE46
LDA #HI(FramePeriod):STA &FE47
We know vsync has just taken place and we want the timer to reach zero on the first visible raster line of the screen. Given the vertical sync position is at raster line 280 = 35 * 8 we have to wait another 312 - 280 = 32 raster lines before we have completed our full 312 raster line signal. We also discover that the vertical sync interrupt arrives 2 raster lines after the vsync actually took place (so the raster is actually further ahead that we thought) so we need to adjust for that. Finally, if we want the FX draw function to be called at the start of raster line 0 then we need to compensate for any framework code that runs before the FX draw function, in this case 22us (found by measurement.)

So our initial Timer1 value is 32*64 - 2*64 - 22 - 2 = 1896us

(The extra -2us is due to the time it takes to latch the register as discovered by RTW.)

Timer 1 is put into free-run mode and latched to the value of 312*64 - 2 so that it continues to countdown for the exact duration of a 312 raster line frame, thus reaching zero at the same point on each subsequent 50.08Hz frame.

So at the top of the main loop we simply block waiting for Timer 1 to reach zero before then calling the current FX draw function.

Code: Select all

\\ Wait for first raster line
{
	LDA #&40
	.waitTimer1
	BIT &FE4D				; 4c + 1/2c
	BEQ waitTimer1         	; poll timer1 flag
	STA &FE4D             	; clear timer1 flag ; 4c +1/2c
}
Note that testing Timer 1 in this way involves cycle stretching, which I'm not going to get into here. The net result for our purposes is that there is up to 8 cycles of jitter for when the wait loop will terminate. It is possible to get a truly stable raster (as demonstrated by hexwab on the RetroSoftware forum) but it requires even more careful coding and deemed not worth the extra effort for this demo. (It may return at a future date ;) )

You may observe that this framework requires the code to always generate a 312 raster line signal from the CRTC otherwise the Timer 1 will reach zero at a different position relative to the raster. This will become apparent later on when we discuss the differences between real CRTC behaviour and the emulated behaviour.

Because we need to keep the music playing throughout the demo, it is not possible for us to re-align to vsync using the code above because the syncloop for narrowing down the vsync edge has to be an exact number of cycles. The music player takes a different number of cycles each time it is polled depending on how many bytes have to be decompressed and sent to the SN76489 chip. If the music is not updated every 20ms then there are pauses / slowdowns that are very noticeable and detract from the quality of the production.

I think that's enough for now. Hopefully this is a reasonably clear start. Please do ask any questions, correct anything I've got wrong or suggest improvements for next time! I will try and get one post done per train commute.

You can reference the code on GitHub as we go along: https://github.com/bitshifters/twisted-brain
Last edited by kieranhj on Fri Jun 29, 2018 1:43 pm, edited 1 time in total.
Bitshifters Collective | Retro Code & Demos for BBC Micro & Acorn computers | https://bitshifters.github.io/

User avatar
kieranhj
Posts: 724
Joined: Sat Sep 19, 2015 10:11 pm
Location: Farnham, Surrey, UK
Contact:

Re: Twisted Brain Demo

Post by kieranhj » Wed Jun 27, 2018 8:58 pm

Part #2: Da Brain Picture

Let's start with the simplest effect - the Brain picture reveal. I sent an early work-in-progress version of the demo to Dethmunk and asked if he was inspired enough to make some MODE 2 artwork. He sent back this awesome picture and suggested that the title of the demo might be "Twisted Brain"...

BrainDrain.png
Brain Drain picture by Dethmunk
BrainDrain.png (21.5 KiB) Viewed 2135 times

The effect itself is very simple - each frame a few lines of pixels are copied from the SHADOW screen buffer across to the visible screen buffer in main RAM. We can use this to explain how the various FX module functions operate.

FX Init function
Every FX module has an init function that is called before any frames are drawn. This is used to set up screen buffers to any pretermined state and / or change MODE.

The FX framework requires that all modules return the system to a "known" state so that certain assumptions can be made safely. These are:
  • Standard MODE 2 CRTC registers - i.e. 32 visible character rows each of 8 scanlines
  • ULA Control Register set to MODE 2 value (&F4)
  • ULA Palette set to MODE 2 defaults (but without flashing colours)
  • Main RAM paged in for read/write with ACCCON
  • Main RAM being displayed by the CRTC with ACCCON
Note that the state of the screen buffers is undefined as all modules are expected to either clear or set up the buffer(s) during init.

Finally, the FX framework makes sure that FX init function is called at the start of raster line 0 and the screen display is turned OFF until after the first FX draw function has been called (to hide any initialisation garbage.)

The init function for the Brain picture simply initialises some local ZP variables uses the PuCrunch library to decompress two images from SWRAM, one to main screen and one to SHADOW screen:

DaBrain.png
Just the brain in colours 8-15 loaded to main screen buffer
DaBrain.png (9.45 KiB) Viewed 2135 times

DaBrainAll.png
Final image with brain in colours 8-15 loaded into SHADOW
DaBrainAll.png (22.12 KiB) Viewed 2135 times

(Because the decompress can take a long time, and the Brain reveal is early on in the demo, the SHADOW picture is actually decompressed at boot time before the music starts to avoid a large wait later on.)

FX Update function
The FX framework calls the update function for the current FX module after both the music player and scripting system have been polled. The only guarantee given is that this will be during the vertical blank period and the only requirement is that the function returns before raster line 0 (otherwise everything breaks :) )

The update function is intended to update any logic for the effect. Because we don't really know how long we have before raster line 0, it is not advised to perform much in the way of heavy lifting (although this is stretched for a couple of FX.)

Because the update takes place during vblank, it is safe to write to the visible screen buffer without introducing any flicker or tearing. We just don't have time to write too much.

For the Brain reveal this is simply:
  • Copy the current line from SHADOW buffer to main screen buffer
  • Update the line y value in a pleasing way
  • Repeat as required (actually copies 3 lines per frame)
  • If animated, update palette mapping
The palette is updated using the regular method of writing to the SHEILA Palette Register at &FE21 (see page 207 of the NAUG.)

You will have noticed that only the brain palette animates. This is because it uses colours 8-15 whilst the rest of the image uses colours 0-7. Once Dethmunk provided me with the brain as a separate image I used a short BASIC program to mask in the top bit to the colour values for these pixels.

FX Draw function
We'll talk more about this next time. For the Brain reveal this is literally "do nothing". (There is a function in the code do_nothing that is just RTS.)

FX Kill function
We'll talk more about this next time as well, including the timings and expectations. For the Brain reveal it should have at least set the ULA Palette back to default state (there is a helper function to do this) but looks like I forgot to call it. Clearly to no ill effect. :)
Last edited by kieranhj on Fri Jun 29, 2018 1:40 pm, edited 1 time in total.
Bitshifters Collective | Retro Code & Demos for BBC Micro & Acorn computers | https://bitshifters.github.io/

User avatar
kieranhj
Posts: 724
Joined: Sat Sep 19, 2015 10:11 pm
Location: Farnham, Surrey, UK
Contact:

Re: Twisted Brain Demo

Post by kieranhj » Thu Jun 28, 2018 10:14 am

Part #3: Text Screens

Now we can get into the first effect that runs code on a specific raster line in the FX draw function! But first lets cover off the Init & Update:

FX Init function
As ever, we initialise a few ZP variables including pointers to blocks of text and the pattern used to type the font glyphs, set ULA Control Register to MODE 1 (&D8) then clear the screen to a stipple pattern made up of colours 0 & 2.

FX Update function
This just updates the colour scroll offset value and then plots a single font glyph to the screen (if there is one left to plot) at the next position.

Each block of text is 18 x 14 = 252 characters (conveniently < 256) and each "pattern" is just a list of 252 values specifying which position on screen (x + y * 18) to use for the next character.

I won't cover the font plot routine here, other than to say it takes 1bpp glyph data and writes this to the screen as MODE 1 pixel bytes using colours 1 & 3 stippled as a mask. The font itself came from an Amiga/ST font collection pack I found somewhere and each glyph is 16x15 pixels.

Charset_1Bitplan.PNG
Nice 1-bit Amiga & ST fonts
Charset_1Bitplan.PNG (11.2 KiB) Viewed 2081 times

text without colour animation.png
Text without colour animation showing stipple

FX Draw function
Finally we can get to our first raster timed draw routine! For the text colour effect we are changing the colour values of 2x entries in the palette on every raster line.

Since we know the draw function is always called at (roughly) the start of raster line 0, this becomes "easy" with cycle counting. The main draw loop looks like this (tidied up a bit from GitHub):

Code: Select all

	LDX #0                    ; raster line counter
	LDY palette_lookup_index  ; index into palette lookup tables

	.loop
  \\ Wait 69 cycles 
	FOR n,1,33,1
	NOP
	NEXT
 	BIT 0                     		; 3c

  \\ Set foreground colour = 26c
	LDA foreground_colour, Y		; 4c
	STA &FE21				; 4c
	EOR #&10		              	; 2c
	STA &FE21				; 4c
	EOR #&40		              	; 2c
	STA &FE21				; 4c
	EOR #&10		              	; 2c
	STA &FE21				; 4c
  
  \\ Set background colour = 26c
	LDA background_colour, Y		; 4c
	STA &FE21				; 4c
	EOR #&10		              	; 2c
	STA &FE21				; 4c
	EOR #&40		              	; 2c
	STA &FE21				; 4c
	EOR #&10		              	; 2c
	STA &FE21				; 4c

  \\ Increment palette lookup
	INY                       		; 2c

  \\ Increment raster line counter
	INX				        ; 2c
	BNE loop		              	; 3c
For each raster line, we wait 69 cycles so that our palette change takes place at the end of the line. Then we set the MODE 1 palette values by looking up from predefined tables and writing the values to SHIELA Palette Register (&FE21). Note that we must write 4x values to the palette register to change 1x colour in MODE 1. If the palette is only partially programmed whilst the raster is visible then this becomes very noticeable (pixel colours will change depending on their position in a byte) hence attempting to do this inside the horizontal sync portion as much as possible. (See the CRTC screen format diagram from Part #1.)

As long as all of the code within the loop totals 128 cycles then it will all add up and execution will stay in sync with the raster. Here we have 69 + 26 + 26 + 2 + 2 + 3 = 128 cycles per loop. Do this 256 times and we have filled our screen.

Note how much time is spent in NOP's here - we're spending a lot of time doing nothing. And note the BIT 0 trick which is an easy way to wait 3 cycles with limited consequence to status flags.

Also note that the loop has to be constant time, so in this case it is simpler to do "redundant" work and set the palette to the same value rather than test and branch (because both code branches will need to take the same amount of time.)

Confession time: when I looked at this code in GitHub I found there was a bug and the loop only contained 127 cycles. It obviously didn't have a noticeable effect on the demo! (Probably because the palette is only changed every 20 lines or so in the end, even though it is set every raster line.)

As you can see, this is not a precise art if counting by hand or using the emulator debugger to help with timings, which is what I did most of the time. I think there are definitely tools that could be built (perhaps directives for BeebAsm or cmorley's code scheduler from Bad Apple) to help remove the manual work of cycle counting and avoid errors.

Palette Tables
The palette tables themselves are all based on the "copper" colours, i.e. RGB arranged by hue so red -> magenta -> blue -> cyan -> green -> yellow -> red.

One table is arranged so the copper colours blend into each other, as you can see if we swap to changing the background:

text with copper background.png
Text with "copper" colours as background

The second table is arranged by blending the copper colours with black and white stipple to tone down the standard garish 3-bit BBC RGB palette:

text with pastel background.png
Text with "pastel" colours as background

FX Kill function
For the text FX module we're in MODE 1 so the kill function needs to reset to MODE 2 as per the rules of the FX system. From a CRTC perspective there is no difference between MODEs 0,1,2 so we only really need to set the ULA Control Register to &F4.

By not messing with CRTC registers we are not risking the possibility of creating a malformed frame (i.e. not 312 total lines or vsync not at line 280) that might end up resyncing the TV or throwing out our Timer 1 synchronisation.
Bitshifters Collective | Retro Code & Demos for BBC Micro & Acorn computers | https://bitshifters.github.io/

User avatar
kieranhj
Posts: 724
Joined: Sat Sep 19, 2015 10:11 pm
Location: Farnham, Surrey, UK
Contact:

Re: Twisted Brain Demo

Post by kieranhj » Fri Jun 29, 2018 9:21 am

Part #4: A brief introduction to CRTC registers

Before explaining any more about the effects in the demo, it is worth briefly covering the CRTC registers. Reference this table in p190 of the NAUG:

CRTC registers.PNG
CRTC Registers

As noted before, there is no difference between MODES 0,1,2 as far as the CRTC is concerned - they all have 80 byte columns across the screen. How screen byte values are interpreted as colour pixels is all down to the ULA.

There are many useful references to the CRTC registers, particularly from the AMSTRAD CPC community. I'll only give an overview here but check out these links if you want to learn more:
The best picture I've found explaining the CRTC comes from an Amstrad page:

CRTC screen reference.png
AMSTRAD CRTC Reference

Most important things to note about how the CRTC works:
  • The smallest unit is a CRTC character which is one byte wide - note this may be 2,4 or 8 pixels on the screen depending on the ULA Control Register
  • The display is made up of a number of character rows, typically 39
  • Not all of those character rows are displayed, typically 32 are visible
  • Each character row has a number of scanlines, typically 8
  • The values of the vertical registers must total 312 raster lines for a good (non-interlaced) PAL signal
    • (Scanlines per character x (Vertical total+1)) + Vertical adjust = 312
      E.g. MODE 0,1,2: (8 * (38 + 1)) + 0 = 312
      E.g. MODE 3,6: (10 * (30 + 1)) + 2 = 312
  • Screen addresses are specified in characters i.e. multiples of 8 bytes
  • For each scanline in a character the screen address is offset by 1 byte
  • For each character in a row the memory address is effectively incremented by 8 bytes (one CRTC character)
  • The CRTC has internal counters for the current character column, character row and scanline etc.
  • The CRTC compares the counters for equality against the register values for its interal logic
  • Most of the registers values are read when the counter comparison takes place, but some registers are latched at the start of the display
So you can see it is difficult to change one register without changing the others!

Screen Start Address (R12,R13)
This allows us to specify which memory address is taken as the top left character of the display ( = memory address / 8 ).

In standard BBC MODE 0,1,2 screen configuration we have &5000 bytes available = &5000/8 = &A00 (2560) CRTC characters. A screen 80 CRTC characters wide is therefore 32 rows deep: 80 x 32 = 2560. Any memory address above &8000 is wrapped around to lower memory. I won't go into that here but you can find out more on page 386 of the NAUG.

Note that the R12 & R13 registers are latched (remembered) at the start of the display cycle generated by the CRTC. This means changing the R12 & R13 register values has no immediate effect. Normally there will be one display (CRTC) cycle per frame but we will break this in the next part with "vertical rupture".

You may well be familiar with the hardware scrolling technique used in many games, which is achieved by changing the Screen Start Address to move the display left, right, up or down one character at a time with careful consideration as to what happens at the memory wrap around at &8000 (which will quickly end up in the middle of the screen!)

Vertical Total R4
The Vertical Total Register is the total number of character rows in the display. Once the character row counter in the CRTC reaches the value in the register then... all the counters are zeroed and it just starts again!

Vertical Displayed R6
The Vertical Displayed Register specifies how many character rows are displayed on the screen before the display is "turned off" (i.e. no further bytes are sent to the screen.)

Vertical Sync Position R7
When the character row counter in the CRTC reaches the value in the Vertical Sync Position register then it issues a vertical sync pulse to the TV. This tells the raster beam to return from its current position to the top of the TV screen.

If this value is increased then there will be fewer character rows left before the end of the display cycle, so the display will appear to be higher up on the TV screen.

If this value is decreased then there will be more character rows before the end of the display cycle, so the display will appear to be lower down on the TV screen.

You can try this easily by going into MODE 0 and typing:

Code: Select all

VDU 23,0,7,36,0,0,0,0,0,0
VDU 23,0,7,33,0,0,0,0,0,0
Scanlines per character R9
In default configuration there are 8 scanlines per character row.

Because of the way the memory offset works, increasing this number beyond 8 results in no bytes being available after scanline 8, so the screen is black for those raster lines. The Vertical total must be reduced by an appropriate amount to achieve 312 raster lines for our PAL signal. See register values for MODE 3,6.

Decreasing this number below 8 results in "shorter" character rows but requires the total number of rows to be increased correspondingly somehow, otherwise we'll end up with a malformed frame with less than 312 raster lines.

Note that using this arrangement means we "lose" RAM because CRTC addressing is in characters (multiples of 8 bytes) plus an offset of the scanline counter. If the scanline counter is always less than 8 then those remaining bytes will be unreachable by the CRTC!

The game Fortress uses this reduced scanline technique to achieve diagonal 4x4 pixel scrolling in MODE 1 but as this still covers all 32 character rows of screen RAM we get a letterbox sized screen! (32 x 4 = 128 pixels high, or thereabouts - I haven't checked the exact resolution.)

I think that is enough to be getting on with for now. We will cover Vertical Total Adjust (R5) and some of the Horizontal registers in a later part.
Bitshifters Collective | Retro Code & Demos for BBC Micro & Acorn computers | https://bitshifters.github.io/

User avatar
kieranhj
Posts: 724
Joined: Sat Sep 19, 2015 10:11 pm
Location: Farnham, Surrey, UK
Contact:

Re: Twisted Brain Demo

Post by kieranhj » Fri Jun 29, 2018 1:19 pm

Part 5: Vertical Rupture

Vertical Rupture is the term coined by the CPC community for the technique of programming the CRTC so that it goes through more than one display cycle per PAL frame. This is a very powerful technique that allows us to effectively map the screen buffer to the screen display in a completely non-linear way but does require some careful timing depending on the complexity of the effect desired.

RTW created an incredibly useful document on vertical rupture that I referred to many times during the creation of Twisted Brain: http://www.retrosoftware.co.uk/wiki/ind ... _scrolling. Thanks also to Tricky for his various previous explanations as vertical rupture is used in many of his excellent arcade conversions.

I liked the Amstrad CRTC diagram so much I decided to make my own in Excel to try and help illustrate the concept further. This is what a regular standard MODE 0,1,2 CRTC display cycle looks like:

Regular display cycle.PNG
Regular CRTC display cycle

So we have 39 total character rows, of which 32 are visible, and vsync around row 35. The memory address from R12 & R13 is loaded at the start of the display in the top left and incremented by 1 character (8 bytes) for each cell, moving from left-to-right, top-to-bottom.

We know that there must only be 1x vsync pulse per PAL frame otherwise bad things may happen to your TV but what happens if we don't have a vsync pulse? Say we set the Vertical Sync Position register to a value greater than the Vertical Total, e.g. &FF? The answer is that the CRTC just starts a new display cycle! This means all counters are reset to zero and, cruicially, it will reload the R12 & R13 register values for the Screen Start Address...

We've got to have a vsync pulse at some point, or we'll never get a picture, so there is some timing required. Many existing examples use IRQV1 callbacks for Timer 1 and vsync interrupts, which is perfectly valid and useful, but for this demo we have our FX framework to allow quite carefully timed code against the raster.

Here's a simple example I hacked up to display the Brain picture on screen in 4x non-contiguous sections; the top 8 character rows are at the bottom of the screen, the next 8 above that and so on:

ruptured brain.png
Ruptured Brain picture!

The Brain picture is still loaded into contiguous RAM at &3000 as normal but by using vertical rupture we can reprogram the CRTC to create 4x display cycles each pointing to a different memory address. Here is an illustration of what's going on:

Vertical ruptured display cyces.PNG
4x CRTC display cycles with vertical rupture

The FX draw function for this is:
  • Set Vertical Total R8 = 7 (8 character rows)
  • Set Vertical Sync Position R7 = &FF (never)
  • Set Vertical Displayed R6 = 8 (8 character rows)
  • Set Screen Start Address R12 & R13 = &5800/8 (screen start address for display cycle #2)
  • Wait 64 raster lines (8 character rows) until display cycle #2 starts
  • Set Screen Start Address R12 & R13 = &4400/8 (screen start address for display cycle #3)
  • Wait 64 raster lines (8 character rows) until display cycle #3 starts
  • Set Screen Start Address R12 & R13 = &3000/8 (screen start address for display cycle #4)
  • Wait 64 raster lines (8 character rows) until display cycle #4 starts
  • Set Vertical Total R8 = 14 (15 character rows)
  • Set Vertical Sync Position R7 = 11 (VSync at raster line 280 = 35*8)
  • Set Screen Start Address R12 & R13 = &6C00/8 (screen start address for display cycle #4)
So here we have a screen made up of 8 + 8 + 8 + 15 = 39 total character rows (as before), with 8 + 8 + 8 + 8 = 32 character rows visible (as before) and vsync at 8 + 8 + 8 + 11 = 35 (as before.) So from a PAL TV signal POV it looks exactly the same as our regular screen setup but is actually made up of 4x separate display cycles from the CRTC's perspective.

Some things to note:

Remember that the FX draw function is called on raster line 0 so we're already in display cycle #1. This can get confusing and hard to debug sometimes! The CRTC register values will contain whatever they were set to in the previous cycle (#4).

The R12 & R13 register values are latched at the start of a CRTC display cycle, which is why we need to set them before the next cycle starts. They have no effect on the current display.

Only the last display cycle (#4) must contain a vsync so it's important to reset that register at the start of the new frame in display cycle #1.

Finally, also remember that the internal counters of the CRTC test for equality against the register values so if you set a register to new value that is less than the curent counter then it won't equate until the counter has wrapped around through zero.


Hopefully this helps to demystify vertical rupture a little bit. Now we have the basics of CRTC registers and vertical rupture in place we can start to move on to some of the more advanced effects in the demo.

Simple maths tells us that we cannot use the 6502 CPU @ 2MHz to fill a 20KB screen buffer in 20ms (50Hz.) However, if we can precalculate interesting patterns for the screen buffer, or create smaller screen buffers effects using the limited CPU time we do have, then vertical rupture gives us a way to display memory on the entire screen in a non-linear manner at 50Hz for effectivey zero cost...
Bitshifters Collective | Retro Code & Demos for BBC Micro & Acorn computers | https://bitshifters.github.io/

User avatar
kieranhj
Posts: 724
Joined: Sat Sep 19, 2015 10:11 pm
Location: Farnham, Surrey, UK
Contact:

Re: Twisted Brain Demo

Post by kieranhj » Fri Jun 29, 2018 1:29 pm

Part #6: Copper Colours

Hopefully now we have enough knowledge of the CRTC registers and vertical rupture technique to be able to explain how the copper colour effect was achieved.

But first we need a small diversion on dithering. We know the BBC colour palette is limited to 3-bit RGB so eight intense colours: black, red, green, yellow, blue, magenta, cyan and white. Fortunately the challenge of representing images from a limited colour palette is a well researched topic. I thoroughly recommend reading the Wikipedia article on the subject of dithering.

There are many different approaches to dithering but for this demo we need something that is simple, can be precomputed and, most importantly, works well with movements and animation. Ordered dithering fits these requirements as the dithering patterns used are fixed and predictable so we do not get scintillating pixels under motion.

I won't go into masses of detail about ordered dithering but again recommend the Wikipedia article on the topic. For our purposes it is sufficient to know that we're using a 4x4 ordered dithering matrix that generates 17 fixed patterns to represent the gradient between two colours. The gradient looks like this:

Ordered_4x4_Bayer_matrix_dithering.png
4x4 Ordered dithering
Ordered_4x4_Bayer_matrix_dithering.png (748 Bytes) Viewed 1864 times

Obviously the higher the pixel resolution, the more effective the dithering effect is, so the Copper Colour effect is actually in MODE 0!

Here is the precomputed 4x4 ordered dither in MODE 0 with one pattern per character row (17 in total including pure white and pure black.) This is loaded into the screen buffer RAM by the FX init function:

copper ordered dither mode 0.png
Precomputed dithered MODE 0 screen buffer

With vertical rupture we can display any of these 17 dithering patterns on any character row of the screen. Even better, we can manipulate the CRTC registers so that a character row is just 4 scanlines high to match the size of our 4x4 dithering pattern.

Ignoring motion and colour for the moment, we can generate a screen that blends from white to black to white to black etc. by setting up a screen configuration made of 64x CRTC display cycles, each with 1x character row of just 4x scanlines. 64 x 1 x 4 = 256 visible lines. Remember, we start already inside display cycle 1 and we must generate a vsync in the final (64th) display cycle.

The FX draw function is then:
  • Set Scanlines per Row R9 = 3 (4 scanlines)
  • Set Vertical Total R4 = 0 (1 character row)
  • Set Vertical Sync Position R7 = &FF (never)
  • Set Vertica Displayed R6 = 1 (1 character row)
  • Set Screen Start Address R12 & R13 = from lookup table of addresses white -> black -> white etc.
  • Wait until we've covered exactly 4x raster lines = 512 cycles - however long the above code takes
  • Loop 62 times (62 more display cycles):
    • Calculate offset for next character row
    • Set Screen Start Address for next display cycle (character row)
    • Change ULA palette (see below)
    • Wait until we've covered exactly 4x raster lines = 512 cycles - however long the above code takes
  • Scanlines per Row R9 unchanged = 3 (4 scanlines)
  • Set Vertical Total R4 = 14 (15 character rows)
  • Vertical Displayed R6 unchanged = 1 (1 character row)
  • Set Vertical Sync Position R7 = 7 (in 7 character rows time)
The final (64th) display cycle must include a vsync. Up until this point we've had 63 cycles each of 4 scanlines, so covered 63 x 4 = 252 raster lines. We need 312 raster lines in total with vsync happening at line 280 ( = 35 * 8 ) so the register values are:
  • Vertical Total = (312 - 252) / 4 = 60 / 4 = 15 (character rows)
  • Vertical Sync Position = (280 - 252) / 4 = 28 / 4 = 7 (character rows)
This gives us our nice black and white bars:

copper no colours.png
Copper bars without colour

Adding Colour
To add colour we simply change the ULA palette at each point we're displaying either solid black or solid white in our copper hue: red -> magenta -> blue -> cyan -> green -> yellow -> red.

Remember in MODE 0 we must program 8x ULA palette entries to modify one colour on the screen. This takes a reasonable amount of cycles so to avoid any pixels appearing with a partially programmed palette we only change the palette register when solid colour is on screen. I.e. when the pattern is all colour 0 then we can safely reprogram the palette for colour 1, and vice versa. We always display a minimum of 4x raster lines of solid colour so this is plenty of time.

copper colour no motion.png
Static Copper bars with colour

Adding Animation
Firstly, the index into the screen address lookup table is scrolled (incremented) each frame, so the bars appear to move up the screen constantly.

Next a simple accumulator is used to increment the index into the screen address lookup table for each new character row. This has the effect of stretching the bars by a constant amount. E.g. if the value added to the accumulator each row is large then the index will step through the table quickly (everything will be squashed together.) If the value added to the accumulator is small then the index will step through the table more slowly (everything will be stretched.)

You can see the effect of accumulating by 32 effectively stretches the bars by a factor of 8 ( = 256 / 32 ). I.e. it takes 8 character rows before we increment the index into the lookup table.

copper no colour stretched.png
Stretched Copper bars without colour

Finally, the "stretch factor" is animated on a sine curve each frame (there are a *lot* of sine curves in this demo!) so the end result zooms in & out of the bars as they scroll.

copper colour stretched.png
Stretched Copper bars with colour
Last edited by kieranhj on Sat Jun 30, 2018 3:46 pm, edited 1 time in total.
Bitshifters Collective | Retro Code & Demos for BBC Micro & Acorn computers | https://bitshifters.github.io/

User avatar
kieranhj
Posts: 724
Joined: Sat Sep 19, 2015 10:11 pm
Location: Farnham, Surrey, UK
Contact:

Re: Twisted Brain Demo

Post by kieranhj » Fri Jun 29, 2018 1:29 pm

Part #7: Plasma

Like the Copper Colours, the Plasma effect is also in MODE 0 and uses a prerendered screen buffer consisting of various 4x4 ordered dither patterns:

plasma prerendered screen.png
Prerendered MODE 0 screen buffer

Although this looks a bit random, every two character rows consists of a gradient that goes from white -> black -> white an increasing number of times. So the top two character rows have 1 gradient, the next two rows 2x gradients and most clearly the bottom two character rows have 8x gradients (they start to look like 8x vertical bars.)

Now we're starting to get a bit more comfortable with the idea of vertical rupture, we can think of taking any one of those prerendered character rows from the screen buffer RAM and displaying it on every row of the TV screen:

plasma large bars.png
Large dithered bars

This is exactly the same 64 x 1 x 4 CRTC cycle configuration that we had in the Copper Colours effect but our starting point is to display the same bit of RAM on every character row of the screen. (This idea of repeating the same area of memory is also very powerful as we'll find out in some of the other effects.)

plasma smaller bars.png
Smaller dithered bars

Adding Animation
If we offset the Screen Start Address for each character row, we can animate the bars in a number of ways (which all basically boil down to predefined sine tables :) ):
  • Scroll horizontally by offsetting all rows by the same amount
  • Apply a sine curve of given frequency and amplitude to "bend" the bars
  • Add another sine curve of different frequency and amplitude over the top
  • Update lookups into the tables by differing amounts per frames / character row
Some examples:

plasma some bend.png
Adding some bend to the bars
plasma more bend.png
Yet more bend to the bars

Because there is a fixed amount of time to calculate the screen address for the next display cycles I wanted to avoid any multiplication so instead everything is made up of adding sine curves together. To be honest it is a bit of black art creating sine tables that result in pleasing visual results and there was a lot of trial and error here fiddling with parameters. Even trying to give parameters sensible names and understand the units of measurement can be tough!

plasma double sine curves.png
Two sine curves added together

Adding Colour
There are only two colours on screen for the plasma, so no fancy palette tricks are required. However the colour selections were deliberately chosen to be "close together" so that the dithered blending is more effective to the eye (particularly in MODE 0 high resolution.) Any colours that are neighbours in hue (e.g. red & magenta) look nice or colours that are similar brightness (e.g. white & yellow).

plasma yellow white.png
Close colours improve appearance of dither
Last edited by kieranhj on Sat Jun 30, 2018 3:53 pm, edited 1 time in total.
Bitshifters Collective | Retro Code & Demos for BBC Micro & Acorn computers | https://bitshifters.github.io/

User avatar
kieranhj
Posts: 724
Joined: Sat Sep 19, 2015 10:11 pm
Location: Farnham, Surrey, UK
Contact:

Re: Twisted Brain Demo

Post by kieranhj » Fri Jun 29, 2018 1:30 pm

Part #8: Parallax Bars

The entirety of Twisted Brain was pretty much based on my desire to recreate the Parallax Bars and other effects from one of my all-time favourite demos of the Amiga era: Total Triple Trouble by Rebels.

It follows the same 64 x 1 x 4 CRTC cycle configuration as both the Copper & Plasma effects but utilises 40K of prerendered MODE 1 screen buffers stored in main and SHADOW RAM:

parallax 1.png
Prerendered screen buffer in main RAM
parallax 2.png
Prerendered screen buffer in SHADOW RAM

The bars were created with a BASIC program that draws 7 layers of bars from back-to-front at a uniformly decreasing distance to the "camera". The numbers are arranged so that top set of bars (closest to the "camera") are 32 pixels wide and 32 pixels apart, giving 5x bars across a 320 pixel MODE 1 screen.

We move the "camera" one pixel to the right and draw all of the bars again in a new character row. After this has been repeated 64 times the bars are all back in the same position as when we started (64 pixels between the left edge of each bar on the top layer.)

The bars themselves are plotted using ordered dithering again to create a smooth gradient but using the pixel coordinates within the bar in the dither equation. This means that pixel pattern remains constant inside the bar on each frame, avoiding scintillating pixels.

Because we can only have 32 x 80 byte rows in a 20K MODE 1 screen (2560 CRTC characters), a second 20K screen is created to be placed in SHADOW RAM, giving 64 character rows in total.

Using vertical rupture we can display the same character row all the way down the screen, giving full screen vertical bars for "free". As we step through the 64 available character rows, the bars will move sideways by 1 pixel at a time in a perfect loop, giving the appearance of parallax scrolling.

From here it's a matter of adding yet another animated sine wave offset for each character row and then fiddle with the parameters to control frequency & speed of animation etc. (AKA the black art of sine wave wibbling.)

parallax.png
BBC Parallax bars!
vlcsnap-2018-07-02-10h08m33s824.png
Amiga Parallax bars!

SHADOW RAM
I've glossed over one aspect of the above - because we have 64x character rows we need to tell the CRTC whether to display from main or SHADOW RAM. The first 32x prerendered rows are in main RAM and the second 32x rows in SHADOW.

This is done easily enough using the Access Control Register (ACCCON) located at address &FE34 in SHIELA. See page 161 - 163 in the NAUG for full details.

Note that there is an annoying mistake in the diagram in the NAUG on page 162, although the text is all correct. Here is a corrected version of the diagram:

ACCCON corrected.PNG
ACCCON diagram corrected

One gotcha is that changing the ACCCON register takes immediate effect (whereas our CRTC Screen Start Address register is latched at the next display cycle.) This means if we are currently showing main RAM and want our next display cycle to show from SHADOW we have to update ACCCON immediately before it is needed.

Thankfully the FX framework plus 6502 instruction cycle counting means we can put this code inside the horizontal blank period at the end of the 4th scanline of each display cycle.

Code: Select all

	LDA #62
	STA parallax_crtc_row

	.loop
	\\ Update our sine tables for next character row / cycle

	TXA					; 2c
	CLC					; 2c
	ADC parallax_wavey			; 3c
	TAX 					; 2c
	LDA parallax_sine_table, X		; 4c
	CLC					; 2c
	ADC parallax_x				; 3c
	AND #&3F				; 2c
	TAY					; 2c

	\\ Wait 49 cycles so we're towards horizontal sync

	FOR n,1,23,1
	NOP
	NEXT
	BIT 0

	\\ Wait two more raster lines

	JSR cycles_wait_128
	JSR cycles_wait_128

	\\ Update the Screen Start Address for next cycle

	LDA #12: STA &FE00			; 2c + 4c
	LDA parallax_vram_table_HI, Y		; 4c
	STA &FE01				; 4c

	LDA #13: STA &FE00			; 2c + 4c
	LDA parallax_vram_table_LO, Y		; 4c
	STA &FE01				; 4c

	\\ Wait another raster line so were at the very end of 4th scanline

	JSR cycles_wait_128

	\\ Set correct video page

	LDA &FE34				; 4c++
	AND #&FE				; 2c
	ORA parallax_vram_table_page, Y		; 4c
	STA &FE34				; 4c++

	\\ Next character row / cycle

	DEC parallax_crtc_row			; 5c
	BNE here				; 3c
Now it becomes clear why this effect is one of the most timing sensitive. If Timer 1 reaches zero at a point other than the beginning of raster line 0 then the switch between main or SHADOW RAM will take place at the wrong time. We'll get on to the timing differences between the emulators and real hardware later on but this is why we get this result:

parallax timing bug.png
Parallax Bars w/ 64us timing bug

We're a raster line out when switching between main and SHADOW RAM so end up with single "glitch" lines at those boundaries.
Last edited by kieranhj on Tue Jul 03, 2018 9:38 am, edited 2 times in total.
Bitshifters Collective | Retro Code & Demos for BBC Micro & Acorn computers | https://bitshifters.github.io/

User avatar
kieranhj
Posts: 724
Joined: Sat Sep 19, 2015 10:11 pm
Location: Farnham, Surrey, UK
Contact:

Re: Twisted Brain Demo

Post by kieranhj » Fri Jun 29, 2018 1:30 pm

Part #9: Vertical Blinds

The "Vertical Blinds" effect was one of the earliest that I prototyped whilst experimenting with vertical rupture to repeat a single character row over the entire screen for "free". The original code used IRQV1 callbacks before the FX framework existed. I wasn't going to include the effect in the demo but both simonm and sbadger quite liked it. :)

The final implementation has a CRTC configuration of 2x scanlines per row, 1x character row per cycle and 128x display cycles per frame. This means our "frame buffer" is just 80 x 2 = 160 bytes in size - small enough to update completely every frame.

However the effect is complicated enough that the mini frame buffer cannot be cleared and redrawn in the FX update function. Instead a double-buffering approach is used and the work to draw the mini frame buffer is moved to the FX draw function. Double buffering is cheap when your frame buffer is so small...

FX Update function
  • Which buffer?
    • Set write ptr to character row 1 and CRTC Screen Start Address to character row 0
      or
    • Set write ptr to character row 0 and CRTC Screen Start Address to character row 1
  • Then swap buffers.
FX Draw function
  • Set Scanlines per Row R9 = 1 (2 scanlines)
  • Set Vertical Total R4 = 0 (1 character row)
  • Set VSync Position R7 = &FF (never)
  • Set Vertical Displayed R6 = 1 (1 character row)
  • Loop to copy colour values from a linear line buffer into MODE 2 screen buffer pixels (~83 raster lines)
  • For 14x vertical blind "bars" (~78 raster lines):
    • Update horizontal position (from sine table)
    • Update width (from another sine table)
    • Draw bar into linear line buffer
  • Wait ~92 raster lines (until we reach display cycle #128)
  • Scanlines per Row R9 unchanged = 1 (2 scanlines)
  • Vertical Total R4 = ((312 - 254) / 2) - 1 = (58/2) - 1 = 29 - 1 = 28 (29 character rows)
  • Vertical Sync Position R7 = (280 - 254) / 2 = 26 / 2 = 13 (13 character rows time)
  • vertical Display R6 unchanged = 1 (1 character row)
The linear line buffer is just an array of 256 bytes that represents the pixels along the top line of the screen. To keep things simple 1x byte represents 1x pixel with 15x values that are mapped to MODE 2 pixel pairs in the copy loop to give a simple stipple effect for the appearance of more colours.

Keeping everything as a linear line buffer has a number of advantages:
  • It is simple to write into the line buffer -> we only need to worry about BBC screen byte arrangement once during the copy loop
  • Clipping at screen edges becomes trivial -> we just copy the middle 160 pixels from the line buffer to the MODE 2 screen
  • A single colour vaue in our line buffer can be turned into multiple screen bytes and the pixel values remapped if required -> stipple
  • Copying from the line buffer to the screen buffer is a constant time operation -> suited to our FX draw functions
The only slight complication is making sure that writing into the line buffer is also a constant time operation for our FX draw function to remain predictable. This is done by having two loops of the same cycle length that always total the same number of iterations. The first loop writes the required number of colour values into the line buffer and the second loop writes the rest of the colour values into a sink.

Here is a very early prototype of this effect dating from 2016:
vertical blinds prototype 2016.png
Vertical bars prototype circa 2016
Last edited by kieranhj on Mon Jul 02, 2018 9:13 pm, edited 1 time in total.
Bitshifters Collective | Retro Code & Demos for BBC Micro & Acorn computers | https://bitshifters.github.io/

User avatar
kieranhj
Posts: 724
Joined: Sat Sep 19, 2015 10:11 pm
Location: Farnham, Surrey, UK
Contact:

Re: Twisted Brain Demo

Post by kieranhj » Fri Jun 29, 2018 1:31 pm

Part #10: Kefrens aka Alcatraz bars

I have been on a quest to produce true single scanline Kefrens bars on the Beeb for quite a while. Here's a very early protoype of the effect from 2016 which is only achieving one bar every 8 scanlines for a massive total of 28x bars!

kefrens bars circa 2016.png
Kefrens bars prototype circa 2016

The crux of this effect is to display the same scanline of memory on every raster line of the screen but update the scanline memory just before the raster so that the pixels accumulate over every line.

For those familiar with the Atari 2600 (VCS) this is a similar concept - that machine has no frame buffer so the video chip must be programmed just as the raster passes the correct part of the screen. (Truly mind boggling that any games were ever made, but I digress.)

The FX Update function simply clears the scanline buffer (80 bytes) and updates our sine table indices. The FX Draw function is a bit more complicated as we're now down to the smallest possible CRTC cycle configuration: 1x scanline per character row, 1x character row per display cycle repeated 256x times!
  • Screen Start Address R12 & R13 = &3000 (constant)
  • Set Scanlines per Row R9 = 0 (1 scanline)
  • Set Vertical Total R4 = 0 (1 character row)
  • Set Vertical Sync Position R7 = &FF (never)
  • Set Vertical Displayed R6 = 1 (1 character row)
  • Loop 254x times:
    • Update lookup into sine tables to get next X position for bar
    • Lookup write address for X position
    • Check if X is odd (right pixel aligned) or even (left pixel aligned)
    • Write 4x bytes for 7x pixels, masking in 8th (left or right) pixel from screen accordingly
  • Scanlines per Row R9 unchanged = 0 (1 scanline)
  • Vertical Total R4 = ((312 - 255) / 1) - 1 = 56 (57 character rows)
  • Vertical Sync Position R7 = (280 - 255) / 1 = 25 (25 character rows time)
  • Vertical Displayed R6 = 1 (1 character row)
The pixel writing code looks like this:

Code: Select all

	.write_pixels
	LDA kefrens_addr_table_LO, Y		; 4c
	STA writeptr				; 3c
	LDA kefrens_addr_table_HI, Y		; 4c
	STA writeptr+1				; 3c

	TYA:LSR A
	BCS right

	;2c
	\\ Left aligned
	LDA # PIXEL_LEFT_7 OR PIXEL_RIGHT_3	; white/yellow
	LDY #0:STA (writeptr), Y		; 8c
	LDA # PIXEL_LEFT_6 OR PIXEL_RIGHT_2	; cyan/green
	LDY #8:STA (writeptr), Y
	LDA # PIXEL_LEFT_5 OR PIXEL_RIGHT_1	; magenta/red
	LDY #16:STA (writeptr), Y
	LDY #24:

	\\ Mask in right most pixel from screen
	LDA (writeptr),Y			; 6c
	AND #&55				; 2c
	ORA #PIXEL_LEFT_4			; 2c	; blue/screen
	STA (writeptr), Y

	BRA continue ;3c

	.right				;3c
	\\ Mask in first left pixel from screen
	LDY #0
	LDA (writeptr),Y			; 6c
	AND #&AA				; 2c
	ORA #PIXEL_RIGHT_7			; 2c	; screen/white
	STA (writeptr), Y

	LDA # PIXEL_LEFT_3 OR PIXEL_RIGHT_6	; yellow/cyan
	LDY #8:STA (writeptr), Y
	LDA # PIXEL_LEFT_2 OR PIXEL_RIGHT_5	; green/magenta
	LDY #16:STA (writeptr), Y
	LDA # PIXEL_LEFT_1 OR PIXEL_RIGHT_4	; red/blue
	LDY #24:STA (writeptr), Y
	NOP
	
	.continue
Both paths of the branch must take the same number of cycles, hence the additional NOP at the end of the right hand branch.

Differences vs Real Hardware
Whilst the effect does work on the BBC Master machines that I've had access to (all sporting the apparently common Hitachi HD6845SP CRTC chip) there is definitely some not-quite-fully-understood behaviour when it comes to setting certain registers on the final scanline of a CRTC display cycle. Given that with this particular arrangement we have 256 "final scanlines" it's not clear that this should work at all..!

Much (confusing) discussion can be found on this thread: viewtopic.php?f=4&t=14971

Based on the behaviour I've observed on real hardware, my unproven suspicion is that setting the Vertical Total R4 to 56 for the final display cycle with vsync doesn't seem to be acknowledged until after the current scanline / display cycle completes, so we end up with a frame 313 raster lines long, instead of 312. This causes Timer 1 to reach zero during the 313rd raster line rather than at the start of raster line 0, so presumably the first single scanline display cycle set up for the subsequent frame is just ignored. The knock on means all the following frames are the 312 raster lines but more by accident than design and everything "works".

As Timer 1 is now out by 64us, this manifests bugs in subsequent effects (see Parallax bug) unless a single frame of 311 raster lines is used to realign Timer 1 reaching zero with the start of raster line 0. This is achieved during deinit (FX Kill function) by resetting all of the CRTC registers back to their MODE 2 default values then hacking a single character row to be 7 scanlines rather than 8. This is what happens when selecting "Real Hardware" from the BASIC loader.

kefrens.png
Kefrens bars 2018!
Last edited by kieranhj on Tue Jul 03, 2018 9:39 am, edited 3 times in total.
Bitshifters Collective | Retro Code & Demos for BBC Micro & Acorn computers | https://bitshifters.github.io/

User avatar
kieranhj
Posts: 724
Joined: Sat Sep 19, 2015 10:11 pm
Location: Farnham, Surrey, UK
Contact:

Re: Twisted Brain Demo

Post by kieranhj » Fri Jun 29, 2018 1:31 pm

Part #11: Checkerboard Zoom

The zooming checkerboard turned out to require more iterations than originally anticipated (for reasons that I will get onto shortly) and actually does the most work in the update function (during vblank) of all the effects.

As with the Kefrens bars, the CRTC configuration is 1x scanline per row x 1 character row per display cycle x 256 cycles and we're displaying the same scanline of RAM on every raster line.

The trick here is to use the flashing bit in the ULA Video Control Register (at SHEILA &FE20) to invert the colours 8-15 of our scanline pixels at close to zero cost. You can read about the Video Control Register on page 204 of the NAUG:

ULA Video Control Register.PNG
ULA Video Control Register

The FX Draw function is quite simple:
  • Set up our single scanline CRTC cycle (as before):
    • Screen Start Address R12 & R13 = &3000 (constant)
    • Set Scanlines per Row R9 = 0 (1 scanline)
    • Set Vertical Total R4 = 0 (1 character row)
    • Set Vertical Sync Position R7 = &FF (never)
    • Set Vertical Displayed R6 = 1 (1 character row)
  • Loop 254 times:
    • Wait 94 cycles until we're in hblank
    • Mask parity bit of checkboard into ULA Video Control Register flash colour select (bit 0)
    • Increment Y coordinate of checkerboard
    • Test whether Y coordinate > size of check (N) and if true invert parity bit
  • Set final CRTC cycle to form a complete PAL signal (as before):
    • Scanlines per Row R9 unchanged = 0 (1 scanline)
    • Vertical Total R4 = ((312 - 255) / 1) - 1 = 56 (57 character rows)
    • Vertical Sync Position R7 = (280 - 255) / 1 = 25 (25 character rows time)
    • Vertical Displayed R6 = 1 (1 character row)
Even though we only have a single scanline of pixels to draw, this must be completed during our update function before raster line 0 occurs. When the music player is at peak load this leaves a maximum of around 18 raster lines (18 * 128 = 2304 cycles.)

Again, it sounds like plenty but if we want to have single pixel movement horizontally and scale the squares by single pixel increments, suddenly there is a heap of pixel masking to think about.

Our checkerboard has x & y offset coordinates in pixels plus size of the check of (N) pixels. If we choose the top left of our checkerboard to be black, then by moving (N) pixels horizontally the top left of the screen will become white, ditto if we move (N) pixels vertically. If we move (N) pixels both horizontally & vertically then it will remain black. We can think about the parity of the check which is probably easier to explain in this diagram:

checkerboard parity.png
Drawing the checkerboard

So our single scanline frame buffer needs to start with an offset of (x MOD N) black pixels and continue drawing pixels until N pixels are drawn then invert the colour, repeat until we reach the end of the line. When we're in the FX draw loop we start assuming (y MOD N) lines of the board are off screen then invert the colour every time we reach N lines being "drawn".

Low Frequency Clock
The original implementations were in MODE 4 which is only 40 CRTC characters wide and therefore relatively easy to write. One gotcha though is that MODE 4 is a low frequency 6845 clock and therefore not considered the same as MODE 0,1,2 by the CRTC. Take a look back at Part #4 and the default values of the CRTC registers: everything is roughly half for MODE 4,5,6.

Hmmm, we know the CRTC counters test for equality against the register values and if we reduce the register values below the counter values then overflow will occur (we have to wait for the counters to wrap around through 0.) Is it possible to change between a high frequency and low frequency clock rate MODE without causing the TV to resync and timing to be thrown out?

The answer I think is yes to avoid resync but not sure when it comes to timing. Once the ULA has set the CRTC to low frequency clock rate we're then in a race against the horizontal counters to set each of the CRTC registers to their new lower values without overflow occuring. It does start to matter which registers are set in which order - if you don't set the Horizontal Total in time then your raster line is too long, if you don't set your Horizontal Sync Position in time then the hblank is in the wrong place etc.

I got this working just about but seemed brittle. Also I had no clue how this would affect the raster timing relative to Timer 1 on real hardware. Say we're at Horizontal Character = 30 in high frequency clock then suddenly we switch to low frequency clock and reduce the Horizontal Total from 127 to 63. Our Horizontal Counter now says we've only got 34 more characters to go but this feels like we've "lost" some characters. 34 low frequency characters = 68 high frequency ones so we'll only get 30 + 68 = 98 high frequency total characters this raster line, rather than 128. This certainly bent my brain and I decided that the emulators almost certainly weren't going to be accurate in that respect, so probaby best stick to the high frequency clock throughout.

Drawing 320 pixels in ~18 raster lines?
How hard can it be? Switching to MODE 1 turned out to be surprisingly challenging to squeeze the pixel draw into the time limits of the FX update function. I'd be delighted if someone points out a better way to do this!

Rounding up to a generous 2400 cycles / 80 bytes in the scanline = 30 cycles / byte, should be easy? Except the pixel colour can be inverted at any X value. 2400 cycles / 320 pixels = 7.5 cycles / pixel, suddenly doesn't seem that generous.

In the end I unrolled the following loop where X contains the number of pixels drawn so far and A contains the current byte to write to the screen. With Y as a temporary register store and some lookup tables for masking and subtraction.

Code: Select all

\\ How many pixels to start with?
SEC
LDA checkzoom_N
SBC checkzoom_XmodN
TAX

\\ Always start with black
LDA #0

\\ Unroll the loop
FOR c,0,79,1
{
	CPX #4                      ; 2c
	BCS write_byte              ; 3c
	\\ Flip our bits
	EOR #&FF                    ; 2c
	TAY                         ; 2c
	\\ Write partial byte
	EOR checker_left_mask, X    ; 4c
	STA &3000 + c * 8    	    ; 4c
	; carry clear
	LDA checkzoom_N             ; 3c
	SBC checker_lazy_table, X   ; 4c
	TAX                         ; 2c
	.partial_byte
	TYA                         ; 2c
	BRA done                    ; 3c
	.write_byte
	STA &3000 + c * 8    	    ; 4c
	.next_column
	DEX:DEX:DEX:DEX             ; 8c
	.done
}
NEXT
\\ Long path = 30c Short path = 17c -> worst case = 80x30 = 2400c = 18 scanlines

.checker_left_mask
EQUB %00000000
EQUB %10001000
EQUB %11001100
EQUB %11101110

.checker_lazy_table
EQUB 3,2,1,0
Given that the vast majority of the FX draw function is spent in NOPs (94 cycles / raster line) I guess I could have used the double buffer technique from the Vertical Blinds effect and moved the work here. The challenge then becomes how to interleave work for the next frame whilst still inverting colour parity at the right time for the current size of check (N).
Last edited by kieranhj on Wed Jul 04, 2018 12:22 pm, edited 1 time in total.
Bitshifters Collective | Retro Code & Demos for BBC Micro & Acorn computers | https://bitshifters.github.io/

User avatar
kieranhj
Posts: 724
Joined: Sat Sep 19, 2015 10:11 pm
Location: Farnham, Surrey, UK
Contact:

Re: Twisted Brain Demo

Post by kieranhj » Fri Jun 29, 2018 1:32 pm

Part #12: Bitshifters "MODE 7" logo

You've probably guessed by now that the wibbling Bitshifters logo at the start of the demo isn't MODE 7 at all but MODE 1. :) I took a screen grab of Steve Horsley's original Acornsoft x Bitshifters Teletext logo that he made for Prince of Persia and passed it throug Image2BBC in MODE 1.

bitshifters logo mode 1.png
MODE 7 Bitshifters screen converted to MODE 1

Fortunately, because the logo is 5x Teletext characters high, there are 15x MODE 7 "sixel" rows that make up the image. Each "sixel" will be 2 or 3 scanlines high, depending on whether it is in the middle of the character or not, but we know we can use vertical rupture to display duplicate scanlines "for free" so we only need to each store sixel row as one scanline in our screen buffer.

As MODE 1 has 4x pixels per byte, for horizontal movement we'll need to preprocess pixel offsets, as we can't afford to do this at runtime. Our standard MODE 1 screen has 32x character rows so we can comfortably store 2x sets of 16x scanlines making up the logo image:

bitshifters logo preprocessed.png
Preprocessed Bitshifters logo including 2x pixel shift

This means the scanlines can only be moved horizontally in 2x pixel increments. It would be nice to use SHADOW RAM for the other 2x sets to give all 4x pixel offsets for smooth single pixel horizontal movement but I'll save that for another time.

The FX draw function sets up a 1x scanline per row x 1 character row per display cycle x 256 display cycles CRTC arrangement again. The 16x preprocessed scanlines are displayed as 64x rasterlines in the draw function from a table, including blank scanlines in the right places to achieve the Teletext separated graphics look.

The ULA palette is changed every 64x raster lines so that the logo appears to be the classic red, green, yellow & blue combination.

The animation is generated from, you guessed it, a couple of sine wave tables used to calculate a character offset for the Screen Start Address of every row.

Ultimately the effect is quite simple but has taken us a while to get to the concept of single scanline CRTC cycles and prerendered screen buffers to be able to explain it!

I had many ideas for things I wanted to do on this screen, including each one of the 4x logos have a different animation, but just ran out of time. Here's a couple of shots from a prototype that rotates the logo towards the viewer with a soft of orthogonal camera:

logo rotate 1.png
Unused prototype of rotating logo 1
logo rotate 2.png
Unused prototype of rotating logo 2
Last edited by kieranhj on Wed Jul 04, 2018 1:11 pm, edited 2 times in total.
Bitshifters Collective | Retro Code & Demos for BBC Micro & Acorn computers | https://bitshifters.github.io/

User avatar
kieranhj
Posts: 724
Joined: Sat Sep 19, 2015 10:11 pm
Location: Farnham, Surrey, UK
Contact:

Re: Twisted Brain Demo

Post by kieranhj » Fri Jun 29, 2018 1:32 pm

Part #13: Twister

I've saved the Twister to (near) the end as it's probably the most technically complex effect using 40K of prerendered single scanline screen buffers. But, given everything we've learnt about the CRTC by now, should be relatively easy to explain. The Twister is an iconic demoscene effect and witnessed on just about every platform. Like the Kefrens bars, I've been on a quest to achieve a single scanline Twister at "high" (MODE 1) resolution for a long time now.

The effect itself is quite simple, as you can see from this BASIC program:

Code: Select all

10 MODE 1
20 FOR A%=0 TO 255
40 angle=360 * A% / 256
50 x1=40+38*SIN(RAD(angle))
60 x2=40+38*SIN(RAD(angle + 90))
70 x3=40+38*SIN(RAD(angle + 180))
80 x4=40+38*SIN(RAD(angle + 270))
90 IF x1 < x2 THEN PROCline(A%,120+x1,120+x2,0,1)
100 IF x2 < x3 THEN PROCline(A%,120+x2,120+x3,0,2)
110 IF x3 < x4 THEN PROCline(A%,120+x3,120+x4,0,3)
120 IF x4 < x1 THEN PROCline(A%,120+x4,120+x1,32,1)
140 NEXT
150 END
160
170 DEF PROCline(y,xstart,xend,plot,colour)
180 GCOL plot, colour
190 MOVE xstart * 4, 1023 - y * 4
200 DRAW xend * 4, 1023 - y * 4
210 ENDPROC
The challenge is how to do this in real time, of course. The answer, as ever, is to precalculate our screen buffer and use vertical rupture to display the scanline corresponding the desired rotation of the Twister at that point.

Since we can only have 32x character rows in a standard MODE 1 screen, having just 32x rotation values wouldn't look that great. Instead we draw 128x rotation values and store them 4x to a scanline:

twister prerendered.png
Prerendered screen buffer with 128x rotations

Instead of having 32x rows each of 80x CRTC characters, we can think of this screen buffer as 128x rows each of 20x characters (128 x 20 = 2560 characters, as before.)

By modifying the Characters per Line register (R1) to 20, the CRTC will only display 20 characters on a horiontal row, regardless of the Horizontal Total. So our raster lines will still be 128 characters wide, as we require for 64us horizontal timing, but we'll only see 20 of them. This is most commonly used to save RAM in games to create a square screen made of either 64 (MODE 0,1,2) or 32 (MODE 4,5) horizontal characters.

We can then use the Horizontal Sync Position register (R2) to m