So far it only does a bit of 8-bit arithmetic, the 16-bit stuff probably needs throwing away and rewriting, and you have to build it with the bootstrap compiler and run it on a PC, but it will turn (some) programs into code and they run...
(Yes, this is why I've been asking questions about doing arithmetic on the 6502.)
It's woefully, woefully unfinished, but it'll turn this:
Code: Select all
sub print_char(char: uint8) @bytes 0xAD; # LDA abs @words &char; @bytes 0x4C; # JMP abs @words 0xFFE3; # OSASCII end sub; sub print(ptr: [int8]) var index: uint8 := 0; loop var c: int8 := ptr[index]; if c == 0 then return; end if; print_char(c); index := index + 1; end loop; end sub; print("Hello, world!\r");
Code: Select all
L0E00: lda $0E47 jmp LFFE3 rts L0E07: lda #$00 sta $0E48 L0E0C: ldy $0E48 -- values can't be retained in registers across labels lda ($00),y -- the pointer index is a uint8, so we get to use an indexed op here sta $0E49 -- write barrier before the conditional, could be improved cmp #$00 -- it can't remember flag state between instructions, so we check again bne L0E19 rts L0E19: lda $0E49 sta $0E47 jsr L0E00 ldx $0E48 -- haven't taught it how to increment values in memory yet inx stx $0E48 jmp L0E0C lda #$38 sta $00 -- it noticed that ptr is used as a pointer and put it in zero page lda #$0E sta $01 jsr L0E07 rts
The compiler itself is a seven-stage behemoth. The stages are:
- tokeniser --- takes text and produces a token stream and a string table
- parser --- takes the token stream and produces a stack-based front-end bytecode stream
- typechecker --- enforces the Cowgol type rules on the FE bytecode and emits a memory-based backend bytecode stream
- classifier --- scans the call graph and assigns variables to locations in memory
- codegen --- turns BE bytecode stream into actual 6502 opcodes
- placer --- calculates the size of subroutines, resolves labels, and places them in memory
- emitter --- throws the result into an output file
Eventually I want to get it self-hosting, but I don't know yet how big the binaries are going to be. Building the parser with itself, it looks like it needs 20kB of in-memory storage, plus the binary. That's quite a lot on a machine the size of a BBC Micro. The bytecode stream, of course, doesn't use up memory.
It should be portable to other architectures. Right now it assumes little-endian, unaligned, 16-bit address space. If I ever make this work for the 6502, I'll try the Z80 next.