Cowgol: actually a thing

Got a programming project in mind? Tell everyone about it!
Post Reply
User avatar
hjalfi
Posts: 119
Joined: Sat May 13, 2017 10:17 pm
Location: Zürich, Switzelrand
Contact:

Cowgol: actually a thing

Post by hjalfi » Wed Oct 11, 2017 11:07 pm

(This is a new topic because the previous one was misleadingly titled and had degenerated into me complaining about how terrible the 6502 was. Plus, this is a major milestone. Is that okay?)

I am very pleased, gleeful even, to announce the first proper release of Cowgol, my almost self hosted fully compiled, properly strongly typed, Ada-inspired programming language for the 6502 and Z80! Which is written in Cowgol itself!

http://cowlark.com/cowgol

The documentation is either markdown files in the github repo, or here: http://cowlark.com/cowgol

...and here is an ADF to prove it, which you can run on a BBC with 6502 Tube:

That floppy, which is an ADFS-M image (Cowgol is too awesome for a DFS disk; more accurately, it's too awesome for the 200kB maximum disk size), contains the full, massive, eight-stage compiler and will let you compile small programs into proper executables.

Now for the bad news. Apart of being ridiculously buggy, undocumented, and unfinished in many ways, it's also really, really slow. Like, compiling "Hello world!" takes seven minutes on b-em. I don't know how fast b-em's emulated floppy disk is, so if anyone wants to give this a try on real hardware and time it, and maybe even video it, that would be hilarious^H^H^H^H^H^H^H^H^Hhelpful.

The code generated isn't that bad, even if it is running on a machine with, um, slightly more memory than my wristwatch (curse you, Pebble).

I'll do a proper writeup and maybe even some documentation tomorrow.

(I'll update this if more releases ever happen.)

2018-02-22 update: version 0.2 has been released. Lots of bugs fixed.

2018-02-26 update: version 0.3 has been released. A few bug fixes, but it's now literally twice as fast (mostly through the magic of disk buffering).

2018-03-12 update: version 0.4 has been released. Massively better 6502 code generation, so now the compiler is 88% of the size it used to be, and substantially faster.

2018-04-05 update: version 0.5 has been released. Some scary 6502 code generation bugfixes, plus a whole new Z80 code generator supporting CP/M.

2018-04-09 update: version 0.6.1 has been released. Much better Z80 code generation; Fuzix native compilation support; and it turned out that 6502 codegen had been horribly broken for both 0.5 and 0.6 but nobody noticed...
Attachments
fuzixdist.tar.gz
(67.71 KiB) Downloaded 9 times
cpmzdist.zip
(83.29 KiB) Downloaded 8 times
bbcdist.zip
(90.83 KiB) Downloaded 7 times
Last edited by hjalfi on Tue Apr 10, 2018 7:08 pm, edited 9 times in total.
David Given
http://cowlark.com

User avatar
hjalfi
Posts: 119
Joined: Sat May 13, 2017 10:17 pm
Location: Zürich, Switzelrand
Contact:

Re: Cowgol: actually a thing

Post by hjalfi » Wed Oct 11, 2017 11:19 pm

...and I just found the first bug! Looks like there's something wrong with the things table --- there are duplicate entries when the offset is bigger than 0x8000. Sigh.

Incidentally, if you ever want to know what a BBC Micro sounds like when it's swapping, Cowgol is for you.
David Given
http://cowlark.com

User avatar
BigEd
Posts: 1967
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Cowgol: actually a thing

Post by BigEd » Thu Oct 12, 2017 7:06 am

Congratulations! I'm following this with great interest.

User avatar
hjalfi
Posts: 119
Joined: Sat May 13, 2017 10:17 pm
Location: Zürich, Switzelrand
Contact:

Re: Cowgol: actually a thing

Post by hjalfi » Thu Oct 12, 2017 10:42 pm

David Given
http://cowlark.com

User avatar
BigEd
Posts: 1967
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Cowgol: actually a thing

Post by BigEd » Fri Oct 13, 2017 4:12 am

Have you a way to see how the performance would be using a machine with any kind of solid state storage? That's the modern norm - no head movement, no rotational delay.

User avatar
hjalfi
Posts: 119
Joined: Sat May 13, 2017 10:17 pm
Location: Zürich, Switzelrand
Contact:

Re: Cowgol: actually a thing

Post by hjalfi » Fri Oct 13, 2017 1:40 pm

I don't actually have any real hardware, so all this is running on an emulator. I mostly use b-em. I've noticed that b-em's floppy disk noises aren't accurate, and it just classifies seeks into single step / short / medium / long, with timings to match, so I believe it's unfairly slow. Without the floppy disk noises it's a lot faster but that might be unfairly fast.

I should try creating an emulated hard drive image and seeing how that behaves. Anyone have real hardware who wants to give it a try?

I think, however, that the main problem is a combination of Cowgol being slow and the MOS file system APIs being slow. If every byte access needs an RPC across the tube and then multiple bank switches on the I/O processor end before they hit the buffer, it's going to be painful. One thing I want to try is my own disk buffering inside the Cowgol binaries, but I'll need to free up some RAM for that first.

I'm also going to try and get MAME set up, which I believe has more accurate floppy disk emulation, and see what the performance is like there.
David Given
http://cowlark.com

User avatar
BigEd
Posts: 1967
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Cowgol: actually a thing

Post by BigEd » Fri Oct 13, 2017 5:36 pm

I believe Alan Cox felt the MOS performance was too slow to support a comfortable Fuzix experience. You are, by the sound of it, pushing the file system hard too. As you say, some buffering (or other intelligence) at the application level could make a big difference.

Many - perhaps most - modern users of co-processors have modern hardware which can run a lot faster than the historical 3MHz or 4MHz. That might help some aspects of Cowgol, but not those aspects which are I/O limited.

I could in principle help with benchmarking, as I have a setup with solid state storage and a fast modern copro. All I'd need is the getting my act together.

dp11
Posts: 813
Joined: Sun Aug 12, 2012 8:47 pm
Contact:

Re: Cowgol: actually a thing

Post by dp11 » Fri Oct 13, 2017 5:39 pm

If you went down the PiTubeDirect route then the 6502 can address much more than 64K RAM by page swapipng.

User avatar
BigEd
Posts: 1967
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Cowgol: actually a thing

Post by BigEd » Fri Oct 13, 2017 5:43 pm

A very good point - applies also the Matchbox, for those running an up to date release of the design. Same I/O interface for the banking.

User avatar
hjalfi
Posts: 119
Joined: Sat May 13, 2017 10:17 pm
Location: Zürich, Switzelrand
Contact:

Re: Cowgol: actually a thing

Post by hjalfi » Sun Oct 15, 2017 1:13 pm

I got it running in MAME, with hopefully more accurate timing and floppy disk noises:

https://www.youtube.com/watch?v=1wLATW7sVXs

The bad news is that the 'Hello, world!' compilation now takes 1010 seconds, or nearly seventeen minutes.

It sounds like nearly all the time is spent waiting for disk seeks and drive spinups. I'll need to have a go using an ADL file instead of an ADF one; because they're double-sided, you get twice as much data per track, which means less seeking and faster transfers. But I don't expect it to make much difference.

This is all yak shaving, anyway; the only thing which would really help is to simply do less work.
David Given
http://cowlark.com

User avatar
BigEd
Posts: 1967
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Cowgol: actually a thing

Post by BigEd » Sun Oct 15, 2017 1:50 pm

Presumably it's not helping that you're also compiling the support library from source. Is the setup such that it could be compiled once, and linked in at a late stage?

User avatar
hjalfi
Posts: 119
Joined: Sat May 13, 2017 10:17 pm
Location: Zürich, Switzelrand
Contact:

Re: Cowgol: actually a thing

Post by hjalfi » Sun Oct 15, 2017 5:57 pm

Not really --- the internal representation doesn't use symbols, so there's no way for two object files to tell each other when they're referring to the same object. Also, the classifier phase needs access to the entire program in order to build the graph to do variable placement (which is the magic bit which makes the whole thing feasible).

(It's tokeniser -> parser -> typechecker -> blockifier -> classifier -> codegen -> placer -> emitter.)

That said, it would be pretty easy to allow incremental tokenisation and probably parsing, which are two of the really slow bits. That would allow the standard libraries to be parsed once and then reused for every user program. You'd still need to typecheck the entire program, though. Once the classifier runs, unused subroutines will be ignored so speed then is based on how much the program actually does.

...pause while I perform measurements...

Yeah, tokenising and parsing consume nearly half the time. Right now, it's:

Tokeniser: 383 seconds
Parser: 160 seconds
Typecheck: 159 seconds
Blockifier: 187 seconds
Classifier: 61 seconds
Codegen: 108 seconds
Placer: 59 seconds
Emitter: 100 seconds
David Given
http://cowlark.com

User avatar
BigEd
Posts: 1967
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Cowgol: actually a thing

Post by BigEd » Sun Oct 15, 2017 6:09 pm

Interesting to know - thanks for the measurements. I'm going to suppose you're in the first part of "make it work first, then make it work fast!" Having something which works is no mean feat - well done!

User avatar
hjalfi
Posts: 119
Joined: Sat May 13, 2017 10:17 pm
Location: Zürich, Switzelrand
Contact:

Re: Cowgol: actually a thing

Post by hjalfi » Sun Nov 05, 2017 7:25 pm

Belated, due to other projects and Windows barfing on my Linux partition... thankfully all my actual *data* was elsewhere, but it did take a while to figure out how to build MAME again.

I hacked in support for precompiling the standard library, allowing you to skip about 8 minutes of boilerplate for every compilation. (All it does is tokenisation and parsing.)

Time to precompile the standard library: 590 seconds.
Time to compile 'Hello, world!': 585 seconds.

Combined, it was 1010 seconds... this is on MAME with realistic floppy disk timings.

It looks like a lot of the time is spent seeking from track 0 where the directories are to about track 70 where the data is; this takes a good second every single time. I experimented with using a second disk so that I can read from one and write to the other. It helps no end, but the configuration's a bit fragile. Also, the time spent doing every chunk of work is just enough time for the floppy drive to spin down, so there's more time waste waiting for it to spin up again. Isn't this configurable somewhere? It'd be interesting to bump it up to a few seconds and see what happens.

Realistically, of course, this needs a hard drive.
David Given
http://cowlark.com

User avatar
Pernod
Posts: 1226
Joined: Fri Jun 08, 2012 10:01 pm
Location: Croydon, UK
Contact:

Re: Cowgol: actually a thing

Post by Pernod » Sun Nov 05, 2017 8:13 pm

hjalfi wrote:Also, the time spent doing every chunk of work is just enough time for the floppy drive to spin down, so there's more time waste waiting for it to spin up again. Isn't this configurable somewhere?
Are you referring to the disc timings set by the keyboard dipswitch? If yes then these can be set from the Dip Switches menu.

Also be aware that the 1770 implementation in MAME is much more accurate than the 8271.
You can select either at startup:
mame bbcb -fdc acorn8271
mame bbcb -fdc acorn1770
- Nigel

BBC Model B, ATPL Sidewise, Acorn Speech, 2xWatford Floppy Drives, AMX Mouse, Viglen case, BeebZIF, etc.

User avatar
hjalfi
Posts: 119
Joined: Sat May 13, 2017 10:17 pm
Location: Zürich, Switzelrand
Contact:

Re: Cowgol: actually a thing

Post by hjalfi » Thu Feb 22, 2018 10:37 pm

Cowgol's not dead! I've just released version 0.2.

http://cowlark.com/cowgol/

Lots of bug fixes --- I can't believe the previous version actually worked --- and a few new language features (array initialisers). And now there's basic Commodore 64 support. It also supports standard library precompilation, which makes actually using it much less painful (although no sane person would actually use this for real work).

It's even faster! The standard benchmark (compiling a one-line 'hello world') now takes 16m7s, where the previous version ran it in 16m50s.
David Given
http://cowlark.com

User avatar
hjalfi
Posts: 119
Joined: Sat May 13, 2017 10:17 pm
Location: Zürich, Switzelrand
Contact:

Re: Cowgol: actually a thing

Post by hjalfi » Mon Feb 26, 2018 10:13 pm

0.3!

Now the standard benchmark takes 7m50s, which means it's now twice as fast.

Reading and writing small objects via MOS system calls turns out to be really, really slow, especially over the second processor.
David Given
http://cowlark.com

User avatar
BigEd
Posts: 1967
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Cowgol: actually a thing

Post by BigEd » Tue Feb 27, 2018 6:50 am

A major step forward!

SteveF
Posts: 510
Joined: Fri Aug 28, 2015 8:34 pm
Contact:

Re: Cowgol: actually a thing

Post by SteveF » Tue Feb 27, 2018 8:35 pm

hjalfi wrote:Now the standard benchmark takes 7m50s, which means it's now twice as fast.
Nice one, congratulations!
hjalfi wrote:Reading and writing small objects via MOS system calls turns out to be really, really slow, especially over the second processor.
Do you have any idea how small "small" is here, as a guideline for getting good I/O performance? Is reading/writing 256 bytes at a time with OSGBPB "small"? (I'm not asking for anything precise here, unless you happen to have the data handy, just a rough idea!)

Cheers.

Steve

User avatar
hjalfi
Posts: 119
Joined: Sat May 13, 2017 10:17 pm
Location: Zürich, Switzelrand
Contact:

Re: Cowgol: actually a thing

Post by hjalfi » Tue Feb 27, 2018 10:20 pm

I was reading and writing blocks of five to ten bytes, depending on exactly what was happening. The tokeniser, of course, was reading text a byte at a time.

My new buffered version reads and writes blocks of 256 bytes (aligned with sector boundaries; I'm hoping that ADFS is smart enough to optimise for this). The performance seems fine there. I haven't tried using bigger buffers because, frankly, 16-bit arithmetic is hard...

As my compiler is way too big to fit in I/O processor memory, this is all via the Tube. I know that a Tube RPC can be pretty expensive, so that may be the limiting factor here.
David Given
http://cowlark.com

SteveF
Posts: 510
Joined: Fri Aug 28, 2015 8:34 pm
Contact:

Re: Cowgol: actually a thing

Post by SteveF » Tue Feb 27, 2018 11:18 pm

Thanks, that makes sense - I'm intermittently working on file I/O for my PLASMA port, so this information will come in handy. I'm glad 256 bytes is fine, I really didn't want to have to do 16-bit arithmetic either. :-)

User avatar
hjalfi
Posts: 119
Joined: Sat May 13, 2017 10:17 pm
Location: Zürich, Switzelrand
Contact:

Re: Cowgol: actually a thing

Post by hjalfi » Mon Mar 12, 2018 10:58 pm

0.4!

The 6502 code generator is looking pretty darn adequate, if I say so myself. It's still dumb as rocks, but the set of heuristics it's using is producing code which isn't too awful.

Here's the print routine, which is really naive:

Code: Select all

sub print(ptr: [int8])
    loop
        var c: int8 := ptr[0];
        if c == 0 then
            return;
        end if;
        print_char(c);
        ptr := ptr + 1;
    end loop;
end sub;
(It doesn't use indexing because Cowgol doesn't know how big the objects pointers point to are, and so has to use 16-bit arithmetic. The above code is cheaper. I'm going to redefine the language to make this better at some point.)

Here's what it compiles into, annotated by me.

Code: Select all

L098F:  ldy     #$00
        lda     ($40),y  --- ptr
        sta     $E93A   --- c
        cmp     #$00
        bne     L099B
        rts
L099B:  lda     $E93A    --- c
        sta     $E93F    --- print_char's parameter
        jsr     L097D    --- print_char
        inc     $40    --- ptr lo
        bne     L09AA
        inc     $41    --- ptr hi
L09AA:  jmp     L098F
        rts
That's not too bad. Admittedly it's a carefully chosen example.

The generated code is now good enough that it's almost capable of self-hosting --- only one tool is too big to fit into memory, and that's the parser which needs rewriting anyway.
David Given
http://cowlark.com

SteveF
Posts: 510
Joined: Fri Aug 28, 2015 8:34 pm
Contact:

Re: Cowgol: actually a thing

Post by SteveF » Tue Mar 13, 2018 9:17 pm

That's pretty impressive! And it looks like you could get some further improvement with a bit of relatively straightforward peephole optimisation too. (I'm thinking you could remove the "cmp #$00" and the redundant 'rts' at the end - not much, but on code this small to start with, it's actually quite a significant saving.)

User avatar
hjalfi
Posts: 119
Joined: Sat May 13, 2017 10:17 pm
Location: Zürich, Switzelrand
Contact:

Re: Cowgol: actually a thing

Post by hjalfi » Tue Mar 13, 2018 10:43 pm

Prior to about five minutes ago I was thinking that a peephole optimiser would be rather hard based on a mistake in the internal architecture, but I actually think now it would be perfectly possible. There's a lot of things which can be easily improved by a peephole optimiser which would be hard to do in the code generator.

I think, however, I'm going to take a break and go visit Z80-land for a bit. I want to get something running on Fuzix. I'll need a new register allocator, which may be applicable to the 6502 as well.

It'll be interesting comparing the two architectures.
David Given
http://cowlark.com

User avatar
hjalfi
Posts: 119
Joined: Sat May 13, 2017 10:17 pm
Location: Zürich, Switzelrand
Contact:

Re: Cowgol: actually a thing

Post by hjalfi » Wed Apr 04, 2018 10:59 pm

Just released Cowgol 0.5.

http://cowlark.com/2018-04-05-cowgol-0.5/

This version has some 6502 fixes, but the big new feature is a Z80 code generator and CP/M support. I've been testing with a Unix CP/M emulator, but I don't see any reason why it wouldn't work on a BBC Z80 second processor running CP/M. (If there's any interest it would be easy to generate BBC Z80 MOS binaries too.)

Z80 code quality is bad but there are plenty of obvious fixes which will drastically improve it.
David Given
http://cowlark.com

SteveF
Posts: 510
Joined: Fri Aug 28, 2015 8:34 pm
Contact:

Re: Cowgol: actually a thing

Post by SteveF » Thu Apr 05, 2018 10:12 pm

Great to see progress on this. I hadn't realised you were writing a blog about it; by sheer chance I clicked on the link and noticed there's loads of fascinatingly geeky detail there. So thanks for that!

User avatar
hjalfi
Posts: 119
Joined: Sat May 13, 2017 10:17 pm
Location: Zürich, Switzelrand
Contact:

Re: Cowgol: actually a thing

Post by hjalfi » Tue Apr 10, 2018 7:09 pm

0.6, and then 0.6.1 after I realised that 6502 codegen had been broken since 0.5 but nobody had noticed.

I think I'm actually going to take a break from this now. The next big thing to do is to rewrite the parser, which is horrible; but I want to spend some time porting Fuzix to my new laptop first (an Amstrad NC200)...
David Given
http://cowlark.com

Post Reply