BASIC Tokenizer

discussion of beeb/electron applications, languages, utils and educational s/w
Post Reply
colonel32
Posts: 62
Joined: Wed Jan 18, 2017 7:59 pm
Location: USA
Contact:

BASIC Tokenizer

Post by colonel32 » Thu Feb 07, 2019 3:25 am

Is there an easy way to tokenize plain text BASIC files?

Something like a windows or linux command line tool.

User avatar
richardtoohey
Posts: 3712
Joined: Thu Dec 29, 2011 5:13 am
Location: Tauranga, New Zealand
Contact:

Re: BASIC Tokenizer

Post by richardtoohey » Thu Feb 07, 2019 5:55 am

Do you mean using an 8-bit machine or an Acorn machine?

Or definitely something on Windows/Linux?

You could run a BBC emulator (on Windows or Linux) and slurp in the code and save it tokenised (but don't think that counts as easy) but that would be using 8-bit software.

EDIT: e.g. this in Beebem ...

Code: Select all

*BUILD TEST
0001 10 PRINT "THIS IS UNTOKENISED"
0002 20 PRINT 2+2
0003 30 REM SOMETHING
0004 40 END
0005
Escape
*EXEC TEST
...
L.
   10 PRINT "THIS IS UNTOKENISED"
   20 PRINT 2+2
   30 REM SOMETHING
   40 END
SA."TESTBAS"
*INFO *
$.TESTBAS     FF1900 FF8023 00003E 003
$.TEST        000000 000000 000044 002
I've just used *BUILD to make a non-tokenised file as an example. The key thing is using *EXEC to tokenise it.

Don't think this is what you meant, but it's under 8 bit software so it's a 8 bit software solution! :D

Get the untokenised source file(s) onto an SSD, run them through an emulator and job done.
Last edited by richardtoohey on Thu Feb 07, 2019 6:29 am, edited 1 time in total.

cmorley
Posts: 865
Joined: Sat Jul 30, 2016 7:11 pm
Location: Oxford
Contact:

Re: BASIC Tokenizer

Post by cmorley » Thu Feb 07, 2019 7:46 am

BeebASM will with the PUTBASIC command.

User avatar
Richard Russell
Posts: 758
Joined: Sun Feb 27, 2011 10:35 am
Location: Downham Market, Norfolk
Contact:

Re: BASIC Tokenizer

Post by Richard Russell » Thu Feb 07, 2019 9:16 am

colonel32 wrote:
Thu Feb 07, 2019 3:25 am
Is there an easy way to tokenize plain text BASIC files?
You can call the built-in tokeniser (in all versions of BBC BASIC except Brandy) using a cheat involving EVAL. Jonathan has a BASIC library which incorporates this code and will run on 6502, Z80, 32000, RISC OS, MS-DOS or Windows BBC BASIC. If you want a Windows executable you could very easily combine it with a file reading/writing wrapper and create an EXE using BB4W or BBCSDL.

User avatar
sweh
Posts: 2033
Joined: Sat Mar 10, 2012 12:05 pm
Location: New York, New York
Contact:

Re: BASIC Tokenizer

Post by sweh » Thu Feb 07, 2019 2:07 pm

colonel32 wrote:
Thu Feb 07, 2019 3:25 am
Is there an easy way to tokenize plain text BASIC files?

Something like a windows or linux command line tool.
Maybe http://www.retrosoftware.co.uk/wiki/ind ... niser_in_C would help?
BASIC Tokeniser/Detokeniser
Some C++ (but trivial to convert into C) routines that convert to/from a memory dump of any Acorn machine running the BASIC II ROM and an ASCII file of the BASIC program. Originally taken from my ElectrEm emulator.

Routines of interest are bool ImportBASIC(char *Filename, Uint8 *Mem) and bool ExportBASIC(char *Filename, Uint8 *Memory). SetupBASICTables() must be called once before either routine is used.
I was able to use that as a library, an put a simple wrapper around it.

Code: Select all

$ cat xyz
10 PRINT "Hello THERE"
15 PRINT TIME
20 END
$ ./tokenzier xyz > foo
$ hdump foo
00000000  0D 00 0A 14 20 F1 20 22 48 65 6C 6C 6F 20 54 48   .... . "Hello TH
00000010  45 52 45 22 0D 00 0F 08 20 F1 20 91 0D 00 14 06   ERE".... . .....
00000020  20 E0 0D FF                                        ...
$ beeb list foo
   10 PRINT "Hello THERE"
   15 PRINT TIME
   20 END
FWIW, the changes I made to use it:

Code: Select all

< #include "BASIC.h"
---
> typedef unsigned char Uint8;
> typedef unsigned short Uint16;
> #include <string.h>
> #include <stdlib.h>
> 
683a688
>               printf("Bad PAGE %x\n",Addr);
694a700
>               printf("Bad Top %x\n",TOPAddr-2);
And the wrapper routine (which is a complete hack'n'slash)

Code: Select all

#include <stdio.h>

typedef unsigned char Uint8;
typedef unsigned short Uint16;

void SetupBASICTables();
bool ImportBASIC(char *, Uint8 *);

int main(int argc,char *argv[])
{
  char *name=argv[1];
  if (!name) name="xyz";
  SetupBASICTables();
  Uint8 foo[32768];
  foo[0x18]=14;
  foo[0x12]=2; foo[0x13]=14;
  foo[0xe00]=13; foo[0xe01]=255;
  ImportBASIC(name,foo);
  int t=foo[0x13]*256+foo[0x12];
  int i;
  for(i=0xe00;i<t;i++) { printf("%c",foo[i]); }
  return(0);
}
Basically, set's page to &E00, TOP to &E02, puts an empty program (OD FF) in, calls the "tokenize" function, and then outputs from &E00->TOP to stdout. Hacky :-)
Last edited by sweh on Thu Feb 07, 2019 2:26 pm, edited 1 time in total.
Rgds
Stephen

ThomasHarte
Posts: 475
Joined: Sat Dec 23, 2000 5:56 pm
Contact:

Re: BASIC Tokenizer

Post by ThomasHarte » Thu Feb 07, 2019 4:26 pm

Oh, I'm the original author of that tokeniser — it originates from ElectrEm. Apologies for the kooky types, they're the SDL names for what would eventually be standardised stdint.h, but the code predates C99 support in the compilers I had access to way back when.

Apologies being made, I should probably modernise it and chuck it on Github. I'll try to take a look at that. Off the top of my head: lots of things should probably be const, the 'private' functions should be static, I'm unclear why I didn't just use qsort for my sorting step in building QuickTable, and it'd be smart to eliminate all the globals.

User avatar
Richard Russell
Posts: 758
Joined: Sun Feb 27, 2011 10:35 am
Location: Downham Market, Norfolk
Contact:

Re: BASIC Tokenizer

Post by Richard Russell » Thu Feb 07, 2019 5:43 pm

Here's a BASIC program to do the job. I have uploaded a Windows executable here, it has not been extensively tested but seems to work.

Code: Select all

      REM!Exefile tokenise.exe,signed,encrypt,console
      ON ERROR PRINT REPORT$ : QUIT

      REM Standard console program rubric:
      SYS "GetStdHandle", -10 TO @hfile%(1)
      SYS "GetStdHandle", -11 TO @hfile%(2)
      SYS "SetConsoleMode", @hfile%(1), 0
      *INPUT 13
      *OUTPUT 14

      REM Get and parse command line:
      Cmd$ = @cmd$
      P% = INSTR(Cmd$, """")
      IF P% THEN
        Q% = INSTR(Cmd$, """", P%+1)
        IF Q% = 0 ERROR 100, "Command syntax: tokenise inputfile outputfile"
        srcfile$ = EVAL(Cmd$)
        dstfile$ = EVAL(MID$(Cmd$,Q%+1))
      ELSE
        P% = INSTR(Cmd$, " ")
        IF P% = 0 ERROR 100, "Command syntax: tokenise inputfile outputfile"
        srcfile$ = LEFT$(Cmd$, P%-1)
        dstfile$ = MID$(Cmd$, P%+1)
      ENDIF

      REM Open / create files:
      S% = OPENIN(srcfile$)
      IF S% = 0 ERROR 100, "Couldn't open input file: " + srcfile$
      D% = OPENOUT(dstfile$)
      IF D% = 0 ERROR 100, "Couldn't create output file: " + dstfile$

      REM Tokenise:
      BPUT#D%, &D
      WHILE NOT EOF#S%
        a$ = GET$#S%
        IF a$="" IF PTR#S%>1 THEN PTR#S%=PTR#S%-2 : IF BGET#S%<>BGET#S% a$ = GET$#S%
        BPUT#D%, FNtokenise(a$);
      ENDWHILE
      BPUT#D%, &FF
      CLOSE #S%
      CLOSE #D%
      PRINT "Created Acorn-tokenised file """ + dstfile$ + """"
      QUIT

      DEF FNtokenise(a$)
      LOCAL n
      n = VAL(a$)
      WHILE ASCa$=&20 OR ASCa$>=&30 AND ASCa$<=&39 a$ = MID$(a$,2) : ENDWHILE
      WHILE RIGHT$(a$) = " " a$ = LEFT$(a$) : ENDWHILE
      IF EVAL("1RECTANGLE:"+a$) a$ = $(!332+3)
      IF LENa$ > 251 THEN
        PRINT "Line too long, truncated"
        a$ = LEFT$(a$,251)
      ENDIF
      = CHR$(n DIV256)+CHR$(n MOD256)+CHR$(LENa$+4)+a$+CHR$&D
Last edited by Richard Russell on Thu Feb 07, 2019 6:30 pm, edited 1 time in total.

colonel32
Posts: 62
Joined: Wed Jan 18, 2017 7:59 pm
Location: USA
Contact:

Re: BASIC Tokenizer

Post by colonel32 » Mon Feb 11, 2019 1:11 am

Richard Russell wrote:
Thu Feb 07, 2019 5:43 pm
Here's a BASIC program to do the job.
Wow, thank you everyone. Especially Richard!

Post Reply