DOS vs UNIX Line Endings
- BeebMaster
- Posts: 3970
- Joined: Sun Aug 02, 2009 5:59 pm
- Location: Lost in the BeebVault!
- Contact:
DOS vs UNIX Line Endings
I think this is a very old problem, but one that has only just started to bug me.
I've been on Linux for about 12 years now, so all my text files have had UNIX line endings since then (ASCII 10 line end character). I've never even really noticed it until I adapted my (ARMBASIC) HTML generator for my website to run on 6502 BASIC as well. I thought it would be nice if I could edit the caption files that are fed into the HTML generator on a Beeb as well as doing the generation itself.
So I loaded up my template in Edit, and there are no line breaks at all, it's just all the text littered with CTRL-J characters so it's not readable and not editable. In Wordwise Plus, View and Inter-Word it's even worse, because the control characters just show as spaces, so I have no idea where the line breaks should be.
*TYPEing it displays it correctly.
I made a version of the template with DOS line endings (ASCII 13 followed by ASCII 10), which is a bit better. In Edit it shows the CTRL-J characters at the beginning of each line, and in the others these are just shown as a space as before, at the beginning of the lines.
I don't really want to move everything to DOS line endings, not least because it will break my HTML generator which can't cope correctly with two consecutive end of line characters which are only supposed to be a single line break. The generator itself outputs ASCII 13 line breaks, so maybe I've unwittingly created my own standard! That seems sensible and normal to me, and displays correctly when loaded in a Beeb editor or word-processor, or *TYPEd, or loaded in a Linux text editor.
UNIX line endings can be converted on RISC OS in !Edit using the CR<>LF facility, which is handy to aid readability, but it modifies the file to change from UNIX to DOS endings, so that's not all that satisfactory either.
Is there any way on a Beeb I can edit a text file which will correctly display UNIX line endings as a new-line?
Or is there a way on a Linux text editor (Kate, Gedit etc) for me to specify my own (per-file preferably) line endings so I can just use the "BM Standard" of ASCII 13?
I've been on Linux for about 12 years now, so all my text files have had UNIX line endings since then (ASCII 10 line end character). I've never even really noticed it until I adapted my (ARMBASIC) HTML generator for my website to run on 6502 BASIC as well. I thought it would be nice if I could edit the caption files that are fed into the HTML generator on a Beeb as well as doing the generation itself.
So I loaded up my template in Edit, and there are no line breaks at all, it's just all the text littered with CTRL-J characters so it's not readable and not editable. In Wordwise Plus, View and Inter-Word it's even worse, because the control characters just show as spaces, so I have no idea where the line breaks should be.
*TYPEing it displays it correctly.
I made a version of the template with DOS line endings (ASCII 13 followed by ASCII 10), which is a bit better. In Edit it shows the CTRL-J characters at the beginning of each line, and in the others these are just shown as a space as before, at the beginning of the lines.
I don't really want to move everything to DOS line endings, not least because it will break my HTML generator which can't cope correctly with two consecutive end of line characters which are only supposed to be a single line break. The generator itself outputs ASCII 13 line breaks, so maybe I've unwittingly created my own standard! That seems sensible and normal to me, and displays correctly when loaded in a Beeb editor or word-processor, or *TYPEd, or loaded in a Linux text editor.
UNIX line endings can be converted on RISC OS in !Edit using the CR<>LF facility, which is handy to aid readability, but it modifies the file to change from UNIX to DOS endings, so that's not all that satisfactory either.
Is there any way on a Beeb I can edit a text file which will correctly display UNIX line endings as a new-line?
Or is there a way on a Linux text editor (Kate, Gedit etc) for me to specify my own (per-file preferably) line endings so I can just use the "BM Standard" of ASCII 13?
- BeebMaster
- Posts: 3970
- Joined: Sun Aug 02, 2009 5:59 pm
- Location: Lost in the BeebVault!
- Contact:
Re: DOS vs UNIX Line Endings
A solution has presented itself, which often happens when writing down the problem.
I did a regex replace \n with \r and whilst it jumbles everything up, after re-loading in Kate, the file displays correctly, and will display correctly on the Beeb. My HTML generator always checks for either CHR$10 or CHR$13 so the new "BeebMaster Standard" line endings don't affect it.
I did a regex replace \n with \r and whilst it jumbles everything up, after re-loading in Kate, the file displays correctly, and will display correctly on the Beeb. My HTML generator always checks for either CHR$10 or CHR$13 so the new "BeebMaster Standard" line endings don't affect it.
Re: DOS vs UNIX Line Endings
Kate claims to have line ending auto detection, but I've never used it. Gedit (3.36.2, at least) silently keeps whatever line endings it was fed, offering to change under "Save As". Emacs does too, but you probably don't want to go there.BeebMaster wrote: ↑Sat Nov 14, 2020 11:22 amOr is there a way on a Linux text editor (Kate, Gedit etc) for me to specify my own (per-file preferably) line endings so I can just use the "BM Standard" of ASCII 13?
The traditional way of doing this is using the unix2(dos|mac) / (mac|dos)2unix tools. On this Ubuntu system, they're in the dos2unix package. Capabilities vary from system to system, and the dos2unix I used on Solaris systems in 1997 was very different from the one I've got here. Common to all of them is that they silently overwrite files:
Code: Select all
unix2mac file.txt
Code: Select all
unix2mac -n unixfile.txt macfile.txt
None of these utilities add the expected Ctrl-Z at end of file that CP/M needs, but you probably don't need that.
Re: DOS vs UNIX Line Endings
It's more of a per-project than per-file thing, but specifying what an editor should use for line endings in different types of file is one of the things EditorConfig does: https://editorconfig.org/BeebMaster wrote: ↑Sat Nov 14, 2020 11:22 amOr is there a way on a Linux text editor (Kate, Gedit etc) for me to specify my own (per-file preferably) line endings so I can just use the "BM Standard" of ASCII 13?
Various teletext things including a web based teletext editor which can export as mode 7 screens.
Join the Teletext Discord for teletext chat.
Join the Teletext Discord for teletext chat.
Re: DOS vs UNIX Line Endings
See https://github.com/SteveFosdick/Utils The utilities txt2bbc, txt2cpm, and txt2dos all write files with the corresponding line ending and, in the case of txt2cpm, with the ^Z at the end. They also don't need you to specify what the line ending are already and use a common, simple, finite state machine to handle any of CR only, LF only and CR/LF.
Also, we tend to think of CR-only as the standard text file of the BBC (like Mac classic) but there wasn't really a standard at the start as far as I can tell. The BBC Micro lacked a supplied text editor until the Master. BASIC does not make it all easy to use text files and has its own format for PRINT# and INPUT#. *EXEC uses CR only but only because that's the ASCII code the Return key generates and thus is used by OSWORD 0 to terminate an input line. *BUILD uses CR because it is usually used to create files for *EXEC whereas *SPOOL copies everything sent to OSWRCH so ends up with CR/LF as the line endings and the BCPL editor uses that format too.
Last edited by Coeus on Sat Nov 14, 2020 7:18 pm, edited 1 time in total.
Re: DOS vs UNIX Line Endings
Geany (GTK-based, light IDE) also deals with different line endings as does Notepad++ on Windows. It's a feature I'd expect most modern text editors to have (though I suspect the original notepad still doesn't.), even to the point that if an editor you like is open source and it doesn't have this feature I'd raise a bug and then maybe consider submitting a patch/pull request.scruss wrote: ↑Sat Nov 14, 2020 5:53 pmKate claims to have line ending auto detection, but I've never used it. Gedit (3.36.2, at least) silently keeps whatever line endings it was fed, offering to change under "Save As". Emacs does too, but you probably don't want to go there.
- Richard Russell
- Posts: 2071
- Joined: Sun Feb 27, 2011 10:35 am
- Location: Downham Market, Norfolk
- Contact:
Re: DOS vs UNIX Line Endings
Doesn't OSNEWL at &FFE7 (which as far as I know has been in the MOS from the start) effectively set the standard as LFCR? It's quite unusual (LF, CRLF and CR probably all being more common).
I am suffering from 'cognitive decline' and depression. If you have a comment about the style or tone of this message please report it to the moderators by clicking the exclamation mark icon, rather than complaining on the public forum.
- 1024MAK
- Posts: 10544
- Joined: Mon Apr 18, 2011 5:46 pm
- Location: Looking forward to summer in Somerset, UK...
- Contact:
Re: DOS vs UNIX Line Endings
There was definitely no standard line ending control character in the past. That’s why dot matrix printers often had a DIP switch to select LF or CR or both (also auto advance or ignore).
Really, both should be used, as LF and CR are different things... But of course, that used up more valuable memory, so most of the time, instead only one control code was used.
And every computer system/manufacturer did their own thing (Sinclair’s ZX80 and ZX81 used 0x76, which is the Z80 HALT instruction code).
Mark
Really, both should be used, as LF and CR are different things... But of course, that used up more valuable memory, so most of the time, instead only one control code was used.
And every computer system/manufacturer did their own thing (Sinclair’s ZX80 and ZX81 used 0x76, which is the Z80 HALT instruction code).
Mark
For a "Complete BBC Games Archive" visit www.bbcmicro.co.uk NOW!
BeebWiki - for answers to many questions...
Fault finding index • Acorn BBC Model B minimal configuration • Logic Levels for 5V TTL Systems
BeebWiki - for answers to many questions...
Fault finding index • Acorn BBC Model B minimal configuration • Logic Levels for 5V TTL Systems
Re: DOS vs UNIX Line Endings
Thanks. I've been making do with an awk one-liner and requiring LF eols because who wouldn't?Coeus wrote: ↑Sat Nov 14, 2020 7:01 pmSee https://github.com/SteveFosdick/Utils The utilities txt2bbc, txt2cpm, and txt2dos all write files with the corresponding line ending …
Code: Select all
awk '{printf("%s\r\n", $0);} END {printf("%c", 26);}'
The ZX80/81 weren't even close to ASCII in any way, something I only found out recently. And I really don't want to know what my PDP-8 clone uses internally. In OS-8 BASIC, it does have a sort-of ASCII code function, but A-Z is way down where I'd expect control characters to be.
Re: DOS vs UNIX Line Endings
I use use the `tr` command:
Code: Select all
tr '\012' '\015' < unixfile > beebfile
tr '\015' '\012' < beebfile > unixfile
Rgds
Stephen
Stephen
Re: DOS vs UNIX Line Endings
From BBC BASIC you can read lines of text agnostic to line endings with the FNrd() function in StringIO, adapted from Richard's original:
Code: Select all
90 REM rd(in%) - Read a <cr>, <lf>, <cr><lf>, <lf><cr> or <eof> terminated string from in%
100 REM -----------------------------------------------------------------------------------
110 DEFFNrd(i%):LOCALA%,B%,A$:REPEAT:A%=BGET#i%:IFA%<>10ANDA%<>13:A$=A$+CHR$A%
120 UNTILA%=10ORA%=13OREOF#i%:IFNOTEOF#i%:B%=BGET#i%:IFA%=B%OR(B%<>13ANDB%<>10):PTR#i%=PTR#i%-1
130 =A$
Code: Select all
$ bbcbasic
PDP11 BBC BASIC IV Version 0.32
(C) Copyright J.G.Harston 1989,2005-2020
>_
Re: DOS vs UNIX Line Endings
I run into this all the time (Apple II uses CR (13) as the line ending for example.) My solution is to use the Linux command line util tr.
tr \\r \\n infile > outfile will convert CR to LF.
tr \\n \\r infile > outfile will do the reverse.
MS-DOS CR+LF endings are more of a pain in the ass to deal with. There are lots of editors that can load one and save in a different format.
EDIT: Should have read all the thread before responding. @sweh beat me to it with tr. Seconded!
tr \\r \\n infile > outfile will convert CR to LF.
tr \\n \\r infile > outfile will do the reverse.
MS-DOS CR+LF endings are more of a pain in the ass to deal with. There are lots of editors that can load one and save in a different format.
EDIT: Should have read all the thread before responding. @sweh beat me to it with tr. Seconded!
Re: DOS vs UNIX Line Endings
On another subject I think the PDP-8 usually packs two six byte chars in a 12 bit word. Nothing like ASCII.
PDP-10 has 36 bit words and does six chars packed to a word.
PDP-10 has 36 bit words and does six chars packed to a word.
- BeebMaster
- Posts: 3970
- Joined: Sun Aug 02, 2009 5:59 pm
- Location: Lost in the BeebVault!
- Contact:
Re: DOS vs UNIX Line Endings
Thanks for all the replies, like I said at the beginning, I haven't really ever give this much thought before, I suppose I always assumed line-endings were &D because that's what the Beeb seems to do, and it isn't all that often you have to look at a hex dump of a text file generated elsewhere. Sounds like some company called Apple have beaten me to it with &D endings, but whatever became of them, eh, so I'm still claiming to have invented "BeebMaster line endings"!
For about 12 years I always used Gedit on Linux, and as has been noted, at time of save it gives you the choice between DOS or UNIX line endings (and also encoding). However the death-knell was sounded when they took away the File menu, and then I was finding it struggling to load large text files (dmesg dumps or BeebSCSI logs etc) so I looked for something else and settled on Kate. Line-endings and encoding have to be set in the preferences, so it's not as easy to switch them per-file. It actually has a 6502 assembler display mode! For now doing a regex /n to /r seems to work, and it survives a save, so I can use that on a Beeb and have it display nicely. (Actually, I think Kate auto-converts /r back to /n on load, because if you repeat the regex, it will do it again, but it doesn't seem to spoil the saved file).
On the Master, I've been using Edit, but it doesn't have on-screen word wrap. View doesn't automatically wrap to the margins even using READ to load the file, and after using FORMAT it seems to muck things up a bit. Even Inter-Word has let me down, as it skips the control character I use as a delimiter in the text file (|) when spooling the output, so I can't use that either. Probably that was a bad choice of delimiter, but it's too late now. It's a real shame, as it was looking good in 106-characters-per-line mode.
Might end up having to write my own Beeb text editor. When I've got a couple of years to spare. I did start writing my own word-processor once upon a time, but got stuck with how to manage text being inserted in the middle of existing text.
For about 12 years I always used Gedit on Linux, and as has been noted, at time of save it gives you the choice between DOS or UNIX line endings (and also encoding). However the death-knell was sounded when they took away the File menu, and then I was finding it struggling to load large text files (dmesg dumps or BeebSCSI logs etc) so I looked for something else and settled on Kate. Line-endings and encoding have to be set in the preferences, so it's not as easy to switch them per-file. It actually has a 6502 assembler display mode! For now doing a regex /n to /r seems to work, and it survives a save, so I can use that on a Beeb and have it display nicely. (Actually, I think Kate auto-converts /r back to /n on load, because if you repeat the regex, it will do it again, but it doesn't seem to spoil the saved file).
On the Master, I've been using Edit, but it doesn't have on-screen word wrap. View doesn't automatically wrap to the margins even using READ to load the file, and after using FORMAT it seems to muck things up a bit. Even Inter-Word has let me down, as it skips the control character I use as a delimiter in the text file (|) when spooling the output, so I can't use that either. Probably that was a bad choice of delimiter, but it's too late now. It's a real shame, as it was looking good in 106-characters-per-line mode.
Might end up having to write my own Beeb text editor. When I've got a couple of years to spare. I did start writing my own word-processor once upon a time, but got stuck with how to manage text being inserted in the middle of existing text.
Re: DOS vs UNIX Line Endings
I wrote a text editor for the Apple II during the summer. It is not a trivial task. (If you're curious the source code is here ... https://github.com/bobbimanners/emaille ... pps/edit.c)
For Linux, you may want to give Sublime Text a try. You can download it here: https://www.sublimetext.com/ People keep recommending it to me, but I am a vi diehard.
For Linux, you may want to give Sublime Text a try. You can download it here: https://www.sublimetext.com/ People keep recommending it to me, but I am a vi diehard.
Re: DOS vs UNIX Line Endings
So that's an output-centric view of a text file, i.e. something that can be copied byte for byte to an output device such as a screen or printer and have it display correctly. Looking from an input perspective, users expect to hit a single key to signal end of line and the usual key, Return, generates CR so to have a single convention there has to be some translation going on somewhere. CR could be expanded to CR/LF on input, upon writing the file to disc, or upon output. The BBC Micro definitely leans towards the "translate on output" option with the provision of OSACII though as I said earlier, there are exceptions such that it doesn't seem to be me to be a strong convention in the way that CRLF is for CP/M, DOS and Windows and Mark already mentioned printers which accept a variety of line endings.
Unix's choice of LF (which it calls newline) seems strange at first as this requires translation both on input and on output (which is done by the terminal driver). My guess as to why is that CR on its own is useful for overstrike, i.e. to print a line in bold one can send the print head of an impact printer back to the left and print the line again, or even print parts of it again for selective bold. Characters can also be combined this way, for example to get accents or underline. Advancing to the next line without returning the print head to the left is less useful.
It is also worth remembering that this "stream of bytes" view of text files, or even all files, is far from universal. The APIs of mainframe operating systems often presented a file as a sequence of records which maybe fixed or variable length, where for text files on record would be a line.
- 1024MAK
- Posts: 10544
- Joined: Mon Apr 18, 2011 5:46 pm
- Location: Looking forward to summer in Somerset, UK...
- Contact:
Re: DOS vs UNIX Line Endings
Yeah, agree. But as with a lot of things, it’s a hangover from ASCII control codes and teletypewriters/terminals and line printers...
Really there should have been an open international agreement/standard... (because if there is not the ‘right’ standard, just add your own

Mark
For a "Complete BBC Games Archive" visit www.bbcmicro.co.uk NOW!
BeebWiki - for answers to many questions...
Fault finding index • Acorn BBC Model B minimal configuration • Logic Levels for 5V TTL Systems
BeebWiki - for answers to many questions...
Fault finding index • Acorn BBC Model B minimal configuration • Logic Levels for 5V TTL Systems
- Richard Russell
- Posts: 2071
- Joined: Sun Feb 27, 2011 10:35 am
- Location: Downham Market, Norfolk
- Contact:
Re: DOS vs UNIX Line Endings
I'm not sure that it "generates" CR, at least not on the modern platforms I mostly deal with. Typically pressing the Enter key results in a 'key down' event with the key being identified by a symbolic constant (e.g. VK_ENTER) that could in principle be anything. When I receive that event I do write CR into the input buffer, since that's what BBC BASIC programs expect, but that's a choice of the application program rather than something determined by the OS.
I am suffering from 'cognitive decline' and depression. If you have a comment about the style or tone of this message please report it to the moderators by clicking the exclamation mark icon, rather than complaining on the public forum.
- 1024MAK
- Posts: 10544
- Joined: Mon Apr 18, 2011 5:46 pm
- Location: Looking forward to summer in Somerset, UK...
- Contact:
Re: DOS vs UNIX Line Endings
But does it do this on all systems? I’m not sure. It’s entirely possible that on some systems the LF code is generated instead. And that assumes that a control code is actually generated and placed in the text stream/buffer/file in the first place.
Heck on some keyboards the ‘Return’ key is called “Enter” and Sinclair being different (as usual) called it “New Line” on some of their computers.
[BTW I’ve not actually researched this, I’m just asking the questions, because never assume anything, other than, it’s likely there is a different way of doing something...]
Edit: Richard got in while I was editing.
Mark
For a "Complete BBC Games Archive" visit www.bbcmicro.co.uk NOW!
BeebWiki - for answers to many questions...
Fault finding index • Acorn BBC Model B minimal configuration • Logic Levels for 5V TTL Systems
BeebWiki - for answers to many questions...
Fault finding index • Acorn BBC Model B minimal configuration • Logic Levels for 5V TTL Systems
Re: DOS vs UNIX Line Endings
Again, it is not universal. GUI environments tend to present keystrokes as events and indeed the keycode may be unrelated to ASCII and may already have been through various translations, but an ASCII (or Unicode) translation may also be part of that event or available separately - something that is very useful for the letter keys.Richard Russell wrote: ↑Mon Nov 16, 2020 11:30 amI'm not sure that it "generates" CR, at least not on the modern platforms I mostly deal with. Typically pressing the Enter key results in a 'key down' event with the key being identified by a symbolic constant (e.g. VK_ENTER) that could in principle be anything. When I receive that event I do write CR into the input buffer, since that's what BBC BASIC programs expect, but that's a choice of the application program rather than something determined by the OS.
CR for the return key probably comes from the asynchronous serial terminals, possibly dating back to the teletype, but was certainly the case in the VT100 era. CP/M and Unix both seem to be written around this serial terminal idea, even when no physically separate terminal is present, and the BBC Micro continues this idea in having all screen drawing the result of a stream of bytes sent to the VDU driver rather than a series of procedure calls made by the application program. OSRDCH returns CR for the return key, i.e. even when reading a character at a time and in most GUI environments I would expect the translation from the keycode in a keydown event for the Return key to a character would result in CR, though other options are possible.
- Richard Russell
- Posts: 2071
- Joined: Sun Feb 27, 2011 10:35 am
- Location: Downham Market, Norfolk
- Contact:
Re: DOS vs UNIX Line Endings
True. In the systems I'm most familiar with, keys corresponding to 'printing' characters generate events reporting the ASCII code (or more likely Unicode these days) but non-printing keys like Enter, Backspace, Delete etc. are identified only by symbolic constants. Again it's not universal, but commonly pressing Shift modifies the code received from printing keys, but not the ID of non-printing keys (if you want Shift and/or Ctrl to modify the code you have to do that yourself).
I am suffering from 'cognitive decline' and depression. If you have a comment about the style or tone of this message please report it to the moderators by clicking the exclamation mark icon, rather than complaining on the public forum.
- BeebMaster
- Posts: 3970
- Joined: Sun Aug 02, 2009 5:59 pm
- Location: Lost in the BeebVault!
- Contact:
Re: DOS vs UNIX Line Endings
I fixed that, it's necessary to change the "pad" character which I think nowadays would more likely be called a hard space, which defaults to the | character. But now I've found something really really annoying. I have to spool the file when I've finished editing it, so that it's a pure text file, rather than an Inter-Word format file, and when doing so it inserts a space before the carriage return at the end of each line! I can't stop it doing that!BeebMaster wrote: ↑Sun Nov 15, 2020 10:57 amEven Inter-Word has let me down, as it skips the control character I use as a delimiter in the text file (|) when spooling the output, so I can't use that either. Probably that was a bad choice of delimiter, but it's too late now. It's a real shame, as it was looking good in 106-characters-per-line mode.