This is all a confusing mess.
I'll start by making a few general observations, then get more specific.
Firstly, and most importantly, as a general principle an archiving tool's main job is to preserve as much information as possible. If, during the path from source filing system, through archiving, transport and restoration, to destination filing system, there must be any translation or discarding of information, that should occur in the dearchiving step, not the archiving one. That way, you're not stuck if you find a bug, change your mind, or want to use the data in a different environment.
Secondly, hosting files for one OS under another OS's filesystem presents an additional complexity: you have two sets of metadata: host and guest. When archiving such files:
- (To the extent that they differ) both sets of metadata need to be preserved.
- Only one subtree might contain guest-OS files, with the rest for the host OS.
- There might be multiple guest subtrees for different guest-OSes.
- There might be guest-of-a-guest nesting for three sets of metadata. Or worse.
- It might not be apparent which files are intended for a guest OS. It might even be impossible to determine accurately.
Thirdly, several aspects of file metadata including hierarchy, naming, date and time representations, file typing, access permissions, forks/ADSes/user attributes, DSORG/sparseness, text encoding, order of files, volume, compression, etc. are each individually a monstrous can of worms. In terms of which precise data items exist, what range of values is possible and how they are represented.
Fourthly, there are many other kinds of filing system objects besides files: directories, image files, symbolic links, hard links, devices, pipes, sockets, mount points... which may need to be represented. The difficulties may be the same as for files, or they may be differently complicated.
Fifthly, not all filesystem metadata is associated with individual filesystem objects.
Sixthly, although things like SparkFS prove it can be done at a pinch, most archival formats are not suitable for use as a live filesystem. Significant problems include:
- The lack of indexing structrures for better than O(n) access to anything
- The difficulty and inefficiency of extending existing files
- The difficulty of modifying data when it's been compressed without regard to that need
- How to be robust in the face of failures during modifications
Seventhly, the world has moved on in past quarter of a century. On the kinds of data volumes we're discussing, seeking has become relatively quicker in comparison with reading and writing. Fragmentation is cheap; moving data is expensive.
I'm realistic enough to recognise that I'm not going to set the world to rights. And I'm aware of the problem that Myelin mentioned, that creating a shiny new format means ending up with N+1 competing formats where before there were only N. But I'm also uncomfortable that things feel somewhat messed-up at the moment.
jgharston wrote:No, 'cos ZIP tools for RISC OS already use the pre-existing 'Acorn' metadata field, and have done for almost 30 years, so you'd be breaking existing systems. Leaving the .inf files out of a ZIP file is involving one fewer standard, and the standard is already that Acorn metadata is stored in the Acorn metadata ZIP field.
Suppose I have some files on an Archimedes. I wish to archive them, transport that archive to a Windows/MacOS/Linux box and unpack that archive. I want those unpacked files to be usable natively, and from a remote/virtualised/emulated RISC OS box. Then I want to archive them up again, transport them back to an Archimedes, unpack them and get back something equivalent to what I started with.
So far as I can tell, nothing approaching that is possible. The Archimedes ZIP tools work with the Acorn extra-data format, but I'm not aware of any ZIP tool for any other platform that honours them. I see your ZipToInf tool, but the documentation is sparse, there's no indication how robust it might be in various corner cases, the Windows version isn't open-source, the BBC BASIC version is impenetrable, there's no Linux/MacOS/portable version and there's no corresponding InfToZip tool.
For BBC Micros the situation is even worse. There's an UnZip tool, but no Zip tool.
Meanwhile, if you've managed to get a filesystem image from an 8-bit Acorn or Archimedes machine onto some other platform, I don't believe any
of the tools for picking them apart support the Acorn extensions to ZIP.
the ZIP standard is very strong on saying Thou Shalt Put Thy Metadata In The Metadata Field And Not In Additional File Entries
For one specific OS, if generating files on that OS either for restoring on that OS or for specialist manipulation on other platforms, that's fair enough. But it's completely impractical to expect WinZip, say, to do something meaningful with Acorn extra data.
you use [...] various tools for extracting Unix user settings on non-Unix systems, etc.
A complicating factor is that ZIP is show-stoppingly deficient as a way of storing Unix files. Someone would have fixed it by now, except that everyone uses tar instead so they've never bothered.
The Acorn datestamp is stored in the pre-existing ZIP datestamp field.
Acorn datestamp is stored in which
pre-existing ZIP datestamp field? There's the RISC OS 40-bit value in files with type+timestamp files, and there's sometimes the Econet datestamp (anyone know what happened when that wrapped in 1997? How about 2013?). What if one or the other is absent; what if they're inconsistent? On the ZIP side, there's the basic DOS mdate and mtime, but also the 0x5455 extended timestamp ctime/mtime/atime. Note that neither is of sufficient resolution to hold an Acorn centisecond-accurate timestamp. Also, Acorn timestamps are local time where POSIX ones are UTC; what mapping do you use? If given load+exec attributes for an Acorn file, how do you distinguish a type+timestamp from a legacy 8-bit host processor address when deciding whether or not to impart the Acorn timestamp to the ZIP file? What timestamp do you use in the ZIP file if there is no timestamp at all on the Acorn side, and how do you annotate that this is what has happened?
And then there's the question of what to do with timestamps when unpacking
the ZIP file. On various platforms...