Hi all,
Does anyone have an algorithm for working out whether a given block of data is likely to be picture data?
I'm not thinking of writing this on a BBC or RISC OS (Windows actually), but am looking for BBC screen/sprite data (which is not necessarily of, say 20K length for a MODE 2 screen, as could only be part of the screen) and telling it apart from code or other data. Sounds easy...I just can't get my head around how to do it programmatically.
Cheers,
Gerald.
Detecting picture data
- geraldholdsworth
- Posts: 437
- Joined: Tue Nov 04, 2014 9:42 pm
- Location: Inverness, Scotland
- Contact:
Re: Detecting picture data
You could look for spans of multiple bytes with the same value. 6502 code isn't likely to contain a lot of those. Possibly narrow this down to bytes that represent runs of the same pixel colour.
- Rich Talbot-Watkins
- Posts: 1412
- Joined: Thu Jan 13, 2005 5:20 pm
- Location: Palma, Mallorca
- Contact:
Re: Detecting picture data
A variant of that might be to count the percentage of pairs of adjacent bytes which differ by 3 bits or less (or some other threshold arrived at by experimentation). I'd expect image data to have a far higher percentage of such pairs than machine code, although other types of data might also come out as false positives.
Re: Detecting picture data
What I can imagine the statistics people doing is to analyse 6502 instructions and build up some kind of probability model (a Markov chain, maybe). Then, running through a sequence of bytes, things that don't match the predicted successor would probably suggest data rather than instructions.
This is all hand waving, of course, and you'd need to be careful with the operands for instructions, so perhaps any probable instruction would cause following bytes to be considered operands (according to that instruction's requirements), and the next item in the sequence would be obtained from the next instruction location rather than the next byte.
This is all hand waving, of course, and you'd need to be careful with the operands for instructions, so perhaps any probable instruction would cause following bytes to be considered operands (according to that instruction's requirements), and the next item in the sequence would be obtained from the next instruction location rather than the next byte.
- geraldholdsworth
- Posts: 437
- Joined: Tue Nov 04, 2014 9:42 pm
- Location: Inverness, Scotland
- Contact:
Re: Detecting picture data
Thank you - there is certainly some deep thought required to achieve this. Might take me some time, but some excellent pointers here.
Re: Detecting picture data
Somewhat related to this is part-of-speech tagging which is used in natural language processing to classify each word in a natural language text. I'm not claiming that such taggers are applicable here, but you can get a feel for the kind of thing I was suggesting by reading up a bit on that topic.