## Detecting picture data

Discuss all aspects of programming here. From 8-bit through to modern architectures.
geraldholdsworth
Posts: 474
Joined: Tue Nov 04, 2014 9:42 pm
Location: Inverness, Scotland
Contact:

### Detecting picture data

Hi all,

Does anyone have an algorithm for working out whether a given block of data is likely to be picture data?

I'm not thinking of writing this on a BBC or RISC OS (Windows actually), but am looking for BBC screen/sprite data (which is not necessarily of, say 20K length for a MODE 2 screen, as could only be part of the screen) and telling it apart from code or other data. Sounds easy...I just can't get my head around how to do it programmatically.

Cheers,

Gerald.
Gerald Holdsworth
Repton Resource Page
www.reptonresourcepage.co.uk

davidb
Posts: 2384
Joined: Sun Nov 11, 2007 10:11 pm
Contact:

### Re: Detecting picture data

You could look for spans of multiple bytes with the same value. 6502 code isn't likely to contain a lot of those. Possibly narrow this down to bytes that represent runs of the same pixel colour.

Rich Talbot-Watkins
Posts: 1446
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca
Contact:

### Re: Detecting picture data

A variant of that might be to count the percentage of pairs of adjacent bytes which differ by 3 bits or less (or some other threshold arrived at by experimentation). I'd expect image data to have a far higher percentage of such pairs than machine code, although other types of data might also come out as false positives.

paulb
Posts: 811
Joined: Mon Jan 20, 2014 9:02 pm
Contact:

### Re: Detecting picture data

What I can imagine the statistics people doing is to analyse 6502 instructions and build up some kind of probability model (a Markov chain, maybe). Then, running through a sequence of bytes, things that don't match the predicted successor would probably suggest data rather than instructions.

This is all hand waving, of course, and you'd need to be careful with the operands for instructions, so perhaps any probable instruction would cause following bytes to be considered operands (according to that instruction's requirements), and the next item in the sequence would be obtained from the next instruction location rather than the next byte.

geraldholdsworth
Posts: 474
Joined: Tue Nov 04, 2014 9:42 pm
Location: Inverness, Scotland
Contact:

### Re: Detecting picture data

Thank you - there is certainly some deep thought required to achieve this. Might take me some time, but some excellent pointers here.
Gerald Holdsworth
Repton Resource Page
www.reptonresourcepage.co.uk

paulb
Posts: 811
Joined: Mon Jan 20, 2014 9:02 pm
Contact:

### Re: Detecting picture data

Somewhat related to this is part-of-speech tagging which is used in natural language processing to classify each word in a natural language text. I'm not claiming that such taggers are applicable here, but you can get a feel for the kind of thing I was suggesting by reading up a bit on that topic.