Scanning

for all subjects/topics not covered by the other forum categories
Post Reply
Coeus
Posts: 1314
Joined: Mon Jul 25, 2016 11:05 am
Contact:

Scanning

Post by Coeus » Mon Dec 10, 2018 7:14 pm

I am sure i have read comments on here about people struggling to put up with the boredom of scanning large manuals etc.

I had a go at the Epson FX manual yesterday and did pretty much the same as with the BCPL manual such that you could, perhaps, call it a process or formula. Firstly, I choose a time when I haven't the mental energy to tackle something taxing. Then I put on some music I enjoy then it is just a case of making sure each page is straight on the scanner bed and clicking the "Scan" button on the PC. Essentially it is spending an hour listening to music in which I happen to scan a manual rather than ever expecting the scanning to hold my interest.

Of course, it helps to have enough automation that you are not having to do things like save each page to an individual file. I'm on Linux so for simple jobs I use simple-scan which scans multiple pages straight to PDF. You can also crop the first page and it will crop all subsequent pages the same until you change it so it is literally one button press per page and, as I have a trackball, the mouse pointer does not even move off the button. For more control over the process there is xsane which can scan to multiple, sequentially numbered files which can then be put into a single PDF with the convert program from Image Magick. I am sure similar features must be available in Windows scanning s/w.
Last edited by Coeus on Mon Dec 10, 2018 7:14 pm, edited 1 time in total.

User avatar
SimonSideburns
Posts: 378
Joined: Mon Aug 26, 2013 8:09 pm
Location: Purbrook, Hampshire
Contact:

Re: Scanning

Post by SimonSideburns » Mon Dec 10, 2018 9:57 pm

That's an interesting approach to the task.

In my case, I'm on Windows 10 with a Brother A3 multi-function printer/scanner/fax/copier thingy that I was given for free by someone who was having all sorts of nightmarish issues with it. Turns out after much delicate ink-soaked paper remnant removal from deep within the mechanisms, it works a treat, as a printer at least.

Now this thing had a huge software install to my PC with an assortment of drivers, apps and who knows what else, but after reading the manual it would appear to be able to be fed an entire stack of sheets of paper and will scan them to the PC or wherever you wish them to go, either directly from standing at the printer or a client.

Now, unless I'm doing something wrong (quite possible), I fed in a magazine from 1984 that had had its spine very carefully removed, told the software to get started, and then sat back expectantly, but after the first side was scanned the machine seemed to sit for a while and then threw up an error on the laptop.

I've not tried for a while, but as you say, something that shouldn't be that difficult to do would be an even simpler task if it had worked fine.

Must try again one day.
Last edited by SimonSideburns on Mon Dec 10, 2018 9:58 pm, edited 1 time in total.
I'm writing a game where you can change your character from a Wizard to a monkey to a cat.

Well, Imogen that!

User avatar
flaxcottage
Posts: 3674
Joined: Thu Dec 13, 2012 8:46 pm
Location: Derbyshire
Contact:

Re: Scanning

Post by flaxcottage » Mon Dec 10, 2018 10:36 pm

In addition to the music I find a good Malt makes the scanning fly by. Mind you the last few pages need a bit of editing to get them square. :lol:
- John

Image

Coeus
Posts: 1314
Joined: Mon Jul 25, 2016 11:05 am
Contact:

Re: Scanning

Post by Coeus » Mon Dec 10, 2018 11:28 pm

SimonSideburns wrote:
Mon Dec 10, 2018 9:57 pm
Now, unless I'm doing something wrong (quite possible), I fed in a magazine from 1984 that had had its spine very carefully removed, told the software to get started, and then sat back expectantly, but after the first side was scanned the machine seemed to sit for a while and then threw up an error on the laptop.
It have had an automatic document feeder work as intended in the past, in that it feeds one page after another, but I do find that they are generally poor at getting the pages square.

A lot depends on what you're prepare to put up with. If you're going to work on each page afterwards anyway then straightening it in software is not so bad for the printer manual I was trying to get it straight enough in the first place than I would not need to so any editing.

Then, of course, you can't feed a bound book from the document feeder.

Coeus
Posts: 1314
Joined: Mon Jul 25, 2016 11:05 am
Contact:

Re: Scanning

Post by Coeus » Mon Dec 10, 2018 11:29 pm

flaxcottage wrote:
Mon Dec 10, 2018 10:36 pm
In addition to the music I find a good Malt makes the scanning fly by. Mind you the last few pages need a bit of editing to get them square. :lol:
I can see that would help.

scruss
Posts: 144
Joined: Sun Jul 01, 2018 3:12 pm
Location: Toronto
Contact:

Re: Scanning

Post by scruss » Tue Dec 11, 2018 3:25 am

Coeus wrote:
Mon Dec 10, 2018 7:14 pm
I had a go at the Epson FX manual yesterday …
Step 0: make sure no-one else has done it. They may already be online.
Of course, it helps to have enough automation that you are not having to do things like save each page to an individual file. I'm on Linux so for simple jobs I use simple-scan which scans multiple pages straight to PDF.
Scan Tailor does a whole bunch of hard tasks for you: page splitting, margin correction, straightening, de-speckling and compilation as PDF.
which can then be put into a single PDF with the convert program from Image Magick.
convert does stuff to images I don't necessarily agree with. For one, it tends to make PDFs that are bigger than they need to be. I find img2pdf gives more useful results, but then I'm maybe needlessly picky about these things. If you're uploading manuals to archive.org, they'll reformat the file and add useful metadata.

User avatar
danielj
Posts: 7399
Joined: Thu Oct 02, 2008 4:51 pm
Location: Manchester
Contact:

Re: Scanning

Post by danielj » Tue Dec 11, 2018 8:03 am

I've got a plan to build a non-destructive scanner (Basically a mounted digital camera, and good lighting), but as ever time is the thing... In theory archive.org have a network of these things around, but I can't locate any near me. There's certainly a service in London?

https://archive.org/scanning

d.

Coeus
Posts: 1314
Joined: Mon Jul 25, 2016 11:05 am
Contact:

Re: Scanning

Post by Coeus » Tue Dec 11, 2018 3:15 pm

scruss wrote:
Tue Dec 11, 2018 3:25 am
Scan Tailor does a whole bunch of hard tasks for you: page splitting, margin correction, straightening, de-speckling and compilation as PDF.
Browing the user guide that looks very useful. I will try it out.
scruss wrote:
Tue Dec 11, 2018 3:25 am
convert does stuff to images I don't necessarily agree with. For one, it tends to make PDFs that are bigger than they need to be...
Could you explain more. I have noticed the output PDF is large if you don't specifically tell it to compress the images but the -compress option fixes that.
Last edited by Coeus on Tue Dec 11, 2018 3:16 pm, edited 1 time in total.

scruss
Posts: 144
Joined: Sun Jul 01, 2018 3:12 pm
Location: Toronto
Contact:

Re: Scanning

Post by scruss » Wed Dec 12, 2018 4:30 pm

Coeus wrote:
Tue Dec 11, 2018 3:15 pm
Could you explain more. I have noticed the output PDF is large if you don't specifically tell it to compress the images but the -compress option fixes that.
Yeah, sorry for that cryptic answer.

ImageMagick — when it deigns to allow you to create PDF — tends to store all images fed to it in a lowest-common-denominator format. While that ensures you get a PDF that everyone can read, it can often result in huge files. When you use -compress, you need to specify a compression format. This is fine if they're all the same type of file, such as B&W TIFFs. If they're a mix of file types, though, they all get converted through ImageMagick's black-box/not always lossless conversion scheme. It's particularly bad at dealing with JPEG files. Since PDF can embed and use JPEG streams, you'd expect ImageMagick to just copy them as they are. But no, it makes tiny lossy changes to the files.

img2pdf stores files losslessly in the PDF stream. It even retains metadata in JPEG files, so you can use it to make a viewable archive of photos that's just as lossless as a ZIP file:

Code: Select all

imgtopdf -o out.pdf file1.jpg file2.jpg file3.jpg
pdfimages -j out.pdf out
(creates out-001.jpg, out-002.jpg, out-003.jpg which are bitwise identical to file1.jpg, file2.jpg and file3.jpg)

Should you care about this? It's probably healthier to be a little less obsessive about it than I am, frankly. Sometimes data and file formats can surprise you. For at least the last 20 years, I thought the smallest generally useful (so: not JBIG) mono image format would always be a G4 TIFF. But under certain circumstances you can make B&W PNG images smaller than the equivalent G4 TIFFs.

tingo
Posts: 4
Joined: Fri Jul 06, 2018 12:56 pm
Contact:

Re: Scanning

Post by tingo » Thu Dec 13, 2018 7:59 pm

FWIW, I like gscan2pdf (but I'm not using Windows, I moved on many years ago).

Also, it is nice to add a post-processing step (if your scanning software doesn't do it for you): please OCR the PDF.
As a friend explained to me: it doesn't matter that the OCR only get between 90 - 97 % of the text correct; each word it does get correct will end up searchable when you put that PDF online.
Last edited by tingo on Thu Dec 13, 2018 8:00 pm, edited 1 time in total.
--
Torfinn

scruss
Posts: 144
Joined: Sun Jul 01, 2018 3:12 pm
Location: Toronto
Contact:

Re: Scanning

Post by scruss » Sat Dec 15, 2018 4:34 am

If you upload to archive.org — and you really should drop copies there for safe-keeping — they automatically carry out OCR. Failing that, tesseract joins images to multi-page PDF as it adds OCR data

User avatar
BigEd
Posts: 2567
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Scanning

Post by BigEd » Sat Dec 15, 2018 9:21 am

danielj wrote:
Tue Dec 11, 2018 8:03 am
I've got a plan to build a non-destructive scanner (Basically a mounted digital camera, and good lighting), but as ever time is the thing... In theory archive.org have a network of these things around, but I can't locate any near me. There's certainly a service in London?

https://archive.org/scanning

d.
Just came across http://diybookscanner.org/ (mentioned in this talk) which is all about machines you can make or buy. (Not quite the answer you want, which is access to a machine near you.)

My present flow is to prop the book open, not fully open but a bit more than a right angle, flatten the left page with a sheet of glass from a photo frame, snap with my phone, then flip the glass and snap the other page. Sometimes I get a bit of reflection from the glass, and sometimes the autofocus doesn't quite snap in, but it's relatively quick - I can do 5 pages a minute, or more, for maybe an hour and then take a rest. And then all the photos need to be squared and rotated. But I don't have to wrestle a book onto a flatbed.

Coeus
Posts: 1314
Joined: Mon Jul 25, 2016 11:05 am
Contact:

Re: Scanning

Post by Coeus » Sat Dec 15, 2018 11:02 pm

scruss wrote:
Wed Dec 12, 2018 4:30 pm
...It's particularly bad at dealing with JPEG files. Since PDF can embed and use JPEG streams, you'd expect ImageMagick to just copy them as they are. But no, it makes tiny lossy changes to the files...
But isn't this fairly standard in general purpose image processing tools, i.e. there effectively a pipeline:

"read and uncompress source" | "apply processing effect" | "encode and write output"

or perhaps, more fully:

read | demultiplex | uncompress | process | compress | multiplex | write.

but it become sub-optimal when the processing effect is a no-op and you are just using the ability to insert into of extract from archive (multi-image) formats. The same is true of audio - I bet if you load an MP3 into audacity, then save it again to MP3 format the new MP3 would not be the same as the one you started with.

So, perhaps, there are two things that come from this:

1. Beware of using a sledgehammer to crack a nut. A smaller, more focused tool may do a better job.
2. When processing it is best to avoid a lossy format for all but the final destination file.
Last edited by Coeus on Sat Dec 15, 2018 11:09 pm, edited 2 times in total.

Coeus
Posts: 1314
Joined: Mon Jul 25, 2016 11:05 am
Contact:

Re: Scanning

Post by Coeus » Sat Dec 15, 2018 11:25 pm

tingo wrote:
Thu Dec 13, 2018 7:59 pm
FWIW, I like gscan2pdf (but I'm not using Windows, I moved on many years ago).
That's another interesting tool. I have had quick play but not really had anything to try it on for real. I think much will depend on the unpaper component it uses.
Last edited by Coeus on Sun Dec 16, 2018 12:09 am, edited 1 time in total.

Coeus
Posts: 1314
Joined: Mon Jul 25, 2016 11:05 am
Contact:

Re: Scanning

Post by Coeus » Sun Dec 16, 2018 12:18 am

BigEd wrote:
Sat Dec 15, 2018 9:21 am
Just came across http://diybookscanner.org/ (mentioned in this talk) which is all about machines you can make or buy. (Not quite the answer you want, which is access to a machine near you.)
That was certainly interesting to read. In one of the pictures, seeing someone wearing a mask suggests a professional archivist and maybe books that are old and where there is a distinct possibility that the book being scanned is the only copy in existence. Fortunately I don't think anything I have falls into that category.
BigEd wrote:
Sat Dec 15, 2018 9:21 am
My present flow is to prop the book open, not fully open but a bit more than a right angle, flatten the left page with a sheet of glass from a photo frame, snap with my phone, then flip the glass and snap the other page. Sometimes I get a bit of reflection from the glass, and sometimes the autofocus doesn't quite snap in, but it's relatively quick - I can do 5 pages a minute, or more, for maybe an hour and then take a rest. And then all the photos need to be squared and rotated. But I don't have to wrestle a book onto a flatbed.
What do you use for the light source? I did read, with interest, a post on another forum that dealt with photographing a painting without moving it. The standard approach seemed to be to pop a camera on a tipod exactly parallel to the painting and exactly centred and the light the paining with a couple of softboxes pointed in at 45 degree angles.

The book scanner also had the light, though only one in this case, at an angle to the page or perhaps more particularly to the plattern. With bigger cameras, depth of field can be an issue, i.e. if the camera is not completely parallel to the page it is impossible for the whole page to be in focus at the same time. That could be an advantage of a mobile phone as they tend to have good depth of field.
Last edited by Coeus on Sun Dec 16, 2018 12:18 am, edited 1 time in total.

User avatar
BigEd
Posts: 2567
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Scanning

Post by BigEd » Sun Dec 16, 2018 7:31 am

I'm using natural light - I'm set up near a big north-facing window. It's not ideal, and it's not white light. I'm sure a pair of proper lights at 45 degrees to the page in question would be better, as indeed it would be better to use a tripod and a proper camera. (But I'm doing what I can, on the grounds that it's better than not doing it at all. If I did it sufficiently poorly it would be a waste of time, but I don't think it's quite as bad as that...)

Before I used the glass, I got wobbly results like this:
viewtopic.php?f=2&t=14919&p=206867#p206867

With the glass, I get results like this:
https://photos.app.goo.gl/w2QREnkPFsqLXwxT6

User avatar
BeebMaster
Posts: 2783
Joined: Sun Aug 02, 2009 4:59 pm
Location: Lost in the BeebVault!
Contact:

Re: Scanning

Post by BeebMaster » Tue Dec 18, 2018 8:55 pm

Doing scanning as a "distraction" whilst engaged in some more pleasurable activity is indeed a great way of thinking of it. That's how I do my ironing nowadays, I don't think of it as ironing any more, it's a session in front of the radio listening to some of my favourites (Round the Horne, Hancock, Dr. Who etc) where it just happens that some ironing also occurs.
Image

Post Reply