We are going to see how to digitize books at home with a camera. It is faster and easy. You only need a digital camera or a smartphone.
Digitizing a book is formed by two parts. The first one consists of get images of the text book and in the second part you have to treat it with an OCR (Optical Character Recognition)
Traditionally, books were scanned page by page, this is a very slow process that was complex because of the spines of the books, which curled the pages and then the OCR did not recognize the words well. So much that many people unbundled them in order to facilitate the scanning process.
So instead of scanning the pages we will take pictures. I work with a compact camera of 10 megapixels, and the last few times with the smartphone.
As you can see it is a very homemade and cheap system but still in less than an hour I have had a 120-page book in digital format (without layout) but without hurrying.
And don’t think that this is only used to hack books, that you can use it perfectly to digitize your class notes and be able to study with a reader, ipad, or laptop.
Steps to digitize a book
You will need
- One camera of photos or a smartphone
- A tripod
- a glass
The first thing to do is to build a platform or lectern to hold the book and we will do it with cardboard. It is very simple
In the back you see all lectern parts fixed with tape.
Spine book detail is important. Depending on the thickness of the book we should adapt it so that the book is not fatter and have no problems.
If you want everything clearer here I leave you the measures of the one I have built. It goes in cm, and x2, x4 is the number of pieces you need from each.
Book Digitizer Assembly
We will use the glass to flatten the page to be photographed, we must be careful with the reflections of the glass, so it is best to do it with natural light and that it impacts from the side.
You have to put the camera so that it takes the entire page, as close as possible (using zoom) and as centered as possible.
With the glass we flatten the sheet that we are going to photograph. With our hand we will move the opposite page so that it does not appear on the screen and let it photograph all the text.
This photograph is badly done because it does not take the full text
You must see all the margins you can not cut the words, so it is very important to place the camera properly to get the images
How I take the images
There are several methods and software that will help us. For me the fastest and most comfortable way is to photograph the odd pages first, and then make all the pairs.
We rename with the page numbers to be able to mix them, this can be done with multiple free software.
How to rotate the images with GIMP
We will use GIMP, the free software image editor and a plugin called BIMP used to edit images in batch. Here is a video of how it would be done
What an OCR is
We are in the last step. Pass the images through the OCR. The OCR is an Optical Character Recognition software, what it does is recognize the text of an image and convert it into written text that you can save as a text document, either .doc .odt or other formats.
The best of those I know, is the Abbyy Fine Reader a real wonder, but it is paid.
Once everything is digitized, we only have to layout, but we are not going to talk about this at the moment
Finally, as surely someone is trying to see what books were in the pile, here is a detail 😉
The fastest book digitizer in the world
How do I know that you like the curiosities here of a video of the operation of the fastest book digitizer in the world. This is the BSF-Auto and is capable of scanning 250 pages per minute
You have more information in http://www.k2.t.u-tokyo.ac.jp/vision/BFS-Auto/