[ale] Document Imaging under Linux

James P. Kinney III jkinney at localnetsolutions.com
Sun Sep 14 22:11:50 EDT 2003


xsane + ADF scanner + script-foo + database + webserver + time

Save the images as a pdf unless you plan on trying OCR, then scan using
black and white and save as tiff.

I had started this project a few years ago for a law firm that had 20+
years of paper documents that were beyond impossible to do searches
through. The script-foo part is the hardest part as it needs to 
a) show the document image
b) get user input for classification of the document
c) make the appropriate entries into the database

a and c are easy. It's b that is the truly hard part. As always,
designing the database system is the really hard part. For my law office
project, each set of documents was the same type of data. Each set was
composed of essentially the same stuff as far as far as classification
went. The details changed, but the major portion could have been
drop-down boxes (doc-type A is always found with doc-type L, doc-type B
is always used with types C, D, E, and M, etc).

Too bad the decision maker was a Luddite and a cheapskate. They might be
quite a bit more profitable than they are now.

On Sun, 2003-09-14 at 20:27, John Wells wrote:
> Guys,
> 
> I have a family member who'd like me to help design/develop or integrate a
> document imaging system under Linux.  He has a large amount of documents
> he'd like to scan in, store, and be able to retrieve easily for his
> company.
> 
> I'm very new to document imaging, so I'm not convinced I have a handle on
> everything that goes into it, but my layman's understanding is that it is
> simply converting paper docs into storable electronic docs.
> 
> So, first of all, is there anything out there already?  I'd hate to
> reinvent the wheel, and I'm sure this has been done before many times.
> 
> If there isn't anything out there, then what formats are available?  We're
> talking potentionally hundreds of thousands of documents here.
> 
> My first thought was to store them as JPEGs in the filesystem and then
> store their ids in a database, allowing the filesystem to handle the
> loading and storing of the files.  Course, JPEG is just the smallest
> fairly good quality format I'm aware of, and I'm sure I'm overlooking some
> better ones.
> 
> If you were approaching this project, what would you do? :-)
> 
> Thanks very much!
> 
> John
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://www.ale.org/mailman/listinfo/ale
-- 
James P. Kinney III          \Changing the mobile computing world/
CEO & Director of Engineering \          one Linux user         /
Local Net Solutions,LLC        \           at a time.          /
770-493-8244                    \.___________________________./
http://www.localnetsolutions.com

GPG ID: 829C6CA7 James P. Kinney III (M.S. Physics)
<jkinney at localnetsolutions.com>
Fingerprint = 3C9E 6366 54FC A3FE BA4D 0659 6190 ADC3 829C 6CA7
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part




More information about the Ale mailing list