[ale] Mining PDF's

Kevin O'Neill Stoll kevinostoll at yahoo.com
Thu Dec 5 13:41:19 EST 2002


Hey all,

I need to implement a search functionality that is able to
mine a url directory structure which contains pdf's. I was
hoping that someone knew of an opensource project that
already has done some of the grunt work otherwise, I'm open
to ideas as to how to accomplish this task.

In mining the pdfs, the search functionality needs to grab
a title, file size, a summary and relevance based on a text
search. (i.e. - if I search for 'dog', all pdfs with the
phrase 'dog' in it would be returned. )  I'm just not sure
how to get the text out of a pdf.

Anywho, thanks in advance.



=====
Kevin Stoll
http://kevinstoll.org

OpenSource Software...FREE!
Angering Bill Gates...priceless.
============================================================

__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com
_______________________________________________
Ale mailing list
Ale at ale.org
http://www.ale.org/mailman/listinfo/ale






More information about the Ale mailing list