[ale] Grabbing a dynamic website automatically?

Geoffrey esoteric at 3times25.net
Fri Aug 23 07:18:36 EDT 2002




johncole at mindspring.com wrote:
> Howdy!
> 
> Yes, but the problem is that the website changes everyday as I have to log
> into a HTTPS site. Then I have to go through a couple of licks/menus in

Man I hate licks/menus, messes up my monitor screen.  Serious 
suggestions below...

> order to get the page I need.
> Otherwise, this would work.
> 
> I did look over what someone else did for doing Cookie based wgets/curl and
> with HTTPS but I don't see anywhere where it says anything bout time-access
> and logging in and going through a few pages before I get to the content I
> need.

Here's what I've done in the past.  When you get to the page that is 
just before the one you want to print, check the url that calls that 
page.  It may be that is all you need to call the page directly.  Try 
saving this full url somewhere, exit your browser and then attempt to 
open this page with a new browser.  If it fails because you're missing a 
cookie, then the issue is more complex.

You can manipulate cookies with both perl and javascript.  The next 
attempt would be to retain the cookie they place in your cookie file, 
update the time/date and insert it back into your cookie file prior to 
attempting to load the page as noted in the previous paragraph.

> 
> Thanks for the ideas though everyone!
> 
> Thanks,
> John
> 
> 
> 
>>At 08:50 AM 08/22/2002 -0400, you wrote:
>>
>>>Run a cronjob with Links outputting the page to a text file?
>>>
>>>Something like: "links -dump https://www.foo.bar/page.pl > ~/daily" done
>>>at 0200, perhaps?
>>>
>>>-- 
>>>Christopher R. Curzio     |  Quantum materiae materietur marmota monax
>>>http://www.accipiter.org  |  si marmota monax materiam possit materiari?
>>>:wq!
>>>
>>>Thus Spake <johncole at mindspring.com>:
>>>Thu, 22 Aug 2002 08:31:36 -0400
>>>
>>>
>>>
>>>>Howdy all!
>>>>
>>>>What would be the best way to grab the data off of a website that is
>>>>dynamic, HTTPS, and has cookies enabled.?  I'm trying to capture a
>>>>single page everyday from a particular website automatically.
>>>>
>>>>(in particular I'm using Redhat 7.2)
>>>>
>>>>I need the page back in text format preferably (or I can convert it to
>>>>text later as needed for insertion into a database.)
>>>>
>>>>Thanks,
>>>>John
>>>
> Paypal membership: free 
> 
> Donation to Freenet: $20 
> 
> Never having to answer the question "Daddy, where were you when they took
> freedom of the press away from the Internet?": Priceless. 
> 
> http://www.freenetproject.org/index.php?page=donations
> 
> ---
> This message has been sent through the ALE general discussion list.
> See http://www.ale.org/mailing-lists.shtml for more info. Problems should be 
> sent to listmaster at ale dot org.
> 
> 


-- 
Until later: Geoffrey		esoteric at 3times25.net

I didn't have to buy my radio from a specific company to listen
to FM, why doesn't that apply to the Internet (anymore...)?


---
This message has been sent through the ALE general discussion list.
See http://www.ale.org/mailing-lists.shtml for more info. Problems should be 
sent to listmaster at ale dot org.






More information about the Ale mailing list