[ale] Dealing with really big log files....

scott mcbrien smcbrien at gmail.com
Sun Mar 22 12:35:25 EDT 2009


You could write a perl script to break it apart for you.  The pseudo code
would look something like:
open original log file

while input from file
  read first line
  pattern match for the thing that looks like a date
  open a different file (probably with date as part of the name)

  while read line contains date
    write out the line
    read the next line

  close the file

close the original log file

variations would include adding some directory structure around where to
place the logs when they're broken apart, or instead of separating by day,
separating by month or year.

-Scott

On Sun, Mar 22, 2009 at 10:54 AM, Kenneth Ratliff <lists at noctum.net> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Mar 22, 2009, at 10:15 AM, Greg Freemyer wrote:
>
> > If you have the disk space and few hours to let it run, I would just
> > "split" that file into big chinks.  Maybe a million lines each.
>
> Well, I could just sed the range of lines I want out in the same time
> frame, and keep the result in one log file as well, which is my
> preference. I've got about 400 gigs of space left on the disk, so I've
> got some room. I mean, I don't really care about the data that goes
> before, that should have been vaporized to the ether long before, I
> just need to isolate the section of the log I do want so I can parse
> it and give an answer to a customer.
>
> > I'd recommend the source and destination of your split command be on
> > different physical drives if you can manage it.  Even if that means
> > connecting up a external usb drive to hold the split files.
>
> Not a machine I have physical access to, sadly. I'd love to have a
> local copy to play with and leave the original intact on the server,
> but pulling 114 gigs across a transatlantic link is not really an
> option at the moment.
>
> > If you don't have the disk space, you could try something like:
> >
> > head -2000000 my_log_file | tail -50000 > /tmp/my_chunk_of_interest
> >
> > Or grep has a option to grab lines before and after a line that has
> > the pattern in it.
> >
> > Hopefully one of those 3 will work for you.
>
> mysql's log file is very annoying in that it doesn't lend itself to
> easy grepping by line count. It doesn't time stamp every entry, it's
> more of a heartbeat thing (like once a second or every couple seconds,
> it injects the date and time in front of the process it's currently
> running). There's no set number of lines between heartbeats, so one
> heartbeat might have a 3 line select query, the next heartbeat might
> be processing 20 different queries including a 20 line update.
>
> I do have a script that will step through the log file and parse out
> what updates were made to what database and what table at what time,
> but it craps out when run against the entire log file, so I'm mostly
> just trying to pare the log file down to a size where it'll work with
> my other tools :)
>
> > FYI: I work with large binary data sets all the time, and we use split
> > to keep each chunk to 2 GB.  Not specifically needed anymore, but if
> > you have read error etc. if is just the one 2 GB chunk you have to
> > retrieve from backup.  if also affords you the ability to copy the
> > data to FAT32 filesystem for portability.
>
> Normally, we rotate logs nightly and keep about a weeks worth, so the
> space or individual size comparisons are usually not an issue. In this
> case, logrotate busted for mysql sometime back in November and the
> beast just kept eating.
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.9 (Darwin)
>
> iEYEARECAAYFAknGUTIACgkQXzanDlV0VY53YgCgkJxWJK6AAOZ+c2QTPN/gYLJH
> v/YAoPZXNIBckyfhfbMGrAZ6TNEqcIxV
> =IOjT
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.ale.org/pipermail/ale/attachments/20090322/9809e7af/attachment.html 


More information about the Ale mailing list