[ale] Dealing with really big log files....

Kenneth Ratliff lists at noctum.net
Sun Mar 22 10:54:39 EDT 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Mar 22, 2009, at 10:15 AM, Greg Freemyer wrote:

> If you have the disk space and few hours to let it run, I would just
> "split" that file into big chinks.  Maybe a million lines each.

Well, I could just sed the range of lines I want out in the same time  
frame, and keep the result in one log file as well, which is my  
preference. I've got about 400 gigs of space left on the disk, so I've  
got some room. I mean, I don't really care about the data that goes  
before, that should have been vaporized to the ether long before, I  
just need to isolate the section of the log I do want so I can parse  
it and give an answer to a customer.

> I'd recommend the source and destination of your split command be on
> different physical drives if you can manage it.  Even if that means
> connecting up a external usb drive to hold the split files.

Not a machine I have physical access to, sadly. I'd love to have a  
local copy to play with and leave the original intact on the server,  
but pulling 114 gigs across a transatlantic link is not really an  
option at the moment.

> If you don't have the disk space, you could try something like:
>
> head -2000000 my_log_file | tail -50000 > /tmp/my_chunk_of_interest
>
> Or grep has a option to grab lines before and after a line that has
> the pattern in it.
>
> Hopefully one of those 3 will work for you.

mysql's log file is very annoying in that it doesn't lend itself to  
easy grepping by line count. It doesn't time stamp every entry, it's  
more of a heartbeat thing (like once a second or every couple seconds,  
it injects the date and time in front of the process it's currently  
running). There's no set number of lines between heartbeats, so one  
heartbeat might have a 3 line select query, the next heartbeat might  
be processing 20 different queries including a 20 line update.

I do have a script that will step through the log file and parse out  
what updates were made to what database and what table at what time,  
but it craps out when run against the entire log file, so I'm mostly  
just trying to pare the log file down to a size where it'll work with  
my other tools :)

> FYI: I work with large binary data sets all the time, and we use split
> to keep each chunk to 2 GB.  Not specifically needed anymore, but if
> you have read error etc. if is just the one 2 GB chunk you have to
> retrieve from backup.  if also affords you the ability to copy the
> data to FAT32 filesystem for portability.

Normally, we rotate logs nightly and keep about a weeks worth, so the  
space or individual size comparisons are usually not an issue. In this  
case, logrotate busted for mysql sometime back in November and the  
beast just kept eating. 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (Darwin)

iEYEARECAAYFAknGUTIACgkQXzanDlV0VY53YgCgkJxWJK6AAOZ+c2QTPN/gYLJH
v/YAoPZXNIBckyfhfbMGrAZ6TNEqcIxV
=IOjT
-----END PGP SIGNATURE-----



More information about the Ale mailing list