[ale] Dealing with really big log files....

Greg Freemyer greg.freemyer at gmail.com
Sun Mar 22 10:15:36 EDT 2009


If you have the disk space and few hours to let it run, I would just
"split" that file into big chinks.  Maybe a million lines each.

I'd recommend the source and destination of your split command be on
different physical drives if you can manage it.  Even if that means
connecting up a external usb drive to hold the split files.

If you don't have the disk space, you could try something like:

head -2000000 my_log_file | tail -50000 > /tmp/my_chunk_of_interest

Or grep has a option to grab lines before and after a line that has
the pattern in it.

Hopefully one of those 3 will work for you.

FYI: I work with large binary data sets all the time, and we use split
to keep each chunk to 2 GB.  Not specifically needed anymore, but if
you have read error etc. if is just the one 2 GB chunk you have to
retrieve from backup.  if also affords you the ability to copy the
data to FAT32 filesystem for portability.

Greg

On Sun, Mar 22, 2009 at 9:41 AM, Kenneth Ratliff <lists at noctum.net> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> I need to extract data from a mysql log in a 12 hour window. The
> particular problem here is that the log file is 114 gigs, and goes
> back to november 2008 (yes, someone screwed the pooch with the log
> rotation on this one, already fixed *that* particular problem, but
> still have the resulting big log file!)
>
> Now, my normal methods of parsing through a log file take a really
> really long time due to it's size.
>
> I know about what line number the data I want begins on. Is there an
> easy way to just chop off all the lines before that and leaving
> everything else intact? Obviously, due to the size of the file, I
> can't load it in vi to do my usual voodoo for this crap.
>
> I'm thinking of running sed -e '1,<really big number>d' mysql.log
>
> against it, but does anyone know of a better method to just chunk out
> a big section of a text file (and by better, I mean faster, it takes
> upwards of 3 hours to process this damn thing)
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.9 (Darwin)
>
> iEYEARECAAYFAknGQCAACgkQXzanDlV0VY7CHACgsfmnV4YuXSFbQyBV2gTsa/r5
> 29cAn0ZlZcz7YSnSw6WbNHH4is2GXpHp
> =2JWk
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
>



-- 
Greg Freemyer
Head of EDD Tape Extraction and Processing team
Litigation Triage Solutions Specialist
http://www.linkedin.com/in/gregfreemyer
First 99 Days Litigation White Paper -
http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf

The Norcross Group
The Intersection of Evidence & Technology
http://www.norcrossgroup.com


More information about the Ale mailing list