[ale] Any MySQL Experts around?

Ronald Chmara ron at Opus1.COM
Tue Feb 17 05:42:37 EST 2004


On Feb 15, 2004, at 12:12 PM, J.M. Taylor wrote:
> I would strongly urge you *not* to store your log files in any 
> relational
> database.  There's just no reason to do it, especially for the number 
> of
> rows you're talking about.

I disagree, because I've worked on a project where 2.4 million dollars 
(per year) hinged on doing this...

>  I've worked on writing reporting solutions for
> huge sets of log data, and I've been a systems admin and DBA (MySQL and
> Informix).  There may be a solution I've not seen but in my experience
> you're not going to get good performance doing what you plan to do.

*nod*

Doing this right required a lot of CPU, or really strong design. We 
chose the former, the latter cost too much for queries that were run 
once a month (4 hour return time vs. 1.2 seconds, but the costing 
estimates of implementing the 1.2 second solution were fairly large).

> Storing all your log files on a central server is an excellent idea.  I
> would store them as text files if you can.  What I would suggest for
> better reporting is to decide what needs reporting and write a perl 
> script
> (most efficient afaik for text processing) to parse your files and 
> store only
> *aggregate* data in a database.

Note that pre-processing via perl (or whatever) can *significantly* 
shrink a data set.  Example: if you have 20 directories off of the 
parent root of a data site, tracking (and searching) 20 integers is 
much smaller (and faster) than tracking and searching 20 text fields. 
Same with GET/PUT/POST, same with User agents, daemons, user, etc... 
even form fields and query strings can be managed the same way.

Aggregate is not required, but the same rules to limit a data set might 
help. Don't repeat data. Don't repeat things like <host> [apache], 
POST, <homedir>, <childdir> etc.... If it occurs less than 65,000 times 
in a db, turn it into an integer token. Heck, I'd tokenize it after 600 
times. I prefer a result after 1.2 seconds. :-)

Of course, db's with bad joining (cough) can perform poorly in such 
situations.

-Bop



More information about the Ale mailing list