[ale] Any MySQL Experts around?

Jason Etheridge phasefx at magusaptus.com
Sat Feb 14 22:12:12 EST 2004


> I'm thinking of having all my server logs (including NT/2K/2K+3) record
> into a MySQL db for better reporting.  Does anyone know if MySQL can
> handle 100,000+ inserts per day?  Also, if it handling this many 
> inserts,
> how good is the SELECT performance for the automated reporting?  ANy
> thoughts, pointers?

Disclaimer:  I'm not a real DBA and the tuple calculus I learned in 
school actually tripped me up in the real world :D

The largest table I work with in mysql has roughly 8 million records 
using full-text indexing.  Each row contains a little over 260 bytes of 
data.  Every morning I turn indexing off and do a LOAD DATA REPLACE 
touching every record, changing a good bit of them, and adding a 
thousand or so new records.  It can take around 3 hours to do that and 
reindex on an old Sun E10K (I forget how many processors; maybe a dozen 
for that domain).  Queries are fast after that, but I use READ LOCK's 
and there are no updates during the day.  I'm not reporting on live 
data.

If you don't use full-text indexing, and instead use LIKE statements on 
large fields, searching is going to be pretty slow with that many 
records, especially if you can't lock the tables because you're doing 
updates.  If you do use full-text indexing, I'd worry about 100,000+ 
inserts a day and concurrent searching.  Full-text updates work best 
with indexing disabled, and even then, when you go to renable indexing, 
it'll do it via REPAIR BY SORTING, which could take a long time if you 
don't have enough RAM to do it without thrashing the disks.

What sort of reporting do you want to do and sort of logs are they?  If 
you just want to store key statistical information rather than entire 
log entries, that would be better.  But for storing actual logs, I'm 
not sure relational databases are the best tool.  For log-based 
reporting, I currently use a kludge: the system loads into temporary 
tables specific data parsed from log entries from just the months we're 
interested in for the current running report.  This only works well for 
one or two months in our case because of the arbitrary timeout value I 
set.  For anything really large, like a year's worth of transactions, I 
just run the logs through a perl script and forget mysql.

I have another table without full-text indexing that gets searched and 
updated throughout the day, with about a million records.  But nothing 
is added except in the mornings, just updated.  I'll try to find some 
stats for that if you want.

Best thing to do is get a test machine and play around with some sample 
loads, and make sure to allocate a lot of filehandles for mysql's 
process space.

-- Jason



More information about the Ale mailing list