[ale] Samba: file corruption on write to share followed by hang

Jim Kinney jim.kinney at gmail.com
Thu Dec 10 20:40:21 EST 2009


Bad ECC RAM is still bad RAM. ECC can only correct a single bit flip in
register. 2 bit flips and it's all toast.

It does sound like Samba managed to totally corrupt itself and the hang
later may have been related to the system thrashing ram around. The
filesystem definitions are kernel space so samba has to access that to
function. Just be restarting samba is a pretty good indication that it was
memory associated with the samba process. The aggressive caching of the
kernel will amplify a bad memory situation. Restarting samba will cause teh
samba caching to also restart and that may have overwritten the bad data
portion which was related to the filesystem management area.

On Thu, Dec 10, 2009 at 3:41 PM, Jeff Hubbs <jhubbslist at att.net> wrote:

> How does it factor in that it's ECC RAM?  There are four 2GiB DIMMs, one
> in every fourth of sixteen slots.
>
> Jim Kinney wrote:
> > time to run memtest
> >
> > On Thu, Dec 10, 2009 at 2:34 PM, Jeff Hubbs <jhubbslist at att.net
> > <mailto:jhubbslist at att.net>> wrote:
> >
> >     Troubling behavior under Samba 3.0.33:  when a certain win2K user
> >     opened
> >     an excel file and saved it, it showed to be corrupted on the next
> >     open.
> >     If same user simply dragged a file down from the Samba share and
> >     dragged
> >     it back, the file had changed (shown by md5sum) even if the size were
> >     the same.  I was troubleshooting this when a few hours later, all
> >     authentication and share access hung up.  I shelled into the
> >     server and
> >     a ps aux would only get so far and hang w/o completing.  Top would
> not
> >     start - no output, just hang.  Restarted samba; everything started
> >     working again and the file corruption on share went away (by "went
> >     away", I mean that the files weren't being changed anymore when
> >     written
> >     back to the server).  What the hey??
> >
> >     FWIW, the server had been up for 91 days in a 150-200 user office.  I
> >     don't think Samba had been restarted in that time.
> >
>
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo
>



-- 
-- 
James P. Kinney III
Actively in pursuit of Life, Liberty and Happiness
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.ale.org/pipermail/ale/attachments/20091210/d4f35e7f/attachment-0001.html 


More information about the Ale mailing list