[ale] interesting problem

Jim Kinney jim.kinney at gmail.com
Thu Jan 11 15:57:56 EST 2018


James and Ed,
One thing I've found is a HUGE number of bad symlinks - most are simply
pointing back to a non-existent source file (user deletion likely)
while a second batch has a source file tree that appears to have been
moved after the links were made. Additionally, links were made
(attempted) from <old dir>/foo* to ../<new dir>/foo*  (literal * in the
names!) in what looks like an attempt to mass link to a collection of
similar named files in folders. Yeah. _that_ worked :-) NOT!
<sigh>
find is being quite helpful. I didn't think about the old tar | tar
process. Tar may be a better way to move the actual data and the links
(that are real) as it's blissfully ignorant and low level enough.
Thanks!
Some further background: The destination is a gluster storage cluster
mounted on a machine that also has 4  4TB drives attached (RAID5
backup) as the source. The glusterfs reports a zillion or so issues and
all see to involve symlinks. The rsync from backup raid to new storage
space reported a zillion issues with symlinks and IO errors. Of course
sysadmin panic sets in with IO errors > 0 and especially > 7000! 
As soon as I get the error numbers to 0, I can reconfig to support a
third machine with the checksum bricks and add more storage overall.
This is the last of the repairs from the RAID6 3-drive crash that
trashed the one node (all 100+TB) last August.
On Thu, 2018-01-11 at 20:23 +0000, Putnam, James M. wrote:
>    Tar (with some combination of switches) may be able to do all this
> for you. A
>    quick test would tell.
> 
>    Upping the block size to some multiple of the native file system
> block size may
>    let the OS DMA directly/from to user space (at least it did in
> SunOS/Solaris/BSD*,
>    not sure if Linux does that these days) which would kill some of
> the tar overhead.
> 
> --
> James  M. Putnam
> Visiting Professor of Computer Science
> 
> The air was soft, the stars so fine,
> the promise of every cobbled alley so great,
> that I thought I was in a dream.
> ________________________________________
> From: Ale [ale-bounces at ale.org] on behalf of Jim Kinney via Ale [ale@
> ale.org]
> Sent: Thursday, January 11, 2018 3:04 PM
> To: Atlanta User Group (E-mail)
> Subject: [ale] interesting problem
> 
> Imagine a giant collection of files, several TB, of unknown directory
> names and unknown directory depths at any point. From the top of that
> tree, you need to cd into EVERY directory, find the symlinks in each
> directory and remake them in a parallel tree on the same system but
> in a different starting point. Rsync is not happy with the relative
> links so that fails as each link looks to be relative to the location
> of the process running rsync.
> 
> It is possible given the source of this data tree that recursive,
> looping symlinks exist. That must be recreated in the new location.
> 
> It looks like a find to list all symlinks in the entire tree then cd
> to each final location to recreate is best. That can be sped up with
> running multiple processes splitting the link list into sections.
> 
> Better ideas?
> 
> --
> 
> James P. Kinney III Every time you stop a school, you will have to
> build a jail. What you gain at one end you lose at the other. It's
> like feeding a dog on his own tail. It won't fatten the dog. - Speech
> 11/23/1900 Mark Twain http://heretothereideas.blogspot.com/
> 
-- 
James P. Kinney III

Every time you stop a school, you will have to build a jail. What you
gain at one end you lose at the other. It's like feeding a dog on his
own tail. It won't fatten the dog.
- Speech 11/23/1900 Mark Twain

http://heretothereideas.blogspot.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ale.org/pipermail/ale/attachments/20180111/b122c7da/attachment.html>


More information about the Ale mailing list