[ale] best way to copy 3Tb of data

Jim Kinney jkinney at jimkinney.us
Tue Oct 27 11:59:39 EDT 2015


And by running his projects outside of condor, he wasn't subject to
being "scheduled" and condor couldn't properly adjust the scheduling
for everyone else.
Yep. And he's probably a senior boss type person. He's also a lazy
researcher who doesnnn't want to learn how to use the new tools.
Probably wrote a bunch of code in fortran "back in the day" and the
central portion can't be paralellized due to being too many nested
loops for auto-parallelizing tools and rewriting the mess into another
format is "inconceivable". That said, by hogging all the machines and
making the other users projects have to wait, it justifies getting more
cluster nodes every year or so :-)
Take one of the older clusters, modernize the software and make it all
his. Show how running across all 40 nodes is faster than using 5 or 6
new ones.
On Tue, 2015-10-27 at 10:47 -0500, Todor Fassl wrote:
> Man, I wish I had your "hand" (Seinfeld reference). I'd get fired if
> I 
> tried that.
> 
> We had another guy who kept running his code on like 5 or 6 different
> machines at a time.  I kept trying to steer him toward condor.  He 
> insisted condor wouldn't work for him. How can it not?
> 
> On 10/27/2015 10:38 AM, Jim Kinney wrote:
> > I implemented a cron job to delete scratch data created over 30
> > days
> > ago. That didn't go well with the people who were eating up all
> > space
> > and not paying for hard drives. So I gave them a way to extend
> > particular areas up to 90 days. Day 91 it was deleted. So they
> > wrote a
> > script to copy their internet archive around every 2 weeks to keep
> > the
> > creation date below the 30 day cut off. So I shrunk the partition
> > of
> > /scratch to about 10G larger than was currently in use. He couldn't
> > do
> > his runs to graduate in time without cleaning up his mess. It also
> > pissed off other people and they yelled at him when I gave my
> > report of
> > who the storage hog was.
> > 
> > On October 27, 2015 11:24:48 AM EDT, Todor Fassl <
> > fassl.tod at gmail.com>
> > wrote:
> > 
> >     I dunno.  First of all, I don't have any details on what's
> > going on on
> >     the HPC cluster. All I know is the researcher says he needs to
> > back up
> >     his  3T of scratch data because they are telling him it will be
> > erased
> >     when they upgrade something or other. Also, I don't know how
> > you can
> >     have 3T of scratch data or why, if it's scratch data, it can't
> > just be
> >     deleted. I come across this all the time though. Researchers
> > pretty
> >     regularly generate 1T+ of what they insist is scratch data.
> > 
> >     In fact, I've had this discussion with this very same
> > researcher. He's
> >     not the only one who does this but he happens to be the guy who
> > i last
> >     questioned about it. You know this "scratch" space isn't backed
> > up or
> >     anything. If the NAS burns up or if you type in the wrong rm
> > command,
> >     it's gone. No problem, it's just scratch data. Well, then how
> > come I
> >     can't just delete it when I want to re-do the network storage
> >     device?
> > 
> >     They get mad if you push them too hard.
> > 
> > 
> > 
> > 
> > 
> >     On 10/27/2015 09:45 AM, Jim Kinney wrote:
> > 
> >         Dumb question: Why is data _stored_ on an HPC cluster? The
> >         storage for
> >         an HPC should be a separate entity entirely. It's a High
> > Performance
> >         cluster, not a Large Storage cluster. Ideally, a complete
> >         teardown and
> >         rebuild of an HPC should have exactly zero impact on the
> > HPC users'
> >         data. Any data kept on the local space of an HPC is purely
> >         scratch/temp
> >         data and is disposable with the possible exception of
> > checkpoint
> >         data
> >         and that should be written back to the main storage and
> > deleted
> >         once the
> >         full run is completed.
> > 
> >         On Tue, 2015-10-27 at 08:33 -0500, Todor Fassl wrote:
> > 
> >             One of the researchers I support wants to backup 3T of
> > data
> >             to his space
> >             on our NAS. The data is on an HPC cluster on another
> >             network. It's not
> >             an on-going backup. He just needs to save it to our NAS
> >             while the HPC
> >             cluster is rebuilt. Then he'll need to copy it right
> > back.
> > 
> >             There is a very stable 1G connection between the 2
> > networks.
> >             We have
> >             plenty of space on our NAS. What is the best way to do
> > the
> >             caopy?
> >             Ideally, it seems we'd want to have boththe ability to
> >             restart the copy
> >             if it fails part way through and to end up with a
> > compressed
> >             archive
> >             like a tarball. Googling around tends to suggest that
> > it's
> >             eitehr rsync
> >             or tar. But with rsync, you wouldn't end up with a
> > tarball.
> >             And with
> >             tar, you can't restart it in the middle. Any other
> > ideas?
> >             Since the network connection is very stable, I am
> > thinking
> >             of suggesting
> >             tar.
> > 
> >             tar zcvf - /datadirectory | sshuser at backup.server
> >             user at backup.server> "cat > backupfile.tgz"
> > 
> >             If the researcher would prefer his data to be copied to
> > our
> >             NAS as
> >             regular files, just use rsync with compression. We
> > don't
> >             have an rsync
> >             server that is accessible to the outside world. He
> > could use
> >             ssh with
> >             rsync but I could set up rsync if it would be
> > worthwhile.
> > 
> >             Ideas? Suggestions?
> > 
> > 
> > 
> >             on at the far end.
> > 
> >             He is going to need to copy the data back in a few
> > weeks. It
> >             might even
> >             be worthwhile to send it via tar without
> >             uncompressing/unarchiving it on
> >             receiving end.
> > 
> > 
> > 
> >             -------------------------------------------------------
> > -----------------
> > 
> >             Ale mailing list
> >             Ale at ale.org Ale at ale.org>
> >             http://mail.ale.org/mailman/listinfo/ale
> >             See JOBS, ANNOUNCE and SCHOOLS lists at
> >             http://mail.ale.org/mailman/listinfo
> > 
> > 
> >         --
> >         James P. Kinney III
> > 
> >         Every time you stop a school, you will have to build a
> > jail.
> >         What you
> >         gain at one end you lose at the other. It's like feeding a
> > dog
> >         on his
> >         own tail. It won't fatten the dog.
> >         - Speech 11/23/1900 Mark Twain
> > 
> >         http://heretothereideas.blogspot.com/
> > 
> > 
> > 
> >         -----------------------------------------------------------
> > -------------
> > 
> >         Ale mailing list
> >         Ale at ale.org
> >         http://mail.ale.org/mailman/listinfo/ale
> >         See JOBS, ANNOUNCE and SCHOOLS lists at
> >         http://mail.ale.org/mailman/listinfo
> > 
> > 
> > 
> > --
> > Sent from my Android device with K-9 Mail. Please excuse my
> > brevity.
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ale.org/pipermail/ale/attachments/20151027/1b7dda08/attachment.html>


More information about the Ale mailing list