[ale] Archiving directories/files with "compressed" mirror version

Michael B. Trausch mike at trausch.us
Thu Aug 14 01:02:52 EDT 2008


On Wed, 2008-08-13 at 14:48 -0400, Ed L. Cashin wrote:
> Do you have lots of free space and resources?
> 
> You could do that with,
> 
>   cp -a foo foo.archive
>   find foo.archive -type f -exec bzip2 '{}' ';'

This approach is nice and simple.  It will, of course, only work if you
have approximately 200% the data size available, _and_ the file names do
not have spaces, tabs, etc. in them.

There is a slightly more complex way to do it such that you don't have
the requirement for the 200% data storage, though.  Assuming that ./foo/
is the directory that needs to be copied and compressed with the tree
preserved:

$ find foo -type d | sed 's/^foo/foo.new/' | xargs mkdir -p
$ for FILE in `find foo -type f`; do NEWFILE=$(echo $FILE | sed
's/^foo/foo.new/'); cat $FILE | bzip2 > $NEWFILE; done

However, this is still not friendly to filenames that contain spaces,
due to the way the shell works; the 'for' builtin will break at spaces.
Instead of using 'for', a combination of the 'while' and 'read' builtins
can do a similar thing, but it gets a little large to enter on the
command line:

$ find foo -type d | sed 's/^foo/foo.new/' | while read NEWDIR; do mkdir
-p "$NEWDIR"; done
$ find foo -type f | while read SOURCE_FILE; do DEST_FILE=$(echo
"${SOURCE_FILE}.bz2" | sed 's/^foo/foo.new/'); cat "$SOURCE_FILE" |
bzip2 > "$DEST_FILE"; done

To make it easier to read, here it is in the form of a shell script,
generalized to work for any user-specified directory (note, this will
fail to act properly if the filename contains a newline character, but
this is a rare occurrence.  If you have filenames that have newlines in
them, you should probably rename them anyway.):

---------------BEGIN
#!/bin/bash
#
# Duplicate the specified directory as 'directory.new', but with all the
# files in the tree compressed via bzip2.
#
# by Michael Trausch, 2008.  Public domain.
#
SRCDIR="$1"
DESTDIR="$1.compressed"

function ErrorExit {
    printf " Failed.\nbzip2 returned an error (%d)" $1
    exit $1
}

# mirror the directory tree, first.
find "$SRCDIR" -type d | sed "s|^$SRCDIR|$DESTDIR|" | \
    while read NEWDIR; do mkdir -p "$NEWDIR"; done

# now, for each of the files, compress them and put them in the new
# tree.
find "$SRCDIR" -type f | while read SOURCE_FILE; do
    DEST_FILE=$(echo "${SOURCE_FILE}.bz2" | sed "s|^$SRCDIR|$DESTDIR|")
    printf "Compressing %s to %s..." "$SOURCE_FILE" "$DEST_FILE"
    cat "$SOURCE_FILE" | bzip2 > "$DEST_FILE" || ErrorExit $?
    printf " done!\n"
done
---------------END

Here are the results of running this on a (slightly redacted) version of
my ${HOME}:

Thursday, 2008-Aug-14 at 00:57:13 - mbt at zest - Linux v2.6.24
Ubuntu Hardy:[1-120/566-0]:~/tst> tree
.
|-- test
|   |-- 100-pushups.ods
|   |-- Doctorow, C. - Little Brother.pdf
|   |-- FertigoProRegular
|   |   |-- Ferigo_Pro.pdf
|   |   |-- Fertigo_PRO.otf
|   |   `-- license_agreement.txt
|   |-- GalaxiumContactList1.png
|   |-- Hegadekatte_2006_PhD-Thesis.pdf
|   |-- Router configuration.conf
|   |-- UCAM-CL-TR-577.pdf
|   |-- WIU Grad App.pdf
|   |-- bubbltre.zip
|   |-- fcgi-lib-description.odt
|   |-- letter to parc.odt
|   `-- ll.odt
`-- test.compressed
    |-- 100-pushups.ods.bz2
    |-- Doctorow, C. - Little Brother.pdf.bz2
    |-- FertigoProRegular
    |   |-- Ferigo_Pro.pdf.bz2
    |   |-- Fertigo_PRO.otf.bz2
    |   `-- license_agreement.txt.bz2
    |-- GalaxiumContactList1.png.bz2
    |-- Hegadekatte_2006_PhD-Thesis.pdf.bz2
    |-- Router configuration.conf.bz2
    |-- UCAM-CL-TR-577.pdf.bz2
    |-- WIU Grad App.pdf.bz2
    |-- bubbltre.zip.bz2
    |-- fcgi-lib-description.odt.bz2
    |-- letter to parc.odt.bz2
    `-- ll.odt.bz2

4 directories, 28 files

For convenience, the shell script is attached.  Consider it public
domain.

	--- Mike

-- 
My sigfile ran away and is on hiatus.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dupdir-bz2
Type: application/x-shellscript
Size: 804 bytes
Desc: not available
Url : http://mail.ale.org/pipermail/ale/attachments/20080814/02415344/attachment-0002.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://mail.ale.org/pipermail/ale/attachments/20080814/02415344/attachment-0003.bin 


More information about the Ale mailing list