[ale] mass file modifcation

Mike Harrison meuon at geeklabs.com
Sun Mar 30 18:19:22 EDT 2008


Jim
> I need to update about 43k files and sed just won't cut it for this
> task.   What I need to do is replace 2 lines with 4 new ones, and the
> lines contain URLs (backslashes, brackets, etc.).  What I would like
> to do is put the new text in a file and pass it and the search text to
> some program that will modify all the files.   Any ideas on whats
> available to do that?

I've not done as much of this as I used to fixing mailQ's and such
at an ISP, but I always ended up doing it in PERL.
Often with a switch for doing 10 files, writing the changed files
in /tmp so I could manually verify them before bulk changing hundreds of 
thousands (or more) files. I'm not as good with find/sed/awk, but one of 
the reasons I was doing things like this on Perl is it worked well
when there were lots of files in a single directory, and shell scripting 
couldn't handle the lists of files well.

I also often found it easier to write and debug complex regex's in perl
as several steps. Regex's are incredible, and powerful, 
and really easy to do things that you didn't realize with exceptions.

I don't have my old perl scripts from those days,

But they all had something like what is below (which cleans up bad MS-HTML):
(note, the character encoding in the regex's didn't cut and past well into e-mail:
-------------------------------------------------------------------------------------------
opendir(INC,"$dd") ;
print "Opening: $dd" ;
@incfiles = readdir(INC) ;
closedir INC ;
foreach(sort @incfiles) {
   if(/^\./ ) { } else {
       if(/(.*).html/ ) {
           $file = $_ ;
           fixheader($file) ;
           #sleep 1 ;  # let the server breath. Optional.
       } ;
   };
} ;

sub fixheader($file) {
  $page = '' ;
  $body = 'F' ;
  open(IN,"$dd/$file") ;
   while(<IN>) {
     if(/\<body/) { $body = "T" ; } ; #don't process headers..
     if($body eq "T") {
       $page .= $_ ;
     } ;
   } ; # end while IN
   close IN ;
   $page =~ s/M//g ;       #deletes cr's
   $page =~ s/\&\#13;/[\[P\]\]/g ; #turns encoded CR's into <P>
   $page =~ s/\U/\[[li]]/g ; # NOTE X is Magic Char 95.   Turns into bullets/listed items
   $page =~ s/\n//g ;   # deletes lf's
   #lots more of these..
   open(OUT,">$dd/$file.new") ;
   print OUT $page
   close OUT ; 
} ;




More information about the Ale mailing list