[ale] Another Large File/PERL/Awk/Sed question...

Björn Gustafsson bg-ale at bjorng.net
Tue Dec 1 17:36:37 EST 2009


With a really gigantic file, in the special case when you are truly
only replacing a byte (i.e. the beginning state of the file is the
same size as the end state), you can use seek() magic to make this a
lot quicker.  On a test with a 500 MB file, this takes 0.003 sec
elapsed time versus 14.2 sec for the sed -i approach. Of course, this
*only* works if the file size stays the same: otherwise your file will
be irrevocably corrupted.

#!/usr/bin/perl -w

use strict;

open(FILE, "+<test.file") or die;

my $firstLine=<FILE>;

$firstLine =~ s/column4;column4/column4;column5/;  # if you change the
number of bytes, data is corrupted.

seek(FILE,0,0);

print(FILE $firstLine);

close(FILE);


On Tue, Dec 1, 2009 at 4:59 PM, JK <jknapka at kneuro.net> wrote:
> Richard Bronosky wrote:
>> in sed, use '[range/line number]{commands}' to limit where the edits
>> are made. Example:
>> mount |sed '1{s/type/TEST/}'
>>
>> what you want is:
>> sed -i.bak '1{s/column4/column5/}' filename
>
> Interesting...
>
> For single commands, address-space-command also works, so:
>
>   sed -i -e '1 s/column4;column4/column4;column5/'
>
> would work as well as the {} version.  My previous attempt
> was wrong though, in that 0 is not a valid address in at
> least some versions of sed. In fact, the man page for 4.2.1
> says 0 IS valid, and means "really for sure start matching
> at the very first line" (there are some circumstances where
> 1 will NOT match, to wit, if line 1 matches the regexp
> specified as the ending address); but 4.2.1 does not actually
> accept that syntax.
>
> -- JK
>
>> A backup filename.bak will be created with that command. drop the
>> -i.bak if you don't want it.
>>
>>
>> On Tue, Dec 1, 2009 at 4:06 PM, Bob Kruger <bkruger at mindspring.com> wrote:
>>> All;
>>>
>>> Thanks to all who assisted me with my earlier question on deleting the semicolon from the end of a line.  I have another one that may be a bit stickier.
>>>
>>> Again I have a large data file in text format, this one is 3.2GB.  Same as before, the field are semicolon delimited.  The first line of the file is the column name.  However, I have two columns that were inadvertently given the same column name.
>>>
>>> Example:
>>>
>>> column1;column2;column3;column4;column4;column6;column7....
>>>
>>> I would like to change the second instance of column4 to column5 on the first line of the file.  I thought it would be simple to fire up vi and just do a simple text edit.  The edit part was simple, but the saving of the file is taking hours.
>>>
>>> Any thoughts or ideas using PERL, Awk, or Sed?
>>>
>>> Thanks in advance for any assistance.
>>>
>>> V/r
>>>
>>> Bob

-- 
Björn Gustafsson



More information about the Ale mailing list