[ale] Another Large File/PERL/Awk/Sed question...

Richard Bronosky Richard at Bronosky.com
Tue Dec 1 17:44:52 EST 2009


Yeah, I was discussing the merits of hexedit and ed with a coworker
for this specific edge case.

2009/12/1 Björn Gustafsson <bg-ale at bjorng.net>:
> With a really gigantic file, in the special case when you are truly
> only replacing a byte (i.e. the beginning state of the file is the
> same size as the end state), you can use seek() magic to make this a
> lot quicker.  On a test with a 500 MB file, this takes 0.003 sec
> elapsed time versus 14.2 sec for the sed -i approach. Of course, this
> *only* works if the file size stays the same: otherwise your file will
> be irrevocably corrupted.
>
> #!/usr/bin/perl -w
>
> use strict;
>
> open(FILE, "+<test.file") or die;
>
> my $firstLine=<FILE>;
>
> $firstLine =~ s/column4;column4/column4;column5/;  # if you change the
> number of bytes, data is corrupted.
>
> seek(FILE,0,0);
>
> print(FILE $firstLine);
>
> close(FILE);
>
>
> On Tue, Dec 1, 2009 at 4:59 PM, JK <jknapka at kneuro.net> wrote:
>> Richard Bronosky wrote:
>>> in sed, use '[range/line number]{commands}' to limit where the edits
>>> are made. Example:
>>> mount |sed '1{s/type/TEST/}'
>>>
>>> what you want is:
>>> sed -i.bak '1{s/column4/column5/}' filename
>>
>> Interesting...
>>
>> For single commands, address-space-command also works, so:
>>
>>   sed -i -e '1 s/column4;column4/column4;column5/'
>>
>> would work as well as the {} version.  My previous attempt
>> was wrong though, in that 0 is not a valid address in at
>> least some versions of sed. In fact, the man page for 4.2.1
>> says 0 IS valid, and means "really for sure start matching
>> at the very first line" (there are some circumstances where
>> 1 will NOT match, to wit, if line 1 matches the regexp
>> specified as the ending address); but 4.2.1 does not actually
>> accept that syntax.
>>
>> -- JK
>>
>>> A backup filename.bak will be created with that command. drop the
>>> -i.bak if you don't want it.
>>>
>>>
>>> On Tue, Dec 1, 2009 at 4:06 PM, Bob Kruger <bkruger at mindspring.com> wrote:
>>>> All;
>>>>
>>>> Thanks to all who assisted me with my earlier question on deleting the semicolon from the end of a line.  I have another one that may be a bit stickier.
>>>>
>>>> Again I have a large data file in text format, this one is 3.2GB.  Same as before, the field are semicolon delimited.  The first line of the file is the column name.  However, I have two columns that were inadvertently given the same column name.
>>>>
>>>> Example:
>>>>
>>>> column1;column2;column3;column4;column4;column6;column7....
>>>>
>>>> I would like to change the second instance of column4 to column5 on the first line of the file.  I thought it would be simple to fire up vi and just do a simple text edit.  The edit part was simple, but the saving of the file is taking hours.
>>>>
>>>> Any thoughts or ideas using PERL, Awk, or Sed?
>>>>
>>>> Thanks in advance for any assistance.
>>>>
>>>> V/r
>>>>
>>>> Bob
>
> --
> Björn Gustafsson
>
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo
>



-- 
.!# RichardBronosky #!.



More information about the Ale mailing list