[ale] hmm. yer never too old to trip on Grep Reg Expressions

Jim Kinney jim.kinney at gmail.com
Sun Sep 25 10:10:03 EDT 2016


I've notice that different distros use grep/egrep based on how it's called
and some always call egrep. That will fubar badly moving between
environments. I've gotten in the habit of "grep for text, egrep for
everything else".
So " what file has string foo?" is a grep for me. I call egrep explicitly
for regex.

I would love to see gnu remove regex capability from simple, old grep
completely. Nothing but exact string matching. Then all regex usage is in
egrep. Sort of baby grep vs grown up grep. Hmm. Maybe an sgrep version-
simple grep or string grep with the only option being to display substring
matches and limit the number of characters to drop from the string for a
match.

If I dig in man grep, er, info (damn!), I'll bet that is already an option.
I think it's possible to brew coffee using grep, sed awk and emacs.

On Sep 25, 2016 9:15 AM, "DJ-Pfulio" <djpfulio at jdpfu.com> wrote:

Good explanation Charles, but something looks funny.   I created some files
to test:

 $ touch aaaa bb ccc i

# The tried the regex:
 $ ls -1 |grep "[abc]+"
      <nothing returned>


 $ ls -1 |grep "[abc]"
 aaaa
 bb
 ccc


 $ ls -1 |grep "[abc]*"
 aaaa
 bb
 ccc
 i

# But if we used egrep (or grep -E if you like)
 $ ls -1 |egrep "[abc]+"
 aaaa
 bb
 ccc

Gotta know which regex engine is being used. ;)
In perl, I've used numbers after an [] group to say exactly how many of
_those
things_ I needed. That didn't work with grep/egrep. Don't know why, just
know it
didn't.

 $ ls -1 |grep "[abc]c"
 ccc
# and
 $ ls -1 |grep "^[abc]c"
 ccc

Should also mention that piping ls output into a grep is just to avoid bash
globbing so grep is really used.  That ls option is a -1 (one), not l (el).

For a few years, I had to create some very nast regex to match patterns in
govt
documents ... so we could hyperlink the ToC and Index entries into the
document
at the correct page/paragraph. Nothing like experience to teach. About 200
docs
per flight, so lots of variability.  Adobe Type 3 fonts really screwed with
our
regexes since they aren't really letters to the computer. ;(

On 09/23/2016 09:01 AM, Charles Shapiro wrote:
>
> Ah, regex golf.  Try 'def.*buff.*for.*ALTPLAN'  Use "grep -i" to ignore
case.
> Your initial regexp used *file* regex, where "*" means any character any
> length.  In the proper formal dialects, "*" merely means any number of the
> preceding RE, and the "." means any character. Hence, "foo*" in the shell
> matches "fooa","foob", et cetera.  But in regex, it matches only "foo",
"fooo",
> "foooooo", et cetera. Watch out for quoting in the shell also; that's why
I used
> single-quotes.  Knowing just a few REs can carry you a surprising
distance.
>  [abc] matches the single character a,b,and c.  So "[abc]+" matches aaaa,
bb, or
> ccc but not i.
>
>
> This worked for me on the following file:
>
> define buffer snort for ALTPLAN
> DEFINE BUFFER BOOF for ALTPLAN
> FOO
>
> !:/home/cshapiro/Mapping_Contracts/forsythco> grep -i '^def.*buf.*for
ALTPLAN'
> foo.txt
> define buffer snort for ALTPLAN
> DEFINE BUFFER BOOF for ALTPLAN
>
> For extra fnu, try the regex golf site ( http://www.regex.alf.nu/ ).
>
> -- CHS
>
>
> On Thu, Sep 22, 2016 at 8:35 PM, DJ-Pfulio <DJPfulio at jdpfu.com
> <mailto:DJPfulio at jdpfu.com>> wrote:
>
>     I'd use perl. Trivial to read a file, find the lines matching any
>     complex regex you like, back up 3 lines and print the following 14
lines.
>     Don't forget to handle lines that happen inside the group to be
>     exported. Would be good to show file:linenum:LINE so it is clear -
>     perhaps highlight the actual line with << >> - idunno.
>
>     I like Leam's regex except the leading ^ and trailing $ - these things
>     don't need to start in col-1 or end of line. Otherwise, probably
>     restrictive enough to minimize unwanted output.
>
>     On 09/22/2016 07:30 PM, Leam Hall wrote:
>     > Why not "^def*buff*altplan$"? Then grep v out things you don't want.
>     >
>     > On 09/22/16 14:46, Neal Rhodes wrote:
>     >> So, I need to look in about a bazillion source files for variants
of
>     >>
>     >>     DEFINE BUFFER SNORT FOR ALTPLAN.
>     >>     Define Buffer Blech for AltPlan.
>     >>     Def    Buff   Blurf for AltPlan.
>     >>     Def Buff Blurf for AltPlan.
>     >>     def buff blurf for altplan.
>     >>     define buff blurf for altplan.
>     >>     define                      buffer                   blorf for
>     >> altplan.
>     >>     define  new shared buffer                   blorf for altplan.
>     >>
>     >> And grap 3 lines before, 10 lines afterwards, source file and
line#.
>     >>
>     >> I was thinking this would to it:
>     >>
>     >>     grep -i -B 3 -A 10 -H -n -r -f buf-grep.inp * > buf.grep.out
>     >>
>     >> Where buf-grep.inp was
>     >>
>     >>     def*buff*for*ALTPLAN
>     >>
>     >>     def*buff*for*ARM
>     >>
>     >>     def*buff*for*ARMNOTE
>     >>
>     >> Alas it is not thus, and the more I study the reg exp notes the
more I
>     >> see there error of my ways, and the less I see an expression that
would
>     >> work.
>     >>

_______________________________________________
Ale mailing list
Ale at ale.org
http://mail.ale.org/mailman/listinfo/ale
See JOBS, ANNOUNCE and SCHOOLS lists at
http://mail.ale.org/mailman/listinfo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ale.org/pipermail/ale/attachments/20160925/aed9c0fe/attachment.html>


More information about the Ale mailing list