[ale] perl/postgresql question

Geoffrey Myers lists at serioustechnology.com
Fri Apr 1 07:40:50 EDT 2011


Jim Kinney wrote:
> I was searching for a clue and saw the post on the pg list :-) Got all 
> excited thinking I had found a solution to point to you to! Then I 
> realized, "Hey! This looks REALLY similar..."
> 
> Maybe a single run then a grep for non-ascii characters?

Not sure what you're suggesting.  The perl code is pulling the data out 
of the database, cleaning it, putting it back in.  Are you suggesting 
pulling the data out and piping to grep?  Are you thinking there's a bug 
in the perl code?

> 
> On Thu, Mar 31, 2011 at 4:11 PM, Geoffrey Myers 
> <lists at serioustechnology.com <mailto:lists at serioustechnology.com>> wrote:
> 
>     Jim Kinney wrote:
>      > The "g" _should_ do it. Quite puzzling.
>      > Can you do a run and find an example of when it fails? Is it
>     always 0xbd?
> 
>     The sad thing is, all it tells me is which table in the database has the
>     error.  It does not identify the field.  I've posted to the Postgres
>     list to see if there's any way to narrow down the culprit, there is not.
> 
>     Once it fails, it stops loading that table, so I never get pass this
>     error.
> 
>     When I first caught this error, I was simply trying to load an SQL_ASCII
>     database dump into a UTF8 database.  That's when I wrote my
>     cleandata.pl <http://cleandata.pl>
>     code.  The code cleans 62750 records.  I just don't understand why it
>     doesn't fix this one.
> 
>      > Maybe someone in Tibet doesn't like your code :-)  (It's a Tibetan
>      > character code in utf8)
> 
>     But what is it in SQL_ASCII?  What happened is, some enterprising users
>     figured out how to enter special characters from the keyboard, say, like
>     a degrees symbol.  That goes in to the SQL_ASCII database fine, but when
>     you try to load that into the UTF8 encoded database, it pukes.
> 
>      >
>      > On Thu, Mar 31, 2011 at 2:59 PM, Geoffrey Myers
>      > <lists at serioustechnology.com <mailto:lists at serioustechnology.com>
>     <mailto:lists at serioustechnology.com
>     <mailto:lists at serioustechnology.com>>> wrote:
>      >
>      >     Jim Kinney wrote:
>      >      > should it be replaced with nothing or with a NULL? Does
>     the perl
>      >     pointer
>      >      > jump to the next character automatically when a match is
>     made thus
>      >      > skipping a character in your replace? Does a double run
>     fix it?
>      >
>      >     Missed the first question.  It should be removed, not
>     replaced with an
>      >     null.  For example I have:
>      >
>      >     This is my <UGLY CHARACTER>data
>      >
>      >     Should then be:
>      >
>      >     This is my data
>      >
>      >      >
>      >      > On Thu, Mar 31, 2011 at 12:04 PM, Geoffrey Myers
>      >      > <lists at serioustechnology.com
>     <mailto:lists at serioustechnology.com>
>     <mailto:lists at serioustechnology.com
>     <mailto:lists at serioustechnology.com>>
>      >     <mailto:lists at serioustechnology.com
>     <mailto:lists at serioustechnology.com>
>      >     <mailto:lists at serioustechnology.com
>     <mailto:lists at serioustechnology.com>>>> wrote:
>      >      >
>      >      >     So, we are trying to convert a number of postgresql
>     databases
>      >     that were
>      >      >     created with SQL_ASCII encoding to UTF8 encoding.  As
>     such, I
>      >     need to
>      >      >     strip certain characters out of the data before
>     dumping and
>      >     reloading
>      >      >     the new databases.  I'm using the following:
>      >      >
>      >      >     data =~ s/(.)/((ord($1) >= 0) && (ord($1) <= 8))
>      >      >                     || (ord($1) == 11)
>      >      >                     || ((ord($1) >= 13) && (ord($1) <= 31))
>      >      >                     || ((ord($1) >= 127)) ?"": $1/egs;
>      >      >
>      >      >     Yet I'm getting the following error, which indicates a
>      >     character that
>      >      >     should be handled by the above code is still in the data:
>      >      >
>      >      >     pg_restore: [archiver (db)] COPY failed: ERROR:
>      invalid byte
>      >     sequence
>      >      >     for encoding "UTF8": 0xbd
>      >      >
>      >      >
>      >      >     Certainly, the above code should replace the 0xbd with
>     nothing?
>      >      >
>      >      >     Any perl/postgres gurus out there?
>      >      >
>      >      >     --
>      >      >     Until later, Geoffrey
>      >      >
>      >      >     "I predict future happiness for America if they can
>     prevent
>      >      >     the government from wasting the labors of the people under
>      >      >     the pretense of taking care of them."
>      >      >     - Thomas Jefferson
>      >      >     _______________________________________________
>      >      >     Ale mailing list
>      >      >     Ale at ale.org <mailto:Ale at ale.org> <mailto:Ale at ale.org
>     <mailto:Ale at ale.org>> <mailto:Ale at ale.org <mailto:Ale at ale.org>
>      >     <mailto:Ale at ale.org <mailto:Ale at ale.org>>>
>      >      >     http://mail.ale.org/mailman/listinfo/ale
>      >      >     See JOBS, ANNOUNCE and SCHOOLS lists at
>      >      >     http://mail.ale.org/mailman/listinfo
>      >      >
>      >      >
>      >      >
>      >      >
>      >      > --
>      >      > --
>      >      > James P. Kinney III
>      >      > I would rather stumble along in freedom than walk
>     effortlessly in
>      >     chains.
>      >      >
>      >      >
>      >      >
>      >      >
>      >    
>     ------------------------------------------------------------------------
>      >      >
>      >      > _______________________________________________
>      >      > Ale mailing list
>      >      > Ale at ale.org <mailto:Ale at ale.org> <mailto:Ale at ale.org
>     <mailto:Ale at ale.org>>
>      >      > http://mail.ale.org/mailman/listinfo/ale
>      >      > See JOBS, ANNOUNCE and SCHOOLS lists at
>      >      > http://mail.ale.org/mailman/listinfo
>      >
>      >
>      >     --
>      >     Until later, Geoffrey
>      >
>      >     "I predict future happiness for America if they can prevent
>      >     the government from wasting the labors of the people under
>      >     the pretense of taking care of them."
>      >     - Thomas Jefferson
>      >     _______________________________________________
>      >     Ale mailing list
>      >     Ale at ale.org <mailto:Ale at ale.org> <mailto:Ale at ale.org
>     <mailto:Ale at ale.org>>
>      >     http://mail.ale.org/mailman/listinfo/ale
>      >     See JOBS, ANNOUNCE and SCHOOLS lists at
>      >     http://mail.ale.org/mailman/listinfo
>      >
>      >
>      >
>      >
>      > --
>      > --
>      > James P. Kinney III
>      > I would rather stumble along in freedom than walk effortlessly in
>     chains.
>      >
>      >
>      >
>      >
>     ------------------------------------------------------------------------
>      >
>      > _______________________________________________
>      > Ale mailing list
>      > Ale at ale.org <mailto:Ale at ale.org>
>      > http://mail.ale.org/mailman/listinfo/ale
>      > See JOBS, ANNOUNCE and SCHOOLS lists at
>      > http://mail.ale.org/mailman/listinfo
> 
> 
>     --
>     Until later, Geoffrey
> 
>     "I predict future happiness for America if they can prevent
>     the government from wasting the labors of the people under
>     the pretense of taking care of them."
>     - Thomas Jefferson
>     _______________________________________________
>     Ale mailing list
>     Ale at ale.org <mailto:Ale at ale.org>
>     http://mail.ale.org/mailman/listinfo/ale
>     See JOBS, ANNOUNCE and SCHOOLS lists at
>     http://mail.ale.org/mailman/listinfo
> 
> 
> 
> 
> -- 
> -- 
> James P. Kinney III
> I would rather stumble along in freedom than walk effortlessly in chains.
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo


-- 
Until later, Geoffrey

"I predict future happiness for America if they can prevent
the government from wasting the labors of the people under
the pretense of taking care of them."
- Thomas Jefferson


More information about the Ale mailing list