expressions sed

Vincent Lefevre vincent at vinc17.org
Mon Jan 28 12:49:26 CET 2008


Bonjour,

On 2008-01-28 10:43:58 +0100, Christophe Martin wrote:
> $ touch a b c d A B C D Z  z
> $ ls | cat
> a
> A
> b
> B
> c
> C
> d
> D
> z
> Z

"ls -1" est plus élégant que "ls | cat":

vin:~tmp/test> LC_COLLATE=fr_FR ls -1
a
A
b
B
c
C
d
D
z
Z

et au lieu de créer des fichiers, on peut utiliser sort:

vin:~> printf "%s\n" a b c d A B C D Z z | LC_COLLATE=fr_FR sort
a
A
b
B
c
C
d
D
z
Z

> quant à [\-~] C'est du délire total en français, par contre, sans  
> traduction,
> ça marche bien.
> $ echo 'Aa b- c~ dZ' | env LC_ALL=C sed -e 's/[\-~]//g'
> A -  Z
> $ echo 'Aa b- c~ dZ' | sed -e 's/[\-~]//g'
> Aa b- c dZ

Ça semble supprimer les caractères suivants:

vin:~> perl -e 'for (32..126, 160..255) { printf "%3d <%c%c>\n", $_, $_, $_ }' | LC_COLLATE=fr_FR sed -e 's/[\-~]//' | grep ' <.>$'
 35 <#>
 37 <%>
 38 <&>
 43 <+>
 92 <\>
 94 <^>
 96 <`>
126 <~>
168 <¨>
177 <±>
180 <´>

La spécification donnée dans la page man regex(7):

  A bracket expression is a list of characters enclosed  in  `[]'.   It
  normally  matches any single character from the list (but see below).
  If the list begins with `^', it matches any single character (but see
  below)  not from the rest of the list.  If two characters in the list
  are separated by `-', this is shorthand for the full range of charac-
  ters  between  those  two  (inclusive) in the collating sequence, for
  example, `[0-9]' in ASCII matches any decimal  digit.   It  is  ille-
  gal(!)  for  two  ranges  to share an endpoint, for example, `a-c-e'.
  Ranges are very collating-sequence-dependent, and  portable  programs
  should avoid relying on them.

> Si quelqu'un peut m'expliquer le rôle de la variable LANGUAGE
> (extension GNU), je suis preneur.

Je crois que l'unique but est de pouvoir définir une liste de langues.
Ce n'est pas supporté par tout.

"info libc" indique:

   This looks very familiar.  With the exception of the `LANGUAGE'
environment variable this is exactly the lookup order the `setlocale'
function uses.  But why introducing the `LANGUAGE' variable?

   The reason is that the syntax of the values these variables can have
is different to what is expected by the `setlocale' function.  If we
would set `LC_ALL' to a value following the extended syntax that would
mean the `setlocale' function will never be able to use the value of
this variable as well.  An additional variable removes this problem
plus we can select the language independently of the locale setting
which sometimes is useful.

   While for the `LC_xxx' variables the value should consist of exactly
one specification of a locale the `LANGUAGE' variable's value can
consist of a colon separated list of locale names.  The attentive
reader will realize that this is the way we manage to implement one of
our additional demands above: we want to be able to specify an ordered
list of language.

> Dans le même genre de ç&é"!(çà"!è!è'"§!&"#(!§ç!&"à!ç§&@## en français il
> y a les nombres à virgule, et pas à point. On se fait tjrs avoir une  
> fois
> ou deux avec awk (et perl ?) sur des nombres tels 1254.12 tronqués à  
> 1254
> car le système attendait, en français, 1254,12

Pour info, dans MPFR, on accepte les deux formes: la forme avec un
point, et celle avec le decimal_point de la locale courante.

-- 
Vincent Lefèvre <vincent at vinc17.org> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)


More information about the Shell mailing list