Article 11356 of comp.lang.perl:
Path: feenix.metronet.com!news.utdallas.edu!convex!cs.utexas.edu!howland.reston.ans.net!math.ohio-state.edu!jussieu.fr!univ-lyon1.fr!swidir.switch.ch!scsing.switch.ch!news.dfn.de!news.coli.uni-sb.de!sbusol.rz.uni-sb.de!mpi-sb.mpg.de!uwe
From: uwe@mpi-sb.mpg.de (Uwe Waldmann)
Newsgroups: comp.lang.perl
Subject: Re: Redefining \w and \b possible?
Date: 9 Mar 1994 18:36:38 GMT
Organization: Max-Planck-Institut fuer Informatik
Lines: 27
Distribution: world
Message-ID: <2ll4vmINN963@sbusol.rz.uni-sb.de>
References: <1994Mar9.125522.20435@nntp.nta.no>
Reply-To: uwe@mpi-sb.mpg.de
NNTP-Posting-Host: mpii02005.ag2.mpi-sb.mpg.de
Originator: uwe@mpii02005

In article <1994Mar9.125522.20435@nntp.nta.no>, Stein Kulseth
<stein@hal.nta.no> wrote:
> Here in Norway we are blessed/cursed with three extra vowels.
> When doing pattern matching on Norwegian text it would be very
> nice to have \b and \w accept these as letters. Is this possible?

No, as far as I know (unless Larry has changed it in the meantime).

> If not, how can I write a search pattern that will match Norwegian
> word boundaries at either end and anywhere within a string?

# (a) Put a \000 before and after every word:
s/([A-Za-z0-9_\305\306\330\345\346\370]+)/\000$1\000/g;
# (b) Check for \000 instead of \b.
# For example, s/\b([A-Z])\b/"$1"/g becomes:
s/\000([A-Z\305\306\330])\000/"\000$1\000"/g;
# (c) Don't forget to remove all \000's after you are done:
s/\000//g;

If you have several substitutions in a row, be careful to check
that every word boundary remains marked by a \000.  It may even be
necessary to repeat steps (c)+(a) in between to readjust them.

-- 
Uwe Waldmann, Max-Planck-Institut fuer Informatik
Im Stadtwald, D-66123 Saarbruecken, Germany
Phone: +49 681 302-5431, Fax: +49 681 302-5401, E-Mail: uwe@mpi-sb.mpg.de