Blame SOURCES/sedfaq.txt

9512b1

9512b1
Archive-Name: editor-faq/sed
9512b1
Posting-Frequency: irregular
9512b1
Last-modified: 10 March 2003
9512b1
Version: 015
9512b1
URL: http://sed.sourceforge.net/sedfaq.html
9512b1
Maintainer: Eric Pement (pemente@northpark.edu)
9512b1

9512b1
                            THE SED FAQ
9512b1

9512b1
                  Frequently Asked Questions about
9512b1
                       sed, the stream editor
9512b1

9512b1
CONTENTS
9512b1

9512b1
1. GENERAL INFORMATION
9512b1
1.1. Introduction - How this FAQ is organized
9512b1
1.2. Latest version of the sed FAQ
9512b1
1.3. FAQ revision information
9512b1
1.4. How do I add a question/answer to the sed FAQ?
9512b1
1.5. FAQ abbreviations
9512b1
1.6. Credits and acknowledgements
9512b1
1.7. Standard disclaimers
9512b1

9512b1
2. BASIC SED
9512b1
2.1. What is sed?
9512b1
2.2. What versions of sed are there, and where can I get them?
9512b1

9512b1
2.2.1. Free versions
9512b1

9512b1
2.2.1.1. Unix platforms
9512b1
2.2.1.2. OS/2
9512b1
2.2.1.3. Microsoft Windows (Win3x, Win9x, WinNT, Win2K)
9512b1
2.2.1.4. MS-DOS
9512b1
2.2.1.5. CP/M
9512b1
2.2.1.6. Macintosh v8 or v9
9512b1

9512b1
2.2.2. Shareware and Commercial versions
9512b1

9512b1
2.2.2.1. Unix platforms
9512b1
2.2.2.2. OS/2
9512b1
2.2.2.3. Windows 95/98, Windows NT, Windows 2000
9512b1
2.2.2.4. MS-DOS
9512b1

9512b1
2.3. Where can I learn to use sed?
9512b1

9512b1
2.3.1. Books
9512b1
2.3.2. Mailing list
9512b1
2.3.3. Tutorials, electronic text
9512b1
2.3.4. General web and ftp sites
9512b1

9512b1
3. TECHNICAL
9512b1
3.1. More detailed explanation of basic sed
9512b1
3.1.1.  Regular expressions on the left side of "s///"
9512b1
3.1.2.  Escape characters on the right side of "s///"
9512b1
3.1.3.  Substitution switches
9512b1
3.2. Common one-line sed scripts. How do I . . . ?
9512b1

9512b1
      - double/triple-space a file?
9512b1
      - convert DOS/Unix newlines?
9512b1
      - delete leading/trailing spaces?
9512b1
      - do substitutions on all/certain lines?
9512b1
      - delete consecutive blank lines?
9512b1
      - delete blank lines at the top/end of the file?
9512b1

9512b1
3.3. Addressing and address ranges
9512b1
3.4. Address ranges in GNU sed and HHsed
9512b1
3.5. Debugging sed scripts
9512b1
3.6. Notes about s2p, the sed-to-perl translator
9512b1
3.7. GNU/POSIX extensions to regular expressions
9512b1

9512b1
4. EXAMPLES
9512b1
   ONE-CHARACTER QUESTIONS
9512b1
4.1.  How do I insert a newline into the RHS of a substitution?
9512b1
4.2.  How do I represent control-codes or non-printable characters?
9512b1
4.3.  How do I convert files with toggle characters, like +this+,
9512b1
      to look like [i]this[/i]?
9512b1

9512b1
   CHANGING STRINGS
9512b1
4.10. How do I perform a case-insensitive search?
9512b1
4.11. How do I match only the first occurrence of a pattern?
9512b1
4.12. How do I parse a comma-delimited (CSV) data file?
9512b1
4.13. How do I handle fixed-length, columnar data?
9512b1
4.14. How do I commify a string of numbers?
9512b1
4.15. How do I prevent regex expansion on substitutions?
9512b1
4.16. How do I convert a string to all lowercase or capital letters?
9512b1

9512b1
   CHANGING BLOCKS (consecutive lines)
9512b1
4.20. How do I change only one section of a file?
9512b1
4.21. How do I delete or change a block of text if the block contains
9512b1
      a certain regular expression?
9512b1
4.22. How do I locate a paragraph of text if the paragraph contains a
9512b1
      certain regular expression?
9512b1
4.23. How do I match a block of specific consecutive lines?
9512b1
4.23.1.  Try to use a "/range/, /expression/"
9512b1
4.23.2.  Try to use a "multi-line\nexpression"
9512b1
4.23.3.  Try to use a block of "literal strings"
9512b1
4.24. How do I address all the lines between RE1 and RE2, excluding the lines themselves?
9512b1
4.25. How do I join two lines if line #1 ends in a [certain string]?
9512b1
4.26. How do I join two lines if line #2 begins in a [certain string]?
9512b1
4.27. How do I change all paragraphs to long lines?
9512b1

9512b1
   SHELL AND ENVIRONMENT
9512b1
4.30.   How do I read environment variables with sed ...
9512b1
4.31.1.   ... on Unix platforms?
9512b1
4.31.2.   ... on MS-DOS or 4DOS platforms?
9512b1
4.32.   How do I export or pass variables back into the environment ...
9512b1
4.32.1.   ... on Unix platforms?
9512b1
4.32.2.   ... on MS-DOS or 4DOS platforms?
9512b1
4.33.   How do I handle shell quoting in sed?
9512b1

9512b1
   FILES, DIRECTORIES, AND PATHS
9512b1
4.40.  How do I read (insert/add) a file at the top of a textfile?
9512b1
4.41.  How do I make substitutions in every file in a directory, or
9512b1
        in a complete directory tree?
9512b1
4.41.1.   ... ssed solution
9512b1
4.41.2.   ... Unix solution
9512b1
4.41.3.   ... DOS solution
9512b1
4.42.  How do I replace "/some/UNIX/path" in a substitution?
9512b1
4.43.  How do I replace "C:\SOME\DOS\PATH" in a substitution?
9512b1
4.44.  How do I emulate file-includes, using sed?
9512b1

9512b1
5. WHY ISN'T THIS WORKING?
9512b1
5.1.  Why don't my variables like $var get expanded in my sed script?
9512b1
5.2.  I'm using 'p' to print, but I have duplicate lines sometimes.
9512b1
5.3.  Why does my DOS version of sed process a file part-way through
9512b1
      and then quit?
9512b1
5.4.  My RE isn't matching/deleting what I want it to. (Or, "Greedy vs.
9512b1
      stingy pattern matching")
9512b1
5.5.  What is CSDPMI*B.ZIP and why do I need it?
9512b1
5.6.  Where are the man pages for GNU sed?
9512b1
5.7.  How do I tell what version of sed I am using?
9512b1
5.8.  Does sed issue an exit code?
9512b1
5.9.  The 'r' command isn't inserting the file into the text.
9512b1
5.10. Why can't I match or delete a newline using the \n escape
9512b1
      sequence? Why can't I match 2 or more lines using \n?
9512b1
5.11. My script aborts with an error message, "event not found".
9512b1

9512b1
6. OTHER ISSUES
9512b1
6.1.  I have a problem that stumps me. Where can I get help?
9512b1
6.2.  How does sed compare with awk, perl, and other utilities?
9512b1
6.3.  When should I use sed?
9512b1
6.4.  When should I NOT use sed?
9512b1
6.5.  When should I ignore sed and use Awk or Perl instead?
9512b1
6.6.  Known limitations among sed versions
9512b1
6.7.  Known incompatibilities between sed versions
9512b1

9512b1
6.7.1. Issuing commands from the command line
9512b1
6.7.2. Using comments (prefixed by the '#' sign)
9512b1
6.7.3. Special syntax in REs
9512b1
6.7.4. Word boundaries
9512b1
6.7.5. Commands which operate differently
9512b1

9512b1
7. KNOWN BUGS AMONG SED VERSIONS
9512b1
7.1. ssed v3.59
9512b1
7.2. GNU sed v4.0 - v4.0.5
9512b1
7.3. GNU sed v3.02.80
9512b1
7.4. GNU sed v3.02
9512b1
7.5. GNU sed v2.05
9512b1
7.6. GNU sed v1.18
9512b1
7.7. GNU sed v1.03
9512b1
7.8. sed v1.6 (Briscoe)
9512b1
7.9. sed v1.5 (Helman)
9512b1
7.10. sedmod v1.0 (Chen)
9512b1
7.11. HP-UX sed
9512b1
7.12. SunOS sed v4.1
9512b1
7.13. SunOS sed v5.6
9512b1
7.14. Ultrix sed v4.3
9512b1
7.15. Digital Unix sed
9512b1

9512b1

9512b1
------------------------------
9512b1

9512b1
1. GENERAL INFORMATION
9512b1

9512b1
1.1. Introduction - How this FAQ is organized
9512b1

9512b1
   This FAQ is organized to answer common (and some uncommon)
9512b1
   questions about sed, quickly. If you see a term or abbreviation in
9512b1
   the examples that seems unclear, see if the term is defined in
9512b1
   section 1.5. If not, send your comment to pemente[at]northpark.edu.
9512b1

9512b1
1.2. Latest version of the sed FAQ
9512b1

9512b1
   The newest version of the sed FAQ is usually here:
9512b1

9512b1
       http://sed.sourceforge.net/sedfaq.html (HTML version)
9512b1
       http://sed.sourceforge.net/sedfaq.txt  (plain text)
9512b1
       http://www.student.northpark.edu/pemente/sed/sedfaq.html
9512b1
       http://www.student.northpark.edu/pemente/sed/sedfaq.txt
9512b1
       http://www.faqs.org/faqs/editor-faq/sed
9512b1
       ftp://rtfm.mit.edu/pub/faqs/editor-faq/sed
9512b1

9512b1
   Another FAQ file on sed by a different author can be found here:
9512b1

9512b1
       http://www.dreamwvr.com/sed-info/sed-faq.html
9512b1

9512b1
1.3. FAQ revision information
9512b1

9512b1
   In the plaintext version, changes are shown by a vertical bar (|)
9512b1
   placed in column 78 of the affected lines. To remove the vertical
9512b1
   bars (use double quotes for MS-DOS):
9512b1

9512b1
     sed 's/  *|$//' sedfaq.txt > sedfaq2.txt
9512b1

9512b1
   In the HTML version, vertical bars do not appear. New or altered
9512b1
   portions of the FAQ are indicated by printing in dark blue type.
9512b1

9512b1
   In the text version, words needing emphasis may be surrounded by
9512b1
   the underscore '_' or the asterisk '*'. In the HTML version, these
9512b1
   are changed to italics and boldface, respectively.
9512b1

9512b1
1.4. How do I add a question/answer to the sed FAQ?
9512b1

9512b1
   Word your question briefly and send it to pemente[at]northpark.edu,
9512b1
   indicating your proposed change. We'll post it on the sed-users
9512b1
   mailing list (see section 2.3.2) and discuss it there. If it's
9512b1
   good, your contribution will be added to the next edition.
9512b1

9512b1
1.5. FAQ abbreviations
9512b1

9512b1
       files = one or more filenames, separated by whitespace
9512b1
       gsed  = GNU sed
9512b1
       ssed  = super-sed
9512b1
       RE    = Regular Expressions supported by sed
9512b1
       LHS   = the left-hand side ("find" part) of "s/find/repl/" command
9512b1
       RHS   = the right-hand side ("replace" part) of "s/find/repl/" cmd
9512b1
       nn+   = version _nn_ or higher (e.g., "15+" = version 1.5 and above)
9512b1

9512b1
   files: "files" stands for one or more filenames entered on the
9512b1
   command line. The names may include any wildcards your shell
9512b1
   understands (such as ``zork*'' or ``Aug[4-9].let''). Sed will
9512b1
   process each filename passed to it by the shell.
9512b1

9512b1
   RE: For details on regular expressions, see section 3.1.1., below.
9512b1

9512b1
1.6. Credits and acknowledgements
9512b1

9512b1
   Many of the ideas for this FAQ were taken from the Awk FAQ:
9512b1
       http://www.faqs.org/faqs/computer-lang/awk/faq/
9512b1
       ftp://rtfm.mit.edu/pub/usenet/comp.lang.awk/faq
9512b1

9512b1
   and from the old Perl FAQ:
9512b1
       http://www.perl.com/doc/FAQs/FAQ/oldfaq-html/index.html
9512b1

9512b1
   The following individuals have contributed significantly to this
9512b1
   document, and have provided input and wording suggestions for
9512b1
   questions, answers, and script examples. Credit goes to these
9512b1
   contributors (in alphabetical order by last name):
9512b1

9512b1
   Al Aab, Yiorgos Adamopoulos, Paolo Bonzini, Walter Briscoe, Jim
9512b1
   Dennis, Carlos Duarte, Otavio Exel, Sven Guckes, Aurelio Jargas,
9512b1
   Mark Katz, Toby Kelsey, Eric Pement, Greg Pfeiffer, Ken Pizzini,
9512b1
   Niall Smart, Simon Taylor, Peter Tillier, Greg Ubben, Laurent
9512b1
   Vogel.
9512b1

9512b1
1.7. Standard disclaimers
9512b1

9512b1
   While a serious attempt has been made to ensure the accuracy of the
9512b1
   information presented herein, the contributors and maintainers of
9512b1
   this document do not claim the absence of errors and make no
9512b1
   warranties on the information provided. If you notice any mistakes,
9512b1
   please let us know so we can fix it.
9512b1

9512b1
------------------------------
9512b1

9512b1
2. BASIC SED
9512b1

9512b1
2.1. What is sed?
9512b1

9512b1
   "sed" stands for Stream EDitor. Sed is a non-interactive editor,
9512b1
   written by the late Lee E. McMahon in 1973 or 1974. A brief history
9512b1
   of sed's origins may be found in an early history of the Unix
9512b1
   tools, at <http://www.columbia.edu/~rh120/ch106.x09>.
9512b1

9512b1
   Instead of altering a file interactively by moving the cursor on
9512b1
   the screen (as with a word processor), the user sends a script of
9512b1
   editing instructions to sed, plus the name of the file to edit (or
9512b1
   the text to be edited may come as output from a pipe). In this
9512b1
   sense, sed works like a filter -- deleting, inserting and changing
9512b1
   characters, words, and lines of text. Its range of activity goes
9512b1
   from small, simple changes to very complex ones.
9512b1

9512b1
   Sed reads its input from stdin (Unix shorthand for "standard
9512b1
   input," i.e., the console) or from files (or both), and sends the
9512b1
   results to stdout ("standard output," normally the console or
9512b1
   screen). Most people use sed first for its substitution features.
9512b1
   Sed is often used as a find-and-replace tool.
9512b1

9512b1
     sed 's/Glenn/Harold/g' oldfile >newfile
9512b1

9512b1
   will replace every occurrence of "Glenn" with the word "Harold",
9512b1
   wherever it occurs in the file. The "find" portion is a regular
9512b1
   expression ("RE"), which can be a simple word or may contain
9512b1
   special characters to allow greater flexibility (for example, to
9512b1
   prevent "Glenn" from also matching "Glennon").
9512b1

9512b1
   My very first use of sed was to add 8 spaces to the left side of a
9512b1
   file, so when I printed it, the printing wouldn't begin at the
9512b1
   absolute left edge of a piece of paper.
9512b1

9512b1
     sed 's/^/        /' myfile >newfile   # my first sed script
9512b1
     sed 's/^/        /' myfile | lp       # my next sed script
9512b1

9512b1
   Then I learned that sed could display only one paragraph of a file,
9512b1
   beginning at the phrase "and where it came" and ending at the
9512b1
   phrase "for all people". My script looked like this:
9512b1

9512b1
     sed -n '/and where it came/,/for all people/p' myfile
9512b1

9512b1
   Sed's normal behavior is to print (i.e., display or show on screen)
9512b1
   the entire file, including the parts that haven't been altered,
9512b1
   unless you use the -n switch. The "-n" stands for "no output". This
9512b1
   switch is almost always used in conjunction with a 'p' command
9512b1
   somewhere, which says to print only the sections of the file that
9512b1
   have been specified. The -n switch with the 'p' command allow for
9512b1
   parts of a file to be printed (i.e., sent to the console).
9512b1

9512b1
   Next, I found that sed could show me only (say) lines 12-18 of a
9512b1
   file and not show me the rest. This was very handy when I needed to
9512b1
   review only part of a long file and I didn't want to alter it.
9512b1

9512b1
     # the 'p' stands for print
9512b1
     sed -n 12,18p myfile
9512b1

9512b1
   Likewise, sed could show me everything else BUT those particular
9512b1
   lines, without physically changing the file on the disk:
9512b1

9512b1
     # the 'd' stands for delete
9512b1
     sed 12,18d myfile
9512b1

9512b1
   Sed could also double-space my single-spaced file when it came time
9512b1
   to print it:
9512b1

9512b1
     sed G myfile >newfile
9512b1

9512b1
   If you have many editing commands (for deleting, adding,
9512b1
   substituting, etc.) which might take up several lines, those
9512b1
   commands can be put into a separate file and all of the commands in
9512b1
   the file applied to file being edited:
9512b1

9512b1
     #  'script.sed' is the file of commands
9512b1
     # 'myfile' is the file being changed
9512b1
     sed -f script.sed myfile  # 'script.sed' is the file of commands
9512b1

9512b1
   It is not our intention to convert this FAQ file into a full-blown
9512b1
   sed tutorial (for good tutorials, see section 2.3). Rather, we hope
9512b1
   this gives the complete novice a few ideas of how sed can be used.
9512b1

9512b1
2.2. What versions of sed are there, and where can I get them?
9512b1

9512b1
2.2.1. Free versions
9512b1

9512b1
   Note: "Free" does not mean "public domain" nor does it necessarily
9512b1
   mean you will never be charged for it. All versions of sed in this
9512b1
   section except the CP/M versions are based on the GNU general
9512b1
   public license and are "free software" by that standard (for
9512b1
   details, see http://www.gnu.org/philosophy/free-sw.html). This
9512b1
   means you can get the source code and develop it further.
9512b1

9512b1
   At the URLs listed in this category, sed binaries or source code
9512b1
   can be downloaded and used without fees or license payments.
9512b1

9512b1
2.2.1.1. Unix platforms
9512b1

9512b1
   ssed v3.60
9512b1
   ssed is the version recommended by the FAQ maintainers, since it
9512b1
   shares the same codebase with GNU sed, has the most options, and is
9512b1
   free software (you can get the source). Though there were earlier
9512b1
   version of ssed distributed, sites for these are not being listed.
9512b1

9512b1
       http://sed.sourceforge.net/grabbag/ssed
9512b1
       http://freshmeat.net/project/sed/
9512b1

9512b1
   GNU sed v4.0.5
9512b1
   This is the latest official version of GNU sed. It offers in-place
9512b1
   text replacement as an option switch.
9512b1

9512b1
       ftp://ftp.gnu.org/pub/gnu/sed/sed-4.0.5.tar.gz
9512b1
       http://freshmeat.net/project/sed
9512b1

9512b1
   BSD multi-byte sed (Japanese)
9512b1
   Based on the latest version of GNU sed, which supports multi-byte
9512b1
   characters.
9512b1

9512b1
       ftp://ftp1.freebsd.org/pub/FreeBSD/FreeBSD-stable/packages/Latest/ja-sed.tgz
9512b1

9512b1
   GNU sed v3.02.80
9512b1
   An alpha test release which was the base for the development of
9512b1
   ssed and GNU sed v4.0.
9512b1

9512b1
       ftp://alpha.gnu.org/pub/gnu/sed/sed-3.02.80.tar.gz
9512b1

9512b1
   GNU sed v3.02a
9512b1
   Interim version with most features of GNU sed v3.02.80.
9512b1

9512b1
   GNU sed v3.02
9512b1
       ftp://ftp.gnu.org/pub/gnu/sed/sed-3.02.tar.gz
9512b1

9512b1
   Precompiled versions:
9512b1

9512b1
   GNU sed v3.02-8
9512b1
   source code and binaries for Debian GNU/Linux
9512b1

9512b1
       http://www.debian.org/Packages/stable/base/sed.html
9512b1

9512b1
   For some time, the GNU project <http://www.gnu.org> used Eric S.
9512b1
   Raymond's version of sed (ESR sed v1.1), but eventually dropped it
9512b1
   because it had too many built-in limits. In 1991 Howard Helman
9512b1
   modified the GNU/ESR sed and produced a flexible version of sed
9512b1
   v1.5 available at several sites (Helman's version permitted things
9512b1
   like \<...\> to delimit word boundaries, \xHH to enter hex code and
9512b1
   \n to indicate newlines in the replace string). This version did
9512b1
   not catch on with the GNU project and their version of sed has
9512b1
   moved in a similar but different direction.
9512b1

9512b1
   sed v1.3, by Eric Steven Raymond (released 4 June 1998)
9512b1
       http://catb.org/~esr/sed-1.3.tar.gz
9512b1

9512b1
   Eric Raymond <esr@snark.thyrsus.com> wrote one of the earliest
9512b1
   versions of sed. On his website <http://www.catb.org/~esr/> which
9512b1
   also distributes many freeware utilities he has written or worked
9512b1
   on, he describes sed v1.1 this way:
9512b1

9512b1
   "This is the fast, small sed originally distributed in the GNU
9512b1
   toolkit and still distributed with Minix. The GNU people ditched it
9512b1
   when they built their own sed around an enhanced regex package --
9512b1
   but it's still better for some uses (in particular, faster and less
9512b1
   memory-intensive)." (Version 1.3 fixes an unidentified bug and adds
9512b1
   the L command to hexdump the current pattern space.)
9512b1

9512b1
2.2.1.2. OS/2
9512b1

9512b1
   GNU sed v3.02.80
9512b1
       http://www2s.biglobe.ne.jp/~vtgf3mpr/gnu/sed.htm
9512b1

9512b1
   GNU sed v3.02
9512b1
       http://hobbes.nmsu.edu/pub/os2/util/file/sed-3_02-r2-bin.zip # binaries
9512b1
       http://hobbes.nmsu.edu/pub/os2/util/file/sed-3_02-r2.zip     # source
9512b1

9512b1
2.2.1.3. Microsoft Windows (Win3x, Win9x, WinNT, Win2K)
9512b1

9512b1
   GNU sed v4.0.5
9512b1
   32-bit binaries and docs. Precompiled versions not available (yet).
9512b1

9512b1
   GNU sed v3.02.80
9512b1
   32-bit binaries and docs, using DJGPP compiler. For details on new
9512b1
   features, see Unix section, above.
9512b1

9512b1
       http://www.student.northpark.edu/pemente/sed/sed3028a.zip # DOS binaries
9512b1
       ftp://alpha.gnu.org/pub/gnu/sed/sed-3.02.80.tar.gz        # source
9512b1
       ftp://ftp.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/sed3028b.zip # binaries
9512b1
       ftp://ftp.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/sed3028d.zip # docs
9512b1
       ftp://ftp.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/sed3028s.zip # source
9512b1

9512b1
   GNU sed v2.05
9512b1
   32-bit binaries, no docs. Requires 80386 DX (SX will not run) and
9512b1
   must be run in a DOS window or in a full screen DOS session under
9512b1
   Microsoft Windows. Will not run in MS-DOS mode (outside Win/Win95).
9512b1
   We recommend using the latest version of GNU sed.
9512b1
       http://www.simtel.net/pub/win95/prog/gsed205b.zip
9512b1
       ftp://ftp.cdrom.com/pub/simtelnet/win95/prog/gsed205b.zip
9512b1

9512b1
   GNU sed v1.03
9512b1
   modified by Frank Whaley.
9512b1

9512b1
   This version was part of the "Virtually UN*X" toolset, hosted by
9512b1
   itribe.net; that website is now closed. Gsed v1.03 supported Win9x
9512b1
   long filenames, as well as hex, decimal, binary, and octal
9512b1
   character representations.
9512b1

9512b1
   The Cygwin toolkit:
9512b1
       http://www.cygwin.com
9512b1

9512b1
   Formerly know as "GNU-Win32 tools." According to their home page,
9512b1
   "The Cygwin tools are Win32 ports of the popular GNU development
9512b1
   tools for Windows NT, 95 and 98. They function through the use of
9512b1
   the Cygwin library which provides a UNIX-like API on top of the
9512b1
   Win32 API." The version of sed used is GNU sed v3.02.
9512b1

9512b1
   Minimalist GNU for Windows (MinGW):
9512b1
       http://www.mingw.org
9512b1
       http://mingw.sourceforge.net
9512b1

9512b1
   According to their home page, "MinGW ('Minimalist GNU for Windows')
9512b1
   refers to a set of runtime headers, used in building a compiler
9512b1
   system based on the GNU GCC and binutils projects. It compiles and
9512b1
   links code to be run on Win32 platforms ... MinGW uses Microsoft
9512b1
   runtime libraries, distributed with the Windows operating system."
9512b1
   The version of sed used is GNU sed v3.02.
9512b1

9512b1
   sed v1.5 (a/k/a HHsed), by Howard Helman
9512b1
   Compiled with Mingw32 for 32-bit environments described above. This
9512b1
   version should support Win95 long filenames.
9512b1
       http://www.dbnet.ece.ntua.gr/~george/sed/OLD/sed15.exe
9512b1
       http://www.student.northpark.edu/pemente/sed/sed15exe.zip
9512b1

9512b1
2.2.1.4. MS-DOS
9512b1

9512b1
   sed v1.6 (from HHsed), by Walter Briscoe
9512b1

9512b1
   This is a forthcoming version, now in beta testing, but with many
9512b1
   new features. It corrects all the bugs in sed v1.5, and adds the
9512b1
   best features of sedmod v1.0 (below). It is available in 16-bit and
9512b1
   32-bit compiled versions for MS-DOS. Sorry, no URLs available yet.
9512b1

9512b1
   sed v1.5 (a/k/a HHsed), by Howard Helman
9512b1
   uncompiled source code (Turbo C)
9512b1
       ftp://ftp.simtel.net/pub/simtelnet/msdos/txtutl/sed15.zip
9512b1
       ftp://ftp.cdrom.com/pub/simtelnet/msdos/txtutl/sed15.zip
9512b1

9512b1
   DOS executable and documentation
9512b1
       ftp://ftp.simtel.net/pub/simtelnet/msdos/txtutl/sed15x.zip
9512b1
       ftp://ftp.cdrom.com/pub/simtelnet/msdos/txtutl/sed15x.zip
9512b1

9512b1
   sedmod v1.0, by Hern Chen
9512b1
       http://www.ptug.org/sed/SEDMOD10.ZIP
9512b1
       http://www.student.northpark.edu/pemente/sed/sedmod10.zip
9512b1
       ftp://garbo.uwasa.fi/pc/unix/sedmod10.zip
9512b1

9512b1
   GNU sed v3.02.80
9512b1
   See section 2.2.1.3 ("Microsoft Windows"), above.
9512b1

9512b1
   GNU sed v2.05
9512b1
   Does not run under MS-DOS.
9512b1

9512b1
   GNU sed v1.18
9512b1
   32-bit binaries and source, using DJGPP compiler. Requires 80386 SX
9512b1
   or better. Also requires 3 CWS*.EXE extenders on the path. See
9512b1
   section 5.5 ("What is CSDPMI*B.ZIP and why do I need it?"), below.
9512b1
   We recommend using a newer version of GNU sed.
9512b1
       http://www.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/sed118b.zip
9512b1
       ftp://ftp.cdrom.com/pub/simtelnet/gnu/djgpp/v2gnu/sed118b.zip
9512b1
       http://www.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/sed118s.zip
9512b1
       ftp://ftp.cdrom.com/pub/simtelnet/gnu/djgpp/v2gnu/sed118s.zip
9512b1

9512b1
   GNU sed v1.06
9512b1
   16-bit binaries and source. Should run under any MS-DOS system.
9512b1
       http://www.simtel.net/pub/gnu/gnuish/sed106.zip
9512b1
       ftp://ftp.cdrom.com/pub/simtelnet/gnu/gnuish/sed106.zip
9512b1

9512b1
2.2.1.5. CP/M
9512b1

9512b1
   ssed v2.2, by Chuck A. Forsberg
9512b1

9512b1
   Written for CP/M, ssed (for "small/stupid stream editor) supports
9512b1
   only the a(ppend), c(hange), d(elete) and i(nsert) options, and
9512b1
   apparently doesn't support regular expressions. A -u switch will
9512b1
   "unsqueeze" compressed files and was used mainly in conjunction
9512b1
   with DIF.COM for source code maintenance. (file: ssed22.lbr)
9512b1

9512b1
   change, by Michael M. Rubenstein
9512b1

9512b1
   Rubenstein released a version of sed called CHANGE.COM (the
9512b1
   TTOOLS.LBR archive member CHANGE.CZM is a "crunched" file).
9512b1
   CHANGE.COM supports full RE's except grouping and backreferences,
9512b1
   and its only function is global substitution. (file: ttools.lbr)
9512b1

9512b1
2.2.1.6. Macintosh v8 or v9
9512b1

9512b1
   Since sed is a command-line utility, it is not customary to think
9512b1
   of sed being used on a Mac. Nonetheless, the following instructions
9512b1
   from Aurelio Jargas describe the process for running sed on MacOS
9512b1
   version version 8 or 9.
9512b1

9512b1
   (1) Download and install the Apple DiskCopy application
9512b1

9512b1
       ftp://ftp.apple.com/developer/Development_Kits
9512b1

9512b1
   (2) Download and install Apple MPW
9512b1

9512b1
       ftp://ftp.apple.com/developer/Tool_Chest/Core_Mac_OS_Tools/MPW_etc./
9512b1

9512b1
   (3) Download and expand Matthias Neeracher's GNU sed for MPW. (They
9512b1
   seem to have misnumbered the sed filename.)
9512b1

9512b1
       ftp://sunsite.cnlab-switch.ch/software/platform/macos/src/mpw_c/sed-2.03.sit.bin
9512b1

9512b1
   (4) Enter the sed-3.02 directory and doubleclick the 'sed' file
9512b1

9512b1
   (5) MPW Shell will open up. It will be a command window instead of
9512b1
   a command line, but sed should work as expected. For example:
9512b1

9512b1
       echo aa | sed 's/a/Z/g'<ENTER>
9512b1

9512b1
   Note that ENTER is different from RETURN on an iMac. Apple *also*
9512b1
   has its own version of sed on MPW, called "StreamEdit", with a
9512b1
   syntax fairly similar to that of normal sed.
9512b1

9512b1
2.2.2. Shareware and Commercial versions
9512b1

9512b1
2.2.2.1. Unix platforms
9512b1

9512b1
       [ Additional information needed. ]
9512b1

9512b1
2.2.2.2. OS/2
9512b1

9512b1
   Hamilton Labs:
9512b1
       http://www.hamiltonlabs.com/cshell.htm
9512b1

9512b1
   A sizable set of Unix/C shell utilities designed for OS/2. Price is
9512b1
   $350 in the US, $395 elsewhere, with FedEx shipping, unconditional
9512b1
   guarantee, unlimited support and free updates. A demo version of
9512b1
   the suite can be downloaded from this site, but a stand-alone copy
9512b1
   of sed is not available.
9512b1

9512b1
2.2.2.3. Windows 95/98, Windows NT, Windows 2000
9512b1

9512b1
   Hamilton Labs:
9512b1
       http://www.hamiltonlabs.com/cshell.htm
9512b1

9512b1
   A sizable set of Unix/C shell utilities designed for Win9x, WinNT,
9512b1
   and Win2K. Price is $350 in the US, $395 elsewhere, with FedEx
9512b1
   shipping, unconditional guarantee, unlimited support and free
9512b1
   updates. A demo version of the suite can be downloaded from this
9512b1
   site, but a stand-alone copy of sed is not available.
9512b1

9512b1
   Interix:
9512b1
       http://www.interix.com
9512b1

9512b1
   Interix (formerly known as OpenNT) is advertised as "a complete
9512b1
   UNIX system environment running natively on Microsoft Windows NT",
9512b1
   and is licensed and supported by Softway Systems. It offers over
9512b1
   200 Unix utilities, and supports Unix shells, sockets, networking,
9512b1
   and more. A single-user edition runs about $200. A free demo or
9512b1
   evaluation copy will run for 31 days and then quit; to continue
9512b1
   using it, you must purchase the commercial version.
9512b1

9512b1
   MKS NuTCRACKER Professional
9512b1
       http://www.datafocus.com/products/nutc/
9512b1

9512b1
   A different, yet related product line offered by MKS (Mortice Kern
9512b1
   Systems, below); the awkward spelling "NuTCRACKER" is intentional.
9512b1
   Various packages offer hundreds of Unix utilities for Win32
9512b1
   environments. Sed is not available as a separate product.
9512b1

9512b1
   UnixDos:
9512b1
       http://www.unixdos.com
9512b1

9512b1
   UnixDos is a suite of 82 Unix utilities ported over to the Windows
9512b1
   environments. There are 16-bit versions for Win3.x and 32-bit
9512b1
   versions for WinNT/Win95. It is distributed as uncrippled shareware
9512b1
   for the first 30 days. After the test period, the utilities will
9512b1
   not run and you must pay the registration fee of $50.
9512b1

9512b1
   Their version of sed supports "\n" in the RHS of expressions, and
9512b1
   increases the length of input lines to 10,000 characters. By
9512b1
   special arrangement with the owners, persons who want a licensed
9512b1
   version of sed *only* (without the other utilities) may pay a
9512b1
   license fee of $10.
9512b1

9512b1
   U/WIN:
9512b1
       http://www.research.att.com/sw/tools/uwin/
9512b1

9512b1
   U/WIN is a suite of Unix utilities created for WinNT and Win95
9512b1
   systems. It is owned by AT&T, created by David Korn (author of the
9512b1
   Unix korn shell), and is freely distributed only to educational
9512b1
   institutions, AT&T employees, or certain researchers; all others
9512b1
   must pay a fee after a 90-day evaluation period expires. U/WIN
9512b1
   operates best with the NTFS (WinNT file system) but will run in
9512b1
   degraded mode with the FAT file system and in further degraded mode
9512b1
   under Win95. A minimal installation takes about 25 to 30 megs of
9512b1
   disk space. Sed is not available as a separate file for download,
9512b1
   but comes with the suite.
9512b1

9512b1
2.2.2.4. MS-DOS
9512b1

9512b1
   Mix C/Utilities Toolchest
9512b1
       http://www.mixsoftware.com/product/utility.htm
9512b1

9512b1
   According to their web page, "The C/Utilities Toolchest adds over
9512b1
   40 powerful UNIX utilities to your MS-DOS operating system. The
9512b1
   result is an environment very similar to UNIX operating systems,
9512b1
   yet 100% compatible with MS-DOS programs and commands." The
9512b1
   toolchest costs $19.95, with source code available for an
9512b1
   additional fee. Mix C's version of sed is not available separately.
9512b1

9512b1
   MKS (Mortice Kern Systems) Toolkit
9512b1
       http://www.mks.com
9512b1

9512b1
   Sed comes bundled with the MKS Toolkit, which is distributed only
9512b1
   as commercial software; it is not available separately.
9512b1

9512b1
   Thompson Automation Software
9512b1
       http://www.tasoft.com
9512b1

9512b1
   The Thompson Toolkit contains over 100 familiar Unix utilities,
9512b1
   including a version of the Unix Korn shell. It runs under MS-DOS,
9512b1
   OS/2, Win3.x, Win9x, and WinNT. Sed is one of the utilities, though
9512b1
   Thompson is better known for its version of awk for DOS, TAWK. The
9512b1
   toolkit runs about $150; sed is not available separately.
9512b1

9512b1
2.3. Where can I learn to use sed?
9512b1

9512b1
2.3.1. Books
9512b1

9512b1
       _Sed & Awk, 2d edition_, by Dale Dougherty & Arnold Robbins
9512b1
       (Sebastopol, Calif: O'Reilly and Associates, 1997)
9512b1
       ISBN 1-56592-225-5
9512b1
       http://www.oreilly.com/catalog/sed2/noframes.html
9512b1

9512b1
   About 40 percent of this book is devoted to sed, and maybe 50
9512b1
   percent is devoted to awk. The other 10 percent covers regexes and
9512b1
   concepts common to both tools. If you prefer hard copy, this is
9512b1
   definitely the best single place to learn to use sed, including its
9512b1
   advanced features.
9512b1

9512b1
   The first edition is also very useful. Several typos crept into the
9512b1
   first printing of the first edition (though if you follow the
9512b1
   tutorials closely, you'll recognize them right away). A list of
9512b1
   errors from the first printing of _sed & awk_ is available at
9512b1
   <http://www.cs.colostate.edu/~dzubera/sedawk.txt>, and errors in
9512b1
   the 2nd are at <http://www.cs.colostate.edu/~dzubera/sedawk2.txt>,
9512b1
   though most of these were corrected in later printings. The second
9512b1
   edition tells how POSIX standards have affected these tools and
9512b1
   covers the popular GNU versions of sed and awk. Price is about (US)
9512b1
   $30.00
9512b1

9512b1
   -----
9512b1

9512b1
       _Mastering Regular Expressions, 2d ed.,_ by Jeffrey E. F. Friedl
9512b1
       (Sebastopol, Calif: O'Reilly and Associates, 2002)
9512b1
       ISBN 0-596-00289-0
9512b1
       http://regex.info
9512b1
       http://www.oreilly.com/catalog/regex2/
9512b1
       http://public.yahoo.com/~jfriedl/regex/ (for the first edition)
9512b1

9512b1
   Knowing how to use "regular expressions" is essential to effective
9512b1
   use of most Unix tools. This book focuses on how regular
9512b1
   expressions can be best implemented in utilities such as perl, vi,
9512b1
   emacs, and awk, but also touches on sed as well. Friedl's home page
9512b1
   (above) gives links to other sites which help students learn to
9512b1
   master regular expressions. His site also gives a Perl script for
9512b1
   determining a syntactically valid e-mail address, using regexes:
9512b1

9512b1
       http://public.yahoo.com/~jfriedl/regex/code.html
9512b1

9512b1
   -----
9512b1

9512b1
       _Awk und Sed_, by Helmut Herold.
9512b1
       (Bonn: Addison-Wesley, 1994; 288 pages)
9512b1
       2nd edition to be released in March 2003
9512b1
       ISBN 3-8273-2094-1
9512b1
       http://www.addison-wesley.de/main/main.asp?page=home/bookdetails&ProductID=37214
9512b1

9512b1
2.3.2. Mailing list
9512b1

9512b1
   If you are interested in learning more about sed (its syntax, using
9512b1
   regular expressions, etc.) you are welcome to subscribe to a
9512b1
   sed-oriented mailing list. In fact, there are two mailing lists
9512b1
   about sed: one in English named "sed-users", moderated by Sven
9512b1
   Guckes; and one in Portuguese named "sed-BR" (for sed-Brazil),
9512b1
   moderated by Aurelio Marinho Jargas. The average volume of mail for
9512b1
   "sed-users" is about 35 messages a week; the average volume of mail
9512b1
   for "sed-BR" is about 15 messages a week.
9512b1

9512b1
       sed-BR mailing list:    http://br.groups.yahoo.com/group/sed-br/
9512b1
       sed-users mailing list: http://groups.yahoo.com/group/sed-users/
9512b1

9512b1
   To subscribe to sed-users, send a blank message to:
9512b1

9512b1
       sed-users-subscribe@yahoogroups.com
9512b1

9512b1
   To unsubscribe from sed-users, send a blank message to:
9512b1

9512b1
       sed-users-unsubscribe@yahoogroups.com
9512b1

9512b1
2.3.3. Tutorials, electronic text
9512b1

9512b1
   The original users manual for sed, by Lee E. McMahon, from the
9512b1
   7th edition UNIX Manual (1978), with the classic "Kubla Khan"
9512b1
   example and tutorial, in formatted text format:
9512b1
       http://sed.sourceforge.net/grabbag/tutorials/sed_mcmahon.txt
9512b1

9512b1
   The source code to the preceding manual. Use "troff -ms sed" to
9512b1
   print this file properly:
9512b1
       http://plan9.bell-labs.com/7thEdMan/vol2/sed
9512b1
       http://cm.bell-labs.com/7thEdMan/vol2/sed
9512b1

9512b1
   "Do It With Sed", by Carlos Duarte
9512b1
       http://www.dbnet.ece.ntua.gr/~george/sed/OLD/sedtut_1.html
9512b1

9512b1
   "Sed: How to use sed, a special editor for modifying files
9512b1
   automatically", by Bruce Barnett and General Electric Company
9512b1
       http://www.grymoire.com/Unix/Sed.html
9512b1

9512b1
   U-SEDIT2.ZIP, by Mike Arst (16 June 1990)
9512b1
       ftp://ftp.cs.umu.se/pub/pc/u-sedit2.zip
9512b1
       ftp://ftp.uni-stuttgart.de/pub/systems/msdos/util/unixlike/u-sedit2.zip
9512b1
       ftp://sunsite.icm.edu.pl/vol/wojsyl/garbo/pc/editor/u-sedit2.zip
9512b1
       ftp://ftp.sogang.ac.kr/pub/msdos/garbo_pc/editor/u-sedit2.zip
9512b1

9512b1
   U-SEDIT3.ZIP, by Mike Arst (24 Jan. 1992)
9512b1
       http://www.student.northpark.edu/pemente/sed/u-sedit3.zip
9512b1
       CompuServe DTPFORUM, "PC DTP Utilities" library, file SEDDOC.ZIP
9512b1

9512b1
   Another sed FAQ
9512b1
       http://www.dreamwvr.com/sed-info/sed-faq.html
9512b1

9512b1
   sed-tutorial, by Felix von Leitner
9512b1
       http://www.math.fu-berlin.de/~leitner/sed/tutorial.html
9512b1

9512b1
   "Manipulating text with sed," chapter 14 of the SCO OpenServer
9512b1
   "Operating System Users Guide"
9512b1
       http://ou800doc.caldera.com/SHL_automate/CTOC-Manipulating_text_with_sed.html
9512b1

9512b1
   "Combining the Bourne-shell, sed and awk in the UNIX environment
9512b1
   for language analysis," by Lothar Schmitt and Kiel Christianson.
9512b1
   This basic tutorial on the Bourne shell, sed and awk downloads as a
9512b1
   71-page PostScript file (compressed to 290K with gzip). You may
9512b1
   need to navigate down from the root to get the file.
9512b1
       ftp://ftp.u-aizu.ac.jp/u-aizu/doc/Tech-Report/1997/97-2-007.tar.gz
9512b1
       available upon request from Lothar Schmitt <lothar@u-aizu.ac.jp>
9512b1

9512b1
2.3.4. General web and ftp sites
9512b1

9512b1
       http://sed.sourceforge.net/grabbag             # Collected scripts
9512b1
       http://main.rtfiber.com.tw/~changyj/sed/       # Yao-Jen Chang
9512b1
       http://www.math.fu-berlin.de/~guckes/sed/      # Sven Guckes
9512b1
       http://www.math.fu-berlin.de/~leitner/sed/     # Felix von Leitner
9512b1
       http://www.dbnet.ece.ntua.gr/~george/sed/      # Yiorgos Adamopoulos
9512b1
       http://www.student.northpark.edu/pemente/sed/  # Eric Pement
9512b1

9512b1
       http://spacsun.rice.edu/FAQ/sed.html
9512b1
       ftp://algos.inesc.pt/pub/users/cdua/scripts.tar.gz (sed and shell scripts)
9512b1

9512b1
   "Handy One-Liners For Sed", compiled by Eric Pement. A large list
9512b1
   of 1-line sed commands which can be executed from the command line.
9512b1
       http://sed.sourceforge.net/sed1line.txt
9512b1
       http://www.student.northpark.edu/pemente/sed/sed1line.txt
9512b1

9512b1
   "Handy One-Liners For Sed", translated to Portuguese
9512b1
       http://wmaker.lrv.ufsc.br/sed_ptBR.html
9512b1

9512b1
   The Single UNIX Specification, Version 3 (technical man page)
9512b1
       http://www.opengroup.org/onlinepubs/007904975/utilities/sed.html
9512b1

9512b1
   Getting started with sed
9512b1
       http://www.cs.hmc.edu/tech_docs/qref/sed.html
9512b1

9512b1
   masm to gas converter
9512b1
       http://www.delorie.com/djgpp/faq/converting/asm2s-sed.html
9512b1

9512b1
   mail2html.zip
9512b1
       http://www.crispen.org/src/#mail2html
9512b1

9512b1
   sample uses of sed in batch files and scripts (Benny Pederson)
9512b1
       http://users.cybercity.dk/~bse26236/batutil/help/SED.HTM
9512b1

9512b1
   dc.sed - the most complex and impressive sed script ever written.
9512b1
   This sed script by Greg Ubben emulates the Unix dc (desk
9512b1
   calculator), including base conversion, exponentiation, square
9512b1
   roots, and much more.
9512b1
       http://sed.sourceforge.net/grabbag/scripts/dc_overview.htm
9512b1

9512b1
   If you should find other tutorials or scripts that should be added
9512b1
   to this document, please forward the URLs to the FAQ maintainer.
9512b1

9512b1
------------------------------
9512b1

9512b1
3. TECHNICAL
9512b1

9512b1
3.1. More detailed explanation of basic sed
9512b1

9512b1
   Sed takes a script of editing commands and applies each command, in
9512b1
   order, to each line of input. After all the commands have been
9512b1
   applied to the first line of input, that line is output. A second
9512b1
   input line is taken for processing, and the cycle repeats. Sed
9512b1
   scripts can address a single line by line number or by matching a
9512b1
   /RE pattern/ on the line. An exclamation mark '!' after a regex
9512b1
   ('/RE/!') or line number will select all lines that do NOT match
9512b1
   that address. Sed can also address a range of lines in the same
9512b1
   manner, using a comma to separate the 2 addresses.
9512b1

9512b1
     $d               # delete the last line of the file
9512b1
     /[0-9]\{3\}/p    # print lines with 3 consecutive digits
9512b1
     5!s/ham/cheese/  # except on line 5, replace 'ham' with 'cheese'
9512b1
     /awk/!s/aaa/bb/  # unless 'awk' is found, replace 'aaa' with 'bb'
9512b1
     17,/foo/d        # delete all lines from line 17 up to 'foo'
9512b1

9512b1
   Following an address or address range, sed accepts curly braces
9512b1
   '{...}' so several commands may be applied to that line or to the
9512b1
   lines matched by the address range. On the command line, semicolons
9512b1
   ';' separate each instruction and must precede the closing brace.
9512b1

9512b1
     sed '/Owner:/{s/yours/mine/g;s/your/my/g;s/you/me/g;}' file
9512b1

9512b1
   Range addresses operate differently depending on which version of
9512b1
   sed is used (see section 3.4, below). For further information on
9512b1
   using sed, consult the references in section 2.3, above.
9512b1

9512b1
3.1.1. Regular expressions on the left side of "s///"
9512b1

9512b1
   All versions of sed support Basic Regular Expressions (BREs). For
9512b1
   the syntax of BREs, enter "man ed" at a Unix shell prompt. A
9512b1
   technical description of BREs from IEEE POSIX 1003.1-2001 and the
9512b1
   Single UNIX Specification Version 3 is available online at:
9512b1
   http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap09.html#tag_09_03
9512b1

9512b1
   Sed normally supports BREs plus '\n' to match a newline in the
9512b1
   pattern space, plus '\xREx' as equivalent to '/RE/', where 'x' is any
9512b1
   character other than a newline or another backslash.
9512b1

9512b1
   Some versions of sed support supersets of BREs, or "extended
9512b1
   regular expressions", which offer additional metacharacters for
9512b1
   increased flexibility. For additional information on extended REs
9512b1
   in GNU sed, see sections 3.7 ("GNU/POSIX extensions to regular
9512b1
   expressions") and 6.7.3 ("Special syntax in REs"), below.
9512b1

9512b1
   Though not required by BREs, some versions of sed support \t to
9512b1
   represent a TAB, \r for carriage return, \xHH for direct entry of
9512b1
   hex codes, and so forth. Other versions of sed do not.
9512b1

9512b1
   ssed (super-sed) introduced many new features for LHS pattern
9512b1
   matching, too many to give here. The complete list is found in
9512b1
   section 6.7.3.H ("ssed"), below.
9512b1

9512b1
3.1.2. Escape characters on the right side of "s///"
9512b1

9512b1
   The right-hand side (the replacement part) in "s/find/replace/" is
9512b1
   almost always a string literal, with no interpolation of these
9512b1
   metacharacters:
9512b1

9512b1
       .   ^   $   [   ]   {   }   (   )  ?   +   *   |
9512b1

9512b1
   Three things *are* interpolated: ampersand (&), backreferences, and
9512b1
   options for special seds. An ampersand on the RHS is replaced by
9512b1
   the entire expression matched on the LHS. There is _never_ any
9512b1
   reason to use grouping like this:
9512b1

9512b1
       s/\(some-complex-regex\)/one two \1 three/
9512b1

9512b1
   since you can do this instead:
9512b1

9512b1
       s/some-complex-regex/one two & three/
9512b1

9512b1
   To enter a literal ampersand on the RHS, type '\&'.
9512b1

9512b1
   Grouping and backreferences: All versions of sed support grouping
9512b1
   and backreferences on the LHS and backreferences only on the RHS.
9512b1
   Grouping allows a series of characters to be collected in a set,
9512b1
   indicating the boundaries of the set with \( and \). Then the set
9512b1
   can be designated to be repeated a certain number of times
9512b1

9512b1
       \(like this\)*   or   \(like this\)\{5,7\}.
9512b1

9512b1
   Groups can also be nested "\(like \(this\) is here\)" and may
9512b1
   contain any valid RE. Backreferences repeat the contents of a
9512b1
   particular group, using a backslash and a digit (1-9) for each
9512b1
   corresponding group. In other words, "/\(pom\)\1/" is another way
9512b1
   of writing "/pompom/". If groups are nested, backreference numbers
9512b1
   are counted by matching \( in strict left to right order.  Thus,
9512b1
   /..\(the \(word\)\) \("foo"\)../ is matched by the backreference
9512b1
   \3. Backreferences can be used in the LHS, the RHS, and in normal
9512b1
   RE addressing (see section 3.3).  Thus,
9512b1

9512b1
       /\(.\)\1\(.\)\2\(.\)\3/;      # matches "bookkeeper"
9512b1
       /^\(.\)\(.\)\(.\)\3\2\1$/;    # finds 6-letter palindromes
9512b1

9512b1
   Seds differ in how they treat invalid backreferences where no
9512b1
   corresponding group occurs. To insert a literal ampersand or
9512b1
   backslash into the RHS, prefix it with a backslash: \& or \\.
9512b1

9512b1
   ssed, sed16, and sedmod permit additional options on the RHS. They
9512b1
   all support changing part of the replacement string to upper case
9512b1
   (\u or \U), lower case (\l or \L), or to end case conversion (\E).
9512b1
   Both sed16 and sedmod support awk-style word references ($1, $2,
9512b1
   $3, ...) and $0 to insert the entire line before conversion.
9512b1

9512b1
     echo ab ghi | sed16 "s/.*/$0 - \U$2/"   # prints "ab ghi - GHI"
9512b1

9512b1
   *Note:* This feature of sed16 and sedmod will break sed scripts which
9512b1
   put a dollar sign and digit into the RHS. Though this is an unlikely
9512b1
   combination, it's worth remembering if you use other people's scripts.
9512b1

9512b1
3.1.3.  Substitution switches
9512b1

9512b1
   Standard versions of sed support 4 main flags or switches which may
9512b1
   be added to the end of an "s///" command. They are:
9512b1

9512b1
       N      - Replace the Nth match of the pattern on the LHS, where
9512b1
                N is an integer between 1 and 512. If N is omitted,
9512b1
                the default is to replace the first match only.
9512b1
       g      - Global replace of all matches to the pattern.
9512b1
       p      - Print the results to stdout, even if -n switch is used.
9512b1
       w file - Write the pattern space to 'file' if a replacement was
9512b1
                done. If the file already exists when the script is
9512b1
                executed, it is overwritten. During script execution,
9512b1
                w appends to the file for each match.
9512b1

9512b1
   GNU sed 3.02 and ssed also offer the /I switch for doing a
9512b1
   case-insensitive match. For example,
9512b1

9512b1
     echo ONE TWO | gsed "s/one/unos/I"      # prints "unos TWO"
9512b1

9512b1
   GNU sed 4.x and ssed add the /M switch, to simplify working with
9512b1
   multi-line patterns: when it is used, ^ or $ will match BOL or EOL.
9512b1
   \` and \' remain available to match the start and end of pattern
9512b1
   space, respectively.
9512b1

9512b1
   ssed supports two more switches, /S and /X, when its Perl mode is
9512b1
   used. They are described in detail in section 6.7.3.H, below.
9512b1

9512b1
3.1.4. Command-line switches
9512b1

9512b1
   All versions of sed support two switches, -e and -n. Though sed
9512b1
   usually separates multiple commands with semicolons (e.g., "H;d;"),
9512b1
   certain commands could not accept a semicolon command separator.
9512b1
   These include :labels, 't', and 'b'. These commands had to occur
9512b1
   last in a script, separated by -e option switches. For example:
9512b1

9512b1
     # The 'ta' means jump to label :a if last s/// returns true
9512b1
     sed -e :a -e '$!N;s/\n=/ /;ta' -e 'P;D' file
9512b1

9512b1
   The -n switch turns off sed's default behavior of printing every
9512b1
   line. With -n, lines are printed only if explicitly told to. In
9512b1
   addition, for certain versions of sed, if an external script begins
9512b1
   with "#n" as its first two characters, the output is suppressed
9512b1
   (exactly as if -n had been entered on the command line). A list of
9512b1
   which versions appears in section 6.7.2., below.
9512b1

9512b1
   GNU sed 4.x and ssed support additional switches. -l (lowercase L),
9512b1
   followed by a number, lets you adjust the default length of the 'l'
9512b1
   and 'L' commands (note that these implementations of sed also
9512b1
   support an argument to these commands, to tailor the length
9512b1
   separately of each occurrence of the command).
9512b1

9512b1
   -i activates in-place editing (see section 4.41.1, below). -s
9512b1
   treats each file as a separate stream: sed by default joins all the
9512b1
   files, so $ represents the last line of the last file; 15 means the
9512b1
   15th line in the joined stream; and /abc/,/def/ might match across
9512b1
   files.
9512b1

9512b1
   When -s is used, however all addresses refer to single files. For
9512b1
   example, $ represents the last line of each input file; 15 means
9512b1
   the 15th line of each input file; and /abc/,/def/ will be "reset"
9512b1
   (in other words, sed will not execute the commands and start
9512b1
   looking for /abc/ again) if a file ends before /def/ has been
9512b1
   matched. Note that -i automatically activates this interpretation
9512b1
   of addresses.
9512b1

9512b1
3.2. Common one-line sed scripts
9512b1

9512b1
   A separate document of over 70 handy "one-line" sed commands is
9512b1
   available at
9512b1
       http://sed.sourceforge.net/sed1line.txt
9512b1

9512b1
   Here are several common sed commands for one-line use. MS-DOS users
9512b1
   should replace single quotes ('...') with double quotes ("...") in
9512b1
   these examples. A specific filename usually follows the script,
9512b1
   though the input may also come via piping or redirection.
9512b1

9512b1
   # Double space a file
9512b1
   sed G file
9512b1

9512b1
   # Triple space a file
9512b1
   sed 'G;G' file
9512b1

9512b1
   # Under UNIX: convert DOS newlines (CR/LF) to Unix format
9512b1
   sed 's/.$//' file    # assumes that all lines end with CR/LF
9512b1
   sed 's/^M$// file    # in bash/tcsh, press Ctrl-V then Ctrl-M
9512b1

9512b1
   # Under DOS: convert Unix newlines (LF) to DOS format
9512b1
   sed 's/$//' file                     # method 1
9512b1
   sed -n p file                        # method 2
9512b1

9512b1
   # Delete leading whitespace (spaces/tabs) from front of each line
9512b1
   # (this aligns all text flush left). '^t' represents a true tab
9512b1
   # character. Under bash or tcsh, press Ctrl-V then Ctrl-I.
9512b1
   sed 's/^[ ^t]*//' file
9512b1

9512b1
   # Delete trailing whitespace (spaces/tabs) from end of each line
9512b1
   sed 's/[ ^t]*$//' file               # see note on '^t', above
9512b1

9512b1
   # Delete BOTH leading and trailing whitespace from each line
9512b1
   sed 's/^[ ^t]*//;s/[ ^]*$//' file    # see note on '^t', above
9512b1

9512b1
   # Substitute "foo" with "bar" on each line
9512b1
   sed 's/foo/bar/' file        # replaces only 1st instance in a line
9512b1
   sed 's/foo/bar/4' file       # replaces only 4th instance in a line
9512b1
   sed 's/foo/bar/g' file       # replaces ALL instances within a line
9512b1

9512b1
   # Substitute "foo" with "bar" ONLY for lines which contain "baz"
9512b1
   sed '/baz/s/foo/bar/g' file
9512b1

9512b1
   # Delete all CONSECUTIVE blank lines from file except the first.
9512b1
   # This method also deletes all blank lines from top and end of file.
9512b1
   # (emulates "cat -s")
9512b1
   sed '/./,/^$/!d' file       # this allows 0 blanks at top, 1 at EOF
9512b1
   sed '/^$/N;/\n$/D' file     # this allows 1 blank at top, 0 at EOF
9512b1

9512b1
   # Delete all leading blank lines at top of file (only).
9512b1
   sed '/./,$!d' file
9512b1

9512b1
   # Delete all trailing blank lines at end of file (only).
9512b1
   sed -e :a -e '/^\n*$/{$d;N;};/\n$/ba' file
9512b1

9512b1
   # If a line ends with a backslash, join the next line to it.
9512b1
   sed -e :a -e '/\\$/N; s/\\\n//; ta' file
9512b1

9512b1
   # If a line begins with an equal sign, append it to the previous
9512b1
   # line (and replace the "=" with a single space).
9512b1
   sed -e :a -e '$!N;s/\n=/ /;ta' -e 'P;D' file
9512b1

9512b1
3.3. Addressing and address ranges
9512b1

9512b1
   Sed commands may have an optional "address" or "address range"
9512b1
   prefix. If there is no address or address range given, then the
9512b1
   command is applied to all the lines of the input file or text
9512b1
   stream. Three commands cannot take an address prefix:
9512b1

9512b1
      - labels, used to branch or jump within the script
9512b1
      - the close brace, '}', which ends the '{' "command"
9512b1
      - the '#' comment character, also technically a "command"
9512b1

9512b1
   An address can be a line number (such as 1, 5, 37, etc.), a regular
9512b1
   expression (written in the form /RE/ or \xREx where 'x' is any
9512b1
   character other than '\' and RE is the regular expression), or the
9512b1
   dollar sign ($), representing the last line of the file. An
9512b1
   exclamation mark (!) after an address or address range will apply
9512b1
   the command to every line EXCEPT the ones named by the address. A
9512b1
   null regex ("//") will be replaced by the last regex which was
9512b1
   used. Also, some seds do not support \xREx as regex delimiters.
9512b1

9512b1
     5d               # delete line 5 only
9512b1
     5!d              # delete every line except line 5
9512b1
     /RE/s/LHS/RHS/g  # substitute only if RE occurs on the line
9512b1
     /^$/b label      # if the line is blank, branch to ':label'
9512b1
     /./!b label      # ... another way to write the same command
9512b1
     \%.%!b label     # ... yet another way to write this command
9512b1
     $!N              # on all lines but the last, get the Next line
9512b1

9512b1
   Note that an embedded newline can be represented in an address by
9512b1
   the symbol \n, but this syntax is needed only if the script puts 2
9512b1
   or more lines into the pattern space via the N, G, or other
9512b1
   commands. The \n symbol does *not* match the newline at an
9512b1
   end-of-line because when sed reads each line into the pattern space
9512b1
   for processing, it strips off the trailing newline, processes the
9512b1
   line, and adds a newline back when printing the line to standard
9512b1
   output. To match the end-of-line, use the '$' metacharacter, as
9512b1
   follows:
9512b1

9512b1
     /tape$/       # matches the word 'tape' at the end of a line
9512b1
     /tape$deck/   # matches the word 'tape$deck' with a literal '$'
9512b1
     /tape\ndeck/  # matches 'tape' and 'deck' with a newline between
9512b1

9512b1
   The following sed commands usually accept *only* a single address.
9512b1
   All other commands (except labels, '}', and '#') accept both single
9512b1
   addresses and address ranges.
9512b1

9512b1
     =       print to stdout the line number of the current line
9512b1
     a       after printing the current line, append "text" to stdout
9512b1
     i       before printing the current line, insert "text" to stdout
9512b1
     q       quit after the current line is matched
9512b1
     r file  prints contents of "file" to stdout after line is matched
9512b1

9512b1
   Note that we said "usually." If you need to apply the '=', 'a',
9512b1
   'i', or 'r' commands to each and every line within an address
9512b1
   range, this behavior can be coerced by the use of braces. Thus,
9512b1
   "1,9=" is an invalid command, but "1,9{=;}" will print each line
9512b1
   number followed by its line for the first 9 lines (and then print
9512b1
   the rest of the rest of the file normally).
9512b1

9512b1
   Address ranges occur in the form
9512b1

9512b1
       <address1>,<address2>    or    <address1>,<address2>!
9512b1

9512b1
   where the address can be a line number or a standard /regex/.
9512b1
   <address2> can also be a dollar sign, indicating the end of file.
9512b1
   Under GNU sed 3.02+, ssed, and sed15+, <address2> may also be a
9512b1
   notation of the form +num, indicating the next _num_ lines after
9512b1
   <address1> is matched.
9512b1

9512b1
   Address ranges are:
9512b1

9512b1
   (1) Inclusive. The range "/From here/,/eternity/" matches all the
9512b1
   lines containing "From here" up to and including the line
9512b1
   containing "eternity". It will not stop on the line just prior to
9512b1
   "eternity". (If you don't like this, see section 4.24.)
9512b1

9512b1
   (2) Plenary. They always match full lines, not just parts of lines.
9512b1
   In other words, a command to change or delete an address range will
9512b1
   change or delete whole lines; it won't stop in the middle of a
9512b1
   line.
9512b1

9512b1
   (3) Multi-linear. Address ranges normally match 2 lines or more.
9512b1
   The second address will never match the same line the first address
9512b1
   did; therefore a valid address range always spans at least two
9512b1
   lines, with these exceptions which match only one line:
9512b1

9512b1
      - if the first address matches the last line of the file
9512b1
      - if using the syntax "/RE/,3" and /RE/ occurs only once in the
9512b1
        file at line 3 or below
9512b1
      - if using HHsed v1.5. See section 3.4.
9512b1

9512b1
   (4) Minimalist. In address ranges with /regex/ as <address2>, the
9512b1
   range "/foo/,/bar/" will stop at the first "bar" it finds, provided
9512b1
   that "bar" occurs on a line below "foo". If the word "bar" occurs
9512b1
   on several lines below the word "foo", the range will match all the
9512b1
   lines from the first "foo" up to the first "bar". It will not
9512b1
   continue hopping ahead to find more "bar"s. In other words, address
9512b1
   ranges are not "greedy," like regular expressions.
9512b1

9512b1
   (5) Repeating. An address range will try to match more than one
9512b1
   block of lines in a file. However, the blocks cannot nest. In
9512b1
   addition, a second match will not "take" the last line of the
9512b1
   previous block.  For example, given the following text,
9512b1

9512b1
       start
9512b1
       stop  start
9512b1
       stop
9512b1

9512b1
   the sed command '/start/,/stop/d' will only delete the first two
9512b1
   lines. It will not delete all 3 lines.
9512b1

9512b1
   (6) Relentless. If the address range finds a "start" match but
9512b1
   doesn't find a "stop", it will match every line from "start" to the
9512b1
   end of the file. Thus, beware of the following behaviors:
9512b1

9512b1
     /RE1/,/RE2/  # If /RE2/ is not found, matches from /RE1/ to the
9512b1
                  # end-of-file.
9512b1

9512b1
     20,/RE/      # If /RE/ is not found, matches from line 20 to the
9512b1
                  # end-of-file.
9512b1

9512b1
     /RE/,30      # If /RE/ occurs any time after line 30, each
9512b1
                  # occurrence will be matched in sed15+, sedmod, and
9512b1
                  # GNU sed v3.02+. GNU sed v2.05 and 1.18 will match
9512b1
                  # from the 2nd occurrence of /RE/ to the end-of-file.
9512b1

9512b1
   If these behaviors seem strange, remember that they occur because
9512b1
   sed does not look "ahead" in the file. Doing so would stop sed from
9512b1
   being a stream editor and have adverse effects on its efficiency.
9512b1
   If these behaviors are undesirable, they can be circumvented or
9512b1
   corrected by the use of nested testing within braces. The following
9512b1
   scripts work under GNU sed 3.02:
9512b1

9512b1
     # Execute your_commands on range "/RE1/,/RE2/", but if /RE2/ is
9512b1
     # not found, do nothing.
9512b1
     /RE1/{:a;N;/RE2/!ba;your_commands;}
9512b1

9512b1
     # Execute your_commands on range "20,/RE/", but if /RE/ is not
9512b1
     # found, do nothing.
9512b1
     20{:a;N;/RE/!ba;your_commands;}
9512b1

9512b1
   As a side note, once we've used N to "slurp" lines together to test
9512b1
   for the ending expression, the pattern space will have gathered
9512b1
   many lines (possibly thousands) together and concatenated them as a
9512b1
   single expression, with the \n sequence marking line breaks. The
9512b1
   REs *within* the pattern space may have to be modified (e.g., you
9512b1
   must write '/\nStart/' instead of '/^Start/' and '/[^\n]*/' instead
9512b1
   of '/.*/') and other standard sed commands will be unavailable or
9512b1
   difficult to use.
9512b1

9512b1
     # Execute your_commands on range "/RE/,30", but if /RE/ occurs
9512b1
     # on line 31 or later, do not match it.
9512b1
     1,30{/RE/,$ your_commands;}
9512b1

9512b1
   For related suggestions on using address ranges, see sections 4.2,
9512b1
   4.15, and 4.19 of this FAQ. Also, note the following section.
9512b1

9512b1
3.4. Address ranges in GNU sed and HHsed
9512b1

9512b1
   (1) GNU sed 3.02+, ssed, and sed15+ all support address ranges like:
9512b1

9512b1
       /regex/,+5
9512b1

9512b1
   which match /regex/ plus the next 5 lines (or EOF, whichever comes
9512b1
   first).
9512b1

9512b1
   (2) GNU sed v3.02.80 (and above) and ssed support address ranges of:
9512b1

9512b1
       0,/regex/
9512b1

9512b1
   as a special case to permit matching /regex/ if it occurs on the
9512b1
   first line. This syntax permits a range expression that matches
9512b1
   every line from the top of the file to the first instance of
9512b1
   /regex/, even if /regex/ is on the first line.
9512b1

9512b1
   (3) HHsed (sed15) has an exceptional way of implementing
9512b1

9512b1
       /regex1/,/regex2/
9512b1

9512b1
   If /RE1/ and /RE2/ both occur on the *same* line, HHsed will match
9512b1
   that single line. In other words, an address range block can
9512b1
   consist of just one line. HHsed will then look for the next
9512b1
   occurrence of /regex1/ to begin the block again.
9512b1

9512b1
   Every other version of sed (including sed16) requires 2 lines to
9512b1
   match an address range, and thus /regex1/ and /regex2/ cannot
9512b1
   successfully match just one line. See also the comments at
9512b1
   section 7.9.4, below.
9512b1

9512b1
   (4) BEGIN~STEP selection: ssed and GNU sed (v2.05 and above) offer
9512b1
   a form of addressing called "BEGIN~STEP selection". This is *not* a
9512b1
   range address, which selects an inclusive block of consecutive
9512b1
   lines from /start/ to /finish/. But I think it seems to belong here.
9512b1

9512b1
   Given an expression of the form "M~N", where M and N are integers,
9512b1
   GNU sed and ssed will select every Nth line, beginning at line M.
9512b1
   (With gsed v2.05, M had to be less than N, but this restriction is
9512b1
   no longer necessary). Both M and N may equal 0 ("0~0" selects every
9512b1
   line). These examples illustrate the syntax:
9512b1

9512b1
     sed '1~3d' file      # delete every 3d line, starting with line 1
9512b1
                          # deletes lines 1, 4, 7, 10, 13, 16, ...
9512b1

9512b1
     sed '0~3d' file      # deletes lines 3, 6, 9, 12, 15, 18, ...
9512b1

9512b1
     sed -n '2~5p' file   # print every 5th line, starting with line 2
9512b1
                          # prints lines 2, 7, 12, 17, 22, 27, ...
9512b1

9512b1
   (5) Finally, GNU sed v2.05 has a bug in range addressing (see
9512b1
   section 7.5), which was fixed in the higher versions.
9512b1

9512b1

9512b1
3.5. Debugging sed scripts
9512b1

9512b1
   The following two debuggers should make it easier to understand how
9512b1
   sed scripts operate. They can save hours of grief when trying to
9512b1
   determine the problems with a sed script.
9512b1

9512b1
   (1) sd (sed debugger), by Brian Hiles
9512b1

9512b1
   This debugger runs under a Unix shell, is powerful, and is easy to
9512b1
   use. sd has conditional breakpoints and spypoints of the pattern
9512b1
   space and hold space, on any scope defined by regex match and/or
9512b1
   script line number. It can be semi-automated, can save diagnostic
9512b1
   reports, and shows potential problems with a sed script before it
9512b1
   tries to execute it. The script is robust and requires the Unix
9512b1
   shell utilities plus the Bourne shell or Korn shell to execute.
9512b1

9512b1
       http://sed.sourceforge.net/grabbag/scripts/sd.ksh.txt (2003)
9512b1
       http://sed.sourceforge.net/grabbag/scripts/sd.sh.txt  (1998)
9512b1

9512b1
   (2) sedsed, by Aurelio Jargas
9512b1

9512b1
   This debugger requires Python to run it, and it uses your own
9512b1
   version of sed, whatever that may be. It displays the current input
9512b1
   line, the pattern space, and the hold space, before and after each
9512b1
   sed command is executed.
9512b1

9512b1
       http://sedsed.sourceforge.net
9512b1

9512b1

9512b1
3.6. Notes about s2p, the sed-to-perl translator
9512b1

9512b1
   s2p (sed to perl) is a Perl program to convert sed scripts into the
9512b1
   Perl programming language; it is included with many versions of
9512b1
   Perl. These problems have been found when using s2p:
9512b1

9512b1
   (1) Doesn't recognize the semicolon properly after s/// commands.
9512b1

9512b1
       s/foo/bar/g;
9512b1

9512b1
   (2) Doesn't trim trailing whitespace after s/// commands. Even lone
9512b1
   trailing spaces, without comments, produce an error.
9512b1

9512b1
   (3) Doesn't handle multiple commands within braces. E.g.,
9512b1

9512b1
       1,4{=;G;}
9512b1

9512b1
   will produce perl code with missing braces, and miss the second "G"
9512b1
   command as well. In fact, any commands after the first one are
9512b1
   missed in the perl output script, and the output perl script will
9512b1
   also contain mismatched braces.
9512b1

9512b1
3.7. GNU/POSIX extensions to regular expressions
9512b1

9512b1
   GNU sed supports "character classes" in addition to regular
9512b1
   character sets, such as [0-9A-F]. Like regular character sets,
9512b1
   character classes represent any single character within a set.
9512b1

9512b1
   "Character classes are a new feature introduced in the POSIX
9512b1
   standard. A character class is a special notation for describing
9512b1
   lists of characters that have a specific attribute, but where the
9512b1
   actual characters themselves can vary from country to country
9512b1
   and/or from character set to character set. For example, the notion
9512b1
   of what is an alphabetic character differs in the USA and in
9512b1
   France." [quoted from the docs for GNU awk v3.1.0.]
9512b1

9512b1
   Though character classes don't generally conserve space on the
9512b1
   line, they help make scripts portable for international use. The
9512b1
   equivalent character sets _for U.S. users_ follows:
9512b1

9512b1
     [[:alnum:]]  - [A-Za-z0-9]     Alphanumeric characters
9512b1
     [[:alpha:]]  - [A-Za-z]        Alphabetic characters
9512b1
     [[:blank:]]  - [ \x09]         Space or tab characters only
9512b1
     [[:cntrl:]]  - [\x00-\x19\x7F] Control characters
9512b1
     [[:digit:]]  - [0-9]           Numeric characters
9512b1
     [[:graph:]]  - [!-~]           Printable and visible characters
9512b1
     [[:lower:]]  - [a-z]           Lower-case alphabetic characters
9512b1
     [[:print:]]  - [ -~]           Printable (non-Control) characters
9512b1
     [[:punct:]]  - [!-/:-@[-`{-~]  Punctuation characters
9512b1
     [[:space:]]  - [ \t\v\f]       All whitespace chars
9512b1
     [[:upper:]]  - [A-Z]           Upper-case alphabetic characters
9512b1
     [[:xdigit:]] - [0-9a-fA-F]     Hexadecimal digit characters
9512b1

9512b1
   Note that [[:graph:]] does not match the space " ", but [[:print:]]
9512b1
   does. Some character classes may (or may not) match characters in
9512b1
   the high ASCII range (ASCII 128-255 or 0x80-0xFF), depending on
9512b1
   which C library was used to compile sed. For non-English languages,
9512b1
   [[:alpha:]] and other classes may also match high ASCII characters.
9512b1

9512b1
------------------------------
9512b1

9512b1
4. EXAMPLES
9512b1

9512b1
   ONE-CHARACTER QUESTIONS
9512b1

9512b1
4.1. How do I insert a newline into the RHS of a substitution?
9512b1

9512b1
   Several versions of sed permit '\n' to be typed directly into the
9512b1
   RHS, which is then converted to a newline on output: ssed,
9512b1
   gsed302a+, gsed103 (with the -x switch), sed15+, sedmod, and
9512b1
   UnixDOS sed. The _easiest_ solution is to use one of these
9512b1
   versions.
9512b1

9512b1
   For other versions of sed, try one of the following:
9512b1

9512b1
   (a) If typing the sed script from a Bourne shell, use one backslash
9512b1
   "\" if the script uses 'single quotes' or two backslashes "\\" if
9512b1
   the script requires "double quotes". In the example below, note
9512b1
   that the leading '>' on the 2nd line is generated by the shell to
9512b1
   prompt the user for more input. The user types in slash,
9512b1
   single-quote, and then ENTER to terminate the command:
9512b1

9512b1
     [sh-prompt]$ echo twolines | sed 's/two/& new\
9512b1
     >/'
9512b1
     two new
9512b1
     lines
9512b1
     [bash-prompt]$
9512b1

9512b1
   (b) Use a script file with one backslash '\' in the script,
9512b1
   immediately followed by a newline. This will embed a newline into
9512b1
   the "replace" portion. Example:
9512b1

9512b1
     sed -f newline.sed files
9512b1

9512b1
     # newline.sed
9512b1
     s/twolines/two new\
9512b1
     lines/g
9512b1

9512b1
   Some versions of sed may not need the trailing backslash. If so,
9512b1
   remove it.
9512b1

9512b1
   (c) Insert an unused character and pipe the output through tr:
9512b1

9512b1
     echo twolines | sed 's/two/& new=/' | tr "=" "\n"   # produces
9512b1
     two new
9512b1
     lines
9512b1

9512b1
   (d) Use the "G" command:
9512b1

9512b1
   G appends a newline, plus the contents of the hold space to the end
9512b1
   of the pattern space. If the hold space is empty, a newline is
9512b1
   appended anyway. The newline is stored in the pattern space as "\n"
9512b1
   where it can be addressed by grouping "\(...\)" and moved in the
9512b1
   RHS. Thus, to change the "twolines" example used earlier, the
9512b1
   following script will work:
9512b1

9512b1
     sed '/twolines/{G;s/\(two\)\(lines\)\(\n\)/\1\3\2/;}'
9512b1

9512b1
   (e) Inserting full lines, not breaking lines up:
9512b1

9512b1
   If one is not *changing* lines but only inserting complete lines
9512b1
   before or after a pattern, the procedure is much easier. Use the
9512b1
   "i" (insert) or "a" (append) command, making the alterations by an
9512b1
   external script. To insert "This line is new" BEFORE each line
9512b1
   matching a regex:
9512b1

9512b1
     /RE/i This line is new               # HHsed, sedmod, gsed 3.02a
9512b1
     /RE/{x;s/$/This line is new/;G;}     # other seds
9512b1

9512b1
   The two examples above are intended as "one-line" commands entered
9512b1
   from the console. If using a sed script, "i\" immediately followed
9512b1
   by a literal newline will work on all versions of sed. Furthermore,
9512b1
   the command "s/$/This line is new/" will only work if the hold
9512b1
   space is already empty (which it is by default).
9512b1

9512b1
   To append "This line is new" AFTER each line matching a regex:
9512b1

9512b1
     /RE/a This line is new               # HHsed, sedmod, gsed 3.02a
9512b1
     /RE/{G;s/$/This line is new/;}       # other seds
9512b1

9512b1
   To append 2 blank lines after each line matching a regex:
9512b1

9512b1
     /RE/{G;G;}                    # assumes the hold space is empty
9512b1

9512b1
   To replace each line matching a regex with 5 blank lines:
9512b1

9512b1
     /RE/{s/.*//;G;G;G;G;}         # assumes the hold space is empty
9512b1

9512b1
   (f) Use the "y///" command if possible:
9512b1

9512b1
   On some Unix versions of sed (not GNU sed!), though the s///
9512b1
   command won't accept '\n' in the RHS, the y/// command does. If
9512b1
   your Unix sed supports it, a newline after "aaa" can be inserted
9512b1
   this way (which is not portable to GNU sed or other seds):
9512b1

9512b1
     s/aaa/&~;; y/~/\n/;    # assuming no other '~' is on the line!
9512b1

9512b1
4.2. How do I represent control-codes or nonprintable characters?
9512b1

9512b1
   Several versions of sed support the notation \xHH, where "HH" are
9512b1
   two hex digits, 00-FF: ssed, GNU sed v3.02.80 and above, GNU sed
9512b1
   v1.03, sed16 and sed15 (HHsed). Try to use one of those versions.
9512b1

9512b1
   Sed is not intended to process binary or object code, and files
9512b1
   which contain nulls (0x00) will usually generate errors in most
9512b1
   versions of sed. The latest versions of GNU sed and ssed are an
9512b1
   exception; they permit nulls in the input files and also in
9512b1
   regexes.
9512b1

9512b1
   On Unix platforms, the 'echo' command may allow insertion of octal
9512b1
   or hex values, e.g., `echo "\0nnn"` or `echo -n "\0nnn"`. The echo
9512b1
   command may also support syntax like '\\b' or '\\t' for backspace
9512b1
   or tab characters. Check the man pages to see what syntax your
9512b1
   version of echo supports. Some versions support the following:
9512b1

9512b1
     # replace 0x1A (32 octal) with ASCII letters
9512b1
     sed 's/'`echo "\032"`'/Ctrl-Z/g'
9512b1

9512b1
     # note the 3 backslashes in the command below
9512b1
     sed "s/.`echo \\\b`//g"
9512b1

9512b1
4.3. How do I convert files with toggle characters, like +this+, to
9512b1
look like [i]this[/i]?
9512b1

9512b1
   Input files, especially message-oriented text files, often contain
9512b1
   toggle characters for emphasis, like ~this~, *this*, or =this=. Sed
9512b1
   can make the same input pattern produce alternating output each
9512b1
   time it is encountered. Typical needs might be to generate HMTL
9512b1
   codes or print codes for boldface, italic, or underscore. This
9512b1
   script accomodates multiple occurrences of the toggle pattern on
9512b1
   the same line, as well as cases where the pattern starts on one
9512b1
   line and finishes several lines later, even at the end of the file:
9512b1

9512b1
     # sed script to convert +this+ to [i]this[/i]
9512b1
     :a
9512b1
     /+/{ x;        # If "+" is found, switch hold and pattern space
9512b1
       /^ON/{       # If "ON" is in the (former) hold space, then ..
9512b1
         s///;      # .. delete it
9512b1
         x;         # .. switch hold space and pattern space back
9512b1
         s|+|[/i]|; # .. turn the next "+" into "[/i]"
9512b1
         ba;        # .. jump back to label :a and start over
9512b1
       }
9512b1
     s/^/ON/;       # Else, "ON" was not in the hold space; create it
9512b1
     x;             # Switch hold space and pattern space
9512b1
     s|+|[i]|;      # Turn the first "+" into "[i]"
9512b1
     ba;            # Branch to label :a to find another pattern
9512b1
     }
9512b1
     #---end of script---
9512b1

9512b1
   This script uses the hold space to create a "flag" to indicate
9512b1
   whether the toggle is ON or not. We have added remarks to
9512b1
   illustrate the script logic, but in most versions of sed remarks
9512b1
   are not permitted after 'b'ranch commands or labels.
9512b1

9512b1
   If you are sure that the +toggle+ characters never cross line
9512b1
   boundaries (i.e., never begin on one line and end on another), this
9512b1
   script can be reduced to one line:
9512b1

9512b1
     s|+\([^+][^+]*\)+|[i]\1[/i]|g
9512b1

9512b1
   If your toggle pattern contains regex metacharacters (such as '*'
9512b1
   or perhaps '+' or '?'), remember to quote them with backslashes.
9512b1

9512b1
   CHANGING STRINGS
9512b1

9512b1
4.10. How do I perform a case-insensitive search?
9512b1

9512b1
   Several versions of sed support case-insensitive matching: ssed and
9512b1
   GNU sed v3.02+ (with I flag after s/// or /regex/); sedmod with the
9512b1
   -i switch; and sed16 (which supports both types of switches).
9512b1

9512b1
   With other versions of sed, case-insensitive searching is awkward,
9512b1
   so people may use awk or perl instead, since these programs have
9512b1
   options for case-insensitive searches. In gawk/mawk, use "BEGIN
9512b1
   {IGNORECASE=1}" and in perl, "/regex/i". For other seds, here are
9512b1
   three solutions:
9512b1

9512b1
   Solution 1: convert everything to upper case and search normally
9512b1

9512b1
     # sed script, solution 1
9512b1
     h;          # copy the original line to the hold space
9512b1
                 # convert the pattern space to solid caps
9512b1
     y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
9512b1
                 # now we can search for the word "CARLOS"
9512b1
     /CARLOS/ {
9512b1
          # add or insert lines. Note: "s/.../.../" will not work
9512b1
          # here because we are searching a modified pattern
9512b1
          # space and are not printing the pattern space.
9512b1
     }
9512b1
     x;          # get back the original pattern space
9512b1
                 # the original pattern space will be printed
9512b1
     #---end of sed script---
9512b1

9512b1
   Solution 2: search for both cases
9512b1

9512b1
   Often, proper names will either start with all lower-case ("unix"),
9512b1
   with an initial capital letter ("Unix") or occur in solid caps
9512b1
   ("UNIX"). There may be no need to search for every possibility.
9512b1

9512b1
     /UNIX/b match
9512b1
     /[Uu]nix/b match
9512b1

9512b1
   Solution 3: search for all possible cases
9512b1

9512b1
     # If you must, search for any possible combination
9512b1
     /[Ca][Aa][Rr][Ll][Oo][Ss]/ { ... }
9512b1

9512b1
   Bear in mind that as the pattern length increases, this solution
9512b1
   becomes an order of magnitude slower than the one of Solution 1, at
9512b1
   least with some implementations of sed.
9512b1

9512b1
4.11. How do I match only the first occurrence of a pattern?
9512b1

9512b1
   (1) The general solution is to use GNU sed or ssed, with one of
9512b1
   these range expressions. The first script ("print only the first
9512b1
   match") works with any version of sed:
9512b1

9512b1
     sed -n '/RE/{p;q;}' file       # print only the first match
9512b1
     sed '0,/RE/{//d;}' file        # delete only the first match
9512b1
     sed '0,/RE/s//to_that/' file   # change only the first match
9512b1

9512b1
   (2) If you cannot use GNU sed and if you *know* the pattern will
9512b1
   not occur on the first line, this will work:
9512b1

9512b1
     sed '1,/RE/{//d;}' file        # delete only the first match
9512b1
     sed '1,/RE/s//to_that/' file   # change only the first match
9512b1

9512b1
   (3) If you cannot use GNU sed and the pattern *might* occur on the
9512b1
   first line, use one of the following commands (credit for short GNU
9512b1
   script goes to Donald Bruce Stewart):
9512b1

9512b1
     sed '/RE/{x;/Y/!{s/^/Y/;h;d;};x;}' file       # delete (one way)
9512b1
     sed -e '/RE/{d;:a' -e '$!N;$ba' -e '}' file   # delete (another way)
9512b1
     sed '/RE/{d;:a;N;$ba;}' file                  # same script, GNU sed
9512b1
     sed -e '/RE/{s//to_that/;:a' -e '$!N;$!ba' -e '}' file  # change
9512b1

9512b1
   Still another solution, using a flag in the hold space. This is
9512b1
   portable to all seds and works if the pattern is on the first line:
9512b1

9512b1
     # sed script to change "foo" to "bar" only on the first occurrence
9512b1
     1{x;s/^/first/;x;}
9512b1
     1,/foo/{x;/first/s///;x;s/foo/bar/;}
9512b1
     #---end of script---
9512b1

9512b1
4.12. How do I parse a comma-delimited (CSV) data file?
9512b1

9512b1
   Comma-delimited data files can come in several forms, requiring
9512b1
   increasing levels of complexity in parsing and handling. They are
9512b1
   often referred to as CSV files (for "comma separated values") and
9512b1
   occasionally as SDF files (for "standard data format"). Note that
9512b1
   some vendors use "SDF" to refer to variable-length records with
9512b1
   comma-separated fields which are "double-quoted" if they contain
9512b1
   character values, while other vendors use "SDF" to designate
9512b1
   fixed-length records with fixed-length, nonquoted fields! (For help
9512b1
   with fixed-length fields, see question 4.23)
9512b1

9512b1
   The term "CSV" became a de-facto standard when Microsoft Excel used
9512b1
   it as an optional output file format.
9512b1

9512b1
   Here are 4 different forms you may encounter in comma-delimited data:
9512b1

9512b1
   (a) No quotes, no internal commas
9512b1

9512b1
       1001,John Smith,PO Box 123,Chicago,IL,60699
9512b1
       1002,Mary Jones,320 Main,Denver,CO,84100,
9512b1

9512b1
   (b) Like (a), with quotes around each field
9512b1

9512b1
       "1003","John Smith","PO Box 123","Chicago","IL","60699"
9512b1
       "1004","Mary Jones","320 Main","Denver","CO","84100"
9512b1

9512b1
   (c) Like (b), with embedded commas
9512b1

9512b1
       "1005","Tom Hall, Jr.","61 Ash Ct.","Niles","OH","44446"
9512b1
       "1006","Bob Davis","429 Pine, Apt. 5","Boston","MA","02128"
9512b1

9512b1
   (d) Like (c), with embedded commas and quotes
9512b1

9512b1
       "1007","Sue "Red" Smith","19 Main","Troy","MI","48055"
9512b1
       "1008","Joe "Hey, guy!" Hall","POB 44","Reno","NV","89504"
9512b1

9512b1
   In each example above, we have 7 fields and 6 commas which function
9512b1
   as field separators. Case (c) is a very typical form of these data
9512b1
   files, with double quotes used to enclose each field and to protect
9512b1
   internal commas (such as "Tom Hall, Jr.") from interpretation as
9512b1
   field separators. However, many times the data may include both
9512b1
   embedded quotation marks as well as embedded commas, as seen by
9512b1
   case (d), above.
9512b1

9512b1
   Case (d) is the closest to Microsoft CSV format. *However*, the
9512b1
   Microsoft CSV format allows embedded newlines within a
9512b1
   double-quoted field. If embedded newlines within fields are a
9512b1
   possibility for your data, you should consider using something
9512b1
   other than sed to work with the data file.
9512b1

9512b1
   Before handling a comma-delimited data file, make sure that you
9512b1
   fully understand its format and check the integrity of the data.
9512b1
   Does each line contain the same number of fields? Should certain
9512b1
   fields be composed only of numbers or of two-letter state
9512b1
   abbreviations in all caps? Sed (or awk or perl) should be used to
9512b1
   validate the integrity of the data file before you attempt to alter
9512b1
   it or extract particular fields from the file.
9512b1

9512b1
   After ensuring that each line has a valid number of fields, use sed
9512b1
   to locate and modify individual fields, using the \(...\) grouping
9512b1
   command where needed.
9512b1

9512b1
   In case (a):
9512b1

9512b1
     sed 's/^[^,]*,[^,]*,[^,]*,[^,]*,/.../'
9512b1
             ^     ^     ^
9512b1
             |     |     |_ 3rd field
9512b1
             |     |_______ 2nd field
9512b1
             |_____________ 1st field
9512b1

9512b1
     # Unix script to delete the second field for case (a)
9512b1
     sed 's/^\([^,]*\),[^,]*,/\1,,/' file
9512b1

9512b1
     # Unix script to change field 1 to 9999 for case (a)
9512b1
     sed 's/^[^,]*,/9999,/' file
9512b1

9512b1
   In cases (b) and (c):
9512b1

9512b1
     sed 's/^"[^"]*","[^"]*","[^"]*","[^"]*",/.../'
9512b1
              1st--   2nd--   3rd--   4th--
9512b1

9512b1
     # Unix script to delete the second field for case (c)
9512b1
     sed 's/^\("[^"]*"\),"[^"]*",/\1,"",/' file
9512b1

9512b1
     # Unix script to change field 1 to 9999 for case (c)
9512b1
     sed 's/^"[^"]*",/"9999",/' file
9512b1

9512b1

9512b1
   In case (d):
9512b1

9512b1
   One way to parse such files is to replace the 3-character field
9512b1
   separator "," with an unused character like the tab or vertical
9512b1
   bar. (Technically, the field separator is only the comma while the
9512b1
   fields are surrounded by "double quotes", but the net _effect_ is
9512b1
   that fields are separated by quote-comma-quote, with quote
9512b1
   characters added to the beginning and end of each record.) Search
9512b1
   your datafile _first_ to make sure that your character appears
9512b1
   nowhere in it!
9512b1

9512b1
     sed -n '/|/p' file        # search for any instance of '|'
9512b1
     # if it's not found, we can use the '|' to separate fields
9512b1

9512b1
   Then replace the 3-character field separator and parse as before:
9512b1

9512b1
     # sed script to delete the second field for case (d)
9512b1
     s/","/|/g;                  # global change of "," to bar
9512b1
     s/^\([^|]*\)|[^|]|/\1||/;   # delete 2nd field
9512b1
     s/|/","/g;                  # global change of bar back to ","
9512b1
     #---end of script---
9512b1

9512b1
     # sed script to change field 1 to 9999 for case (d)
9512b1
     # Remember to accommodate leading and trailing quote marks
9512b1
     s/","/|/g;
9512b1
     s/^[^|]*|/"9999|/;
9512b1
     s/|/","/g;
9512b1
     #---end of script---
9512b1

9512b1
   Note that this technique works only if _each_ and _every_ field is
9512b1
   surrounded with double quotes, including empty fields.
9512b1

9512b1
   The following solution is for more complex examples of (d), such
9512b1
   as: not all fields contain "double-quote" marks, or the presence of
9512b1
   embedded "double-quote" marks within fields, or extraneous
9512b1
   whitespace around field delimiters. (Thanks to Greg Ubben for this
9512b1
   script!)
9512b1

9512b1
     # sed script to convert case (d) to bar-delimited records
9512b1
     s/^ *\(.*[^ ]\) *$/|\1|/;
9512b1
     s/" *, */"|/g;
9512b1
     : loop
9512b1
     s/| *\([^",|][^,|]*\) *, */|\1|/g;
9512b1
     s/| *, */|\1|/g;
9512b1
     t loop
9512b1
     s/  *|/|/g;
9512b1
     s/|  */|/g;
9512b1
     s/^|\(.*\)|$/\1/;
9512b1
     #---end of script---
9512b1

9512b1
   For example, it turns this (which is badly-formed but legal):
9512b1

9512b1
   first,"",unquoted ,""this" is, quoted " ,, sub "quote" inside, f", lone  " empty:
9512b1

9512b1
   into this:
9512b1

9512b1
   first|""|unquoted|""this" is, quoted "||sub "quote" inside|f"|lone  "   empty:
9512b1

9512b1
   Note that the script preserves the "double-quote" marks, but
9512b1
   changes only the commas where they are used as field separators. I
9512b1
   have used the vertical bar "|" because it's easier to read, but you
9512b1
   may change this to another field separator if you wish.
9512b1

9512b1
   If your CSV datafile is more complex, it would probably not be
9512b1
   worth the effort to write it in sed. For such a case, you should
9512b1
   use Perl with a dedicated CSV module (there are at least two recent
9512b1
   CSV parsers available from CPAN).
9512b1

9512b1
4.13. How do I handle fixed-length, columnar data?
9512b1

9512b1
   Sed handles fixed-length fields via \(grouping\) and backreferences
9512b1
   (\1, \2, \3 ...). If we have 3 fields of 10, 25, and 9 characters
9512b1
   per field, our sed script might look like so:
9512b1

9512b1
     s/^\(.\{10\}\)\(.\{25\}\)\(.\{9\}\)/\3\2\1/;  # Change the fields
9512b1
        ^^^^^^^^^^^~~~~~~~~~~~==========           #   from 1,2,3 to 3,2,1
9512b1
         field #1   field #2   field #3
9512b1

9512b1
   This is a bit hard to read. By using GNU sed or ssed with the -r
9512b1
   switch active, it can look like this:
9512b1

9512b1
     s/^(.{10})(.{25})(.{9})/\3\2\1/;          # Using the -r switch
9512b1

9512b1
   To delete a field in sed, use grouping and omit the backreference
9512b1
   from the field to be deleted. If the data is long or difficult to
9512b1
   work with, use ssed with the -R switch and the /x flag after an s///
9512b1
   command, to insert comments and remarks about the fields.
9512b1

9512b1
   For records with many fields, use GNU awk with the FIELDWIDTHS
9512b1
   variable set in the top of the script. For example:
9512b1

9512b1
     awk 'BEGIN{FIELDWIDTHS = "10 25 9"}; {print $3 $2 $1}' file
9512b1

9512b1
   This is much easier to read than a similar sed script, especially
9512b1
   if there are more than 5 or 6 fields to manipulate.
9512b1

9512b1
4.14. How do I commify a string of numbers?
9512b1

9512b1
   Use the simplest script necessary to accomplish your task. As
9512b1
   variations of the line increase, the sed script must become more
9512b1
   complex to handle additional conditions. Whole numbers are
9512b1
   simplest, followed by decimal formats, followed by embedded words.
9512b1

9512b1
   Case 1: simple strings of whole numbers separated by spaces or
9512b1
   commas, with an optional negative sign. To convert this:
9512b1

9512b1
       4381, -1222333, and 70000: - 44555666 1234567890 words
9512b1
       56890  -234567, and 89222  -999777  345888777666 chars
9512b1

9512b1
   to this:
9512b1

9512b1
       4,381, -1,222,333, and 70,000: - 44,555,666 1,234,567,890 words
9512b1
       56,890  -234,567, and 89,222  -999,777  345,888,777,666 chars
9512b1

9512b1
   use one of these one-liners:
9512b1

9512b1
     sed ':a;s/\B[0-9]\{3\}\>/,&/;ta'                      # GNU sed
9512b1
     sed -e :a -e 's/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/;ta'  # other seds
9512b1

9512b1
   Case 2: strings of numbers which may have an embedded decimal
9512b1
   point, separated by spaces or commas, with an optional negative
9512b1
   sign. To change this:
9512b1

9512b1
       4381,  -6555.1212 and 70000,  7.18281828  44906982.071902
9512b1
       56890   -2345.7778 and 8.0000:  -49000000 -1234567.89012
9512b1

9512b1
   to this:
9512b1

9512b1
       4,381,  -6,555.1212 and 70,000,  7.18281828  44,906,982.071902
9512b1
       56,890   -2,345.7778 and 8.0000:  -49,000,000 -1,234,567.89012
9512b1

9512b1
   use the following command for GNU sed:
9512b1

9512b1
     sed ':a;s/\(^\|[^0-9.]\)\([0-9]\+\)\([0-9]\{3\}\)/\1\2,\3/g;ta'
9512b1

9512b1
   and for other versions of sed:
9512b1

9512b1
     sed -f case2.sed files
9512b1

9512b1
     # case2.sed
9512b1
     s/^/ /;                 # add space to start of line
9512b1
     :a
9512b1
     s/\( [-0-9]\{1,\}\)\([0-9]\{3\}\)/\1,\2/g
9512b1
     ta
9512b1
     s/ //;                  # remove space from start of line
9512b1
     #---end of script---
9512b1

9512b1
4.15. How do I prevent regex expansion on substitutions?
9512b1

9512b1
   Sometimes you want to *match* regular expression metacharacters as
9512b1
   literals (e.g., you want to match "[0-9]" or "\n"), to be replaced
9512b1
   with something else. The ordinary way to prevent expanding
9512b1
   metacharacters is to prefix them with a backslash. Thus, if "\n"
9512b1
   matches a newline, "\\n" will match the two-character string of
9512b1
   'backslash' followed by 'n'.
9512b1

9512b1
   But doing this repeatedly can become tedious if there are many
9512b1
   regexes. The following script will replace alternating strings of
9512b1
   literals, where no character is interpreted as a regex
9512b1
   metacharacter:
9512b1

9512b1
     # filename: sub_quote.sed
9512b1
     #   author: Paolo Bonzini
9512b1
     # sed script to add backslash to find/replace metacharacters
9512b1
     N;                  # add even numbered line to pattern space
9512b1
     s,[]/\\$*[],\\&,;;  # quote all of [, ], /, \, $, or *
9512b1
     s,^,s/,;            # prepend "s/" to front of pattern space
9512b1
     s,$,/,;             # append "/" to end of pattern space
9512b1
     s,\n,/,;            # change "\n" to "/", making s/from/to/
9512b1
     #---end of script---
9512b1

9512b1
   Here's a sample of how sub_quote.sed might be used. This example
9512b1
   converts typical sed regexes to perl-style regexes. The input file
9512b1
   consists of 10 lines:
9512b1

9512b1
       [0-9]
9512b1
       \d
9512b1
       [^0-9]
9512b1
       \D
9512b1
       \+
9512b1
       +
9512b1
       \?
9512b1
       ?
9512b1
       \|
9512b1
       |
9512b1

9512b1
   Run the command "sed -f sub_quote.sed input", to transform the
9512b1
   input file (above) to 5 lines of output:
9512b1

9512b1
       s/\[0-9\]/\\d/
9512b1
       s/\[^0-9\]/\\D/
9512b1
       s/\\+/+/
9512b1
       s/\\?/?/
9512b1
       s/\\|/|/
9512b1

9512b1
   The above file is itself a sed script, which can then be used to
9512b1
   modify other files.
9512b1

9512b1
4.16. How do I convert a string to all lowercase or capital letters?
9512b1

9512b1
   The easiest method is to use a new version of GNU sed, ssed, sedmod
9512b1
   or sed16 and employ the \U, \L, or other switches on the right side
9512b1
   of an s/// command. For example, to convert any word which begins
9512b1
   with "reg" or "exp" into solid capital letters:
9512b1

9512b1
       sed -r "s/\<(reg|exp)[a-z]+/\U&/g"              # gsed4.+ or ssed
9512b1
       sed "s/\
9512b1

9512b1
   As you can see, sedmod and sed16 do not support alternation (|),
9512b1
   but they do support case conversion. If none of these versions of
9512b1
   sed are available to you, some sample scripts for this task are
9512b1
   available from the Seder's Grab Bag:
9512b1

9512b1
       http://sed.sourceforge.net/grabbag/scripts
9512b1

9512b1
   Note that some case conversion scripts are listed under "Filename
9512b1
   manipulation" and others are under "Text formatting."
9512b1

9512b1
   CHANGING BLOCKS (consecutive lines)
9512b1

9512b1
4.20. How do I change only one section of a file?
9512b1

9512b1
   You can match a range of lines by line number, by regexes (say, all
9512b1
   lines between the words "from" and "until"), or by a combination of
9512b1
   the two. For multiple substitutions on the same range, put the
9512b1
   command(s) between braces {...}. For example:
9512b1

9512b1
     # replace only between lines 1 and 20
9512b1
     1,20 s/Johnson/White/g
9512b1

9512b1
     # replace everywhere EXCEPT between lines 1 and 20
9512b1
     1,20 !s/Johnson/White/g
9512b1

9512b1
     # replace only between words "from" and "until". Note the
9512b1
     # use of \<....\> as word boundary markers in GNU sed.
9512b1
     /from/,/until/ { s/\<red\>/magenta/g; s/\<blue\>/cyan/g; }
9512b1

9512b1
     # replace only from the words "ENDNOTES:" to the end of file
9512b1
     /ENDNOTES:/,$ { s/Schaff/Herzog/g; s/Kraft/Ebbing/g; }
9512b1

9512b1
   For technical details on using address ranges, see section 3.3
9512b1
   ("Addressing and Address ranges").
9512b1

9512b1
4.21. How do I delete or change a block of text if the block contains
9512b1
      a certain regular expression?
9512b1

9512b1
   The following deletes the block between 'start' and 'end'
9512b1
   inclusively, if and only if the block contains the string
9512b1
   'regex'. Written by Russell Davies, with additional comments:
9512b1

9512b1
     # sed script to delete a block if /regex/ matches inside it
9512b1
     :t
9512b1
     /start/,/end/ {    # For each line between these block markers..
9512b1
        /end/!{         #   If we are not at the /end/ marker
9512b1
           $!{          #     nor the last line of the file,
9512b1
              N;        #     add the Next line to the pattern space
9512b1
              bt
9512b1
           }            #   and branch (loop back) to the :t label.
9512b1
        }               # This line matches the /end/ marker.
9512b1
        /regex/d;       # If /regex/ matches, delete the block.
9512b1
     }                  # Otherwise, the block will be printed.
9512b1
     #---end of script---
9512b1

9512b1
   Note: When the script above reaches /regex/, the entire multi-line
9512b1
   block is in the pattern space. To replace items inside the block,
9512b1
   use "s///". To change the entire block, use the 'c' (change)
9512b1
   command:
9512b1

9512b1
     /regex/c\
9512b1
     1: This will replace the entire block\
9512b1
     2: with these two lines of text.
9512b1

9512b1
4.22. How do I locate a paragraph of text if the paragraph contains a
9512b1
      certain regular expression?
9512b1

9512b1
   Assume that paragraphs are separated by blank lines. For regexes
9512b1
   that are single terms, use one of the following scripts:
9512b1

9512b1
     sed -e '/./{H;$!d;}' -e 'x;/regex/!d'      # most seds
9512b1
     sed '/./{H;$!d;};x;/regex/!d'              # GNU sed
9512b1

9512b1
   To print paragraphs only if they contain 3 specific regular
9512b1
   expressions (RE1, RE2, and RE3), in any order in the paragraph:
9512b1

9512b1
     sed -e '/./{H;$!d;}' -e 'x;/RE1/!d;/RE2/!d;/RE3/!d'
9512b1

9512b1
   With this solution and the preceding one, if the paragraphs are
9512b1
   excessively long (more than 4k in length), you may overflow sed's
9512b1
   internal buffers. If using HHsed, you must add a "G;" command
9512b1
   immediately after the "x;" in the scripts above to defeat a bug
9512b1
   in HHsed (see section 7.9(5), below, for a description).
9512b1

9512b1
4.23. How do I match a block of _specific_ consecutive lines?
9512b1

9512b1
   There are three ways to approach this problem:
9512b1

9512b1
       (1) Try to use a "/range/, /expression/"
9512b1
       (2) Try to use a "/multi-line\nexpression/"
9512b1
       (3) Try to use a block of "literal strings"
9512b1

9512b1
   We describe each approach in the following sections.
9512b1

9512b1
4.23.1.  Try to use a "/range/, /expression/"
9512b1

9512b1
   If the block of lines are strings that *never change their order*
9512b1
   and if the top line never occurs outside the block, like this:
9512b1

9512b1
       Abel
9512b1
       Baker
9512b1
       Charlie
9512b1
       Delta
9512b1

9512b1
   then these solutions will work for deleting the block:
9512b1

9512b1
     sed 's/^Abel$/{N;N;N;d;}' files    # for blocks with few lines
9512b1
     sed '/^Abel$/, /^Zebra$/d' files   # for blocks with many lines
9512b1
     sed '/^Abel$/,+25d' files          # HHsed, sedmod, ssed, gsed 3.02.80
9512b1

9512b1
   To change the block, use the 'c' (change) command instead of 'd'.
9512b1
   To print that block only, use the -n switch and 'p' (print) instead
9512b1
   of 'd'. To change some things inside the block, try this:
9512b1

9512b1
     /^Abel$/,/^Delta$/ {
9512b1
         :ack
9512b1
         N;
9512b1
         /\nDelta$/! b ack
9512b1
         # At this point, all the lines in the block are collected
9512b1
         s/ubstitute /somethin/g;
9512b1
     }
9512b1

9512b1
4.23.2.  Try to use a "multi-line\nexpression"
9512b1

9512b1
   If the top line of the block sometimes appears alone or is
9512b1
   sometimes followed by other lines, or if a partial block may occur
9512b1
   somewhere in the file, a multi-line expression may be required.
9512b1

9512b1
   In these examples, we give solutions for matching an N-line block.
9512b1
   The expression "/^RE1\nRE2\nRE3...$/" represents a properly formed
9512b1
   regular expression where \n indicates a newline between lines. Note
9512b1
   that the 'N' followed by the 'P;D;' commands forms a "sliding
9512b1
   window" technique. A window of N lines is formed. If the multi-line
9512b1
   pattern matches, the block is handled. If not, the top line is
9512b1
   printed and then deleted from the pattern space, and we try to
9512b1
   match at the next line.
9512b1

9512b1
     # sed script to delete 2 consecutive lines: /^RE1\nRE2$/
9512b1
     $b
9512b1
     /^RE1$/ {
9512b1
       $!N
9512b1
       /^RE1\nRE2$/d
9512b1
       P;D
9512b1
     }
9512b1
     #---end of script---
9512b1

9512b1
     # sed script to delete 3 consecutive lines. (This script
9512b1
     # fails under GNU sed v2.05 and earlier because of the 't'
9512b1
     # bug when s///n is used; see section 7.5(1) of the FAQ.)
9512b1
     : more
9512b1
     $!N
9512b1
     s/\n/&/;;
9512b1
     t enough
9512b1
     $!b more
9512b1
     : enough
9512b1
     /^RE1\nRE2\nRE3$/d
9512b1
     P;D
9512b1
     #---end of script---
9512b1

9512b1
   For example, to delete a block of 5 consecutive lines, the previous
9512b1
   script must be altered in only two places:
9512b1

9512b1
   (1) Change the 2 in "s/\n/&/;;" to a 4 (the trailing semicolon is
9512b1
   needed to work around a bug in HHsed v1.5).
9512b1

9512b1
   (2) Change the regex line to "/^RE1\nRE2\nRE3\nRE4\nRE5$/d",
9512b1
   modifying the expression as needed.
9512b1

9512b1
   Suppose we want to delete a block of two blank lines followed by
9512b1
   the word "foo" followed by another blank line (4 lines in all).
9512b1
   Other blank lines and other instances of "foo" should be left
9512b1
   alone. After changing the '2' to a '3' (always one number less than
9512b1
   the total number of lines), the regex line would look like this:
9512b1
   "/^\n\nfoo\n$/d". (Thanks to Greg Ubben for this script.)
9512b1

9512b1
   As an alternative to work around the 't' bug in older versions of
9512b1
   GNU sed, the following script will delete 4 consecutive lines:
9512b1

9512b1
     # sed script to delete 4 consecutive lines. Use this if you
9512b1
     # require GNU sed 2.05 and below.
9512b1
     /^RE1$/!b
9512b1
     $!N
9512b1
     $!N
9512b1
     :a
9512b1
     $b
9512b1
     N
9512b1
     /^RE1\nRE2\nRE3\nRE4$/d
9512b1
     P
9512b1
     s/^.*\n\(.*\n.*\n.*\)$/\1/
9512b1
     ba
9512b1
     #---end of script---
9512b1

9512b1
   Its drawback is that it must be modified in 3 places instead of 2
9512b1
   to adapt it for more lines, and as additional lines are added, the
9512b1
   's' command is forced to work harder to match the regexes. On the
9512b1
   other hand, it avoids a bug with gsed-2.05 and illustrates another
9512b1
   way to solve the problem of deleting consecutive lines.
9512b1

9512b1
4.23.3.  Try to use a block of "literal strings"
9512b1

9512b1
   If you need to match a static block of text (which may occur any
9512b1
   number of times throughout a file), where the contents of the block
9512b1
   are known in advance, then this script is easy to use. It requires
9512b1
   an intermediate file, which we will call "findrep.txt" (below):
9512b1

9512b1
       A block of several consecutive lines to
9512b1
       be matched literally should be placed on
9512b1
       top. Regular expressions like .*  or [a-z]
9512b1
       will lose their special meaning and be
9512b1
       interpreted literally in this block.
9512b1
       ----
9512b1
       Four hyphens separate the two sections. Put
9512b1
       the replacement text in the lower section.
9512b1
       As above, sed symbols like &, \n, or \1 will
9512b1
       lose their special meaning.
9512b1

9512b1
   This is a 3-step process. A generic script called "blockrep.sed"
9512b1
   will read "findrep.txt" (above) and generate a custom script, which
9512b1
   is then used on the actual input file. In other words,
9512b1
   "findrep.txt" is a simplified description of the editing that you
9512b1
   want to do on the block, and "blockrep.sed" turns it into actual
9512b1
   sed commands.
9512b1

9512b1
   Use this process from a Unix shell or from a DOS prompt:
9512b1

9512b1
     sed -nf blockrep.sed findrep.txt >custom.sed
9512b1
     sed -f custom.sed input.file >output.file
9512b1
     erase custom.sed
9512b1

9512b1
   The generic script "blockrep.sed" follows below. It's fairly long.
9512b1
   Examining its output might help you understanding how to use the
9512b1
   _sliding window_ technique.
9512b1

9512b1
     # filename: blockrep.sed
9512b1
     #   author: Paolo Bonzini
9512b1
     # Requires:
9512b1
     #    (1) blocks to find and replace, e.g., findrep.txt
9512b1
     #    (2) an input file to be changed, input.file
9512b1
     #
9512b1
     # blockrep.sed creates a second sed script, custom.sed,
9512b1
     # to find the lines above the row of 4 hyphens, globally
9512b1
     # replacing them with the lower block of text. GNU sed
9512b1
     # is recommended but not required for this script.
9512b1
     #
9512b1
     # Loop on the first part, accumulating the `from' text
9512b1
     # into the hold space.
9512b1
     :a
9512b1
     /^----$/! {
9512b1
        # Escape slashes, backslashes, the final newline and
9512b1
        # regular expression metacharacters.
9512b1
        s,[/\[.*],\\&,g
9512b1
        s/$/\\/
9512b1
        H
9512b1
        #
9512b1
        # Append N cmds needed to maintain the sliding window.
9512b1
        x
9512b1
        1 s,^.,s/,
9512b1
        1! s/^/N\
9512b1
     /
9512b1
        x
9512b1
        n
9512b1
        ba
9512b1
     }
9512b1
     #
9512b1
     # Change the final backslash to a slash to separate the
9512b1
     # two sides of the s command.
9512b1
     x
9512b1
     s,\\$,/,
9512b1
     x
9512b1
     #
9512b1
     # Until EOF, gather the substitution into hold space.
9512b1
     :b
9512b1
     n
9512b1
     s,[/\],\\&,g
9512b1
     $! s/$/\\/
9512b1
     H
9512b1
     $! bb
9512b1
     #
9512b1
     # Start the RHS of the s command without a leading
9512b1
     # newline, add the P/D pair for the sliding window, and
9512b1
     # print the script.
9512b1
     g
9512b1
     s,/\n,/,
9512b1
     s,$,/\
9512b1
     P\
9512b1
     D,p
9512b1
     #---end of script---
9512b1

9512b1
4.24. How do I address all the lines between RE1 and RE2, excluding the
9512b1
      lines themselves?
9512b1

9512b1
   Normally, to address the lines between two regular expressions, RE1
9512b1
   and RE2, one would do this: '/RE1/,/RE2/{commands;}'. Excluding
9512b1
   those lines takes an extra step. To put 2 arrows before each line
9512b1
   between RE1 and RE2, except for those lines:
9512b1

9512b1
     sed '1,/RE1/!{ /RE2/,/RE1/!s/^/>>/; }' input.fil
9512b1

9512b1
   The preceding script, though short, may be difficult to follow. It
9512b1
   also requires that /RE1/ cannot occur on the first line of the
9512b1
   input file. The following script, though it's not a one-liner, is
9512b1
   easier to read and it permits /RE1/ to appear on the first line:
9512b1

9512b1
     # sed script to replace all lines between /RE1/ and /RE2/,
9512b1
     # without matching /RE1/ or /RE2/
9512b1
     /RE1/,/RE2/{
9512b1
       /RE1/b
9512b1
       /RE2/b
9512b1
       s/^/>>/
9512b1
     }
9512b1
     #---end of script---
9512b1

9512b1
   Contents of input.fil:         Output of sed script:
9512b1
      aaa                           aaa
9512b1
      bbb                           bbb
9512b1
      RE1                           RE1
9512b1
      aaa                           >>aaa
9512b1
      bbb                           >>bbb
9512b1
      ccc                           >>ccc
9512b1
      RE2                           RE2
9512b1
      end                           end
9512b1

9512b1
4.25. How do I join two lines if line #1 ends in a [certain string]?
9512b1

9512b1
   This question appears in the section on one-line sed scripts, but
9512b1
   it comes up so many times that it needs a place here also. Suppose
9512b1
   a line ends with a particular string (often, a line ends with a
9512b1
   backslash). How do you bring up the second line after it, even in
9512b1
   cases where several consecutive lines all end in a backslash?
9512b1

9512b1
     sed -e :a -e '/\\$/N; s/\\\n//; ta' file   # all seds
9512b1
     sed ':a; /\\$/N; s/\\\n//; ta' file        # GNU sed, ssed, HHsed
9512b1

9512b1
   Note that this replaces the backslash-newline with nothing. You may
9512b1
   want to replace the backslash-newline with a single space instead.
9512b1

9512b1
4.26. How do I join two lines if line #2 begins in a [certain string]?
9512b1

9512b1
   The inverse situation is another FAQ. Suppose a line begins with a
9512b1
   particular string. How do you bring that line up to follow the
9512b1
   previous line? In this example, we want to match the string "<<="
9512b1
   at the beginning of one line, bring that line up to the end of the
9512b1
   line before it, and replace the string with a single space:
9512b1

9512b1
     sed -e :a -e '$!N;s/\n<<=/ /;ta' -e 'P;D' file   # all seds
9512b1
     sed ':a; $!N;s/\n<<=/ /;ta;P;D' file             # GNU, ssed, sed15+
9512b1

9512b1
4.27. How do I change all paragraphs to long lines?
9512b1

9512b1
   A frequent request is how to convert DOS-style textfiles, in which
9512b1
   each line ends with "paragraph marker", to Microsoft-style
9512b1
   textfiles, in which the "paragraph" marker only appears at the end
9512b1
   of real paragraphs. Sometimes this question is framed as, "How do I
9512b1
   remove the hard returns at the end of each line in a paragraph?"
9512b1

9512b1
   The problem occurs because newer word processors don't work the
9512b1
   same way older text editors did. Older text editors used a newline
9512b1
   (CR/LF in DOS; LF alone in Unix) to end each line on screen or on
9512b1
   disk, and used two newlines to separate paragraphs. Certain word
9512b1
   processors wanted to make paragraph reformatting and reflowing work
9512b1
   easily, so they use one newline to end a paragraph and never allow
9512b1
   newlines _within_ a paragraph. This means that textfiles created
9512b1
   with standard editors (Emacs, vi, Vedit, Boxer, etc.) appear to
9512b1
   have "hard returns" at inappropriate places. The following sed
9512b1
   script finds blocks of consecutive nonblank lines (i.e., paragraphs
9512b1
   of text), and converts each block into one long line with one "hard
9512b1
   return" at the end.
9512b1

9512b1
     # sed script to change all paragraphs to long lines
9512b1
     /./{H; $!d;}             # Put each paragraph into hold space
9512b1
     x;                       # Swap hold space and pattern space
9512b1
     s/^\(\n\)\(..*\)$/\2\1/; # Move leading \n to end of PatSpace
9512b1
     s/\n\(.\)/ \1/g;         # Replace all other \n with 1 space
9512b1
     # Uncomment the following line to remove excess blank lines:
9512b1
     # /./!d;
9512b1
     #---end of sed script---
9512b1

9512b1
   If the input files have formatting or indentation that conveys
9512b1
   special meaning (like program source code), this script will remove
9512b1
   it. But if the text still needs to be extended, try 'par'
9512b1
   (paragraph reformatter) or the 'fmt' utility with the -t or -c
9512b1
   switches and the width option (-w) set to a number like 9999.
9512b1

9512b1
   SHELL AND ENVIRONMENT
9512b1

9512b1
4.30. How do I read environment variables with sed?
9512b1

9512b1
4.30.1. - on Unix platforms
9512b1

9512b1
   In Unix, environment variables begin with a dollar sign, such as
9512b1
   $TERM, $PATH, $var or $i. In sed, the dollar sign is used to
9512b1
   indicate the last line of the input file, the end of a line (in the
9512b1
   LHS), or a literal symbol (in the RHS). Sed cannot access variables
9512b1
   directly, so one must pay attention to shell quoting requirements
9512b1
   to expand the variables properly.
9512b1

9512b1
   To ALLOW the Unix shell to interpret the dollar sign, put the
9512b1
   script in double quotes:
9512b1

9512b1
     sed "s/_terminal-type_/$TERM/g" input.file >output.file
9512b1

9512b1
   To PREVENT the Unix shell from interpreting the dollar sign as a
9512b1
   shell variable, put the script in single quotes:
9512b1

9512b1
     sed 's/.$//' infile >outfile
9512b1

9512b1
   To use BOTH Unix $environment_vars and sed /end-of-line$/ pattern
9512b1
   matching, there are two solutions. (1) The easiest is to enclose
9512b1
   the script in "double quotes" so the shell can see the $variables,
9512b1
   and to prefix the sed metacharacter ($) with a backslash. Thus, in
9512b1

9512b1
     sed "s/$user\$/root/" file
9512b1

9512b1
   the shell interpolates $user and sed interprets \$ as the symbol
9512b1
   for end-of-line.
9512b1

9512b1
   (2) Another method--somewhat less readable--is to concatenate the
9512b1
   script with 'single quotes' where the $ should not be interpolated
9512b1
   and "double quotes" where variable interpolation should occur. To
9512b1
   demonstrate using the preceding script:
9512b1

9512b1
     sed "s/$user"'$/root/' file
9512b1

9512b1
   Solution #1 seems easier to remember. In either case, we search for
9512b1
   the user's name (stored in a variable called $user) when it occurs
9512b1
   at the end of the line ($), and substitute the word "root" in all
9512b1
   matches.
9512b1

9512b1
   For longer shell scripts, it is sometimes useful to begin with
9512b1
   single quote marks ('), close them upon encountering the variable,
9512b1
   enclose the variable name in double quotes ("), and resume with
9512b1
   single quotes, closing them at the end of the sed script.  Example:
9512b1

9512b1
     #! /bin/sh
9512b1
     # sed script to illustrate 'quote'"matching"'usage'
9512b1
     FROM='abcdefgh'
9512b1
     TO='ABCDEFGH'
9512b1
     sed -e '
9512b1
     y/'"$FROM"'/'"$TO"'/;    # note the quote pairing
9512b1
     # some more commands go here . . .
9512b1
     # last line is a single quote mark
9512b1
     '
9512b1

9512b1
   Thus, each variable named $FROM is replaced by $TO, and the single
9512b1
   quotes are used to glue the multiple lines together in the script.
9512b1
   (See also section 4.10, "How do I handle shell quoting in sed?")
9512b1

9512b1
4.30.2. - on MS-DOS and 4DOS platforms
9512b1

9512b1
   Under 4DOS and MS-DOS version 7.0 (Win95) or 7.10 (Win95 OSR2),
9512b1
   environment variables can be accessed from the command prompt.
9512b1
   Under MS-DOS v6.22 and below, environment variables can only be
9512b1
   accessed from within batch files. Environment variables should be
9512b1
   enclosed between percent signs and are case-insensitive; i.e.,
9512b1
   %USER% or %user% will display the USER variable. To generate a true
9512b1
   percent sign, just enter it twice.
9512b1

9512b1
   DOS versions of sed require that sed scripts be enclosed by double
9512b1
   quote marks "..." (not single quotes!) if the script contains
9512b1
   embedded tabs, spaces, redirection arrows or the vertical bar. In
9512b1
   fact, if the input for sed comes from piping, a sed script should
9512b1
   not contain a vertical bar, even if it is protected by double
9512b1
   quotes (this seems to be bug in the normal MS-DOS syntax). Thus,
9512b1

9512b1
       echo blurk | sed "s/^/ |foo /"     # will cause an error
9512b1
       sed "s/^/ |foo /" blurk.txt        # will work as expected
9512b1

9512b1
   Using DOS environment variables which contain DOS path statements
9512b1
   (such as a TMP variable set to "C:\TEMP") within sed scripts is
9512b1
   discouraged because sed will interpret the backslash '\' as a
9512b1
   metacharacter to "quote" the next character, not as a normal
9512b1
   symbol. Thus,
9512b1

9512b1
       sed "s/^/%TMP% /" somefile.txt
9512b1

9512b1
   will not prefix each line with (say) "C:\TEMP ", but will prefix
9512b1
   each line with "C:TEMP "; sed will discard the backslash, which is
9512b1
   probably not what you want. Other variables such as %PATH% and
9512b1
   %COMSPEC% will also lose the backslash within sed scripts.
9512b1

9512b1
   Environment variables which do not use backslashes are usually
9512b1
   workable. Thus, all the following should work without difficulty,
9512b1
   if they are invoked from within DOS batch files:
9512b1

9512b1
       sed "s/=username=/%USER%/g" somefile.txt
9512b1
       echo %FILENAME% | sed "s/\.TXT/.BAK/"
9512b1
       grep -Ei "%string%" somefile.txt | sed "s/^/  /"
9512b1

9512b1
   while from either the DOS prompt or from within a batch file,
9512b1

9512b1
       sed "s/%%/ percent/g" input.fil >output.fil
9512b1

9512b1
   will replace each percent symbol in a file with " percent" (adding
9512b1
   the leading space for readability).
9512b1

9512b1
4.31. How do I export or pass variables back into the environment?
9512b1

9512b1
4.31.1. - on Unix platforms
9512b1

9512b1
   Suppose that line #1, word #2 of the file 'terminals' contains a
9512b1
   value to be put in your TERM environment variable. Sed cannot
9512b1
   export variables directly to the shell, but it can pass strings to
9512b1
   shell commands. To set a variable in the Bourne shell:
9512b1

9512b1
       TERM=`sed 's/^[^ ][^ ]* \([^ ][^ ]*\).*/\1/;q' terminals`;
9512b1
       export TERM
9512b1

9512b1
   If the second word were "Wyse50", this would send the shell command
9512b1
   "TERM=Wyse50".
9512b1

9512b1
4.31.2. - on MS-DOS or 4DOS platforms
9512b1

9512b1
   Sed cannot directly manipulate the environment. Under DOS, only
9512b1
   batch files (.BAT) can do this, using the SET instruction, since
9512b1
   they are run directly by the command shell. Under 4DOS, special
9512b1
   4DOS commands (such as ESET) can also alter the environment.
9512b1

9512b1
   Under DOS or 4DOS, sed can select a word and pass it to the SET
9512b1
   command. Suppose you want the 1st word of the 2nd line of MY.DAT
9512b1
   put into an environment variable named %PHONE%. You might do this:
9512b1

9512b1
       @echo off
9512b1
       sed -n "2 s/^\([^ ][^ ]*\) .*/SET PHONE=\1/p;3q" MY.DAT > GO_.BAT
9512b1
       call GO_.BAT
9512b1
       echo The environment variable for PHONE is %PHONE%
9512b1
       :: cleanup
9512b1
       del GO_.BAT
9512b1

9512b1
   The sed script assumes that the first character on the 2nd line is
9512b1
   not a space and uses grouping \(...\) to save the first string of
9512b1
   non-space characters as \1 for the RHS. In writing any batch files,
9512b1
   make sure that output filenames such as GO_.BAT don't overwrite
9512b1
   preexisting files of the same name.
9512b1

9512b1
4.32. How do I handle Unix shell quoting in sed?
9512b1

9512b1
   To embed a literal single quote (') in a script, use (a) or (b):
9512b1

9512b1
   (a) If possible, put the script in double quotes:
9512b1

9512b1
     sed "s/cannot/can't/g" file
9512b1

9512b1
   (b) If the script must use single quotes, then close-single-quote
9512b1
   the script just before the SPECIAL single quote, prefix the single
9512b1
   quote with a backslash, and use a 2nd pair of single quotes to
9512b1
   finish marking the script. Thus:
9512b1

9512b1
     sed 's/cannot$/can'\''t/g' file
9512b1

9512b1
   Though this looks hard to read, it breaks down to 3 parts:
9512b1

9512b1
      's/cannot$/can'   \'   't/g'
9512b1
      ---------------   --   -----
9512b1

9512b1
   To embed a literal double quote (") in a script, use (a) or (b):
9512b1

9512b1
   (a) If possible, put the script in single quotes. You don't need to
9512b1
   prefix the double quotes with anything. Thus:
9512b1

9512b1
     sed 's/14"/fourteen inches/g' file
9512b1

9512b1
   (b) If the script must use double quotes, then prefix the SPECIAL
9512b1
   double quote with a backslash (\). Thus,
9512b1

9512b1
     sed "s/$length\"/$length inches/g" file
9512b1

9512b1
   To embed a literal backslash (\) into a script, enter it twice:
9512b1

9512b1
     sed 's/C:\\DOS/D:\\DOS/g' config.sys
9512b1

9512b1
   FILES, DIRECTORIES, AND PATHS
9512b1

9512b1
4.40. How do I read (insert/add) a file at the top of a textfile?
9512b1

9512b1
   Normally, adding a "header" file to the top of a "body" file is
9512b1
   done from the command prompt before passing the file on to sed.
9512b1
   (MS-DOS below version 6.0 must use COPY and DEL instead of MOVE in
9512b1
   the following example.)
9512b1

9512b1
       copy header.txt+body temp                  # MS-DOS command 1
9512b1
       echo Y | move temp body                    # MS-DOS command 2
9512b1
                                                    #
9512b1
       cat header.txt body >temp; mv temp body    # Unix commands
9512b1

9512b1
   However, if inserting the file must occur within sed, there is a
9512b1
   way. The sed command "1 r header.txt" will not work; it will print
9512b1
   line 1 and then insert "header.txt" between lines 1 and 2. The
9512b1
   following script solves this problem; however, there must be at
9512b1
   least 2 lines in the target file for the script to work properly.
9512b1

9512b1
     # sed script to insert "header.txt" above the first line
9512b1
     1{h; r header.txt
9512b1
       D; }
9512b1
     2{x; G; }
9512b1
     #---end of sed script---
9512b1

9512b1
4.41. How do I make substitutions in every file in a directory, or in
9512b1
      a complete directory tree?
9512b1

9512b1
4.41.1. - ssed and Perl solution
9512b1

9512b1
   The best solution for multiple files in a single directory is to
9512b1
   use ssed or gsed v4.0 or higher:
9512b1

9512b1
     sed -i.BAK 's|foo|bar|g' files       # -i does in-place replacement
9512b1

9512b1
   If you don't have ssed, there is a similar solution in Perl. (Yes,
9512b1
   we know this is a FAQ file for sed, not perl, but perl is more
9512b1
   common than ssed for many users.)
9512b1

9512b1
     perl -pi.bak -e 's|foo|bar|g' files                # or
9512b1
     perl -pi.bak -e 's|foo|bar|g' `find /pathname -name "filespec"`
9512b1

9512b1
   For each file in the filelist, sed (or Perl) renames the source
9512b1
   file to "filename.bak"; the modified file gets the original
9512b1
   filename. Remove '.bak' if you don't need backup copies. (Note the
9512b1
   use of "s|||" instead of "s///" here, and in the scripts below. The
9512b1
   vertical bars in the 's' command let you replace '/some/path' with
9512b1
   '/another/path', accommodating slashes in the LHS and RHS.)
9512b1

9512b1
   To recurse directories in Unix or GNU/Linux:
9512b1

9512b1
     # We use xargs to prevent passing too many filenames to sed, but
9512b1
     # this command will fail if filenames contain spaces or newlines.
9512b1
     find /my/path -name '*.ht' -print | xargs sed -i.BAK 's|foo|bar|g'
9512b1

9512b1
   To recurse directories under Windows 2000 (CMD.EXE or COMMAND.COM):
9512b1

9512b1
     # This syntax isn't supported under Windows 9x COMMAND.COM
9512b1
     for /R c:\my\path %f in (*.htm) do sed -i.BAK "s|foo|bar|g" %f
9512b1

9512b1
4.41.2. - Unix solution
9512b1

9512b1
   For all files in a single directory, assuming they end with *.txt
9512b1
   and you have no files named "[anything].txt.bak" already, use a
9512b1
   shell script:
9512b1

9512b1
     #! /bin/sh
9512b1
     # Source files are saved as "filename.txt.bak" in case of error
9512b1
     # The '&&' after cp is an additional safety feature
9512b1
     for file in *.txt
9512b1
     do
9512b1
        cp $file $file.bak &&
9512b1
        sed 's|foo|bar|g' $file.bak >$file
9512b1
     done
9512b1

9512b1
   To do an entire directory tree, use the Unix utility find, like so
9512b1
   (thanks to Jim Dennis <jadestar@rahul.net> for this script):
9512b1

9512b1
     #! /bin/sh
9512b1
     # filename: replaceall
9512b1
     # Backup files are NOT saved in this script.
9512b1
     find . -type f -name '*.txt' -print | while read i
9512b1
     do
9512b1
        sed 's|foo|bar|g' $i > $i.tmp && mv $i.tmp $i
9512b1
     done
9512b1

9512b1
   This previous shell script recurses through the directory tree,
9512b1
   finding only files in the directory (not symbolic links, which will
9512b1
   be encountered by the shell command "for file in *.txt", above). To
9512b1
   preserve file permissions and make backup copies, use the 2-line cp
9512b1
   routine of the earlier script instead of "sed ... && mv ...". By
9512b1
   replacing the sed command 's|foo|bar|g' with something like
9512b1

9512b1
     sed "s|$1|$2|g" ${i}.bak > $i
9512b1

9512b1
   using double quotes instead of single quotes, the user can also
9512b1
   employ positional parameters on the shell script command tail, thus
9512b1
   reusing the script from time to time. For example,
9512b1

9512b1
       replaceall East West
9512b1

9512b1
   would modify all your *.txt files in the current directory.
9512b1

9512b1
4.41.3. - DOS solution:
9512b1

9512b1
   MS-DOS users should use two batch files like this:
9512b1

9512b1
      @echo off
9512b1
      :: MS-DOS filename: REPLACE.BAT
9512b1
      ::
9512b1
      :: Create a destination directory to put the new files.
9512b1
      :: Note: The next command will fail under Novel Netware
9512b1
      :: below version 4.10 unless "SHOW DOTS=ON" is active.
9512b1
      if not exist .\NEWFILES\NUL mkdir NEWFILES
9512b1
      for %%f in (*.txt) do CALL REPL_2.BAT %%f
9512b1
      echo Done!!
9512b1
      :: ---End of first batch file---
9512b1

9512b1
      @echo off
9512b1
      :: MS-DOS filename: REPL_2.BAT
9512b1
      ::
9512b1
      sed "s/foo/bar/g" %1 > NEWFILES\%1
9512b1
      :: ---End of the second batch file---
9512b1

9512b1
   When finished, the current directory contains all the original
9512b1
   files, and the newly-created NEWFILES subdirectory contains the
9512b1
   modified *.TXT files. Do not attempt a command like
9512b1

9512b1
       for %%f in (*.txt) do sed "s/foo/bar/g" %%f >NEWFILES\%%f
9512b1

9512b1
   under any version of MS-DOS because the output filename will be
9512b1
   created as a literal '%f' in the NEWFILES directory before the
9512b1
   %%f is expanded to become each filename in (*.txt). This occurs
9512b1
   because MS-DOS creates output filenames via redirection commands
9512b1
   before it expands "for..in..do" variables.
9512b1

9512b1
   To recurse through an entire directory tree in MS-DOS requires a
9512b1
   batch file more complex than we have room to describe. Examine the
9512b1
   file SWEEP.BAT in Timo Salmi's great archive of batch tricks,
9512b1
   located at <ftp://garbo.uwasa.fi/pc/link/tsbat.zip> (this file is
9512b1
   regularly updated). Another alternative is to get an external
9512b1
   program designed for directory recursion. Here are some recommended
9512b1
   programs for directory recursion. The first one, FORALL, runs under
9512b1
   either OS/2 or DOS. Unfortunately, none of these supports Win9x
9512b1
   long filenames.
9512b1

9512b1
       http://hobbes.nmsu.edu/pub/os2/util/disk/forall72.zip
9512b1
       ftp://garbo.uwasa.fi/pc/filefind/target15.zip
9512b1

9512b1
4.42. How do I replace "/some/UNIX/path" in a substitution?
9512b1

9512b1
   Technically, the normal meaning of the slash can be disabled by
9512b1
   prefixing it with a backslash. Thus,
9512b1

9512b1
     sed 's/\/some\/UNIX\/path/\/a\/new\/path/g' files
9512b1

9512b1
   But this is hard to read and write. There is a better solution.
9512b1
   The s/// substitution command allows '/' to be replaced by any
9512b1
   other character (including spaces or alphanumerics). Thus,
9512b1

9512b1
     sed 's|/some/UNIX/path|/a/new/path|g' files
9512b1

9512b1
   and if you are using variable names in a Unix shell script,
9512b1

9512b1
     sed "s|$OLDPATH|$NEWPATH|g" oldfile >newfile
9512b1

9512b1
4.43. How do I replace "C:\SOME\DOS\PATH" in a substitution?
9512b1

9512b1
   For MS-DOS users, every backslash must be doubled. Thus, to replace
9512b1
   "C:\SOME\DOS\PATH" with "D:\MY\NEW\PATH":
9512b1

9512b1
     sed "s|C:\\SOME\\DOS\\PATH|D:\\MY\\NEW\\PATH|g" infile >outfile
9512b1

9512b1
   Remember that DOS pathnames are not case sensitive and can appear
9512b1
   in upper or lower case in the input file. If this concerns you, use
9512b1
   a version of sed which can ignore case when matching (gsed, ssed,
9512b1
   sedmod, sed16).
9512b1

9512b1
       @echo off
9512b1
       :: sample MS-DOS batch file to alter path statements
9512b1
       :: requires GNU sed with the /i flag for s///
9512b1
       set old=C:\\SOME\\DOS\\PATH
9512b1
       set new=D:\\MY\\NEW\\PATH
9512b1
       gsed "s|%old%|%new%|gi" infile >outfile
9512b1
       :: or
9512b1
       ::     sedmod -i "s|%old%|%new%|g" infile >outfile
9512b1
       set old=
9512b1
       set new=
9512b1

9512b1
   Also, remember that under Windows long filenames may be stored in
9512b1
   two formats: e.g., as "C:\Program Files" or as "C:\PROGRA~1".
9512b1

9512b1
4.44.  How do I emulate file-includes, using sed?
9512b1

9512b1
   Given an input file with file-include statements, similar to
9512b1
   C-style includes or "server-side includes" (SSI) of this format:
9512b1

9512b1
       This is the source file. It's short.
9512b1
       Its name is simply 'source'. See the script below.
9512b1
       
9512b1
              And this is any amount of text between
9512b1
       
9512b1
       This is the last line of the file.
9512b1

9512b1
   How do we direct sed to import/insert whichever files are at the
9512b1
   point of the 'file="filename"' token? First, use this file:
9512b1

9512b1
     #n
9512b1
     # filename: incl.sed
9512b1
     # Comments supported by GNU sed or ssed. Leading '#n' should
9512b1
     # be on line 1, columns 1-2 of the line.
9512b1
     /
9512b1
       =;                     #   print the line number
9512b1
       s/^[^"]*"/{r /;        #   change pattern to 'r{ '
9512b1
       s/".*//p;              #   delete rest to EOL, print
9512b1
                              #   and a(ppend) a delete command
9512b1
       a\
9512b1
       d;}
9512b1
     }
9512b1
     #---end of sed script---
9512b1

9512b1
   Second, use the following shell script or DOS batch file (if
9512b1
   running a DOS batch file, use "double quotes" instead of 'single
9512b1
   quotes', and use "del" instead of "rm" to remove the temp file):
9512b1

9512b1
     sed -nf incl.sed source | sed 'N;N;s/\n//' >temp.sed
9512b1
     sed -f temp.sed source >target
9512b1
     rm temp.sed
9512b1

9512b1
   If you have GNU sed or ssed, you can reduce the script even further
9512b1
   (thanks to Michael Carmack for the reminder):
9512b1

9512b1
     sed -nf incl.sed source | sed 'N;N;s/\n//' | sed -f - source >target
9512b1

9512b1
   In brief, the script replaces each filename with a 'r filename'
9512b1
   command to insert the file at that point, while omitting the
9512b1
   extraneous material. Two important things to note with this script:
9512b1
   (1) There should be only one '#include file' directive per line, and
9512b1
   (2) each '#include file' directive must be the *only* thing on that
9512b1
   line, because everything else on the line will be deleted.
9512b1

9512b1
   Though the script uses GNU sed or ssed because of the great support
9512b1
   for embedded script comments, it should run on any version of sed.
9512b1
   If not, write me and let me know.
9512b1

9512b1
------------------------------
9512b1

9512b1
5. WHY ISN'T THIS WORKING?
9512b1

9512b1
5.1. Why don't my variables like $var get expanded in my sed script?
9512b1

9512b1
   Because your sed script uses 'single quotes' instead of "double
9512b1
   quotes." Unix shells never expand $variables in single quotes.
9512b1

9512b1
   This is probably the most frequently-asked sed question. For more
9512b1
   info on using variables, see section 4.30.
9512b1

9512b1
5.2. I'm using 'p' to print, but I have duplicate lines sometimes.
9512b1

9512b1
   Sed prints the entire file by default, so the 'p' command might
9512b1
   cause the duplicate lines. If you want the whole file printed,
9512b1
   try removing the 'p' from commands like 's/foo/bar/p'. If you want
9512b1
   part of the file printed, run your sed script with -n flag to
9512b1
   suppress normal output, and rewrite the script to get all output
9512b1
   from the 'p' comand.
9512b1

9512b1
   If you're still getting duplicate lines, you are probably finding
9512b1
   several matches for the same line. Suppose you want to print lines
9512b1
   with the words "Peter" or "James" or "John", but not the same line
9512b1
   twice. The following command will fail:
9512b1

9512b1
     sed -n '/Peter/p; /James/p; /John/p' files
9512b1

9512b1
   Since all 3 commands of the script are executed for each line,
9512b1
   you'll get extra lines. A better way is to use the 'd' (delete) or
9512b1
   'b' (branch) commands, like so (with GNU sed):
9512b1

9512b1
     sed '/Peter/b; /James/b; /John/b; d' files          # one way
9512b1
     sed -n '/Peter/{p;d;};/James/{p;d;};/John/p' files  # a 2nd way
9512b1
     sed -n '/Peter/{p;b;};/James/{p;b;};/John/p' files  # a 3rd way
9512b1
     sed '/Peter\|James\|John/!d' files                  # shortest way
9512b1

9512b1
   On standard seds, these must be broken down with -e commands:
9512b1

9512b1
     sed -e '/Peter/b' -e '/James/b' -e '/John/b' -e d files
9512b1
     sed -n -e '/Peter/{p;d;}' -e '/James/{p;d;}' -e '/John/p' files
9512b1

9512b1
   The 3rd line would require too many -e commands to fit on one line,
9512b1
   since standard versions of sed require an -e command after each 'b'
9512b1
   and also after each closing brace '}'.
9512b1

9512b1
5.3. Why does my DOS version of sed process a file part-way through
9512b1
     and then quit?
9512b1

9512b1
   First, look for errors in the script. Have you used the -n switch
9512b1
   without telling sed to print anything to the console? Have you read
9512b1
   the docs to your version of sed to see if it has a syntax you may
9512b1
   have misused? (Look for an N or H command that gathers too much.)
9512b1

9512b1
   Next, if you are sure your sed script is valid, a probable cause is
9512b1
   an end-of-file marker embedded in the file. An EOF marker (SUB) is
9512b1
   a Control-Z character, with the value of 1A hex (26 decimal). As
9512b1
   soon as any DOS version of sed encounters a Ctrl-Z character, sed
9512b1
   stops processing.
9512b1

9512b1
   To locate the EOF character, use Vern Buerg's shareware file viewer
9512b1
   LIST.COM <http://www.buerg.com/list.html>. In text mode, look for a
9512b1
   right-arrow symbol; in hex mode (Alt-H), look for a 1A code. With
9512b1
   Unix utilities ported to DOS, use 'od' (octal dump) to display
9512b1
   hexcodes in your file, and then use sed to locate the offending
9512b1
   character:
9512b1

9512b1
       od -txC badfile.txt | sed -n "/ 1a /p; / 1a$/p"
9512b1

9512b1
   Then edit the input file to remove the offending character(s).
9512b1

9512b1
   If you would rather NOT edit the input file, there is still a fix.
9512b1
   It requires the DJGPP 32-bit port of 'tr', the Unix translate
9512b1
   program (v1.22 or higher). GNU od and tr are currently at v2.0 (for
9512b1
   DOS); they are packaged with the GNU text utilities, available at
9512b1

9512b1
       ftp://ftp.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/txt20b.zip
9512b1
       http://www.simtel.net/gnudlpage.php?product=/gnu/djgpp/v2gnu/txt20b.zip&name=txt20b.zip
9512b1

9512b1
   It is important to get the DJGPP version of 'tr' because other
9512b1
   versions ported to DOS will stop processing when they encounter the
9512b1
   EOF character. Use the -d (delete) command:
9512b1

9512b1
       tr -d \32 < badfile.txt | sed -f myscript.sed
9512b1

9512b1
5.4. My RE isn't matching/deleting what I want it to. (Or, "Greedy vs.
9512b1
     stingy pattern matching")
9512b1

9512b1
   The two most common causes for this problem are: (1) misusing the
9512b1
   '.' metacharacter, and (2) misusing the '*' metacharacter. The RE
9512b1
   '.*' is designed to be "greedy" (i.e., matching as many characters
9512b1
   as possible). However, sometimes users need an expression which is
9512b1
   "stingy," matching the shortest possible string.
9512b1

9512b1
   (1) On single-line patterns, the '.' metacharacter matches any
9512b1
   single character on the line. ('.' cannot match the newline at the
9512b1
   end of the line because the newline is removed when the line is put
9512b1
   into the pattern space; sed adds a newline automatically when the
9512b1
   pattern space is printed.) On multi-line patterns obtained with the
9512b1
   'N' or 'G' commands, '.' _will_ match a newline in the middle of the
9512b1
   pattern space. If there are 3 lines in the pattern space, "s/.*//"
9512b1
   will delete all 3 lines, not just the first one (leaving 1 blank
9512b1
   line, since the trailing newline is added to the output).
9512b1

9512b1
   Normal misuse of '.' occurs in trying to match a word or bounded
9512b1
   field, and forgetting that '.' will also cross the field limits.
9512b1
   Suppose you want to delete the first word in braces:
9512b1

9512b1
       echo {one} {two} {three} | sed 's/{.*}/{}/'       # fails
9512b1
       echo {one} {two} {three} | sed 's/{[^}]*}/{}/'    # succeeds
9512b1

9512b1
   's/{.*}/{}/' is not the solution, since the regex '.' will match
9512b1
   any character, including the close braces. Replace the '.' with
9512b1
   '[^}]', which signifies a negated character set '[^...]' containing
9512b1
   anything other than a right brace. FWIW, we know that 's/{one}/{}/'
9512b1
   would also solve our question, but we're trying to illustrate the
9512b1
   use of the negated character set: [^anything-but-this].
9512b1

9512b1
   A negated character set should be used for matching words between
9512b1
   quote marks, for fields separated by commas, and so on. See also
9512b1
   section 4.12 ("How do I parse a comma-delimited data file?").
9512b1

9512b1
   (2) The '*' metacharacter represents zero or more instances of the
9512b1
   previous expression. The '*' metacharacter looks for the leftmost
9512b1
   possible match first and will match zero characters. Thus,
9512b1

9512b1
       echo foo | sed 's/o*/EEE/'
9512b1

9512b1
   will generate 'EEEfoo', not 'fEEE' as one might expect. This is
9512b1
   because /o*/ matches the null string at the beginning of the word.
9512b1

9512b1
   After finding the leftmost possible match, the '*' is GREEDY; it
9512b1
   always tries to match the longest possible string. When two or
9512b1
   three instances of '.*' occur in the same RE, the leftmost instance
9512b1
   will grab the most characters. Consider this example, which uses
9512b1
   grouping '\(...\)' to save patterns:
9512b1

9512b1
       echo bar bat bay bet bit | sed 's/^.*\(b.*\)/\1/'
9512b1

9512b1
   What will be displayed is 'bit', never anything longer, because the
9512b1
   leftmost '.*' took the longest possible match. Remember this rule:
9512b1
   "leftmost match, longest possible string, zero also matches."
9512b1

9512b1
5.5. What is CSDPMI*B.ZIP and why do I need it?
9512b1

9512b1
   If you use MS-DOS outside of Windows and try to use GNU sed v1.18
9512b1
   or 3.02, you may encounter the following error message:
9512b1

9512b1
       no DPMI - Get csdpmi*b.zip
9512b1

9512b1
   "DPMI" stands for DOS Protected Mode Interface; it's basically a
9512b1
   means of running DOS in Protected Mode (as opposed to Real Mode),
9512b1
   which allows programs to share resources in extended memory without
9512b1
   conflicting with one another. Running HIMEM.SYS and EMM386.EXE is
9512b1
   not enough. The "CSDPMI*B.ZIP" refers to files written by Charles
9512b1
   Sandmann to provide DPMI services for 32-bit computers (i.e.,
9512b1
   386SX, 386DX, 486SX, etc.). Download the binary file (the source
9512b1
   code is also available):
9512b1

9512b1
       http://www.delorie.com/djgpp/dl/ofc/simtel/v2misc/csdpmi5b.zip  # binaries
9512b1
       http://www.delorie.com/djgpp/dl/ofc/simtel/v2misc/csdpmi5s.zip  # source
9512b1
       ftp://ftp.cdrom.com/pub/simtelnet/gnu/djgpp/v2misc/csdpmi5b.zip # binaries
9512b1
       ftp://ftp.cdrom.com/pub/simtelnet/gnu/djgpp/v2misc/csdpmi5s.zip # source
9512b1

9512b1
   and extract CWSDPMI.EXE, CWSDPR0.EXE and CWSPARAM.EXE from the ZIP
9512b1
   file. Put all 3 CWS*.EXE files in the same directory as GSED.EXE
9512b1
   and you're all set. There are DOC files enclosed, but they're
9512b1
   nearly incomprehensible for the average computer user. (Another
9512b1
   case of user-vicious documentation.)
9512b1

9512b1
   If you're running Windows and you normally use a DOS session to run
9512b1
   GNU sed (i.e., you get to a DOS prompt with a resizable window or
9512b1
   you press Alt-Enter to switch to full-screen mode), you don't need
9512b1
   the CWS*.EXE files at all, since Windows uses DPMI already.
9512b1

9512b1
5.6. Where are the man pages for GNU sed?
9512b1

9512b1
   Prior to GNU sed v3.02, there weren't any. Until recently, man
9512b1
   pages distributed with gsed were borrowed from old sources or from
9512b1
   other compilations. None of them were "official." GNU sed v3.02 had
9512b1
   the first real set of official man pages, and the documentation has
9512b1
   greatly improved with GNU sed version 4.0, which now includes both
9512b1
   man pages and textinfo pages.
9512b1

9512b1
5.7. How do I tell what version of sed I am using?
9512b1

9512b1
   Try entering "sed" all by itself on the command line, followed by
9512b1
   no arguments or parameters.  Also, try "sed --version".  In a
9512b1
   pinch, you can also try this:
9512b1

9512b1
       strings sed | grep -i ver
9512b1

9512b1
   Your version of 'strings' must be a version of the Unix utility of
9512b1
   this name. It should not be the DOS utility STRINGS.COM by Douglas
9512b1
   Boling.
9512b1

9512b1
5.8. Does sed issue an exit code?
9512b1

9512b1
   Most versions of sed do not, but check the documentation that came
9512b1
   with whichever version you are using. GNU sed issues an exit code
9512b1
   of 0 if the program terminated normally, 1 if there were errors in
9512b1
   the script, and 2 if there were errors during script execution.
9512b1

9512b1
5.9. The 'r' command isn't inserting the file into the text.
9512b1

9512b1
   On most versions of sed (but not all), the 'r' (read) and 'w'
9512b1
   (write) commands must be followed by exactly one space, then the
9512b1
   filename, and then terminated by a newline. Any additional
9512b1
   characters before or after the filename are interpreted as *part*
9512b1
   of the filename. Thus
9512b1

9512b1
       /RE/r  insert.me
9512b1

9512b1
   will would try to locate a file called ' insert.me' (note the
9512b1
   leading space!). If the file was not found, most versions of sed
9512b1
   say nothing, not even an error message.
9512b1

9512b1
   When sed scripts are used on the command line, every 'r' and 'w'
9512b1
   must be the last command in that part of the script. Thus,
9512b1

9512b1
       sed -e '/regex/{r insert.file;d;}' source         # will fail
9512b1
       sed -e '/regex/{r insert.file' -e 'd;}' source    # will succeed
9512b1

9512b1
5.10. Why can't I match or delete a newline using the \n escape sequence?
9512b1
      Why can't I match 2 or more lines using \n?
9512b1

9512b1
   The \n will never match the newline at the end-of-line because the
9512b1
   newline is always stripped off before the line is placed into the
9512b1
   pattern space. To get 2 or more lines into the pattern space, use
9512b1
   the 'N' command or something similar (such as 'H;...;g;').
9512b1

9512b1
   Sed works like this: sed reads one line at a time, chops off the
9512b1
   terminating newline, puts what is left into the pattern space where
9512b1
   the sed script can address or change it, and when the pattern space
9512b1
   is printed, appends a newline to stdout (or to a file). If the
9512b1
   pattern space is entirely or partially deleted with 'd' or 'D', the
9512b1
   newline is *not* added in such cases. Thus, scripts like
9512b1

9512b1
       sed 's/\n//' file       # to delete newlines from each line
9512b1
       sed 's/\n/foo\n/' file  # to add a word to the end of each line
9512b1

9512b1
   will _never_ work, because the trailing newline is removed _before_
9512b1
   the line is put into the pattern space. To perform the above tasks,
9512b1
   use one of these scripts instead:
9512b1

9512b1
       tr -d '\n' < file              # use tr to delete newlines
9512b1
       sed ':a;N;$!ba;s/\n//g' file   # GNU sed to delete newlines
9512b1
       sed 's/$/ foo/' file           # add "foo" to end of each line
9512b1

9512b1
   Since versions of sed other than GNU sed have limits to the size of
9512b1
   the pattern buffer, the Unix 'tr' utility is to be preferred here.
9512b1
   If the last line of the file contains a newline, GNU sed will add
9512b1
   that newline to the output but delete all others, whereas tr will
9512b1
   delete all newlines.
9512b1

9512b1
   To match a block of two or more lines, there are 3 basic choices:
9512b1
   (1) use the 'N' command to add the Next line to the pattern space;
9512b1
   (2) use the 'H' command at least twice to append the current line
9512b1
   to the Hold space, and then retrieve the lines from the hold space
9512b1
   with x, g, or G; or (3) use address ranges (see section 3.3, above)
9512b1
   to match lines between two specified addresses.
9512b1

9512b1
   Choices (1) and (2) will put an \n into the pattern space, where it
9512b1
   can be addressed as desired ('s/ABC\nXYZ/alphabet/g'). One example
9512b1
   of using 'N' to delete a block of lines appears in section 4.13
9512b1
   ("How do I delete a block of _specific_ consecutive lines?"). This
9512b1
   example can be modified by changing the delete command to something
9512b1
   else, like 'p' (print), 'i' (insert), 'c' (change), 'a' (append),
9512b1
   or 's' (substitute).
9512b1

9512b1
   Choice (3) will not put an \n into the pattern space, but it _does_
9512b1
   match a block of consecutive lines, so it may be that you don't
9512b1
   even need the \n to find what you're looking for. Since several
9512b1
   versions of sed support this syntax:
9512b1

9512b1
       sed '/start/,+4d'  # to delete "start" plus the next 4 lines,
9512b1

9512b1
   in addition to the traditional '/from here/,/to there/{...}' range
9512b1
   addresses, it may be possible to avoid the use of \n entirely.
9512b1

9512b1
5.11. My script aborts with an error message, "event not found".
9512b1

9512b1
   This error is generated by the csh or tcsh shells, not by sed. The
9512b1
   exclamation mark (!) is special to csh/tcsh, and if you use it in
9512b1
   command-line or shell scripts--even within single quotes--it must
9512b1
   be preceded by a backslash. Thus, under the csh/tcsh shell:
9512b1

9512b1
       sed '/regex/!d'      # will fail
9512b1
       sed '/regex/\!d'     # will succeed
9512b1

9512b1
   The exclamation mark should not be prefixed with a backslash when
9512b1
   the script is called from a file, as "-f script.file".
9512b1

9512b1
------------------------------
9512b1

9512b1
6. OTHER ISSUES
9512b1

9512b1
6.1. I have a certain problem that stumps me. Where can I get help?
9512b1

9512b1
   Post your question on the "sed-users" mailing list (section 2.3.2),
9512b1
   where many sed users will be able to see your question. You will have
9512b1
   to subscribe to have posting privileges.
9512b1

9512b1
   Your other alternative is one of these newsgroups:
9512b1

9512b1
      - alt.comp.editors.batch
9512b1
      - comp.editors
9512b1
      - comp.unix.questions
9512b1
      - comp.unix.shell
9512b1

9512b1
6.2. How does sed compare with awk, perl, and other utilities?
9512b1

9512b1
   Awk is a much richer language with many features of a programming
9512b1
   language, including variable names, math functions, arrays, system
9512b1
   calls, etc. Its command structure is similar to sed:
9512b1

9512b1
      address { command(s) }
9512b1

9512b1
   which means that for each line or range of lines that matches the
9512b1
   address, execute the command(s). In both sed and awk, an address
9512b1
   can be a line number or a RE somewhere on the line, or both.
9512b1

9512b1
   In program size, awk is 3-10 times larger than sed. Awk has most of
9512b1
   the functions of sed, but not all. Notably, sed supports
9512b1
   backreferences (\1, \2, ...) to previous expressions, and awk does
9512b1
   not have any comparable syntax. (One exception: GNU awk v3.0
9512b1
   introduced gensub(), which supports backreferences only on
9512b1
   substitutions.)
9512b1

9512b1
   Perl is a general-purpose programming language, with many features
9512b1
   beyond text processing and interprocess communication, taking it
9512b1
   well past awk or other scripting languages. Perl supports every
9512b1
   feature sed does and has its own set of extended regular
9512b1
   expressions, which give it extensive power in pattern matching and
9512b1
   processing. (Note: the standard perl distribution comes with 's2p',
9512b1
   a sed-to-perl conversion script. See section 3.6 for more info.)
9512b1
   Like sed and awk, perl scripts do not need to be compiled into
9512b1
   binary code. Like sed, perl can also run many useful "one-liners"
9512b1
   from the command line, though with greater flexibility; see
9512b1
   question 4.41 ("How do I make substitutions in every file in a
9512b1
   directory, or in a complete directory tree?").
9512b1

9512b1
   On the other hand, the current version of perl is from 8 to 35
9512b1
   times larger than sed in its executables alone (perl's library
9512b1
   modules and allied files not included!). Further, for most simple
9512b1
   tasks such as substitution, sed executes more quickly than either
9512b1
   perl or awk. All these utilities serve to process input text,
9512b1
   transforming it to meet our needs . . . or our arbitrary whims.
9512b1

9512b1
6.3. When should I use sed?
9512b1

9512b1
   When you need a small, fast program to modify words, lines, or
9512b1
   blocks of lines in a textfile.
9512b1

9512b1
6.4. When should I NOT use sed?
9512b1

9512b1
   You should not use sed when you have "dedicated" tools which can do
9512b1
   the job faster or with an easier syntax. Do not use sed when you
9512b1
   only want to:
9512b1

9512b1
   - print individual lines, based on patterns within the line itself.
9512b1
     Instead, use "grep".
9512b1

9512b1
   - print blocks of lines, with 1 or more lines of context above or
9512b1
     below a specific regular expression. Instead, use the GNU version
9512b1
     of grep as follows:
9512b1

9512b1
        grep -A{number} -B{number} "regex"
9512b1

9512b1
   - remove individual lines, based on patterns within the line
9512b1
     itself. Instead, use "grep -v".
9512b1

9512b1
   - print line numbers.  Instead, use "nl" or "cat -n".
9512b1

9512b1
   - reformat lines or paragraphs. Instead, use "fold", "fmt" or "par".
9512b1

9512b1
   The tr utility is also more suited than sed to some simple tasks. For
9512b1
   example, to:
9512b1

9512b1
   - delete individual characters. Instead of "s/[a-d]//g", use
9512b1

9512b1
        tr -d "[a-d]"
9512b1

9512b1
   - squeeze sequential characters. Instead of "s/ee*/e/g", use
9512b1

9512b1
        tr -s "{character-set}"
9512b1

9512b1
   - change individual characters. Instead of "y/abcdef/ABCDEF/", use
9512b1

9512b1
        tr "[a-f]" "[A-F]"
9512b1

9512b1
   Note, however, that tr does not support giving input files on the
9512b1
   command line, so the syntax is:
9512b1

9512b1
     tr {options-and-patterns} < input-file
9512b1

9512b1
   or, to process multiple files:
9512b1

9512b1
     cat input-file1 input-file2 | tr {options-and-patterns}
9512b1

9512b1
   If you have multiple files, using tr instead of sed is often more of
9512b1
   an exercise than a useful thing. Although sed can perfectly emulate
9512b1
   certain functions of cat, grep, nl, rev, sort, tac, tail, tr, uniq,
9512b1
   and other utilities, producing identical output, the native utilities
9512b1
   are usually optimized to do the job more quickly than sed.
9512b1

9512b1
6.5. When should I ignore sed and use awk or Perl instead?
9512b1

9512b1
   If you can write the same script in awk or Perl and do it in less
9512b1
   time, then use Perl or awk. There's no reason to spend an hour
9512b1
   writing and debugging a sed script if you can do it in Perl in 10
9512b1
   minutes (assuming that you know Perl already) and if the processing
9512b1
   time or memory use is not a factor. Don't hunt pheasants with a .22
9512b1
   if you have a shotgun at your side . . . unless you simply enjoy
9512b1
   the challenge!
9512b1

9512b1
   Specifically, use awk or perl if you need to:
9512b1

9512b1
      - count fields or words on a line. (awk)
9512b1
      - count lines in a block or objects in a file.
9512b1
      - check lengths of strings or do math operations.
9512b1
      - handle very long lines or need very large buffers. (or gsed)
9512b1
      - handle binary data (control characters). (perl: binmode)
9512b1
      - loop through an array or list.
9512b1
      - test for file existence, filesize, or fileage.
9512b1
      - treat each paragraph as a line. (well, not always)
9512b1

9512b1
6.6. Known limitations among sed versions
9512b1

9512b1
   Limits on distributed versions, although source code for most
9512b1
   versions of free sed allows for modification and recompilation. As
9512b1
   used below, "no limit" means there is no "fixed" limit. Limits are
9512b1
   actually determined by one's hardware, memory, operating system,
9512b1
   and which C library is used to compile sed.
9512b1

9512b1
6.6.1. Maximum line length
9512b1

9512b1
      GNU sed:        no limit
9512b1
      ssed:           no limit
9512b1
      sedmod v1.0:    4096 bytes
9512b1
      HHsed v1.5:     4000 bytes
9512b1
      sed v1.6:       [pending]
9512b1

9512b1
6.6.2. Maximum size for all buffers (pattern space + hold space)
9512b1

9512b1
      GNU sed:        no limit
9512b1
      ssed:           no limit
9512b1
      sedmod v1.0:    4096 bytes
9512b1
      HHsed v1.5:     4000 bytes
9512b1
      sed v1.6:       [pending]
9512b1

9512b1
6.6.3. Maximum number of files that can be read with read command
9512b1

9512b1
      GNU sed v3+:    no limit
9512b1
      ssed:           no limit
9512b1
      GNU sed v2.05:  total no. of r and w commands may not exceed 32
9512b1
      sedmod v1.0:    total no. of r and w commands may not exceed 20
9512b1
      sed v1.6:       [pending]
9512b1

9512b1
6.6.4. Maximum number of files that can be written with 'w' command
9512b1

9512b1
      GNU sed v3+:    no limit (but typical Unix is 253)
9512b1
      ssed:           no limit (but typical Unix is 253)
9512b1
      GNU sed v2.05:  total no. of r and w commands may not exceed 32
9512b1
      sedmod v1.0:    10
9512b1
      HHsed v1.5:     10
9512b1
      sed v1.6:       [pending]
9512b1

9512b1
6.6.5. Limits on length of label names
9512b1

9512b1
      GNU sed:        no limit
9512b1
      ssed:           no limit
9512b1
      HHsed v1.5:     no limit
9512b1
      sed v1.6:       [pending]
9512b1
      BSD sed:        8 characters
9512b1

9512b1
   Note that GNU sed and ssed both consider a semicolon to terminate a
9512b1
   label name.
9512b1

9512b1
6.6.6. Limits on length of write-file names
9512b1

9512b1
      GNU sed:        no limit
9512b1
      ssed:           no limit
9512b1
      HHsed v1.5:     no limit
9512b1
      sed v1.6:       [pending]
9512b1
      BSD sed:        40 characters
9512b1

9512b1
6.6.7. Limits on branch/jump commands
9512b1

9512b1
      GNU sed:        no limit
9512b1
      ssed:           no limit
9512b1
      HHsed v1.5:     50
9512b1
      sed v1.6:       [pending]
9512b1

9512b1
   As a practical consequence, this means that HHsed will not read
9512b1
   more than 50 lines into the pattern space via an N command, even if
9512b1
   the pattern space is only a few hundred bytes in size. HHsed exits
9512b1
   with an error message, "infinite branch loop at line {nn}".
9512b1

9512b1
6.7. Known incompatibilities between sed versions
9512b1

9512b1
6.7.1. Issuing commands from the command line
9512b1

9512b1
   Most versions of sed permit multiple commands to issued on the
9512b1
   command line, separated by a semicolon (;). Thus,
9512b1

9512b1
       sed 'G;G' file
9512b1

9512b1
   should triple-space a file. However, for non-GNU sed, some commands
9512b1
   *require* separate expressions on the command line. These include:
9512b1

9512b1
      - all labels (':a', ':more', etc.)
9512b1
      - all branching instructions ('b', 't')
9512b1
      - commands to read and write files ('r' and 'w')
9512b1
      - any closing brace, '}'
9512b1

9512b1
   If these commands are used, they must be the LAST commands of an
9512b1
   expression. Subsequent commands must use another expression
9512b1
   (another -e switch plus arguments).  E.g.,
9512b1

9512b1
     sed  -e :a -e 's/^.\{1,77\}$/ &/;ta' -e 's/\( *\)\1/\1/' files
9512b1

9512b1
   GNU sed, ssed, sed15 and sed16 all permit these commands to be
9512b1
   followed by a semicolon, so the previous script can be written:
9512b1

9512b1
     sed  ':a;s/^.\{1,77\}$/ &/;ta;s/\( *\)\1/\1/' files
9512b1

9512b1
   Versions differ in implementing the 'a' (append), 'c' (change), and
9512b1
   'i' (insert) commands:
9512b1

9512b1
      sed "/foo/i New text here"              # HHsed/sedmod/gsed-30280
9512b1
      gsed -e "/foo/i\\" -e "New text here"   # GNU sed
9512b1
      sed1 -e "/foo/i" -e "New text here"     # one version of sed
9512b1
      sed2 "/foo/i\ New text here"            # another version
9512b1

9512b1
6.7.2. Using comments (prefixed by the '#' sign)
9512b1

9512b1
   Most versions of sed permit comments to appear in sed scripts only
9512b1
   on the first line of the script. Comments on line 2 or thereafter
9512b1
   are not recognized and will generate an error like "unrecognized
9512b1
   command" or "command [bad-line-here] has trailing garbage".
9512b1

9512b1
   GNU sed, HHsed, sedmod, and HP-UX sed permit comments to appear on
9512b1
   any line of the script, except after labels and branching commands
9512b1
   (b,t), *provided* that a semicolon (;) occurs after the command
9512b1
   itself. This syntax makes sed similar to awk and perl, which use a
9512b1
   similar commenting structure in their scripts.  Thus,
9512b1

9512b1
      # GNU style sed script
9512b1
      $!N;                        # except for last line, get next line
9512b1
      s/^\([0-9]\{5\}\).*\n\1.*//;    # if first 5 digits of each line
9512b1
                                      # match, delete BOTH lines.
9512b1
      t skip
9512b1
      P;                              # print 1st line only if no match
9512b1
      :skip
9512b1
      D;                    # delete 1st line of pattern space and loop
9512b1
      #---end of script---
9512b1

9512b1
   is a valid script for GNU-based versions of sed, but is
9512b1
   unrecognized for most other versions of sed.
9512b1

9512b1
   Finally, if the first two characters in a disk file script are
9512b1
   "#n", the output is suppressed, exactly as if -n were entered on
9512b1
   the command line. This is true for the following versions of sed:
9512b1

9512b1
      - ssed v3.57 and above
9512b1
      - gsed
9512b1
      - HHsed v1.5
9512b1
      - sed v1.6
9512b1

9512b1
   This syntax is not recognized by these versions of sed:
9512b1

9512b1
      - ssed v3.45 to v3.50 (other versions untested)
9512b1
      - sedmod v1.0
9512b1

9512b1
6.7.3. Special syntax in REs
9512b1

9512b1
A. HHsed v1.5 (by Howard Helman)
9512b1

9512b1
   The following expressions can be used for /RE/ addresses or in the
9512b1
   LHS side of a substitution:
9512b1

9512b1
      +    - 1 or more occurrences of previous RE: same as \{1,\}
9512b1
      \<   - boundary between nonword and word character
9512b1
      \>   - boundary between word and nonword character
9512b1

9512b1
   The following expressions can be used for /RE/ addresses or on
9512b1
   either side of a substitution:
9512b1

9512b1
      \a   - bell         (ASCII 07, 0x07)
9512b1
      \b   - backspace    (ASCII 08, 0x08)
9512b1
      \e   - escape       (ASCII 27, 0x1B)
9512b1
      \f   - formfeed     (ASCII 12, 0x0C)
9512b1
      \n   - newline      (printed as 2 bytes, 0D 0A or ^M^J, in DOS)
9512b1
      \r   - return       (ASCII 13, 0x0D)
9512b1
      \t   - tab          (ASCII 09, 0x09)
9512b1
      \v   - vertical tab (ASCII 11, 0x0B)
9512b1
      \xHH - the ASCII character corresponding to 2 hex digits HH.
9512b1

9512b1
B. sed v1.6 (by Walter Briscoe)
9512b1

9512b1
   sed v1.6 accepts every expression supported by sed v1.5 (above),
9512b1
   plus the following elements, which can also used in the RHS of a
9512b1
   substitution (in addition to those listed above):
9512b1

9512b1
      \\~  - insert replacement pattern defined in last s/// command
9512b1
             (must be used alone in the RHS)
9512b1
      \l   - change next element to lower case
9512b1
      \L   - change remaining elements to lower case
9512b1
      \u   - change next element to upper case
9512b1
      \U   - change remaining elements to upper case
9512b1
      \e   - end case conversion of next element
9512b1
      \E   - end case conversion of remaining elements
9512b1
      $0   - insert pattern space BEFORE the substitution
9512b1
      $1-$9 - match Nth word on the pattern space
9512b1

9512b1

9512b1
C. sedmod v1.0 (by Hern Chen)
9512b1

9512b1
   The following expressions can be used for /RE/ addresses in the LHS
9512b1
   of a substitution:
9512b1

9512b1
      +    - 1 or more occurrences of previous RE: same as \{1,\}
9512b1
      \a   - any alphanumeric: same as [a-zA-Z0-9]
9512b1
      \A   - 1 or more alphas: same as \a+
9512b1
      \d   - any digit: same as [0-9]
9512b1
      \D   - 1 or more digits: same as \d+
9512b1
      \h   - any hex digit: same as [0-9a-fA-F]
9512b1
      \H   - 1 or more hexdigits: same as \h+
9512b1
      \l   - any letter: same as [A-Za-z]
9512b1
      \L   - 1 or more letters: same as \l+
9512b1
      \n   - newline      (read as 2 bytes, 0D 0A or ^M^J, in DOS)
9512b1
      \s   - any whitespace character: space, tab, or vertical tab
9512b1
      \S   - 1 or more whitespace chars: same as \s+
9512b1
      \t   - tab          (ASCII 09, 0x09)
9512b1
      \<   - boundary between nonword and word character
9512b1
      \>   - boundary between word and nonword character
9512b1

9512b1
   The following expressions can be used in the RHS of a substitution.
9512b1
   "Elements" refer to \1 .. \9, &, $0, or $1 .. $9:
9512b1

9512b1
      &    - insert regexp defined on LHS
9512b1
      \e   - end case conversion of next element
9512b1
      \E   - end case conversion of remaining elements
9512b1
      \l   - change next element to lower case
9512b1
      \L   - change remaining elements to lower case
9512b1
      \n   - newline      (printed as 2 bytes, 0D 0A or ^M^J, in DOS)
9512b1
      \t   - tab          (ASCII 09, 0x09)
9512b1
      \u   - change next element to upper case
9512b1
      \U   - change remaining elements to upper case
9512b1
      $0   - insert the original pattern space
9512b1
      $1-$9 - match Nth word on the pattern space
9512b1

9512b1
D. UnixDos sed
9512b1

9512b1
   The following expressions can be used in text, LHS, and RHS:
9512b1

9512b1
      \n   - newline      (printed as 2 bytes, 0D 0A or ^M^J, in DOS)
9512b1

9512b1
E. GNU sed v1.03 (by Frank Whaley)
9512b1

9512b1
   When used with the -x (extended) switch on the command line, or
9512b1
   when '#x' occurs as the first line of a script, Whaley's gsed103
9512b1
   supports the following expressions in both the LHS and RHS of a
9512b1
   substitution:
9512b1

9512b1
      \|      matches the expression on either side
9512b1
      ?       0 or 1 occurrences of previous RE: same as \{0,1\}
9512b1
      +       1 or more occurrence of previous RE: same as \{1,\}
9512b1
      \a      "alert" beep     (BEL, Ctrl-G, 0x07)
9512b1
      \b      backspace        (BS, Ctrl-H, 0x08)
9512b1
      \f      formfeed         (FF, Ctrl-L, 0x0C)
9512b1
      \n      newline          (LF, Ctrl-J, 0x0A)
9512b1
      \r      carriage-return  (CR, Ctrl-M, 0x0D)
9512b1
      \t      horizontal tab   (HT, Ctrl-I, 0x09)
9512b1
      \v      vertical tab     (VT, Ctrl-K, 0x0B)
9512b1
      \bBBB   binary char, where BBB are 1-8 binary digits, [0-1]
9512b1
      \dDDD   decimal char, where DDD are 1-3 decimal digits, [0-9]
9512b1
      \oOOO   octal char, where OOO are 1-3 octal digits, [0-7]
9512b1
      \xHH    hex char, where HH are 1-2 hex digits, [0-9A-F]
9512b1

9512b1
   In normal mode, with or without the -x switch, the following escape
9512b1
   sequences are also supported in regex addressing or in the LHS of a
9512b1
   substitution:
9512b1

9512b1
      \`      matches beginning of pattern space: same as /^/
9512b1
      \'      matches end of pattern space: same as /$/
9512b1
      \B      boundary between 2 word or 2 nonword characters
9512b1
      \w      any nonword character [*BUG!* should be a word char]
9512b1
      \W      any nonword character: same as /[^A-Za-z0-9]/
9512b1
      \<      boundary between nonword and word char
9512b1
      \>      boundary between word and nonword char
9512b1

9512b1
F. GNU sed v2.05 and higher versions
9512b1

9512b1
   The following expressions can be used for /RE/ addresses or in the
9512b1
   LHS side of a substitution:
9512b1

9512b1
      \`  - matches the beginning of the pattern space (same as "^")
9512b1
      \'  - matches the end of the pattern space (same as "$")
9512b1
      \?  - 0 or 1 occurrence of previous character: same as \{0,1\}
9512b1
      \+  - 1 or more occurrences of previous character: same as \{1,\}
9512b1
      \|  - matches the string on either side, e.g., foo\|bar
9512b1
      \b  - boundary between word and nonword chars (reversible)
9512b1
      \B  - boundary between 2 word or between 2 nonword chars
9512b1
      \n  - embedded newline (usable after N, G, or similar commands)
9512b1
      \w  - any word character: [A-Za-z0-9_]
9512b1
      \W  - any nonword char: [^A-Za-z0-9_]
9512b1
      \<  - boundary between nonword and word character
9512b1
      \>  - boundary between word and nonword character
9512b1

9512b1
   On \b, \B, \<, and \>, see section 6.7.4 ("Word boundaries"),
9512b1
   below.
9512b1

9512b1
   Undocumented -r switch:
9512b1

9512b1
   Beginning with version 3.02, GNU sed has an undocumented -r switch
9512b1
   (undocumented till version 4.0), activating Extended Regular
9512b1
   Expressions in the following manner:
9512b1

9512b1
       ?      -  0 or 1 occurrence of previous character
9512b1
       +      -  1 or more occurrences of previous character
9512b1
       |      -  matches the string on either side, e.g., foo|bar
9512b1
       (...)  -  enable grouping without backslash
9512b1
       {...}  -  enable interval expression without backslash
9512b1

9512b1
   When the -r switch (mnemonic: "regular expression") is used, prefix
9512b1
   these symbols with a backslash to disable the special meaning.
9512b1

9512b1
   Escape sequences:
9512b1

9512b1
   Beginning with version 3.02.80, the following escape sequences can
9512b1
   now be used on both sides of a "s///" substitution:
9512b1

9512b1
      \a      "alert" beep     (BEL, Ctrl-G, 0x07)
9512b1
      \f      formfeed         (FF, Ctrl-L, 0x0C)
9512b1
      \n      newline          (LF, Ctrl-J, 0x0A)
9512b1
      \r      carriage-return  (CR, Ctrl-M, 0x0D)
9512b1
      \t      horizontal tab   (HT, Ctrl-I, 0x09)
9512b1
      \v      vertical tab     (VT, Ctrl-K, 0x0B)
9512b1
      \oNNN   a character with the octal value NNN
9512b1
      \dNNN   a character with the decimal value NNN
9512b1
      \xHH    a character with the hexadecimal value HH
9512b1

9512b1
   Note that GNU sed also supports "character classes", a POSIX
9512b1
   extension to regexes, described in section 3.7, above.
9512b1

9512b1
G. sed 4.0 and higher versions
9512b1

9512b1
   The following expressions can be used in the RHS of a substitution.
9512b1

9512b1
      \e   - end case conversion
9512b1
      \l   - change next character to lower case
9512b1
      \L   - change remaining text to lower case
9512b1
      \n   - newline      (printed as 2 bytes, 0D 0A or ^M^J, in DOS)
9512b1
      \t   - tab          (ASCII 09, 0x09)
9512b1
      \u   - change next character to upper case
9512b1
      \U   - change remaining text to upper case
9512b1

9512b1
   In addition, GNU sed 4.0 can modify the way ^ and $ are interpreted,
9512b1
   so that ^ can also match an empty string after a newline character,
9512b1
   and $ can also match an empty string before a newline character (to
9512b1
   do this, add an "M" after the regular expression terminator, like
9512b1
   /^>/M -- see section 3.1.1). Even if you use this feature, \` and \'
9512b1
   still match the beginning and the end of the pattern space,
9512b1
   respectively.
9512b1

9512b1
H. ssed
9512b1

9512b1
   Everything that was said for GNU sed applies to ssed as well. In
9512b1
   addition, in Perl-mode (-R switch), these become active or inactive:
9512b1

9512b1
      .     - no longer matches new-line characters
9512b1
      \A    - matches beginning of pattern space
9512b1
      \Z    - matches end of pattern space or last newline in the PS
9512b1
      \z    - matches end of pattern space
9512b1
      \d    - matches any digit: same as [0-9]
9512b1
      \D    - matches any non-digit: same as [^0-9]
9512b1
      \`    - no longer matches beginning of pattern space
9512b1
      \'    - no longer matches end of pattern space
9512b1
      \<    - no longer matches boundary between nonword & word char
9512b1
      \>    - no longer matches boundary between word & nonword char
9512b1
      \oNNN - no longer matches char with octal value NNN
9512b1
      \dNNN - no longer matches char with decimal value NNN
9512b1
      \NNN  - matches char with octal value NNN
9512b1

9512b1
   Perl mode supports lookahead (?=match) and lookbehind (?<=match)
9512b1
   pattern matching.  The matched text is NOT captured in "&" for s///
9512b1
   replacements!
9512b1

9512b1
      foo(?=bar)   - match "foo" only if "bar" follows it
9512b1
      foo(?!bar)   - match "foo" only if "bar" does NOT follow it
9512b1
      (?<=foo)bar  - match "bar" only if "foo" precedes it
9512b1
      (?
9512b1

9512b1
      (?
9512b1
                  - match "foo" only if NOT preceded by "in", "on" or "at"
9512b1
      (?<=\d{3})(?
9512b1
                  - match "foo" only if preceded by 3 digits other than "999"
9512b1

9512b1
  In Perl mode, there are two new switches in /addressing/ or s///
9512b1
  commands. Switches may be lowercase in s/// commands, but must be
9512b1
  uppercase in /addressing/:
9512b1

9512b1
       /S  - lets "." match a newline also
9512b1
       /X  - extra whitespace is ignored. See below, for sample usage.
9512b1

9512b1
   Here are some examples of Perl-style regular expressions. Use the -R
9512b1
   switch.
9512b1

9512b1
     (?i)abc    - case-insensitive match of abc, ABC, aBc, ABc, etc.
9512b1
     ab(?i)c    - same as above; the (?i) applies throughout the pattern
9512b1
     (ab(?i)c)  - matches abc or abC; the outer parens make the difference!
9512b1
     (?m)       - multi-line pattern space: same as "s/FIND/REPL/M"
9512b1
     (?s)       - set "." to match newline also: same as "s/FIND/REPL/S"
9512b1
     (?x)       - ignore whitespace and #comments; see section (9) below.
9512b1

9512b1
     (?:abc)foo    - match "abcfoo", but do not capture 'abc' in \1
9512b1
     (?:ab|cd)ef   - match "abef" or "cdef"; only 'cd' is captured in \1
9512b1
     (?#remark)xy  - match "xy"; remarks after "#" are ignored.
9512b1

9512b1
   And here are some sample uses of /X switch to add comments to complex
9512b1
   expressions. To embed literal spaces, precede with \ or put inside
9512b1
   [brackets].
9512b1

9512b1
     # ssed script to change "(123) 456-7890" into "[ac123] 456-7890"
9512b1
     #
9512b1
     s/ # BACKSLASH IS NEEDED AT END OF EACH LINE!   \
9512b1
     \(                   # literal left paren, (    \
9512b1
     (\d{3})              # 3 digits                 \
9512b1
     \)                   # literal right paren, )   \
9512b1
     [ \t]*               # zero or more spaces or tabs  \
9512b1
     (\d{3}-\d{4})        # 3 digits, hyphen, 4 digits   \
9512b1
     /[ac\1] \2/gx;       # replace g(lobally), with e(x)tended spacing
9512b1

9512b1
6.7.4. Word boundaries
9512b1

9512b1
   GNU sed, ssed, sed16, sed15 and sedmod use certain symbols to define
9512b1
   the boundary between a "word character" and a nonword character. A
9512b1
   word character fits the regex "[A-Za-z0-9_]". Note: a word character
9512b1
   includes the underscore "_" but not the hyphen, probably because the
9512b1
   underscore is permissible as a label in sed and in other scripting
9512b1
   languages. (In gsed103, a word character did NOT include the
9512b1
   underscore; it included alphanumerics only.)
9512b1

9512b1
   These symbols include '\<' and '\>' (gsed, ssed, sed15, sed16,
9512b1
   sedmod) and '\b' and '\B' (gsed only). Note that the boundary
9512b1
   symbols do not represent a character, but a position on the line.
9512b1
   Word boundaries are used with literal characters or character sets
9512b1
   to let you match (and delete or alter) whole words without
9512b1
   affecting the spaces or punctuation marks outside of those words.
9512b1
   They can only be used in a "/pattern/" address or in the LHS of a
9512b1
   's/LHS/RHS/' command. The following table shows how these symbols
9512b1
   may be used in HHsed and GNU sed. Sedmod matches the syntax of
9512b1
   HHsed.
9512b1

9512b1
      Match position      Possible word boundaries   HHsed   GNU sed
9512b1
      ---------------------------------------------------------------
9512b1
      start of word    [nonword char]^[word char]      \<    \< or \b
9512b1
      end of word         [word char]^[nonword char]   \>    \> or \b
9512b1
      middle of word      [word char]^[word char]     none      \B
9512b1
      outside of word  [nonword char]^[nonword char]  none      \B
9512b1
      ---------------------------------------------------------------
9512b1

9512b1
   In ssed, the symbols '\<' and '\>' lose their special meaning when
9512b1
   the -R switch is used to invoke Perl-style expressions. However,
9512b1
   the identical meaning of '\<' and '\>' can be obtained through
9512b1
   these nonmatching, zero-width assertions:
9512b1

9512b1
       (?
9512b1

9512b1
6.7.5. Commands which operate differently
9512b1

9512b1
A. GNU sed version 3.02 and 3.02.80
9512b1

9512b1
   The N command no longer discards the contents of the pattern space
9512b1
   upon reaching the end of file. This is not a bug, it's a feature.
9512b1
   However, it breaks certain scripts which relied on the older
9512b1
   behavior of N.
9512b1

9512b1
   'N' adds the Next line to the pattern space, enabling multiple
9512b1
   lines to be stored and acted upon. Upon reaching the last line of
9512b1
   the file, if the N command was issued again, the contents of the
9512b1
   pattern space would be silently deleted and the script would abort
9512b1
   (this has been the traditional behavior). For this reason, sed
9512b1
   users generally wrote:
9512b1

9512b1
       $!N;   # to add the Next line to every line but the last one.
9512b1

9512b1
   However, certain sed scripts relied on this behavior, such as the
9512b1
   script to delete trailing blank lines at the end of a file (see
9512b1
   script #12 in section 3.2, "Common one-line sed scripts", above).
9512b1
   Also, classic textbooks such as Dale Dougherty and Arnold Robbins'
9512b1
   _sed & awk_ documented the older behavior.
9512b1

9512b1
   The GNU sed maintainer felt that despite the portability problems
9512b1
   this would cause, changing the N command to print (rather than
9512b1
   delete) the pattern space was more consistent with one's intuitions
9512b1
   about how a command to "append the Next line" _ought_ to behave.
9512b1
   Another fact favoring the change was that "{N;command;}" will
9512b1
   delete the last line if the file has an odd number of lines, but
9512b1
   print the last line if the file has an even number of lines.
9512b1

9512b1
   To convert scripts which used the former behavior of N (deleting
9512b1
   the pattern space upon reaching the EOF) to scripts compatible with
9512b1
   all versions of sed, change a lone "N;" to "$d;N;".
9512b1

9512b1
------------------------------
9512b1

9512b1
7. KNOWN BUGS AMONG SED VERSIONS
9512b1

9512b1
   Most versions of GNU sed and ssed contain a "buglist" in the
9512b1
   archive source code of known errors or reported behaviors that may
9512b1
   be misconstrued as bugs. This portion of the sed FAQ does _not_
9512b1
   attempt to fully reproduce those buglists files. However, we do
9512b1
   seek to do some substantial reporting, particularly where certain
9512b1
   programs have no "buglist" of their own or are not being actively
9512b1
   maintained.
9512b1

9512b1
   As a rule of thumb, if the bug "bites" someone on the sed-users
9512b1
   mailing list, I tend to report it.
9512b1

9512b1
7.1. ssed v3.59 (by Paolo Bonzini)
9512b1

9512b1
   (1) N does not discard the contents of the pattern space upon
9512b1
   reaching the end of file; not a bug. See section 6.7.5.A, above.
9512b1

9512b1
   (2) If \x26 is entered into the RHS of a substitution, it is
9512b1
   interpreted as an ampersand metacharacter, and the entire pattern
9512b1
   matched in the "find" portion is inserted at that point. A literal
9512b1
   ampersand should be inserted instead.
9512b1

9512b1
   (3) Under Windows 2000, the -i switch doesn't create backup files
9512b1
   properly. When passed one or more files to process, the source
9512b1
   file(s) are unchanged, and the output changed files are given
9512b1
   filenames like sedDOSxyz with no way to correspond them with the
9512b1
   names of the source files.
9512b1

9512b1
7.2. GNU sed v4.0 - v4.0.5
9512b1

9512b1
   (1) N does not discard the contents of the pattern space upon
9512b1
   reaching the end of file; not a bug. See section 6.7.5.A, above.
9512b1

9512b1
   (2) If \x26 is entered into the RHS of a substitution, it is
9512b1
   interpreted as an ampersand metacharacter, and the entire pattern
9512b1
   matched in the "find" portion is inserted at that point. A literal
9512b1
   ampersand should be inserted instead.
9512b1

9512b1
7.3. GNU sed v3.02.80
9512b1

9512b1
   (1) N does not discard the contents of the pattern space upon
9512b1
   reaching the end of file; not a bug. See section 6.7.5.A, above.
9512b1

9512b1
   (2) Same as #2 for GNU sed v4.0, above.
9512b1

9512b1
7.4. GNU sed v3.02
9512b1

9512b1
   (1) Affects only v3.02 binaries compiled with DJGPP for MS-DOS and
9512b1
   MS-Windows: 'l' (list) command does not display a lone carriage
9512b1
   return (0x0D, ^M) embedded in a line.
9512b1

9512b1
   (2) The expression "\<" causes problems when attempting the
9512b1
   following types of substitutions, which should print "+aaa +bbb":
9512b1

9512b1
       echo aaa bbb | sed 's/\</+/g'    # prints "+a+a+a +b+b+b"
9512b1
       echo aaa bbb | sed 's/\<./+&/g'  # prints "+a+a+a +b+b+b"
9512b1

9512b1
   (3) The N command no longer discards the contents of the pattern
9512b1
   space upon reaching the end of file. This is not a bug, it's a
9512b1
   feature. See section 6.7.5, "Commands which operate differently".
9512b1

9512b1
7.5. GNU sed v2.05
9512b1

9512b1
   (1) If a number follows the substitute command (e.g., s/f/F/10) and
9512b1
   the number exceeds the possible matches on the pattern space, the
9512b1
   command 't label' _always_ jumps to the specified label. 't' should
9512b1
   jump only if the substitution was successful (or returned "true").
9512b1

9512b1
   (2) 'l' (list) command does not convert the following characters to
9512b1
   hex values, but passes them through unchanged: 0xF7, 0xFB, 0xFC,
9512b1
   0xFD, 0xFE.
9512b1

9512b1
   (3) A range address like "/foo/,14" is supposed to match every line
9512b1
   from the first occurrence of "foo" until line 14, inclusive, and
9512b1
   then match only those lines containing "foo" thereafter. In gsed
9512b1
   v2.05, if "foo" occurs later in the file, every line from there to
9512b1
   the end of file will be matched (since gsed is looking for line 14
9512b1
   to occur again!).
9512b1

9512b1
   (4) The regexes /\`/ and /\'/ are not interpreted as a backquote
9512b1
   and apostrophe, as might be expected. Instead, they are used to
9512b1
   represent the beginning-of-line and end-of-line (respectively), to
9512b1
   conform with similar regexes in the GNU versions of Emacs and awk.
9512b1
   As a consequence, there is no clear way to indicate an apostrophe,
9512b1
   since a bare apostrophe (') has special meaning to the Unix shell
9512b1
   and the quoted apostrophe (\') is interpreted as the EOL. A
9512b1
   double-quote apostrophe (\\') was interpreted as a backslash to sed
9512b1
   and a quote mark to the shell--again, not providing the expected
9512b1
   results. This syntax changed in the next version of gsed.
9512b1

9512b1
   (5) Multiple occurrences of the 'w' command fail, as shown here,
9512b1
   given that both "aaa" and "bbb" occur within the file:
9512b1

9512b1
       gsed -e "/aaa/w FILE" -e "/bbb/w FILE" input.txt
9512b1

9512b1
   (6) The expression "\<" causes problems when attempting the
9512b1
   following type of substitution, which should print "+aaa +bbb":
9512b1

9512b1
       echo aaa bbb | sed 's/\</+/g'    # sed hangs up with no output
9512b1

9512b1
   The syntax 's/\<./+&/g' issues the proper output.
9512b1

9512b1
7.6. GNU sed v1.18
9512b1

9512b1
   (1) Same as #1 for GNU sed v2.05, above.
9512b1

9512b1
   (2) The following command will lock the computer under Win95. Echos
9512b1
   is an echo command that does not issue a trailing newline:
9512b1

9512b1
       echos any_word | gsed "s/[ ]*$//"
9512b1

9512b1
   (3) Same as #3 for GNU sed v2.05, above.
9512b1

9512b1
7.7. GNU sed v1.03 (by Frank Whaley)
9512b1

9512b1
   (1) The \w and \W escape sequences both match only nonword
9512b1
   characters. \w is misdefined and should match word characters.
9512b1

9512b1
   (2) The underscore is defined as a nonword character; it should be
9512b1
   defined as a word character.
9512b1

9512b1
   (3) same as #3 for GNU sed v2.05, above.
9512b1

9512b1
7.8. sed v1.6 (by Walter Briscoe) - still in beta version
9512b1

9512b1
   (1) Duplicated subexpressions (still) do not match an empty set as
9512b1
   they should. This problem was inherited from HHsed15.
9512b1

9512b1
       echo 123 | sed "s/\([a-z][a-z]\)*/=\1/"  # does not return '='
9512b1

9512b1
   (2) If grouping is followed by a + operator, nothing is matched.
9512b1
   This problem was inherited from HHsed; it fixed a bug with the *
9512b1
   operator, but the problem with the + operator persists.
9512b1

9512b1
       echo aaa | sed "/\(a\)+/d"          # nothing is deleted.
9512b1

9512b1
   (3) With the interval expressions \{1,\} and +, there is a bug
9512b1
   related to the & replacement character. This affected the BETA
9512b1
   release, and it's not known if it affects the final release.
9512b1

9512b1
       echo ab | sed "s/a[^a]*/&c/"        # returns 'abc'. Okay.
9512b1
       echo ab | sed "s/a[^a]+/&c/"        # returns 'ab'. Bug!
9512b1
       echo ab | sed "s/a[^a]\{1,\}/&c/"   # returns 'ab'. Bug!
9512b1

9512b1
7.9. HHsed v1.5 (by Howard Helman)
9512b1

9512b1
   (1) If a number follows the substitute command (e.g., s/foo/bar/2),
9512b1
   in a sed script entered from the command line, two semicolons must
9512b1
   follow the number, or they must be separated by an -e switch.
9512b1
   Normally, only 1 semicolon is needed to separate commands.
9512b1

9512b1
       echo bit bet | HHsed "s/b/n/2;;s/b/B/"          # solution 1
9512b1
       echo bit bet | HHsed -e "s/b/n/2" -e "s/b/B"    # solution 2
9512b1

9512b1
   (2) If the substitute command is followed by a number and a "p"
9512b1
   flag, when the -n switch is used, the "p" flag must occur first.
9512b1

9512b1
       echo aaa | HHsed -n "s/./B/3p"    # bug! nothing prints
9512b1
       echo aaa | HHsed -n "s/./B/p3"    # prints "aaB" as expected
9512b1

9512b1
   (3) The following commands will cause HHsed to lock the computer
9512b1
   under MS-DOS or Win95. Note that they occur because of malformed
9512b1
   regular expressions which will match no characters.
9512b1

9512b1
       sed -n "p;s/\<//g;" file
9512b1
       sed -n "p;s/[char-set]*//g;" file
9512b1

9512b1
   (4) The range command '/RE1/,/RE2/' in HHsed will match one line if
9512b1
   both regexes occur on the same line (see section 3.4(3), above).
9512b1
   Though this could be construed as a feature, it should probably be
9512b1
   considered a bug since its operation differs from every other
9512b1
   version of sed. For example, '/----/,/----/{s/^/>>/;}' should put
9512b1
   two angle brackets ">>" before every line which is sandwiched
9512b1
   between a row of 4 or more hyphens. With HHsed, this command will
9512b1
   only prefix the hyphens themselves with the angle brackets.
9512b1

9512b1
   (5) If the hold space is empty, the H command copies the pattern
9512b1
   space to the hold space but fails to prepend a leading newline. The
9512b1
   H command is supposed to add a newline, followed by the contents of
9512b1
   the pattern space, to the hold space at all times. A workaround is
9512b1
   "{G;s/^\(.*\)\(\n\)$/\2\1/;H;s/\n$//;}", but it requires knowing
9512b1
   that the hold space is empty and using the command only once.
9512b1
   Another alternative is to use the G or the h command alone at key
9512b1
   points in the script.
9512b1

9512b1
   (6) If grouping is followed by an '*' or '+' operator, HHsed does
9512b1
   not match the pattern, but issues no warning. See below:
9512b1

9512b1
       echo aaa | HHsed "/\(a\)*/d"      # nothing is deleted
9512b1
       echo aaa | HHsed "/\(a\)+/d"      # nothing is deleted
9512b1
       echo aaa | HHsed "s/\(a\)*/\1B/"  # nothing is changed
9512b1
       echo aaa | HHsed "s/\(a\)+/\1B/"  # nothing is changed
9512b1

9512b1
   (7) If grouping is followed by an interval expression, HHsed halts
9512b1
   with the error message "garbled command", in all of the following
9512b1
   examples:
9512b1

9512b1
       echo aaa | HHsed "/\(a\)\{3\}/d"
9512b1
       echo aaa | HHsed "/\(a\)\{1,5\}/d"
9512b1
       echo aaa | HHsed "s/\(a\)\{3\}/\1B/"
9512b1

9512b1
   (8) In interval expressions, 0 is not supported. E.g., \{0,3\)
9512b1

9512b1
7.10. sedmod v1.0 (by Hern Chen)
9512b1

9512b1
   Technically, the following are limits (or features?) of sedmod, not
9512b1
   bugs, since the docs for sedmod do not claim to support these
9512b1
   missing features.
9512b1

9512b1
   (1) sedmod does not support standard interval expressions  \{...\}
9512b1
   present in nearly all versions of sed.
9512b1

9512b1
   (2) If grouping is followed by an '*' or '+' operator, sedmod gives
9512b1
   a "garbled command" message. However, if the grouped expressions
9512b1
   are strings literals with no metacharacters, a partial workaround
9512b1
   can be done like so:
9512b1

9512b1
       \(string\)\1*    # matches 1 or more instances of 'string'
9512b1
       \(string\)\1+    # matches 2 or more instances of 'string'
9512b1

9512b1
   (3) sedmod does not support a numeric argument after the s///
9512b1
   command, as in 's/a/b/3', present in nearly all versions of sed.
9512b1

9512b1
   The following are bugs in sedmod v1.0:
9512b1

9512b1
   (4) When the -i (ignore case) switch is used, the '/regex/d'
9512b1
   command is not properly obeyed. Sedmod may miss one or more lines
9512b1
   matching the expression, regardless of where they occur in the
9512b1
   script. Workaround: use "/regex/{d;}" instead.
9512b1

9512b1
7.11. HP-UX sed
9512b1

9512b1
   (1) Versions of HP-UX sed up to and including version 10.20 are
9512b1
   buggy. According to the README file, which comes with the GNU cc
9512b1
   at <ftp://ftp.ntua.gr/pub/gnu/sed/sed-2.05.bin.README>:
9512b1

9512b1
   "When building gcc on a hppa*-*-hpux10 platform, the `fixincludes'
9512b1
   step (which involves running a sed script) fails because of a bug
9512b1
   in the vendor's implementation of sed.  Currently the only known
9512b1
   workaround is to install GNU sed before building gcc.  The file
9512b1
   sed-2.05.bin.hpux10 is a precompiled binary for that platform."
9512b1

9512b1
7.12. SunOS sed v4.1
9512b1

9512b1
   (1) Bug occurs in RE pattern matching when a non-null '[char-set]*'
9512b1
   is followed by a null '\NUM' pattern recall, illustrated here and
9512b1
   reported by Greg Ubben:
9512b1

9512b1
       s/\(a\)\(b*\)cd\1[0-9]*\2foo/bar/  # between '[0-9]*' and '\2'
9512b1
       s/\(a\{0,1\}\).\{0,1\}\1/bar/      # between '.\{0,1\}' and '\1'
9512b1

9512b1
   Workaround: add a do-nothing 'X*' expression which will not match
9512b1
   any characters on the line between the two components. E.g.,
9512b1

9512b1
       s/\(a\)\(b*\)cd\1[0-9]*X*\2foo/bar/
9512b1
       s/\(a\{0,1\}\).\{0,1\}X*\1/bar/
9512b1

9512b1
7.13. SunOS sed v5.6
9512b1

9512b1
   (1) If grouping is followed by an asterisk, SunOS sed does not match
9512b1
   the null string, which it should do. The following command:
9512b1

9512b1
       echo foo | sed 's/f\(NO-MATCH\)*/g\1/'
9512b1

9512b1
   should transform "foo" to "goo" under normal versions of sed.
9512b1

9512b1
7.14. Ultrix sed v4.3
9512b1

9512b1
   (1) If grouping is followed by an asterisk, Ultrix sed replies with
9512b1
   "command garbled", as shown in the following example:
9512b1

9512b1
       echo foo | sed 's/f\(NO-MATCH\)*/g\1/'
9512b1

9512b1
   (2) If grouping is followed by a numeric operator such as \{0,9\},
9512b1
   Ultrix sed does not find the match.
9512b1

9512b1
7.15. Digital Unix sed
9512b1

9512b1
   (1) The following comes from the man pages for sed distributed with
9512b1
   new, 1998 versions of Digital Unix (reformatted to fit our
9512b1
   margins):
9512b1

9512b1
   [Digital]  The h subcommand for sed does not work properly.  When
9512b1
   you use the  h subcommand to place text into the hold area, only
9512b1
   the last line of the specified text is saved.  You can use the H
9512b1
   subcommand to append text to the hold area. The H subcommand and
9512b1
   all others dealing with the hold area work correctly.
9512b1

9512b1
   (2) "$d" command issues an error message, "cannot parse".  Reported
9512b1
   by Carlos Duarte on 8 June 1998.
9512b1

9512b1
[end-of-file]