5a67ee

5a67ee
Archive-Name: editor-faq/sed
5a67ee
Posting-Frequency: irregular
5a67ee
Last-modified: 10 March 2003
5a67ee
Version: 015
5a67ee
URL: http://sed.sourceforge.net/sedfaq.html
5a67ee
Maintainer: Eric Pement (pemente@northpark.edu)
5a67ee

5a67ee
                            THE SED FAQ
5a67ee

5a67ee
                  Frequently Asked Questions about
5a67ee
                       sed, the stream editor
5a67ee

5a67ee
CONTENTS
5a67ee

5a67ee
1. GENERAL INFORMATION
5a67ee
1.1. Introduction - How this FAQ is organized
5a67ee
1.2. Latest version of the sed FAQ
5a67ee
1.3. FAQ revision information
5a67ee
1.4. How do I add a question/answer to the sed FAQ?
5a67ee
1.5. FAQ abbreviations
5a67ee
1.6. Credits and acknowledgements
5a67ee
1.7. Standard disclaimers
5a67ee

5a67ee
2. BASIC SED
5a67ee
2.1. What is sed?
5a67ee
2.2. What versions of sed are there, and where can I get them?
5a67ee

5a67ee
2.2.1. Free versions
5a67ee

5a67ee
2.2.1.1. Unix platforms
5a67ee
2.2.1.2. OS/2
5a67ee
2.2.1.3. Microsoft Windows (Win3x, Win9x, WinNT, Win2K)
5a67ee
2.2.1.4. MS-DOS
5a67ee
2.2.1.5. CP/M
5a67ee
2.2.1.6. Macintosh v8 or v9
5a67ee

5a67ee
2.2.2. Shareware and Commercial versions
5a67ee

5a67ee
2.2.2.1. Unix platforms
5a67ee
2.2.2.2. OS/2
5a67ee
2.2.2.3. Windows 95/98, Windows NT, Windows 2000
5a67ee
2.2.2.4. MS-DOS
5a67ee

5a67ee
2.3. Where can I learn to use sed?
5a67ee

5a67ee
2.3.1. Books
5a67ee
2.3.2. Mailing list
5a67ee
2.3.3. Tutorials, electronic text
5a67ee
2.3.4. General web and ftp sites
5a67ee

5a67ee
3. TECHNICAL
5a67ee
3.1. More detailed explanation of basic sed
5a67ee
3.1.1.  Regular expressions on the left side of "s///"
5a67ee
3.1.2.  Escape characters on the right side of "s///"
5a67ee
3.1.3.  Substitution switches
5a67ee
3.2. Common one-line sed scripts. How do I . . . ?
5a67ee

5a67ee
      - double/triple-space a file?
5a67ee
      - convert DOS/Unix newlines?
5a67ee
      - delete leading/trailing spaces?
5a67ee
      - do substitutions on all/certain lines?
5a67ee
      - delete consecutive blank lines?
5a67ee
      - delete blank lines at the top/end of the file?
5a67ee

5a67ee
3.3. Addressing and address ranges
5a67ee
3.4. Address ranges in GNU sed and HHsed
5a67ee
3.5. Debugging sed scripts
5a67ee
3.6. Notes about s2p, the sed-to-perl translator
5a67ee
3.7. GNU/POSIX extensions to regular expressions
5a67ee

5a67ee
4. EXAMPLES
5a67ee
   ONE-CHARACTER QUESTIONS
5a67ee
4.1.  How do I insert a newline into the RHS of a substitution?
5a67ee
4.2.  How do I represent control-codes or non-printable characters?
5a67ee
4.3.  How do I convert files with toggle characters, like +this+,
5a67ee
      to look like [i]this[/i]?
5a67ee

5a67ee
   CHANGING STRINGS
5a67ee
4.10. How do I perform a case-insensitive search?
5a67ee
4.11. How do I match only the first occurrence of a pattern?
5a67ee
4.12. How do I parse a comma-delimited (CSV) data file?
5a67ee
4.13. How do I handle fixed-length, columnar data?
5a67ee
4.14. How do I commify a string of numbers?
5a67ee
4.15. How do I prevent regex expansion on substitutions?
5a67ee
4.16. How do I convert a string to all lowercase or capital letters?
5a67ee

5a67ee
   CHANGING BLOCKS (consecutive lines)
5a67ee
4.20. How do I change only one section of a file?
5a67ee
4.21. How do I delete or change a block of text if the block contains
5a67ee
      a certain regular expression?
5a67ee
4.22. How do I locate a paragraph of text if the paragraph contains a
5a67ee
      certain regular expression?
5a67ee
4.23. How do I match a block of specific consecutive lines?
5a67ee
4.23.1.  Try to use a "/range/, /expression/"
5a67ee
4.23.2.  Try to use a "multi-line\nexpression"
5a67ee
4.23.3.  Try to use a block of "literal strings"
5a67ee
4.24. How do I address all the lines between RE1 and RE2, excluding the lines themselves?
5a67ee
4.25. How do I join two lines if line #1 ends in a [certain string]?
5a67ee
4.26. How do I join two lines if line #2 begins in a [certain string]?
5a67ee
4.27. How do I change all paragraphs to long lines?
5a67ee

5a67ee
   SHELL AND ENVIRONMENT
5a67ee
4.30.   How do I read environment variables with sed ...
5a67ee
4.31.1.   ... on Unix platforms?
5a67ee
4.31.2.   ... on MS-DOS or 4DOS platforms?
5a67ee
4.32.   How do I export or pass variables back into the environment ...
5a67ee
4.32.1.   ... on Unix platforms?
5a67ee
4.32.2.   ... on MS-DOS or 4DOS platforms?
5a67ee
4.33.   How do I handle shell quoting in sed?
5a67ee

5a67ee
   FILES, DIRECTORIES, AND PATHS
5a67ee
4.40.  How do I read (insert/add) a file at the top of a textfile?
5a67ee
4.41.  How do I make substitutions in every file in a directory, or
5a67ee
        in a complete directory tree?
5a67ee
4.41.1.   ... ssed solution
5a67ee
4.41.2.   ... Unix solution
5a67ee
4.41.3.   ... DOS solution
5a67ee
4.42.  How do I replace "/some/UNIX/path" in a substitution?
5a67ee
4.43.  How do I replace "C:\SOME\DOS\PATH" in a substitution?
5a67ee
4.44.  How do I emulate file-includes, using sed?
5a67ee

5a67ee
5. WHY ISN'T THIS WORKING?
5a67ee
5.1.  Why don't my variables like $var get expanded in my sed script?
5a67ee
5.2.  I'm using 'p' to print, but I have duplicate lines sometimes.
5a67ee
5.3.  Why does my DOS version of sed process a file part-way through
5a67ee
      and then quit?
5a67ee
5.4.  My RE isn't matching/deleting what I want it to. (Or, "Greedy vs.
5a67ee
      stingy pattern matching")
5a67ee
5.5.  What is CSDPMI*B.ZIP and why do I need it?
5a67ee
5.6.  Where are the man pages for GNU sed?
5a67ee
5.7.  How do I tell what version of sed I am using?
5a67ee
5.8.  Does sed issue an exit code?
5a67ee
5.9.  The 'r' command isn't inserting the file into the text.
5a67ee
5.10. Why can't I match or delete a newline using the \n escape
5a67ee
      sequence? Why can't I match 2 or more lines using \n?
5a67ee
5.11. My script aborts with an error message, "event not found".
5a67ee

5a67ee
6. OTHER ISSUES
5a67ee
6.1.  I have a problem that stumps me. Where can I get help?
5a67ee
6.2.  How does sed compare with awk, perl, and other utilities?
5a67ee
6.3.  When should I use sed?
5a67ee
6.4.  When should I NOT use sed?
5a67ee
6.5.  When should I ignore sed and use Awk or Perl instead?
5a67ee
6.6.  Known limitations among sed versions
5a67ee
6.7.  Known incompatibilities between sed versions
5a67ee

5a67ee
6.7.1. Issuing commands from the command line
5a67ee
6.7.2. Using comments (prefixed by the '#' sign)
5a67ee
6.7.3. Special syntax in REs
5a67ee
6.7.4. Word boundaries
5a67ee
6.7.5. Commands which operate differently
5a67ee

5a67ee
7. KNOWN BUGS AMONG SED VERSIONS
5a67ee
7.1. ssed v3.59
5a67ee
7.2. GNU sed v4.0 - v4.0.5
5a67ee
7.3. GNU sed v3.02.80
5a67ee
7.4. GNU sed v3.02
5a67ee
7.5. GNU sed v2.05
5a67ee
7.6. GNU sed v1.18
5a67ee
7.7. GNU sed v1.03
5a67ee
7.8. sed v1.6 (Briscoe)
5a67ee
7.9. sed v1.5 (Helman)
5a67ee
7.10. sedmod v1.0 (Chen)
5a67ee
7.11. HP-UX sed
5a67ee
7.12. SunOS sed v4.1
5a67ee
7.13. SunOS sed v5.6
5a67ee
7.14. Ultrix sed v4.3
5a67ee
7.15. Digital Unix sed
5a67ee

5a67ee

5a67ee
------------------------------
5a67ee

5a67ee
1. GENERAL INFORMATION
5a67ee

5a67ee
1.1. Introduction - How this FAQ is organized
5a67ee

5a67ee
   This FAQ is organized to answer common (and some uncommon)
5a67ee
   questions about sed, quickly. If you see a term or abbreviation in
5a67ee
   the examples that seems unclear, see if the term is defined in
5a67ee
   section 1.5. If not, send your comment to pemente[at]northpark.edu.
5a67ee

5a67ee
1.2. Latest version of the sed FAQ
5a67ee

5a67ee
   The newest version of the sed FAQ is usually here:
5a67ee

5a67ee
       http://sed.sourceforge.net/sedfaq.html (HTML version)
5a67ee
       http://sed.sourceforge.net/sedfaq.txt  (plain text)
5a67ee
       http://www.student.northpark.edu/pemente/sed/sedfaq.html
5a67ee
       http://www.student.northpark.edu/pemente/sed/sedfaq.txt
5a67ee
       http://www.faqs.org/faqs/editor-faq/sed
5a67ee
       ftp://rtfm.mit.edu/pub/faqs/editor-faq/sed
5a67ee

5a67ee
   Another FAQ file on sed by a different author can be found here:
5a67ee

5a67ee
       http://www.dreamwvr.com/sed-info/sed-faq.html
5a67ee

5a67ee
1.3. FAQ revision information
5a67ee

5a67ee
   In the plaintext version, changes are shown by a vertical bar (|)
5a67ee
   placed in column 78 of the affected lines. To remove the vertical
5a67ee
   bars (use double quotes for MS-DOS):
5a67ee

5a67ee
     sed 's/  *|$//' sedfaq.txt > sedfaq2.txt
5a67ee

5a67ee
   In the HTML version, vertical bars do not appear. New or altered
5a67ee
   portions of the FAQ are indicated by printing in dark blue type.
5a67ee

5a67ee
   In the text version, words needing emphasis may be surrounded by
5a67ee
   the underscore '_' or the asterisk '*'. In the HTML version, these
5a67ee
   are changed to italics and boldface, respectively.
5a67ee

5a67ee
1.4. How do I add a question/answer to the sed FAQ?
5a67ee

5a67ee
   Word your question briefly and send it to pemente[at]northpark.edu,
5a67ee
   indicating your proposed change. We'll post it on the sed-users
5a67ee
   mailing list (see section 2.3.2) and discuss it there. If it's
5a67ee
   good, your contribution will be added to the next edition.
5a67ee

5a67ee
1.5. FAQ abbreviations
5a67ee

5a67ee
       files = one or more filenames, separated by whitespace
5a67ee
       gsed  = GNU sed
5a67ee
       ssed  = super-sed
5a67ee
       RE    = Regular Expressions supported by sed
5a67ee
       LHS   = the left-hand side ("find" part) of "s/find/repl/" command
5a67ee
       RHS   = the right-hand side ("replace" part) of "s/find/repl/" cmd
5a67ee
       nn+   = version _nn_ or higher (e.g., "15+" = version 1.5 and above)
5a67ee

5a67ee
   files: "files" stands for one or more filenames entered on the
5a67ee
   command line. The names may include any wildcards your shell
5a67ee
   understands (such as ``zork*'' or ``Aug[4-9].let''). Sed will
5a67ee
   process each filename passed to it by the shell.
5a67ee

5a67ee
   RE: For details on regular expressions, see section 3.1.1., below.
5a67ee

5a67ee
1.6. Credits and acknowledgements
5a67ee

5a67ee
   Many of the ideas for this FAQ were taken from the Awk FAQ:
5a67ee
       http://www.faqs.org/faqs/computer-lang/awk/faq/
5a67ee
       ftp://rtfm.mit.edu/pub/usenet/comp.lang.awk/faq
5a67ee

5a67ee
   and from the old Perl FAQ:
5a67ee
       http://www.perl.com/doc/FAQs/FAQ/oldfaq-html/index.html
5a67ee

5a67ee
   The following individuals have contributed significantly to this
5a67ee
   document, and have provided input and wording suggestions for
5a67ee
   questions, answers, and script examples. Credit goes to these
5a67ee
   contributors (in alphabetical order by last name):
5a67ee

5a67ee
   Al Aab, Yiorgos Adamopoulos, Paolo Bonzini, Walter Briscoe, Jim
5a67ee
   Dennis, Carlos Duarte, Otavio Exel, Sven Guckes, Aurelio Jargas,
5a67ee
   Mark Katz, Toby Kelsey, Eric Pement, Greg Pfeiffer, Ken Pizzini,
5a67ee
   Niall Smart, Simon Taylor, Peter Tillier, Greg Ubben, Laurent
5a67ee
   Vogel.
5a67ee

5a67ee
1.7. Standard disclaimers
5a67ee

5a67ee
   While a serious attempt has been made to ensure the accuracy of the
5a67ee
   information presented herein, the contributors and maintainers of
5a67ee
   this document do not claim the absence of errors and make no
5a67ee
   warranties on the information provided. If you notice any mistakes,
5a67ee
   please let us know so we can fix it.
5a67ee

5a67ee
------------------------------
5a67ee

5a67ee
2. BASIC SED
5a67ee

5a67ee
2.1. What is sed?
5a67ee

5a67ee
   "sed" stands for Stream EDitor. Sed is a non-interactive editor,
5a67ee
   written by the late Lee E. McMahon in 1973 or 1974. A brief history
5a67ee
   of sed's origins may be found in an early history of the Unix
5a67ee
   tools, at <http://www.columbia.edu/~rh120/ch106.x09>.
5a67ee

5a67ee
   Instead of altering a file interactively by moving the cursor on
5a67ee
   the screen (as with a word processor), the user sends a script of
5a67ee
   editing instructions to sed, plus the name of the file to edit (or
5a67ee
   the text to be edited may come as output from a pipe). In this
5a67ee
   sense, sed works like a filter -- deleting, inserting and changing
5a67ee
   characters, words, and lines of text. Its range of activity goes
5a67ee
   from small, simple changes to very complex ones.
5a67ee

5a67ee
   Sed reads its input from stdin (Unix shorthand for "standard
5a67ee
   input," i.e., the console) or from files (or both), and sends the
5a67ee
   results to stdout ("standard output," normally the console or
5a67ee
   screen). Most people use sed first for its substitution features.
5a67ee
   Sed is often used as a find-and-replace tool.
5a67ee

5a67ee
     sed 's/Glenn/Harold/g' oldfile >newfile
5a67ee

5a67ee
   will replace every occurrence of "Glenn" with the word "Harold",
5a67ee
   wherever it occurs in the file. The "find" portion is a regular
5a67ee
   expression ("RE"), which can be a simple word or may contain
5a67ee
   special characters to allow greater flexibility (for example, to
5a67ee
   prevent "Glenn" from also matching "Glennon").
5a67ee

5a67ee
   My very first use of sed was to add 8 spaces to the left side of a
5a67ee
   file, so when I printed it, the printing wouldn't begin at the
5a67ee
   absolute left edge of a piece of paper.
5a67ee

5a67ee
     sed 's/^/        /' myfile >newfile   # my first sed script
5a67ee
     sed 's/^/        /' myfile | lp       # my next sed script
5a67ee

5a67ee
   Then I learned that sed could display only one paragraph of a file,
5a67ee
   beginning at the phrase "and where it came" and ending at the
5a67ee
   phrase "for all people". My script looked like this:
5a67ee

5a67ee
     sed -n '/and where it came/,/for all people/p' myfile
5a67ee

5a67ee
   Sed's normal behavior is to print (i.e., display or show on screen)
5a67ee
   the entire file, including the parts that haven't been altered,
5a67ee
   unless you use the -n switch. The "-n" stands for "no output". This
5a67ee
   switch is almost always used in conjunction with a 'p' command
5a67ee
   somewhere, which says to print only the sections of the file that
5a67ee
   have been specified. The -n switch with the 'p' command allow for
5a67ee
   parts of a file to be printed (i.e., sent to the console).
5a67ee

5a67ee
   Next, I found that sed could show me only (say) lines 12-18 of a
5a67ee
   file and not show me the rest. This was very handy when I needed to
5a67ee
   review only part of a long file and I didn't want to alter it.
5a67ee

5a67ee
     # the 'p' stands for print
5a67ee
     sed -n 12,18p myfile
5a67ee

5a67ee
   Likewise, sed could show me everything else BUT those particular
5a67ee
   lines, without physically changing the file on the disk:
5a67ee

5a67ee
     # the 'd' stands for delete
5a67ee
     sed 12,18d myfile
5a67ee

5a67ee
   Sed could also double-space my single-spaced file when it came time
5a67ee
   to print it:
5a67ee

5a67ee
     sed G myfile >newfile
5a67ee

5a67ee
   If you have many editing commands (for deleting, adding,
5a67ee
   substituting, etc.) which might take up several lines, those
5a67ee
   commands can be put into a separate file and all of the commands in
5a67ee
   the file applied to file being edited:
5a67ee

5a67ee
     #  'script.sed' is the file of commands
5a67ee
     # 'myfile' is the file being changed
5a67ee
     sed -f script.sed myfile  # 'script.sed' is the file of commands
5a67ee

5a67ee
   It is not our intention to convert this FAQ file into a full-blown
5a67ee
   sed tutorial (for good tutorials, see section 2.3). Rather, we hope
5a67ee
   this gives the complete novice a few ideas of how sed can be used.
5a67ee

5a67ee
2.2. What versions of sed are there, and where can I get them?
5a67ee

5a67ee
2.2.1. Free versions
5a67ee

5a67ee
   Note: "Free" does not mean "public domain" nor does it necessarily
5a67ee
   mean you will never be charged for it. All versions of sed in this
5a67ee
   section except the CP/M versions are based on the GNU general
5a67ee
   public license and are "free software" by that standard (for
5a67ee
   details, see http://www.gnu.org/philosophy/free-sw.html). This
5a67ee
   means you can get the source code and develop it further.
5a67ee

5a67ee
   At the URLs listed in this category, sed binaries or source code
5a67ee
   can be downloaded and used without fees or license payments.
5a67ee

5a67ee
2.2.1.1. Unix platforms
5a67ee

5a67ee
   ssed v3.60
5a67ee
   ssed is the version recommended by the FAQ maintainers, since it
5a67ee
   shares the same codebase with GNU sed, has the most options, and is
5a67ee
   free software (you can get the source). Though there were earlier
5a67ee
   version of ssed distributed, sites for these are not being listed.
5a67ee

5a67ee
       http://sed.sourceforge.net/grabbag/ssed
5a67ee
       http://freshmeat.net/project/sed/
5a67ee

5a67ee
   GNU sed v4.0.5
5a67ee
   This is the latest official version of GNU sed. It offers in-place
5a67ee
   text replacement as an option switch.
5a67ee

5a67ee
       ftp://ftp.gnu.org/pub/gnu/sed/sed-4.0.5.tar.gz
5a67ee
       http://freshmeat.net/project/sed
5a67ee

5a67ee
   BSD multi-byte sed (Japanese)
5a67ee
   Based on the latest version of GNU sed, which supports multi-byte
5a67ee
   characters.
5a67ee

5a67ee
       ftp://ftp1.freebsd.org/pub/FreeBSD/FreeBSD-stable/packages/Latest/ja-sed.tgz
5a67ee

5a67ee
   GNU sed v3.02.80
5a67ee
   An alpha test release which was the base for the development of
5a67ee
   ssed and GNU sed v4.0.
5a67ee

5a67ee
       ftp://alpha.gnu.org/pub/gnu/sed/sed-3.02.80.tar.gz
5a67ee

5a67ee
   GNU sed v3.02a
5a67ee
   Interim version with most features of GNU sed v3.02.80.
5a67ee

5a67ee
   GNU sed v3.02
5a67ee
       ftp://ftp.gnu.org/pub/gnu/sed/sed-3.02.tar.gz
5a67ee

5a67ee
   Precompiled versions:
5a67ee

5a67ee
   GNU sed v3.02-8
5a67ee
   source code and binaries for Debian GNU/Linux
5a67ee

5a67ee
       http://www.debian.org/Packages/stable/base/sed.html
5a67ee

5a67ee
   For some time, the GNU project <http://www.gnu.org> used Eric S.
5a67ee
   Raymond's version of sed (ESR sed v1.1), but eventually dropped it
5a67ee
   because it had too many built-in limits. In 1991 Howard Helman
5a67ee
   modified the GNU/ESR sed and produced a flexible version of sed
5a67ee
   v1.5 available at several sites (Helman's version permitted things
5a67ee
   like \<...\> to delimit word boundaries, \xHH to enter hex code and
5a67ee
   \n to indicate newlines in the replace string). This version did
5a67ee
   not catch on with the GNU project and their version of sed has
5a67ee
   moved in a similar but different direction.
5a67ee

5a67ee
   sed v1.3, by Eric Steven Raymond (released 4 June 1998)
5a67ee
       http://catb.org/~esr/sed-1.3.tar.gz
5a67ee

5a67ee
   Eric Raymond <esr@snark.thyrsus.com> wrote one of the earliest
5a67ee
   versions of sed. On his website <http://www.catb.org/~esr/> which
5a67ee
   also distributes many freeware utilities he has written or worked
5a67ee
   on, he describes sed v1.1 this way:
5a67ee

5a67ee
   "This is the fast, small sed originally distributed in the GNU
5a67ee
   toolkit and still distributed with Minix. The GNU people ditched it
5a67ee
   when they built their own sed around an enhanced regex package --
5a67ee
   but it's still better for some uses (in particular, faster and less
5a67ee
   memory-intensive)." (Version 1.3 fixes an unidentified bug and adds
5a67ee
   the L command to hexdump the current pattern space.)
5a67ee

5a67ee
2.2.1.2. OS/2
5a67ee

5a67ee
   GNU sed v3.02.80
5a67ee
       http://www2s.biglobe.ne.jp/~vtgf3mpr/gnu/sed.htm
5a67ee

5a67ee
   GNU sed v3.02
5a67ee
       http://hobbes.nmsu.edu/pub/os2/util/file/sed-3_02-r2-bin.zip # binaries
5a67ee
       http://hobbes.nmsu.edu/pub/os2/util/file/sed-3_02-r2.zip     # source
5a67ee

5a67ee
2.2.1.3. Microsoft Windows (Win3x, Win9x, WinNT, Win2K)
5a67ee

5a67ee
   GNU sed v4.0.5
5a67ee
   32-bit binaries and docs. Precompiled versions not available (yet).
5a67ee

5a67ee
   GNU sed v3.02.80
5a67ee
   32-bit binaries and docs, using DJGPP compiler. For details on new
5a67ee
   features, see Unix section, above.
5a67ee

5a67ee
       http://www.student.northpark.edu/pemente/sed/sed3028a.zip # DOS binaries
5a67ee
       ftp://alpha.gnu.org/pub/gnu/sed/sed-3.02.80.tar.gz        # source
5a67ee
       ftp://ftp.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/sed3028b.zip # binaries
5a67ee
       ftp://ftp.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/sed3028d.zip # docs
5a67ee
       ftp://ftp.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/sed3028s.zip # source
5a67ee

5a67ee
   GNU sed v2.05
5a67ee
   32-bit binaries, no docs. Requires 80386 DX (SX will not run) and
5a67ee
   must be run in a DOS window or in a full screen DOS session under
5a67ee
   Microsoft Windows. Will not run in MS-DOS mode (outside Win/Win95).
5a67ee
   We recommend using the latest version of GNU sed.
5a67ee
       http://www.simtel.net/pub/win95/prog/gsed205b.zip
5a67ee
       ftp://ftp.cdrom.com/pub/simtelnet/win95/prog/gsed205b.zip
5a67ee

5a67ee
   GNU sed v1.03
5a67ee
   modified by Frank Whaley.
5a67ee

5a67ee
   This version was part of the "Virtually UN*X" toolset, hosted by
5a67ee
   itribe.net; that website is now closed. Gsed v1.03 supported Win9x
5a67ee
   long filenames, as well as hex, decimal, binary, and octal
5a67ee
   character representations.
5a67ee

5a67ee
   The Cygwin toolkit:
5a67ee
       http://www.cygwin.com
5a67ee

5a67ee
   Formerly know as "GNU-Win32 tools." According to their home page,
5a67ee
   "The Cygwin tools are Win32 ports of the popular GNU development
5a67ee
   tools for Windows NT, 95 and 98. They function through the use of
5a67ee
   the Cygwin library which provides a UNIX-like API on top of the
5a67ee
   Win32 API." The version of sed used is GNU sed v3.02.
5a67ee

5a67ee
   Minimalist GNU for Windows (MinGW):
5a67ee
       http://www.mingw.org
5a67ee
       http://mingw.sourceforge.net
5a67ee

5a67ee
   According to their home page, "MinGW ('Minimalist GNU for Windows')
5a67ee
   refers to a set of runtime headers, used in building a compiler
5a67ee
   system based on the GNU GCC and binutils projects. It compiles and
5a67ee
   links code to be run on Win32 platforms ... MinGW uses Microsoft
5a67ee
   runtime libraries, distributed with the Windows operating system."
5a67ee
   The version of sed used is GNU sed v3.02.
5a67ee

5a67ee
   sed v1.5 (a/k/a HHsed), by Howard Helman
5a67ee
   Compiled with Mingw32 for 32-bit environments described above. This
5a67ee
   version should support Win95 long filenames.
5a67ee
       http://www.dbnet.ece.ntua.gr/~george/sed/OLD/sed15.exe
5a67ee
       http://www.student.northpark.edu/pemente/sed/sed15exe.zip
5a67ee

5a67ee
2.2.1.4. MS-DOS
5a67ee

5a67ee
   sed v1.6 (from HHsed), by Walter Briscoe
5a67ee

5a67ee
   This is a forthcoming version, now in beta testing, but with many
5a67ee
   new features. It corrects all the bugs in sed v1.5, and adds the
5a67ee
   best features of sedmod v1.0 (below). It is available in 16-bit and
5a67ee
   32-bit compiled versions for MS-DOS. Sorry, no URLs available yet.
5a67ee

5a67ee
   sed v1.5 (a/k/a HHsed), by Howard Helman
5a67ee
   uncompiled source code (Turbo C)
5a67ee
       ftp://ftp.simtel.net/pub/simtelnet/msdos/txtutl/sed15.zip
5a67ee
       ftp://ftp.cdrom.com/pub/simtelnet/msdos/txtutl/sed15.zip
5a67ee

5a67ee
   DOS executable and documentation
5a67ee
       ftp://ftp.simtel.net/pub/simtelnet/msdos/txtutl/sed15x.zip
5a67ee
       ftp://ftp.cdrom.com/pub/simtelnet/msdos/txtutl/sed15x.zip
5a67ee

5a67ee
   sedmod v1.0, by Hern Chen
5a67ee
       http://www.ptug.org/sed/SEDMOD10.ZIP
5a67ee
       http://www.student.northpark.edu/pemente/sed/sedmod10.zip
5a67ee
       ftp://garbo.uwasa.fi/pc/unix/sedmod10.zip
5a67ee

5a67ee
   GNU sed v3.02.80
5a67ee
   See section 2.2.1.3 ("Microsoft Windows"), above.
5a67ee

5a67ee
   GNU sed v2.05
5a67ee
   Does not run under MS-DOS.
5a67ee

5a67ee
   GNU sed v1.18
5a67ee
   32-bit binaries and source, using DJGPP compiler. Requires 80386 SX
5a67ee
   or better. Also requires 3 CWS*.EXE extenders on the path. See
5a67ee
   section 5.5 ("What is CSDPMI*B.ZIP and why do I need it?"), below.
5a67ee
   We recommend using a newer version of GNU sed.
5a67ee
       http://www.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/sed118b.zip
5a67ee
       ftp://ftp.cdrom.com/pub/simtelnet/gnu/djgpp/v2gnu/sed118b.zip
5a67ee
       http://www.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/sed118s.zip
5a67ee
       ftp://ftp.cdrom.com/pub/simtelnet/gnu/djgpp/v2gnu/sed118s.zip
5a67ee

5a67ee
   GNU sed v1.06
5a67ee
   16-bit binaries and source. Should run under any MS-DOS system.
5a67ee
       http://www.simtel.net/pub/gnu/gnuish/sed106.zip
5a67ee
       ftp://ftp.cdrom.com/pub/simtelnet/gnu/gnuish/sed106.zip
5a67ee

5a67ee
2.2.1.5. CP/M
5a67ee

5a67ee
   ssed v2.2, by Chuck A. Forsberg
5a67ee

5a67ee
   Written for CP/M, ssed (for "small/stupid stream editor) supports
5a67ee
   only the a(ppend), c(hange), d(elete) and i(nsert) options, and
5a67ee
   apparently doesn't support regular expressions. A -u switch will
5a67ee
   "unsqueeze" compressed files and was used mainly in conjunction
5a67ee
   with DIF.COM for source code maintenance. (file: ssed22.lbr)
5a67ee

5a67ee
   change, by Michael M. Rubenstein
5a67ee

5a67ee
   Rubenstein released a version of sed called CHANGE.COM (the
5a67ee
   TTOOLS.LBR archive member CHANGE.CZM is a "crunched" file).
5a67ee
   CHANGE.COM supports full RE's except grouping and backreferences,
5a67ee
   and its only function is global substitution. (file: ttools.lbr)
5a67ee

5a67ee
2.2.1.6. Macintosh v8 or v9
5a67ee

5a67ee
   Since sed is a command-line utility, it is not customary to think
5a67ee
   of sed being used on a Mac. Nonetheless, the following instructions
5a67ee
   from Aurelio Jargas describe the process for running sed on MacOS
5a67ee
   version version 8 or 9.
5a67ee

5a67ee
   (1) Download and install the Apple DiskCopy application
5a67ee

5a67ee
       ftp://ftp.apple.com/developer/Development_Kits
5a67ee

5a67ee
   (2) Download and install Apple MPW
5a67ee

5a67ee
       ftp://ftp.apple.com/developer/Tool_Chest/Core_Mac_OS_Tools/MPW_etc./
5a67ee

5a67ee
   (3) Download and expand Matthias Neeracher's GNU sed for MPW. (They
5a67ee
   seem to have misnumbered the sed filename.)
5a67ee

5a67ee
       ftp://sunsite.cnlab-switch.ch/software/platform/macos/src/mpw_c/sed-2.03.sit.bin
5a67ee

5a67ee
   (4) Enter the sed-3.02 directory and doubleclick the 'sed' file
5a67ee

5a67ee
   (5) MPW Shell will open up. It will be a command window instead of
5a67ee
   a command line, but sed should work as expected. For example:
5a67ee

5a67ee
       echo aa | sed 's/a/Z/g'<ENTER>
5a67ee

5a67ee
   Note that ENTER is different from RETURN on an iMac. Apple *also*
5a67ee
   has its own version of sed on MPW, called "StreamEdit", with a
5a67ee
   syntax fairly similar to that of normal sed.
5a67ee

5a67ee
2.2.2. Shareware and Commercial versions
5a67ee

5a67ee
2.2.2.1. Unix platforms
5a67ee

5a67ee
       [ Additional information needed. ]
5a67ee

5a67ee
2.2.2.2. OS/2
5a67ee

5a67ee
   Hamilton Labs:
5a67ee
       http://www.hamiltonlabs.com/cshell.htm
5a67ee

5a67ee
   A sizable set of Unix/C shell utilities designed for OS/2. Price is
5a67ee
   $350 in the US, $395 elsewhere, with FedEx shipping, unconditional
5a67ee
   guarantee, unlimited support and free updates. A demo version of
5a67ee
   the suite can be downloaded from this site, but a stand-alone copy
5a67ee
   of sed is not available.
5a67ee

5a67ee
2.2.2.3. Windows 95/98, Windows NT, Windows 2000
5a67ee

5a67ee
   Hamilton Labs:
5a67ee
       http://www.hamiltonlabs.com/cshell.htm
5a67ee

5a67ee
   A sizable set of Unix/C shell utilities designed for Win9x, WinNT,
5a67ee
   and Win2K. Price is $350 in the US, $395 elsewhere, with FedEx
5a67ee
   shipping, unconditional guarantee, unlimited support and free
5a67ee
   updates. A demo version of the suite can be downloaded from this
5a67ee
   site, but a stand-alone copy of sed is not available.
5a67ee

5a67ee
   Interix:
5a67ee
       http://www.interix.com
5a67ee

5a67ee
   Interix (formerly known as OpenNT) is advertised as "a complete
5a67ee
   UNIX system environment running natively on Microsoft Windows NT",
5a67ee
   and is licensed and supported by Softway Systems. It offers over
5a67ee
   200 Unix utilities, and supports Unix shells, sockets, networking,
5a67ee
   and more. A single-user edition runs about $200. A free demo or
5a67ee
   evaluation copy will run for 31 days and then quit; to continue
5a67ee
   using it, you must purchase the commercial version.
5a67ee

5a67ee
   MKS NuTCRACKER Professional
5a67ee
       http://www.datafocus.com/products/nutc/
5a67ee

5a67ee
   A different, yet related product line offered by MKS (Mortice Kern
5a67ee
   Systems, below); the awkward spelling "NuTCRACKER" is intentional.
5a67ee
   Various packages offer hundreds of Unix utilities for Win32
5a67ee
   environments. Sed is not available as a separate product.
5a67ee

5a67ee
   UnixDos:
5a67ee
       http://www.unixdos.com
5a67ee

5a67ee
   UnixDos is a suite of 82 Unix utilities ported over to the Windows
5a67ee
   environments. There are 16-bit versions for Win3.x and 32-bit
5a67ee
   versions for WinNT/Win95. It is distributed as uncrippled shareware
5a67ee
   for the first 30 days. After the test period, the utilities will
5a67ee
   not run and you must pay the registration fee of $50.
5a67ee

5a67ee
   Their version of sed supports "\n" in the RHS of expressions, and
5a67ee
   increases the length of input lines to 10,000 characters. By
5a67ee
   special arrangement with the owners, persons who want a licensed
5a67ee
   version of sed *only* (without the other utilities) may pay a
5a67ee
   license fee of $10.
5a67ee

5a67ee
   U/WIN:
5a67ee
       http://www.research.att.com/sw/tools/uwin/
5a67ee

5a67ee
   U/WIN is a suite of Unix utilities created for WinNT and Win95
5a67ee
   systems. It is owned by AT&T, created by David Korn (author of the
5a67ee
   Unix korn shell), and is freely distributed only to educational
5a67ee
   institutions, AT&T employees, or certain researchers; all others
5a67ee
   must pay a fee after a 90-day evaluation period expires. U/WIN
5a67ee
   operates best with the NTFS (WinNT file system) but will run in
5a67ee
   degraded mode with the FAT file system and in further degraded mode
5a67ee
   under Win95. A minimal installation takes about 25 to 30 megs of
5a67ee
   disk space. Sed is not available as a separate file for download,
5a67ee
   but comes with the suite.
5a67ee

5a67ee
2.2.2.4. MS-DOS
5a67ee

5a67ee
   Mix C/Utilities Toolchest
5a67ee
       http://www.mixsoftware.com/product/utility.htm
5a67ee

5a67ee
   According to their web page, "The C/Utilities Toolchest adds over
5a67ee
   40 powerful UNIX utilities to your MS-DOS operating system. The
5a67ee
   result is an environment very similar to UNIX operating systems,
5a67ee
   yet 100% compatible with MS-DOS programs and commands." The
5a67ee
   toolchest costs $19.95, with source code available for an
5a67ee
   additional fee. Mix C's version of sed is not available separately.
5a67ee

5a67ee
   MKS (Mortice Kern Systems) Toolkit
5a67ee
       http://www.mks.com
5a67ee

5a67ee
   Sed comes bundled with the MKS Toolkit, which is distributed only
5a67ee
   as commercial software; it is not available separately.
5a67ee

5a67ee
   Thompson Automation Software
5a67ee
       http://www.tasoft.com
5a67ee

5a67ee
   The Thompson Toolkit contains over 100 familiar Unix utilities,
5a67ee
   including a version of the Unix Korn shell. It runs under MS-DOS,
5a67ee
   OS/2, Win3.x, Win9x, and WinNT. Sed is one of the utilities, though
5a67ee
   Thompson is better known for its version of awk for DOS, TAWK. The
5a67ee
   toolkit runs about $150; sed is not available separately.
5a67ee

5a67ee
2.3. Where can I learn to use sed?
5a67ee

5a67ee
2.3.1. Books
5a67ee

5a67ee
       _Sed & Awk, 2d edition_, by Dale Dougherty & Arnold Robbins
5a67ee
       (Sebastopol, Calif: O'Reilly and Associates, 1997)
5a67ee
       ISBN 1-56592-225-5
5a67ee
       http://www.oreilly.com/catalog/sed2/noframes.html
5a67ee

5a67ee
   About 40 percent of this book is devoted to sed, and maybe 50
5a67ee
   percent is devoted to awk. The other 10 percent covers regexes and
5a67ee
   concepts common to both tools. If you prefer hard copy, this is
5a67ee
   definitely the best single place to learn to use sed, including its
5a67ee
   advanced features.
5a67ee

5a67ee
   The first edition is also very useful. Several typos crept into the
5a67ee
   first printing of the first edition (though if you follow the
5a67ee
   tutorials closely, you'll recognize them right away). A list of
5a67ee
   errors from the first printing of _sed & awk_ is available at
5a67ee
   <http://www.cs.colostate.edu/~dzubera/sedawk.txt>, and errors in
5a67ee
   the 2nd are at <http://www.cs.colostate.edu/~dzubera/sedawk2.txt>,
5a67ee
   though most of these were corrected in later printings. The second
5a67ee
   edition tells how POSIX standards have affected these tools and
5a67ee
   covers the popular GNU versions of sed and awk. Price is about (US)
5a67ee
   $30.00
5a67ee

5a67ee
   -----
5a67ee

5a67ee
       _Mastering Regular Expressions, 2d ed.,_ by Jeffrey E. F. Friedl
5a67ee
       (Sebastopol, Calif: O'Reilly and Associates, 2002)
5a67ee
       ISBN 0-596-00289-0
5a67ee
       http://regex.info
5a67ee
       http://www.oreilly.com/catalog/regex2/
5a67ee
       http://public.yahoo.com/~jfriedl/regex/ (for the first edition)
5a67ee

5a67ee
   Knowing how to use "regular expressions" is essential to effective
5a67ee
   use of most Unix tools. This book focuses on how regular
5a67ee
   expressions can be best implemented in utilities such as perl, vi,
5a67ee
   emacs, and awk, but also touches on sed as well. Friedl's home page
5a67ee
   (above) gives links to other sites which help students learn to
5a67ee
   master regular expressions. His site also gives a Perl script for
5a67ee
   determining a syntactically valid e-mail address, using regexes:
5a67ee

5a67ee
       http://public.yahoo.com/~jfriedl/regex/code.html
5a67ee

5a67ee
   -----
5a67ee

5a67ee
       _Awk und Sed_, by Helmut Herold.
5a67ee
       (Bonn: Addison-Wesley, 1994; 288 pages)
5a67ee
       2nd edition to be released in March 2003
5a67ee
       ISBN 3-8273-2094-1
5a67ee
       http://www.addison-wesley.de/main/main.asp?page=home/bookdetails&ProductID=37214
5a67ee

5a67ee
2.3.2. Mailing list
5a67ee

5a67ee
   If you are interested in learning more about sed (its syntax, using
5a67ee
   regular expressions, etc.) you are welcome to subscribe to a
5a67ee
   sed-oriented mailing list. In fact, there are two mailing lists
5a67ee
   about sed: one in English named "sed-users", moderated by Sven
5a67ee
   Guckes; and one in Portuguese named "sed-BR" (for sed-Brazil),
5a67ee
   moderated by Aurelio Marinho Jargas. The average volume of mail for
5a67ee
   "sed-users" is about 35 messages a week; the average volume of mail
5a67ee
   for "sed-BR" is about 15 messages a week.
5a67ee

5a67ee
       sed-BR mailing list:    http://br.groups.yahoo.com/group/sed-br/
5a67ee
       sed-users mailing list: http://groups.yahoo.com/group/sed-users/
5a67ee

5a67ee
   To subscribe to sed-users, send a blank message to:
5a67ee

5a67ee
       sed-users-subscribe@yahoogroups.com
5a67ee

5a67ee
   To unsubscribe from sed-users, send a blank message to:
5a67ee

5a67ee
       sed-users-unsubscribe@yahoogroups.com
5a67ee

5a67ee
2.3.3. Tutorials, electronic text
5a67ee

5a67ee
   The original users manual for sed, by Lee E. McMahon, from the
5a67ee
   7th edition UNIX Manual (1978), with the classic "Kubla Khan"
5a67ee
   example and tutorial, in formatted text format:
5a67ee
       http://sed.sourceforge.net/grabbag/tutorials/sed_mcmahon.txt
5a67ee

5a67ee
   The source code to the preceding manual. Use "troff -ms sed" to
5a67ee
   print this file properly:
5a67ee
       http://plan9.bell-labs.com/7thEdMan/vol2/sed
5a67ee
       http://cm.bell-labs.com/7thEdMan/vol2/sed
5a67ee

5a67ee
   "Do It With Sed", by Carlos Duarte
5a67ee
       http://www.dbnet.ece.ntua.gr/~george/sed/OLD/sedtut_1.html
5a67ee

5a67ee
   "Sed: How to use sed, a special editor for modifying files
5a67ee
   automatically", by Bruce Barnett and General Electric Company
5a67ee
       http://www.grymoire.com/Unix/Sed.html
5a67ee

5a67ee
   U-SEDIT2.ZIP, by Mike Arst (16 June 1990)
5a67ee
       ftp://ftp.cs.umu.se/pub/pc/u-sedit2.zip
5a67ee
       ftp://ftp.uni-stuttgart.de/pub/systems/msdos/util/unixlike/u-sedit2.zip
5a67ee
       ftp://sunsite.icm.edu.pl/vol/wojsyl/garbo/pc/editor/u-sedit2.zip
5a67ee
       ftp://ftp.sogang.ac.kr/pub/msdos/garbo_pc/editor/u-sedit2.zip
5a67ee

5a67ee
   U-SEDIT3.ZIP, by Mike Arst (24 Jan. 1992)
5a67ee
       http://www.student.northpark.edu/pemente/sed/u-sedit3.zip
5a67ee
       CompuServe DTPFORUM, "PC DTP Utilities" library, file SEDDOC.ZIP
5a67ee

5a67ee
   Another sed FAQ
5a67ee
       http://www.dreamwvr.com/sed-info/sed-faq.html
5a67ee

5a67ee
   sed-tutorial, by Felix von Leitner
5a67ee
       http://www.math.fu-berlin.de/~leitner/sed/tutorial.html
5a67ee

5a67ee
   "Manipulating text with sed," chapter 14 of the SCO OpenServer
5a67ee
   "Operating System Users Guide"
5a67ee
       http://ou800doc.caldera.com/SHL_automate/CTOC-Manipulating_text_with_sed.html
5a67ee

5a67ee
   "Combining the Bourne-shell, sed and awk in the UNIX environment
5a67ee
   for language analysis," by Lothar Schmitt and Kiel Christianson.
5a67ee
   This basic tutorial on the Bourne shell, sed and awk downloads as a
5a67ee
   71-page PostScript file (compressed to 290K with gzip). You may
5a67ee
   need to navigate down from the root to get the file.
5a67ee
       ftp://ftp.u-aizu.ac.jp/u-aizu/doc/Tech-Report/1997/97-2-007.tar.gz
5a67ee
       available upon request from Lothar Schmitt <lothar@u-aizu.ac.jp>
5a67ee

5a67ee
2.3.4. General web and ftp sites
5a67ee

5a67ee
       http://sed.sourceforge.net/grabbag             # Collected scripts
5a67ee
       http://main.rtfiber.com.tw/~changyj/sed/       # Yao-Jen Chang
5a67ee
       http://www.math.fu-berlin.de/~guckes/sed/      # Sven Guckes
5a67ee
       http://www.math.fu-berlin.de/~leitner/sed/     # Felix von Leitner
5a67ee
       http://www.dbnet.ece.ntua.gr/~george/sed/      # Yiorgos Adamopoulos
5a67ee
       http://www.student.northpark.edu/pemente/sed/  # Eric Pement
5a67ee

5a67ee
       http://spacsun.rice.edu/FAQ/sed.html
5a67ee
       ftp://algos.inesc.pt/pub/users/cdua/scripts.tar.gz (sed and shell scripts)
5a67ee

5a67ee
   "Handy One-Liners For Sed", compiled by Eric Pement. A large list
5a67ee
   of 1-line sed commands which can be executed from the command line.
5a67ee
       http://sed.sourceforge.net/sed1line.txt
5a67ee
       http://www.student.northpark.edu/pemente/sed/sed1line.txt
5a67ee

5a67ee
   "Handy One-Liners For Sed", translated to Portuguese
5a67ee
       http://wmaker.lrv.ufsc.br/sed_ptBR.html
5a67ee

5a67ee
   The Single UNIX Specification, Version 3 (technical man page)
5a67ee
       http://www.opengroup.org/onlinepubs/007904975/utilities/sed.html
5a67ee

5a67ee
   Getting started with sed
5a67ee
       http://www.cs.hmc.edu/tech_docs/qref/sed.html
5a67ee

5a67ee
   masm to gas converter
5a67ee
       http://www.delorie.com/djgpp/faq/converting/asm2s-sed.html
5a67ee

5a67ee
   mail2html.zip
5a67ee
       http://www.crispen.org/src/#mail2html
5a67ee

5a67ee
   sample uses of sed in batch files and scripts (Benny Pederson)
5a67ee
       http://users.cybercity.dk/~bse26236/batutil/help/SED.HTM
5a67ee

5a67ee
   dc.sed - the most complex and impressive sed script ever written.
5a67ee
   This sed script by Greg Ubben emulates the Unix dc (desk
5a67ee
   calculator), including base conversion, exponentiation, square
5a67ee
   roots, and much more.
5a67ee
       http://sed.sourceforge.net/grabbag/scripts/dc_overview.htm
5a67ee

5a67ee
   If you should find other tutorials or scripts that should be added
5a67ee
   to this document, please forward the URLs to the FAQ maintainer.
5a67ee

5a67ee
------------------------------
5a67ee

5a67ee
3. TECHNICAL
5a67ee

5a67ee
3.1. More detailed explanation of basic sed
5a67ee

5a67ee
   Sed takes a script of editing commands and applies each command, in
5a67ee
   order, to each line of input. After all the commands have been
5a67ee
   applied to the first line of input, that line is output. A second
5a67ee
   input line is taken for processing, and the cycle repeats. Sed
5a67ee
   scripts can address a single line by line number or by matching a
5a67ee
   /RE pattern/ on the line. An exclamation mark '!' after a regex
5a67ee
   ('/RE/!') or line number will select all lines that do NOT match
5a67ee
   that address. Sed can also address a range of lines in the same
5a67ee
   manner, using a comma to separate the 2 addresses.
5a67ee

5a67ee
     $d               # delete the last line of the file
5a67ee
     /[0-9]\{3\}/p    # print lines with 3 consecutive digits
5a67ee
     5!s/ham/cheese/  # except on line 5, replace 'ham' with 'cheese'
5a67ee
     /awk/!s/aaa/bb/  # unless 'awk' is found, replace 'aaa' with 'bb'
5a67ee
     17,/foo/d        # delete all lines from line 17 up to 'foo'
5a67ee

5a67ee
   Following an address or address range, sed accepts curly braces
5a67ee
   '{...}' so several commands may be applied to that line or to the
5a67ee
   lines matched by the address range. On the command line, semicolons
5a67ee
   ';' separate each instruction and must precede the closing brace.
5a67ee

5a67ee
     sed '/Owner:/{s/yours/mine/g;s/your/my/g;s/you/me/g;}' file
5a67ee

5a67ee
   Range addresses operate differently depending on which version of
5a67ee
   sed is used (see section 3.4, below). For further information on
5a67ee
   using sed, consult the references in section 2.3, above.
5a67ee

5a67ee
3.1.1. Regular expressions on the left side of "s///"
5a67ee

5a67ee
   All versions of sed support Basic Regular Expressions (BREs). For
5a67ee
   the syntax of BREs, enter "man ed" at a Unix shell prompt. A
5a67ee
   technical description of BREs from IEEE POSIX 1003.1-2001 and the
5a67ee
   Single UNIX Specification Version 3 is available online at:
5a67ee
   http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap09.html#tag_09_03
5a67ee

5a67ee
   Sed normally supports BREs plus '\n' to match a newline in the
5a67ee
   pattern space, plus '\xREx' as equivalent to '/RE/', where 'x' is any
5a67ee
   character other than a newline or another backslash.
5a67ee

5a67ee
   Some versions of sed support supersets of BREs, or "extended
5a67ee
   regular expressions", which offer additional metacharacters for
5a67ee
   increased flexibility. For additional information on extended REs
5a67ee
   in GNU sed, see sections 3.7 ("GNU/POSIX extensions to regular
5a67ee
   expressions") and 6.7.3 ("Special syntax in REs"), below.
5a67ee

5a67ee
   Though not required by BREs, some versions of sed support \t to
5a67ee
   represent a TAB, \r for carriage return, \xHH for direct entry of
5a67ee
   hex codes, and so forth. Other versions of sed do not.
5a67ee

5a67ee
   ssed (super-sed) introduced many new features for LHS pattern
5a67ee
   matching, too many to give here. The complete list is found in
5a67ee
   section 6.7.3.H ("ssed"), below.
5a67ee

5a67ee
3.1.2. Escape characters on the right side of "s///"
5a67ee

5a67ee
   The right-hand side (the replacement part) in "s/find/replace/" is
5a67ee
   almost always a string literal, with no interpolation of these
5a67ee
   metacharacters:
5a67ee

5a67ee
       .   ^   $   [   ]   {   }   (   )  ?   +   *   |
5a67ee

5a67ee
   Three things *are* interpolated: ampersand (&), backreferences, and
5a67ee
   options for special seds. An ampersand on the RHS is replaced by
5a67ee
   the entire expression matched on the LHS. There is _never_ any
5a67ee
   reason to use grouping like this:
5a67ee

5a67ee
       s/\(some-complex-regex\)/one two \1 three/
5a67ee

5a67ee
   since you can do this instead:
5a67ee

5a67ee
       s/some-complex-regex/one two & three/
5a67ee

5a67ee
   To enter a literal ampersand on the RHS, type '\&'.
5a67ee

5a67ee
   Grouping and backreferences: All versions of sed support grouping
5a67ee
   and backreferences on the LHS and backreferences only on the RHS.
5a67ee
   Grouping allows a series of characters to be collected in a set,
5a67ee
   indicating the boundaries of the set with \( and \). Then the set
5a67ee
   can be designated to be repeated a certain number of times
5a67ee

5a67ee
       \(like this\)*   or   \(like this\)\{5,7\}.
5a67ee

5a67ee
   Groups can also be nested "\(like \(this\) is here\)" and may
5a67ee
   contain any valid RE. Backreferences repeat the contents of a
5a67ee
   particular group, using a backslash and a digit (1-9) for each
5a67ee
   corresponding group. In other words, "/\(pom\)\1/" is another way
5a67ee
   of writing "/pompom/". If groups are nested, backreference numbers
5a67ee
   are counted by matching \( in strict left to right order.  Thus,
5a67ee
   /..\(the \(word\)\) \("foo"\)../ is matched by the backreference
5a67ee
   \3. Backreferences can be used in the LHS, the RHS, and in normal
5a67ee
   RE addressing (see section 3.3).  Thus,
5a67ee

5a67ee
       /\(.\)\1\(.\)\2\(.\)\3/;      # matches "bookkeeper"
5a67ee
       /^\(.\)\(.\)\(.\)\3\2\1$/;    # finds 6-letter palindromes
5a67ee

5a67ee
   Seds differ in how they treat invalid backreferences where no
5a67ee
   corresponding group occurs. To insert a literal ampersand or
5a67ee
   backslash into the RHS, prefix it with a backslash: \& or \\.
5a67ee

5a67ee
   ssed, sed16, and sedmod permit additional options on the RHS. They
5a67ee
   all support changing part of the replacement string to upper case
5a67ee
   (\u or \U), lower case (\l or \L), or to end case conversion (\E).
5a67ee
   Both sed16 and sedmod support awk-style word references ($1, $2,
5a67ee
   $3, ...) and $0 to insert the entire line before conversion.
5a67ee

5a67ee
     echo ab ghi | sed16 "s/.*/$0 - \U$2/"   # prints "ab ghi - GHI"
5a67ee

5a67ee
   *Note:* This feature of sed16 and sedmod will break sed scripts which
5a67ee
   put a dollar sign and digit into the RHS. Though this is an unlikely
5a67ee
   combination, it's worth remembering if you use other people's scripts.
5a67ee

5a67ee
3.1.3.  Substitution switches
5a67ee

5a67ee
   Standard versions of sed support 4 main flags or switches which may
5a67ee
   be added to the end of an "s///" command. They are:
5a67ee

5a67ee
       N      - Replace the Nth match of the pattern on the LHS, where
5a67ee
                N is an integer between 1 and 512. If N is omitted,
5a67ee
                the default is to replace the first match only.
5a67ee
       g      - Global replace of all matches to the pattern.
5a67ee
       p      - Print the results to stdout, even if -n switch is used.
5a67ee
       w file - Write the pattern space to 'file' if a replacement was
5a67ee
                done. If the file already exists when the script is
5a67ee
                executed, it is overwritten. During script execution,
5a67ee
                w appends to the file for each match.
5a67ee

5a67ee
   GNU sed 3.02 and ssed also offer the /I switch for doing a
5a67ee
   case-insensitive match. For example,
5a67ee

5a67ee
     echo ONE TWO | gsed "s/one/unos/I"      # prints "unos TWO"
5a67ee

5a67ee
   GNU sed 4.x and ssed add the /M switch, to simplify working with
5a67ee
   multi-line patterns: when it is used, ^ or $ will match BOL or EOL.
5a67ee
   \` and \' remain available to match the start and end of pattern
5a67ee
   space, respectively.
5a67ee

5a67ee
   ssed supports two more switches, /S and /X, when its Perl mode is
5a67ee
   used. They are described in detail in section 6.7.3.H, below.
5a67ee

5a67ee
3.1.4. Command-line switches
5a67ee

5a67ee
   All versions of sed support two switches, -e and -n. Though sed
5a67ee
   usually separates multiple commands with semicolons (e.g., "H;d;"),
5a67ee
   certain commands could not accept a semicolon command separator.
5a67ee
   These include :labels, 't', and 'b'. These commands had to occur
5a67ee
   last in a script, separated by -e option switches. For example:
5a67ee

5a67ee
     # The 'ta' means jump to label :a if last s/// returns true
5a67ee
     sed -e :a -e '$!N;s/\n=/ /;ta' -e 'P;D' file
5a67ee

5a67ee
   The -n switch turns off sed's default behavior of printing every
5a67ee
   line. With -n, lines are printed only if explicitly told to. In
5a67ee
   addition, for certain versions of sed, if an external script begins
5a67ee
   with "#n" as its first two characters, the output is suppressed
5a67ee
   (exactly as if -n had been entered on the command line). A list of
5a67ee
   which versions appears in section 6.7.2., below.
5a67ee

5a67ee
   GNU sed 4.x and ssed support additional switches. -l (lowercase L),
5a67ee
   followed by a number, lets you adjust the default length of the 'l'
5a67ee
   and 'L' commands (note that these implementations of sed also
5a67ee
   support an argument to these commands, to tailor the length
5a67ee
   separately of each occurrence of the command).
5a67ee

5a67ee
   -i activates in-place editing (see section 4.41.1, below). -s
5a67ee
   treats each file as a separate stream: sed by default joins all the
5a67ee
   files, so $ represents the last line of the last file; 15 means the
5a67ee
   15th line in the joined stream; and /abc/,/def/ might match across
5a67ee
   files.
5a67ee

5a67ee
   When -s is used, however all addresses refer to single files. For
5a67ee
   example, $ represents the last line of each input file; 15 means
5a67ee
   the 15th line of each input file; and /abc/,/def/ will be "reset"
5a67ee
   (in other words, sed will not execute the commands and start
5a67ee
   looking for /abc/ again) if a file ends before /def/ has been
5a67ee
   matched. Note that -i automatically activates this interpretation
5a67ee
   of addresses.
5a67ee

5a67ee
3.2. Common one-line sed scripts
5a67ee

5a67ee
   A separate document of over 70 handy "one-line" sed commands is
5a67ee
   available at
5a67ee
       http://sed.sourceforge.net/sed1line.txt
5a67ee

5a67ee
   Here are several common sed commands for one-line use. MS-DOS users
5a67ee
   should replace single quotes ('...') with double quotes ("...") in
5a67ee
   these examples. A specific filename usually follows the script,
5a67ee
   though the input may also come via piping or redirection.
5a67ee

5a67ee
   # Double space a file
5a67ee
   sed G file
5a67ee

5a67ee
   # Triple space a file
5a67ee
   sed 'G;G' file
5a67ee

5a67ee
   # Under UNIX: convert DOS newlines (CR/LF) to Unix format
5a67ee
   sed 's/.$//' file    # assumes that all lines end with CR/LF
5a67ee
   sed 's/^M$// file    # in bash/tcsh, press Ctrl-V then Ctrl-M
5a67ee

5a67ee
   # Under DOS: convert Unix newlines (LF) to DOS format
5a67ee
   sed 's/$//' file                     # method 1
5a67ee
   sed -n p file                        # method 2
5a67ee

5a67ee
   # Delete leading whitespace (spaces/tabs) from front of each line
5a67ee
   # (this aligns all text flush left). '^t' represents a true tab
5a67ee
   # character. Under bash or tcsh, press Ctrl-V then Ctrl-I.
5a67ee
   sed 's/^[ ^t]*//' file
5a67ee

5a67ee
   # Delete trailing whitespace (spaces/tabs) from end of each line
5a67ee
   sed 's/[ ^t]*$//' file               # see note on '^t', above
5a67ee

5a67ee
   # Delete BOTH leading and trailing whitespace from each line
5a67ee
   sed 's/^[ ^t]*//;s/[ ^]*$//' file    # see note on '^t', above
5a67ee

5a67ee
   # Substitute "foo" with "bar" on each line
5a67ee
   sed 's/foo/bar/' file        # replaces only 1st instance in a line
5a67ee
   sed 's/foo/bar/4' file       # replaces only 4th instance in a line
5a67ee
   sed 's/foo/bar/g' file       # replaces ALL instances within a line
5a67ee

5a67ee
   # Substitute "foo" with "bar" ONLY for lines which contain "baz"
5a67ee
   sed '/baz/s/foo/bar/g' file
5a67ee

5a67ee
   # Delete all CONSECUTIVE blank lines from file except the first.
5a67ee
   # This method also deletes all blank lines from top and end of file.
5a67ee
   # (emulates "cat -s")
5a67ee
   sed '/./,/^$/!d' file       # this allows 0 blanks at top, 1 at EOF
5a67ee
   sed '/^$/N;/\n$/D' file     # this allows 1 blank at top, 0 at EOF
5a67ee

5a67ee
   # Delete all leading blank lines at top of file (only).
5a67ee
   sed '/./,$!d' file
5a67ee

5a67ee
   # Delete all trailing blank lines at end of file (only).
5a67ee
   sed -e :a -e '/^\n*$/{$d;N;};/\n$/ba' file
5a67ee

5a67ee
   # If a line ends with a backslash, join the next line to it.
5a67ee
   sed -e :a -e '/\\$/N; s/\\\n//; ta' file
5a67ee

5a67ee
   # If a line begins with an equal sign, append it to the previous
5a67ee
   # line (and replace the "=" with a single space).
5a67ee
   sed -e :a -e '$!N;s/\n=/ /;ta' -e 'P;D' file
5a67ee

5a67ee
3.3. Addressing and address ranges
5a67ee

5a67ee
   Sed commands may have an optional "address" or "address range"
5a67ee
   prefix. If there is no address or address range given, then the
5a67ee
   command is applied to all the lines of the input file or text
5a67ee
   stream. Three commands cannot take an address prefix:
5a67ee

5a67ee
      - labels, used to branch or jump within the script
5a67ee
      - the close brace, '}', which ends the '{' "command"
5a67ee
      - the '#' comment character, also technically a "command"
5a67ee

5a67ee
   An address can be a line number (such as 1, 5, 37, etc.), a regular
5a67ee
   expression (written in the form /RE/ or \xREx where 'x' is any
5a67ee
   character other than '\' and RE is the regular expression), or the
5a67ee
   dollar sign ($), representing the last line of the file. An
5a67ee
   exclamation mark (!) after an address or address range will apply
5a67ee
   the command to every line EXCEPT the ones named by the address. A
5a67ee
   null regex ("//") will be replaced by the last regex which was
5a67ee
   used. Also, some seds do not support \xREx as regex delimiters.
5a67ee

5a67ee
     5d               # delete line 5 only
5a67ee
     5!d              # delete every line except line 5
5a67ee
     /RE/s/LHS/RHS/g  # substitute only if RE occurs on the line
5a67ee
     /^$/b label      # if the line is blank, branch to ':label'
5a67ee
     /./!b label      # ... another way to write the same command
5a67ee
     \%.%!b label     # ... yet another way to write this command
5a67ee
     $!N              # on all lines but the last, get the Next line
5a67ee

5a67ee
   Note that an embedded newline can be represented in an address by
5a67ee
   the symbol \n, but this syntax is needed only if the script puts 2
5a67ee
   or more lines into the pattern space via the N, G, or other
5a67ee
   commands. The \n symbol does *not* match the newline at an
5a67ee
   end-of-line because when sed reads each line into the pattern space
5a67ee
   for processing, it strips off the trailing newline, processes the
5a67ee
   line, and adds a newline back when printing the line to standard
5a67ee
   output. To match the end-of-line, use the '$' metacharacter, as
5a67ee
   follows:
5a67ee

5a67ee
     /tape$/       # matches the word 'tape' at the end of a line
5a67ee
     /tape$deck/   # matches the word 'tape$deck' with a literal '$'
5a67ee
     /tape\ndeck/  # matches 'tape' and 'deck' with a newline between
5a67ee

5a67ee
   The following sed commands usually accept *only* a single address.
5a67ee
   All other commands (except labels, '}', and '#') accept both single
5a67ee
   addresses and address ranges.
5a67ee

5a67ee
     =       print to stdout the line number of the current line
5a67ee
     a       after printing the current line, append "text" to stdout
5a67ee
     i       before printing the current line, insert "text" to stdout
5a67ee
     q       quit after the current line is matched
5a67ee
     r file  prints contents of "file" to stdout after line is matched
5a67ee

5a67ee
   Note that we said "usually." If you need to apply the '=', 'a',
5a67ee
   'i', or 'r' commands to each and every line within an address
5a67ee
   range, this behavior can be coerced by the use of braces. Thus,
5a67ee
   "1,9=" is an invalid command, but "1,9{=;}" will print each line
5a67ee
   number followed by its line for the first 9 lines (and then print
5a67ee
   the rest of the rest of the file normally).
5a67ee

5a67ee
   Address ranges occur in the form
5a67ee

5a67ee
       <address1>,<address2>    or    <address1>,<address2>!
5a67ee

5a67ee
   where the address can be a line number or a standard /regex/.
5a67ee
   <address2> can also be a dollar sign, indicating the end of file.
5a67ee
   Under GNU sed 3.02+, ssed, and sed15+, <address2> may also be a
5a67ee
   notation of the form +num, indicating the next _num_ lines after
5a67ee
   <address1> is matched.
5a67ee

5a67ee
   Address ranges are:
5a67ee

5a67ee
   (1) Inclusive. The range "/From here/,/eternity/" matches all the
5a67ee
   lines containing "From here" up to and including the line
5a67ee
   containing "eternity". It will not stop on the line just prior to
5a67ee
   "eternity". (If you don't like this, see section 4.24.)
5a67ee

5a67ee
   (2) Plenary. They always match full lines, not just parts of lines.
5a67ee
   In other words, a command to change or delete an address range will
5a67ee
   change or delete whole lines; it won't stop in the middle of a
5a67ee
   line.
5a67ee

5a67ee
   (3) Multi-linear. Address ranges normally match 2 lines or more.
5a67ee
   The second address will never match the same line the first address
5a67ee
   did; therefore a valid address range always spans at least two
5a67ee
   lines, with these exceptions which match only one line:
5a67ee

5a67ee
      - if the first address matches the last line of the file
5a67ee
      - if using the syntax "/RE/,3" and /RE/ occurs only once in the
5a67ee
        file at line 3 or below
5a67ee
      - if using HHsed v1.5. See section 3.4.
5a67ee

5a67ee
   (4) Minimalist. In address ranges with /regex/ as <address2>, the
5a67ee
   range "/foo/,/bar/" will stop at the first "bar" it finds, provided
5a67ee
   that "bar" occurs on a line below "foo". If the word "bar" occurs
5a67ee
   on several lines below the word "foo", the range will match all the
5a67ee
   lines from the first "foo" up to the first "bar". It will not
5a67ee
   continue hopping ahead to find more "bar"s. In other words, address
5a67ee
   ranges are not "greedy," like regular expressions.
5a67ee

5a67ee
   (5) Repeating. An address range will try to match more than one
5a67ee
   block of lines in a file. However, the blocks cannot nest. In
5a67ee
   addition, a second match will not "take" the last line of the
5a67ee
   previous block.  For example, given the following text,
5a67ee

5a67ee
       start
5a67ee
       stop  start
5a67ee
       stop
5a67ee

5a67ee
   the sed command '/start/,/stop/d' will only delete the first two
5a67ee
   lines. It will not delete all 3 lines.
5a67ee

5a67ee
   (6) Relentless. If the address range finds a "start" match but
5a67ee
   doesn't find a "stop", it will match every line from "start" to the
5a67ee
   end of the file. Thus, beware of the following behaviors:
5a67ee

5a67ee
     /RE1/,/RE2/  # If /RE2/ is not found, matches from /RE1/ to the
5a67ee
                  # end-of-file.
5a67ee

5a67ee
     20,/RE/      # If /RE/ is not found, matches from line 20 to the
5a67ee
                  # end-of-file.
5a67ee

5a67ee
     /RE/,30      # If /RE/ occurs any time after line 30, each
5a67ee
                  # occurrence will be matched in sed15+, sedmod, and
5a67ee
                  # GNU sed v3.02+. GNU sed v2.05 and 1.18 will match
5a67ee
                  # from the 2nd occurrence of /RE/ to the end-of-file.
5a67ee

5a67ee
   If these behaviors seem strange, remember that they occur because
5a67ee
   sed does not look "ahead" in the file. Doing so would stop sed from
5a67ee
   being a stream editor and have adverse effects on its efficiency.
5a67ee
   If these behaviors are undesirable, they can be circumvented or
5a67ee
   corrected by the use of nested testing within braces. The following
5a67ee
   scripts work under GNU sed 3.02:
5a67ee

5a67ee
     # Execute your_commands on range "/RE1/,/RE2/", but if /RE2/ is
5a67ee
     # not found, do nothing.
5a67ee
     /RE1/{:a;N;/RE2/!ba;your_commands;}
5a67ee

5a67ee
     # Execute your_commands on range "20,/RE/", but if /RE/ is not
5a67ee
     # found, do nothing.
5a67ee
     20{:a;N;/RE/!ba;your_commands;}
5a67ee

5a67ee
   As a side note, once we've used N to "slurp" lines together to test
5a67ee
   for the ending expression, the pattern space will have gathered
5a67ee
   many lines (possibly thousands) together and concatenated them as a
5a67ee
   single expression, with the \n sequence marking line breaks. The
5a67ee
   REs *within* the pattern space may have to be modified (e.g., you
5a67ee
   must write '/\nStart/' instead of '/^Start/' and '/[^\n]*/' instead
5a67ee
   of '/.*/') and other standard sed commands will be unavailable or
5a67ee
   difficult to use.
5a67ee

5a67ee
     # Execute your_commands on range "/RE/,30", but if /RE/ occurs
5a67ee
     # on line 31 or later, do not match it.
5a67ee
     1,30{/RE/,$ your_commands;}
5a67ee

5a67ee
   For related suggestions on using address ranges, see sections 4.2,
5a67ee
   4.15, and 4.19 of this FAQ. Also, note the following section.
5a67ee

5a67ee
3.4. Address ranges in GNU sed and HHsed
5a67ee

5a67ee
   (1) GNU sed 3.02+, ssed, and sed15+ all support address ranges like:
5a67ee

5a67ee
       /regex/,+5
5a67ee

5a67ee
   which match /regex/ plus the next 5 lines (or EOF, whichever comes
5a67ee
   first).
5a67ee

5a67ee
   (2) GNU sed v3.02.80 (and above) and ssed support address ranges of:
5a67ee

5a67ee
       0,/regex/
5a67ee

5a67ee
   as a special case to permit matching /regex/ if it occurs on the
5a67ee
   first line. This syntax permits a range expression that matches
5a67ee
   every line from the top of the file to the first instance of
5a67ee
   /regex/, even if /regex/ is on the first line.
5a67ee

5a67ee
   (3) HHsed (sed15) has an exceptional way of implementing
5a67ee

5a67ee
       /regex1/,/regex2/
5a67ee

5a67ee
   If /RE1/ and /RE2/ both occur on the *same* line, HHsed will match
5a67ee
   that single line. In other words, an address range block can
5a67ee
   consist of just one line. HHsed will then look for the next
5a67ee
   occurrence of /regex1/ to begin the block again.
5a67ee

5a67ee
   Every other version of sed (including sed16) requires 2 lines to
5a67ee
   match an address range, and thus /regex1/ and /regex2/ cannot
5a67ee
   successfully match just one line. See also the comments at
5a67ee
   section 7.9.4, below.
5a67ee

5a67ee
   (4) BEGIN~STEP selection: ssed and GNU sed (v2.05 and above) offer
5a67ee
   a form of addressing called "BEGIN~STEP selection". This is *not* a
5a67ee
   range address, which selects an inclusive block of consecutive
5a67ee
   lines from /start/ to /finish/. But I think it seems to belong here.
5a67ee

5a67ee
   Given an expression of the form "M~N", where M and N are integers,
5a67ee
   GNU sed and ssed will select every Nth line, beginning at line M.
5a67ee
   (With gsed v2.05, M had to be less than N, but this restriction is
5a67ee
   no longer necessary). Both M and N may equal 0 ("0~0" selects every
5a67ee
   line). These examples illustrate the syntax:
5a67ee

5a67ee
     sed '1~3d' file      # delete every 3d line, starting with line 1
5a67ee
                          # deletes lines 1, 4, 7, 10, 13, 16, ...
5a67ee

5a67ee
     sed '0~3d' file      # deletes lines 3, 6, 9, 12, 15, 18, ...
5a67ee

5a67ee
     sed -n '2~5p' file   # print every 5th line, starting with line 2
5a67ee
                          # prints lines 2, 7, 12, 17, 22, 27, ...
5a67ee

5a67ee
   (5) Finally, GNU sed v2.05 has a bug in range addressing (see
5a67ee
   section 7.5), which was fixed in the higher versions.
5a67ee

5a67ee

5a67ee
3.5. Debugging sed scripts
5a67ee

5a67ee
   The following two debuggers should make it easier to understand how
5a67ee
   sed scripts operate. They can save hours of grief when trying to
5a67ee
   determine the problems with a sed script.
5a67ee

5a67ee
   (1) sd (sed debugger), by Brian Hiles
5a67ee

5a67ee
   This debugger runs under a Unix shell, is powerful, and is easy to
5a67ee
   use. sd has conditional breakpoints and spypoints of the pattern
5a67ee
   space and hold space, on any scope defined by regex match and/or
5a67ee
   script line number. It can be semi-automated, can save diagnostic
5a67ee
   reports, and shows potential problems with a sed script before it
5a67ee
   tries to execute it. The script is robust and requires the Unix
5a67ee
   shell utilities plus the Bourne shell or Korn shell to execute.
5a67ee

5a67ee
       http://sed.sourceforge.net/grabbag/scripts/sd.ksh.txt (2003)
5a67ee
       http://sed.sourceforge.net/grabbag/scripts/sd.sh.txt  (1998)
5a67ee

5a67ee
   (2) sedsed, by Aurelio Jargas
5a67ee

5a67ee
   This debugger requires Python to run it, and it uses your own
5a67ee
   version of sed, whatever that may be. It displays the current input
5a67ee
   line, the pattern space, and the hold space, before and after each
5a67ee
   sed command is executed.
5a67ee

5a67ee
       http://sedsed.sourceforge.net
5a67ee

5a67ee

5a67ee
3.6. Notes about s2p, the sed-to-perl translator
5a67ee

5a67ee
   s2p (sed to perl) is a Perl program to convert sed scripts into the
5a67ee
   Perl programming language; it is included with many versions of
5a67ee
   Perl. These problems have been found when using s2p:
5a67ee

5a67ee
   (1) Doesn't recognize the semicolon properly after s/// commands.
5a67ee

5a67ee
       s/foo/bar/g;
5a67ee

5a67ee
   (2) Doesn't trim trailing whitespace after s/// commands. Even lone
5a67ee
   trailing spaces, without comments, produce an error.
5a67ee

5a67ee
   (3) Doesn't handle multiple commands within braces. E.g.,
5a67ee

5a67ee
       1,4{=;G;}
5a67ee

5a67ee
   will produce perl code with missing braces, and miss the second "G"
5a67ee
   command as well. In fact, any commands after the first one are
5a67ee
   missed in the perl output script, and the output perl script will
5a67ee
   also contain mismatched braces.
5a67ee

5a67ee
3.7. GNU/POSIX extensions to regular expressions
5a67ee

5a67ee
   GNU sed supports "character classes" in addition to regular
5a67ee
   character sets, such as [0-9A-F]. Like regular character sets,
5a67ee
   character classes represent any single character within a set.
5a67ee

5a67ee
   "Character classes are a new feature introduced in the POSIX
5a67ee
   standard. A character class is a special notation for describing
5a67ee
   lists of characters that have a specific attribute, but where the
5a67ee
   actual characters themselves can vary from country to country
5a67ee
   and/or from character set to character set. For example, the notion
5a67ee
   of what is an alphabetic character differs in the USA and in
5a67ee
   France." [quoted from the docs for GNU awk v3.1.0.]
5a67ee

5a67ee
   Though character classes don't generally conserve space on the
5a67ee
   line, they help make scripts portable for international use. The
5a67ee
   equivalent character sets _for U.S. users_ follows:
5a67ee

5a67ee
     [[:alnum:]]  - [A-Za-z0-9]     Alphanumeric characters
5a67ee
     [[:alpha:]]  - [A-Za-z]        Alphabetic characters
5a67ee
     [[:blank:]]  - [ \x09]         Space or tab characters only
5a67ee
     [[:cntrl:]]  - [\x00-\x19\x7F] Control characters
5a67ee
     [[:digit:]]  - [0-9]           Numeric characters
5a67ee
     [[:graph:]]  - [!-~]           Printable and visible characters
5a67ee
     [[:lower:]]  - [a-z]           Lower-case alphabetic characters
5a67ee
     [[:print:]]  - [ -~]           Printable (non-Control) characters
5a67ee
     [[:punct:]]  - [!-/:-@[-`{-~]  Punctuation characters
5a67ee
     [[:space:]]  - [ \t\v\f]       All whitespace chars
5a67ee
     [[:upper:]]  - [A-Z]           Upper-case alphabetic characters
5a67ee
     [[:xdigit:]] - [0-9a-fA-F]     Hexadecimal digit characters
5a67ee

5a67ee
   Note that [[:graph:]] does not match the space " ", but [[:print:]]
5a67ee
   does. Some character classes may (or may not) match characters in
5a67ee
   the high ASCII range (ASCII 128-255 or 0x80-0xFF), depending on
5a67ee
   which C library was used to compile sed. For non-English languages,
5a67ee
   [[:alpha:]] and other classes may also match high ASCII characters.
5a67ee

5a67ee
------------------------------
5a67ee

5a67ee
4. EXAMPLES
5a67ee

5a67ee
   ONE-CHARACTER QUESTIONS
5a67ee

5a67ee
4.1. How do I insert a newline into the RHS of a substitution?
5a67ee

5a67ee
   Several versions of sed permit '\n' to be typed directly into the
5a67ee
   RHS, which is then converted to a newline on output: ssed,
5a67ee
   gsed302a+, gsed103 (with the -x switch), sed15+, sedmod, and
5a67ee
   UnixDOS sed. The _easiest_ solution is to use one of these
5a67ee
   versions.
5a67ee

5a67ee
   For other versions of sed, try one of the following:
5a67ee

5a67ee
   (a) If typing the sed script from a Bourne shell, use one backslash
5a67ee
   "\" if the script uses 'single quotes' or two backslashes "\\" if
5a67ee
   the script requires "double quotes". In the example below, note
5a67ee
   that the leading '>' on the 2nd line is generated by the shell to
5a67ee
   prompt the user for more input. The user types in slash,
5a67ee
   single-quote, and then ENTER to terminate the command:
5a67ee

5a67ee
     [sh-prompt]$ echo twolines | sed 's/two/& new\
5a67ee
     >/'
5a67ee
     two new
5a67ee
     lines
5a67ee
     [bash-prompt]$
5a67ee

5a67ee
   (b) Use a script file with one backslash '\' in the script,
5a67ee
   immediately followed by a newline. This will embed a newline into
5a67ee
   the "replace" portion. Example:
5a67ee

5a67ee
     sed -f newline.sed files
5a67ee

5a67ee
     # newline.sed
5a67ee
     s/twolines/two new\
5a67ee
     lines/g
5a67ee

5a67ee
   Some versions of sed may not need the trailing backslash. If so,
5a67ee
   remove it.
5a67ee

5a67ee
   (c) Insert an unused character and pipe the output through tr:
5a67ee

5a67ee
     echo twolines | sed 's/two/& new=/' | tr "=" "\n"   # produces
5a67ee
     two new
5a67ee
     lines
5a67ee

5a67ee
   (d) Use the "G" command:
5a67ee

5a67ee
   G appends a newline, plus the contents of the hold space to the end
5a67ee
   of the pattern space. If the hold space is empty, a newline is
5a67ee
   appended anyway. The newline is stored in the pattern space as "\n"
5a67ee
   where it can be addressed by grouping "\(...\)" and moved in the
5a67ee
   RHS. Thus, to change the "twolines" example used earlier, the
5a67ee
   following script will work:
5a67ee

5a67ee
     sed '/twolines/{G;s/\(two\)\(lines\)\(\n\)/\1\3\2/;}'
5a67ee

5a67ee
   (e) Inserting full lines, not breaking lines up:
5a67ee

5a67ee
   If one is not *changing* lines but only inserting complete lines
5a67ee
   before or after a pattern, the procedure is much easier. Use the
5a67ee
   "i" (insert) or "a" (append) command, making the alterations by an
5a67ee
   external script. To insert "This line is new" BEFORE each line
5a67ee
   matching a regex:
5a67ee

5a67ee
     /RE/i This line is new               # HHsed, sedmod, gsed 3.02a
5a67ee
     /RE/{x;s/$/This line is new/;G;}     # other seds
5a67ee

5a67ee
   The two examples above are intended as "one-line" commands entered
5a67ee
   from the console. If using a sed script, "i\" immediately followed
5a67ee
   by a literal newline will work on all versions of sed. Furthermore,
5a67ee
   the command "s/$/This line is new/" will only work if the hold
5a67ee
   space is already empty (which it is by default).
5a67ee

5a67ee
   To append "This line is new" AFTER each line matching a regex:
5a67ee

5a67ee
     /RE/a This line is new               # HHsed, sedmod, gsed 3.02a
5a67ee
     /RE/{G;s/$/This line is new/;}       # other seds
5a67ee

5a67ee
   To append 2 blank lines after each line matching a regex:
5a67ee

5a67ee
     /RE/{G;G;}                    # assumes the hold space is empty
5a67ee

5a67ee
   To replace each line matching a regex with 5 blank lines:
5a67ee

5a67ee
     /RE/{s/.*//;G;G;G;G;}         # assumes the hold space is empty
5a67ee

5a67ee
   (f) Use the "y///" command if possible:
5a67ee

5a67ee
   On some Unix versions of sed (not GNU sed!), though the s///
5a67ee
   command won't accept '\n' in the RHS, the y/// command does. If
5a67ee
   your Unix sed supports it, a newline after "aaa" can be inserted
5a67ee
   this way (which is not portable to GNU sed or other seds):
5a67ee

5a67ee
     s/aaa/&~;; y/~/\n/;    # assuming no other '~' is on the line!
5a67ee

5a67ee
4.2. How do I represent control-codes or nonprintable characters?
5a67ee

5a67ee
   Several versions of sed support the notation \xHH, where "HH" are
5a67ee
   two hex digits, 00-FF: ssed, GNU sed v3.02.80 and above, GNU sed
5a67ee
   v1.03, sed16 and sed15 (HHsed). Try to use one of those versions.
5a67ee

5a67ee
   Sed is not intended to process binary or object code, and files
5a67ee
   which contain nulls (0x00) will usually generate errors in most
5a67ee
   versions of sed. The latest versions of GNU sed and ssed are an
5a67ee
   exception; they permit nulls in the input files and also in
5a67ee
   regexes.
5a67ee

5a67ee
   On Unix platforms, the 'echo' command may allow insertion of octal
5a67ee
   or hex values, e.g., `echo "\0nnn"` or `echo -n "\0nnn"`. The echo
5a67ee
   command may also support syntax like '\\b' or '\\t' for backspace
5a67ee
   or tab characters. Check the man pages to see what syntax your
5a67ee
   version of echo supports. Some versions support the following:
5a67ee

5a67ee
     # replace 0x1A (32 octal) with ASCII letters
5a67ee
     sed 's/'`echo "\032"`'/Ctrl-Z/g'
5a67ee

5a67ee
     # note the 3 backslashes in the command below
5a67ee
     sed "s/.`echo \\\b`//g"
5a67ee

5a67ee
4.3. How do I convert files with toggle characters, like +this+, to
5a67ee
look like [i]this[/i]?
5a67ee

5a67ee
   Input files, especially message-oriented text files, often contain
5a67ee
   toggle characters for emphasis, like ~this~, *this*, or =this=. Sed
5a67ee
   can make the same input pattern produce alternating output each
5a67ee
   time it is encountered. Typical needs might be to generate HMTL
5a67ee
   codes or print codes for boldface, italic, or underscore. This
5a67ee
   script accomodates multiple occurrences of the toggle pattern on
5a67ee
   the same line, as well as cases where the pattern starts on one
5a67ee
   line and finishes several lines later, even at the end of the file:
5a67ee

5a67ee
     # sed script to convert +this+ to [i]this[/i]
5a67ee
     :a
5a67ee
     /+/{ x;        # If "+" is found, switch hold and pattern space
5a67ee
       /^ON/{       # If "ON" is in the (former) hold space, then ..
5a67ee
         s///;      # .. delete it
5a67ee
         x;         # .. switch hold space and pattern space back
5a67ee
         s|+|[/i]|; # .. turn the next "+" into "[/i]"
5a67ee
         ba;        # .. jump back to label :a and start over
5a67ee
       }
5a67ee
     s/^/ON/;       # Else, "ON" was not in the hold space; create it
5a67ee
     x;             # Switch hold space and pattern space
5a67ee
     s|+|[i]|;      # Turn the first "+" into "[i]"
5a67ee
     ba;            # Branch to label :a to find another pattern
5a67ee
     }
5a67ee
     #---end of script---
5a67ee

5a67ee
   This script uses the hold space to create a "flag" to indicate
5a67ee
   whether the toggle is ON or not. We have added remarks to
5a67ee
   illustrate the script logic, but in most versions of sed remarks
5a67ee
   are not permitted after 'b'ranch commands or labels.
5a67ee

5a67ee
   If you are sure that the +toggle+ characters never cross line
5a67ee
   boundaries (i.e., never begin on one line and end on another), this
5a67ee
   script can be reduced to one line:
5a67ee

5a67ee
     s|+\([^+][^+]*\)+|[i]\1[/i]|g
5a67ee

5a67ee
   If your toggle pattern contains regex metacharacters (such as '*'
5a67ee
   or perhaps '+' or '?'), remember to quote them with backslashes.
5a67ee

5a67ee
   CHANGING STRINGS
5a67ee

5a67ee
4.10. How do I perform a case-insensitive search?
5a67ee

5a67ee
   Several versions of sed support case-insensitive matching: ssed and
5a67ee
   GNU sed v3.02+ (with I flag after s/// or /regex/); sedmod with the
5a67ee
   -i switch; and sed16 (which supports both types of switches).
5a67ee

5a67ee
   With other versions of sed, case-insensitive searching is awkward,
5a67ee
   so people may use awk or perl instead, since these programs have
5a67ee
   options for case-insensitive searches. In gawk/mawk, use "BEGIN
5a67ee
   {IGNORECASE=1}" and in perl, "/regex/i". For other seds, here are
5a67ee
   three solutions:
5a67ee

5a67ee
   Solution 1: convert everything to upper case and search normally
5a67ee

5a67ee
     # sed script, solution 1
5a67ee
     h;          # copy the original line to the hold space
5a67ee
                 # convert the pattern space to solid caps
5a67ee
     y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
5a67ee
                 # now we can search for the word "CARLOS"
5a67ee
     /CARLOS/ {
5a67ee
          # add or insert lines. Note: "s/.../.../" will not work
5a67ee
          # here because we are searching a modified pattern
5a67ee
          # space and are not printing the pattern space.
5a67ee
     }
5a67ee
     x;          # get back the original pattern space
5a67ee
                 # the original pattern space will be printed
5a67ee
     #---end of sed script---
5a67ee

5a67ee
   Solution 2: search for both cases
5a67ee

5a67ee
   Often, proper names will either start with all lower-case ("unix"),
5a67ee
   with an initial capital letter ("Unix") or occur in solid caps
5a67ee
   ("UNIX"). There may be no need to search for every possibility.
5a67ee

5a67ee
     /UNIX/b match
5a67ee
     /[Uu]nix/b match
5a67ee

5a67ee
   Solution 3: search for all possible cases
5a67ee

5a67ee
     # If you must, search for any possible combination
5a67ee
     /[Ca][Aa][Rr][Ll][Oo][Ss]/ { ... }
5a67ee

5a67ee
   Bear in mind that as the pattern length increases, this solution
5a67ee
   becomes an order of magnitude slower than the one of Solution 1, at
5a67ee
   least with some implementations of sed.
5a67ee

5a67ee
4.11. How do I match only the first occurrence of a pattern?
5a67ee

5a67ee
   (1) The general solution is to use GNU sed or ssed, with one of
5a67ee
   these range expressions. The first script ("print only the first
5a67ee
   match") works with any version of sed:
5a67ee

5a67ee
     sed -n '/RE/{p;q;}' file       # print only the first match
5a67ee
     sed '0,/RE/{//d;}' file        # delete only the first match
5a67ee
     sed '0,/RE/s//to_that/' file   # change only the first match
5a67ee

5a67ee
   (2) If you cannot use GNU sed and if you *know* the pattern will
5a67ee
   not occur on the first line, this will work:
5a67ee

5a67ee
     sed '1,/RE/{//d;}' file        # delete only the first match
5a67ee
     sed '1,/RE/s//to_that/' file   # change only the first match
5a67ee

5a67ee
   (3) If you cannot use GNU sed and the pattern *might* occur on the
5a67ee
   first line, use one of the following commands (credit for short GNU
5a67ee
   script goes to Donald Bruce Stewart):
5a67ee

5a67ee
     sed '/RE/{x;/Y/!{s/^/Y/;h;d;};x;}' file       # delete (one way)
5a67ee
     sed -e '/RE/{d;:a' -e '$!N;$ba' -e '}' file   # delete (another way)
5a67ee
     sed '/RE/{d;:a;N;$ba;}' file                  # same script, GNU sed
5a67ee
     sed -e '/RE/{s//to_that/;:a' -e '$!N;$!ba' -e '}' file  # change
5a67ee

5a67ee
   Still another solution, using a flag in the hold space. This is
5a67ee
   portable to all seds and works if the pattern is on the first line:
5a67ee

5a67ee
     # sed script to change "foo" to "bar" only on the first occurrence
5a67ee
     1{x;s/^/first/;x;}
5a67ee
     1,/foo/{x;/first/s///;x;s/foo/bar/;}
5a67ee
     #---end of script---
5a67ee

5a67ee
4.12. How do I parse a comma-delimited (CSV) data file?
5a67ee

5a67ee
   Comma-delimited data files can come in several forms, requiring
5a67ee
   increasing levels of complexity in parsing and handling. They are
5a67ee
   often referred to as CSV files (for "comma separated values") and
5a67ee
   occasionally as SDF files (for "standard data format"). Note that
5a67ee
   some vendors use "SDF" to refer to variable-length records with
5a67ee
   comma-separated fields which are "double-quoted" if they contain
5a67ee
   character values, while other vendors use "SDF" to designate
5a67ee
   fixed-length records with fixed-length, nonquoted fields! (For help
5a67ee
   with fixed-length fields, see question 4.23)
5a67ee

5a67ee
   The term "CSV" became a de-facto standard when Microsoft Excel used
5a67ee
   it as an optional output file format.
5a67ee

5a67ee
   Here are 4 different forms you may encounter in comma-delimited data:
5a67ee

5a67ee
   (a) No quotes, no internal commas
5a67ee

5a67ee
       1001,John Smith,PO Box 123,Chicago,IL,60699
5a67ee
       1002,Mary Jones,320 Main,Denver,CO,84100,
5a67ee

5a67ee
   (b) Like (a), with quotes around each field
5a67ee

5a67ee
       "1003","John Smith","PO Box 123","Chicago","IL","60699"
5a67ee
       "1004","Mary Jones","320 Main","Denver","CO","84100"
5a67ee

5a67ee
   (c) Like (b), with embedded commas
5a67ee

5a67ee
       "1005","Tom Hall, Jr.","61 Ash Ct.","Niles","OH","44446"
5a67ee
       "1006","Bob Davis","429 Pine, Apt. 5","Boston","MA","02128"
5a67ee

5a67ee
   (d) Like (c), with embedded commas and quotes
5a67ee

5a67ee
       "1007","Sue "Red" Smith","19 Main","Troy","MI","48055"
5a67ee
       "1008","Joe "Hey, guy!" Hall","POB 44","Reno","NV","89504"
5a67ee

5a67ee
   In each example above, we have 7 fields and 6 commas which function
5a67ee
   as field separators. Case (c) is a very typical form of these data
5a67ee
   files, with double quotes used to enclose each field and to protect
5a67ee
   internal commas (such as "Tom Hall, Jr.") from interpretation as
5a67ee
   field separators. However, many times the data may include both
5a67ee
   embedded quotation marks as well as embedded commas, as seen by
5a67ee
   case (d), above.
5a67ee

5a67ee
   Case (d) is the closest to Microsoft CSV format. *However*, the
5a67ee
   Microsoft CSV format allows embedded newlines within a
5a67ee
   double-quoted field. If embedded newlines within fields are a
5a67ee
   possibility for your data, you should consider using something
5a67ee
   other than sed to work with the data file.
5a67ee

5a67ee
   Before handling a comma-delimited data file, make sure that you
5a67ee
   fully understand its format and check the integrity of the data.
5a67ee
   Does each line contain the same number of fields? Should certain
5a67ee
   fields be composed only of numbers or of two-letter state
5a67ee
   abbreviations in all caps? Sed (or awk or perl) should be used to
5a67ee
   validate the integrity of the data file before you attempt to alter
5a67ee
   it or extract particular fields from the file.
5a67ee

5a67ee
   After ensuring that each line has a valid number of fields, use sed
5a67ee
   to locate and modify individual fields, using the \(...\) grouping
5a67ee
   command where needed.
5a67ee

5a67ee
   In case (a):
5a67ee

5a67ee
     sed 's/^[^,]*,[^,]*,[^,]*,[^,]*,/.../'
5a67ee
             ^     ^     ^
5a67ee
             |     |     |_ 3rd field
5a67ee
             |     |_______ 2nd field
5a67ee
             |_____________ 1st field
5a67ee

5a67ee
     # Unix script to delete the second field for case (a)
5a67ee
     sed 's/^\([^,]*\),[^,]*,/\1,,/' file
5a67ee

5a67ee
     # Unix script to change field 1 to 9999 for case (a)
5a67ee
     sed 's/^[^,]*,/9999,/' file
5a67ee

5a67ee
   In cases (b) and (c):
5a67ee

5a67ee
     sed 's/^"[^"]*","[^"]*","[^"]*","[^"]*",/.../'
5a67ee
              1st--   2nd--   3rd--   4th--
5a67ee

5a67ee
     # Unix script to delete the second field for case (c)
5a67ee
     sed 's/^\("[^"]*"\),"[^"]*",/\1,"",/' file
5a67ee

5a67ee
     # Unix script to change field 1 to 9999 for case (c)
5a67ee
     sed 's/^"[^"]*",/"9999",/' file
5a67ee

5a67ee

5a67ee
   In case (d):
5a67ee

5a67ee
   One way to parse such files is to replace the 3-character field
5a67ee
   separator "," with an unused character like the tab or vertical
5a67ee
   bar. (Technically, the field separator is only the comma while the
5a67ee
   fields are surrounded by "double quotes", but the net _effect_ is
5a67ee
   that fields are separated by quote-comma-quote, with quote
5a67ee
   characters added to the beginning and end of each record.) Search
5a67ee
   your datafile _first_ to make sure that your character appears
5a67ee
   nowhere in it!
5a67ee

5a67ee
     sed -n '/|/p' file        # search for any instance of '|'
5a67ee
     # if it's not found, we can use the '|' to separate fields
5a67ee

5a67ee
   Then replace the 3-character field separator and parse as before:
5a67ee

5a67ee
     # sed script to delete the second field for case (d)
5a67ee
     s/","/|/g;                  # global change of "," to bar
5a67ee
     s/^\([^|]*\)|[^|]|/\1||/;   # delete 2nd field
5a67ee
     s/|/","/g;                  # global change of bar back to ","
5a67ee
     #---end of script---
5a67ee

5a67ee
     # sed script to change field 1 to 9999 for case (d)
5a67ee
     # Remember to accommodate leading and trailing quote marks
5a67ee
     s/","/|/g;
5a67ee
     s/^[^|]*|/"9999|/;
5a67ee
     s/|/","/g;
5a67ee
     #---end of script---
5a67ee

5a67ee
   Note that this technique works only if _each_ and _every_ field is
5a67ee
   surrounded with double quotes, including empty fields.
5a67ee

5a67ee
   The following solution is for more complex examples of (d), such
5a67ee
   as: not all fields contain "double-quote" marks, or the presence of
5a67ee
   embedded "double-quote" marks within fields, or extraneous
5a67ee
   whitespace around field delimiters. (Thanks to Greg Ubben for this
5a67ee
   script!)
5a67ee

5a67ee
     # sed script to convert case (d) to bar-delimited records
5a67ee
     s/^ *\(.*[^ ]\) *$/|\1|/;
5a67ee
     s/" *, */"|/g;
5a67ee
     : loop
5a67ee
     s/| *\([^",|][^,|]*\) *, */|\1|/g;
5a67ee
     s/| *, */|\1|/g;
5a67ee
     t loop
5a67ee
     s/  *|/|/g;
5a67ee
     s/|  */|/g;
5a67ee
     s/^|\(.*\)|$/\1/;
5a67ee
     #---end of script---
5a67ee

5a67ee
   For example, it turns this (which is badly-formed but legal):
5a67ee

5a67ee
   first,"",unquoted ,""this" is, quoted " ,, sub "quote" inside, f", lone  " empty:
5a67ee

5a67ee
   into this:
5a67ee

5a67ee
   first|""|unquoted|""this" is, quoted "||sub "quote" inside|f"|lone  "   empty:
5a67ee

5a67ee
   Note that the script preserves the "double-quote" marks, but
5a67ee
   changes only the commas where they are used as field separators. I
5a67ee
   have used the vertical bar "|" because it's easier to read, but you
5a67ee
   may change this to another field separator if you wish.
5a67ee

5a67ee
   If your CSV datafile is more complex, it would probably not be
5a67ee
   worth the effort to write it in sed. For such a case, you should
5a67ee
   use Perl with a dedicated CSV module (there are at least two recent
5a67ee
   CSV parsers available from CPAN).
5a67ee

5a67ee
4.13. How do I handle fixed-length, columnar data?
5a67ee

5a67ee
   Sed handles fixed-length fields via \(grouping\) and backreferences
5a67ee
   (\1, \2, \3 ...). If we have 3 fields of 10, 25, and 9 characters
5a67ee
   per field, our sed script might look like so:
5a67ee

5a67ee
     s/^\(.\{10\}\)\(.\{25\}\)\(.\{9\}\)/\3\2\1/;  # Change the fields
5a67ee
        ^^^^^^^^^^^~~~~~~~~~~~==========           #   from 1,2,3 to 3,2,1
5a67ee
         field #1   field #2   field #3
5a67ee

5a67ee
   This is a bit hard to read. By using GNU sed or ssed with the -r
5a67ee
   switch active, it can look like this:
5a67ee

5a67ee
     s/^(.{10})(.{25})(.{9})/\3\2\1/;          # Using the -r switch
5a67ee

5a67ee
   To delete a field in sed, use grouping and omit the backreference
5a67ee
   from the field to be deleted. If the data is long or difficult to
5a67ee
   work with, use ssed with the -R switch and the /x flag after an s///
5a67ee
   command, to insert comments and remarks about the fields.
5a67ee

5a67ee
   For records with many fields, use GNU awk with the FIELDWIDTHS
5a67ee
   variable set in the top of the script. For example:
5a67ee

5a67ee
     awk 'BEGIN{FIELDWIDTHS = "10 25 9"}; {print $3 $2 $1}' file
5a67ee

5a67ee
   This is much easier to read than a similar sed script, especially
5a67ee
   if there are more than 5 or 6 fields to manipulate.
5a67ee

5a67ee
4.14. How do I commify a string of numbers?
5a67ee

5a67ee
   Use the simplest script necessary to accomplish your task. As
5a67ee
   variations of the line increase, the sed script must become more
5a67ee
   complex to handle additional conditions. Whole numbers are
5a67ee
   simplest, followed by decimal formats, followed by embedded words.
5a67ee

5a67ee
   Case 1: simple strings of whole numbers separated by spaces or
5a67ee
   commas, with an optional negative sign. To convert this:
5a67ee

5a67ee
       4381, -1222333, and 70000: - 44555666 1234567890 words
5a67ee
       56890  -234567, and 89222  -999777  345888777666 chars
5a67ee

5a67ee
   to this:
5a67ee

5a67ee
       4,381, -1,222,333, and 70,000: - 44,555,666 1,234,567,890 words
5a67ee
       56,890  -234,567, and 89,222  -999,777  345,888,777,666 chars
5a67ee

5a67ee
   use one of these one-liners:
5a67ee

5a67ee
     sed ':a;s/\B[0-9]\{3\}\>/,&/;ta'                      # GNU sed
5a67ee
     sed -e :a -e 's/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/;ta'  # other seds
5a67ee

5a67ee
   Case 2: strings of numbers which may have an embedded decimal
5a67ee
   point, separated by spaces or commas, with an optional negative
5a67ee
   sign. To change this:
5a67ee

5a67ee
       4381,  -6555.1212 and 70000,  7.18281828  44906982.071902
5a67ee
       56890   -2345.7778 and 8.0000:  -49000000 -1234567.89012
5a67ee

5a67ee
   to this:
5a67ee

5a67ee
       4,381,  -6,555.1212 and 70,000,  7.18281828  44,906,982.071902
5a67ee
       56,890   -2,345.7778 and 8.0000:  -49,000,000 -1,234,567.89012
5a67ee

5a67ee
   use the following command for GNU sed:
5a67ee

5a67ee
     sed ':a;s/\(^\|[^0-9.]\)\([0-9]\+\)\([0-9]\{3\}\)/\1\2,\3/g;ta'
5a67ee

5a67ee
   and for other versions of sed:
5a67ee

5a67ee
     sed -f case2.sed files
5a67ee

5a67ee
     # case2.sed
5a67ee
     s/^/ /;                 # add space to start of line
5a67ee
     :a
5a67ee
     s/\( [-0-9]\{1,\}\)\([0-9]\{3\}\)/\1,\2/g
5a67ee
     ta
5a67ee
     s/ //;                  # remove space from start of line
5a67ee
     #---end of script---
5a67ee

5a67ee
4.15. How do I prevent regex expansion on substitutions?
5a67ee

5a67ee
   Sometimes you want to *match* regular expression metacharacters as
5a67ee
   literals (e.g., you want to match "[0-9]" or "\n"), to be replaced
5a67ee
   with something else. The ordinary way to prevent expanding
5a67ee
   metacharacters is to prefix them with a backslash. Thus, if "\n"
5a67ee
   matches a newline, "\\n" will match the two-character string of
5a67ee
   'backslash' followed by 'n'.
5a67ee

5a67ee
   But doing this repeatedly can become tedious if there are many
5a67ee
   regexes. The following script will replace alternating strings of
5a67ee
   literals, where no character is interpreted as a regex
5a67ee
   metacharacter:
5a67ee

5a67ee
     # filename: sub_quote.sed
5a67ee
     #   author: Paolo Bonzini
5a67ee
     # sed script to add backslash to find/replace metacharacters
5a67ee
     N;                  # add even numbered line to pattern space
5a67ee
     s,[]/\\$*[],\\&,;;  # quote all of [, ], /, \, $, or *
5a67ee
     s,^,s/,;            # prepend "s/" to front of pattern space
5a67ee
     s,$,/,;             # append "/" to end of pattern space
5a67ee
     s,\n,/,;            # change "\n" to "/", making s/from/to/
5a67ee
     #---end of script---
5a67ee

5a67ee
   Here's a sample of how sub_quote.sed might be used. This example
5a67ee
   converts typical sed regexes to perl-style regexes. The input file
5a67ee
   consists of 10 lines:
5a67ee

5a67ee
       [0-9]
5a67ee
       \d
5a67ee
       [^0-9]
5a67ee
       \D
5a67ee
       \+
5a67ee
       +
5a67ee
       \?
5a67ee
       ?
5a67ee
       \|
5a67ee
       |
5a67ee

5a67ee
   Run the command "sed -f sub_quote.sed input", to transform the
5a67ee
   input file (above) to 5 lines of output:
5a67ee

5a67ee
       s/\[0-9\]/\\d/
5a67ee
       s/\[^0-9\]/\\D/
5a67ee
       s/\\+/+/
5a67ee
       s/\\?/?/
5a67ee
       s/\\|/|/
5a67ee

5a67ee
   The above file is itself a sed script, which can then be used to
5a67ee
   modify other files.
5a67ee

5a67ee
4.16. How do I convert a string to all lowercase or capital letters?
5a67ee

5a67ee
   The easiest method is to use a new version of GNU sed, ssed, sedmod
5a67ee
   or sed16 and employ the \U, \L, or other switches on the right side
5a67ee
   of an s/// command. For example, to convert any word which begins
5a67ee
   with "reg" or "exp" into solid capital letters:
5a67ee

5a67ee
       sed -r "s/\<(reg|exp)[a-z]+/\U&/g"              # gsed4.+ or ssed
5a67ee
       sed "s/\
5a67ee

5a67ee
   As you can see, sedmod and sed16 do not support alternation (|),
5a67ee
   but they do support case conversion. If none of these versions of
5a67ee
   sed are available to you, some sample scripts for this task are
5a67ee
   available from the Seder's Grab Bag:
5a67ee

5a67ee
       http://sed.sourceforge.net/grabbag/scripts
5a67ee

5a67ee
   Note that some case conversion scripts are listed under "Filename
5a67ee
   manipulation" and others are under "Text formatting."
5a67ee

5a67ee
   CHANGING BLOCKS (consecutive lines)
5a67ee

5a67ee
4.20. How do I change only one section of a file?
5a67ee

5a67ee
   You can match a range of lines by line number, by regexes (say, all
5a67ee
   lines between the words "from" and "until"), or by a combination of
5a67ee
   the two. For multiple substitutions on the same range, put the
5a67ee
   command(s) between braces {...}. For example:
5a67ee

5a67ee
     # replace only between lines 1 and 20
5a67ee
     1,20 s/Johnson/White/g
5a67ee

5a67ee
     # replace everywhere EXCEPT between lines 1 and 20
5a67ee
     1,20 !s/Johnson/White/g
5a67ee

5a67ee
     # replace only between words "from" and "until". Note the
5a67ee
     # use of \<....\> as word boundary markers in GNU sed.
5a67ee
     /from/,/until/ { s/\<red\>/magenta/g; s/\<blue\>/cyan/g; }
5a67ee

5a67ee
     # replace only from the words "ENDNOTES:" to the end of file
5a67ee
     /ENDNOTES:/,$ { s/Schaff/Herzog/g; s/Kraft/Ebbing/g; }
5a67ee

5a67ee
   For technical details on using address ranges, see section 3.3
5a67ee
   ("Addressing and Address ranges").
5a67ee

5a67ee
4.21. How do I delete or change a block of text if the block contains
5a67ee
      a certain regular expression?
5a67ee

5a67ee
   The following deletes the block between 'start' and 'end'
5a67ee
   inclusively, if and only if the block contains the string
5a67ee
   'regex'. Written by Russell Davies, with additional comments:
5a67ee

5a67ee
     # sed script to delete a block if /regex/ matches inside it
5a67ee
     :t
5a67ee
     /start/,/end/ {    # For each line between these block markers..
5a67ee
        /end/!{         #   If we are not at the /end/ marker
5a67ee
           $!{          #     nor the last line of the file,
5a67ee
              N;        #     add the Next line to the pattern space
5a67ee
              bt
5a67ee
           }            #   and branch (loop back) to the :t label.
5a67ee
        }               # This line matches the /end/ marker.
5a67ee
        /regex/d;       # If /regex/ matches, delete the block.
5a67ee
     }                  # Otherwise, the block will be printed.
5a67ee
     #---end of script---
5a67ee

5a67ee
   Note: When the script above reaches /regex/, the entire multi-line
5a67ee
   block is in the pattern space. To replace items inside the block,
5a67ee
   use "s///". To change the entire block, use the 'c' (change)
5a67ee
   command:
5a67ee

5a67ee
     /regex/c\
5a67ee
     1: This will replace the entire block\
5a67ee
     2: with these two lines of text.
5a67ee

5a67ee
4.22. How do I locate a paragraph of text if the paragraph contains a
5a67ee
      certain regular expression?
5a67ee

5a67ee
   Assume that paragraphs are separated by blank lines. For regexes
5a67ee
   that are single terms, use one of the following scripts:
5a67ee

5a67ee
     sed -e '/./{H;$!d;}' -e 'x;/regex/!d'      # most seds
5a67ee
     sed '/./{H;$!d;};x;/regex/!d'              # GNU sed
5a67ee

5a67ee
   To print paragraphs only if they contain 3 specific regular
5a67ee
   expressions (RE1, RE2, and RE3), in any order in the paragraph:
5a67ee

5a67ee
     sed -e '/./{H;$!d;}' -e 'x;/RE1/!d;/RE2/!d;/RE3/!d'
5a67ee

5a67ee
   With this solution and the preceding one, if the paragraphs are
5a67ee
   excessively long (more than 4k in length), you may overflow sed's
5a67ee
   internal buffers. If using HHsed, you must add a "G;" command
5a67ee
   immediately after the "x;" in the scripts above to defeat a bug
5a67ee
   in HHsed (see section 7.9(5), below, for a description).
5a67ee

5a67ee
4.23. How do I match a block of _specific_ consecutive lines?
5a67ee

5a67ee
   There are three ways to approach this problem:
5a67ee

5a67ee
       (1) Try to use a "/range/, /expression/"
5a67ee
       (2) Try to use a "/multi-line\nexpression/"
5a67ee
       (3) Try to use a block of "literal strings"
5a67ee

5a67ee
   We describe each approach in the following sections.
5a67ee

5a67ee
4.23.1.  Try to use a "/range/, /expression/"
5a67ee

5a67ee
   If the block of lines are strings that *never change their order*
5a67ee
   and if the top line never occurs outside the block, like this:
5a67ee

5a67ee
       Abel
5a67ee
       Baker
5a67ee
       Charlie
5a67ee
       Delta
5a67ee

5a67ee
   then these solutions will work for deleting the block:
5a67ee

5a67ee
     sed 's/^Abel$/{N;N;N;d;}' files    # for blocks with few lines
5a67ee
     sed '/^Abel$/, /^Zebra$/d' files   # for blocks with many lines
5a67ee
     sed '/^Abel$/,+25d' files          # HHsed, sedmod, ssed, gsed 3.02.80
5a67ee

5a67ee
   To change the block, use the 'c' (change) command instead of 'd'.
5a67ee
   To print that block only, use the -n switch and 'p' (print) instead
5a67ee
   of 'd'. To change some things inside the block, try this:
5a67ee

5a67ee
     /^Abel$/,/^Delta$/ {
5a67ee
         :ack
5a67ee
         N;
5a67ee
         /\nDelta$/! b ack
5a67ee
         # At this point, all the lines in the block are collected
5a67ee
         s/ubstitute /somethin/g;
5a67ee
     }
5a67ee

5a67ee
4.23.2.  Try to use a "multi-line\nexpression"
5a67ee

5a67ee
   If the top line of the block sometimes appears alone or is
5a67ee
   sometimes followed by other lines, or if a partial block may occur
5a67ee
   somewhere in the file, a multi-line expression may be required.
5a67ee

5a67ee
   In these examples, we give solutions for matching an N-line block.
5a67ee
   The expression "/^RE1\nRE2\nRE3...$/" represents a properly formed
5a67ee
   regular expression where \n indicates a newline between lines. Note
5a67ee
   that the 'N' followed by the 'P;D;' commands forms a "sliding
5a67ee
   window" technique. A window of N lines is formed. If the multi-line
5a67ee
   pattern matches, the block is handled. If not, the top line is
5a67ee
   printed and then deleted from the pattern space, and we try to
5a67ee
   match at the next line.
5a67ee

5a67ee
     # sed script to delete 2 consecutive lines: /^RE1\nRE2$/
5a67ee
     $b
5a67ee
     /^RE1$/ {
5a67ee
       $!N
5a67ee
       /^RE1\nRE2$/d
5a67ee
       P;D
5a67ee
     }
5a67ee
     #---end of script---
5a67ee

5a67ee
     # sed script to delete 3 consecutive lines. (This script
5a67ee
     # fails under GNU sed v2.05 and earlier because of the 't'
5a67ee
     # bug when s///n is used; see section 7.5(1) of the FAQ.)
5a67ee
     : more
5a67ee
     $!N
5a67ee
     s/\n/&/;;
5a67ee
     t enough
5a67ee
     $!b more
5a67ee
     : enough
5a67ee
     /^RE1\nRE2\nRE3$/d
5a67ee
     P;D
5a67ee
     #---end of script---
5a67ee

5a67ee
   For example, to delete a block of 5 consecutive lines, the previous
5a67ee
   script must be altered in only two places:
5a67ee

5a67ee
   (1) Change the 2 in "s/\n/&/;;" to a 4 (the trailing semicolon is
5a67ee
   needed to work around a bug in HHsed v1.5).
5a67ee

5a67ee
   (2) Change the regex line to "/^RE1\nRE2\nRE3\nRE4\nRE5$/d",
5a67ee
   modifying the expression as needed.
5a67ee

5a67ee
   Suppose we want to delete a block of two blank lines followed by
5a67ee
   the word "foo" followed by another blank line (4 lines in all).
5a67ee
   Other blank lines and other instances of "foo" should be left
5a67ee
   alone. After changing the '2' to a '3' (always one number less than
5a67ee
   the total number of lines), the regex line would look like this:
5a67ee
   "/^\n\nfoo\n$/d". (Thanks to Greg Ubben for this script.)
5a67ee

5a67ee
   As an alternative to work around the 't' bug in older versions of
5a67ee
   GNU sed, the following script will delete 4 consecutive lines:
5a67ee

5a67ee
     # sed script to delete 4 consecutive lines. Use this if you
5a67ee
     # require GNU sed 2.05 and below.
5a67ee
     /^RE1$/!b
5a67ee
     $!N
5a67ee
     $!N
5a67ee
     :a
5a67ee
     $b
5a67ee
     N
5a67ee
     /^RE1\nRE2\nRE3\nRE4$/d
5a67ee
     P
5a67ee
     s/^.*\n\(.*\n.*\n.*\)$/\1/
5a67ee
     ba
5a67ee
     #---end of script---
5a67ee

5a67ee
   Its drawback is that it must be modified in 3 places instead of 2
5a67ee
   to adapt it for more lines, and as additional lines are added, the
5a67ee
   's' command is forced to work harder to match the regexes. On the
5a67ee
   other hand, it avoids a bug with gsed-2.05 and illustrates another
5a67ee
   way to solve the problem of deleting consecutive lines.
5a67ee

5a67ee
4.23.3.  Try to use a block of "literal strings"
5a67ee

5a67ee
   If you need to match a static block of text (which may occur any
5a67ee
   number of times throughout a file), where the contents of the block
5a67ee
   are known in advance, then this script is easy to use. It requires
5a67ee
   an intermediate file, which we will call "findrep.txt" (below):
5a67ee

5a67ee
       A block of several consecutive lines to
5a67ee
       be matched literally should be placed on
5a67ee
       top. Regular expressions like .*  or [a-z]
5a67ee
       will lose their special meaning and be
5a67ee
       interpreted literally in this block.
5a67ee
       ----
5a67ee
       Four hyphens separate the two sections. Put
5a67ee
       the replacement text in the lower section.
5a67ee
       As above, sed symbols like &, \n, or \1 will
5a67ee
       lose their special meaning.
5a67ee

5a67ee
   This is a 3-step process. A generic script called "blockrep.sed"
5a67ee
   will read "findrep.txt" (above) and generate a custom script, which
5a67ee
   is then used on the actual input file. In other words,
5a67ee
   "findrep.txt" is a simplified description of the editing that you
5a67ee
   want to do on the block, and "blockrep.sed" turns it into actual
5a67ee
   sed commands.
5a67ee

5a67ee
   Use this process from a Unix shell or from a DOS prompt:
5a67ee

5a67ee
     sed -nf blockrep.sed findrep.txt >custom.sed
5a67ee
     sed -f custom.sed input.file >output.file
5a67ee
     erase custom.sed
5a67ee

5a67ee
   The generic script "blockrep.sed" follows below. It's fairly long.
5a67ee
   Examining its output might help you understanding how to use the
5a67ee
   _sliding window_ technique.
5a67ee

5a67ee
     # filename: blockrep.sed
5a67ee
     #   author: Paolo Bonzini
5a67ee
     # Requires:
5a67ee
     #    (1) blocks to find and replace, e.g., findrep.txt
5a67ee
     #    (2) an input file to be changed, input.file
5a67ee
     #
5a67ee
     # blockrep.sed creates a second sed script, custom.sed,
5a67ee
     # to find the lines above the row of 4 hyphens, globally
5a67ee
     # replacing them with the lower block of text. GNU sed
5a67ee
     # is recommended but not required for this script.
5a67ee
     #
5a67ee
     # Loop on the first part, accumulating the `from' text
5a67ee
     # into the hold space.
5a67ee
     :a
5a67ee
     /^----$/! {
5a67ee
        # Escape slashes, backslashes, the final newline and
5a67ee
        # regular expression metacharacters.
5a67ee
        s,[/\[.*],\\&,g
5a67ee
        s/$/\\/
5a67ee
        H
5a67ee
        #
5a67ee
        # Append N cmds needed to maintain the sliding window.
5a67ee
        x
5a67ee
        1 s,^.,s/,
5a67ee
        1! s/^/N\
5a67ee
     /
5a67ee
        x
5a67ee
        n
5a67ee
        ba
5a67ee
     }
5a67ee
     #
5a67ee
     # Change the final backslash to a slash to separate the
5a67ee
     # two sides of the s command.
5a67ee
     x
5a67ee
     s,\\$,/,
5a67ee
     x
5a67ee
     #
5a67ee
     # Until EOF, gather the substitution into hold space.
5a67ee
     :b
5a67ee
     n
5a67ee
     s,[/\],\\&,g
5a67ee
     $! s/$/\\/
5a67ee
     H
5a67ee
     $! bb
5a67ee
     #
5a67ee
     # Start the RHS of the s command without a leading
5a67ee
     # newline, add the P/D pair for the sliding window, and
5a67ee
     # print the script.
5a67ee
     g
5a67ee
     s,/\n,/,
5a67ee
     s,$,/\
5a67ee
     P\
5a67ee
     D,p
5a67ee
     #---end of script---
5a67ee

5a67ee
4.24. How do I address all the lines between RE1 and RE2, excluding the
5a67ee
      lines themselves?
5a67ee

5a67ee
   Normally, to address the lines between two regular expressions, RE1
5a67ee
   and RE2, one would do this: '/RE1/,/RE2/{commands;}'. Excluding
5a67ee
   those lines takes an extra step. To put 2 arrows before each line
5a67ee
   between RE1 and RE2, except for those lines:
5a67ee

5a67ee
     sed '1,/RE1/!{ /RE2/,/RE1/!s/^/>>/; }' input.fil
5a67ee

5a67ee
   The preceding script, though short, may be difficult to follow. It
5a67ee
   also requires that /RE1/ cannot occur on the first line of the
5a67ee
   input file. The following script, though it's not a one-liner, is
5a67ee
   easier to read and it permits /RE1/ to appear on the first line:
5a67ee

5a67ee
     # sed script to replace all lines between /RE1/ and /RE2/,
5a67ee
     # without matching /RE1/ or /RE2/
5a67ee
     /RE1/,/RE2/{
5a67ee
       /RE1/b
5a67ee
       /RE2/b
5a67ee
       s/^/>>/
5a67ee
     }
5a67ee
     #---end of script---
5a67ee

5a67ee
   Contents of input.fil:         Output of sed script:
5a67ee
      aaa                           aaa
5a67ee
      bbb                           bbb
5a67ee
      RE1                           RE1
5a67ee
      aaa                           >>aaa
5a67ee
      bbb                           >>bbb
5a67ee
      ccc                           >>ccc
5a67ee
      RE2                           RE2
5a67ee
      end                           end
5a67ee

5a67ee
4.25. How do I join two lines if line #1 ends in a [certain string]?
5a67ee

5a67ee
   This question appears in the section on one-line sed scripts, but
5a67ee
   it comes up so many times that it needs a place here also. Suppose
5a67ee
   a line ends with a particular string (often, a line ends with a
5a67ee
   backslash). How do you bring up the second line after it, even in
5a67ee
   cases where several consecutive lines all end in a backslash?
5a67ee

5a67ee
     sed -e :a -e '/\\$/N; s/\\\n//; ta' file   # all seds
5a67ee
     sed ':a; /\\$/N; s/\\\n//; ta' file        # GNU sed, ssed, HHsed
5a67ee

5a67ee
   Note that this replaces the backslash-newline with nothing. You may
5a67ee
   want to replace the backslash-newline with a single space instead.
5a67ee

5a67ee
4.26. How do I join two lines if line #2 begins in a [certain string]?
5a67ee

5a67ee
   The inverse situation is another FAQ. Suppose a line begins with a
5a67ee
   particular string. How do you bring that line up to follow the
5a67ee
   previous line? In this example, we want to match the string "<<="
5a67ee
   at the beginning of one line, bring that line up to the end of the
5a67ee
   line before it, and replace the string with a single space:
5a67ee

5a67ee
     sed -e :a -e '$!N;s/\n<<=/ /;ta' -e 'P;D' file   # all seds
5a67ee
     sed ':a; $!N;s/\n<<=/ /;ta;P;D' file             # GNU, ssed, sed15+
5a67ee

5a67ee
4.27. How do I change all paragraphs to long lines?
5a67ee

5a67ee
   A frequent request is how to convert DOS-style textfiles, in which
5a67ee
   each line ends with "paragraph marker", to Microsoft-style
5a67ee
   textfiles, in which the "paragraph" marker only appears at the end
5a67ee
   of real paragraphs. Sometimes this question is framed as, "How do I
5a67ee
   remove the hard returns at the end of each line in a paragraph?"
5a67ee

5a67ee
   The problem occurs because newer word processors don't work the
5a67ee
   same way older text editors did. Older text editors used a newline
5a67ee
   (CR/LF in DOS; LF alone in Unix) to end each line on screen or on
5a67ee
   disk, and used two newlines to separate paragraphs. Certain word
5a67ee
   processors wanted to make paragraph reformatting and reflowing work
5a67ee
   easily, so they use one newline to end a paragraph and never allow
5a67ee
   newlines _within_ a paragraph. This means that textfiles created
5a67ee
   with standard editors (Emacs, vi, Vedit, Boxer, etc.) appear to
5a67ee
   have "hard returns" at inappropriate places. The following sed
5a67ee
   script finds blocks of consecutive nonblank lines (i.e., paragraphs
5a67ee
   of text), and converts each block into one long line with one "hard
5a67ee
   return" at the end.
5a67ee

5a67ee
     # sed script to change all paragraphs to long lines
5a67ee
     /./{H; $!d;}             # Put each paragraph into hold space
5a67ee
     x;                       # Swap hold space and pattern space
5a67ee
     s/^\(\n\)\(..*\)$/\2\1/; # Move leading \n to end of PatSpace
5a67ee
     s/\n\(.\)/ \1/g;         # Replace all other \n with 1 space
5a67ee
     # Uncomment the following line to remove excess blank lines:
5a67ee
     # /./!d;
5a67ee
     #---end of sed script---
5a67ee

5a67ee
   If the input files have formatting or indentation that conveys
5a67ee
   special meaning (like program source code), this script will remove
5a67ee
   it. But if the text still needs to be extended, try 'par'
5a67ee
   (paragraph reformatter) or the 'fmt' utility with the -t or -c
5a67ee
   switches and the width option (-w) set to a number like 9999.
5a67ee

5a67ee
   SHELL AND ENVIRONMENT
5a67ee

5a67ee
4.30. How do I read environment variables with sed?
5a67ee

5a67ee
4.30.1. - on Unix platforms
5a67ee

5a67ee
   In Unix, environment variables begin with a dollar sign, such as
5a67ee
   $TERM, $PATH, $var or $i. In sed, the dollar sign is used to
5a67ee
   indicate the last line of the input file, the end of a line (in the
5a67ee
   LHS), or a literal symbol (in the RHS). Sed cannot access variables
5a67ee
   directly, so one must pay attention to shell quoting requirements
5a67ee
   to expand the variables properly.
5a67ee

5a67ee
   To ALLOW the Unix shell to interpret the dollar sign, put the
5a67ee
   script in double quotes:
5a67ee

5a67ee
     sed "s/_terminal-type_/$TERM/g" input.file >output.file
5a67ee

5a67ee
   To PREVENT the Unix shell from interpreting the dollar sign as a
5a67ee
   shell variable, put the script in single quotes:
5a67ee

5a67ee
     sed 's/.$//' infile >outfile
5a67ee

5a67ee
   To use BOTH Unix $environment_vars and sed /end-of-line$/ pattern
5a67ee
   matching, there are two solutions. (1) The easiest is to enclose
5a67ee
   the script in "double quotes" so the shell can see the $variables,
5a67ee
   and to prefix the sed metacharacter ($) with a backslash. Thus, in
5a67ee

5a67ee
     sed "s/$user\$/root/" file
5a67ee

5a67ee
   the shell interpolates $user and sed interprets \$ as the symbol
5a67ee
   for end-of-line.
5a67ee

5a67ee
   (2) Another method--somewhat less readable--is to concatenate the
5a67ee
   script with 'single quotes' where the $ should not be interpolated
5a67ee
   and "double quotes" where variable interpolation should occur. To
5a67ee
   demonstrate using the preceding script:
5a67ee

5a67ee
     sed "s/$user"'$/root/' file
5a67ee

5a67ee
   Solution #1 seems easier to remember. In either case, we search for
5a67ee
   the user's name (stored in a variable called $user) when it occurs
5a67ee
   at the end of the line ($), and substitute the word "root" in all
5a67ee
   matches.
5a67ee

5a67ee
   For longer shell scripts, it is sometimes useful to begin with
5a67ee
   single quote marks ('), close them upon encountering the variable,
5a67ee
   enclose the variable name in double quotes ("), and resume with
5a67ee
   single quotes, closing them at the end of the sed script.  Example:
5a67ee

5a67ee
     #! /bin/sh
5a67ee
     # sed script to illustrate 'quote'"matching"'usage'
5a67ee
     FROM='abcdefgh'
5a67ee
     TO='ABCDEFGH'
5a67ee
     sed -e '
5a67ee
     y/'"$FROM"'/'"$TO"'/;    # note the quote pairing
5a67ee
     # some more commands go here . . .
5a67ee
     # last line is a single quote mark
5a67ee
     '
5a67ee

5a67ee
   Thus, each variable named $FROM is replaced by $TO, and the single
5a67ee
   quotes are used to glue the multiple lines together in the script.
5a67ee
   (See also section 4.10, "How do I handle shell quoting in sed?")
5a67ee

5a67ee
4.30.2. - on MS-DOS and 4DOS platforms
5a67ee

5a67ee
   Under 4DOS and MS-DOS version 7.0 (Win95) or 7.10 (Win95 OSR2),
5a67ee
   environment variables can be accessed from the command prompt.
5a67ee
   Under MS-DOS v6.22 and below, environment variables can only be
5a67ee
   accessed from within batch files. Environment variables should be
5a67ee
   enclosed between percent signs and are case-insensitive; i.e.,
5a67ee
   %USER% or %user% will display the USER variable. To generate a true
5a67ee
   percent sign, just enter it twice.
5a67ee

5a67ee
   DOS versions of sed require that sed scripts be enclosed by double
5a67ee
   quote marks "..." (not single quotes!) if the script contains
5a67ee
   embedded tabs, spaces, redirection arrows or the vertical bar. In
5a67ee
   fact, if the input for sed comes from piping, a sed script should
5a67ee
   not contain a vertical bar, even if it is protected by double
5a67ee
   quotes (this seems to be bug in the normal MS-DOS syntax). Thus,
5a67ee

5a67ee
       echo blurk | sed "s/^/ |foo /"     # will cause an error
5a67ee
       sed "s/^/ |foo /" blurk.txt        # will work as expected
5a67ee

5a67ee
   Using DOS environment variables which contain DOS path statements
5a67ee
   (such as a TMP variable set to "C:\TEMP") within sed scripts is
5a67ee
   discouraged because sed will interpret the backslash '\' as a
5a67ee
   metacharacter to "quote" the next character, not as a normal
5a67ee
   symbol. Thus,
5a67ee

5a67ee
       sed "s/^/%TMP% /" somefile.txt
5a67ee

5a67ee
   will not prefix each line with (say) "C:\TEMP ", but will prefix
5a67ee
   each line with "C:TEMP "; sed will discard the backslash, which is
5a67ee
   probably not what you want. Other variables such as %PATH% and
5a67ee
   %COMSPEC% will also lose the backslash within sed scripts.
5a67ee

5a67ee
   Environment variables which do not use backslashes are usually
5a67ee
   workable. Thus, all the following should work without difficulty,
5a67ee
   if they are invoked from within DOS batch files:
5a67ee

5a67ee
       sed "s/=username=/%USER%/g" somefile.txt
5a67ee
       echo %FILENAME% | sed "s/\.TXT/.BAK/"
5a67ee
       grep -Ei "%string%" somefile.txt | sed "s/^/  /"
5a67ee

5a67ee
   while from either the DOS prompt or from within a batch file,
5a67ee

5a67ee
       sed "s/%%/ percent/g" input.fil >output.fil
5a67ee

5a67ee
   will replace each percent symbol in a file with " percent" (adding
5a67ee
   the leading space for readability).
5a67ee

5a67ee
4.31. How do I export or pass variables back into the environment?
5a67ee

5a67ee
4.31.1. - on Unix platforms
5a67ee

5a67ee
   Suppose that line #1, word #2 of the file 'terminals' contains a
5a67ee
   value to be put in your TERM environment variable. Sed cannot
5a67ee
   export variables directly to the shell, but it can pass strings to
5a67ee
   shell commands. To set a variable in the Bourne shell:
5a67ee

5a67ee
       TERM=`sed 's/^[^ ][^ ]* \([^ ][^ ]*\).*/\1/;q' terminals`;
5a67ee
       export TERM
5a67ee

5a67ee
   If the second word were "Wyse50", this would send the shell command
5a67ee
   "TERM=Wyse50".
5a67ee

5a67ee
4.31.2. - on MS-DOS or 4DOS platforms
5a67ee

5a67ee
   Sed cannot directly manipulate the environment. Under DOS, only
5a67ee
   batch files (.BAT) can do this, using the SET instruction, since
5a67ee
   they are run directly by the command shell. Under 4DOS, special
5a67ee
   4DOS commands (such as ESET) can also alter the environment.
5a67ee

5a67ee
   Under DOS or 4DOS, sed can select a word and pass it to the SET
5a67ee
   command. Suppose you want the 1st word of the 2nd line of MY.DAT
5a67ee
   put into an environment variable named %PHONE%. You might do this:
5a67ee

5a67ee
       @echo off
5a67ee
       sed -n "2 s/^\([^ ][^ ]*\) .*/SET PHONE=\1/p;3q" MY.DAT > GO_.BAT
5a67ee
       call GO_.BAT
5a67ee
       echo The environment variable for PHONE is %PHONE%
5a67ee
       :: cleanup
5a67ee
       del GO_.BAT
5a67ee

5a67ee
   The sed script assumes that the first character on the 2nd line is
5a67ee
   not a space and uses grouping \(...\) to save the first string of
5a67ee
   non-space characters as \1 for the RHS. In writing any batch files,
5a67ee
   make sure that output filenames such as GO_.BAT don't overwrite
5a67ee
   preexisting files of the same name.
5a67ee

5a67ee
4.32. How do I handle Unix shell quoting in sed?
5a67ee

5a67ee
   To embed a literal single quote (') in a script, use (a) or (b):
5a67ee

5a67ee
   (a) If possible, put the script in double quotes:
5a67ee

5a67ee
     sed "s/cannot/can't/g" file
5a67ee

5a67ee
   (b) If the script must use single quotes, then close-single-quote
5a67ee
   the script just before the SPECIAL single quote, prefix the single
5a67ee
   quote with a backslash, and use a 2nd pair of single quotes to
5a67ee
   finish marking the script. Thus:
5a67ee

5a67ee
     sed 's/cannot$/can'\''t/g' file
5a67ee

5a67ee
   Though this looks hard to read, it breaks down to 3 parts:
5a67ee

5a67ee
      's/cannot$/can'   \'   't/g'
5a67ee
      ---------------   --   -----
5a67ee

5a67ee
   To embed a literal double quote (") in a script, use (a) or (b):
5a67ee

5a67ee
   (a) If possible, put the script in single quotes. You don't need to
5a67ee
   prefix the double quotes with anything. Thus:
5a67ee

5a67ee
     sed 's/14"/fourteen inches/g' file
5a67ee

5a67ee
   (b) If the script must use double quotes, then prefix the SPECIAL
5a67ee
   double quote with a backslash (\). Thus,
5a67ee

5a67ee
     sed "s/$length\"/$length inches/g" file
5a67ee

5a67ee
   To embed a literal backslash (\) into a script, enter it twice:
5a67ee

5a67ee
     sed 's/C:\\DOS/D:\\DOS/g' config.sys
5a67ee

5a67ee
   FILES, DIRECTORIES, AND PATHS
5a67ee

5a67ee
4.40. How do I read (insert/add) a file at the top of a textfile?
5a67ee

5a67ee
   Normally, adding a "header" file to the top of a "body" file is
5a67ee
   done from the command prompt before passing the file on to sed.
5a67ee
   (MS-DOS below version 6.0 must use COPY and DEL instead of MOVE in
5a67ee
   the following example.)
5a67ee

5a67ee
       copy header.txt+body temp                  # MS-DOS command 1
5a67ee
       echo Y | move temp body                    # MS-DOS command 2
5a67ee
                                                    #
5a67ee
       cat header.txt body >temp; mv temp body    # Unix commands
5a67ee

5a67ee
   However, if inserting the file must occur within sed, there is a
5a67ee
   way. The sed command "1 r header.txt" will not work; it will print
5a67ee
   line 1 and then insert "header.txt" between lines 1 and 2. The
5a67ee
   following script solves this problem; however, there must be at
5a67ee
   least 2 lines in the target file for the script to work properly.
5a67ee

5a67ee
     # sed script to insert "header.txt" above the first line
5a67ee
     1{h; r header.txt
5a67ee
       D; }
5a67ee
     2{x; G; }
5a67ee
     #---end of sed script---
5a67ee

5a67ee
4.41. How do I make substitutions in every file in a directory, or in
5a67ee
      a complete directory tree?
5a67ee

5a67ee
4.41.1. - ssed and Perl solution
5a67ee

5a67ee
   The best solution for multiple files in a single directory is to
5a67ee
   use ssed or gsed v4.0 or higher:
5a67ee

5a67ee
     sed -i.BAK 's|foo|bar|g' files       # -i does in-place replacement
5a67ee

5a67ee
   If you don't have ssed, there is a similar solution in Perl. (Yes,
5a67ee
   we know this is a FAQ file for sed, not perl, but perl is more
5a67ee
   common than ssed for many users.)
5a67ee

5a67ee
     perl -pi.bak -e 's|foo|bar|g' files                # or
5a67ee
     perl -pi.bak -e 's|foo|bar|g' `find /pathname -name "filespec"`
5a67ee

5a67ee
   For each file in the filelist, sed (or Perl) renames the source
5a67ee
   file to "filename.bak"; the modified file gets the original
5a67ee
   filename. Remove '.bak' if you don't need backup copies. (Note the
5a67ee
   use of "s|||" instead of "s///" here, and in the scripts below. The
5a67ee
   vertical bars in the 's' command let you replace '/some/path' with
5a67ee
   '/another/path', accommodating slashes in the LHS and RHS.)
5a67ee

5a67ee
   To recurse directories in Unix or GNU/Linux:
5a67ee

5a67ee
     # We use xargs to prevent passing too many filenames to sed, but
5a67ee
     # this command will fail if filenames contain spaces or newlines.
5a67ee
     find /my/path -name '*.ht' -print | xargs sed -i.BAK 's|foo|bar|g'
5a67ee

5a67ee
   To recurse directories under Windows 2000 (CMD.EXE or COMMAND.COM):
5a67ee

5a67ee
     # This syntax isn't supported under Windows 9x COMMAND.COM
5a67ee
     for /R c:\my\path %f in (*.htm) do sed -i.BAK "s|foo|bar|g" %f
5a67ee

5a67ee
4.41.2. - Unix solution
5a67ee

5a67ee
   For all files in a single directory, assuming they end with *.txt
5a67ee
   and you have no files named "[anything].txt.bak" already, use a
5a67ee
   shell script:
5a67ee

5a67ee
     #! /bin/sh
5a67ee
     # Source files are saved as "filename.txt.bak" in case of error
5a67ee
     # The '&&' after cp is an additional safety feature
5a67ee
     for file in *.txt
5a67ee
     do
5a67ee
        cp $file $file.bak &&
5a67ee
        sed 's|foo|bar|g' $file.bak >$file
5a67ee
     done
5a67ee

5a67ee
   To do an entire directory tree, use the Unix utility find, like so
5a67ee
   (thanks to Jim Dennis <jadestar@rahul.net> for this script):
5a67ee

5a67ee
     #! /bin/sh
5a67ee
     # filename: replaceall
5a67ee
     # Backup files are NOT saved in this script.
5a67ee
     find . -type f -name '*.txt' -print | while read i
5a67ee
     do
5a67ee
        sed 's|foo|bar|g' $i > $i.tmp && mv $i.tmp $i
5a67ee
     done
5a67ee

5a67ee
   This previous shell script recurses through the directory tree,
5a67ee
   finding only files in the directory (not symbolic links, which will
5a67ee
   be encountered by the shell command "for file in *.txt", above). To
5a67ee
   preserve file permissions and make backup copies, use the 2-line cp
5a67ee
   routine of the earlier script instead of "sed ... && mv ...". By
5a67ee
   replacing the sed command 's|foo|bar|g' with something like
5a67ee

5a67ee
     sed "s|$1|$2|g" ${i}.bak > $i
5a67ee

5a67ee
   using double quotes instead of single quotes, the user can also
5a67ee
   employ positional parameters on the shell script command tail, thus
5a67ee
   reusing the script from time to time. For example,
5a67ee

5a67ee
       replaceall East West
5a67ee

5a67ee
   would modify all your *.txt files in the current directory.
5a67ee

5a67ee
4.41.3. - DOS solution:
5a67ee

5a67ee
   MS-DOS users should use two batch files like this:
5a67ee

5a67ee
      @echo off
5a67ee
      :: MS-DOS filename: REPLACE.BAT
5a67ee
      ::
5a67ee
      :: Create a destination directory to put the new files.
5a67ee
      :: Note: The next command will fail under Novel Netware
5a67ee
      :: below version 4.10 unless "SHOW DOTS=ON" is active.
5a67ee
      if not exist .\NEWFILES\NUL mkdir NEWFILES
5a67ee
      for %%f in (*.txt) do CALL REPL_2.BAT %%f
5a67ee
      echo Done!!
5a67ee
      :: ---End of first batch file---
5a67ee

5a67ee
      @echo off
5a67ee
      :: MS-DOS filename: REPL_2.BAT
5a67ee
      ::
5a67ee
      sed "s/foo/bar/g" %1 > NEWFILES\%1
5a67ee
      :: ---End of the second batch file---
5a67ee

5a67ee
   When finished, the current directory contains all the original
5a67ee
   files, and the newly-created NEWFILES subdirectory contains the
5a67ee
   modified *.TXT files. Do not attempt a command like
5a67ee

5a67ee
       for %%f in (*.txt) do sed "s/foo/bar/g" %%f >NEWFILES\%%f
5a67ee

5a67ee
   under any version of MS-DOS because the output filename will be
5a67ee
   created as a literal '%f' in the NEWFILES directory before the
5a67ee
   %%f is expanded to become each filename in (*.txt). This occurs
5a67ee
   because MS-DOS creates output filenames via redirection commands
5a67ee
   before it expands "for..in..do" variables.
5a67ee

5a67ee
   To recurse through an entire directory tree in MS-DOS requires a
5a67ee
   batch file more complex than we have room to describe. Examine the
5a67ee
   file SWEEP.BAT in Timo Salmi's great archive of batch tricks,
5a67ee
   located at <ftp://garbo.uwasa.fi/pc/link/tsbat.zip> (this file is
5a67ee
   regularly updated). Another alternative is to get an external
5a67ee
   program designed for directory recursion. Here are some recommended
5a67ee
   programs for directory recursion. The first one, FORALL, runs under
5a67ee
   either OS/2 or DOS. Unfortunately, none of these supports Win9x
5a67ee
   long filenames.
5a67ee

5a67ee
       http://hobbes.nmsu.edu/pub/os2/util/disk/forall72.zip
5a67ee
       ftp://garbo.uwasa.fi/pc/filefind/target15.zip
5a67ee

5a67ee
4.42. How do I replace "/some/UNIX/path" in a substitution?
5a67ee

5a67ee
   Technically, the normal meaning of the slash can be disabled by
5a67ee
   prefixing it with a backslash. Thus,
5a67ee

5a67ee
     sed 's/\/some\/UNIX\/path/\/a\/new\/path/g' files
5a67ee

5a67ee
   But this is hard to read and write. There is a better solution.
5a67ee
   The s/// substitution command allows '/' to be replaced by any
5a67ee
   other character (including spaces or alphanumerics). Thus,
5a67ee

5a67ee
     sed 's|/some/UNIX/path|/a/new/path|g' files
5a67ee

5a67ee
   and if you are using variable names in a Unix shell script,
5a67ee

5a67ee
     sed "s|$OLDPATH|$NEWPATH|g" oldfile >newfile
5a67ee

5a67ee
4.43. How do I replace "C:\SOME\DOS\PATH" in a substitution?
5a67ee

5a67ee
   For MS-DOS users, every backslash must be doubled. Thus, to replace
5a67ee
   "C:\SOME\DOS\PATH" with "D:\MY\NEW\PATH":
5a67ee

5a67ee
     sed "s|C:\\SOME\\DOS\\PATH|D:\\MY\\NEW\\PATH|g" infile >outfile
5a67ee

5a67ee
   Remember that DOS pathnames are not case sensitive and can appear
5a67ee
   in upper or lower case in the input file. If this concerns you, use
5a67ee
   a version of sed which can ignore case when matching (gsed, ssed,
5a67ee
   sedmod, sed16).
5a67ee

5a67ee
       @echo off
5a67ee
       :: sample MS-DOS batch file to alter path statements
5a67ee
       :: requires GNU sed with the /i flag for s///
5a67ee
       set old=C:\\SOME\\DOS\\PATH
5a67ee
       set new=D:\\MY\\NEW\\PATH
5a67ee
       gsed "s|%old%|%new%|gi" infile >outfile
5a67ee
       :: or
5a67ee
       ::     sedmod -i "s|%old%|%new%|g" infile >outfile
5a67ee
       set old=
5a67ee
       set new=
5a67ee

5a67ee
   Also, remember that under Windows long filenames may be stored in
5a67ee
   two formats: e.g., as "C:\Program Files" or as "C:\PROGRA~1".
5a67ee

5a67ee
4.44.  How do I emulate file-includes, using sed?
5a67ee

5a67ee
   Given an input file with file-include statements, similar to
5a67ee
   C-style includes or "server-side includes" (SSI) of this format:
5a67ee

5a67ee
       This is the source file. It's short.
5a67ee
       Its name is simply 'source'. See the script below.
5a67ee
       
5a67ee
              And this is any amount of text between
5a67ee
       
5a67ee
       This is the last line of the file.
5a67ee

5a67ee
   How do we direct sed to import/insert whichever files are at the
5a67ee
   point of the 'file="filename"' token? First, use this file:
5a67ee

5a67ee
     #n
5a67ee
     # filename: incl.sed
5a67ee
     # Comments supported by GNU sed or ssed. Leading '#n' should
5a67ee
     # be on line 1, columns 1-2 of the line.
5a67ee
     /
5a67ee
       =;                     #   print the line number
5a67ee
       s/^[^"]*"/{r /;        #   change pattern to 'r{ '
5a67ee
       s/".*//p;              #   delete rest to EOL, print
5a67ee
                              #   and a(ppend) a delete command
5a67ee
       a\
5a67ee
       d;}
5a67ee
     }
5a67ee
     #---end of sed script---
5a67ee

5a67ee
   Second, use the following shell script or DOS batch file (if
5a67ee
   running a DOS batch file, use "double quotes" instead of 'single
5a67ee
   quotes', and use "del" instead of "rm" to remove the temp file):
5a67ee

5a67ee
     sed -nf incl.sed source | sed 'N;N;s/\n//' >temp.sed
5a67ee
     sed -f temp.sed source >target
5a67ee
     rm temp.sed
5a67ee

5a67ee
   If you have GNU sed or ssed, you can reduce the script even further
5a67ee
   (thanks to Michael Carmack for the reminder):
5a67ee

5a67ee
     sed -nf incl.sed source | sed 'N;N;s/\n//' | sed -f - source >target
5a67ee

5a67ee
   In brief, the script replaces each filename with a 'r filename'
5a67ee
   command to insert the file at that point, while omitting the
5a67ee
   extraneous material. Two important things to note with this script:
5a67ee
   (1) There should be only one '#include file' directive per line, and
5a67ee
   (2) each '#include file' directive must be the *only* thing on that
5a67ee
   line, because everything else on the line will be deleted.
5a67ee

5a67ee
   Though the script uses GNU sed or ssed because of the great support
5a67ee
   for embedded script comments, it should run on any version of sed.
5a67ee
   If not, write me and let me know.
5a67ee

5a67ee
------------------------------
5a67ee

5a67ee
5. WHY ISN'T THIS WORKING?
5a67ee

5a67ee
5.1. Why don't my variables like $var get expanded in my sed script?
5a67ee

5a67ee
   Because your sed script uses 'single quotes' instead of "double
5a67ee
   quotes." Unix shells never expand $variables in single quotes.
5a67ee

5a67ee
   This is probably the most frequently-asked sed question. For more
5a67ee
   info on using variables, see section 4.30.
5a67ee

5a67ee
5.2. I'm using 'p' to print, but I have duplicate lines sometimes.
5a67ee

5a67ee
   Sed prints the entire file by default, so the 'p' command might
5a67ee
   cause the duplicate lines. If you want the whole file printed,
5a67ee
   try removing the 'p' from commands like 's/foo/bar/p'. If you want
5a67ee
   part of the file printed, run your sed script with -n flag to
5a67ee
   suppress normal output, and rewrite the script to get all output
5a67ee
   from the 'p' comand.
5a67ee

5a67ee
   If you're still getting duplicate lines, you are probably finding
5a67ee
   several matches for the same line. Suppose you want to print lines
5a67ee
   with the words "Peter" or "James" or "John", but not the same line
5a67ee
   twice. The following command will fail:
5a67ee

5a67ee
     sed -n '/Peter/p; /James/p; /John/p' files
5a67ee

5a67ee
   Since all 3 commands of the script are executed for each line,
5a67ee
   you'll get extra lines. A better way is to use the 'd' (delete) or
5a67ee
   'b' (branch) commands, like so (with GNU sed):
5a67ee

5a67ee
     sed '/Peter/b; /James/b; /John/b; d' files          # one way
5a67ee
     sed -n '/Peter/{p;d;};/James/{p;d;};/John/p' files  # a 2nd way
5a67ee
     sed -n '/Peter/{p;b;};/James/{p;b;};/John/p' files  # a 3rd way
5a67ee
     sed '/Peter\|James\|John/!d' files                  # shortest way
5a67ee

5a67ee
   On standard seds, these must be broken down with -e commands:
5a67ee

5a67ee
     sed -e '/Peter/b' -e '/James/b' -e '/John/b' -e d files
5a67ee
     sed -n -e '/Peter/{p;d;}' -e '/James/{p;d;}' -e '/John/p' files
5a67ee

5a67ee
   The 3rd line would require too many -e commands to fit on one line,
5a67ee
   since standard versions of sed require an -e command after each 'b'
5a67ee
   and also after each closing brace '}'.
5a67ee

5a67ee
5.3. Why does my DOS version of sed process a file part-way through
5a67ee
     and then quit?
5a67ee

5a67ee
   First, look for errors in the script. Have you used the -n switch
5a67ee
   without telling sed to print anything to the console? Have you read
5a67ee
   the docs to your version of sed to see if it has a syntax you may
5a67ee
   have misused? (Look for an N or H command that gathers too much.)
5a67ee

5a67ee
   Next, if you are sure your sed script is valid, a probable cause is
5a67ee
   an end-of-file marker embedded in the file. An EOF marker (SUB) is
5a67ee
   a Control-Z character, with the value of 1A hex (26 decimal). As
5a67ee
   soon as any DOS version of sed encounters a Ctrl-Z character, sed
5a67ee
   stops processing.
5a67ee

5a67ee
   To locate the EOF character, use Vern Buerg's shareware file viewer
5a67ee
   LIST.COM <http://www.buerg.com/list.html>. In text mode, look for a
5a67ee
   right-arrow symbol; in hex mode (Alt-H), look for a 1A code. With
5a67ee
   Unix utilities ported to DOS, use 'od' (octal dump) to display
5a67ee
   hexcodes in your file, and then use sed to locate the offending
5a67ee
   character:
5a67ee

5a67ee
       od -txC badfile.txt | sed -n "/ 1a /p; / 1a$/p"
5a67ee

5a67ee
   Then edit the input file to remove the offending character(s).
5a67ee

5a67ee
   If you would rather NOT edit the input file, there is still a fix.
5a67ee
   It requires the DJGPP 32-bit port of 'tr', the Unix translate
5a67ee
   program (v1.22 or higher). GNU od and tr are currently at v2.0 (for
5a67ee
   DOS); they are packaged with the GNU text utilities, available at
5a67ee

5a67ee
       ftp://ftp.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/txt20b.zip
5a67ee
       http://www.simtel.net/gnudlpage.php?product=/gnu/djgpp/v2gnu/txt20b.zip&name=txt20b.zip
5a67ee

5a67ee
   It is important to get the DJGPP version of 'tr' because other
5a67ee
   versions ported to DOS will stop processing when they encounter the
5a67ee
   EOF character. Use the -d (delete) command:
5a67ee

5a67ee
       tr -d \32 < badfile.txt | sed -f myscript.sed
5a67ee

5a67ee
5.4. My RE isn't matching/deleting what I want it to. (Or, "Greedy vs.
5a67ee
     stingy pattern matching")
5a67ee

5a67ee
   The two most common causes for this problem are: (1) misusing the
5a67ee
   '.' metacharacter, and (2) misusing the '*' metacharacter. The RE
5a67ee
   '.*' is designed to be "greedy" (i.e., matching as many characters
5a67ee
   as possible). However, sometimes users need an expression which is
5a67ee
   "stingy," matching the shortest possible string.
5a67ee

5a67ee
   (1) On single-line patterns, the '.' metacharacter matches any
5a67ee
   single character on the line. ('.' cannot match the newline at the
5a67ee
   end of the line because the newline is removed when the line is put
5a67ee
   into the pattern space; sed adds a newline automatically when the
5a67ee
   pattern space is printed.) On multi-line patterns obtained with the
5a67ee
   'N' or 'G' commands, '.' _will_ match a newline in the middle of the
5a67ee
   pattern space. If there are 3 lines in the pattern space, "s/.*//"
5a67ee
   will delete all 3 lines, not just the first one (leaving 1 blank
5a67ee
   line, since the trailing newline is added to the output).
5a67ee

5a67ee
   Normal misuse of '.' occurs in trying to match a word or bounded
5a67ee
   field, and forgetting that '.' will also cross the field limits.
5a67ee
   Suppose you want to delete the first word in braces:
5a67ee

5a67ee
       echo {one} {two} {three} | sed 's/{.*}/{}/'       # fails
5a67ee
       echo {one} {two} {three} | sed 's/{[^}]*}/{}/'    # succeeds
5a67ee

5a67ee
   's/{.*}/{}/' is not the solution, since the regex '.' will match
5a67ee
   any character, including the close braces. Replace the '.' with
5a67ee
   '[^}]', which signifies a negated character set '[^...]' containing
5a67ee
   anything other than a right brace. FWIW, we know that 's/{one}/{}/'
5a67ee
   would also solve our question, but we're trying to illustrate the
5a67ee
   use of the negated character set: [^anything-but-this].
5a67ee

5a67ee
   A negated character set should be used for matching words between
5a67ee
   quote marks, for fields separated by commas, and so on. See also
5a67ee
   section 4.12 ("How do I parse a comma-delimited data file?").
5a67ee

5a67ee
   (2) The '*' metacharacter represents zero or more instances of the
5a67ee
   previous expression. The '*' metacharacter looks for the leftmost
5a67ee
   possible match first and will match zero characters. Thus,
5a67ee

5a67ee
       echo foo | sed 's/o*/EEE/'
5a67ee

5a67ee
   will generate 'EEEfoo', not 'fEEE' as one might expect. This is
5a67ee
   because /o*/ matches the null string at the beginning of the word.
5a67ee

5a67ee
   After finding the leftmost possible match, the '*' is GREEDY; it
5a67ee
   always tries to match the longest possible string. When two or
5a67ee
   three instances of '.*' occur in the same RE, the leftmost instance
5a67ee
   will grab the most characters. Consider this example, which uses
5a67ee
   grouping '\(...\)' to save patterns:
5a67ee

5a67ee
       echo bar bat bay bet bit | sed 's/^.*\(b.*\)/\1/'
5a67ee

5a67ee
   What will be displayed is 'bit', never anything longer, because the
5a67ee
   leftmost '.*' took the longest possible match. Remember this rule:
5a67ee
   "leftmost match, longest possible string, zero also matches."
5a67ee

5a67ee
5.5. What is CSDPMI*B.ZIP and why do I need it?
5a67ee

5a67ee
   If you use MS-DOS outside of Windows and try to use GNU sed v1.18
5a67ee
   or 3.02, you may encounter the following error message:
5a67ee

5a67ee
       no DPMI - Get csdpmi*b.zip
5a67ee

5a67ee
   "DPMI" stands for DOS Protected Mode Interface; it's basically a
5a67ee
   means of running DOS in Protected Mode (as opposed to Real Mode),
5a67ee
   which allows programs to share resources in extended memory without
5a67ee
   conflicting with one another. Running HIMEM.SYS and EMM386.EXE is
5a67ee
   not enough. The "CSDPMI*B.ZIP" refers to files written by Charles
5a67ee
   Sandmann to provide DPMI services for 32-bit computers (i.e.,
5a67ee
   386SX, 386DX, 486SX, etc.). Download the binary file (the source
5a67ee
   code is also available):
5a67ee

5a67ee
       http://www.delorie.com/djgpp/dl/ofc/simtel/v2misc/csdpmi5b.zip  # binaries
5a67ee
       http://www.delorie.com/djgpp/dl/ofc/simtel/v2misc/csdpmi5s.zip  # source
5a67ee
       ftp://ftp.cdrom.com/pub/simtelnet/gnu/djgpp/v2misc/csdpmi5b.zip # binaries
5a67ee
       ftp://ftp.cdrom.com/pub/simtelnet/gnu/djgpp/v2misc/csdpmi5s.zip # source
5a67ee

5a67ee
   and extract CWSDPMI.EXE, CWSDPR0.EXE and CWSPARAM.EXE from the ZIP
5a67ee
   file. Put all 3 CWS*.EXE files in the same directory as GSED.EXE
5a67ee
   and you're all set. There are DOC files enclosed, but they're
5a67ee
   nearly incomprehensible for the average computer user. (Another
5a67ee
   case of user-vicious documentation.)
5a67ee

5a67ee
   If you're running Windows and you normally use a DOS session to run
5a67ee
   GNU sed (i.e., you get to a DOS prompt with a resizable window or
5a67ee
   you press Alt-Enter to switch to full-screen mode), you don't need
5a67ee
   the CWS*.EXE files at all, since Windows uses DPMI already.
5a67ee

5a67ee
5.6. Where are the man pages for GNU sed?
5a67ee

5a67ee
   Prior to GNU sed v3.02, there weren't any. Until recently, man
5a67ee
   pages distributed with gsed were borrowed from old sources or from
5a67ee
   other compilations. None of them were "official." GNU sed v3.02 had
5a67ee
   the first real set of official man pages, and the documentation has
5a67ee
   greatly improved with GNU sed version 4.0, which now includes both
5a67ee
   man pages and textinfo pages.
5a67ee

5a67ee
5.7. How do I tell what version of sed I am using?
5a67ee

5a67ee
   Try entering "sed" all by itself on the command line, followed by
5a67ee
   no arguments or parameters.  Also, try "sed --version".  In a
5a67ee
   pinch, you can also try this:
5a67ee

5a67ee
       strings sed | grep -i ver
5a67ee

5a67ee
   Your version of 'strings' must be a version of the Unix utility of
5a67ee
   this name. It should not be the DOS utility STRINGS.COM by Douglas
5a67ee
   Boling.
5a67ee

5a67ee
5.8. Does sed issue an exit code?
5a67ee

5a67ee
   Most versions of sed do not, but check the documentation that came
5a67ee
   with whichever version you are using. GNU sed issues an exit code
5a67ee
   of 0 if the program terminated normally, 1 if there were errors in
5a67ee
   the script, and 2 if there were errors during script execution.
5a67ee

5a67ee
5.9. The 'r' command isn't inserting the file into the text.
5a67ee

5a67ee
   On most versions of sed (but not all), the 'r' (read) and 'w'
5a67ee
   (write) commands must be followed by exactly one space, then the
5a67ee
   filename, and then terminated by a newline. Any additional
5a67ee
   characters before or after the filename are interpreted as *part*
5a67ee
   of the filename. Thus
5a67ee

5a67ee
       /RE/r  insert.me
5a67ee

5a67ee
   will would try to locate a file called ' insert.me' (note the
5a67ee
   leading space!). If the file was not found, most versions of sed
5a67ee
   say nothing, not even an error message.
5a67ee

5a67ee
   When sed scripts are used on the command line, every 'r' and 'w'
5a67ee
   must be the last command in that part of the script. Thus,
5a67ee

5a67ee
       sed -e '/regex/{r insert.file;d;}' source         # will fail
5a67ee
       sed -e '/regex/{r insert.file' -e 'd;}' source    # will succeed
5a67ee

5a67ee
5.10. Why can't I match or delete a newline using the \n escape sequence?
5a67ee
      Why can't I match 2 or more lines using \n?
5a67ee

5a67ee
   The \n will never match the newline at the end-of-line because the
5a67ee
   newline is always stripped off before the line is placed into the
5a67ee
   pattern space. To get 2 or more lines into the pattern space, use
5a67ee
   the 'N' command or something similar (such as 'H;...;g;').
5a67ee

5a67ee
   Sed works like this: sed reads one line at a time, chops off the
5a67ee
   terminating newline, puts what is left into the pattern space where
5a67ee
   the sed script can address or change it, and when the pattern space
5a67ee
   is printed, appends a newline to stdout (or to a file). If the
5a67ee
   pattern space is entirely or partially deleted with 'd' or 'D', the
5a67ee
   newline is *not* added in such cases. Thus, scripts like
5a67ee

5a67ee
       sed 's/\n//' file       # to delete newlines from each line
5a67ee
       sed 's/\n/foo\n/' file  # to add a word to the end of each line
5a67ee

5a67ee
   will _never_ work, because the trailing newline is removed _before_
5a67ee
   the line is put into the pattern space. To perform the above tasks,
5a67ee
   use one of these scripts instead:
5a67ee

5a67ee
       tr -d '\n' < file              # use tr to delete newlines
5a67ee
       sed ':a;N;$!ba;s/\n//g' file   # GNU sed to delete newlines
5a67ee
       sed 's/$/ foo/' file           # add "foo" to end of each line
5a67ee

5a67ee
   Since versions of sed other than GNU sed have limits to the size of
5a67ee
   the pattern buffer, the Unix 'tr' utility is to be preferred here.
5a67ee
   If the last line of the file contains a newline, GNU sed will add
5a67ee
   that newline to the output but delete all others, whereas tr will
5a67ee
   delete all newlines.
5a67ee

5a67ee
   To match a block of two or more lines, there are 3 basic choices:
5a67ee
   (1) use the 'N' command to add the Next line to the pattern space;
5a67ee
   (2) use the 'H' command at least twice to append the current line
5a67ee
   to the Hold space, and then retrieve the lines from the hold space
5a67ee
   with x, g, or G; or (3) use address ranges (see section 3.3, above)
5a67ee
   to match lines between two specified addresses.
5a67ee

5a67ee
   Choices (1) and (2) will put an \n into the pattern space, where it
5a67ee
   can be addressed as desired ('s/ABC\nXYZ/alphabet/g'). One example
5a67ee
   of using 'N' to delete a block of lines appears in section 4.13
5a67ee
   ("How do I delete a block of _specific_ consecutive lines?"). This
5a67ee
   example can be modified by changing the delete command to something
5a67ee
   else, like 'p' (print), 'i' (insert), 'c' (change), 'a' (append),
5a67ee
   or 's' (substitute).
5a67ee

5a67ee
   Choice (3) will not put an \n into the pattern space, but it _does_
5a67ee
   match a block of consecutive lines, so it may be that you don't
5a67ee
   even need the \n to find what you're looking for. Since several
5a67ee
   versions of sed support this syntax:
5a67ee

5a67ee
       sed '/start/,+4d'  # to delete "start" plus the next 4 lines,
5a67ee

5a67ee
   in addition to the traditional '/from here/,/to there/{...}' range
5a67ee
   addresses, it may be possible to avoid the use of \n entirely.
5a67ee

5a67ee
5.11. My script aborts with an error message, "event not found".
5a67ee

5a67ee
   This error is generated by the csh or tcsh shells, not by sed. The
5a67ee
   exclamation mark (!) is special to csh/tcsh, and if you use it in
5a67ee
   command-line or shell scripts--even within single quotes--it must
5a67ee
   be preceded by a backslash. Thus, under the csh/tcsh shell:
5a67ee

5a67ee
       sed '/regex/!d'      # will fail
5a67ee
       sed '/regex/\!d'     # will succeed
5a67ee

5a67ee
   The exclamation mark should not be prefixed with a backslash when
5a67ee
   the script is called from a file, as "-f script.file".
5a67ee

5a67ee
------------------------------
5a67ee

5a67ee
6. OTHER ISSUES
5a67ee

5a67ee
6.1. I have a certain problem that stumps me. Where can I get help?
5a67ee

5a67ee
   Post your question on the "sed-users" mailing list (section 2.3.2),
5a67ee
   where many sed users will be able to see your question. You will have
5a67ee
   to subscribe to have posting privileges.
5a67ee

5a67ee
   Your other alternative is one of these newsgroups:
5a67ee

5a67ee
      - alt.comp.editors.batch
5a67ee
      - comp.editors
5a67ee
      - comp.unix.questions
5a67ee
      - comp.unix.shell
5a67ee

5a67ee
6.2. How does sed compare with awk, perl, and other utilities?
5a67ee

5a67ee
   Awk is a much richer language with many features of a programming
5a67ee
   language, including variable names, math functions, arrays, system
5a67ee
   calls, etc. Its command structure is similar to sed:
5a67ee

5a67ee
      address { command(s) }
5a67ee

5a67ee
   which means that for each line or range of lines that matches the
5a67ee
   address, execute the command(s). In both sed and awk, an address
5a67ee
   can be a line number or a RE somewhere on the line, or both.
5a67ee

5a67ee
   In program size, awk is 3-10 times larger than sed. Awk has most of
5a67ee
   the functions of sed, but not all. Notably, sed supports
5a67ee
   backreferences (\1, \2, ...) to previous expressions, and awk does
5a67ee
   not have any comparable syntax. (One exception: GNU awk v3.0
5a67ee
   introduced gensub(), which supports backreferences only on
5a67ee
   substitutions.)
5a67ee

5a67ee
   Perl is a general-purpose programming language, with many features
5a67ee
   beyond text processing and interprocess communication, taking it
5a67ee
   well past awk or other scripting languages. Perl supports every
5a67ee
   feature sed does and has its own set of extended regular
5a67ee
   expressions, which give it extensive power in pattern matching and
5a67ee
   processing. (Note: the standard perl distribution comes with 's2p',
5a67ee
   a sed-to-perl conversion script. See section 3.6 for more info.)
5a67ee
   Like sed and awk, perl scripts do not need to be compiled into
5a67ee
   binary code. Like sed, perl can also run many useful "one-liners"
5a67ee
   from the command line, though with greater flexibility; see
5a67ee
   question 4.41 ("How do I make substitutions in every file in a
5a67ee
   directory, or in a complete directory tree?").
5a67ee

5a67ee
   On the other hand, the current version of perl is from 8 to 35
5a67ee
   times larger than sed in its executables alone (perl's library
5a67ee
   modules and allied files not included!). Further, for most simple
5a67ee
   tasks such as substitution, sed executes more quickly than either
5a67ee
   perl or awk. All these utilities serve to process input text,
5a67ee
   transforming it to meet our needs . . . or our arbitrary whims.
5a67ee

5a67ee
6.3. When should I use sed?
5a67ee

5a67ee
   When you need a small, fast program to modify words, lines, or
5a67ee
   blocks of lines in a textfile.
5a67ee

5a67ee
6.4. When should I NOT use sed?
5a67ee

5a67ee
   You should not use sed when you have "dedicated" tools which can do
5a67ee
   the job faster or with an easier syntax. Do not use sed when you
5a67ee
   only want to:
5a67ee

5a67ee
   - print individual lines, based on patterns within the line itself.
5a67ee
     Instead, use "grep".
5a67ee

5a67ee
   - print blocks of lines, with 1 or more lines of context above or
5a67ee
     below a specific regular expression. Instead, use the GNU version
5a67ee
     of grep as follows:
5a67ee

5a67ee
        grep -A{number} -B{number} "regex"
5a67ee

5a67ee
   - remove individual lines, based on patterns within the line
5a67ee
     itself. Instead, use "grep -v".
5a67ee

5a67ee
   - print line numbers.  Instead, use "nl" or "cat -n".
5a67ee

5a67ee
   - reformat lines or paragraphs. Instead, use "fold", "fmt" or "par".
5a67ee

5a67ee
   The tr utility is also more suited than sed to some simple tasks. For
5a67ee
   example, to:
5a67ee

5a67ee
   - delete individual characters. Instead of "s/[a-d]//g", use
5a67ee

5a67ee
        tr -d "[a-d]"
5a67ee

5a67ee
   - squeeze sequential characters. Instead of "s/ee*/e/g", use
5a67ee

5a67ee
        tr -s "{character-set}"
5a67ee

5a67ee
   - change individual characters. Instead of "y/abcdef/ABCDEF/", use
5a67ee

5a67ee
        tr "[a-f]" "[A-F]"
5a67ee

5a67ee
   Note, however, that tr does not support giving input files on the
5a67ee
   command line, so the syntax is:
5a67ee

5a67ee
     tr {options-and-patterns} < input-file
5a67ee

5a67ee
   or, to process multiple files:
5a67ee

5a67ee
     cat input-file1 input-file2 | tr {options-and-patterns}
5a67ee

5a67ee
   If you have multiple files, using tr instead of sed is often more of
5a67ee
   an exercise than a useful thing. Although sed can perfectly emulate
5a67ee
   certain functions of cat, grep, nl, rev, sort, tac, tail, tr, uniq,
5a67ee
   and other utilities, producing identical output, the native utilities
5a67ee
   are usually optimized to do the job more quickly than sed.
5a67ee

5a67ee
6.5. When should I ignore sed and use awk or Perl instead?
5a67ee

5a67ee
   If you can write the same script in awk or Perl and do it in less
5a67ee
   time, then use Perl or awk. There's no reason to spend an hour
5a67ee
   writing and debugging a sed script if you can do it in Perl in 10
5a67ee
   minutes (assuming that you know Perl already) and if the processing
5a67ee
   time or memory use is not a factor. Don't hunt pheasants with a .22
5a67ee
   if you have a shotgun at your side . . . unless you simply enjoy
5a67ee
   the challenge!
5a67ee

5a67ee
   Specifically, use awk or perl if you need to:
5a67ee

5a67ee
      - count fields or words on a line. (awk)
5a67ee
      - count lines in a block or objects in a file.
5a67ee
      - check lengths of strings or do math operations.
5a67ee
      - handle very long lines or need very large buffers. (or gsed)
5a67ee
      - handle binary data (control characters). (perl: binmode)
5a67ee
      - loop through an array or list.
5a67ee
      - test for file existence, filesize, or fileage.
5a67ee
      - treat each paragraph as a line. (well, not always)
5a67ee

5a67ee
6.6. Known limitations among sed versions
5a67ee

5a67ee
   Limits on distributed versions, although source code for most
5a67ee
   versions of free sed allows for modification and recompilation. As
5a67ee
   used below, "no limit" means there is no "fixed" limit. Limits are
5a67ee
   actually determined by one's hardware, memory, operating system,
5a67ee
   and which C library is used to compile sed.
5a67ee

5a67ee
6.6.1. Maximum line length
5a67ee

5a67ee
      GNU sed:        no limit
5a67ee
      ssed:           no limit
5a67ee
      sedmod v1.0:    4096 bytes
5a67ee
      HHsed v1.5:     4000 bytes
5a67ee
      sed v1.6:       [pending]
5a67ee

5a67ee
6.6.2. Maximum size for all buffers (pattern space + hold space)
5a67ee

5a67ee
      GNU sed:        no limit
5a67ee
      ssed:           no limit
5a67ee
      sedmod v1.0:    4096 bytes
5a67ee
      HHsed v1.5:     4000 bytes
5a67ee
      sed v1.6:       [pending]
5a67ee

5a67ee
6.6.3. Maximum number of files that can be read with read command
5a67ee

5a67ee
      GNU sed v3+:    no limit
5a67ee
      ssed:           no limit
5a67ee
      GNU sed v2.05:  total no. of r and w commands may not exceed 32
5a67ee
      sedmod v1.0:    total no. of r and w commands may not exceed 20
5a67ee
      sed v1.6:       [pending]
5a67ee

5a67ee
6.6.4. Maximum number of files that can be written with 'w' command
5a67ee

5a67ee
      GNU sed v3+:    no limit (but typical Unix is 253)
5a67ee
      ssed:           no limit (but typical Unix is 253)
5a67ee
      GNU sed v2.05:  total no. of r and w commands may not exceed 32
5a67ee
      sedmod v1.0:    10
5a67ee
      HHsed v1.5:     10
5a67ee
      sed v1.6:       [pending]
5a67ee

5a67ee
6.6.5. Limits on length of label names
5a67ee

5a67ee
      GNU sed:        no limit
5a67ee
      ssed:           no limit
5a67ee
      HHsed v1.5:     no limit
5a67ee
      sed v1.6:       [pending]
5a67ee
      BSD sed:        8 characters
5a67ee

5a67ee
   Note that GNU sed and ssed both consider a semicolon to terminate a
5a67ee
   label name.
5a67ee

5a67ee
6.6.6. Limits on length of write-file names
5a67ee

5a67ee
      GNU sed:        no limit
5a67ee
      ssed:           no limit
5a67ee
      HHsed v1.5:     no limit
5a67ee
      sed v1.6:       [pending]
5a67ee
      BSD sed:        40 characters
5a67ee

5a67ee
6.6.7. Limits on branch/jump commands
5a67ee

5a67ee
      GNU sed:        no limit
5a67ee
      ssed:           no limit
5a67ee
      HHsed v1.5:     50
5a67ee
      sed v1.6:       [pending]
5a67ee

5a67ee
   As a practical consequence, this means that HHsed will not read
5a67ee
   more than 50 lines into the pattern space via an N command, even if
5a67ee
   the pattern space is only a few hundred bytes in size. HHsed exits
5a67ee
   with an error message, "infinite branch loop at line {nn}".
5a67ee

5a67ee
6.7. Known incompatibilities between sed versions
5a67ee

5a67ee
6.7.1. Issuing commands from the command line
5a67ee

5a67ee
   Most versions of sed permit multiple commands to issued on the
5a67ee
   command line, separated by a semicolon (;). Thus,
5a67ee

5a67ee
       sed 'G;G' file
5a67ee

5a67ee
   should triple-space a file. However, for non-GNU sed, some commands
5a67ee
   *require* separate expressions on the command line. These include:
5a67ee

5a67ee
      - all labels (':a', ':more', etc.)
5a67ee
      - all branching instructions ('b', 't')
5a67ee
      - commands to read and write files ('r' and 'w')
5a67ee
      - any closing brace, '}'
5a67ee

5a67ee
   If these commands are used, they must be the LAST commands of an
5a67ee
   expression. Subsequent commands must use another expression
5a67ee
   (another -e switch plus arguments).  E.g.,
5a67ee

5a67ee
     sed  -e :a -e 's/^.\{1,77\}$/ &/;ta' -e 's/\( *\)\1/\1/' files
5a67ee

5a67ee
   GNU sed, ssed, sed15 and sed16 all permit these commands to be
5a67ee
   followed by a semicolon, so the previous script can be written:
5a67ee

5a67ee
     sed  ':a;s/^.\{1,77\}$/ &/;ta;s/\( *\)\1/\1/' files
5a67ee

5a67ee
   Versions differ in implementing the 'a' (append), 'c' (change), and
5a67ee
   'i' (insert) commands:
5a67ee

5a67ee
      sed "/foo/i New text here"              # HHsed/sedmod/gsed-30280
5a67ee
      gsed -e "/foo/i\\" -e "New text here"   # GNU sed
5a67ee
      sed1 -e "/foo/i" -e "New text here"     # one version of sed
5a67ee
      sed2 "/foo/i\ New text here"            # another version
5a67ee

5a67ee
6.7.2. Using comments (prefixed by the '#' sign)
5a67ee

5a67ee
   Most versions of sed permit comments to appear in sed scripts only
5a67ee
   on the first line of the script. Comments on line 2 or thereafter
5a67ee
   are not recognized and will generate an error like "unrecognized
5a67ee
   command" or "command [bad-line-here] has trailing garbage".
5a67ee

5a67ee
   GNU sed, HHsed, sedmod, and HP-UX sed permit comments to appear on
5a67ee
   any line of the script, except after labels and branching commands
5a67ee
   (b,t), *provided* that a semicolon (;) occurs after the command
5a67ee
   itself. This syntax makes sed similar to awk and perl, which use a
5a67ee
   similar commenting structure in their scripts.  Thus,
5a67ee

5a67ee
      # GNU style sed script
5a67ee
      $!N;                        # except for last line, get next line
5a67ee
      s/^\([0-9]\{5\}\).*\n\1.*//;    # if first 5 digits of each line
5a67ee
                                      # match, delete BOTH lines.
5a67ee
      t skip
5a67ee
      P;                              # print 1st line only if no match
5a67ee
      :skip
5a67ee
      D;                    # delete 1st line of pattern space and loop
5a67ee
      #---end of script---
5a67ee

5a67ee
   is a valid script for GNU-based versions of sed, but is
5a67ee
   unrecognized for most other versions of sed.
5a67ee

5a67ee
   Finally, if the first two characters in a disk file script are
5a67ee
   "#n", the output is suppressed, exactly as if -n were entered on
5a67ee
   the command line. This is true for the following versions of sed:
5a67ee

5a67ee
      - ssed v3.57 and above
5a67ee
      - gsed
5a67ee
      - HHsed v1.5
5a67ee
      - sed v1.6
5a67ee

5a67ee
   This syntax is not recognized by these versions of sed:
5a67ee

5a67ee
      - ssed v3.45 to v3.50 (other versions untested)
5a67ee
      - sedmod v1.0
5a67ee

5a67ee
6.7.3. Special syntax in REs
5a67ee

5a67ee
A. HHsed v1.5 (by Howard Helman)
5a67ee

5a67ee
   The following expressions can be used for /RE/ addresses or in the
5a67ee
   LHS side of a substitution:
5a67ee

5a67ee
      +    - 1 or more occurrences of previous RE: same as \{1,\}
5a67ee
      \<   - boundary between nonword and word character
5a67ee
      \>   - boundary between word and nonword character
5a67ee

5a67ee
   The following expressions can be used for /RE/ addresses or on
5a67ee
   either side of a substitution:
5a67ee

5a67ee
      \a   - bell         (ASCII 07, 0x07)
5a67ee
      \b   - backspace    (ASCII 08, 0x08)
5a67ee
      \e   - escape       (ASCII 27, 0x1B)
5a67ee
      \f   - formfeed     (ASCII 12, 0x0C)
5a67ee
      \n   - newline      (printed as 2 bytes, 0D 0A or ^M^J, in DOS)
5a67ee
      \r   - return       (ASCII 13, 0x0D)
5a67ee
      \t   - tab          (ASCII 09, 0x09)
5a67ee
      \v   - vertical tab (ASCII 11, 0x0B)
5a67ee
      \xHH - the ASCII character corresponding to 2 hex digits HH.
5a67ee

5a67ee
B. sed v1.6 (by Walter Briscoe)
5a67ee

5a67ee
   sed v1.6 accepts every expression supported by sed v1.5 (above),
5a67ee
   plus the following elements, which can also used in the RHS of a
5a67ee
   substitution (in addition to those listed above):
5a67ee

5a67ee
      \\~  - insert replacement pattern defined in last s/// command
5a67ee
             (must be used alone in the RHS)
5a67ee
      \l   - change next element to lower case
5a67ee
      \L   - change remaining elements to lower case
5a67ee
      \u   - change next element to upper case
5a67ee
      \U   - change remaining elements to upper case
5a67ee
      \e   - end case conversion of next element
5a67ee
      \E   - end case conversion of remaining elements
5a67ee
      $0   - insert pattern space BEFORE the substitution
5a67ee
      $1-$9 - match Nth word on the pattern space
5a67ee

5a67ee

5a67ee
C. sedmod v1.0 (by Hern Chen)
5a67ee

5a67ee
   The following expressions can be used for /RE/ addresses in the LHS
5a67ee
   of a substitution:
5a67ee

5a67ee
      +    - 1 or more occurrences of previous RE: same as \{1,\}
5a67ee
      \a   - any alphanumeric: same as [a-zA-Z0-9]
5a67ee
      \A   - 1 or more alphas: same as \a+
5a67ee
      \d   - any digit: same as [0-9]
5a67ee
      \D   - 1 or more digits: same as \d+
5a67ee
      \h   - any hex digit: same as [0-9a-fA-F]
5a67ee
      \H   - 1 or more hexdigits: same as \h+
5a67ee
      \l   - any letter: same as [A-Za-z]
5a67ee
      \L   - 1 or more letters: same as \l+
5a67ee
      \n   - newline      (read as 2 bytes, 0D 0A or ^M^J, in DOS)
5a67ee
      \s   - any whitespace character: space, tab, or vertical tab
5a67ee
      \S   - 1 or more whitespace chars: same as \s+
5a67ee
      \t   - tab          (ASCII 09, 0x09)
5a67ee
      \<   - boundary between nonword and word character
5a67ee
      \>   - boundary between word and nonword character
5a67ee

5a67ee
   The following expressions can be used in the RHS of a substitution.
5a67ee
   "Elements" refer to \1 .. \9, &, $0, or $1 .. $9:
5a67ee

5a67ee
      &    - insert regexp defined on LHS
5a67ee
      \e   - end case conversion of next element
5a67ee
      \E   - end case conversion of remaining elements
5a67ee
      \l   - change next element to lower case
5a67ee
      \L   - change remaining elements to lower case
5a67ee
      \n   - newline      (printed as 2 bytes, 0D 0A or ^M^J, in DOS)
5a67ee
      \t   - tab          (ASCII 09, 0x09)
5a67ee
      \u   - change next element to upper case
5a67ee
      \U   - change remaining elements to upper case
5a67ee
      $0   - insert the original pattern space
5a67ee
      $1-$9 - match Nth word on the pattern space
5a67ee

5a67ee
D. UnixDos sed
5a67ee

5a67ee
   The following expressions can be used in text, LHS, and RHS:
5a67ee

5a67ee
      \n   - newline      (printed as 2 bytes, 0D 0A or ^M^J, in DOS)
5a67ee

5a67ee
E. GNU sed v1.03 (by Frank Whaley)
5a67ee

5a67ee
   When used with the -x (extended) switch on the command line, or
5a67ee
   when '#x' occurs as the first line of a script, Whaley's gsed103
5a67ee
   supports the following expressions in both the LHS and RHS of a
5a67ee
   substitution:
5a67ee

5a67ee
      \|      matches the expression on either side
5a67ee
      ?       0 or 1 occurrences of previous RE: same as \{0,1\}
5a67ee
      +       1 or more occurrence of previous RE: same as \{1,\}
5a67ee
      \a      "alert" beep     (BEL, Ctrl-G, 0x07)
5a67ee
      \b      backspace        (BS, Ctrl-H, 0x08)
5a67ee
      \f      formfeed         (FF, Ctrl-L, 0x0C)
5a67ee
      \n      newline          (LF, Ctrl-J, 0x0A)
5a67ee
      \r      carriage-return  (CR, Ctrl-M, 0x0D)
5a67ee
      \t      horizontal tab   (HT, Ctrl-I, 0x09)
5a67ee
      \v      vertical tab     (VT, Ctrl-K, 0x0B)
5a67ee
      \bBBB   binary char, where BBB are 1-8 binary digits, [0-1]
5a67ee
      \dDDD   decimal char, where DDD are 1-3 decimal digits, [0-9]
5a67ee
      \oOOO   octal char, where OOO are 1-3 octal digits, [0-7]
5a67ee
      \xHH    hex char, where HH are 1-2 hex digits, [0-9A-F]
5a67ee

5a67ee
   In normal mode, with or without the -x switch, the following escape
5a67ee
   sequences are also supported in regex addressing or in the LHS of a
5a67ee
   substitution:
5a67ee

5a67ee
      \`      matches beginning of pattern space: same as /^/
5a67ee
      \'      matches end of pattern space: same as /$/
5a67ee
      \B      boundary between 2 word or 2 nonword characters
5a67ee
      \w      any nonword character [*BUG!* should be a word char]
5a67ee
      \W      any nonword character: same as /[^A-Za-z0-9]/
5a67ee
      \<      boundary between nonword and word char
5a67ee
      \>      boundary between word and nonword char
5a67ee

5a67ee
F. GNU sed v2.05 and higher versions
5a67ee

5a67ee
   The following expressions can be used for /RE/ addresses or in the
5a67ee
   LHS side of a substitution:
5a67ee

5a67ee
      \`  - matches the beginning of the pattern space (same as "^")
5a67ee
      \'  - matches the end of the pattern space (same as "$")
5a67ee
      \?  - 0 or 1 occurrence of previous character: same as \{0,1\}
5a67ee
      \+  - 1 or more occurrences of previous character: same as \{1,\}
5a67ee
      \|  - matches the string on either side, e.g., foo\|bar
5a67ee
      \b  - boundary between word and nonword chars (reversible)
5a67ee
      \B  - boundary between 2 word or between 2 nonword chars
5a67ee
      \n  - embedded newline (usable after N, G, or similar commands)
5a67ee
      \w  - any word character: [A-Za-z0-9_]
5a67ee
      \W  - any nonword char: [^A-Za-z0-9_]
5a67ee
      \<  - boundary between nonword and word character
5a67ee
      \>  - boundary between word and nonword character
5a67ee

5a67ee
   On \b, \B, \<, and \>, see section 6.7.4 ("Word boundaries"),
5a67ee
   below.
5a67ee

5a67ee
   Undocumented -r switch:
5a67ee

5a67ee
   Beginning with version 3.02, GNU sed has an undocumented -r switch
5a67ee
   (undocumented till version 4.0), activating Extended Regular
5a67ee
   Expressions in the following manner:
5a67ee

5a67ee
       ?      -  0 or 1 occurrence of previous character
5a67ee
       +      -  1 or more occurrences of previous character
5a67ee
       |      -  matches the string on either side, e.g., foo|bar
5a67ee
       (...)  -  enable grouping without backslash
5a67ee
       {...}  -  enable interval expression without backslash
5a67ee

5a67ee
   When the -r switch (mnemonic: "regular expression") is used, prefix
5a67ee
   these symbols with a backslash to disable the special meaning.
5a67ee

5a67ee
   Escape sequences:
5a67ee

5a67ee
   Beginning with version 3.02.80, the following escape sequences can
5a67ee
   now be used on both sides of a "s///" substitution:
5a67ee

5a67ee
      \a      "alert" beep     (BEL, Ctrl-G, 0x07)
5a67ee
      \f      formfeed         (FF, Ctrl-L, 0x0C)
5a67ee
      \n      newline          (LF, Ctrl-J, 0x0A)
5a67ee
      \r      carriage-return  (CR, Ctrl-M, 0x0D)
5a67ee
      \t      horizontal tab   (HT, Ctrl-I, 0x09)
5a67ee
      \v      vertical tab     (VT, Ctrl-K, 0x0B)
5a67ee
      \oNNN   a character with the octal value NNN
5a67ee
      \dNNN   a character with the decimal value NNN
5a67ee
      \xHH    a character with the hexadecimal value HH
5a67ee

5a67ee
   Note that GNU sed also supports "character classes", a POSIX
5a67ee
   extension to regexes, described in section 3.7, above.
5a67ee

5a67ee
G. sed 4.0 and higher versions
5a67ee

5a67ee
   The following expressions can be used in the RHS of a substitution.
5a67ee

5a67ee
      \e   - end case conversion
5a67ee
      \l   - change next character to lower case
5a67ee
      \L   - change remaining text to lower case
5a67ee
      \n   - newline      (printed as 2 bytes, 0D 0A or ^M^J, in DOS)
5a67ee
      \t   - tab          (ASCII 09, 0x09)
5a67ee
      \u   - change next character to upper case
5a67ee
      \U   - change remaining text to upper case
5a67ee

5a67ee
   In addition, GNU sed 4.0 can modify the way ^ and $ are interpreted,
5a67ee
   so that ^ can also match an empty string after a newline character,
5a67ee
   and $ can also match an empty string before a newline character (to
5a67ee
   do this, add an "M" after the regular expression terminator, like
5a67ee
   /^>/M -- see section 3.1.1). Even if you use this feature, \` and \'
5a67ee
   still match the beginning and the end of the pattern space,
5a67ee
   respectively.
5a67ee

5a67ee
H. ssed
5a67ee

5a67ee
   Everything that was said for GNU sed applies to ssed as well. In
5a67ee
   addition, in Perl-mode (-R switch), these become active or inactive:
5a67ee

5a67ee
      .     - no longer matches new-line characters
5a67ee
      \A    - matches beginning of pattern space
5a67ee
      \Z    - matches end of pattern space or last newline in the PS
5a67ee
      \z    - matches end of pattern space
5a67ee
      \d    - matches any digit: same as [0-9]
5a67ee
      \D    - matches any non-digit: same as [^0-9]
5a67ee
      \`    - no longer matches beginning of pattern space
5a67ee
      \'    - no longer matches end of pattern space
5a67ee
      \<    - no longer matches boundary between nonword & word char
5a67ee
      \>    - no longer matches boundary between word & nonword char
5a67ee
      \oNNN - no longer matches char with octal value NNN
5a67ee
      \dNNN - no longer matches char with decimal value NNN
5a67ee
      \NNN  - matches char with octal value NNN
5a67ee

5a67ee
   Perl mode supports lookahead (?=match) and lookbehind (?<=match)
5a67ee
   pattern matching.  The matched text is NOT captured in "&" for s///
5a67ee
   replacements!
5a67ee

5a67ee
      foo(?=bar)   - match "foo" only if "bar" follows it
5a67ee
      foo(?!bar)   - match "foo" only if "bar" does NOT follow it
5a67ee
      (?<=foo)bar  - match "bar" only if "foo" precedes it
5a67ee
      (?
5a67ee

5a67ee
      (?
5a67ee
                  - match "foo" only if NOT preceded by "in", "on" or "at"
5a67ee
      (?<=\d{3})(?
5a67ee
                  - match "foo" only if preceded by 3 digits other than "999"
5a67ee

5a67ee
  In Perl mode, there are two new switches in /addressing/ or s///
5a67ee
  commands. Switches may be lowercase in s/// commands, but must be
5a67ee
  uppercase in /addressing/:
5a67ee

5a67ee
       /S  - lets "." match a newline also
5a67ee
       /X  - extra whitespace is ignored. See below, for sample usage.
5a67ee

5a67ee
   Here are some examples of Perl-style regular expressions. Use the -R
5a67ee
   switch.
5a67ee

5a67ee
     (?i)abc    - case-insensitive match of abc, ABC, aBc, ABc, etc.
5a67ee
     ab(?i)c    - same as above; the (?i) applies throughout the pattern
5a67ee
     (ab(?i)c)  - matches abc or abC; the outer parens make the difference!
5a67ee
     (?m)       - multi-line pattern space: same as "s/FIND/REPL/M"
5a67ee
     (?s)       - set "." to match newline also: same as "s/FIND/REPL/S"
5a67ee
     (?x)       - ignore whitespace and #comments; see section (9) below.
5a67ee

5a67ee
     (?:abc)foo    - match "abcfoo", but do not capture 'abc' in \1
5a67ee
     (?:ab|cd)ef   - match "abef" or "cdef"; only 'cd' is captured in \1
5a67ee
     (?#remark)xy  - match "xy"; remarks after "#" are ignored.
5a67ee

5a67ee
   And here are some sample uses of /X switch to add comments to complex
5a67ee
   expressions. To embed literal spaces, precede with \ or put inside
5a67ee
   [brackets].
5a67ee

5a67ee
     # ssed script to change "(123) 456-7890" into "[ac123] 456-7890"
5a67ee
     #
5a67ee
     s/ # BACKSLASH IS NEEDED AT END OF EACH LINE!   \
5a67ee
     \(                   # literal left paren, (    \
5a67ee
     (\d{3})              # 3 digits                 \
5a67ee
     \)                   # literal right paren, )   \
5a67ee
     [ \t]*               # zero or more spaces or tabs  \
5a67ee
     (\d{3}-\d{4})        # 3 digits, hyphen, 4 digits   \
5a67ee
     /[ac\1] \2/gx;       # replace g(lobally), with e(x)tended spacing
5a67ee

5a67ee
6.7.4. Word boundaries
5a67ee

5a67ee
   GNU sed, ssed, sed16, sed15 and sedmod use certain symbols to define
5a67ee
   the boundary between a "word character" and a nonword character. A
5a67ee
   word character fits the regex "[A-Za-z0-9_]". Note: a word character
5a67ee
   includes the underscore "_" but not the hyphen, probably because the
5a67ee
   underscore is permissible as a label in sed and in other scripting
5a67ee
   languages. (In gsed103, a word character did NOT include the
5a67ee
   underscore; it included alphanumerics only.)
5a67ee

5a67ee
   These symbols include '\<' and '\>' (gsed, ssed, sed15, sed16,
5a67ee
   sedmod) and '\b' and '\B' (gsed only). Note that the boundary
5a67ee
   symbols do not represent a character, but a position on the line.
5a67ee
   Word boundaries are used with literal characters or character sets
5a67ee
   to let you match (and delete or alter) whole words without
5a67ee
   affecting the spaces or punctuation marks outside of those words.
5a67ee
   They can only be used in a "/pattern/" address or in the LHS of a
5a67ee
   's/LHS/RHS/' command. The following table shows how these symbols
5a67ee
   may be used in HHsed and GNU sed. Sedmod matches the syntax of
5a67ee
   HHsed.
5a67ee

5a67ee
      Match position      Possible word boundaries   HHsed   GNU sed
5a67ee
      ---------------------------------------------------------------
5a67ee
      start of word    [nonword char]^[word char]      \<    \< or \b
5a67ee
      end of word         [word char]^[nonword char]   \>    \> or \b
5a67ee
      middle of word      [word char]^[word char]     none      \B
5a67ee
      outside of word  [nonword char]^[nonword char]  none      \B
5a67ee
      ---------------------------------------------------------------
5a67ee

5a67ee
   In ssed, the symbols '\<' and '\>' lose their special meaning when
5a67ee
   the -R switch is used to invoke Perl-style expressions. However,
5a67ee
   the identical meaning of '\<' and '\>' can be obtained through
5a67ee
   these nonmatching, zero-width assertions:
5a67ee

5a67ee
       (?
5a67ee

5a67ee
6.7.5. Commands which operate differently
5a67ee

5a67ee
A. GNU sed version 3.02 and 3.02.80
5a67ee

5a67ee
   The N command no longer discards the contents of the pattern space
5a67ee
   upon reaching the end of file. This is not a bug, it's a feature.
5a67ee
   However, it breaks certain scripts which relied on the older
5a67ee
   behavior of N.
5a67ee

5a67ee
   'N' adds the Next line to the pattern space, enabling multiple
5a67ee
   lines to be stored and acted upon. Upon reaching the last line of
5a67ee
   the file, if the N command was issued again, the contents of the
5a67ee
   pattern space would be silently deleted and the script would abort
5a67ee
   (this has been the traditional behavior). For this reason, sed
5a67ee
   users generally wrote:
5a67ee

5a67ee
       $!N;   # to add the Next line to every line but the last one.
5a67ee

5a67ee
   However, certain sed scripts relied on this behavior, such as the
5a67ee
   script to delete trailing blank lines at the end of a file (see
5a67ee
   script #12 in section 3.2, "Common one-line sed scripts", above).
5a67ee
   Also, classic textbooks such as Dale Dougherty and Arnold Robbins'
5a67ee
   _sed & awk_ documented the older behavior.
5a67ee

5a67ee
   The GNU sed maintainer felt that despite the portability problems
5a67ee
   this would cause, changing the N command to print (rather than
5a67ee
   delete) the pattern space was more consistent with one's intuitions
5a67ee
   about how a command to "append the Next line" _ought_ to behave.
5a67ee
   Another fact favoring the change was that "{N;command;}" will
5a67ee
   delete the last line if the file has an odd number of lines, but
5a67ee
   print the last line if the file has an even number of lines.
5a67ee

5a67ee
   To convert scripts which used the former behavior of N (deleting
5a67ee
   the pattern space upon reaching the EOF) to scripts compatible with
5a67ee
   all versions of sed, change a lone "N;" to "$d;N;".
5a67ee

5a67ee
------------------------------
5a67ee

5a67ee
7. KNOWN BUGS AMONG SED VERSIONS
5a67ee

5a67ee
   Most versions of GNU sed and ssed contain a "buglist" in the
5a67ee
   archive source code of known errors or reported behaviors that may
5a67ee
   be misconstrued as bugs. This portion of the sed FAQ does _not_
5a67ee
   attempt to fully reproduce those buglists files. However, we do
5a67ee
   seek to do some substantial reporting, particularly where certain
5a67ee
   programs have no "buglist" of their own or are not being actively
5a67ee
   maintained.
5a67ee

5a67ee
   As a rule of thumb, if the bug "bites" someone on the sed-users
5a67ee
   mailing list, I tend to report it.
5a67ee

5a67ee
7.1. ssed v3.59 (by Paolo Bonzini)
5a67ee

5a67ee
   (1) N does not discard the contents of the pattern space upon
5a67ee
   reaching the end of file; not a bug. See section 6.7.5.A, above.
5a67ee

5a67ee
   (2) If \x26 is entered into the RHS of a substitution, it is
5a67ee
   interpreted as an ampersand metacharacter, and the entire pattern
5a67ee
   matched in the "find" portion is inserted at that point. A literal
5a67ee
   ampersand should be inserted instead.
5a67ee

5a67ee
   (3) Under Windows 2000, the -i switch doesn't create backup files
5a67ee
   properly. When passed one or more files to process, the source
5a67ee
   file(s) are unchanged, and the output changed files are given
5a67ee
   filenames like sedDOSxyz with no way to correspond them with the
5a67ee
   names of the source files.
5a67ee

5a67ee
7.2. GNU sed v4.0 - v4.0.5
5a67ee

5a67ee
   (1) N does not discard the contents of the pattern space upon
5a67ee
   reaching the end of file; not a bug. See section 6.7.5.A, above.
5a67ee

5a67ee
   (2) If \x26 is entered into the RHS of a substitution, it is
5a67ee
   interpreted as an ampersand metacharacter, and the entire pattern
5a67ee
   matched in the "find" portion is inserted at that point. A literal
5a67ee
   ampersand should be inserted instead.
5a67ee

5a67ee
7.3. GNU sed v3.02.80
5a67ee

5a67ee
   (1) N does not discard the contents of the pattern space upon
5a67ee
   reaching the end of file; not a bug. See section 6.7.5.A, above.
5a67ee

5a67ee
   (2) Same as #2 for GNU sed v4.0, above.
5a67ee

5a67ee
7.4. GNU sed v3.02
5a67ee

5a67ee
   (1) Affects only v3.02 binaries compiled with DJGPP for MS-DOS and
5a67ee
   MS-Windows: 'l' (list) command does not display a lone carriage
5a67ee
   return (0x0D, ^M) embedded in a line.
5a67ee

5a67ee
   (2) The expression "\<" causes problems when attempting the
5a67ee
   following types of substitutions, which should print "+aaa +bbb":
5a67ee

5a67ee
       echo aaa bbb | sed 's/\</+/g'    # prints "+a+a+a +b+b+b"
5a67ee
       echo aaa bbb | sed 's/\<./+&/g'  # prints "+a+a+a +b+b+b"
5a67ee

5a67ee
   (3) The N command no longer discards the contents of the pattern
5a67ee
   space upon reaching the end of file. This is not a bug, it's a
5a67ee
   feature. See section 6.7.5, "Commands which operate differently".
5a67ee

5a67ee
7.5. GNU sed v2.05
5a67ee

5a67ee
   (1) If a number follows the substitute command (e.g., s/f/F/10) and
5a67ee
   the number exceeds the possible matches on the pattern space, the
5a67ee
   command 't label' _always_ jumps to the specified label. 't' should
5a67ee
   jump only if the substitution was successful (or returned "true").
5a67ee

5a67ee
   (2) 'l' (list) command does not convert the following characters to
5a67ee
   hex values, but passes them through unchanged: 0xF7, 0xFB, 0xFC,
5a67ee
   0xFD, 0xFE.
5a67ee

5a67ee
   (3) A range address like "/foo/,14" is supposed to match every line
5a67ee
   from the first occurrence of "foo" until line 14, inclusive, and
5a67ee
   then match only those lines containing "foo" thereafter. In gsed
5a67ee
   v2.05, if "foo" occurs later in the file, every line from there to
5a67ee
   the end of file will be matched (since gsed is looking for line 14
5a67ee
   to occur again!).
5a67ee

5a67ee
   (4) The regexes /\`/ and /\'/ are not interpreted as a backquote
5a67ee
   and apostrophe, as might be expected. Instead, they are used to
5a67ee
   represent the beginning-of-line and end-of-line (respectively), to
5a67ee
   conform with similar regexes in the GNU versions of Emacs and awk.
5a67ee
   As a consequence, there is no clear way to indicate an apostrophe,
5a67ee
   since a bare apostrophe (') has special meaning to the Unix shell
5a67ee
   and the quoted apostrophe (\') is interpreted as the EOL. A
5a67ee
   double-quote apostrophe (\\') was interpreted as a backslash to sed
5a67ee
   and a quote mark to the shell--again, not providing the expected
5a67ee
   results. This syntax changed in the next version of gsed.
5a67ee

5a67ee
   (5) Multiple occurrences of the 'w' command fail, as shown here,
5a67ee
   given that both "aaa" and "bbb" occur within the file:
5a67ee

5a67ee
       gsed -e "/aaa/w FILE" -e "/bbb/w FILE" input.txt
5a67ee

5a67ee
   (6) The expression "\<" causes problems when attempting the
5a67ee
   following type of substitution, which should print "+aaa +bbb":
5a67ee

5a67ee
       echo aaa bbb | sed 's/\</+/g'    # sed hangs up with no output
5a67ee

5a67ee
   The syntax 's/\<./+&/g' issues the proper output.
5a67ee

5a67ee
7.6. GNU sed v1.18
5a67ee

5a67ee
   (1) Same as #1 for GNU sed v2.05, above.
5a67ee

5a67ee
   (2) The following command will lock the computer under Win95. Echos
5a67ee
   is an echo command that does not issue a trailing newline:
5a67ee

5a67ee
       echos any_word | gsed "s/[ ]*$//"
5a67ee

5a67ee
   (3) Same as #3 for GNU sed v2.05, above.
5a67ee

5a67ee
7.7. GNU sed v1.03 (by Frank Whaley)
5a67ee

5a67ee
   (1) The \w and \W escape sequences both match only nonword
5a67ee
   characters. \w is misdefined and should match word characters.
5a67ee

5a67ee
   (2) The underscore is defined as a nonword character; it should be
5a67ee
   defined as a word character.
5a67ee

5a67ee
   (3) same as #3 for GNU sed v2.05, above.
5a67ee

5a67ee
7.8. sed v1.6 (by Walter Briscoe) - still in beta version
5a67ee

5a67ee
   (1) Duplicated subexpressions (still) do not match an empty set as
5a67ee
   they should. This problem was inherited from HHsed15.
5a67ee

5a67ee
       echo 123 | sed "s/\([a-z][a-z]\)*/=\1/"  # does not return '='
5a67ee

5a67ee
   (2) If grouping is followed by a + operator, nothing is matched.
5a67ee
   This problem was inherited from HHsed; it fixed a bug with the *
5a67ee
   operator, but the problem with the + operator persists.
5a67ee

5a67ee
       echo aaa | sed "/\(a\)+/d"          # nothing is deleted.
5a67ee

5a67ee
   (3) With the interval expressions \{1,\} and +, there is a bug
5a67ee
   related to the & replacement character. This affected the BETA
5a67ee
   release, and it's not known if it affects the final release.
5a67ee

5a67ee
       echo ab | sed "s/a[^a]*/&c/"        # returns 'abc'. Okay.
5a67ee
       echo ab | sed "s/a[^a]+/&c/"        # returns 'ab'. Bug!
5a67ee
       echo ab | sed "s/a[^a]\{1,\}/&c/"   # returns 'ab'. Bug!
5a67ee

5a67ee
7.9. HHsed v1.5 (by Howard Helman)
5a67ee

5a67ee
   (1) If a number follows the substitute command (e.g., s/foo/bar/2),
5a67ee
   in a sed script entered from the command line, two semicolons must
5a67ee
   follow the number, or they must be separated by an -e switch.
5a67ee
   Normally, only 1 semicolon is needed to separate commands.
5a67ee

5a67ee
       echo bit bet | HHsed "s/b/n/2;;s/b/B/"          # solution 1
5a67ee
       echo bit bet | HHsed -e "s/b/n/2" -e "s/b/B"    # solution 2
5a67ee

5a67ee
   (2) If the substitute command is followed by a number and a "p"
5a67ee
   flag, when the -n switch is used, the "p" flag must occur first.
5a67ee

5a67ee
       echo aaa | HHsed -n "s/./B/3p"    # bug! nothing prints
5a67ee
       echo aaa | HHsed -n "s/./B/p3"    # prints "aaB" as expected
5a67ee

5a67ee
   (3) The following commands will cause HHsed to lock the computer
5a67ee
   under MS-DOS or Win95. Note that they occur because of malformed
5a67ee
   regular expressions which will match no characters.
5a67ee

5a67ee
       sed -n "p;s/\<//g;" file
5a67ee
       sed -n "p;s/[char-set]*//g;" file
5a67ee

5a67ee
   (4) The range command '/RE1/,/RE2/' in HHsed will match one line if
5a67ee
   both regexes occur on the same line (see section 3.4(3), above).
5a67ee
   Though this could be construed as a feature, it should probably be
5a67ee
   considered a bug since its operation differs from every other
5a67ee
   version of sed. For example, '/----/,/----/{s/^/>>/;}' should put
5a67ee
   two angle brackets ">>" before every line which is sandwiched
5a67ee
   between a row of 4 or more hyphens. With HHsed, this command will
5a67ee
   only prefix the hyphens themselves with the angle brackets.
5a67ee

5a67ee
   (5) If the hold space is empty, the H command copies the pattern
5a67ee
   space to the hold space but fails to prepend a leading newline. The
5a67ee
   H command is supposed to add a newline, followed by the contents of
5a67ee
   the pattern space, to the hold space at all times. A workaround is
5a67ee
   "{G;s/^\(.*\)\(\n\)$/\2\1/;H;s/\n$//;}", but it requires knowing
5a67ee
   that the hold space is empty and using the command only once.
5a67ee
   Another alternative is to use the G or the h command alone at key
5a67ee
   points in the script.
5a67ee

5a67ee
   (6) If grouping is followed by an '*' or '+' operator, HHsed does
5a67ee
   not match the pattern, but issues no warning. See below:
5a67ee

5a67ee
       echo aaa | HHsed "/\(a\)*/d"      # nothing is deleted
5a67ee
       echo aaa | HHsed "/\(a\)+/d"      # nothing is deleted
5a67ee
       echo aaa | HHsed "s/\(a\)*/\1B/"  # nothing is changed
5a67ee
       echo aaa | HHsed "s/\(a\)+/\1B/"  # nothing is changed
5a67ee

5a67ee
   (7) If grouping is followed by an interval expression, HHsed halts
5a67ee
   with the error message "garbled command", in all of the following
5a67ee
   examples:
5a67ee

5a67ee
       echo aaa | HHsed "/\(a\)\{3\}/d"
5a67ee
       echo aaa | HHsed "/\(a\)\{1,5\}/d"
5a67ee
       echo aaa | HHsed "s/\(a\)\{3\}/\1B/"
5a67ee

5a67ee
   (8) In interval expressions, 0 is not supported. E.g., \{0,3\)
5a67ee

5a67ee
7.10. sedmod v1.0 (by Hern Chen)
5a67ee

5a67ee
   Technically, the following are limits (or features?) of sedmod, not
5a67ee
   bugs, since the docs for sedmod do not claim to support these
5a67ee
   missing features.
5a67ee

5a67ee
   (1) sedmod does not support standard interval expressions  \{...\}
5a67ee
   present in nearly all versions of sed.
5a67ee

5a67ee
   (2) If grouping is followed by an '*' or '+' operator, sedmod gives
5a67ee
   a "garbled command" message. However, if the grouped expressions
5a67ee
   are strings literals with no metacharacters, a partial workaround
5a67ee
   can be done like so:
5a67ee

5a67ee
       \(string\)\1*    # matches 1 or more instances of 'string'
5a67ee
       \(string\)\1+    # matches 2 or more instances of 'string'
5a67ee

5a67ee
   (3) sedmod does not support a numeric argument after the s///
5a67ee
   command, as in 's/a/b/3', present in nearly all versions of sed.
5a67ee

5a67ee
   The following are bugs in sedmod v1.0:
5a67ee

5a67ee
   (4) When the -i (ignore case) switch is used, the '/regex/d'
5a67ee
   command is not properly obeyed. Sedmod may miss one or more lines
5a67ee
   matching the expression, regardless of where they occur in the
5a67ee
   script. Workaround: use "/regex/{d;}" instead.
5a67ee

5a67ee
7.11. HP-UX sed
5a67ee

5a67ee
   (1) Versions of HP-UX sed up to and including version 10.20 are
5a67ee
   buggy. According to the README file, which comes with the GNU cc
5a67ee
   at <ftp://ftp.ntua.gr/pub/gnu/sed/sed-2.05.bin.README>:
5a67ee

5a67ee
   "When building gcc on a hppa*-*-hpux10 platform, the `fixincludes'
5a67ee
   step (which involves running a sed script) fails because of a bug
5a67ee
   in the vendor's implementation of sed.  Currently the only known
5a67ee
   workaround is to install GNU sed before building gcc.  The file
5a67ee
   sed-2.05.bin.hpux10 is a precompiled binary for that platform."
5a67ee

5a67ee
7.12. SunOS sed v4.1
5a67ee

5a67ee
   (1) Bug occurs in RE pattern matching when a non-null '[char-set]*'
5a67ee
   is followed by a null '\NUM' pattern recall, illustrated here and
5a67ee
   reported by Greg Ubben:
5a67ee

5a67ee
       s/\(a\)\(b*\)cd\1[0-9]*\2foo/bar/  # between '[0-9]*' and '\2'
5a67ee
       s/\(a\{0,1\}\).\{0,1\}\1/bar/      # between '.\{0,1\}' and '\1'
5a67ee

5a67ee
   Workaround: add a do-nothing 'X*' expression which will not match
5a67ee
   any characters on the line between the two components. E.g.,
5a67ee

5a67ee
       s/\(a\)\(b*\)cd\1[0-9]*X*\2foo/bar/
5a67ee
       s/\(a\{0,1\}\).\{0,1\}X*\1/bar/
5a67ee

5a67ee
7.13. SunOS sed v5.6
5a67ee

5a67ee
   (1) If grouping is followed by an asterisk, SunOS sed does not match
5a67ee
   the null string, which it should do. The following command:
5a67ee

5a67ee
       echo foo | sed 's/f\(NO-MATCH\)*/g\1/'
5a67ee

5a67ee
   should transform "foo" to "goo" under normal versions of sed.
5a67ee

5a67ee
7.14. Ultrix sed v4.3
5a67ee

5a67ee
   (1) If grouping is followed by an asterisk, Ultrix sed replies with
5a67ee
   "command garbled", as shown in the following example:
5a67ee

5a67ee
       echo foo | sed 's/f\(NO-MATCH\)*/g\1/'
5a67ee

5a67ee
   (2) If grouping is followed by a numeric operator such as \{0,9\},
5a67ee
   Ultrix sed does not find the match.
5a67ee

5a67ee
7.15. Digital Unix sed
5a67ee

5a67ee
   (1) The following comes from the man pages for sed distributed with
5a67ee
   new, 1998 versions of Digital Unix (reformatted to fit our
5a67ee
   margins):
5a67ee

5a67ee
   [Digital]  The h subcommand for sed does not work properly.  When
5a67ee
   you use the  h subcommand to place text into the hold area, only
5a67ee
   the last line of the specified text is saved.  You can use the H
5a67ee
   subcommand to append text to the hold area. The H subcommand and
5a67ee
   all others dealing with the hold area work correctly.
5a67ee

5a67ee
   (2) "$d" command issues an error message, "cannot parse".  Reported
5a67ee
   by Carlos Duarte on 8 June 1998.
5a67ee

5a67ee
[end-of-file]