Blame SOURCES/sedfaq.txt

d0cde9

d0cde9
Archive-Name: editor-faq/sed
d0cde9
Posting-Frequency: irregular
d0cde9
Last-modified: 10 March 2003
d0cde9
Version: 015
d0cde9
URL: http://sed.sourceforge.net/sedfaq.html
d0cde9
Maintainer: Eric Pement (pemente@northpark.edu)
d0cde9

d0cde9
                            THE SED FAQ
d0cde9

d0cde9
                  Frequently Asked Questions about
d0cde9
                       sed, the stream editor
d0cde9

d0cde9
CONTENTS
d0cde9

d0cde9
1. GENERAL INFORMATION
d0cde9
1.1. Introduction - How this FAQ is organized
d0cde9
1.2. Latest version of the sed FAQ
d0cde9
1.3. FAQ revision information
d0cde9
1.4. How do I add a question/answer to the sed FAQ?
d0cde9
1.5. FAQ abbreviations
d0cde9
1.6. Credits and acknowledgements
d0cde9
1.7. Standard disclaimers
d0cde9

d0cde9
2. BASIC SED
d0cde9
2.1. What is sed?
d0cde9
2.2. What versions of sed are there, and where can I get them?
d0cde9

d0cde9
2.2.1. Free versions
d0cde9

d0cde9
2.2.1.1. Unix platforms
d0cde9
2.2.1.2. OS/2
d0cde9
2.2.1.3. Microsoft Windows (Win3x, Win9x, WinNT, Win2K)
d0cde9
2.2.1.4. MS-DOS
d0cde9
2.2.1.5. CP/M
d0cde9
2.2.1.6. Macintosh v8 or v9
d0cde9

d0cde9
2.2.2. Shareware and Commercial versions
d0cde9

d0cde9
2.2.2.1. Unix platforms
d0cde9
2.2.2.2. OS/2
d0cde9
2.2.2.3. Windows 95/98, Windows NT, Windows 2000
d0cde9
2.2.2.4. MS-DOS
d0cde9

d0cde9
2.3. Where can I learn to use sed?
d0cde9

d0cde9
2.3.1. Books
d0cde9
2.3.2. Mailing list
d0cde9
2.3.3. Tutorials, electronic text
d0cde9
2.3.4. General web and ftp sites
d0cde9

d0cde9
3. TECHNICAL
d0cde9
3.1. More detailed explanation of basic sed
d0cde9
3.1.1.  Regular expressions on the left side of "s///"
d0cde9
3.1.2.  Escape characters on the right side of "s///"
d0cde9
3.1.3.  Substitution switches
d0cde9
3.2. Common one-line sed scripts. How do I . . . ?
d0cde9

d0cde9
      - double/triple-space a file?
d0cde9
      - convert DOS/Unix newlines?
d0cde9
      - delete leading/trailing spaces?
d0cde9
      - do substitutions on all/certain lines?
d0cde9
      - delete consecutive blank lines?
d0cde9
      - delete blank lines at the top/end of the file?
d0cde9

d0cde9
3.3. Addressing and address ranges
d0cde9
3.4. Address ranges in GNU sed and HHsed
d0cde9
3.5. Debugging sed scripts
d0cde9
3.6. Notes about s2p, the sed-to-perl translator
d0cde9
3.7. GNU/POSIX extensions to regular expressions
d0cde9

d0cde9
4. EXAMPLES
d0cde9
   ONE-CHARACTER QUESTIONS
d0cde9
4.1.  How do I insert a newline into the RHS of a substitution?
d0cde9
4.2.  How do I represent control-codes or non-printable characters?
d0cde9
4.3.  How do I convert files with toggle characters, like +this+,
d0cde9
      to look like [i]this[/i]?
d0cde9

d0cde9
   CHANGING STRINGS
d0cde9
4.10. How do I perform a case-insensitive search?
d0cde9
4.11. How do I match only the first occurrence of a pattern?
d0cde9
4.12. How do I parse a comma-delimited (CSV) data file?
d0cde9
4.13. How do I handle fixed-length, columnar data?
d0cde9
4.14. How do I commify a string of numbers?
d0cde9
4.15. How do I prevent regex expansion on substitutions?
d0cde9
4.16. How do I convert a string to all lowercase or capital letters?
d0cde9

d0cde9
   CHANGING BLOCKS (consecutive lines)
d0cde9
4.20. How do I change only one section of a file?
d0cde9
4.21. How do I delete or change a block of text if the block contains
d0cde9
      a certain regular expression?
d0cde9
4.22. How do I locate a paragraph of text if the paragraph contains a
d0cde9
      certain regular expression?
d0cde9
4.23. How do I match a block of specific consecutive lines?
d0cde9
4.23.1.  Try to use a "/range/, /expression/"
d0cde9
4.23.2.  Try to use a "multi-line\nexpression"
d0cde9
4.23.3.  Try to use a block of "literal strings"
d0cde9
4.24. How do I address all the lines between RE1 and RE2, excluding the lines themselves?
d0cde9
4.25. How do I join two lines if line #1 ends in a [certain string]?
d0cde9
4.26. How do I join two lines if line #2 begins in a [certain string]?
d0cde9
4.27. How do I change all paragraphs to long lines?
d0cde9

d0cde9
   SHELL AND ENVIRONMENT
d0cde9
4.30.   How do I read environment variables with sed ...
d0cde9
4.31.1.   ... on Unix platforms?
d0cde9
4.31.2.   ... on MS-DOS or 4DOS platforms?
d0cde9
4.32.   How do I export or pass variables back into the environment ...
d0cde9
4.32.1.   ... on Unix platforms?
d0cde9
4.32.2.   ... on MS-DOS or 4DOS platforms?
d0cde9
4.33.   How do I handle shell quoting in sed?
d0cde9

d0cde9
   FILES, DIRECTORIES, AND PATHS
d0cde9
4.40.  How do I read (insert/add) a file at the top of a textfile?
d0cde9
4.41.  How do I make substitutions in every file in a directory, or
d0cde9
        in a complete directory tree?
d0cde9
4.41.1.   ... ssed solution
d0cde9
4.41.2.   ... Unix solution
d0cde9
4.41.3.   ... DOS solution
d0cde9
4.42.  How do I replace "/some/UNIX/path" in a substitution?
d0cde9
4.43.  How do I replace "C:\SOME\DOS\PATH" in a substitution?
d0cde9
4.44.  How do I emulate file-includes, using sed?
d0cde9

d0cde9
5. WHY ISN'T THIS WORKING?
d0cde9
5.1.  Why don't my variables like $var get expanded in my sed script?
d0cde9
5.2.  I'm using 'p' to print, but I have duplicate lines sometimes.
d0cde9
5.3.  Why does my DOS version of sed process a file part-way through
d0cde9
      and then quit?
d0cde9
5.4.  My RE isn't matching/deleting what I want it to. (Or, "Greedy vs.
d0cde9
      stingy pattern matching")
d0cde9
5.5.  What is CSDPMI*B.ZIP and why do I need it?
d0cde9
5.6.  Where are the man pages for GNU sed?
d0cde9
5.7.  How do I tell what version of sed I am using?
d0cde9
5.8.  Does sed issue an exit code?
d0cde9
5.9.  The 'r' command isn't inserting the file into the text.
d0cde9
5.10. Why can't I match or delete a newline using the \n escape
d0cde9
      sequence? Why can't I match 2 or more lines using \n?
d0cde9
5.11. My script aborts with an error message, "event not found".
d0cde9

d0cde9
6. OTHER ISSUES
d0cde9
6.1.  I have a problem that stumps me. Where can I get help?
d0cde9
6.2.  How does sed compare with awk, perl, and other utilities?
d0cde9
6.3.  When should I use sed?
d0cde9
6.4.  When should I NOT use sed?
d0cde9
6.5.  When should I ignore sed and use Awk or Perl instead?
d0cde9
6.6.  Known limitations among sed versions
d0cde9
6.7.  Known incompatibilities between sed versions
d0cde9

d0cde9
6.7.1. Issuing commands from the command line
d0cde9
6.7.2. Using comments (prefixed by the '#' sign)
d0cde9
6.7.3. Special syntax in REs
d0cde9
6.7.4. Word boundaries
d0cde9
6.7.5. Commands which operate differently
d0cde9

d0cde9
7. KNOWN BUGS AMONG SED VERSIONS
d0cde9
7.1. ssed v3.59
d0cde9
7.2. GNU sed v4.0 - v4.0.5
d0cde9
7.3. GNU sed v3.02.80
d0cde9
7.4. GNU sed v3.02
d0cde9
7.5. GNU sed v2.05
d0cde9
7.6. GNU sed v1.18
d0cde9
7.7. GNU sed v1.03
d0cde9
7.8. sed v1.6 (Briscoe)
d0cde9
7.9. sed v1.5 (Helman)
d0cde9
7.10. sedmod v1.0 (Chen)
d0cde9
7.11. HP-UX sed
d0cde9
7.12. SunOS sed v4.1
d0cde9
7.13. SunOS sed v5.6
d0cde9
7.14. Ultrix sed v4.3
d0cde9
7.15. Digital Unix sed
d0cde9

d0cde9

d0cde9
------------------------------
d0cde9

d0cde9
1. GENERAL INFORMATION
d0cde9

d0cde9
1.1. Introduction - How this FAQ is organized
d0cde9

d0cde9
   This FAQ is organized to answer common (and some uncommon)
d0cde9
   questions about sed, quickly. If you see a term or abbreviation in
d0cde9
   the examples that seems unclear, see if the term is defined in
d0cde9
   section 1.5. If not, send your comment to pemente[at]northpark.edu.
d0cde9

d0cde9
1.2. Latest version of the sed FAQ
d0cde9

d0cde9
   The newest version of the sed FAQ is usually here:
d0cde9

d0cde9
       http://sed.sourceforge.net/sedfaq.html (HTML version)
d0cde9
       http://sed.sourceforge.net/sedfaq.txt  (plain text)
d0cde9
       http://www.student.northpark.edu/pemente/sed/sedfaq.html
d0cde9
       http://www.student.northpark.edu/pemente/sed/sedfaq.txt
d0cde9
       http://www.faqs.org/faqs/editor-faq/sed
d0cde9
       ftp://rtfm.mit.edu/pub/faqs/editor-faq/sed
d0cde9

d0cde9
   Another FAQ file on sed by a different author can be found here:
d0cde9

d0cde9
       http://www.dreamwvr.com/sed-info/sed-faq.html
d0cde9

d0cde9
1.3. FAQ revision information
d0cde9

d0cde9
   In the plaintext version, changes are shown by a vertical bar (|)
d0cde9
   placed in column 78 of the affected lines. To remove the vertical
d0cde9
   bars (use double quotes for MS-DOS):
d0cde9

d0cde9
     sed 's/  *|$//' sedfaq.txt > sedfaq2.txt
d0cde9

d0cde9
   In the HTML version, vertical bars do not appear. New or altered
d0cde9
   portions of the FAQ are indicated by printing in dark blue type.
d0cde9

d0cde9
   In the text version, words needing emphasis may be surrounded by
d0cde9
   the underscore '_' or the asterisk '*'. In the HTML version, these
d0cde9
   are changed to italics and boldface, respectively.
d0cde9

d0cde9
1.4. How do I add a question/answer to the sed FAQ?
d0cde9

d0cde9
   Word your question briefly and send it to pemente[at]northpark.edu,
d0cde9
   indicating your proposed change. We'll post it on the sed-users
d0cde9
   mailing list (see section 2.3.2) and discuss it there. If it's
d0cde9
   good, your contribution will be added to the next edition.
d0cde9

d0cde9
1.5. FAQ abbreviations
d0cde9

d0cde9
       files = one or more filenames, separated by whitespace
d0cde9
       gsed  = GNU sed
d0cde9
       ssed  = super-sed
d0cde9
       RE    = Regular Expressions supported by sed
d0cde9
       LHS   = the left-hand side ("find" part) of "s/find/repl/" command
d0cde9
       RHS   = the right-hand side ("replace" part) of "s/find/repl/" cmd
d0cde9
       nn+   = version _nn_ or higher (e.g., "15+" = version 1.5 and above)
d0cde9

d0cde9
   files: "files" stands for one or more filenames entered on the
d0cde9
   command line. The names may include any wildcards your shell
d0cde9
   understands (such as ``zork*'' or ``Aug[4-9].let''). Sed will
d0cde9
   process each filename passed to it by the shell.
d0cde9

d0cde9
   RE: For details on regular expressions, see section 3.1.1., below.
d0cde9

d0cde9
1.6. Credits and acknowledgements
d0cde9

d0cde9
   Many of the ideas for this FAQ were taken from the Awk FAQ:
d0cde9
       http://www.faqs.org/faqs/computer-lang/awk/faq/
d0cde9
       ftp://rtfm.mit.edu/pub/usenet/comp.lang.awk/faq
d0cde9

d0cde9
   and from the old Perl FAQ:
d0cde9
       http://www.perl.com/doc/FAQs/FAQ/oldfaq-html/index.html
d0cde9

d0cde9
   The following individuals have contributed significantly to this
d0cde9
   document, and have provided input and wording suggestions for
d0cde9
   questions, answers, and script examples. Credit goes to these
d0cde9
   contributors (in alphabetical order by last name):
d0cde9

d0cde9
   Al Aab, Yiorgos Adamopoulos, Paolo Bonzini, Walter Briscoe, Jim
d0cde9
   Dennis, Carlos Duarte, Otavio Exel, Sven Guckes, Aurelio Jargas,
d0cde9
   Mark Katz, Toby Kelsey, Eric Pement, Greg Pfeiffer, Ken Pizzini,
d0cde9
   Niall Smart, Simon Taylor, Peter Tillier, Greg Ubben, Laurent
d0cde9
   Vogel.
d0cde9

d0cde9
1.7. Standard disclaimers
d0cde9

d0cde9
   While a serious attempt has been made to ensure the accuracy of the
d0cde9
   information presented herein, the contributors and maintainers of
d0cde9
   this document do not claim the absence of errors and make no
d0cde9
   warranties on the information provided. If you notice any mistakes,
d0cde9
   please let us know so we can fix it.
d0cde9

d0cde9
------------------------------
d0cde9

d0cde9
2. BASIC SED
d0cde9

d0cde9
2.1. What is sed?
d0cde9

d0cde9
   "sed" stands for Stream EDitor. Sed is a non-interactive editor,
d0cde9
   written by the late Lee E. McMahon in 1973 or 1974. A brief history
d0cde9
   of sed's origins may be found in an early history of the Unix
d0cde9
   tools, at <http://www.columbia.edu/~rh120/ch106.x09>.
d0cde9

d0cde9
   Instead of altering a file interactively by moving the cursor on
d0cde9
   the screen (as with a word processor), the user sends a script of
d0cde9
   editing instructions to sed, plus the name of the file to edit (or
d0cde9
   the text to be edited may come as output from a pipe). In this
d0cde9
   sense, sed works like a filter -- deleting, inserting and changing
d0cde9
   characters, words, and lines of text. Its range of activity goes
d0cde9
   from small, simple changes to very complex ones.
d0cde9

d0cde9
   Sed reads its input from stdin (Unix shorthand for "standard
d0cde9
   input," i.e., the console) or from files (or both), and sends the
d0cde9
   results to stdout ("standard output," normally the console or
d0cde9
   screen). Most people use sed first for its substitution features.
d0cde9
   Sed is often used as a find-and-replace tool.
d0cde9

d0cde9
     sed 's/Glenn/Harold/g' oldfile >newfile
d0cde9

d0cde9
   will replace every occurrence of "Glenn" with the word "Harold",
d0cde9
   wherever it occurs in the file. The "find" portion is a regular
d0cde9
   expression ("RE"), which can be a simple word or may contain
d0cde9
   special characters to allow greater flexibility (for example, to
d0cde9
   prevent "Glenn" from also matching "Glennon").
d0cde9

d0cde9
   My very first use of sed was to add 8 spaces to the left side of a
d0cde9
   file, so when I printed it, the printing wouldn't begin at the
d0cde9
   absolute left edge of a piece of paper.
d0cde9

d0cde9
     sed 's/^/        /' myfile >newfile   # my first sed script
d0cde9
     sed 's/^/        /' myfile | lp       # my next sed script
d0cde9

d0cde9
   Then I learned that sed could display only one paragraph of a file,
d0cde9
   beginning at the phrase "and where it came" and ending at the
d0cde9
   phrase "for all people". My script looked like this:
d0cde9

d0cde9
     sed -n '/and where it came/,/for all people/p' myfile
d0cde9

d0cde9
   Sed's normal behavior is to print (i.e., display or show on screen)
d0cde9
   the entire file, including the parts that haven't been altered,
d0cde9
   unless you use the -n switch. The "-n" stands for "no output". This
d0cde9
   switch is almost always used in conjunction with a 'p' command
d0cde9
   somewhere, which says to print only the sections of the file that
d0cde9
   have been specified. The -n switch with the 'p' command allow for
d0cde9
   parts of a file to be printed (i.e., sent to the console).
d0cde9

d0cde9
   Next, I found that sed could show me only (say) lines 12-18 of a
d0cde9
   file and not show me the rest. This was very handy when I needed to
d0cde9
   review only part of a long file and I didn't want to alter it.
d0cde9

d0cde9
     # the 'p' stands for print
d0cde9
     sed -n 12,18p myfile
d0cde9

d0cde9
   Likewise, sed could show me everything else BUT those particular
d0cde9
   lines, without physically changing the file on the disk:
d0cde9

d0cde9
     # the 'd' stands for delete
d0cde9
     sed 12,18d myfile
d0cde9

d0cde9
   Sed could also double-space my single-spaced file when it came time
d0cde9
   to print it:
d0cde9

d0cde9
     sed G myfile >newfile
d0cde9

d0cde9
   If you have many editing commands (for deleting, adding,
d0cde9
   substituting, etc.) which might take up several lines, those
d0cde9
   commands can be put into a separate file and all of the commands in
d0cde9
   the file applied to file being edited:
d0cde9

d0cde9
     #  'script.sed' is the file of commands
d0cde9
     # 'myfile' is the file being changed
d0cde9
     sed -f script.sed myfile  # 'script.sed' is the file of commands
d0cde9

d0cde9
   It is not our intention to convert this FAQ file into a full-blown
d0cde9
   sed tutorial (for good tutorials, see section 2.3). Rather, we hope
d0cde9
   this gives the complete novice a few ideas of how sed can be used.
d0cde9

d0cde9
2.2. What versions of sed are there, and where can I get them?
d0cde9

d0cde9
2.2.1. Free versions
d0cde9

d0cde9
   Note: "Free" does not mean "public domain" nor does it necessarily
d0cde9
   mean you will never be charged for it. All versions of sed in this
d0cde9
   section except the CP/M versions are based on the GNU general
d0cde9
   public license and are "free software" by that standard (for
d0cde9
   details, see http://www.gnu.org/philosophy/free-sw.html). This
d0cde9
   means you can get the source code and develop it further.
d0cde9

d0cde9
   At the URLs listed in this category, sed binaries or source code
d0cde9
   can be downloaded and used without fees or license payments.
d0cde9

d0cde9
2.2.1.1. Unix platforms
d0cde9

d0cde9
   ssed v3.60
d0cde9
   ssed is the version recommended by the FAQ maintainers, since it
d0cde9
   shares the same codebase with GNU sed, has the most options, and is
d0cde9
   free software (you can get the source). Though there were earlier
d0cde9
   version of ssed distributed, sites for these are not being listed.
d0cde9

d0cde9
       http://sed.sourceforge.net/grabbag/ssed
d0cde9
       http://freshmeat.net/project/sed/
d0cde9

d0cde9
   GNU sed v4.0.5
d0cde9
   This is the latest official version of GNU sed. It offers in-place
d0cde9
   text replacement as an option switch.
d0cde9

d0cde9
       ftp://ftp.gnu.org/pub/gnu/sed/sed-4.0.5.tar.gz
d0cde9
       http://freshmeat.net/project/sed
d0cde9

d0cde9
   BSD multi-byte sed (Japanese)
d0cde9
   Based on the latest version of GNU sed, which supports multi-byte
d0cde9
   characters.
d0cde9

d0cde9
       ftp://ftp1.freebsd.org/pub/FreeBSD/FreeBSD-stable/packages/Latest/ja-sed.tgz
d0cde9

d0cde9
   GNU sed v3.02.80
d0cde9
   An alpha test release which was the base for the development of
d0cde9
   ssed and GNU sed v4.0.
d0cde9

d0cde9
       ftp://alpha.gnu.org/pub/gnu/sed/sed-3.02.80.tar.gz
d0cde9

d0cde9
   GNU sed v3.02a
d0cde9
   Interim version with most features of GNU sed v3.02.80.
d0cde9

d0cde9
   GNU sed v3.02
d0cde9
       ftp://ftp.gnu.org/pub/gnu/sed/sed-3.02.tar.gz
d0cde9

d0cde9
   Precompiled versions:
d0cde9

d0cde9
   GNU sed v3.02-8
d0cde9
   source code and binaries for Debian GNU/Linux
d0cde9

d0cde9
       http://www.debian.org/Packages/stable/base/sed.html
d0cde9

d0cde9
   For some time, the GNU project <http://www.gnu.org> used Eric S.
d0cde9
   Raymond's version of sed (ESR sed v1.1), but eventually dropped it
d0cde9
   because it had too many built-in limits. In 1991 Howard Helman
d0cde9
   modified the GNU/ESR sed and produced a flexible version of sed
d0cde9
   v1.5 available at several sites (Helman's version permitted things
d0cde9
   like \<...\> to delimit word boundaries, \xHH to enter hex code and
d0cde9
   \n to indicate newlines in the replace string). This version did
d0cde9
   not catch on with the GNU project and their version of sed has
d0cde9
   moved in a similar but different direction.
d0cde9

d0cde9
   sed v1.3, by Eric Steven Raymond (released 4 June 1998)
d0cde9
       http://catb.org/~esr/sed-1.3.tar.gz
d0cde9

d0cde9
   Eric Raymond <esr@snark.thyrsus.com> wrote one of the earliest
d0cde9
   versions of sed. On his website <http://www.catb.org/~esr/> which
d0cde9
   also distributes many freeware utilities he has written or worked
d0cde9
   on, he describes sed v1.1 this way:
d0cde9

d0cde9
   "This is the fast, small sed originally distributed in the GNU
d0cde9
   toolkit and still distributed with Minix. The GNU people ditched it
d0cde9
   when they built their own sed around an enhanced regex package --
d0cde9
   but it's still better for some uses (in particular, faster and less
d0cde9
   memory-intensive)." (Version 1.3 fixes an unidentified bug and adds
d0cde9
   the L command to hexdump the current pattern space.)
d0cde9

d0cde9
2.2.1.2. OS/2
d0cde9

d0cde9
   GNU sed v3.02.80
d0cde9
       http://www2s.biglobe.ne.jp/~vtgf3mpr/gnu/sed.htm
d0cde9

d0cde9
   GNU sed v3.02
d0cde9
       http://hobbes.nmsu.edu/pub/os2/util/file/sed-3_02-r2-bin.zip # binaries
d0cde9
       http://hobbes.nmsu.edu/pub/os2/util/file/sed-3_02-r2.zip     # source
d0cde9

d0cde9
2.2.1.3. Microsoft Windows (Win3x, Win9x, WinNT, Win2K)
d0cde9

d0cde9
   GNU sed v4.0.5
d0cde9
   32-bit binaries and docs. Precompiled versions not available (yet).
d0cde9

d0cde9
   GNU sed v3.02.80
d0cde9
   32-bit binaries and docs, using DJGPP compiler. For details on new
d0cde9
   features, see Unix section, above.
d0cde9

d0cde9
       http://www.student.northpark.edu/pemente/sed/sed3028a.zip # DOS binaries
d0cde9
       ftp://alpha.gnu.org/pub/gnu/sed/sed-3.02.80.tar.gz        # source
d0cde9
       ftp://ftp.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/sed3028b.zip # binaries
d0cde9
       ftp://ftp.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/sed3028d.zip # docs
d0cde9
       ftp://ftp.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/sed3028s.zip # source
d0cde9

d0cde9
   GNU sed v2.05
d0cde9
   32-bit binaries, no docs. Requires 80386 DX (SX will not run) and
d0cde9
   must be run in a DOS window or in a full screen DOS session under
d0cde9
   Microsoft Windows. Will not run in MS-DOS mode (outside Win/Win95).
d0cde9
   We recommend using the latest version of GNU sed.
d0cde9
       http://www.simtel.net/pub/win95/prog/gsed205b.zip
d0cde9
       ftp://ftp.cdrom.com/pub/simtelnet/win95/prog/gsed205b.zip
d0cde9

d0cde9
   GNU sed v1.03
d0cde9
   modified by Frank Whaley.
d0cde9

d0cde9
   This version was part of the "Virtually UN*X" toolset, hosted by
d0cde9
   itribe.net; that website is now closed. Gsed v1.03 supported Win9x
d0cde9
   long filenames, as well as hex, decimal, binary, and octal
d0cde9
   character representations.
d0cde9

d0cde9
   The Cygwin toolkit:
d0cde9
       http://www.cygwin.com
d0cde9

d0cde9
   Formerly know as "GNU-Win32 tools." According to their home page,
d0cde9
   "The Cygwin tools are Win32 ports of the popular GNU development
d0cde9
   tools for Windows NT, 95 and 98. They function through the use of
d0cde9
   the Cygwin library which provides a UNIX-like API on top of the
d0cde9
   Win32 API." The version of sed used is GNU sed v3.02.
d0cde9

d0cde9
   Minimalist GNU for Windows (MinGW):
d0cde9
       http://www.mingw.org
d0cde9
       http://mingw.sourceforge.net
d0cde9

d0cde9
   According to their home page, "MinGW ('Minimalist GNU for Windows')
d0cde9
   refers to a set of runtime headers, used in building a compiler
d0cde9
   system based on the GNU GCC and binutils projects. It compiles and
d0cde9
   links code to be run on Win32 platforms ... MinGW uses Microsoft
d0cde9
   runtime libraries, distributed with the Windows operating system."
d0cde9
   The version of sed used is GNU sed v3.02.
d0cde9

d0cde9
   sed v1.5 (a/k/a HHsed), by Howard Helman
d0cde9
   Compiled with Mingw32 for 32-bit environments described above. This
d0cde9
   version should support Win95 long filenames.
d0cde9
       http://www.dbnet.ece.ntua.gr/~george/sed/OLD/sed15.exe
d0cde9
       http://www.student.northpark.edu/pemente/sed/sed15exe.zip
d0cde9

d0cde9
2.2.1.4. MS-DOS
d0cde9

d0cde9
   sed v1.6 (from HHsed), by Walter Briscoe
d0cde9

d0cde9
   This is a forthcoming version, now in beta testing, but with many
d0cde9
   new features. It corrects all the bugs in sed v1.5, and adds the
d0cde9
   best features of sedmod v1.0 (below). It is available in 16-bit and
d0cde9
   32-bit compiled versions for MS-DOS. Sorry, no URLs available yet.
d0cde9

d0cde9
   sed v1.5 (a/k/a HHsed), by Howard Helman
d0cde9
   uncompiled source code (Turbo C)
d0cde9
       ftp://ftp.simtel.net/pub/simtelnet/msdos/txtutl/sed15.zip
d0cde9
       ftp://ftp.cdrom.com/pub/simtelnet/msdos/txtutl/sed15.zip
d0cde9

d0cde9
   DOS executable and documentation
d0cde9
       ftp://ftp.simtel.net/pub/simtelnet/msdos/txtutl/sed15x.zip
d0cde9
       ftp://ftp.cdrom.com/pub/simtelnet/msdos/txtutl/sed15x.zip
d0cde9

d0cde9
   sedmod v1.0, by Hern Chen
d0cde9
       http://www.ptug.org/sed/SEDMOD10.ZIP
d0cde9
       http://www.student.northpark.edu/pemente/sed/sedmod10.zip
d0cde9
       ftp://garbo.uwasa.fi/pc/unix/sedmod10.zip
d0cde9

d0cde9
   GNU sed v3.02.80
d0cde9
   See section 2.2.1.3 ("Microsoft Windows"), above.
d0cde9

d0cde9
   GNU sed v2.05
d0cde9
   Does not run under MS-DOS.
d0cde9

d0cde9
   GNU sed v1.18
d0cde9
   32-bit binaries and source, using DJGPP compiler. Requires 80386 SX
d0cde9
   or better. Also requires 3 CWS*.EXE extenders on the path. See
d0cde9
   section 5.5 ("What is CSDPMI*B.ZIP and why do I need it?"), below.
d0cde9
   We recommend using a newer version of GNU sed.
d0cde9
       http://www.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/sed118b.zip
d0cde9
       ftp://ftp.cdrom.com/pub/simtelnet/gnu/djgpp/v2gnu/sed118b.zip
d0cde9
       http://www.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/sed118s.zip
d0cde9
       ftp://ftp.cdrom.com/pub/simtelnet/gnu/djgpp/v2gnu/sed118s.zip
d0cde9

d0cde9
   GNU sed v1.06
d0cde9
   16-bit binaries and source. Should run under any MS-DOS system.
d0cde9
       http://www.simtel.net/pub/gnu/gnuish/sed106.zip
d0cde9
       ftp://ftp.cdrom.com/pub/simtelnet/gnu/gnuish/sed106.zip
d0cde9

d0cde9
2.2.1.5. CP/M
d0cde9

d0cde9
   ssed v2.2, by Chuck A. Forsberg
d0cde9

d0cde9
   Written for CP/M, ssed (for "small/stupid stream editor) supports
d0cde9
   only the a(ppend), c(hange), d(elete) and i(nsert) options, and
d0cde9
   apparently doesn't support regular expressions. A -u switch will
d0cde9
   "unsqueeze" compressed files and was used mainly in conjunction
d0cde9
   with DIF.COM for source code maintenance. (file: ssed22.lbr)
d0cde9

d0cde9
   change, by Michael M. Rubenstein
d0cde9

d0cde9
   Rubenstein released a version of sed called CHANGE.COM (the
d0cde9
   TTOOLS.LBR archive member CHANGE.CZM is a "crunched" file).
d0cde9
   CHANGE.COM supports full RE's except grouping and backreferences,
d0cde9
   and its only function is global substitution. (file: ttools.lbr)
d0cde9

d0cde9
2.2.1.6. Macintosh v8 or v9
d0cde9

d0cde9
   Since sed is a command-line utility, it is not customary to think
d0cde9
   of sed being used on a Mac. Nonetheless, the following instructions
d0cde9
   from Aurelio Jargas describe the process for running sed on MacOS
d0cde9
   version version 8 or 9.
d0cde9

d0cde9
   (1) Download and install the Apple DiskCopy application
d0cde9

d0cde9
       ftp://ftp.apple.com/developer/Development_Kits
d0cde9

d0cde9
   (2) Download and install Apple MPW
d0cde9

d0cde9
       ftp://ftp.apple.com/developer/Tool_Chest/Core_Mac_OS_Tools/MPW_etc./
d0cde9

d0cde9
   (3) Download and expand Matthias Neeracher's GNU sed for MPW. (They
d0cde9
   seem to have misnumbered the sed filename.)
d0cde9

d0cde9
       ftp://sunsite.cnlab-switch.ch/software/platform/macos/src/mpw_c/sed-2.03.sit.bin
d0cde9

d0cde9
   (4) Enter the sed-3.02 directory and doubleclick the 'sed' file
d0cde9

d0cde9
   (5) MPW Shell will open up. It will be a command window instead of
d0cde9
   a command line, but sed should work as expected. For example:
d0cde9

d0cde9
       echo aa | sed 's/a/Z/g'<ENTER>
d0cde9

d0cde9
   Note that ENTER is different from RETURN on an iMac. Apple *also*
d0cde9
   has its own version of sed on MPW, called "StreamEdit", with a
d0cde9
   syntax fairly similar to that of normal sed.
d0cde9

d0cde9
2.2.2. Shareware and Commercial versions
d0cde9

d0cde9
2.2.2.1. Unix platforms
d0cde9

d0cde9
       [ Additional information needed. ]
d0cde9

d0cde9
2.2.2.2. OS/2
d0cde9

d0cde9
   Hamilton Labs:
d0cde9
       http://www.hamiltonlabs.com/cshell.htm
d0cde9

d0cde9
   A sizable set of Unix/C shell utilities designed for OS/2. Price is
d0cde9
   $350 in the US, $395 elsewhere, with FedEx shipping, unconditional
d0cde9
   guarantee, unlimited support and free updates. A demo version of
d0cde9
   the suite can be downloaded from this site, but a stand-alone copy
d0cde9
   of sed is not available.
d0cde9

d0cde9
2.2.2.3. Windows 95/98, Windows NT, Windows 2000
d0cde9

d0cde9
   Hamilton Labs:
d0cde9
       http://www.hamiltonlabs.com/cshell.htm
d0cde9

d0cde9
   A sizable set of Unix/C shell utilities designed for Win9x, WinNT,
d0cde9
   and Win2K. Price is $350 in the US, $395 elsewhere, with FedEx
d0cde9
   shipping, unconditional guarantee, unlimited support and free
d0cde9
   updates. A demo version of the suite can be downloaded from this
d0cde9
   site, but a stand-alone copy of sed is not available.
d0cde9

d0cde9
   Interix:
d0cde9
       http://www.interix.com
d0cde9

d0cde9
   Interix (formerly known as OpenNT) is advertised as "a complete
d0cde9
   UNIX system environment running natively on Microsoft Windows NT",
d0cde9
   and is licensed and supported by Softway Systems. It offers over
d0cde9
   200 Unix utilities, and supports Unix shells, sockets, networking,
d0cde9
   and more. A single-user edition runs about $200. A free demo or
d0cde9
   evaluation copy will run for 31 days and then quit; to continue
d0cde9
   using it, you must purchase the commercial version.
d0cde9

d0cde9
   MKS NuTCRACKER Professional
d0cde9
       http://www.datafocus.com/products/nutc/
d0cde9

d0cde9
   A different, yet related product line offered by MKS (Mortice Kern
d0cde9
   Systems, below); the awkward spelling "NuTCRACKER" is intentional.
d0cde9
   Various packages offer hundreds of Unix utilities for Win32
d0cde9
   environments. Sed is not available as a separate product.
d0cde9

d0cde9
   UnixDos:
d0cde9
       http://www.unixdos.com
d0cde9

d0cde9
   UnixDos is a suite of 82 Unix utilities ported over to the Windows
d0cde9
   environments. There are 16-bit versions for Win3.x and 32-bit
d0cde9
   versions for WinNT/Win95. It is distributed as uncrippled shareware
d0cde9
   for the first 30 days. After the test period, the utilities will
d0cde9
   not run and you must pay the registration fee of $50.
d0cde9

d0cde9
   Their version of sed supports "\n" in the RHS of expressions, and
d0cde9
   increases the length of input lines to 10,000 characters. By
d0cde9
   special arrangement with the owners, persons who want a licensed
d0cde9
   version of sed *only* (without the other utilities) may pay a
d0cde9
   license fee of $10.
d0cde9

d0cde9
   U/WIN:
d0cde9
       http://www.research.att.com/sw/tools/uwin/
d0cde9

d0cde9
   U/WIN is a suite of Unix utilities created for WinNT and Win95
d0cde9
   systems. It is owned by AT&T, created by David Korn (author of the
d0cde9
   Unix korn shell), and is freely distributed only to educational
d0cde9
   institutions, AT&T employees, or certain researchers; all others
d0cde9
   must pay a fee after a 90-day evaluation period expires. U/WIN
d0cde9
   operates best with the NTFS (WinNT file system) but will run in
d0cde9
   degraded mode with the FAT file system and in further degraded mode
d0cde9
   under Win95. A minimal installation takes about 25 to 30 megs of
d0cde9
   disk space. Sed is not available as a separate file for download,
d0cde9
   but comes with the suite.
d0cde9

d0cde9
2.2.2.4. MS-DOS
d0cde9

d0cde9
   Mix C/Utilities Toolchest
d0cde9
       http://www.mixsoftware.com/product/utility.htm
d0cde9

d0cde9
   According to their web page, "The C/Utilities Toolchest adds over
d0cde9
   40 powerful UNIX utilities to your MS-DOS operating system. The
d0cde9
   result is an environment very similar to UNIX operating systems,
d0cde9
   yet 100% compatible with MS-DOS programs and commands." The
d0cde9
   toolchest costs $19.95, with source code available for an
d0cde9
   additional fee. Mix C's version of sed is not available separately.
d0cde9

d0cde9
   MKS (Mortice Kern Systems) Toolkit
d0cde9
       http://www.mks.com
d0cde9

d0cde9
   Sed comes bundled with the MKS Toolkit, which is distributed only
d0cde9
   as commercial software; it is not available separately.
d0cde9

d0cde9
   Thompson Automation Software
d0cde9
       http://www.tasoft.com
d0cde9

d0cde9
   The Thompson Toolkit contains over 100 familiar Unix utilities,
d0cde9
   including a version of the Unix Korn shell. It runs under MS-DOS,
d0cde9
   OS/2, Win3.x, Win9x, and WinNT. Sed is one of the utilities, though
d0cde9
   Thompson is better known for its version of awk for DOS, TAWK. The
d0cde9
   toolkit runs about $150; sed is not available separately.
d0cde9

d0cde9
2.3. Where can I learn to use sed?
d0cde9

d0cde9
2.3.1. Books
d0cde9

d0cde9
       _Sed & Awk, 2d edition_, by Dale Dougherty & Arnold Robbins
d0cde9
       (Sebastopol, Calif: O'Reilly and Associates, 1997)
d0cde9
       ISBN 1-56592-225-5
d0cde9
       http://www.oreilly.com/catalog/sed2/noframes.html
d0cde9

d0cde9
   About 40 percent of this book is devoted to sed, and maybe 50
d0cde9
   percent is devoted to awk. The other 10 percent covers regexes and
d0cde9
   concepts common to both tools. If you prefer hard copy, this is
d0cde9
   definitely the best single place to learn to use sed, including its
d0cde9
   advanced features.
d0cde9

d0cde9
   The first edition is also very useful. Several typos crept into the
d0cde9
   first printing of the first edition (though if you follow the
d0cde9
   tutorials closely, you'll recognize them right away). A list of
d0cde9
   errors from the first printing of _sed & awk_ is available at
d0cde9
   <http://www.cs.colostate.edu/~dzubera/sedawk.txt>, and errors in
d0cde9
   the 2nd are at <http://www.cs.colostate.edu/~dzubera/sedawk2.txt>,
d0cde9
   though most of these were corrected in later printings. The second
d0cde9
   edition tells how POSIX standards have affected these tools and
d0cde9
   covers the popular GNU versions of sed and awk. Price is about (US)
d0cde9
   $30.00
d0cde9

d0cde9
   -----
d0cde9

d0cde9
       _Mastering Regular Expressions, 2d ed.,_ by Jeffrey E. F. Friedl
d0cde9
       (Sebastopol, Calif: O'Reilly and Associates, 2002)
d0cde9
       ISBN 0-596-00289-0
d0cde9
       http://regex.info
d0cde9
       http://www.oreilly.com/catalog/regex2/
d0cde9
       http://public.yahoo.com/~jfriedl/regex/ (for the first edition)
d0cde9

d0cde9
   Knowing how to use "regular expressions" is essential to effective
d0cde9
   use of most Unix tools. This book focuses on how regular
d0cde9
   expressions can be best implemented in utilities such as perl, vi,
d0cde9
   emacs, and awk, but also touches on sed as well. Friedl's home page
d0cde9
   (above) gives links to other sites which help students learn to
d0cde9
   master regular expressions. His site also gives a Perl script for
d0cde9
   determining a syntactically valid e-mail address, using regexes:
d0cde9

d0cde9
       http://public.yahoo.com/~jfriedl/regex/code.html
d0cde9

d0cde9
   -----
d0cde9

d0cde9
       _Awk und Sed_, by Helmut Herold.
d0cde9
       (Bonn: Addison-Wesley, 1994; 288 pages)
d0cde9
       2nd edition to be released in March 2003
d0cde9
       ISBN 3-8273-2094-1
d0cde9
       http://www.addison-wesley.de/main/main.asp?page=home/bookdetails&ProductID=37214
d0cde9

d0cde9
2.3.2. Mailing list
d0cde9

d0cde9
   If you are interested in learning more about sed (its syntax, using
d0cde9
   regular expressions, etc.) you are welcome to subscribe to a
d0cde9
   sed-oriented mailing list. In fact, there are two mailing lists
d0cde9
   about sed: one in English named "sed-users", moderated by Sven
d0cde9
   Guckes; and one in Portuguese named "sed-BR" (for sed-Brazil),
d0cde9
   moderated by Aurelio Marinho Jargas. The average volume of mail for
d0cde9
   "sed-users" is about 35 messages a week; the average volume of mail
d0cde9
   for "sed-BR" is about 15 messages a week.
d0cde9

d0cde9
       sed-BR mailing list:    http://br.groups.yahoo.com/group/sed-br/
d0cde9
       sed-users mailing list: http://groups.yahoo.com/group/sed-users/
d0cde9

d0cde9
   To subscribe to sed-users, send a blank message to:
d0cde9

d0cde9
       sed-users-subscribe@yahoogroups.com
d0cde9

d0cde9
   To unsubscribe from sed-users, send a blank message to:
d0cde9

d0cde9
       sed-users-unsubscribe@yahoogroups.com
d0cde9

d0cde9
2.3.3. Tutorials, electronic text
d0cde9

d0cde9
   The original users manual for sed, by Lee E. McMahon, from the
d0cde9
   7th edition UNIX Manual (1978), with the classic "Kubla Khan"
d0cde9
   example and tutorial, in formatted text format:
d0cde9
       http://sed.sourceforge.net/grabbag/tutorials/sed_mcmahon.txt
d0cde9

d0cde9
   The source code to the preceding manual. Use "troff -ms sed" to
d0cde9
   print this file properly:
d0cde9
       http://plan9.bell-labs.com/7thEdMan/vol2/sed
d0cde9
       http://cm.bell-labs.com/7thEdMan/vol2/sed
d0cde9

d0cde9
   "Do It With Sed", by Carlos Duarte
d0cde9
       http://www.dbnet.ece.ntua.gr/~george/sed/OLD/sedtut_1.html
d0cde9

d0cde9
   "Sed: How to use sed, a special editor for modifying files
d0cde9
   automatically", by Bruce Barnett and General Electric Company
d0cde9
       http://www.grymoire.com/Unix/Sed.html
d0cde9

d0cde9
   U-SEDIT2.ZIP, by Mike Arst (16 June 1990)
d0cde9
       ftp://ftp.cs.umu.se/pub/pc/u-sedit2.zip
d0cde9
       ftp://ftp.uni-stuttgart.de/pub/systems/msdos/util/unixlike/u-sedit2.zip
d0cde9
       ftp://sunsite.icm.edu.pl/vol/wojsyl/garbo/pc/editor/u-sedit2.zip
d0cde9
       ftp://ftp.sogang.ac.kr/pub/msdos/garbo_pc/editor/u-sedit2.zip
d0cde9

d0cde9
   U-SEDIT3.ZIP, by Mike Arst (24 Jan. 1992)
d0cde9
       http://www.student.northpark.edu/pemente/sed/u-sedit3.zip
d0cde9
       CompuServe DTPFORUM, "PC DTP Utilities" library, file SEDDOC.ZIP
d0cde9

d0cde9
   Another sed FAQ
d0cde9
       http://www.dreamwvr.com/sed-info/sed-faq.html
d0cde9

d0cde9
   sed-tutorial, by Felix von Leitner
d0cde9
       http://www.math.fu-berlin.de/~leitner/sed/tutorial.html
d0cde9

d0cde9
   "Manipulating text with sed," chapter 14 of the SCO OpenServer
d0cde9
   "Operating System Users Guide"
d0cde9
       http://ou800doc.caldera.com/SHL_automate/CTOC-Manipulating_text_with_sed.html
d0cde9

d0cde9
   "Combining the Bourne-shell, sed and awk in the UNIX environment
d0cde9
   for language analysis," by Lothar Schmitt and Kiel Christianson.
d0cde9
   This basic tutorial on the Bourne shell, sed and awk downloads as a
d0cde9
   71-page PostScript file (compressed to 290K with gzip). You may
d0cde9
   need to navigate down from the root to get the file.
d0cde9
       ftp://ftp.u-aizu.ac.jp/u-aizu/doc/Tech-Report/1997/97-2-007.tar.gz
d0cde9
       available upon request from Lothar Schmitt <lothar@u-aizu.ac.jp>
d0cde9

d0cde9
2.3.4. General web and ftp sites
d0cde9

d0cde9
       http://sed.sourceforge.net/grabbag             # Collected scripts
d0cde9
       http://main.rtfiber.com.tw/~changyj/sed/       # Yao-Jen Chang
d0cde9
       http://www.math.fu-berlin.de/~guckes/sed/      # Sven Guckes
d0cde9
       http://www.math.fu-berlin.de/~leitner/sed/     # Felix von Leitner
d0cde9
       http://www.dbnet.ece.ntua.gr/~george/sed/      # Yiorgos Adamopoulos
d0cde9
       http://www.student.northpark.edu/pemente/sed/  # Eric Pement
d0cde9

d0cde9
       http://spacsun.rice.edu/FAQ/sed.html
d0cde9
       ftp://algos.inesc.pt/pub/users/cdua/scripts.tar.gz (sed and shell scripts)
d0cde9

d0cde9
   "Handy One-Liners For Sed", compiled by Eric Pement. A large list
d0cde9
   of 1-line sed commands which can be executed from the command line.
d0cde9
       http://sed.sourceforge.net/sed1line.txt
d0cde9
       http://www.student.northpark.edu/pemente/sed/sed1line.txt
d0cde9

d0cde9
   "Handy One-Liners For Sed", translated to Portuguese
d0cde9
       http://wmaker.lrv.ufsc.br/sed_ptBR.html
d0cde9

d0cde9
   The Single UNIX Specification, Version 3 (technical man page)
d0cde9
       http://www.opengroup.org/onlinepubs/007904975/utilities/sed.html
d0cde9

d0cde9
   Getting started with sed
d0cde9
       http://www.cs.hmc.edu/tech_docs/qref/sed.html
d0cde9

d0cde9
   masm to gas converter
d0cde9
       http://www.delorie.com/djgpp/faq/converting/asm2s-sed.html
d0cde9

d0cde9
   mail2html.zip
d0cde9
       http://www.crispen.org/src/#mail2html
d0cde9

d0cde9
   sample uses of sed in batch files and scripts (Benny Pederson)
d0cde9
       http://users.cybercity.dk/~bse26236/batutil/help/SED.HTM
d0cde9

d0cde9
   dc.sed - the most complex and impressive sed script ever written.
d0cde9
   This sed script by Greg Ubben emulates the Unix dc (desk
d0cde9
   calculator), including base conversion, exponentiation, square
d0cde9
   roots, and much more.
d0cde9
       http://sed.sourceforge.net/grabbag/scripts/dc_overview.htm
d0cde9

d0cde9
   If you should find other tutorials or scripts that should be added
d0cde9
   to this document, please forward the URLs to the FAQ maintainer.
d0cde9

d0cde9
------------------------------
d0cde9

d0cde9
3. TECHNICAL
d0cde9

d0cde9
3.1. More detailed explanation of basic sed
d0cde9

d0cde9
   Sed takes a script of editing commands and applies each command, in
d0cde9
   order, to each line of input. After all the commands have been
d0cde9
   applied to the first line of input, that line is output. A second
d0cde9
   input line is taken for processing, and the cycle repeats. Sed
d0cde9
   scripts can address a single line by line number or by matching a
d0cde9
   /RE pattern/ on the line. An exclamation mark '!' after a regex
d0cde9
   ('/RE/!') or line number will select all lines that do NOT match
d0cde9
   that address. Sed can also address a range of lines in the same
d0cde9
   manner, using a comma to separate the 2 addresses.
d0cde9

d0cde9
     $d               # delete the last line of the file
d0cde9
     /[0-9]\{3\}/p    # print lines with 3 consecutive digits
d0cde9
     5!s/ham/cheese/  # except on line 5, replace 'ham' with 'cheese'
d0cde9
     /awk/!s/aaa/bb/  # unless 'awk' is found, replace 'aaa' with 'bb'
d0cde9
     17,/foo/d        # delete all lines from line 17 up to 'foo'
d0cde9

d0cde9
   Following an address or address range, sed accepts curly braces
d0cde9
   '{...}' so several commands may be applied to that line or to the
d0cde9
   lines matched by the address range. On the command line, semicolons
d0cde9
   ';' separate each instruction and must precede the closing brace.
d0cde9

d0cde9
     sed '/Owner:/{s/yours/mine/g;s/your/my/g;s/you/me/g;}' file
d0cde9

d0cde9
   Range addresses operate differently depending on which version of
d0cde9
   sed is used (see section 3.4, below). For further information on
d0cde9
   using sed, consult the references in section 2.3, above.
d0cde9

d0cde9
3.1.1. Regular expressions on the left side of "s///"
d0cde9

d0cde9
   All versions of sed support Basic Regular Expressions (BREs). For
d0cde9
   the syntax of BREs, enter "man ed" at a Unix shell prompt. A
d0cde9
   technical description of BREs from IEEE POSIX 1003.1-2001 and the
d0cde9
   Single UNIX Specification Version 3 is available online at:
d0cde9
   http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap09.html#tag_09_03
d0cde9

d0cde9
   Sed normally supports BREs plus '\n' to match a newline in the
d0cde9
   pattern space, plus '\xREx' as equivalent to '/RE/', where 'x' is any
d0cde9
   character other than a newline or another backslash.
d0cde9

d0cde9
   Some versions of sed support supersets of BREs, or "extended
d0cde9
   regular expressions", which offer additional metacharacters for
d0cde9
   increased flexibility. For additional information on extended REs
d0cde9
   in GNU sed, see sections 3.7 ("GNU/POSIX extensions to regular
d0cde9
   expressions") and 6.7.3 ("Special syntax in REs"), below.
d0cde9

d0cde9
   Though not required by BREs, some versions of sed support \t to
d0cde9
   represent a TAB, \r for carriage return, \xHH for direct entry of
d0cde9
   hex codes, and so forth. Other versions of sed do not.
d0cde9

d0cde9
   ssed (super-sed) introduced many new features for LHS pattern
d0cde9
   matching, too many to give here. The complete list is found in
d0cde9
   section 6.7.3.H ("ssed"), below.
d0cde9

d0cde9
3.1.2. Escape characters on the right side of "s///"
d0cde9

d0cde9
   The right-hand side (the replacement part) in "s/find/replace/" is
d0cde9
   almost always a string literal, with no interpolation of these
d0cde9
   metacharacters:
d0cde9

d0cde9
       .   ^   $   [   ]   {   }   (   )  ?   +   *   |
d0cde9

d0cde9
   Three things *are* interpolated: ampersand (&), backreferences, and
d0cde9
   options for special seds. An ampersand on the RHS is replaced by
d0cde9
   the entire expression matched on the LHS. There is _never_ any
d0cde9
   reason to use grouping like this:
d0cde9

d0cde9
       s/\(some-complex-regex\)/one two \1 three/
d0cde9

d0cde9
   since you can do this instead:
d0cde9

d0cde9
       s/some-complex-regex/one two & three/
d0cde9

d0cde9
   To enter a literal ampersand on the RHS, type '\&'.
d0cde9

d0cde9
   Grouping and backreferences: All versions of sed support grouping
d0cde9
   and backreferences on the LHS and backreferences only on the RHS.
d0cde9
   Grouping allows a series of characters to be collected in a set,
d0cde9
   indicating the boundaries of the set with \( and \). Then the set
d0cde9
   can be designated to be repeated a certain number of times
d0cde9

d0cde9
       \(like this\)*   or   \(like this\)\{5,7\}.
d0cde9

d0cde9
   Groups can also be nested "\(like \(this\) is here\)" and may
d0cde9
   contain any valid RE. Backreferences repeat the contents of a
d0cde9
   particular group, using a backslash and a digit (1-9) for each
d0cde9
   corresponding group. In other words, "/\(pom\)\1/" is another way
d0cde9
   of writing "/pompom/". If groups are nested, backreference numbers
d0cde9
   are counted by matching \( in strict left to right order.  Thus,
d0cde9
   /..\(the \(word\)\) \("foo"\)../ is matched by the backreference
d0cde9
   \3. Backreferences can be used in the LHS, the RHS, and in normal
d0cde9
   RE addressing (see section 3.3).  Thus,
d0cde9

d0cde9
       /\(.\)\1\(.\)\2\(.\)\3/;      # matches "bookkeeper"
d0cde9
       /^\(.\)\(.\)\(.\)\3\2\1$/;    # finds 6-letter palindromes
d0cde9

d0cde9
   Seds differ in how they treat invalid backreferences where no
d0cde9
   corresponding group occurs. To insert a literal ampersand or
d0cde9
   backslash into the RHS, prefix it with a backslash: \& or \\.
d0cde9

d0cde9
   ssed, sed16, and sedmod permit additional options on the RHS. They
d0cde9
   all support changing part of the replacement string to upper case
d0cde9
   (\u or \U), lower case (\l or \L), or to end case conversion (\E).
d0cde9
   Both sed16 and sedmod support awk-style word references ($1, $2,
d0cde9
   $3, ...) and $0 to insert the entire line before conversion.
d0cde9

d0cde9
     echo ab ghi | sed16 "s/.*/$0 - \U$2/"   # prints "ab ghi - GHI"
d0cde9

d0cde9
   *Note:* This feature of sed16 and sedmod will break sed scripts which
d0cde9
   put a dollar sign and digit into the RHS. Though this is an unlikely
d0cde9
   combination, it's worth remembering if you use other people's scripts.
d0cde9

d0cde9
3.1.3.  Substitution switches
d0cde9

d0cde9
   Standard versions of sed support 4 main flags or switches which may
d0cde9
   be added to the end of an "s///" command. They are:
d0cde9

d0cde9
       N      - Replace the Nth match of the pattern on the LHS, where
d0cde9
                N is an integer between 1 and 512. If N is omitted,
d0cde9
                the default is to replace the first match only.
d0cde9
       g      - Global replace of all matches to the pattern.
d0cde9
       p      - Print the results to stdout, even if -n switch is used.
d0cde9
       w file - Write the pattern space to 'file' if a replacement was
d0cde9
                done. If the file already exists when the script is
d0cde9
                executed, it is overwritten. During script execution,
d0cde9
                w appends to the file for each match.
d0cde9

d0cde9
   GNU sed 3.02 and ssed also offer the /I switch for doing a
d0cde9
   case-insensitive match. For example,
d0cde9

d0cde9
     echo ONE TWO | gsed "s/one/unos/I"      # prints "unos TWO"
d0cde9

d0cde9
   GNU sed 4.x and ssed add the /M switch, to simplify working with
d0cde9
   multi-line patterns: when it is used, ^ or $ will match BOL or EOL.
d0cde9
   \` and \' remain available to match the start and end of pattern
d0cde9
   space, respectively.
d0cde9

d0cde9
   ssed supports two more switches, /S and /X, when its Perl mode is
d0cde9
   used. They are described in detail in section 6.7.3.H, below.
d0cde9

d0cde9
3.1.4. Command-line switches
d0cde9

d0cde9
   All versions of sed support two switches, -e and -n. Though sed
d0cde9
   usually separates multiple commands with semicolons (e.g., "H;d;"),
d0cde9
   certain commands could not accept a semicolon command separator.
d0cde9
   These include :labels, 't', and 'b'. These commands had to occur
d0cde9
   last in a script, separated by -e option switches. For example:
d0cde9

d0cde9
     # The 'ta' means jump to label :a if last s/// returns true
d0cde9
     sed -e :a -e '$!N;s/\n=/ /;ta' -e 'P;D' file
d0cde9

d0cde9
   The -n switch turns off sed's default behavior of printing every
d0cde9
   line. With -n, lines are printed only if explicitly told to. In
d0cde9
   addition, for certain versions of sed, if an external script begins
d0cde9
   with "#n" as its first two characters, the output is suppressed
d0cde9
   (exactly as if -n had been entered on the command line). A list of
d0cde9
   which versions appears in section 6.7.2., below.
d0cde9

d0cde9
   GNU sed 4.x and ssed support additional switches. -l (lowercase L),
d0cde9
   followed by a number, lets you adjust the default length of the 'l'
d0cde9
   and 'L' commands (note that these implementations of sed also
d0cde9
   support an argument to these commands, to tailor the length
d0cde9
   separately of each occurrence of the command).
d0cde9

d0cde9
   -i activates in-place editing (see section 4.41.1, below). -s
d0cde9
   treats each file as a separate stream: sed by default joins all the
d0cde9
   files, so $ represents the last line of the last file; 15 means the
d0cde9
   15th line in the joined stream; and /abc/,/def/ might match across
d0cde9
   files.
d0cde9

d0cde9
   When -s is used, however all addresses refer to single files. For
d0cde9
   example, $ represents the last line of each input file; 15 means
d0cde9
   the 15th line of each input file; and /abc/,/def/ will be "reset"
d0cde9
   (in other words, sed will not execute the commands and start
d0cde9
   looking for /abc/ again) if a file ends before /def/ has been
d0cde9
   matched. Note that -i automatically activates this interpretation
d0cde9
   of addresses.
d0cde9

d0cde9
3.2. Common one-line sed scripts
d0cde9

d0cde9
   A separate document of over 70 handy "one-line" sed commands is
d0cde9
   available at
d0cde9
       http://sed.sourceforge.net/sed1line.txt
d0cde9

d0cde9
   Here are several common sed commands for one-line use. MS-DOS users
d0cde9
   should replace single quotes ('...') with double quotes ("...") in
d0cde9
   these examples. A specific filename usually follows the script,
d0cde9
   though the input may also come via piping or redirection.
d0cde9

d0cde9
   # Double space a file
d0cde9
   sed G file
d0cde9

d0cde9
   # Triple space a file
d0cde9
   sed 'G;G' file
d0cde9

d0cde9
   # Under UNIX: convert DOS newlines (CR/LF) to Unix format
d0cde9
   sed 's/.$//' file    # assumes that all lines end with CR/LF
d0cde9
   sed 's/^M$// file    # in bash/tcsh, press Ctrl-V then Ctrl-M
d0cde9

d0cde9
   # Under DOS: convert Unix newlines (LF) to DOS format
d0cde9
   sed 's/$//' file                     # method 1
d0cde9
   sed -n p file                        # method 2
d0cde9

d0cde9
   # Delete leading whitespace (spaces/tabs) from front of each line
d0cde9
   # (this aligns all text flush left). '^t' represents a true tab
d0cde9
   # character. Under bash or tcsh, press Ctrl-V then Ctrl-I.
d0cde9
   sed 's/^[ ^t]*//' file
d0cde9

d0cde9
   # Delete trailing whitespace (spaces/tabs) from end of each line
d0cde9
   sed 's/[ ^t]*$//' file               # see note on '^t', above
d0cde9

d0cde9
   # Delete BOTH leading and trailing whitespace from each line
d0cde9
   sed 's/^[ ^t]*//;s/[ ^]*$//' file    # see note on '^t', above
d0cde9

d0cde9
   # Substitute "foo" with "bar" on each line
d0cde9
   sed 's/foo/bar/' file        # replaces only 1st instance in a line
d0cde9
   sed 's/foo/bar/4' file       # replaces only 4th instance in a line
d0cde9
   sed 's/foo/bar/g' file       # replaces ALL instances within a line
d0cde9

d0cde9
   # Substitute "foo" with "bar" ONLY for lines which contain "baz"
d0cde9
   sed '/baz/s/foo/bar/g' file
d0cde9

d0cde9
   # Delete all CONSECUTIVE blank lines from file except the first.
d0cde9
   # This method also deletes all blank lines from top and end of file.
d0cde9
   # (emulates "cat -s")
d0cde9
   sed '/./,/^$/!d' file       # this allows 0 blanks at top, 1 at EOF
d0cde9
   sed '/^$/N;/\n$/D' file     # this allows 1 blank at top, 0 at EOF
d0cde9

d0cde9
   # Delete all leading blank lines at top of file (only).
d0cde9
   sed '/./,$!d' file
d0cde9

d0cde9
   # Delete all trailing blank lines at end of file (only).
d0cde9
   sed -e :a -e '/^\n*$/{$d;N;};/\n$/ba' file
d0cde9

d0cde9
   # If a line ends with a backslash, join the next line to it.
d0cde9
   sed -e :a -e '/\\$/N; s/\\\n//; ta' file
d0cde9

d0cde9
   # If a line begins with an equal sign, append it to the previous
d0cde9
   # line (and replace the "=" with a single space).
d0cde9
   sed -e :a -e '$!N;s/\n=/ /;ta' -e 'P;D' file
d0cde9

d0cde9
3.3. Addressing and address ranges
d0cde9

d0cde9
   Sed commands may have an optional "address" or "address range"
d0cde9
   prefix. If there is no address or address range given, then the
d0cde9
   command is applied to all the lines of the input file or text
d0cde9
   stream. Three commands cannot take an address prefix:
d0cde9

d0cde9
      - labels, used to branch or jump within the script
d0cde9
      - the close brace, '}', which ends the '{' "command"
d0cde9
      - the '#' comment character, also technically a "command"
d0cde9

d0cde9
   An address can be a line number (such as 1, 5, 37, etc.), a regular
d0cde9
   expression (written in the form /RE/ or \xREx where 'x' is any
d0cde9
   character other than '\' and RE is the regular expression), or the
d0cde9
   dollar sign ($), representing the last line of the file. An
d0cde9
   exclamation mark (!) after an address or address range will apply
d0cde9
   the command to every line EXCEPT the ones named by the address. A
d0cde9
   null regex ("//") will be replaced by the last regex which was
d0cde9
   used. Also, some seds do not support \xREx as regex delimiters.
d0cde9

d0cde9
     5d               # delete line 5 only
d0cde9
     5!d              # delete every line except line 5
d0cde9
     /RE/s/LHS/RHS/g  # substitute only if RE occurs on the line
d0cde9
     /^$/b label      # if the line is blank, branch to ':label'
d0cde9
     /./!b label      # ... another way to write the same command
d0cde9
     \%.%!b label     # ... yet another way to write this command
d0cde9
     $!N              # on all lines but the last, get the Next line
d0cde9

d0cde9
   Note that an embedded newline can be represented in an address by
d0cde9
   the symbol \n, but this syntax is needed only if the script puts 2
d0cde9
   or more lines into the pattern space via the N, G, or other
d0cde9
   commands. The \n symbol does *not* match the newline at an
d0cde9
   end-of-line because when sed reads each line into the pattern space
d0cde9
   for processing, it strips off the trailing newline, processes the
d0cde9
   line, and adds a newline back when printing the line to standard
d0cde9
   output. To match the end-of-line, use the '$' metacharacter, as
d0cde9
   follows:
d0cde9

d0cde9
     /tape$/       # matches the word 'tape' at the end of a line
d0cde9
     /tape$deck/   # matches the word 'tape$deck' with a literal '$'
d0cde9
     /tape\ndeck/  # matches 'tape' and 'deck' with a newline between
d0cde9

d0cde9
   The following sed commands usually accept *only* a single address.
d0cde9
   All other commands (except labels, '}', and '#') accept both single
d0cde9
   addresses and address ranges.
d0cde9

d0cde9
     =       print to stdout the line number of the current line
d0cde9
     a       after printing the current line, append "text" to stdout
d0cde9
     i       before printing the current line, insert "text" to stdout
d0cde9
     q       quit after the current line is matched
d0cde9
     r file  prints contents of "file" to stdout after line is matched
d0cde9

d0cde9
   Note that we said "usually." If you need to apply the '=', 'a',
d0cde9
   'i', or 'r' commands to each and every line within an address
d0cde9
   range, this behavior can be coerced by the use of braces. Thus,
d0cde9
   "1,9=" is an invalid command, but "1,9{=;}" will print each line
d0cde9
   number followed by its line for the first 9 lines (and then print
d0cde9
   the rest of the rest of the file normally).
d0cde9

d0cde9
   Address ranges occur in the form
d0cde9

d0cde9
       <address1>,<address2>    or    <address1>,<address2>!
d0cde9

d0cde9
   where the address can be a line number or a standard /regex/.
d0cde9
   <address2> can also be a dollar sign, indicating the end of file.
d0cde9
   Under GNU sed 3.02+, ssed, and sed15+, <address2> may also be a
d0cde9
   notation of the form +num, indicating the next _num_ lines after
d0cde9
   <address1> is matched.
d0cde9

d0cde9
   Address ranges are:
d0cde9

d0cde9
   (1) Inclusive. The range "/From here/,/eternity/" matches all the
d0cde9
   lines containing "From here" up to and including the line
d0cde9
   containing "eternity". It will not stop on the line just prior to
d0cde9
   "eternity". (If you don't like this, see section 4.24.)
d0cde9

d0cde9
   (2) Plenary. They always match full lines, not just parts of lines.
d0cde9
   In other words, a command to change or delete an address range will
d0cde9
   change or delete whole lines; it won't stop in the middle of a
d0cde9
   line.
d0cde9

d0cde9
   (3) Multi-linear. Address ranges normally match 2 lines or more.
d0cde9
   The second address will never match the same line the first address
d0cde9
   did; therefore a valid address range always spans at least two
d0cde9
   lines, with these exceptions which match only one line:
d0cde9

d0cde9
      - if the first address matches the last line of the file
d0cde9
      - if using the syntax "/RE/,3" and /RE/ occurs only once in the
d0cde9
        file at line 3 or below
d0cde9
      - if using HHsed v1.5. See section 3.4.
d0cde9

d0cde9
   (4) Minimalist. In address ranges with /regex/ as <address2>, the
d0cde9
   range "/foo/,/bar/" will stop at the first "bar" it finds, provided
d0cde9
   that "bar" occurs on a line below "foo". If the word "bar" occurs
d0cde9
   on several lines below the word "foo", the range will match all the
d0cde9
   lines from the first "foo" up to the first "bar". It will not
d0cde9
   continue hopping ahead to find more "bar"s. In other words, address
d0cde9
   ranges are not "greedy," like regular expressions.
d0cde9

d0cde9
   (5) Repeating. An address range will try to match more than one
d0cde9
   block of lines in a file. However, the blocks cannot nest. In
d0cde9
   addition, a second match will not "take" the last line of the
d0cde9
   previous block.  For example, given the following text,
d0cde9

d0cde9
       start
d0cde9
       stop  start
d0cde9
       stop
d0cde9

d0cde9
   the sed command '/start/,/stop/d' will only delete the first two
d0cde9
   lines. It will not delete all 3 lines.
d0cde9

d0cde9
   (6) Relentless. If the address range finds a "start" match but
d0cde9
   doesn't find a "stop", it will match every line from "start" to the
d0cde9
   end of the file. Thus, beware of the following behaviors:
d0cde9

d0cde9
     /RE1/,/RE2/  # If /RE2/ is not found, matches from /RE1/ to the
d0cde9
                  # end-of-file.
d0cde9

d0cde9
     20,/RE/      # If /RE/ is not found, matches from line 20 to the
d0cde9
                  # end-of-file.
d0cde9

d0cde9
     /RE/,30      # If /RE/ occurs any time after line 30, each
d0cde9
                  # occurrence will be matched in sed15+, sedmod, and
d0cde9
                  # GNU sed v3.02+. GNU sed v2.05 and 1.18 will match
d0cde9
                  # from the 2nd occurrence of /RE/ to the end-of-file.
d0cde9

d0cde9
   If these behaviors seem strange, remember that they occur because
d0cde9
   sed does not look "ahead" in the file. Doing so would stop sed from
d0cde9
   being a stream editor and have adverse effects on its efficiency.
d0cde9
   If these behaviors are undesirable, they can be circumvented or
d0cde9
   corrected by the use of nested testing within braces. The following
d0cde9
   scripts work under GNU sed 3.02:
d0cde9

d0cde9
     # Execute your_commands on range "/RE1/,/RE2/", but if /RE2/ is
d0cde9
     # not found, do nothing.
d0cde9
     /RE1/{:a;N;/RE2/!ba;your_commands;}
d0cde9

d0cde9
     # Execute your_commands on range "20,/RE/", but if /RE/ is not
d0cde9
     # found, do nothing.
d0cde9
     20{:a;N;/RE/!ba;your_commands;}
d0cde9

d0cde9
   As a side note, once we've used N to "slurp" lines together to test
d0cde9
   for the ending expression, the pattern space will have gathered
d0cde9
   many lines (possibly thousands) together and concatenated them as a
d0cde9
   single expression, with the \n sequence marking line breaks. The
d0cde9
   REs *within* the pattern space may have to be modified (e.g., you
d0cde9
   must write '/\nStart/' instead of '/^Start/' and '/[^\n]*/' instead
d0cde9
   of '/.*/') and other standard sed commands will be unavailable or
d0cde9
   difficult to use.
d0cde9

d0cde9
     # Execute your_commands on range "/RE/,30", but if /RE/ occurs
d0cde9
     # on line 31 or later, do not match it.
d0cde9
     1,30{/RE/,$ your_commands;}
d0cde9

d0cde9
   For related suggestions on using address ranges, see sections 4.2,
d0cde9
   4.15, and 4.19 of this FAQ. Also, note the following section.
d0cde9

d0cde9
3.4. Address ranges in GNU sed and HHsed
d0cde9

d0cde9
   (1) GNU sed 3.02+, ssed, and sed15+ all support address ranges like:
d0cde9

d0cde9
       /regex/,+5
d0cde9

d0cde9
   which match /regex/ plus the next 5 lines (or EOF, whichever comes
d0cde9
   first).
d0cde9

d0cde9
   (2) GNU sed v3.02.80 (and above) and ssed support address ranges of:
d0cde9

d0cde9
       0,/regex/
d0cde9

d0cde9
   as a special case to permit matching /regex/ if it occurs on the
d0cde9
   first line. This syntax permits a range expression that matches
d0cde9
   every line from the top of the file to the first instance of
d0cde9
   /regex/, even if /regex/ is on the first line.
d0cde9

d0cde9
   (3) HHsed (sed15) has an exceptional way of implementing
d0cde9

d0cde9
       /regex1/,/regex2/
d0cde9

d0cde9
   If /RE1/ and /RE2/ both occur on the *same* line, HHsed will match
d0cde9
   that single line. In other words, an address range block can
d0cde9
   consist of just one line. HHsed will then look for the next
d0cde9
   occurrence of /regex1/ to begin the block again.
d0cde9

d0cde9
   Every other version of sed (including sed16) requires 2 lines to
d0cde9
   match an address range, and thus /regex1/ and /regex2/ cannot
d0cde9
   successfully match just one line. See also the comments at
d0cde9
   section 7.9.4, below.
d0cde9

d0cde9
   (4) BEGIN~STEP selection: ssed and GNU sed (v2.05 and above) offer
d0cde9
   a form of addressing called "BEGIN~STEP selection". This is *not* a
d0cde9
   range address, which selects an inclusive block of consecutive
d0cde9
   lines from /start/ to /finish/. But I think it seems to belong here.
d0cde9

d0cde9
   Given an expression of the form "M~N", where M and N are integers,
d0cde9
   GNU sed and ssed will select every Nth line, beginning at line M.
d0cde9
   (With gsed v2.05, M had to be less than N, but this restriction is
d0cde9
   no longer necessary). Both M and N may equal 0 ("0~0" selects every
d0cde9
   line). These examples illustrate the syntax:
d0cde9

d0cde9
     sed '1~3d' file      # delete every 3d line, starting with line 1
d0cde9
                          # deletes lines 1, 4, 7, 10, 13, 16, ...
d0cde9

d0cde9
     sed '0~3d' file      # deletes lines 3, 6, 9, 12, 15, 18, ...
d0cde9

d0cde9
     sed -n '2~5p' file   # print every 5th line, starting with line 2
d0cde9
                          # prints lines 2, 7, 12, 17, 22, 27, ...
d0cde9

d0cde9
   (5) Finally, GNU sed v2.05 has a bug in range addressing (see
d0cde9
   section 7.5), which was fixed in the higher versions.
d0cde9

d0cde9

d0cde9
3.5. Debugging sed scripts
d0cde9

d0cde9
   The following two debuggers should make it easier to understand how
d0cde9
   sed scripts operate. They can save hours of grief when trying to
d0cde9
   determine the problems with a sed script.
d0cde9

d0cde9
   (1) sd (sed debugger), by Brian Hiles
d0cde9

d0cde9
   This debugger runs under a Unix shell, is powerful, and is easy to
d0cde9
   use. sd has conditional breakpoints and spypoints of the pattern
d0cde9
   space and hold space, on any scope defined by regex match and/or
d0cde9
   script line number. It can be semi-automated, can save diagnostic
d0cde9
   reports, and shows potential problems with a sed script before it
d0cde9
   tries to execute it. The script is robust and requires the Unix
d0cde9
   shell utilities plus the Bourne shell or Korn shell to execute.
d0cde9

d0cde9
       http://sed.sourceforge.net/grabbag/scripts/sd.ksh.txt (2003)
d0cde9
       http://sed.sourceforge.net/grabbag/scripts/sd.sh.txt  (1998)
d0cde9

d0cde9
   (2) sedsed, by Aurelio Jargas
d0cde9

d0cde9
   This debugger requires Python to run it, and it uses your own
d0cde9
   version of sed, whatever that may be. It displays the current input
d0cde9
   line, the pattern space, and the hold space, before and after each
d0cde9
   sed command is executed.
d0cde9

d0cde9
       http://sedsed.sourceforge.net
d0cde9

d0cde9

d0cde9
3.6. Notes about s2p, the sed-to-perl translator
d0cde9

d0cde9
   s2p (sed to perl) is a Perl program to convert sed scripts into the
d0cde9
   Perl programming language; it is included with many versions of
d0cde9
   Perl. These problems have been found when using s2p:
d0cde9

d0cde9
   (1) Doesn't recognize the semicolon properly after s/// commands.
d0cde9

d0cde9
       s/foo/bar/g;
d0cde9

d0cde9
   (2) Doesn't trim trailing whitespace after s/// commands. Even lone
d0cde9
   trailing spaces, without comments, produce an error.
d0cde9

d0cde9
   (3) Doesn't handle multiple commands within braces. E.g.,
d0cde9

d0cde9
       1,4{=;G;}
d0cde9

d0cde9
   will produce perl code with missing braces, and miss the second "G"
d0cde9
   command as well. In fact, any commands after the first one are
d0cde9
   missed in the perl output script, and the output perl script will
d0cde9
   also contain mismatched braces.
d0cde9

d0cde9
3.7. GNU/POSIX extensions to regular expressions
d0cde9

d0cde9
   GNU sed supports "character classes" in addition to regular
d0cde9
   character sets, such as [0-9A-F]. Like regular character sets,
d0cde9
   character classes represent any single character within a set.
d0cde9

d0cde9
   "Character classes are a new feature introduced in the POSIX
d0cde9
   standard. A character class is a special notation for describing
d0cde9
   lists of characters that have a specific attribute, but where the
d0cde9
   actual characters themselves can vary from country to country
d0cde9
   and/or from character set to character set. For example, the notion
d0cde9
   of what is an alphabetic character differs in the USA and in
d0cde9
   France." [quoted from the docs for GNU awk v3.1.0.]
d0cde9

d0cde9
   Though character classes don't generally conserve space on the
d0cde9
   line, they help make scripts portable for international use. The
d0cde9
   equivalent character sets _for U.S. users_ follows:
d0cde9

d0cde9
     [[:alnum:]]  - [A-Za-z0-9]     Alphanumeric characters
d0cde9
     [[:alpha:]]  - [A-Za-z]        Alphabetic characters
d0cde9
     [[:blank:]]  - [ \x09]         Space or tab characters only
d0cde9
     [[:cntrl:]]  - [\x00-\x19\x7F] Control characters
d0cde9
     [[:digit:]]  - [0-9]           Numeric characters
d0cde9
     [[:graph:]]  - [!-~]           Printable and visible characters
d0cde9
     [[:lower:]]  - [a-z]           Lower-case alphabetic characters
d0cde9
     [[:print:]]  - [ -~]           Printable (non-Control) characters
d0cde9
     [[:punct:]]  - [!-/:-@[-`{-~]  Punctuation characters
d0cde9
     [[:space:]]  - [ \t\v\f]       All whitespace chars
d0cde9
     [[:upper:]]  - [A-Z]           Upper-case alphabetic characters
d0cde9
     [[:xdigit:]] - [0-9a-fA-F]     Hexadecimal digit characters
d0cde9

d0cde9
   Note that [[:graph:]] does not match the space " ", but [[:print:]]
d0cde9
   does. Some character classes may (or may not) match characters in
d0cde9
   the high ASCII range (ASCII 128-255 or 0x80-0xFF), depending on
d0cde9
   which C library was used to compile sed. For non-English languages,
d0cde9
   [[:alpha:]] and other classes may also match high ASCII characters.
d0cde9

d0cde9
------------------------------
d0cde9

d0cde9
4. EXAMPLES
d0cde9

d0cde9
   ONE-CHARACTER QUESTIONS
d0cde9

d0cde9
4.1. How do I insert a newline into the RHS of a substitution?
d0cde9

d0cde9
   Several versions of sed permit '\n' to be typed directly into the
d0cde9
   RHS, which is then converted to a newline on output: ssed,
d0cde9
   gsed302a+, gsed103 (with the -x switch), sed15+, sedmod, and
d0cde9
   UnixDOS sed. The _easiest_ solution is to use one of these
d0cde9
   versions.
d0cde9

d0cde9
   For other versions of sed, try one of the following:
d0cde9

d0cde9
   (a) If typing the sed script from a Bourne shell, use one backslash
d0cde9
   "\" if the script uses 'single quotes' or two backslashes "\\" if
d0cde9
   the script requires "double quotes". In the example below, note
d0cde9
   that the leading '>' on the 2nd line is generated by the shell to
d0cde9
   prompt the user for more input. The user types in slash,
d0cde9
   single-quote, and then ENTER to terminate the command:
d0cde9

d0cde9
     [sh-prompt]$ echo twolines | sed 's/two/& new\
d0cde9
     >/'
d0cde9
     two new
d0cde9
     lines
d0cde9
     [bash-prompt]$
d0cde9

d0cde9
   (b) Use a script file with one backslash '\' in the script,
d0cde9
   immediately followed by a newline. This will embed a newline into
d0cde9
   the "replace" portion. Example:
d0cde9

d0cde9
     sed -f newline.sed files
d0cde9

d0cde9
     # newline.sed
d0cde9
     s/twolines/two new\
d0cde9
     lines/g
d0cde9

d0cde9
   Some versions of sed may not need the trailing backslash. If so,
d0cde9
   remove it.
d0cde9

d0cde9
   (c) Insert an unused character and pipe the output through tr:
d0cde9

d0cde9
     echo twolines | sed 's/two/& new=/' | tr "=" "\n"   # produces
d0cde9
     two new
d0cde9
     lines
d0cde9

d0cde9
   (d) Use the "G" command:
d0cde9

d0cde9
   G appends a newline, plus the contents of the hold space to the end
d0cde9
   of the pattern space. If the hold space is empty, a newline is
d0cde9
   appended anyway. The newline is stored in the pattern space as "\n"
d0cde9
   where it can be addressed by grouping "\(...\)" and moved in the
d0cde9
   RHS. Thus, to change the "twolines" example used earlier, the
d0cde9
   following script will work:
d0cde9

d0cde9
     sed '/twolines/{G;s/\(two\)\(lines\)\(\n\)/\1\3\2/;}'
d0cde9

d0cde9
   (e) Inserting full lines, not breaking lines up:
d0cde9

d0cde9
   If one is not *changing* lines but only inserting complete lines
d0cde9
   before or after a pattern, the procedure is much easier. Use the
d0cde9
   "i" (insert) or "a" (append) command, making the alterations by an
d0cde9
   external script. To insert "This line is new" BEFORE each line
d0cde9
   matching a regex:
d0cde9

d0cde9
     /RE/i This line is new               # HHsed, sedmod, gsed 3.02a
d0cde9
     /RE/{x;s/$/This line is new/;G;}     # other seds
d0cde9

d0cde9
   The two examples above are intended as "one-line" commands entered
d0cde9
   from the console. If using a sed script, "i\" immediately followed
d0cde9
   by a literal newline will work on all versions of sed. Furthermore,
d0cde9
   the command "s/$/This line is new/" will only work if the hold
d0cde9
   space is already empty (which it is by default).
d0cde9

d0cde9
   To append "This line is new" AFTER each line matching a regex:
d0cde9

d0cde9
     /RE/a This line is new               # HHsed, sedmod, gsed 3.02a
d0cde9
     /RE/{G;s/$/This line is new/;}       # other seds
d0cde9

d0cde9
   To append 2 blank lines after each line matching a regex:
d0cde9

d0cde9
     /RE/{G;G;}                    # assumes the hold space is empty
d0cde9

d0cde9
   To replace each line matching a regex with 5 blank lines:
d0cde9

d0cde9
     /RE/{s/.*//;G;G;G;G;}         # assumes the hold space is empty
d0cde9

d0cde9
   (f) Use the "y///" command if possible:
d0cde9

d0cde9
   On some Unix versions of sed (not GNU sed!), though the s///
d0cde9
   command won't accept '\n' in the RHS, the y/// command does. If
d0cde9
   your Unix sed supports it, a newline after "aaa" can be inserted
d0cde9
   this way (which is not portable to GNU sed or other seds):
d0cde9

d0cde9
     s/aaa/&~;; y/~/\n/;    # assuming no other '~' is on the line!
d0cde9

d0cde9
4.2. How do I represent control-codes or nonprintable characters?
d0cde9

d0cde9
   Several versions of sed support the notation \xHH, where "HH" are
d0cde9
   two hex digits, 00-FF: ssed, GNU sed v3.02.80 and above, GNU sed
d0cde9
   v1.03, sed16 and sed15 (HHsed). Try to use one of those versions.
d0cde9

d0cde9
   Sed is not intended to process binary or object code, and files
d0cde9
   which contain nulls (0x00) will usually generate errors in most
d0cde9
   versions of sed. The latest versions of GNU sed and ssed are an
d0cde9
   exception; they permit nulls in the input files and also in
d0cde9
   regexes.
d0cde9

d0cde9
   On Unix platforms, the 'echo' command may allow insertion of octal
d0cde9
   or hex values, e.g., `echo "\0nnn"` or `echo -n "\0nnn"`. The echo
d0cde9
   command may also support syntax like '\\b' or '\\t' for backspace
d0cde9
   or tab characters. Check the man pages to see what syntax your
d0cde9
   version of echo supports. Some versions support the following:
d0cde9

d0cde9
     # replace 0x1A (32 octal) with ASCII letters
d0cde9
     sed 's/'`echo "\032"`'/Ctrl-Z/g'
d0cde9

d0cde9
     # note the 3 backslashes in the command below
d0cde9
     sed "s/.`echo \\\b`//g"
d0cde9

d0cde9
4.3. How do I convert files with toggle characters, like +this+, to
d0cde9
look like [i]this[/i]?
d0cde9

d0cde9
   Input files, especially message-oriented text files, often contain
d0cde9
   toggle characters for emphasis, like ~this~, *this*, or =this=. Sed
d0cde9
   can make the same input pattern produce alternating output each
d0cde9
   time it is encountered. Typical needs might be to generate HMTL
d0cde9
   codes or print codes for boldface, italic, or underscore. This
d0cde9
   script accomodates multiple occurrences of the toggle pattern on
d0cde9
   the same line, as well as cases where the pattern starts on one
d0cde9
   line and finishes several lines later, even at the end of the file:
d0cde9

d0cde9
     # sed script to convert +this+ to [i]this[/i]
d0cde9
     :a
d0cde9
     /+/{ x;        # If "+" is found, switch hold and pattern space
d0cde9
       /^ON/{       # If "ON" is in the (former) hold space, then ..
d0cde9
         s///;      # .. delete it
d0cde9
         x;         # .. switch hold space and pattern space back
d0cde9
         s|+|[/i]|; # .. turn the next "+" into "[/i]"
d0cde9
         ba;        # .. jump back to label :a and start over
d0cde9
       }
d0cde9
     s/^/ON/;       # Else, "ON" was not in the hold space; create it
d0cde9
     x;             # Switch hold space and pattern space
d0cde9
     s|+|[i]|;      # Turn the first "+" into "[i]"
d0cde9
     ba;            # Branch to label :a to find another pattern
d0cde9
     }
d0cde9
     #---end of script---
d0cde9

d0cde9
   This script uses the hold space to create a "flag" to indicate
d0cde9
   whether the toggle is ON or not. We have added remarks to
d0cde9
   illustrate the script logic, but in most versions of sed remarks
d0cde9
   are not permitted after 'b'ranch commands or labels.
d0cde9

d0cde9
   If you are sure that the +toggle+ characters never cross line
d0cde9
   boundaries (i.e., never begin on one line and end on another), this
d0cde9
   script can be reduced to one line:
d0cde9

d0cde9
     s|+\([^+][^+]*\)+|[i]\1[/i]|g
d0cde9

d0cde9
   If your toggle pattern contains regex metacharacters (such as '*'
d0cde9
   or perhaps '+' or '?'), remember to quote them with backslashes.
d0cde9

d0cde9
   CHANGING STRINGS
d0cde9

d0cde9
4.10. How do I perform a case-insensitive search?
d0cde9

d0cde9
   Several versions of sed support case-insensitive matching: ssed and
d0cde9
   GNU sed v3.02+ (with I flag after s/// or /regex/); sedmod with the
d0cde9
   -i switch; and sed16 (which supports both types of switches).
d0cde9

d0cde9
   With other versions of sed, case-insensitive searching is awkward,
d0cde9
   so people may use awk or perl instead, since these programs have
d0cde9
   options for case-insensitive searches. In gawk/mawk, use "BEGIN
d0cde9
   {IGNORECASE=1}" and in perl, "/regex/i". For other seds, here are
d0cde9
   three solutions:
d0cde9

d0cde9
   Solution 1: convert everything to upper case and search normally
d0cde9

d0cde9
     # sed script, solution 1
d0cde9
     h;          # copy the original line to the hold space
d0cde9
                 # convert the pattern space to solid caps
d0cde9
     y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
d0cde9
                 # now we can search for the word "CARLOS"
d0cde9
     /CARLOS/ {
d0cde9
          # add or insert lines. Note: "s/.../.../" will not work
d0cde9
          # here because we are searching a modified pattern
d0cde9
          # space and are not printing the pattern space.
d0cde9
     }
d0cde9
     x;          # get back the original pattern space
d0cde9
                 # the original pattern space will be printed
d0cde9
     #---end of sed script---
d0cde9

d0cde9
   Solution 2: search for both cases
d0cde9

d0cde9
   Often, proper names will either start with all lower-case ("unix"),
d0cde9
   with an initial capital letter ("Unix") or occur in solid caps
d0cde9
   ("UNIX"). There may be no need to search for every possibility.
d0cde9

d0cde9
     /UNIX/b match
d0cde9
     /[Uu]nix/b match
d0cde9

d0cde9
   Solution 3: search for all possible cases
d0cde9

d0cde9
     # If you must, search for any possible combination
d0cde9
     /[Ca][Aa][Rr][Ll][Oo][Ss]/ { ... }
d0cde9

d0cde9
   Bear in mind that as the pattern length increases, this solution
d0cde9
   becomes an order of magnitude slower than the one of Solution 1, at
d0cde9
   least with some implementations of sed.
d0cde9

d0cde9
4.11. How do I match only the first occurrence of a pattern?
d0cde9

d0cde9
   (1) The general solution is to use GNU sed or ssed, with one of
d0cde9
   these range expressions. The first script ("print only the first
d0cde9
   match") works with any version of sed:
d0cde9

d0cde9
     sed -n '/RE/{p;q;}' file       # print only the first match
d0cde9
     sed '0,/RE/{//d;}' file        # delete only the first match
d0cde9
     sed '0,/RE/s//to_that/' file   # change only the first match
d0cde9

d0cde9
   (2) If you cannot use GNU sed and if you *know* the pattern will
d0cde9
   not occur on the first line, this will work:
d0cde9

d0cde9
     sed '1,/RE/{//d;}' file        # delete only the first match
d0cde9
     sed '1,/RE/s//to_that/' file   # change only the first match
d0cde9

d0cde9
   (3) If you cannot use GNU sed and the pattern *might* occur on the
d0cde9
   first line, use one of the following commands (credit for short GNU
d0cde9
   script goes to Donald Bruce Stewart):
d0cde9

d0cde9
     sed '/RE/{x;/Y/!{s/^/Y/;h;d;};x;}' file       # delete (one way)
d0cde9
     sed -e '/RE/{d;:a' -e '$!N;$ba' -e '}' file   # delete (another way)
d0cde9
     sed '/RE/{d;:a;N;$ba;}' file                  # same script, GNU sed
d0cde9
     sed -e '/RE/{s//to_that/;:a' -e '$!N;$!ba' -e '}' file  # change
d0cde9

d0cde9
   Still another solution, using a flag in the hold space. This is
d0cde9
   portable to all seds and works if the pattern is on the first line:
d0cde9

d0cde9
     # sed script to change "foo" to "bar" only on the first occurrence
d0cde9
     1{x;s/^/first/;x;}
d0cde9
     1,/foo/{x;/first/s///;x;s/foo/bar/;}
d0cde9
     #---end of script---
d0cde9

d0cde9
4.12. How do I parse a comma-delimited (CSV) data file?
d0cde9

d0cde9
   Comma-delimited data files can come in several forms, requiring
d0cde9
   increasing levels of complexity in parsing and handling. They are
d0cde9
   often referred to as CSV files (for "comma separated values") and
d0cde9
   occasionally as SDF files (for "standard data format"). Note that
d0cde9
   some vendors use "SDF" to refer to variable-length records with
d0cde9
   comma-separated fields which are "double-quoted" if they contain
d0cde9
   character values, while other vendors use "SDF" to designate
d0cde9
   fixed-length records with fixed-length, nonquoted fields! (For help
d0cde9
   with fixed-length fields, see question 4.23)
d0cde9

d0cde9
   The term "CSV" became a de-facto standard when Microsoft Excel used
d0cde9
   it as an optional output file format.
d0cde9

d0cde9
   Here are 4 different forms you may encounter in comma-delimited data:
d0cde9

d0cde9
   (a) No quotes, no internal commas
d0cde9

d0cde9
       1001,John Smith,PO Box 123,Chicago,IL,60699
d0cde9
       1002,Mary Jones,320 Main,Denver,CO,84100,
d0cde9

d0cde9
   (b) Like (a), with quotes around each field
d0cde9

d0cde9
       "1003","John Smith","PO Box 123","Chicago","IL","60699"
d0cde9
       "1004","Mary Jones","320 Main","Denver","CO","84100"
d0cde9

d0cde9
   (c) Like (b), with embedded commas
d0cde9

d0cde9
       "1005","Tom Hall, Jr.","61 Ash Ct.","Niles","OH","44446"
d0cde9
       "1006","Bob Davis","429 Pine, Apt. 5","Boston","MA","02128"
d0cde9

d0cde9
   (d) Like (c), with embedded commas and quotes
d0cde9

d0cde9
       "1007","Sue "Red" Smith","19 Main","Troy","MI","48055"
d0cde9
       "1008","Joe "Hey, guy!" Hall","POB 44","Reno","NV","89504"
d0cde9

d0cde9
   In each example above, we have 7 fields and 6 commas which function
d0cde9
   as field separators. Case (c) is a very typical form of these data
d0cde9
   files, with double quotes used to enclose each field and to protect
d0cde9
   internal commas (such as "Tom Hall, Jr.") from interpretation as
d0cde9
   field separators. However, many times the data may include both
d0cde9
   embedded quotation marks as well as embedded commas, as seen by
d0cde9
   case (d), above.
d0cde9

d0cde9
   Case (d) is the closest to Microsoft CSV format. *However*, the
d0cde9
   Microsoft CSV format allows embedded newlines within a
d0cde9
   double-quoted field. If embedded newlines within fields are a
d0cde9
   possibility for your data, you should consider using something
d0cde9
   other than sed to work with the data file.
d0cde9

d0cde9
   Before handling a comma-delimited data file, make sure that you
d0cde9
   fully understand its format and check the integrity of the data.
d0cde9
   Does each line contain the same number of fields? Should certain
d0cde9
   fields be composed only of numbers or of two-letter state
d0cde9
   abbreviations in all caps? Sed (or awk or perl) should be used to
d0cde9
   validate the integrity of the data file before you attempt to alter
d0cde9
   it or extract particular fields from the file.
d0cde9

d0cde9
   After ensuring that each line has a valid number of fields, use sed
d0cde9
   to locate and modify individual fields, using the \(...\) grouping
d0cde9
   command where needed.
d0cde9

d0cde9
   In case (a):
d0cde9

d0cde9
     sed 's/^[^,]*,[^,]*,[^,]*,[^,]*,/.../'
d0cde9
             ^     ^     ^
d0cde9
             |     |     |_ 3rd field
d0cde9
             |     |_______ 2nd field
d0cde9
             |_____________ 1st field
d0cde9

d0cde9
     # Unix script to delete the second field for case (a)
d0cde9
     sed 's/^\([^,]*\),[^,]*,/\1,,/' file
d0cde9

d0cde9
     # Unix script to change field 1 to 9999 for case (a)
d0cde9
     sed 's/^[^,]*,/9999,/' file
d0cde9

d0cde9
   In cases (b) and (c):
d0cde9

d0cde9
     sed 's/^"[^"]*","[^"]*","[^"]*","[^"]*",/.../'
d0cde9
              1st--   2nd--   3rd--   4th--
d0cde9

d0cde9
     # Unix script to delete the second field for case (c)
d0cde9
     sed 's/^\("[^"]*"\),"[^"]*",/\1,"",/' file
d0cde9

d0cde9
     # Unix script to change field 1 to 9999 for case (c)
d0cde9
     sed 's/^"[^"]*",/"9999",/' file
d0cde9

d0cde9

d0cde9
   In case (d):
d0cde9

d0cde9
   One way to parse such files is to replace the 3-character field
d0cde9
   separator "," with an unused character like the tab or vertical
d0cde9
   bar. (Technically, the field separator is only the comma while the
d0cde9
   fields are surrounded by "double quotes", but the net _effect_ is
d0cde9
   that fields are separated by quote-comma-quote, with quote
d0cde9
   characters added to the beginning and end of each record.) Search
d0cde9
   your datafile _first_ to make sure that your character appears
d0cde9
   nowhere in it!
d0cde9

d0cde9
     sed -n '/|/p' file        # search for any instance of '|'
d0cde9
     # if it's not found, we can use the '|' to separate fields
d0cde9

d0cde9
   Then replace the 3-character field separator and parse as before:
d0cde9

d0cde9
     # sed script to delete the second field for case (d)
d0cde9
     s/","/|/g;                  # global change of "," to bar
d0cde9
     s/^\([^|]*\)|[^|]|/\1||/;   # delete 2nd field
d0cde9
     s/|/","/g;                  # global change of bar back to ","
d0cde9
     #---end of script---
d0cde9

d0cde9
     # sed script to change field 1 to 9999 for case (d)
d0cde9
     # Remember to accommodate leading and trailing quote marks
d0cde9
     s/","/|/g;
d0cde9
     s/^[^|]*|/"9999|/;
d0cde9
     s/|/","/g;
d0cde9
     #---end of script---
d0cde9

d0cde9
   Note that this technique works only if _each_ and _every_ field is
d0cde9
   surrounded with double quotes, including empty fields.
d0cde9

d0cde9
   The following solution is for more complex examples of (d), such
d0cde9
   as: not all fields contain "double-quote" marks, or the presence of
d0cde9
   embedded "double-quote" marks within fields, or extraneous
d0cde9
   whitespace around field delimiters. (Thanks to Greg Ubben for this
d0cde9
   script!)
d0cde9

d0cde9
     # sed script to convert case (d) to bar-delimited records
d0cde9
     s/^ *\(.*[^ ]\) *$/|\1|/;
d0cde9
     s/" *, */"|/g;
d0cde9
     : loop
d0cde9
     s/| *\([^",|][^,|]*\) *, */|\1|/g;
d0cde9
     s/| *, */|\1|/g;
d0cde9
     t loop
d0cde9
     s/  *|/|/g;
d0cde9
     s/|  */|/g;
d0cde9
     s/^|\(.*\)|$/\1/;
d0cde9
     #---end of script---
d0cde9

d0cde9
   For example, it turns this (which is badly-formed but legal):
d0cde9

d0cde9
   first,"",unquoted ,""this" is, quoted " ,, sub "quote" inside, f", lone  " empty:
d0cde9

d0cde9
   into this:
d0cde9

d0cde9
   first|""|unquoted|""this" is, quoted "||sub "quote" inside|f"|lone  "   empty:
d0cde9

d0cde9
   Note that the script preserves the "double-quote" marks, but
d0cde9
   changes only the commas where they are used as field separators. I
d0cde9
   have used the vertical bar "|" because it's easier to read, but you
d0cde9
   may change this to another field separator if you wish.
d0cde9

d0cde9
   If your CSV datafile is more complex, it would probably not be
d0cde9
   worth the effort to write it in sed. For such a case, you should
d0cde9
   use Perl with a dedicated CSV module (there are at least two recent
d0cde9
   CSV parsers available from CPAN).
d0cde9

d0cde9
4.13. How do I handle fixed-length, columnar data?
d0cde9

d0cde9
   Sed handles fixed-length fields via \(grouping\) and backreferences
d0cde9
   (\1, \2, \3 ...). If we have 3 fields of 10, 25, and 9 characters
d0cde9
   per field, our sed script might look like so:
d0cde9

d0cde9
     s/^\(.\{10\}\)\(.\{25\}\)\(.\{9\}\)/\3\2\1/;  # Change the fields
d0cde9
        ^^^^^^^^^^^~~~~~~~~~~~==========           #   from 1,2,3 to 3,2,1
d0cde9
         field #1   field #2   field #3
d0cde9

d0cde9
   This is a bit hard to read. By using GNU sed or ssed with the -r
d0cde9
   switch active, it can look like this:
d0cde9

d0cde9
     s/^(.{10})(.{25})(.{9})/\3\2\1/;          # Using the -r switch
d0cde9

d0cde9
   To delete a field in sed, use grouping and omit the backreference
d0cde9
   from the field to be deleted. If the data is long or difficult to
d0cde9
   work with, use ssed with the -R switch and the /x flag after an s///
d0cde9
   command, to insert comments and remarks about the fields.
d0cde9

d0cde9
   For records with many fields, use GNU awk with the FIELDWIDTHS
d0cde9
   variable set in the top of the script. For example:
d0cde9

d0cde9
     awk 'BEGIN{FIELDWIDTHS = "10 25 9"}; {print $3 $2 $1}' file
d0cde9

d0cde9
   This is much easier to read than a similar sed script, especially
d0cde9
   if there are more than 5 or 6 fields to manipulate.
d0cde9

d0cde9
4.14. How do I commify a string of numbers?
d0cde9

d0cde9
   Use the simplest script necessary to accomplish your task. As
d0cde9
   variations of the line increase, the sed script must become more
d0cde9
   complex to handle additional conditions. Whole numbers are
d0cde9
   simplest, followed by decimal formats, followed by embedded words.
d0cde9

d0cde9
   Case 1: simple strings of whole numbers separated by spaces or
d0cde9
   commas, with an optional negative sign. To convert this:
d0cde9

d0cde9
       4381, -1222333, and 70000: - 44555666 1234567890 words
d0cde9
       56890  -234567, and 89222  -999777  345888777666 chars
d0cde9

d0cde9
   to this:
d0cde9

d0cde9
       4,381, -1,222,333, and 70,000: - 44,555,666 1,234,567,890 words
d0cde9
       56,890  -234,567, and 89,222  -999,777  345,888,777,666 chars
d0cde9

d0cde9
   use one of these one-liners:
d0cde9

d0cde9
     sed ':a;s/\B[0-9]\{3\}\>/,&/;ta'                      # GNU sed
d0cde9
     sed -e :a -e 's/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/;ta'  # other seds
d0cde9

d0cde9
   Case 2: strings of numbers which may have an embedded decimal
d0cde9
   point, separated by spaces or commas, with an optional negative
d0cde9
   sign. To change this:
d0cde9

d0cde9
       4381,  -6555.1212 and 70000,  7.18281828  44906982.071902
d0cde9
       56890   -2345.7778 and 8.0000:  -49000000 -1234567.89012
d0cde9

d0cde9
   to this:
d0cde9

d0cde9
       4,381,  -6,555.1212 and 70,000,  7.18281828  44,906,982.071902
d0cde9
       56,890   -2,345.7778 and 8.0000:  -49,000,000 -1,234,567.89012
d0cde9

d0cde9
   use the following command for GNU sed:
d0cde9

d0cde9
     sed ':a;s/\(^\|[^0-9.]\)\([0-9]\+\)\([0-9]\{3\}\)/\1\2,\3/g;ta'
d0cde9

d0cde9
   and for other versions of sed:
d0cde9

d0cde9
     sed -f case2.sed files
d0cde9

d0cde9
     # case2.sed
d0cde9
     s/^/ /;                 # add space to start of line
d0cde9
     :a
d0cde9
     s/\( [-0-9]\{1,\}\)\([0-9]\{3\}\)/\1,\2/g
d0cde9
     ta
d0cde9
     s/ //;                  # remove space from start of line
d0cde9
     #---end of script---
d0cde9

d0cde9
4.15. How do I prevent regex expansion on substitutions?
d0cde9

d0cde9
   Sometimes you want to *match* regular expression metacharacters as
d0cde9
   literals (e.g., you want to match "[0-9]" or "\n"), to be replaced
d0cde9
   with something else. The ordinary way to prevent expanding
d0cde9
   metacharacters is to prefix them with a backslash. Thus, if "\n"
d0cde9
   matches a newline, "\\n" will match the two-character string of
d0cde9
   'backslash' followed by 'n'.
d0cde9

d0cde9
   But doing this repeatedly can become tedious if there are many
d0cde9
   regexes. The following script will replace alternating strings of
d0cde9
   literals, where no character is interpreted as a regex
d0cde9
   metacharacter:
d0cde9

d0cde9
     # filename: sub_quote.sed
d0cde9
     #   author: Paolo Bonzini
d0cde9
     # sed script to add backslash to find/replace metacharacters
d0cde9
     N;                  # add even numbered line to pattern space
d0cde9
     s,[]/\\$*[],\\&,;;  # quote all of [, ], /, \, $, or *
d0cde9
     s,^,s/,;            # prepend "s/" to front of pattern space
d0cde9
     s,$,/,;             # append "/" to end of pattern space
d0cde9
     s,\n,/,;            # change "\n" to "/", making s/from/to/
d0cde9
     #---end of script---
d0cde9

d0cde9
   Here's a sample of how sub_quote.sed might be used. This example
d0cde9
   converts typical sed regexes to perl-style regexes. The input file
d0cde9
   consists of 10 lines:
d0cde9

d0cde9
       [0-9]
d0cde9
       \d
d0cde9
       [^0-9]
d0cde9
       \D
d0cde9
       \+
d0cde9
       +
d0cde9
       \?
d0cde9
       ?
d0cde9
       \|
d0cde9
       |
d0cde9

d0cde9
   Run the command "sed -f sub_quote.sed input", to transform the
d0cde9
   input file (above) to 5 lines of output:
d0cde9

d0cde9
       s/\[0-9\]/\\d/
d0cde9
       s/\[^0-9\]/\\D/
d0cde9
       s/\\+/+/
d0cde9
       s/\\?/?/
d0cde9
       s/\\|/|/
d0cde9

d0cde9
   The above file is itself a sed script, which can then be used to
d0cde9
   modify other files.
d0cde9

d0cde9
4.16. How do I convert a string to all lowercase or capital letters?
d0cde9

d0cde9
   The easiest method is to use a new version of GNU sed, ssed, sedmod
d0cde9
   or sed16 and employ the \U, \L, or other switches on the right side
d0cde9
   of an s/// command. For example, to convert any word which begins
d0cde9
   with "reg" or "exp" into solid capital letters:
d0cde9

d0cde9
       sed -r "s/\<(reg|exp)[a-z]+/\U&/g"              # gsed4.+ or ssed
d0cde9
       sed "s/\
d0cde9

d0cde9
   As you can see, sedmod and sed16 do not support alternation (|),
d0cde9
   but they do support case conversion. If none of these versions of
d0cde9
   sed are available to you, some sample scripts for this task are
d0cde9
   available from the Seder's Grab Bag:
d0cde9

d0cde9
       http://sed.sourceforge.net/grabbag/scripts
d0cde9

d0cde9
   Note that some case conversion scripts are listed under "Filename
d0cde9
   manipulation" and others are under "Text formatting."
d0cde9

d0cde9
   CHANGING BLOCKS (consecutive lines)
d0cde9

d0cde9
4.20. How do I change only one section of a file?
d0cde9

d0cde9
   You can match a range of lines by line number, by regexes (say, all
d0cde9
   lines between the words "from" and "until"), or by a combination of
d0cde9
   the two. For multiple substitutions on the same range, put the
d0cde9
   command(s) between braces {...}. For example:
d0cde9

d0cde9
     # replace only between lines 1 and 20
d0cde9
     1,20 s/Johnson/White/g
d0cde9

d0cde9
     # replace everywhere EXCEPT between lines 1 and 20
d0cde9
     1,20 !s/Johnson/White/g
d0cde9

d0cde9
     # replace only between words "from" and "until". Note the
d0cde9
     # use of \<....\> as word boundary markers in GNU sed.
d0cde9
     /from/,/until/ { s/\<red\>/magenta/g; s/\<blue\>/cyan/g; }
d0cde9

d0cde9
     # replace only from the words "ENDNOTES:" to the end of file
d0cde9
     /ENDNOTES:/,$ { s/Schaff/Herzog/g; s/Kraft/Ebbing/g; }
d0cde9

d0cde9
   For technical details on using address ranges, see section 3.3
d0cde9
   ("Addressing and Address ranges").
d0cde9

d0cde9
4.21. How do I delete or change a block of text if the block contains
d0cde9
      a certain regular expression?
d0cde9

d0cde9
   The following deletes the block between 'start' and 'end'
d0cde9
   inclusively, if and only if the block contains the string
d0cde9
   'regex'. Written by Russell Davies, with additional comments:
d0cde9

d0cde9
     # sed script to delete a block if /regex/ matches inside it
d0cde9
     :t
d0cde9
     /start/,/end/ {    # For each line between these block markers..
d0cde9
        /end/!{         #   If we are not at the /end/ marker
d0cde9
           $!{          #     nor the last line of the file,
d0cde9
              N;        #     add the Next line to the pattern space
d0cde9
              bt
d0cde9
           }            #   and branch (loop back) to the :t label.
d0cde9
        }               # This line matches the /end/ marker.
d0cde9
        /regex/d;       # If /regex/ matches, delete the block.
d0cde9
     }                  # Otherwise, the block will be printed.
d0cde9
     #---end of script---
d0cde9

d0cde9
   Note: When the script above reaches /regex/, the entire multi-line
d0cde9
   block is in the pattern space. To replace items inside the block,
d0cde9
   use "s///". To change the entire block, use the 'c' (change)
d0cde9
   command:
d0cde9

d0cde9
     /regex/c\
d0cde9
     1: This will replace the entire block\
d0cde9
     2: with these two lines of text.
d0cde9

d0cde9
4.22. How do I locate a paragraph of text if the paragraph contains a
d0cde9
      certain regular expression?
d0cde9

d0cde9
   Assume that paragraphs are separated by blank lines. For regexes
d0cde9
   that are single terms, use one of the following scripts:
d0cde9

d0cde9
     sed -e '/./{H;$!d;}' -e 'x;/regex/!d'      # most seds
d0cde9
     sed '/./{H;$!d;};x;/regex/!d'              # GNU sed
d0cde9

d0cde9
   To print paragraphs only if they contain 3 specific regular
d0cde9
   expressions (RE1, RE2, and RE3), in any order in the paragraph:
d0cde9

d0cde9
     sed -e '/./{H;$!d;}' -e 'x;/RE1/!d;/RE2/!d;/RE3/!d'
d0cde9

d0cde9
   With this solution and the preceding one, if the paragraphs are
d0cde9
   excessively long (more than 4k in length), you may overflow sed's
d0cde9
   internal buffers. If using HHsed, you must add a "G;" command
d0cde9
   immediately after the "x;" in the scripts above to defeat a bug
d0cde9
   in HHsed (see section 7.9(5), below, for a description).
d0cde9

d0cde9
4.23. How do I match a block of _specific_ consecutive lines?
d0cde9

d0cde9
   There are three ways to approach this problem:
d0cde9

d0cde9
       (1) Try to use a "/range/, /expression/"
d0cde9
       (2) Try to use a "/multi-line\nexpression/"
d0cde9
       (3) Try to use a block of "literal strings"
d0cde9

d0cde9
   We describe each approach in the following sections.
d0cde9

d0cde9
4.23.1.  Try to use a "/range/, /expression/"
d0cde9

d0cde9
   If the block of lines are strings that *never change their order*
d0cde9
   and if the top line never occurs outside the block, like this:
d0cde9

d0cde9
       Abel
d0cde9
       Baker
d0cde9
       Charlie
d0cde9
       Delta
d0cde9

d0cde9
   then these solutions will work for deleting the block:
d0cde9

d0cde9
     sed 's/^Abel$/{N;N;N;d;}' files    # for blocks with few lines
d0cde9
     sed '/^Abel$/, /^Zebra$/d' files   # for blocks with many lines
d0cde9
     sed '/^Abel$/,+25d' files          # HHsed, sedmod, ssed, gsed 3.02.80
d0cde9

d0cde9
   To change the block, use the 'c' (change) command instead of 'd'.
d0cde9
   To print that block only, use the -n switch and 'p' (print) instead
d0cde9
   of 'd'. To change some things inside the block, try this:
d0cde9

d0cde9
     /^Abel$/,/^Delta$/ {
d0cde9
         :ack
d0cde9
         N;
d0cde9
         /\nDelta$/! b ack
d0cde9
         # At this point, all the lines in the block are collected
d0cde9
         s/ubstitute /somethin/g;
d0cde9
     }
d0cde9

d0cde9
4.23.2.  Try to use a "multi-line\nexpression"
d0cde9

d0cde9
   If the top line of the block sometimes appears alone or is
d0cde9
   sometimes followed by other lines, or if a partial block may occur
d0cde9
   somewhere in the file, a multi-line expression may be required.
d0cde9

d0cde9
   In these examples, we give solutions for matching an N-line block.
d0cde9
   The expression "/^RE1\nRE2\nRE3...$/" represents a properly formed
d0cde9
   regular expression where \n indicates a newline between lines. Note
d0cde9
   that the 'N' followed by the 'P;D;' commands forms a "sliding
d0cde9
   window" technique. A window of N lines is formed. If the multi-line
d0cde9
   pattern matches, the block is handled. If not, the top line is
d0cde9
   printed and then deleted from the pattern space, and we try to
d0cde9
   match at the next line.
d0cde9

d0cde9
     # sed script to delete 2 consecutive lines: /^RE1\nRE2$/
d0cde9
     $b
d0cde9
     /^RE1$/ {
d0cde9
       $!N
d0cde9
       /^RE1\nRE2$/d
d0cde9
       P;D
d0cde9
     }
d0cde9
     #---end of script---
d0cde9

d0cde9
     # sed script to delete 3 consecutive lines. (This script
d0cde9
     # fails under GNU sed v2.05 and earlier because of the 't'
d0cde9
     # bug when s///n is used; see section 7.5(1) of the FAQ.)
d0cde9
     : more
d0cde9
     $!N
d0cde9
     s/\n/&/;;
d0cde9
     t enough
d0cde9
     $!b more
d0cde9
     : enough
d0cde9
     /^RE1\nRE2\nRE3$/d
d0cde9
     P;D
d0cde9
     #---end of script---
d0cde9

d0cde9
   For example, to delete a block of 5 consecutive lines, the previous
d0cde9
   script must be altered in only two places:
d0cde9

d0cde9
   (1) Change the 2 in "s/\n/&/;;" to a 4 (the trailing semicolon is
d0cde9
   needed to work around a bug in HHsed v1.5).
d0cde9

d0cde9
   (2) Change the regex line to "/^RE1\nRE2\nRE3\nRE4\nRE5$/d",
d0cde9
   modifying the expression as needed.
d0cde9

d0cde9
   Suppose we want to delete a block of two blank lines followed by
d0cde9
   the word "foo" followed by another blank line (4 lines in all).
d0cde9
   Other blank lines and other instances of "foo" should be left
d0cde9
   alone. After changing the '2' to a '3' (always one number less than
d0cde9
   the total number of lines), the regex line would look like this:
d0cde9
   "/^\n\nfoo\n$/d". (Thanks to Greg Ubben for this script.)
d0cde9

d0cde9
   As an alternative to work around the 't' bug in older versions of
d0cde9
   GNU sed, the following script will delete 4 consecutive lines:
d0cde9

d0cde9
     # sed script to delete 4 consecutive lines. Use this if you
d0cde9
     # require GNU sed 2.05 and below.
d0cde9
     /^RE1$/!b
d0cde9
     $!N
d0cde9
     $!N
d0cde9
     :a
d0cde9
     $b
d0cde9
     N
d0cde9
     /^RE1\nRE2\nRE3\nRE4$/d
d0cde9
     P
d0cde9
     s/^.*\n\(.*\n.*\n.*\)$/\1/
d0cde9
     ba
d0cde9
     #---end of script---
d0cde9

d0cde9
   Its drawback is that it must be modified in 3 places instead of 2
d0cde9
   to adapt it for more lines, and as additional lines are added, the
d0cde9
   's' command is forced to work harder to match the regexes. On the
d0cde9
   other hand, it avoids a bug with gsed-2.05 and illustrates another
d0cde9
   way to solve the problem of deleting consecutive lines.
d0cde9

d0cde9
4.23.3.  Try to use a block of "literal strings"
d0cde9

d0cde9
   If you need to match a static block of text (which may occur any
d0cde9
   number of times throughout a file), where the contents of the block
d0cde9
   are known in advance, then this script is easy to use. It requires
d0cde9
   an intermediate file, which we will call "findrep.txt" (below):
d0cde9

d0cde9
       A block of several consecutive lines to
d0cde9
       be matched literally should be placed on
d0cde9
       top. Regular expressions like .*  or [a-z]
d0cde9
       will lose their special meaning and be
d0cde9
       interpreted literally in this block.
d0cde9
       ----
d0cde9
       Four hyphens separate the two sections. Put
d0cde9
       the replacement text in the lower section.
d0cde9
       As above, sed symbols like &, \n, or \1 will
d0cde9
       lose their special meaning.
d0cde9

d0cde9
   This is a 3-step process. A generic script called "blockrep.sed"
d0cde9
   will read "findrep.txt" (above) and generate a custom script, which
d0cde9
   is then used on the actual input file. In other words,
d0cde9
   "findrep.txt" is a simplified description of the editing that you
d0cde9
   want to do on the block, and "blockrep.sed" turns it into actual
d0cde9
   sed commands.
d0cde9

d0cde9
   Use this process from a Unix shell or from a DOS prompt:
d0cde9

d0cde9
     sed -nf blockrep.sed findrep.txt >custom.sed
d0cde9
     sed -f custom.sed input.file >output.file
d0cde9
     erase custom.sed
d0cde9

d0cde9
   The generic script "blockrep.sed" follows below. It's fairly long.
d0cde9
   Examining its output might help you understanding how to use the
d0cde9
   _sliding window_ technique.
d0cde9

d0cde9
     # filename: blockrep.sed
d0cde9
     #   author: Paolo Bonzini
d0cde9
     # Requires:
d0cde9
     #    (1) blocks to find and replace, e.g., findrep.txt
d0cde9
     #    (2) an input file to be changed, input.file
d0cde9
     #
d0cde9
     # blockrep.sed creates a second sed script, custom.sed,
d0cde9
     # to find the lines above the row of 4 hyphens, globally
d0cde9
     # replacing them with the lower block of text. GNU sed
d0cde9
     # is recommended but not required for this script.
d0cde9
     #
d0cde9
     # Loop on the first part, accumulating the `from' text
d0cde9
     # into the hold space.
d0cde9
     :a
d0cde9
     /^----$/! {
d0cde9
        # Escape slashes, backslashes, the final newline and
d0cde9
        # regular expression metacharacters.
d0cde9
        s,[/\[.*],\\&,g
d0cde9
        s/$/\\/
d0cde9
        H
d0cde9
        #
d0cde9
        # Append N cmds needed to maintain the sliding window.
d0cde9
        x
d0cde9
        1 s,^.,s/,
d0cde9
        1! s/^/N\
d0cde9
     /
d0cde9
        x
d0cde9
        n
d0cde9
        ba
d0cde9
     }
d0cde9
     #
d0cde9
     # Change the final backslash to a slash to separate the
d0cde9
     # two sides of the s command.
d0cde9
     x
d0cde9
     s,\\$,/,
d0cde9
     x
d0cde9
     #
d0cde9
     # Until EOF, gather the substitution into hold space.
d0cde9
     :b
d0cde9
     n
d0cde9
     s,[/\],\\&,g
d0cde9
     $! s/$/\\/
d0cde9
     H
d0cde9
     $! bb
d0cde9
     #
d0cde9
     # Start the RHS of the s command without a leading
d0cde9
     # newline, add the P/D pair for the sliding window, and
d0cde9
     # print the script.
d0cde9
     g
d0cde9
     s,/\n,/,
d0cde9
     s,$,/\
d0cde9
     P\
d0cde9
     D,p
d0cde9
     #---end of script---
d0cde9

d0cde9
4.24. How do I address all the lines between RE1 and RE2, excluding the
d0cde9
      lines themselves?
d0cde9

d0cde9
   Normally, to address the lines between two regular expressions, RE1
d0cde9
   and RE2, one would do this: '/RE1/,/RE2/{commands;}'. Excluding
d0cde9
   those lines takes an extra step. To put 2 arrows before each line
d0cde9
   between RE1 and RE2, except for those lines:
d0cde9

d0cde9
     sed '1,/RE1/!{ /RE2/,/RE1/!s/^/>>/; }' input.fil
d0cde9

d0cde9
   The preceding script, though short, may be difficult to follow. It
d0cde9
   also requires that /RE1/ cannot occur on the first line of the
d0cde9
   input file. The following script, though it's not a one-liner, is
d0cde9
   easier to read and it permits /RE1/ to appear on the first line:
d0cde9

d0cde9
     # sed script to replace all lines between /RE1/ and /RE2/,
d0cde9
     # without matching /RE1/ or /RE2/
d0cde9
     /RE1/,/RE2/{
d0cde9
       /RE1/b
d0cde9
       /RE2/b
d0cde9
       s/^/>>/
d0cde9
     }
d0cde9
     #---end of script---
d0cde9

d0cde9
   Contents of input.fil:         Output of sed script:
d0cde9
      aaa                           aaa
d0cde9
      bbb                           bbb
d0cde9
      RE1                           RE1
d0cde9
      aaa                           >>aaa
d0cde9
      bbb                           >>bbb
d0cde9
      ccc                           >>ccc
d0cde9
      RE2                           RE2
d0cde9
      end                           end
d0cde9

d0cde9
4.25. How do I join two lines if line #1 ends in a [certain string]?
d0cde9

d0cde9
   This question appears in the section on one-line sed scripts, but
d0cde9
   it comes up so many times that it needs a place here also. Suppose
d0cde9
   a line ends with a particular string (often, a line ends with a
d0cde9
   backslash). How do you bring up the second line after it, even in
d0cde9
   cases where several consecutive lines all end in a backslash?
d0cde9

d0cde9
     sed -e :a -e '/\\$/N; s/\\\n//; ta' file   # all seds
d0cde9
     sed ':a; /\\$/N; s/\\\n//; ta' file        # GNU sed, ssed, HHsed
d0cde9

d0cde9
   Note that this replaces the backslash-newline with nothing. You may
d0cde9
   want to replace the backslash-newline with a single space instead.
d0cde9

d0cde9
4.26. How do I join two lines if line #2 begins in a [certain string]?
d0cde9

d0cde9
   The inverse situation is another FAQ. Suppose a line begins with a
d0cde9
   particular string. How do you bring that line up to follow the
d0cde9
   previous line? In this example, we want to match the string "<<="
d0cde9
   at the beginning of one line, bring that line up to the end of the
d0cde9
   line before it, and replace the string with a single space:
d0cde9

d0cde9
     sed -e :a -e '$!N;s/\n<<=/ /;ta' -e 'P;D' file   # all seds
d0cde9
     sed ':a; $!N;s/\n<<=/ /;ta;P;D' file             # GNU, ssed, sed15+
d0cde9

d0cde9
4.27. How do I change all paragraphs to long lines?
d0cde9

d0cde9
   A frequent request is how to convert DOS-style textfiles, in which
d0cde9
   each line ends with "paragraph marker", to Microsoft-style
d0cde9
   textfiles, in which the "paragraph" marker only appears at the end
d0cde9
   of real paragraphs. Sometimes this question is framed as, "How do I
d0cde9
   remove the hard returns at the end of each line in a paragraph?"
d0cde9

d0cde9
   The problem occurs because newer word processors don't work the
d0cde9
   same way older text editors did. Older text editors used a newline
d0cde9
   (CR/LF in DOS; LF alone in Unix) to end each line on screen or on
d0cde9
   disk, and used two newlines to separate paragraphs. Certain word
d0cde9
   processors wanted to make paragraph reformatting and reflowing work
d0cde9
   easily, so they use one newline to end a paragraph and never allow
d0cde9
   newlines _within_ a paragraph. This means that textfiles created
d0cde9
   with standard editors (Emacs, vi, Vedit, Boxer, etc.) appear to
d0cde9
   have "hard returns" at inappropriate places. The following sed
d0cde9
   script finds blocks of consecutive nonblank lines (i.e., paragraphs
d0cde9
   of text), and converts each block into one long line with one "hard
d0cde9
   return" at the end.
d0cde9

d0cde9
     # sed script to change all paragraphs to long lines
d0cde9
     /./{H; $!d;}             # Put each paragraph into hold space
d0cde9
     x;                       # Swap hold space and pattern space
d0cde9
     s/^\(\n\)\(..*\)$/\2\1/; # Move leading \n to end of PatSpace
d0cde9
     s/\n\(.\)/ \1/g;         # Replace all other \n with 1 space
d0cde9
     # Uncomment the following line to remove excess blank lines:
d0cde9
     # /./!d;
d0cde9
     #---end of sed script---
d0cde9

d0cde9
   If the input files have formatting or indentation that conveys
d0cde9
   special meaning (like program source code), this script will remove
d0cde9
   it. But if the text still needs to be extended, try 'par'
d0cde9
   (paragraph reformatter) or the 'fmt' utility with the -t or -c
d0cde9
   switches and the width option (-w) set to a number like 9999.
d0cde9

d0cde9
   SHELL AND ENVIRONMENT
d0cde9

d0cde9
4.30. How do I read environment variables with sed?
d0cde9

d0cde9
4.30.1. - on Unix platforms
d0cde9

d0cde9
   In Unix, environment variables begin with a dollar sign, such as
d0cde9
   $TERM, $PATH, $var or $i. In sed, the dollar sign is used to
d0cde9
   indicate the last line of the input file, the end of a line (in the
d0cde9
   LHS), or a literal symbol (in the RHS). Sed cannot access variables
d0cde9
   directly, so one must pay attention to shell quoting requirements
d0cde9
   to expand the variables properly.
d0cde9

d0cde9
   To ALLOW the Unix shell to interpret the dollar sign, put the
d0cde9
   script in double quotes:
d0cde9

d0cde9
     sed "s/_terminal-type_/$TERM/g" input.file >output.file
d0cde9

d0cde9
   To PREVENT the Unix shell from interpreting the dollar sign as a
d0cde9
   shell variable, put the script in single quotes:
d0cde9

d0cde9
     sed 's/.$//' infile >outfile
d0cde9

d0cde9
   To use BOTH Unix $environment_vars and sed /end-of-line$/ pattern
d0cde9
   matching, there are two solutions. (1) The easiest is to enclose
d0cde9
   the script in "double quotes" so the shell can see the $variables,
d0cde9
   and to prefix the sed metacharacter ($) with a backslash. Thus, in
d0cde9

d0cde9
     sed "s/$user\$/root/" file
d0cde9

d0cde9
   the shell interpolates $user and sed interprets \$ as the symbol
d0cde9
   for end-of-line.
d0cde9

d0cde9
   (2) Another method--somewhat less readable--is to concatenate the
d0cde9
   script with 'single quotes' where the $ should not be interpolated
d0cde9
   and "double quotes" where variable interpolation should occur. To
d0cde9
   demonstrate using the preceding script:
d0cde9

d0cde9
     sed "s/$user"'$/root/' file
d0cde9

d0cde9
   Solution #1 seems easier to remember. In either case, we search for
d0cde9
   the user's name (stored in a variable called $user) when it occurs
d0cde9
   at the end of the line ($), and substitute the word "root" in all
d0cde9
   matches.
d0cde9

d0cde9
   For longer shell scripts, it is sometimes useful to begin with
d0cde9
   single quote marks ('), close them upon encountering the variable,
d0cde9
   enclose the variable name in double quotes ("), and resume with
d0cde9
   single quotes, closing them at the end of the sed script.  Example:
d0cde9

d0cde9
     #! /bin/sh
d0cde9
     # sed script to illustrate 'quote'"matching"'usage'
d0cde9
     FROM='abcdefgh'
d0cde9
     TO='ABCDEFGH'
d0cde9
     sed -e '
d0cde9
     y/'"$FROM"'/'"$TO"'/;    # note the quote pairing
d0cde9
     # some more commands go here . . .
d0cde9
     # last line is a single quote mark
d0cde9
     '
d0cde9

d0cde9
   Thus, each variable named $FROM is replaced by $TO, and the single
d0cde9
   quotes are used to glue the multiple lines together in the script.
d0cde9
   (See also section 4.10, "How do I handle shell quoting in sed?")
d0cde9

d0cde9
4.30.2. - on MS-DOS and 4DOS platforms
d0cde9

d0cde9
   Under 4DOS and MS-DOS version 7.0 (Win95) or 7.10 (Win95 OSR2),
d0cde9
   environment variables can be accessed from the command prompt.
d0cde9
   Under MS-DOS v6.22 and below, environment variables can only be
d0cde9
   accessed from within batch files. Environment variables should be
d0cde9
   enclosed between percent signs and are case-insensitive; i.e.,
d0cde9
   %USER% or %user% will display the USER variable. To generate a true
d0cde9
   percent sign, just enter it twice.
d0cde9

d0cde9
   DOS versions of sed require that sed scripts be enclosed by double
d0cde9
   quote marks "..." (not single quotes!) if the script contains
d0cde9
   embedded tabs, spaces, redirection arrows or the vertical bar. In
d0cde9
   fact, if the input for sed comes from piping, a sed script should
d0cde9
   not contain a vertical bar, even if it is protected by double
d0cde9
   quotes (this seems to be bug in the normal MS-DOS syntax). Thus,
d0cde9

d0cde9
       echo blurk | sed "s/^/ |foo /"     # will cause an error
d0cde9
       sed "s/^/ |foo /" blurk.txt        # will work as expected
d0cde9

d0cde9
   Using DOS environment variables which contain DOS path statements
d0cde9
   (such as a TMP variable set to "C:\TEMP") within sed scripts is
d0cde9
   discouraged because sed will interpret the backslash '\' as a
d0cde9
   metacharacter to "quote" the next character, not as a normal
d0cde9
   symbol. Thus,
d0cde9

d0cde9
       sed "s/^/%TMP% /" somefile.txt
d0cde9

d0cde9
   will not prefix each line with (say) "C:\TEMP ", but will prefix
d0cde9
   each line with "C:TEMP "; sed will discard the backslash, which is
d0cde9
   probably not what you want. Other variables such as %PATH% and
d0cde9
   %COMSPEC% will also lose the backslash within sed scripts.
d0cde9

d0cde9
   Environment variables which do not use backslashes are usually
d0cde9
   workable. Thus, all the following should work without difficulty,
d0cde9
   if they are invoked from within DOS batch files:
d0cde9

d0cde9
       sed "s/=username=/%USER%/g" somefile.txt
d0cde9
       echo %FILENAME% | sed "s/\.TXT/.BAK/"
d0cde9
       grep -Ei "%string%" somefile.txt | sed "s/^/  /"
d0cde9

d0cde9
   while from either the DOS prompt or from within a batch file,
d0cde9

d0cde9
       sed "s/%%/ percent/g" input.fil >output.fil
d0cde9

d0cde9
   will replace each percent symbol in a file with " percent" (adding
d0cde9
   the leading space for readability).
d0cde9

d0cde9
4.31. How do I export or pass variables back into the environment?
d0cde9

d0cde9
4.31.1. - on Unix platforms
d0cde9

d0cde9
   Suppose that line #1, word #2 of the file 'terminals' contains a
d0cde9
   value to be put in your TERM environment variable. Sed cannot
d0cde9
   export variables directly to the shell, but it can pass strings to
d0cde9
   shell commands. To set a variable in the Bourne shell:
d0cde9

d0cde9
       TERM=`sed 's/^[^ ][^ ]* \([^ ][^ ]*\).*/\1/;q' terminals`;
d0cde9
       export TERM
d0cde9

d0cde9
   If the second word were "Wyse50", this would send the shell command
d0cde9
   "TERM=Wyse50".
d0cde9

d0cde9
4.31.2. - on MS-DOS or 4DOS platforms
d0cde9

d0cde9
   Sed cannot directly manipulate the environment. Under DOS, only
d0cde9
   batch files (.BAT) can do this, using the SET instruction, since
d0cde9
   they are run directly by the command shell. Under 4DOS, special
d0cde9
   4DOS commands (such as ESET) can also alter the environment.
d0cde9

d0cde9
   Under DOS or 4DOS, sed can select a word and pass it to the SET
d0cde9
   command. Suppose you want the 1st word of the 2nd line of MY.DAT
d0cde9
   put into an environment variable named %PHONE%. You might do this:
d0cde9

d0cde9
       @echo off
d0cde9
       sed -n "2 s/^\([^ ][^ ]*\) .*/SET PHONE=\1/p;3q" MY.DAT > GO_.BAT
d0cde9
       call GO_.BAT
d0cde9
       echo The environment variable for PHONE is %PHONE%
d0cde9
       :: cleanup
d0cde9
       del GO_.BAT
d0cde9

d0cde9
   The sed script assumes that the first character on the 2nd line is
d0cde9
   not a space and uses grouping \(...\) to save the first string of
d0cde9
   non-space characters as \1 for the RHS. In writing any batch files,
d0cde9
   make sure that output filenames such as GO_.BAT don't overwrite
d0cde9
   preexisting files of the same name.
d0cde9

d0cde9
4.32. How do I handle Unix shell quoting in sed?
d0cde9

d0cde9
   To embed a literal single quote (') in a script, use (a) or (b):
d0cde9

d0cde9
   (a) If possible, put the script in double quotes:
d0cde9

d0cde9
     sed "s/cannot/can't/g" file
d0cde9

d0cde9
   (b) If the script must use single quotes, then close-single-quote
d0cde9
   the script just before the SPECIAL single quote, prefix the single
d0cde9
   quote with a backslash, and use a 2nd pair of single quotes to
d0cde9
   finish marking the script. Thus:
d0cde9

d0cde9
     sed 's/cannot$/can'\''t/g' file
d0cde9

d0cde9
   Though this looks hard to read, it breaks down to 3 parts:
d0cde9

d0cde9
      's/cannot$/can'   \'   't/g'
d0cde9
      ---------------   --   -----
d0cde9

d0cde9
   To embed a literal double quote (") in a script, use (a) or (b):
d0cde9

d0cde9
   (a) If possible, put the script in single quotes. You don't need to
d0cde9
   prefix the double quotes with anything. Thus:
d0cde9

d0cde9
     sed 's/14"/fourteen inches/g' file
d0cde9

d0cde9
   (b) If the script must use double quotes, then prefix the SPECIAL
d0cde9
   double quote with a backslash (\). Thus,
d0cde9

d0cde9
     sed "s/$length\"/$length inches/g" file
d0cde9

d0cde9
   To embed a literal backslash (\) into a script, enter it twice:
d0cde9

d0cde9
     sed 's/C:\\DOS/D:\\DOS/g' config.sys
d0cde9

d0cde9
   FILES, DIRECTORIES, AND PATHS
d0cde9

d0cde9
4.40. How do I read (insert/add) a file at the top of a textfile?
d0cde9

d0cde9
   Normally, adding a "header" file to the top of a "body" file is
d0cde9
   done from the command prompt before passing the file on to sed.
d0cde9
   (MS-DOS below version 6.0 must use COPY and DEL instead of MOVE in
d0cde9
   the following example.)
d0cde9

d0cde9
       copy header.txt+body temp                  # MS-DOS command 1
d0cde9
       echo Y | move temp body                    # MS-DOS command 2
d0cde9
                                                    #
d0cde9
       cat header.txt body >temp; mv temp body    # Unix commands
d0cde9

d0cde9
   However, if inserting the file must occur within sed, there is a
d0cde9
   way. The sed command "1 r header.txt" will not work; it will print
d0cde9
   line 1 and then insert "header.txt" between lines 1 and 2. The
d0cde9
   following script solves this problem; however, there must be at
d0cde9
   least 2 lines in the target file for the script to work properly.
d0cde9

d0cde9
     # sed script to insert "header.txt" above the first line
d0cde9
     1{h; r header.txt
d0cde9
       D; }
d0cde9
     2{x; G; }
d0cde9
     #---end of sed script---
d0cde9

d0cde9
4.41. How do I make substitutions in every file in a directory, or in
d0cde9
      a complete directory tree?
d0cde9

d0cde9
4.41.1. - ssed and Perl solution
d0cde9

d0cde9
   The best solution for multiple files in a single directory is to
d0cde9
   use ssed or gsed v4.0 or higher:
d0cde9

d0cde9
     sed -i.BAK 's|foo|bar|g' files       # -i does in-place replacement
d0cde9

d0cde9
   If you don't have ssed, there is a similar solution in Perl. (Yes,
d0cde9
   we know this is a FAQ file for sed, not perl, but perl is more
d0cde9
   common than ssed for many users.)
d0cde9

d0cde9
     perl -pi.bak -e 's|foo|bar|g' files                # or
d0cde9
     perl -pi.bak -e 's|foo|bar|g' `find /pathname -name "filespec"`
d0cde9

d0cde9
   For each file in the filelist, sed (or Perl) renames the source
d0cde9
   file to "filename.bak"; the modified file gets the original
d0cde9
   filename. Remove '.bak' if you don't need backup copies. (Note the
d0cde9
   use of "s|||" instead of "s///" here, and in the scripts below. The
d0cde9
   vertical bars in the 's' command let you replace '/some/path' with
d0cde9
   '/another/path', accommodating slashes in the LHS and RHS.)
d0cde9

d0cde9
   To recurse directories in Unix or GNU/Linux:
d0cde9

d0cde9
     # We use xargs to prevent passing too many filenames to sed, but
d0cde9
     # this command will fail if filenames contain spaces or newlines.
d0cde9
     find /my/path -name '*.ht' -print | xargs sed -i.BAK 's|foo|bar|g'
d0cde9

d0cde9
   To recurse directories under Windows 2000 (CMD.EXE or COMMAND.COM):
d0cde9

d0cde9
     # This syntax isn't supported under Windows 9x COMMAND.COM
d0cde9
     for /R c:\my\path %f in (*.htm) do sed -i.BAK "s|foo|bar|g" %f
d0cde9

d0cde9
4.41.2. - Unix solution
d0cde9

d0cde9
   For all files in a single directory, assuming they end with *.txt
d0cde9
   and you have no files named "[anything].txt.bak" already, use a
d0cde9
   shell script:
d0cde9

d0cde9
     #! /bin/sh
d0cde9
     # Source files are saved as "filename.txt.bak" in case of error
d0cde9
     # The '&&' after cp is an additional safety feature
d0cde9
     for file in *.txt
d0cde9
     do
d0cde9
        cp $file $file.bak &&
d0cde9
        sed 's|foo|bar|g' $file.bak >$file
d0cde9
     done
d0cde9

d0cde9
   To do an entire directory tree, use the Unix utility find, like so
d0cde9
   (thanks to Jim Dennis <jadestar@rahul.net> for this script):
d0cde9

d0cde9
     #! /bin/sh
d0cde9
     # filename: replaceall
d0cde9
     # Backup files are NOT saved in this script.
d0cde9
     find . -type f -name '*.txt' -print | while read i
d0cde9
     do
d0cde9
        sed 's|foo|bar|g' $i > $i.tmp && mv $i.tmp $i
d0cde9
     done
d0cde9

d0cde9
   This previous shell script recurses through the directory tree,
d0cde9
   finding only files in the directory (not symbolic links, which will
d0cde9
   be encountered by the shell command "for file in *.txt", above). To
d0cde9
   preserve file permissions and make backup copies, use the 2-line cp
d0cde9
   routine of the earlier script instead of "sed ... && mv ...". By
d0cde9
   replacing the sed command 's|foo|bar|g' with something like
d0cde9

d0cde9
     sed "s|$1|$2|g" ${i}.bak > $i
d0cde9

d0cde9
   using double quotes instead of single quotes, the user can also
d0cde9
   employ positional parameters on the shell script command tail, thus
d0cde9
   reusing the script from time to time. For example,
d0cde9

d0cde9
       replaceall East West
d0cde9

d0cde9
   would modify all your *.txt files in the current directory.
d0cde9

d0cde9
4.41.3. - DOS solution:
d0cde9

d0cde9
   MS-DOS users should use two batch files like this:
d0cde9

d0cde9
      @echo off
d0cde9
      :: MS-DOS filename: REPLACE.BAT
d0cde9
      ::
d0cde9
      :: Create a destination directory to put the new files.
d0cde9
      :: Note: The next command will fail under Novel Netware
d0cde9
      :: below version 4.10 unless "SHOW DOTS=ON" is active.
d0cde9
      if not exist .\NEWFILES\NUL mkdir NEWFILES
d0cde9
      for %%f in (*.txt) do CALL REPL_2.BAT %%f
d0cde9
      echo Done!!
d0cde9
      :: ---End of first batch file---
d0cde9

d0cde9
      @echo off
d0cde9
      :: MS-DOS filename: REPL_2.BAT
d0cde9
      ::
d0cde9
      sed "s/foo/bar/g" %1 > NEWFILES\%1
d0cde9
      :: ---End of the second batch file---
d0cde9

d0cde9
   When finished, the current directory contains all the original
d0cde9
   files, and the newly-created NEWFILES subdirectory contains the
d0cde9
   modified *.TXT files. Do not attempt a command like
d0cde9

d0cde9
       for %%f in (*.txt) do sed "s/foo/bar/g" %%f >NEWFILES\%%f
d0cde9

d0cde9
   under any version of MS-DOS because the output filename will be
d0cde9
   created as a literal '%f' in the NEWFILES directory before the
d0cde9
   %%f is expanded to become each filename in (*.txt). This occurs
d0cde9
   because MS-DOS creates output filenames via redirection commands
d0cde9
   before it expands "for..in..do" variables.
d0cde9

d0cde9
   To recurse through an entire directory tree in MS-DOS requires a
d0cde9
   batch file more complex than we have room to describe. Examine the
d0cde9
   file SWEEP.BAT in Timo Salmi's great archive of batch tricks,
d0cde9
   located at <ftp://garbo.uwasa.fi/pc/link/tsbat.zip> (this file is
d0cde9
   regularly updated). Another alternative is to get an external
d0cde9
   program designed for directory recursion. Here are some recommended
d0cde9
   programs for directory recursion. The first one, FORALL, runs under
d0cde9
   either OS/2 or DOS. Unfortunately, none of these supports Win9x
d0cde9
   long filenames.
d0cde9

d0cde9
       http://hobbes.nmsu.edu/pub/os2/util/disk/forall72.zip
d0cde9
       ftp://garbo.uwasa.fi/pc/filefind/target15.zip
d0cde9

d0cde9
4.42. How do I replace "/some/UNIX/path" in a substitution?
d0cde9

d0cde9
   Technically, the normal meaning of the slash can be disabled by
d0cde9
   prefixing it with a backslash. Thus,
d0cde9

d0cde9
     sed 's/\/some\/UNIX\/path/\/a\/new\/path/g' files
d0cde9

d0cde9
   But this is hard to read and write. There is a better solution.
d0cde9
   The s/// substitution command allows '/' to be replaced by any
d0cde9
   other character (including spaces or alphanumerics). Thus,
d0cde9

d0cde9
     sed 's|/some/UNIX/path|/a/new/path|g' files
d0cde9

d0cde9
   and if you are using variable names in a Unix shell script,
d0cde9

d0cde9
     sed "s|$OLDPATH|$NEWPATH|g" oldfile >newfile
d0cde9

d0cde9
4.43. How do I replace "C:\SOME\DOS\PATH" in a substitution?
d0cde9

d0cde9
   For MS-DOS users, every backslash must be doubled. Thus, to replace
d0cde9
   "C:\SOME\DOS\PATH" with "D:\MY\NEW\PATH":
d0cde9

d0cde9
     sed "s|C:\\SOME\\DOS\\PATH|D:\\MY\\NEW\\PATH|g" infile >outfile
d0cde9

d0cde9
   Remember that DOS pathnames are not case sensitive and can appear
d0cde9
   in upper or lower case in the input file. If this concerns you, use
d0cde9
   a version of sed which can ignore case when matching (gsed, ssed,
d0cde9
   sedmod, sed16).
d0cde9

d0cde9
       @echo off
d0cde9
       :: sample MS-DOS batch file to alter path statements
d0cde9
       :: requires GNU sed with the /i flag for s///
d0cde9
       set old=C:\\SOME\\DOS\\PATH
d0cde9
       set new=D:\\MY\\NEW\\PATH
d0cde9
       gsed "s|%old%|%new%|gi" infile >outfile
d0cde9
       :: or
d0cde9
       ::     sedmod -i "s|%old%|%new%|g" infile >outfile
d0cde9
       set old=
d0cde9
       set new=
d0cde9

d0cde9
   Also, remember that under Windows long filenames may be stored in
d0cde9
   two formats: e.g., as "C:\Program Files" or as "C:\PROGRA~1".
d0cde9

d0cde9
4.44.  How do I emulate file-includes, using sed?
d0cde9

d0cde9
   Given an input file with file-include statements, similar to
d0cde9
   C-style includes or "server-side includes" (SSI) of this format:
d0cde9

d0cde9
       This is the source file. It's short.
d0cde9
       Its name is simply 'source'. See the script below.
d0cde9
       
d0cde9
              And this is any amount of text between
d0cde9
       
d0cde9
       This is the last line of the file.
d0cde9

d0cde9
   How do we direct sed to import/insert whichever files are at the
d0cde9
   point of the 'file="filename"' token? First, use this file:
d0cde9

d0cde9
     #n
d0cde9
     # filename: incl.sed
d0cde9
     # Comments supported by GNU sed or ssed. Leading '#n' should
d0cde9
     # be on line 1, columns 1-2 of the line.
d0cde9
     /
d0cde9
       =;                     #   print the line number
d0cde9
       s/^[^"]*"/{r /;        #   change pattern to 'r{ '
d0cde9
       s/".*//p;              #   delete rest to EOL, print
d0cde9
                              #   and a(ppend) a delete command
d0cde9
       a\
d0cde9
       d;}
d0cde9
     }
d0cde9
     #---end of sed script---
d0cde9

d0cde9
   Second, use the following shell script or DOS batch file (if
d0cde9
   running a DOS batch file, use "double quotes" instead of 'single
d0cde9
   quotes', and use "del" instead of "rm" to remove the temp file):
d0cde9

d0cde9
     sed -nf incl.sed source | sed 'N;N;s/\n//' >temp.sed
d0cde9
     sed -f temp.sed source >target
d0cde9
     rm temp.sed
d0cde9

d0cde9
   If you have GNU sed or ssed, you can reduce the script even further
d0cde9
   (thanks to Michael Carmack for the reminder):
d0cde9

d0cde9
     sed -nf incl.sed source | sed 'N;N;s/\n//' | sed -f - source >target
d0cde9

d0cde9
   In brief, the script replaces each filename with a 'r filename'
d0cde9
   command to insert the file at that point, while omitting the
d0cde9
   extraneous material. Two important things to note with this script:
d0cde9
   (1) There should be only one '#include file' directive per line, and
d0cde9
   (2) each '#include file' directive must be the *only* thing on that
d0cde9
   line, because everything else on the line will be deleted.
d0cde9

d0cde9
   Though the script uses GNU sed or ssed because of the great support
d0cde9
   for embedded script comments, it should run on any version of sed.
d0cde9
   If not, write me and let me know.
d0cde9

d0cde9
------------------------------
d0cde9

d0cde9
5. WHY ISN'T THIS WORKING?
d0cde9

d0cde9
5.1. Why don't my variables like $var get expanded in my sed script?
d0cde9

d0cde9
   Because your sed script uses 'single quotes' instead of "double
d0cde9
   quotes." Unix shells never expand $variables in single quotes.
d0cde9

d0cde9
   This is probably the most frequently-asked sed question. For more
d0cde9
   info on using variables, see section 4.30.
d0cde9

d0cde9
5.2. I'm using 'p' to print, but I have duplicate lines sometimes.
d0cde9

d0cde9
   Sed prints the entire file by default, so the 'p' command might
d0cde9
   cause the duplicate lines. If you want the whole file printed,
d0cde9
   try removing the 'p' from commands like 's/foo/bar/p'. If you want
d0cde9
   part of the file printed, run your sed script with -n flag to
d0cde9
   suppress normal output, and rewrite the script to get all output
d0cde9
   from the 'p' comand.
d0cde9

d0cde9
   If you're still getting duplicate lines, you are probably finding
d0cde9
   several matches for the same line. Suppose you want to print lines
d0cde9
   with the words "Peter" or "James" or "John", but not the same line
d0cde9
   twice. The following command will fail:
d0cde9

d0cde9
     sed -n '/Peter/p; /James/p; /John/p' files
d0cde9

d0cde9
   Since all 3 commands of the script are executed for each line,
d0cde9
   you'll get extra lines. A better way is to use the 'd' (delete) or
d0cde9
   'b' (branch) commands, like so (with GNU sed):
d0cde9

d0cde9
     sed '/Peter/b; /James/b; /John/b; d' files          # one way
d0cde9
     sed -n '/Peter/{p;d;};/James/{p;d;};/John/p' files  # a 2nd way
d0cde9
     sed -n '/Peter/{p;b;};/James/{p;b;};/John/p' files  # a 3rd way
d0cde9
     sed '/Peter\|James\|John/!d' files                  # shortest way
d0cde9

d0cde9
   On standard seds, these must be broken down with -e commands:
d0cde9

d0cde9
     sed -e '/Peter/b' -e '/James/b' -e '/John/b' -e d files
d0cde9
     sed -n -e '/Peter/{p;d;}' -e '/James/{p;d;}' -e '/John/p' files
d0cde9

d0cde9
   The 3rd line would require too many -e commands to fit on one line,
d0cde9
   since standard versions of sed require an -e command after each 'b'
d0cde9
   and also after each closing brace '}'.
d0cde9

d0cde9
5.3. Why does my DOS version of sed process a file part-way through
d0cde9
     and then quit?
d0cde9

d0cde9
   First, look for errors in the script. Have you used the -n switch
d0cde9
   without telling sed to print anything to the console? Have you read
d0cde9
   the docs to your version of sed to see if it has a syntax you may
d0cde9
   have misused? (Look for an N or H command that gathers too much.)
d0cde9

d0cde9
   Next, if you are sure your sed script is valid, a probable cause is
d0cde9
   an end-of-file marker embedded in the file. An EOF marker (SUB) is
d0cde9
   a Control-Z character, with the value of 1A hex (26 decimal). As
d0cde9
   soon as any DOS version of sed encounters a Ctrl-Z character, sed
d0cde9
   stops processing.
d0cde9

d0cde9
   To locate the EOF character, use Vern Buerg's shareware file viewer
d0cde9
   LIST.COM <http://www.buerg.com/list.html>. In text mode, look for a
d0cde9
   right-arrow symbol; in hex mode (Alt-H), look for a 1A code. With
d0cde9
   Unix utilities ported to DOS, use 'od' (octal dump) to display
d0cde9
   hexcodes in your file, and then use sed to locate the offending
d0cde9
   character:
d0cde9

d0cde9
       od -txC badfile.txt | sed -n "/ 1a /p; / 1a$/p"
d0cde9

d0cde9
   Then edit the input file to remove the offending character(s).
d0cde9

d0cde9
   If you would rather NOT edit the input file, there is still a fix.
d0cde9
   It requires the DJGPP 32-bit port of 'tr', the Unix translate
d0cde9
   program (v1.22 or higher). GNU od and tr are currently at v2.0 (for
d0cde9
   DOS); they are packaged with the GNU text utilities, available at
d0cde9

d0cde9
       ftp://ftp.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/txt20b.zip
d0cde9
       http://www.simtel.net/gnudlpage.php?product=/gnu/djgpp/v2gnu/txt20b.zip&name=txt20b.zip
d0cde9

d0cde9
   It is important to get the DJGPP version of 'tr' because other
d0cde9
   versions ported to DOS will stop processing when they encounter the
d0cde9
   EOF character. Use the -d (delete) command:
d0cde9

d0cde9
       tr -d \32 < badfile.txt | sed -f myscript.sed
d0cde9

d0cde9
5.4. My RE isn't matching/deleting what I want it to. (Or, "Greedy vs.
d0cde9
     stingy pattern matching")
d0cde9

d0cde9
   The two most common causes for this problem are: (1) misusing the
d0cde9
   '.' metacharacter, and (2) misusing the '*' metacharacter. The RE
d0cde9
   '.*' is designed to be "greedy" (i.e., matching as many characters
d0cde9
   as possible). However, sometimes users need an expression which is
d0cde9
   "stingy," matching the shortest possible string.
d0cde9

d0cde9
   (1) On single-line patterns, the '.' metacharacter matches any
d0cde9
   single character on the line. ('.' cannot match the newline at the
d0cde9
   end of the line because the newline is removed when the line is put
d0cde9
   into the pattern space; sed adds a newline automatically when the
d0cde9
   pattern space is printed.) On multi-line patterns obtained with the
d0cde9
   'N' or 'G' commands, '.' _will_ match a newline in the middle of the
d0cde9
   pattern space. If there are 3 lines in the pattern space, "s/.*//"
d0cde9
   will delete all 3 lines, not just the first one (leaving 1 blank
d0cde9
   line, since the trailing newline is added to the output).
d0cde9

d0cde9
   Normal misuse of '.' occurs in trying to match a word or bounded
d0cde9
   field, and forgetting that '.' will also cross the field limits.
d0cde9
   Suppose you want to delete the first word in braces:
d0cde9

d0cde9
       echo {one} {two} {three} | sed 's/{.*}/{}/'       # fails
d0cde9
       echo {one} {two} {three} | sed 's/{[^}]*}/{}/'    # succeeds
d0cde9

d0cde9
   's/{.*}/{}/' is not the solution, since the regex '.' will match
d0cde9
   any character, including the close braces. Replace the '.' with
d0cde9
   '[^}]', which signifies a negated character set '[^...]' containing
d0cde9
   anything other than a right brace. FWIW, we know that 's/{one}/{}/'
d0cde9
   would also solve our question, but we're trying to illustrate the
d0cde9
   use of the negated character set: [^anything-but-this].
d0cde9

d0cde9
   A negated character set should be used for matching words between
d0cde9
   quote marks, for fields separated by commas, and so on. See also
d0cde9
   section 4.12 ("How do I parse a comma-delimited data file?").
d0cde9

d0cde9
   (2) The '*' metacharacter represents zero or more instances of the
d0cde9
   previous expression. The '*' metacharacter looks for the leftmost
d0cde9
   possible match first and will match zero characters. Thus,
d0cde9

d0cde9
       echo foo | sed 's/o*/EEE/'
d0cde9

d0cde9
   will generate 'EEEfoo', not 'fEEE' as one might expect. This is
d0cde9
   because /o*/ matches the null string at the beginning of the word.
d0cde9

d0cde9
   After finding the leftmost possible match, the '*' is GREEDY; it
d0cde9
   always tries to match the longest possible string. When two or
d0cde9
   three instances of '.*' occur in the same RE, the leftmost instance
d0cde9
   will grab the most characters. Consider this example, which uses
d0cde9
   grouping '\(...\)' to save patterns:
d0cde9

d0cde9
       echo bar bat bay bet bit | sed 's/^.*\(b.*\)/\1/'
d0cde9

d0cde9
   What will be displayed is 'bit', never anything longer, because the
d0cde9
   leftmost '.*' took the longest possible match. Remember this rule:
d0cde9
   "leftmost match, longest possible string, zero also matches."
d0cde9

d0cde9
5.5. What is CSDPMI*B.ZIP and why do I need it?
d0cde9

d0cde9
   If you use MS-DOS outside of Windows and try to use GNU sed v1.18
d0cde9
   or 3.02, you may encounter the following error message:
d0cde9

d0cde9
       no DPMI - Get csdpmi*b.zip
d0cde9

d0cde9
   "DPMI" stands for DOS Protected Mode Interface; it's basically a
d0cde9
   means of running DOS in Protected Mode (as opposed to Real Mode),
d0cde9
   which allows programs to share resources in extended memory without
d0cde9
   conflicting with one another. Running HIMEM.SYS and EMM386.EXE is
d0cde9
   not enough. The "CSDPMI*B.ZIP" refers to files written by Charles
d0cde9
   Sandmann to provide DPMI services for 32-bit computers (i.e.,
d0cde9
   386SX, 386DX, 486SX, etc.). Download the binary file (the source
d0cde9
   code is also available):
d0cde9

d0cde9
       http://www.delorie.com/djgpp/dl/ofc/simtel/v2misc/csdpmi5b.zip  # binaries
d0cde9
       http://www.delorie.com/djgpp/dl/ofc/simtel/v2misc/csdpmi5s.zip  # source
d0cde9
       ftp://ftp.cdrom.com/pub/simtelnet/gnu/djgpp/v2misc/csdpmi5b.zip # binaries
d0cde9
       ftp://ftp.cdrom.com/pub/simtelnet/gnu/djgpp/v2misc/csdpmi5s.zip # source
d0cde9

d0cde9
   and extract CWSDPMI.EXE, CWSDPR0.EXE and CWSPARAM.EXE from the ZIP
d0cde9
   file. Put all 3 CWS*.EXE files in the same directory as GSED.EXE
d0cde9
   and you're all set. There are DOC files enclosed, but they're
d0cde9
   nearly incomprehensible for the average computer user. (Another
d0cde9
   case of user-vicious documentation.)
d0cde9

d0cde9
   If you're running Windows and you normally use a DOS session to run
d0cde9
   GNU sed (i.e., you get to a DOS prompt with a resizable window or
d0cde9
   you press Alt-Enter to switch to full-screen mode), you don't need
d0cde9
   the CWS*.EXE files at all, since Windows uses DPMI already.
d0cde9

d0cde9
5.6. Where are the man pages for GNU sed?
d0cde9

d0cde9
   Prior to GNU sed v3.02, there weren't any. Until recently, man
d0cde9
   pages distributed with gsed were borrowed from old sources or from
d0cde9
   other compilations. None of them were "official." GNU sed v3.02 had
d0cde9
   the first real set of official man pages, and the documentation has
d0cde9
   greatly improved with GNU sed version 4.0, which now includes both
d0cde9
   man pages and textinfo pages.
d0cde9

d0cde9
5.7. How do I tell what version of sed I am using?
d0cde9

d0cde9
   Try entering "sed" all by itself on the command line, followed by
d0cde9
   no arguments or parameters.  Also, try "sed --version".  In a
d0cde9
   pinch, you can also try this:
d0cde9

d0cde9
       strings sed | grep -i ver
d0cde9

d0cde9
   Your version of 'strings' must be a version of the Unix utility of
d0cde9
   this name. It should not be the DOS utility STRINGS.COM by Douglas
d0cde9
   Boling.
d0cde9

d0cde9
5.8. Does sed issue an exit code?
d0cde9

d0cde9
   Most versions of sed do not, but check the documentation that came
d0cde9
   with whichever version you are using. GNU sed issues an exit code
d0cde9
   of 0 if the program terminated normally, 1 if there were errors in
d0cde9
   the script, and 2 if there were errors during script execution.
d0cde9

d0cde9
5.9. The 'r' command isn't inserting the file into the text.
d0cde9

d0cde9
   On most versions of sed (but not all), the 'r' (read) and 'w'
d0cde9
   (write) commands must be followed by exactly one space, then the
d0cde9
   filename, and then terminated by a newline. Any additional
d0cde9
   characters before or after the filename are interpreted as *part*
d0cde9
   of the filename. Thus
d0cde9

d0cde9
       /RE/r  insert.me
d0cde9

d0cde9
   will would try to locate a file called ' insert.me' (note the
d0cde9
   leading space!). If the file was not found, most versions of sed
d0cde9
   say nothing, not even an error message.
d0cde9

d0cde9
   When sed scripts are used on the command line, every 'r' and 'w'
d0cde9
   must be the last command in that part of the script. Thus,
d0cde9

d0cde9
       sed -e '/regex/{r insert.file;d;}' source         # will fail
d0cde9
       sed -e '/regex/{r insert.file' -e 'd;}' source    # will succeed
d0cde9

d0cde9
5.10. Why can't I match or delete a newline using the \n escape sequence?
d0cde9
      Why can't I match 2 or more lines using \n?
d0cde9

d0cde9
   The \n will never match the newline at the end-of-line because the
d0cde9
   newline is always stripped off before the line is placed into the
d0cde9
   pattern space. To get 2 or more lines into the pattern space, use
d0cde9
   the 'N' command or something similar (such as 'H;...;g;').
d0cde9

d0cde9
   Sed works like this: sed reads one line at a time, chops off the
d0cde9
   terminating newline, puts what is left into the pattern space where
d0cde9
   the sed script can address or change it, and when the pattern space
d0cde9
   is printed, appends a newline to stdout (or to a file). If the
d0cde9
   pattern space is entirely or partially deleted with 'd' or 'D', the
d0cde9
   newline is *not* added in such cases. Thus, scripts like
d0cde9

d0cde9
       sed 's/\n//' file       # to delete newlines from each line
d0cde9
       sed 's/\n/foo\n/' file  # to add a word to the end of each line
d0cde9

d0cde9
   will _never_ work, because the trailing newline is removed _before_
d0cde9
   the line is put into the pattern space. To perform the above tasks,
d0cde9
   use one of these scripts instead:
d0cde9

d0cde9
       tr -d '\n' < file              # use tr to delete newlines
d0cde9
       sed ':a;N;$!ba;s/\n//g' file   # GNU sed to delete newlines
d0cde9
       sed 's/$/ foo/' file           # add "foo" to end of each line
d0cde9

d0cde9
   Since versions of sed other than GNU sed have limits to the size of
d0cde9
   the pattern buffer, the Unix 'tr' utility is to be preferred here.
d0cde9
   If the last line of the file contains a newline, GNU sed will add
d0cde9
   that newline to the output but delete all others, whereas tr will
d0cde9
   delete all newlines.
d0cde9

d0cde9
   To match a block of two or more lines, there are 3 basic choices:
d0cde9
   (1) use the 'N' command to add the Next line to the pattern space;
d0cde9
   (2) use the 'H' command at least twice to append the current line
d0cde9
   to the Hold space, and then retrieve the lines from the hold space
d0cde9
   with x, g, or G; or (3) use address ranges (see section 3.3, above)
d0cde9
   to match lines between two specified addresses.
d0cde9

d0cde9
   Choices (1) and (2) will put an \n into the pattern space, where it
d0cde9
   can be addressed as desired ('s/ABC\nXYZ/alphabet/g'). One example
d0cde9
   of using 'N' to delete a block of lines appears in section 4.13
d0cde9
   ("How do I delete a block of _specific_ consecutive lines?"). This
d0cde9
   example can be modified by changing the delete command to something
d0cde9
   else, like 'p' (print), 'i' (insert), 'c' (change), 'a' (append),
d0cde9
   or 's' (substitute).
d0cde9

d0cde9
   Choice (3) will not put an \n into the pattern space, but it _does_
d0cde9
   match a block of consecutive lines, so it may be that you don't
d0cde9
   even need the \n to find what you're looking for. Since several
d0cde9
   versions of sed support this syntax:
d0cde9

d0cde9
       sed '/start/,+4d'  # to delete "start" plus the next 4 lines,
d0cde9

d0cde9
   in addition to the traditional '/from here/,/to there/{...}' range
d0cde9
   addresses, it may be possible to avoid the use of \n entirely.
d0cde9

d0cde9
5.11. My script aborts with an error message, "event not found".
d0cde9

d0cde9
   This error is generated by the csh or tcsh shells, not by sed. The
d0cde9
   exclamation mark (!) is special to csh/tcsh, and if you use it in
d0cde9
   command-line or shell scripts--even within single quotes--it must
d0cde9
   be preceded by a backslash. Thus, under the csh/tcsh shell:
d0cde9

d0cde9
       sed '/regex/!d'      # will fail
d0cde9
       sed '/regex/\!d'     # will succeed
d0cde9

d0cde9
   The exclamation mark should not be prefixed with a backslash when
d0cde9
   the script is called from a file, as "-f script.file".
d0cde9

d0cde9
------------------------------
d0cde9

d0cde9
6. OTHER ISSUES
d0cde9

d0cde9
6.1. I have a certain problem that stumps me. Where can I get help?
d0cde9

d0cde9
   Post your question on the "sed-users" mailing list (section 2.3.2),
d0cde9
   where many sed users will be able to see your question. You will have
d0cde9
   to subscribe to have posting privileges.
d0cde9

d0cde9
   Your other alternative is one of these newsgroups:
d0cde9

d0cde9
      - alt.comp.editors.batch
d0cde9
      - comp.editors
d0cde9
      - comp.unix.questions
d0cde9
      - comp.unix.shell
d0cde9

d0cde9
6.2. How does sed compare with awk, perl, and other utilities?
d0cde9

d0cde9
   Awk is a much richer language with many features of a programming
d0cde9
   language, including variable names, math functions, arrays, system
d0cde9
   calls, etc. Its command structure is similar to sed:
d0cde9

d0cde9
      address { command(s) }
d0cde9

d0cde9
   which means that for each line or range of lines that matches the
d0cde9
   address, execute the command(s). In both sed and awk, an address
d0cde9
   can be a line number or a RE somewhere on the line, or both.
d0cde9

d0cde9
   In program size, awk is 3-10 times larger than sed. Awk has most of
d0cde9
   the functions of sed, but not all. Notably, sed supports
d0cde9
   backreferences (\1, \2, ...) to previous expressions, and awk does
d0cde9
   not have any comparable syntax. (One exception: GNU awk v3.0
d0cde9
   introduced gensub(), which supports backreferences only on
d0cde9
   substitutions.)
d0cde9

d0cde9
   Perl is a general-purpose programming language, with many features
d0cde9
   beyond text processing and interprocess communication, taking it
d0cde9
   well past awk or other scripting languages. Perl supports every
d0cde9
   feature sed does and has its own set of extended regular
d0cde9
   expressions, which give it extensive power in pattern matching and
d0cde9
   processing. (Note: the standard perl distribution comes with 's2p',
d0cde9
   a sed-to-perl conversion script. See section 3.6 for more info.)
d0cde9
   Like sed and awk, perl scripts do not need to be compiled into
d0cde9
   binary code. Like sed, perl can also run many useful "one-liners"
d0cde9
   from the command line, though with greater flexibility; see
d0cde9
   question 4.41 ("How do I make substitutions in every file in a
d0cde9
   directory, or in a complete directory tree?").
d0cde9

d0cde9
   On the other hand, the current version of perl is from 8 to 35
d0cde9
   times larger than sed in its executables alone (perl's library
d0cde9
   modules and allied files not included!). Further, for most simple
d0cde9
   tasks such as substitution, sed executes more quickly than either
d0cde9
   perl or awk. All these utilities serve to process input text,
d0cde9
   transforming it to meet our needs . . . or our arbitrary whims.
d0cde9

d0cde9
6.3. When should I use sed?
d0cde9

d0cde9
   When you need a small, fast program to modify words, lines, or
d0cde9
   blocks of lines in a textfile.
d0cde9

d0cde9
6.4. When should I NOT use sed?
d0cde9

d0cde9
   You should not use sed when you have "dedicated" tools which can do
d0cde9
   the job faster or with an easier syntax. Do not use sed when you
d0cde9
   only want to:
d0cde9

d0cde9
   - print individual lines, based on patterns within the line itself.
d0cde9
     Instead, use "grep".
d0cde9

d0cde9
   - print blocks of lines, with 1 or more lines of context above or
d0cde9
     below a specific regular expression. Instead, use the GNU version
d0cde9
     of grep as follows:
d0cde9

d0cde9
        grep -A{number} -B{number} "regex"
d0cde9

d0cde9
   - remove individual lines, based on patterns within the line
d0cde9
     itself. Instead, use "grep -v".
d0cde9

d0cde9
   - print line numbers.  Instead, use "nl" or "cat -n".
d0cde9

d0cde9
   - reformat lines or paragraphs. Instead, use "fold", "fmt" or "par".
d0cde9

d0cde9
   The tr utility is also more suited than sed to some simple tasks. For
d0cde9
   example, to:
d0cde9

d0cde9
   - delete individual characters. Instead of "s/[a-d]//g", use
d0cde9

d0cde9
        tr -d "[a-d]"
d0cde9

d0cde9
   - squeeze sequential characters. Instead of "s/ee*/e/g", use
d0cde9

d0cde9
        tr -s "{character-set}"
d0cde9

d0cde9
   - change individual characters. Instead of "y/abcdef/ABCDEF/", use
d0cde9

d0cde9
        tr "[a-f]" "[A-F]"
d0cde9

d0cde9
   Note, however, that tr does not support giving input files on the
d0cde9
   command line, so the syntax is:
d0cde9

d0cde9
     tr {options-and-patterns} < input-file
d0cde9

d0cde9
   or, to process multiple files:
d0cde9

d0cde9
     cat input-file1 input-file2 | tr {options-and-patterns}
d0cde9

d0cde9
   If you have multiple files, using tr instead of sed is often more of
d0cde9
   an exercise than a useful thing. Although sed can perfectly emulate
d0cde9
   certain functions of cat, grep, nl, rev, sort, tac, tail, tr, uniq,
d0cde9
   and other utilities, producing identical output, the native utilities
d0cde9
   are usually optimized to do the job more quickly than sed.
d0cde9

d0cde9
6.5. When should I ignore sed and use awk or Perl instead?
d0cde9

d0cde9
   If you can write the same script in awk or Perl and do it in less
d0cde9
   time, then use Perl or awk. There's no reason to spend an hour
d0cde9
   writing and debugging a sed script if you can do it in Perl in 10
d0cde9
   minutes (assuming that you know Perl already) and if the processing
d0cde9
   time or memory use is not a factor. Don't hunt pheasants with a .22
d0cde9
   if you have a shotgun at your side . . . unless you simply enjoy
d0cde9
   the challenge!
d0cde9

d0cde9
   Specifically, use awk or perl if you need to:
d0cde9

d0cde9
      - count fields or words on a line. (awk)
d0cde9
      - count lines in a block or objects in a file.
d0cde9
      - check lengths of strings or do math operations.
d0cde9
      - handle very long lines or need very large buffers. (or gsed)
d0cde9
      - handle binary data (control characters). (perl: binmode)
d0cde9
      - loop through an array or list.
d0cde9
      - test for file existence, filesize, or fileage.
d0cde9
      - treat each paragraph as a line. (well, not always)
d0cde9

d0cde9
6.6. Known limitations among sed versions
d0cde9

d0cde9
   Limits on distributed versions, although source code for most
d0cde9
   versions of free sed allows for modification and recompilation. As
d0cde9
   used below, "no limit" means there is no "fixed" limit. Limits are
d0cde9
   actually determined by one's hardware, memory, operating system,
d0cde9
   and which C library is used to compile sed.
d0cde9

d0cde9
6.6.1. Maximum line length
d0cde9

d0cde9
      GNU sed:        no limit
d0cde9
      ssed:           no limit
d0cde9
      sedmod v1.0:    4096 bytes
d0cde9
      HHsed v1.5:     4000 bytes
d0cde9
      sed v1.6:       [pending]
d0cde9

d0cde9
6.6.2. Maximum size for all buffers (pattern space + hold space)
d0cde9

d0cde9
      GNU sed:        no limit
d0cde9
      ssed:           no limit
d0cde9
      sedmod v1.0:    4096 bytes
d0cde9
      HHsed v1.5:     4000 bytes
d0cde9
      sed v1.6:       [pending]
d0cde9

d0cde9
6.6.3. Maximum number of files that can be read with read command
d0cde9

d0cde9
      GNU sed v3+:    no limit
d0cde9
      ssed:           no limit
d0cde9
      GNU sed v2.05:  total no. of r and w commands may not exceed 32
d0cde9
      sedmod v1.0:    total no. of r and w commands may not exceed 20
d0cde9
      sed v1.6:       [pending]
d0cde9

d0cde9
6.6.4. Maximum number of files that can be written with 'w' command
d0cde9

d0cde9
      GNU sed v3+:    no limit (but typical Unix is 253)
d0cde9
      ssed:           no limit (but typical Unix is 253)
d0cde9
      GNU sed v2.05:  total no. of r and w commands may not exceed 32
d0cde9
      sedmod v1.0:    10
d0cde9
      HHsed v1.5:     10
d0cde9
      sed v1.6:       [pending]
d0cde9

d0cde9
6.6.5. Limits on length of label names
d0cde9

d0cde9
      GNU sed:        no limit
d0cde9
      ssed:           no limit
d0cde9
      HHsed v1.5:     no limit
d0cde9
      sed v1.6:       [pending]
d0cde9
      BSD sed:        8 characters
d0cde9

d0cde9
   Note that GNU sed and ssed both consider a semicolon to terminate a
d0cde9
   label name.
d0cde9

d0cde9
6.6.6. Limits on length of write-file names
d0cde9

d0cde9
      GNU sed:        no limit
d0cde9
      ssed:           no limit
d0cde9
      HHsed v1.5:     no limit
d0cde9
      sed v1.6:       [pending]
d0cde9
      BSD sed:        40 characters
d0cde9

d0cde9
6.6.7. Limits on branch/jump commands
d0cde9

d0cde9
      GNU sed:        no limit
d0cde9
      ssed:           no limit
d0cde9
      HHsed v1.5:     50
d0cde9
      sed v1.6:       [pending]
d0cde9

d0cde9
   As a practical consequence, this means that HHsed will not read
d0cde9
   more than 50 lines into the pattern space via an N command, even if
d0cde9
   the pattern space is only a few hundred bytes in size. HHsed exits
d0cde9
   with an error message, "infinite branch loop at line {nn}".
d0cde9

d0cde9
6.7. Known incompatibilities between sed versions
d0cde9

d0cde9
6.7.1. Issuing commands from the command line
d0cde9

d0cde9
   Most versions of sed permit multiple commands to issued on the
d0cde9
   command line, separated by a semicolon (;). Thus,
d0cde9

d0cde9
       sed 'G;G' file
d0cde9

d0cde9
   should triple-space a file. However, for non-GNU sed, some commands
d0cde9
   *require* separate expressions on the command line. These include:
d0cde9

d0cde9
      - all labels (':a', ':more', etc.)
d0cde9
      - all branching instructions ('b', 't')
d0cde9
      - commands to read and write files ('r' and 'w')
d0cde9
      - any closing brace, '}'
d0cde9

d0cde9
   If these commands are used, they must be the LAST commands of an
d0cde9
   expression. Subsequent commands must use another expression
d0cde9
   (another -e switch plus arguments).  E.g.,
d0cde9

d0cde9
     sed  -e :a -e 's/^.\{1,77\}$/ &/;ta' -e 's/\( *\)\1/\1/' files
d0cde9

d0cde9
   GNU sed, ssed, sed15 and sed16 all permit these commands to be
d0cde9
   followed by a semicolon, so the previous script can be written:
d0cde9

d0cde9
     sed  ':a;s/^.\{1,77\}$/ &/;ta;s/\( *\)\1/\1/' files
d0cde9

d0cde9
   Versions differ in implementing the 'a' (append), 'c' (change), and
d0cde9
   'i' (insert) commands:
d0cde9

d0cde9
      sed "/foo/i New text here"              # HHsed/sedmod/gsed-30280
d0cde9
      gsed -e "/foo/i\\" -e "New text here"   # GNU sed
d0cde9
      sed1 -e "/foo/i" -e "New text here"     # one version of sed
d0cde9
      sed2 "/foo/i\ New text here"            # another version
d0cde9

d0cde9
6.7.2. Using comments (prefixed by the '#' sign)
d0cde9

d0cde9
   Most versions of sed permit comments to appear in sed scripts only
d0cde9
   on the first line of the script. Comments on line 2 or thereafter
d0cde9
   are not recognized and will generate an error like "unrecognized
d0cde9
   command" or "command [bad-line-here] has trailing garbage".
d0cde9

d0cde9
   GNU sed, HHsed, sedmod, and HP-UX sed permit comments to appear on
d0cde9
   any line of the script, except after labels and branching commands
d0cde9
   (b,t), *provided* that a semicolon (;) occurs after the command
d0cde9
   itself. This syntax makes sed similar to awk and perl, which use a
d0cde9
   similar commenting structure in their scripts.  Thus,
d0cde9

d0cde9
      # GNU style sed script
d0cde9
      $!N;                        # except for last line, get next line
d0cde9
      s/^\([0-9]\{5\}\).*\n\1.*//;    # if first 5 digits of each line
d0cde9
                                      # match, delete BOTH lines.
d0cde9
      t skip
d0cde9
      P;                              # print 1st line only if no match
d0cde9
      :skip
d0cde9
      D;                    # delete 1st line of pattern space and loop
d0cde9
      #---end of script---
d0cde9

d0cde9
   is a valid script for GNU-based versions of sed, but is
d0cde9
   unrecognized for most other versions of sed.
d0cde9

d0cde9
   Finally, if the first two characters in a disk file script are
d0cde9
   "#n", the output is suppressed, exactly as if -n were entered on
d0cde9
   the command line. This is true for the following versions of sed:
d0cde9

d0cde9
      - ssed v3.57 and above
d0cde9
      - gsed
d0cde9
      - HHsed v1.5
d0cde9
      - sed v1.6
d0cde9

d0cde9
   This syntax is not recognized by these versions of sed:
d0cde9

d0cde9
      - ssed v3.45 to v3.50 (other versions untested)
d0cde9
      - sedmod v1.0
d0cde9

d0cde9
6.7.3. Special syntax in REs
d0cde9

d0cde9
A. HHsed v1.5 (by Howard Helman)
d0cde9

d0cde9
   The following expressions can be used for /RE/ addresses or in the
d0cde9
   LHS side of a substitution:
d0cde9

d0cde9
      +    - 1 or more occurrences of previous RE: same as \{1,\}
d0cde9
      \<   - boundary between nonword and word character
d0cde9
      \>   - boundary between word and nonword character
d0cde9

d0cde9
   The following expressions can be used for /RE/ addresses or on
d0cde9
   either side of a substitution:
d0cde9

d0cde9
      \a   - bell         (ASCII 07, 0x07)
d0cde9
      \b   - backspace    (ASCII 08, 0x08)
d0cde9
      \e   - escape       (ASCII 27, 0x1B)
d0cde9
      \f   - formfeed     (ASCII 12, 0x0C)
d0cde9
      \n   - newline      (printed as 2 bytes, 0D 0A or ^M^J, in DOS)
d0cde9
      \r   - return       (ASCII 13, 0x0D)
d0cde9
      \t   - tab          (ASCII 09, 0x09)
d0cde9
      \v   - vertical tab (ASCII 11, 0x0B)
d0cde9
      \xHH - the ASCII character corresponding to 2 hex digits HH.
d0cde9

d0cde9
B. sed v1.6 (by Walter Briscoe)
d0cde9

d0cde9
   sed v1.6 accepts every expression supported by sed v1.5 (above),
d0cde9
   plus the following elements, which can also used in the RHS of a
d0cde9
   substitution (in addition to those listed above):
d0cde9

d0cde9
      \\~  - insert replacement pattern defined in last s/// command
d0cde9
             (must be used alone in the RHS)
d0cde9
      \l   - change next element to lower case
d0cde9
      \L   - change remaining elements to lower case
d0cde9
      \u   - change next element to upper case
d0cde9
      \U   - change remaining elements to upper case
d0cde9
      \e   - end case conversion of next element
d0cde9
      \E   - end case conversion of remaining elements
d0cde9
      $0   - insert pattern space BEFORE the substitution
d0cde9
      $1-$9 - match Nth word on the pattern space
d0cde9

d0cde9

d0cde9
C. sedmod v1.0 (by Hern Chen)
d0cde9

d0cde9
   The following expressions can be used for /RE/ addresses in the LHS
d0cde9
   of a substitution:
d0cde9

d0cde9
      +    - 1 or more occurrences of previous RE: same as \{1,\}
d0cde9
      \a   - any alphanumeric: same as [a-zA-Z0-9]
d0cde9
      \A   - 1 or more alphas: same as \a+
d0cde9
      \d   - any digit: same as [0-9]
d0cde9
      \D   - 1 or more digits: same as \d+
d0cde9
      \h   - any hex digit: same as [0-9a-fA-F]
d0cde9
      \H   - 1 or more hexdigits: same as \h+
d0cde9
      \l   - any letter: same as [A-Za-z]
d0cde9
      \L   - 1 or more letters: same as \l+
d0cde9
      \n   - newline      (read as 2 bytes, 0D 0A or ^M^J, in DOS)
d0cde9
      \s   - any whitespace character: space, tab, or vertical tab
d0cde9
      \S   - 1 or more whitespace chars: same as \s+
d0cde9
      \t   - tab          (ASCII 09, 0x09)
d0cde9
      \<   - boundary between nonword and word character
d0cde9
      \>   - boundary between word and nonword character
d0cde9

d0cde9
   The following expressions can be used in the RHS of a substitution.
d0cde9
   "Elements" refer to \1 .. \9, &, $0, or $1 .. $9:
d0cde9

d0cde9
      &    - insert regexp defined on LHS
d0cde9
      \e   - end case conversion of next element
d0cde9
      \E   - end case conversion of remaining elements
d0cde9
      \l   - change next element to lower case
d0cde9
      \L   - change remaining elements to lower case
d0cde9
      \n   - newline      (printed as 2 bytes, 0D 0A or ^M^J, in DOS)
d0cde9
      \t   - tab          (ASCII 09, 0x09)
d0cde9
      \u   - change next element to upper case
d0cde9
      \U   - change remaining elements to upper case
d0cde9
      $0   - insert the original pattern space
d0cde9
      $1-$9 - match Nth word on the pattern space
d0cde9

d0cde9
D. UnixDos sed
d0cde9

d0cde9
   The following expressions can be used in text, LHS, and RHS:
d0cde9

d0cde9
      \n   - newline      (printed as 2 bytes, 0D 0A or ^M^J, in DOS)
d0cde9

d0cde9
E. GNU sed v1.03 (by Frank Whaley)
d0cde9

d0cde9
   When used with the -x (extended) switch on the command line, or
d0cde9
   when '#x' occurs as the first line of a script, Whaley's gsed103
d0cde9
   supports the following expressions in both the LHS and RHS of a
d0cde9
   substitution:
d0cde9

d0cde9
      \|      matches the expression on either side
d0cde9
      ?       0 or 1 occurrences of previous RE: same as \{0,1\}
d0cde9
      +       1 or more occurrence of previous RE: same as \{1,\}
d0cde9
      \a      "alert" beep     (BEL, Ctrl-G, 0x07)
d0cde9
      \b      backspace        (BS, Ctrl-H, 0x08)
d0cde9
      \f      formfeed         (FF, Ctrl-L, 0x0C)
d0cde9
      \n      newline          (LF, Ctrl-J, 0x0A)
d0cde9
      \r      carriage-return  (CR, Ctrl-M, 0x0D)
d0cde9
      \t      horizontal tab   (HT, Ctrl-I, 0x09)
d0cde9
      \v      vertical tab     (VT, Ctrl-K, 0x0B)
d0cde9
      \bBBB   binary char, where BBB are 1-8 binary digits, [0-1]
d0cde9
      \dDDD   decimal char, where DDD are 1-3 decimal digits, [0-9]
d0cde9
      \oOOO   octal char, where OOO are 1-3 octal digits, [0-7]
d0cde9
      \xHH    hex char, where HH are 1-2 hex digits, [0-9A-F]
d0cde9

d0cde9
   In normal mode, with or without the -x switch, the following escape
d0cde9
   sequences are also supported in regex addressing or in the LHS of a
d0cde9
   substitution:
d0cde9

d0cde9
      \`      matches beginning of pattern space: same as /^/
d0cde9
      \'      matches end of pattern space: same as /$/
d0cde9
      \B      boundary between 2 word or 2 nonword characters
d0cde9
      \w      any nonword character [*BUG!* should be a word char]
d0cde9
      \W      any nonword character: same as /[^A-Za-z0-9]/
d0cde9
      \<      boundary between nonword and word char
d0cde9
      \>      boundary between word and nonword char
d0cde9

d0cde9
F. GNU sed v2.05 and higher versions
d0cde9

d0cde9
   The following expressions can be used for /RE/ addresses or in the
d0cde9
   LHS side of a substitution:
d0cde9

d0cde9
      \`  - matches the beginning of the pattern space (same as "^")
d0cde9
      \'  - matches the end of the pattern space (same as "$")
d0cde9
      \?  - 0 or 1 occurrence of previous character: same as \{0,1\}
d0cde9
      \+  - 1 or more occurrences of previous character: same as \{1,\}
d0cde9
      \|  - matches the string on either side, e.g., foo\|bar
d0cde9
      \b  - boundary between word and nonword chars (reversible)
d0cde9
      \B  - boundary between 2 word or between 2 nonword chars
d0cde9
      \n  - embedded newline (usable after N, G, or similar commands)
d0cde9
      \w  - any word character: [A-Za-z0-9_]
d0cde9
      \W  - any nonword char: [^A-Za-z0-9_]
d0cde9
      \<  - boundary between nonword and word character
d0cde9
      \>  - boundary between word and nonword character
d0cde9

d0cde9
   On \b, \B, \<, and \>, see section 6.7.4 ("Word boundaries"),
d0cde9
   below.
d0cde9

d0cde9
   Undocumented -r switch:
d0cde9

d0cde9
   Beginning with version 3.02, GNU sed has an undocumented -r switch
d0cde9
   (undocumented till version 4.0), activating Extended Regular
d0cde9
   Expressions in the following manner:
d0cde9

d0cde9
       ?      -  0 or 1 occurrence of previous character
d0cde9
       +      -  1 or more occurrences of previous character
d0cde9
       |      -  matches the string on either side, e.g., foo|bar
d0cde9
       (...)  -  enable grouping without backslash
d0cde9
       {...}  -  enable interval expression without backslash
d0cde9

d0cde9
   When the -r switch (mnemonic: "regular expression") is used, prefix
d0cde9
   these symbols with a backslash to disable the special meaning.
d0cde9

d0cde9
   Escape sequences:
d0cde9

d0cde9
   Beginning with version 3.02.80, the following escape sequences can
d0cde9
   now be used on both sides of a "s///" substitution:
d0cde9

d0cde9
      \a      "alert" beep     (BEL, Ctrl-G, 0x07)
d0cde9
      \f      formfeed         (FF, Ctrl-L, 0x0C)
d0cde9
      \n      newline          (LF, Ctrl-J, 0x0A)
d0cde9
      \r      carriage-return  (CR, Ctrl-M, 0x0D)
d0cde9
      \t      horizontal tab   (HT, Ctrl-I, 0x09)
d0cde9
      \v      vertical tab     (VT, Ctrl-K, 0x0B)
d0cde9
      \oNNN   a character with the octal value NNN
d0cde9
      \dNNN   a character with the decimal value NNN
d0cde9
      \xHH    a character with the hexadecimal value HH
d0cde9

d0cde9
   Note that GNU sed also supports "character classes", a POSIX
d0cde9
   extension to regexes, described in section 3.7, above.
d0cde9

d0cde9
G. sed 4.0 and higher versions
d0cde9

d0cde9
   The following expressions can be used in the RHS of a substitution.
d0cde9

d0cde9
      \e   - end case conversion
d0cde9
      \l   - change next character to lower case
d0cde9
      \L   - change remaining text to lower case
d0cde9
      \n   - newline      (printed as 2 bytes, 0D 0A or ^M^J, in DOS)
d0cde9
      \t   - tab          (ASCII 09, 0x09)
d0cde9
      \u   - change next character to upper case
d0cde9
      \U   - change remaining text to upper case
d0cde9

d0cde9
   In addition, GNU sed 4.0 can modify the way ^ and $ are interpreted,
d0cde9
   so that ^ can also match an empty string after a newline character,
d0cde9
   and $ can also match an empty string before a newline character (to
d0cde9
   do this, add an "M" after the regular expression terminator, like
d0cde9
   /^>/M -- see section 3.1.1). Even if you use this feature, \` and \'
d0cde9
   still match the beginning and the end of the pattern space,
d0cde9
   respectively.
d0cde9

d0cde9
H. ssed
d0cde9

d0cde9
   Everything that was said for GNU sed applies to ssed as well. In
d0cde9
   addition, in Perl-mode (-R switch), these become active or inactive:
d0cde9

d0cde9
      .     - no longer matches new-line characters
d0cde9
      \A    - matches beginning of pattern space
d0cde9
      \Z    - matches end of pattern space or last newline in the PS
d0cde9
      \z    - matches end of pattern space
d0cde9
      \d    - matches any digit: same as [0-9]
d0cde9
      \D    - matches any non-digit: same as [^0-9]
d0cde9
      \`    - no longer matches beginning of pattern space
d0cde9
      \'    - no longer matches end of pattern space
d0cde9
      \<    - no longer matches boundary between nonword & word char
d0cde9
      \>    - no longer matches boundary between word & nonword char
d0cde9
      \oNNN - no longer matches char with octal value NNN
d0cde9
      \dNNN - no longer matches char with decimal value NNN
d0cde9
      \NNN  - matches char with octal value NNN
d0cde9

d0cde9
   Perl mode supports lookahead (?=match) and lookbehind (?<=match)
d0cde9
   pattern matching.  The matched text is NOT captured in "&" for s///
d0cde9
   replacements!
d0cde9

d0cde9
      foo(?=bar)   - match "foo" only if "bar" follows it
d0cde9
      foo(?!bar)   - match "foo" only if "bar" does NOT follow it
d0cde9
      (?<=foo)bar  - match "bar" only if "foo" precedes it
d0cde9
      (?
d0cde9

d0cde9
      (?
d0cde9
                  - match "foo" only if NOT preceded by "in", "on" or "at"
d0cde9
      (?<=\d{3})(?
d0cde9
                  - match "foo" only if preceded by 3 digits other than "999"
d0cde9

d0cde9
  In Perl mode, there are two new switches in /addressing/ or s///
d0cde9
  commands. Switches may be lowercase in s/// commands, but must be
d0cde9
  uppercase in /addressing/:
d0cde9

d0cde9
       /S  - lets "." match a newline also
d0cde9
       /X  - extra whitespace is ignored. See below, for sample usage.
d0cde9

d0cde9
   Here are some examples of Perl-style regular expressions. Use the -R
d0cde9
   switch.
d0cde9

d0cde9
     (?i)abc    - case-insensitive match of abc, ABC, aBc, ABc, etc.
d0cde9
     ab(?i)c    - same as above; the (?i) applies throughout the pattern
d0cde9
     (ab(?i)c)  - matches abc or abC; the outer parens make the difference!
d0cde9
     (?m)       - multi-line pattern space: same as "s/FIND/REPL/M"
d0cde9
     (?s)       - set "." to match newline also: same as "s/FIND/REPL/S"
d0cde9
     (?x)       - ignore whitespace and #comments; see section (9) below.
d0cde9

d0cde9
     (?:abc)foo    - match "abcfoo", but do not capture 'abc' in \1
d0cde9
     (?:ab|cd)ef   - match "abef" or "cdef"; only 'cd' is captured in \1
d0cde9
     (?#remark)xy  - match "xy"; remarks after "#" are ignored.
d0cde9

d0cde9
   And here are some sample uses of /X switch to add comments to complex
d0cde9
   expressions. To embed literal spaces, precede with \ or put inside
d0cde9
   [brackets].
d0cde9

d0cde9
     # ssed script to change "(123) 456-7890" into "[ac123] 456-7890"
d0cde9
     #
d0cde9
     s/ # BACKSLASH IS NEEDED AT END OF EACH LINE!   \
d0cde9
     \(                   # literal left paren, (    \
d0cde9
     (\d{3})              # 3 digits                 \
d0cde9
     \)                   # literal right paren, )   \
d0cde9
     [ \t]*               # zero or more spaces or tabs  \
d0cde9
     (\d{3}-\d{4})        # 3 digits, hyphen, 4 digits   \
d0cde9
     /[ac\1] \2/gx;       # replace g(lobally), with e(x)tended spacing
d0cde9

d0cde9
6.7.4. Word boundaries
d0cde9

d0cde9
   GNU sed, ssed, sed16, sed15 and sedmod use certain symbols to define
d0cde9
   the boundary between a "word character" and a nonword character. A
d0cde9
   word character fits the regex "[A-Za-z0-9_]". Note: a word character
d0cde9
   includes the underscore "_" but not the hyphen, probably because the
d0cde9
   underscore is permissible as a label in sed and in other scripting
d0cde9
   languages. (In gsed103, a word character did NOT include the
d0cde9
   underscore; it included alphanumerics only.)
d0cde9

d0cde9
   These symbols include '\<' and '\>' (gsed, ssed, sed15, sed16,
d0cde9
   sedmod) and '\b' and '\B' (gsed only). Note that the boundary
d0cde9
   symbols do not represent a character, but a position on the line.
d0cde9
   Word boundaries are used with literal characters or character sets
d0cde9
   to let you match (and delete or alter) whole words without
d0cde9
   affecting the spaces or punctuation marks outside of those words.
d0cde9
   They can only be used in a "/pattern/" address or in the LHS of a
d0cde9
   's/LHS/RHS/' command. The following table shows how these symbols
d0cde9
   may be used in HHsed and GNU sed. Sedmod matches the syntax of
d0cde9
   HHsed.
d0cde9

d0cde9
      Match position      Possible word boundaries   HHsed   GNU sed
d0cde9
      ---------------------------------------------------------------
d0cde9
      start of word    [nonword char]^[word char]      \<    \< or \b
d0cde9
      end of word         [word char]^[nonword char]   \>    \> or \b
d0cde9
      middle of word      [word char]^[word char]     none      \B
d0cde9
      outside of word  [nonword char]^[nonword char]  none      \B
d0cde9
      ---------------------------------------------------------------
d0cde9

d0cde9
   In ssed, the symbols '\<' and '\>' lose their special meaning when
d0cde9
   the -R switch is used to invoke Perl-style expressions. However,
d0cde9
   the identical meaning of '\<' and '\>' can be obtained through
d0cde9
   these nonmatching, zero-width assertions:
d0cde9

d0cde9
       (?
d0cde9

d0cde9
6.7.5. Commands which operate differently
d0cde9

d0cde9
A. GNU sed version 3.02 and 3.02.80
d0cde9

d0cde9
   The N command no longer discards the contents of the pattern space
d0cde9
   upon reaching the end of file. This is not a bug, it's a feature.
d0cde9
   However, it breaks certain scripts which relied on the older
d0cde9
   behavior of N.
d0cde9

d0cde9
   'N' adds the Next line to the pattern space, enabling multiple
d0cde9
   lines to be stored and acted upon. Upon reaching the last line of
d0cde9
   the file, if the N command was issued again, the contents of the
d0cde9
   pattern space would be silently deleted and the script would abort
d0cde9
   (this has been the traditional behavior). For this reason, sed
d0cde9
   users generally wrote:
d0cde9

d0cde9
       $!N;   # to add the Next line to every line but the last one.
d0cde9

d0cde9
   However, certain sed scripts relied on this behavior, such as the
d0cde9
   script to delete trailing blank lines at the end of a file (see
d0cde9
   script #12 in section 3.2, "Common one-line sed scripts", above).
d0cde9
   Also, classic textbooks such as Dale Dougherty and Arnold Robbins'
d0cde9
   _sed & awk_ documented the older behavior.
d0cde9

d0cde9
   The GNU sed maintainer felt that despite the portability problems
d0cde9
   this would cause, changing the N command to print (rather than
d0cde9
   delete) the pattern space was more consistent with one's intuitions
d0cde9
   about how a command to "append the Next line" _ought_ to behave.
d0cde9
   Another fact favoring the change was that "{N;command;}" will
d0cde9
   delete the last line if the file has an odd number of lines, but
d0cde9
   print the last line if the file has an even number of lines.
d0cde9

d0cde9
   To convert scripts which used the former behavior of N (deleting
d0cde9
   the pattern space upon reaching the EOF) to scripts compatible with
d0cde9
   all versions of sed, change a lone "N;" to "$d;N;".
d0cde9

d0cde9
------------------------------
d0cde9

d0cde9
7. KNOWN BUGS AMONG SED VERSIONS
d0cde9

d0cde9
   Most versions of GNU sed and ssed contain a "buglist" in the
d0cde9
   archive source code of known errors or reported behaviors that may
d0cde9
   be misconstrued as bugs. This portion of the sed FAQ does _not_
d0cde9
   attempt to fully reproduce those buglists files. However, we do
d0cde9
   seek to do some substantial reporting, particularly where certain
d0cde9
   programs have no "buglist" of their own or are not being actively
d0cde9
   maintained.
d0cde9

d0cde9
   As a rule of thumb, if the bug "bites" someone on the sed-users
d0cde9
   mailing list, I tend to report it.
d0cde9

d0cde9
7.1. ssed v3.59 (by Paolo Bonzini)
d0cde9

d0cde9
   (1) N does not discard the contents of the pattern space upon
d0cde9
   reaching the end of file; not a bug. See section 6.7.5.A, above.
d0cde9

d0cde9
   (2) If \x26 is entered into the RHS of a substitution, it is
d0cde9
   interpreted as an ampersand metacharacter, and the entire pattern
d0cde9
   matched in the "find" portion is inserted at that point. A literal
d0cde9
   ampersand should be inserted instead.
d0cde9

d0cde9
   (3) Under Windows 2000, the -i switch doesn't create backup files
d0cde9
   properly. When passed one or more files to process, the source
d0cde9
   file(s) are unchanged, and the output changed files are given
d0cde9
   filenames like sedDOSxyz with no way to correspond them with the
d0cde9
   names of the source files.
d0cde9

d0cde9
7.2. GNU sed v4.0 - v4.0.5
d0cde9

d0cde9
   (1) N does not discard the contents of the pattern space upon
d0cde9
   reaching the end of file; not a bug. See section 6.7.5.A, above.
d0cde9

d0cde9
   (2) If \x26 is entered into the RHS of a substitution, it is
d0cde9
   interpreted as an ampersand metacharacter, and the entire pattern
d0cde9
   matched in the "find" portion is inserted at that point. A literal
d0cde9
   ampersand should be inserted instead.
d0cde9

d0cde9
7.3. GNU sed v3.02.80
d0cde9

d0cde9
   (1) N does not discard the contents of the pattern space upon
d0cde9
   reaching the end of file; not a bug. See section 6.7.5.A, above.
d0cde9

d0cde9
   (2) Same as #2 for GNU sed v4.0, above.
d0cde9

d0cde9
7.4. GNU sed v3.02
d0cde9

d0cde9
   (1) Affects only v3.02 binaries compiled with DJGPP for MS-DOS and
d0cde9
   MS-Windows: 'l' (list) command does not display a lone carriage
d0cde9
   return (0x0D, ^M) embedded in a line.
d0cde9

d0cde9
   (2) The expression "\<" causes problems when attempting the
d0cde9
   following types of substitutions, which should print "+aaa +bbb":
d0cde9

d0cde9
       echo aaa bbb | sed 's/\</+/g'    # prints "+a+a+a +b+b+b"
d0cde9
       echo aaa bbb | sed 's/\<./+&/g'  # prints "+a+a+a +b+b+b"
d0cde9

d0cde9
   (3) The N command no longer discards the contents of the pattern
d0cde9
   space upon reaching the end of file. This is not a bug, it's a
d0cde9
   feature. See section 6.7.5, "Commands which operate differently".
d0cde9

d0cde9
7.5. GNU sed v2.05
d0cde9

d0cde9
   (1) If a number follows the substitute command (e.g., s/f/F/10) and
d0cde9
   the number exceeds the possible matches on the pattern space, the
d0cde9
   command 't label' _always_ jumps to the specified label. 't' should
d0cde9
   jump only if the substitution was successful (or returned "true").
d0cde9

d0cde9
   (2) 'l' (list) command does not convert the following characters to
d0cde9
   hex values, but passes them through unchanged: 0xF7, 0xFB, 0xFC,
d0cde9
   0xFD, 0xFE.
d0cde9

d0cde9
   (3) A range address like "/foo/,14" is supposed to match every line
d0cde9
   from the first occurrence of "foo" until line 14, inclusive, and
d0cde9
   then match only those lines containing "foo" thereafter. In gsed
d0cde9
   v2.05, if "foo" occurs later in the file, every line from there to
d0cde9
   the end of file will be matched (since gsed is looking for line 14
d0cde9
   to occur again!).
d0cde9

d0cde9
   (4) The regexes /\`/ and /\'/ are not interpreted as a backquote
d0cde9
   and apostrophe, as might be expected. Instead, they are used to
d0cde9
   represent the beginning-of-line and end-of-line (respectively), to
d0cde9
   conform with similar regexes in the GNU versions of Emacs and awk.
d0cde9
   As a consequence, there is no clear way to indicate an apostrophe,
d0cde9
   since a bare apostrophe (') has special meaning to the Unix shell
d0cde9
   and the quoted apostrophe (\') is interpreted as the EOL. A
d0cde9
   double-quote apostrophe (\\') was interpreted as a backslash to sed
d0cde9
   and a quote mark to the shell--again, not providing the expected
d0cde9
   results. This syntax changed in the next version of gsed.
d0cde9

d0cde9
   (5) Multiple occurrences of the 'w' command fail, as shown here,
d0cde9
   given that both "aaa" and "bbb" occur within the file:
d0cde9

d0cde9
       gsed -e "/aaa/w FILE" -e "/bbb/w FILE" input.txt
d0cde9

d0cde9
   (6) The expression "\<" causes problems when attempting the
d0cde9
   following type of substitution, which should print "+aaa +bbb":
d0cde9

d0cde9
       echo aaa bbb | sed 's/\</+/g'    # sed hangs up with no output
d0cde9

d0cde9
   The syntax 's/\<./+&/g' issues the proper output.
d0cde9

d0cde9
7.6. GNU sed v1.18
d0cde9

d0cde9
   (1) Same as #1 for GNU sed v2.05, above.
d0cde9

d0cde9
   (2) The following command will lock the computer under Win95. Echos
d0cde9
   is an echo command that does not issue a trailing newline:
d0cde9

d0cde9
       echos any_word | gsed "s/[ ]*$//"
d0cde9

d0cde9
   (3) Same as #3 for GNU sed v2.05, above.
d0cde9

d0cde9
7.7. GNU sed v1.03 (by Frank Whaley)
d0cde9

d0cde9
   (1) The \w and \W escape sequences both match only nonword
d0cde9
   characters. \w is misdefined and should match word characters.
d0cde9

d0cde9
   (2) The underscore is defined as a nonword character; it should be
d0cde9
   defined as a word character.
d0cde9

d0cde9
   (3) same as #3 for GNU sed v2.05, above.
d0cde9

d0cde9
7.8. sed v1.6 (by Walter Briscoe) - still in beta version
d0cde9

d0cde9
   (1) Duplicated subexpressions (still) do not match an empty set as
d0cde9
   they should. This problem was inherited from HHsed15.
d0cde9

d0cde9
       echo 123 | sed "s/\([a-z][a-z]\)*/=\1/"  # does not return '='
d0cde9

d0cde9
   (2) If grouping is followed by a + operator, nothing is matched.
d0cde9
   This problem was inherited from HHsed; it fixed a bug with the *
d0cde9
   operator, but the problem with the + operator persists.
d0cde9

d0cde9
       echo aaa | sed "/\(a\)+/d"          # nothing is deleted.
d0cde9

d0cde9
   (3) With the interval expressions \{1,\} and +, there is a bug
d0cde9
   related to the & replacement character. This affected the BETA
d0cde9
   release, and it's not known if it affects the final release.
d0cde9

d0cde9
       echo ab | sed "s/a[^a]*/&c/"        # returns 'abc'. Okay.
d0cde9
       echo ab | sed "s/a[^a]+/&c/"        # returns 'ab'. Bug!
d0cde9
       echo ab | sed "s/a[^a]\{1,\}/&c/"   # returns 'ab'. Bug!
d0cde9

d0cde9
7.9. HHsed v1.5 (by Howard Helman)
d0cde9

d0cde9
   (1) If a number follows the substitute command (e.g., s/foo/bar/2),
d0cde9
   in a sed script entered from the command line, two semicolons must
d0cde9
   follow the number, or they must be separated by an -e switch.
d0cde9
   Normally, only 1 semicolon is needed to separate commands.
d0cde9

d0cde9
       echo bit bet | HHsed "s/b/n/2;;s/b/B/"          # solution 1
d0cde9
       echo bit bet | HHsed -e "s/b/n/2" -e "s/b/B"    # solution 2
d0cde9

d0cde9
   (2) If the substitute command is followed by a number and a "p"
d0cde9
   flag, when the -n switch is used, the "p" flag must occur first.
d0cde9

d0cde9
       echo aaa | HHsed -n "s/./B/3p"    # bug! nothing prints
d0cde9
       echo aaa | HHsed -n "s/./B/p3"    # prints "aaB" as expected
d0cde9

d0cde9
   (3) The following commands will cause HHsed to lock the computer
d0cde9
   under MS-DOS or Win95. Note that they occur because of malformed
d0cde9
   regular expressions which will match no characters.
d0cde9

d0cde9
       sed -n "p;s/\<//g;" file
d0cde9
       sed -n "p;s/[char-set]*//g;" file
d0cde9

d0cde9
   (4) The range command '/RE1/,/RE2/' in HHsed will match one line if
d0cde9
   both regexes occur on the same line (see section 3.4(3), above).
d0cde9
   Though this could be construed as a feature, it should probably be
d0cde9
   considered a bug since its operation differs from every other
d0cde9
   version of sed. For example, '/----/,/----/{s/^/>>/;}' should put
d0cde9
   two angle brackets ">>" before every line which is sandwiched
d0cde9
   between a row of 4 or more hyphens. With HHsed, this command will
d0cde9
   only prefix the hyphens themselves with the angle brackets.
d0cde9

d0cde9
   (5) If the hold space is empty, the H command copies the pattern
d0cde9
   space to the hold space but fails to prepend a leading newline. The
d0cde9
   H command is supposed to add a newline, followed by the contents of
d0cde9
   the pattern space, to the hold space at all times. A workaround is
d0cde9
   "{G;s/^\(.*\)\(\n\)$/\2\1/;H;s/\n$//;}", but it requires knowing
d0cde9
   that the hold space is empty and using the command only once.
d0cde9
   Another alternative is to use the G or the h command alone at key
d0cde9
   points in the script.
d0cde9

d0cde9
   (6) If grouping is followed by an '*' or '+' operator, HHsed does
d0cde9
   not match the pattern, but issues no warning. See below:
d0cde9

d0cde9
       echo aaa | HHsed "/\(a\)*/d"      # nothing is deleted
d0cde9
       echo aaa | HHsed "/\(a\)+/d"      # nothing is deleted
d0cde9
       echo aaa | HHsed "s/\(a\)*/\1B/"  # nothing is changed
d0cde9
       echo aaa | HHsed "s/\(a\)+/\1B/"  # nothing is changed
d0cde9

d0cde9
   (7) If grouping is followed by an interval expression, HHsed halts
d0cde9
   with the error message "garbled command", in all of the following
d0cde9
   examples:
d0cde9

d0cde9
       echo aaa | HHsed "/\(a\)\{3\}/d"
d0cde9
       echo aaa | HHsed "/\(a\)\{1,5\}/d"
d0cde9
       echo aaa | HHsed "s/\(a\)\{3\}/\1B/"
d0cde9

d0cde9
   (8) In interval expressions, 0 is not supported. E.g., \{0,3\)
d0cde9

d0cde9
7.10. sedmod v1.0 (by Hern Chen)
d0cde9

d0cde9
   Technically, the following are limits (or features?) of sedmod, not
d0cde9
   bugs, since the docs for sedmod do not claim to support these
d0cde9
   missing features.
d0cde9

d0cde9
   (1) sedmod does not support standard interval expressions  \{...\}
d0cde9
   present in nearly all versions of sed.
d0cde9

d0cde9
   (2) If grouping is followed by an '*' or '+' operator, sedmod gives
d0cde9
   a "garbled command" message. However, if the grouped expressions
d0cde9
   are strings literals with no metacharacters, a partial workaround
d0cde9
   can be done like so:
d0cde9

d0cde9
       \(string\)\1*    # matches 1 or more instances of 'string'
d0cde9
       \(string\)\1+    # matches 2 or more instances of 'string'
d0cde9

d0cde9
   (3) sedmod does not support a numeric argument after the s///
d0cde9
   command, as in 's/a/b/3', present in nearly all versions of sed.
d0cde9

d0cde9
   The following are bugs in sedmod v1.0:
d0cde9

d0cde9
   (4) When the -i (ignore case) switch is used, the '/regex/d'
d0cde9
   command is not properly obeyed. Sedmod may miss one or more lines
d0cde9
   matching the expression, regardless of where they occur in the
d0cde9
   script. Workaround: use "/regex/{d;}" instead.
d0cde9

d0cde9
7.11. HP-UX sed
d0cde9

d0cde9
   (1) Versions of HP-UX sed up to and including version 10.20 are
d0cde9
   buggy. According to the README file, which comes with the GNU cc
d0cde9
   at <ftp://ftp.ntua.gr/pub/gnu/sed/sed-2.05.bin.README>:
d0cde9

d0cde9
   "When building gcc on a hppa*-*-hpux10 platform, the `fixincludes'
d0cde9
   step (which involves running a sed script) fails because of a bug
d0cde9
   in the vendor's implementation of sed.  Currently the only known
d0cde9
   workaround is to install GNU sed before building gcc.  The file
d0cde9
   sed-2.05.bin.hpux10 is a precompiled binary for that platform."
d0cde9

d0cde9
7.12. SunOS sed v4.1
d0cde9

d0cde9
   (1) Bug occurs in RE pattern matching when a non-null '[char-set]*'
d0cde9
   is followed by a null '\NUM' pattern recall, illustrated here and
d0cde9
   reported by Greg Ubben:
d0cde9

d0cde9
       s/\(a\)\(b*\)cd\1[0-9]*\2foo/bar/  # between '[0-9]*' and '\2'
d0cde9
       s/\(a\{0,1\}\).\{0,1\}\1/bar/      # between '.\{0,1\}' and '\1'
d0cde9

d0cde9
   Workaround: add a do-nothing 'X*' expression which will not match
d0cde9
   any characters on the line between the two components. E.g.,
d0cde9

d0cde9
       s/\(a\)\(b*\)cd\1[0-9]*X*\2foo/bar/
d0cde9
       s/\(a\{0,1\}\).\{0,1\}X*\1/bar/
d0cde9

d0cde9
7.13. SunOS sed v5.6
d0cde9

d0cde9
   (1) If grouping is followed by an asterisk, SunOS sed does not match
d0cde9
   the null string, which it should do. The following command:
d0cde9

d0cde9
       echo foo | sed 's/f\(NO-MATCH\)*/g\1/'
d0cde9

d0cde9
   should transform "foo" to "goo" under normal versions of sed.
d0cde9

d0cde9
7.14. Ultrix sed v4.3
d0cde9

d0cde9
   (1) If grouping is followed by an asterisk, Ultrix sed replies with
d0cde9
   "command garbled", as shown in the following example:
d0cde9

d0cde9
       echo foo | sed 's/f\(NO-MATCH\)*/g\1/'
d0cde9

d0cde9
   (2) If grouping is followed by a numeric operator such as \{0,9\},
d0cde9
   Ultrix sed does not find the match.
d0cde9

d0cde9
7.15. Digital Unix sed
d0cde9

d0cde9
   (1) The following comes from the man pages for sed distributed with
d0cde9
   new, 1998 versions of Digital Unix (reformatted to fit our
d0cde9
   margins):
d0cde9

d0cde9
   [Digital]  The h subcommand for sed does not work properly.  When
d0cde9
   you use the  h subcommand to place text into the hold area, only
d0cde9
   the last line of the specified text is saved.  You can use the H
d0cde9
   subcommand to append text to the hold area. The H subcommand and
d0cde9
   all others dealing with the hold area work correctly.
d0cde9

d0cde9
   (2) "$d" command issues an error message, "cannot parse".  Reported
d0cde9
   by Carlos Duarte on 8 June 1998.
d0cde9

d0cde9
[end-of-file]