|
|
d0cde9 |
|
|
|
d0cde9 |
Archive-Name: editor-faq/sed
|
|
|
d0cde9 |
Posting-Frequency: irregular
|
|
|
d0cde9 |
Last-modified: 10 March 2003
|
|
|
d0cde9 |
Version: 015
|
|
|
d0cde9 |
URL: http://sed.sourceforge.net/sedfaq.html
|
|
|
d0cde9 |
Maintainer: Eric Pement (pemente@northpark.edu)
|
|
|
d0cde9 |
|
|
|
d0cde9 |
THE SED FAQ
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Frequently Asked Questions about
|
|
|
d0cde9 |
sed, the stream editor
|
|
|
d0cde9 |
|
|
|
d0cde9 |
CONTENTS
|
|
|
d0cde9 |
|
|
|
d0cde9 |
1. GENERAL INFORMATION
|
|
|
d0cde9 |
1.1. Introduction - How this FAQ is organized
|
|
|
d0cde9 |
1.2. Latest version of the sed FAQ
|
|
|
d0cde9 |
1.3. FAQ revision information
|
|
|
d0cde9 |
1.4. How do I add a question/answer to the sed FAQ?
|
|
|
d0cde9 |
1.5. FAQ abbreviations
|
|
|
d0cde9 |
1.6. Credits and acknowledgements
|
|
|
d0cde9 |
1.7. Standard disclaimers
|
|
|
d0cde9 |
|
|
|
d0cde9 |
2. BASIC SED
|
|
|
d0cde9 |
2.1. What is sed?
|
|
|
d0cde9 |
2.2. What versions of sed are there, and where can I get them?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
2.2.1. Free versions
|
|
|
d0cde9 |
|
|
|
d0cde9 |
2.2.1.1. Unix platforms
|
|
|
d0cde9 |
2.2.1.2. OS/2
|
|
|
d0cde9 |
2.2.1.3. Microsoft Windows (Win3x, Win9x, WinNT, Win2K)
|
|
|
d0cde9 |
2.2.1.4. MS-DOS
|
|
|
d0cde9 |
2.2.1.5. CP/M
|
|
|
d0cde9 |
2.2.1.6. Macintosh v8 or v9
|
|
|
d0cde9 |
|
|
|
d0cde9 |
2.2.2. Shareware and Commercial versions
|
|
|
d0cde9 |
|
|
|
d0cde9 |
2.2.2.1. Unix platforms
|
|
|
d0cde9 |
2.2.2.2. OS/2
|
|
|
d0cde9 |
2.2.2.3. Windows 95/98, Windows NT, Windows 2000
|
|
|
d0cde9 |
2.2.2.4. MS-DOS
|
|
|
d0cde9 |
|
|
|
d0cde9 |
2.3. Where can I learn to use sed?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
2.3.1. Books
|
|
|
d0cde9 |
2.3.2. Mailing list
|
|
|
d0cde9 |
2.3.3. Tutorials, electronic text
|
|
|
d0cde9 |
2.3.4. General web and ftp sites
|
|
|
d0cde9 |
|
|
|
d0cde9 |
3. TECHNICAL
|
|
|
d0cde9 |
3.1. More detailed explanation of basic sed
|
|
|
d0cde9 |
3.1.1. Regular expressions on the left side of "s///"
|
|
|
d0cde9 |
3.1.2. Escape characters on the right side of "s///"
|
|
|
d0cde9 |
3.1.3. Substitution switches
|
|
|
d0cde9 |
3.2. Common one-line sed scripts. How do I . . . ?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
- double/triple-space a file?
|
|
|
d0cde9 |
- convert DOS/Unix newlines?
|
|
|
d0cde9 |
- delete leading/trailing spaces?
|
|
|
d0cde9 |
- do substitutions on all/certain lines?
|
|
|
d0cde9 |
- delete consecutive blank lines?
|
|
|
d0cde9 |
- delete blank lines at the top/end of the file?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
3.3. Addressing and address ranges
|
|
|
d0cde9 |
3.4. Address ranges in GNU sed and HHsed
|
|
|
d0cde9 |
3.5. Debugging sed scripts
|
|
|
d0cde9 |
3.6. Notes about s2p, the sed-to-perl translator
|
|
|
d0cde9 |
3.7. GNU/POSIX extensions to regular expressions
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4. EXAMPLES
|
|
|
d0cde9 |
ONE-CHARACTER QUESTIONS
|
|
|
d0cde9 |
4.1. How do I insert a newline into the RHS of a substitution?
|
|
|
d0cde9 |
4.2. How do I represent control-codes or non-printable characters?
|
|
|
d0cde9 |
4.3. How do I convert files with toggle characters, like +this+,
|
|
|
d0cde9 |
to look like [i]this[/i]?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
CHANGING STRINGS
|
|
|
d0cde9 |
4.10. How do I perform a case-insensitive search?
|
|
|
d0cde9 |
4.11. How do I match only the first occurrence of a pattern?
|
|
|
d0cde9 |
4.12. How do I parse a comma-delimited (CSV) data file?
|
|
|
d0cde9 |
4.13. How do I handle fixed-length, columnar data?
|
|
|
d0cde9 |
4.14. How do I commify a string of numbers?
|
|
|
d0cde9 |
4.15. How do I prevent regex expansion on substitutions?
|
|
|
d0cde9 |
4.16. How do I convert a string to all lowercase or capital letters?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
CHANGING BLOCKS (consecutive lines)
|
|
|
d0cde9 |
4.20. How do I change only one section of a file?
|
|
|
d0cde9 |
4.21. How do I delete or change a block of text if the block contains
|
|
|
d0cde9 |
a certain regular expression?
|
|
|
d0cde9 |
4.22. How do I locate a paragraph of text if the paragraph contains a
|
|
|
d0cde9 |
certain regular expression?
|
|
|
d0cde9 |
4.23. How do I match a block of specific consecutive lines?
|
|
|
d0cde9 |
4.23.1. Try to use a "/range/, /expression/"
|
|
|
d0cde9 |
4.23.2. Try to use a "multi-line\nexpression"
|
|
|
d0cde9 |
4.23.3. Try to use a block of "literal strings"
|
|
|
d0cde9 |
4.24. How do I address all the lines between RE1 and RE2, excluding the lines themselves?
|
|
|
d0cde9 |
4.25. How do I join two lines if line #1 ends in a [certain string]?
|
|
|
d0cde9 |
4.26. How do I join two lines if line #2 begins in a [certain string]?
|
|
|
d0cde9 |
4.27. How do I change all paragraphs to long lines?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
SHELL AND ENVIRONMENT
|
|
|
d0cde9 |
4.30. How do I read environment variables with sed ...
|
|
|
d0cde9 |
4.31.1. ... on Unix platforms?
|
|
|
d0cde9 |
4.31.2. ... on MS-DOS or 4DOS platforms?
|
|
|
d0cde9 |
4.32. How do I export or pass variables back into the environment ...
|
|
|
d0cde9 |
4.32.1. ... on Unix platforms?
|
|
|
d0cde9 |
4.32.2. ... on MS-DOS or 4DOS platforms?
|
|
|
d0cde9 |
4.33. How do I handle shell quoting in sed?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
FILES, DIRECTORIES, AND PATHS
|
|
|
d0cde9 |
4.40. How do I read (insert/add) a file at the top of a textfile?
|
|
|
d0cde9 |
4.41. How do I make substitutions in every file in a directory, or
|
|
|
d0cde9 |
in a complete directory tree?
|
|
|
d0cde9 |
4.41.1. ... ssed solution
|
|
|
d0cde9 |
4.41.2. ... Unix solution
|
|
|
d0cde9 |
4.41.3. ... DOS solution
|
|
|
d0cde9 |
4.42. How do I replace "/some/UNIX/path" in a substitution?
|
|
|
d0cde9 |
4.43. How do I replace "C:\SOME\DOS\PATH" in a substitution?
|
|
|
d0cde9 |
4.44. How do I emulate file-includes, using sed?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
5. WHY ISN'T THIS WORKING?
|
|
|
d0cde9 |
5.1. Why don't my variables like $var get expanded in my sed script?
|
|
|
d0cde9 |
5.2. I'm using 'p' to print, but I have duplicate lines sometimes.
|
|
|
d0cde9 |
5.3. Why does my DOS version of sed process a file part-way through
|
|
|
d0cde9 |
and then quit?
|
|
|
d0cde9 |
5.4. My RE isn't matching/deleting what I want it to. (Or, "Greedy vs.
|
|
|
d0cde9 |
stingy pattern matching")
|
|
|
d0cde9 |
5.5. What is CSDPMI*B.ZIP and why do I need it?
|
|
|
d0cde9 |
5.6. Where are the man pages for GNU sed?
|
|
|
d0cde9 |
5.7. How do I tell what version of sed I am using?
|
|
|
d0cde9 |
5.8. Does sed issue an exit code?
|
|
|
d0cde9 |
5.9. The 'r' command isn't inserting the file into the text.
|
|
|
d0cde9 |
5.10. Why can't I match or delete a newline using the \n escape
|
|
|
d0cde9 |
sequence? Why can't I match 2 or more lines using \n?
|
|
|
d0cde9 |
5.11. My script aborts with an error message, "event not found".
|
|
|
d0cde9 |
|
|
|
d0cde9 |
6. OTHER ISSUES
|
|
|
d0cde9 |
6.1. I have a problem that stumps me. Where can I get help?
|
|
|
d0cde9 |
6.2. How does sed compare with awk, perl, and other utilities?
|
|
|
d0cde9 |
6.3. When should I use sed?
|
|
|
d0cde9 |
6.4. When should I NOT use sed?
|
|
|
d0cde9 |
6.5. When should I ignore sed and use Awk or Perl instead?
|
|
|
d0cde9 |
6.6. Known limitations among sed versions
|
|
|
d0cde9 |
6.7. Known incompatibilities between sed versions
|
|
|
d0cde9 |
|
|
|
d0cde9 |
6.7.1. Issuing commands from the command line
|
|
|
d0cde9 |
6.7.2. Using comments (prefixed by the '#' sign)
|
|
|
d0cde9 |
6.7.3. Special syntax in REs
|
|
|
d0cde9 |
6.7.4. Word boundaries
|
|
|
d0cde9 |
6.7.5. Commands which operate differently
|
|
|
d0cde9 |
|
|
|
d0cde9 |
7. KNOWN BUGS AMONG SED VERSIONS
|
|
|
d0cde9 |
7.1. ssed v3.59
|
|
|
d0cde9 |
7.2. GNU sed v4.0 - v4.0.5
|
|
|
d0cde9 |
7.3. GNU sed v3.02.80
|
|
|
d0cde9 |
7.4. GNU sed v3.02
|
|
|
d0cde9 |
7.5. GNU sed v2.05
|
|
|
d0cde9 |
7.6. GNU sed v1.18
|
|
|
d0cde9 |
7.7. GNU sed v1.03
|
|
|
d0cde9 |
7.8. sed v1.6 (Briscoe)
|
|
|
d0cde9 |
7.9. sed v1.5 (Helman)
|
|
|
d0cde9 |
7.10. sedmod v1.0 (Chen)
|
|
|
d0cde9 |
7.11. HP-UX sed
|
|
|
d0cde9 |
7.12. SunOS sed v4.1
|
|
|
d0cde9 |
7.13. SunOS sed v5.6
|
|
|
d0cde9 |
7.14. Ultrix sed v4.3
|
|
|
d0cde9 |
7.15. Digital Unix sed
|
|
|
d0cde9 |
|
|
|
d0cde9 |
|
|
|
d0cde9 |
------------------------------
|
|
|
d0cde9 |
|
|
|
d0cde9 |
1. GENERAL INFORMATION
|
|
|
d0cde9 |
|
|
|
d0cde9 |
1.1. Introduction - How this FAQ is organized
|
|
|
d0cde9 |
|
|
|
d0cde9 |
This FAQ is organized to answer common (and some uncommon)
|
|
|
d0cde9 |
questions about sed, quickly. If you see a term or abbreviation in
|
|
|
d0cde9 |
the examples that seems unclear, see if the term is defined in
|
|
|
d0cde9 |
section 1.5. If not, send your comment to pemente[at]northpark.edu.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
1.2. Latest version of the sed FAQ
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The newest version of the sed FAQ is usually here:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
http://sed.sourceforge.net/sedfaq.html (HTML version)
|
|
|
d0cde9 |
http://sed.sourceforge.net/sedfaq.txt (plain text)
|
|
|
d0cde9 |
http://www.student.northpark.edu/pemente/sed/sedfaq.html
|
|
|
d0cde9 |
http://www.student.northpark.edu/pemente/sed/sedfaq.txt
|
|
|
d0cde9 |
http://www.faqs.org/faqs/editor-faq/sed
|
|
|
d0cde9 |
ftp://rtfm.mit.edu/pub/faqs/editor-faq/sed
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Another FAQ file on sed by a different author can be found here:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
http://www.dreamwvr.com/sed-info/sed-faq.html
|
|
|
d0cde9 |
|
|
|
d0cde9 |
1.3. FAQ revision information
|
|
|
d0cde9 |
|
|
|
d0cde9 |
In the plaintext version, changes are shown by a vertical bar (|)
|
|
|
d0cde9 |
placed in column 78 of the affected lines. To remove the vertical
|
|
|
d0cde9 |
bars (use double quotes for MS-DOS):
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed 's/ *|$//' sedfaq.txt > sedfaq2.txt
|
|
|
d0cde9 |
|
|
|
d0cde9 |
In the HTML version, vertical bars do not appear. New or altered
|
|
|
d0cde9 |
portions of the FAQ are indicated by printing in dark blue type.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
In the text version, words needing emphasis may be surrounded by
|
|
|
d0cde9 |
the underscore '_' or the asterisk '*'. In the HTML version, these
|
|
|
d0cde9 |
are changed to italics and boldface, respectively.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
1.4. How do I add a question/answer to the sed FAQ?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Word your question briefly and send it to pemente[at]northpark.edu,
|
|
|
d0cde9 |
indicating your proposed change. We'll post it on the sed-users
|
|
|
d0cde9 |
mailing list (see section 2.3.2) and discuss it there. If it's
|
|
|
d0cde9 |
good, your contribution will be added to the next edition.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
1.5. FAQ abbreviations
|
|
|
d0cde9 |
|
|
|
d0cde9 |
files = one or more filenames, separated by whitespace
|
|
|
d0cde9 |
gsed = GNU sed
|
|
|
d0cde9 |
ssed = super-sed
|
|
|
d0cde9 |
RE = Regular Expressions supported by sed
|
|
|
d0cde9 |
LHS = the left-hand side ("find" part) of "s/find/repl/" command
|
|
|
d0cde9 |
RHS = the right-hand side ("replace" part) of "s/find/repl/" cmd
|
|
|
d0cde9 |
nn+ = version _nn_ or higher (e.g., "15+" = version 1.5 and above)
|
|
|
d0cde9 |
|
|
|
d0cde9 |
files: "files" stands for one or more filenames entered on the
|
|
|
d0cde9 |
command line. The names may include any wildcards your shell
|
|
|
d0cde9 |
understands (such as ``zork*'' or ``Aug[4-9].let''). Sed will
|
|
|
d0cde9 |
process each filename passed to it by the shell.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
RE: For details on regular expressions, see section 3.1.1., below.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
1.6. Credits and acknowledgements
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Many of the ideas for this FAQ were taken from the Awk FAQ:
|
|
|
d0cde9 |
http://www.faqs.org/faqs/computer-lang/awk/faq/
|
|
|
d0cde9 |
ftp://rtfm.mit.edu/pub/usenet/comp.lang.awk/faq
|
|
|
d0cde9 |
|
|
|
d0cde9 |
and from the old Perl FAQ:
|
|
|
d0cde9 |
http://www.perl.com/doc/FAQs/FAQ/oldfaq-html/index.html
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The following individuals have contributed significantly to this
|
|
|
d0cde9 |
document, and have provided input and wording suggestions for
|
|
|
d0cde9 |
questions, answers, and script examples. Credit goes to these
|
|
|
d0cde9 |
contributors (in alphabetical order by last name):
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Al Aab, Yiorgos Adamopoulos, Paolo Bonzini, Walter Briscoe, Jim
|
|
|
d0cde9 |
Dennis, Carlos Duarte, Otavio Exel, Sven Guckes, Aurelio Jargas,
|
|
|
d0cde9 |
Mark Katz, Toby Kelsey, Eric Pement, Greg Pfeiffer, Ken Pizzini,
|
|
|
d0cde9 |
Niall Smart, Simon Taylor, Peter Tillier, Greg Ubben, Laurent
|
|
|
d0cde9 |
Vogel.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
1.7. Standard disclaimers
|
|
|
d0cde9 |
|
|
|
d0cde9 |
While a serious attempt has been made to ensure the accuracy of the
|
|
|
d0cde9 |
information presented herein, the contributors and maintainers of
|
|
|
d0cde9 |
this document do not claim the absence of errors and make no
|
|
|
d0cde9 |
warranties on the information provided. If you notice any mistakes,
|
|
|
d0cde9 |
please let us know so we can fix it.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
------------------------------
|
|
|
d0cde9 |
|
|
|
d0cde9 |
2. BASIC SED
|
|
|
d0cde9 |
|
|
|
d0cde9 |
2.1. What is sed?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
"sed" stands for Stream EDitor. Sed is a non-interactive editor,
|
|
|
d0cde9 |
written by the late Lee E. McMahon in 1973 or 1974. A brief history
|
|
|
d0cde9 |
of sed's origins may be found in an early history of the Unix
|
|
|
d0cde9 |
tools, at <http://www.columbia.edu/~rh120/ch106.x09>.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Instead of altering a file interactively by moving the cursor on
|
|
|
d0cde9 |
the screen (as with a word processor), the user sends a script of
|
|
|
d0cde9 |
editing instructions to sed, plus the name of the file to edit (or
|
|
|
d0cde9 |
the text to be edited may come as output from a pipe). In this
|
|
|
d0cde9 |
sense, sed works like a filter -- deleting, inserting and changing
|
|
|
d0cde9 |
characters, words, and lines of text. Its range of activity goes
|
|
|
d0cde9 |
from small, simple changes to very complex ones.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Sed reads its input from stdin (Unix shorthand for "standard
|
|
|
d0cde9 |
input," i.e., the console) or from files (or both), and sends the
|
|
|
d0cde9 |
results to stdout ("standard output," normally the console or
|
|
|
d0cde9 |
screen). Most people use sed first for its substitution features.
|
|
|
d0cde9 |
Sed is often used as a find-and-replace tool.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed 's/Glenn/Harold/g' oldfile >newfile
|
|
|
d0cde9 |
|
|
|
d0cde9 |
will replace every occurrence of "Glenn" with the word "Harold",
|
|
|
d0cde9 |
wherever it occurs in the file. The "find" portion is a regular
|
|
|
d0cde9 |
expression ("RE"), which can be a simple word or may contain
|
|
|
d0cde9 |
special characters to allow greater flexibility (for example, to
|
|
|
d0cde9 |
prevent "Glenn" from also matching "Glennon").
|
|
|
d0cde9 |
|
|
|
d0cde9 |
My very first use of sed was to add 8 spaces to the left side of a
|
|
|
d0cde9 |
file, so when I printed it, the printing wouldn't begin at the
|
|
|
d0cde9 |
absolute left edge of a piece of paper.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed 's/^/ /' myfile >newfile # my first sed script
|
|
|
d0cde9 |
sed 's/^/ /' myfile | lp # my next sed script
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Then I learned that sed could display only one paragraph of a file,
|
|
|
d0cde9 |
beginning at the phrase "and where it came" and ending at the
|
|
|
d0cde9 |
phrase "for all people". My script looked like this:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed -n '/and where it came/,/for all people/p' myfile
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Sed's normal behavior is to print (i.e., display or show on screen)
|
|
|
d0cde9 |
the entire file, including the parts that haven't been altered,
|
|
|
d0cde9 |
unless you use the -n switch. The "-n" stands for "no output". This
|
|
|
d0cde9 |
switch is almost always used in conjunction with a 'p' command
|
|
|
d0cde9 |
somewhere, which says to print only the sections of the file that
|
|
|
d0cde9 |
have been specified. The -n switch with the 'p' command allow for
|
|
|
d0cde9 |
parts of a file to be printed (i.e., sent to the console).
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Next, I found that sed could show me only (say) lines 12-18 of a
|
|
|
d0cde9 |
file and not show me the rest. This was very handy when I needed to
|
|
|
d0cde9 |
review only part of a long file and I didn't want to alter it.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# the 'p' stands for print
|
|
|
d0cde9 |
sed -n 12,18p myfile
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Likewise, sed could show me everything else BUT those particular
|
|
|
d0cde9 |
lines, without physically changing the file on the disk:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# the 'd' stands for delete
|
|
|
d0cde9 |
sed 12,18d myfile
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Sed could also double-space my single-spaced file when it came time
|
|
|
d0cde9 |
to print it:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed G myfile >newfile
|
|
|
d0cde9 |
|
|
|
d0cde9 |
If you have many editing commands (for deleting, adding,
|
|
|
d0cde9 |
substituting, etc.) which might take up several lines, those
|
|
|
d0cde9 |
commands can be put into a separate file and all of the commands in
|
|
|
d0cde9 |
the file applied to file being edited:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# 'script.sed' is the file of commands
|
|
|
d0cde9 |
# 'myfile' is the file being changed
|
|
|
d0cde9 |
sed -f script.sed myfile # 'script.sed' is the file of commands
|
|
|
d0cde9 |
|
|
|
d0cde9 |
It is not our intention to convert this FAQ file into a full-blown
|
|
|
d0cde9 |
sed tutorial (for good tutorials, see section 2.3). Rather, we hope
|
|
|
d0cde9 |
this gives the complete novice a few ideas of how sed can be used.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
2.2. What versions of sed are there, and where can I get them?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
2.2.1. Free versions
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Note: "Free" does not mean "public domain" nor does it necessarily
|
|
|
d0cde9 |
mean you will never be charged for it. All versions of sed in this
|
|
|
d0cde9 |
section except the CP/M versions are based on the GNU general
|
|
|
d0cde9 |
public license and are "free software" by that standard (for
|
|
|
d0cde9 |
details, see http://www.gnu.org/philosophy/free-sw.html). This
|
|
|
d0cde9 |
means you can get the source code and develop it further.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
At the URLs listed in this category, sed binaries or source code
|
|
|
d0cde9 |
can be downloaded and used without fees or license payments.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
2.2.1.1. Unix platforms
|
|
|
d0cde9 |
|
|
|
d0cde9 |
ssed v3.60
|
|
|
d0cde9 |
ssed is the version recommended by the FAQ maintainers, since it
|
|
|
d0cde9 |
shares the same codebase with GNU sed, has the most options, and is
|
|
|
d0cde9 |
free software (you can get the source). Though there were earlier
|
|
|
d0cde9 |
version of ssed distributed, sites for these are not being listed.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
http://sed.sourceforge.net/grabbag/ssed
|
|
|
d0cde9 |
http://freshmeat.net/project/sed/
|
|
|
d0cde9 |
|
|
|
d0cde9 |
GNU sed v4.0.5
|
|
|
d0cde9 |
This is the latest official version of GNU sed. It offers in-place
|
|
|
d0cde9 |
text replacement as an option switch.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
ftp://ftp.gnu.org/pub/gnu/sed/sed-4.0.5.tar.gz
|
|
|
d0cde9 |
http://freshmeat.net/project/sed
|
|
|
d0cde9 |
|
|
|
d0cde9 |
BSD multi-byte sed (Japanese)
|
|
|
d0cde9 |
Based on the latest version of GNU sed, which supports multi-byte
|
|
|
d0cde9 |
characters.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
ftp://ftp1.freebsd.org/pub/FreeBSD/FreeBSD-stable/packages/Latest/ja-sed.tgz
|
|
|
d0cde9 |
|
|
|
d0cde9 |
GNU sed v3.02.80
|
|
|
d0cde9 |
An alpha test release which was the base for the development of
|
|
|
d0cde9 |
ssed and GNU sed v4.0.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
ftp://alpha.gnu.org/pub/gnu/sed/sed-3.02.80.tar.gz
|
|
|
d0cde9 |
|
|
|
d0cde9 |
GNU sed v3.02a
|
|
|
d0cde9 |
Interim version with most features of GNU sed v3.02.80.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
GNU sed v3.02
|
|
|
d0cde9 |
ftp://ftp.gnu.org/pub/gnu/sed/sed-3.02.tar.gz
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Precompiled versions:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
GNU sed v3.02-8
|
|
|
d0cde9 |
source code and binaries for Debian GNU/Linux
|
|
|
d0cde9 |
|
|
|
d0cde9 |
http://www.debian.org/Packages/stable/base/sed.html
|
|
|
d0cde9 |
|
|
|
d0cde9 |
For some time, the GNU project <http://www.gnu.org> used Eric S.
|
|
|
d0cde9 |
Raymond's version of sed (ESR sed v1.1), but eventually dropped it
|
|
|
d0cde9 |
because it had too many built-in limits. In 1991 Howard Helman
|
|
|
d0cde9 |
modified the GNU/ESR sed and produced a flexible version of sed
|
|
|
d0cde9 |
v1.5 available at several sites (Helman's version permitted things
|
|
|
d0cde9 |
like \<...\> to delimit word boundaries, \xHH to enter hex code and
|
|
|
d0cde9 |
\n to indicate newlines in the replace string). This version did
|
|
|
d0cde9 |
not catch on with the GNU project and their version of sed has
|
|
|
d0cde9 |
moved in a similar but different direction.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed v1.3, by Eric Steven Raymond (released 4 June 1998)
|
|
|
d0cde9 |
http://catb.org/~esr/sed-1.3.tar.gz
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Eric Raymond <esr@snark.thyrsus.com> wrote one of the earliest
|
|
|
d0cde9 |
versions of sed. On his website <http://www.catb.org/~esr/> which
|
|
|
d0cde9 |
also distributes many freeware utilities he has written or worked
|
|
|
d0cde9 |
on, he describes sed v1.1 this way:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
"This is the fast, small sed originally distributed in the GNU
|
|
|
d0cde9 |
toolkit and still distributed with Minix. The GNU people ditched it
|
|
|
d0cde9 |
when they built their own sed around an enhanced regex package --
|
|
|
d0cde9 |
but it's still better for some uses (in particular, faster and less
|
|
|
d0cde9 |
memory-intensive)." (Version 1.3 fixes an unidentified bug and adds
|
|
|
d0cde9 |
the L command to hexdump the current pattern space.)
|
|
|
d0cde9 |
|
|
|
d0cde9 |
2.2.1.2. OS/2
|
|
|
d0cde9 |
|
|
|
d0cde9 |
GNU sed v3.02.80
|
|
|
d0cde9 |
http://www2s.biglobe.ne.jp/~vtgf3mpr/gnu/sed.htm
|
|
|
d0cde9 |
|
|
|
d0cde9 |
GNU sed v3.02
|
|
|
d0cde9 |
http://hobbes.nmsu.edu/pub/os2/util/file/sed-3_02-r2-bin.zip # binaries
|
|
|
d0cde9 |
http://hobbes.nmsu.edu/pub/os2/util/file/sed-3_02-r2.zip # source
|
|
|
d0cde9 |
|
|
|
d0cde9 |
2.2.1.3. Microsoft Windows (Win3x, Win9x, WinNT, Win2K)
|
|
|
d0cde9 |
|
|
|
d0cde9 |
GNU sed v4.0.5
|
|
|
d0cde9 |
32-bit binaries and docs. Precompiled versions not available (yet).
|
|
|
d0cde9 |
|
|
|
d0cde9 |
GNU sed v3.02.80
|
|
|
d0cde9 |
32-bit binaries and docs, using DJGPP compiler. For details on new
|
|
|
d0cde9 |
features, see Unix section, above.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
http://www.student.northpark.edu/pemente/sed/sed3028a.zip # DOS binaries
|
|
|
d0cde9 |
ftp://alpha.gnu.org/pub/gnu/sed/sed-3.02.80.tar.gz # source
|
|
|
d0cde9 |
ftp://ftp.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/sed3028b.zip # binaries
|
|
|
d0cde9 |
ftp://ftp.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/sed3028d.zip # docs
|
|
|
d0cde9 |
ftp://ftp.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/sed3028s.zip # source
|
|
|
d0cde9 |
|
|
|
d0cde9 |
GNU sed v2.05
|
|
|
d0cde9 |
32-bit binaries, no docs. Requires 80386 DX (SX will not run) and
|
|
|
d0cde9 |
must be run in a DOS window or in a full screen DOS session under
|
|
|
d0cde9 |
Microsoft Windows. Will not run in MS-DOS mode (outside Win/Win95).
|
|
|
d0cde9 |
We recommend using the latest version of GNU sed.
|
|
|
d0cde9 |
http://www.simtel.net/pub/win95/prog/gsed205b.zip
|
|
|
d0cde9 |
ftp://ftp.cdrom.com/pub/simtelnet/win95/prog/gsed205b.zip
|
|
|
d0cde9 |
|
|
|
d0cde9 |
GNU sed v1.03
|
|
|
d0cde9 |
modified by Frank Whaley.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
This version was part of the "Virtually UN*X" toolset, hosted by
|
|
|
d0cde9 |
itribe.net; that website is now closed. Gsed v1.03 supported Win9x
|
|
|
d0cde9 |
long filenames, as well as hex, decimal, binary, and octal
|
|
|
d0cde9 |
character representations.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The Cygwin toolkit:
|
|
|
d0cde9 |
http://www.cygwin.com
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Formerly know as "GNU-Win32 tools." According to their home page,
|
|
|
d0cde9 |
"The Cygwin tools are Win32 ports of the popular GNU development
|
|
|
d0cde9 |
tools for Windows NT, 95 and 98. They function through the use of
|
|
|
d0cde9 |
the Cygwin library which provides a UNIX-like API on top of the
|
|
|
d0cde9 |
Win32 API." The version of sed used is GNU sed v3.02.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Minimalist GNU for Windows (MinGW):
|
|
|
d0cde9 |
http://www.mingw.org
|
|
|
d0cde9 |
http://mingw.sourceforge.net
|
|
|
d0cde9 |
|
|
|
d0cde9 |
According to their home page, "MinGW ('Minimalist GNU for Windows')
|
|
|
d0cde9 |
refers to a set of runtime headers, used in building a compiler
|
|
|
d0cde9 |
system based on the GNU GCC and binutils projects. It compiles and
|
|
|
d0cde9 |
links code to be run on Win32 platforms ... MinGW uses Microsoft
|
|
|
d0cde9 |
runtime libraries, distributed with the Windows operating system."
|
|
|
d0cde9 |
The version of sed used is GNU sed v3.02.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed v1.5 (a/k/a HHsed), by Howard Helman
|
|
|
d0cde9 |
Compiled with Mingw32 for 32-bit environments described above. This
|
|
|
d0cde9 |
version should support Win95 long filenames.
|
|
|
d0cde9 |
http://www.dbnet.ece.ntua.gr/~george/sed/OLD/sed15.exe
|
|
|
d0cde9 |
http://www.student.northpark.edu/pemente/sed/sed15exe.zip
|
|
|
d0cde9 |
|
|
|
d0cde9 |
2.2.1.4. MS-DOS
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed v1.6 (from HHsed), by Walter Briscoe
|
|
|
d0cde9 |
|
|
|
d0cde9 |
This is a forthcoming version, now in beta testing, but with many
|
|
|
d0cde9 |
new features. It corrects all the bugs in sed v1.5, and adds the
|
|
|
d0cde9 |
best features of sedmod v1.0 (below). It is available in 16-bit and
|
|
|
d0cde9 |
32-bit compiled versions for MS-DOS. Sorry, no URLs available yet.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed v1.5 (a/k/a HHsed), by Howard Helman
|
|
|
d0cde9 |
uncompiled source code (Turbo C)
|
|
|
d0cde9 |
ftp://ftp.simtel.net/pub/simtelnet/msdos/txtutl/sed15.zip
|
|
|
d0cde9 |
ftp://ftp.cdrom.com/pub/simtelnet/msdos/txtutl/sed15.zip
|
|
|
d0cde9 |
|
|
|
d0cde9 |
DOS executable and documentation
|
|
|
d0cde9 |
ftp://ftp.simtel.net/pub/simtelnet/msdos/txtutl/sed15x.zip
|
|
|
d0cde9 |
ftp://ftp.cdrom.com/pub/simtelnet/msdos/txtutl/sed15x.zip
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sedmod v1.0, by Hern Chen
|
|
|
d0cde9 |
http://www.ptug.org/sed/SEDMOD10.ZIP
|
|
|
d0cde9 |
http://www.student.northpark.edu/pemente/sed/sedmod10.zip
|
|
|
d0cde9 |
ftp://garbo.uwasa.fi/pc/unix/sedmod10.zip
|
|
|
d0cde9 |
|
|
|
d0cde9 |
GNU sed v3.02.80
|
|
|
d0cde9 |
See section 2.2.1.3 ("Microsoft Windows"), above.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
GNU sed v2.05
|
|
|
d0cde9 |
Does not run under MS-DOS.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
GNU sed v1.18
|
|
|
d0cde9 |
32-bit binaries and source, using DJGPP compiler. Requires 80386 SX
|
|
|
d0cde9 |
or better. Also requires 3 CWS*.EXE extenders on the path. See
|
|
|
d0cde9 |
section 5.5 ("What is CSDPMI*B.ZIP and why do I need it?"), below.
|
|
|
d0cde9 |
We recommend using a newer version of GNU sed.
|
|
|
d0cde9 |
http://www.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/sed118b.zip
|
|
|
d0cde9 |
ftp://ftp.cdrom.com/pub/simtelnet/gnu/djgpp/v2gnu/sed118b.zip
|
|
|
d0cde9 |
http://www.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/sed118s.zip
|
|
|
d0cde9 |
ftp://ftp.cdrom.com/pub/simtelnet/gnu/djgpp/v2gnu/sed118s.zip
|
|
|
d0cde9 |
|
|
|
d0cde9 |
GNU sed v1.06
|
|
|
d0cde9 |
16-bit binaries and source. Should run under any MS-DOS system.
|
|
|
d0cde9 |
http://www.simtel.net/pub/gnu/gnuish/sed106.zip
|
|
|
d0cde9 |
ftp://ftp.cdrom.com/pub/simtelnet/gnu/gnuish/sed106.zip
|
|
|
d0cde9 |
|
|
|
d0cde9 |
2.2.1.5. CP/M
|
|
|
d0cde9 |
|
|
|
d0cde9 |
ssed v2.2, by Chuck A. Forsberg
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Written for CP/M, ssed (for "small/stupid stream editor) supports
|
|
|
d0cde9 |
only the a(ppend), c(hange), d(elete) and i(nsert) options, and
|
|
|
d0cde9 |
apparently doesn't support regular expressions. A -u switch will
|
|
|
d0cde9 |
"unsqueeze" compressed files and was used mainly in conjunction
|
|
|
d0cde9 |
with DIF.COM for source code maintenance. (file: ssed22.lbr)
|
|
|
d0cde9 |
|
|
|
d0cde9 |
change, by Michael M. Rubenstein
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Rubenstein released a version of sed called CHANGE.COM (the
|
|
|
d0cde9 |
TTOOLS.LBR archive member CHANGE.CZM is a "crunched" file).
|
|
|
d0cde9 |
CHANGE.COM supports full RE's except grouping and backreferences,
|
|
|
d0cde9 |
and its only function is global substitution. (file: ttools.lbr)
|
|
|
d0cde9 |
|
|
|
d0cde9 |
2.2.1.6. Macintosh v8 or v9
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Since sed is a command-line utility, it is not customary to think
|
|
|
d0cde9 |
of sed being used on a Mac. Nonetheless, the following instructions
|
|
|
d0cde9 |
from Aurelio Jargas describe the process for running sed on MacOS
|
|
|
d0cde9 |
version version 8 or 9.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(1) Download and install the Apple DiskCopy application
|
|
|
d0cde9 |
|
|
|
d0cde9 |
ftp://ftp.apple.com/developer/Development_Kits
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(2) Download and install Apple MPW
|
|
|
d0cde9 |
|
|
|
d0cde9 |
ftp://ftp.apple.com/developer/Tool_Chest/Core_Mac_OS_Tools/MPW_etc./
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(3) Download and expand Matthias Neeracher's GNU sed for MPW. (They
|
|
|
d0cde9 |
seem to have misnumbered the sed filename.)
|
|
|
d0cde9 |
|
|
|
d0cde9 |
ftp://sunsite.cnlab-switch.ch/software/platform/macos/src/mpw_c/sed-2.03.sit.bin
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(4) Enter the sed-3.02 directory and doubleclick the 'sed' file
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(5) MPW Shell will open up. It will be a command window instead of
|
|
|
d0cde9 |
a command line, but sed should work as expected. For example:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
echo aa | sed 's/a/Z/g'<ENTER>
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Note that ENTER is different from RETURN on an iMac. Apple *also*
|
|
|
d0cde9 |
has its own version of sed on MPW, called "StreamEdit", with a
|
|
|
d0cde9 |
syntax fairly similar to that of normal sed.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
2.2.2. Shareware and Commercial versions
|
|
|
d0cde9 |
|
|
|
d0cde9 |
2.2.2.1. Unix platforms
|
|
|
d0cde9 |
|
|
|
d0cde9 |
[ Additional information needed. ]
|
|
|
d0cde9 |
|
|
|
d0cde9 |
2.2.2.2. OS/2
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Hamilton Labs:
|
|
|
d0cde9 |
http://www.hamiltonlabs.com/cshell.htm
|
|
|
d0cde9 |
|
|
|
d0cde9 |
A sizable set of Unix/C shell utilities designed for OS/2. Price is
|
|
|
d0cde9 |
$350 in the US, $395 elsewhere, with FedEx shipping, unconditional
|
|
|
d0cde9 |
guarantee, unlimited support and free updates. A demo version of
|
|
|
d0cde9 |
the suite can be downloaded from this site, but a stand-alone copy
|
|
|
d0cde9 |
of sed is not available.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
2.2.2.3. Windows 95/98, Windows NT, Windows 2000
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Hamilton Labs:
|
|
|
d0cde9 |
http://www.hamiltonlabs.com/cshell.htm
|
|
|
d0cde9 |
|
|
|
d0cde9 |
A sizable set of Unix/C shell utilities designed for Win9x, WinNT,
|
|
|
d0cde9 |
and Win2K. Price is $350 in the US, $395 elsewhere, with FedEx
|
|
|
d0cde9 |
shipping, unconditional guarantee, unlimited support and free
|
|
|
d0cde9 |
updates. A demo version of the suite can be downloaded from this
|
|
|
d0cde9 |
site, but a stand-alone copy of sed is not available.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Interix:
|
|
|
d0cde9 |
http://www.interix.com
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Interix (formerly known as OpenNT) is advertised as "a complete
|
|
|
d0cde9 |
UNIX system environment running natively on Microsoft Windows NT",
|
|
|
d0cde9 |
and is licensed and supported by Softway Systems. It offers over
|
|
|
d0cde9 |
200 Unix utilities, and supports Unix shells, sockets, networking,
|
|
|
d0cde9 |
and more. A single-user edition runs about $200. A free demo or
|
|
|
d0cde9 |
evaluation copy will run for 31 days and then quit; to continue
|
|
|
d0cde9 |
using it, you must purchase the commercial version.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
MKS NuTCRACKER Professional
|
|
|
d0cde9 |
http://www.datafocus.com/products/nutc/
|
|
|
d0cde9 |
|
|
|
d0cde9 |
A different, yet related product line offered by MKS (Mortice Kern
|
|
|
d0cde9 |
Systems, below); the awkward spelling "NuTCRACKER" is intentional.
|
|
|
d0cde9 |
Various packages offer hundreds of Unix utilities for Win32
|
|
|
d0cde9 |
environments. Sed is not available as a separate product.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
UnixDos:
|
|
|
d0cde9 |
http://www.unixdos.com
|
|
|
d0cde9 |
|
|
|
d0cde9 |
UnixDos is a suite of 82 Unix utilities ported over to the Windows
|
|
|
d0cde9 |
environments. There are 16-bit versions for Win3.x and 32-bit
|
|
|
d0cde9 |
versions for WinNT/Win95. It is distributed as uncrippled shareware
|
|
|
d0cde9 |
for the first 30 days. After the test period, the utilities will
|
|
|
d0cde9 |
not run and you must pay the registration fee of $50.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Their version of sed supports "\n" in the RHS of expressions, and
|
|
|
d0cde9 |
increases the length of input lines to 10,000 characters. By
|
|
|
d0cde9 |
special arrangement with the owners, persons who want a licensed
|
|
|
d0cde9 |
version of sed *only* (without the other utilities) may pay a
|
|
|
d0cde9 |
license fee of $10.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
U/WIN:
|
|
|
d0cde9 |
http://www.research.att.com/sw/tools/uwin/
|
|
|
d0cde9 |
|
|
|
d0cde9 |
U/WIN is a suite of Unix utilities created for WinNT and Win95
|
|
|
d0cde9 |
systems. It is owned by AT&T, created by David Korn (author of the
|
|
|
d0cde9 |
Unix korn shell), and is freely distributed only to educational
|
|
|
d0cde9 |
institutions, AT&T employees, or certain researchers; all others
|
|
|
d0cde9 |
must pay a fee after a 90-day evaluation period expires. U/WIN
|
|
|
d0cde9 |
operates best with the NTFS (WinNT file system) but will run in
|
|
|
d0cde9 |
degraded mode with the FAT file system and in further degraded mode
|
|
|
d0cde9 |
under Win95. A minimal installation takes about 25 to 30 megs of
|
|
|
d0cde9 |
disk space. Sed is not available as a separate file for download,
|
|
|
d0cde9 |
but comes with the suite.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
2.2.2.4. MS-DOS
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Mix C/Utilities Toolchest
|
|
|
d0cde9 |
http://www.mixsoftware.com/product/utility.htm
|
|
|
d0cde9 |
|
|
|
d0cde9 |
According to their web page, "The C/Utilities Toolchest adds over
|
|
|
d0cde9 |
40 powerful UNIX utilities to your MS-DOS operating system. The
|
|
|
d0cde9 |
result is an environment very similar to UNIX operating systems,
|
|
|
d0cde9 |
yet 100% compatible with MS-DOS programs and commands." The
|
|
|
d0cde9 |
toolchest costs $19.95, with source code available for an
|
|
|
d0cde9 |
additional fee. Mix C's version of sed is not available separately.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
MKS (Mortice Kern Systems) Toolkit
|
|
|
d0cde9 |
http://www.mks.com
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Sed comes bundled with the MKS Toolkit, which is distributed only
|
|
|
d0cde9 |
as commercial software; it is not available separately.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Thompson Automation Software
|
|
|
d0cde9 |
http://www.tasoft.com
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The Thompson Toolkit contains over 100 familiar Unix utilities,
|
|
|
d0cde9 |
including a version of the Unix Korn shell. It runs under MS-DOS,
|
|
|
d0cde9 |
OS/2, Win3.x, Win9x, and WinNT. Sed is one of the utilities, though
|
|
|
d0cde9 |
Thompson is better known for its version of awk for DOS, TAWK. The
|
|
|
d0cde9 |
toolkit runs about $150; sed is not available separately.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
2.3. Where can I learn to use sed?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
2.3.1. Books
|
|
|
d0cde9 |
|
|
|
d0cde9 |
_Sed & Awk, 2d edition_, by Dale Dougherty & Arnold Robbins
|
|
|
d0cde9 |
(Sebastopol, Calif: O'Reilly and Associates, 1997)
|
|
|
d0cde9 |
ISBN 1-56592-225-5
|
|
|
d0cde9 |
http://www.oreilly.com/catalog/sed2/noframes.html
|
|
|
d0cde9 |
|
|
|
d0cde9 |
About 40 percent of this book is devoted to sed, and maybe 50
|
|
|
d0cde9 |
percent is devoted to awk. The other 10 percent covers regexes and
|
|
|
d0cde9 |
concepts common to both tools. If you prefer hard copy, this is
|
|
|
d0cde9 |
definitely the best single place to learn to use sed, including its
|
|
|
d0cde9 |
advanced features.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The first edition is also very useful. Several typos crept into the
|
|
|
d0cde9 |
first printing of the first edition (though if you follow the
|
|
|
d0cde9 |
tutorials closely, you'll recognize them right away). A list of
|
|
|
d0cde9 |
errors from the first printing of _sed & awk_ is available at
|
|
|
d0cde9 |
<http://www.cs.colostate.edu/~dzubera/sedawk.txt>, and errors in
|
|
|
d0cde9 |
the 2nd are at <http://www.cs.colostate.edu/~dzubera/sedawk2.txt>,
|
|
|
d0cde9 |
though most of these were corrected in later printings. The second
|
|
|
d0cde9 |
edition tells how POSIX standards have affected these tools and
|
|
|
d0cde9 |
covers the popular GNU versions of sed and awk. Price is about (US)
|
|
|
d0cde9 |
$30.00
|
|
|
d0cde9 |
|
|
|
d0cde9 |
-----
|
|
|
d0cde9 |
|
|
|
d0cde9 |
_Mastering Regular Expressions, 2d ed.,_ by Jeffrey E. F. Friedl
|
|
|
d0cde9 |
(Sebastopol, Calif: O'Reilly and Associates, 2002)
|
|
|
d0cde9 |
ISBN 0-596-00289-0
|
|
|
d0cde9 |
http://regex.info
|
|
|
d0cde9 |
http://www.oreilly.com/catalog/regex2/
|
|
|
d0cde9 |
http://public.yahoo.com/~jfriedl/regex/ (for the first edition)
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Knowing how to use "regular expressions" is essential to effective
|
|
|
d0cde9 |
use of most Unix tools. This book focuses on how regular
|
|
|
d0cde9 |
expressions can be best implemented in utilities such as perl, vi,
|
|
|
d0cde9 |
emacs, and awk, but also touches on sed as well. Friedl's home page
|
|
|
d0cde9 |
(above) gives links to other sites which help students learn to
|
|
|
d0cde9 |
master regular expressions. His site also gives a Perl script for
|
|
|
d0cde9 |
determining a syntactically valid e-mail address, using regexes:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
http://public.yahoo.com/~jfriedl/regex/code.html
|
|
|
d0cde9 |
|
|
|
d0cde9 |
-----
|
|
|
d0cde9 |
|
|
|
d0cde9 |
_Awk und Sed_, by Helmut Herold.
|
|
|
d0cde9 |
(Bonn: Addison-Wesley, 1994; 288 pages)
|
|
|
d0cde9 |
2nd edition to be released in March 2003
|
|
|
d0cde9 |
ISBN 3-8273-2094-1
|
|
|
d0cde9 |
http://www.addison-wesley.de/main/main.asp?page=home/bookdetails&ProductID=37214
|
|
|
d0cde9 |
|
|
|
d0cde9 |
2.3.2. Mailing list
|
|
|
d0cde9 |
|
|
|
d0cde9 |
If you are interested in learning more about sed (its syntax, using
|
|
|
d0cde9 |
regular expressions, etc.) you are welcome to subscribe to a
|
|
|
d0cde9 |
sed-oriented mailing list. In fact, there are two mailing lists
|
|
|
d0cde9 |
about sed: one in English named "sed-users", moderated by Sven
|
|
|
d0cde9 |
Guckes; and one in Portuguese named "sed-BR" (for sed-Brazil),
|
|
|
d0cde9 |
moderated by Aurelio Marinho Jargas. The average volume of mail for
|
|
|
d0cde9 |
"sed-users" is about 35 messages a week; the average volume of mail
|
|
|
d0cde9 |
for "sed-BR" is about 15 messages a week.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed-BR mailing list: http://br.groups.yahoo.com/group/sed-br/
|
|
|
d0cde9 |
sed-users mailing list: http://groups.yahoo.com/group/sed-users/
|
|
|
d0cde9 |
|
|
|
d0cde9 |
To subscribe to sed-users, send a blank message to:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed-users-subscribe@yahoogroups.com
|
|
|
d0cde9 |
|
|
|
d0cde9 |
To unsubscribe from sed-users, send a blank message to:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed-users-unsubscribe@yahoogroups.com
|
|
|
d0cde9 |
|
|
|
d0cde9 |
2.3.3. Tutorials, electronic text
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The original users manual for sed, by Lee E. McMahon, from the
|
|
|
d0cde9 |
7th edition UNIX Manual (1978), with the classic "Kubla Khan"
|
|
|
d0cde9 |
example and tutorial, in formatted text format:
|
|
|
d0cde9 |
http://sed.sourceforge.net/grabbag/tutorials/sed_mcmahon.txt
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The source code to the preceding manual. Use "troff -ms sed" to
|
|
|
d0cde9 |
print this file properly:
|
|
|
d0cde9 |
http://plan9.bell-labs.com/7thEdMan/vol2/sed
|
|
|
d0cde9 |
http://cm.bell-labs.com/7thEdMan/vol2/sed
|
|
|
d0cde9 |
|
|
|
d0cde9 |
"Do It With Sed", by Carlos Duarte
|
|
|
d0cde9 |
http://www.dbnet.ece.ntua.gr/~george/sed/OLD/sedtut_1.html
|
|
|
d0cde9 |
|
|
|
d0cde9 |
"Sed: How to use sed, a special editor for modifying files
|
|
|
d0cde9 |
automatically", by Bruce Barnett and General Electric Company
|
|
|
d0cde9 |
http://www.grymoire.com/Unix/Sed.html
|
|
|
d0cde9 |
|
|
|
d0cde9 |
U-SEDIT2.ZIP, by Mike Arst (16 June 1990)
|
|
|
d0cde9 |
ftp://ftp.cs.umu.se/pub/pc/u-sedit2.zip
|
|
|
d0cde9 |
ftp://ftp.uni-stuttgart.de/pub/systems/msdos/util/unixlike/u-sedit2.zip
|
|
|
d0cde9 |
ftp://sunsite.icm.edu.pl/vol/wojsyl/garbo/pc/editor/u-sedit2.zip
|
|
|
d0cde9 |
ftp://ftp.sogang.ac.kr/pub/msdos/garbo_pc/editor/u-sedit2.zip
|
|
|
d0cde9 |
|
|
|
d0cde9 |
U-SEDIT3.ZIP, by Mike Arst (24 Jan. 1992)
|
|
|
d0cde9 |
http://www.student.northpark.edu/pemente/sed/u-sedit3.zip
|
|
|
d0cde9 |
CompuServe DTPFORUM, "PC DTP Utilities" library, file SEDDOC.ZIP
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Another sed FAQ
|
|
|
d0cde9 |
http://www.dreamwvr.com/sed-info/sed-faq.html
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed-tutorial, by Felix von Leitner
|
|
|
d0cde9 |
http://www.math.fu-berlin.de/~leitner/sed/tutorial.html
|
|
|
d0cde9 |
|
|
|
d0cde9 |
"Manipulating text with sed," chapter 14 of the SCO OpenServer
|
|
|
d0cde9 |
"Operating System Users Guide"
|
|
|
d0cde9 |
http://ou800doc.caldera.com/SHL_automate/CTOC-Manipulating_text_with_sed.html
|
|
|
d0cde9 |
|
|
|
d0cde9 |
"Combining the Bourne-shell, sed and awk in the UNIX environment
|
|
|
d0cde9 |
for language analysis," by Lothar Schmitt and Kiel Christianson.
|
|
|
d0cde9 |
This basic tutorial on the Bourne shell, sed and awk downloads as a
|
|
|
d0cde9 |
71-page PostScript file (compressed to 290K with gzip). You may
|
|
|
d0cde9 |
need to navigate down from the root to get the file.
|
|
|
d0cde9 |
ftp://ftp.u-aizu.ac.jp/u-aizu/doc/Tech-Report/1997/97-2-007.tar.gz
|
|
|
d0cde9 |
available upon request from Lothar Schmitt <lothar@u-aizu.ac.jp>
|
|
|
d0cde9 |
|
|
|
d0cde9 |
2.3.4. General web and ftp sites
|
|
|
d0cde9 |
|
|
|
d0cde9 |
http://sed.sourceforge.net/grabbag # Collected scripts
|
|
|
d0cde9 |
http://main.rtfiber.com.tw/~changyj/sed/ # Yao-Jen Chang
|
|
|
d0cde9 |
http://www.math.fu-berlin.de/~guckes/sed/ # Sven Guckes
|
|
|
d0cde9 |
http://www.math.fu-berlin.de/~leitner/sed/ # Felix von Leitner
|
|
|
d0cde9 |
http://www.dbnet.ece.ntua.gr/~george/sed/ # Yiorgos Adamopoulos
|
|
|
d0cde9 |
http://www.student.northpark.edu/pemente/sed/ # Eric Pement
|
|
|
d0cde9 |
|
|
|
d0cde9 |
http://spacsun.rice.edu/FAQ/sed.html
|
|
|
d0cde9 |
ftp://algos.inesc.pt/pub/users/cdua/scripts.tar.gz (sed and shell scripts)
|
|
|
d0cde9 |
|
|
|
d0cde9 |
"Handy One-Liners For Sed", compiled by Eric Pement. A large list
|
|
|
d0cde9 |
of 1-line sed commands which can be executed from the command line.
|
|
|
d0cde9 |
http://sed.sourceforge.net/sed1line.txt
|
|
|
d0cde9 |
http://www.student.northpark.edu/pemente/sed/sed1line.txt
|
|
|
d0cde9 |
|
|
|
d0cde9 |
"Handy One-Liners For Sed", translated to Portuguese
|
|
|
d0cde9 |
http://wmaker.lrv.ufsc.br/sed_ptBR.html
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The Single UNIX Specification, Version 3 (technical man page)
|
|
|
d0cde9 |
http://www.opengroup.org/onlinepubs/007904975/utilities/sed.html
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Getting started with sed
|
|
|
d0cde9 |
http://www.cs.hmc.edu/tech_docs/qref/sed.html
|
|
|
d0cde9 |
|
|
|
d0cde9 |
masm to gas converter
|
|
|
d0cde9 |
http://www.delorie.com/djgpp/faq/converting/asm2s-sed.html
|
|
|
d0cde9 |
|
|
|
d0cde9 |
mail2html.zip
|
|
|
d0cde9 |
http://www.crispen.org/src/#mail2html
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sample uses of sed in batch files and scripts (Benny Pederson)
|
|
|
d0cde9 |
http://users.cybercity.dk/~bse26236/batutil/help/SED.HTM
|
|
|
d0cde9 |
|
|
|
d0cde9 |
dc.sed - the most complex and impressive sed script ever written.
|
|
|
d0cde9 |
This sed script by Greg Ubben emulates the Unix dc (desk
|
|
|
d0cde9 |
calculator), including base conversion, exponentiation, square
|
|
|
d0cde9 |
roots, and much more.
|
|
|
d0cde9 |
http://sed.sourceforge.net/grabbag/scripts/dc_overview.htm
|
|
|
d0cde9 |
|
|
|
d0cde9 |
If you should find other tutorials or scripts that should be added
|
|
|
d0cde9 |
to this document, please forward the URLs to the FAQ maintainer.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
------------------------------
|
|
|
d0cde9 |
|
|
|
d0cde9 |
3. TECHNICAL
|
|
|
d0cde9 |
|
|
|
d0cde9 |
3.1. More detailed explanation of basic sed
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Sed takes a script of editing commands and applies each command, in
|
|
|
d0cde9 |
order, to each line of input. After all the commands have been
|
|
|
d0cde9 |
applied to the first line of input, that line is output. A second
|
|
|
d0cde9 |
input line is taken for processing, and the cycle repeats. Sed
|
|
|
d0cde9 |
scripts can address a single line by line number or by matching a
|
|
|
d0cde9 |
/RE pattern/ on the line. An exclamation mark '!' after a regex
|
|
|
d0cde9 |
('/RE/!') or line number will select all lines that do NOT match
|
|
|
d0cde9 |
that address. Sed can also address a range of lines in the same
|
|
|
d0cde9 |
manner, using a comma to separate the 2 addresses.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
$d # delete the last line of the file
|
|
|
d0cde9 |
/[0-9]\{3\}/p # print lines with 3 consecutive digits
|
|
|
d0cde9 |
5!s/ham/cheese/ # except on line 5, replace 'ham' with 'cheese'
|
|
|
d0cde9 |
/awk/!s/aaa/bb/ # unless 'awk' is found, replace 'aaa' with 'bb'
|
|
|
d0cde9 |
17,/foo/d # delete all lines from line 17 up to 'foo'
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Following an address or address range, sed accepts curly braces
|
|
|
d0cde9 |
'{...}' so several commands may be applied to that line or to the
|
|
|
d0cde9 |
lines matched by the address range. On the command line, semicolons
|
|
|
d0cde9 |
';' separate each instruction and must precede the closing brace.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed '/Owner:/{s/yours/mine/g;s/your/my/g;s/you/me/g;}' file
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Range addresses operate differently depending on which version of
|
|
|
d0cde9 |
sed is used (see section 3.4, below). For further information on
|
|
|
d0cde9 |
using sed, consult the references in section 2.3, above.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
3.1.1. Regular expressions on the left side of "s///"
|
|
|
d0cde9 |
|
|
|
d0cde9 |
All versions of sed support Basic Regular Expressions (BREs). For
|
|
|
d0cde9 |
the syntax of BREs, enter "man ed" at a Unix shell prompt. A
|
|
|
d0cde9 |
technical description of BREs from IEEE POSIX 1003.1-2001 and the
|
|
|
d0cde9 |
Single UNIX Specification Version 3 is available online at:
|
|
|
d0cde9 |
http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap09.html#tag_09_03
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Sed normally supports BREs plus '\n' to match a newline in the
|
|
|
d0cde9 |
pattern space, plus '\xREx' as equivalent to '/RE/', where 'x' is any
|
|
|
d0cde9 |
character other than a newline or another backslash.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Some versions of sed support supersets of BREs, or "extended
|
|
|
d0cde9 |
regular expressions", which offer additional metacharacters for
|
|
|
d0cde9 |
increased flexibility. For additional information on extended REs
|
|
|
d0cde9 |
in GNU sed, see sections 3.7 ("GNU/POSIX extensions to regular
|
|
|
d0cde9 |
expressions") and 6.7.3 ("Special syntax in REs"), below.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Though not required by BREs, some versions of sed support \t to
|
|
|
d0cde9 |
represent a TAB, \r for carriage return, \xHH for direct entry of
|
|
|
d0cde9 |
hex codes, and so forth. Other versions of sed do not.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
ssed (super-sed) introduced many new features for LHS pattern
|
|
|
d0cde9 |
matching, too many to give here. The complete list is found in
|
|
|
d0cde9 |
section 6.7.3.H ("ssed"), below.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
3.1.2. Escape characters on the right side of "s///"
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The right-hand side (the replacement part) in "s/find/replace/" is
|
|
|
d0cde9 |
almost always a string literal, with no interpolation of these
|
|
|
d0cde9 |
metacharacters:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
. ^ $ [ ] { } ( ) ? + * |
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Three things *are* interpolated: ampersand (&), backreferences, and
|
|
|
d0cde9 |
options for special seds. An ampersand on the RHS is replaced by
|
|
|
d0cde9 |
the entire expression matched on the LHS. There is _never_ any
|
|
|
d0cde9 |
reason to use grouping like this:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
s/\(some-complex-regex\)/one two \1 three/
|
|
|
d0cde9 |
|
|
|
d0cde9 |
since you can do this instead:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
s/some-complex-regex/one two & three/
|
|
|
d0cde9 |
|
|
|
d0cde9 |
To enter a literal ampersand on the RHS, type '\&'.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Grouping and backreferences: All versions of sed support grouping
|
|
|
d0cde9 |
and backreferences on the LHS and backreferences only on the RHS.
|
|
|
d0cde9 |
Grouping allows a series of characters to be collected in a set,
|
|
|
d0cde9 |
indicating the boundaries of the set with \( and \). Then the set
|
|
|
d0cde9 |
can be designated to be repeated a certain number of times
|
|
|
d0cde9 |
|
|
|
d0cde9 |
\(like this\)* or \(like this\)\{5,7\}.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Groups can also be nested "\(like \(this\) is here\)" and may
|
|
|
d0cde9 |
contain any valid RE. Backreferences repeat the contents of a
|
|
|
d0cde9 |
particular group, using a backslash and a digit (1-9) for each
|
|
|
d0cde9 |
corresponding group. In other words, "/\(pom\)\1/" is another way
|
|
|
d0cde9 |
of writing "/pompom/". If groups are nested, backreference numbers
|
|
|
d0cde9 |
are counted by matching \( in strict left to right order. Thus,
|
|
|
d0cde9 |
/..\(the \(word\)\) \("foo"\)../ is matched by the backreference
|
|
|
d0cde9 |
\3. Backreferences can be used in the LHS, the RHS, and in normal
|
|
|
d0cde9 |
RE addressing (see section 3.3). Thus,
|
|
|
d0cde9 |
|
|
|
d0cde9 |
/\(.\)\1\(.\)\2\(.\)\3/; # matches "bookkeeper"
|
|
|
d0cde9 |
/^\(.\)\(.\)\(.\)\3\2\1$/; # finds 6-letter palindromes
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Seds differ in how they treat invalid backreferences where no
|
|
|
d0cde9 |
corresponding group occurs. To insert a literal ampersand or
|
|
|
d0cde9 |
backslash into the RHS, prefix it with a backslash: \& or \\.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
ssed, sed16, and sedmod permit additional options on the RHS. They
|
|
|
d0cde9 |
all support changing part of the replacement string to upper case
|
|
|
d0cde9 |
(\u or \U), lower case (\l or \L), or to end case conversion (\E).
|
|
|
d0cde9 |
Both sed16 and sedmod support awk-style word references ($1, $2,
|
|
|
d0cde9 |
$3, ...) and $0 to insert the entire line before conversion.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
echo ab ghi | sed16 "s/.*/$0 - \U$2/" # prints "ab ghi - GHI"
|
|
|
d0cde9 |
|
|
|
d0cde9 |
*Note:* This feature of sed16 and sedmod will break sed scripts which
|
|
|
d0cde9 |
put a dollar sign and digit into the RHS. Though this is an unlikely
|
|
|
d0cde9 |
combination, it's worth remembering if you use other people's scripts.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
3.1.3. Substitution switches
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Standard versions of sed support 4 main flags or switches which may
|
|
|
d0cde9 |
be added to the end of an "s///" command. They are:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
N - Replace the Nth match of the pattern on the LHS, where
|
|
|
d0cde9 |
N is an integer between 1 and 512. If N is omitted,
|
|
|
d0cde9 |
the default is to replace the first match only.
|
|
|
d0cde9 |
g - Global replace of all matches to the pattern.
|
|
|
d0cde9 |
p - Print the results to stdout, even if -n switch is used.
|
|
|
d0cde9 |
w file - Write the pattern space to 'file' if a replacement was
|
|
|
d0cde9 |
done. If the file already exists when the script is
|
|
|
d0cde9 |
executed, it is overwritten. During script execution,
|
|
|
d0cde9 |
w appends to the file for each match.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
GNU sed 3.02 and ssed also offer the /I switch for doing a
|
|
|
d0cde9 |
case-insensitive match. For example,
|
|
|
d0cde9 |
|
|
|
d0cde9 |
echo ONE TWO | gsed "s/one/unos/I" # prints "unos TWO"
|
|
|
d0cde9 |
|
|
|
d0cde9 |
GNU sed 4.x and ssed add the /M switch, to simplify working with
|
|
|
d0cde9 |
multi-line patterns: when it is used, ^ or $ will match BOL or EOL.
|
|
|
d0cde9 |
\` and \' remain available to match the start and end of pattern
|
|
|
d0cde9 |
space, respectively.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
ssed supports two more switches, /S and /X, when its Perl mode is
|
|
|
d0cde9 |
used. They are described in detail in section 6.7.3.H, below.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
3.1.4. Command-line switches
|
|
|
d0cde9 |
|
|
|
d0cde9 |
All versions of sed support two switches, -e and -n. Though sed
|
|
|
d0cde9 |
usually separates multiple commands with semicolons (e.g., "H;d;"),
|
|
|
d0cde9 |
certain commands could not accept a semicolon command separator.
|
|
|
d0cde9 |
These include :labels, 't', and 'b'. These commands had to occur
|
|
|
d0cde9 |
last in a script, separated by -e option switches. For example:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# The 'ta' means jump to label :a if last s/// returns true
|
|
|
d0cde9 |
sed -e :a -e '$!N;s/\n=/ /;ta' -e 'P;D' file
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The -n switch turns off sed's default behavior of printing every
|
|
|
d0cde9 |
line. With -n, lines are printed only if explicitly told to. In
|
|
|
d0cde9 |
addition, for certain versions of sed, if an external script begins
|
|
|
d0cde9 |
with "#n" as its first two characters, the output is suppressed
|
|
|
d0cde9 |
(exactly as if -n had been entered on the command line). A list of
|
|
|
d0cde9 |
which versions appears in section 6.7.2., below.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
GNU sed 4.x and ssed support additional switches. -l (lowercase L),
|
|
|
d0cde9 |
followed by a number, lets you adjust the default length of the 'l'
|
|
|
d0cde9 |
and 'L' commands (note that these implementations of sed also
|
|
|
d0cde9 |
support an argument to these commands, to tailor the length
|
|
|
d0cde9 |
separately of each occurrence of the command).
|
|
|
d0cde9 |
|
|
|
d0cde9 |
-i activates in-place editing (see section 4.41.1, below). -s
|
|
|
d0cde9 |
treats each file as a separate stream: sed by default joins all the
|
|
|
d0cde9 |
files, so $ represents the last line of the last file; 15 means the
|
|
|
d0cde9 |
15th line in the joined stream; and /abc/,/def/ might match across
|
|
|
d0cde9 |
files.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
When -s is used, however all addresses refer to single files. For
|
|
|
d0cde9 |
example, $ represents the last line of each input file; 15 means
|
|
|
d0cde9 |
the 15th line of each input file; and /abc/,/def/ will be "reset"
|
|
|
d0cde9 |
(in other words, sed will not execute the commands and start
|
|
|
d0cde9 |
looking for /abc/ again) if a file ends before /def/ has been
|
|
|
d0cde9 |
matched. Note that -i automatically activates this interpretation
|
|
|
d0cde9 |
of addresses.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
3.2. Common one-line sed scripts
|
|
|
d0cde9 |
|
|
|
d0cde9 |
A separate document of over 70 handy "one-line" sed commands is
|
|
|
d0cde9 |
available at
|
|
|
d0cde9 |
http://sed.sourceforge.net/sed1line.txt
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Here are several common sed commands for one-line use. MS-DOS users
|
|
|
d0cde9 |
should replace single quotes ('...') with double quotes ("...") in
|
|
|
d0cde9 |
these examples. A specific filename usually follows the script,
|
|
|
d0cde9 |
though the input may also come via piping or redirection.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# Double space a file
|
|
|
d0cde9 |
sed G file
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# Triple space a file
|
|
|
d0cde9 |
sed 'G;G' file
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# Under UNIX: convert DOS newlines (CR/LF) to Unix format
|
|
|
d0cde9 |
sed 's/.$//' file # assumes that all lines end with CR/LF
|
|
|
d0cde9 |
sed 's/^M$// file # in bash/tcsh, press Ctrl-V then Ctrl-M
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# Under DOS: convert Unix newlines (LF) to DOS format
|
|
|
d0cde9 |
sed 's/$//' file # method 1
|
|
|
d0cde9 |
sed -n p file # method 2
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# Delete leading whitespace (spaces/tabs) from front of each line
|
|
|
d0cde9 |
# (this aligns all text flush left). '^t' represents a true tab
|
|
|
d0cde9 |
# character. Under bash or tcsh, press Ctrl-V then Ctrl-I.
|
|
|
d0cde9 |
sed 's/^[ ^t]*//' file
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# Delete trailing whitespace (spaces/tabs) from end of each line
|
|
|
d0cde9 |
sed 's/[ ^t]*$//' file # see note on '^t', above
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# Delete BOTH leading and trailing whitespace from each line
|
|
|
d0cde9 |
sed 's/^[ ^t]*//;s/[ ^]*$//' file # see note on '^t', above
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# Substitute "foo" with "bar" on each line
|
|
|
d0cde9 |
sed 's/foo/bar/' file # replaces only 1st instance in a line
|
|
|
d0cde9 |
sed 's/foo/bar/4' file # replaces only 4th instance in a line
|
|
|
d0cde9 |
sed 's/foo/bar/g' file # replaces ALL instances within a line
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# Substitute "foo" with "bar" ONLY for lines which contain "baz"
|
|
|
d0cde9 |
sed '/baz/s/foo/bar/g' file
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# Delete all CONSECUTIVE blank lines from file except the first.
|
|
|
d0cde9 |
# This method also deletes all blank lines from top and end of file.
|
|
|
d0cde9 |
# (emulates "cat -s")
|
|
|
d0cde9 |
sed '/./,/^$/!d' file # this allows 0 blanks at top, 1 at EOF
|
|
|
d0cde9 |
sed '/^$/N;/\n$/D' file # this allows 1 blank at top, 0 at EOF
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# Delete all leading blank lines at top of file (only).
|
|
|
d0cde9 |
sed '/./,$!d' file
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# Delete all trailing blank lines at end of file (only).
|
|
|
d0cde9 |
sed -e :a -e '/^\n*$/{$d;N;};/\n$/ba' file
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# If a line ends with a backslash, join the next line to it.
|
|
|
d0cde9 |
sed -e :a -e '/\\$/N; s/\\\n//; ta' file
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# If a line begins with an equal sign, append it to the previous
|
|
|
d0cde9 |
# line (and replace the "=" with a single space).
|
|
|
d0cde9 |
sed -e :a -e '$!N;s/\n=/ /;ta' -e 'P;D' file
|
|
|
d0cde9 |
|
|
|
d0cde9 |
3.3. Addressing and address ranges
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Sed commands may have an optional "address" or "address range"
|
|
|
d0cde9 |
prefix. If there is no address or address range given, then the
|
|
|
d0cde9 |
command is applied to all the lines of the input file or text
|
|
|
d0cde9 |
stream. Three commands cannot take an address prefix:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
- labels, used to branch or jump within the script
|
|
|
d0cde9 |
- the close brace, '}', which ends the '{' "command"
|
|
|
d0cde9 |
- the '#' comment character, also technically a "command"
|
|
|
d0cde9 |
|
|
|
d0cde9 |
An address can be a line number (such as 1, 5, 37, etc.), a regular
|
|
|
d0cde9 |
expression (written in the form /RE/ or \xREx where 'x' is any
|
|
|
d0cde9 |
character other than '\' and RE is the regular expression), or the
|
|
|
d0cde9 |
dollar sign ($), representing the last line of the file. An
|
|
|
d0cde9 |
exclamation mark (!) after an address or address range will apply
|
|
|
d0cde9 |
the command to every line EXCEPT the ones named by the address. A
|
|
|
d0cde9 |
null regex ("//") will be replaced by the last regex which was
|
|
|
d0cde9 |
used. Also, some seds do not support \xREx as regex delimiters.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
5d # delete line 5 only
|
|
|
d0cde9 |
5!d # delete every line except line 5
|
|
|
d0cde9 |
/RE/s/LHS/RHS/g # substitute only if RE occurs on the line
|
|
|
d0cde9 |
/^$/b label # if the line is blank, branch to ':label'
|
|
|
d0cde9 |
/./!b label # ... another way to write the same command
|
|
|
d0cde9 |
\%.%!b label # ... yet another way to write this command
|
|
|
d0cde9 |
$!N # on all lines but the last, get the Next line
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Note that an embedded newline can be represented in an address by
|
|
|
d0cde9 |
the symbol \n, but this syntax is needed only if the script puts 2
|
|
|
d0cde9 |
or more lines into the pattern space via the N, G, or other
|
|
|
d0cde9 |
commands. The \n symbol does *not* match the newline at an
|
|
|
d0cde9 |
end-of-line because when sed reads each line into the pattern space
|
|
|
d0cde9 |
for processing, it strips off the trailing newline, processes the
|
|
|
d0cde9 |
line, and adds a newline back when printing the line to standard
|
|
|
d0cde9 |
output. To match the end-of-line, use the '$' metacharacter, as
|
|
|
d0cde9 |
follows:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
/tape$/ # matches the word 'tape' at the end of a line
|
|
|
d0cde9 |
/tape$deck/ # matches the word 'tape$deck' with a literal '$'
|
|
|
d0cde9 |
/tape\ndeck/ # matches 'tape' and 'deck' with a newline between
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The following sed commands usually accept *only* a single address.
|
|
|
d0cde9 |
All other commands (except labels, '}', and '#') accept both single
|
|
|
d0cde9 |
addresses and address ranges.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
= print to stdout the line number of the current line
|
|
|
d0cde9 |
a after printing the current line, append "text" to stdout
|
|
|
d0cde9 |
i before printing the current line, insert "text" to stdout
|
|
|
d0cde9 |
q quit after the current line is matched
|
|
|
d0cde9 |
r file prints contents of "file" to stdout after line is matched
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Note that we said "usually." If you need to apply the '=', 'a',
|
|
|
d0cde9 |
'i', or 'r' commands to each and every line within an address
|
|
|
d0cde9 |
range, this behavior can be coerced by the use of braces. Thus,
|
|
|
d0cde9 |
"1,9=" is an invalid command, but "1,9{=;}" will print each line
|
|
|
d0cde9 |
number followed by its line for the first 9 lines (and then print
|
|
|
d0cde9 |
the rest of the rest of the file normally).
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Address ranges occur in the form
|
|
|
d0cde9 |
|
|
|
d0cde9 |
<address1>,<address2> or <address1>,<address2>!
|
|
|
d0cde9 |
|
|
|
d0cde9 |
where the address can be a line number or a standard /regex/.
|
|
|
d0cde9 |
<address2> can also be a dollar sign, indicating the end of file.
|
|
|
d0cde9 |
Under GNU sed 3.02+, ssed, and sed15+, <address2> may also be a
|
|
|
d0cde9 |
notation of the form +num, indicating the next _num_ lines after
|
|
|
d0cde9 |
<address1> is matched.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Address ranges are:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(1) Inclusive. The range "/From here/,/eternity/" matches all the
|
|
|
d0cde9 |
lines containing "From here" up to and including the line
|
|
|
d0cde9 |
containing "eternity". It will not stop on the line just prior to
|
|
|
d0cde9 |
"eternity". (If you don't like this, see section 4.24.)
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(2) Plenary. They always match full lines, not just parts of lines.
|
|
|
d0cde9 |
In other words, a command to change or delete an address range will
|
|
|
d0cde9 |
change or delete whole lines; it won't stop in the middle of a
|
|
|
d0cde9 |
line.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(3) Multi-linear. Address ranges normally match 2 lines or more.
|
|
|
d0cde9 |
The second address will never match the same line the first address
|
|
|
d0cde9 |
did; therefore a valid address range always spans at least two
|
|
|
d0cde9 |
lines, with these exceptions which match only one line:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
- if the first address matches the last line of the file
|
|
|
d0cde9 |
- if using the syntax "/RE/,3" and /RE/ occurs only once in the
|
|
|
d0cde9 |
file at line 3 or below
|
|
|
d0cde9 |
- if using HHsed v1.5. See section 3.4.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(4) Minimalist. In address ranges with /regex/ as <address2>, the
|
|
|
d0cde9 |
range "/foo/,/bar/" will stop at the first "bar" it finds, provided
|
|
|
d0cde9 |
that "bar" occurs on a line below "foo". If the word "bar" occurs
|
|
|
d0cde9 |
on several lines below the word "foo", the range will match all the
|
|
|
d0cde9 |
lines from the first "foo" up to the first "bar". It will not
|
|
|
d0cde9 |
continue hopping ahead to find more "bar"s. In other words, address
|
|
|
d0cde9 |
ranges are not "greedy," like regular expressions.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(5) Repeating. An address range will try to match more than one
|
|
|
d0cde9 |
block of lines in a file. However, the blocks cannot nest. In
|
|
|
d0cde9 |
addition, a second match will not "take" the last line of the
|
|
|
d0cde9 |
previous block. For example, given the following text,
|
|
|
d0cde9 |
|
|
|
d0cde9 |
start
|
|
|
d0cde9 |
stop start
|
|
|
d0cde9 |
stop
|
|
|
d0cde9 |
|
|
|
d0cde9 |
the sed command '/start/,/stop/d' will only delete the first two
|
|
|
d0cde9 |
lines. It will not delete all 3 lines.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(6) Relentless. If the address range finds a "start" match but
|
|
|
d0cde9 |
doesn't find a "stop", it will match every line from "start" to the
|
|
|
d0cde9 |
end of the file. Thus, beware of the following behaviors:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
/RE1/,/RE2/ # If /RE2/ is not found, matches from /RE1/ to the
|
|
|
d0cde9 |
# end-of-file.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
20,/RE/ # If /RE/ is not found, matches from line 20 to the
|
|
|
d0cde9 |
# end-of-file.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
/RE/,30 # If /RE/ occurs any time after line 30, each
|
|
|
d0cde9 |
# occurrence will be matched in sed15+, sedmod, and
|
|
|
d0cde9 |
# GNU sed v3.02+. GNU sed v2.05 and 1.18 will match
|
|
|
d0cde9 |
# from the 2nd occurrence of /RE/ to the end-of-file.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
If these behaviors seem strange, remember that they occur because
|
|
|
d0cde9 |
sed does not look "ahead" in the file. Doing so would stop sed from
|
|
|
d0cde9 |
being a stream editor and have adverse effects on its efficiency.
|
|
|
d0cde9 |
If these behaviors are undesirable, they can be circumvented or
|
|
|
d0cde9 |
corrected by the use of nested testing within braces. The following
|
|
|
d0cde9 |
scripts work under GNU sed 3.02:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# Execute your_commands on range "/RE1/,/RE2/", but if /RE2/ is
|
|
|
d0cde9 |
# not found, do nothing.
|
|
|
d0cde9 |
/RE1/{:a;N;/RE2/!ba;your_commands;}
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# Execute your_commands on range "20,/RE/", but if /RE/ is not
|
|
|
d0cde9 |
# found, do nothing.
|
|
|
d0cde9 |
20{:a;N;/RE/!ba;your_commands;}
|
|
|
d0cde9 |
|
|
|
d0cde9 |
As a side note, once we've used N to "slurp" lines together to test
|
|
|
d0cde9 |
for the ending expression, the pattern space will have gathered
|
|
|
d0cde9 |
many lines (possibly thousands) together and concatenated them as a
|
|
|
d0cde9 |
single expression, with the \n sequence marking line breaks. The
|
|
|
d0cde9 |
REs *within* the pattern space may have to be modified (e.g., you
|
|
|
d0cde9 |
must write '/\nStart/' instead of '/^Start/' and '/[^\n]*/' instead
|
|
|
d0cde9 |
of '/.*/') and other standard sed commands will be unavailable or
|
|
|
d0cde9 |
difficult to use.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# Execute your_commands on range "/RE/,30", but if /RE/ occurs
|
|
|
d0cde9 |
# on line 31 or later, do not match it.
|
|
|
d0cde9 |
1,30{/RE/,$ your_commands;}
|
|
|
d0cde9 |
|
|
|
d0cde9 |
For related suggestions on using address ranges, see sections 4.2,
|
|
|
d0cde9 |
4.15, and 4.19 of this FAQ. Also, note the following section.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
3.4. Address ranges in GNU sed and HHsed
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(1) GNU sed 3.02+, ssed, and sed15+ all support address ranges like:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
/regex/,+5
|
|
|
d0cde9 |
|
|
|
d0cde9 |
which match /regex/ plus the next 5 lines (or EOF, whichever comes
|
|
|
d0cde9 |
first).
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(2) GNU sed v3.02.80 (and above) and ssed support address ranges of:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
0,/regex/
|
|
|
d0cde9 |
|
|
|
d0cde9 |
as a special case to permit matching /regex/ if it occurs on the
|
|
|
d0cde9 |
first line. This syntax permits a range expression that matches
|
|
|
d0cde9 |
every line from the top of the file to the first instance of
|
|
|
d0cde9 |
/regex/, even if /regex/ is on the first line.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(3) HHsed (sed15) has an exceptional way of implementing
|
|
|
d0cde9 |
|
|
|
d0cde9 |
/regex1/,/regex2/
|
|
|
d0cde9 |
|
|
|
d0cde9 |
If /RE1/ and /RE2/ both occur on the *same* line, HHsed will match
|
|
|
d0cde9 |
that single line. In other words, an address range block can
|
|
|
d0cde9 |
consist of just one line. HHsed will then look for the next
|
|
|
d0cde9 |
occurrence of /regex1/ to begin the block again.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Every other version of sed (including sed16) requires 2 lines to
|
|
|
d0cde9 |
match an address range, and thus /regex1/ and /regex2/ cannot
|
|
|
d0cde9 |
successfully match just one line. See also the comments at
|
|
|
d0cde9 |
section 7.9.4, below.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(4) BEGIN~STEP selection: ssed and GNU sed (v2.05 and above) offer
|
|
|
d0cde9 |
a form of addressing called "BEGIN~STEP selection". This is *not* a
|
|
|
d0cde9 |
range address, which selects an inclusive block of consecutive
|
|
|
d0cde9 |
lines from /start/ to /finish/. But I think it seems to belong here.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Given an expression of the form "M~N", where M and N are integers,
|
|
|
d0cde9 |
GNU sed and ssed will select every Nth line, beginning at line M.
|
|
|
d0cde9 |
(With gsed v2.05, M had to be less than N, but this restriction is
|
|
|
d0cde9 |
no longer necessary). Both M and N may equal 0 ("0~0" selects every
|
|
|
d0cde9 |
line). These examples illustrate the syntax:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed '1~3d' file # delete every 3d line, starting with line 1
|
|
|
d0cde9 |
# deletes lines 1, 4, 7, 10, 13, 16, ...
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed '0~3d' file # deletes lines 3, 6, 9, 12, 15, 18, ...
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed -n '2~5p' file # print every 5th line, starting with line 2
|
|
|
d0cde9 |
# prints lines 2, 7, 12, 17, 22, 27, ...
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(5) Finally, GNU sed v2.05 has a bug in range addressing (see
|
|
|
d0cde9 |
section 7.5), which was fixed in the higher versions.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
|
|
|
d0cde9 |
3.5. Debugging sed scripts
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The following two debuggers should make it easier to understand how
|
|
|
d0cde9 |
sed scripts operate. They can save hours of grief when trying to
|
|
|
d0cde9 |
determine the problems with a sed script.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(1) sd (sed debugger), by Brian Hiles
|
|
|
d0cde9 |
|
|
|
d0cde9 |
This debugger runs under a Unix shell, is powerful, and is easy to
|
|
|
d0cde9 |
use. sd has conditional breakpoints and spypoints of the pattern
|
|
|
d0cde9 |
space and hold space, on any scope defined by regex match and/or
|
|
|
d0cde9 |
script line number. It can be semi-automated, can save diagnostic
|
|
|
d0cde9 |
reports, and shows potential problems with a sed script before it
|
|
|
d0cde9 |
tries to execute it. The script is robust and requires the Unix
|
|
|
d0cde9 |
shell utilities plus the Bourne shell or Korn shell to execute.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
http://sed.sourceforge.net/grabbag/scripts/sd.ksh.txt (2003)
|
|
|
d0cde9 |
http://sed.sourceforge.net/grabbag/scripts/sd.sh.txt (1998)
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(2) sedsed, by Aurelio Jargas
|
|
|
d0cde9 |
|
|
|
d0cde9 |
This debugger requires Python to run it, and it uses your own
|
|
|
d0cde9 |
version of sed, whatever that may be. It displays the current input
|
|
|
d0cde9 |
line, the pattern space, and the hold space, before and after each
|
|
|
d0cde9 |
sed command is executed.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
http://sedsed.sourceforge.net
|
|
|
d0cde9 |
|
|
|
d0cde9 |
|
|
|
d0cde9 |
3.6. Notes about s2p, the sed-to-perl translator
|
|
|
d0cde9 |
|
|
|
d0cde9 |
s2p (sed to perl) is a Perl program to convert sed scripts into the
|
|
|
d0cde9 |
Perl programming language; it is included with many versions of
|
|
|
d0cde9 |
Perl. These problems have been found when using s2p:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(1) Doesn't recognize the semicolon properly after s/// commands.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
s/foo/bar/g;
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(2) Doesn't trim trailing whitespace after s/// commands. Even lone
|
|
|
d0cde9 |
trailing spaces, without comments, produce an error.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(3) Doesn't handle multiple commands within braces. E.g.,
|
|
|
d0cde9 |
|
|
|
d0cde9 |
1,4{=;G;}
|
|
|
d0cde9 |
|
|
|
d0cde9 |
will produce perl code with missing braces, and miss the second "G"
|
|
|
d0cde9 |
command as well. In fact, any commands after the first one are
|
|
|
d0cde9 |
missed in the perl output script, and the output perl script will
|
|
|
d0cde9 |
also contain mismatched braces.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
3.7. GNU/POSIX extensions to regular expressions
|
|
|
d0cde9 |
|
|
|
d0cde9 |
GNU sed supports "character classes" in addition to regular
|
|
|
d0cde9 |
character sets, such as [0-9A-F]. Like regular character sets,
|
|
|
d0cde9 |
character classes represent any single character within a set.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
"Character classes are a new feature introduced in the POSIX
|
|
|
d0cde9 |
standard. A character class is a special notation for describing
|
|
|
d0cde9 |
lists of characters that have a specific attribute, but where the
|
|
|
d0cde9 |
actual characters themselves can vary from country to country
|
|
|
d0cde9 |
and/or from character set to character set. For example, the notion
|
|
|
d0cde9 |
of what is an alphabetic character differs in the USA and in
|
|
|
d0cde9 |
France." [quoted from the docs for GNU awk v3.1.0.]
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Though character classes don't generally conserve space on the
|
|
|
d0cde9 |
line, they help make scripts portable for international use. The
|
|
|
d0cde9 |
equivalent character sets _for U.S. users_ follows:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
[[:alnum:]] - [A-Za-z0-9] Alphanumeric characters
|
|
|
d0cde9 |
[[:alpha:]] - [A-Za-z] Alphabetic characters
|
|
|
d0cde9 |
[[:blank:]] - [ \x09] Space or tab characters only
|
|
|
d0cde9 |
[[:cntrl:]] - [\x00-\x19\x7F] Control characters
|
|
|
d0cde9 |
[[:digit:]] - [0-9] Numeric characters
|
|
|
d0cde9 |
[[:graph:]] - [!-~] Printable and visible characters
|
|
|
d0cde9 |
[[:lower:]] - [a-z] Lower-case alphabetic characters
|
|
|
d0cde9 |
[[:print:]] - [ -~] Printable (non-Control) characters
|
|
|
d0cde9 |
[[:punct:]] - [!-/:-@[-`{-~] Punctuation characters
|
|
|
d0cde9 |
[[:space:]] - [ \t\v\f] All whitespace chars
|
|
|
d0cde9 |
[[:upper:]] - [A-Z] Upper-case alphabetic characters
|
|
|
d0cde9 |
[[:xdigit:]] - [0-9a-fA-F] Hexadecimal digit characters
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Note that [[:graph:]] does not match the space " ", but [[:print:]]
|
|
|
d0cde9 |
does. Some character classes may (or may not) match characters in
|
|
|
d0cde9 |
the high ASCII range (ASCII 128-255 or 0x80-0xFF), depending on
|
|
|
d0cde9 |
which C library was used to compile sed. For non-English languages,
|
|
|
d0cde9 |
[[:alpha:]] and other classes may also match high ASCII characters.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
------------------------------
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4. EXAMPLES
|
|
|
d0cde9 |
|
|
|
d0cde9 |
ONE-CHARACTER QUESTIONS
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.1. How do I insert a newline into the RHS of a substitution?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Several versions of sed permit '\n' to be typed directly into the
|
|
|
d0cde9 |
RHS, which is then converted to a newline on output: ssed,
|
|
|
d0cde9 |
gsed302a+, gsed103 (with the -x switch), sed15+, sedmod, and
|
|
|
d0cde9 |
UnixDOS sed. The _easiest_ solution is to use one of these
|
|
|
d0cde9 |
versions.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
For other versions of sed, try one of the following:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(a) If typing the sed script from a Bourne shell, use one backslash
|
|
|
d0cde9 |
"\" if the script uses 'single quotes' or two backslashes "\\" if
|
|
|
d0cde9 |
the script requires "double quotes". In the example below, note
|
|
|
d0cde9 |
that the leading '>' on the 2nd line is generated by the shell to
|
|
|
d0cde9 |
prompt the user for more input. The user types in slash,
|
|
|
d0cde9 |
single-quote, and then ENTER to terminate the command:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
[sh-prompt]$ echo twolines | sed 's/two/& new\
|
|
|
d0cde9 |
>/'
|
|
|
d0cde9 |
two new
|
|
|
d0cde9 |
lines
|
|
|
d0cde9 |
[bash-prompt]$
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(b) Use a script file with one backslash '\' in the script,
|
|
|
d0cde9 |
immediately followed by a newline. This will embed a newline into
|
|
|
d0cde9 |
the "replace" portion. Example:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed -f newline.sed files
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# newline.sed
|
|
|
d0cde9 |
s/twolines/two new\
|
|
|
d0cde9 |
lines/g
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Some versions of sed may not need the trailing backslash. If so,
|
|
|
d0cde9 |
remove it.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(c) Insert an unused character and pipe the output through tr:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
echo twolines | sed 's/two/& new=/' | tr "=" "\n" # produces
|
|
|
d0cde9 |
two new
|
|
|
d0cde9 |
lines
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(d) Use the "G" command:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
G appends a newline, plus the contents of the hold space to the end
|
|
|
d0cde9 |
of the pattern space. If the hold space is empty, a newline is
|
|
|
d0cde9 |
appended anyway. The newline is stored in the pattern space as "\n"
|
|
|
d0cde9 |
where it can be addressed by grouping "\(...\)" and moved in the
|
|
|
d0cde9 |
RHS. Thus, to change the "twolines" example used earlier, the
|
|
|
d0cde9 |
following script will work:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed '/twolines/{G;s/\(two\)\(lines\)\(\n\)/\1\3\2/;}'
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(e) Inserting full lines, not breaking lines up:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
If one is not *changing* lines but only inserting complete lines
|
|
|
d0cde9 |
before or after a pattern, the procedure is much easier. Use the
|
|
|
d0cde9 |
"i" (insert) or "a" (append) command, making the alterations by an
|
|
|
d0cde9 |
external script. To insert "This line is new" BEFORE each line
|
|
|
d0cde9 |
matching a regex:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
/RE/i This line is new # HHsed, sedmod, gsed 3.02a
|
|
|
d0cde9 |
/RE/{x;s/$/This line is new/;G;} # other seds
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The two examples above are intended as "one-line" commands entered
|
|
|
d0cde9 |
from the console. If using a sed script, "i\" immediately followed
|
|
|
d0cde9 |
by a literal newline will work on all versions of sed. Furthermore,
|
|
|
d0cde9 |
the command "s/$/This line is new/" will only work if the hold
|
|
|
d0cde9 |
space is already empty (which it is by default).
|
|
|
d0cde9 |
|
|
|
d0cde9 |
To append "This line is new" AFTER each line matching a regex:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
/RE/a This line is new # HHsed, sedmod, gsed 3.02a
|
|
|
d0cde9 |
/RE/{G;s/$/This line is new/;} # other seds
|
|
|
d0cde9 |
|
|
|
d0cde9 |
To append 2 blank lines after each line matching a regex:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
/RE/{G;G;} # assumes the hold space is empty
|
|
|
d0cde9 |
|
|
|
d0cde9 |
To replace each line matching a regex with 5 blank lines:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
/RE/{s/.*//;G;G;G;G;} # assumes the hold space is empty
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(f) Use the "y///" command if possible:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
On some Unix versions of sed (not GNU sed!), though the s///
|
|
|
d0cde9 |
command won't accept '\n' in the RHS, the y/// command does. If
|
|
|
d0cde9 |
your Unix sed supports it, a newline after "aaa" can be inserted
|
|
|
d0cde9 |
this way (which is not portable to GNU sed or other seds):
|
|
|
d0cde9 |
|
|
|
d0cde9 |
s/aaa/&~;; y/~/\n/; # assuming no other '~' is on the line!
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.2. How do I represent control-codes or nonprintable characters?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Several versions of sed support the notation \xHH, where "HH" are
|
|
|
d0cde9 |
two hex digits, 00-FF: ssed, GNU sed v3.02.80 and above, GNU sed
|
|
|
d0cde9 |
v1.03, sed16 and sed15 (HHsed). Try to use one of those versions.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Sed is not intended to process binary or object code, and files
|
|
|
d0cde9 |
which contain nulls (0x00) will usually generate errors in most
|
|
|
d0cde9 |
versions of sed. The latest versions of GNU sed and ssed are an
|
|
|
d0cde9 |
exception; they permit nulls in the input files and also in
|
|
|
d0cde9 |
regexes.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
On Unix platforms, the 'echo' command may allow insertion of octal
|
|
|
d0cde9 |
or hex values, e.g., `echo "\0nnn"` or `echo -n "\0nnn"`. The echo
|
|
|
d0cde9 |
command may also support syntax like '\\b' or '\\t' for backspace
|
|
|
d0cde9 |
or tab characters. Check the man pages to see what syntax your
|
|
|
d0cde9 |
version of echo supports. Some versions support the following:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# replace 0x1A (32 octal) with ASCII letters
|
|
|
d0cde9 |
sed 's/'`echo "\032"`'/Ctrl-Z/g'
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# note the 3 backslashes in the command below
|
|
|
d0cde9 |
sed "s/.`echo \\\b`//g"
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.3. How do I convert files with toggle characters, like +this+, to
|
|
|
d0cde9 |
look like [i]this[/i]?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Input files, especially message-oriented text files, often contain
|
|
|
d0cde9 |
toggle characters for emphasis, like ~this~, *this*, or =this=. Sed
|
|
|
d0cde9 |
can make the same input pattern produce alternating output each
|
|
|
d0cde9 |
time it is encountered. Typical needs might be to generate HMTL
|
|
|
d0cde9 |
codes or print codes for boldface, italic, or underscore. This
|
|
|
d0cde9 |
script accomodates multiple occurrences of the toggle pattern on
|
|
|
d0cde9 |
the same line, as well as cases where the pattern starts on one
|
|
|
d0cde9 |
line and finishes several lines later, even at the end of the file:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# sed script to convert +this+ to [i]this[/i]
|
|
|
d0cde9 |
:a
|
|
|
d0cde9 |
/+/{ x; # If "+" is found, switch hold and pattern space
|
|
|
d0cde9 |
/^ON/{ # If "ON" is in the (former) hold space, then ..
|
|
|
d0cde9 |
s///; # .. delete it
|
|
|
d0cde9 |
x; # .. switch hold space and pattern space back
|
|
|
d0cde9 |
s|+|[/i]|; # .. turn the next "+" into "[/i]"
|
|
|
d0cde9 |
ba; # .. jump back to label :a and start over
|
|
|
d0cde9 |
}
|
|
|
d0cde9 |
s/^/ON/; # Else, "ON" was not in the hold space; create it
|
|
|
d0cde9 |
x; # Switch hold space and pattern space
|
|
|
d0cde9 |
s|+|[i]|; # Turn the first "+" into "[i]"
|
|
|
d0cde9 |
ba; # Branch to label :a to find another pattern
|
|
|
d0cde9 |
}
|
|
|
d0cde9 |
#---end of script---
|
|
|
d0cde9 |
|
|
|
d0cde9 |
This script uses the hold space to create a "flag" to indicate
|
|
|
d0cde9 |
whether the toggle is ON or not. We have added remarks to
|
|
|
d0cde9 |
illustrate the script logic, but in most versions of sed remarks
|
|
|
d0cde9 |
are not permitted after 'b'ranch commands or labels.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
If you are sure that the +toggle+ characters never cross line
|
|
|
d0cde9 |
boundaries (i.e., never begin on one line and end on another), this
|
|
|
d0cde9 |
script can be reduced to one line:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
s|+\([^+][^+]*\)+|[i]\1[/i]|g
|
|
|
d0cde9 |
|
|
|
d0cde9 |
If your toggle pattern contains regex metacharacters (such as '*'
|
|
|
d0cde9 |
or perhaps '+' or '?'), remember to quote them with backslashes.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
CHANGING STRINGS
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.10. How do I perform a case-insensitive search?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Several versions of sed support case-insensitive matching: ssed and
|
|
|
d0cde9 |
GNU sed v3.02+ (with I flag after s/// or /regex/); sedmod with the
|
|
|
d0cde9 |
-i switch; and sed16 (which supports both types of switches).
|
|
|
d0cde9 |
|
|
|
d0cde9 |
With other versions of sed, case-insensitive searching is awkward,
|
|
|
d0cde9 |
so people may use awk or perl instead, since these programs have
|
|
|
d0cde9 |
options for case-insensitive searches. In gawk/mawk, use "BEGIN
|
|
|
d0cde9 |
{IGNORECASE=1}" and in perl, "/regex/i". For other seds, here are
|
|
|
d0cde9 |
three solutions:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Solution 1: convert everything to upper case and search normally
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# sed script, solution 1
|
|
|
d0cde9 |
h; # copy the original line to the hold space
|
|
|
d0cde9 |
# convert the pattern space to solid caps
|
|
|
d0cde9 |
y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
|
|
|
d0cde9 |
# now we can search for the word "CARLOS"
|
|
|
d0cde9 |
/CARLOS/ {
|
|
|
d0cde9 |
# add or insert lines. Note: "s/.../.../" will not work
|
|
|
d0cde9 |
# here because we are searching a modified pattern
|
|
|
d0cde9 |
# space and are not printing the pattern space.
|
|
|
d0cde9 |
}
|
|
|
d0cde9 |
x; # get back the original pattern space
|
|
|
d0cde9 |
# the original pattern space will be printed
|
|
|
d0cde9 |
#---end of sed script---
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Solution 2: search for both cases
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Often, proper names will either start with all lower-case ("unix"),
|
|
|
d0cde9 |
with an initial capital letter ("Unix") or occur in solid caps
|
|
|
d0cde9 |
("UNIX"). There may be no need to search for every possibility.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
/UNIX/b match
|
|
|
d0cde9 |
/[Uu]nix/b match
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Solution 3: search for all possible cases
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# If you must, search for any possible combination
|
|
|
d0cde9 |
/[Ca][Aa][Rr][Ll][Oo][Ss]/ { ... }
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Bear in mind that as the pattern length increases, this solution
|
|
|
d0cde9 |
becomes an order of magnitude slower than the one of Solution 1, at
|
|
|
d0cde9 |
least with some implementations of sed.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.11. How do I match only the first occurrence of a pattern?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(1) The general solution is to use GNU sed or ssed, with one of
|
|
|
d0cde9 |
these range expressions. The first script ("print only the first
|
|
|
d0cde9 |
match") works with any version of sed:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed -n '/RE/{p;q;}' file # print only the first match
|
|
|
d0cde9 |
sed '0,/RE/{//d;}' file # delete only the first match
|
|
|
d0cde9 |
sed '0,/RE/s//to_that/' file # change only the first match
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(2) If you cannot use GNU sed and if you *know* the pattern will
|
|
|
d0cde9 |
not occur on the first line, this will work:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed '1,/RE/{//d;}' file # delete only the first match
|
|
|
d0cde9 |
sed '1,/RE/s//to_that/' file # change only the first match
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(3) If you cannot use GNU sed and the pattern *might* occur on the
|
|
|
d0cde9 |
first line, use one of the following commands (credit for short GNU
|
|
|
d0cde9 |
script goes to Donald Bruce Stewart):
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed '/RE/{x;/Y/!{s/^/Y/;h;d;};x;}' file # delete (one way)
|
|
|
d0cde9 |
sed -e '/RE/{d;:a' -e '$!N;$ba' -e '}' file # delete (another way)
|
|
|
d0cde9 |
sed '/RE/{d;:a;N;$ba;}' file # same script, GNU sed
|
|
|
d0cde9 |
sed -e '/RE/{s//to_that/;:a' -e '$!N;$!ba' -e '}' file # change
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Still another solution, using a flag in the hold space. This is
|
|
|
d0cde9 |
portable to all seds and works if the pattern is on the first line:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# sed script to change "foo" to "bar" only on the first occurrence
|
|
|
d0cde9 |
1{x;s/^/first/;x;}
|
|
|
d0cde9 |
1,/foo/{x;/first/s///;x;s/foo/bar/;}
|
|
|
d0cde9 |
#---end of script---
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.12. How do I parse a comma-delimited (CSV) data file?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Comma-delimited data files can come in several forms, requiring
|
|
|
d0cde9 |
increasing levels of complexity in parsing and handling. They are
|
|
|
d0cde9 |
often referred to as CSV files (for "comma separated values") and
|
|
|
d0cde9 |
occasionally as SDF files (for "standard data format"). Note that
|
|
|
d0cde9 |
some vendors use "SDF" to refer to variable-length records with
|
|
|
d0cde9 |
comma-separated fields which are "double-quoted" if they contain
|
|
|
d0cde9 |
character values, while other vendors use "SDF" to designate
|
|
|
d0cde9 |
fixed-length records with fixed-length, nonquoted fields! (For help
|
|
|
d0cde9 |
with fixed-length fields, see question 4.23)
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The term "CSV" became a de-facto standard when Microsoft Excel used
|
|
|
d0cde9 |
it as an optional output file format.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Here are 4 different forms you may encounter in comma-delimited data:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(a) No quotes, no internal commas
|
|
|
d0cde9 |
|
|
|
d0cde9 |
1001,John Smith,PO Box 123,Chicago,IL,60699
|
|
|
d0cde9 |
1002,Mary Jones,320 Main,Denver,CO,84100,
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(b) Like (a), with quotes around each field
|
|
|
d0cde9 |
|
|
|
d0cde9 |
"1003","John Smith","PO Box 123","Chicago","IL","60699"
|
|
|
d0cde9 |
"1004","Mary Jones","320 Main","Denver","CO","84100"
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(c) Like (b), with embedded commas
|
|
|
d0cde9 |
|
|
|
d0cde9 |
"1005","Tom Hall, Jr.","61 Ash Ct.","Niles","OH","44446"
|
|
|
d0cde9 |
"1006","Bob Davis","429 Pine, Apt. 5","Boston","MA","02128"
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(d) Like (c), with embedded commas and quotes
|
|
|
d0cde9 |
|
|
|
d0cde9 |
"1007","Sue "Red" Smith","19 Main","Troy","MI","48055"
|
|
|
d0cde9 |
"1008","Joe "Hey, guy!" Hall","POB 44","Reno","NV","89504"
|
|
|
d0cde9 |
|
|
|
d0cde9 |
In each example above, we have 7 fields and 6 commas which function
|
|
|
d0cde9 |
as field separators. Case (c) is a very typical form of these data
|
|
|
d0cde9 |
files, with double quotes used to enclose each field and to protect
|
|
|
d0cde9 |
internal commas (such as "Tom Hall, Jr.") from interpretation as
|
|
|
d0cde9 |
field separators. However, many times the data may include both
|
|
|
d0cde9 |
embedded quotation marks as well as embedded commas, as seen by
|
|
|
d0cde9 |
case (d), above.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Case (d) is the closest to Microsoft CSV format. *However*, the
|
|
|
d0cde9 |
Microsoft CSV format allows embedded newlines within a
|
|
|
d0cde9 |
double-quoted field. If embedded newlines within fields are a
|
|
|
d0cde9 |
possibility for your data, you should consider using something
|
|
|
d0cde9 |
other than sed to work with the data file.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Before handling a comma-delimited data file, make sure that you
|
|
|
d0cde9 |
fully understand its format and check the integrity of the data.
|
|
|
d0cde9 |
Does each line contain the same number of fields? Should certain
|
|
|
d0cde9 |
fields be composed only of numbers or of two-letter state
|
|
|
d0cde9 |
abbreviations in all caps? Sed (or awk or perl) should be used to
|
|
|
d0cde9 |
validate the integrity of the data file before you attempt to alter
|
|
|
d0cde9 |
it or extract particular fields from the file.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
After ensuring that each line has a valid number of fields, use sed
|
|
|
d0cde9 |
to locate and modify individual fields, using the \(...\) grouping
|
|
|
d0cde9 |
command where needed.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
In case (a):
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed 's/^[^,]*,[^,]*,[^,]*,[^,]*,/.../'
|
|
|
d0cde9 |
^ ^ ^
|
|
|
d0cde9 |
| | |_ 3rd field
|
|
|
d0cde9 |
| |_______ 2nd field
|
|
|
d0cde9 |
|_____________ 1st field
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# Unix script to delete the second field for case (a)
|
|
|
d0cde9 |
sed 's/^\([^,]*\),[^,]*,/\1,,/' file
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# Unix script to change field 1 to 9999 for case (a)
|
|
|
d0cde9 |
sed 's/^[^,]*,/9999,/' file
|
|
|
d0cde9 |
|
|
|
d0cde9 |
In cases (b) and (c):
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed 's/^"[^"]*","[^"]*","[^"]*","[^"]*",/.../'
|
|
|
d0cde9 |
1st-- 2nd-- 3rd-- 4th--
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# Unix script to delete the second field for case (c)
|
|
|
d0cde9 |
sed 's/^\("[^"]*"\),"[^"]*",/\1,"",/' file
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# Unix script to change field 1 to 9999 for case (c)
|
|
|
d0cde9 |
sed 's/^"[^"]*",/"9999",/' file
|
|
|
d0cde9 |
|
|
|
d0cde9 |
|
|
|
d0cde9 |
In case (d):
|
|
|
d0cde9 |
|
|
|
d0cde9 |
One way to parse such files is to replace the 3-character field
|
|
|
d0cde9 |
separator "," with an unused character like the tab or vertical
|
|
|
d0cde9 |
bar. (Technically, the field separator is only the comma while the
|
|
|
d0cde9 |
fields are surrounded by "double quotes", but the net _effect_ is
|
|
|
d0cde9 |
that fields are separated by quote-comma-quote, with quote
|
|
|
d0cde9 |
characters added to the beginning and end of each record.) Search
|
|
|
d0cde9 |
your datafile _first_ to make sure that your character appears
|
|
|
d0cde9 |
nowhere in it!
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed -n '/|/p' file # search for any instance of '|'
|
|
|
d0cde9 |
# if it's not found, we can use the '|' to separate fields
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Then replace the 3-character field separator and parse as before:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# sed script to delete the second field for case (d)
|
|
|
d0cde9 |
s/","/|/g; # global change of "," to bar
|
|
|
d0cde9 |
s/^\([^|]*\)|[^|]|/\1||/; # delete 2nd field
|
|
|
d0cde9 |
s/|/","/g; # global change of bar back to ","
|
|
|
d0cde9 |
#---end of script---
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# sed script to change field 1 to 9999 for case (d)
|
|
|
d0cde9 |
# Remember to accommodate leading and trailing quote marks
|
|
|
d0cde9 |
s/","/|/g;
|
|
|
d0cde9 |
s/^[^|]*|/"9999|/;
|
|
|
d0cde9 |
s/|/","/g;
|
|
|
d0cde9 |
#---end of script---
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Note that this technique works only if _each_ and _every_ field is
|
|
|
d0cde9 |
surrounded with double quotes, including empty fields.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The following solution is for more complex examples of (d), such
|
|
|
d0cde9 |
as: not all fields contain "double-quote" marks, or the presence of
|
|
|
d0cde9 |
embedded "double-quote" marks within fields, or extraneous
|
|
|
d0cde9 |
whitespace around field delimiters. (Thanks to Greg Ubben for this
|
|
|
d0cde9 |
script!)
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# sed script to convert case (d) to bar-delimited records
|
|
|
d0cde9 |
s/^ *\(.*[^ ]\) *$/|\1|/;
|
|
|
d0cde9 |
s/" *, */"|/g;
|
|
|
d0cde9 |
: loop
|
|
|
d0cde9 |
s/| *\([^",|][^,|]*\) *, */|\1|/g;
|
|
|
d0cde9 |
s/| *, */|\1|/g;
|
|
|
d0cde9 |
t loop
|
|
|
d0cde9 |
s/ *|/|/g;
|
|
|
d0cde9 |
s/| */|/g;
|
|
|
d0cde9 |
s/^|\(.*\)|$/\1/;
|
|
|
d0cde9 |
#---end of script---
|
|
|
d0cde9 |
|
|
|
d0cde9 |
For example, it turns this (which is badly-formed but legal):
|
|
|
d0cde9 |
|
|
|
d0cde9 |
first,"",unquoted ,""this" is, quoted " ,, sub "quote" inside, f", lone " empty:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
into this:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
first|""|unquoted|""this" is, quoted "||sub "quote" inside|f"|lone " empty:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Note that the script preserves the "double-quote" marks, but
|
|
|
d0cde9 |
changes only the commas where they are used as field separators. I
|
|
|
d0cde9 |
have used the vertical bar "|" because it's easier to read, but you
|
|
|
d0cde9 |
may change this to another field separator if you wish.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
If your CSV datafile is more complex, it would probably not be
|
|
|
d0cde9 |
worth the effort to write it in sed. For such a case, you should
|
|
|
d0cde9 |
use Perl with a dedicated CSV module (there are at least two recent
|
|
|
d0cde9 |
CSV parsers available from CPAN).
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.13. How do I handle fixed-length, columnar data?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Sed handles fixed-length fields via \(grouping\) and backreferences
|
|
|
d0cde9 |
(\1, \2, \3 ...). If we have 3 fields of 10, 25, and 9 characters
|
|
|
d0cde9 |
per field, our sed script might look like so:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
s/^\(.\{10\}\)\(.\{25\}\)\(.\{9\}\)/\3\2\1/; # Change the fields
|
|
|
d0cde9 |
^^^^^^^^^^^~~~~~~~~~~~========== # from 1,2,3 to 3,2,1
|
|
|
d0cde9 |
field #1 field #2 field #3
|
|
|
d0cde9 |
|
|
|
d0cde9 |
This is a bit hard to read. By using GNU sed or ssed with the -r
|
|
|
d0cde9 |
switch active, it can look like this:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
s/^(.{10})(.{25})(.{9})/\3\2\1/; # Using the -r switch
|
|
|
d0cde9 |
|
|
|
d0cde9 |
To delete a field in sed, use grouping and omit the backreference
|
|
|
d0cde9 |
from the field to be deleted. If the data is long or difficult to
|
|
|
d0cde9 |
work with, use ssed with the -R switch and the /x flag after an s///
|
|
|
d0cde9 |
command, to insert comments and remarks about the fields.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
For records with many fields, use GNU awk with the FIELDWIDTHS
|
|
|
d0cde9 |
variable set in the top of the script. For example:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
awk 'BEGIN{FIELDWIDTHS = "10 25 9"}; {print $3 $2 $1}' file
|
|
|
d0cde9 |
|
|
|
d0cde9 |
This is much easier to read than a similar sed script, especially
|
|
|
d0cde9 |
if there are more than 5 or 6 fields to manipulate.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.14. How do I commify a string of numbers?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Use the simplest script necessary to accomplish your task. As
|
|
|
d0cde9 |
variations of the line increase, the sed script must become more
|
|
|
d0cde9 |
complex to handle additional conditions. Whole numbers are
|
|
|
d0cde9 |
simplest, followed by decimal formats, followed by embedded words.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Case 1: simple strings of whole numbers separated by spaces or
|
|
|
d0cde9 |
commas, with an optional negative sign. To convert this:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4381, -1222333, and 70000: - 44555666 1234567890 words
|
|
|
d0cde9 |
56890 -234567, and 89222 -999777 345888777666 chars
|
|
|
d0cde9 |
|
|
|
d0cde9 |
to this:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4,381, -1,222,333, and 70,000: - 44,555,666 1,234,567,890 words
|
|
|
d0cde9 |
56,890 -234,567, and 89,222 -999,777 345,888,777,666 chars
|
|
|
d0cde9 |
|
|
|
d0cde9 |
use one of these one-liners:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed ':a;s/\B[0-9]\{3\}\>/,&/;ta' # GNU sed
|
|
|
d0cde9 |
sed -e :a -e 's/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/;ta' # other seds
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Case 2: strings of numbers which may have an embedded decimal
|
|
|
d0cde9 |
point, separated by spaces or commas, with an optional negative
|
|
|
d0cde9 |
sign. To change this:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4381, -6555.1212 and 70000, 7.18281828 44906982.071902
|
|
|
d0cde9 |
56890 -2345.7778 and 8.0000: -49000000 -1234567.89012
|
|
|
d0cde9 |
|
|
|
d0cde9 |
to this:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4,381, -6,555.1212 and 70,000, 7.18281828 44,906,982.071902
|
|
|
d0cde9 |
56,890 -2,345.7778 and 8.0000: -49,000,000 -1,234,567.89012
|
|
|
d0cde9 |
|
|
|
d0cde9 |
use the following command for GNU sed:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed ':a;s/\(^\|[^0-9.]\)\([0-9]\+\)\([0-9]\{3\}\)/\1\2,\3/g;ta'
|
|
|
d0cde9 |
|
|
|
d0cde9 |
and for other versions of sed:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed -f case2.sed files
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# case2.sed
|
|
|
d0cde9 |
s/^/ /; # add space to start of line
|
|
|
d0cde9 |
:a
|
|
|
d0cde9 |
s/\( [-0-9]\{1,\}\)\([0-9]\{3\}\)/\1,\2/g
|
|
|
d0cde9 |
ta
|
|
|
d0cde9 |
s/ //; # remove space from start of line
|
|
|
d0cde9 |
#---end of script---
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.15. How do I prevent regex expansion on substitutions?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Sometimes you want to *match* regular expression metacharacters as
|
|
|
d0cde9 |
literals (e.g., you want to match "[0-9]" or "\n"), to be replaced
|
|
|
d0cde9 |
with something else. The ordinary way to prevent expanding
|
|
|
d0cde9 |
metacharacters is to prefix them with a backslash. Thus, if "\n"
|
|
|
d0cde9 |
matches a newline, "\\n" will match the two-character string of
|
|
|
d0cde9 |
'backslash' followed by 'n'.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
But doing this repeatedly can become tedious if there are many
|
|
|
d0cde9 |
regexes. The following script will replace alternating strings of
|
|
|
d0cde9 |
literals, where no character is interpreted as a regex
|
|
|
d0cde9 |
metacharacter:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# filename: sub_quote.sed
|
|
|
d0cde9 |
# author: Paolo Bonzini
|
|
|
d0cde9 |
# sed script to add backslash to find/replace metacharacters
|
|
|
d0cde9 |
N; # add even numbered line to pattern space
|
|
|
d0cde9 |
s,[]/\\$*[],\\&,;; # quote all of [, ], /, \, $, or *
|
|
|
d0cde9 |
s,^,s/,; # prepend "s/" to front of pattern space
|
|
|
d0cde9 |
s,$,/,; # append "/" to end of pattern space
|
|
|
d0cde9 |
s,\n,/,; # change "\n" to "/", making s/from/to/
|
|
|
d0cde9 |
#---end of script---
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Here's a sample of how sub_quote.sed might be used. This example
|
|
|
d0cde9 |
converts typical sed regexes to perl-style regexes. The input file
|
|
|
d0cde9 |
consists of 10 lines:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
[0-9]
|
|
|
d0cde9 |
\d
|
|
|
d0cde9 |
[^0-9]
|
|
|
d0cde9 |
\D
|
|
|
d0cde9 |
\+
|
|
|
d0cde9 |
+
|
|
|
d0cde9 |
\?
|
|
|
d0cde9 |
?
|
|
|
d0cde9 |
\|
|
|
|
d0cde9 |
|
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Run the command "sed -f sub_quote.sed input", to transform the
|
|
|
d0cde9 |
input file (above) to 5 lines of output:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
s/\[0-9\]/\\d/
|
|
|
d0cde9 |
s/\[^0-9\]/\\D/
|
|
|
d0cde9 |
s/\\+/+/
|
|
|
d0cde9 |
s/\\?/?/
|
|
|
d0cde9 |
s/\\|/|/
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The above file is itself a sed script, which can then be used to
|
|
|
d0cde9 |
modify other files.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.16. How do I convert a string to all lowercase or capital letters?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The easiest method is to use a new version of GNU sed, ssed, sedmod
|
|
|
d0cde9 |
or sed16 and employ the \U, \L, or other switches on the right side
|
|
|
d0cde9 |
of an s/// command. For example, to convert any word which begins
|
|
|
d0cde9 |
with "reg" or "exp" into solid capital letters:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed -r "s/\<(reg|exp)[a-z]+/\U&/g" # gsed4.+ or ssed
|
|
|
d0cde9 |
sed "s/\
|
|
|
d0cde9 |
|
|
|
d0cde9 |
As you can see, sedmod and sed16 do not support alternation (|),
|
|
|
d0cde9 |
but they do support case conversion. If none of these versions of
|
|
|
d0cde9 |
sed are available to you, some sample scripts for this task are
|
|
|
d0cde9 |
available from the Seder's Grab Bag:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
http://sed.sourceforge.net/grabbag/scripts
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Note that some case conversion scripts are listed under "Filename
|
|
|
d0cde9 |
manipulation" and others are under "Text formatting."
|
|
|
d0cde9 |
|
|
|
d0cde9 |
CHANGING BLOCKS (consecutive lines)
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.20. How do I change only one section of a file?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
You can match a range of lines by line number, by regexes (say, all
|
|
|
d0cde9 |
lines between the words "from" and "until"), or by a combination of
|
|
|
d0cde9 |
the two. For multiple substitutions on the same range, put the
|
|
|
d0cde9 |
command(s) between braces {...}. For example:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# replace only between lines 1 and 20
|
|
|
d0cde9 |
1,20 s/Johnson/White/g
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# replace everywhere EXCEPT between lines 1 and 20
|
|
|
d0cde9 |
1,20 !s/Johnson/White/g
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# replace only between words "from" and "until". Note the
|
|
|
d0cde9 |
# use of \<....\> as word boundary markers in GNU sed.
|
|
|
d0cde9 |
/from/,/until/ { s/\<red\>/magenta/g; s/\<blue\>/cyan/g; }
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# replace only from the words "ENDNOTES:" to the end of file
|
|
|
d0cde9 |
/ENDNOTES:/,$ { s/Schaff/Herzog/g; s/Kraft/Ebbing/g; }
|
|
|
d0cde9 |
|
|
|
d0cde9 |
For technical details on using address ranges, see section 3.3
|
|
|
d0cde9 |
("Addressing and Address ranges").
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.21. How do I delete or change a block of text if the block contains
|
|
|
d0cde9 |
a certain regular expression?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The following deletes the block between 'start' and 'end'
|
|
|
d0cde9 |
inclusively, if and only if the block contains the string
|
|
|
d0cde9 |
'regex'. Written by Russell Davies, with additional comments:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# sed script to delete a block if /regex/ matches inside it
|
|
|
d0cde9 |
:t
|
|
|
d0cde9 |
/start/,/end/ { # For each line between these block markers..
|
|
|
d0cde9 |
/end/!{ # If we are not at the /end/ marker
|
|
|
d0cde9 |
$!{ # nor the last line of the file,
|
|
|
d0cde9 |
N; # add the Next line to the pattern space
|
|
|
d0cde9 |
bt
|
|
|
d0cde9 |
} # and branch (loop back) to the :t label.
|
|
|
d0cde9 |
} # This line matches the /end/ marker.
|
|
|
d0cde9 |
/regex/d; # If /regex/ matches, delete the block.
|
|
|
d0cde9 |
} # Otherwise, the block will be printed.
|
|
|
d0cde9 |
#---end of script---
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Note: When the script above reaches /regex/, the entire multi-line
|
|
|
d0cde9 |
block is in the pattern space. To replace items inside the block,
|
|
|
d0cde9 |
use "s///". To change the entire block, use the 'c' (change)
|
|
|
d0cde9 |
command:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
/regex/c\
|
|
|
d0cde9 |
1: This will replace the entire block\
|
|
|
d0cde9 |
2: with these two lines of text.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.22. How do I locate a paragraph of text if the paragraph contains a
|
|
|
d0cde9 |
certain regular expression?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Assume that paragraphs are separated by blank lines. For regexes
|
|
|
d0cde9 |
that are single terms, use one of the following scripts:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed -e '/./{H;$!d;}' -e 'x;/regex/!d' # most seds
|
|
|
d0cde9 |
sed '/./{H;$!d;};x;/regex/!d' # GNU sed
|
|
|
d0cde9 |
|
|
|
d0cde9 |
To print paragraphs only if they contain 3 specific regular
|
|
|
d0cde9 |
expressions (RE1, RE2, and RE3), in any order in the paragraph:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed -e '/./{H;$!d;}' -e 'x;/RE1/!d;/RE2/!d;/RE3/!d'
|
|
|
d0cde9 |
|
|
|
d0cde9 |
With this solution and the preceding one, if the paragraphs are
|
|
|
d0cde9 |
excessively long (more than 4k in length), you may overflow sed's
|
|
|
d0cde9 |
internal buffers. If using HHsed, you must add a "G;" command
|
|
|
d0cde9 |
immediately after the "x;" in the scripts above to defeat a bug
|
|
|
d0cde9 |
in HHsed (see section 7.9(5), below, for a description).
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.23. How do I match a block of _specific_ consecutive lines?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
There are three ways to approach this problem:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(1) Try to use a "/range/, /expression/"
|
|
|
d0cde9 |
(2) Try to use a "/multi-line\nexpression/"
|
|
|
d0cde9 |
(3) Try to use a block of "literal strings"
|
|
|
d0cde9 |
|
|
|
d0cde9 |
We describe each approach in the following sections.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.23.1. Try to use a "/range/, /expression/"
|
|
|
d0cde9 |
|
|
|
d0cde9 |
If the block of lines are strings that *never change their order*
|
|
|
d0cde9 |
and if the top line never occurs outside the block, like this:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Abel
|
|
|
d0cde9 |
Baker
|
|
|
d0cde9 |
Charlie
|
|
|
d0cde9 |
Delta
|
|
|
d0cde9 |
|
|
|
d0cde9 |
then these solutions will work for deleting the block:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed 's/^Abel$/{N;N;N;d;}' files # for blocks with few lines
|
|
|
d0cde9 |
sed '/^Abel$/, /^Zebra$/d' files # for blocks with many lines
|
|
|
d0cde9 |
sed '/^Abel$/,+25d' files # HHsed, sedmod, ssed, gsed 3.02.80
|
|
|
d0cde9 |
|
|
|
d0cde9 |
To change the block, use the 'c' (change) command instead of 'd'.
|
|
|
d0cde9 |
To print that block only, use the -n switch and 'p' (print) instead
|
|
|
d0cde9 |
of 'd'. To change some things inside the block, try this:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
/^Abel$/,/^Delta$/ {
|
|
|
d0cde9 |
:ack
|
|
|
d0cde9 |
N;
|
|
|
d0cde9 |
/\nDelta$/! b ack
|
|
|
d0cde9 |
# At this point, all the lines in the block are collected
|
|
|
d0cde9 |
s/ubstitute /somethin/g;
|
|
|
d0cde9 |
}
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.23.2. Try to use a "multi-line\nexpression"
|
|
|
d0cde9 |
|
|
|
d0cde9 |
If the top line of the block sometimes appears alone or is
|
|
|
d0cde9 |
sometimes followed by other lines, or if a partial block may occur
|
|
|
d0cde9 |
somewhere in the file, a multi-line expression may be required.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
In these examples, we give solutions for matching an N-line block.
|
|
|
d0cde9 |
The expression "/^RE1\nRE2\nRE3...$/" represents a properly formed
|
|
|
d0cde9 |
regular expression where \n indicates a newline between lines. Note
|
|
|
d0cde9 |
that the 'N' followed by the 'P;D;' commands forms a "sliding
|
|
|
d0cde9 |
window" technique. A window of N lines is formed. If the multi-line
|
|
|
d0cde9 |
pattern matches, the block is handled. If not, the top line is
|
|
|
d0cde9 |
printed and then deleted from the pattern space, and we try to
|
|
|
d0cde9 |
match at the next line.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# sed script to delete 2 consecutive lines: /^RE1\nRE2$/
|
|
|
d0cde9 |
$b
|
|
|
d0cde9 |
/^RE1$/ {
|
|
|
d0cde9 |
$!N
|
|
|
d0cde9 |
/^RE1\nRE2$/d
|
|
|
d0cde9 |
P;D
|
|
|
d0cde9 |
}
|
|
|
d0cde9 |
#---end of script---
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# sed script to delete 3 consecutive lines. (This script
|
|
|
d0cde9 |
# fails under GNU sed v2.05 and earlier because of the 't'
|
|
|
d0cde9 |
# bug when s///n is used; see section 7.5(1) of the FAQ.)
|
|
|
d0cde9 |
: more
|
|
|
d0cde9 |
$!N
|
|
|
d0cde9 |
s/\n/&/;;
|
|
|
d0cde9 |
t enough
|
|
|
d0cde9 |
$!b more
|
|
|
d0cde9 |
: enough
|
|
|
d0cde9 |
/^RE1\nRE2\nRE3$/d
|
|
|
d0cde9 |
P;D
|
|
|
d0cde9 |
#---end of script---
|
|
|
d0cde9 |
|
|
|
d0cde9 |
For example, to delete a block of 5 consecutive lines, the previous
|
|
|
d0cde9 |
script must be altered in only two places:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(1) Change the 2 in "s/\n/&/;;" to a 4 (the trailing semicolon is
|
|
|
d0cde9 |
needed to work around a bug in HHsed v1.5).
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(2) Change the regex line to "/^RE1\nRE2\nRE3\nRE4\nRE5$/d",
|
|
|
d0cde9 |
modifying the expression as needed.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Suppose we want to delete a block of two blank lines followed by
|
|
|
d0cde9 |
the word "foo" followed by another blank line (4 lines in all).
|
|
|
d0cde9 |
Other blank lines and other instances of "foo" should be left
|
|
|
d0cde9 |
alone. After changing the '2' to a '3' (always one number less than
|
|
|
d0cde9 |
the total number of lines), the regex line would look like this:
|
|
|
d0cde9 |
"/^\n\nfoo\n$/d". (Thanks to Greg Ubben for this script.)
|
|
|
d0cde9 |
|
|
|
d0cde9 |
As an alternative to work around the 't' bug in older versions of
|
|
|
d0cde9 |
GNU sed, the following script will delete 4 consecutive lines:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# sed script to delete 4 consecutive lines. Use this if you
|
|
|
d0cde9 |
# require GNU sed 2.05 and below.
|
|
|
d0cde9 |
/^RE1$/!b
|
|
|
d0cde9 |
$!N
|
|
|
d0cde9 |
$!N
|
|
|
d0cde9 |
:a
|
|
|
d0cde9 |
$b
|
|
|
d0cde9 |
N
|
|
|
d0cde9 |
/^RE1\nRE2\nRE3\nRE4$/d
|
|
|
d0cde9 |
P
|
|
|
d0cde9 |
s/^.*\n\(.*\n.*\n.*\)$/\1/
|
|
|
d0cde9 |
ba
|
|
|
d0cde9 |
#---end of script---
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Its drawback is that it must be modified in 3 places instead of 2
|
|
|
d0cde9 |
to adapt it for more lines, and as additional lines are added, the
|
|
|
d0cde9 |
's' command is forced to work harder to match the regexes. On the
|
|
|
d0cde9 |
other hand, it avoids a bug with gsed-2.05 and illustrates another
|
|
|
d0cde9 |
way to solve the problem of deleting consecutive lines.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.23.3. Try to use a block of "literal strings"
|
|
|
d0cde9 |
|
|
|
d0cde9 |
If you need to match a static block of text (which may occur any
|
|
|
d0cde9 |
number of times throughout a file), where the contents of the block
|
|
|
d0cde9 |
are known in advance, then this script is easy to use. It requires
|
|
|
d0cde9 |
an intermediate file, which we will call "findrep.txt" (below):
|
|
|
d0cde9 |
|
|
|
d0cde9 |
A block of several consecutive lines to
|
|
|
d0cde9 |
be matched literally should be placed on
|
|
|
d0cde9 |
top. Regular expressions like .* or [a-z]
|
|
|
d0cde9 |
will lose their special meaning and be
|
|
|
d0cde9 |
interpreted literally in this block.
|
|
|
d0cde9 |
----
|
|
|
d0cde9 |
Four hyphens separate the two sections. Put
|
|
|
d0cde9 |
the replacement text in the lower section.
|
|
|
d0cde9 |
As above, sed symbols like &, \n, or \1 will
|
|
|
d0cde9 |
lose their special meaning.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
This is a 3-step process. A generic script called "blockrep.sed"
|
|
|
d0cde9 |
will read "findrep.txt" (above) and generate a custom script, which
|
|
|
d0cde9 |
is then used on the actual input file. In other words,
|
|
|
d0cde9 |
"findrep.txt" is a simplified description of the editing that you
|
|
|
d0cde9 |
want to do on the block, and "blockrep.sed" turns it into actual
|
|
|
d0cde9 |
sed commands.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Use this process from a Unix shell or from a DOS prompt:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed -nf blockrep.sed findrep.txt >custom.sed
|
|
|
d0cde9 |
sed -f custom.sed input.file >output.file
|
|
|
d0cde9 |
erase custom.sed
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The generic script "blockrep.sed" follows below. It's fairly long.
|
|
|
d0cde9 |
Examining its output might help you understanding how to use the
|
|
|
d0cde9 |
_sliding window_ technique.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# filename: blockrep.sed
|
|
|
d0cde9 |
# author: Paolo Bonzini
|
|
|
d0cde9 |
# Requires:
|
|
|
d0cde9 |
# (1) blocks to find and replace, e.g., findrep.txt
|
|
|
d0cde9 |
# (2) an input file to be changed, input.file
|
|
|
d0cde9 |
#
|
|
|
d0cde9 |
# blockrep.sed creates a second sed script, custom.sed,
|
|
|
d0cde9 |
# to find the lines above the row of 4 hyphens, globally
|
|
|
d0cde9 |
# replacing them with the lower block of text. GNU sed
|
|
|
d0cde9 |
# is recommended but not required for this script.
|
|
|
d0cde9 |
#
|
|
|
d0cde9 |
# Loop on the first part, accumulating the `from' text
|
|
|
d0cde9 |
# into the hold space.
|
|
|
d0cde9 |
:a
|
|
|
d0cde9 |
/^----$/! {
|
|
|
d0cde9 |
# Escape slashes, backslashes, the final newline and
|
|
|
d0cde9 |
# regular expression metacharacters.
|
|
|
d0cde9 |
s,[/\[.*],\\&,g
|
|
|
d0cde9 |
s/$/\\/
|
|
|
d0cde9 |
H
|
|
|
d0cde9 |
#
|
|
|
d0cde9 |
# Append N cmds needed to maintain the sliding window.
|
|
|
d0cde9 |
x
|
|
|
d0cde9 |
1 s,^.,s/,
|
|
|
d0cde9 |
1! s/^/N\
|
|
|
d0cde9 |
/
|
|
|
d0cde9 |
x
|
|
|
d0cde9 |
n
|
|
|
d0cde9 |
ba
|
|
|
d0cde9 |
}
|
|
|
d0cde9 |
#
|
|
|
d0cde9 |
# Change the final backslash to a slash to separate the
|
|
|
d0cde9 |
# two sides of the s command.
|
|
|
d0cde9 |
x
|
|
|
d0cde9 |
s,\\$,/,
|
|
|
d0cde9 |
x
|
|
|
d0cde9 |
#
|
|
|
d0cde9 |
# Until EOF, gather the substitution into hold space.
|
|
|
d0cde9 |
:b
|
|
|
d0cde9 |
n
|
|
|
d0cde9 |
s,[/\],\\&,g
|
|
|
d0cde9 |
$! s/$/\\/
|
|
|
d0cde9 |
H
|
|
|
d0cde9 |
$! bb
|
|
|
d0cde9 |
#
|
|
|
d0cde9 |
# Start the RHS of the s command without a leading
|
|
|
d0cde9 |
# newline, add the P/D pair for the sliding window, and
|
|
|
d0cde9 |
# print the script.
|
|
|
d0cde9 |
g
|
|
|
d0cde9 |
s,/\n,/,
|
|
|
d0cde9 |
s,$,/\
|
|
|
d0cde9 |
P\
|
|
|
d0cde9 |
D,p
|
|
|
d0cde9 |
#---end of script---
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.24. How do I address all the lines between RE1 and RE2, excluding the
|
|
|
d0cde9 |
lines themselves?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Normally, to address the lines between two regular expressions, RE1
|
|
|
d0cde9 |
and RE2, one would do this: '/RE1/,/RE2/{commands;}'. Excluding
|
|
|
d0cde9 |
those lines takes an extra step. To put 2 arrows before each line
|
|
|
d0cde9 |
between RE1 and RE2, except for those lines:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed '1,/RE1/!{ /RE2/,/RE1/!s/^/>>/; }' input.fil
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The preceding script, though short, may be difficult to follow. It
|
|
|
d0cde9 |
also requires that /RE1/ cannot occur on the first line of the
|
|
|
d0cde9 |
input file. The following script, though it's not a one-liner, is
|
|
|
d0cde9 |
easier to read and it permits /RE1/ to appear on the first line:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# sed script to replace all lines between /RE1/ and /RE2/,
|
|
|
d0cde9 |
# without matching /RE1/ or /RE2/
|
|
|
d0cde9 |
/RE1/,/RE2/{
|
|
|
d0cde9 |
/RE1/b
|
|
|
d0cde9 |
/RE2/b
|
|
|
d0cde9 |
s/^/>>/
|
|
|
d0cde9 |
}
|
|
|
d0cde9 |
#---end of script---
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Contents of input.fil: Output of sed script:
|
|
|
d0cde9 |
aaa aaa
|
|
|
d0cde9 |
bbb bbb
|
|
|
d0cde9 |
RE1 RE1
|
|
|
d0cde9 |
aaa >>aaa
|
|
|
d0cde9 |
bbb >>bbb
|
|
|
d0cde9 |
ccc >>ccc
|
|
|
d0cde9 |
RE2 RE2
|
|
|
d0cde9 |
end end
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.25. How do I join two lines if line #1 ends in a [certain string]?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
This question appears in the section on one-line sed scripts, but
|
|
|
d0cde9 |
it comes up so many times that it needs a place here also. Suppose
|
|
|
d0cde9 |
a line ends with a particular string (often, a line ends with a
|
|
|
d0cde9 |
backslash). How do you bring up the second line after it, even in
|
|
|
d0cde9 |
cases where several consecutive lines all end in a backslash?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed -e :a -e '/\\$/N; s/\\\n//; ta' file # all seds
|
|
|
d0cde9 |
sed ':a; /\\$/N; s/\\\n//; ta' file # GNU sed, ssed, HHsed
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Note that this replaces the backslash-newline with nothing. You may
|
|
|
d0cde9 |
want to replace the backslash-newline with a single space instead.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.26. How do I join two lines if line #2 begins in a [certain string]?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The inverse situation is another FAQ. Suppose a line begins with a
|
|
|
d0cde9 |
particular string. How do you bring that line up to follow the
|
|
|
d0cde9 |
previous line? In this example, we want to match the string "<<="
|
|
|
d0cde9 |
at the beginning of one line, bring that line up to the end of the
|
|
|
d0cde9 |
line before it, and replace the string with a single space:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed -e :a -e '$!N;s/\n<<=/ /;ta' -e 'P;D' file # all seds
|
|
|
d0cde9 |
sed ':a; $!N;s/\n<<=/ /;ta;P;D' file # GNU, ssed, sed15+
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.27. How do I change all paragraphs to long lines?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
A frequent request is how to convert DOS-style textfiles, in which
|
|
|
d0cde9 |
each line ends with "paragraph marker", to Microsoft-style
|
|
|
d0cde9 |
textfiles, in which the "paragraph" marker only appears at the end
|
|
|
d0cde9 |
of real paragraphs. Sometimes this question is framed as, "How do I
|
|
|
d0cde9 |
remove the hard returns at the end of each line in a paragraph?"
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The problem occurs because newer word processors don't work the
|
|
|
d0cde9 |
same way older text editors did. Older text editors used a newline
|
|
|
d0cde9 |
(CR/LF in DOS; LF alone in Unix) to end each line on screen or on
|
|
|
d0cde9 |
disk, and used two newlines to separate paragraphs. Certain word
|
|
|
d0cde9 |
processors wanted to make paragraph reformatting and reflowing work
|
|
|
d0cde9 |
easily, so they use one newline to end a paragraph and never allow
|
|
|
d0cde9 |
newlines _within_ a paragraph. This means that textfiles created
|
|
|
d0cde9 |
with standard editors (Emacs, vi, Vedit, Boxer, etc.) appear to
|
|
|
d0cde9 |
have "hard returns" at inappropriate places. The following sed
|
|
|
d0cde9 |
script finds blocks of consecutive nonblank lines (i.e., paragraphs
|
|
|
d0cde9 |
of text), and converts each block into one long line with one "hard
|
|
|
d0cde9 |
return" at the end.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# sed script to change all paragraphs to long lines
|
|
|
d0cde9 |
/./{H; $!d;} # Put each paragraph into hold space
|
|
|
d0cde9 |
x; # Swap hold space and pattern space
|
|
|
d0cde9 |
s/^\(\n\)\(..*\)$/\2\1/; # Move leading \n to end of PatSpace
|
|
|
d0cde9 |
s/\n\(.\)/ \1/g; # Replace all other \n with 1 space
|
|
|
d0cde9 |
# Uncomment the following line to remove excess blank lines:
|
|
|
d0cde9 |
# /./!d;
|
|
|
d0cde9 |
#---end of sed script---
|
|
|
d0cde9 |
|
|
|
d0cde9 |
If the input files have formatting or indentation that conveys
|
|
|
d0cde9 |
special meaning (like program source code), this script will remove
|
|
|
d0cde9 |
it. But if the text still needs to be extended, try 'par'
|
|
|
d0cde9 |
(paragraph reformatter) or the 'fmt' utility with the -t or -c
|
|
|
d0cde9 |
switches and the width option (-w) set to a number like 9999.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
SHELL AND ENVIRONMENT
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.30. How do I read environment variables with sed?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.30.1. - on Unix platforms
|
|
|
d0cde9 |
|
|
|
d0cde9 |
In Unix, environment variables begin with a dollar sign, such as
|
|
|
d0cde9 |
$TERM, $PATH, $var or $i. In sed, the dollar sign is used to
|
|
|
d0cde9 |
indicate the last line of the input file, the end of a line (in the
|
|
|
d0cde9 |
LHS), or a literal symbol (in the RHS). Sed cannot access variables
|
|
|
d0cde9 |
directly, so one must pay attention to shell quoting requirements
|
|
|
d0cde9 |
to expand the variables properly.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
To ALLOW the Unix shell to interpret the dollar sign, put the
|
|
|
d0cde9 |
script in double quotes:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed "s/_terminal-type_/$TERM/g" input.file >output.file
|
|
|
d0cde9 |
|
|
|
d0cde9 |
To PREVENT the Unix shell from interpreting the dollar sign as a
|
|
|
d0cde9 |
shell variable, put the script in single quotes:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed 's/.$//' infile >outfile
|
|
|
d0cde9 |
|
|
|
d0cde9 |
To use BOTH Unix $environment_vars and sed /end-of-line$/ pattern
|
|
|
d0cde9 |
matching, there are two solutions. (1) The easiest is to enclose
|
|
|
d0cde9 |
the script in "double quotes" so the shell can see the $variables,
|
|
|
d0cde9 |
and to prefix the sed metacharacter ($) with a backslash. Thus, in
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed "s/$user\$/root/" file
|
|
|
d0cde9 |
|
|
|
d0cde9 |
the shell interpolates $user and sed interprets \$ as the symbol
|
|
|
d0cde9 |
for end-of-line.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(2) Another method--somewhat less readable--is to concatenate the
|
|
|
d0cde9 |
script with 'single quotes' where the $ should not be interpolated
|
|
|
d0cde9 |
and "double quotes" where variable interpolation should occur. To
|
|
|
d0cde9 |
demonstrate using the preceding script:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed "s/$user"'$/root/' file
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Solution #1 seems easier to remember. In either case, we search for
|
|
|
d0cde9 |
the user's name (stored in a variable called $user) when it occurs
|
|
|
d0cde9 |
at the end of the line ($), and substitute the word "root" in all
|
|
|
d0cde9 |
matches.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
For longer shell scripts, it is sometimes useful to begin with
|
|
|
d0cde9 |
single quote marks ('), close them upon encountering the variable,
|
|
|
d0cde9 |
enclose the variable name in double quotes ("), and resume with
|
|
|
d0cde9 |
single quotes, closing them at the end of the sed script. Example:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
#! /bin/sh
|
|
|
d0cde9 |
# sed script to illustrate 'quote'"matching"'usage'
|
|
|
d0cde9 |
FROM='abcdefgh'
|
|
|
d0cde9 |
TO='ABCDEFGH'
|
|
|
d0cde9 |
sed -e '
|
|
|
d0cde9 |
y/'"$FROM"'/'"$TO"'/; # note the quote pairing
|
|
|
d0cde9 |
# some more commands go here . . .
|
|
|
d0cde9 |
# last line is a single quote mark
|
|
|
d0cde9 |
'
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Thus, each variable named $FROM is replaced by $TO, and the single
|
|
|
d0cde9 |
quotes are used to glue the multiple lines together in the script.
|
|
|
d0cde9 |
(See also section 4.10, "How do I handle shell quoting in sed?")
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.30.2. - on MS-DOS and 4DOS platforms
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Under 4DOS and MS-DOS version 7.0 (Win95) or 7.10 (Win95 OSR2),
|
|
|
d0cde9 |
environment variables can be accessed from the command prompt.
|
|
|
d0cde9 |
Under MS-DOS v6.22 and below, environment variables can only be
|
|
|
d0cde9 |
accessed from within batch files. Environment variables should be
|
|
|
d0cde9 |
enclosed between percent signs and are case-insensitive; i.e.,
|
|
|
d0cde9 |
%USER% or %user% will display the USER variable. To generate a true
|
|
|
d0cde9 |
percent sign, just enter it twice.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
DOS versions of sed require that sed scripts be enclosed by double
|
|
|
d0cde9 |
quote marks "..." (not single quotes!) if the script contains
|
|
|
d0cde9 |
embedded tabs, spaces, redirection arrows or the vertical bar. In
|
|
|
d0cde9 |
fact, if the input for sed comes from piping, a sed script should
|
|
|
d0cde9 |
not contain a vertical bar, even if it is protected by double
|
|
|
d0cde9 |
quotes (this seems to be bug in the normal MS-DOS syntax). Thus,
|
|
|
d0cde9 |
|
|
|
d0cde9 |
echo blurk | sed "s/^/ |foo /" # will cause an error
|
|
|
d0cde9 |
sed "s/^/ |foo /" blurk.txt # will work as expected
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Using DOS environment variables which contain DOS path statements
|
|
|
d0cde9 |
(such as a TMP variable set to "C:\TEMP") within sed scripts is
|
|
|
d0cde9 |
discouraged because sed will interpret the backslash '\' as a
|
|
|
d0cde9 |
metacharacter to "quote" the next character, not as a normal
|
|
|
d0cde9 |
symbol. Thus,
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed "s/^/%TMP% /" somefile.txt
|
|
|
d0cde9 |
|
|
|
d0cde9 |
will not prefix each line with (say) "C:\TEMP ", but will prefix
|
|
|
d0cde9 |
each line with "C:TEMP "; sed will discard the backslash, which is
|
|
|
d0cde9 |
probably not what you want. Other variables such as %PATH% and
|
|
|
d0cde9 |
%COMSPEC% will also lose the backslash within sed scripts.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Environment variables which do not use backslashes are usually
|
|
|
d0cde9 |
workable. Thus, all the following should work without difficulty,
|
|
|
d0cde9 |
if they are invoked from within DOS batch files:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed "s/=username=/%USER%/g" somefile.txt
|
|
|
d0cde9 |
echo %FILENAME% | sed "s/\.TXT/.BAK/"
|
|
|
d0cde9 |
grep -Ei "%string%" somefile.txt | sed "s/^/ /"
|
|
|
d0cde9 |
|
|
|
d0cde9 |
while from either the DOS prompt or from within a batch file,
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed "s/%%/ percent/g" input.fil >output.fil
|
|
|
d0cde9 |
|
|
|
d0cde9 |
will replace each percent symbol in a file with " percent" (adding
|
|
|
d0cde9 |
the leading space for readability).
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.31. How do I export or pass variables back into the environment?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.31.1. - on Unix platforms
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Suppose that line #1, word #2 of the file 'terminals' contains a
|
|
|
d0cde9 |
value to be put in your TERM environment variable. Sed cannot
|
|
|
d0cde9 |
export variables directly to the shell, but it can pass strings to
|
|
|
d0cde9 |
shell commands. To set a variable in the Bourne shell:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
TERM=`sed 's/^[^ ][^ ]* \([^ ][^ ]*\).*/\1/;q' terminals`;
|
|
|
d0cde9 |
export TERM
|
|
|
d0cde9 |
|
|
|
d0cde9 |
If the second word were "Wyse50", this would send the shell command
|
|
|
d0cde9 |
"TERM=Wyse50".
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.31.2. - on MS-DOS or 4DOS platforms
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Sed cannot directly manipulate the environment. Under DOS, only
|
|
|
d0cde9 |
batch files (.BAT) can do this, using the SET instruction, since
|
|
|
d0cde9 |
they are run directly by the command shell. Under 4DOS, special
|
|
|
d0cde9 |
4DOS commands (such as ESET) can also alter the environment.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Under DOS or 4DOS, sed can select a word and pass it to the SET
|
|
|
d0cde9 |
command. Suppose you want the 1st word of the 2nd line of MY.DAT
|
|
|
d0cde9 |
put into an environment variable named %PHONE%. You might do this:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
@echo off
|
|
|
d0cde9 |
sed -n "2 s/^\([^ ][^ ]*\) .*/SET PHONE=\1/p;3q" MY.DAT > GO_.BAT
|
|
|
d0cde9 |
call GO_.BAT
|
|
|
d0cde9 |
echo The environment variable for PHONE is %PHONE%
|
|
|
d0cde9 |
:: cleanup
|
|
|
d0cde9 |
del GO_.BAT
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The sed script assumes that the first character on the 2nd line is
|
|
|
d0cde9 |
not a space and uses grouping \(...\) to save the first string of
|
|
|
d0cde9 |
non-space characters as \1 for the RHS. In writing any batch files,
|
|
|
d0cde9 |
make sure that output filenames such as GO_.BAT don't overwrite
|
|
|
d0cde9 |
preexisting files of the same name.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.32. How do I handle Unix shell quoting in sed?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
To embed a literal single quote (') in a script, use (a) or (b):
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(a) If possible, put the script in double quotes:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed "s/cannot/can't/g" file
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(b) If the script must use single quotes, then close-single-quote
|
|
|
d0cde9 |
the script just before the SPECIAL single quote, prefix the single
|
|
|
d0cde9 |
quote with a backslash, and use a 2nd pair of single quotes to
|
|
|
d0cde9 |
finish marking the script. Thus:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed 's/cannot$/can'\''t/g' file
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Though this looks hard to read, it breaks down to 3 parts:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
's/cannot$/can' \' 't/g'
|
|
|
d0cde9 |
--------------- -- -----
|
|
|
d0cde9 |
|
|
|
d0cde9 |
To embed a literal double quote (") in a script, use (a) or (b):
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(a) If possible, put the script in single quotes. You don't need to
|
|
|
d0cde9 |
prefix the double quotes with anything. Thus:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed 's/14"/fourteen inches/g' file
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(b) If the script must use double quotes, then prefix the SPECIAL
|
|
|
d0cde9 |
double quote with a backslash (\). Thus,
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed "s/$length\"/$length inches/g" file
|
|
|
d0cde9 |
|
|
|
d0cde9 |
To embed a literal backslash (\) into a script, enter it twice:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed 's/C:\\DOS/D:\\DOS/g' config.sys
|
|
|
d0cde9 |
|
|
|
d0cde9 |
FILES, DIRECTORIES, AND PATHS
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.40. How do I read (insert/add) a file at the top of a textfile?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Normally, adding a "header" file to the top of a "body" file is
|
|
|
d0cde9 |
done from the command prompt before passing the file on to sed.
|
|
|
d0cde9 |
(MS-DOS below version 6.0 must use COPY and DEL instead of MOVE in
|
|
|
d0cde9 |
the following example.)
|
|
|
d0cde9 |
|
|
|
d0cde9 |
copy header.txt+body temp # MS-DOS command 1
|
|
|
d0cde9 |
echo Y | move temp body # MS-DOS command 2
|
|
|
d0cde9 |
#
|
|
|
d0cde9 |
cat header.txt body >temp; mv temp body # Unix commands
|
|
|
d0cde9 |
|
|
|
d0cde9 |
However, if inserting the file must occur within sed, there is a
|
|
|
d0cde9 |
way. The sed command "1 r header.txt" will not work; it will print
|
|
|
d0cde9 |
line 1 and then insert "header.txt" between lines 1 and 2. The
|
|
|
d0cde9 |
following script solves this problem; however, there must be at
|
|
|
d0cde9 |
least 2 lines in the target file for the script to work properly.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# sed script to insert "header.txt" above the first line
|
|
|
d0cde9 |
1{h; r header.txt
|
|
|
d0cde9 |
D; }
|
|
|
d0cde9 |
2{x; G; }
|
|
|
d0cde9 |
#---end of sed script---
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.41. How do I make substitutions in every file in a directory, or in
|
|
|
d0cde9 |
a complete directory tree?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.41.1. - ssed and Perl solution
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The best solution for multiple files in a single directory is to
|
|
|
d0cde9 |
use ssed or gsed v4.0 or higher:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed -i.BAK 's|foo|bar|g' files # -i does in-place replacement
|
|
|
d0cde9 |
|
|
|
d0cde9 |
If you don't have ssed, there is a similar solution in Perl. (Yes,
|
|
|
d0cde9 |
we know this is a FAQ file for sed, not perl, but perl is more
|
|
|
d0cde9 |
common than ssed for many users.)
|
|
|
d0cde9 |
|
|
|
d0cde9 |
perl -pi.bak -e 's|foo|bar|g' files # or
|
|
|
d0cde9 |
perl -pi.bak -e 's|foo|bar|g' `find /pathname -name "filespec"`
|
|
|
d0cde9 |
|
|
|
d0cde9 |
For each file in the filelist, sed (or Perl) renames the source
|
|
|
d0cde9 |
file to "filename.bak"; the modified file gets the original
|
|
|
d0cde9 |
filename. Remove '.bak' if you don't need backup copies. (Note the
|
|
|
d0cde9 |
use of "s|||" instead of "s///" here, and in the scripts below. The
|
|
|
d0cde9 |
vertical bars in the 's' command let you replace '/some/path' with
|
|
|
d0cde9 |
'/another/path', accommodating slashes in the LHS and RHS.)
|
|
|
d0cde9 |
|
|
|
d0cde9 |
To recurse directories in Unix or GNU/Linux:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# We use xargs to prevent passing too many filenames to sed, but
|
|
|
d0cde9 |
# this command will fail if filenames contain spaces or newlines.
|
|
|
d0cde9 |
find /my/path -name '*.ht' -print | xargs sed -i.BAK 's|foo|bar|g'
|
|
|
d0cde9 |
|
|
|
d0cde9 |
To recurse directories under Windows 2000 (CMD.EXE or COMMAND.COM):
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# This syntax isn't supported under Windows 9x COMMAND.COM
|
|
|
d0cde9 |
for /R c:\my\path %f in (*.htm) do sed -i.BAK "s|foo|bar|g" %f
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.41.2. - Unix solution
|
|
|
d0cde9 |
|
|
|
d0cde9 |
For all files in a single directory, assuming they end with *.txt
|
|
|
d0cde9 |
and you have no files named "[anything].txt.bak" already, use a
|
|
|
d0cde9 |
shell script:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
#! /bin/sh
|
|
|
d0cde9 |
# Source files are saved as "filename.txt.bak" in case of error
|
|
|
d0cde9 |
# The '&&' after cp is an additional safety feature
|
|
|
d0cde9 |
for file in *.txt
|
|
|
d0cde9 |
do
|
|
|
d0cde9 |
cp $file $file.bak &&
|
|
|
d0cde9 |
sed 's|foo|bar|g' $file.bak >$file
|
|
|
d0cde9 |
done
|
|
|
d0cde9 |
|
|
|
d0cde9 |
To do an entire directory tree, use the Unix utility find, like so
|
|
|
d0cde9 |
(thanks to Jim Dennis <jadestar@rahul.net> for this script):
|
|
|
d0cde9 |
|
|
|
d0cde9 |
#! /bin/sh
|
|
|
d0cde9 |
# filename: replaceall
|
|
|
d0cde9 |
# Backup files are NOT saved in this script.
|
|
|
d0cde9 |
find . -type f -name '*.txt' -print | while read i
|
|
|
d0cde9 |
do
|
|
|
d0cde9 |
sed 's|foo|bar|g' $i > $i.tmp && mv $i.tmp $i
|
|
|
d0cde9 |
done
|
|
|
d0cde9 |
|
|
|
d0cde9 |
This previous shell script recurses through the directory tree,
|
|
|
d0cde9 |
finding only files in the directory (not symbolic links, which will
|
|
|
d0cde9 |
be encountered by the shell command "for file in *.txt", above). To
|
|
|
d0cde9 |
preserve file permissions and make backup copies, use the 2-line cp
|
|
|
d0cde9 |
routine of the earlier script instead of "sed ... && mv ...". By
|
|
|
d0cde9 |
replacing the sed command 's|foo|bar|g' with something like
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed "s|$1|$2|g" ${i}.bak > $i
|
|
|
d0cde9 |
|
|
|
d0cde9 |
using double quotes instead of single quotes, the user can also
|
|
|
d0cde9 |
employ positional parameters on the shell script command tail, thus
|
|
|
d0cde9 |
reusing the script from time to time. For example,
|
|
|
d0cde9 |
|
|
|
d0cde9 |
replaceall East West
|
|
|
d0cde9 |
|
|
|
d0cde9 |
would modify all your *.txt files in the current directory.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.41.3. - DOS solution:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
MS-DOS users should use two batch files like this:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
@echo off
|
|
|
d0cde9 |
:: MS-DOS filename: REPLACE.BAT
|
|
|
d0cde9 |
::
|
|
|
d0cde9 |
:: Create a destination directory to put the new files.
|
|
|
d0cde9 |
:: Note: The next command will fail under Novel Netware
|
|
|
d0cde9 |
:: below version 4.10 unless "SHOW DOTS=ON" is active.
|
|
|
d0cde9 |
if not exist .\NEWFILES\NUL mkdir NEWFILES
|
|
|
d0cde9 |
for %%f in (*.txt) do CALL REPL_2.BAT %%f
|
|
|
d0cde9 |
echo Done!!
|
|
|
d0cde9 |
:: ---End of first batch file---
|
|
|
d0cde9 |
|
|
|
d0cde9 |
@echo off
|
|
|
d0cde9 |
:: MS-DOS filename: REPL_2.BAT
|
|
|
d0cde9 |
::
|
|
|
d0cde9 |
sed "s/foo/bar/g" %1 > NEWFILES\%1
|
|
|
d0cde9 |
:: ---End of the second batch file---
|
|
|
d0cde9 |
|
|
|
d0cde9 |
When finished, the current directory contains all the original
|
|
|
d0cde9 |
files, and the newly-created NEWFILES subdirectory contains the
|
|
|
d0cde9 |
modified *.TXT files. Do not attempt a command like
|
|
|
d0cde9 |
|
|
|
d0cde9 |
for %%f in (*.txt) do sed "s/foo/bar/g" %%f >NEWFILES\%%f
|
|
|
d0cde9 |
|
|
|
d0cde9 |
under any version of MS-DOS because the output filename will be
|
|
|
d0cde9 |
created as a literal '%f' in the NEWFILES directory before the
|
|
|
d0cde9 |
%%f is expanded to become each filename in (*.txt). This occurs
|
|
|
d0cde9 |
because MS-DOS creates output filenames via redirection commands
|
|
|
d0cde9 |
before it expands "for..in..do" variables.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
To recurse through an entire directory tree in MS-DOS requires a
|
|
|
d0cde9 |
batch file more complex than we have room to describe. Examine the
|
|
|
d0cde9 |
file SWEEP.BAT in Timo Salmi's great archive of batch tricks,
|
|
|
d0cde9 |
located at <ftp://garbo.uwasa.fi/pc/link/tsbat.zip> (this file is
|
|
|
d0cde9 |
regularly updated). Another alternative is to get an external
|
|
|
d0cde9 |
program designed for directory recursion. Here are some recommended
|
|
|
d0cde9 |
programs for directory recursion. The first one, FORALL, runs under
|
|
|
d0cde9 |
either OS/2 or DOS. Unfortunately, none of these supports Win9x
|
|
|
d0cde9 |
long filenames.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
http://hobbes.nmsu.edu/pub/os2/util/disk/forall72.zip
|
|
|
d0cde9 |
ftp://garbo.uwasa.fi/pc/filefind/target15.zip
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.42. How do I replace "/some/UNIX/path" in a substitution?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Technically, the normal meaning of the slash can be disabled by
|
|
|
d0cde9 |
prefixing it with a backslash. Thus,
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed 's/\/some\/UNIX\/path/\/a\/new\/path/g' files
|
|
|
d0cde9 |
|
|
|
d0cde9 |
But this is hard to read and write. There is a better solution.
|
|
|
d0cde9 |
The s/// substitution command allows '/' to be replaced by any
|
|
|
d0cde9 |
other character (including spaces or alphanumerics). Thus,
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed 's|/some/UNIX/path|/a/new/path|g' files
|
|
|
d0cde9 |
|
|
|
d0cde9 |
and if you are using variable names in a Unix shell script,
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed "s|$OLDPATH|$NEWPATH|g" oldfile >newfile
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.43. How do I replace "C:\SOME\DOS\PATH" in a substitution?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
For MS-DOS users, every backslash must be doubled. Thus, to replace
|
|
|
d0cde9 |
"C:\SOME\DOS\PATH" with "D:\MY\NEW\PATH":
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed "s|C:\\SOME\\DOS\\PATH|D:\\MY\\NEW\\PATH|g" infile >outfile
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Remember that DOS pathnames are not case sensitive and can appear
|
|
|
d0cde9 |
in upper or lower case in the input file. If this concerns you, use
|
|
|
d0cde9 |
a version of sed which can ignore case when matching (gsed, ssed,
|
|
|
d0cde9 |
sedmod, sed16).
|
|
|
d0cde9 |
|
|
|
d0cde9 |
@echo off
|
|
|
d0cde9 |
:: sample MS-DOS batch file to alter path statements
|
|
|
d0cde9 |
:: requires GNU sed with the /i flag for s///
|
|
|
d0cde9 |
set old=C:\\SOME\\DOS\\PATH
|
|
|
d0cde9 |
set new=D:\\MY\\NEW\\PATH
|
|
|
d0cde9 |
gsed "s|%old%|%new%|gi" infile >outfile
|
|
|
d0cde9 |
:: or
|
|
|
d0cde9 |
:: sedmod -i "s|%old%|%new%|g" infile >outfile
|
|
|
d0cde9 |
set old=
|
|
|
d0cde9 |
set new=
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Also, remember that under Windows long filenames may be stored in
|
|
|
d0cde9 |
two formats: e.g., as "C:\Program Files" or as "C:\PROGRA~1".
|
|
|
d0cde9 |
|
|
|
d0cde9 |
4.44. How do I emulate file-includes, using sed?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Given an input file with file-include statements, similar to
|
|
|
d0cde9 |
C-style includes or "server-side includes" (SSI) of this format:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
This is the source file. It's short.
|
|
|
d0cde9 |
Its name is simply 'source'. See the script below.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
And this is any amount of text between
|
|
|
d0cde9 |
|
|
|
d0cde9 |
This is the last line of the file.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
How do we direct sed to import/insert whichever files are at the
|
|
|
d0cde9 |
point of the 'file="filename"' token? First, use this file:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
#n
|
|
|
d0cde9 |
# filename: incl.sed
|
|
|
d0cde9 |
# Comments supported by GNU sed or ssed. Leading '#n' should
|
|
|
d0cde9 |
# be on line 1, columns 1-2 of the line.
|
|
|
d0cde9 |
/
|
|
|
d0cde9 |
=; # print the line number
|
|
|
d0cde9 |
s/^[^"]*"/{r /; # change pattern to 'r{ '
|
|
|
d0cde9 |
s/".*//p; # delete rest to EOL, print
|
|
|
d0cde9 |
# and a(ppend) a delete command
|
|
|
d0cde9 |
a\
|
|
|
d0cde9 |
d;}
|
|
|
d0cde9 |
}
|
|
|
d0cde9 |
#---end of sed script---
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Second, use the following shell script or DOS batch file (if
|
|
|
d0cde9 |
running a DOS batch file, use "double quotes" instead of 'single
|
|
|
d0cde9 |
quotes', and use "del" instead of "rm" to remove the temp file):
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed -nf incl.sed source | sed 'N;N;s/\n//' >temp.sed
|
|
|
d0cde9 |
sed -f temp.sed source >target
|
|
|
d0cde9 |
rm temp.sed
|
|
|
d0cde9 |
|
|
|
d0cde9 |
If you have GNU sed or ssed, you can reduce the script even further
|
|
|
d0cde9 |
(thanks to Michael Carmack for the reminder):
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed -nf incl.sed source | sed 'N;N;s/\n//' | sed -f - source >target
|
|
|
d0cde9 |
|
|
|
d0cde9 |
In brief, the script replaces each filename with a 'r filename'
|
|
|
d0cde9 |
command to insert the file at that point, while omitting the
|
|
|
d0cde9 |
extraneous material. Two important things to note with this script:
|
|
|
d0cde9 |
(1) There should be only one '#include file' directive per line, and
|
|
|
d0cde9 |
(2) each '#include file' directive must be the *only* thing on that
|
|
|
d0cde9 |
line, because everything else on the line will be deleted.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Though the script uses GNU sed or ssed because of the great support
|
|
|
d0cde9 |
for embedded script comments, it should run on any version of sed.
|
|
|
d0cde9 |
If not, write me and let me know.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
------------------------------
|
|
|
d0cde9 |
|
|
|
d0cde9 |
5. WHY ISN'T THIS WORKING?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
5.1. Why don't my variables like $var get expanded in my sed script?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Because your sed script uses 'single quotes' instead of "double
|
|
|
d0cde9 |
quotes." Unix shells never expand $variables in single quotes.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
This is probably the most frequently-asked sed question. For more
|
|
|
d0cde9 |
info on using variables, see section 4.30.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
5.2. I'm using 'p' to print, but I have duplicate lines sometimes.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Sed prints the entire file by default, so the 'p' command might
|
|
|
d0cde9 |
cause the duplicate lines. If you want the whole file printed,
|
|
|
d0cde9 |
try removing the 'p' from commands like 's/foo/bar/p'. If you want
|
|
|
d0cde9 |
part of the file printed, run your sed script with -n flag to
|
|
|
d0cde9 |
suppress normal output, and rewrite the script to get all output
|
|
|
d0cde9 |
from the 'p' comand.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
If you're still getting duplicate lines, you are probably finding
|
|
|
d0cde9 |
several matches for the same line. Suppose you want to print lines
|
|
|
d0cde9 |
with the words "Peter" or "James" or "John", but not the same line
|
|
|
d0cde9 |
twice. The following command will fail:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed -n '/Peter/p; /James/p; /John/p' files
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Since all 3 commands of the script are executed for each line,
|
|
|
d0cde9 |
you'll get extra lines. A better way is to use the 'd' (delete) or
|
|
|
d0cde9 |
'b' (branch) commands, like so (with GNU sed):
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed '/Peter/b; /James/b; /John/b; d' files # one way
|
|
|
d0cde9 |
sed -n '/Peter/{p;d;};/James/{p;d;};/John/p' files # a 2nd way
|
|
|
d0cde9 |
sed -n '/Peter/{p;b;};/James/{p;b;};/John/p' files # a 3rd way
|
|
|
d0cde9 |
sed '/Peter\|James\|John/!d' files # shortest way
|
|
|
d0cde9 |
|
|
|
d0cde9 |
On standard seds, these must be broken down with -e commands:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed -e '/Peter/b' -e '/James/b' -e '/John/b' -e d files
|
|
|
d0cde9 |
sed -n -e '/Peter/{p;d;}' -e '/James/{p;d;}' -e '/John/p' files
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The 3rd line would require too many -e commands to fit on one line,
|
|
|
d0cde9 |
since standard versions of sed require an -e command after each 'b'
|
|
|
d0cde9 |
and also after each closing brace '}'.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
5.3. Why does my DOS version of sed process a file part-way through
|
|
|
d0cde9 |
and then quit?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
First, look for errors in the script. Have you used the -n switch
|
|
|
d0cde9 |
without telling sed to print anything to the console? Have you read
|
|
|
d0cde9 |
the docs to your version of sed to see if it has a syntax you may
|
|
|
d0cde9 |
have misused? (Look for an N or H command that gathers too much.)
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Next, if you are sure your sed script is valid, a probable cause is
|
|
|
d0cde9 |
an end-of-file marker embedded in the file. An EOF marker (SUB) is
|
|
|
d0cde9 |
a Control-Z character, with the value of 1A hex (26 decimal). As
|
|
|
d0cde9 |
soon as any DOS version of sed encounters a Ctrl-Z character, sed
|
|
|
d0cde9 |
stops processing.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
To locate the EOF character, use Vern Buerg's shareware file viewer
|
|
|
d0cde9 |
LIST.COM <http://www.buerg.com/list.html>. In text mode, look for a
|
|
|
d0cde9 |
right-arrow symbol; in hex mode (Alt-H), look for a 1A code. With
|
|
|
d0cde9 |
Unix utilities ported to DOS, use 'od' (octal dump) to display
|
|
|
d0cde9 |
hexcodes in your file, and then use sed to locate the offending
|
|
|
d0cde9 |
character:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
od -txC badfile.txt | sed -n "/ 1a /p; / 1a$/p"
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Then edit the input file to remove the offending character(s).
|
|
|
d0cde9 |
|
|
|
d0cde9 |
If you would rather NOT edit the input file, there is still a fix.
|
|
|
d0cde9 |
It requires the DJGPP 32-bit port of 'tr', the Unix translate
|
|
|
d0cde9 |
program (v1.22 or higher). GNU od and tr are currently at v2.0 (for
|
|
|
d0cde9 |
DOS); they are packaged with the GNU text utilities, available at
|
|
|
d0cde9 |
|
|
|
d0cde9 |
ftp://ftp.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/txt20b.zip
|
|
|
d0cde9 |
http://www.simtel.net/gnudlpage.php?product=/gnu/djgpp/v2gnu/txt20b.zip&name=txt20b.zip
|
|
|
d0cde9 |
|
|
|
d0cde9 |
It is important to get the DJGPP version of 'tr' because other
|
|
|
d0cde9 |
versions ported to DOS will stop processing when they encounter the
|
|
|
d0cde9 |
EOF character. Use the -d (delete) command:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
tr -d \32 < badfile.txt | sed -f myscript.sed
|
|
|
d0cde9 |
|
|
|
d0cde9 |
5.4. My RE isn't matching/deleting what I want it to. (Or, "Greedy vs.
|
|
|
d0cde9 |
stingy pattern matching")
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The two most common causes for this problem are: (1) misusing the
|
|
|
d0cde9 |
'.' metacharacter, and (2) misusing the '*' metacharacter. The RE
|
|
|
d0cde9 |
'.*' is designed to be "greedy" (i.e., matching as many characters
|
|
|
d0cde9 |
as possible). However, sometimes users need an expression which is
|
|
|
d0cde9 |
"stingy," matching the shortest possible string.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(1) On single-line patterns, the '.' metacharacter matches any
|
|
|
d0cde9 |
single character on the line. ('.' cannot match the newline at the
|
|
|
d0cde9 |
end of the line because the newline is removed when the line is put
|
|
|
d0cde9 |
into the pattern space; sed adds a newline automatically when the
|
|
|
d0cde9 |
pattern space is printed.) On multi-line patterns obtained with the
|
|
|
d0cde9 |
'N' or 'G' commands, '.' _will_ match a newline in the middle of the
|
|
|
d0cde9 |
pattern space. If there are 3 lines in the pattern space, "s/.*//"
|
|
|
d0cde9 |
will delete all 3 lines, not just the first one (leaving 1 blank
|
|
|
d0cde9 |
line, since the trailing newline is added to the output).
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Normal misuse of '.' occurs in trying to match a word or bounded
|
|
|
d0cde9 |
field, and forgetting that '.' will also cross the field limits.
|
|
|
d0cde9 |
Suppose you want to delete the first word in braces:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
echo {one} {two} {three} | sed 's/{.*}/{}/' # fails
|
|
|
d0cde9 |
echo {one} {two} {three} | sed 's/{[^}]*}/{}/' # succeeds
|
|
|
d0cde9 |
|
|
|
d0cde9 |
's/{.*}/{}/' is not the solution, since the regex '.' will match
|
|
|
d0cde9 |
any character, including the close braces. Replace the '.' with
|
|
|
d0cde9 |
'[^}]', which signifies a negated character set '[^...]' containing
|
|
|
d0cde9 |
anything other than a right brace. FWIW, we know that 's/{one}/{}/'
|
|
|
d0cde9 |
would also solve our question, but we're trying to illustrate the
|
|
|
d0cde9 |
use of the negated character set: [^anything-but-this].
|
|
|
d0cde9 |
|
|
|
d0cde9 |
A negated character set should be used for matching words between
|
|
|
d0cde9 |
quote marks, for fields separated by commas, and so on. See also
|
|
|
d0cde9 |
section 4.12 ("How do I parse a comma-delimited data file?").
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(2) The '*' metacharacter represents zero or more instances of the
|
|
|
d0cde9 |
previous expression. The '*' metacharacter looks for the leftmost
|
|
|
d0cde9 |
possible match first and will match zero characters. Thus,
|
|
|
d0cde9 |
|
|
|
d0cde9 |
echo foo | sed 's/o*/EEE/'
|
|
|
d0cde9 |
|
|
|
d0cde9 |
will generate 'EEEfoo', not 'fEEE' as one might expect. This is
|
|
|
d0cde9 |
because /o*/ matches the null string at the beginning of the word.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
After finding the leftmost possible match, the '*' is GREEDY; it
|
|
|
d0cde9 |
always tries to match the longest possible string. When two or
|
|
|
d0cde9 |
three instances of '.*' occur in the same RE, the leftmost instance
|
|
|
d0cde9 |
will grab the most characters. Consider this example, which uses
|
|
|
d0cde9 |
grouping '\(...\)' to save patterns:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
echo bar bat bay bet bit | sed 's/^.*\(b.*\)/\1/'
|
|
|
d0cde9 |
|
|
|
d0cde9 |
What will be displayed is 'bit', never anything longer, because the
|
|
|
d0cde9 |
leftmost '.*' took the longest possible match. Remember this rule:
|
|
|
d0cde9 |
"leftmost match, longest possible string, zero also matches."
|
|
|
d0cde9 |
|
|
|
d0cde9 |
5.5. What is CSDPMI*B.ZIP and why do I need it?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
If you use MS-DOS outside of Windows and try to use GNU sed v1.18
|
|
|
d0cde9 |
or 3.02, you may encounter the following error message:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
no DPMI - Get csdpmi*b.zip
|
|
|
d0cde9 |
|
|
|
d0cde9 |
"DPMI" stands for DOS Protected Mode Interface; it's basically a
|
|
|
d0cde9 |
means of running DOS in Protected Mode (as opposed to Real Mode),
|
|
|
d0cde9 |
which allows programs to share resources in extended memory without
|
|
|
d0cde9 |
conflicting with one another. Running HIMEM.SYS and EMM386.EXE is
|
|
|
d0cde9 |
not enough. The "CSDPMI*B.ZIP" refers to files written by Charles
|
|
|
d0cde9 |
Sandmann to provide DPMI services for 32-bit computers (i.e.,
|
|
|
d0cde9 |
386SX, 386DX, 486SX, etc.). Download the binary file (the source
|
|
|
d0cde9 |
code is also available):
|
|
|
d0cde9 |
|
|
|
d0cde9 |
http://www.delorie.com/djgpp/dl/ofc/simtel/v2misc/csdpmi5b.zip # binaries
|
|
|
d0cde9 |
http://www.delorie.com/djgpp/dl/ofc/simtel/v2misc/csdpmi5s.zip # source
|
|
|
d0cde9 |
ftp://ftp.cdrom.com/pub/simtelnet/gnu/djgpp/v2misc/csdpmi5b.zip # binaries
|
|
|
d0cde9 |
ftp://ftp.cdrom.com/pub/simtelnet/gnu/djgpp/v2misc/csdpmi5s.zip # source
|
|
|
d0cde9 |
|
|
|
d0cde9 |
and extract CWSDPMI.EXE, CWSDPR0.EXE and CWSPARAM.EXE from the ZIP
|
|
|
d0cde9 |
file. Put all 3 CWS*.EXE files in the same directory as GSED.EXE
|
|
|
d0cde9 |
and you're all set. There are DOC files enclosed, but they're
|
|
|
d0cde9 |
nearly incomprehensible for the average computer user. (Another
|
|
|
d0cde9 |
case of user-vicious documentation.)
|
|
|
d0cde9 |
|
|
|
d0cde9 |
If you're running Windows and you normally use a DOS session to run
|
|
|
d0cde9 |
GNU sed (i.e., you get to a DOS prompt with a resizable window or
|
|
|
d0cde9 |
you press Alt-Enter to switch to full-screen mode), you don't need
|
|
|
d0cde9 |
the CWS*.EXE files at all, since Windows uses DPMI already.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
5.6. Where are the man pages for GNU sed?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Prior to GNU sed v3.02, there weren't any. Until recently, man
|
|
|
d0cde9 |
pages distributed with gsed were borrowed from old sources or from
|
|
|
d0cde9 |
other compilations. None of them were "official." GNU sed v3.02 had
|
|
|
d0cde9 |
the first real set of official man pages, and the documentation has
|
|
|
d0cde9 |
greatly improved with GNU sed version 4.0, which now includes both
|
|
|
d0cde9 |
man pages and textinfo pages.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
5.7. How do I tell what version of sed I am using?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Try entering "sed" all by itself on the command line, followed by
|
|
|
d0cde9 |
no arguments or parameters. Also, try "sed --version". In a
|
|
|
d0cde9 |
pinch, you can also try this:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
strings sed | grep -i ver
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Your version of 'strings' must be a version of the Unix utility of
|
|
|
d0cde9 |
this name. It should not be the DOS utility STRINGS.COM by Douglas
|
|
|
d0cde9 |
Boling.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
5.8. Does sed issue an exit code?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Most versions of sed do not, but check the documentation that came
|
|
|
d0cde9 |
with whichever version you are using. GNU sed issues an exit code
|
|
|
d0cde9 |
of 0 if the program terminated normally, 1 if there were errors in
|
|
|
d0cde9 |
the script, and 2 if there were errors during script execution.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
5.9. The 'r' command isn't inserting the file into the text.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
On most versions of sed (but not all), the 'r' (read) and 'w'
|
|
|
d0cde9 |
(write) commands must be followed by exactly one space, then the
|
|
|
d0cde9 |
filename, and then terminated by a newline. Any additional
|
|
|
d0cde9 |
characters before or after the filename are interpreted as *part*
|
|
|
d0cde9 |
of the filename. Thus
|
|
|
d0cde9 |
|
|
|
d0cde9 |
/RE/r insert.me
|
|
|
d0cde9 |
|
|
|
d0cde9 |
will would try to locate a file called ' insert.me' (note the
|
|
|
d0cde9 |
leading space!). If the file was not found, most versions of sed
|
|
|
d0cde9 |
say nothing, not even an error message.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
When sed scripts are used on the command line, every 'r' and 'w'
|
|
|
d0cde9 |
must be the last command in that part of the script. Thus,
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed -e '/regex/{r insert.file;d;}' source # will fail
|
|
|
d0cde9 |
sed -e '/regex/{r insert.file' -e 'd;}' source # will succeed
|
|
|
d0cde9 |
|
|
|
d0cde9 |
5.10. Why can't I match or delete a newline using the \n escape sequence?
|
|
|
d0cde9 |
Why can't I match 2 or more lines using \n?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The \n will never match the newline at the end-of-line because the
|
|
|
d0cde9 |
newline is always stripped off before the line is placed into the
|
|
|
d0cde9 |
pattern space. To get 2 or more lines into the pattern space, use
|
|
|
d0cde9 |
the 'N' command or something similar (such as 'H;...;g;').
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Sed works like this: sed reads one line at a time, chops off the
|
|
|
d0cde9 |
terminating newline, puts what is left into the pattern space where
|
|
|
d0cde9 |
the sed script can address or change it, and when the pattern space
|
|
|
d0cde9 |
is printed, appends a newline to stdout (or to a file). If the
|
|
|
d0cde9 |
pattern space is entirely or partially deleted with 'd' or 'D', the
|
|
|
d0cde9 |
newline is *not* added in such cases. Thus, scripts like
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed 's/\n//' file # to delete newlines from each line
|
|
|
d0cde9 |
sed 's/\n/foo\n/' file # to add a word to the end of each line
|
|
|
d0cde9 |
|
|
|
d0cde9 |
will _never_ work, because the trailing newline is removed _before_
|
|
|
d0cde9 |
the line is put into the pattern space. To perform the above tasks,
|
|
|
d0cde9 |
use one of these scripts instead:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
tr -d '\n' < file # use tr to delete newlines
|
|
|
d0cde9 |
sed ':a;N;$!ba;s/\n//g' file # GNU sed to delete newlines
|
|
|
d0cde9 |
sed 's/$/ foo/' file # add "foo" to end of each line
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Since versions of sed other than GNU sed have limits to the size of
|
|
|
d0cde9 |
the pattern buffer, the Unix 'tr' utility is to be preferred here.
|
|
|
d0cde9 |
If the last line of the file contains a newline, GNU sed will add
|
|
|
d0cde9 |
that newline to the output but delete all others, whereas tr will
|
|
|
d0cde9 |
delete all newlines.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
To match a block of two or more lines, there are 3 basic choices:
|
|
|
d0cde9 |
(1) use the 'N' command to add the Next line to the pattern space;
|
|
|
d0cde9 |
(2) use the 'H' command at least twice to append the current line
|
|
|
d0cde9 |
to the Hold space, and then retrieve the lines from the hold space
|
|
|
d0cde9 |
with x, g, or G; or (3) use address ranges (see section 3.3, above)
|
|
|
d0cde9 |
to match lines between two specified addresses.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Choices (1) and (2) will put an \n into the pattern space, where it
|
|
|
d0cde9 |
can be addressed as desired ('s/ABC\nXYZ/alphabet/g'). One example
|
|
|
d0cde9 |
of using 'N' to delete a block of lines appears in section 4.13
|
|
|
d0cde9 |
("How do I delete a block of _specific_ consecutive lines?"). This
|
|
|
d0cde9 |
example can be modified by changing the delete command to something
|
|
|
d0cde9 |
else, like 'p' (print), 'i' (insert), 'c' (change), 'a' (append),
|
|
|
d0cde9 |
or 's' (substitute).
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Choice (3) will not put an \n into the pattern space, but it _does_
|
|
|
d0cde9 |
match a block of consecutive lines, so it may be that you don't
|
|
|
d0cde9 |
even need the \n to find what you're looking for. Since several
|
|
|
d0cde9 |
versions of sed support this syntax:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed '/start/,+4d' # to delete "start" plus the next 4 lines,
|
|
|
d0cde9 |
|
|
|
d0cde9 |
in addition to the traditional '/from here/,/to there/{...}' range
|
|
|
d0cde9 |
addresses, it may be possible to avoid the use of \n entirely.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
5.11. My script aborts with an error message, "event not found".
|
|
|
d0cde9 |
|
|
|
d0cde9 |
This error is generated by the csh or tcsh shells, not by sed. The
|
|
|
d0cde9 |
exclamation mark (!) is special to csh/tcsh, and if you use it in
|
|
|
d0cde9 |
command-line or shell scripts--even within single quotes--it must
|
|
|
d0cde9 |
be preceded by a backslash. Thus, under the csh/tcsh shell:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed '/regex/!d' # will fail
|
|
|
d0cde9 |
sed '/regex/\!d' # will succeed
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The exclamation mark should not be prefixed with a backslash when
|
|
|
d0cde9 |
the script is called from a file, as "-f script.file".
|
|
|
d0cde9 |
|
|
|
d0cde9 |
------------------------------
|
|
|
d0cde9 |
|
|
|
d0cde9 |
6. OTHER ISSUES
|
|
|
d0cde9 |
|
|
|
d0cde9 |
6.1. I have a certain problem that stumps me. Where can I get help?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Post your question on the "sed-users" mailing list (section 2.3.2),
|
|
|
d0cde9 |
where many sed users will be able to see your question. You will have
|
|
|
d0cde9 |
to subscribe to have posting privileges.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Your other alternative is one of these newsgroups:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
- alt.comp.editors.batch
|
|
|
d0cde9 |
- comp.editors
|
|
|
d0cde9 |
- comp.unix.questions
|
|
|
d0cde9 |
- comp.unix.shell
|
|
|
d0cde9 |
|
|
|
d0cde9 |
6.2. How does sed compare with awk, perl, and other utilities?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Awk is a much richer language with many features of a programming
|
|
|
d0cde9 |
language, including variable names, math functions, arrays, system
|
|
|
d0cde9 |
calls, etc. Its command structure is similar to sed:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
address { command(s) }
|
|
|
d0cde9 |
|
|
|
d0cde9 |
which means that for each line or range of lines that matches the
|
|
|
d0cde9 |
address, execute the command(s). In both sed and awk, an address
|
|
|
d0cde9 |
can be a line number or a RE somewhere on the line, or both.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
In program size, awk is 3-10 times larger than sed. Awk has most of
|
|
|
d0cde9 |
the functions of sed, but not all. Notably, sed supports
|
|
|
d0cde9 |
backreferences (\1, \2, ...) to previous expressions, and awk does
|
|
|
d0cde9 |
not have any comparable syntax. (One exception: GNU awk v3.0
|
|
|
d0cde9 |
introduced gensub(), which supports backreferences only on
|
|
|
d0cde9 |
substitutions.)
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Perl is a general-purpose programming language, with many features
|
|
|
d0cde9 |
beyond text processing and interprocess communication, taking it
|
|
|
d0cde9 |
well past awk or other scripting languages. Perl supports every
|
|
|
d0cde9 |
feature sed does and has its own set of extended regular
|
|
|
d0cde9 |
expressions, which give it extensive power in pattern matching and
|
|
|
d0cde9 |
processing. (Note: the standard perl distribution comes with 's2p',
|
|
|
d0cde9 |
a sed-to-perl conversion script. See section 3.6 for more info.)
|
|
|
d0cde9 |
Like sed and awk, perl scripts do not need to be compiled into
|
|
|
d0cde9 |
binary code. Like sed, perl can also run many useful "one-liners"
|
|
|
d0cde9 |
from the command line, though with greater flexibility; see
|
|
|
d0cde9 |
question 4.41 ("How do I make substitutions in every file in a
|
|
|
d0cde9 |
directory, or in a complete directory tree?").
|
|
|
d0cde9 |
|
|
|
d0cde9 |
On the other hand, the current version of perl is from 8 to 35
|
|
|
d0cde9 |
times larger than sed in its executables alone (perl's library
|
|
|
d0cde9 |
modules and allied files not included!). Further, for most simple
|
|
|
d0cde9 |
tasks such as substitution, sed executes more quickly than either
|
|
|
d0cde9 |
perl or awk. All these utilities serve to process input text,
|
|
|
d0cde9 |
transforming it to meet our needs . . . or our arbitrary whims.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
6.3. When should I use sed?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
When you need a small, fast program to modify words, lines, or
|
|
|
d0cde9 |
blocks of lines in a textfile.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
6.4. When should I NOT use sed?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
You should not use sed when you have "dedicated" tools which can do
|
|
|
d0cde9 |
the job faster or with an easier syntax. Do not use sed when you
|
|
|
d0cde9 |
only want to:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
- print individual lines, based on patterns within the line itself.
|
|
|
d0cde9 |
Instead, use "grep".
|
|
|
d0cde9 |
|
|
|
d0cde9 |
- print blocks of lines, with 1 or more lines of context above or
|
|
|
d0cde9 |
below a specific regular expression. Instead, use the GNU version
|
|
|
d0cde9 |
of grep as follows:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
grep -A{number} -B{number} "regex"
|
|
|
d0cde9 |
|
|
|
d0cde9 |
- remove individual lines, based on patterns within the line
|
|
|
d0cde9 |
itself. Instead, use "grep -v".
|
|
|
d0cde9 |
|
|
|
d0cde9 |
- print line numbers. Instead, use "nl" or "cat -n".
|
|
|
d0cde9 |
|
|
|
d0cde9 |
- reformat lines or paragraphs. Instead, use "fold", "fmt" or "par".
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The tr utility is also more suited than sed to some simple tasks. For
|
|
|
d0cde9 |
example, to:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
- delete individual characters. Instead of "s/[a-d]//g", use
|
|
|
d0cde9 |
|
|
|
d0cde9 |
tr -d "[a-d]"
|
|
|
d0cde9 |
|
|
|
d0cde9 |
- squeeze sequential characters. Instead of "s/ee*/e/g", use
|
|
|
d0cde9 |
|
|
|
d0cde9 |
tr -s "{character-set}"
|
|
|
d0cde9 |
|
|
|
d0cde9 |
- change individual characters. Instead of "y/abcdef/ABCDEF/", use
|
|
|
d0cde9 |
|
|
|
d0cde9 |
tr "[a-f]" "[A-F]"
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Note, however, that tr does not support giving input files on the
|
|
|
d0cde9 |
command line, so the syntax is:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
tr {options-and-patterns} < input-file
|
|
|
d0cde9 |
|
|
|
d0cde9 |
or, to process multiple files:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
cat input-file1 input-file2 | tr {options-and-patterns}
|
|
|
d0cde9 |
|
|
|
d0cde9 |
If you have multiple files, using tr instead of sed is often more of
|
|
|
d0cde9 |
an exercise than a useful thing. Although sed can perfectly emulate
|
|
|
d0cde9 |
certain functions of cat, grep, nl, rev, sort, tac, tail, tr, uniq,
|
|
|
d0cde9 |
and other utilities, producing identical output, the native utilities
|
|
|
d0cde9 |
are usually optimized to do the job more quickly than sed.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
6.5. When should I ignore sed and use awk or Perl instead?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
If you can write the same script in awk or Perl and do it in less
|
|
|
d0cde9 |
time, then use Perl or awk. There's no reason to spend an hour
|
|
|
d0cde9 |
writing and debugging a sed script if you can do it in Perl in 10
|
|
|
d0cde9 |
minutes (assuming that you know Perl already) and if the processing
|
|
|
d0cde9 |
time or memory use is not a factor. Don't hunt pheasants with a .22
|
|
|
d0cde9 |
if you have a shotgun at your side . . . unless you simply enjoy
|
|
|
d0cde9 |
the challenge!
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Specifically, use awk or perl if you need to:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
- count fields or words on a line. (awk)
|
|
|
d0cde9 |
- count lines in a block or objects in a file.
|
|
|
d0cde9 |
- check lengths of strings or do math operations.
|
|
|
d0cde9 |
- handle very long lines or need very large buffers. (or gsed)
|
|
|
d0cde9 |
- handle binary data (control characters). (perl: binmode)
|
|
|
d0cde9 |
- loop through an array or list.
|
|
|
d0cde9 |
- test for file existence, filesize, or fileage.
|
|
|
d0cde9 |
- treat each paragraph as a line. (well, not always)
|
|
|
d0cde9 |
|
|
|
d0cde9 |
6.6. Known limitations among sed versions
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Limits on distributed versions, although source code for most
|
|
|
d0cde9 |
versions of free sed allows for modification and recompilation. As
|
|
|
d0cde9 |
used below, "no limit" means there is no "fixed" limit. Limits are
|
|
|
d0cde9 |
actually determined by one's hardware, memory, operating system,
|
|
|
d0cde9 |
and which C library is used to compile sed.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
6.6.1. Maximum line length
|
|
|
d0cde9 |
|
|
|
d0cde9 |
GNU sed: no limit
|
|
|
d0cde9 |
ssed: no limit
|
|
|
d0cde9 |
sedmod v1.0: 4096 bytes
|
|
|
d0cde9 |
HHsed v1.5: 4000 bytes
|
|
|
d0cde9 |
sed v1.6: [pending]
|
|
|
d0cde9 |
|
|
|
d0cde9 |
6.6.2. Maximum size for all buffers (pattern space + hold space)
|
|
|
d0cde9 |
|
|
|
d0cde9 |
GNU sed: no limit
|
|
|
d0cde9 |
ssed: no limit
|
|
|
d0cde9 |
sedmod v1.0: 4096 bytes
|
|
|
d0cde9 |
HHsed v1.5: 4000 bytes
|
|
|
d0cde9 |
sed v1.6: [pending]
|
|
|
d0cde9 |
|
|
|
d0cde9 |
6.6.3. Maximum number of files that can be read with read command
|
|
|
d0cde9 |
|
|
|
d0cde9 |
GNU sed v3+: no limit
|
|
|
d0cde9 |
ssed: no limit
|
|
|
d0cde9 |
GNU sed v2.05: total no. of r and w commands may not exceed 32
|
|
|
d0cde9 |
sedmod v1.0: total no. of r and w commands may not exceed 20
|
|
|
d0cde9 |
sed v1.6: [pending]
|
|
|
d0cde9 |
|
|
|
d0cde9 |
6.6.4. Maximum number of files that can be written with 'w' command
|
|
|
d0cde9 |
|
|
|
d0cde9 |
GNU sed v3+: no limit (but typical Unix is 253)
|
|
|
d0cde9 |
ssed: no limit (but typical Unix is 253)
|
|
|
d0cde9 |
GNU sed v2.05: total no. of r and w commands may not exceed 32
|
|
|
d0cde9 |
sedmod v1.0: 10
|
|
|
d0cde9 |
HHsed v1.5: 10
|
|
|
d0cde9 |
sed v1.6: [pending]
|
|
|
d0cde9 |
|
|
|
d0cde9 |
6.6.5. Limits on length of label names
|
|
|
d0cde9 |
|
|
|
d0cde9 |
GNU sed: no limit
|
|
|
d0cde9 |
ssed: no limit
|
|
|
d0cde9 |
HHsed v1.5: no limit
|
|
|
d0cde9 |
sed v1.6: [pending]
|
|
|
d0cde9 |
BSD sed: 8 characters
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Note that GNU sed and ssed both consider a semicolon to terminate a
|
|
|
d0cde9 |
label name.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
6.6.6. Limits on length of write-file names
|
|
|
d0cde9 |
|
|
|
d0cde9 |
GNU sed: no limit
|
|
|
d0cde9 |
ssed: no limit
|
|
|
d0cde9 |
HHsed v1.5: no limit
|
|
|
d0cde9 |
sed v1.6: [pending]
|
|
|
d0cde9 |
BSD sed: 40 characters
|
|
|
d0cde9 |
|
|
|
d0cde9 |
6.6.7. Limits on branch/jump commands
|
|
|
d0cde9 |
|
|
|
d0cde9 |
GNU sed: no limit
|
|
|
d0cde9 |
ssed: no limit
|
|
|
d0cde9 |
HHsed v1.5: 50
|
|
|
d0cde9 |
sed v1.6: [pending]
|
|
|
d0cde9 |
|
|
|
d0cde9 |
As a practical consequence, this means that HHsed will not read
|
|
|
d0cde9 |
more than 50 lines into the pattern space via an N command, even if
|
|
|
d0cde9 |
the pattern space is only a few hundred bytes in size. HHsed exits
|
|
|
d0cde9 |
with an error message, "infinite branch loop at line {nn}".
|
|
|
d0cde9 |
|
|
|
d0cde9 |
6.7. Known incompatibilities between sed versions
|
|
|
d0cde9 |
|
|
|
d0cde9 |
6.7.1. Issuing commands from the command line
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Most versions of sed permit multiple commands to issued on the
|
|
|
d0cde9 |
command line, separated by a semicolon (;). Thus,
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed 'G;G' file
|
|
|
d0cde9 |
|
|
|
d0cde9 |
should triple-space a file. However, for non-GNU sed, some commands
|
|
|
d0cde9 |
*require* separate expressions on the command line. These include:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
- all labels (':a', ':more', etc.)
|
|
|
d0cde9 |
- all branching instructions ('b', 't')
|
|
|
d0cde9 |
- commands to read and write files ('r' and 'w')
|
|
|
d0cde9 |
- any closing brace, '}'
|
|
|
d0cde9 |
|
|
|
d0cde9 |
If these commands are used, they must be the LAST commands of an
|
|
|
d0cde9 |
expression. Subsequent commands must use another expression
|
|
|
d0cde9 |
(another -e switch plus arguments). E.g.,
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed -e :a -e 's/^.\{1,77\}$/ &/;ta' -e 's/\( *\)\1/\1/' files
|
|
|
d0cde9 |
|
|
|
d0cde9 |
GNU sed, ssed, sed15 and sed16 all permit these commands to be
|
|
|
d0cde9 |
followed by a semicolon, so the previous script can be written:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed ':a;s/^.\{1,77\}$/ &/;ta;s/\( *\)\1/\1/' files
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Versions differ in implementing the 'a' (append), 'c' (change), and
|
|
|
d0cde9 |
'i' (insert) commands:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed "/foo/i New text here" # HHsed/sedmod/gsed-30280
|
|
|
d0cde9 |
gsed -e "/foo/i\\" -e "New text here" # GNU sed
|
|
|
d0cde9 |
sed1 -e "/foo/i" -e "New text here" # one version of sed
|
|
|
d0cde9 |
sed2 "/foo/i\ New text here" # another version
|
|
|
d0cde9 |
|
|
|
d0cde9 |
6.7.2. Using comments (prefixed by the '#' sign)
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Most versions of sed permit comments to appear in sed scripts only
|
|
|
d0cde9 |
on the first line of the script. Comments on line 2 or thereafter
|
|
|
d0cde9 |
are not recognized and will generate an error like "unrecognized
|
|
|
d0cde9 |
command" or "command [bad-line-here] has trailing garbage".
|
|
|
d0cde9 |
|
|
|
d0cde9 |
GNU sed, HHsed, sedmod, and HP-UX sed permit comments to appear on
|
|
|
d0cde9 |
any line of the script, except after labels and branching commands
|
|
|
d0cde9 |
(b,t), *provided* that a semicolon (;) occurs after the command
|
|
|
d0cde9 |
itself. This syntax makes sed similar to awk and perl, which use a
|
|
|
d0cde9 |
similar commenting structure in their scripts. Thus,
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# GNU style sed script
|
|
|
d0cde9 |
$!N; # except for last line, get next line
|
|
|
d0cde9 |
s/^\([0-9]\{5\}\).*\n\1.*//; # if first 5 digits of each line
|
|
|
d0cde9 |
# match, delete BOTH lines.
|
|
|
d0cde9 |
t skip
|
|
|
d0cde9 |
P; # print 1st line only if no match
|
|
|
d0cde9 |
:skip
|
|
|
d0cde9 |
D; # delete 1st line of pattern space and loop
|
|
|
d0cde9 |
#---end of script---
|
|
|
d0cde9 |
|
|
|
d0cde9 |
is a valid script for GNU-based versions of sed, but is
|
|
|
d0cde9 |
unrecognized for most other versions of sed.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Finally, if the first two characters in a disk file script are
|
|
|
d0cde9 |
"#n", the output is suppressed, exactly as if -n were entered on
|
|
|
d0cde9 |
the command line. This is true for the following versions of sed:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
- ssed v3.57 and above
|
|
|
d0cde9 |
- gsed
|
|
|
d0cde9 |
- HHsed v1.5
|
|
|
d0cde9 |
- sed v1.6
|
|
|
d0cde9 |
|
|
|
d0cde9 |
This syntax is not recognized by these versions of sed:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
- ssed v3.45 to v3.50 (other versions untested)
|
|
|
d0cde9 |
- sedmod v1.0
|
|
|
d0cde9 |
|
|
|
d0cde9 |
6.7.3. Special syntax in REs
|
|
|
d0cde9 |
|
|
|
d0cde9 |
A. HHsed v1.5 (by Howard Helman)
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The following expressions can be used for /RE/ addresses or in the
|
|
|
d0cde9 |
LHS side of a substitution:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
+ - 1 or more occurrences of previous RE: same as \{1,\}
|
|
|
d0cde9 |
\< - boundary between nonword and word character
|
|
|
d0cde9 |
\> - boundary between word and nonword character
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The following expressions can be used for /RE/ addresses or on
|
|
|
d0cde9 |
either side of a substitution:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
\a - bell (ASCII 07, 0x07)
|
|
|
d0cde9 |
\b - backspace (ASCII 08, 0x08)
|
|
|
d0cde9 |
\e - escape (ASCII 27, 0x1B)
|
|
|
d0cde9 |
\f - formfeed (ASCII 12, 0x0C)
|
|
|
d0cde9 |
\n - newline (printed as 2 bytes, 0D 0A or ^M^J, in DOS)
|
|
|
d0cde9 |
\r - return (ASCII 13, 0x0D)
|
|
|
d0cde9 |
\t - tab (ASCII 09, 0x09)
|
|
|
d0cde9 |
\v - vertical tab (ASCII 11, 0x0B)
|
|
|
d0cde9 |
\xHH - the ASCII character corresponding to 2 hex digits HH.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
B. sed v1.6 (by Walter Briscoe)
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed v1.6 accepts every expression supported by sed v1.5 (above),
|
|
|
d0cde9 |
plus the following elements, which can also used in the RHS of a
|
|
|
d0cde9 |
substitution (in addition to those listed above):
|
|
|
d0cde9 |
|
|
|
d0cde9 |
\\~ - insert replacement pattern defined in last s/// command
|
|
|
d0cde9 |
(must be used alone in the RHS)
|
|
|
d0cde9 |
\l - change next element to lower case
|
|
|
d0cde9 |
\L - change remaining elements to lower case
|
|
|
d0cde9 |
\u - change next element to upper case
|
|
|
d0cde9 |
\U - change remaining elements to upper case
|
|
|
d0cde9 |
\e - end case conversion of next element
|
|
|
d0cde9 |
\E - end case conversion of remaining elements
|
|
|
d0cde9 |
$0 - insert pattern space BEFORE the substitution
|
|
|
d0cde9 |
$1-$9 - match Nth word on the pattern space
|
|
|
d0cde9 |
|
|
|
d0cde9 |
|
|
|
d0cde9 |
C. sedmod v1.0 (by Hern Chen)
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The following expressions can be used for /RE/ addresses in the LHS
|
|
|
d0cde9 |
of a substitution:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
+ - 1 or more occurrences of previous RE: same as \{1,\}
|
|
|
d0cde9 |
\a - any alphanumeric: same as [a-zA-Z0-9]
|
|
|
d0cde9 |
\A - 1 or more alphas: same as \a+
|
|
|
d0cde9 |
\d - any digit: same as [0-9]
|
|
|
d0cde9 |
\D - 1 or more digits: same as \d+
|
|
|
d0cde9 |
\h - any hex digit: same as [0-9a-fA-F]
|
|
|
d0cde9 |
\H - 1 or more hexdigits: same as \h+
|
|
|
d0cde9 |
\l - any letter: same as [A-Za-z]
|
|
|
d0cde9 |
\L - 1 or more letters: same as \l+
|
|
|
d0cde9 |
\n - newline (read as 2 bytes, 0D 0A or ^M^J, in DOS)
|
|
|
d0cde9 |
\s - any whitespace character: space, tab, or vertical tab
|
|
|
d0cde9 |
\S - 1 or more whitespace chars: same as \s+
|
|
|
d0cde9 |
\t - tab (ASCII 09, 0x09)
|
|
|
d0cde9 |
\< - boundary between nonword and word character
|
|
|
d0cde9 |
\> - boundary between word and nonword character
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The following expressions can be used in the RHS of a substitution.
|
|
|
d0cde9 |
"Elements" refer to \1 .. \9, &, $0, or $1 .. $9:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
& - insert regexp defined on LHS
|
|
|
d0cde9 |
\e - end case conversion of next element
|
|
|
d0cde9 |
\E - end case conversion of remaining elements
|
|
|
d0cde9 |
\l - change next element to lower case
|
|
|
d0cde9 |
\L - change remaining elements to lower case
|
|
|
d0cde9 |
\n - newline (printed as 2 bytes, 0D 0A or ^M^J, in DOS)
|
|
|
d0cde9 |
\t - tab (ASCII 09, 0x09)
|
|
|
d0cde9 |
\u - change next element to upper case
|
|
|
d0cde9 |
\U - change remaining elements to upper case
|
|
|
d0cde9 |
$0 - insert the original pattern space
|
|
|
d0cde9 |
$1-$9 - match Nth word on the pattern space
|
|
|
d0cde9 |
|
|
|
d0cde9 |
D. UnixDos sed
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The following expressions can be used in text, LHS, and RHS:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
\n - newline (printed as 2 bytes, 0D 0A or ^M^J, in DOS)
|
|
|
d0cde9 |
|
|
|
d0cde9 |
E. GNU sed v1.03 (by Frank Whaley)
|
|
|
d0cde9 |
|
|
|
d0cde9 |
When used with the -x (extended) switch on the command line, or
|
|
|
d0cde9 |
when '#x' occurs as the first line of a script, Whaley's gsed103
|
|
|
d0cde9 |
supports the following expressions in both the LHS and RHS of a
|
|
|
d0cde9 |
substitution:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
\| matches the expression on either side
|
|
|
d0cde9 |
? 0 or 1 occurrences of previous RE: same as \{0,1\}
|
|
|
d0cde9 |
+ 1 or more occurrence of previous RE: same as \{1,\}
|
|
|
d0cde9 |
\a "alert" beep (BEL, Ctrl-G, 0x07)
|
|
|
d0cde9 |
\b backspace (BS, Ctrl-H, 0x08)
|
|
|
d0cde9 |
\f formfeed (FF, Ctrl-L, 0x0C)
|
|
|
d0cde9 |
\n newline (LF, Ctrl-J, 0x0A)
|
|
|
d0cde9 |
\r carriage-return (CR, Ctrl-M, 0x0D)
|
|
|
d0cde9 |
\t horizontal tab (HT, Ctrl-I, 0x09)
|
|
|
d0cde9 |
\v vertical tab (VT, Ctrl-K, 0x0B)
|
|
|
d0cde9 |
\bBBB binary char, where BBB are 1-8 binary digits, [0-1]
|
|
|
d0cde9 |
\dDDD decimal char, where DDD are 1-3 decimal digits, [0-9]
|
|
|
d0cde9 |
\oOOO octal char, where OOO are 1-3 octal digits, [0-7]
|
|
|
d0cde9 |
\xHH hex char, where HH are 1-2 hex digits, [0-9A-F]
|
|
|
d0cde9 |
|
|
|
d0cde9 |
In normal mode, with or without the -x switch, the following escape
|
|
|
d0cde9 |
sequences are also supported in regex addressing or in the LHS of a
|
|
|
d0cde9 |
substitution:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
\` matches beginning of pattern space: same as /^/
|
|
|
d0cde9 |
\' matches end of pattern space: same as /$/
|
|
|
d0cde9 |
\B boundary between 2 word or 2 nonword characters
|
|
|
d0cde9 |
\w any nonword character [*BUG!* should be a word char]
|
|
|
d0cde9 |
\W any nonword character: same as /[^A-Za-z0-9]/
|
|
|
d0cde9 |
\< boundary between nonword and word char
|
|
|
d0cde9 |
\> boundary between word and nonword char
|
|
|
d0cde9 |
|
|
|
d0cde9 |
F. GNU sed v2.05 and higher versions
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The following expressions can be used for /RE/ addresses or in the
|
|
|
d0cde9 |
LHS side of a substitution:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
\` - matches the beginning of the pattern space (same as "^")
|
|
|
d0cde9 |
\' - matches the end of the pattern space (same as "$")
|
|
|
d0cde9 |
\? - 0 or 1 occurrence of previous character: same as \{0,1\}
|
|
|
d0cde9 |
\+ - 1 or more occurrences of previous character: same as \{1,\}
|
|
|
d0cde9 |
\| - matches the string on either side, e.g., foo\|bar
|
|
|
d0cde9 |
\b - boundary between word and nonword chars (reversible)
|
|
|
d0cde9 |
\B - boundary between 2 word or between 2 nonword chars
|
|
|
d0cde9 |
\n - embedded newline (usable after N, G, or similar commands)
|
|
|
d0cde9 |
\w - any word character: [A-Za-z0-9_]
|
|
|
d0cde9 |
\W - any nonword char: [^A-Za-z0-9_]
|
|
|
d0cde9 |
\< - boundary between nonword and word character
|
|
|
d0cde9 |
\> - boundary between word and nonword character
|
|
|
d0cde9 |
|
|
|
d0cde9 |
On \b, \B, \<, and \>, see section 6.7.4 ("Word boundaries"),
|
|
|
d0cde9 |
below.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Undocumented -r switch:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Beginning with version 3.02, GNU sed has an undocumented -r switch
|
|
|
d0cde9 |
(undocumented till version 4.0), activating Extended Regular
|
|
|
d0cde9 |
Expressions in the following manner:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
? - 0 or 1 occurrence of previous character
|
|
|
d0cde9 |
+ - 1 or more occurrences of previous character
|
|
|
d0cde9 |
| - matches the string on either side, e.g., foo|bar
|
|
|
d0cde9 |
(...) - enable grouping without backslash
|
|
|
d0cde9 |
{...} - enable interval expression without backslash
|
|
|
d0cde9 |
|
|
|
d0cde9 |
When the -r switch (mnemonic: "regular expression") is used, prefix
|
|
|
d0cde9 |
these symbols with a backslash to disable the special meaning.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Escape sequences:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Beginning with version 3.02.80, the following escape sequences can
|
|
|
d0cde9 |
now be used on both sides of a "s///" substitution:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
\a "alert" beep (BEL, Ctrl-G, 0x07)
|
|
|
d0cde9 |
\f formfeed (FF, Ctrl-L, 0x0C)
|
|
|
d0cde9 |
\n newline (LF, Ctrl-J, 0x0A)
|
|
|
d0cde9 |
\r carriage-return (CR, Ctrl-M, 0x0D)
|
|
|
d0cde9 |
\t horizontal tab (HT, Ctrl-I, 0x09)
|
|
|
d0cde9 |
\v vertical tab (VT, Ctrl-K, 0x0B)
|
|
|
d0cde9 |
\oNNN a character with the octal value NNN
|
|
|
d0cde9 |
\dNNN a character with the decimal value NNN
|
|
|
d0cde9 |
\xHH a character with the hexadecimal value HH
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Note that GNU sed also supports "character classes", a POSIX
|
|
|
d0cde9 |
extension to regexes, described in section 3.7, above.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
G. sed 4.0 and higher versions
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The following expressions can be used in the RHS of a substitution.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
\e - end case conversion
|
|
|
d0cde9 |
\l - change next character to lower case
|
|
|
d0cde9 |
\L - change remaining text to lower case
|
|
|
d0cde9 |
\n - newline (printed as 2 bytes, 0D 0A or ^M^J, in DOS)
|
|
|
d0cde9 |
\t - tab (ASCII 09, 0x09)
|
|
|
d0cde9 |
\u - change next character to upper case
|
|
|
d0cde9 |
\U - change remaining text to upper case
|
|
|
d0cde9 |
|
|
|
d0cde9 |
In addition, GNU sed 4.0 can modify the way ^ and $ are interpreted,
|
|
|
d0cde9 |
so that ^ can also match an empty string after a newline character,
|
|
|
d0cde9 |
and $ can also match an empty string before a newline character (to
|
|
|
d0cde9 |
do this, add an "M" after the regular expression terminator, like
|
|
|
d0cde9 |
/^>/M -- see section 3.1.1). Even if you use this feature, \` and \'
|
|
|
d0cde9 |
still match the beginning and the end of the pattern space,
|
|
|
d0cde9 |
respectively.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
H. ssed
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Everything that was said for GNU sed applies to ssed as well. In
|
|
|
d0cde9 |
addition, in Perl-mode (-R switch), these become active or inactive:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
. - no longer matches new-line characters
|
|
|
d0cde9 |
\A - matches beginning of pattern space
|
|
|
d0cde9 |
\Z - matches end of pattern space or last newline in the PS
|
|
|
d0cde9 |
\z - matches end of pattern space
|
|
|
d0cde9 |
\d - matches any digit: same as [0-9]
|
|
|
d0cde9 |
\D - matches any non-digit: same as [^0-9]
|
|
|
d0cde9 |
\` - no longer matches beginning of pattern space
|
|
|
d0cde9 |
\' - no longer matches end of pattern space
|
|
|
d0cde9 |
\< - no longer matches boundary between nonword & word char
|
|
|
d0cde9 |
\> - no longer matches boundary between word & nonword char
|
|
|
d0cde9 |
\oNNN - no longer matches char with octal value NNN
|
|
|
d0cde9 |
\dNNN - no longer matches char with decimal value NNN
|
|
|
d0cde9 |
\NNN - matches char with octal value NNN
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Perl mode supports lookahead (?=match) and lookbehind (?<=match)
|
|
|
d0cde9 |
pattern matching. The matched text is NOT captured in "&" for s///
|
|
|
d0cde9 |
replacements!
|
|
|
d0cde9 |
|
|
|
d0cde9 |
foo(?=bar) - match "foo" only if "bar" follows it
|
|
|
d0cde9 |
foo(?!bar) - match "foo" only if "bar" does NOT follow it
|
|
|
d0cde9 |
(?<=foo)bar - match "bar" only if "foo" precedes it
|
|
|
d0cde9 |
(?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(?
|
|
|
d0cde9 |
- match "foo" only if NOT preceded by "in", "on" or "at"
|
|
|
d0cde9 |
(?<=\d{3})(?
|
|
|
d0cde9 |
- match "foo" only if preceded by 3 digits other than "999"
|
|
|
d0cde9 |
|
|
|
d0cde9 |
In Perl mode, there are two new switches in /addressing/ or s///
|
|
|
d0cde9 |
commands. Switches may be lowercase in s/// commands, but must be
|
|
|
d0cde9 |
uppercase in /addressing/:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
/S - lets "." match a newline also
|
|
|
d0cde9 |
/X - extra whitespace is ignored. See below, for sample usage.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Here are some examples of Perl-style regular expressions. Use the -R
|
|
|
d0cde9 |
switch.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(?i)abc - case-insensitive match of abc, ABC, aBc, ABc, etc.
|
|
|
d0cde9 |
ab(?i)c - same as above; the (?i) applies throughout the pattern
|
|
|
d0cde9 |
(ab(?i)c) - matches abc or abC; the outer parens make the difference!
|
|
|
d0cde9 |
(?m) - multi-line pattern space: same as "s/FIND/REPL/M"
|
|
|
d0cde9 |
(?s) - set "." to match newline also: same as "s/FIND/REPL/S"
|
|
|
d0cde9 |
(?x) - ignore whitespace and #comments; see section (9) below.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(?:abc)foo - match "abcfoo", but do not capture 'abc' in \1
|
|
|
d0cde9 |
(?:ab|cd)ef - match "abef" or "cdef"; only 'cd' is captured in \1
|
|
|
d0cde9 |
(?#remark)xy - match "xy"; remarks after "#" are ignored.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
And here are some sample uses of /X switch to add comments to complex
|
|
|
d0cde9 |
expressions. To embed literal spaces, precede with \ or put inside
|
|
|
d0cde9 |
[brackets].
|
|
|
d0cde9 |
|
|
|
d0cde9 |
# ssed script to change "(123) 456-7890" into "[ac123] 456-7890"
|
|
|
d0cde9 |
#
|
|
|
d0cde9 |
s/ # BACKSLASH IS NEEDED AT END OF EACH LINE! \
|
|
|
d0cde9 |
\( # literal left paren, ( \
|
|
|
d0cde9 |
(\d{3}) # 3 digits \
|
|
|
d0cde9 |
\) # literal right paren, ) \
|
|
|
d0cde9 |
[ \t]* # zero or more spaces or tabs \
|
|
|
d0cde9 |
(\d{3}-\d{4}) # 3 digits, hyphen, 4 digits \
|
|
|
d0cde9 |
/[ac\1] \2/gx; # replace g(lobally), with e(x)tended spacing
|
|
|
d0cde9 |
|
|
|
d0cde9 |
6.7.4. Word boundaries
|
|
|
d0cde9 |
|
|
|
d0cde9 |
GNU sed, ssed, sed16, sed15 and sedmod use certain symbols to define
|
|
|
d0cde9 |
the boundary between a "word character" and a nonword character. A
|
|
|
d0cde9 |
word character fits the regex "[A-Za-z0-9_]". Note: a word character
|
|
|
d0cde9 |
includes the underscore "_" but not the hyphen, probably because the
|
|
|
d0cde9 |
underscore is permissible as a label in sed and in other scripting
|
|
|
d0cde9 |
languages. (In gsed103, a word character did NOT include the
|
|
|
d0cde9 |
underscore; it included alphanumerics only.)
|
|
|
d0cde9 |
|
|
|
d0cde9 |
These symbols include '\<' and '\>' (gsed, ssed, sed15, sed16,
|
|
|
d0cde9 |
sedmod) and '\b' and '\B' (gsed only). Note that the boundary
|
|
|
d0cde9 |
symbols do not represent a character, but a position on the line.
|
|
|
d0cde9 |
Word boundaries are used with literal characters or character sets
|
|
|
d0cde9 |
to let you match (and delete or alter) whole words without
|
|
|
d0cde9 |
affecting the spaces or punctuation marks outside of those words.
|
|
|
d0cde9 |
They can only be used in a "/pattern/" address or in the LHS of a
|
|
|
d0cde9 |
's/LHS/RHS/' command. The following table shows how these symbols
|
|
|
d0cde9 |
may be used in HHsed and GNU sed. Sedmod matches the syntax of
|
|
|
d0cde9 |
HHsed.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Match position Possible word boundaries HHsed GNU sed
|
|
|
d0cde9 |
---------------------------------------------------------------
|
|
|
d0cde9 |
start of word [nonword char]^[word char] \< \< or \b
|
|
|
d0cde9 |
end of word [word char]^[nonword char] \> \> or \b
|
|
|
d0cde9 |
middle of word [word char]^[word char] none \B
|
|
|
d0cde9 |
outside of word [nonword char]^[nonword char] none \B
|
|
|
d0cde9 |
---------------------------------------------------------------
|
|
|
d0cde9 |
|
|
|
d0cde9 |
In ssed, the symbols '\<' and '\>' lose their special meaning when
|
|
|
d0cde9 |
the -R switch is used to invoke Perl-style expressions. However,
|
|
|
d0cde9 |
the identical meaning of '\<' and '\>' can be obtained through
|
|
|
d0cde9 |
these nonmatching, zero-width assertions:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(?
|
|
|
d0cde9 |
|
|
|
d0cde9 |
6.7.5. Commands which operate differently
|
|
|
d0cde9 |
|
|
|
d0cde9 |
A. GNU sed version 3.02 and 3.02.80
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The N command no longer discards the contents of the pattern space
|
|
|
d0cde9 |
upon reaching the end of file. This is not a bug, it's a feature.
|
|
|
d0cde9 |
However, it breaks certain scripts which relied on the older
|
|
|
d0cde9 |
behavior of N.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
'N' adds the Next line to the pattern space, enabling multiple
|
|
|
d0cde9 |
lines to be stored and acted upon. Upon reaching the last line of
|
|
|
d0cde9 |
the file, if the N command was issued again, the contents of the
|
|
|
d0cde9 |
pattern space would be silently deleted and the script would abort
|
|
|
d0cde9 |
(this has been the traditional behavior). For this reason, sed
|
|
|
d0cde9 |
users generally wrote:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
$!N; # to add the Next line to every line but the last one.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
However, certain sed scripts relied on this behavior, such as the
|
|
|
d0cde9 |
script to delete trailing blank lines at the end of a file (see
|
|
|
d0cde9 |
script #12 in section 3.2, "Common one-line sed scripts", above).
|
|
|
d0cde9 |
Also, classic textbooks such as Dale Dougherty and Arnold Robbins'
|
|
|
d0cde9 |
_sed & awk_ documented the older behavior.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The GNU sed maintainer felt that despite the portability problems
|
|
|
d0cde9 |
this would cause, changing the N command to print (rather than
|
|
|
d0cde9 |
delete) the pattern space was more consistent with one's intuitions
|
|
|
d0cde9 |
about how a command to "append the Next line" _ought_ to behave.
|
|
|
d0cde9 |
Another fact favoring the change was that "{N;command;}" will
|
|
|
d0cde9 |
delete the last line if the file has an odd number of lines, but
|
|
|
d0cde9 |
print the last line if the file has an even number of lines.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
To convert scripts which used the former behavior of N (deleting
|
|
|
d0cde9 |
the pattern space upon reaching the EOF) to scripts compatible with
|
|
|
d0cde9 |
all versions of sed, change a lone "N;" to "$d;N;".
|
|
|
d0cde9 |
|
|
|
d0cde9 |
------------------------------
|
|
|
d0cde9 |
|
|
|
d0cde9 |
7. KNOWN BUGS AMONG SED VERSIONS
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Most versions of GNU sed and ssed contain a "buglist" in the
|
|
|
d0cde9 |
archive source code of known errors or reported behaviors that may
|
|
|
d0cde9 |
be misconstrued as bugs. This portion of the sed FAQ does _not_
|
|
|
d0cde9 |
attempt to fully reproduce those buglists files. However, we do
|
|
|
d0cde9 |
seek to do some substantial reporting, particularly where certain
|
|
|
d0cde9 |
programs have no "buglist" of their own or are not being actively
|
|
|
d0cde9 |
maintained.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
As a rule of thumb, if the bug "bites" someone on the sed-users
|
|
|
d0cde9 |
mailing list, I tend to report it.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
7.1. ssed v3.59 (by Paolo Bonzini)
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(1) N does not discard the contents of the pattern space upon
|
|
|
d0cde9 |
reaching the end of file; not a bug. See section 6.7.5.A, above.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(2) If \x26 is entered into the RHS of a substitution, it is
|
|
|
d0cde9 |
interpreted as an ampersand metacharacter, and the entire pattern
|
|
|
d0cde9 |
matched in the "find" portion is inserted at that point. A literal
|
|
|
d0cde9 |
ampersand should be inserted instead.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(3) Under Windows 2000, the -i switch doesn't create backup files
|
|
|
d0cde9 |
properly. When passed one or more files to process, the source
|
|
|
d0cde9 |
file(s) are unchanged, and the output changed files are given
|
|
|
d0cde9 |
filenames like sedDOSxyz with no way to correspond them with the
|
|
|
d0cde9 |
names of the source files.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
7.2. GNU sed v4.0 - v4.0.5
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(1) N does not discard the contents of the pattern space upon
|
|
|
d0cde9 |
reaching the end of file; not a bug. See section 6.7.5.A, above.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(2) If \x26 is entered into the RHS of a substitution, it is
|
|
|
d0cde9 |
interpreted as an ampersand metacharacter, and the entire pattern
|
|
|
d0cde9 |
matched in the "find" portion is inserted at that point. A literal
|
|
|
d0cde9 |
ampersand should be inserted instead.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
7.3. GNU sed v3.02.80
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(1) N does not discard the contents of the pattern space upon
|
|
|
d0cde9 |
reaching the end of file; not a bug. See section 6.7.5.A, above.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(2) Same as #2 for GNU sed v4.0, above.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
7.4. GNU sed v3.02
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(1) Affects only v3.02 binaries compiled with DJGPP for MS-DOS and
|
|
|
d0cde9 |
MS-Windows: 'l' (list) command does not display a lone carriage
|
|
|
d0cde9 |
return (0x0D, ^M) embedded in a line.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(2) The expression "\<" causes problems when attempting the
|
|
|
d0cde9 |
following types of substitutions, which should print "+aaa +bbb":
|
|
|
d0cde9 |
|
|
|
d0cde9 |
echo aaa bbb | sed 's/\</+/g' # prints "+a+a+a +b+b+b"
|
|
|
d0cde9 |
echo aaa bbb | sed 's/\<./+&/g' # prints "+a+a+a +b+b+b"
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(3) The N command no longer discards the contents of the pattern
|
|
|
d0cde9 |
space upon reaching the end of file. This is not a bug, it's a
|
|
|
d0cde9 |
feature. See section 6.7.5, "Commands which operate differently".
|
|
|
d0cde9 |
|
|
|
d0cde9 |
7.5. GNU sed v2.05
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(1) If a number follows the substitute command (e.g., s/f/F/10) and
|
|
|
d0cde9 |
the number exceeds the possible matches on the pattern space, the
|
|
|
d0cde9 |
command 't label' _always_ jumps to the specified label. 't' should
|
|
|
d0cde9 |
jump only if the substitution was successful (or returned "true").
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(2) 'l' (list) command does not convert the following characters to
|
|
|
d0cde9 |
hex values, but passes them through unchanged: 0xF7, 0xFB, 0xFC,
|
|
|
d0cde9 |
0xFD, 0xFE.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(3) A range address like "/foo/,14" is supposed to match every line
|
|
|
d0cde9 |
from the first occurrence of "foo" until line 14, inclusive, and
|
|
|
d0cde9 |
then match only those lines containing "foo" thereafter. In gsed
|
|
|
d0cde9 |
v2.05, if "foo" occurs later in the file, every line from there to
|
|
|
d0cde9 |
the end of file will be matched (since gsed is looking for line 14
|
|
|
d0cde9 |
to occur again!).
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(4) The regexes /\`/ and /\'/ are not interpreted as a backquote
|
|
|
d0cde9 |
and apostrophe, as might be expected. Instead, they are used to
|
|
|
d0cde9 |
represent the beginning-of-line and end-of-line (respectively), to
|
|
|
d0cde9 |
conform with similar regexes in the GNU versions of Emacs and awk.
|
|
|
d0cde9 |
As a consequence, there is no clear way to indicate an apostrophe,
|
|
|
d0cde9 |
since a bare apostrophe (') has special meaning to the Unix shell
|
|
|
d0cde9 |
and the quoted apostrophe (\') is interpreted as the EOL. A
|
|
|
d0cde9 |
double-quote apostrophe (\\') was interpreted as a backslash to sed
|
|
|
d0cde9 |
and a quote mark to the shell--again, not providing the expected
|
|
|
d0cde9 |
results. This syntax changed in the next version of gsed.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(5) Multiple occurrences of the 'w' command fail, as shown here,
|
|
|
d0cde9 |
given that both "aaa" and "bbb" occur within the file:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
gsed -e "/aaa/w FILE" -e "/bbb/w FILE" input.txt
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(6) The expression "\<" causes problems when attempting the
|
|
|
d0cde9 |
following type of substitution, which should print "+aaa +bbb":
|
|
|
d0cde9 |
|
|
|
d0cde9 |
echo aaa bbb | sed 's/\</+/g' # sed hangs up with no output
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The syntax 's/\<./+&/g' issues the proper output.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
7.6. GNU sed v1.18
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(1) Same as #1 for GNU sed v2.05, above.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(2) The following command will lock the computer under Win95. Echos
|
|
|
d0cde9 |
is an echo command that does not issue a trailing newline:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
echos any_word | gsed "s/[ ]*$//"
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(3) Same as #3 for GNU sed v2.05, above.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
7.7. GNU sed v1.03 (by Frank Whaley)
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(1) The \w and \W escape sequences both match only nonword
|
|
|
d0cde9 |
characters. \w is misdefined and should match word characters.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(2) The underscore is defined as a nonword character; it should be
|
|
|
d0cde9 |
defined as a word character.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(3) same as #3 for GNU sed v2.05, above.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
7.8. sed v1.6 (by Walter Briscoe) - still in beta version
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(1) Duplicated subexpressions (still) do not match an empty set as
|
|
|
d0cde9 |
they should. This problem was inherited from HHsed15.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
echo 123 | sed "s/\([a-z][a-z]\)*/=\1/" # does not return '='
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(2) If grouping is followed by a + operator, nothing is matched.
|
|
|
d0cde9 |
This problem was inherited from HHsed; it fixed a bug with the *
|
|
|
d0cde9 |
operator, but the problem with the + operator persists.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
echo aaa | sed "/\(a\)+/d" # nothing is deleted.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(3) With the interval expressions \{1,\} and +, there is a bug
|
|
|
d0cde9 |
related to the & replacement character. This affected the BETA
|
|
|
d0cde9 |
release, and it's not known if it affects the final release.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
echo ab | sed "s/a[^a]*/&c/" # returns 'abc'. Okay.
|
|
|
d0cde9 |
echo ab | sed "s/a[^a]+/&c/" # returns 'ab'. Bug!
|
|
|
d0cde9 |
echo ab | sed "s/a[^a]\{1,\}/&c/" # returns 'ab'. Bug!
|
|
|
d0cde9 |
|
|
|
d0cde9 |
7.9. HHsed v1.5 (by Howard Helman)
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(1) If a number follows the substitute command (e.g., s/foo/bar/2),
|
|
|
d0cde9 |
in a sed script entered from the command line, two semicolons must
|
|
|
d0cde9 |
follow the number, or they must be separated by an -e switch.
|
|
|
d0cde9 |
Normally, only 1 semicolon is needed to separate commands.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
echo bit bet | HHsed "s/b/n/2;;s/b/B/" # solution 1
|
|
|
d0cde9 |
echo bit bet | HHsed -e "s/b/n/2" -e "s/b/B" # solution 2
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(2) If the substitute command is followed by a number and a "p"
|
|
|
d0cde9 |
flag, when the -n switch is used, the "p" flag must occur first.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
echo aaa | HHsed -n "s/./B/3p" # bug! nothing prints
|
|
|
d0cde9 |
echo aaa | HHsed -n "s/./B/p3" # prints "aaB" as expected
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(3) The following commands will cause HHsed to lock the computer
|
|
|
d0cde9 |
under MS-DOS or Win95. Note that they occur because of malformed
|
|
|
d0cde9 |
regular expressions which will match no characters.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
sed -n "p;s/\<//g;" file
|
|
|
d0cde9 |
sed -n "p;s/[char-set]*//g;" file
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(4) The range command '/RE1/,/RE2/' in HHsed will match one line if
|
|
|
d0cde9 |
both regexes occur on the same line (see section 3.4(3), above).
|
|
|
d0cde9 |
Though this could be construed as a feature, it should probably be
|
|
|
d0cde9 |
considered a bug since its operation differs from every other
|
|
|
d0cde9 |
version of sed. For example, '/----/,/----/{s/^/>>/;}' should put
|
|
|
d0cde9 |
two angle brackets ">>" before every line which is sandwiched
|
|
|
d0cde9 |
between a row of 4 or more hyphens. With HHsed, this command will
|
|
|
d0cde9 |
only prefix the hyphens themselves with the angle brackets.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(5) If the hold space is empty, the H command copies the pattern
|
|
|
d0cde9 |
space to the hold space but fails to prepend a leading newline. The
|
|
|
d0cde9 |
H command is supposed to add a newline, followed by the contents of
|
|
|
d0cde9 |
the pattern space, to the hold space at all times. A workaround is
|
|
|
d0cde9 |
"{G;s/^\(.*\)\(\n\)$/\2\1/;H;s/\n$//;}", but it requires knowing
|
|
|
d0cde9 |
that the hold space is empty and using the command only once.
|
|
|
d0cde9 |
Another alternative is to use the G or the h command alone at key
|
|
|
d0cde9 |
points in the script.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(6) If grouping is followed by an '*' or '+' operator, HHsed does
|
|
|
d0cde9 |
not match the pattern, but issues no warning. See below:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
echo aaa | HHsed "/\(a\)*/d" # nothing is deleted
|
|
|
d0cde9 |
echo aaa | HHsed "/\(a\)+/d" # nothing is deleted
|
|
|
d0cde9 |
echo aaa | HHsed "s/\(a\)*/\1B/" # nothing is changed
|
|
|
d0cde9 |
echo aaa | HHsed "s/\(a\)+/\1B/" # nothing is changed
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(7) If grouping is followed by an interval expression, HHsed halts
|
|
|
d0cde9 |
with the error message "garbled command", in all of the following
|
|
|
d0cde9 |
examples:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
echo aaa | HHsed "/\(a\)\{3\}/d"
|
|
|
d0cde9 |
echo aaa | HHsed "/\(a\)\{1,5\}/d"
|
|
|
d0cde9 |
echo aaa | HHsed "s/\(a\)\{3\}/\1B/"
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(8) In interval expressions, 0 is not supported. E.g., \{0,3\)
|
|
|
d0cde9 |
|
|
|
d0cde9 |
7.10. sedmod v1.0 (by Hern Chen)
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Technically, the following are limits (or features?) of sedmod, not
|
|
|
d0cde9 |
bugs, since the docs for sedmod do not claim to support these
|
|
|
d0cde9 |
missing features.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(1) sedmod does not support standard interval expressions \{...\}
|
|
|
d0cde9 |
present in nearly all versions of sed.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(2) If grouping is followed by an '*' or '+' operator, sedmod gives
|
|
|
d0cde9 |
a "garbled command" message. However, if the grouped expressions
|
|
|
d0cde9 |
are strings literals with no metacharacters, a partial workaround
|
|
|
d0cde9 |
can be done like so:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
\(string\)\1* # matches 1 or more instances of 'string'
|
|
|
d0cde9 |
\(string\)\1+ # matches 2 or more instances of 'string'
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(3) sedmod does not support a numeric argument after the s///
|
|
|
d0cde9 |
command, as in 's/a/b/3', present in nearly all versions of sed.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
The following are bugs in sedmod v1.0:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(4) When the -i (ignore case) switch is used, the '/regex/d'
|
|
|
d0cde9 |
command is not properly obeyed. Sedmod may miss one or more lines
|
|
|
d0cde9 |
matching the expression, regardless of where they occur in the
|
|
|
d0cde9 |
script. Workaround: use "/regex/{d;}" instead.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
7.11. HP-UX sed
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(1) Versions of HP-UX sed up to and including version 10.20 are
|
|
|
d0cde9 |
buggy. According to the README file, which comes with the GNU cc
|
|
|
d0cde9 |
at <ftp://ftp.ntua.gr/pub/gnu/sed/sed-2.05.bin.README>:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
"When building gcc on a hppa*-*-hpux10 platform, the `fixincludes'
|
|
|
d0cde9 |
step (which involves running a sed script) fails because of a bug
|
|
|
d0cde9 |
in the vendor's implementation of sed. Currently the only known
|
|
|
d0cde9 |
workaround is to install GNU sed before building gcc. The file
|
|
|
d0cde9 |
sed-2.05.bin.hpux10 is a precompiled binary for that platform."
|
|
|
d0cde9 |
|
|
|
d0cde9 |
7.12. SunOS sed v4.1
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(1) Bug occurs in RE pattern matching when a non-null '[char-set]*'
|
|
|
d0cde9 |
is followed by a null '\NUM' pattern recall, illustrated here and
|
|
|
d0cde9 |
reported by Greg Ubben:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
s/\(a\)\(b*\)cd\1[0-9]*\2foo/bar/ # between '[0-9]*' and '\2'
|
|
|
d0cde9 |
s/\(a\{0,1\}\).\{0,1\}\1/bar/ # between '.\{0,1\}' and '\1'
|
|
|
d0cde9 |
|
|
|
d0cde9 |
Workaround: add a do-nothing 'X*' expression which will not match
|
|
|
d0cde9 |
any characters on the line between the two components. E.g.,
|
|
|
d0cde9 |
|
|
|
d0cde9 |
s/\(a\)\(b*\)cd\1[0-9]*X*\2foo/bar/
|
|
|
d0cde9 |
s/\(a\{0,1\}\).\{0,1\}X*\1/bar/
|
|
|
d0cde9 |
|
|
|
d0cde9 |
7.13. SunOS sed v5.6
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(1) If grouping is followed by an asterisk, SunOS sed does not match
|
|
|
d0cde9 |
the null string, which it should do. The following command:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
echo foo | sed 's/f\(NO-MATCH\)*/g\1/'
|
|
|
d0cde9 |
|
|
|
d0cde9 |
should transform "foo" to "goo" under normal versions of sed.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
7.14. Ultrix sed v4.3
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(1) If grouping is followed by an asterisk, Ultrix sed replies with
|
|
|
d0cde9 |
"command garbled", as shown in the following example:
|
|
|
d0cde9 |
|
|
|
d0cde9 |
echo foo | sed 's/f\(NO-MATCH\)*/g\1/'
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(2) If grouping is followed by a numeric operator such as \{0,9\},
|
|
|
d0cde9 |
Ultrix sed does not find the match.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
7.15. Digital Unix sed
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(1) The following comes from the man pages for sed distributed with
|
|
|
d0cde9 |
new, 1998 versions of Digital Unix (reformatted to fit our
|
|
|
d0cde9 |
margins):
|
|
|
d0cde9 |
|
|
|
d0cde9 |
[Digital] The h subcommand for sed does not work properly. When
|
|
|
d0cde9 |
you use the h subcommand to place text into the hold area, only
|
|
|
d0cde9 |
the last line of the specified text is saved. You can use the H
|
|
|
d0cde9 |
subcommand to append text to the hold area. The H subcommand and
|
|
|
d0cde9 |
all others dealing with the hold area work correctly.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
(2) "$d" command issues an error message, "cannot parse". Reported
|
|
|
d0cde9 |
by Carlos Duarte on 8 June 1998.
|
|
|
d0cde9 |
|
|
|
d0cde9 |
[end-of-file]
|