,TITLE.24009 Page 1 Tuesday, October 9, 2001 1:55 AM
Effective awk Programming
,TITLE.24009 Page 2 Tuesday, October 9, 2001 1:55 AM
,TITLE.24009 Page 3 Tuesday, October 9, 2001 1:55 AM
Effective awk Programming
Third Edition
Arnold Robbins
Beijing • Cambridge • Farnham • Köln • Paris • Sebastopol • Taipei • Tokyo
,AUTHOR.COLO.23763 Page 423 Tuesday, October 9, 2001 1:55 AM
About the Author
Arnold Robbins, a native of Atlanta, Georgia, is a professional programmer and a
technical author. He is also a happy husband, the father of four very cute chil-
dren, and an amateur Talmudist (Babylonian and Jerusalem). Since late 1997, he
and his family have been living happily in Israel.
Arnold has been working with Unix systems since 1980, when he was introduced
to a PDP-11 running a version of Sixth Edition Unix. He has been a heavy awk
user since 1987, when he became involved with gawk, the GNU project’s version
of awk. As a member of the POSIX 1003.2 balloting group, he helped shape the
POSIX standard for awk. He currently maintains gawk and its documentation (i.e.,
this book). The documentation is also available from the Free Software Founda-
tion (http://www.gnu.org).
In previous
incarnations, Arnold was a systems administrator and taught
continuing education classes in Unix and networking. He has also had more than
one poor experience with startup software companies, about which he prefers not
to think anymore.
O’Reilly & Associates has been keeping him busy. In addition to this book, Arnold
is the author of Unix in a Nutshell, Third Edition, and the sed & awk Pocket
Reference; he is the coauthor of sed & awk, Second Edition, and Learning the vi
Editor, Sixth Edition.
Colophon
Our look is the result of reader comments, our own experimentation, and feed-
back from distribution channels. Distinctive covers complement our distinctive
approach to technical topics, breathing personality and life into potentially dry
subjects.
The animal on the cover of Effective awk Programming, Third Edition, is a great
auk, a powerful symbol of nineteenth-century European and American arrogance
toward nature. In using great auks as food and for their oil, and later collecting
specimen for the kind of
trivial display so popular with the inhabitants of
mansions in Victorian England, mankind showed no mercy; mankind did not take
care to effectively manage the few delicate populations as sustainable resources,
much less treat the great auk as a living species worthy of respect. In 1844, sailors
working for a British collector killed the last two great auks and stole their incu-
bating egg on an island off the coast of Iceland.
,AUTHOR.COLO.23763 Page 424 Tuesday, October 9, 2001 1:55 AM
The original penguin, great auks were large, black and white, flightless seabirds
with pronounced, bent, orange beaks. The auks nested for three to four weeks
each spring on craggy islands in the North Atlantic. When not nesting with their
lifelong mates, great auks swam the seas in extended-family groups, occasionally
deep-sea diving for large fish. Sixteenth-century sailors who exploited nesting
populations for food during long voyages called the birds penguins, a name they
also gave to the smaller-beaked seabirds of the Southern Hemisphere that still exist
today.
Jeffrey Holcomb was the production editor for Effective awk Programming, Third
Edition. Claire Cloutier was the production manager. Mary Brady was the copyed-
itor, and Maureen Dempsey was
the proofreader. Rachel Wheeler, Matt
Hutchinson, and Claire Cloutier provided quality control. Kimo Carter and Matt
Hutchinson provided production support. Arnold Robbins and Nancy Crumpton
wrote the index.
Hanna Dyer designed the cover of this book, based on a series design by Edie
Freedman. The cover image is a 19th-century engraving from Century Illustrated
Monthly Magazine. Emma Colby produced the cover layout with QuarkXPress 4.1
using Adobe’s ITC Garamond font.
David Futato designed the interior layout based on a series design by Nancy Priest.
Using a version of makeinfo modified by Phillippe Martin to create DocBook and
enhanced by the author, the book was converted by the author from the Texinfo
source into DocBook XML. Arnold then post-processed the generated DocBook
with no less than six awk scripts (of course!), finally tuning the DocBook source
files by hand. The print version of
this book was created by translating the
DocBook XML markup of its source files into a set of groff macros using a filter
developed at O’Reilly & Associates by Norman Walsh. Steve Talbott designed and
wrote the underlying macro set on the basis of the GNU troff –mgs macros; Lenny
Muellner adapted them to XML and implemented the book design. The GNU groff
text formatter Version 1.11.1 was used to generate PostScript output. The text and
heading fonts are ITC Garamond Light and Garamond Book; the code font is
Constant Willison. The illustrations that appear in the book were produced by
Robert Romano and Jessamyn Read using Macromedia FreeHand 9 and Adobe
Photoshop 6. This colophon was written by Sarah Jane Shangraw.
Whenever possible, our books use a durable and flexible lay-flat binding. If the
page count exceeds this binding’s limit, perfect binding is used.
To Miriam, for making me complete.
To Chana, for the joy you bring us.
To Rivka, for the exponential increase.
To Nachum, for the added dimension.
To Malka, for the new beginning.
9 October 2001 01:44
9 October 2001 01:44
Ta ble of Contents
Fore word .............................................................................................................. xiii
Preface .................................................................................................................... xv
I. The awk Language and gawk ........................................... 1
1. Getting Star ted with awk ......................................................................... 3
How to Run awk Programs ................................................................................ 4
Datafiles for the Examples ............................................................................... 10
Some Simple Examples .................................................................................... 11
An Example with Two Rules ........................................................................... 13
A Mor e Complex Example ............................................................................... 14
awk Statements Versus Lines ........................................................................... 15
Other Features of awk ..................................................................................... 17
When to Use awk ............................................................................................. 17
2. Regular Expressions ................................................................................. 19
How to Use Regular Expressions .................................................................... 19
Escape Sequences ............................................................................................ 21
Regular Expression Operators ......................................................................... 23
Using Character Lists ........................................................................................ 26
gawk-Specific Regexp Operators ..................................................................... 28
Case Sensitivity in Matching ............................................................................ 29
How Much Text Matches? ................................................................................ 31
Using Dynamic Regexps .................................................................................. 31
vii
9 October 2001 01:45