AWK Programming
awk is a processing language with a man page that can be printed on eight sheets of paper. Most importantly, it is a standard part of any Unix-like environment, so it can be used in contexts where the features of a target installation are unknown.
Program structure has three sections, all of which are optional
BEGIN { ... } /patternN/ { ... } END { ... }
Reference the Current Script
Shell script includes the name of the current script in
$0
but this information is not available in awk
ARGV[0] = awk ARGV[1] = first argument
This is important because recursive calls to the same script are only possible if the path can be found. Sometimes checking for an environment variable, with the fallback of the current working directory works
sd = ENVIRON["SD"] if (!sd) sd = "." system(sd "/my.awk x=y")
Make Targets
make(1)
recognizes some common extensions such as
.sh
and
.c
by default.
.awk
files can also be processed by setting the
.SUFFIXES
rule.
.SUFFIXES: .awk .awk: sed -e 's/$${release}/${RELEASE}/' $< > $@ @chmod +x $@
Argument Parsing
Command line arguments can also be processed in order
for (i=1; i<ARGC; i++) { printf " " ARGV[i] } printf "\n"
All arguments are assumed to be input files to process.
shift
arguments by assigning empty string to the argument
if (ARGV[1] == "-q") { ARGV[1] = "" quiet = 1 }
Literals
Perhaps the simplest way to escape single quotes is to use the octal or hex ASCII code
print "\047" print "\x27"
Special Characters
When performing string subsitution
with
sub()
or
gsub()
,
&
is substituted with the fragment of text matching the regex. To work around
this escape variables. This example escapes a user supplied variable
awk -v template="$1" 'BEGIN { gsub("&", "\\\\&", template) } { ... }'
Translating Newlines
Sometimes it is useful to convert newlines into a literal
\n
to format a JSON value. One way to accomplish this is by setting the output
record seperator a string
awk 1 ORS='\\n'
The first argument is an always-true condition, and
print
is implied.
Collecting Statistics
An associative array is handy for accumulating statistics. Variables that are not initialized are assumed to start at zero
/aliases/ { hosts[$2]++ } END { for (h in hosts) print hosts[h], h }
Safe Mode
From the OpenBSD man page
Disable file output (print >, print >>
), process creation (cmd | getline, print |, system
), and access to the environment (ENVIRON;
see the section on variables below).
BSD/Mac |
-safe
|
GNU Awk |
-S, --sandbox
|
Busybox | not available |
Filter Daemon
Awk can be run as a persistent process that filters input from a pipe! Examples:
- add_list_headers.awk used by the scriptedconfiguration.org mailing list
- entr(1) status filters
Limitations
Standard awk is missing some important features:
-
non-greedy
pattern matching: that reguar expressions cannot be limited to the first match. There is no easy solution, but a workaround can be built using a loop andindex()
. - No function for sorting arrays. [1]
[1] gawk has
asort()
and
asorti()