Eric Radman : a Journal

AWK Programming

awk is a processing language with a man page that can be printed on eight sheets of paper. Most importantly, it is a standard part of any Unix-like environment, so it can be used in contexts where the features of a target installation are unknown.

Program structure has three sections, all of which are optional

BEGIN { ... }
/patternN/ { ... }
END { ... }

Regex Limitations

The most severe limitation of awk is that it does not support non-greedy pattern matching, which means that reguar expressions cannot be limited to the first match. There is no easy solution, but a workaround can often be built using a loop and index().

Reference the Current Script

Shell script includes the name of the current script in $0 but this information is not available in awk

ARGV[0] = awk
ARGV[1] = first argument

This is important because recursive calls to the same script are only possible if the path can be found. Sometimes checking for an environment variable, with the fallback of the current working directory works

sd = ENVIRON["SD"]
if (!sd) sd = "."
system(sd "/my.awk x=y")

Make Targets

make(1) recognizes some common extensions such as .sh and .c by default. .awk files can also be processed by setting the .SUFFIXES rule.

.SUFFIXES: .awk
.awk:
    sed -e 's/$${release}/${RELEASE}/' $< > $@
    @chmod +x $@

Argument Parsing

Command line arguments can also be processed in order

for (i=1; i<ARGC; i++) { printf " " ARGV[i] }
printf "\n"

All arguments are assumed to be input files to process. shift arguments by assigning empty string to the argument

if (ARGV[1] == "-q") {
  ARGV[1] = ""
  quiet = 1
}

Literals

Perhaps the simplest way to escape single quotes is to use the octal or hex ASCII code

print "\047"
print "\x27"

Special Characters

When performing string subsitution with sub() or gsub(), & is substituted with the fragment of text matching the regex. To work around this escape variables. This example escapes a user supplied variable

awk -v template="$1" 'BEGIN { gsub("&", "\\\\&", template) } { ... }'

Collecting Statistics

An associative array is handy for accumulating statistics. Variables that are not initialized are assumed to start at zero

/aliases/ { hosts[$2]++ }
END {
  for (h in hosts) print hosts[h], h
}

In standard awk there is no easy way to order items; if this is needed pipe the results to sort(1).

Safe Mode

From the OpenBSD man page

Disable file output (print >, print >>), process creation (cmd | getline, print |, system), and access to the environment (ENVIRON; see the section on variables below).
BSD/Mac -safe
GNU Awk -S, --sandbox
Busybox not available

Filter Daemon

Awk can be run as a persistent process that filters input from a pipe! Examples: