Eric Radman : a Journal

AWK Programming

awk is a processing language with a man page that can be printed on eight sheets of paper. Most importantly, it is a standard part of any Unix-like environment, so it can be used in contexts where the features of a target installation are unknown.

Program structure has three sections, all of which are optional

BEGIN { ... }
/patternN/ { ... }
END { ... }

Reference the Current Script

Shell script includes the name of the current script in $0 but this information is not available in awk

ARGV[0] = awk
ARGV[1] = first argument

This is important because recursive calls to the same script are only possible if the path can be found. Sometimes checking for an environment variable, with the fallback of the current working directory works

sd = ENVIRON["SD"]
if (!sd) sd = "."
system(sd "/my.awk x=y")

Make Targets

make(1) recognizes some common extensions such as .sh and .c by default. .awk files can also be processed by setting the .SUFFIXES rule.

.SUFFIXES: .awk
.awk:
    sed -e 's/$${release}/${RELEASE}/' $< > $@
    @chmod +x $@

Argument Parsing

Command line arguments can also be processed in order

for (i=1; i<ARGC; i++) { printf " " ARGV[i] }
printf "\n"

All arguments are assumed to be input files to process. shift arguments by assigning empty string to the argument

if (ARGV[1] == "-q") {
  ARGV[1] = ""
  quiet = 1
}

Literals

Perhaps the simplest way to escape single quotes is to use the octal or hex ASCII code

print "\047"
print "\x27"

Special Characters

When performing string subsitution with sub() or gsub(), & is substituted with the fragment of text matching the regex. To work around this escape variables. This example escapes a user supplied variable

awk -v template="$1" 'BEGIN { gsub("&", "\\\\&", template) } { ... }'

Translating Newlines

Sometimes it is useful to convert newlines into a literal \n to format a JSON value. One way to accomplish this is by setting the output record seperator a string

awk 1 ORS='\\n'

The first argument is an always-true condition, and print is implied.

Collecting Statistics

An associative array is handy for accumulating statistics. Variables that are not initialized are assumed to start at zero

/aliases/ { hosts[$2]++ }
END {
  for (h in hosts) print hosts[h], h
}

Safe Mode

From the OpenBSD man page

Disable file output (print >, print >>), process creation (cmd | getline, print |, system), and access to the environment (ENVIRON; see the section on variables below).
BSD/Mac -safe
GNU Awk -S, --sandbox
Busybox not available

Filter Daemon

Awk can be run as a persistent process that filters input from a pipe! Examples:

Limitations

Standard awk is missing some important features:

[1] gawk has asort() and asorti()