Great awkGavin Wraith continues his series on awk. (If you missed the first part, there's a copy of the Mawk interpreter in the SOFTWARE directory on the CD-ROM.) Records and Fields: Example 3In the previous article I said that pattern-action statements applied to the text line by line and that the variable $n denoted the n-th word of the current line. That was a deliberate oversimplification. Adopting jargon from database tradition, we may say that every textfile can be thought of as a sequence of records, separated by a record-separator, and that each record can be thought of as a sequence of fields, separated by a field-separator. Awk has a number of built in variables, among them:
The default value for both RS and ORS is the string '\n' - in other words,
a newline character. This means that by default
record is a line of text. The default value for FS and OFS is ' ',
a single blank space. When FS takes this special value, input fields
are separated by blank spaces and/or tabs, and leading blank spaces
and tabs are ignored. So, in effect, fields are words.
The variable OFS determines how the comma symbol is interpreted
between the expressions following If RS is set to an empty string '' then input records are separated by one or more blank lines, and field-separators are either newlines or values given by FS. So the following settings: BEGIN { RS = ""; FS = "\n" } would interpret paragraphs as records and individual lines as fields, in a 'standard' text file where paragraph breaks are blank lines. Incidentally, as the example aboove shows, the semicolon
can be used to put many statements on the
same line. Unlike C, the semicolon is a statement separator,
not a statement terminator, which is why we don't need one
after J D Salinger The Catcher in the Rye M Peake Gormenghast J Cowper Powys Wolf Solent Himself Augustus Carp By applying them to the Authors file, can you explain the difference in behaviour between the following awk programs?
Without those commas, OFS plays no role. To undo the effect of convert2 try the following awk program: # revert BEGIN { FS = " was written by " } { printf("%s\n%s\n\n", $2,$1) } Command Line Arguments and FilesAwk was devised for Unix in the days before Desktop environments. Have a look at !mawk.Docs.manpage for the complete specification of the command line arguments for the executable file !mawk.bin.mawk. As awk programs are often so short, awk was written to accept 'throwaway' programs inside single quotes on the command line to the awk (in our case mawk) command. In RISC OS, where path names are limited to 256 characters, this is not so convenient. Instead one must use the-f <program pathname> option. If you shift-doubleclick on the Obey file !mawk.Apps.!RunAwk.!Run you will see in the last line how this is used when you drag a textfile onto the !RunAwk icon. The contents of this file are shown below: | AWK if "<Awk$Prog>" = "" then echo No awk program chosen if "<Awk$Prog>" = "" then obey if "%*0" = "" then obey ?leaf <Awk$Prog> do taskwindow "mawk -f <Awk$Prog> %*0" -name <leaf>=>%0 -quit
There is nothing to stop you using the mawk command with
its full variety of command line options in an Obey file.
Furthermore, you can use the mawk -f <prog_1> ... -f <prog_m> <file_1> ... <file_n>
A word about the sequence in which things happen
needs to be said here. The program does not run until
all m program files have been read in. Then all the function
definitions found in them are compiled, the built in variable
ARGC is given the value n and the built in array ARGV is
initialized so that ARGV[i] has the value
<file_i>. Then all the There is a useful trick for passing in values. The command line arguments do not have to be the pathnames of input files. For example, suppose you wanted <file_k> to specify an output file. Then you would include in one of the programs a line of the form BEGIN { out = ARGV[k] ; ARGV[k] = "" } Setting the k-th element of ARGV to a null string will suppress input of the k-th command line argument. The variable out can then be used for output statements. This technique will be demonstrated in the next example. Invoicing customers: Example 4Suppose you are the milkman. You want to add up the amount due from customers and send out bills to them. The directory Invoices holds a collection of files, how many and what they are called is irrelevant, in each of which a record is kept of milk deliveries, each record having the form: <customer> <amount> Double-click on the Obey file Bill to create a directory of bills to send out to customers. Shift-double-clicking on Bill will reveal its contents: | Bill customers dir <Obey$Dir> enumdir Invoices invlist cdir Letters mawk -f Total invlist Invoices Letters delete invlist The fifth line makes the awk program Total act on the temporary file invlist, and passes the the input and output directory names as ARGV[2] and ARGV[3]. Shift-double-clicking on Total will reveal: # Total up invoices BEGIN { invoicesdirectory = ARGV[2] outdir = ARGV[3] ARGV[2] = ARGV[3] = "" # remove args sep = "." } # directory separator symbol NF { list[$1] = "" } # read what invoice files there are END { for (file in list) invoice(invoicesdirectory sep file, account) for (customer in account) if ((owing = account[customer])) # no bill if zero letter(outdir sep customer,customer,owing) } function invoice(f,a) { while ((getline < f) > 0) a[$1] = $2 close(f) } function letter(out,customer,amount) { print "Milk Bill for",sysvar("Sys$Date") > out print "Dear",customer > out print "You owe", amount, "pence." > out print "Thank you." > out close(out) } One or two points need comment:
You can execute RISC OS commands from within an awk program
using the standard built in function system(), which is analogous
to Basic's This milk account example is, of course, just a skeleton. There are a great many aspects of the model that we have left out or simplified. Nevertheless, I hope you can see that by using Obey files and awk programs together you can achieve a great deal with very few lines of code. SummaryIn this article we have seen that awk parses text as a sequence of records, made up of a sequence of fields. Combining awk programs with Obey files lets us construct more involved applications that use the full range of the command line, whereas the click and drag approach given in the first article merely restricts awk programs to the role of text filters. In later articles we will look more closely at patterns and consider how to use other applications to display output in tables or graphs. Gavin Wraith (gavin@wraith.u-net.com) |