AWK in Java with JBang!

Mar 1, 2022 · 815 words · 4 minute read

A few days ago I learned about pz. A Python library that exposes a few simple one-letter shorthands for line-based editing of pipes at the command-line. I immediately thought there could be potential.

This is simple and clever. The default `s` variable holds the contents of stdin. @jbangdev idea? 🤔https://t.co/pBBcIfEIJb
— Edoardo Vacchi (@evacchi) February 12, 2022

I liked the idea so I pestered Max Andersen: what if JBang supported that kind of shorthand syntax? It turned out Max was already working on something:

This might be possible sooner than I thought it would be.pic.twitter.com/nO4CHgQenV
— Max Rydahl Andersen (@maxandersen) February 3, 2022

This is already available since JBang 0.90.0 (but you should use v0.90.1).

I was pretty sure that this new feature could be used to implement AWK-like scripting using Java:

This+data frame lib = jawk !
— Edoardo Vacchi (@evacchi) February 18, 2022

For the following days I have been playing around with a short Prelude to enhance these kind of one-liners on JBang. But today I came across a blog post about Prig on Hacker News. Prig is «like AWK, but uses Go for “scripting”».

Well, that did it. I had to show I could do the same with JBang. I give you: Prelude.jsh

It is a very short collections of utilities. The main idea is that the class Line may be used to split the fields of a stdin line into an AWK-like “record”. I didn’t call it record not to confuse it with Java 16+’s record feature.

class Line {
    private final String line;
    private final String pattern;
    private final String[] fields; 
    public final int nf;

    Line(String line_, String pattern) {
        line = line_; fields = line.split(pattern); 
        nf = fields.length == 1 ? 0 : fields.length;
    }
    Line(String line) { this(line, "\\s+"); }

    public String s(int n) { return (n == 0)? line : fields[n-1]; }
    public int i(int n) { return Integer.parseInt(s(n)); }
    public double d(int n) { return Double.parseDouble(s(n)); }
    public String toString() { return line; }
}

All it does is splitting a line into whitespace-separated fields. You can access a field with Line#s. At index 0 you’ll find the entire line; at 1..n you’ll find the first..n-th field.

Line#d, Line#i, are just shorthands to convert the n-th field to a double or an integer.

Line#nf gives you the number of fields, just like AWK’s $NF.

There you go. Now suppose you want to print the second field for each line in logs.txt

$ cat logs.txt
GET /robots.txt HTTP/1.1
HEAD /README.md HTTP/1.1
GET /wp-admin/ HTTP/1.0

You would write:

$ cat logs.txt | jbang -s Prelude.jsh -c \
    'lines().map(Line::new).map(l -> l.s(2)).forEach(s -> println(s))'

of course, you’ll need to first download Prelude.jsh:

$ curl -L https://bit.ly/prelude-jsh -o Prelude.jsh

oh, by the way, since JBang is awesome, you can also write:

$ cat logs.txt | jbang -s https://bit.ly/prelude-jsh -c \
    'lines().map(Line::new).map(l -> l.s(2)).forEach(s -> println(s))'

🚨 Update: JBang v0.91.0 has become even awesomer: you can now skip the download and use the catalog I posted here
$ cat logs.txt | jbang -s prelude@evacchi -c \
    'lines().map(Line::new).map(l -> l.s(2)).forEach(s -> println(s))'

Now, because creating a Line object, then mapping it and then printing each result is so frequent, I also defined a few shorthands for you:

Stream<Line> $lines() { return lines().map(Line::new); }
void $$(Function<Line, Object> f) { $lines().map(f).forEach(o -> println(o)); }

There you go, now you can write:

$ cat logs.txt | jbang -s Prelude.jsh -c '$$(l -> l.s(2))'

and of course, now we can implement the example found in the Prig blog post

$ cat logs.txt | jbang -s Prelude.jsh -c '$$(l -> "https://example.com" + l.s(2))'

But let’s see how we may implement the other examples as well.

The average of the third column in average.txt:

$ cat average.txt
a b 400
c d 200
e f 200
g h 200

would be:

cat average.txt | jbang -s Prelude.jsh -c \
    '$lines().mapToInt(l -> l.i(l.nf)).average().ifPresent(d -> println(d))'

Format into millis the third row in millis.txt

$ cat millis.txt
1 GET 3.14159
2 HEAD 4.0
3 GET 1.0

is just:

$ cat millis.txt | jbang -s Prelude.jsh -c \
    '$lines().filter(l -> l.s(0).matches(".*(GET|HEAD).*"))
        .forEach(l -> printf("%.0fms\n", l.d(3)*1000))'

This is only slightly more cumbersome because String#matches matches against the entire line; hence requiring the leading .*( and the trailing ).* in the pattern. You may easily add a shorthand to Line to decorate the pattern and avoid the noise.

e.g.:

boolean matches(int n, String pattern) { return s(n).matches(".*" + pattern + ".*"); }

Finally, counting word frequency in words.txt

$ cat words.txt 
The foo barfs
foo the the the

In fact, this does not even require the Prelude !

$ cat words.txt | jbang -c \
    'println(lines().flatMap(s -> Stream.of(s.split("\s+")))
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting())))'

The JBang line-editing feature does not stop here. You have all JBang’s power at your fingertips: you can declare dependencies, extend the prelude further… have fun!

Thanks to Ben Hoyt for nerd-sniping me!

AWK JBang