Brian Kernighan on Successful Language Design

Datetime:2016-08-23 01:00:15          Topic: AWK           Share

What makes language design successful? This is the question that Brian Kernighan , among the contributors to the development of Unix and father of Awk , tries to answer in a talk at the University of Nottingham.

For his talk, Kernighan focuses on domain-specific languages, i.e. languages that are designed to solve a specific problem associated to a given application. They are usually “little” languages that have a narrow domain of applicablity and are not necessarily Turing-complete. What makes them interesting for Kernighan is that they provide a much more accessible playground for language design and language designers than complex languages, such as C++ and others. Examples of successful domain-specific languages are regular expressions, shell, Awk, XML, HTML, SQL, R, etc.

The notation that a language provides is a fundamental part of its design. Notation will make the difference between a language that allows developers to solve a problem with almost no effort and a language that will make it a hard task (and that can be better at other kinds of problems). Moreover, Kernighan says, a language’s notation will affect the way we think about a problem.

The first example of language that Kernighan analyzes is Awk, which uses a pattern-action paradigm to specify what is to be done. This choice was motivated by Awk’s design goal to make it simple to write 1-liners, and although Awk can be “abused” to write really long programs, it is most effective for short programs. Another feature of Awk’s notation was its similarity to C, which is according to Kernighan an important advantage so people do not have to learn something new.

The focus on 1-liners led Kernighan to devise specific features of the language, such as the absence of type declarations, automatic initialization, built-in variables, operators working on both strings and numbers, support for associative arrays and regular expressions, control flow statements, and built-in functions.

Also significant were the learnings that Awk made possible. First and foremost, that people use tools in unexpected ways, as well as that mistakes are inevitable and hard to change. In particular, it is fundamental for Kernighan to avoid adding new features to a language or even try to fix or improve specific things, since this will impair the language stability. Furthermore, new notation added as an afterthought generally brings a lack of consistency in the syntax.

The second example that Kernighan presents is AMPL , an algebraic modeling language for solving optimization problems based on a set of variables and constraints on their values. Kernighan was co-designer of the language as well as its initial implementor. AMPL provides a declarative syntax, which allows you to describe the problem’s variables and their constraints. AMPL also provides a data specification language for describing the data, and a command language to control what AMPL does.

According to Kernighan there were a few features of AMPL that made it moderately sucessful. In particular, due to its narrow focus on a specific kind of problems, universities taught it and it was quickly adopted in the industry. Its notation was initially purely declarative. Later more mechanisms were added, such as conditionals, loops, functions, i.e. what typically makes languages Turing-complete. This brought also some odd and irregular syntax. Another particular aspect of AMPL is its being closed-source, which surely contributed to its success, but, Kernighan says, it is one of those things that could be argued about.

Next case study considered by Kernighan is EQN, a mathematical typesetting language that he designed with Lorinda Cherry in 1974. The basic idea with EQN was implementing a language that made it possible to write math formulas the way they are spoken and served as inspiration to TeX:

x sup 2 + y sup 2 = z sup 2
f(t) = 2 pi int sin(omega t) dt

This syntax allowed the use of EQN by typists inside Bell Labs that were no mathematician, says Kernighan. One fundamental decision about EQN was to use pipes to feed its output to troff , basically due to memory constraints on computers at the time the language was created. Relying on existing tools is, according to Kernighan, a good approach for implementing DSLs while keeping the implementation complexity low.

Another typesetting language by Kernighan was Pic , a language meant to convert a textual description into a line drawing. Among the advantages that Pic provided were its making systematic changes easy, guaranteeing correct dimensions, and using loops and conditionals to build repetitive structures. Again, the approach was using troff as the output stage and piping into it the Pic preprocessor. This language was implemented using YACC and LEX, which made it easy to experiment with syntax and build a free-form English-like syntax that did not relied much on coordinate specification. This is how a simple flow chart would be specified in Pic:

arrow;
box "input";
arrow;
box "process";
arrow;
box "output";
arrow

Based on the examples above, Kernighan suggests then a few principles that can explain why languages succeeeds:

  • They solve real problems in a clearly better way thanks, among other things, to their notation.
  • They are culturally compatible and look familiar. This applies, e.g., to many languages that have relied on C-like syntax to make it easy to get started.
  • They are compatible with the existing tool ecosystem and do not require buying into a complete new environment or doing away with standard tools.
  • They are open source – this is especially true nowadays, Kernighan says.
  • They have weak competition and good luck.

On the other hand, there are also a few factors that can make languages fail, although languages never disappear completely:

  • They live in a niche or domain that disappears, as it happened to Kernighan’s languages based on troff .
  • They are too big, too complex, too slow, or too late. Two notable mentions here are for Perl 6, which arguably is late, and C++ for being big and complex – although C++ is still being used for many new developments.
  • They are based on poor philosophical choices, such as when they favour ideological choices at the expense of functionality, or when they are too “mathematical” or too “different”, both things reducing the number of possible users.

Kernighan’s final remark is about a quote by Alan Perlis, which could serve as an inspiration and a guide to create new languages that try to combine the best stuff that came before with a new approach that makes them better:

There will always be things we wish to say in our programs that in all known languagues can only be said poorly.





About List