random dungeon generator in obfuscated C

The Ascetic Programmer

How asceticism benefits programming, science, and the arts

Antonio Piccolboni


Cover illustration based on a program by Robert Nystrom [183]. Used by permission.

Many thanks to Marco Scanu and David Bylsma for providing feedback on an early draft, Tom Lindsey and my wife Maria for patient proofreading and to Maria for her unwavering support.


© 2023 Antonio Piccolboni CC BY-NC-ND

Introduction

Ascetic programming is the writing of computer programs under restrictions unrelated to resources such as memory or power. They range from using fewer lines of code to forgoing certain programming language features. My interest in ascetic programming started with challenges among coworkers, which we undertook to keep our skills sharp, our curiosity for programming alive, and, not least, for bragging rights.

We were not the first to think of imposing restrictions on programming other than those on resources. For instance, you may be familiar with Edsger W. Dijkstra’s condemnation of the goto statement (see Sec. 1.6). However, among possible restrictions, the one on program length has caught my long-term interest. It has connections to different facets of computing, from maintainability to testing to theory. These associations are mostly positive, meaning that shorter programs are better along several dimensions: for instance, they are more readable and maintainable and contain fewer errors.

Asceticism originated as a spiritual practice millennia ago. Ascetic restrictions traditionally target worldly pleasures and promise salvation or enlightenment in return. This book will focus mainly on code length restrictions and, more generally, description length restrictions, and our rewards will be plenty and varied, but, unfortunately, won’t include salvation.

Asceticism in programming comes in two flavors, roughly described by the slogans “doing more with less” and “less is more.” On the one hand, we seek to deliver the same functionality with fewer lines of code; on the other, we aim to identify essential features and discard the rest. I will refer to the former as conciseness and to the latter as frugality. We will discuss both as they apply to programming at different scales, from snippets of code to complete systems.

Beyond programming, I discovered unexpected examples of asceticism in statistics — where the preferred jargon is parsimony — philosophy, the visual arts, music, literature, and more. Outside science and engineering, the function of an artifact is less clear or undefined, and thus the boundary between conciseness and frugality becomes blurred. The equivalent of program length may or may not be available: for instance, it is, to a degree, for statistical models but less so for the visual arts. I hope you will find these more speculative forays interesting.

This book is a collection of cases or testimonials where brevity plays a positive role and where verbosity is the villain. The opposing view that verbosity is superior has also been allotted some space, as a counterpoint, without any pretense of equal treatment. Don’t expect a grand theory or a practical method, but rather a series of interconnected anecdotes and expert opinions, interspersed with the occasional theorem and research paper, some from giants in their respective fields and some from regular people working in the trenches. Let the case for asceticism build from the totality of the book rather than a single, decisive argument.

I hope that, by the end, you will be convinced that asceticism benefits engineering, science, the arts, and more. May this book inspire you to adopt unnecessary restrictions in your endeavors.

Who should read this book?

This book covers a wide variety of subjects, from computer programming to science to the arts. It doesn’t go particularly deep in any of them, but, reflecting my own background, it provides the most detail in the section about computing, less so in the one about science and only makes a cursory pass through the arts. I hope that anyone who has some familiarity with programming and mathematics will be able to enjoy most of the book. While there are plenty of internal connections, many of which I highlighted with cross-references, they are not essential for understanding. Therefore, skipping a subsection here or there won’t prevent you from enjoying the rest.

Reading advice

On devices that allow resizing the text view or the font, I recommend you choose settings such that the display shows a full line of code without wrapping — up to 88 characters. Where supported, long lines will scroll horizontally, and only as a last resort will they wrap. Since, in computer programs, indentation is meaningful, and this book often discusses line counts, this will reduce clarity.

For the most part, footnotes contain definitions of concepts or entities that may be unfamiliar to some. If you don’t feel like you need to read a footnote to improve your understanding, it is probably safe to skip it. Other footnotes provide license information for material reproduced in this book when appropriate.

Code snippets are for illustrative purposes only, are stripped of import or equivalent statements to access libraries, and are not guaranteed to be correct even restoring those statements.

1 Computing

The console of an IBM 650 automatic calculator [1], see Sec. 1.6.
Figure 1: The console of an IBM 650 automatic calculator [1],1 see Sec. 1.6.

1.1 Metrics

A first step towards espousing conciseness as a goal in software engineering is convincing ourselves that lines of code, or LOCs, are a valuable measure of program complexity. Certainly, 100,000 lines of well-structured code are simpler to understand or maintain than the same amount of so-called “spaghetti” code, which contains many interactions between its parts. However, one of the pioneering figures in computer science, Edsger W. Dijkstra, posed the problem of size as a fundamental one in software engineering [2]:

[…] widespread underestimation of the specific difficulties of size seems one of the major underlying causes of the current software failure.

Elsewhere, Dijkstra warned again about the dangers of accumulating large codebases [3]:

[…] if we wish to count lines of code, we should not regard them as “lines produced” but as “lines spent”: the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger.

Dijkstra’s preferred solution, namely structured programming, is intended to make it easier to deal with large codebases rather than to reduce code size (see Sec. 1.2.4).

Microsoft co-founder Bill Gates, according to Internet folklore, said:

Measuring programming progress by lines of code is like measuring aircraft building progress by weight.

In more recent times, this sentiment is still present among developers. For example, free software advocate Chris Warburton writes on Hacker News2 [4]:

Software is a liability, not an asset. We should produce as little as needed, and delete it if possible.

In the classic software engineering book The Mythical Man-Month, Frederick P. Brooks Jr. states [5]:

Productivity seems constant in terms of elementary statements, a conclusion that is reasonable in terms of the thought a statement requires and the errors it may include.

Albeit expressed in statements rather than lines, according to the use of the time, this supports the idea that LOCs are a reasonable measure of complexity. This book follows the common practice of counting LOCs. Yet this count is partially dependent on the vagaries of code formatting (see Sec. 1.2.7). An alternative is counting tokens, the size of the representation of a program that a computer builds to run it. It is a formatting-independent measure, which we will adopt in one specific example (see Quicksort in Sec. 1.2.6).

We can often run into invariants like the one referenced above expressed in terms of LOCs. For instance, John Lions, the author of a well-known guide to the Unix operating system (see Sec. 1.3.6), states [6]:

It has often been suggested that 1000 lines of code represents [sic] the practical limit in size for a program which is to be understood and maintained by a single individual.

While the exact value in this statement may be debatable and its origin unclear, what is noteworthy is that it uses LOCs as its unit of measure. Code structure, programming language, or application domain appear to be secondary.

No other quantity has emerged as a viable substitute for LOCs. Herraiz and Hassan review some of the alternatives and come to the following conclusion [7]:

All the complexity metrics are highly correlated with lines of code, and therefore the more complex metrics provide no further information that could not be measured simply with lines of code.

Among the alternatives is cyclomatic complexity [8]. It focuses on the flow of control in a program by viewing it as a graph. A node represents a sequence of statements devoid of control statements, and if and while statements correspond to the subgraphs in Fig. 2.

Graphs corresponding to if and while statements resp.

Figure 2: Graphs corresponding to if and while statements resp.

With this graph representation, the cyclomatic complexity of a program is one more than the circuit rank3 of its associated graph, that is, EN+2PE - N + 2P, where EE is the number of edges, NN is the number of nodes, and PP is the number of connected components. If you suspected this is a long-winded way to count the number of control statements, you would be right, unless we consider multiway branching statements like switch in C. It is possible to double the size of a program without altering its control structure — say, interleaving logging statements with the original ones — and, therefore, without changing its cyclomatic complexity. However, experimental studies show that it correlates well with program size, and those that used it to predict the number of bugs4 while controlling for program size had inconsistent results. A possible conclusion is that cyclomatic complexity isn’t so valuable “since program size is not a controllable feature of commercial software” [8]. However, program size, controllable or not, predicts crucial properties of software.

One of them is the number of bugs. According to Capers Jones, a consultant on software methodologies, it is more than proportional to program size, falling in the range of 0-25 defects per KLOC (a thousand LOCs) for programs smaller than 2 KLOCs, but rising to 4-100 for those over 512 KLOCs ([9]; as quoted in [10]).

However, even more important than the number of bugs is the probability of project failure, defined as substantially missing a project’s goals. For example, Stack Overflow and Discourse co-founder Jeff Atwood writes [11]:

Here’s the single most important decision you can make on your software project if you want it to be successful: keep it small. Small may not accomplish much, but the odds of outright failure — a disturbingly common outcome for most software projects — is [sic] low.

W. Wayt Gibbs, a science writer for Scientific American, explains [12]:

Software is exploding in size as society comes to rely on more powerful computer systems. That faith is often rewarded by disappointment as most large software projects overrun their schedules and many fail outright — usually after most of the development money has been spent.

1.2 Concise Software

1.2.1 Programming

I actually found a way to shorten his code by 1 word and it had only taken me 20 years to do it. […] it was the only time I’d managed to reduce some of Gosper’s code.5 It felt like a real victory. And it was a beautiful piece of code.

In this passage from an interview with Guy L. Steele Jr., inventor of the Lisp dialect Scheme, we read how the quest for conciseness transcended its practical origins and mutated into an aesthetic one [14]. Back in the day when Steele and Bill Gosper, also a founding figure for symbolic computations and Lisp, left their mark on computing, computer memory was so scarce that shortening a program even by one word could have been necessary. Twenty years later, the evolution of hardware and software had been such that there could no longer be practical motivations to shave off a word; only the intellectual ones remained.

Yukihiro Matsumoto, inventor of the programming language Ruby, agrees with the aesthetic qualities of concise code but also adds the language of ethics [15]:

Brevity is one element that helps make code beautiful. […] In the vocabulary of programming, brevity is a virtue.

Dan Ingalls, a key contributor to the implementation of the language Smalltalk, recalls that, as he learned the language FORTRAN, he “tried to see how much shorter I could make programs, and stuff like that,” suggesting that, for him, it was a learning or playful activity [14].

For an experimental study, Lutz Prechelt asked several mostly inexperienced programmers to solve a simple string problem in various languages monitoring several properties of the solutions, including length. He concludes [16]:

Interpersonal variability, that is the capability and behavior differences between programmers using the same language, tends to account for more differences between programs than a change of the programming language.

Focusing on program length, this study finds that choice of language and programmer skill or style equally contribute to variation. Yet the point that matters for us is the wide variability in program length for the same task in the same language (see Sec. 1.2.9 for language-dependent conciseness). In this study, the programmers were allowed to code according to their preferred style. Would they have performed differently had they been asked to minimize length?

They may have, but it isn’t something they will often be asked to do in their careers. We have seen that code size was a priority in the pioneering age of computing, and it might still be when developing software for underpowered devices. However, I am not aware that code size currently is an externally imposed constraint on major platforms. An exception should be web development since program download is part of the load time of a page. Yet, it hasn’t prevented web pages from ballooning in size (see the “web obesity crisis,” Sec. 1.5). Not only minimizing program length is no longer a goal: at least in some situations, there has been pressure to increase it. In an interview, former Microsoft CEO Steve Ballmer recalls how IBM required billing by the KLOC and that there was no way to convince them that this system discouraged the search for more concise solutions [17]. Once a task was estimated to require, say, 20 KLOCs, that put a cap on how much money the developer could make, and it became a target to hit.

Free from external pressure, do programmers aim to keep their programs short? Some argue in favor of slowing down, which is related to program length since time is a limited resource. For example, Poul-Henning Kamp, the author of the web caching program Varnish, estimates that he wrote what he intended to be his “highest quality program — ever” at the breakneck speed of 5 LOCs per hour [18]. If that sounds slow, the equivalent estimate in the previously mentioned Mythical Man-Month is much slower: 10 LOCs per programmer-day.6 Kamp adds that for life-and-death software, where correctness is paramount, such as that developed for human space flight missions, the same estimate drops to one LOC per programmer-day. Sure such a coding speed must discourage code bloat (but see Sec. 1.5 for bloated space software).

It isn’t only rocket science. Matt Lacey, a mobile developer and consultant, in a post titled “You’ve only added two lines — why did that take two days!” lists plenty of reasons why, at times, development may slow down to such a crawl [19]. He reports about fixing a bug, first understanding it in detail, from which subsystems are affected to how to reproduce it. Then, unsatisfied with just making the issue disappear, Lacey endeavors to find its root cause, if the problem may represent a class of related ones and how to fix it with a high-quality solution that won’t break his program elsewhere. Finally, he spends some more time preventing this or similar problems from happening in the future. Two days to perform all these tasks, even if the solution is only two lines, don’t seem like a waste of time.

Everyone in the software industry knows that slowing down is the opposite of how it works. There are entire methodologies aimed at creating arbitrary deadlines and fostering a fictional sense of emergency. Limits on size, on the other hand, are rarely, if ever, encountered. No boss ever told my teammates or me: “you could have done this with less code,” or “let’s try to keep it under 1000 LOCs.” However, I found a counter-example (see Dwm in Sec. 1.2.11 for an additional one). An anonymous Quora user asks if he should “fire someone who unnecessarily wrote 500 lines of code” without context [20]. We don’t know what these lines were supposed to achieve or if there was a pattern of verbosity in this engineer’s work. Rejecting the rush to summary judgment, another user replies:

As a programmer, I’ve written many, many lines of unnecessary code. Why? Because sometimes you have to write 500 unnecessary lines of code to get to the 50 absolutely necessary lines of code.

(for code deletion, see Sec. 1.2.2).

Going back to Kamp and his Varnish, he further writes:

Back when I was a kid I could write 1000 lines in a single sleep-deprived session across midnight, but it didn’t take that long before I discovered that I had to throw out most of it once I woke up again.
I was 40 years old when I started Varnish and I had 22 years of professional experience, a lot of them staring at, and often fixing/improving/refactoring, other peoples [sic] source code.

Patrick Burns, the author of Tao Te Programming, agrees that experience is a factor [21]:

Novices may think that the increase in the number of lines is good. More experienced programmers will know that the good parts are when the number decreases.

He goes one step further: instead of just accruing code slowly and carefully, Burns talks about reducing it (for code deletion, see the next section). As to bug fixes, Burns recommends making them as short as possible. He writes:

Change as little as possible. A one-character change is optimal.

One character is what would have saved the Mariner I probe from destruction (but it wasn’t in its control program, see Sec. 1.5). Burns brings this line of thought to its ultimate consequences and thinks “of programming as an exercise in compression.” While others use this language (see Steele’s “Huffman encoding problem” in Sec. 1.2.9 and Armstrong’s system compressor in Sec. 1.2.12), I believe the word “compression” has a more specific meaning in computing distinct from conciseness. Compressed data aren’t directly usable: they need to undergo a process of decompression that reconstitutes the original before being, for instance, displayed as images. Concise programs are run, read, and edited as they are, and there is no way to reconstruct a verbose version from which they may have been derived. A concise program is an independent entity, which can’t be said about compressed data, tightly bound to its uncompressed version. However, we could also take the other side of the argument: that decompressing a program is one of many transformations programs may undergo before being executed, such as macro expansion or translation into intermediate or machine language. The crucial difference between a compressed program and a concise one is that the latter is human-intelligible; it is the version developers read, modify and maintain. For compressed programs, that is the uncompressed version. A compressed program isn’t intelligible to developers, at least with known compression techniques.

It is unfair to complain that engineers write unnecessarily verbose code when they are not taught or rewarded for being concise. However, one company goes against the grain. Comma.ai, an AI startup, added it to their requirements for aspiring employees [22]:

We value those who can do more with less code; software engineering doesn’t have an external selection pressure for low part count and cost like EE and ME, but you pay massive costs down the line in debugging and upkeep.

In an echo of “art for art’s sake,” James Hague, a video game developer, enthusiastically declares [23]:

Writing concise to-the-purpose solutions is a primary reason for programming in the twenty-first century.

If that is the future, it isn’t “evenly distributed yet” [24].

1.2.2 Deleting Code

If code isn’t written directly in its most concise form, a solution is to delete some of it. Of course, removing random lines won’t work: we need to identify useless ones, if there are any. “I love deleting code,” declares Nathan Marz, the author of the pioneering stream processing system Storm [25]. He isn’t alone. For instance, Chromium browser developer Devlin Cronin writes in a related bug-tracking thread [26]:

It [a proposed code change] has concrete benefits […] the ability to potentially delete a bit of UI code (something that always makes me happy!).

It may sound like an odd statement for someone whose craft is writing code. Artists have destroyed or mutilated their work in fits of disappointment [27] or as a performance piece [28] or have deliberately created ephemeral works [29]. However, it isn’t a widespread practice. In other branches of engineering, there is a pressure to reduce part count, and simplifying a design may be rewarding in and of itself. However, owing to a clear distinction between design and production, unnecessary parts come off a blueprint, not a finished product. Once removed, it is no longer necessary to make, procure or assemble them, and they don’t contribute to the cost or weight of an artifact anymore. In software engineering, once created, a line of code’s further costs are in the future and harder to estimate, in the form of debugging, learning, documentation, maintenance, and, as we have seen, even project failure. In my decades of experience with several employers and many more clients, we hardly ever tried to estimate those delayed costs. Therefore, the incentives to delete are weaker. However, good programmers know how much code they have to read and that someone has to maintain each line and so on. They understand the burdensome side of a line of code.

The know-how on how to delete is sparse. Nobody taught me how to delete code. A post titled Advice on Deleting Code sparked my interest [30]. At 1600 words long, it doesn’t seem to convey much more than an appeal not to leave dead code7 behind, whether commented out8 or otherwise disabled or forgotten. That alone, though, is a good suggestion, as the following anecdote exemplifies.

I once worked on a codebase where a static analysis9 marked 1/3 of the code as unreachable. We supported this evidence with an analysis based on logging in production and for an extended time. Nonetheless, we couldn’t overcome resistance to deleting code that might still be useful to someone someday. This even after giving engineers veto power on removing any code for which they knew there was a use, and even after explaining that “deleting it” just meant “preserving it forever” in version control,10 from where it could be restored on demand. The unspoken truth was that few believed dead code to be harmful enough to warrant any risk of removing active code, thus creating bugs. The business side could not believe we were spending time on a non-customer-facing issue, and, in a way, they were right because nothing came of it.

Inexperience doesn’t help. Developer Neil Kakkar writes [31]:

I was very uncomfortable deleting shit or obsolete code. I considered what was written eons ago sacred.

Under the guidance of senior engineers, he reports having become more aggressive in deleting code but still recommends removing only dead code.

Some people even advocate the appointment of “code removal leaders.” For instance, in a Hacker News discussion about career prospects for senior developers, we read [32]:

I think the most valuable engineering leader would be someone who can remove code, or at least prevent unnecessary code from being written and maintained.

That reminded me of one of the best developers with whom I have ever worked. He had discovered another coworker was generating low-quality code in copious amounts, primarily by cut and paste. To address this, he spent the last half hour of each workday reviewing the other developer’s work, reverting all of his changes if they affected foundational modules. Luckily, the functionality to replace was minimal. This arrangement seemed to work well for everyone. What did this developer get from deleting all that code? His goal was to keep the amount of code he needed to understand and maintain in check. However, that isn’t the only valid motivation.11

Joe Armstrong, creator of the language Erlang, talked about “removing” as fundamental to how he went about his job [14]:

I would write a specific algorithm removing all things that were not necessary for this program. Whenever I got the program, it became shorter as it became more specific.

Reportedly, some can go beyond that. For example, Jon Bentley, inventor of the k-d tree data structure and author of the Programming Pearls series of books, reports that he “once heard a master programmer praised with the phrase, ‘He adds function by deleting code’ ” [15].

Others would like to fix bugs by deleting code. This tongue-in-cheek exchange between Michael B. Johnson, a software architect who contributed to many Pixar films, and Pete Barber, a software engineer, proposes this approach [33]:

In order to fix that bug I had to:
- have a good night’s sleep
- take a 2 hour hike
and then write one line of code.

deleting one line to fix it would have been even better

Fine. Ruin my little victory …

According to psychological studies, fixing by removing doesn’t readily occur to regular people, not just programmers (see Sec. 2.3). Only surgeons have made an art of it, owing to the redundancy in the human body and the scarcity of spare parts for it.

In another passage from [14], Armstrong goes beyond the practical:

If you start removing things, if you get to the point where if you were to remove anything more it would not work any more — at this point it is beautiful.

We will later read a similar statement generalized to all human artifacts and even natural forms (see Sec. 2.3). A crucial property that distinguishes an engineered object from a work of art is that the former serves a specific function: a program computes something or a building must let people in and out — unless it is a prison. In the arts, authors can push conciseness to nihilism, to the silent piece of music (see Sec. 3.3) or the empty canvas (see Sec. 3.2.3). Actually, these extremes can be reached by programmers as well, on occasion. This is the description for JavaScript module Noop2 [34]:

No operation as a module™. The long awaited successor to noop. Noop2 does even less than its predecessor, and is 100% tested to guarantee nothing happens.

Its author, Yoshua Wuyts, considers it “both a joke and actually useful,” reportedly to write tests.

1.2.3 Repetition

The most obvious opportunity for deletion is repetition. With physical artifacts, it is a form of order: the pillars of a bridge, its equally sized spans, the columns of a church, and so forth. Repetition brings about economies of scale, simplifies maintenance, and provides visual rhythm. On the contrary, it is hard to justify in computer programs since it is avoidable. There are many ways to eliminate repetition: for instance, loops, to execute the same code multiple times; and functions or methods, to name a snippet of code and recall it by name. These can optionally have parameters, like their mathematical brethren (see Sec. 1.2.4). Detecting repetition doesn’t require advanced skills or creativity: it is simple and mechanical. So much so that the code analysis service Code Climate includes an analysis tool to detect duplicated code [35]. Even if, according to their website, it “uses a fairly simple algorithm,” it detected a few almost identical duplications that had snuck into my own code. Unfortunately, their removal and the testing of the resulting code are still the user’s responsibility.

Indeed, repetitions need not be identical, but that is a fuzzy statement. Where does the quasi-repetition stop, and where does the “vaguely related code” start? In a simple case, we may find a simple parametrization that unifies all the repetitions under consideration. In a complex one, the corresponding abstraction may or may not be expressible in the language of choice, or it may require highly unsafe constructs whose use is discouraged (see Sec. 1.2.9.2). Another observation is that it is easy to generate duplications even with the best of intentions. Even if we aren’t maliciously wielding the cut-and-paste function in the editor to exceed our KLOC targets (see IBM in Sec. 1.2.1), a repetition could result from the convergence of two similar ideas. We wrote the first copy too long ago to remember, or it is different enough that recognizing it as duplicative requires careful analysis. Even when identification is easy, there may not be time to eliminate repetition because of the work required. If that entails writing a new function, we may also need unit tests,12 because of coding standards, unlike copy-and-paste code. Therefore, the elimination of the repetition must wait. However, writing a ticket13 to mark some duplication we left unattended generally isn’t a career boosting move, and those tickets never get taken care of regardless. The repetition lives to see another day and we move on to more glorious endeavors, like launching a new feature. Duplications accumulate in codebases like in genomes (see Sec. 2.4), slowly drifting apart over time.

Repetitions aren’t without drawbacks, though. Burns issues the following warnings [21]:

If you want to make a change to the repetitious code, then you have to change every occurrence. And before you can change them, you have to find them. If the change is a bug fix and you miss one, then you have just created a pernicious, hard-to-find bug. (“I already fixed that bug, it has to be somewhere else.”) If the change is a modification to the behavior and you miss one, then you have just created a pernicious, hard-to-find buglet. Buglets can grow into bugs.

(Burns doesn’t define a buglet, but, if I understand it correctly, it is a software defect that hasn’t yet manifested itself, a potential bug). That is why many programmers embrace the so-called DRY principle [36]. In the words of Matsumoto [15]:

Redundancy is defined as the duplication of information. In order to eliminate redundancy, we follow the DRY principle: Don’t Repeat Yourself. The concept of DRY is the antithesis of copy-and-paste coding.

Despite Matsumoto’s strong statement, there is nothing in Ruby to enforce this principle, nor in any other language I am aware of. I can only speculate why that is the case. The problem may be deciding what repetitions are worth removing, particularly how small and after how many repetitions. Is printing a new line worth a special construct? Is println() better than print("\n")? Or is x++ better than x = x + 1? Different languages give different answers to these questions. If you had to print the same message nn times, for what value of nn is the following solution

print(message)
print(message)
...
print(message) # n times

better than this alternative:

for i in range(n):
  print(message)

I know which one most people would choose for n=1n=1 or n=10n=10, but, for intermediate values, it is anyone’s guess.14 Without the lens of conciseness, these choices are seen as debatable and even just a matter of style.

Another obstacle to enforcing the DRY principle may be the issue of approximate repetition. Banning only identical copies would only represent a modest level of enforcement. Extending the proscription to approximate ones would face the problem of choosing which ones to remove. Moreover, if we replaced two code snippets with a function f(x), invoked with different values of x to replace each one, we would need to prove or at least corroborate the equivalence of those snippets to f. The equivalence problem is, in general, extremely difficult,15 but less so when the original snippets are similar.

However, the simplest explanation may be that code quality is considered outside the purview of compilers and interpreters and something that can only be enforced and improved by programmers. Nonetheless, there are separate tools, such as CPD, the aptly named Copy-Paste Detector, to detect duplications [37]. In my experience, their use could be more widespread in the industry.

Sometimes the obstacles preventing this process are in our skills and the complexities of the code: finding repetitions and unifying them. At other times, they are a question of priorities and goals. In an interesting anecdote, Dan Abramov recalls how he identified repetitions in an otherwise working piece of code [38]. After refactoring it, this is how he described the results:

The code is half the total size, and the duplication is gone completely! So clean. If we want to change the behavior for a particular direction or a shape, we could do it in a single place instead of updating methods all over the place.

The remarkable twist is that he had to revert his elegant solution under his boss’ orders and became convinced the messy code with the duplications was better, in what may be an engineering version of Stockholm syndrome. The reasons he brings range from the social — he didn’t ask permission — to the over-engineering — the clean solution could not support some future hypothetical use cases (see YAGNI in Sec. 1.3.1). Adding and removing code aren’t just two equally important sides of programming: one is considered an ordinary, day to day activity; the other, an exceptional, dangerous one requiring a broad consensus.

1.2.4 Abstraction

When talking about approximate repetitions, we saw that we could sometimes replace two blocks of code with one, possibly shorter than the sum of the original two, which performs the function of both. To achieve this, we need to unify their commonalities and reduce their differences to a small fraction of the code or, even better, parametrize them. This process is a special case of what is known as abstraction. As Dijkstra said in his Turing Award Lecture [39]:

[…] the only mental tool by means of which a very finite piece of reasoning can cover a myriad cases is called “abstraction”; as a result the effective exploitation of his powers of abstraction must be regarded as one of the most vital activities of a competent programmer.

It is natural to think that a piece of code that covers “a myriad cases” would spare us from the effort of coding each case one by one, thus reducing code length by a corresponding factor. For instance, C. A. R. Hoare, Turing award recipient and inventor of the Quicksort algorithm, declared: “The way to shorten programs is to use procedures”.16

However, that isn’t the mechanism proposed by Dijkstra. In the same speech — indeed, the same paragraph — he advocated using abstraction to reduce the “intellectual effort” required to write a program of a given length, not to reduce its length. He didn’t formulate this recommendation based on empirical data or a mathematical theory. He started instead from the common-sense observation that if all parts of a program interact, this results in many interactions that need to be understood, debugged, maintained, etc. Their number grows with the square of the size of the program. On the other hand, if each part interacts with a small number of others, this results in a much more manageable linear number of interactions.

Dijkstra appears to assume that abstract code results in programs of the latter type, but this is far from obvious. On the contrary, a more general function creates indirect dependencies between parts of a program and, thus, more of the entanglement that Dijkstra suggests reducing. Taking sorting as an example, let’s assume we have two independent programs, one that needs to sort arrays of integers and the other of strings. In the first scenario, each program uses its specific sorting function, and there is no other code-sharing between the two. In the other, they rely on a shared library function that can sort any collection. This sharing constitutes an indirect dependence between the two programs. If changes in the sorting code were to become necessary because of the needs of one program, a bug could appear that cripples the other one.

If keeping the two programs completely independent is paramount, an option is duplicating the sorting function, no matter how general, and letting each program depend on its separate copy. When involving entire libraries, developers call this forking the shared dependency (from the idea of a fork in the road). Albeit an operation that results in more code to maintain, developer, blogger, and speaker Thomas Figg advocates for repetition to be temporarily accepted, to prevent coupling, or to let a better abstraction emerge from examples ([41], see also “interpolation” below).

There are other ways to ensure isolation between parts of a program besides duplication. For instance, modules or functions which are clearly defined and independently tested minimize the coupling risk. In my previous example of modifications to a sorting function, they would be unusual for a task that is so clearly specified. Because of this, such a function is simple to test, and extensive testing reduces the likelihood of introducing bugs when modifying it. Therefore, sharing a sorting function between modules doesn’t induce any coupling worth worrying about. Indeed, such a function is available in any language I can think of as a built-in or standard library function, and few developers would bother creating their own.

A program built primarily out of functions as clearly specified as a sorting function behaves more like several small programs than a single large one. All its components will still run on one machine, sharing its resources and crashing all at once if only one is faulty. However, from the point of view of development, the different parts are more swappable: we can replace a buggy function with another; a slow implementation can give way to a faster one. Microservices take decoupling to the next level by running parts of a program or application in isolation.17 This way, a microservice can fail without dragging down the rest, and graceful degradation is possible. At an extreme scale, that is why the Internet works despite being based on, possibly, billions of LOCs:18 it can keep working even if at any given time parts of it aren’t — there are nonetheless occasional large-scale failures [42].

In summary, conciseness and modularity are two sides of coping with size: if size can’t decrease, by splitting code neatly into modules, we can deal with smaller chunks at once.

If writing more general code is harder and introduces more dependencies, does it offer any advantages? Well, what is better than covering “a myriad cases” (see Dijkstra’s talk above) with a single coding effort? “Wherever there is repetition, there is an opportunity for abstraction,” states Burns [21]. For example, imagine having a sorting function for each type of collection and then for each element type: one for arrays of integers, one for lists of strings, and so on. That would result in a large amount of repetitive code. Indeed we will see an example where a library (in java.util.Arrays, see Sec. 1.5) contains seven copies of essentially the same functions because the language hampers further generalization. Moreover, bugs can lurk for a very long time. For example, an implementation of binary search,19 generally accepted as correct, hid a bug for 20 years after being published [43]. Its Java standard library version contained the same error for nine years [44]. Imagine the effort to fix this bug if every project maintained its version of binary search.

Abstracting over different data types is only one instance of a general phenomenon. Often code can grow as an accumulation of special cases until we realize a general rule or solution that can replace them and covers additional ones as a bonus. Using a statistical analogy, we are replacing individual data points with a model and predicting new ones as a result (for statistical modeling, see Sec. 2.1).

Rich Skrenta has observed something similar [45]:

[…] more code often means less flexibility and functionality. This is counter-intuitive, but a lot of times a simple, elegant solution is faster and more general than the plodding mess of code produced by a programmer of lesser talent.

I had a stark demonstration of this when I was still in college. As part of standard computer science coursework, I was working with other students on implementing a toy programming language. Our instructor asked us to use a modular architecture that subdivided its compiler20 into multiple interacting programs, each assigned to a sub-team. After preparatory meetings and some development, we all met to check that all parts worked together. One of them seemed to work on the simplest programs, but it failed at the first hint of nesting — a for loop inside another for loop, for instance. The student responsible for it promised to fix it and meet us again in a couple of days. At the following meeting, his program failed again, this time on two levels of nesting. This sequence of events repeated itself a few times, with his program producing errors each time on slightly more complicated examples. Concerned about the fate of our project, we asked if we could review his source code. We found that he treated the no-, single- and double-nesting cases separately. He had written a large amount of code to cover all the examples we had provided. He had yet to use recursion21 to deal with unlimited nesting, as required by the definition of the language. It looked like he thought that, in practice, the programs to be compiled wouldn’t be too complicated and that his approach would be enough. The result was a program that had to grow in size proportionally to the complexity of the examples it was required to handle. In this respect, it had something in common with applications whose complexity is proportional to the number of screens to display; or machine learning models, whose complexity grows with the size of the training set (for more on this, see Sec. 2.1.4). Only much later in my career, I realized that my teammate22 was anticipating the future of the software industry:

  1. he was pioneering test-driven development;
  2. he focused on his customers’ requests, with his team-mates being the customers;
  3. he maximized customer value — higher levels of nesting are used less often;
  4. he met his self-imposed rapid-fire deadlines;
  5. he wrote code quickly.

One obstacle to achieving the level of generality of a sorting function or a compiler is that not all abstractions emerge with the same level of clarity. For example, sometimes we create a do_stuff function that we reuse somewhere else before we understand what “stuff” exactly means. One consequence is that it keeps changing as we modify code elsewhere. It may also be harder to test in isolation from the contexts in which it runs. These are the telltale signs of a poor abstraction, but they are seldom perfect from the start. Sometimes the correct generalization emerges over time after seeing it applied in different contexts, akin to interpolation in mathematics. Marz writes [46]:

I view the development of beautiful abstractions as similar to statistical regression: you have a set of points on a graph (your use cases) and you’re looking for the simplest curve that fits those points (a set of abstractions).

The alternative, letting repetitive code proliferate and fixing it at a later time, has its drawbacks, as we have seen.

Another problem is that the connection between abstraction and conciseness isn’t considered fundamental. We have seen how Dijkstra thought abstract code reduced the intellectual effort required per line written, not the number of lines in a program. Furthermore, he didn’t look kindly at attempts to solve problems with a single line of code (see “one-liners” in Sec. 1.5). In my training, different abstraction techniques were touted for their potential to reduce coupling and not for any conciseness advantage. Without a measurable goal, the best level of abstraction becomes a matter of opinion. However, there is some agreement that it can be taken too far (see the LinkedIn example in Sec. 1.2.9.4), for instance, creating hierarchies of classes23 where most classes are never instantiated.

A few pay attention to the conciseness potential of specific programming techniques. For instance, software engineer and blogger Nikolay Grozev wrote an introduction to monads, the advanced programming technique inspired by category theory [47]. First, he develops an example that he deems “far from perfect, as we have repeated ‘glue code.’ ” Later, after applying monads, he observes:

This [using monads] avoids the shortcomings of our previous approach because the bind function implements all the glue code and we don’t have to repeat it.

The crucial point is ensuring that abstractions can be combined seamlessly, providing appropriate combinators. Only then can we reap the full benefits of creating those abstractions. An example could be modifying a module for arithmetic calculations where a division by 0 terminates a program so that it would instead generate a special error value, say "err". This module features four functions, add, sub, mul, and div, for the four basic arithmetical operations. Since a division by 0 can only occur during the execution of div, we start by modifying that function:

def div(num, den):
  return "err" if den==0 else num/den

The problem is that the other functions don’t know how to deal with the value "err", so we can’t just write add(1, div(1, 0)) to calculate 1+1/01+1/0. The intended behavior is that any operation involving "err" also results in the same value. One approach is to check every time we use div:

a = div(1, 0)
b = "err" if a == "err" else add(1, a)

This approach is verbose and error prone. Another option is to modify all four operations to deal with "err" appropriately. However, the “monadic” approach is to define a function, traditionally named bind, to modify the evaluation of the operations:

def bind(op, x, y):
  return "err" if x == "err" or y == "err" else op(x,y)

This way, we can write complex expressions as follows:

bind(sum, 1, bind(div, 1, 0))

Another example from functional programming24 circles relates to higher-order functions, which accept as arguments or return other functions, and are available in many popular languages but aren’t used widely in practice. Software engineer and blogger Gio Carlo Cielo guides us through a complete example in which “our code verbosity has been significantly reduced through the use of the higher-order functions” [48]. A minimal illustration of their use could be applying a function to each element in a list. Without higher-order functions, this is what we would write in Python:

def f(x):
  ... # some definition for f

result = []
for x in range(10):
    result.append(f(x))

With the standard (higher-order) function map, the same example becomes:

map(f, range(10))

This version is simultaneously more concise and more abstract. We obtained a more concise version by hiding details like creating an empty list and appending elements to it. Python offers an alternate form for this task, list comprehensions, using which our example becomes:

[f(x) for x in range(10)]

This code doesn’t eliminate the loop variable x and is limited in its generality. For instance, there is no list comprehension for the fold or reduce higher-order function typical of functional programming. The likely reason is that it is the element of functional programming that Python’s author Guido van Rossum has “always hated the most” and for which he needs to “grab pen and paper to diagram what’s actually being fed into that function before I understand what the reduce() is supposed to do” [49]. However, this issue is wider-reaching than reduce itself and van Rossum’s foibles: it is whether languages should provide the most common abstractions or the tools to build them. Higher-order functions are in the second group. For instance, extending map to a data structure such as a tree is straightforward, as we will see shortly.

Let’s start with some definitions. A tree is made out of nodes. A relation exists among them such that some nodes are parents to other nodes, called children. An ancestor to a node is either parent to that node or an ancestor to its parent. An ancestor is connected to its descendant nodes through a unique path. Finally, there must be a node that is an ancestor to all other nodes, called root (see Fig. 3).

A tree.

Figure 3: A tree.

In Python, let’s represent a node with the list of its children and a tree with its root node. For instance, [[],[]] represents a tree with three nodes, one the parent of the other two. Since trees are most useful when holding some data on each node, let’s say the first element in the list isn’t a child but such data, for instance [1,[2],[3]], where the node holding 1 is the root, which is parent to the other two nodes, containing 2 and 3 respectively. With these definitions and representation conventions, we can define a map for trees as follows:

def map_tree(T, f):
  return [f(T[0])] + list(map(partial(map_tree, f=f), T[1:])) if T else []

With two lines25 of code, we have extended the concept of a map function to trees.26 There is no such possibility for list comprehensions. They are a useful pre-defined abstraction but aren’t amenable to being extended to other data structures.

There isn’t anything special about functional techniques other than that they are powerful techniques for abstraction and, therefore, also for conciseness. Moreover, I wouldn’t want to generate the impression that abstraction is the result of applying sophisticated techniques or adopting esoteric languages, whereas, for the rest of us, there is copy-paste-modify. Sharing a complex method between two classes by introducing a common parent class also helps conciseness. As we have seen above, Hoare found procedures already fit for this task.

1.2.5 Rewrites

A rewrite is a controversial and radical step in software development: take what you have built, throw it away and start from scratch. Rewrites come in different sizes, from those targeting a complete application to those affecting just a few functions. They have been described as company-destroying mistakes [50] or a normal phase of development [5]. Opinion dominates, and I don’t have much to add to this debate but to consider what it means for conciseness. Rewriting some portion of a program isn’t guaranteed to result in less code. Famously, Netscape discarded its Navigator code base for various reasons, one being its size [51]. A prototype released to showcase the new rendering engine Geko was about a tenth of the size of contemporary browsers and could fit a single floppy disk — 1.44 MB. It had swollen to 17 MB by the time the 1.0 version was released, negating any size advantages.

However, it doesn’t have to end that way. Instead, we will see how switching to a higher level language or one better matched with the application can result in radical code savings, for instance, in a 60-KLOC rewrite performed at LinkedIn (see Sec. 1.2.9.4). In that case, the rewrite was management initiated. In other ones, rewrites start from the grassroots. For example, Apple developer Bill Atkinson used to report a negative number of lines contributed in a weekly form that he had to submit, including a “-2000” in the week in which he had rewritten some routines in QuickDraw to be six times faster. Eventually, he wasn’t asked to fill the form anymore [52].

Another example involves Bret Taylor, co-inventor of Google Maps [53]:

[Taylor] was frustrated that the JavaScript was really slow. It would take forever to load. So, one weekend he said, “I’m going to rewrite it.” And he rewrote the entire thing, making it 10 times as fast and a third of the size. He completely threw away the code that a team of engineers had been working on for months.

Unfortunately, we don’t know if the same result could have been achieved incrementally or by giving the right incentives to the original team.

An additional example is a cryptocurrency named XLM, initially based on a fork of a pre-existing one, XRP. Its developers were having doubts about the security of XRP and decided there were better options, eventually opting for the Stellar Consensus Protocol. However they did not merge it: instead, they “created an entirely new codebase designed with safety, minimalism, scalability, interoperability, and multiple implementations in mind,” elevating minimalism to a top-level concern [54].

In some instances, the impetus for a rewrite comes from a shift in hardware technology that makes an existing implementation obsolete. For example, the trend towards using GPUs27 for numerical computation prompted a complete rewrite for an application named Weather Research and Forecasting (WRF; [55]):

WRF is well over half a million lines of code, mostly rooted in Fortran. This new model is based on the same ideas but with less code. Still, the effort to get this up and running on Power9/Volta GPU systems took two and a half years.

We will see more of this as heterogeneous computing28 spreads to all types of devices, from supercomputers to smartphones, prompting partial or complete rewrites [56].

dogpile.cache is a generic API29 that can use different caching30 backends. It is a rewrite of a similar product, Beaker, by the developer of both, Mike Bayer, who explains [57]:

All the ideas of Beaker which “work” are re-implemented in dogpile.cache in a more efficient and succinct manner, and all the cruft […] relegated to the trash heap.

Here we have an example of a rewrite that doesn’t involve management taking a new direction (see the LinkedIn JavaScript rewrite in Sec. 1.2.9.4) or a programming superstar showing his mettle (Taylor, see above). Instead, we witness a developer realizing that an incremental path from his existing code to where he wanted to be was less practicable than a fresh start. It is also an example where succinctness rises to top-level concern instead of being a happy accident.

1.2.6 Algorithms

Algorithms are generally small enough that their code length isn’t an issue, instead relevant for libraries, applications, and systems. With algorithms, we are more interested in their correctness and efficiency. However, it doesn’t mean keeping the line count low isn’t a nice extra. For example, Daniel Lemire, inventor of the xor filter, a replacement for bloom filters31 and other related algorithms, asks [58]:

Could we do even better [on speed and memory usage] while limiting the code to something you can hold in your head?

His xor filter reportedly marks an improvement on traditional measures of algorithm efficiency (speed and memory usage) while also keeping it relatively short:

The complete implementation in Go fits in less than 300 lines and I did not even try to be short. Indeed, any semi-competent Go coder can make it fit within 200 lines.

While I doubt I can memorize 300 LOCs, it is 50 lines shorter [59] than a bloom filter implementation by the same group [60].

Moving onto a more run-of-the-mill Quicksort, a mainstream sorting algorithm, there are different ways to implement it. Angel Sola Orbaiceta, a software engineer and book author, experiments with object-oriented and functional paradigms, using Java and Clojure [61]. He finds out that “functional code is much more readable and compact” for the same task and believes “the more code there is, more time and effort has to be put in maintaining the codebase and more bugs tend to appear.” Specifically, skipping empty lines or those with just a bracket or parenthesis, he ends up with 27 lines for Java and 9 for Clojure.

I tried to replicate his experiment in Python. Quicksort is an application of the divide et impera approach:

When a split is empty or only has one element, return it as is. Some subtleties in how the partition is performed create the opportunity for some variants. Namely, the Quicksort function (abbreviated to qs in the following code blocks) should never invoke itself on its own input, or the algorithm won’t terminate (and run out of memory in practice). We can achieve this by excluding the pivot or all elements equal to the pivot from further sorting. Another choice is either to sort in place — moving entries around in a single array — or to create new arrays as needed: the former is more memory-efficient, the latter quite a bit simpler.

The following version performs the sorting in place:

def qs(xx, lo = 0, hi = None):
    hi = len(xx) - 1 if hi is None else hi
    if (hi <= lo) or lo < 0:
        return
    p = partition(xx, lo, hi)
    qs(xx, lo, p - 1)
    qs(xx, p + 1, hi)

def partition(xx, lo, hi):
    pivot = xx[hi]
    i = lo - 1
    for j in range(lo, hi):
        if xx[j] <= pivot:
            i = i + 1
            xx[i], xx[j] = xx[j], xx[i]
    i = i + 1
    xx[i], xx[hi] = xx[hi], xx[i]
    return i

It takes 17 lines, which compares favorably with the 27 lines of Java in Orbaiceta’s blog post. However, on closer inspection, his version isn’t in-place. Removing that constraint, we can produce this version:

def qs(xx):
    if len(xx) <= 1:
        return xx
    return (
        qs([x for x in xx if x < (pivot := xx[0])])
        + [x for x in xx if x == pivot]
        + qs([x for x in xx if x > pivot])
    )

This code is just seven lines, ignoring the line with just a parenthesis, and it is also straightforward. There is no index accounting to master, no swapping of elements, just plain and simple divide et impera. However, the ascetic programmer can’t help but notice a certain repetitiveness in the three list comprehensions that perform the partition. The only element that changes is the comparison operator used. I looked unsuccessfully for a library function combining these steps; defining our own would take too many lines. However, we still can factor out the filtering operation:

fo = lambda op, xx: [x for x in xx if op(x, xx[0])]

where op is one of <, >, =. We can’t pass them around as parameters in their most familiar form, but there are equivalent functions in the standard package Operators: lt, gt, and eq, respectively. With this parametrization, we can write qs as follows:

qs = lambda xx: xx if len(xx) <= 1 else qs(fo(lt, xx)) + fo(eq, xx) + qs(fo(gt, xx))

I used a short name for fo to fit qs onto one line (formatted with a popular tool, Black, which uses an 88-character bound). For the same reason, I used lambda definitions instead of def. Those tricks are possible because line count is a blunt instrument. If we used token count, a measure of program complexity insensitive to the vagaries of formatting or naming (see the next section), those tricks would not pay off. Indeed, it is easy to perform a token count with the help of package Ast (for abstract syntax tree, a data structure used to compile programs):

def token_count(source):
    return len(list(ast.walk(ast.parse(source, "source", "exec"))))

According to this function, the in-place Quicksort contains 200 tokens, whereas the shorter versions are 75 and 74 tokens long, respectively. Remarkably, the two-line version wins by just one token, owing to its “density.” It is natural to ask if we can go any lower than this. It is possible, indeed, by eight tokens. To achieve this feat, we need to define the partitioning filter as a function of one instead of two arguments. This change requires its definition to be inside the scope of xx:

def qs(xx):
    if len(xx) <= 1:
        return xx
    filter_xx = lambda op: [x for x in xx if op(x, xx[0])]
    return qs(filter_xx(lt)) + filter_xx(eq) + qs(filter_xx(gt))

It makes sense to limit the arguments of a function to those that change within a scope of interest: what else could we be filtering other than xx? What may look like a silly search to shave off a few additional tokens — a pursuit that Dijkstra thought to lack “any conceptual relevance” (see Sec. 1.5) — spurred me to tighten or specialize the filtering abstraction to the needed level of generality, and nothing more than that. There may be a performance penalty for defining a function inside a recursive call, certainly in Python, but we are not trying to optimize performance in this book, and occasionally we aim for the opposite (see the Busy Beaver numbers in Sec. 1.2.11.1).

Code length is also used to evaluate the effort needed to prove the correctness of algorithms (see Sec. 2.2). For instance, Martin Kleppmann, author of Designing Data-Intensive Applications, implemented, in 60 LOCs, a distributed data structure to support group editing, but the proof of its correctness took 800 lines of math for a proof-to-code ratio over 13:1 [62]. Kleppmann maintains it was worthwhile not only to raise confidence in the correctness of a less-than-intuitive algorithm but also to increase its understanding. Nonetheless, the large size of the proof highlights an obstacle to applying formal methods in software engineering. Another barrier is that proofs can only show that an algorithm satisfies its requirements, but those requirements must reflect the real world. Nonetheless, formal methods are valuable in high-stakes applications such as the control software for the French railway network [12] or the S3 storage system [63].

1.2.7 Notation and Formatting

If recognizing and implementing abstractions is considered good programming, abbreviating identifiers or skipping white space is generally frowned upon. Identifiers, as we were taught in school, should be self-explanatory; white space should be used liberally to impart visual structure to the source code.

An exchange from a Hacker News forum shows that at least some developers, including database architect Andres Freund, challenge the orthodoxy [64]:

Usage of acronyms is one of the worst offenders in bad code.

Not really on board with that … It’s a balance. Brevity does have its value too.

The trade-off is memorizing the abbreviations in exchange for dealing with more compact and readable expressions. If a long variable name works as documentation, is having to read it every time you run into a variable a good idea? Whether or not a program uses acronyms, identifiers’ definitions and related documentation must be easily accessible, usually through features of an IDE.32 When a program deals with complex concepts, the idea that even long identifiers can be self-explanatory is unrealistic.

We should look at mathematics as a model. Variables are generally single letters. Mathematicians and physicists develop specialized notations to make expressions more compact. About Einstein’s general relativity, the biographer Walter Isaacson writes [65]:

Einstein’s final equations used the condensed notations of tensors to compress sprawling complexities into squiggly symbols and subscripts.

Clearly, “sprawling complexities” aren’t conducive to readability!

Charles Babbage, inventor of the first mechanical calculator, seemed to agree (as quoted in Kenneth Iverson’s Turing Award lecture [66]):

The quantity of meaning compressed into a small space by algebraic signs is another circumstance that facilitates the reasonings we are accustomed to carry on by their aid.

Even newlines should not get a free pass. In a thread on GitHub, a discussion developed between supporters of a format with shorter lines that was allegedly more friendly towards text-oriented diff tools33 and a more compact one. User tweakimp writes [67]:

I think you can call it [the choice of format] more than preference because if there is no other difference, less [sic] lines is [sic] better.

In hugovks [sic] example the second version is shorter, easier to read and should be preferred in my opinion.

Optimizing code to be read by primitive comparison tools sounds like a bad idea, independent of conciseness. In the seminal book Structure and Interpretation of Computer Programs (SICP [68]), we read that “programs must be written for people to read, and only incidentally for machines to execute.” Even more so, we shouldn’t be writing them to ease the work of primitive diff tools but switch to better ones, like those that analyze the structure of programs.

1.2.8 Documentation and Comments

When it comes to documentation, the more documentation, the better, correct? Actually, who enjoys reading it? Documentation is a cost like code is. It needs to be written and maintained, developers must read it, and it silently becomes outdated.

Michele Simionato, author of several Python packages, reported this about halving the size of the documentation for one of them [69]:

[dropping Python 2] is a huge bonus, since I could remove over 2,000 lines of duplicated documentation/doctests. Having to maintain separate docs for Python 2 and Python 3 effectively stopped any development on the module for several years.

What may be hard to understand with large teams becomes evident for a one-person volunteer project.

Can we eliminate documentation? Joshua Bloch, known for his work on Java, thinks so. About API design, he writes [70]:

APIs should be self-documenting: It should rarely require documentation to read code written to a good API. Indeed, it should rarely require documentation to write it.

Early drafts of APIs should be short, typically one page with class and method signatures and one-line descriptions. This makes it easy to restructure the API when you don’t get it right the first time.

It may look like a lofty goal, but Bloch was likely thinking about some of his work on the Java standard library. Building on fundamental concepts in software engineering, which all developers should be familiar with, alleviates the need for documentation, and simple and consistent API design should complete the task of minimizing the need for documentation. Does the Java standard library meet these lofty goals? For instance, in module java.util.Arrays we find entries such as [71]:

static int binarySearch(byte[] a, byte key)
static byte[] copyOf(byte[] original, int newLength)
static byte[] copyOfRange(byte[] original, int from, int to)

Their behavior should be intuitive. However, why does copyOf need a newLength argument? Shouldn’t a copy be length-preserving? And what happens if newLength is larger or smaller than the original? Luckily, the API reference provides a simple one-line explanation that copyOf is more than a copy: it will happily stop after copying only newLength elements, or continue, padding the new array with 0s until newLength has been reached. The padding value isn’t configurable and depends on the type of the original array — copyOf is overloaded34 for different types. Scanning the different forms of copyOf I found

static <T> T[] copyOf(T[] original, int newLength),

an entry which uses Java generics to parametrize its parameter and return types, for which the padding value is null. So what if T equals byte? Will copyOf pad its result with zeros or ‘nulls’? The answer is that Java generics cannot be instantiated with primitive types, and therefore T can never be instantiated with byte [72]. Thus the apparent ambiguity is resolved. It gets more confusing if we consider this additional form of copyOf:

static <T,U> T[] copyOf(U[] original, int newLength,
                        Class<? extends T[]> newType)

which represents a copy with conversion from type U to type T. What happens if you need to perform the same operation from or to primitive types? I would check with a Java expert before giving up, but I am afraid you will have to write 49 more functions, one for each pair of primitive types.35 This limitation stems from the Java generics design, like the quantity of almost identical copyOf entries for each primitive type, and the API developer must adapt (see java.util.Arrays in Sec. 1.5). However, the documentation leaves the user helpless here, suggesting a consistency in the library that is only aspirational.

In a thread about the “best code bases,” Hacker News user taspeotis has this to say about a code base touted for its line-by-line comments:36 “Every line? Oh dear” [73]. He and other users follow up by sampling the uninformative comments, which add nothing to what is already obvious from the code:

/* Set verbose flag */
  verbose++;
   ...
/* Set advanced flag */
  advanced = 1;

User teknopaul comments:

IMHO Coding guides should ban “SBO comments” Statin [sic] the Bleedin [sic] Obvious. Let devs [sic] decide what SBO is, usually its [sic] bleedin [sic] obvious.

Which sounds agreeable, but for the circular definition of “bleedin [sic] obvious.”

Martin Fowler, author of popular software engineering books, doesn’t like comments [74]:

Comments are not bad — but there are often better options. I always try to write code that doesn’t need comments …

One such option is assert statements, which specify properties to be satisfied by a program when executing at a certain position in the code. While they are yet more code to write, understand and maintain, they can express simple properties that aren’t obvious from reading the rest of the code. These properties are checked at run-time, helping in the early detection of bugs, near their origin, rather than later as consequences snowball. However, even without running any program, they work “as documentation for other developers reading the code, who see the assert and can confidently say that its condition holds” [75]. For instance, we can express the property that a list is sorted with:

is_sorted = lambda xx: all([x <= y for (x,y) in zip(xx[:-1], xx[1:])])

We can use this in the implementation of the algorithm Quicksort (see Sec. 1.2.6) to check and document that each call has to return a sorted list:

def qs(xx):
    if len(xx) <= 1:
        return xx
    filter_op = lambda op: list(filter(lambda x: op(x, xx[0]), xx))
    yy =  qs(filter_op(lt)) + filter_op(eq) + qs(filter_op(gt))
    assert is_sorted(yy)
    return yy

Another option is randomized or property-based tests, which express, in executable form, required properties of inputs and outputs of a function. The idea is that, instead of specifying a set of inputs with corresponding desired outputs for a function, we can define a process to generate inputs and properties that the input-output pair must verify. For instance, when testing a matrix multiplication function, we would pick a matrix generator to provide inputs. Tests could verify properties such as associativity, distributivity, and having the correct dimensions. Additionally, they could check the results of operations involving identity and null matrices. This approach originated in the Haskell language community and was implemented in the package QuickCheck, where it relies on the type system to provide useful default generators even for user-defined types [76]. It has been imitated in many languages to different degrees of fidelity. In Python, the package Hypothesis supports property-based testing [77]. To continue with our Quicksort example, a test that the output is sorted would look as follows:

@given(xx=lists(integers()))
def test_is_sorted(xx):
    assert is_sorted(qs(xx))

In this test, the first line is a decorator that specifies what generator to use for the argument to the test function, one that produces lists of integers. Writing a generator is also a form of documentation because it tells us that qs accepts lists of integers. qs is more general than that, which should prompt us to write generators like lists(one_of(integers(), floats(), text())) which will produce lists of integers, floats, or strings (not mixed in the same list). The second line contains the name and signature of the test function, following the convention used by the package Pytest. The body of the test asserts that the result of applying qs to its argument is sorted. This condition is necessary but not sufficient for qs to be correct, as it could always return the same sorted list. We also need to ensure its output is a permutation of its input, that is, that it contains the same number of occurrences of each element, a task suitable for the data structure Counter from the library Collections:

@given(xx=lists(integers()))
def test_is_permutation(xx):
    assert Counter(qs(xx)) == Counter(xx)

These two property tests are equivalent to a user-selectable number of conventional ones because a property test can repeatedly run on newly generated inputs. They can replace hundreds or thousands of lines of tests written one by one, provide a high level of confidence in the implementation, and a readable specification for qs (for theoretical work showing how much testing is sufficient, see Sec. 1.2.11.1). This flexibility also mitigates the issue of slow-running tests because, by changing a configuration parameter, we can run a more superficial but quick battery of tests or a much slower but more thorough one. Indeed I applied this two-speed strategy in developing the Rmr2 package [78], using Quickcheck for R [79]. While adding new features, I ran a quick set of tests until most bugs were fixed. Then, as we approached a release, I increased the number and size of tests to build more confidence in the correctness of the package. You may be more of a believer in quality over quantity in the writing of tests. In my experience, handwritten tests have several limitations. First, they are often trivial because the correct output has to be calculated by hand. When written in response to bug reports, they catch problems that have already been discovered. Finally, they sometimes share the same faulty assumptions as the code to be tested and, therefore, won’t probe it outside of those.

All these elements make property-based testing valuable, but there are a couple of catches. First, a developer must state properties that a function’s inputs and outputs must satisfy, and implement them. This task is easy for mathematical objects like matrices or well-defined algorithms like those for sorting and searching but less so in other domains. For example, how to approach testing that a graphical routine draws a circle? Another problem is that this approach relies on referential transparency, a complicated way to say that computer functions behave like mathematical functions, that is their outputs depend only on their inputs. For example, if a function has access to a database, its return value may depend on its contents. In this case there are workarounds, like starting with an empty test database and testing read and write functions in combination, but the code to be tested has to be amenable to this technique. Because of these limitations, we should be weary of emulating Dijkstra’s maximalism — the view that all software should be proven correct like a theorem (see testing in Sec. 1.2.11.1). Like formal methods, property testing has a domain of applicability, and within it, it is a great asset.

Another alternative to traditional documentation is doctests, generally written using literate programming — a technique to interleave text and code so that the latter can be executed [80]. If a chunk of code fails, the documentation related to it may need updating. One instance of this is the “Examples” section from the documentation of package Altair_recipies, which includes both the test suite and the output for each test, consisting of statistical graphics [81]. Maintaining such an extensive set of examples covering the whole library and a comprehensive test suite would have been daunting for a one-person, spare-time project. However, since they are one and the same, the cost is cut in half.

1.2.9 Languages

Paul Graham, Lisp hacker turned venture capitalist, wrote an essay devoted to conciseness, “Succinctness is power.” He sees succinctness as a productivity multiplier and believes it to be related to other positive qualities of code, such as readability. He focuses specifically on programming languages. He makes a characteristically articulate case for it [82]:

[…] clearly succinctness is a large part of what higher-level languages are for. If it is not all they’re for, then what else are they for […]

However, his point of view, while related, is different from the one espoused in this book, as he sees conciseness as a desirable feature in languages but not necessarily in programs:

We should be clear that we are talking about the succinctness of languages, not of individual programs. It is certainly possible for individual programs to be too densely written.

While it is possible to run into cryptic code (for instance, see Sec. 1.2.11.2 and the front cover), we shouldn’t regard general programming as a substantially different activity from designing languages. The boundary between core language and standard library is often blurred regardless, more a matter of packaging and versioning than anything else. In SICP, the authors write [68]:

To appreciate this point [that language interpreters are just programs] is to change our images of ourselves as programmers. We come to see ourselves as designers of languages, rather than only users of languages designed by others.

Therefore it is hard to agree with the point made by Graham that, while higher-level languages exist primarily for the sake of program conciseness, this should not be pursued when using these languages. He considers moderate succinctness in programs acceptable but not a goal.

Any non-trivial program should define the concepts relevant to the task on hand and implement data structures and operations to represent and manipulate these concepts, and this extends a language. However, in many languages, some extensions aren’t possible. To pick an example in Python, let’s assume we need to increment a group of variables by one, as in the following example:

a+=1
b+=1
c+=1

It is impossible to encapsulate this snippet in a function such as inc(a, b, c, by = 1) to better express intent and promote reuse. The reason is that function arguments in Python are passed by shallow copy: the value of an integer, a pointer for a list. In the case of an integer, modifying it inside a function would only change the copy rather than the original. Despite this, there actually is a way to implement this function, as shown by Stack Overflow user zwer [83]. Zwer uses a built-in mechanism to access the call stack, a data structure used by the language run-time to implement function calls. By doing that, we can access the list of variables in the calling environment and modify them at will. To my knowledge, debugging is the only recommended use for these features. Indeed, zwer’s solution comes with so many caveats and warnings that the author himself can’t recommend it as a viable one and named it here_be_dragons.

I didn’t select this example to pick on Python: languages are defined by what they allow and what they prevent. Some features are considered “dangerous” because they are believed to lead to the creation of bugs. Limiting side effects is encouraged in functional programming when they are at all possible. In C++, primarily an object-oriented language, parameters that allow side effects must be of reference type. Whether or not we agree with the specific choices made in a language, we need to operate within those limitations. Hence the lofty vision promoted in SICP of programmers as language designers must be tempered with the reality that it has to be achieved within the confines of the language one is using.

Possible ways to push beyond language limits are macros and reflection. The former enables defining abstractions that generate or modify code before execution; the latter, gives the ability to operate on the structures of the language run-time from the program being run, which is used in here_be_dragons. Lisp macros are notable for being expressed in Lisp itself, unlike, for instance, C macros, which use a dedicated macro language. This is natural in Lisp, since, in that language, expressions are also data structures. Every interpreter or compiler for any language needs to build a representation of the program which can only be a data structure, but in Lisp the relationship between code and representation is particularly simple, as programs are represented by lists [84]. In an example, developer and author Adam Tornhill shows how to eliminate code duplication for a specific web application using a macro: “Now […] we have a concise way to express web-pages with a uniform look […]” [85]. Not only is this solution concise, but also conciseness begets consistency because the same code generates all pages. Unfortunately, he doesn’t explain why macros are necessary in this case. Graham devotes a chapter in his “On Lisp” to explaining their domain of application [86]:

Macros can do two things that functions can’t: they can control (or prevent) the evaluation of their arguments, and they are expanded right into the calling context. Any application which requires macros requires, in the end, one or both of these properties.

For instance, being expanded in its calling context, a macro would allow implementing the inc function described above. However, this possibility isn’t unique to macros: pointer and reference parameters allow to modify the calling context in C++, as do var arguments in Pascal, but in a much more controlled fashion. As to the ability to control argument evaluation, for an alternative mechanism, see Sec. 1.2.9.3.

Besides these specific applications, Graham recommends using macros only “when nothing else will do” [86] because both macros and reflection can make programs harder to understand and hide subtle bugs [87]. Nonetheless, macros have an important role in Lisp. For example, the dominant object-oriented extension to Lisp, the Common Lisp Object System, is implemented with macros, as are several iteration constructs. Graham even shows how to implement a completely different language, Prolog (see Sec. 1.2.9.6), with just macros [86].

Steele thinks that the quest for the ideal, all-powerful language is bound to be frustrated. He says [14]:

[…] it’s really hard to make a language that’s great at everything, in part just because there are only so many concise notations to go around. There’s this Huffman encoding37 problem.

If this observation is correct, it suggests three ideas. Please note that these are based on an abstract, mathematical view of programs rather than a practical one. The extent to which they apply in practice is debatable. First, languages that forbid many constructs preclude them from expressing valid computations. Consequently, developers will have to use longer programs to define equivalent computations. An example is variable declarations and explicit initializations, which many advocate on a safety basis (see the Mariner I disaster, Sec. 1.5). Second, if two programs define the same computation, one of the two could have defined a different one. That computation will have to use a longer program. It is reasonable to conclude that a language with many ways to express the same computation will lead to more verbose programs. For instance, it is debatable whether a language needs both a map function, which applies a function to each element in a collection and returns a collection of its return values, and a list comprehension:

map(function, a_list)
[function(x) for x in a_list]

Finally, defining more common computations with shorter programs and accepting some verbosity for the less common ones could be a good compromise but implies different languages tailored for different application domains. For instance, a sum of two vectors in R is an element-by-element sum, even if the two vectors have different lengths, a problem solved with a technique known as recycling. In Python, the sum of two lists is their concatenation. In R, the concatenation function is concisely named c(), probably owing to its widespread use. In Python, there is no element-wise sum, but we can perform it with a relatively complex expression: map(lambda x, y: x + y, zip(x_list, y_list). There are vector libraries for Python, such as Numpy, whereby the sum of two vectors is defined as in R, save for the recycling rules. While we can accomplish the same results in both, the choices of each language and library reflect the decisions of their respective developers, which likely took into account what the most common use cases were expected to be and reserved the most concise forms for them.

1.2.9.1 Lisp

We started talking about Lisp in the previous section because of its macros, but it is notable in other ways. Its inventor, John McCarty, designed it as a concise alternative to the Turing machine38[88]:

Another way to show that Lisp was neater than Turing machines was to write a universal Lisp function and show that it is briefer and more comprehensible than the description of a universal Turing machine.39

The idea to turn it into an actual programming language came to a student of McCarty’s, Steve Russell, who wrote its first interpreter in assembly language.40 It wasn’t a large, challenging project, unlike other languages (both ADA and Algol68 raised concerns about their implementability; see Sec. 1.5). An interpreter for the Lisp in the original paper takes only 53 lines of Common Lisp — a more complex but also practical dialect thereof [89]. This is a remarkable combination: a simpler language should also be less powerful and therefore require more code for the same tasks, including the implementation of its own compiler. However, Lisp sidesteps this trade-off to some degree. The so-called “Computer Language Shootout,” a systematic comparison of many languages focused on small, well-defined tasks, corroborates its conciseness [90]. It ranks Common Lisp first among over 50 languages (including multiple implementations of the same language).

Clojure is a modernized cousin of Lisp, particularly focused on supporting concurrency. It hasn’t lost Lisp’s expressivity while getting none of the verbosity of the Java language, with which it shares the target virtual machine.41 Daniel Lebrero compares two programs for the same task, one in each language, and asks [91]:

So given that the inherent complexity of the problem is the same, and that the Clojure version is able to express the solution in 58 lines of code while the Java version require 441 lines of code, and that the Clojure version is easier to understand, what are those extra 383 (87% of the codebase) lines of perfectly idiomatic Java code about?

Lebrero also notices how much easier to review the Clojure version is, and that size is one or maybe the determining factor. Sure, concise code could be cryptic, and verbose code trivial, but this isn’t the case here:

[…] the Clojure version was way simpler to understand, and […] the fact of having a single file with 58 lines of code was a very important reason for it.

He ascribes the additional LOCs to “incidental complexity”, which he thinks is caused by the use of inappropriate tools. He concludes that LOCs aren’t helpful as a measure of productivity but as one of complexity (see Sec. 1.1).

1.2.9.2 Python

Paul Prescod, Python developer and book author, is quoted by Graham as stating that “Python’s goal is regularity and readability, not succinctness” [82]. Graham considers it a “damning thing to claim about a programming language.” However, many others think Python is concise, in practice, if not by design. For example, Nicholas Hahn, a software engineer, compared Python, Go, and Rust. He finds that “Python has the least code by far” but also that “it’s leaning heavily on features of the image library it’s using, but this is indicative of the general experience of using Python” [92]. For instance, in the Python version, the difference between two images requires a single method call, whereas it is done pixel by pixel in the other two languages. This observation doesn’t tell us much about the intrinsic qualities of each language while still being important in practice. Availability of libraries could depend on popularity, age — older languages accumulate more — and ease of development, which in turn depends on tools and resources available, as well as those intrinsic qualities. The “Computer Language Shootout” mentioned in the previous section ranks Python 19th out of 51 languages for conciseness [90].

It is less verbose than C in many cases. For example, the README file for a project by the CMU Cylab contains equivalent examples in several languages, including Python. However, it omits the one in C because “the same program in C will take too much space and will not fit into the README format” [93]. One of the advantages of concise code is that it is easier to discuss and present. It will fit into a few slides or, in this case, a README file. Verbose code is something to skim in long, painful, lonely sessions and to hide.

Python practitioners aren’t short on enthusiasm for their language of choice and may occasionally overstate the power of certain features. For example, Mahmoud Hashemi, author of package Boltons, would have us believe that “decorators are among Python’s most elegant and succinct language features.” However, all they replace is one assignment: @decorator instead of f = decorator(f) [94]. Hashemi may be referring to the use of higher-order functions, popularized by decorators, rather than their syntax (see higher order functions in Sec. 1.2.4).

1.2.9.3 R

RStudio co-founder Joe Chang, also known for the R package Shiny, thinks R is more expressive than Python and other languages. As the CTO of a company devoted to supporting this language, we should be aware of his biases. However, he is referring to specific features that are unique to R, among popular languages. In an interview, he said (best effort transcription from [95]):

[…] each one of these libraries has a bunch of features and a bunch of syntax that would not be possible certainly with Python, certainly with Java, certainly with C++ […] when you look at the interfaces of these packages users don’t necessarily know what magic we had to do to make this happen, they just know oh it’s really easy it’s really concise when I want to express my plots this way when I express my data manipulations this way […]

The “magic” he refers to includes a macro-like facility that optionally suspends the regular evaluation of expressions and replaces it with specialized variants. For instance, we can refer to columns in a tabular data structure, the data frame, in certain expressions without constantly repeating the name of the table, as in mutate(table, a = b + c), as opposed to the repetitive table$a = table$b + table$c, where a, b and c are names of columns in table. Users love these more concise forms, but having multiple evaluation rules makes programs harder to understand and hinders their parametrization and generalization. For instance, if we wanted to make the above expression general with respect to the names of the columns involved, we would need to use a different syntax (see package Dplyr in Sec. 1.2.10.1).

One task that R performs well is operating on vectors. It had better do, as it doesn’t even provide scalar types, replaced by vectors of length one. While most languages offer extensions for vectors or other types of collections, in R, by design, all library functions need to accept vectors. R developers think in vectors, and the comparatively modest speed of iterative constructs in R is such that they have to use vector functions in day-to-day work. That often helps with conciseness. For instance, Rasmus Bååth takes a simple spelling corrector in 21 lines of Python and translates it to two lines of R without using any extensions [96]. It is two oversized lines, though, and, once automatically reformatted in the R IDE RStudio, his code swells to 12 lines: more than 2, but still a great deal shorter than the original. I took the liberty to modify this code to eliminate one repetition and to use the pipe operator (%>%) to reorder the operations. The “classic” style of function composition in R is f(g(h(a1, a2), a3), a4) which many find counter-intuitive because functions execute in a temporal order opposite to that in which they appear, but also because arguments to the same function can end up far apart. An alternative is using a temporary variable:

x = h(a1, a2)
x = g(x, a3)
f(x, a4)

This approach adds some unnecessary steps and doesn’t convey the rigid organization of a pipeline. Using R’s macro-like facility, we can define a pipe operator similar to the ones in F# and the Unix shell and available in package Magrittr, such that the above expression is equivalent to h(a1, a2) %>% g(a3) %>% f(a4) [96]. In other words, the pipe operator turns the expression on its left side into the first argument to the function call on its right side, pushing the other arguments to the right. With these modifications, here is the concise spell corrector in R:42

readLines("http://www.norvig.com/big.txt") %>%
  paste(collapse = " ") %>%
  tolower %>%
  strsplit("[^a-z]+") %>%
  table %>%
  sort(decreasing = TRUE) %>%
  names -> sorted_words

In this step, we build a vector of words by:

This sorting order will help return the most common among equally close matches for a word. Now onto the actual spell-correcting function:

correct <-
  function(word) {
    distances = adist(word, sorted_words)
    c(sorted_words[distances <= min(distances, 2)],
      word)[1] }

It works by computing the edit distance43 between a given word and our list of correct, frequency-sorted words. This code’s conciseness stems from using vector functions instead of loops and the vast number of functions considered “core” to R (about 700). Bååth observes that, in particular, the function adist, calculating the edit distance between words, performs much of the difficult work. He adds that R also provides a function aspell in the standard library, which would reduce his program to a single line. Unlike the features referred to by Chang earlier as “magic,” I don’t think this example highlights any conciseness advantage intrinsic to the language, save for the use of the pipe operator. Instead, it shows the benefits of a vectorized programming style, whereby we can process entire vectors in one step, and the importance of being familiar with available libraries. We could likely achieve a similar result in Python, but this programming style may be less natural to practitioners of that language. It is a fine line between what a language allows and what it encourages or enforces.

1.2.9.4 JavaScript

While Python is the target of unwavering praise and R of a mix of praise and blame [97], JavaScript is subject to relentless condemnation (see Sec. 1.3.2) while, ironically, being arguably the most widely adopted language. It certainly is so when measured by the number of JavaScript programs simultaneously running since it is the programming language used for web development on the browser side. Because of its ubiquitousness and perceived shortcomings, developers have defined new languages that translate into JavaScript, correcting some of its defects but maintaining a considerable degree of compatibility. One of these, CoffeeScript, is touted for its conciseness. For example, software engineer Simon Grondin implemented “a rate limiter in 31 lines of CoffeeScript” and thinks this is “a demonstration of its exceptional expressiveness” [98]. However, when I revisited this implementation after first becoming aware of it [99], I found that it had grown to 8 times its original size [100] and that finding the core rate limiter logic among the cruft had become difficult. You may wonder whether the initial, concise implementation was more the result of a fresh take on the subject rather than the intrinsic qualities of the language. Eventually, this code seems to have succumbed to the drift toward complexity.

Not so the three-line implementation of partial application,44 also in CoffeeScript, by Smooth CoffeeScript’s author E. Hoigaard, which retained its original purity [101]. partial takes a function func of nn arguments and returns a function of mnm\le n arguments by providing nmn-m values in advance. In this specific implementation, this is accomplished by using the undefined constant as a marker for an argument that has no pre-specified value. For instance, a function rectangle(x,y,h,w) which draws rectangles centered in (x,y)(x,y) with height hh and width ww can be used to define a function that draws only unit squares as follows:

square = partial (rectangle, undefined, undefined, 1, 1)

Hoigaard starts from someone else’s 15-line implementation (see below), bringing it gradually down to its final svelte size, which I further trimmed down to two lines:45

partial = (func, a...) -> (b...) ->
  func (for arg in a then arg ?= b.shift())...

Even without knowing much about CoffeeScript, I could understand this implementation of partial after researching a couple of constructs. The meaning of this code is that partial is a function that accepts another function func plus a variable number of arguments in vector a and returns a function of a variable number of arguments (vector b). The body of this function invokes func with a list of arguments assembled in a list comprehension equal to a element by element when it is defined, otherwise picking the next value from b. Not only is it remarkably concise without being cryptic but also it implements an important feature that other languages lack and can’t add with a simple extension. So does the 15-line version — here reduced to 14 — but its meaning must be extracted from a jungle of indexes and tests:46

partial15lines = () ->
  [func, args...] = arguments
  wrapper = () ->
    i = 0
    j = 0
    res_args = []
    while i < args.length
      if args[i] == undefined
        res_args.push arguments[j]
        j++
      else
        res_args.push args[i]
      i++
    return func.apply null, res_args

Converted to JavaScript, the first step to execute CoffeeScript code, the two-line version expands to 21 LOCs, but an automated conversion may be more verbose than a skilled programmer’s.

In its latest version, ES6, JavaScript has evolved to look and work more like CoffeeScript. However, even the unimproved JavaScript, warts and all, seems to offer the potential for significant code reductions. Kiran Prasad, a software executive, recalls how LinkedIn switched from Ruby to server-side JavaScript (node.js) to meet their scalability goals for some mobile products [102]. In the process, their code base shrank from 60 KLOCs to 1-2 KLOCs, a reduction greater than 30-fold that wasn’t even a goal for this transition. He explains this remarkable achievement in three parts: the node.js version is “framework-free,” therefore free of framework-related cruft (I assume he means web framework, see Sec. 1.2.10.3); they switched to a more functional style from an object-oriented one, which had resulted in a lot of unneeded abstraction layers; and they moved rendering47 client-side48 from server-side, which relocated, rather than eliminated, some of the code. With so many concurrent changes, it is impossible to tell which ones were instrumental to the code savings. For me the most important lesson is that a 30-fold reduction in a codebase, which presumably freed 29 out of 30 engineers from maintenance work, was achieved as a side effect of the quest for scalability rather than being an explicit goal. Had LinkedIn selected a different approach, they might still be saddled with a 60-KLOC code base.

1.2.9.5 Erlang

Maybe not so if they went with Erlang. In a video presentation, its creator, Joe Armstrong, shows how a couple of pages of C translate to one line of Erlang (near 1:20 [103]). Unfortunately, this video is too fuzzy to read the C code. Later, a co-worker cites an internal study at telecommunication firm Ericsson that estimates the ratio between C and Erlang code for the same tasks as 7-fold. A factor to consider is that, just as node.js (see previous section) supports event-driven programming, Erlang includes specialized features to build reliable systems. In particular, it aims to solve the problem of creating them despite the reality of software errors by supporting a system’s ability to recover. Erlang achieves this goal by providing the developer with lightweight processes that don’t share resources and communicate via asynchronous message passing. They are organized in hierarchies, whereby those higher up supervise those below. Rather than trying to code defensively a single process to keep it going at all costs, developers focus on recovering when something goes wrong. The goal isn’t eliminating crashes but making them ordinary, recoverable events. It is easier to understand this concept in the context of the original target application: software for telephone switches. The goal, in that case, is to keep a large number of conversations going, a task that naturally lends itself to concurrency. Other applications written in Erlang are the messaging library RabbitMQ and the messaging app WhatsApp. However, its noteworthy features aren’t limited to its concurrency and fault tolerance models. It is also a functional language with pattern matching and many other modern features. Armstrong wrote a simple side-by-side comparison of Java and Erlang programs computing the area of geometric shapes [104]. This is the Erlang version:

area({rectangle, Width, Ht}) -> Width * Ht;
area({square, X}) -> X * X;
area({circle, R}) -> 3.14159 * R * R.

And this is the Java version:

abstract class Shape {
    abstract double area();
}

class Circle extends Shape {
    final double radius;
    Circle(double radius) { this.radius = radius; }
    double area() { return Math.PI * radius*radius; }
}

class Rectangle extends Shape {
    final double ht;
    final double width;
    Rectangle(double width, double height) {
        this.ht = height;
      this.width = width;
    }
    double area() { return width * ht; }
}

class Square extends Shape {
    final double side;
    Square(double side) {
      this.side = side;
    }
    double area() { return side * side; }
}

Skipping empty or single-character lines, we have a 3:17 ratio in favor of the Erlang program. However, despite its conciseness, more than a million lines of Erlang were necessary for its first production deployment, the ATM switch AXD 301 [105].

1.2.9.6 Prolog

Armstrong supported conciseness in contexts other than his programming language. In another interview, this is what he said about Prolog [14]:

[Prolog is] not widely used. And it’s a great shame because programs are incredibly short.

While it is possible to write all types of applications in Prolog (Armstrong used it for the first version of the Erlang interpreter [106]), that is uncommon. Prolog is a logic-based language used primarily in AI, specifically the flavor known as symbolic AI. Programs consist of facts and rules that allow inferring more facts. A classic example is the implementation of a family tree. For instance, let’s consider the ancestry of Charles II of Spain going back to Philip I and Joanna of Castile in Fig. 4.

A portion of the ancestry of Charles II of Spain.

Figure 4: A portion of the ancestry of Charles II of Spain.49

We can implement this tree and some simple searches in a few lines of Prolog. First, we need to enter what would be considered data in other languages, that is, the parent-offspring relation by listing Prolog facts:

parent("Philip I of Castile", "Charles V Holy Roman Emperor").
parent("Joanna of Castile", "Charles V Holy Roman Emperor").
parent("Philip I of Castile", "Ferdinand I Holy Roman Emperor").
parent("Joanna of Castile", "Ferdinand I Holy Roman Emperor").
/* more parent-child pairs as shown in the graph */

Then we have a set of rules that allow to derive new facts from existing ones:

offspring(X,Y):- parent(Y, X).
ancestor(X, Y):- parent(X, Y).
ancestor(X, Y):- parent(X, Z), ancestor(Z, Y).
descendant(X, Y):- ancestor(Y, X).

These four lines define three relations:

  1. offspring is the inverse of parent;
  2. ancestor contains pairs (X, Y) for which either X is a parent to Y or there exists a Z such that X is a parent to Z and Z is an ancestor to Y;
  3. descendant is the inverse of ancestor.

There is nothing wrong in having two rules for ancestor as it is a way to express disjunction: either one applied successfully will be sufficient to establish that a specific pair is an ancestor-descendant pair. With just a handful of LOCs, we can already ask questions like “Who were the ancestors of Anna of Austria?”:

?- findall(X, ancestor(X, "Anna of Austria"), XX).
XX = ["Ferdinand I Holy Roman Emperor", "Anna of Bohemia and Hungary", "Philip I of Castile", "Joanna of Castile"].

If we want to see how a person is an ancestor of another, and not just whether that person is, we need only two more rules:

path(X, Y, []):- parent(X, Y).
path(X, Y, [Z|P]):- parent(X, Z), path(Z, Y, P).

which mimic the definition of ancestor but add a list variable P to accumulate the ancestry path. [Z|P] is the list obtained by prepending Z to P. We can paraphrase the second rule as “There is a path consisting of Z followed by P between X and Y if there is a Z that has X as a parent and there is a path P between Z and Y”.

?- path("Philip I of Castile", "Charles II of Spain", XX).
XX = ["Charles V Holy Roman Emperor", "Philip II of Spain", "Philip III of Spain", "Maria Anna of Spain", "Mariana of Austria"] .

We can combine path with findall to ask the interpreter and provide not one but all paths connecting two members of the dynasty (the output is truncated):

?- findall(XX, path("Philip I of Castile", "Charles II of Spain", XX), YY).
YY = [["Charles V Holy Roman Emperor", "Philip II of Spain", "Philip III of Spain", "Maria Anna of Spain", "Mariana of Austria"],
["Charles V Holy Roman Emperor", "Philip II of Spain", "Philip III of Spain", "Philip IV of Spain"], ["Charles V Holy Roman Emperor", "Maria of Spain", "Anna of Austria '49", "Philip III of Spain", "Maria Anna of Spain", "Mariana of Austria"],
["Charles V Holy Roman Emperor", "Maria of Spain", "Anna of Austria '49", "Philip III of Spain", "Philip IV of Spain"],
["Ferdinand I Holy Roman Emperor", "Maximilian II Holy Roman Emperor", "Anna of Austria '49", "Philip III of Spain"|...],
["Ferdinand I Holy Roman Emperor", "Maximilian II Holy Roman Emperor", "Anna of Austria '49"|...], ["Ferdinand I Holy Roman Emperor", "Anna of Austria"|...], ["Ferdinand I Holy Roman Emperor"|...], [...|...]|...].

In ordinary families, there should be a unique path linking an ancestor and a descendant, but not in European royal families of the 16th and 17th centuries, and certainly not in this one. Since political expedience dictated marriages, unions between cousins or an uncle and his niece were not unusual. The cumulative effect over several generations was to reduce the genetic diversity to unhealthy levels: Charles II of Spain was extraordinarily sickly and died at just 39; his sister only made it to 21.

Leaving royal inbreeding aside, we can add more features to this genealogy system: finding if two individuals are half-siblings, if they are related in any way, which common ancestors they share, or which one is the closest. Each feature requires one or two new rules. I can’t even think how long it would take me to implement this from first principles in a more traditional language. Behind the scenes, while most languages execute statements in the order they are written or as specified by control structures, Prolog applies a search and unification algorithm. The unification part solves the problem of matching parent(X, "Charles II of Spain") with parent("Philip IV of Spain", "Charles II of Spain") for X = "Philip IV of Spain". The search part explores all the connections between matching facts and rules. It may help to think of it as a database with a powerful query language. It looks like Prolog would free the developer from worrying about algorithms, that is, the specific steps needed to produce results. Unfortunately, it is more complicated. Sometimes the rule order has a drastic impact on the search space. In other cases, there are different sets of rules to accomplish the same task, but they can make the search process more or less efficient: for instance, see how different sorting algorithms can be implemented [107].

1.2.9.7 Kotlin

Kotlin is a language that runs on the JVM50 and is thus seen as an alternative to Java. It prominently advertises its conciseness [108]:

Concise
Reduce boilerplate code […] For example, one can now create a POJO [Plain Old Java Object] with getters, setters, equals(), hashCode(), toString() and copy() in a single line.

POJOs are just Java objects that don’t satisfy any additional specifications. It is customary to provide them with the methods listed in the quote, which used to be written by hand. I distinctly remember when, early in my career, I discovered I had to write POJOs for the simplest tasks, like accessing a database table. It was like running while submerged to the waist: a trivial curiosity required half hour of mindless typing. I couldn’t believe it hadn’t been automated in some way. Senior colleagues reassured me that it was standard procedure and that I would improve with practice.

In Kotlin, data classes replace POJOs. An example would be:

data class Person(var name: String, var age: Int)

In Java, this would look like the following snippet, the longest included in this book:

public class Person {

  private String name;
  private int age;

  public Person() {
  }

  public Person(String name, int age) {
    this.name = name;
    this.age = age;
  }

  public String getName() {
    return name;
  }

  public void setName(String name) {
    this.name = name;
  }

  public int getAge() {
    return age;
  }

  public void setAge(int age) {
    this.age = age;
}

  @Override
  public String toString() {
    StringBuilder sb = new StringBuilder();
    sb.append(Person.class.getName())
      .append('@')
      .append(Integer.toHexString( System.identityHashCode(this)))
      .append('[');
    sb.append("name");
    sb.append('=');
    sb.append(((this.name == null)?"<null>":this.name));
    sb.append(',');
    sb.append("age");
    sb.append('=');
    sb.append(this.age);
    sb.append(',');
    if (sb.charAt((sb.length()- 1)) == ',') {
      sb.setCharAt((sb.length()- 1), ']');
    } else {
      sb.append(']');
    }
    return sb.toString();
  }

  @Override
  public int hashCode() {
    int result = 1;
    result = ((result* 31)
             +((this.name == null)? 0 :this.name.hashCode()));
    result = ((result* 31)+ this.age);
    return result;
  }

  @Override
  public boolean equals(Object other) {
  if (other == this) {
    return true;
  }
  if ((other instanceof Person) == false) {
    return false;
  }
  Person rhs = ((Person) other);
    return (((this.name == rhs.name)
           ||((this.name!= null)
             &&this.name.equals(rhs.name)))
          &&(this.age == rhs.age));
  }
}

Java developers aren’t fools and, in this day and age, use code generators to write this boilerplate51 for them. However, there is a problem with this approach. Generated code needs to be read, understood, and maintained. Granted, it doesn’t look all that difficult, but how do we find out someone didn’t change some detail in the equals method after its generation? That is why code generation isn’t a suitable replacement for abstraction. Moreover, I believe POJOs tell us something about Java. What is wrong with a language that doesn’t allow abstracting this boilerplate away? For instance, one in which a developer can’t express with code the following algorithm: “To implement equality comparison for a class, for each property in that class, call the equals method on that property and return true if all of these calls return true?” It seems like one of the most straightforward and useful abstractions you could come up with. In Python, the specific issue of boilerplate methods can be solved with a library (for instance, Attrs; see Sec. 1.2.10.4), but, apparently, not in Java. Kotlin offers much more than data classes, and replacing POJOs was only one of the motivations behind the language, but one that will get the attention of POJO-aggrieved Java programmers.

1.2.9.8 C++

R’s vector functions perform well because they are written in other, lower-level, but more efficient languages, typically C or C++. The most downloaded R package is indeed one that facilitates mixed R and C++ development, Rcpp. Its maintainers took advantage of new features in Cpp11, a relatively recent update to the C++ standard, to refactor or, should I say, cull the Rcpp code base. Software engineer Kevin Ushey reports the following [109]:

Eliminating the code bloat and replacing [it] with variadic templates52 was awesome. […] For some hard numbers, I count 31,428 lines of code in the inst/include directory of Rcpp11, while there are 92,726 total lines of code in inst/include for Rcpp.

This process resulted in a 3-fold reduction stemming from a single new language feature and without any impact on the Rcpp user, for whom it is entirely transparent. Templates enable the parametrization of variable types, but until C++11, the number of type parameters had to be constant. As a workaround, Rcpp developers, to provide the flexibility of functions with a variable number of arguments, painstakingly replicated and adapted their code for each number up to a somewhat arbitrary maximum, 20 (not unlike what my team-mate did with nesting levels in Sec. 1.2.4, but in that case a better solution was readily available). For example, searching for all the Vector create definitions in Rcpp, we find 20 instances [110]:

static Vector create(const T1& t1)
static Vector create(const T1& t1, const T2& t2)
static Vector create(const T1& t1, const T2& t2,
                     const T3& t3)
static Vector create(const T1& t1, const T2& t2,
                     const T3& t3, const T4& t4)
static Vector create(const T1& t1, const T2& t2,
                     const T3& t3, const T4& t4,
                     const T5& t5)
[14 entries omitted]
static Vector create(const T1& t1, const T2& t2,
                     const T3& t3, const T4& t4,
                     const T5& t5, const T6& t6,
                     const T7& t7, const T8& t8,
                     const T9& t9, const T10& t10,
                     const T11& t11, const T12& t12,
                     const T13& t13, const T14& t14,
                     const T15& t15, const T16& t16,
                     const T17& t17, const T18& t18,
                     const T19& t19, const T20& t20)

To add insult to injury, each definition needs a largely repetitive but slightly customized implementation — here omitted. The developers were aware of this massive amount of repetition. They prefaced this code with a comment describing an imaginary abstraction that deals with the 20 cases in a unified way, which was impossible before C++11.

In Rcpp11, instead, there is only one definition for all these cases, and a single implementation [111]:

static Vector create(Args&&... args)

The line count went from 1245 to 144. However, there is a price to pay for this code reduction: an ever more sprawling C++ standard for a language already quite complex in its first incarnation (see Sec. 1.3.2). Its evolution has been guided by an unwavering commitment to backward compatibility,53 which implies features can’t be removed, and an openness to new features. Its standard committee doesn’t seem particularly concerned about the language becoming harder to learn and read. They aren’t alone in taking this position: for instance, Python has been growing as well, adding list comprehensions, coroutines, and f-strings, among other features. Unlike C++, though, Python abandoned full backward compatibility in its 3.0 version. The co-existence between the two incompatible versions continues at the time of this writing. They were officially supported in parallel for 12 years, explaining why language steering committees often accept backward compatibility as a constraint, no matter how burdensome.

1.2.9.9 Query Languages

SQL queries can quickly grow to be unmanageable. Luckily, there are libraries to side-step this problem, generating SQL from much more concise expressions. Dplyr (see Sec. 1.2.10.1) is one such library. For instance, here is a typical R expression using Dplyr:

mtcars %>%
group_by(cyl) %>%
filter(mpg == min(mpg) | mpg == max(mpg)) %>%
select(mpg, model) %>%
arrange(mpg)

First, remember the meaning of %>%, the pipe operator (see Sec. 1.2.9.3), which takes the expression on its left hand side and makes it the first argument for the function call on its right side. This code takes the data in table mtcars, a car-related dataset, groups it by the number of cylinders, filters the rows that match the minimum and maximum fuel mileage in each group, keeps only specific columns, and sorts the rows by increasing gas mileage. The value of this expression is the following table:

cyl   mpg model
  8  10.4 Cadillac Fleetwood
  8  10.4 Lincoln Continental
  6  17.8 Merc 280C
  8  19.2 Pontiac Firebird
  6  21.4 Hornet 4 Drive
  4  21.4 Volvo 142E
  4  33.9 Toyota Corolla

Dplyr can process datasets in memory or database tables, the latter by generating SQL queries. For example, the following is generated from the above expression (assuming that the target database is Postgres, as SQL is still subject to proprietary variations, despite the existence of a standard):

SELECT `cyl`, `mpg`, `model`
FROM (SELECT `cyl`, `mpg`, `model`
FROM (SELECT `cyl`, `mpg`, `model`,
  MIN(`mpg`) OVER (PARTITION BY `cyl`) AS `q01`,
  MAX(`mpg`) OVER (PARTITION BY `cyl`) AS `q02`
FROM `mtcars`) `q01`
WHERE (`mpg` = `q01` OR `mpg` = `q02`)) `q02`
ORDER BY `mpg`

It is possible that a skilled SQL developer could come up with a more concise query, but there is no denying that the R-Dplyr expression is more clear and concise.

Impala, an SQL variant specialized for big data, adds support for complex types. One of the advantages of this extension is that “common querying patterns can be expressed concisely” [112]. This statement reprises the concept that this isn’t possible for all patterns (see Steele’s “Huffman encoding problem” in Sec. 1.2.9). With knowledge of which are more valuable, it is possible to make those more concise, while allowing the expression of less common patterns in more complex ways rather than dropping them altogether.

1.2.10 Libraries

As we have seen in Sec. 1.2.9, libraries should not be seen as totally distinct from languages. On the contrary, they are language extensions, and the line between them is sometimes blurred. It is no surprise, then, that we can look at them from the point of view of their conciseness, meaning the conciseness they enable in the programs that use them — we will talk about frugal APIs later on (see Sec. 1.3.3). It is almost a given that using a library allows us to write less code than managing without it (but see Sec. 1.2.9.4 for an exception), possibly at the cost of a steep learning curve. However, not all libraries are created equal. We will see through several examples that some libraries specifically target conciseness. Users value this property, whereas they criticize or avoid libraries that force verbose code, particularly in the form of boilerplate.

1.2.10.1 Data Analysis and Processing

A remarkable example is the library Keras for deep learning (which is covered in Sec. 2.1.4). The brainchild of François Chollet, it entered a segment populated by many strong competitors. It doesn’t intend to replace them but only to provide a better API that makes developing deep learning models easier. On the implementation side, it relies on a user-selectable preexisting deep learning library: you can see it as a thin layer of peanut butter and jelly, with the underlying bread provided by others. Therefore it isn’t any faster or more powerful than any of these libraries, but only more convenient and makes developers more productive. These properties are based, at least partly, on the conciseness it enables. Version 0.2.0 opened its README with “Keras is a minimalist, highly modular neural network library” [113]. Among the design principles explained in the same document, we find: “Minimalism. Each module54 should be kept short and simple (<100 lines of code).” Strangely this principle was dropped in later versions, but not the following one [114]:

No separate models [sic] configuration files in a declarative format. Models are described in Python code, which is compact […].

Keras avoids defining a new, separate description language, finding Python is perfectly adequate and highlighting the compactness of model descriptions, which are also modular and easy to combine. Its users agree: fast.ai co-founder Rachel Thomas calls one of the alternatives to Keras, Tensorflow, “verbose and confusing” and recommends switching [115]. Indeed Tensorflow developers gave it the strongest endorsement, by adopting it as their recommended API and incorporating it into their package distribution. A run-of-the-mill feed-forward neural network in Keras looks as straightforward as it should be:

model = (
    Sequential()
    .add(Dense(10, input_dim=4, activation="relu"))
    .add(Dense(1)))

There is a small terminology catch: what everyone else calls a feed-forward neural network here is a sequential model, which may suggest a model for sequences of variable length, such as time series. Its input has length four, as the input_dim parameter specifies. The two layers of this network are added one at a time, calling a function named with the desired connectivity pattern, Dense, and parametrized with their sizes and, optionally, activation functions. Everything else is set to reasonable defaults customizable with additional parameters. Let’s see what this looked like in Tensorflow before Keras was bundled with it:

inputs = placeholder(float32, shape=(None, 4))


# Hidden layer
hidden_size = 10
weights_hidden = Variable(random_normal([hidden_size, 4], stddev=0.01))
bias_hidden = Variable(constant(0.1, shape=(hidden_size, 1)))
hidden = relu(add(matmul(weights_hidden, transpose(inputs)), bias_hidden))


# Output layer
weights_output = Variable(random_normal([1, hidden_size], stddev=0.01), name='weights_output')
bias_output = Variable(random_normal([1, 1]))
output = add(matmul(weights_output, hidden), bias_output)

The general structure is the same: we start with an input layer, created with the opaquely named placeholder, then create a hidden layer, and finally an output layer, but each requires 3-4 lines. Crucial to creating each is the also opaquely named Variable, which creates a vector or matrix object that can be initialized and later optimized during training. You can see different initialization strategies being used. The layers’ definitions make explicit the underlying mathematical operations, such as add, matmul, and relu, and are combined by function composition instead of the add method. On the one hand, this approach enables the complex architectures used in deep learning, for which the Sequential API in Keras is inadequate. On the other, they expose lots of details practitioners are often happy to leave at defaults, resulting in three times as much code. Particularly burdensome are those details about layer dimensions that need to be compatible between layers and are all calculated automatically in Keras. Confusingly, the layer dimensions, for which there are no reasonable defaults, must be specified as parameters to the initialization strategy, for which, instead, there are defaults. It may seem a classic case of power vs. ease-of-use trade-off: after all, reducing a layer to its mathematical definition means we can swap one transformation for another and create all sorts of experimental neural modules. Moreover, composing those modules by name means great flexibility in defining complex architectures. However, Keras isn’t only for feed-forward neural networks: its Sequential API is just a teaser for newcomers or the result of recognizing this type of network is perfectly adequate in many applications. For more advanced ones, it offers a Functional API, more similar to the one native to Tensorflow, but with significant differences. The following is how the same network can be defined using this API:

inputs = keras.Input(shape=(4,))
dense = layers.Dense(10, activation="relu")
hidden = dense(inputs)
outputs = layers.Dense(1)(hidden)
model = keras.Model(inputs=inputs, outputs=outputs)

This approach adds the flexibility researchers need to define cutting edge architectures while not being substantially more verbose than its Sequential API. Like the original Tensorflow API, it uses function composition to assemble neural modules but avoids several of its pitfalls: this API is still centered around machine learning concepts of input and layer rather than the more opaque Variable and placeholder as in Tensorflow; the layer dimensions are specified directly, not through initialization; and there are reasonable defaults for initialization and numeric types. When necessary, we can always define a custom layer at the level of mathematical operations.

However, as good as Keras is, some people think there are better options: it is a very active field at the time of this writing. If Thomas vouches for Keras, fast.ai’s other co-founder Jeremy Howard has already moved on. Reporting on a comparison between Keras and PyTorch, another library for the same class of applications, he writes [116]:

We built models that are faster, more accurate, and more complex than those using Keras, yet were written with much less code.

This code wasn’t immediately available. While I don’t have direct experience with PyTorch, a quick look at plain-vanilla examples showed an API at an abstraction level somewhere between Keras’s functional API and Tensorflow’s, with additional object-oriented boilerplate layered on top, and I wasn’t impressed. However, PyTorch was built for experimentation and not for plain-vanilla models. Therefore, its true worth may not be evident unless someone works with models with the complexity Howard is interested in. Whichever the current champion, it doesn’t look like anyone is finding value in verbose machine learning code, while models are getting more complex over time, putting additional emphasis on compact descriptions. Moreover, no API designer can escape the compromise between the complexity of an API, its power, and the complexity of user code. Nonetheless, some navigate it better than others.

Large deep-learning models are trained on GPUs. While most practitioners don’t have to code directly for the GPU, those writing advanced custom models often do, a task that requires highly specialized skills. Triton, a library from OpenAI, aims to change that. In their words [117]:

Triton makes it possible to reach peak hardware performance with relatively little effort; for example, it can be used to write FP16 matrix multiplication kernels that match the performance of cuBLAS — something that many GPU programmers can’t do — in under 25 lines of code. […] although a variety of systems have recently emerged to make this process easier, we have found them to be either too verbose, lack flexibility or generate code noticeably slower than our hand-tuned baselines.

As a taste of programming for the GPU, let’s consider the “Hello World”55 of this field, which is the sum of two vectors. Here is the Triton version (lightly edited from [117]):

BLOCK = 512
@jit
def add(x, y, z, n):
   pid = program_id(0)
   index = pid * BLOCK + arange(BLOCK)
   mask = index < N
   gpu_x = load(x + index, mask=mask)
   gpu_y = load(y + index, mask=mask)
   store(z + index, gpu_x + gpu_y, mask=mask)

GPUs operate with a high degree of parallelism, and individual threads are organized in blocks. Here, the program_id call provides an id for the current block as the function add runs in parallel on each block of size BLOCK. Within each block, there can be even more parallelism, but the library manages this transparently. The load and store calls are responsible for moving data to the GPU and back. mask prevents out-of-range accesses. Contrast this with the corresponding program written in C using a library called CUDA (lightly edited from [118]):

__global__ void add(int *a, int *b, int *c, int n) {
  int index = threadIdx.x + blockIdx.x * blockDim.x;
  if (index < n)
    c[index] = a[index] + b[index];
}


#define N (2048*2048)
#define THREADS_PER_BLOCK 512
#define BLOCKS ((N + THREADS_PER_BLOCK-1) / THREADS_PER_BLOCK)
int add(int *a, int *b, int *c) {
  int *gpu_a, *gpu_b, *gpu_c;
  int size = N * sizeof(int);

  cudaMemcpy(gpu_a, a, size, cudaMemcpyHostToDevice);
  cudaMemcpy(gpu_b, b, size, cudaMemcpyHostToDevice);

  add_kernel<<<BLOCKS,THREADS_PER_BLOCK>>>(gpu_a, gpu_b, gpu_c, N);
  cudaMemcpy(c, gpu_c, size, cudaMemcpyDeviceToHost);
  cudaFree(gpu_a);
  cudaFree(gpu_b);
  cudaFree(gpu_c);
}

We need a pair of functions, the first of which is a so-called “kernel,” the part that runs on the GPU. It is marked by they keyword __global__, as the documentation explains:

__global__ is a CUDA C/C++ keyword meaning [function] will execute on the device [and] will be called from the host.

Here “device” means GPU. If you find __global__ a confusing choice to mark these functions, you aren’t alone since “global” is already a charged word in programming, mostly having to deal with the scope of variables. Calculating which indexes to operate upon follows a logic close to the Triton example. However, in this case, instead of an index range, we have a purely scalar operation. The second function runs sequentially on the CPU and takes care of memory allocation, deallocation, and moving data in and out of the GPU. It invokes the kernel with a unique triple chevron syntax that provides information about the desired level of parallelism. While already demonstrating the conciseness of Triton, this basic example doesn’t highlight its ability to perform automatic optimizations that are far from trivial and need to be implemented manually in C and CUDA. An example where this ability is on display is the implementation of the function softmax, available in Python with Triton [117] and C with CUDA [119]. However, a complete understanding of these examples requires a more in-depth treatment of GPU programming than I can provide. Libraries such as PyTorch [120] and Numba [121] enable equally or even more concise GPU programming. Unfortunately, they also impose a significant performance penalty over Triton or hand-optimized C CUDA kernels, in a highly performance-sensitive application domain.

The quest for conciseness isn’t limited to deep learning. For example, the authors of SparkSQL, a module that extends the big data processing system Spark with relational capabilities,56 are keen on touting its conciseness, both of its internals as well of sample programs. They do so at least seven times in the paper where it was first described [122]. SparkSQL attempts to combine relational processing with semi-structured, ETL-type57 processing. A basic example from the original article is the following:

employees
  .join(dept, employees("deptId") === dept("id"))
  .where(employees("gender") === "female")
  .groupBy(dept("id"), dept("name"))
  .agg(count("name"))

In this expression, employees and dept are two relational tables. The authors remark that this being just a library within a fully featured programming language (either Scala or Python) means that we can use control structures, variables, and all the features of one together with the powerful declarative style of SQL (see also Sec. 1.2.10.1). Programming directly in SQL would not afford these possibilities. Since evaluation in SparkSQL is delayed as much as feasible, the developer can build abstractions for common subqueries without a performance penalty, because only the final, fully assembled query will be submitted to Spark and therefore optimized as a unit. As a SparkSQL user put it, it is “concise and declarative like SQL, except I can name intermediate results,” and naming is essential to reuse.

The authors of Scalding, also a Scala library, picked this byline for their twitter account [123]:

Scalding is a scala DSL58 for Cascading, running on Hadoop. It’s concise and functional, and it’s just good, clean fun.

We don’t need to take a slogan at face value. On the other hand, it is what the Scalding team wanted to advertise, as opposed to, say, that it is scalable. Since it runs on top of Hadoop,59 it inherits its scalability properties. However, since Hadoop is written in Java that was the only language to develop programs running on it, and its Java API was phenomenally verbose. For example, I once counted approximately 250 lines to perform a simple relational join (that was using an early version, and I hope more concise alternatives have been developed since). Therefore, it is easy to understand why Scalding developers would like us to focus on its conciseness, which we can appreciate in the following example of a join, despite the verbiage devoted to static typing [124]:

case class Book(title: String, author: String)
case class ExtendedBook(title: String, author: String, copies: Long)

val library1: TypedPipe[Book]
val library2: TypedPipe[ExtendedBook]
// somehow load them with data
val group1: CoGroupable[String, Book] = library1.groupBy { _.title }
val group2: CoGroupable[String, ExtendedBook] = library2.groupBy { _.title }
val theJoin: CoGrouped[String, (Book, ExtendedBook)] = group1.join(group2)

Many data manipulation libraries have the data frame as their central data structure. It can be described as tabular or relational data or, in the jargon of statistics, a two-dimensional collection with one row per observation and one column per variable. There is something about this data structure that makes it so valuable. For example, Spark co-author and Databricks Chief Technologist Matei Zaharia says that “[data frames] give you a very concise way of writing expressions, doing operations” (my transcription from [125]).

Dataframes are built into R as the data structure to represent empirical observations, but how R developers work with them has changed over time. An R library, Dplyr, provides an alternative to built-in capabilities and won out with users, who praise its conciseness — its speed also does help. Take for instance this tweet thread between Vince Buffalo and J. J. Emerson, two evolutionary geneticists [126]:

I ♥ how R makes much of data analysis trivial. How good are genotype imputations by freq class? table(real_geno, imputed_geno, cut(freq, 5))
Some of my old R code (100’s [sic] of lines) could now be reduced to a 20-liner with plyr, ggplot, etc.
Haha, yea mine too. The Hadley-sphere is a black hole for wasted/unnecessary lines of code.

The Hadley-sphere mentioned here is a set of packages, including Dplyr, by the same developer, Hadley Wickham. While not all equally popular, some — Dplyr and Ggplot2 above all — have largely replaced corresponding built-ins or core libraries bundled with the language for most users, a rare feat among libraries, as the advantages of incumbency are as real as in politics (see also Requests in Sec. 1.2.10.3).

Tony Fischetti puts it in no uncertain terms in his blog post “How dplyr replaced my most common R idioms” [127]:

Perhaps more importantly, the dplyr code uses many fewer lines and assignments and is more terse and probably more readable with its neat-o “%.%” operator.

His blog post clarifies that these conciseness advantages are more important than a 2x speed-up for his use cases. The operator mentioned in the quote is a version of the pipe operator (see Sec. 1.2.9.3).

A problem with Dplyr is that its author, to facilitate interactive use, made life harder for people who want to create programs with it [128]. R is a programming language, but, even more importantly, a console for statistical analysis by non-programmers. This dual use is uncommon among programming languages — other examples are MATLAB and SAS. Nonetheless, it is better to support interactive and programming uses with the same constructs and libraries to ease the transition from one to the other. To achieve that, Wickham designed two very similar sub-APIs in the same package, one for each use case. I was involved in a discussion about this as the developer of a related package, and I resisted that option calling it “a doubling of the API footprint” [129]. However, my solution required slightly more boilerplate in programming use cases. More recent versions of Dplyr have gone down a similar path, deprecating one of the sub-APIs (for frugal APIs, see Sec. 1.3.3).

We have already touched upon some of Dplyr’s features in Sec. 1.2.9.3, as they are the best known application of R’s unique macro-like facility. When a data frame table includes columns a and b, we can easily create a new data frame with an additional column c with table = mutate(table, c = a + b), as opposed to the standard R table$c = table$a + table$b. However, what if we wanted to parametrize the columns involved in this operation? Let’s assume the names of the columns are stored, as strings, in three variables a_name, b_name, and c_name. In this case, neither of the above expressions will work. Standard R offers a different operator, [[]] for this case: table[[c_name]] = table[[a_name]] + table[[b_name]]. Dplyr provides a now deprecated sub API of functions, whose names are suffixed with a cryptic underscore character, to support this case: table = mutate_(table, c_name = paste(a_name, "+", b_name)), where paste concatenates its argument to yield the string "a + b". While this form avoids the endless repetition of the data frame’s name, it requires forming R expressions by string operations, an error-prone and hard-to-read approach. The currently recommended solution is to refer to the data frame not by its name but by a conventional name that can occur inside Dplyr expressions only as in table = mutate(table, c_name = .data[[a_name]] + .data[[b_name]]), which is the longest of all the different versions examined here, and restores the repetition of names necessary in standard R. As Steele said about languages, “there are only so many concise notations to go around,” and the same seems to hold for libraries (see Sec. 1.2.9). This is an instructive example of the subtle trade-offs between complexity in a library and in its users’ code and optimizing for different use cases.

Like Keras (see Sec. 1.2.10.1), Dplyr delegates computation to one of several backends, including relational databases. Its developer chose to add advanced SQL features known as window functions, but with limited support[130]:

[…] you can’t access rarely used window function capabilities (unless you write raw SQL), but in return, frequent operations are much more succinct.

Conciseness is difficult; therefore, as we have seen before (see Impala in Sec. 1.2.9.9), an option is making popular features and idioms more concise while allowing a more verbose expression for the less common ones. Here the pain goes further, though, since we have to “punch through the abstraction,”60 reaching for the separate, lower level language of the backend, SQL, and losing the portability61 of “pure” Dplyr expressions.

Unlike R, Python doesn’t come with data frames built-in, but the library Pandas provides equivalent capabilities. However, it is the opinion of some practitioners that it is less elegant than its R counterpart, Dplyr. The project Pandas-ply aims to provide a more Dplyr-like API for Python. Its authors write [131]:

In our opinion, this pandas-ply code is cleaner, more expressive, more readable, more concise, and less error-prone than the original pandas code.

It is still in an experimental stage, and it is far from having the impact on the Python user base that Dplyr had on R users. You can judge whether it should by looking at the following examples, from the Pandas-ply introductory material, the first of which is written using Pandas:

grouped_flights = flights.groupby(['year', 'month', 'day'])
output = pd.DataFrame()
output['arr'] = grouped_flights.arr_delay.mean()
output['dep'] = grouped_flights.dep_delay.mean()
filtered_output = output[(output.arr > 30) & (output.dep > 30)]

Pandas-ply lets you instead write:

(flights
  .groupby(['year', 'month', 'day'])
  .ply_select(
    arr = X.arr_delay.mean(),
    dep = X.dep_delay.mean())
  .ply_where(X.arr > 30, X.dep > 30))

While the line count is similar, the Pandas example is more “busy,” with temporary results stored in intermediate variables and more repeated variable names. The Pandas-ply version uses a chained method call syntax that avoids intermediate variables and uses a special object X to simulate some of the Dplyr syntax based on R’s macro-like facility, which has no equivalent in Python.62 However, even more telling is the comparison with their R closest counterpart, Dplyr. Recalling the example (see Sec. 1.2.9.9) where we sought to compute, for each group of data, the entries matching the maximum and minimum for a variable, it is instructive to try to reproduce that with Pandas and Pandas-ply. Simply applying a filter operation to grouped data doesn’t work for either package because grouped data frames don’t have the same methods as their ungrouped counterparts. In Pandas, we can use this workaround:

mtcars.groupby("cyl")
  .apply(lambda x: x[(x["mpg"]==x["mpg"].max()) | (x["mpg"]==x["mpg"].min())])[
    ["mpg", "model"]
  ]
  .sort_values("mpg")

The same workaround also works for Pandas-ply:

mtcars.groupby("cyl")
  .apply(lambda x: x.ply_where((X.mpg==X.mpg.max()) | (X.mpg==X.mpg.min())))
  .ply_select("mpg", "model")
  .sort_values("mpg")

In Pandas or Pandas-ply, a grouped data set is more like a list of groups, and we can use an approach from functional programming to apply a function to every element in this list, each a data frame. In this function, created with a lambda expression, the desired selection is performed using standard Pandas data frame operators. It is hard to ignore the numerous references to X, similar to the .data identifier in Dplyr, but for which Dplyr offers more concise, albeit less general, alternatives. On the result of this apply call, we can select variables of interest and perform a sort, alternating operators like [[]] and methods like sort_values, but in closer analogy with Dplyr. Unlike Dplyr, Pandas doesn’t provide specialized idioms for interactive sessions, as this use case is less dominant and more recent in Python. Independent of the use case, treating grouped data frames differently from ungrouped ones makes the API harder to use for no increase in power. It represents a loss of orthogonality, the ability for a set of functions or methods to operate on different types of objects with no exclusions. Here we have an example of a lost opportunity to use the more concise form:

table.group_by(...).filter(...)

instead of the more verbose:

table.group_by(...).apply(lambda x: x.filter(...))

while the former is simply disallowed, an arguably intuitive line of code that returns an error instead of something useful.

One of the advantages of being an incumbent, like Pandas, is third-party support. For example, using package Modin together with Pandas, we can partially parallelize a program [132]. Just by importing Modin, unmodified Pandas code will run in parallel. It is the most concise parallelization I have ever seen (compare it with MPI in Sec. 1.5).

1.2.10.2 Statistical Graphics

In statistical graphics, the quest for conciseness starts with its design, independent of any implementation. In The Visual Display of Quantitative Information, author Edward Tufte defined the data-ink ratio, the ratio of ink spent to display data vs. the total amount used in a chart, and recommended maximizing it [133]. It is a somewhat counterintuitive definition, as it may seem to encourage the use of heavy marks, like thick lines and large dots; this outcome, though, is prevented by another one of Tufte’s principles, that of data density — the number of data points per unit of area. The result is graphics that display lots of data with just enough ink, minimize and de-emphasize everything that isn’t data, like grids and scales, and eschew ornamentation altogether.

SimplexCT, a data analysis consultancy, provides a step-by-step example of transforming a chart that Tufte wouldn’t hesitate to call chartjunk into one following his principles [134]. Its starting point is full of useless and distracting graphical elements, as in Fig. 5.

An example of chartjunk.
Figure 5: An example of chartjunk.

The final chart uses graphical elements sparingly, emphasizes the data, and presents it in two different ways while adding insightful comments (see Fig. 6).

The same data displayed according to Tuftian principles.
Figure 6: The same data displayed according to Tuftian principles.

An intermediate step between a graphic conception and its display is its high-level formal description. The best-known language for this task is the grammar of graphics. Invented by Leland Wilkinson, it identifies the elements and combinators needed to define a large variety of statistical graphics [135]. It avoids the verbose generality of using raster or vector graphics — describing a graphics point by point or line by line — but also the limitations of predefined graphics types: the scatterplot, the boxplot, and so forth. It has inspired several concrete implementations. A popular one is Ggplot2 (see below), and its author defines the grammar this way [136]:

A grammar of graphics is a tool that enables us to concisely describe the components of a graphic.

The Vega project develops implementations of the grammar in JSON, including extensions for interactive graphics. Vega-Lite is the most compact of the different implementations, because “Verbose specification impedes rapid authoring and hinders systematic exploration of alternative designs” [137]. Here is a simple example:

{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "data": {"url": "data/penguins.json"},
  "mark": "point",
  "encoding": {
    "x": {
      "field": "Flipper Length (mm)",
      "type": "quantitative",
      "scale": {"zero": false}
    },
    "y": {
      "field": "Body Mass (g)",
      "type": "quantitative",
      "scale": {"zero": false}
    },
    "color": {"field": "Species"},
    "shape": {"field": "Species"}
  }
}

The first line, $schema, points to the exact specification of the language this chart is written in, in this case, Vega-Lite version 5. Then there is the data source and the type of mark or sign to associate with the data. The encoding entry defines the association between visual properties of the mark, like position, color, and shape, and variables in the data set, together with the type of each variable, and details of the scale to use — in this case, it doesn’t need to include zero if the data doesn’t. Many other properties can be specified, but defaults are well thought-out. You can write these specifications directly or use libraries in different languages to generate them. The resulting graphic can be seen in Fig. 7.

A scatterplot generated with Vega.
Figure 7: A scatterplot generated with Vega.

We can violate Tuftian principles using Vega-Lite but defaults prod the user in the right direction. There are some compromises to accept, though. For instance, it was impossible to use polar coordinates or more than 5000 data points in the version I used. I think these are more temporary omissions than fundamental limitations. The main focus for Vega-Lite isn’t reproducing every type of statistical graphics ever conceived but exploring its cutting edge, interactivity.

Much more mature is Bokeh, a statistical graphics package for Python. Its documentation reads [138]:

Its goal is to provide elegant, concise construction of novel graphics […]

It may be its goal, but it doesn’t necessarily excel at it. Newer competitors surpass it along the conciseness dimension. For instance, comparing Bokeh and Dash, Flavian Hautbois reports that the former took 139 LOCs to implement a simple dashboard, whereas the latter only took 105 [139]. He also notes that the Bokeh implementation took much longer and felt harder, beyond what the line count suggests. The above Vega-Lite example can be reproduced with the following Python Bokeh code (simplified from [140]):

# data is in data frame "data"
SPECIES = sorted(data.species.unique())
MARKERS = ['hex', 'circle_x', 'triangle']

p = figure()
p.xaxis.axis_label = 'Flipper Length (mm)'
p.yaxis.axis_label = 'Body Mass (g)'

p.scatter("flipper_length_mm", "body_mass_g", source=data,
          legend_group="species", size=12,
          marker=factor_mark('species', MARKERS, SPECIES),
          color=factor_cmap('species', 'Category10_3', SPECIES))

p.legend.location = "top_left"
p.legend.title = "Species"

show(p)

There are many differences, but the most important is that p.scatter call: all this code is needed to create a pre-defined type of plot, the scatterplot, not to build one from its constituent elements, as in the systems that espouse the Grammar of Graphics. Indeed, Bokeh is, by their developers admission, only “inspired” by it but does not implement it. Another noteworthy difference is how the scales (mappings between data values and graphical properties) must be created from scratch for the marker (shape in Vega-Lite) and the color attributes. Also, there is no default for axis labels and legend title, and the default legend location obstructs the data. As a user of Vega-Lite and Ggplot2 (see below), I seldom had to deal with these details. Picking reasonable defaults that will work for most data and plot types is far from trivial and essential to these libraries’ user experience.

For R users, when it comes to graphics, there is a dominant package: Ggplot2 [141]. An implementation of Wilkinson’s Grammar of Graphics, as the initialism in its name suggests, it followed the same path as Dplyr or Keras (see Sec. 1.2.10.1). It didn’t bring radically new capabilities to its users, and pre-existing alternatives are built right into the standard library. Nonetheless, it was able to displace them by sheer force of convenience, partly based on its conciseness. For instance, Bradley Eck, a research scientist, tweeted: “Still blows me away that I can make this plot with 3 lines (read, melt, qplot63) thanks to @hadleywickham and #rstats” [142]. It may be instructive to look at how to produce the same scatterplot in Fig. 7 with Ggplot2:

theme_set(theme_minimal())
ggplot(data = penguins,
       aes(x = flipper_length_mm,
           y = body_mass_g)) +
  geom_point(aes(color = species, shape = species), size = 3)

Ggplot2 allows us to define a ubiquitous graphics type, the scatterplot, with a minimum of code and a complete focus on the essential choices to be made when designing statistical graphics.64 An aspect where Ggplot2 is starting to show its age is its lack of support for interactivity. Wickham has endeavored to build a replacement, Ggvis, to address that. Explaining its relation to some technologies Ggvis is built upon, he writes [143]:

ggvis makes many more assumptions about what you’re trying to do: this allows it to be much more concise, at some cost of generality.

It is clear that, in his view, conciseness is worth giving up some power. However, unlike other examples (see Impala in Sec. 1.2.9.9) where full generality could be achieved at the cost of more code, with Ggvis, as with Dplyr (see sec Sec. 1.2.10.1), we have to access an underlying, lower level technology — a more challenging step.

If we are willing to go lower level to produce statistical graphics, an option is D3.js [144]. Billed a “library for manipulating documents based on data,” it is more general than libraries like Ggplot2, focused on statistical graphics and quick data exploration. On the other hand, it undoubtedly requires more work for the same output. Niko McCarty, a science journalist, endeavored to reproduce our scatterplot in D3.js, but his code is too long to show here — over 100 lines, despite help from additional libraries [145]. That doesn’t mean D3’s author, Mike Bostock, isn’t thinking about minimizing the effort, including a quite ambitious idea [146]:

I’m exploring whether we can have better passive reusability in d3.express. Where by leveraging the structure of reactive documents, we can more easily repurpose code, even if that code wasn’t carefully designed to be reusable.

As if designing for reusability wasn’t challenging enough, Bostock is aiming for unplanned reusability, the philosopher’s stone of software development. However, limited to the specific context of D3, he may be able to pull it off.

1.2.10.3 Web Development

There is nothing special about libraries for data analysis or statistical graphics regarding conciseness, and the selection up to this point reflects my professional experience and interests. Similar observations can be made about other types of libraries. For instance, many developers deal with web frameworks65 and bemoan their complexity. They may refer to two unpleasant characteristics. First, some frameworks force onto developers a specific DBMS,66 an object-relational mapping67 library, an architecture such as MVC,68 and more. This lack of modularity is sometimes positively spun as coming “batteries included,” meaning that any integration challenges are taken care of. Second, some frameworks are more verbose than others: they force developers to specify details they are not interested in or repeat boilerplate code in every application.

In a comparison of three popular frameworks, Django, Flask, and Pyramid, Ryan Scott Brown, a backend developer, crowns Flask the winner [147]:

Flask’s Hello World app69 has to be the simplest out there, clocking in at a puny 7 lines of code in a single Python file.

An advantage of a seven-line application is that we can show it here in its entirety [148]:70

app = Flask(__name__)

@app.route("/")
def hello_world():
    return "Hello, World!"

if __name__ == "__main__":
    app.run()

The main elements here are creating an application object; creating a function that returns some HTML with a Python decorator that associates it with a path, which will go to form a URL; and finally, some boilerplate code to start the app. It looks like the bare minimum, but it is possible to perform slightly better (see Hug and FastAPI below). In Pyramid, it gets a bit more complicated [149]:

def hello_world(request):
    return Response('Hello World!')

if __name__ == '__main__':
    with Configurator() as config:
        config.add_route('hello', '/')
        config.add_view(hello_world, route_name='hello')
        app = config.make_wsgi_app()

    server = make_server('0.0.0.0', 6543, app)
    server.serve_forever()

We start with a simple function that returns some content. Then, that function and a path must be added to a Configurator object, in which they are associated through a name, 'hello'. What a round-about way of creating a mapping from URLs to content! Finally, we need to create an app from this configuration, and pass that to a web server, and get it started. While this is a typical architecture, there is no need to be reminded of it in every app, like there is no need to know about inodes when opening a file in Unix. These are details that Flask, thankfully, abstracts away. Smaller code size and tighter abstraction go hand in hand. In addition, Pyramid provides a utility to bootstrap a more complex project consisting of several directories and files, but we can get away with a single file in the minimal case. Django also includes such a utility, which is recommended in most tutorials. Eventually, I figured out how to run a single file Django app (based on [150]):

settings.configure(
    DEBUG=True,
    ROOT_URLCONF=__name__,
)

def hello_world(request):
    return HttpResponse("Hello World")

urlpatterns = (re_path(r"^$", hello_world),)

if __name__ == "__main__":
    execute_from_command_line(sys.argv)

After some configuration (the DEBUG mode allows skipping complex security-related code, too bad safety isn’t the default), we can see the familiar content-generating function, an associated URL definition, and a way to start the application that is close to Flask. One way Django distinguishes itself from the other two is that it is more monolithic, meaning that it comes with an included database, ORM,71 template language,72 and more to provide a complete environment for web development. The other two let the developer choose these components among many available options, a more modular approach. A final word of caution is that a minimal application should not be considered a complete quality test of a library by any stretch of the imagination: here, we use it to highlight how differences in conciseness can be apparent even at this beginner level.

Among web frameworks, the conciseness champion may be a slightly less well-known entry, Hug, targeting mostly web APIs.73 Its author, Timothy Crosley, has done the comparison work for us and concluded [151]:

For almost every common Web API task the code written to accomplish it in hug is a small fraction of what is required in other Frameworks.

Its approach is based on designing APIs in familiar Python, then turning them into web APIs by simply decorating, in the Python sense, each function we want to be invoked in response to an HTTP request. All the gory details, from (de)serializing74 arguments and return values to route definition, are taken care of with reasonable defaults. We can also get a standard command line interface out of the same code without additional effort.

Crosley omitted the newer FastAPI from his comparison. Later on, this is what he had to say about it [152]:

Honestly, what you’ve built looks super solid and polished. In many ways, it is what I wanted hug to be - it is really inspiring to see someone build that.

In an exchange of pleasantries, FastAPI developer Sebastián Ramírez acknowledges Hug was an inspiration to his work [153]. FastAPI is explicit about its conciseness. Among its key features, it highlights the following:

Short: Minimize code duplication. Multiple features from each parameter declaration. Fewer bugs.

FastAPI uses an approach similar to Hug’s, with decorators converting regular Python functions into web API routes. It makes better use of Python 3 type annotations and takes better advantage of open standards. While FastAPI isn’t officially intended to build web applications, there is nothing to stop us from recreating our “Hello World!” app:

app = FastAPI()

@app.get("/")
def root():
    return Response("<p>Hello World</p>")

It isn’t very different from the Flask version, save for the boilerplate necessary to start an app. Here, it is unnecessary because an application is started with a command such as Uvicorn with a file containing this code as an argument. FastAPI particularly shines in the processing of arguments and complex data. For example, we can modify the above example to accept two parameters, a string and an int, and return a JSON dictionary containing them:

app = FastAPI()

@app.get("/{fruit}/{how_many}")
def root(fruit: str, how_many: int):
    return {fruit: how_many}

For instance, a valid URL could be “https://example.com/orange/3” returning the dictionary {"orange":3} as a response, with data validation, conversion and reasonable error messages built-in.

The freedom granted by microframeworks to pick your preferred template language allows choosing a new entrant that, strictly speaking, isn’t one [154]:

Dominate is a Python library for creating and manipulating HTML documents using an elegant DOM API. It enables writing HTML pages in pure Python very concisely, which eliminates the need to learn another template language, and to take advantage of Python’s more powerful features.

We usually talk about conciseness in terms of LOC reduction, and defining new domain-specific languages is a way to achieve that. However, templates don’t seem to offer decisive advantages over a library. For example, Dominate and Shiny [155], a dashboard-focused web framework for R, both eschew templates altogether, replacing them with a regular library that generates HTML with more flexibility and conciseness.

Here are some simple examples to get a feel for how to create the same page with these different libraries or template languages. A well-known template language, Jinja2, like all similar technologies, embeds bits of code in a custom programming language inside a string containing HTML [156]:

<!DOCTYPE html>
 <html>
   <head>
     <title>Hello World</title>
   </head>
   <body>
     <div id="header">
       <ol>
       {% for page in ["home", "about", "contact"] %}
         <li>
           <a href="/{{page}}.html">{{page|title}}</a>
         </li>
       {% endfor %}
       </ol>
     </div>
     <div class="body">
       <p>Lorem ipsum..</p>
     </div>
   </body>
 </html>

The only bit of programming here is in that for loop in the middle, which is used to build a three-item list of links. Statements are surrounded by {% %}, expressions by {{ }}, and functions are invoked using a pipe-like syntax: page|title. It is a quirky language and one more subject that developers need to learn, but it may represent the path of least resistance for HTML content creators who aren’t programmers. Dominate can generate the same HTML with this code (simplified from the Dominate documentation [154]):

doc = dominate.document(title="Hello World")

with doc:
    with div(id="header").add(ol()):
        for page in ["home", "about", "contact"]:
            li(a(page.title(), href="/%s.html" % page))

    with div():
        attr(cls="body")
        p("Lorem ipsum..")

print(doc)

Not only is it more concise, but it also uses the familiar Python language. However, it may go one step too far by offering three separate ways to create complex HTML structures: the with statement, the add method, and direct support for iterables75 as arguments. It may support different programming styles, but it makes the code less readable. Shiny’s API is exclusively based on functions. Unfortunately, since it is intended to create dashboards, it didn’t allow me to reproduce the same HTML exactly. However, the following two snippets generate the same head and body as the previous code blocks:

tags$title("Hello World")
tags$div(id = "header",
         tags$ol(lapply(list("home", "about", "contact"), function(x)
           tags$li(tags$a(href = paste(x, "html", sep = "."), toTitleCase(x))))),
         tags$div(class = "body",
                  tags$p("Lorem ipsum..")))

Nesting in HTML becomes the nesting of calls in shiny. Each tags function can take one or more arguments or a list and operate on it appropriately. If only we didn’t have to write tags every time! The likely reason for this verbiage is that importing only selected symbols from a library isn’t well supported in R, and the potential for name conflicts would be too high without the tags object.

The now-abandoned Minima took the programming-language-first approach of Dominate and Shiny and extended it to all aspects of web development [157]. Web developers must use a plethora of languages: HTML, CSS, JavaScript, and template, query, and backend languages. Minima cut that list down to a single entry: Python. It is too bad we won’t see that project brought to full fruition.

A Python library that, like Dplyr and Ggplot2 for R (see Sec. 1.2.10.1), dislodged the incumbents is Requests. Despite being a replacement for Urllib, which is bundled with every Python distribution, it is downloaded millions of times per week. It is yet another HTTP client library and performs all the operations you can expect from such a package. However, it requires little code, unlike its predecessors. Daniel Greenfeld, a Python developer, writes [158]:

Nuked a 1200 LOC spaghetti code library with 10 lines of code thanks to @kennethreitz’s request [sic] library. Today has been AWESOME.

A 120x code reduction is the second largest shrinkage factor in a concrete project reported in this book (see Sec. 1.2.11 for the record holder). I don’t have the code for this example, nor would I saddle you with 1200 LOCs of spaghetti code, but I have a short one by Requests’ author Kennneth Reitz, which I have ported to modern Python 3 and slightly shortened [159]. First, the Urllib version:

gh_url = "https://api.github.com"
req = urllib.request.Request(gh_url)
password_manager = urllib.request.HTTPPasswordMgrWithDefaultRealm()
password_manager.add_password(None, gh_url, "user", "pass")
auth_manager = urllib.request.HTTPBasicAuthHandler(password_manager)
opener = urllib.request.build_opener(auth_manager)
urllib.request.install_opener(opener)
handler = urllib.request.urlopen(req)
print(handler.getcode())
print(handler.headers.get("content-type"))

And then the Requests version:

response = requests.get('https://api.github.com', auth=('user', 'pass'))
print(response.status_code)
print(response.headers['content-type'])

The Requests version is for the laywoman. She needs to access a URL for which she provides authentication credentials. She is interested in the status code and content type header associated with the response. Knowing the basics of the HTTP protocol is enough to understand this version. The Urllib version is for some high priestess of the web. She needs to be initiated into the mysteries of password and authentication managers, URL openers and handlers. It looks like the manifest of over-engineering, combined with a willingness to rename everything with generic names that obfuscate meaning. The usual offenders are “manager,” “handler,” and “context”: catch-all terms with an aura of competence.

1.2.10.4 More Libraries

Logging code clutters a program’s logic (that is why StackOverflow removed all of it, see Sec. 1.3.1) and is easy to get wrong if consistency is a goal. log_calls aims to solve that. Its creator writes [160]:

log_calls can save you from writing, rewriting, copying, pasting and tweaking a lot of ad hoc, debug-only, boilerplate code — and it can keep your codebase free of that clutter.

log_calls, as the name suggests, is focused on logging function calls but sports additional features, including logging custom expressions at any point in the code. It does so with a bare minimum of code and produces well-organized output. It is based on decorating with @log_calls functions whose invocation needs to be logged:

@log_calls(log_retval=True)
@div_w_remainder(x,y):
  return (x//y, x%y)

The call div_w_remainder(5,4) will generate the following output:

div_w_remainder <== called by <module>
    arguments: x=5, y=4
    div_w_remainder return value: (1, 1)
div_w_remainder ==> returning to <module>

To reproduce something of this sort without the assistance of Log_calls, here is what I came up with:

def div_w_remainder(x, y):
    caller = inspect.stack()[1][3]
    print(f"div_w_reminder <==== called by {caller}")
    print(f"   arguments: x = {x}, y = {y}")
    retval = (x // y, x % y)
    print(f"   div_w_reminder return value  = {retval}")
    print(f"div_w_reminder returning to {caller}")
    return retval

The amount of logging code in the latest snippet is a multiple of the non-logging code and hampers its readability. If one decorator per function or method is too noisy, we can even log all the methods in a class by decorating the class, or all callables in a module with a single call.76

Another programming concern that should not be polluting the main logic is data validation, which means checking that variables contain what they are supposed to. Validation is required when accepting user input, but it may also be appropriate for a function to perform on its arguments. Its goal is to have a program fail early and with an exception or an error message if an invalid argument is provided. While assert condition(x) may look like a good idea, the problem is that the evaluation of condition(x) may fail with its own exception; or it may return False, in which case an AssertionFailure will be raised. Developers often want more control over what exceptions are raised. To this end, they will have to wrap the condition evaluation in a try-except statement. Imagine doing so for every argument of a function.

A possible solution is Valid8, a validation package for Python that makes a strong conciseness claim: it “provides tools to perform validation in a minimum number of lines” [161]. While it is hard to prove its optimality, it certainly simplifies the problem of validating data. For example, let’s say an argument x is required to be a positive number. This condition could be checked as follows:

assert isinstance(x, Number) and x > 0, "Not a positive number"

However, assuming control over the exception raised is required, we need a different approach:

if  not (isinstance(x, Number) and x > 0):
    print("Not a positive number", stderr)
    raise ValidationError

Using Valid8 we can instead write:

def is_positive(x):
    return x > 0
validate("x", x,
    instance_of=Number,
    custom=is_positive,
    help_msg="Not a positive number")

Well, not any more concise, even if we could reuse is_positive elsewhere, but at least it produces a nicely formatted message with more information, for instance when x == 'c':

ValidationError[TypeError]: Not a positive number. Error validating [x=c]. HasWrongType: Value should be an instance of <class 'numbers.Number'>. Wrong value: 'c'.

or x == -1:

ValidationError[ValueError]: Not a positive number. Error validating [x=-1]. InvalidValue: Function [is_positive] returned [False] for value -1.

We still need to achieve a more concise solution, which can be done using companion packages Vtypes and Pyfields. These two packages incorporate validation into type checking and classes, respectively. With Vtypes, we can integrate validation into a type [162]:

PositiveInt = vtype('PositiveInt', int, {'should be positive': lambda x: x >= 0})

With this definition, isinstance performs type and validation checks as in assert isinstance(2, PositiveInt).

Could we tie this into type annotations for a most concise validation of arguments? Type annotations are parameter type declarations ignored by default but available to tools and libraries for static and run-time checking. Using Vtypes as type annotations with a checking library like Pytypes would give a concise way of validating function arguments. This idea didn’t pan out since Pytypes and Vtypes don’t work together, so I decided to roll out my own decorator:77

def typechecked(f):
    @wraps(f)
    def wrapper(*args, **kwargs):
        for arg, val in getcallargs(f, *args, **kwargs).items():
            ann_type = get_annotations(f).get(arg)
            if ann_type is not None:
                if not is_vtype(ann_type):
                    name = ann_type.__name__
                    ann_type = vtype(name + "__", ann_type)
                ann_type.validate(arg, val)
        return f(*args, **kwargs)

    return wrapper

This decorator exploits type annotations to type check and validate each argument for which an annotation is available. For instance, with the following definition:

@typechecked
def f(x: PositiveInt, y, z: int):
    ...

before the body of f runs, f’s first and last arguments will be checked to be a positive integer or any integer, respectively, whereas no checks will affect the second argument. Now it is concise to my satisfaction and very readable for good measure. If you are interested in how this decorator works, here is a quick explanation. It takes a function as an argument and returns one, like any decorator. The returned function, wrapper, is decorated with wraps from the Functools package, which adjusts its signature and other metadata to be the same as function f. The body of wrapper uses getcallargs, from package Inspect, to obtain a dictionary of all the arguments in the function call and their values. Iterating through each of them, it does nothing if there is no annotation (ann_type); otherwise, if the annotation isn’t a vtype, it converts it to one and then uses it to validate the argument value,78 before calling f with its arguments.

I have brought up Java mostly on the verbose side of programming (see Scalding in Sec. 1.2.10.1, and Kotlin in Sec. 1.2.9.7), and there will be more opportunities to do so (see java.util.Arrays in Sec. 1.5). However, here is some good news: we can implement a simple but significant data structure, an LRU cache,79 in just ten lines of Java [163]. On second thought, its author, engineer Christopher Wu, leveraged Java’s extensive libraries rather than the language’s intrinsic capabilities. He observed the similarity between the task he had to solve and a lesser-known data structure already in the standard library, the LinkedHashMap. It is a data structure similar to a hash map but with a guaranteed iteration order corresponding to insertion or most recent access order (as determined by the third argument to the super call below). It even provides an overridable80 eviction policy (removeEldestEntry, limited to “when” to remove, whereas the order is the same as the iteration order), suggesting that the authors of this data structure foresaw caching as its most likely use. These features lead to a concise implementation for an LRU cache:

public LRUCache<K, V> extends LinkedHashMap<K, V> {
  private int cacheSize;

  public LRUCache(int cacheSize) {
    super(16, 0.75, true);
    this.cacheSize = cacheSize;
  }

  protected boolean removeEldestEntry(Map.Entry<K, V> eldest) {
    return size() >= cacheSize;
  }
}

Since someone else has done the work of implementing LinkedHashMap and added it to the standard library, clearing a high bar for software quality, we are unlikely ever to have to worry about its implementation and conciseness. However, this shortcut is unavailable when we need to build an abstraction from lower-level ingredients. The opposite would be a language with better abstraction capabilities but less extensive libraries. Are the programming possibilities worth developing more libraries from scratch? Which choice will result in fewer LOCs to develop? I don’t think there is an answer that can be valid for every project.

To wrap up this sampling of conciseness-enabling libraries, let me mention three libraries that, like Vtypes (see above), are language extensions. By this, I mean that, rather than focusing on an application domain, they add language capabilities like manipulating function signatures, defining classes, and imitating constructs from other languages.

The first is Forge, which focuses on building APIs. As we will see in the section on APIs (Sec. 1.3.3), proponents of the humane interface support expanding an API with different variants or combinations of the same calls, often with simple implementations, as long as they capture common usage patterns and prevent repetitive code. Unfortunately, writing such APIs usually requires boilerplate code to shuffle arguments around, declare the same parameter list with minor variations and so on. Devin Fee, author of Forge, takes the popular Python Requests (see Sec. 1.2.10.3) as a use case. Analyzing a file in which Requests’ API is defined, he comments [164]:

The code is a little more than 150 lines, with about 90% of that being boilerplate. Using forge we can get that back down to about 10% its current size, while increasing the literacy of the code.

Indeed there are eight functions defined therein [165], each with a one-line implementation. The first, request, calls out to a different module. All the others invoke request with minor variations in the accepted arguments. However, its author duly repeats the documentation eight times, unaware of or uninterested in any alternative, leading to an eight-fold duplication in some documentation entries and many more close matches for a total of over 150 lines. In contrast, Fee manages to get away with about 20:

request = forge.copy(requests.Session.request, exclude='self')(requests.request)

def with_method(method):
    revised = forge.modify(
        'method', default=method, bound=True,
        kind=forge.FParameter.POSITIONAL_ONLY,
    )(request)
    revised.__name__ = method.lower()
    return revised

post = with_method('POST')
get = with_method('GET')
put = with_method('PUT')
delete = with_method('DELETE')
options = with_method('OPTIONS')
head = with_method('HEAD')
patch = with_method('PATCH')

He does so in three parts: first, he replaces the signature of request with one from another equivalent function in that package that doesn’t hide most arguments inside kwargs.81 This is because he thinks that fully specified signatures are better, and kwargs should be reserved for when the set of allowable arguments is infinite, for example, for function dict. Then he defines a higher-order function that takes request, binds its argument method to the appropriate value (say GET or POST), changes its name accordingly (say "get" or "post"), and returns it. Then he uses this function seven times to create all the other entries in the API. Et voilà, seven functions in seven LOCs, with all their documentation, as the metadata is preserved by default. However, there is a catch: to do so, he had to force seven of the eight functions to share the same signature, whereas in the original there are four different variants, and some of the “excess” arguments, unsurprisingly, don’t make sense in all cases. For example, an HTTP POST can submit data to the server, but an HTTP GET can’t. Finding in the signature of get an argument data, which will be ignored, is confusing. However, this is just a poetic license, not a limitation in the library, which has all the tools to define several signatures, sharing some arguments but not sharing others. Even a more refined version of this example would come out way ahead in line count and internal consistency.

Another library for Python, Attrs, focuses on defining classes [166]:

Its main goal is to help you to write concise and correct software without slowing down your code.

Its author, Hynek Schlawack, starts from the observation that many classes require the definition of dunder methods: initialization, comparisons and character representations almost invariably perform the same tasks no matter the class they belong to (see POJOs in Sec. 1.2.9.7). He wryly notes that “while we’re fans of all things artisanal, writing the same nine methods all over again doesn’t qualify for me.” Such a blatant violation of the DRY principle (see Sec. 1.2.3) required a remedy, so much so that some features of Attrs were later incorporated in Python as data classes. The original, though, is still more powerful. Here is a minimal example from the documentation:

@define
class Coordinates:
    x: int
    y: int

That’s all! How much will you miss writing:

def __init__(self, x: int, y: int):
    self.x = x
    self.y = y

Dull, isn’t it? And it isn’t just that method; you need also __eq__, __ne__, and all the other comparisons, __str__, __repr__, and __hash__. Beyond that, Attrs provide features such as validation and conversion of fields at initialization, frozen classes, and much more. Given the popular Pythonista’s motto “explicit is better than implicit,” it isn’t surprising that the language would require practitioners to write all the dunder methods explicitly. Luckily, the language is flexible enough, and its community is creative enough to remedy this design flaw.

Vadr for R “implements workalikes for the author’s (and perhaps your) favorite features from other languages, making R programs shorter and more expressive” [167]. The original Vadr has been supplanted by seven more focused packages, of which, unfortunately, only two are currently available. Among other features, Vadr included destructuring assignments, a feature present in a growing list of languages. It allows writing expressions like bind[first, ...=middle, last]= a_list instead of:

first = a_list[1]
middle = a_list[2:(length(a_list)-1)]
last = a_list[length(a_list)]

There are several ways a destructuring assignment is more concise: we don’t need to repeat the name of the structure being operated upon or the assignment operator again and again, and the ellipsis symbol ... replaces working directly with the indexes in many cases. bind supports nested lists, and more complex examples would further highlight its conciseness. Pipelines, also included in Vadr, allow chaining operations in a more natural order without creating intermediate variables, as we have seen in Sec. 1.2.9.3. I believe Vadr’s implementation predates the one that has become the most adopted in R, from package Magrittr. Here instead of relying on a custom operator, pipelined operations are listed as arguments to the function chain:

chain(n,
      seq(length=.+1, 0, 2*pi),
      cbind(sin(.), cos(.)),
      apply(2, diff),
      .^2,
      rowSums,
      sqrt,
      sum)

This expression calculates the perimeter of a regular polygon with n sides. Please note the . symbol represents the output of the previous operation in this context.

In addition, Vadr offered macros, whose power we have already discussed talking about Lisp (see Sec. 1.2.9.1), partial application, an essential tool for functional programming, and more.

1.2.11 Programs

Before proceeding to examine some programs, a caveat is in order. The equivalence of programs is a more nuanced proposition than the equivalence of functions, and we will see examples where the larger program has more features. However, the smaller programs still broadly perform the same role while being much more concise (for programs focused on frugality, see Sec. 1.3.4).

If two programs fulfill the same need, should we care whether one is larger than the other? A smaller program takes less storage and loads faster, but those are practical rather than ascetic concerns. Indeed, nothing has prevented programs from ballooning to ever larger sizes [12]. However, some people pay attention to code length. For example, Comparing VPNs,82 Ars Technica’s reporter Jim Salter states the following [168]:

WireGuard weighs in at around 4,000 lines of code; this compares to 600,000 total lines of code for OpenVPN + OpenSSL or 400,000 total lines of code for XFRM+StrongSwan for an IPSEC VPN. Two orders of magnitude fewer lines of code mean a lot less attack surface to find flaws in.
A much smaller codebase also means code that’s more likely to work the way it’s supposed to.

Software reviews seldom highlight conciseness or any other property of the source code. To add to this reviewer’s opinion, Linus Torvalds, author of the Linux operating system, commented:

Can I just once again state my love for it [WireGuard] and hope it gets merged soon? Maybe the code isn’t perfect, but I’ve skimmed it, and compared to the horrors that are OpenVPN and IPSec, it’s a work of art.

This praise sounds even more significant compared to Torvalds’ often abrasive posts.

Lwan is a web server, like many out there, but with a notable feature from our point of view: “Its simple architecture and tiny source code guarantees the entire code base can be easily grokked” [169]. Unlike Lwan, many software systems have reached a level of complexity such that reading the complete source code is no longer practical. This fact has several undesirable consequences, including that developers give up on reading source code beyond what is strictly necessary to add their contributions; and that they write code as if it will never be read, save maybe for a quick code review by a fellow programmer. Imagine how literature would be if writers didn’t read whole books and if no one but the authors themselves ever read most pages. Lwan is like a classic, according to the definition given by Italian writer Italo Calvino [170]: you can read it again and again, as Twitter user neurodrone reports [171]:

Lwan is a work of art. Every time I read through it, I am almost always awe-struck.

We have seen that program size isn’t considered a controllable parameter (see Sec. 1.1), at least according to some. However, there is a counterexample: a program for which a length upper bound has been set [172]:

dwm is only a single binary, and its source code is intended to never exceed 2000 SLOC.

Should there be an upper limit on how many LOCs are used to implement a program? That would be an innovation in software engineering. I am not aware of any methodology that imposes constraints of this sort. Throughout my career, I was never asked to limit my LOC output, nor were my coworkers. If you need one KLOC to fix a bug and can produce it within the available time, so be it. The maintenance bill doesn’t reach the original author. You may object that Dwm, a spartan window manager that is freely available, doesn’t prove this is a viable approach to software engineering. However, its list of features seems to omit only advanced items such as Lua scripting or remote control. On the other hand, its configuration requires recompilation, which “[…] keeps its userbase small and elitist. No novices asking stupid questions.” It is an original twist on the luxury market strategy of keeping goods and services out of reach for most.

A recent trend resulting in smaller programs is replacing statistics- or heuristic-based code with machine learning models. These models are anything but simple (see Deep Learning, Sec. 2.1.4), so you may argue that the complexity has just morphed, but it is learned from data rather than hand-coded. A stunning example is Google Translate, which, in the process of switching from a more traditional approach to deep learning, shed 99.9% of its code, from 500 KLOCs to 500 LOCs [173]. This is the largest code shrinkage factor reported in this book, by quite a margin. The newer version apparently also provides better translations [174].

This approach can benefit even small amateur projects. For example, Jaques Matteheij needed to sort 2 tons of Lego and created a robotic system to help with the task [175]. Switching his approach to machine learning, this is what changed:

[…] approximately 2000 lines of feature detection code, another 2000 or so of tests and glue was [sic] replaced by less than 200 lines of (quite readable) Keras code.

(see Keras, Sec. 1.2.10.1).

Small programs can generate very large outputs with properties such as pseudo-randomness, the attribute of a deterministic sequence of numbers that looks generated by a stochastic process and can pass tests of randomness. It replaces true randomness in most applications that need it, including Monte Carlo simulations, online security, and privacy. The art of developing programs of this type, called random number generators, is a small but active subfield of computer science. A simple option, which gives results good enough to be still in use, is the family of linear congruential generators, defined by the recursion xn+1=(axn+c)modmx_{n+1} = (a x_n + c) \mod m. x0x_0 is the starting value or “seed” and can be user-provided or based on sources of genuine randomness. The choice of parameters aa, cc, and mm is crucial and can result in generators of drastically different quality. More modern options use additional operations, including bit-wise logical operations, bit rotations, and shifts, but aren’t much more complex. A random number generator currently used in most web browsers comes in several versions, each roughly 13 lines of C code (here showing xoroshiro128+, [176]):83

static inline uint64_t rotl(const uint64_t x, int k) {
    return (x << k) | (x >> (64 - k));
}

static uint64_t s[2];

uint64_t next(void) {
    const uint64_t s0 = s[0];
    uint64_t s1 = s[1];
    const uint64_t result = s0 + s1;

    s1 ^= s0;
    s[0] = rotl(s0, 24) ^ s1 ^ (s1 << 16); // a, b
    s[1] = rotl(s1, 37); // c

    return result;
}

It isn’t hard to follow, but for people unaccustomed to C, let’s recall that << and >> rotate the bits in an integer by a given amount, left and right, respectively. | and ^ are bit-wise OR and XOR operators, respectively. Understanding why successive calls to next return a pseudo-random sequence is much less intuitive. One of the co-authors of this generator, Sebastiano Vigna, uses a biological metaphor to describe this class of programs (my translation from [177]):

[Random number] generators start from few lines of code but generate billions of numbers. It’s like creating a live being and make it live.

1.2.11.1 Concise Programs in Mathematics

If we drop the requirement that a short program outputs anything, how long can it go before stopping? It may sound silly, but this question has been the starting point for a mathematical inquiry into the busy beaver numbers [178]. The nth such number, or BB(n)BB(n), is the largest number of steps a Turing machine limited to nn states can perform before stopping. Using a Turing machine in the definition is helpful because it is easy to define its size, but similar numbers could be defined in any programming language. An alternative definition requires a Turing machine to write the largest number of 1s onto its output tape (the two definitions lead to distinct but closely related functions). Why wouldn’t a simple infinite loop be enough? Because it doesn’t end. The name of the game is preventing the machine from terminating for as long as possible, but not forever. Otherwise stated, we are trying to encode the biggest number that can fit in a size-constrained Turing machine. If you have ever played the game of writing the largest number that will fit on a sticky note, using digits and operators, this is a more mathematical version of that game.

BB(n)BB(n) is a function with some properties that I would describe as magical. It increases at a ferocious pace: faster than exponential, even faster than the Ackermann function.84 Indeed, no function growing faster than BB(n)BB(n) is computable, like BB(n)BB(n) itself. That is, no Turing machine can take in input nn and return BB(n)BB(n). The proof of this fact is straightforward if we recall the archetypal uncomputable problem: deciding whether a given Turing machine will halt. BB(n)BB(n) limits how long we have to run a machine before concluding with certainty that, if it hasn’t stopped so far, it never will. This bound would provide an easy, if slow, solution for the halting problem: given a Turing machine with nn states, run it for BB(n)BB(n) steps: if it stops before reaching that mark, we have our answer. If it doesn’t, it never will. In a way, it turns an infinite question into a finite one. Only a handful of BB(n)BB(n) values are known, and relatively small ones are provably independent of a standard formulation of set theory. Therefore, no mathematical proof of their values is possible without non-standard axioms.

If you want to try designing one of these machines, they seem to follow a common template. The main idea is to compute some big numbers and do so as slowly as possible. They implement iterated functions, not unlike the random number generators in the previous section, but closer to the Collatz sequence:

f(n)={n2if n(0mod2)3n+1otherwise f(n)= \begin{cases} \frac{n}{2} & \text{if } n \equiv (0 \mod 2)\\ 3n+1 & \text{otherwise} \end{cases}

These produce a non-repetitive sequence of numbers, including some large numbers, when carefully tuned. All intermediate operations are implemented with inefficient algorithms that work at a glacial pace but also fit within the narrow confines of these constrained machines. Numbers are encoded using unary notation,85 which simplifies program logic — for instance, a sum becomes just a concatenation — but requires more elementary steps. Performing slow arithmetical operations on large unary numbers gives these machines their prodigious run times. It is a bit of a culture shock for engineers who prize algorithms that are thrifty with time and memory, but here the goals are reversed: use as much time as possible writing only a tiny bit of code, while memory is no object. Although this approach worked before, there isn’t a rule to generate these machines, and their discovery has been halting.

The only complaint I have about these numbers is about their name. Since they are designed to spin idly for the longest time, they should be called super-procrastinator numbers (for procrastination in human endeavors, see Sec. 3.4).

This concept of a small machine that produces a maximal string of all 1s is related to that of Kolmogorov complexity, which is, for any given string xx, the size of the smallest Turing machine or program that can reproduce xx [179].86 In other words, the theory of Kolmogorov complexity is the theory of maximally concise programs for a given output. It has applications in the theory of computability, probability (see algorithmic probability in Sec. 2.1.2) and program testing. Concerning the latter, Luc Longpre, Vladik Kreinovich, and Ann Quiroz Gates formally show that short programs are easier to test [180]. An informal statement of their main result reads:

Theorem 1 says that to check whether a given program pp is correct it is sufficient to check this program only on inputs that are not too complicated […] The more complicated the program pp and/or the specification […] the more examples we need to test.

In this quote, “not too complicated” for inputs means having low Kolmogorov complexity, whereas “complicated” for programs and specifications means having more characters. Therefore short programs are easier to test. Please note that this theorem allows proving the correctness of programs from a finite test set, contradicting Dijkstra’s famous statement: “Program testing can be used to show the presence of bugs, but never to show their absence!” [3]. Unfortunately, there are many obstacles to making this research valuable in practice: first and foremost, the Kolmogorov complexity is an uncomputable function. An additional obstacle is that simple inputs can be huge, since small programs can generate them, as we have learned by looking into busy beaver numbers (see above). We are still a long way from proving real programs correct using this approach.

1.2.11.2 Obfuscation

So far, we have considered line, statement and token count (see Sec. 1.2.6), as well as number of states (see the previous section) as useful measures of program complexity, but there is a type of programming competition that focuses on character count, known as code golfing. Programs created to win these contests sacrifice everything in the name of program length, namely readability, and resort to dedicated languages created for this purpose, such as GolfScript. For instance, this program prints the first thousand digits of π\pi [181]:87

;''
6666,-2%{2+.2/@*\/10.3??2*+}*
`1000<~\;

Unlike most code blocks in this book, we will not try to understand this one.

Counting characters may seem to cross the line between terse and silly, but there are good reasons to do so in specific contexts. For example, we used Turing machines as a computing model while discussing Kolmogorov complexity (see the previous section). Had we used any programming language, a workable definition would have had to limit the character length of a program, not just the number of tokens or lines therein. The reason is that it is possible to encode the desired output, no matter how complex, in a single integer or string, and those count as one token in every programming language I am familiar with. Later on, we will see in Sec. 2.1.2 how statisticians have looked at statistical models as programs. Limiting their character length means, among other constraints, that they can’t include numbers of unbounded magnitude or precision in their code. This constraint is essential because allowing such numbers can have surprising statistical consequences (for instance, see example three in [182]). A limit on the number of tokens or lines alone would not enable proving theorems equivalent to those that follow from a character length limit.

To close this section on concise programs, an example of conciseness veering into obfuscation or, maybe, the visual arts. A random dungeon generator is a program that generates layouts for the game Dungeon & Dragons. Author and developer Robert Nystrom endeavored to write one, in C, that could be printed on a business card ([183]; see also the front cover). GitHub user Joker-vD took a keen interest in it and commented:

Really nice, I had some fun uncompressing and deobfuscating it — there are some nifty tricks in it! […] it is amazing how much can be accomplished in so few lines of code.

1.2.12 Systems

What is valid for programs continues to be so at the level of system architecture, even if the program length metric may not be the one used. For instance, software engineer Brandur Leach explains how the cloud platform Heroku practices architectural minimalism [184]:

Only by concertedly building a minimal stack that’s stable and nearly perfectly operable can we maximize our ability to push forward with new products and ideas.

One takeaway is that maintaining simplicity in the face of supporting new features requires a constant, organized, resourced effort. Another one is that this minimalism is a necessary condition for the evolution of the product, not a “nice to have.”

If the pressure to build new features engenders complexity, we can work to reduce it after the fact. The progression toward complexity is reversible. For example, Stephen Kell, a lecturer in Computer Science, writes [185]:

An underrated way to make progress is by distillation. It’s to recognise repetition, to integrate and unify existing systems where possible.

It is such an important activity that Dru Riley (who also writes a concise newsletter, see Sec. 3.1.1) included it in his LinkedIn resume [186]:

Building scalable services, consolidating duplicate systems and extracting insights.

It is an unusual sighting while scanning resumes and job descriptions. The emphasis is always on building systems and delivering features. The complexity left behind is someone else’s problem, as engineers are shuffled from one assignment to the next and “builders” job hop before the complexity bill is due. Working on reducing complexity is generally considered second-class work and not career-boosting.

Summarizing a keynote by Armstrong, Kell writes [185]:

A theme […] was that creating stuff doesn’t necessarily make progress overall. Proliferation reduces value. Entropy is our enemy. He mentioned the bewildering array of build systems, packaging systems, unidentifiable system software […] and, more generally, copies of the same or similar code and data. Building a “compressor” [see Burn’s “exercise in compression” in Sec. 1.2.1 and Steele’s “Huffman encoding” in Sec. 1.2.9] was his likeably vague solution.

There are even tools to deal with complex architectures and dedicated architect roles whose task is to define them without contributing a single line of code. Both are counterproductive because they will encourage complexity, according to Gergely Orosz, who worked on large infrastructure projects at Uber and Microsoft [187]. If specialized tools are needed to describe an architecture, simplify it. Design tools won’t reduce the coding and maintenance burden for a team.

1.3 Frugal Software

1.3.1 Programming

There is already a specialization of less is more (see Sec. 3.2.1 for the origin of this phrase) for computing: worse is better. It is due to Richard P. Gabriel, a computer scientist known for his work on the Lisp language [188]. His main argument is that software quality doesn’t necessarily increase with the number of features. There is a point where adding more features makes an application less practical because, for instance, it needs more expensive hardware to run; or less usable since it becomes harder to learn. Despite this concept having been formulated early on (see Brooks on “anomalous features” in Sec. 1.3.6) and widely disseminated, it doesn’t seem to have prevented the progressive bloating of applications. Software is still developed as an accumulation of features: the more, the merrier. Marketers want to claim a feature list longer than the competition. Software reviews go through checklists of features. Hardware continues to become more powerful at a roughly exponential rate, removing one reason for restraint. Software resilience engineer Alex Gaynor observed the following [189]:

The most natural implementation of any feature request is additive […]. As this process is repeated, the simplicity of a system is lost and complexity takes its place. […] Every feature request has a constituency — some group who wants it implemented, because they benefit from it. Simplicity does not have a constituency in the same way, it’s what economists call a non-excludable good — everyone benefits from it.

However, there are a few voices in support of simplicity, if a little isolated. Will Shipley, the winner of three Apple design awards, warns us to “start with brevity” [190]. All the other good properties of software should be only increased as necessary — in his words, “as required by testing.” For instance, don’t build flexibility into some software just because flexibility is good, but only if it is a requirement. Likewise, there’s no need to make a videogame as robust as a nuclear installation’s control software.

Software architect and developer Lawrence Kesteloot is even blunter: “[…] refuse to add any features or lines of code unless you need it right now and you need it badly” [191]. Imagine if your boss asked you to perform a task, and your only retort were: “Do we need it badly?” That conversation would be short, I suspect. However, it could last marginally longer if you let your boss know that programming superstars at Google ask this type of questions. In the words of Hacker News user nostrademons [192]:

For those who work inside Google, it’s well worth it to look at Jeff & Sanjay’s commit history and code review dashboard. They aren’t actually all that much more productive in terms of code written than a decent SWE3 who knows his codebase.
The reason they have a reputation as rockstars is that they can apply this productivity to tasks that really matter; they’re able to pick out the really important parts of the problem and then focus their efforts there, so that the end result ends up being much more impactful than what the SWE3 wrote. The SWE3 may spend his time writing a bunch of unit tests that catch bugs that wouldn’t really have happened anyway, or migrating from one system to another that isn’t really a large improvement, or going down an architectural dead end that’ll just have to be rewritten later.

It isn’t so much the code they write, but what they decide to work on, which implies they have the agency not to work on something less relevant. You can’t become a superstar if you have to pick the next entry in a so-called backlog.88 The programmers referred to in this excerpt are, most likely, Jeff Dean and Sanjay Ghemawat, whose work includes MapReduce, GFS, BigTable, and Spanner, cornerstones of the Google infrastructure and influential projects throughout the industry.

Tero Piirainen sees frugality as a competitive advantage [193]:

Minimalism: an undervalued development skill

Here’s a cheap trick to make a difference: focus on the bare essentials and get rid of the rest. It’s easy to be different, because others are mostly doing the opposite: tons of crap.

As the CEO of a company making a web analytics product, he must be concerned about page load times, which his software contributes to (see the “website obesity crisis” in Sec. 1.5). Nevertheless, his argument goes beyond that: the additional features his software doesn’t have aren’t left out in a compromise with speed; they are because they would make his product worse. However, I can’t exclude there is a hint of sour grapes as well.

Atwood goes further by calling code “our enemy” [194]:

Every new line of code you willingly bring into the world is code that has to be debugged, code that has to be read and understood, code that has to be supported. Every time you write new code, you should do so reluctantly, under duress, because you completely exhausted all your other options. Code is only our enemy because there are so many of us programmers writing so damn much of it. If you can’t get away with no code, the next best thing is to start with brevity.

Similarly, Kesteloot calls for writing not just less but “the least code that will get the job done” [195]. Elsewhere he writes [196]:

Every line of code you write is a potential bug. Do not write any line of code unless you absolutely need it right now and your program will suffer for the lack of it. Do not write routines speculatively. Do not write abstraction layers you don’t need right now. If an optimization will add any complexity whatsoever […], resist it. You will be sorry in five years when your code is riddled with potentially-buggy [sic] code that you never really needed to write.

It is a resounding rejection of planning for future or predicted needs and a warning against over-engineering. The value of keeping a project small and the uncertainty about future requirements make a strict focus on the present the best strategy. Anyway, “You Aren’t Gonna Need It,” (YAGNI). This principle is part of a software methodology called Extreme Programming that indeed rejects the usefulness of planning, even if its supporters don’t word it this way [197].

While some of these warnings may sound like bordering paranoia, think of the Heartbleed bug (see Sec. 1.5) and other major security breaches or mission- and life-critical software. In those contexts, a defensive attitude is warranted. However, when developing a game or other entertainment software, the best balance between features and robustness or maintainability may be different.

All good software must have one feature: logging; well, until now. From now on, it is optional, with Atwood’s blessing [198]:

Logging means more code. If you’re using a traditional logging framework like log4net, every logged event is at least one additional line of code. The more you log, the larger your code grows. This is a serious problem, because code is the enemy. Visible logging code is clutter — like excessive comments, it actively obscures the code that’s doing the real work in the application. […] We’ve since removed all logging from Stack Overflow, relying exclusively on exception logging. Honestly, I don’t miss it at all.

Blogger and developer Itamar Turner-Tauring urges his readers to stop being a “10x programmer” and become not just an average one but a “0.1x programmer” [199]. Some of his suggestions are about doing more with less, from choosing a higher-level language to increasing code reuse. Others are about doing less, including “drop the feature” and a radical “drop the product.” The former is an invitation to filter feature requests before queuing them. Clients may not know other ways to achieve their goals and may request capabilities that are seldom needed and for which it isn’t worth developing, learning, and maintaining new features. “Dropping the product” is a reminder that not all good ideas are well received, and that it is essential to test the market fit before devoting a lot of resources to developing one. On the other hand, users may not know they need a product or believe in its feasibility until it is in their hands. Please note that, in both cases, “drop” is construed as “do not develop.” Dropping an existing feature or product presents additional hurdles by breaking users’ expectations.

If dropping one product isn’t enough, the management at Basecamp went even further: they dropped all but one, spinning off or selling the others [200]. Their motivation was that they felt a little scattered and didn’t want to grow into a larger company. An external observer may counter that the other products were not remotely as successful as their main one or had been integrated into it. The move also partially reneged on a promise to support their products forever (they added a new product again more recently, see Sec. 3.4).

They aren’t the only ones to drop products. Google has retired so many products that people are wary of adopting the ones that remain [201]. This results from an internal reward structure that favors product launches over incremental improvements rather than any frugal thinking. There is nothing frugal about today’s Google.

1.3.2 Languages

No matter how simple or complex at conception, languages tend to drift towards complexity by incorporating new features over time: generics in Java, coroutines in Python, and objects in Lisp. Therefore, maintaining simplicity in a language doesn’t seem to be a primary concern. We have already seen how the original Lisp, or some of its dialects like Scheme, are relatively small languages with few but powerful features. Yet, it isn’t clear that it was a primary goal in their design. As we have learned in Sec. 1.2.9.1, the original aim was to replace the Turing machine with a language resulting in more concise programs, not a simpler language. However, simplicity helped greatly in creating its first interpreter, for instance.

A language that should drop features is JavaScript. Facundo Olano writes [202]:

JavaScript is loaded with crap (the bad and awful parts); there’s a great functional language somewhere in there, striving to get out; sometimes, a lot of the times, less is more. […] We can (and we should) make a better language out of JavaScript by subsetting it.”

There even is a book about it: JavaScript: The Good Parts [203]. However, in practice, those who avoid the bad parts can’t prevent others from using them and, therefore, still have to deal with programs written in the entire language. Hence, a better option may be writing in JavaScript derivatives such as CoffeeScript or TypeScript, which address JavaScript’s shortcomings while maintaining a high level of compatibility.

The idea of subsetting a language isn’t new. Hoare was scathing in his critique of complex languages such as PL/1 and ADA and proposed to fix them by subsetting, turning some features into extensions to be used when appropriate [204] (see also Sec. 1.5). Indeed, at least two dialects of ADA exist, SPARK and HAC, which are both strict subsets of the original language. On the other hand, he praised Pascal for its simplicity and observed how it was the starting point for several dialects for different applications. Another example of subsetting is RPython, a dialect of Python “amenable to static analysis” [205]. It doesn’t have a rigorous definition: “The exact definition is ‘RPython is everything that our [Pypy’s development team’s] translation toolchain can accept’ :).” It nonetheless serves its purpose as the implementation language for the Python interpreter Pypy. Comparing it to the traditional one in C, “the clear advantage is that such a description [the one in RPython] is shorter and simpler to read, and many implementation details vanish.”

The typesetting language TeX may or may not be considered frugal, but it sure isn’t going to become any more feature rich. Its version number is approaching the constant π\pi, indicating that it is reaching its final state, and new versions only incorporate bug fixes. The associated program Metafont’s version is also converging, in its case to ee, the base of the natural logarithms. Despite this choice, TeX remains the standard for typesetting scientific documents.

Flouting the trend towards complexity, the designers of two lesser-known languages intend to simplify them over time. The authors of Newspeak named their language after the one described in Orwell’s novel 1984 [206]. That language shrank over time to limit the expression of subversive ideas, foretelling keyword-based censorship on today’s social media. Newspeak’s authors call it “an ideal to strive for — a shrinkable language.” However, they aren’t clear about how they intend to shrink it, and, at the time of this writing, one addition and no subtractions appear to be planned. Maybe we need to write a few subversive programs in Newspeak to spring them into action.

Nim is a new statically typed compiled systems programming language. One of its main contributors, Dominik Picheta, admits that it “has bloated a little and we’re now trying to remove as many features as we can” without specifying what criteria would be followed [207]. In the same discussion, Hacker News user weberc2 talks about stripping down Python by, for instance, removing classes. However, it is unclear why classes and not some other feature. Removing features is always controversial and, for programming languages, causes the existing code base to stop working consistently. Without a clear and, ideally, automated way to convert code from the full featured language to the reduced one, shrinking isn’t an option for any programming language with an existing user base. However, as we have seen above, it is possible to define new languages from existing ones for specific purposes by removing features, with no expectation of carrying over existing codebases.

1.3.3 Libraries

We have already talked about libraries from the point of view of how they help developers write concise code. However, we haven’t talked about the libraries themselves, particularly the part visible to its users, its API. There are at least two approaches in API design: minimal and humane. According to Fowler, a minimal interface provides the capabilities of a library with a minimum of complexity, that is, a minimum number of functions, methods, or classes accessible to a user [208]. A humane interface strives to support as many reasonable and prevalent use cases as possible [209]. That means a developer writing against a minimal API will have to write more code but learn less than against a humane one. For example, imagine an array library that provides two functions, one to sort an array and one to extract its nn leading elements. We can combine them to find the top nn largest elements (let’s ignore the existence of a specialized algorithm for this problem). Should an API provide this capability or leave it to its users to implement?

Joshua Block, co-author of the Java standard library, wrote this about API design [70]:

Every facet of an API should be as small as possible, but no smaller. You can always add things later, but you can’t take them away.

Don’t make the client do anything the library could do. Violating this rule leads to boilerplate code in the client, which is annoying and error-prone.

The two requirements seem to almost directly contradict each other: keep the API small but make it do as much as reasonably possible. Others have also observed this contradiction. For example, discussing the omission of the function getLast from the java List API, Matthew Murdoch, author of a unit testing framework for Arduino, writes [210]:

[…] design guidelines often conflict and the hardest part of an API designers [sic] job is to balance these conflicts.

Stating the trade-offs and giving guidance on finding the right balance would be more helpful than listing clear but incompatible goals.

What is more frugal than an API with a single entry, a function that does it all? Of course, we can always force unrelated functionality into the same function, but that isn’t particularly helpful. Nonetheless, I found four libraries that can get away, without strain, with a single entry point:

These are four highly non-trivial libraries for meaningful tasks, which are all exposing a most minimalistic API.

Testit, a testing library for R, is close to this ideal as well, with two calls implemented in about 30 LOCs — neither KLOCs nor MLOCs. Its author, Yihui Xie, is explicit about its minimalism: “There is no plan to add new features or reinvent anything in this package. It is an intentionally tiny package” [215]. That “reinvent” could be construed as a jab directed at a competing package, test_that, which forces users to rewrite boolean expressions in a new language of expectations, as in expect_equal(x, y) instead of x==y, for no apparent advantage [216].

Another library for R is Tidyr. While not as minimalistic as the previous ones, it follows an approach of improvement through shrinking: “Just as reshape2 did less than reshape, tidyr does less than reshape2” [217]. These three packages officially aren’t versions of each other, but they cover similar ground, and development remained active only on the most recent at any given time. The renaming strategy is meant to allow their author to break backward compatibility and rectify any perceived design errors in previous packages while minimizing user opposition. The reduction in features, more than the result of a selection, is the outcome of eliminating duplications between this and other related packages while improving their interoperability.

What if we could take libraries that manipulate different types of sequences — in memory, on disk, lazy,92 and so on — and make them one and the same? Just one library to implement and to learn? That is what Rich Hickley, author of Lisp derivative Clojure, has tried with transducers [218]. I recommend the referenced video over other online documentation as it is the most effective in comparing and contrasting designs with and without transducers. Transducers are just like the familiar functional primitives map, filter, and reduce; only they abstract away from the actual collection they work on. They aren’t specific to lists or files or streams etc. To apply them to data, we need to call a function, transduce, or other transducer-aware functions, which accept as arguments a sequence, a transducer, and trigger the computation. It is a subtle difference that provides a unifying abstraction over a variety of sequence-like collections, which significantly reduces the size of the API and the libraries themselves.

How often are companies racing to endow their products with the most extensive feature list, even if the user experience suffers as a result? Fortunately, that isn’t the case with StaticFrame. Instead, it sports an impressive “negative feature list,” a list of features that it doesn’t offer, goals it doesn’t aim for, and a svelte codebase [219]:

StaticFrame is not a drop-in replacement for Pandas. […] Further, as StaticFrame does not support in-place mutation, architectures that made significant use of mutability in Pandas will require refactoring.

StaticFrame is lightweight. It has few dependencies (Pandas is not a dependency). The core library is less than 10,000 lines of code, less than 5% the size of the Pandas code base.

StaticFrame does not aspire to be an all-in-one framework for all aspects of data processing and visualization. […]

StaticFrame does not implement its own types or numeric computation routines, relying entirely on NumPy.

This definition by negation reminds me of the closing verses of a poem by Nobel Prize winner Eugenio Montale, Non chiederci la parola (Don’t ask the word) [220]:

Codesto solo oggi possiamo dirti,
ciò che non siamo, ciò che non vogliamo.

(All we can tell you today is this, / what we are not, what we do not want). I am afraid this approach did not secure an enormous user base for StaticFrame, but it is a breath of fresh air compared to some of the cluttered do-it-all packages out there.

1.3.4 Programs

If few libraries and frameworks have frugality as a goal, even fewer complete programs and apps do so. The Unix utility less seems a promising candidate, at least judging from its name. Yet it originally was a version of the utility more for scrolling through text files, with the additional ability to scroll backward. Therefore, confusingly, less has more features than more.

The Unix philosophy requires keeping programs small and focused (see Sec. 1.5). It hasn’t always kept faithful to this mandate in more recent versions, for instance, in the case of Systemd (also covered in Sec. 1.5). Another Unix program affected by bloating is Xterm. The people behind suckless.org condemn it in no uncertain terms [221]:

xterm is bloated and unmaintainable. […] It has over 65K lines of code and emulates obscure and obsolete terminals you will never need.

The popular alternative, rxvt, has only 32K lines of code. This is just too much for something as simple as a terminal emulator.

suckless.org provides an alternative with st, a simple terminal emulator that focuses on the most common terminals and avoids feature duplication with terminal multiplexers93 like Tmux. You may wonder why st belongs in the frugal section, whereas dwm, another project championed by suckless.org (see Sec. 1.2.11), belongs in the concise section. It is a matter of emphasis. dwm seems to have the essential features of a window manager but endeavors to deliver them within a maximum of 2KLOCs of code. st’s focus is on dropping emulation of obscure terminals and features provided by multiplexers, but it is no heavyweight at about 6KLOCs. I will concede, though, that the line between frugal and concise is blurred for complex software artifacts.

Why not write your own application when nothing on the market is frugal enough? It is a rite of passage among tech bloggers: developing a custom platform. For example, Hague completed his with “269 lines of commented Perl” [23]. There is no question his codebase size is a small fraction of that of an off-the-shelf alternative like Jekyll. However, with the abundance of blogging solutions available, those 269 LOCs didn’t have to be written and will, presumably, be lightly used and tested. Nonetheless, running and maintaining your own bare-bones blogging platform may be easier than doing the same with WordPress.

The preceding one was a small selection of programs that are frugal by design, and while there are no claims of completeness anywhere in this book, its brevity reflects the paucity of examples. The industry has long moved decisively in the opposite direction. Programs compete on features, not on simplicity. Code bases have been growing for decades, with no end in sight [12].

1.3.5 User Interfaces

If APIs connect a program’s modules, user interfaces or UIs connect programs and people. It may not be surprising, then, that some of the observations that are valid for one (see Sec. 1.3.3) carry over to the other. For instance, Quip founders Bret Taylor and Kevin Gibbs wrote that “When designing a user interface, it’s much harder to remove something than to add in something new” ([222]; compare with Bloch’s guidelines for APIs in Sec. 1.2.8). Users may be more adaptable than programs, but while they can ignore a new feature, they can’t replace one that has been removed, which is an option for developers. The result is this constant pull towards complexity, which needs to be deliberately countered.

Regarding graphical UIs, Rian van der Merwe formulates principles modeled after Tufte’s data-ink for statistical graphics (see Sec. 1.2.10.2). These principles mandate that a good UI needs to show data of interest to the user, do it efficiently with the pixels available and avoid wasting pixels on anything else [223]. Remember skeuomorphic UIs, which tried to show, say, a calendar as if it were made of paper and leather? Or web 2.0 sites with only 48 and larger fonts? Just the opposite. Looking at fake leather gets old quickly, and huge fonts spread the same information over many screens. A similar approach works for both statistical data and family pictures from a UI standpoint. If the user is after either, the screen should show more of those and less of everything else. The same is true for input buttons or fields or any UI elements.

A radical approach to UI design is using real-world objects instead of graphical elements. It has been adopted by Dynamicland, the brainchild of Bret Victor [224]. This system can track physical items’ positions, recognize text and other marks on their surface and project text and symbols onto them. The cameras and projectors are hidden in the ceiling. No code or data is needed to define UI elements’ appearance and behavior: users stick colorful labels on those they want the system to track. The learning curve is flatter as ordinary objects are already familiar. Compare this restrained approach with Mark Zuckenberg’s Meta’s goal of replacing the real world with a privately-owned artificial one. Its intrusiveness is matched only by the eye-watering development costs: $15B/year most recently and $100B94 over the project’s lifetime up to 2022 [225]. The only downside of ordinary objects is that they are susceptible to coffee spills, unlike their virtual counterparts.

Turner-Tauring loves proposing radical solutions, and UI is no exception. After he suggested we “drop the product” and become a “0.1 programmer” in another post (see Sec. 1.3.1), what could he recommend about the UI? Remove it [226]! He observes that user input isn’t a goal and that if we can help users achieve their aims without it, we should. For instance, we could provide a default value, guess a value from other inputs or just manage without.

1.3.6 Systems

FreeRTOS is a little operating system that reportedly fits in only three files of C code [227]. Before you dismiss it as a toy project, let me add it was backed by a dedicated company that later passed its stewardship to Amazon AWS. It is used mainly for microcontrollers.95 When verifying this information, I found it somewhat outdated [228]. The whole project is over 160 KLOCs at the time of this writing, but most files seem related to porting to different hardware. If I got that distinction right, the core is below 30 KLOCs. Therein there are six C files with less than 20 header files. Still small, but not just three files.

A much more considerable expansion affected the UNIX kernel. About an early version, John Lions writes [6]:

Not least amongst the charms and virtues of the UNIX Time-sharing System is the compactness of its source code. The source code for the permanently resident “nucleus” of the system when only a small number of peripheral devices is represented, is comfortably less than 9,000 lines of code.

This quote comes from a book containing a full copy of that code: A Commentary on the Sixth Edition UNIX Operating System. It became the centerpiece of operating systems classes around the world in the late 70s. Operating systems with 10 or even 100 times more LOCs already existed then.

I lost track of what happened to this UNIX flavor, but it is easier to find information about an independent UNIX implementation, Linux. A book modeled after the Commentary but devoted to this operating system explains why they picked a fairly old version thereof [229]:

The current Linux kernel source code amount is in the number of millions of lines, the 2.6.0 version of the kernel code line is about 5.92 million lines, and the 4.18.X version of the kernel code is extremely large, and it has exceeded 25 million lines! So it is almost impossible to fully annotate and elaborate on these kernels. The 0.12 version of the kernel does not exceed 20,000 lines of code, so it can be explained and commented clearly in a book. Small but complete.

In 2021, the 5.11 release of the Linux kernel had around 30.34 million LOCs, and the expansion has accelerated since [230]. A great deal of this code inflation is due to the inclusion of device drivers in the kernel source, a unique arrangement that moves some of the burdens of driver maintenance to the kernel developers. However, it isn’t just the drivers. As we will see in Sec. 1.5, Systemd, the replacement for the init process, has passed the 1.3 MLOCs mark and has been criticized for this and other reasons. Luckily, Linux kernels are highly configurable, allowing the creation of minimalistic ones for underpowered devices, teaching purposes, or just for fun. Minimal Linux, for instance, is about 1.6 MLOCs [231].

We can find hints at frugality even in The Mythical Man-Month (see Sec. 1.1), hailing from a time when resource constraints dominated any other consideration (even if the book was published in the 90s). In it, Brooks calls for unity of design above feature completeness [5]:

It is better to have a system omit certain anomalous features and improvements, but to reflect one set of design ideas, than to have one that contains many good but independent and uncoordinated ideas.

It is a timid step toward frugality, but it is a start.

1.4 Hardware

I didn’t think about concise hardware until I ran into a keyboard described as follows [232]:

Concise and portable: Designed with ultra-thin, light weight, small size and concise outlook, [the] RK61 keyboard is handy, comfortable and portable to carry to [sic] everywhere for familiar typing in any room.

On second thought, it isn’t a particularly ascetic device but rather one subject to physical constraints. Any keyboard is a compromise between having more and larger keys and keeping them within easy reach or making the device portable, two physical constraints. Some keyboards sport dedicated media keys or an extra-large enter key; others tout their compactness and are even foldable.

A better example of ascetic hardware is the missing keyboard of the iPhone. Introduced by Steve Jobs as a single device to replace three distinct ones — a phone, a music player, and an internet-connected device — it crucially lacked the tiny keyboard of contemporary competitors in exchange for a much larger screen [233]. It was a bold step. John Markoff wrote in the New York Times [234]:

If there is a billion-dollar gamble underlying Apple’s iPhone, it lies in what this smart cellphone does not have: a mechanical keyboard.

The size and weight constraints smartphone designers operate under make this not a pure ascetic choice. Nonetheless, iPhone competitors quickly responded with devices with sliding keyboards and large touchscreens. Despite this and the fact that on-screen typing is slower, mobile keyboards have gone extinct. Rumors had it that the iPhone’s few remaining buttons — all side buttons, with the home button long gone — were on the chopping block for the iPhone 13, but that turned out not to be the case [235]. Maybe a future iPhone will turn on by looking at it or will always be on.

Another less well-known but relevant hardware example is the RISC architecture, from Reduced Instruction Set Computer [236]. Just the name is ascetic: why would we want to reduce the instruction set when CPUs96 with rich instruction sets are plentiful? CPU designers must choose a trade-off between a rich instruction set that allows accomplishing more computation in a single instruction and a simplified one, which can be implemented more efficiently. With the help of compilers, interpreters, and emulators, the same programs can run on either type of CPU. From that point of view, the two architectures are equivalent, and therefore, in our jargon, we can see RISC as a concise architecture. Even before compilers were available, some liked programming with a smaller instruction set (see the IBM 650 in Sec. 1.6). When proposed in the 80s, despite evidence it could work better, the dominant players in the industry were too invested in its opposite, CISC, for Complex Instruction Set Computer. RISC survived as an appropriate choice for low-power applications or where performance was more important than backward compatibility, for instance, in game consoles. One of those low-power applications was the smartphone, quickly making RISC the most widespread CPU type. Now that Apple has chosen RISC for most of its product line, including the desktop, we may be approaching a day of reckoning for CISC. All CPUs today work under energy or power constraints, either because they are battery-powered or because of thermal requirements. In this context, the nimble RISC architecture has the potential to dominate.

1.5 Verbosity and Bloatware

While this book’s main focus so far has been on concise or frugal software, and its verbose and bloated counterpart has been mentioned only as a point of comparison, this is a section devoted to the latter. Just as the bad guys are often more entertaining in fiction, “verboseware” and bloatware have inspired some juicy stories.

They also have some surprising supporters. Concise software has a bad reputation that goes back to the same Dijkstra quoted in section Sec. 1.1 warning about the threat represented by large programs [39]:

[…] a specific phenomenon occurs that even has a well-established name: it is called “the one-liners”. […] one programmer places a one-line program on the desk of another and […] adds the question “Can you code this in less symbols?” — as if this were of any conceptual relevance!

It is somewhat puzzling that he would not consider for a moment that there may be some relationship between reducing 10 LOCs to 1 and reducing 1 million to 100,000, even if different techniques apply at different scales. However, as we have seen in Sec. 1.2.4, his answer was to produce better-structured code, not less of it.

He isn’t the only instance of a Turing awardee defending verbosity in his Turing lecture. Recalling what he considered a youthful mistake, Hoare said [204]:

[For the successor to ALGOL60] my suggestion was to […] relax the ALGOL60 rule of compulsory declaration of variable names […] It was pointed out that the redundancy of ALGOL60 was the best protection against programming and coding errors […] The story of the Mariner space rocket to Venus, lost because of the lack of compulsory declarations in FORTRAN, wasn’t to be published until later.

Hoare seems to embrace the widespread belief that a missed hyphen in Mariner I’s control program caused its premature loss. An inquiry later established that the error was equally minute but in its specification, not in the program itself [237]. Nonetheless, the false belief stuck, as good stories have a way of doing. He continues:

I was eventually persuaded of the need to design programming notations so as to maximize the number of errors which cannot be made, or if made, can be reliably detected at compile time. Perhaps this would make the text of programs longer. Never mind! Wouldn’t you be delighted if your Fairy Godmother offered to wave her wand over your program to remove all its errors and only made the condition that you should write out and key in your whole program three times!

Unfortunately, variable declarations, let alone retyping entire programs, are a primitive way to spell-check variables. I haven’t seen any evidence that languages with this characteristic have a lower bug density, but there is plenty of evidence that they have more lines. Modern IDEs will catch misspelled and uninitialized variables, and the declarative information is “vital” because some compilers use it to allocate the correct amount of memory.

The warnings about the pitfalls of large programs have been around for a long time. The Unix Philosophy, as summarized by open source advocate Eric Raymond, includes this rule about large programs [238]:

Rule of Parsimony: Write a big program only when it is clear by demonstration that nothing else will do.

That doesn’t ensure that modern versions of Unix don’t violate this principle. For instance, Systemd, a Linux program, is “an init system used to bootstrap user space and manage user processes,” which “also provides replacements for various daemons and utilities, including device management, login management, network connection management, and event logging” [239]. It exceeds 1.3 MLOCs and controversially centralizes several responsibilities previously dealt with by separate, smaller programs. Several notable Linux developers have criticized it, but major Linux distributions install it by default [240].

This trend is by no means limited to the Unix world, but, in its case, is in strident contrast with its original philosophy. It affects the software industry as a whole, despite its negative consequences [12]. We have seen in Sec. 1.1 that the size of a program predicts, to some degree, the number of errors to be found in it and the probability of its failure as an engineering project. Yet, there are further ramifications, such as security-related ones.

For instance, consider signing onto web sites using your Facebook credentials (a feature known as single sign-on). The problem isn’t just creating a single point of failure for all of your authentication needs; it is trusting the security of Facebook authentication in the first place. Jason Polakis, an assistant professor of computer science at the University of Illinois at Chicago and an expert on single sign-on, puts it as follows [241]:

The codebase of these services is massive. You have different teams working on different components, and they can interplay in different ways, and you can have a crazy hack that no one expects.

He could have mentioned Facebook’s “move fast and break things” motto, which doesn’t bode well for security, or other reputation-busting activities that made the news. Instead, he focused on the enormous attack surface represented by its sprawling code base, a problem shared with all the other tech behemoths. Polakis was being interviewed on the occasion of a security breach at the social network, affecting 50 million users.

A similar, code-size-based explanation was proposed by Zulfikar Ramzan, CTO at a cloud security startup, for a well-publicized bug known as Heartbleed [242]:

[…] when I looked at the flaw myself I said, “Obviously, this is a pretty simple error.” This comes down to the issue that there’s so much code out there right now, and there’s so much code people are writing. There was a particular protocol called Heartbeat that did not get as much scrutiny.

To be precise, Heartbeat is an extension of TSL and DTSL, two protocols underlying secure communication on the web [243]. We can imagine a world without this feature but also free of the criminal activity enabled by its implementation errors. Indeed, before a patch became available, a recommended workaround was to turn it off. Whoever decided to add it and whoever, most likely inadvertently, introduced this bug will not foot the bill for said activity.

A company that may regret writing so much code is Boeing. It is building Starliner, a space capsule for crewed missions. After a bug was identified and corrected in its control software while in orbit, Boeing decided to review the whole code base for the project, as several additional bugs pointed to a development process failure. Unfortunately for them, there were already a million LOCs to review [244]. It would seem wise to check a company’s development process before reaching the one MLOC mark, let alone having a spaceship in orbit, but that didn’t happen in this instance. I can’t help but imagine a million LOCs written at the speed that some ascribe to this kind of project, namely, one LOC per programmer per day (see Sec. 1.2.1).

Talking about space exploration, I ran into this code comment, part of the container97 management software Kubernetes [245]:

PLEASE DO NOT ATTEMPT TO SIMPLIFY THIS CODE. KEEP THE SPACE SHUTTLE FLYING. This controller is intentionally written in a very verbose style.

The comments further dictate that this program section can only be added to because it is written in this style. Reportedly, it is used at NASA in some contexts to prevent bugs from occurring. Its rationale is that if the code lists all possible conditions in a cascade of ifs, then no condition will be unaccounted for, even if some branches do nothing. However, I don’t know if this approach is widely adopted at NASA or if there is any evidence it contributes to program correctness.

Can large code bases be responsible for crimes, or at least provide a mitigating circumstance for the people accused of those crimes? Eddie Tipton, the confessed mastermind of the largest lottery fraud in US history, apparently tried to suggest this idea. Speaking in court about the system running the lottery, he said [246]:

[The lottery code] just kept growing and growing, […] It became spaghetti codes [sic], unfortunately.

The jury wasn’t moved and sentenced him to 10 years, eventually adding up to 25 in combination with other sentences. Rigging spaghetti code, albeit unpleasant, is still rigging.

Website operators and developers should be particularly sensitive to the size issue because web pages, with their associated media and code, have to travel over the wire to impatient users, who may move on rather than wait. Unfortunately, this is not the case. While many techniques have been adopted to accelerate page load, restraint in designing pages isn’t one of those. In an entertaining presentation, Maciej Cegłowski, a web entrepreneur and activist, details what he calls “the website obesity crisis” [247]. He compares the size of pages of popular websites to that of major works of literature. His prescription is drastic:

I want to share with you my simple two-step secret to improving the performance of any website.
Make sure that the most important elements of the page download and render first.
Stop there.
You don’t need all that other crap. Have courage in your minimalism.

He calls for less of everything other than human-readable text: fewer images and videos, less Javascript and tracking tags. His blog’s home page is over 56% text; by comparison, the corresponding figure for cnn.com is 0.21% [248].

Typically, more planning and thinking go into designing a language than a website. Language designers should know that keeping their creations small makes them more accessible. On the other hand, faced with a long list of requirements, they may try to meet them at the cost of more complexity. That might be what happened to the ADA language, designed in response to a call for proposals by the Department of Defense. None of the existing languages had been found to meet the complex requirements for a general-purpose programming language for defense-related projects. The result wasn’t encouraging [249]:

[The developers of] early Ada [sic] compilers struggled to implement the large, complex language, and both compile-time and run-time performance tended to be slow and tools primitive.

Hoare took time during his Turing Award Lecture to criticize ADA, as well as PL/1 and Algol 68, as too complex and therefore unreliable (apparently, being the awardee didn’t put him in a charitable mood) [204]. His experience with complex languages goes back to his work on designing a successor for Algol 60, during which he had to watch in disbelief as a simple, incremental proposal was sidelined in favor of a much more elaborate and, in his opinion, unfeasible design. Of this, he wrote:

[…] there are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.

It became, after many delays, the quickly abandoned Algol 68.

About ADA, which was designed to become the standard programming language for the Department of Defense, he warned:

Do not allow this language in its present state to be used in applications where reliability is critical, i.e., nuclear power stations, cruise missiles, early warning systems, anti-ballistic missile defense systems. The next rocket to go astray as a result of a programming language error may not be an exploratory space rocket on a harmless trip to Venus [see Mariner I above]: It may be a nuclear warhead exploding over one of our own cities. An unreliable programming language generating unreliable programs constitutes a far greater risk to our environment and to our society than unsafe cars, toxic pesticides, or accidents at nuclear power stations.

It is at least debatable whether this prediction has proven accurate. I can’t think of a software error that caused devastation even remotely comparable to that of the Chernobyl or Fukushima disasters, even if the MCAS-related accidents (see Sec. 1.6), for instance, resulted in more short-term casualties. However, by most accounts, they were not due to the programming language used.

ADA continues to be used in defense and other security-sensitive applications, like flight control for aviation and space. Yet, its adoption rate pales compared to another problematic language, SQL, which is “by far the most used database language” [250]. The same reference contains a well-reasoned criticism of SQL but, for our goals, the following paragraph stands out:

SQL is not a small language. At the time of writing the PostgreSQL implementation contains 469 keywords. Just part 2 (out of 14) of the SQL:2016 standard has 1732 pages.

By comparison, JavaScript has 64 reserved words in its ECMAScript 5/6 standard [251]. Other issues with SQL make queries long and hard to read and inhibit query reuse (see Sec. 1.2.9.9 and Sec. 1.2.10.1). The inventor of the relational model, Edgar F. Codd, devotes a whole chapter in one of his books to SQL’s “serious flaws” [252].

If the DRY principle (see Sec. 1.2.3) guides Python and Ruby programmers, Java should be considered a WET language — for Write Explicitly Tenfold. We have already encountered its proverbial verbosity in comparisons to Clojure (see Sec. 1.2.9.1) and Scala (see Sec. 1.2.10.1). However, the most extreme example of irreducible verbosity must be the one in java.util.Arrays, a module of the Java standard library. One of its authors is the same Joshua Bloch who preaches the gospel of minimalistic APIs (see Sec. 1.2.8). In an older version, we could find the following fatalist comment [253]:

The code for each of the seven primitive types is largely identical. C’est la vie.98

Let us consider that this is as idiomatic Java as it gets, co-authored by three principal contributors to the development of the language. Apparently, Java has no abstraction mechanism to avoid this 7-fold duplication without a performance penalty, which isn’t acceptable in a foundational array library. More recently, the comment disappeared, but the repetitive code remains.

Verbosity isn’t just an inconvenience or a maintenance burden: it can kill complete subfields of computing. At least, this is what Jonathan Dursi puts forward in his post: “HPC is dying, and MPI is killing it” [254]. He accuses MPI, a dominant library for High-Performance Computing, of forcing practitioners to a lower level of abstraction than most need. Moreover, compared to modern alternatives like Spark and Chapel, he shows that MPI takes over twice as much code while leaving fault tolerance to the developer.

Anecdotally, most engineers have had to deal with awful code in their careers, particularly of the verbose variety. Hacker News user conceptjunkie comes to a sobering conclusion [255]:

I don’t know about you, but I’ve rarely worked on a codebase that wasn’t awful in some way, and I am definitely not early in my career. I’ve come to the conclusion that most programmers are simply awful at their jobs, and the developers that can write clear, concise code are a small minority. I’ve known a few, but not many.

I hold a slightly less pessimistic view. It may be that too many people have rapidly entered this field for most of them to have a talent for or a deep interest in computing. However, terrible developers don’t need to be a majority to produce a much larger number of LOCs than their concise peers. Good, concise code is hidden among the KLOCs and MLOCs of verbose horrors.

Moreover, bad code begets bad code. For example, I once needed a particular time-related function, and I thought someone at a client of mine had likely written it before. I found related functions everywhere, from database-related modules to the UI. The search had taken me already longer than writing my own implementation. Should I have started a refactor to bring all these functions into one module while eliminating duplications, or just followed the most direct route to meet my deadline? In most companies, an engineer is not free to make such a decision. You can’t go to a team meeting and say that you won’t meet a deadline because you started a refactor of your own accord. You would be asked immediately: is this refactor a blocker? That is, is its completion a necessary condition to perform the original task? Of course it isn’t, because you can pile up a bit more technical debt,99 let the rot spread a little and meet your deadline. Indeed, you are required to.

This experience brings up a final point: the dominant incentive structure isn’t aligned with quality. Development these days is fragmented into a sequence of self-inflicted emergencies. Invented deadlines and the jargon used by management (“backlog”, “sprint”) are meant to create pressure. It is like trying to win a marathon, going as fast as possible for the first 100 yards, then again and again. It is a technical debt factory. We know the aphorism: “quality, time or resources, pick two.” Since individual contributors don’t have access to additional engineering resources, and time is set by management or the fastest, least scrupulous coworker, quality has to yield. It becomes a race to the bottom, doing the least for a new feature to pass equally rushed tests, product, and code reviews. Week after week, a developer will be called to work on someone else’s awful code and someone else will mess up whatever good there in this developer’s. It is irrational to expect good work to be performed in this environment. It is inevitable that conciseness, which is seldom even recognized as a quality factor, is one of the victims.

1.6 Code Considered Harmful

Some engineers react to the realities of horrible code by trying to write good code, including eliminating verbosity. Others take a more pessimistic position, denying the possibility of writing concise code or using features appropriately. Their approach is more radical: elimination.

Developers likely recognized, in this section’s title, a play on “Go To Statement Considered Harmful,” an essay in which Dijkstra criticized the goto statement, which causes a program to continue at a line identified by a number or a label [256]. Dijkstra endeavored to rewrite programs without the goto and reported that “In all cases tried […] the program without goto statements turned out to be shorter and more lucid” [257]. It is the prototype of all attacks on language features and the only one to succeed to a great extent. Indeed, many modern languages don’t support this construct. This omission can be worked around at the cost of a modest increase in code length or by using alternative instructions such as break or return. That goto isn’t needed is also supported by the Böhm-Jacopini theorem, which shows that any program can be rewritten without goto ([258]; see also [259]). More than the statement itself, it is noteworthy that its proof builds a goto-free version of a program via incremental modifications, rather than just claiming Turing equivalence.100 In other words, ridding programs of gotos doesn’t require rewriting them from the ground up. The main argument against the goto is that it made programs hard to understand by allowing a combinatorial explosion of the possible execution paths through the code. Well-known voices expressing qualified support for the goto are those of Donald Knuth [260], author of The Art of Computer Programming, and Torvalds [261]. Both find that it has applications where it seems the most appropriate construct. I have seen excellent programs written with goto as the only available control statement and terrible ones written without a single goto. Indeed, it is a feature prone to abuse, but many other dangerous features are defended based on their power and under the optimistic assumption that only competent people will use them. I don’t miss the goto, but I feel like a double standard was applied to it.

On the evidence that suppressing the goto didn’t solve the ills of software development — but it may have mitigated them — the scapegoating of language features could have fallen out of favor, but it hasn’t. There is a budding effort to discourage the use of the if statement. Francesco Cirillo, the consultant who invented the Pomodoro Technique, has penned an “Anti-If Manifesto” [262]:

Have you ever wondered how IFs impact on [sic] your code? Avoid dangerous IFs and use Objects to build a code that is flexible, changeable and easily testable, and will help avoid a lot of headaches and weekends spent debugging!

I understand this is an effort to replace ifs with method dispatch,101 substituting the implicit checking of types for the explicit checking of conditions. Equivalently, it moves the responsibility for type checking from the caller to the callee, an instance of what Bloch recommends for APIs (don’t let the user do what the API can take care of, see Sec. 1.2.8). I am less clear why this is framed as an attack on the if statement rather than a pitch for object-oriented programming and, specifically, polymorphism102 or whether Cirillo writes or foresees programs free of ifs.

While if fights for its life, the detractors of class inheritance,103 which is closely related to polymorphism, are building up their ranks [263]. As language evolution continues to be based on a popularity contest, as opposed to hard evidence, we miss opportunities to make actual progress.

If sending individual features into oblivion isn’t enough, why not reduce code to fragments? This is more or less what Robert C. Martin proposes in his best-seller Clean Code [264]:

The first rule of functions is that they should be small. The second rule of functions is that they should be smaller than that.

There is only one way to fulfill this requirement: limiting every function to one line — I considered the zero lines solution but ran into mathematical difficulties. It likely isn’t meant to be taken too literally because, elsewhere in the book, the ideal function length is set at two to four lines. This precept is absolute and independent of other considerations, including:

You may be writing this madness off as a necessary hyperbole to make a book sell, but that would be premature. I once reviewed a 60-KLOC code base that approached this ideal closely. Function names were summaries of their own minuscule bodies, and formal and actual parameters had the same names. I expect this to be hard to imagine even for battle-hardened programmers, so let me clarify with an example (not from the actual code). A statement like table2 = join("table0", "table1") would be transformed into:

def join_table0_table1(table0, table1):
  return join(table0, table1)

table0 = "table0"
table1 = "table1"
table2 = join_table0_table1(table0, table1)

As a result, no function was invoked twice, ever. If every sequence of more than two LOCs needs to be broken down into separate functions, the reusability of these functions can’t be a priority anymore. To add insult to injury, every constant was assigned to a variable named after its value because of another well-known software engineering guideline according to which hardwired constants are to be avoided. This code was being maintained at a Fortune 500 company, not a fly-by-night operation.

Some put the blame for code verbosity squarely on developers. For example, Tyler Treat, a developer himself by training, writes [265]:

[…] you are not paid to write code. You have never been paid to write code. Indeed, code is a nasty byproduct of being a software engineer.

These statements must sound outlandish to whoever makes a living writing code and are possibly explained by Treat having reached the level of “managing partner,” several levels above the coding trenches. The decisions about whether to write code or not aren’t made by developers in most contexts. A developer not producing code is considered an idle one. For instance, news outlets reported that Twitter engineers, facing a review in preparation for massive layoffs, had to provide management with 50 pages of code produced during the last 30 days, or roughly 90 LOCs per day — a “nasty byproduct” of trying to keep their jobs (and 18 times faster than Kamp wrote Varnish, see Sec. 1.2.1)[266].

A piece of code that turned out to be harmful, even deadly, was the infamous MCAS, a patch in the control software of the new Boeing 737 Max necessitated by changes in engine size and position [267]:

After weighing many possibilities […] Boeing decided to add a new program — what engineers described as essentially some lines of code — to the aircraft’s existing flight control system to counter the destabilizing pitching forces from the new engines.
That program was M. C. A. S.

The new code, together with hardware flaws, resulted in the total loss of two airplanes, with all 346 passengers and crew. It may look like a concise addition to the existing control software — just a few LOCs. However, an ill-advised addition it was: a plane initially designed with intrinsic stability and ultimate pilot control over any electronic aid was modified into one with a computer override in a crucially unstable condition ([268], [269]). As an additional risk factor, the changes had to be downplayed to convince clients and regulatory authorities that the new airplane was essentially the same as the previous 737, sidestepping expensive retraining and certification requirements. Aircraft developed more recently, both by Boeing and its main competitor, Airbus, are designed for computer control. As a result, they didn’t incur any of the software problems of the 737 MAX.

Going back to Brooks and the early days of computing, he discusses the delicate balance between adding features and adding complexity, both for the user and the developer. A system that he is particularly fond of is the IBM 650 ([5]; see Fig. 1):

I am haunted by the memory of the ease of use of the IBM 650, even without an assembler or any other software at all.

You got it right: his favorite system was software-less, bare hardware. Later, it received an assembly language and other software, possibly spoiling the user experience.

Knuth got his computing career started working on the same model. He, too, has fond memories of it [270]:

[…] There was something special about the IBM 650, something that has provided much of the inspiration for my life’s work. Somehow this machine was powerful in spite of its severe limitations. Somehow it was friendly in spite of its primitive man-machine interface. […] The 650 had only 44 operation codes, while the 709 had more than 200 [see Sec. 1.4]; yet I never enjoyed coding for the 709, because I never seemed to be able to write short and elegant programs for it — the opcodes didn’t blend together especially well. By contrast, it was somehow quite easy and pleasant to do complex things on the 650 with very few instructions.

1.7 Methodologies

In a time of proliferating and ever more irrational software methodologies, it is likely for the better that there aren’t ascetic programming workshops, books, and expensive ascetic consultants to promise unrealistic results and to dodge responsibility when they aren’t achieved. However, on a small scale, there are initiatives related to this book’s ideas.

Atwood points us to Spartan Programming, which he finds particularly close to his programming style [271]. This methodology consists of a long list of concrete programming guidelines. These can be described as the simultaneous minimization of various code-related quantities, from the number of lines to the number and scope of variables. Minimizing all those measures may result in trade-offs between them, but the guidelines don’t explain how to find a compromise. Unfortunately, the original Spartan programming description is now a dead link, but you can automatically apply Spartan programming to Java code using Spartanizer, an online service [272].

The take of suckless.org is much more practical [273]. They develop concrete software “with a focus on simplicity, clarity and frugality.” Their philosophy can be summarized as “keeping things simple, minimal and usable.” They host the development of several projects, including Dwm (see Sec. 1.2.11), a window manager with a 2-KLOC upper bound on code length, and st (see Sec. 1.3.4), a simplified terminal emulator. They claim that “the tendency for complex, error-prone and slow software seems to be prevalent in the present-day software industry.” Surprisingly, their intended audience is “advanced and experienced computer users,” whereas it seems reasonable that their emphasis on simplicity would be most appealing to beginner users. Their notion of simplicity seems to focus on objective measures such as the number of features or LOCs:

The more code lines you have removed, the more progress you have made. As the number of lines of code in your software shrinks, the more skilled you have become and the less your software sucks.

Also related is the slow code movement, obviously inspired by the slow food movement. It has a great slogan: “Slower. Less code. But better” [274]. Its website presents an anecdote on solving a seemingly complex problem with two LOCs after an appropriately extended period of analysis, but that is about all there is. Being a slow movement, we are still hoping it will eventually blossom.

2 Science and Engineering

A Caudron Simoun similar to the one flown by Antoine de Saint-Exupéry [275], see Sec. 2.3.
Figure 8: A Caudron Simoun similar to the one flown by Antoine de Saint-Exupéry [275],104 see Sec. 2.3.

2.1 Epistemology, Statistics and Machine Learning

2.1.1 The Parsimony Principle

Ockham’s razor is a well-known parsimony principle in philosophy [276]:

Pluralitas non est ponenda sine necessitate.

(Plurality should not be posited without necessity). William of Ockham, a medieval friar, theologian, and philosopher, used it in several of his manuscripts for different purposes, the most important of which was to conclude that “God’s existence cannot be deduced by reason alone” [277]. However, this principle, despite its name, did not originate with him. It was well-known in Scholastic philosophy and can be traced back all the way to Aristotle. He wrote: “We may assume the superiority, other things being equal, of the demonstration which derives from fewer postulates or hypotheses” [278]. In more modern times, we find it restated in several variants, for instance, by Isaac Newton: “We are to admit no more causes of natural things than such as are both true and sufficient to explain their appearances” [277].

This bias for simplicity continues in modern statistics and data analysis. For example, John Tukey, a statistician who invented the Fast Fourier Transform and the boxplot, wrote [279]:

The importance of parsimony in data analysis can hardly be overstated. By parsimony we mean both the use of few numerical constants and also the avoidance of undue complexity of form in summarizing and displaying. The need for parsimony is both esthetic and practical.

It isn’t just an experience-based statement. For instance, well-known model selection methods, such as the Akaike Information Criterion and the Bayesian Information Criterion, call for the minimization of a quantity that includes a term proportional to the number of parameters in a model. In particular, when the fit to the data is equally good, both methods will choose the model with fewer parameters.

Not everyone agrees. For instance, statistician and social scientist Andrew Gelman writes [280]:

A lot has been written in statistics about “parsimony” — that is, the desire to explain phenomena using fewer parameters — but I’ve never seen any good general justification for parsimony. (I don’t count “Occam’s Razor,” or “Ockham’s Razor,” or whatever, as a justification. You gotta do better than digging up a 700-year-old quote.)
Maybe it’s because I work in social science, but my feeling is: if you can approximate reality with just a few parameters, fine. If you can use more parameters to fold in more information, that’s even better.
In practice, I often use simple models — because they are less effort to fit and, especially, to understand. But I don’t kid myself that they’re better than more complicated efforts!

Gelman sees parsimony as specific to physics, not as a general statistical or epistemological principle [281].

These two points of view aren’t compatible. I initially thought this could be explained by the existence of two major schools of thought in statistics, frequentist and Bayesian, with Gelman an exponent of the latter [282]. Broadly speaking, in frequentist statistics, the more parameters a model has, the better it fits the data. The statistician needs to actively counter this tendency to prevent overfitting, which is when a model can fit the signal as well as the noise, undermining its use, for instance, in prediction or parameter estimation. Intuitively, such a model “memorizes” past data, as opposed to capturing its regularities. In Bayesian statistics, all parameters are assumed to be drawn from a probability distribution, the prior, either because they are stochastic or to represent our preexisting state of knowledge or ignorance about them. This fact, in and of itself, is an antidote to introducing too many parameters. Let’s see how this works with an example of Bayesian model comparison [283]. M1M_1 models a fair coin with equal probability for heads or tails and no parameters. M2M_2 models a biased coin, with an unknown amount and direction of its bias, represented by a parameter also assumed to be a random variable. The probability of M2M_2, conditional to the data, is obtained by integrating over the possible values of its parameter, weighted with their prior probability. With a peanut butter analogy, M2M_2 is “spread thinner” than M1M_1 because all probability distributions are normalized to one. With a betting one, M1M_1 makes a sharp bet on a single parameter value, 1/2, whereas M2M_2 keeps its options open. In frequentist statistics, the bias parameter would be considered an unknown but not a random variable. It could be set to 1/2 if the data supported that, and there would be no way M1M_1 could fit the data better than M2M_2. The latter “contains” M1M_1 as a special case. Frequentists aren’t fools and have developed ways to fix that. However, for Bayesianists, the solution is built into their approach. Wikipedia concludes:

M2M_2 is a more complex model than M1M_1 because it has a free parameter which allows it to model the data more closely. The ability of Bayes factors to take this into account is a reason why Bayesian inference has been put forward as a theoretical justification for and generalisation of Occam’s razor […].

Therefore, while the parsimony principle may be implicit in Bayesian statistics, it is compatible with it. Jefferys and Sharpening agree [284]:

Ockham’s razor, far from being merely an ad hoc principle, can under many practical situations in science be justified as a consequence of Bayesian inference.

In the coin example, though, the prior probabilities of models M1M_1 and M2M_2 were arbitrarily set to the same value. Additionally, we glossed over the choice of prior for the bias parameter in M2M_2. These choices are mainly left to the statistician or researcher. This has often been cited as making Bayesian statistics inadequate for science, as conclusions are too dependent on the choice of prior — that is why its dominant variant is called subjective Bayesianism [285]. Bayesians reply with several counter-arguments, from the theoretical difficulties affecting frequentism to the existence of uninformative priors, which, at least in simple cases, lead to a convergence of Bayesian and frequentist results.

2.1.2 A Universal Prior

The question of the prior could be solved if we could find something like a universal prior. It would function as a proto-belief, the prior that predates all data collection and thus solves the Bayesian conundrum. Such a concept exists in a subfield called algorithmic probability [286]. As the name suggests, this is a research domain at the intersection of computer science and statistics. The foundational step is to consider statistical models as programs, not just because it is practical to run them on computers rather than with pencil and paper, but also because of advantages at a conceptual level. Traditionally, statistical models take the form of mathematical expressions, and it seems impossible even to contemplate the set of all possible models. Philosopher Elliot Sober states [276]:

Unfortunately, the idea that we can now enumerate the set of all possible future scientific theories is illusory.

Suppose we instead confine ourselves to computable models, that is, those that can be expressed as terminating programs. In that case, even if there is an infinite number of them, each program is a finite object; therefore, they can be enumerated, at least in theory — their number grows exponentially with size. Moreover, bounding the size of a program-as-model provides a simultaneous limit on its parameters’ number and magnitude (or precision). This limits its complexity in a way agnostic to its nature (say linear vs. non-linear). An objection to the generality of this approach may be that even a simple linear model y=axy = ax may not be computable. It may be surprising, but most real numbers are uncomputable; that is, no computer can produce their digits to an arbitrary precision [287]. If the parameter aa in y=axy = ax were uncomputable, the model itself would be too. However, since all measurements result in rational numbers, it is debatable whether we will ever have to deal with such an uncomputability barrier, or if it will forever remain as a potential problem. Indeed there are quantum-mechanical and information-theoretic arguments against the possibility of infinite-precision measurements [288].

It is possible to define a probability distribution over the set of computable models that we can use as a prior before any data becomes available. For a program of size LL,105 its a priori probability is proportional to 2L2^{-L}. In other words, a model that is one unit larger than another is considered half as likely. Just the fact that we are able to proceed with this definition is progress, but we shouldn’t assume that reality bends to fit mathematics. Nonetheless, for the sake of this book’s thesis all we need to observe is that all alternative priors share the property that those represented by large programs are less likely than small ones in an aggregate sense. This result can be stated more precisely: for every ϵ>0\epsilon>0 there is an LL such that all models larger than LL, in aggregate, have probability less than ϵ\epsilon. Therefore, no matter how we tinker with the prior, there is no way to rigorously state “prefer large models over small ones.” Let’s not overstate what this theorem means: it doesn’t lead to a unique choice of prior.106 Ray Solomonoff, one of the inventors of algorithmic probability, concedes as much in a relatively recent paper where he says he has come to think of this as a strength rather than a liability [289]. However, no prior can systematically prefer larger models over smaller ones. There is no way to escape parsimony, at least in this aggregate sense.

The next step is assuming that the process to be modeled is itself computational; that is, a program can simulate it. We could equivalently state that theories of the natural world can be expressed in a computable way without loss. This is a bit more of a leap of faith and I refer you to the work of Jack Copeland, Mark Sprevak and Oron Shagrir, where the question of whether the Universe is computational is addressed [290]. They examine several alternative answers: the Universe could be a computing device, albeit an enormous one; or, while non-computational, it could be effectively approximated by such a device; or, at least in some reaches we haven’t explored yet, it could be impervious to computational modeling. Either one of the first two answers allows us to proceed with our argument.

Now that we have a prior on programs which, hopefully, can effectively model Nature, deep mathematical results are possible, including the derivation of a prediction algorithm with an error bound depending only on the Kolmogorov complexity of the process to be modeled. Let’s recall that if such a process produces a certain sequence of observations xx, its Kolmogorov complexity is defined as the length of the shortest program107 able to reproduce xx (see Sec. 1.2.11.1). Imagine a random xx generated by fair, independent coin tosses: the only way to reproduce xx is storing it toss by toss. Hence its length and its Kolmogorov complexity are similar in magnitude. At the opposite end, imagine xx is 0000000\ldots 0 |x|\lvert x\rvert times. It is easy to build a short program repeating 0, at least if by “short” we mean O(log(|x|))O(\log(\lvert x\rvert)).108 The Kolmogorov complexity of these observations is much smaller than in the previous case. It is intuitive that a prediction method would perform better on the repetitive sequence as opposed the random one, and the above error bound formalizes that. The universal prior has other interesting properties that we can’t cover here and its presentation has been quite informal; therefore I encourage you to check the sources. Unfortunately, it isn’t appropriate for practical applications, or at least not yet. There are related approaches that are more so and also rely on short descriptions of data and models, as we will see in the next section.

2.1.3 Minimum Message Length

The relationship between the size of statistical models and their performance is at the center of a model selection method known as Minimum Message Length [291]. Imagine we need to communicate a dataset over a transmission line and decide to send it in two parts: an encoding for a statistical model and then a separate encoding for the data, conditional to the choice of model, for instance, encoding only the error in the model prediction. Plug into this the Shannon-Fano code, which assigns a code length of logP(x)-\log P(x) to a code word xx, and it turns out that minimizing code length is equivalent to maximizing the posterior probability P(M)P(D|M)P(M) P(D|M), where the first part is the probability of the model and the second is the probability of the data given the model. Up to this point, this may look like a Bayesian take on model selection.109 However, the formulation in terms of encodings means that the models have to be expressed finitely, for instance, with finite precision numbers instead of real numbers, creating a bridge with algorithmic probability (see the previous section). This relation is explored in depth by Wallace and Dowe [291] and Vitányi and Li [292]. Here though, the class of models isn’t necessarily the class of all computable models but can be defined more flexibly in a problem-dependent fashion. Thus, we could restrict the space of models to finite precision linear models and apply Minimum Message Length to choose among competing ones, giving up on any universality claims but, at the same time, offering a more practical method. Vitányi and Li go so far as to state:

It is widely believed that the better a theory compresses the data concerning some phenomenon under investigation, the better we have learned, generalized, and the better the theory predicts unknown data. This belief is vindicated in practice and is a form of “Occam’s razor” paradigm about “simplicity” but apparently has not been rigorously proved in a general setting. Here we show that data compression is almost always the best strategy, both in hypotheses identification by using an ideal form of the minimum description length (MDL) principle and in prediction of sequences.

The Minimum Description Length principle is an independently developed relative of Minimum Message Length which tries to free itself from the dependence on the availability of a prior [293]. The “ideal form” mentioned above is a bridge to the algorithmic probability of the previous section. These different lines of research point to the discretization of statistical models and the adoption of size, of either concise programs or concise codes, as a means to control complexity and, as a consequence, ensure good statistical performance.

2.1.4 Deep Learning

Neural networks, a biologically inspired class of statistical models, have gone through different phases of enthusiasm followed by disillusion as researchers explore their possibilities and limitations. In the 2000s, one of the growth phases started when it became possible to trainfit in more traditional statistical jargon — substantially larger networks than before, dubbed deep neural networks because of their structure organized in many layers [294]. These new systems broke performance records on several established AI benchmarks related to vision and language tasks. There was a consensus that scaling to larger networks and data sets was the most direct path to improved performance [295] and even general intelligence [296], a level at which man and machine become indistinguishable.

After a phase of rapid progress and exuberant optimism, we are now in one of diminishing returns. Autonomous driving is perennially a few years away from fruition. Google has been going at it for 12 years, yet its cars still struggle to make left turns in ideal conditions [297]. Moreover, machine learning systems are brittle to small changes in their statistical environment or deliberately generated examples to “trick” them into giving wrong answers, so called adversarial examples [294]. The best neural nets for natural language processing, known as language models, were trained from entire crawls of the internet, vast collections of digitized books, and more. It isn’t far-fetched to say that they ingested everything there is to read, yet their performance is uneven: superhuman on some tasks, disappointing on others [298].

One of the reasons may be that researchers focused too much on statistical performance at all costs, on achieving the coveted SOTA (State Of The Art, the best performance on a benchmark). Throwing more data and neurons at the problem was a reliable way to improve performance, but the rate of progress got overlooked. After observing a decade of progress, Sun and co-authors concluded that “performance increases logarithmically based on volume of training data” [299]. Otherwise stated, an additive improvement requires order-of-magnitude larger datasets and models and two orders of magnitude more computing power, as each data point needs to be processed by a proportionally larger network. The next step is multi-modal systems that process text, images, sound and video, and are trained to perform many disparate tasks [300]. Unfortunately, few well-funded labs can pursue this research, as enormous computational resources are needed.

Two major innovations that led to the current renaissance of neural nets are convolutional nets and deep nets, which, for the same size, show better performance than their non-convolutional and shallow counterparts. Paradoxically, ideas at the origin of the current size race are ways to squeeze more performance out of a system of a given size. Some researchers are starting to question the consensus and finding that smaller networks (still in the 10s of billions of parameters) trained on more data under a fixed computational budget perform better than some of the previous champions [301]. Others are looking again into data efficiency [302].

Deep neural networks seem as far as possible from the concept of parsimony. They include hundreds of billions of adjustable parameters, a large number even considering the giant data sets on which they are trained. Theory and experiments show they should overfit the data but, in practice, they don’t, and researchers are trying to figure out why [303]. In one case, they created abridged versions of large networks without significantly changing their performance [304]. This wasn’t done for practical reasons, such as deployment onto mobile devices [305], but to prove that the larger networks provided good predictions, like their smaller versions. We could say that the parsimonious models were hiding inside the verbose ones.

2.1.5 Extreme parsimony

Ockham’s razor warns against complex explanations beyond what is necessary (see Sec. 2.1.1). Different thinkers and researchers have provided different definitions for “complex” and “necessary,” resulting in various principles and methods. However, an ancient philosopher, Parmenides, stands out for simply dropping the “necessary” part of the razor and reducing his theory of everything to a bare minimum of complexity. He embraced a tautology — “whatever is is, and what is not cannot be” — to the extent that he denied the existence of anything other than an undifferentiated being and considered all perception an illusion [306]:

For this [view], that That Which Is Not exists, can never predominate. You must debar your thought from this way of search, nor let ordinary experience in its variety force you along this way, (namely, that of allowing) the eye, sightless as it is, and the ear, full of sound, and the tongue, to rule; but (you must) judge by means of the Reason (Logos) the much-contested proof which is expounded by me.

Ancient and modern philosophers provided many interpretations of Parmenides’ thought, which we know only through fragments of his writings or indirect sources [307]. Among those, the most appealing to me is the most extreme, known as strict existential monism. According to it, only one entity exists, while every distinction, every difference, and becoming itself must all be an illusion. I associate this barebones ontology with other expressions of extreme minimalism, some of which appear in this book: the empty software module (see noop2 in Sec. 1.2.2), the paintings showing a single square (see Malevich’s squares in Sec. 3.2.3), and the silent piece of music (see Cage’s 4’33” in Sec. 3.3). As for those, it is possible to suspect Parmenides’ philosophy of being a prank for intellectuals. However, his current standing as the father of ontology 2500 years later suggests the opposite.

2.2 Mathematics

Being the first to devise a proof for some important conjecture or open problem provides the most recognition and notoriety, independent of its length. Take Andrew Wiles’s proof of Fermat’s last theorem [308]. This theorem has a deceptively simple formulation. We know there are right triangles with integer-length sides, for instance, 3, 4, and 5, and, by Pythagoras’ Theorem, 32+42=523^2+4^2=5^2. Since mathematicians like to try to generalize simple concepts, the problem was expanded as follows: are there positive integer solutions to an+bn=cna^n+b^n=c^n? Solutions for n=1,2n=1,2 were known since antiquity, but none for n3n\ge3. Fermat stated he had proved that none existed. His proof being lost or maybe only imagined, his statement stood as a conjecture to the attention of the greatest mathematicians for centuries, but progress was halting: Fermat himself proved the n=4n=4 case, which implies further work need only focus on prime exponents.110 Then n=3,5,7n=3,5,7 fell, followed by entire classes of primes. However, the general proof was considered untouchable by most mathematicians. Wiles’s solution took years to put together while also building on voluminous work by other mathematicians, who had created bridges between distant fields in mathematics. When first announced, it took a three-day workshop to explain; when published, after some errors were found and corrected, a full issue of the Annals of Mathematics. It is no lightweight proof, but the problem is solved, and prizes and honors have been bestowed on Wiles. Nonetheless, other mathematicians have already started simplifying it. This type of work is also valued, albeit not as celebrated as the first proof of an original result. For instance, reporting on the award of the Field Medals, the highest recognition in Mathematics, to Peter Scholze, Kenneth Chang writes [309]:

Dr. Scholze gained prominence when he was still in graduate school in 2010, simplifying a complicated book-length, 288-page proof to a novella-size 37-page version.

That more svelte proof did not earn Scholze his Field Medal, though. Instead, it was his work at the intersection of number theory and geometry, where he made several original contributions that spurred new lines of research. Novel and fertile for novel results beat concise.

If long proofs are acceptable, is there a limit? When is a proof too long to be checked accurately? One case stretching the boundary is that of the four-color theorem. It is another deceptively simple statement about coloring maps. Specifically, we are concerned with political maps and the assignment of colors to each state or country. It is required that no two bordering states be assigned the same color. Corner contact such as that between Utah and New Mexico doesn’t count. The theorem states that four colors are enough for any map, real or imagined. Kenneth Appel and Wolfgang Haken published a hybrid proof, with a section written by them and another generated by a program [310]. It wouldn’t have been feasible for humans to perform an exhaustive check on thousands of cases to which the original statement was reduced. Not only did such a proof create a certain amount of controversy, but it also managed to conceal some errors for at least four years. The authors finally rectified these errors in a book, including a computer-generated 400-page appendix. Flawed proofs weren’t unprecedented for this problem: two previous ones, of a more traditional kind, had each withstood scrutiny for 11 years before unrecoverable errors were discovered. The controversy about the recent proof was partly because checking it was beyond human ability. Indeed, mathematicians directly verified the program that wrote the more repetitive part of the proof. Therefore, why not publish that program instead of its 400-page output? I guess that would have been an even more radical innovation. Moreover, some mathematicians find such verbose proof doesn’t help in understanding why this theorem is correct, a somewhat vague complaint. As mathematician and pioneering computer scientist John Von Neumann said: “In mathematics you don’t understand things. You just get used to them,” recognizing that modern mathematics is past a level where intuition is helpful [311].

The four-color theorem is neither unique nor the record holder for size. Wikipedia maintains a list of long proofs, with the length of the worst offender estimated to be between 10,000 and 20,000 pages long [312]. I don’t know if this count includes only one proof or is comprehensive of independent results used as stepping stones.

The prolific mathematician Paul Erdős believed the most elegant and concise proofs come from a mythical book he referred to as The Book. He reportedly said: “You need not believe in God, but, as a mathematician, you should believe in The Book.” Its earthly materialization came into existence in 1998: Proofs from The Book [313]. Its authors, mathematicians Günter Ziegler and Martin Aigner, explain that there are many criteria to pick a proof for the book: its clarity, innovative ideas and ability to reveal something new about the problem. However, there is a clear-cut criterion: “A proof that eats more than 10 pages cannot be a proof for our book” [314]. The authors readily admit that there could be perfect proofs that are longer than ten pages, but they aren’t sure they are within human reach.

2.3 Engineering and Design

In branches of engineering outside computing, as we have mentioned (see Sec. 1.2.1), there is built-in pressure towards lowering the part count and the materials bill, as well as assembly costs. Therefore it is harder to identify or even clearly define any ascetic tendencies. However, some designs accomplish more than the sum of their parts. For example, in the Car of the Century Award ranking, of the top five cars, three are entry-level models that made driving accessible to large swaths of the population for the first time: the Ford Model T, the BMC Mini, and the Volkswagen Beetle [315]. Notably missing from the top five is “the most intelligent application of minimalism ever to succeed as a car,” the Citroën 2CV [316]. Yet the 2CV stayed in production longer than any other model, a record 42 years, graduating from a practical vehicle for farmers to a lifestyle statement.

Likewise, in the evolution of the airplane, simpler shapes are primarily the result of aerodynamic or structural constraints. Yet Antoine de Saint-Exupéry, writer and aviator of the pioneering age (see Fig. 8), found that simplicity was a unifying theme in all engineering endeavors and shared with natural forms. He wrote [317]:

In anything at all, perfection is finally attained not when there is no longer anything to add, but when there is no longer anything to take away, when a body has been stripped down to its nakedness.

This idea seems to have taken hold in modern industrial design, as demonstrated, for instance, in the work of Dieter Rams, the designer behind iconic small appliances and electronic devices made by consumer product company Braun in the 50s and 60s, like the T3 radio, the cylindrical table lighter or the PS2 turntable [318]. The last, but hopefully not least, of his ten design principles is that “Good design is as little design as possible” [319]. This restrained approach resulted in designs that rise above the ebbs and flows of fashion and have aged gracefully, even as technology has made them obsolete. Moreover, these principles were in the service of other goals, including durability. In a recent documentary dedicated to him, Rams expressed his condemnation of the rapid replacement rate of modern technology [320].

If designing with fewer parts is now mainstream, fixing artifacts by removing parts isn’t, as we have seen for computer programs. Psychology research supports the idea that people, facing a broken device, think of adding parts rather than removing them: a piece of tape here, a screw there [321]. The same research finds that subtractive solutions are often overlooked in various tasks, from essay writing to itinerary definition.

Even if engineering has a built-in pressure towards reducing part counts, there are spectacular exceptions. Brick houses require roughly eight bricks per square foot of habitable surface, often exceeding 10,000 bricks for a typical US house.111 Small bricks take more time and mortar to lay but, as long as unassisted workers perform the job, there are limits to the size they can use. Consider how many other aspects of manual work are assisted by power tools: bricklaying isn’t one of them, save for mortar preparation.

At least a home can be built with one or a few types of bricks. Their number may be in the five figures, but the parts are interchangeable. Quite the opposite is true for the Eiffel Tower, which is made of 18038 metal parts described in 5329 drawings [322]. Two factors contributed to this inflation of unique elements: a limit on the weight that could be lifted up the tower during construction and its curved profile, based on Eiffel’s calculations for optimal wind resistance [323]. While the tower’s design is simple, it did not result in a significant amount of modularity.

2.4 Biology

The question of whether biology favors conciseness is too big to handle here. Yet, when talking about genomes, size seems highly variable and not associated with essential characteristics of an organism. Sure, viruses tend to have smaller genomes than bacteria, and these, in turn, smaller ones than multicellular organisms, but there are wild variations even between closely related species. Additionally, we don’t know what most of the genome’s function is and that it is generally believed to include a lot of filler or detritus collectively dismissed as junk DNA. Nonetheless, the parts of unknown function have been slowly shrinking. Mappings for regulatory regions, small RNAs and more have been slowly added to the better-understood mapping for genes. Shrinking doesn’t mean disappearing, though. In 2004, Nobrega and co-authors showed they could remove two megabases of the mouse genome that are devoid of protein-coding genes and grow perfectly viable (“indistinguishable,” as they report) litters of mice [324]. Then, one glorious day, the vast majority of the human genome was declared functional — the opposite of junk — by the researchers of a mega-project named ENCODE112 [325]. Never mind it later emerged that its principal investigators were debating among themselves whether to make the top-line figure 20%, 80%, or anything in between, a telltale sign of an arbitrary number not supported by evidence. The chief bioinformaticist on the project, Ewan Birney, wrote, in a strangely naive confession, that they selected the higher figure because it “best conveys the difference between a genome made mostly of dead wood and one that is alive with activity” [326]. This statement should make clear beyond doubt that these numbers aren’t estimates of anything; they are instead messaging tools in a propaganda war for more funding. And funding do they need: the ENCODE project burned through roughly 400 million dollars, a significant portion of the NIH budget that couldn’t be awarded to “normal” research, and multiple follow-ups are already planned [327]. Jumping on the ENCODE bandwagon may become the only way to keep a laboratory open for a young investigator. Nature could not sit idle in the face of this travesty and made available an incredibly concise genome, that of a little carnivorous plant, Utricularia gibba, which has mostly eliminated anything in its genome that isn’t directly related to genes [328]. These are the types of genomic elements the ENCODE project newly claimed as functional, only working on human samples. If you are thinking about homo sapiens exceptionalism and hypothesize these mysterious genomic regions may be related to higher cognitive functions or free will, I will have to disappoint you. Consider, as a comparison, the common tomato, a not-so-distant cousin of Utricularia, with which it shares an 87 million years old ancestor. While the number of genes in the two genomes is similar, tomato’s size is ten times Utricularia’s and is packed with the same filler abundant in the human genome. That something that evolution can dispose of so quickly, in evolutionary time, could be of the utmost importance in both tomatoes and humans defies credulity [329]. Discovering a single plant genome lacking all protein-coding genes would revolutionize biology, but we won’t because genes are essential to an organism. Concise genomes may not be advantageous for life, but they sure helped us understand it.

3 Literature, Visual Arts and More

Blue and Green Music, 1921, by Georgia O’Keeffe [330], see Sec. 3.2.3.
Figure 9: Blue and Green Music, 1921, by Georgia O’Keeffe [330],113 see Sec. 3.2.3.

Anything expressed just right, whether it’s being really concise or just capturing it — like anything, a really well-put-together sentence or a little doodle, a caricature that looks just like someone but only used four lines, that kind of thing …

This is how Jamie Zawinski, a programmer best known for his role in creating the Netscape Browser, explains beauty in code [14]. It provides a perfect transition from engineering and science into the arts.

3.1 Speaking and Writing

3.1.1 Concise Writing

Calls for conciseness in writing abound and come from writers of all types. For example, The Elements of Style’s author William Strunk Jr. writes [331]:

Vigorous writing is concise. A sentence should contain no unnecessary words, a paragraph no unnecessary sentences, for the same reason that a drawing should have no unnecessary lines and a machine no unnecessary parts.

The Elements of Style is itself an concise book. E. B. White, author of its second edition, describes it as “[…] a forty-three-page summation of the case for cleanliness, accuracy, and brevity in the use of English” [332].

Atwood joins the praise for this book [333]:

There is perhaps no greater single reference on the topic of writing than Strunk and White’s The Elements of Style. It’s one of those essential books you discover in high school or college, and then spend the rest of your life wondering why other textbooks waste your time with all those unnecessary words to get their point across.

The Star Copy Style sheet, used at The Kansas City Star newspaper, warns to “use short sentences […] short first paragraphs” and to “eliminate every superfluous word” [334]. Ernest Hemingway, who worked there as a reporter, credited it for “containing the best rules I ever learned for the business of writing.”

However, writing short sentences doesn’t necessarily produce short books. Hemingway proposed the iceberg theory of omissions [335]:

This [the real ending] was omitted on my new theory that you could omit anything if you knew that you omitted and the omitted part would strengthen the story and make people feel something more than they understood.

(The iceberg metaphor was introduced elsewhere). To buy into conciseness, you don’t need to be a Nobel Prize winner. Dru Riley, author of a popular newsletter, makes conciseness itself a goal [336]:

People don’t realize constraints breed quality, and without constraints you just have a lot of noise. Some weeks I’ll write 13,000 words and I have to distill it down to less than ten percent of that. […] The goal is for the final report to be concise, dense, and readable.

Blogger Mike Crittenden supports reducing the number of ideas or sentences down to one [337]:

There’s no law that says a blog post needs more than one idea or more than one sentence.

Software developer Bruce Eckel doesn’t consider any length too short [338]:

Less is More. The more words you have, the more work your reader must do. Shorter sentences are easier to read and understand.

I am not sure Eckel strictly follows his own rules: the post this quote is taken from has more than 2500 words and takes roughly 16 minutes to read. However, I can’t prove they weren’t necessary.

Having the right goals is already helpful. For example, Jonathan Maltz made conciseness his new years’ resolution [339]:

Biggest one for me in 2020 was succinct written communications. I look back at stuff I’ve written in the past and I’m embarrassed by how many words I wasted.

Even computer programs encourage conciseness, and I don’t mean that of their source code, which is covered in the first part of this book. For example, the text editor Zettlr offers an assist mode to improve a writer’s style. Its “algorithm will give you text credits for short and concise sentences,” among other style criteria [340].

Remember Javascript: The Good Parts, the book about subsetting JavaScript to make it better (see Sec. 1.3.2)? For a double dose of conciseness, that book is only 100 pages long. Olano writes [202]:

Sometimes, a lot of the times, less is more. The book itself is the perfect example: 100 pages long, mostly filled with snippets and examples, and still it had a clearer purpose and communicated it more effectively than most computer books.

What about speeches? Let’s learn from Lord Pannick, counsel for the prosecution in a United Kingdom Supreme Court hearing about an unprecedented suspension of parliament [341]:

[Lord Pannick’s] delivery was smooth and unruffled. Never using two words where one would do.

The counsel for the Government, Lord Keen, was less in a hurry:

He began by spending 20 minutes arguing there was no difference between Scottish and English law in this case. This left people scratching their heads as no one had previously said there was.

The prosecution prevailed.

Four words can make a poem. The complete text of Italian poet Giuseppe Ungaretti’s Mattina roughly translates to “I shine / of immensity” [342]. I am not sure he got his point across, but I don’t think he was concerned about being intelligible. He was an exponent of the hermeticist literary current, thus named for its use of obscure metaphors. A draft, written on a postcard sent from the war front, was found to contain five verses, later reduced to two in the published version. To put such a short poem in context — short even by his standards — we need to consider that each word was vital for him and selected after a painstaking process. In Commiato he writes [343]:

Quando trovo
in questo mio silenzio
una parola
scavata è nella mia vita
come un abisso

(When I find / in this silence of mine / a word / it is dug into my life / like an abyss). Fittingly, he read his poems at a deliberate pace [344].

3.1.2 Reducing Text

Sometimes it is possible to compare a text in long and short forms. What better opportunity to evaluate the merits of conciseness?

In a review of director Simon Godwin’s Romeo and Juliet, New York Times’s Jesse Green writes [345]:

At 90 minutes, it is even shorter than the “two hours’ traffic of our stage” promised in its first lines but rarely honored in performance. (The entire play normally takes about three hours.) Yet […] this […] version scores point after point while whizzing past, or outright cutting, the elements that can make you think it was written not by Shakespeare but by O. Henry on a bender.

If the cutting merely left what remains with a much higher proportion of penetrating insight and powerful feeling, that would be enough […]. But the speed serves another function here: telling a story that’s mostly about teenagers with a teenage intensity and recklessness.

It seems that in this case, paraphrasing McLuhan, the length of the medium is the message.

Revisions of a poem or novel may provide insights into how its author sees length. Poems come in all shapes and sizes, from single-verse fragments to multi-page odes. It is thus reasonable to think that there is little external pressure towards conciseness. Yet, in the case of “The Waste Land” by T. S. Eliot, the drafts of the poem show that it lost about half of its lines before being published, partly on the advice of fellow poet Ezra Pound [346].

State of the Union speeches by American Presidents also vary considerably in length. Yet, Cody Keenan, President Obama’s chief speechwriter in his second term, stayed up all night to perform some trimming [347]:

By 5 a.m., a more succinct draft was on its way to the president.

Researchers often chafe at the strict paper length limits enforced by some journals. Not Mark Blaxter, professor of genomics, who was able to recognize that there were also advantages [348]:

I was shocked at how much of our carefully crafted text could be pruned […] to leave only the essential core of what had to be said. Many a choice turn of phrase and illustrative simile had to be sacrificed to precision, flow and brevity. This was (probably) a good thing.

Other writers even enjoy the process. For example, Pamela Erens, author of the novels The Virgins and The Understory, writes [349]:

My favorite part of writing is taking stuff out. […] when I can remove a limp adjective or superfluous sentence from a novel chapter or essay, I feel a rush that is a bit like being airborne.

3.1.3 Conciseness is Hard

When we run into verbose writings or speeches, we should cut the author some slack, because being concise is hard. “I would have written a shorter letter, but I did not have the time” wrote philosopher Blaise Pascal — this quote has also been misattributed to a half dozen other intellectuals [350].

Blogger Mike Crittenden, selecting quotes from his favorite reads for the year, can’t settle on a single one: “I couldn’t pick just one favorite quote from it, so here are two” [351]. It may sound tongue-in-cheek, but it makes sense that the decision between the top two options could be the hardest, as those are closer to the optimum than any other. A mathematical parallel is choosing an element for each of the sets in a collection. The possibility of doing so for infinite collections rests on the axiom of choice, which is independent114 from the other axioms of set theory, and was initially looked upon with suspicion by mathematicians [352]. Luckily no one has read an infinite number of books, so picking one quote from each does not rely on this axiom.

President Obama, who asked his speechwriter to shorten his State of the Union address (see the previous section), wasn’t as good with his memoir, which exceeds 500 pages [353]. He said [354]:

[I am] painfully aware that a more gifted writer could have found a way to tell the same story with greater brevity ([…] a signed copy of the 272-word Gettysburg Address rests inside a glass case)

Even computers find conciseness hard. For example, researchers working on GPT-2, a record-setting AI system to process natural language, reported [355]:

Fine-tuning for the stylistic continuation tasks is sample efficient: 5,000 human samples suffice for strong performance according to humans. For summarization, models trained with 60,000 comparisons learn to copy whole sentences from the input while skipping irrelevant preamble.

In other words, plausibly extending a text was easier for this AI system than summarizing it.

3.1.4 The Law

The text of the law, in the United States as well as in many other countries, is ponderous, to put it mildly. Just the Affordable Healthcare Act alone takes close to a thousand pages. How are we supposed to access healthcare knowing our rights and obligations? Even attempting a count of federal laws is considered futile by the experts at the Library of Congress [356]. The complexity of the law, fueled by its size, jargon, and ambiguous language, forces people to rely on lawyers to interact with the justice system and each other. This is a source of discrimination between those who can afford to hire the best and those who can’t.

In an interview, Google co-founder Larry Page said he “would dearly love all laws to be limited to 50 pages. Every time a new law is enacted an old one should be removed” [357]. It can be easily dismissed as a billionaire’s delusion, but if we accept that ignorance of the law isn’t excusable, then a limit on its length should follow as a corollary.

While there is no size limit on the text of laws, there is a law that sets some constraints on the length of forms the United States Government asks its citizens to fill out: the Paperwork Reduction Act [358]. Passed in 1980 and amended in 1995, it doesn’t impose a size limit outright. Instead, all agencies wishing to collect information must fulfill several requirements, such as demonstating the need for such collection, submitting a plan for its use, and so on. The central authority for these checks is the Office of Information and Regulatory Affairs, which, as a consequence, can produce an estimate of the burden of information gathering on citizens, which was 9.78 billion hours in 2016, roughly 6 minutes per person per day.

I couldn’t find statistics on private contracts, but, in my experience, they are always too long. Lawyered-up companies try to gain the upper hand on smaller ones or private citizens by overwhelming them with long and complex contracts or user agreements, sometimes exceeding 20,000 words [359]. Here is one refreshing exception, though. An energy startup, Voltus, offers one-page contracts and asks: “Could we make all contracts one page?” [360].

3.1.5 Counterpoint

This book was never meant to be an unbiased take on conciseness, but I would be remiss if I didn’t mention that there are texts, specifically novels, of incredible heft. According to Wikipedia, the record holder is Venmurasu for an estimated 3,680,000 words [361]. I am not laying an accusation that it is verbose: according to its author, it paints “a canvas as big as time itself,” a demanding task. The first English entry, A Chronicle of Ancient Sunlight by Henry Williamson, is a distant second with 2,436,924 words. In both cases, it may be arguable that they are cycles of novels rather than a single one. There are about ten entries over a million words. All these will likely be eclipsed by Richard Grossman’s Breeze Avenue, a collective multi-media work expected to total three million pages [362]. Not so much a novel as a “band of meaning impelled by forces analogous to those that undergird physical reality,” it is planned to “emanate,” every 36 days, “a one-of-a-kind 5,000-page evanescent novel.” Five thousand of these works are scheduled over five hundred years, each published online for a short time. The Guinness book of records must employ a tighter definition of a novel, as they list as the current champion Marcel Proust’s Remembrance of Things Past [363], totaling 1,267,069 words in 7 volumes [364]. I do know people who credibly claim to have read all of it.

It’s one thing to write unrepentantly long novels, and another thing to belittle conciseness. Cegłowski’s blog byline is “Brevity is for the weak,” without further explanation [365]. His most-read posts have an average length of about 3500 words for a reading time of 15-20 minutes. There is no contradiction, though, with his views on web design (see the “web obesity crisis” in Sec. 1.5), as his pages are only rich in text and load almost instantly.

3.2 Visual Arts

3.2.1 Architecture

We have mentioned in the Introduction that “Less is more” is an aphorism for frugality. While this concept does not originate with architect Ludwig Mies van der Rohe, it is most commonly associated with him, whose work was characterized by “extreme clarity and simplicity” [366]. He is regarded as one of the pioneers of modernist architecture, and the same minimalistic attitude guides the whole movement. Some maybe took it a little too seriously, as the title of Adolf Loos’ essay Ornament and Crime suggests [367]. He knew something about crime, being convicted of one later in life.

3.2.2 Photography

Alfred Stieglitz is credited with driving the acceptance of photography as an art form with his pictures as well his work as a promoter [368]. His style was aimed at drawing as clear a distinction as possible between photography and painting:

Stieglitz described his most recent work as ‘intensely direct. … Not a trace of hand work on either negative or prints. No diffused focus. Just the straight goods. On [some things] the lens stopped down to 128.115 But everything simplified in spite of endless detail.’

In particular, he was reacting to an early photographic movement known as pictorialism, which emphasized staged subjects and effects. Stieglitz instead saw photography as fundamentally anchored in reality.

Reality can be messy, but the photographer, no matter how direct his technique is, still exercises control over when, where, and in what direction to shoot, which results in the inclusion or exclusion of some visual elements. Landscape photographer Huntington Witherill sees this selection as fundamental [369]:

Finding the strongest way of seeing is really, to my way of thinking, intellectualizing within myself what it is that attracted me to the scene in the first place. And then, doing my best to include all of that within the photograph itself and eliminate everything else out of the photograph.

Editing a shot is an available artistic choice in most genres of photography, but not in photojournalism, where it is strictly limited to keep a picture true to reality. When Steve McCurry, one of the most celebrated photojournalists of his generation, was found to have routinely and secretly edited or even staged his, a scandal erupted [370]. Even the famous Afghan Girl has been published in at least two versions, one more “cleaned up” than the other. Inspection of several edited images reveals that removing distracting elements and enhancing colors are the most common transformations, a window into the thinking of a great photographer, if no longer a photojournalist, as he now fashions himself a “visual storyteller.”

The virtues of visual simplicity are evident not only to outstanding photographers but also to regular professionals. For example, real estate and landscape photographer Anton Gorlin doesn’t mince words [371]:

Take it this way — if it doesn’t add to the story, exclude it. You got it right — include as little as possible.
Indeed, to build great compositions, you must learn to exclude with no remorse or regrets. […] Always aim to include as little as possible.

Magnum Photos photojournalist Jerome Sessini goes one step further [372]: “I look to make photos better by subtraction rather than addition.” He acknowledges that his style evolved towards conciseness but that, initially, he was “introducing the maximum number of elements into the frame” following the example set by two of his role models (see Gilden in Sec. 3.2.4).

3.2.3 Painting

Compared to photography, painting affords the artist the highest degree of control and, specifically, the freedom to include any number of elements or details in an image. Georgia O’Keeffe, a painter (see Fig. 9) personally and professionally related to Stieglitz (see Sec. 3.2.2), advocated including fewer of them [373]:

Nothing is less real than realism. Details are confusing. It is only by selection, by elimination, and by emphasis that we get at the real meaning of things.

O’Keeffe’s quest isn’t just aesthetic but one for meaning. Her words suggest an analogy with statistics (see Sec. 2.1.1). Over-fit models can reproduce every minute detail in the data but, unable to distinguish the signal from the noise, fail to capture the laws governing the system under consideration — a failure seen in their inability to generalize to new data points.

“Art is the elimination of the unnecessary,” said Picasso [374]. Or did he? Despite the abundance of attributions on the web, I couldn’t find one pointing to a source, and exact matches in online searches are all dated 2000 or later. An earlier variant read: “Design is the elimination of the unnecessary.” It probably seemed like a good aphorism, just needing a prestigious, if inauthentic, voice.

Nonetheless, Picasso was familiar with self-imposed limitations [375]:

For a long time I limited myself to one color — as a form of discipline.

He is probably referring to his blue and rose periods — they are two distinct periods, one for each color — during which he used severely restricted palettes, arguably matching his dominant mood at that time.

His contemporary Henri Matisse applied minimalism not just to the color but to every element of a painting [376]:

In a picture every part will be visible and will play its appointed role, whether it be principal or secondary. Everything that is not useful in the picture is, it follows, harmful. A work of art must be harmonious in its entirety: any superfluous detail would replace some other essential detail in the mind of the spectator.

As with music (see 4’33”, Sec. 3.3), it is possible to push simplicity to its extreme consequences, and I suspect nobody represents that possibility better than Kazimir Malevich. He is known, among other works, for his trifecta of paintings depicting just one square each: Black Square, Red Square, and White on White. Peter Schjeldahl, an art critic, wrote [377]:

Malevich is monumental not for what he put into pictorial space but for what he took out: bodily experience, the fundamental theme of Western art since the Renaissance.

This statement places Malevich at the forefront of abstract art, but I think he was another few steps ahead. Corresponding about Black Square, Malevich wrote: “It is from zero, in zero, that the true movement of being begins” [378]. To me, White on White, a painting representing, unsurprisingly, a white square on an otherwise empty canvas, is even closer to zero, with its minimal amount of contrast. Malevich seems to be asking how abstract art can be and what can’t be left out of a painting. He stops right on the edge of answering: “nothing.” Like the silent piece 4’33” but unlike their less famous precursors (see below), Malevich’s squares have stood the test of time, are still displayed in exhibits around the world, and have a place in the history of art.

Hidden under the paint of Black Square, there is a reference to similar, earlier pieces by French humorist Alphonse Allais. He produced several monochrome canvases, filled edge-to-edge with the same color, and a silent funeral march, all satirical works [379]. Allais did not author the first published black square, though. As far as I know, that distinction goes to Paracelsian physician Robert Fludd in his 1617 The History of the Physical and Metaphysical Cosmos, where, among a set of 60 detailed engravings, it appears as an illustration of infinite darkness, the primordial state before creation [380]. It seems that, for Malevich’s squares, timing and context were as consequential as the artwork itself.

3.2.4 Counterpoint

I would be remiss if I didn’t reference some counterexamples to conciseness in the visual arts. Talking about the photography of Jerome Sessini (see Sec. 3.2.2), I mentioned that he moved away from a more crowded style that had been inspired by two of his role models, Mark Cohen and Bruce Gilden. After researching their work, I could find only a few examples of such style in Gilden’s work (for instance, [381] from Coney Island [382]). His images include multiple subjects, none clearly more important than the other, filling the frame, with no background-to-foreground separation. They seem to represent a small fraction of his work [383]. We can find this style thoroughly embraced by the Flemish Renaissance masters Hieronymus Bosch (Garden of Earthly Delights; Triptych of the Haywain) and Pieter Bruegel the Elder (Carnival and Lent; Proverbs [384]) or French engraver Jean Duvet (Fall of Babylon; Apocalypse [385]). The latter’s work, in particular, can be considered an example of horror vacui, the fear of or distaste for empty space. Originally a term used in physics dating back to Aristotle, it was later associated with Victorian architecture as a pejorative. Baroque architecture also makes extensive use of ornamentation to avoid empty space. An artistic need to fill all the available space predates all of these examples, going back at least to the Geometric Age in Greek art. However, that style, like the Islamic arabesque, seems to prefer orderly patterns to Duvet’s complex, almost chaotic arrangements.

3.3 Music

Music, like literature, doesn’t seem constrained by particularly severe length limits, with compositions lasting minutes, several hours, or anything in between. Namely, classical music, with its planned repetitions and taste for variation, seems bent on keeping the music going. Particularly during the romantic period, which saw the premiere of Beethoven’s 9th Symphony — 70 minutes — or Richard Wagner’s opera Die Meistersinger von Nürnberg (The Master-Singers of Nuremberg) — 5 hours and fifteen minutes — it seems conciseness wasn’t an important consideration. Yet Franz Liszt, in his praise of Robert Schumann’s Carnaval, wrote [386]:

[Carnaval] will assume its natural place in the public eye alongside Beethoven’s Diabelli Variations, which in my opinion it even surpasses in melodic invention and conciseness.

Hungarian composer Béla Bartók, who devoted much of his work to the study of folk music, wrote [387]:

[Folk music] is the classical model of how to express an idea musically in the most concise form, with the greatest simplicity of means, freshness and life, briefly yet completely and properly proportioned.

Moving closer to contemporary music, the work of Anton Webern stands out for its brevity. “Webern’s compositions are concise, distilled, and select; just thirty-one of his compositions were published in his lifetime,” Wikipedia reports [388]. A comprehensive recording of his opera omnia fits 6 CDs, and “the Six Bagatelles for string quartet (1913), for instance, last about three minutes in total.” Contemporary composer Arnold Schoenberg, whom Webern reciprocally influenced, described his musical aesthetics as follows: “My music must be brief. Concise! In two notes: not built, but ‘expressed’!!” Later, though, he deviated from these more clearly Webernian ideas.

An alternative to writing short pieces of music is removing the sound. John Cage achieved this with his 4’33”, in which any ensemble performing it is instructed not to make a sound [389]. It wasn’t an original idea, being preceded by half a century by Il Silenzio: pezzo caratteristico e descrittivo (stile moderno) (Silence: characteristic and descriptive piece (modern style)), pseudonymous work tentatively attributed to Edgardo Del Valle de Paz [390]. However, unlike its predecessors, it was meant to be taken seriously. Cage hesitated to “write it,” concerned it would be considered a prank. Whereas the function of computer programs or buildings is relatively clear, music doesn’t have an obvious one. Is 4’33” a concise piece of music? Or does it take that much time to accomplish nothing, like a computer program spinning idly (see the busy beaver numbers in Sec. 1.2.11.1 for interesting yet idle programs)? Or is the goal of 4’33” to provide some time to take in background noise or to reflect? I don’t have a definitive answer but 4’33” has worked like a music piece. It has been recorded, performed in front of an audience, and its score is available for purchase. As for Malevich’s squares (Sec. 3.2.3), it seems like the timing and context of the work have to deal with its success more than the actual “music.”

While revolutionary, the first “serious” silent piece didn’t leave room for many follow-ups, even if Cage himself wrote two of them, 0’0” and One3 = 4’33” (0’00”) + 𝄞. Don’t rush to publish your own 4’32” or 4’34” and hope to wow audiences and critics the way the original did. However, other approaches to minimalistic music don’t require sound to shrink or disappear, only the score. Terry Riley’s in C shows one possibility [391]. While this piece takes only one page of musical notation, its typical performances last 45 minutes to an hour. It specifies 53 loops or phrases to be repeated and combined according to somewhat loose instructions that allow performers a lot of discretion. These instructions are related to the canon, a traditional form prescribing that phrases be performed in the order they are written, with staggered starts for each instrument. Unlike the canon, performers are allowed a variable number of repetitions for each phrase. The result is a repetitive but ever so slightly changing mass of sound, strangely suspended between the calm and frantic, forever stuck in a single harmony, as the title suggests — which is one of the choices that allow so much performance freedom, like the fixed tempo. Unlike the silent piece, this approach allowed for many variations, attracted several composers, and became known as minimal music [392].

3.4 Management

Ascetic management sounds like an oxymoron. Leadership is about getting more done, conquering new markets, vanquishing the competition, and so forth. In this context, you may think of the lean startup, but that is penny-pinching rather than asceticism [393]. Nonetheless, a few isolated voices sing to a different tune. Jotform founder and CEO Aytekin Tank believed in the classic entrepreneur busyness bug: “constantly be building, working, growing and developing the next thing” [394]. Then he started seeing that being busy and being successful aren’t equivalent, and he decided that it was time to have some downtime. “But, doing less or nothing at all is easier said than done,” he writes. He almost echoes photographer Saul Leiter who wrote [395]: “I have a great respect for people who do nothing.” Or singer-songwriter Franco Battiato [396]: “Si salverà chi non ha voglia di far niente, non sa fare niente” (Those will be saved who don’t feel like doing anything, who don’t know how to do anything). Tank primarily faults societal pressure to look constantly busy for his inability to slow down. The remedy he adopted, following the likes of Bill Gates, is to take a block of time completely off. While Gates goes to a retreat to read books, Tank spends a week picking olives. We have no reason to doubt the healing effects of olive picking, but it still is work, and a week per year is not a lifestyle change. While it will take more for Tank to become a leader in idleness, we wish him well on his path to recovery.

Software entrepreneurs Jason Fried and David Heinemeier Hansson are well known for adopting some non-standard management practices at their company, Basecamp. In their book It Doesn’t Have to Be Crazy at Work, they propose their alternative to the culture of workism [397]:

The answer isn’t more hours, it’s less bullshit. Less waste, not more production. And far fewer things that induce distraction, always-on anxiety, and stress.

More recently, they included political discussions into their list of distractions, banning them and offering a buy-out to employees who didn’t like the change. A third of the company took their offer and left, making this organization more frugal overnight [398].

Fried is in favor of procrastination. Familiar with the mandatory “bias for action” in startup job ads? Fried doesn’t buy that [399]:

Not only can most things wait, most things should. You either can’t stop thinking about them, or those thoughts fade away. Let time do some work for you.

In other words, time acts as a filter. Downsides of an idea may manifest themselves, opposition to it solidify or alternatives emerge. Sure, a project may have time-sensitive requirements that preclude the luxury of time. For example, some hustling may be warranted if a project relies a once-in-175-years alignment of planets, as the Voyager missions did [400]. However, hurrying for the sake of it used to be called “being impulsive,” and it wasn’t a compliment. We may call this approach a bias for inaction.

Fried and Heinemeier Hansson run their company to their own tune and, in particular, aren’t pursuing growth at all costs, or, should I say, at any cost. Heinemeier Hansson writes [401]:

WHAT’S NEXT?! Which is really a question of WHAT’S MORE? What else are you going to do in addition to all the shit you’re already doing? It’s so ingrained in our entrepreneurial culture that you must always be on a conquest. Once a set of territories have been subdued, you’re honor-bound to push further north.

Thanks, but no thanks. […] We’ve always been great fans of constraints, and capping the headcount in the face of growth is perhaps the biggest constraint of all.

They have indeed capped headcount after whittling their product portfolio down to one (see Sec. 1.3.1). Now, with a third of the company gone, they may have hiring on their task list again. It appears from their LinkedIn page that they have already more than doubled their stated headcount limit and launched a second product. The call of the North may be too hard to resist.

3.5 Humor

Can asceticism be applied throughout your lifestyle? It would seem so from a post by none other than Google’s director of research Peter Norvig, which touts a service called “Functional Lifestyles […] dedicated to making our clients stronger, healthier, more robust, more concise […]” [402]. On closer inspection, the date of the post is April 1. But even without the help of dedicated training, people have been integrating conciseness into their lives. For example, startup founder Jon Schlossberg chose as a byline to his blogging profile “Bacon, hummus, brevity,” elevating brevity to bacon-level concern [403].

Conclusion

What I tried to corroborate with the present book is that conciseness, frugality and parsimony in their various manifestations aren’t just being stingy, or deprivations endured for the sake of character building. Neither are they nice-to-haves, a sort of generalized decluttering that makes things look neater. Instead they go to the heart of how we understand reality, by turning the complexity of observations into simple laws. Likewise, they go to the heart of how we build artifacts, which need to be predictable and reliable, that is, to follow their own simple laws. If we agree with O’Keeffe that art is a way to get “at the real meaning of things,” the same could apply in that domain as well.

The special role that computer programs have in all of this isn’t just as engineered artifacts themselves, but also as an effective way to express these laws. Traditionally, this is the domain of mathematical models, but constraining them to be computable has distinct practical and conceptual advantages. Simplicity or elegance need no longer be in the eye of the beholder but coincide with the existence, implicit or explicit, of short computational descriptions.

The surprising discovery is that to get to the truth – a program true to its specifications, a model true to a natural phenomenon, a painting true to its subject – deploying an abundance of means is counter-productive. Instead of getting more, we get less: less correctness, less maintainability, less accuracy, less elegance and even less beauty. The activities of knowing, building and creating all rest on similar notions of concise effective descriptions.

I hope this books leaves you with a newly found appreciation for the role of asceticism in the modern world and a desire to adopt it in your own endeavors.

Bibliography

For references to on-line material I tried to follow these criteria:

  1. I provided a Wayback Machine URL which is frozen in time and relies on the commitment by an organization, the Web Archive, to never take it down [404]. Since these links are formed by a prefix followed by the original URL, the original link is also implicitly available. Unfortunately some web sites do not allow this type of archiving and, as a consequence, some “rotting” of links will be unavoidable. As a backup I attempted to save PDF images for every URL in the references, but a minority failed.
  2. To set a date for a reference item, whenever the content includes a date (post date, reply date, most recent commit in a software project and so on) I used that date. This doesn’t imply the item hasn’t been updated since, but was the posted date at the time I accessed the material. Sometimes I calculated the post date from information such as “posted x years ago.” When no such date was available, I listed the date at the time the material was accessed by me.
  3. As far as authorship, I trusted the listed author or used the owner of a source code repository as the author. When only a username or handle was listed, I tried to follow links to find a real name, but I did not put on my detective hat to achieve this result, not only in the interest of time but because I didn’t intend to out anyone who wanted to remain anonymous. I am sure I confused group and individual authorship in some cases, and the boundary can be blurry for software; for any mistakes I offer my apologies.

     

[1]
Aarre Ekholm / Lehtikuva, “Ensi computer,” 1958. https://commons.wikimedia.org/wiki/File:Ensi-computer-1958.jpg
[2]
E. W. Dijkstra, “Notes on structured programming,” Technical University Eindhoven, 1970, Available: http://www.cs.utexas.edu/users/EWD/ewd02xx/EWD249.PDF
[3]
E. W. Dijkstra et al., “On the cruelty of really teaching computing science,” Communications of the ACM, vol. 32, no. 12, pp. 1398–1404, 1989, Available: http://www.cs.utexas.edu/~EWD/transcriptions/EWD10xx/EWD1036.html
[4]
C. Warburton, “Ask HN: How common is it to work on projects with no testing?” 2019. https://web.archive.org/web/20200504095748/https://news.ycombinator.com/item?id=19834777
[5]
F. P. Brooks Jr., The mythical man-month: Essays on software engineering. Pearson Education, 1995.
[6]
J. Lions, Lions’ commentary on UNIX 6th Edition with source code. Peer-to-Peer Communications, 1996.
[7]
I. Herraiz and A. E. Hassan, “Beyond lines of code: Do we need more complexity metrics?” in Making software: What really works, and why we believe it, A. Oram and G. Wilson, Eds. O’Reilly Media, 2010, pp. 125–141.
[8]
[9]
C. Jones, Program quality and programmer productivity. IBM General Products Division, Santa Teresa Laboratory, 1977.
[10]
S. McConnell, Code complete. Pearson Education, 2004.
[11]
[12]
W. Wayt Gibbs, “Software’s chronic crisis,” Scientific American, vol. 271, no. 3, pp. 86–95, 1994.
[13]
R. W. Gosper Jr., “Artificial intelligence memo no. 239.” MIT, 1972.
[14]
P. Seibel, Coders at work: Reflections on the craft of programming. Apress, 2009.
[15]
A. Oram and G. Wilson, Beautiful code leading programmers explain how they think. O’Reilly Media, Inc, 2007.
[16]
L. Prechelt, “An empirical comparison of c, c++, java, perl, python, rexx and tcl,” IEEE Computer, vol. 33, no. 10, pp. 23–29, 2000.
[17]
P. Sen, “The triumph of the nerds: The rise of accidental empires.” 1996. Available: https://web.archive.org/web/20211217111305/https://www.imdb.com/title/tt0115398/
[18]
P.-H. Kamp, “Going fast slowly — Varnish version 6.2.3 documentation,” 2021. https://web.archive.org/web/20210409230129/https://varnish-cache.org/docs/6.2/phk/thatslow.html
[19]
M. Lacey, “You’ve only added two lines — why did that take two days!” 2020. https://web.archive.org/web/20220206121940/https://www.mrlacey.com/2020/07/youve-only-added-two-lines-why-did-that.html
[20]
Anonymous, “Should I fire someone who unnecessarily wrote 500 lines of code?” 2017. https://www.quora.com/Should-I-fire-someone-who-unnecessarily-wrote-500-lines-of-code
[21]
P. Burns, Tao te programming. Lulu.com, 2012.
[22]
[23]
[24]
[25]
[26]
D. Cronin, “Issue 111700: Handle conflicts between web request extensions gracefully,” 2012. https://web.archive.org/web/20211102015547/https://bugs.chromium.org/p/chromium/issues/detail?id=111700#38
[27]
[28]
L. Givetash, “Banksy painting shreds itself moments after being sold for $1.4 million at london auction,” 2018. https://web.archive.org/web/20220120131436/https://www.nbcnews.com/news/world/banksy-painting-shreds-itself-moments-after-being-sold-1-4-n917326
[29]
T. Riedelsheimer, “Rivers and tides.” 2004. Available: https://web.archive.org/web/20211207114709/https://www.imdb.com/title/tt0307385/
[30]
[31]
N. Kakkar, “Things I learnt from a senior software engineer,” 2019. https://web.archive.org/web/20211124163250/https://neilkakkar.com/things-I-learnt-from-a-senior-dev.html
[32]
rhinoceraptor, “Senior developers are getting rejected for jobs,” 2019. https://web.archive.org/web/20201112034058/https://news.ycombinator.com/item?id=19918902
[33]
M. B. Johnson and P. Barber, “Deleting one line would have been even better,” 2015. https://web.archive.org/web/20220623225137/https://twitter.com/foobarber/status/653327149001273344
[34]
[35]
[36]
A. Hunt and D. Thomas, The pragmatic programmer: From journeyman to master. Pearson Education, 1999.
[37]
PMD contributors, “Finding duplicated code with CPD,” 2021. https://web.archive.org/web/20211028085353/https://pmd.github.io/latest/pmd_userdocs_cpd.html
[38]
[39]
E. W. Dijkstra, “The humble programmer,” ACM, vol. 15, no. 10, pp. 859–866, 1972, Available: https://www.cs.utexas.edu/~EWD/transcriptions/EWD03xx/EWD340.html
[40]
J. v. Neumann and H. H. Goldstine, “Planning and coding of problems for an electronic computing instrument,” Institute for Advanced Study, Princeton, New Jersey, 1948.
[41]
T. Figg, “Write code that is easy to delete, not easy to... — programming is terrible,” 2013. https://web.archive.org/web/20220111204045/https://programmingisterrible.com/post/139222674273/how-to-write-disposable-code-in-large-systems
[42]
W. F. Slater III, “The Internet outage and attacks of October 2002,” Chicago Chapter of the Internet Society, pp. 1–14, 2002.
[43]
J. Bentley, Programming pearls. Pearson Education, 2016.
[44]
[45]
R. Skrenta, “Code is our enemy (skrentablog).” 2007. Available: https://web.archive.org/web/20220111165205/http://www.skrenta.com/2007/05/code_is_our_enemy.html
[46]
[47]
[48]
G. C. Cielo, “Prelude into underscorejs: Higher-order functions,” 2015. https://web.archive.org/web/20170227121554/http://giocc.com/prelude-into-underscorejs-higher-order-functions.html
[49]
[50]
J. Spolsky, “Things you should never do, part IJoel on software,” 2000. https://web.archive.org/web/20220207014605/https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/
[51]
[52]
[53]
C. Treichler, “Gmail creator and YC partner Paul Buchheit on joining Google, how to become a great engineer, and happiness,” 2018. https://web.archive.org/web/20211110223307/https://triplebyte.com/blog/interview-with-gmail-creator-and-y-combinator-partner-paul-buchheit
[54]
[55]
[56]
S. H. Fuller and L. I. Millett, The future of computing performance: Game over or next level? National Academy Press, 2011.
[57]
[58]
T. M. Graf and D. Lemire, “Xor filters: Faster and smaller than bloom and cuckoo filters,” Journal of Experimental Algorithmics (JEA), vol. 25, pp. 1–16, 2020.
[59]
D. Lemire, “Xorfilter/xorfilter.go at master · FastFilter/xorfilter · GitHub,” 2021. https://web.archive.org/web/20211124215704/https://github.com/FastFilter/xorfilter/blob/master/xorfilter.go
[60]
D. Lemire, “Bloom/bloom.go at master · bits-and-blooms/bloom · GitHub,” 2021. https://web.archive.org/web/20220112182302/https://github.com/bits-and-blooms/bloom/blob/master/bloom.go
[61]
[62]
M. Kleppmann, “Figuring out the future of distributed data systems (interview),” 2019. https://web.archive.org/web/20210412201847/https://martin.kleppmann.com/2019/06/27/hydra-interview.html
[63]
C. Newcombe, T. Rath, F. Zhang, B. Munteanu, M. Brooker, and M. Deardeuff, “How Amazon web services uses formal methods,” Communications of the ACM, vol. 58, no. 4, pp. 66–73, 2015.
[64]
dwild and A. Freund, “Ask HN: What are the "best" codebases that you’ve encountered?” 2019. https://web.archive.org/web/20201107224323/http://news.ycombinator.com/item?id=20556336
[65]
[66]
K. E. Iverson, “Notation as a tool of thought,” in ACM Turing award lectures, 2007, p. 1979.
[67]
tweakimp, “[Feature request] optimize imports · issue #333 · psf/black · GitHub,” 2018. https://web.archive.org/web/20220113011431/https://github.com/psf/black/issues/333
[68]
H. Abelson, G. J. Sussman, and J. Sussman, Structure and interpretation of computer programs. MIT Press, 1999.
[69]
M. Simionato, “Introduction — decorator 4.1.2 documentation,” 2017. https://web.archive.org/web/20170728231901/http://decorator.readthedocs.io/en/master/tests.documentation.html
[70]
[71]
[72]
[73]
taspeotis, “Ask HN: What are the ‘best’ codebases that you’ve encountered?” 2019. https://web.archive.org/web/20201107224323/http://news.ycombinator.com/item?id=20556336
[74]
[75]
[76]
J. Hughes, “QuickCheck: An automatic testing tool for Haskell,” 2000. https://web.archive.org/web/20210601171144/http://www.cse.chalmers.se/~rjmh/QuickCheck/
[77]
[78]
[79]
[80]
D. E. Knuth, “Literate programming,” The computer journal, vol. 27, no. 2, pp. 97–111, 1984.
[81]
[82]
[83]
[84]
[85]
[86]
P. Graham, On Lisp: Advanced techniques for Common Lisp. Prentice Hall, 1994.
[87]
E. Kohlbecker, D. P. Friedman, M. Felleisen, and B. Duba, “Hygienic macro expansion,” in Proceedings of the 1986 ACM conference on LISP and functional programming, 1986, pp. 151–161.
[88]
[89]
[90]
A. Calpini, “Computer language shootout scorecard,” 2022. https://web.archive.org/web/20220111162846/http://dada.perl.it/shootout/craps_loc.html
[91]
D. Lebrero, “Lines of code — does it matter?” 2016. https://web.archive.org/web/20220111163128/https://labs.ig.com/lines-of-code-matter
[92]
[93]
CMU Cylab, BinaryAnalysisPlatform/bap at 3d8bcfae39238c49fdefb0fd7d186336d56662e9,” 2019. https://web.archive.org/web/20220112170607/https://github.com/BinaryAnalysisPlatform/bap/tree/3d8bcfae39238c49fdefb0fd7d186336d56662e9
[94]
M. Hashemi, “Funcutils — functools fixes — boltons 21.0.0 documentation,” 2021. https://web.archive.org/web/20210411001326/https://boltons.readthedocs.io/en/latest/funcutils.html
[95]
J. Chang, “Interview by DataScience.LA at useR 2014,” 2014. https://www.youtube.com/watch?v=uJm-its3ZWM#t=540
[96]
[97]
P. Burns, The R inferno. Lulu. com, 2012.
[98]
S. Grondin, “A rate limiter in 31 lines of coffeescript. A demonstration of it exceptional expressiveness,” 2014. https://web.archive.org/web/20140910095918/http://www.reddit.com/r/coffeescript/comments/1vngd3/a_rate_limiter_in_31_lines_of_coffeescript_a/
[99]
[100]
[101]
E. Hoigaard, “Partial with free variables — smooth CoffeeScript,” 2011. https://web.archive.org/web/20201030230014/https://autotelicum.github.io/Smooth-CoffeeScript/literate/partial.html
[102]
K. Prasad, K. Norton, and T. Coatta, “Node at LinkedIn: The pursuit of thinner, lighter, faster: A discussion with Kiran Prasad, Kelly Norton, and Terry Coatta,” Queue, vol. 11, no. 12, pp. 40–48, 2013.
[103]
“Erlang: The movie,” 1991. https://archive.org/details/ErlangTheMovie
[104]
J. Armstrong, “What’s all this fuss about Erlang?” 2007. https://web.archive.org/web/20071028192349/http://www.pragprog.com/articles/erlang
[105]
J. Armstrong, “A history of Erlang,” in Proceedings of the third ACM SIGPLAN conference on history of programming languages, 2007, pp. 6–1.
[106]
[107]
[108]
SuperKotlin contributors, “SuperKotlin — because Kotlin give you superpowers,” 2021. https://web.archive.org/web/20210506084944/https://superkotlin.com/
[109]
K. Ushey, “Ideas: Variadic templates · issue #9 · Rcpp11/rjournal_article · GitHub,” 2014. https://web.archive.org/web/20200916075832/https://github.com/Rcpp11/rjournal_article/issues/9
[110]
[111]
[112]
[113]
[114]
[115]
R. Thomas, “Big deep learning news: Google Tensorflow chooses Keras · fast.ai,” 2017. https://web.archive.org/web/20220111053319/https://www.fast.ai/2017/01/03/keras/
[116]
[117]
Open AI, “Introducing Triton: Open-source GPU programming for neural networks,” 2021. https://web.archive.org/web/20220112220248/https://openai.com/blog/triton/
[118]
[119]
[120]
[121]
[122]
M. Armbrust et al., “Spark sql: Relational data processing in spark,” in Proceedings of the 2015 ACM SIGMOD international conference on management of data, 2015, pp. 1383–1394.
[123]
Scalding contributors, “A DSL that is concise and fun,” 2021. https://web.archive.org/web/20211216060806/https://twitter.com/scalding
[124]
[125]
[126]
V. Buffalo and J. J. Emerson, “I ♥ how r makes much of data analysis trivial. How good are genotype imputations by freq class? Table(real_geno, imputed_geno, cut(freq, 5)),” 2014. https://web.archive.org/web/20220112062711/https://twitter.com/vsbuffalo/status/489161013879508992
[127]
[128]
Hadley Wickham and Romain François and Lionel Henry and Kirill Müller, “Programming with dplyr,” 2022. https://web.archive.org/web/20220510140801/https://dplyr.tidyverse.org/articles/programming.html
[129]
A. Piccolboni, “Make it possible to program with dplyr · issue #352 · tidyverse/dplyr · GitHub,” 2014. https://web.archive.org/web/20220111053627/https://github.com/tidyverse/dplyr/issues/352
[130]
[131]
pandas-ply, “Pandas-ply: Functional data manipulation for pandas — pandas-ply 0.2.1 documentation,” 2020. https://web.archive.org/web/20201206015936/https://pythonhosted.org/pandas-ply/
[132]
RISELab, “Scale your pandas workflow by changing a single line of code — Modin,” 2020. https://web.archive.org/web/20201212113414/https://modin.readthedocs.io/en/latest/index.html
[133]
E. Tufte, The visual display of quantitative information. Graphics Press, 1983.
[134]
[135]
L. Wilkinson, D. Wills, D. Rope, A. Norton, and R. Dubbs, The grammar of graphics. Springer New York, 2005.
[136]
H. Wickham, “A layered grammar of graphics,” Journal of Computational and Graphical Statistics, vol. 19, no. 1, pp. 3–28, 2010, doi: 10.1198/jcgs.2009.07098.
[137]
[138]
[139]
F. Hautbois, “Bokeh vs Dash — which is the best dashboard framework for Python?” 2018. https://web.archive.org/web/20200730043435/https://www.sicara.ai/blog/2018-01-30-bokeh-dash-best-dashboard-framework-python
[140]
[141]
H. Wickham et al., “Elegant graphics for data analysis,” Media, vol. 35, no. 211, pp. 10–1007, 2009.
[142]
[143]
[144]
M. Bostock, “D3.js — data-driven documents,” 2022. https://web.archive.org/web/20220112061739/https://d3js.org/
[145]
[146]
[147]
R. S. Brown, “Django vs flask vs pyramid: Choosing a Python web framework,” 2014. https://web.archive.org/web/20210827143752/https://www.airpair.com/python/posts/django-flask-pyramid
[148]
[149]
[150]
[151]
T. Crosley, “Hug: Embrace the APIs of the future,” 2022. https://web.archive.org/web/20220111053049/http://www.hug.rest/website/learn/
[152]
T. Crosley, “Show HN: FastAPI: Build Python APIs with Go-like speed and automatic UI docs,” 2019. https://web.archive.org/web/20200920151608/https://news.ycombinator.com/item?id=19441195
[153]
S. Ramírez, “Fastapi/README.md at master · tiangolo/fastapi · GitHub,” 2021. https://web.archive.org/web/20211208173952/https://github.com/tiangolo/fastapi/blob/master/README.md
[154]
T. Flanagan, “Knio/dominate: Dominate is a Python library for creating and manipulating HTML documents using an elegant DOM API. It allows you to write HTML pages in pure Python very concisely, which eliminate the need to learn another template language, and to take advantage of the more powerful features of Python.” 2020. https://web.archive.org/web/20211023143453/https://github.com/Knio/dominate
[155]
[156]
Jinja contributors, “Jinja documentation (3.1.x),” 2022. https://web.archive.org/web/20220602015124/https://jinja.palletsprojects.com/en/3.1.x/
[157]
T. Filiba, “Minima/README.md at master · tomerfiliba/minima · GitHub,” 2012. https://web.archive.org/web/20200920215526/https://github.com/tomerfiliba/minima/blob/master/README.md
[158]
D. R. Greenfeld, “Requests III: HTTP for humans and machines, alike — requests 2.21.0 documentation,” 2021. https://web.archive.org/web/20211216171941/https://3.python-requests.org/
[159]
[160]
[161]
[162]
S. Marié, “Smarie/python-vtypes: Validating types for Python — use ‘isinstance()‘ to validate both type and value.” 2020. https://web.archive.org/web/20201110035159/https://github.com/smarie/python-vtypes/
[163]
[164]
D. Fee, “Forge (Python) signatures for fun and profit — forge 18.6.0 documentation,” 2021. https://web.archive.org/web/20210305232705/https://python-forge.readthedocs.io/en/latest/
[165]
[166]
H. Schlawack, “Why not… — attrs 21.4.0 documentation,” 2021. https://web.archive.org/web/20211224224519/https://www.attrs.org/en/stable/why.html
[167]
P. Meilstrup, “Crowding/vadr: Making R a better language,” 2015. https://web.archive.org/web/20201202234130/https://github.com/crowding/vadr
[168]
[169]
L. A. F. Pereira, “Lwan web server,” 2021. https://web.archive.org/web/20211223131551/https://lwan.ws//
[170]
I. Calvino, Perché leggere i classici. Mondadori, 2010.
[171]
neurodrone, “Lwan is a work of art. Every time I read through it, I am almost always awe-struck,” 2013. https://web.archive.org/web/20200513043536/https://twitter.com/neurodrone/status/359296080283840513
[172]
dwm contributors, “Dwm — dynamic window manager,” 2022. https://web.archive.org/web/20220110041251/https://dwm.suckless.org/
[173]
P. Bailey, “Blue lines: @Google’s old translate program, 500k lines of stats-focused code. Green: Now, 500 lines of @tensorflow,” 2017. https://web.archive.org/web/20210630055315/https://twitter.com/DynamicWebPaige/status/915326707107844097
[174]
D. Castelvecchi, “Deep learning boosts google translate tool,” Nature, vol. 27, 2016.
[175]
J. Mattheij, “Sorting 2 tons of lego, the software side,” 2017. https://web.archive.org/web/20211028233449/https://jacquesmattheij.com/sorting-lego-the-software-side/
[176]
D. Blackman and S. Vigna, “xoroshiro128plus.c,” 2018. https://web.archive.org/web/20220704174758/https://prng.di.unimi.it/xoroshiro128plus.c
[177]
S. Vigna, “Il prof della Statale conquista la Silicon Valley con un algoritmo,” 2016. https://web.archive.org/web/20201108092100/https://www.ilgiorno.it/milano/cronaca/sebastiano-vigna-algoritmo-chrome-1.1648818
[178]
S. Aaronson, “The busy beaver frontier,” ACM SIGACT News, vol. 51, no. 3, pp. 32–54, 2020.
[179]
M. Andreev, “Busy beavers and Kolmogorov complexity,” in Conference on computability in Europe, 2016, pp. 195–204.
[180]
A. Q. Gates, V. Kreinovich, and L. Longpre, “Kolmogorov complexity justifies software engineering heuristics,” vol. 66, pp. 150–154, 1998.
[181]
[182]
[183]
R. Nystrom, “A random dungeon generator that fits on a business card · GitHub,” 2019. https://web.archive.org/web/20220112171138/https://gist.github.com/munificent/b1bcd969063da3e6c298be070a22b604
[184]
B. Leach, “In pursuit of production minimalism — brandur.org,” 2017. https://web.archive.org/web/20220112062214/https://brandur.org/minimalism
[185]
[186]
D. Riley, “Deduplicator,” 2022. https://www.linkedin.com/in/drurly/
[187]
G. Orosz, “Software architecture is overrated, clear and simple design is underrated,” 2019. https://web.archive.org/web/20220111054416/https://blog.pragmaticengineer.com/software-architecture-is-overrated/
[188]
[189]
[190]
[191]
[192]
nostrademons, “For those who work inside google, it’s well worth it to look at Jeff & Sanjay ...” 2013. https://web.archive.org/web/20210512003539/https://news.ycombinator.com/item?id=5496914
[193]
[194]
[195]
[196]
[197]
K. Beck, C. Andres, and E. Gamma, Extreme programming explained: Embrace change. Addison-Wesley, 2004.
[198]
[199]
I. Turner-Trauring, “From 10x programmer to 0.1x programmer: Creating more with less,” 2016. https://web.archive.org/web/20220202042625/https://codewithoutrules.com/2016/08/25/the-01x-programmer/
[200]
[201]
C. Ogden, “Google graveyard — killed by Google,” 2022. https://web.archive.org/web/20220104055105/https://killedbygoogle.com/
[202]
[203]
D. Crockford, JavaScript: The good parts. O’Reilly Media, Inc., 2008.
[204]
C. A. R. Hoare, “The emperor’s old clothes,” in ACM Turing award lectures, 2007, p. 1980.
[205]
Pypy contributors, “RPython language — RPython documentation,” 2021. https://web.archive.org/web/20211123102802/https://rpython.readthedocs.io/en/latest/rpython.html
[206]
[207]
D. Picheta, “Coconut: Pythonic functional programming,” 2019. https://web.archive.org/web/20201112190052/https://news.ycombinator.com/item?id=18815125
[208]
[209]
[210]
[211]
D. Halter, “Davidhalter/parso: A Python parser,” 2021. https://web.archive.org/web/20220111053517/https://github.com/davidhalter/parso
[212]
Google Research, “Google AI blog: Tangent: Source-to-source debuggable derivatives,” 2017. https://web.archive.org/web/20210517041533/https://ai.googleblog.com/2017/11/tangent-source-to-source-debuggable.html
[213]
N. Marz, “Redplanetlabs/specter: Clojure(Script)’S missing piece,” 2021. https://web.archive.org/web/20210814183655/https://github.com/redplanetlabs/specter
[214]
[215]
Y. Xie, “Yihui/testit: A simple package for testing R packages,” 2021. https://web.archive.org/web/20210416103236/https://github.com/yihui/testit
[216]
H. Wickham, “Testthat: Get started with testing,” The R Journal, vol. 3, pp. 5–10, 2011, Available: https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf
[217]
[218]
[219]
C. Ariza, “About StaticFrameStaticFrame 0.8.33 documentation,” 2022. https://web.archive.org/web/20220112013622/https://static-frame.readthedocs.io/en/latest/intro.html
[220]
E. Montale, Ossi di seppia. Torino: Piero Gobetti editore, 1925.
[221]
st contributors, “St — simple terminal,” 2022. https://web.archive.org/web/20220103005804/http://st.suckless.org/
[222]
B. Taylor and K. Gibbs, “Introducing Quip,” 2013. https://web.archive.org/web/20220111054053/https://quip.com/blog/introducing-quip
[223]
R. van der Merwe, “The data-pixel approach to improving user experience — Smashing magazine,” 2011. https://web.archive.org/web/20210411221532/https://www.smashingmagazine.com/2011/11/the-data-pixel-approach-to-improving-user-experience/
[224]
[225]
[226]
I. Turner-Trauring, “It may not be your fault, but it’s always your responsibility,” 2017. https://web.archive.org/web/20220111054056/https://codewithoutrules.com/2017/06/27/its-always-your-fault-in-practice/
[227]
R. Barry, Using the FreeRTOS real time kernel: A practical guide. Real Time Engineers, 2010.
[228]
Amazon AWS, “FreeRTOS/FreeRTOS-kernel: FreeRTOS kernel files only, submoduled into https://github.com/FreeRTOS/FreeRTOS and various other repos,” 2021. https://web.archive.org/web/20211113212542/https://github.com/FreeRTOS/FreeRTOS-Kernel
[229]
Z. Jiong, “A heavily commented Linux kernel source code,” 2019. https://web.archive.org/web/20220207190152/http://oldlinux.org/download/ECLK-5.0-WithCover.pdf
[230]
[231]
[232]
RK ROYAL KLUDGE, “RK ROYAL KLUDGE royal kludge RK61 61 keys wired/wireless multi-device yellow LED backlit mechanical gaming/office keyboard for iOS, android, win,” 2022. https://web.archive.org/web/20220111195001/https://www.kmart.com/rk-royal-kludge-royal-kludge-rk61-61-keys/p-A034750050
[233]
S. Jobs, “Steve Jobs introduces iPhone in 2007,” 2007. https://web.archive.org/web/20220207075239/https://www.youtube.com/watch?v=MnrJzXM7a6o
[234]
J. Markoff, “That iPhone has a keyboard, but it’s not mechanical,” 2007. https://web.archive.org/web/20200603010850/http://www.nytimes.com/2007/06/13/technology/13phone.ready.html
[235]
B. Bennett, “iPhone 13 is coming soon, but will apple’s new iPhone be buttonless?” 2021. https://web.archive.org/web/20210621182754/https://www.cnet.com/news/iphone-13-could-completely-buttonless-apple-might-instead/
[236]
D. A. Patterson and C. H. Sequin, “RISC i: A reduced instruction set VLSI computer,” in 25 years of the international symposia on computer architecture (selected papers), 1998, pp. 216–230.
[237]
[238]
E. S. Raymond, The art of UNIX programming. Pearson Education, 2003.
[239]
[240]
T. Anderson, “Linux in 2020: 27.8 million lines of code in the kernel, 1.3 million in systemd,” 2020. https://web.archive.org/web/20220112232321/https://www.theregister.com/2020/01/06/linux_2020_kernel_systemd_code/
[241]
[242]
B. X. Chen and Z. Ramzan, “Q&a on Heartbleed: A flaw missed by the masses,” 2014. https://web.archive.org/web/20140409194244/http://bits.blogs.nytimes.com/2014/04/09/qa-on-heartbleed-a-flaw-missed-by-the-masses/
[243]
M. Williams, M. Tüxen, and R. Seggelmann, “Transport layer security (TLS) and datagram transport layer security (DTLS) heartbeat extension,” 2012. https://web.archive.org/web/20220706062727/https://datatracker.ietf.org/doc/rfc6520/
[244]
K. Chang, “Boeing Starliner flight’s flaws show ‘fundamental problem,’ NASA says,” 2020. https://web.archive.org/web/20210604191825/https://www.nytimes.com/2020/02/07/science/boeing-starliner-nasa.html
[245]
[246]
[247]
M. Cegłowski, “The website obesity crisis [web directions 2015 keynote],” 2015. https://web.archive.org/web/20210516230735/https://www.youtube.com/watch?v=iYpl0QVCr6U
[248]
[249]
R. E. Sward, “The rise, fall and persistence of Ada,” in Proceedings of the ACM SIGAda annual international conference on SIGAda, 2010, pp. 71–74.
[250]
[251]
[252]
E. F. Codd, The relational model for database management: Version 2. Addison-Wesley Longman Publishing Co., Inc., 1990.
[253]
J. Bloch, N. Gafter, and J. Rose, “Java/6 : Java/util/arrays.java,” 2006. https://web.archive.org/web/20220207173411/https://code.yawk.at/java/6/java/util/Arrays.java
[254]
[255]
ConceptJunkie, “Ask HN: Have you ever inherited a codebase nobody on the team could understand?” 2018. https://web.archive.org/web/20201108091110/http://news.ycombinator.com/item?id=18554272
[256]
E. W. Dijkstra, “The goto statement considered harmful,” Communications of the ACM, vol. 11, no. 3, pp. 147–148, 1968, Available: https://homepages.cwi.nl/~storm/teaching/reader/Dijkstra68.pdf
[257]
E. Dijkstra, “Programming considered as a human activity,” in Classics in software engineering, 1979, pp. 1–9.
[258]
C. Böhm and G. Jacopini, “Flow diagrams, Turing machines and languages with only two formation rules,” Communications of the ACM, vol. 9, no. 5, pp. 366–371, 1966.
[259]
S. R. Kosaraju, “Analysis of structured programs,” Journal of Computer and System Sciences, vol. 9, no. 3, pp. 232–255, 1974.
[260]
D. E. Knuth, “Structured programming with go to statements,” ACM Computing Surveys (CSUR), vol. 6, no. 4, pp. 261–301, 1974.
[261]
L. Torvalds, “Linux: Using goto in kernel code,” 2003. https://web.archive.org/web/20051128093253/http://kerneltrap.org/node/553/2131
[262]
[263]
[264]
R. C. Martin, Clean code: A handbook of agile software craftsmanship. Prentice Hall, 2009.
[265]
[266]
[267]
J. Glanz, J. Creswell, T. Kaplan, and Z. Wichter, “After a Lion Air 737 Max crashed in october, questions about the plane arose,” 2019. https://web.archive.org/web/20220116135942/https://www.nytimes.com/2019/02/03/world/asia/lion-air-plane-crash-pilots.html
[268]
G. Travis, “How the Boeing 737 Max disaster looks to a software developer,” IEEE Spectrum, vol. 18, 2019.
[269]
[270]
D. E. Knuth, Selected papers on computer science. Cambridge University Press, 1996.
[271]
[272]
[273]
Suckless contributors, “Software that sucks less,” 2022. https://web.archive.org/web/20220110071413/https://suckless.org/philosophy/
[274]
[275]
G. Garitan, “Caudron Renault Simoun 06772,” 2013. https://commons.wikimedia.org/wiki/File:Caudron_Renault_Simoun_06772.JPG
[276]
E. Sober, Ockham’s razors: A user’s manual. Cambridge University Press, 2015.
[277]
[278]
W. D. Ross, Prior and posterior analytics. Clarendon Press, 1949.
[279]
J. W. Tukey and M. B. Wilk, “Data analysis and statistics: An expository overview,” pp. 695–709, 1966.
[280]
[281]
A. Gelman, “Bayes, jeffreys, prior distributions and the philosophy of statistics,” Statistical Science, vol. 24, no. 2, pp. 176–178, 2009.
[282]
B. Efron, “Why isn’t everyone a Bayesian?” The American Statistician, vol. 40, no. 1, pp. 1–5, 1986.
[283]
[284]
W. H. Jefferys and J. O. Berger, “Sharpening Ockham’s razor on a Bayesian strop,” Technical Report, p. 157, 1991.
[285]
B. Efron, “Controversies in the foundations of statistics,” The American Mathematical Monthly, vol. 85, no. 4, pp. 231–246, 1978.
[286]
M. Hutter, S. Legg, and P. M. B. Vitanyi, “Algorithmic probability,” 2007. https://web.archive.org/web/20211225105259/http://www.scholarpedia.org/article/Algorithmic_probability
[287]
A. M. Turing, “On computable numbers, with an application to the Entscheidungsproblem,” J. of Math, vol. 58, no. 345–363, p. 5, 1936.
[288]
F. Del Santo and N. Gisin, “Physics without determinism: Alternative interpretations of classical physics,” Physical Review A, vol. 100, no. 6, p. 062107, 2019.
[289]
R. J. Solomonoff, “Algorithmic probability: Theory and applications,” in Information theory and statistical learning, Springer, 2009, pp. 1–23.
[290]
J. Copeland, M. Sprevak, and O. Shagrir, “Is the universe computational?” in The turing guide, B. J. Copeland, J. Bowen, M. Sprevak, and R. Wilson, Eds. Oxford University Press, 2017, pp. 445–462.
[291]
C. S. Wallace and D. L. Dowe, “Minimum message length and Kolmogorov complexity,” The Computer Journal, vol. 42, no. 4, pp. 270–283, 1999.
[292]
P. M. Vitányi and M. Li, “Minimum description length induction, Bayesianism, and Kolmogorov complexity,” IEEE Transactions on information theory, vol. 46, no. 2, pp. 446–464, 2000.
[293]
J. Rissanen, “Modeling by shortest data description,” Automatica, vol. 14, no. 5, pp. 465–471, 1978.
[294]
I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. MIT Press, 2016.
[295]
A. Halevy, P. Norvig, and F. Pereira, “The unreasonable effectiveness of data,” IEEE intelligent systems, vol. 24, no. 2, pp. 8–12, 2009.
[296]
[297]
M. McFarland, “Waymo’s self-driving taxi struggles with left turns and puddles. But it’s still winning over some Arizona riders,” 2021. https://web.archive.org/web/20211230182629/https://www.cnn.com/2021/10/12/tech/waymo-one-year/index.html
[298]
T. Brown et al., “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
[299]
C. Sun, A. Shrivastava, S. Singh, and A. Gupta, “Revisiting unreasonable effectiveness of data in deep learning era,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 843–852.
[300]
[301]
J. Hoffmann et al., “Training compute-optimal large language models,” arXiv preprint arXiv:2203.15556, 2022.
[302]
A. Adadi, “A survey on data-efficient algorithms in big data era,” Journal of Big Data, vol. 8, no. 1, pp. 1–54, 2021.
[303]
C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, “Understanding deep learning (still) requires rethinking generalization,” vol. 64, no. 3, pp. 107–115, 2021, Available: https://arxiv.org/pdf/1611.03530.pdf
[304]
S. Arora, “Proving generalization of deep nets via compression — off the convex path,” 2018. https://web.archive.org/web/20210426184505/http://www.offconvex.org/2018/02/17/generalization2/
[305]
[306]
A. B. Hakim, Historical introduction to philosophy. Taylor & Francis, 2016.
[307]
[308]
“Are mathematicians finally satisfied with Andrew Wiles’s proof of fermat’s last theorem? Why has this theorem been so difficult to prove?” Scientific American, 1999, Available: https://web.archive.org/web/20211009031030/https://www.scientificamerican.com/article/are-mathematicians-finall/
[309]
[310]
K. Appel, K. I. Appel, and W. Haken, Every planar map is four colorable. American Mathematical Society, 1989.
[311]
G. Zukav, The dancing wu li masters: An overview of the new physics. HarperCollins, 2009.
[312]
Wikipedia contributors, “List of long mathematical proofs,” 2022. https://web.archive.org/web/20220105041705/https://en.wikipedia.org/wiki/List_of_long_mathematical_proofs
[313]
M. Aigner, K. H. Hofmann, and G. M. Ziegler, Proofs from THE BOOK. Springer Berlin Heidelberg, 2018.
[314]
[315]
[316]
L. J. K. Setright, Drive on!: A social history of the motor car. Granta, 2003.
[317]
A. de Saint-Exupéry and L. Galantière, Wind, sand and stars. Harcourt Brace Jovanovich, 1992.
[318]
[319]
[320]
S. Jordahn, “Dieter Rams ’regrets contributing to culture of overconsumption’,” 2019. https://web.archive.org/web/20220129005633/https://www.dezeen.com/2019/03/05/dieter-rams-documentary-gary-hustwit-interview/
[321]
G. S. Adams, B. A. Converse, A. H. Hales, and L. E. Klotz, “People systematically overlook subtractive changes,” Nature, vol. 592, no. 7853, pp. 258–261, 2021.
[322]
J. Harriss, The tallest tower: Eiffel and the belle epoque. Unlimited Publishing LLC, 2004.
[323]
P. Weidman and I. Pinelis, “Model equations for the Eiffel tower profile: Historical perspective and new results,” Comptes Rendus Mecanique, vol. 332, no. 7, pp. 571–584, 2004.
[324]
M. A. Nobrega, Y. Zhu, I. Plajzer-Frick, V. Afzal, and E. M. Rubin, “Megabase deletions of gene deserts result in viable mice,” Nature, vol. 431, no. 7011, pp. 988–993, 2004.
[325]
[326]
[327]
T. T. Dan Graur Michael Eisen, “Debating ENCODE: Dan Graur, Michael Eisen,” 2013. https://web.archive.org/web/20160422024813/http://mendelspod.com/podcast/debating-encode-with-dan-graur-and-michael-eisen/
[328]
E. Ibarra-Laclette, E. Lyons, G. Hernández-Guzmán, and et al., “Architecture and evolution of a minute plant genome,” vol. 498, pp. 94–98, 2013, Available: https://www.nature.com/articles/nature12132
[329]
[330]
[331]
W. Strunk Jr. and R. De A’Morelli, The elements of style: Classic edition (2018): With editor’s notes, new chapters & study guide. Spectrum Ink, 2018.
[332]
E. B. White and W. Strunk, The elements of style. Macmillan New York, 1972.
[333]
[334]
[335]
E. Hemingway, A moveable feast. Good Press, 2021.
[336]
[337]
M. Crittenden, “Atomic blog posts — Mike Crittenden,” 2021. https://web.archive.org/web/20210120234133/https://critter.blog/2021/01/06/atomic-blog-posts/
[338]
[339]
[340]
[341]
[342]
G. Ungaretti, R. Angelica, and C. M. Romano, Sentimento del tempo. Fondazione A. e A. Mondadori, 1988.
[343]
G. Ungaretti, L’allegria. Milano: Preda, 1931.
[344]
[345]
[346]
T. S. Eliot, V. Eliot, and E. Pound, The waste land: A facsimile and transcript of the original drafts, including the annotations of Ezra Pound. Harcourt Brace Jovanovich, 1974.
[347]
[348]
M. Blaxter, “Eight things I learnt from #tardigate — nematodes.org,” 2016. https://web.archive.org/web/20220112061157/http://nematodes.org/blog/eight-things-i-learnt-from-tardigate/
[349]
[350]
[351]
M. Crittenden, “2020 BookRiot read harder challenge — mike crittenden,” 2020. https://web.archive.org/web/20210420183825/https://critter.blog/2020/12/21/2020-bookriot-read-harder-challenge/
[352]
J. L. Bell, “The axiom of choice (Stanford encyclopedia of philosophy),” 2021. https://web.archive.org/web/20211210191223/https://plato.stanford.edu/entries/axiom-choice/
[353]
B. Obama, A promised land. Crown, 2020.
[354]
M. Pengelly, “Barack obama: ’Americans spooked by black man in White House’ led to Trump presidency,” 2020. https://web.archive.org/web/20210817155612/https://www.theguardian.com/us-news/2020/nov/12/barack-obama-memoir-donald-trump
[355]
D. M. Ziegler et al., “Fine-tuning language models from human preferences,” arXiv preprint arXiv:1909.08593, 2019.
[356]
[357]
C. Matyszczyk, “Google’s Brin: Individual car ownership needs to go,” 2014. https://web.archive.org/web/20210411173114/https://www.cnet.com/news/googles-brin-we-want-to-end-individual-car-ownership/
[358]
U. States. Congress. House. C. on Oversight and G. Reform, Paperwork Reduction Act of 1995. 1995.
[359]
[360]
[361]
[362]
[363]
[364]
[365]
[366]
[367]
A. Loos, Ornament and crime. Penguin Books Limited, 2019.
[368]
[369]
H. Witherill, “Video: How to improve your compositions, from a photographer who worked with Ansel Adams,” 2021. https://web.archive.org/web/20220111061534/https://www.dpreview.com/news/3358443646/video-how-to-improve-your-compositions-from-a-photographer-who-worked-with-ansel-adams
[370]
[371]
A. Gorlin, “Photography composition: The definitive guide — landscape photography,” 2018. https://web.archive.org/web/20220111165249/https://antongorlin.com/blog/photography-composition-definitive-guide/
[372]
J. Sessini, “14 Magnum photographers reveal the single image that ’changed everything’,” 2015. https://web.archive.org/web/20201027121724/https://www.featureshoot.com/2015/06/magnum-photographers-reveal-the-single-image-that-changed-everything/
[373]
G. O’Keeffe, J. Stuhlman, B. B. Lynes, N. M. of Art, G. O. Museum, and M. I. of Arts, “Georgia OKeeffe: Circling around abstraction,” 2007.
[374]
Unknown, “Quote by Pablo Picasso: ’Art is the elimination of the unnecessary’,” 2017. https://web.archive.org/web/20170411221156/http://www.goodreads.com/quotes/163897-art-is-the-elimination-of-the-unnecessary
[375]
P. Picasso and D. Ashton, Picasso on art: A selection of views. Thames; Hudson, 1972.
[376]
H. Matisse, J. Flam, and J. D. Flam, Matisse on art. University of California Press, 1995.
[377]
[378]
M. Drutt et al., Kazimir Malevich: Suprematism. Guggenheim Museum, 2003.
[379]
A. Allais, Album primo-avrilesque. Bellenand, 1962.
[380]
[381]
[382]
B. Gilden, Coney Island, 1969–1986. Magnum Editions, 2002.
[383]
[384]
J. Snyder, Northern renaissance art: Painting, sculpture, the graphic arts from 1350 to 1575. Prentice-Hall, 1985.
[385]
C. T. Eisler, The master of the unicorn: The life and work of Jean Duvet. Abaris Books, 1979.
[386]
R. Schumann, Carnaval, op. 9. Concert Hall, 2000.
[387]
B. Bartók, “Folk music and the "free and equal treatment of the twelve tones": Aspects of béla bartók’s synthetic methods,” 2019. https://web.archive.org/web/20201110033030/https://symposium.music.org/index.php/27/item/2019-folk-music-and-the-free-and-equal-treatment-of-the-twelve-tones-aspects-of-bela-bartoks-synthetic-methods
[388]
[389]
J. Cage, 4’33": John Cage centennial edition. Henmar Press, 2012.
[390]
Wikipedia contributors, “List of silent musical compositions,” 2021. https://web.archive.org/web/20210528142141/https://en.wikipedia.org/wiki/List_of_silent_musical_compositions
[391]
T. Riley, Terry Riley in C. Wergo, 2002.
[392]
W. Mertens, M. Nyman, and J. Hautekiet, American minimal music: La Monte Young, Terry Riley, Steve Reich, Philip Glass. Kahn & Averill, 1988.
[393]
E. Reis, “The lean startup,” New York: Crown Business, vol. 27, 2011.
[394]
[395]
S. Leiter, M. Erb, P. Vermare, and M. Shibata, All about Saul Leiter. Thames & Hudson, 2018.
[396]
F. Battiato, “La torre.” EMI Italiana S.p.A., 1982.
[397]
J. Fried and D. H. Hansson, It doesn’t have to be crazy at work. HarperBusiness, 2018.
[398]
T. Hatmaker, “Basecamp sees mass employee exodus after CEO bans political discussions — TechCrunch,” 2021. https://web.archive.org/web/20211021201349/https://techcrunch.com/2021/04/30/basecamp-employees-quit-ceo-letter/
[399]
J. Fried, “Jason Fried on Twitter: "Not only can most things wait, most things should. You either can’t stop thinking about them, or those thoughts fade away. Let time do some work for you.",” 2018. https://web.archive.org/web/20210428070816/https://twitter.com/jasonfried/status/1048365057862770688
[400]
[401]
D. H. Hansson, “Things are going so well we’re doing a hiring freeze,” 2018. https://web.archive.org/web/20211120023718/https://m.signalvnoise.com/things-are-going-so-well-were-doing-a-hiring-freeze/
[402]
[403]
[404]
G. R. Notess, “The Wayback Machine: The web’s archive.” Online, vol. 26, no. 2, pp. 59–61, 2002.

  1. This work is in the public domain, via Wikimedia Commons.↩︎

  2. A discussion board for developers.↩︎

  3. The smallest number of edges that can be removed from a graph to make it acyclic.↩︎

  4. Errors in a program that cause it to fail or behave in unintended ways.↩︎

  5. It can be found in [13] as item 158, without Steele’s change.↩︎

  6. For team projects, speed is measured in lines written by one developer per unit of time, even if multiple developers are working at the same time.↩︎

  7. Code that can never be executed, and can be removed without altering the behavior of a program.↩︎

  8. A way to de-activate code by turning it into a comment.↩︎

  9. An automated analysis that doesn’t require running the target program.↩︎

  10. A software used by software engineers to preserve, compare and combine different versions of the same code.↩︎

  11. The verbose developer is now a department head. The concise one is still an individual contributor.↩︎

  12. Tests that exercise individual components.↩︎

  13. Also known as issue, a unit of work with attached description, stored in a system known as issue tracker, in support of software development.↩︎

  14. Python also offers the option print((message + '\n') * n), end = '').↩︎

  15. Undecidable, to be precise, that is there can be no program that solves it.↩︎

  16. A precursor to the computing concept of a function known from the early days of computing [40].↩︎

  17. On different machines, real or virtual, or at least separate processes↩︎

  18. Google’s codebase alone has passed the 2-billion-LOC mark, an unknown fraction thereof currently in use; I don’t think there is a way to make a reasonable industry-wide estimate.↩︎

  19. A fundamental but also elementary algorithm.↩︎

  20. A program to translate code written in a programming language into instruction directly executable by a machine.↩︎

  21. A programming technique whereby a function is defined using itself; it isn’t a circular definition because at each invocation of the function its arguments become smaller or simpler.↩︎

  22. If you are worrying for him, he is now a successful IT manager.↩︎

  23. In object-oriented programming, classes of objects can be associated in a inheritance relation whereby the inheriting class has all the capabilities of the other and adds some of its own.↩︎

  24. A programming paradigm whereby composing and applying functions is emphasized, whereas changes of state and side effects are minimized.↩︎

  25. Using automatic formatting tool Black.↩︎

  26. This implementation is for educational purposes only. Fundamental Python data structures are typically implemented in C and avoiding recursion for efficiency reasons. partial is from the standard library module Functools.↩︎

  27. A type of processing unit specialized in graphics and image processing, but also used for a variety of other numerical computations.↩︎

  28. Based on systems that include different types of processor, typically optimized for different tasks.↩︎

  29. Application Programming Interface or how programmers access libraries and services.↩︎

  30. A technique whereby data is stored temporarily for performance reasons.↩︎

  31. A data structure to represent sets with particular emphasis on memory efficiency.↩︎

  32. Integrated Development Environment, a software used by engineers to develop applications.↩︎

  33. Used to compare different versions of a program; text-oriented ones aren’t aware of the syntax of a program and process it as a generic sequence of characters.↩︎

  34. A programming language features whereby different functions can share the same name as long as their arguments’ types are different.↩︎

  35. I hope you charge by the LOC if you are an independent contractor.↩︎

  36. A form of documentation interleaved with code.↩︎

  37. A data compression technique, see Sec. 1.2.1.↩︎

  38. A theoretical model of computation.↩︎

  39. Universal in this context means able to read a description of another Turing machine or Lisp function in input and execute it↩︎

  40. A language close to the machine language that computers can understand directly, but readable for people.↩︎

  41. An abstract computer that is emulated in software; it represents an intermediate, optional step between programming languages and machine language which enhances portability, the ability of programs to run on different types of computers.↩︎

  42. Code derived from Bååth’s is available under an Attribution 4.0 International (CC BY 4.0) license.↩︎

  43. The minimum number of one-letter deletions, insertions and substitutions necessary to transform a word into another.↩︎

  44. The process of creating a function from another with a larger number of arguments by setting values for a subset of them.↩︎

  45. Code derived from Hoigaard’s is licensed under an Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) license.↩︎

  46. As quoted in Hoigaard’s post. A link to the original was “dead” and no licensing information was provided.↩︎

  47. In the context of web applications, the generation of HTML for a page.↩︎

  48. In this context, it means “in the browser.”↩︎

  49. Names are disambiguated where needed with the year of birth.↩︎

  50. Java Virtual Machine.↩︎

  51. Code that is largely repetitive but can’t be avoided.↩︎

  52. [A type of template that accept a variable number of arguments.]↩︎

  53. For languages, it means that working programs written for older version of a language remain working under the definition of the new version.↩︎

  54. [Meaning a neural network module, not a software module.]↩︎

  55. Traditionally the simplest program we can write is one that prints this message and terminates.↩︎

  56. Similar to those of relational databases and inspired by the mathematical concept of relation.↩︎

  57. For Extract Transform Load, an umbrella term for a variety of computations necessary to load a raw data set into a database.↩︎

  58. [A Domain Specific Language; to the best of my knowledge Scalding is just a library for Scala, despite this statement.]↩︎

  59. A system for the processing of large amounts of data.↩︎

  60. This idiom derives from the fact that code is often organized in layers which typically only access the layer immediately underneath.↩︎

  61. The ability of programs to run on different types of computers.↩︎

  62. Despite attempts to extend Python with macros, there is nothing available beyond the experimental stage.↩︎

  63. [qplot is a function in Ggplot2.]↩︎

  64. That theme_set call is necessary to remove the questionable default gray background: not even Ggplot2 is perfect.↩︎

  65. I am discussing frameworks together with libraries, as I don’t see the distinction as fundamental.↩︎

  66. Data Base Management System.↩︎

  67. A way to map programming language objects to tables in a database.↩︎

  68. For Model View Controller, a widely adopted structure for web applications.↩︎

  69. [The simplest possible app that performs something, like printing “Hello, World.”]↩︎

  70. However, following the convention used throughout this book of stripping any import or equivalent statements.↩︎

  71. Object Relational Mapping.↩︎

  72. A component in a web application that supports the generation of HTML documents.↩︎

  73. Like APIs, but accessed through the HTTP protocol rather than function or method calls.↩︎

  74. Converting from and to a format suitable for storage or transmission over a network.↩︎

  75. Data structures that support iteration.↩︎

  76. In my testing the module-wide feature didn’t work.↩︎

  77. This code is available under the GNU GPLv3 license.↩︎

  78. The performance penalty on decorated calls has been ignored for the sake of simplicity, and can be improved by moving creation of new vtypes to decoration time rather than call time, or memoizing vtype.↩︎

  79. A temporary data store that discards the Least Recently Used item to make room for new ones.↩︎

  80. One that can be replaced with a custom definition.↩︎

  81. A catch-all argument for otherwise undeclared keyword arguments, those specified as key=value.↩︎

  82. Virtual Private Networks, a networking technology that improves privacy.↩︎

  83. This code is in the public domain.↩︎

  84. This function extends the sequence of sum, multiplication and exponentiation to ever faster growing functions.↩︎

  85. For instance, 5 is represented as 11111.↩︎

  86. This size needs to include the size of any input or, alternately, no input is allowed.↩︎

  87. This code is available under a Creative Commons Attributions ShareAlike 3.0 unported license.↩︎

  88. The depressing name some software methodologies use for a list of planned features or other changes.↩︎

  89. The analysis of the syntactical structure of a computer program.↩︎

  90. To be precise, Parso has a second, much less important, function.↩︎

  91. The process of finding the derivative of a function.↩︎

  92. Generated when needed rather than as soon as possible (eager).↩︎

  93. Programs to control multiple computers from a single terminal window.↩︎

  94. A high estimate, others being as low as 1/3 of that; Meta did not separately report its VR division results until recently.↩︎

  95. Small computers on a single chip used for specific purposes, as opposed to the more general purpose ones used in phones, laptops or data centers.↩︎

  96. Central Processing Unit, or the most important part of a computer, responsible for reading program instructions and executing them.↩︎

  97. A software bundle of an application with libraries and configuration it needs to work. It is meant to allow the execution of said application on a standard infrastructure.↩︎

  98. [That’s life]↩︎

  99. The amount of work caused by choosing a quick solution rather than a high quality one.↩︎

  100. The property of different models of computation to describe the same set of computations as the Turing machine.↩︎

  101. The ability present in many object-oriented languages to execute different code based on the type of an object.↩︎

  102. An object-oriented language feature whereby objects of different types can be dealt with in a uniform way.↩︎

  103. In object-oriented programming, a relation between types according to which an object of the “child” type can be operated on as if it were an instance of the “parent” type.↩︎

  104. This image is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported license, via Wikimedia Commons.↩︎

  105. This is usually defined in terms of universal Turing Machines, but any Turing-equivalent model of computation leads to similar results.↩︎

  106. Because there are multiple choices of computing models, with the only requirement that they be Turing-equivalent.↩︎

  107. A precise definition requires using a specific type of Turing machines, but any variant thereof or reasonable programming language will result in a definition with the same properties, albeit different concrete values.↩︎

  108. The O()O() notations means “growing at most like.”↩︎

  109. Even if a pure Bayesian approach would dictate that the only thing that matters is the posterior distribution over models, not which one ends up winning.↩︎

  110. Because if n=mln = m l then aml+bml=cmla^{m l}+b^{m l}=c^{m l} and (am)l+(bm)l=(cm)l(a^m)^l+(b^m)^l=(c^m)^l which is also covered by the Fermat conjecture unless l=2l=2↩︎

  111. Notwithstanding that brick construction isn’t the most popular in that country.↩︎

  112. I was involved in a minor role in a preliminary phase of this project, at least 5 years before the events described here, based only on public information.↩︎

  113. This work is in the public domain in the United States, via Wikimedia Commons.↩︎

  114. It neither contradicts nor is it implied by.↩︎

  115. [This refers to a camera setting which maximizes the areas of a photograph that appear sharp.]↩︎