Linguistic
Antipatterns
This is a website to teach you to identify and fix
linguistic
antipatterns
in your code

What is a
linguistic
antipattern
?

Have you ever had a gnarly bug, or even just a frustrating coding session, that could be ultimately traced back to something that just didn't do what you thought it did based on the name? We certainly have. These can be caused by two people who interpret a word differently, or one person making too many assumptions by themselves. But more often than not, they're caused by a problem where the name predictably leads people to believe a function does something it simply doesn't. The ways in which this happens are linqusitic antipatterns. As defined by the original researchers:

Linguistic Antipatterns (LAs) in software systems are recurring poor practices in the naming, documentation, and choice of identifiers in the implementation of an entity, thus possibly impairing program understanding.

This website is dedicated to cataloguing types of linguistic antipatterns and discussing the deeper reasons they cause problems and how to fix them.

Origin

Linguistic antipatterns were first studied in a series of papers led by Venera Arnaoudova. In the two main papers, they identified many types of linguistic antipatterns by scrutinizing several codebases, built scanners to find large numbers of examples, and then conducted a study where 11 professional engineers and 19 graduate students were asked their opinions on examples of each anti-pattern.

We take inspiration from Arnaoudova's work, but depart significantly from it, giving a smaller but broader set of anti-patterns.

How does this website differ from the original
Linguistic
Antipatterns
papers ?

The original Linguistic Antipattern papers catalogued 18 types. Each of these were very narrow. Some are exceptionally rare. This larger set is great for people trying to build static analyzers to find them. But we believe a smaller list of broader anti-patterns is better for learning and memory. We have collapsed the original list of 18 narrow patterns into 3 broader ones, and then added several of our own based on stories of bad bugs caused by poor names. We also try to connect each of the antipattern types to deeper software-engineering principles. For more of this philosophy, read our newsletter on Why Not to Study Design Patterns

Who are we !?

Through our intense courses and 1-on-1 coaching, we have trained over 250 software engineers at the advanced level. This website is part of our mission to make the world's software less buggy and easier to change by creating common knowledge of scientific coding principles. Many of the examples in this website are ones we have directly gathered from students asked to share stories about difficult bugs they've encountered.

1 Formerly James Koppel Coaching, LLC.

Linguistic
Antipatterns

Multiple methods with confusably-similar names and effects

Description

A class or namespace has two functions with similar names. A programmer who wants the functionality of the first function may mistakenly call the second. If the effects are similar, casual testing may mislead the programmer into thinking they had called the correct function, even if the two functions differ in some important way.

Examples

The classic example of this is the confusion between Thread.start() and Thread.run() in Java. Python's standard thread package was based on Java's, and also has this problem. For example, consider these snippets:

Thread myThread = new Thread(() => doSomethingExpensive());
myThread.run();
class MyThread(thread):
def run(self):
doSomethingExpensive()
myThread = MyThread()
myThread.run()

In both of these, the programmer intended to call myThread.start(), which creates a new background thread and then runs doSomethingExpensive on that background thread. Instead, they have called myThread.run(), which runs doSomethingExpensive() in the current thread. This happened because start and run are confusingly similar names. Further, the application will appear to work, but it will be slower because something which should be done in the background is blocking important behavior. Because of that, this bug can go undetected in a codebase for a long time.

For another example, consider the battle between the various load functions in PyYAML. Originally, in versions 3.12 and below, PyYAML had two functions called load and safe_load, where safe_load had a safe behavior while load could execute arbitrary code. In PyYAML 4.1, they renamed the old load to danger_load. They removed the old safe_load function and created a new load function, which is also unsafe. This caused significant controversy. The story is chronicled in this blog post.

Both pairs of functions raised issues of confusability. Users of a function called load will see YAML parse correctly and believe they have implemented this functionality correctly, but they have in fact introduced an arbitrary-code-execution vulnerability into their software. And when there is a function called danger_load, a programmer may be tempted to think that the load function implements the safer options, but in this library load was in some ways more dangerous than danger_load.

Discussion and Lessons

Having confusable methods is one way to violate the Representable/Valid Principle, that there should be a 1-1 mapping between representable and valid states of the program. For the thread example, the state where doSomethingExpensive() has been run in the main thread is an error state which should never occur. It should not be possible to call the Thread.run() method. In PyYAML 3.12, calling load enters a state where arbitrary code may have been executed, which is invalid; it should therefore not be possible for a programmer to inadvertently call it. In comparison, after the changes of version 4.1, an ordinary programmer would only call load and never danger_load, meaning the result of having called danger_load is not representable in any program surviving minimal code review. However, this only makes uses of the load function more likely to pass code review, even though it is also insecure.