Recently, I tweeted about an observation I have made a couple of times in various Scala projects:
Observation: People are using Option too often where their business logic clearly indicates they should use their own, custom ADT.— Daniel Westheide (@kaffeecoder) April 20, 2016
Since 140 characters are not nearly enough to discuss this issue appropriately and there were a few questions around this whole topic, I decided to put my thoughts down in a blog post, providing a few examples of what I actually talking about.
Option is great, Option is good!
As Scala programmers, we can all give ourselves a pat on the back for being upright users of the
Option is great! It allows us to express in types that a value may or may not be there. Option is good! It allows us to work with potentially missing values in a safe and elegant way, without having to check for presence all the time or risking those dreaded
NullPointerExceptions. All hail
The problem of overloaded semantics
Option is also bad! Or, more precisely, the way that this type is being used is somewhat problematic. The semantics of the
Option type are pretty clear: it is about potential absence of a value. For example, if you try to get a value for a specific key from a
Map, the result may or may not be there:
This usage of the
Option type is consistent with its semantics. However, sometimes, we attach different or additional meaning to this type. An example of this has been summarised by @tksfz as a reply to my tweet:
@kaffeecoder An example: A Query type where empty means match everything. Have that in our code right now.— tksfz (@tksfz) April 20, 2016
Since this is such a good example, I decided to shamelessly make use of it to explain the problem. Imagine you are developing a system that allows users to search for various offers, both by retailers and private sellers. A very simplified version of the search function could look like this:
1 2 3 4
Apparently, there are two search criteria you can provide to filter the results: the product, and the retailer offering the product.
But what does this really mean? It looks like we have attached some new semantics to the
Option type. It seems that if
None, the user wants to search for all offers, regardless of the product title, which means that
None has the meaning of a wildcard. However, if
productTitle is a
Some, is this supposed to be an exact match or does the provided search string just have to be contained in the product title?
For the retailer, the same semantics might apply if the
Option is undefined. Or maybe, in this case,
None actually means that the user wants to search for offers that are not provided by a professional retailer.
Who knows? As a team member seeing this code for the first time, or coming back to it after a few months, I would probably be confused.
The problem is that we are overloading the semantics of
Some. The former is very similar to how
null in Java is sometimes used with meanings that are different from simple absence.
Towards meaningful types
Luckily, we can do better in Scala. Whenever someone in your team is confused about the exact meaning of an
Option in your code, this is a good indicator that you should introduce your own algebraic data type that captures the implicit semantics of your domain. In the example above, something like this would probably be clearer for the reader:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Introducing your own algebraic data types here allows you to use to close the gap between the language of your domain and the one used in your code, bringing you closer to a ubiquitous language, one of the core values in domain-driven design.
In this example, any confusion about whether the product title is an exact search in the
Some case or whether
None is a wildcard is now eliminated. In the
searchOffers implementation, we can simply use pattern matching on the
RetailerCriteria, and since they are sealed traits, we will get a warning if our pattern matching is not exhaustive, i.e. it does not cover all the cases. If you want to read more about algebraic data types and sealed traits in Scala, read this excellent blog post by Noel Welsh.
Now, you might say that
RetailerCriteria have the same shape as
Option, apart from the fact that they are not generic. Wouldn’t it be good enough to use some type aliases then in order to get the benefits of the ubiquitous language?
You could certainly do that, but having your own algebraic data types is more future-proof. You can easily extend your algebraic data types to enable additional functionality. If we want to allow users to do an exact search for the product title, and to specify that they are only interested in articles that are not offered by retailers, the following will do the trick:
1 2 3 4 5 6 7 8 9 10 11 12 13
Since we are already using our own algebraic data types, no big refactorings should be necessary.
Of course, it is possible to encode the retailer criteria as an
Either[Unit, Option[Retailer]], but can you tell me immediately what each of the possible cases is supposed to mean?
The previous example was mainly about
Option being used in function arguments. It seems like this is rarely a good idea. Here is another example where
Option is used as a field of a case class and as a return type of a function, with confusing semantics.
Imagine you are working at Nextflix, the next big platform for watching TV series online. Things being as they are, you need to block certain TV series from users located in specific countries. To do that, you could make use of a filter chain in your web application. One filter in this scenario needs to immediately return a response if the content is blocked in the user’s country, or forward to the next filter in the chain if the content is accessible from the user’s country. Here is what this could look like in Scala code:
1 2 3 4 5 6 7 8 9 10 11 12
Option returned by
None, that’s the happy path and we can call the next filter in the chain. If it is defined, however, we need to short-circuit and immediately return a response informing the user that the content is blocked. Since
Option is success-biased and short-circuits if it is
None, this doesn’t look very intuitive to me.
What if we had our own algebraic data type for the verdict on the territory that a request is from? Such a solution could look like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
I find that to be a lot more readable, and again, it has the advantage that it is easier to extend this in the future with additional verdict types.
It is often possible to express a concept from your domain as an
Option, or if that doesn’t work, as an
Either. Nevertheless, it is sometimes better to not use these generic types. If your actual semantics are different from potential absence of values, don’t force it, as this causes unnecessary indirection when reasoning about the code and the domain. Unsurprisingly, this can easily lead to bugs. The domain logic is easier to understand if you use your own types from the ubiquitous language.
Do you have more examples from your projects where the lack of custom algebraic data types has led to bugs or code that is difficult to understand? Feel free to share those in the comments or via Twitter.