Better Domain Modeling with Discriminated Unions
When I think about software, I like designing software so that doing the right things are easy and doing the wrong things are impossible (or at least very hard). This approach is typically called falling into the pit of success.
Having a well-defined domain model can prevent many mistakes from happening just because the code literally won't let it happen (either through a compilation error or other mechanisms).
I'm a proponent of functional programming as it allows us to model software in a better way that can reduce the number of errors we make.
Let's at one of my favorite techniques discriminated unions.
Motivation
In the GitHub API, there's an endpoint that allows you to get the events that have occurred for a pull request.
Let's take a look at the example response in the docs.
Based on the name of the docs, it seems like we'd expect to get back an array of events, let's call this TimelineEvent[]
.
Let's go ahead and define the TimelineEvent
type. One approach is to start copying the fields from the events in the array. By doing this, we would get the following.
The Problem
This definition will work, as it will cover all the data. However, the problem with this approach is that lock_reason
, label
, and rename
had to be defined as nullable as they can sometimes be specified, but not always (for example, the lock_reason
isn't specified for a label event).
Let's say that we wanted to write a function that printed data about TimelineEvent
, we would have to write something like the following:
The main problem is that the we have to remember that the labeled
event has a label
property, but not the locked
property. It might not be a big deal right now, but given that the GitHub API has over 40 event types, the odds of forgetting which properties belong where can be challenging.
The pattern here is that we have a type TimelineEvent
that can have different, separate shapes, and we need a type that can represent all the shapes.
The Solution
One of the cool things about Typescript is that there is a union operator (|), that allows you to define a type as one of the other types.
Let's refactor our TimelineEvent
model to use the union operator.
First, we need to define the different events as their own types
At this point, we have three types, one for each specific event. A LockedEvent
has no knowledge of a label
property and a RenamedEvent
has no knowledge of a lock_reason
property.
Next, we can update our definition of TimelineEvent
to use the union operator as so.
This would be read as A TimelineEvent
can either be a LockedEvent
or a LabeledEvent
or a RenamedEvent
.
With this new definition, let's rewrite the printData
function.
Not only do we not have to use the !
operator to ignore type safety, but we also have better autocomplete (note that locked_reason
and rename
don't appear when working with a labeled event).
Deeper Dive
At a general level, what we've modeled is a sum type and it's great for when you have a type that can take on a finite number of differing shapes.
Sum types are implemented as either tagged unions or untagged unions. Typescript has untagged unions, however, other languages like Haskell and F#, use tagged unions. Let's see what the same implementation in F# would have looked like.
A tagged union is when each shape has a specific constructor. So in the F# version, the Locked
is the tag for the LockedEvent
, Labeled
is the tag for the LabeledEvent
, so on and so forth. In the Typescript example, we worked around it because the event
property is on every TimelineEvent
and is a different value.
If that wasn't true, then we would had to have added a field to TimelineEvent
(typically called kind
or tag
) that would help us differentiate between the various shapes.
Wrapping Up
When defining domain models where the model can have different shapes, you can use a sum type to define the model.