Skip to content

Functional

Today I Learned - Iterating Through Union Types

In a previous post, we cover on how using union types in TypeScript is a great approach for domain modeling because it limits the possible values that a type can have.

For example, let's say that we're modeling a card game with a standard deck of playing cards. We could model the domain as such.

1
2
3
4
5
type Rank = "Ace" | "Two" | "Three" | "Four" | "Five" | "Six" | "Seven"
           | "Eight" | "Nine" | "Ten" |"Jack" | "Queen" | "King"
type Suite = "Hearts" | "Clubs" | "Spades" | "Diamonds"

type Card = {rank:Rank; suite:Suit}

With this modeling, there's no way to create a Card such that it has an invalid Rank or Suite.

With this definition, let's create a function to build the deck.

function createDeck(): Card[] {
  const ranks = ["Ace", "Two", "Three", "Four", "Five", "Six", "Seven", "Eight", "Nine", "Ten", "Jack", "Queen", "King"];
  const suites = ["Hearts", "Clubs", "Spades", "Diamonds"];

  const deck:Card[] = [];
  for (const rank of ranks) {
    for (const suite of suites) {
      deck.push({rank, suite});
    }
  }
  return deck;
}

This code works, however, I don't like the fact that I had to formally list the option for both Rank and Suite as this means that I have two different representtions for Rank and Suite, which implies tthat if we needed to add a new Rank or Suite, then we'd need to add it in two places (a violation of DRY).

Doing some digging, I found this StackOverflow post that gave a different way of defining our Rank and Suite types. Let's try that new definition.

1
2
3
4
const ranks = ["Ace", "Two", "Three", "Four", "Five", "Six", "Seven", "Eight", "Nine", "Ten", "Jack", "Queen", "King"] as const;
type Rank = typeof ranks[number];
const suites = ["Hearts", "Clubs", "Spades", "Diamonds"] as const;
type Suite = typeof suites[number]

In this above code, we're saying that ranks cannot change (either by assignment or by operations like push). With that definition, we can say that Rank is some entry in the ranks array. Similar approach for our suites array and Suite type.

I prefer this approach much more because we have our ranks and suites defined in one place and our code reads cleaner as this says Here are the possible ranks and Rank can only be one of those choices.

Limitations

The main limitation is that it only works for "enum" style unions. Let's change example and say that we want to model a series of shapes with the following.

1
2
3
4
5
type Circle = {radius:number};
type Square = {length:number};
type Rectangle = {height:number, width:number}

type Shape = Circle | Square | Rectangle

To use the same trick, we would need to have an array of constant values. However, we can't have a constant value for any of the Shapes because there are an infinite number of valid Circles, Squares, and Rectangles.

The Humble Function - Foundation to Functional Programming

When learning about functional programming, you won't go far before you run into the concept of a function. But we're not talking about some syntax or keywords in the language, but from a mathematical sense.

A function is a mapping between two sets such that for every element in the first set, it's mapped to a single element in the second set.

Since a set is a collection of elements, this is similar to a type where values are part of that type. With this understanding, we can modify our definition of a function to be:

A function is a mapping between two types such that for every value in the first type, it's mapped to a single value in the second type.

So What Does a Function Look Like?

Before diving into code, let's build up our understanding of functions more. When drawing out the function, we can model it like this.

Generic mapping from A to B

Generic mapping from A to B

The first type (A) is a circle where each possible value is listed, using ... to denote a pattern. The arrows map from each value of A to a value in B.

With this example, we know we have a function if the mapping satisfies the following rule:

Every element on the left maps to a single element on the right.

This rule seems easy enough to follow, but let's look at a mapping where this rule doesn't hold.

Functional Heartbreak

Let's say that we needed to write some code that could take a given month and return the number of days it has. Given this, here's what the mapping would look like.

Days in Month Mapping

Mapping from month name to days in month

To check if we have a function, we need to see if there's any element on the left with two or more arrows coming out.

In this case, February is breaking our rule because it could map to 28 or 29, depending on if it's a leap year. Since there isn't a parameter to denote if it's a leap year, our mapping isn't consistent and can't be a function.

One way to fix this would be to change our type on the left from MonthName to MonthName and year. Making this change gives us this new mapping.

Days in Month Mapping with Month Name and Year

Month and year mapping to days in month

Hip to Be Square

Let's look at a mapping that is a function, the square function.

Square mapping from number to number

Square mapping from number to number

Does every value on the left map to a single value on the right?

Yep, every value does. In fact, there are some values on the left that map to the same value on the right, which isn't a problem.

If we wanted to, we could restrict the type on the right from number to non-negative number, but there's no harm in having it be wider than needed.

Kinds of Functions

With this understanding of functions, let's talk about the two kinds of functions we can write and how they interact with each other.

Pure Functions

First, we have the pure function. These functions depend wholly on their inputs and they do not interact with outside state. For example, pure functions won't interact with databases, file systems, random generation, or time.

Pure functions are great because they're easy to test, composed with other functions, and don't modify state. Another advantage is that pure functions can be replaced with the result of their call (in other words, you could replace a call to square(3) with its result, 9 and the program is the same). This is known as referential transparency and is a great tool for troubleshooting an application.

The main downside to pure functions is that since they don't talk to other systems (including input/output), it's impossible to write a useful program with just pure functions.

Impure Functions

Impure functions, on the other hand, focus on interacting with outside state. These functions will call to the database or other systems, get the time, and generate random data as needed.

They allow us to write useful programs because they interact with input/output, however, the trade off is that they can be harder to test, they don't compose, and since they modify state, you may not be able to run them multiple times and get the same result.

One way to identify an impure function is by looking at its type signatures. For example, a function that takes inputs but returns void has to be modifying state or talking to another system, otherwise, why would you call it? Another signature is a function that takes no inputs, but it can return a value (like readLine() from nodejs), where did it get the value from? Just short of returning a constant value, it had to get it from somewhere.

Building an Application

Building an application requires both pure and impure functions, but how do we leverage the best of both worlds?

When I think about software, I think about how data flows through the system. For example, if someone is using a console application, they're inputting their commands to the terminal, which in turn converts them to commands to run, where the output is displayed on the screen.

As such, an application is made of three kinds of functions.

  • Boundary functions - These are responsible for getting input/output. They should have zero business rules (or the least possible as they are impure functions.
  • Business functions - These are the business specific rules that need to be ran on the given inputs. As these are typically the most important of an application, they are designed as pure functions
  • Workflow functions - This is the combination of boundary components and business components to build something useful. Since they're using impure functions, this will also be impure, but I often use this as my composition root for the application.

Effervescent Applications with Fizz Buzz

To demonstrate, let's build a version of FizzBuzz that uses this mindset.

For the problem, we have the following requirements.

  • If the number is divisible by 3, print "Fizz".
  • If the number is divisible by 5, print "Buzz".
  • If the number is divisible by both 3 and 5, print "FizzBuzz".
  • If the number isn't divisible by any of these, then print the number.

Given that we're building a console application, we will need to support getting input via the console and printing to the console.

Let's go ahead and build up our boundary functions.

// Impure function that allows us to get number from user
function getInput(): number {
  // using prompt-sync https://github.com/heapwolf/prompt-sync
  const prompt = require("prompt-sync")({ sigint: true });
  const response = prompt("What number to calculate FizzBuzz to?");
  if (!+response || +response < 1) {
    console.log(
      "Invalid response, please enter a positive number greater than 1"
    );
    return getInput();
  }
  return +response;
}

// Function that wraps console.log for printing
function printOutput(content: string): void {
  console.log(content);
}

At this point, we have a way of getting a number via getInput and a way to print a string via printOutput. In printOutput, this is a tiny function with no business rules whatsoever. getInput, however, has some business rules about validation, but we'll see later on how to refactor this.

For now, let's leave these two and look into creating our business rule functions.

// Business rules for FizzBuzz
function calculateFizzBuzz(input: number): string {
  if (input % 3 == 0 && input % 5 == 0) {
    return "FizzBuzz";
  }
  if (input % 3 == 0) {
    return "Fizz";
  }
  if (input % 5 == 0) {
    return "Buzz";
  }
  return `${input}`;
}

// Helper function to create a range of numbers from [1..end]
function createRangeFromOneTo(end: number): number[] {
  if (number < 1) {
    return [];
  }
  return Array.from(Array[number].keys()).map((x) => x + 1);
}

With calculateFizzBuzz defined, we could write unit tests to ensure the correctness of the behavior. We could also create a mapping to double-check that we have a function.

Now, let's revisit our getInput function. We've got some business rules that deal with validation (e.g. the input must be a number and greater than 1). Given that this is a light business rule, we could leave it here; however, testing this becomes harder because we don't have a way to ensure that the validation works as expected.

To solve this problem, we could extract the validation logic to its own pure function and update getInput to use the new function.

function isInputValid(input: string): boolean {
  if (!+input) {
    return false;
  }
  return +input > 1;
}

function getInput(): number {
  // using prompt-sync https://github.com/heapwolf/prompt-sync
  const prompt = require("prompt-sync")({ sigint: true });
  const response = prompt("What number to calculate FizzBuzz to?");
  if (!isInputValid(response)) {
    console.log(
      "Invalid response, please enter a positive number greater than 1"
    );
    return getInput();
  }
  return +response;
}

Nice! With this in place, we can go ahead and implement our last function, the FizzBuzz workflow.

function runFizzBuzzWorkflow(): void {
  // Data coming in
  const maximumNumber = getInput();

  // Calculating results
  const results = createRangeFromOneTo(maximumNumber).map((x) =>
    calculateFizzBuzz(x)
  );

  // Print Results
  results.forEach((x) => printOutput(x));
}

// example invocation
runFizzBuzzWorkflow();

This is a straightforward implementation as we get the maximumNumber to calculate, create an array of numbers from 1 to maximumNumber, map each of those to their FizzBuzz representation, and then print them to the screen.

Let's go one step forward. In our example, we assumed that the input and output was coming from the console, but what if we needed to change to read and write to a file?

We could move the boundary functions to be parameters to runFizzBuzzWorkflow, let's take a look at what that would give us.

function runFizzBuzzWorkflow(
  readInput: () => number,
  writeOutput: (string) => void
) {
  // Data coming in
  const maximumNumber = readInput();

  // Calculating results
  const results = createRangeFromOneTo(maximumNumber).map((x) =>
    calculateFizzBuzz(x)
  );

  // Print Results
  results.forEach((x) => writeOutput(x));
}

// example invocations
runFizzBuzzWorkflow(getInput, printOutput); // using console read/write
runFizzBuzzWorkflow(() => 42, printOutput); // using hardcoded input with console log

With this change, we can now swap out how we can input or output by creating new functions with the right type signatures. This makes testing workflow components easy because you can stub in your own mocks (no need for mocking frameworks).

If you understand the power of switching out your boundaries, then you also understand other modern architectures like Ports and Adapters as they follow a similar principle.

Wrapping Up

In this post, we looked at what a function is, how it relates to types, and how to tell if a mapping is a function. From there, we covered the differences between pure and impure functions and how you need both to build any useful application. Finally, we wrapped up by implementing the FizzBuzz problem using this approach.

Better Domain Modeling with Discriminated Unions

When I think about software, I like designing software so that doing the right things are easy and doing the wrong things are impossible (or at least very hard). This approach is typically called falling into the pit of success.

Having a well-defined domain model can prevent many mistakes from happening just because the code literally won't let it happen (either through a compilation error or other mechanisms).

I'm a proponent of functional programming as it allows us to model software in a better way that can reduce the number of errors we make.

Let's at one of my favorite techniques discriminated unions.

Motivation

In the GitHub API, there's an endpoint that allows you to get the events that have occurred for a pull request.

Let's take a look at the example response in the docs.

[
  {
    "id": 6430295168,
    "url": "https://api.github.com/repos/github/roadmap/issues/events/6430295168",
    "event": "locked",
    "commit_id": null,
    "commit_url": null,
    "created_at": "2022-04-13T20:49:13Z",
    "lock_reason": null
  },
  {
    "id": 6430296748,
    "url": "https://api.github.com/repos/github/roadmap/issues/events/6430296748",
    "event": "labeled",
    "commit_id": null,
    "commit_url": null,
    "created_at": "2022-04-13T20:49:34Z",
    "label": {
      "name": "beta",
      "color": "99dd88"
    }
  },
  {
    "id": 6635165802,
    "url": "https://api.github.com/repos/github/roadmap/issues/events/6635165802",
    "event": "renamed",
    "commit_id": null,
    "commit_url": null,
    "created_at": "2022-05-18T19:29:01Z",
    "rename": {
      "from": "Secret scanning: dry-runs for enterprise-level custom patterns (cloud)",
      "to": "Secret scanning: dry-runs for enterprise-level custom patterns"
    }
  }
]

Based on the name of the docs, it seems like we'd expect to get back an array of events, let's call this TimelineEvent[].

Let's go ahead and define the TimelineEvent type. One approach is to start copying the fields from the events in the array. By doing this, we would get the following.

type TimelineEvent = {
  id: number;
  url: string;
  event: string;
  commit_id?: string;
  commit_url?: string;
  created_at: string;
  lock_reason?: string;
  label?: {
    name: string;
    color: string;
  };
  rename?: {
    from: string;
    to: string;
  };
};

The Problem

This definition will work, as it will cover all the data. However, the problem with this approach is that lock_reason, label, and rename had to be defined as nullable as they can sometimes be specified, but not always (for example, the lock_reason isn't specified for a label event).

Let's say that we wanted to write a function that printed data about TimelineEvent, we would have to write something like the following:

1
2
3
4
5
6
7
8
9
function printData(event: TimelineEvent) {
  if (event.event === "labeled") {
    console.log(event.label!.name); // note the ! here, to tell TypeScript that I know it'll have a value
  } else if (event.event == "locked") {
    console.log(event.lock_reason);
  } else {
    console.log(event.rename!.from); // note the ! here, to tell Typescript that I know it'll have a value
  }
}

The main problem is that the we have to remember that the labeled event has a label property, but not the locked property. It might not be a big deal right now, but given that the GitHub API has over 40 event types, the odds of forgetting which properties belong where can be challenging.

The pattern here is that we have a type TimelineEvent that can have different, separate shapes, and we need a type that can represent all the shapes.

The Solution

One of the cool things about Typescript is that there is a union operator (|), that allows you to define a type as one of the other types.

Let's refactor our TimelineEvent model to use the union operator.

First, we need to define the different events as their own types

type LockedEvent = {
  id: number;
  url: string;
  event: "locked"; // note the hardcoded value for event
  commit_id?: string;
  commit_url?: string;
  created_at: string;
  lock_reason?: string;
};

type LabeledEvent = {
  id: number;
  url: string;
  event: "labeled"; // note the hardcoded value for event
  commit_id?: string;
  commit_url: string;
  created_at: string;
  label: {
    name: string;
    color: string;
  };
};

type RenamedEvent = {
  id: number;
  url: string;
  event: "renamed"; // note the hardcoded value for event
  commit_id?: string;
  commit_url?: string;
  created_at: string;
  rename: {
    from: string;
    to: string;
  };
};

At this point, we have three types, one for each specific event. A LockedEvent has no knowledge of a label property and a RenamedEvent has no knowledge of a lock_reason property.

Next, we can update our definition of TimelineEvent to use the union operator as so.

type TimelineEvent = LockedEvent | LabeledEvent | RenamedEvent;

This would be read as A TimelineEvent can either be a LockedEvent or a LabeledEvent or a RenamedEvent.

With this new definition, let's rewrite the printData function.

1
2
3
4
5
6
7
8
9
function printData(event: TimelineEvent) {
  if (event.event == "labeled") {
    console.log(event.label.name); // note that we no longer need !
  } else if (event.event == "locked") {
    console.log(event.lock_reason);
  } else {
    console.log(event.rename.to); // note that we no longer need !
  }
}

Not only do we not have to use the ! operator to ignore type safety, but we also have better autocomplete (note that locked_reason and rename don't appear when working with a labeled event). Better autocomplete

Deeper Dive

At a general level, what we've modeled is a sum type and it's great for when you have a type that can take on a finite number of differing shapes.

Sum types are implemented as either tagged unions or untagged unions. Typescript has untagged unions, however, other languages like Haskell and F#, use tagged unions. Let's see what the same implementation in F# would have looked like.

// specific type definitions omitted since they're
// similar to typescript definition
// ....
type TimelineEvent = Locked of LockedEvent | Labeled of LabeledEvent | Renamed of RenamedEvent

let printData e =
    match e with
    | Locked l -> printf "%s" l.lock_reason
    | Labeled l -> printf "%s" l.label.name
    | Renamed r -> printf "%s" r.rename.``to`` // the `` is needed here as to is a reserved word in F#

A tagged union is when each shape has a specific constructor. So in the F# version, the Locked is the tag for the LockedEvent, Labeled is the tag for the LabeledEvent, so on and so forth. In the Typescript example, we worked around it because the event property is on every TimelineEvent and is a different value.

If that wasn't true, then we would had to have added a field to TimelineEvent (typically called kind or tag) that would help us differentiate between the various shapes.

Wrapping Up

When defining domain models where the model can have different shapes, you can use a sum type to define the model.

Using F# To Solve a Constraints Problem

In this post, I’m going to solve a logic puzzle using C# and F#. First, I’ll define the problem being solved and what our restrictions are. Next, I’ll show how I’d break down the problem and write an easy-to-read, extendable solution using idiomatic C#. Afterwards, I’ll solve the same problem and write an easy-to-read, extendable solution writing in idiomatic F#. Finally, we’ll compare the two solutions and see why the F# solution is the better solution.

The Problem

For this problem, I’m going to write a constraint solver (thanks to Geoff Mazeroff for the inspiration).

If you’re not familiar with the concept, a constraint is simply some rule that must be followed (such as all numbers must start with a 4). So a constraint solver is something that takes all the constraints and a source of inputs and returns all values that fit all the constraints.

With that being said, our source will be a list of positive integers and our constraints are the following:

  • 4 digits long (so 1000 – 9999)
  • Must be even (so 1000, 1002, 1004, etc…)
  • The first digit must match the last digit (2002, 2012, 2022, etc…)

To further restrict solutions, all code will be production ready. This includes handling error conditions (like input being null), being maintainable (easily adding more constraints) and easy to read.

To quickly summarize, we need to find a robust, maintainable, and readable solution to help us find all 4 digit number that are even and that the first and last digit are equal.

Implementing a Solution in C

For the C# solution, I’m going to need a class for every constraint, a class to execute all constraints against a source (positive integers) and a runner that ties all the pieces together.

Starting with the smaller blocks and building up, I’m going to start with the constraint classes. Each constraint is going to take a single number and will return true if the number follows the constraint, false otherwise.

With that being said, I’d first implement the constraint that all numbers must be 4 digits long

1
2
3
4
5
6
7
class MustBeFourDigitsLongConstraint
{
    public bool FollowsConstraint(int value)
    {
        return value.ToString().Length == 4;
    }
}

Second, I’d write the constraint that all numbers must be even

1
2
3
4
5
6
7
class MustBeEvenConstraint
{
    public bool FollowsConstraint(int value)
    {
        return value % 2 == 0;
    }
}

Third, I’d implement the constraint that all numbers must have the same first digit and the last digit

1
2
3
4
5
6
7
8
class FirstDigitMustEqualLastDigitConstraint
{
    public bool FollowsConstraint(int value)
    {
        var valueString = value.ToString();
        return valueString[0] == valueString[valueString.Length-1];
    }
}

At this point, I have the constraints written, but I need them to follow a general contract so that the Constraint Solver (about to be defined) can take a list of these constraints. I’ll introduce an interface, IConstraint and update my classes to implement that interface.

1
2
3
4
5
6
7
8
9
public interface IConstraint
{
    bool FollowsConstraint(int value);
}
class MustBeFourDigitsLongConstraint : IConstraint {/* Implementation Details Omitted */}

class MustBeEvenConstraint : IConstraint {/* Implementation Details Omitted */}

class FirstDigitMustEqualLastDigitConstraint : IConstraint {/* Implementation Details Omitted */}

So now I have the constraints defined and they’re now implementing a uniform interface, I can now create the constraint solver. This class is responsible for taking the list of numbers and the list of constraints and then returning a list of numbers that follow all constraints.

class ConstraintSolver
{
    public List FindValues(List constraints, List values)
    {
        if (constraints == null) throw new ArgumentNullException("constraints");
        if (values == null) throw new ArgumentNullException("values");

        var result = values;
        foreach (var constraint in constraints)
        {
            result = result.Where(x => constraint.FollowsConstraint(x)).ToList();
        }
        return result;
    }
}

Finally, I can put all the pieces together using LINQPad (Full C# solution can be found here).

void Main()
{
    var numbers = Enumerable.Range(0, 10000).ToList();
    var constraints = new List<IConstraint>{new MustBeFourDigitsLongConstraint(), new MustBeEvenConstraint(), 
             new FirstDigitMustEqualLastDigitConstraint()};

    var constraintSolver = new ConstraintSolver();
    var result = constraintSolver.FindValues(constraints, numbers.ToList());

    result.Dump();
}

This solution is easily extendable because if we need to add another constraint, we just add another class that implements the IConstraint interface and change the Main method to add an instance of the new constraint to the list of constraints.

Implementing a Solution in F

Now that we have the C# solution, let’s take a look at how I would solve the problem using F#.

Similar to the C# solution, I’m going to create a function for every constraint, a function to execute all constraints against a source (positive integers) and a runner that ties all the pieces together.

Also similar to the C# solution, I’m going to start with creating the constraints and then work on the constraint solver function.

First, I’d implement that the number must be four digits long constraint.

let mustBeFourDigit number = 
    number.ToString().Length = 4

Next, the number must be even constraint.

let mustBeEven number =
    number % 2 = 0

Lastly, the first digit is the same as the last digit constraint.

1
2
3
4
5
let firstDigitMustBeEqualLast number =
    let numberString = number.ToString().ToCharArray()
    let firstDigit = numberString.GetValue(0)
    let lastDigit = numberString.GetValue(numberString.Length-1)
    firstDigit = lastDigit

At this stage in the C# solution, I had to create an interface, IConstraint, so that the constraint solver could take a list of constraints. What’s cool with F# is that I don’t have to define the interface. The F# type inference is saying that each of these functions are taking the same input (some generic `a) and returning a bool, so I can add all of them to the list. This is pretty convenient since I don’t have to worry about this piece of plumbing.

Now that the different constraints are defined, I’d go ahead and write the last function that takes a list of constraints and a list of numbers and returns the numbers that the constraints fit. (Confused by this function? Take a look at Implementing your own version of # List.Filter)

1
2
3
4
5
6
let rec findValidNumbers numbers constraints = 
    match constraints with
    | [] -> numbers
    | firstConstraint::remainingConstraints ->
        let validNumbers = numbers |> List.filter firstConstraint
        findValidNumbers validNumbers remainingConstraints

Finally, all the pieces are in place, so I can now put all the pieces together using LINQPad.

1
2
3
4
5
6
let numbers = [1000 .. 9999]
let constraints = [mustBeFourDigits; mustBeEven; firstDigitMustEqualLast;]

let result = findValidNumbers numbers constraints

printf "%A" result

Comparing Both Solutions

Now that we have both solutions written up, let’s compare and see which solution is better.

First, the same design was used for both solutions. I decided to use this design for both because it’s flexible enough that we could add new constraints if needed (such as, the 2nd digit must be odd). As an example, for the C# solution, I’d create a new class that implemented IConstraint and then update the runner to use the new class. For the F# solution, I’d create a new function and update the runner. So, I’d think it’s safe to say that both solutions score about the same from a maintainability and extendability point of view.

From an implementation perspective, both solutions are production ready since the code handles possible error conditions (C# with null checks in the ConstraintSolver class, F# with none because it doesn’t support null). In addition to being robust, both solutions are readable by using ample whitespace and having all variables, classes, and interfaces clearly described.

With that being said, this is where the similarities end. When we look at how much code was written to solve the problem, we have a stark difference. For the C# solution, I ended up with 48 lines of code (omitting blank lines), however, for the F# solution, it only took 19. Now you could argue that I could have written the C# solution in fewer lines of code by removing curly braces around one line statements or ignoring null inputs. However, this would lead the code to be less robust.

Another difference between the F# solution and C# is that I was able to focus on the solution without having to wire up an interface. You’ll often hear developers talk about the how little plumbing you need for F# to “just work” and this small example demonstrates that point.

Another difference (albeit subtle) is that the F# solution is that I can use the findValidNumbers function with any generic list of values and any generic list of functions that take the generic type and return true/false.

In other words, if I had another constraint problem using strings, I’d still implement the different constraints, but I could use the same findValidNumbers function. At that point, however, I’d probably rename it to findValidValues to signify the generic solution.

What’s awesome about this is that I didn’t have to do any more work to have a generic solution, F# did that for me. To be fair, the C# solution can easily be made generic, but that would have to be a conscious design decision and I think that’s a downside.

TL;DR

In this post, we took a look at solving a number constraint problem by using idiomatic C# and F#. Even though both solutions are easy to read and easy to extend, the F# version was less than 1/2 the size of the C# solution. In addition, I didn’t have to do any plumbing for the F# version, but had to do some for the C# solution, and to top it off, the F# solution is generically solved, whereas the C# solution is not.

Implementing Your Own Version of F#’s List.Filter

As I’ve been thinking more about F#, I began to wonder how certain methods in the F# stack work, so I decided to implement F#’s List.filter method.

For those of you who aren’t familiar, List.Filter takes a function that returns true or false and a list of values. The result of the call is all values that fulfill the fuction.

For example, if we wanted to keep just the even numbers in our list, then the following would accomplish that goal.

1
2
3
4
5
6
let values = [1;2;3;4]
let isItEven x = x % 2 = 0


let evenValues = List.filter isItEven values
// val it : int list = [2; 4]

Now that we know the problem, how would we begin to implement? First, we need to define a function called filter:

let filter () =

However, to match the signature for List.filter, it needs to take a function that maps integers to bools and the list of values to work on

let filter (func:int->bool) (values:int List) =

Now that we have the signature, let’s add some logic to match on the list of values. When working with lists, there are two possibilities, an empty list and a non-empty list. Let’s first explore the empty list option.

In the case of an empty list of values, then it doesn’t matter what the func parameter does, there are no possible results, so we should return an empty list for the result.

1
2
3
let filter (func:int->bool) (values:int List) =
    match values with
    | [] -> []

Now that we’ve handled the empty list, let’s explore the non-empty list scenario. In this branch, the list must have a head and a tail, so we can deconstruct the list to follow that pattern.

1
2
3
4
let filter (func:int->bool) (values:int List) =
    match values with
    | [] -> []
    | head::tail -> // what goes here?

Now that we’ve deconstructed the list, we can now use the func parameter with the head element. If the value satisfies the func parameter, then we want to add the head element to the list of results and continue processing the rest of the list. To do that, we can use recursion to call back into filter with the same func parameter and the rest of the list:

1
2
3
4
5
let rec filter (func:int->bool) (values:int List) =
    match values with
    | [] -> []
    | head::tail -> 
         if func head then head :: filter func tail

At this point, we need to handle the case where the head element does not satisfy the func parameter. In this case, we should not add the element to the list of results and we should let filter continue the work

1
2
3
4
5
6
let rec filter (func:int->bool) (values:int List) =
    match values with
    | [] -> []
    | head::tail -> 
         if func head then head :: filter func tail
         else filter func tail

By handling the base case first (an empty list), filter can focus on the current element in the list (head) and then recurse to process the rest of the list. This solution works, but we can make this better by removing the type annotations. Interestingly enough, we don’t care if we’re working with integers, strings, or whatever. Just as long as the function takes some type and returns bool and the list of values matches the same type as the func parameter, it works. So then we end up with the following:

1
2
3
4
let rec filter func values =
    match values with
    | [] -> []
    | head::tail -> if func head then head :: filter func tail else filter func tail

In general, when working with lists, I tend to start by matching the list with either an empty list or non-empty. From there, I’ve got my base case, so I can focus on the implementation for the first element. After performing the work for the first element, I can then recurse to the next element.