Clean Code

Thinking With Properties: Examining Where

(Note: This post is for C# Advent Calendar 2020 organized by Matthew Groves. Check out some of the other posts that are happening over the month of December!)

What Do We Mean By Properties?

When I think about software, I will generally think about the properties that the solution has to have where properties are the characteristics that the code has. Sometimes, the property is obvious (for example, if you square a number, the result should always be positive). In this post, we’re going to look at LINQ’s Where method, examine some of the properties, and come up with a more efficient way of creating filters.

Examining LINQ’s Where Method

For those not familiar with Where, it’s a method that promises that it will return a subset of a list where each item fulfills some criteria (referred to as a predicate). The type signature for Where is

IEnumerable<T> => Func<T, bool> => IEnumerable<T>

At face value, this sounds pretty straightforward, but there are a few more properties that Where provides that aren’t obvious at first, but are beneficial

  • The results can’t be null (worse case, it’ll be an empty list because no item fulfilled the criteria)
  • The results can’t be larger than the original list
  • The results are in the same order that the original list was in (i.e. if we’re trying to find the even numbers in a list that is 1..10, then you’re guaranteed to get 2, 4, 6, 8, and 10. Not, 8, 2, 6, 10, 4)
  • The results will only contain elements that were in the original list (i.e. it can’t create elements out of thin air and it can’t copy elements in the original list)

Common LINQ Mistake With Where

That’s a ton of guarantees that you get from leveraging Where instead of doing your own filtering inside loops. With these properties in mind, let’s take a look at a common mistake that developers make when working with Where

// Generate numbers from -10 .. 100
var numbers = Enumerable.Range(-10, 111);
// Determines all positive numbers that are divisible by 6
var positiveDivisbleBySix = numbers
.Where(x=>x > 0) // iterates through the whole list (111 comparisons, returning 100 results)
.Where(x=>x % 2 == 0) // iterates through the new list (100 comparisons, returning 50 results)
.Where(x=>x % 3 == 0); // iterates through the new list (50 comparisons, returning 16 results)
// Overall metrics: 261 comparisons over 111 total elements)

By leveraging multiple Where statements, the list will be iterated once per statement. which may not be a big deal for small lists, but for larger lists, this will become a performance hit. In order to help cut down on the iterations, it’d be ideal to combine the multiple Where statements into a single one like so

// Generate numbers from -10 .. 100
var numbers = Enumerable.Range(-10, 111);
var positiveDivisibleBySix = numbers.Where(x => x > 0 && x % 2 == 0 && x % 3 == 0);

By combining the criteria in a single Where statement, we eliminate the multiple iteration problem, however, we introduce code that’s a bit harder to read and if we want to combine a non-fixed number of predicates, then this approach won’t work.

Since the goal is to take multiple predicates and combine them to a single predicate, my intuition is to leverage LINQ’s Aggregate method where we can take a List of items and reduce down to a single item.

Refactoring Multiple Where With Aggregate

In order to leverage Aggregate, we’ll first need to have a list of item to reduce down. Since all of the predicates are Func<int, bool>, we can easily create a List like so

Func<int, bool> isPositive = x => x > 0;
Func<int, bool> isEven = x => x % 2 == 0;
Func<int, bool> isDivisibleByThree = x => x % 3 == 0;
var predicates = new List<Func<int, bool>> {isPositive, isEven, isDivisibleByThree};

Now that we have a list of predicates, we can go ahead and start stubbing out the Aggregate call.

var combinedPredicate = predicates.Aggregate(...., ....);

In order to use Aggregate, we need to determine two pieces of information. First, what should the predicate be if there are no predicates to combine? Second, how do we we combine two predicates into a single predicate?

Defining The Base Case

When using Aggregate, the first thing that we need to think about is the base case, or in other words, what should the default value be if there are no elements to reduce down?

Given that the result needs to be a predicate, we know that the type should be Func<int, bool>, but how do we implement that? We’ve got one of two choices for the base case, we can either filter every item out (i.e. if no predicates are specified, then no items are kept) or we keep every item.

For our use case, we want to keep every item if there are no predicates, so our base case looks like the following

Func<int, bool> andIdentity = _ => true;

Defining How To Combine Predicates

Since we’re combining predicates, our combine function will need to have the following type

Func<int, bool> => Func<int, bool> => Func<int, bool>

Func<int, bool> combinedPredicateWithAnd(Func<int, bool> a, Func<int, bool> b)
{
return x => ...;
}

With this in mind, we know that for an item to be valid, it has to match every predicate in the list which implies that we’ll be leveraging the && operator

Func<int, bool> combinedPredicateWithAnd(Func<int, bool> a, Func<int, bool> b)
{
return x => ... && ...;
}

Now that we know to use &&, we can then use a and b to determine if the item is valid

Func<int, bool> combinedPredicateWithAnd(Func<int, bool> a, Func<int, bool> b)
{
return x => a(x) && b(x);
}

Bringing It All Together

With the base case established and a way to combine predicates, here’s how we can solve the original problem.

// Define the predicates
Func<int, bool> isPositive = x => x > 0;
Func<int, bool> isEven = x => x % 2 == 0;
Func<int, bool> isDivisibleByThree = x => x % 3 == 0;
var predicates = new List<Func<int, bool>> {isPositive, isEven, isDivisibleByThree};
// Defining the Aggregate functions
Func<int, bool> andIdentity = _ => true;
Func<int, bool> combinedPredicateWithAnd(Func<int, bool> a, Func<int, bool> b)
{
return x => a(x) && b(x);
}
// Combining the predicates
Func<int, bool> combinedAndPredicate = predicates.Aggregate(andIdentity, combinedPredicateWithAnd);
// The new solution
// Generate numbers from -10 .. 100
var numbers = Enumerable.Range(-10, 111);
var positiveDivisbleBySix = numbers.Where(combinedAndPredicate);

Going forward, if we need to add more predicates, all we need to do is to add it to the List and the rest of the application will work as expected

Wrapping Up

In this post, we explored LINQ’s Where method by examining its various properties. From there, we took a look at a common mistake developers make with Where and then showed how to resolve that issue by using Aggregate.

Shout out to Matthew Groves for letting me participate in C# Christmas (csadvent.christmas)

4 comments

  1. djcarter85

    You say that using .Where() multiple times results in the list being iterated multiple times, and that flattening it into a single .Where() call removes this problem.

    I don’t think this is true. LINQ uses deferred evaluation, meaning that the initial enumerable is only iterated when the result is iterated. Chaining the .Where() calls together therefore doesn’t make any difference to the number of times the initial list is iterated.

    I would say that the chained .Where() calls actually end up easier to read than using Aggregate! That said, it’s interesting to see how to combine predicates in this way – I think this means that Func is a monoid using the combinedPredicateWithAnd operation.

    1. Cameron Presley

      You’re totally correct that LINQ uses deferred execution and that it won’t process until required (by using .ToList() or First as examples) and I should have made that much clearer in the above!

      That being said, once we do call .ToList(), it will then iterate the list, first keeping only positives (hence the 111 comparisons, 100 results on the beginning), then keeping only evens, finally by the evenly divisible by three.

      And nice observation that there’s a monoid running around here! It’s definitely safe to say that the type Func is a monoid over the && operator given the seed that we use above and the combining function!

  2. John

    Your affirmation regarding multiple Where’s is incorrect. Sure there is a performance penalty, but not the one you think, and it isn’t as bad as you think.

    If you have several where’s the list is not iterated several times because Where returns an IEnumerable which isn’t an actual list, it is a thing that can enumerate the elements later on. This measn that if you chain a second Where after another then nothing happens yet. The filtering only occurs when you try to read the resulting IEnumerable.

    To recap
    var positiveDivisbleBySix = numbers
    .Where(x=>x > 0) // iterates through the whole list (111 comparisons, returning 100 results)
    .Where(x=>x % 2 == 0) // iterates through the new list (100 comparisons, returning 50 results)
    .Where(x=>x % 3 == 0); // iterates through the new list (50 comparisons, returning 16 results)
    does not read the list of numbers, it just sets up the filtering operations.

    To convert the result to a list, you can call .ToList(), which will only iterate over the list once, passing each element through the pipeline of conditions.

    var list = positiveDivisibleBySix.ToList();

    So if you do
    var first = positiveDivisbleBySix.First();
    that will just read the original list until it finds an element that matches the three conditions, then it doesn’t read any further

    If you then did
    var firstTwo = positiveDivisibleBySix.Take(2).ToList()
    then it will start reading from the beginning again (a trap to watch out for), but it will only read enough to find the first two results.

    1. Cameron Presley

      Yep, you’re right that by just setting up the Where chain isn’t bad because of the deferred execution, (I should have made that much clearer in the above example, thank you for calling that out!).

      I may not have made it clear, but my point was that using ToList when leveraging multiple Where’s can cause some performance problems. You’re completely correct that using Take, Skip, First, Single, and others of its kind can side-step a lot of the performance concerns.

Leave a Reply

%d bloggers like this: