Skip to content

2014

Today I Learned – How to Break Down A Massive Method

During this past week, I was working with our intern and showing him some cool debugging tricks when I came across a massive method. I gave him 30 seconds to look at it and tell me what he thought the method was doing. After a while, he was able to figure it out, but the fact that it wasn’t easy discernible was enough to give pause.

The lesson here is that if you can’t determine what the method is doing easily, then it’s probably doing way too much (violating the Single Responsibility Principle) and needs to be broken into more easily readable pieces.

To demonstrate what I mean, I wrote a program that inserts Messages into a database. A Message contains a description, the number (for identification when troubleshooting) and the module. We would have issues where different messages would have the same number which would cause confusion when troubleshooting errors.

In the program I wrote, the user provides the message and what module the message belongs to and the program automatically generates the message number and inserts the message into the database.

For brevity’s sake, shown below is the logic for determining what the next message number should be.

public int GetNextAlertAndErrorModuleNumber(string module)
{
  if (String.IsNullOrEmpty(module))
    throw new ArgumentException("module cannot be null or empty");
  if (_connection == null)
    _connection = CreateConnection();

  var results = new List<int>();

  _connection.Open();
  var cmd = new SqlCommand("dbo.GetAlertData", _connection);
  cmd.CommandType = CommandType.StoredProcedure;

  var reader = cmd.ExecuteReader();
  while (reader.Read())
  {
    if (!reader["ALERT_ID_NUMBER"].ToString().Contains(module))
      continue;

    var pieces = reader["ALERT_ID_NUMBER"].ToString().Split( );

    results.Add(Int32.Parse(pieces[1]));
  }
  if (reader != null)
    reader.Close();

  cmd = new SqlCommand("dbo.GetErrorData";, _connection);
  cmd.CommandType = CommandType.StoredProcedure;

  reader = cmd.ExecuteReader();
  while (reader.Read())
  {
    if (!reader["ERROR_ID_NUMBER"].ToString().Contains(module))
      continue;

    var pieces = reader["ERROR_ID_NUMBER"].ToString().Split( );

    results.Add(Int32.Parse(pieces[1]));
  }
  if (reader != null)
    reader.Close();

  if (_connection != null)
    _connection.Close();

  return results.Max() + 1;
}

The method itself isn’t complex, just calling some stored procedures, parsing the output and adding the number to a list. However, it’s not abundantly clear what the purpose of the calling the stored procedures.

First, it looks like we’re reading the alerts error numbers from a stored procedure call, why don’t we extract that logic out to a helper method and have the public method call the helper?

public int GetNextAlertAndErrorModuleNumber(string module)
{
  if (String.IsNullOrEmpty(module))
    throw new ArgumentException(&amp;amp;quot;module cannot be null or empty&amp;amp;quot;);

  if (_connection == null)
    _connection = CreateConnection();

  var results = new List<int>();

  _connection.Open();
  results.AddRange(ReadAlerts(module.ToUpper()));

  var cmd = new SqlCommand("dbo.GetErrorData", _connection);
  cmd.CommandType = CommandType.StoredProcedure;

  var reader = cmd.ExecuteReader();
  while (reader.Read())
  {
    if (!reader["ERROR_ID_NUMBER"].ToString().Contains(module))
      continue;

    var pieces = reader["ERROR_ID_NUMBER&"].ToString().Split( );

    results.Add(Int32.Parse(pieces[1]));
  }
  if (reader != null)
    reader.Close();

  if (_connection != null)
    _connection.Close();

  return results.Max() + 1;
  }

  private List<int> ReadAlerts(string module)
  {
    var results = new List<int>();
    var cmd = new SqlCommand("dbo.GetAlertData", _connection);
    cmd.CommandType = CommandType.StoredProcedure;

    var reader = cmd.ExecuteReader();
    while (reader.Read())
    {
      if (!reader["ALERT_ID_NUMBER"].ToString().Contains(module))
        continue;

      var pieces = reader["ALERT_ID_NUMBER"].ToString().Split( );
      results.Add(Int32.Parse(pieces[1]));
    }
    if (reader != null)
    reader.Close();

    return results;
}

By doing this, we fix two issues at once. First, we’ve given a name to the process of reading the alerts which in turns allows us to quickly understand what the public method should be doing (i.e. improved readability).

Second, it allows us for easier debugging because we now have smaller components. For example, let’s say that we were getting the wrong value. In the first implementation, we would have to put breakpoints in different areas trying to determine which part was broken. However, in the new form, we can check to see if ReadAlerts is behaving correctly. If it isn’t, we now know the bug has to be in that method, otherwise, it’s in the rest.

For the next step, you may have noticed that we can repeat the same refactoring trick again, except this time, we can extract the logic for reading the errors into a helper method.

public int GetNextAlertAndErrorModuleNumber(string module)
{
  if (String.IsNullOrEmpty(module))
    throw new ArgumentException("module cannot be null or empty");
  if (_connection == null)
    _connection = CreateConnection();

  _connection.Open();

  var results = new List<int>();
  results.AddRange(ReadAlerts(module.ToUpper()));
  results.AddRange(ReadErrors(module.ToUpper()));

  if (_connection != null)
    _connection.Close();

  return results.Max() + 1;
}

private List<int> ReadAlerts(string module)
{
  var results = new List<int>();
  var cmd = new SqlCommand("dbo.GetAlertData", _connection);
  cmd.CommandType = CommandType.StoredProcedure;

  var reader = cmd.ExecuteReader();
  while (reader.Read())
  {
    if (!reader["ALERT_ID_NUMBER"].ToString().Contains(module))
      continue;

    var pieces = reader["ALERT_ID_NUMBER"].ToString().Split( );
    results.Add(Int32.Parse(pieces[1]));
  }
  if (reader != null)
    reader.Close();

  return results;
}

private List<int> ReadErrors(string module)
{
  var results = new List<int>();
  var cmd = new SqlCommand("dbo.GetErrorData", _connection);
  cmd.CommandType = CommandType.StoredProcedure;

  var reader = cmd.ExecuteReader();
  while (reader.Read())
  {
    if (!reader["ERROR_ID_NUMBER"].ToString().Contains(module))
      continue;

    var pieces = reader["ERROR_ID_NUMBER"].ToString().Split( );
    results.Add(Int32.Parse(pieces[1]));
  }
  if (reader != null)
    reader.Close();

  return results;
}

After the changes, anyone who looks at the public API can easily see that it’s reading from both Alerts and Errors. This is really powerful because now you can communicate with non-technical people about requirements and what the code is doing.

Let’s say that in the future, the requirements change and this conversation plays out:

Alice (QA, finding an error) – Hey Bob, I was running one of our test plans and it looks like that we’re getting the wrong message number if we’re trying to add a new message and there are warning messages in the database. We are including the warning table when figuring that out, right?

Bob (Engineer, finding the root cause) – Hey you’re right, it looks like we’re only using the alerts and error tables when calculating the next number. Why don’t we write up a ticket for that and get a fix in?

The key point is that no matter how large a method is, there always have to be steps being performed in some order (by definition of an algorithm) and this is the key to refactoring larger methods into more maintainable pieces of code. The trick is determining what those steps are and making decisions on whether to make helper methods or helper classes.

If those steps become complicated, then they should be broken out into helper methods. As time progresses and those helper methods start to become more complicated, then those helper methods should in turn become classes of their own.

Today I Learned: The Law of Demeter

Don’t pass in more information that you need. It sounds simple, but when working with messy legacy code, it’s easy to forget.

The impetus for this post came from a peer code review. During the review, I found this method:

1
2
3
4
5
6
7
8
9
public IStrategy GetStrategy(Project project, bool isAffected)
{
  var type = project.Type;
  if (type == ProjectType.A && isAffected)
    return new ProjectAIsAffectedStrategy();
  if (type == ProjectType.B)
    return new ProjectBStrategy();
  // Similar if statements
}

At first glance, it looks pretty good. Logic was sound and it seemed to be returning a class implementing an interface similar to what we would expect for the Factory pattern. However, there’s a slight problem, can you spot it?

The issue is with the first parameter, Project. The method takes a Project, however, we’re only really depending on the Project’s Type property.

1
2
3
4
5
6
7
8
9
public IStrategy GetStrategy(Project project, bool isAffected)
{
  var type = project.Type;
  if (type == ProjectType.A && isAffected)
    return new ProjectAIsAffectedStrategy();
  if (type == ProjectType.B)
    return new ProjectBStrategy();
  // Similar if statements
}

So why don’t we get rid of the dependency on the Project and instead replace it with the dependency on the ProjectType instead?

1
2
3
4
5
6
7
8
public IStrategy GetStrategy(ProjectType type, bool isAffected)
{
  if (type == ProjectType.A &amp;&amp; isAffected)
    return new ProjectAIsAffectedStrategy();
  if (type == ProjectType.B)
    return new ProjectBStrategy();
  // Similar if statements
}

Instinctual, I knew this was the right call, but I couldn’t remember why I knew it was a good choice. After some digging, I remembered that this is a Law of Demeter violation, or better known as the Principle of Least Knowledge violation.

In general, this principle states that a method should have the least amount of information it needs to do it’s job. Other classic violations of this principle is when you use a class’s internals internals. For example,

SomeClassA.SomePropertyB.WithSomeMethodC()

One of the reasons that I really like the Law of Demeter is that if you follow it, you create easier to test methods. Don’t believe me? Which is easier to create, the Project class (which may have many dependencies that would need to be stubbed) or the ProjectType enum (which by definition has zero dependencies)?

Another reason that following the Law of Demeter is good practice is that it forces your code to be explicit about what dependencies are required. For example, in the first implementation, the caller knew that the method needed a Project, but had no clue on how much of the Project it needed (does it need all of the properties set? Does it need further setup besides basic instantiation?). However, with the refactored version, now it’s much clearer that the method has a looser dependency on not Project, but just the ProjectType.

Beginner Basics: Establishing a SOLID Foundation – The Dependency Inversion Principle

Welcome to the final installment of Establishing a SOLID Foundation series. In this post, we’ll be exploring the fifth part of SOLID, the Dependency Inversion Principle.

What is the Dependency Inversion Principle?

When working with object-oriented languages, we take large problems and break them down into smaller pieces. These smaller pieces in turn are broken down into even smaller, more manageable pieces to work on. As part of the breaking down process, we inherently have to introduce dependencies between the larger pieces and the smaller pieces.

How we weave these dependencies together is the difference between easily changing behavior and spending the next week pulling your hair out.

When working with classes, dependencies are usually introduced by constructing them in the class that they’re used in. For example, let’s say that we’ve been asked to write an utility that emulates an calculator but it also keeps a transaction log for record keeping purposes.

class Logger
  def log (content)
    File.open("C:\\temp\\results.txt", 'a') {|f| f.write(content)}
  end
end

class Calculator
  def initialize
      @logger = Logger.new()
  end
  def add (a, b)
      log(a, b, "+")
      return a + b
  end
  def sub (a, b)
      log(a, b, "-")
      return a - b
  end
  def mult (a, b)
      log(a,b,"*")
      return a * b
  end
  def div (a, b)
      log(a,b,"/")
      return a.to_f / b
  end

  def log(a, b, sym)
      text = a.to_s + " " + sym + " " + b.to_s + " = "
      if sym == "+"
            text += (a + b).to_s
      elsif sym == "-"
            text += (a-b).to_s
      elsif sym == "*"
            text += (a*b).to_s
      else
            text += (a.to_f/b).to_s
      end
      text += "\n"
      @logger.log(text)
  end
end

# Usage
calc = Calculator.new()
puts calc.add(4,3)
puts calc.sub(2,1)
puts calc.mult(100,2)
puts calc.div(5,2)

So far so good, we have two classes (Logger and Calculator) that is responsible for logging and the calculations. Even though this is a small amount of code (~50 lines for both classes), there are three dependencies in the code. The first dependency that I notice is that Calculator depends on Logger. We can see this by looking at the initialize method for Calculator (as hinted above):

1
2
3
4
5
class Calculator
  def initialize
    @logger = Logger.new("C:\\temp\\results.txt")
  end
end

The second dependency is a bit trickier to find, but in the Logger class’ log method, we use a hard coded file path.

1
2
3
4
5
class Logger
  def log (content)
      File.open("C:\\temp\\results.txt", 'a') {|f| f.write(content)}
  end
end

The third dependency is probably the hardest to find, but in the the Logger class’ log method, we are also depending on the file system for the machine by using Ruby’s File class. But wait a minute, I hear you say, why is the File class considered a dependency, that’s a part of the Ruby language? I agree with you, but it’s still a dependency in the code and something that we should keep in mind.

From these three dependencies, we’re going to focus on resolving the first two. We could make resolve the dependency on the file system, but it would take so much effort for so little gain.

Why don’t we resolve the file dependency issue?

One thing that I keep in mind when identifying which dependencies to invert is to focus on inverting dependencies outside of the framework. In this case, we’re going to ignore the file system dependency because I, the programmer, depend on Ruby to behave correctly. If Ruby stops working, then a broken file system is the least of my worries. Therefore, it’s not worth the effort to resolve.

Making the Calculator and Logger more flexible

In order to resolve these DIP violations, we need to expose ways to drop in these dependencies. There are two ways of doing this. We can either:

  • Expose the dependency via the constructor
  • Expose the dependency via a public property

Using the Calculator example, we can expose the logger dependency via the constructor and we can expose the filePath dependency by exposing it as a property.

For the Calculator class, I’m going to first change the initialize method so that it takes a logger instead of constructing it’s own.

1
2
3
4
5
class Calculator
  def initialize(logger)
    @logger = logger
  end
end

Next, I will construct the logger in the usage and pass the logger as part of the constructor.

1
2
3
# Usage
logger = Logger.new()
calc = Calculator.new(logger)

A quick run of our program tells us that everything is still working correctly.

Now that we’ve finished up exposing the logger dependency, it’s time to expose the file path. First, I’m going to add a public property on the Logger class call filePath and use that property in the log method

1
2
3
4
5
6
class Logger
  def log (content)
      File.open(filePath, 'a') {|f| f.write(content)}
  end
  attr_accessor :filePath
end

Now that we’ve introduced a seam for the file path dependency, we use that seam in the program and assign the property```ruby

1
2
3
4
# Usage
logger = Logger.new()
logger.filePath = "C:\\temp\\results.txt"
calc = Calculator.new(logger)

Multiple ways of solving the issue, which one is best?

When using the constructor method, it’s very clear to see what dependencies the class has, just look at the constructor. However, adding a new dependency to the constructor may cause other code to fail because the signature of the constructor has changed. This in turn can lead to cascading changes where multiple places of code need to be updated to pass in the dependency.

On the other hand, using the property method, the change is less invasive because the property can be set independently of the when the object was constructed. However, it’s harder to see the dependencies for the class because now the properties are containing the dependencies. Also, it’s very easy to forget to set a property before using the object.

Both of these methods are valid, but when I’m working with DIP, I prefer to expose the dependencies via the constructor because if my class starts to gain more and more dependencies, then it’s a sign that my class is doing too much and violating the Single Responsibility Principle (SRP). You can say that DIP is the canary in the coal mine for SRP.

TL;DR

In summary, the Dependency Inversion Principle (DIP) tells us that we should we should have the outside world pass in our dependencies. If we don’t follow this rule, then we will not. In order to resolve violations, we need to determine what dependencies we have and modify our constructor to accept those dependencies. If our constructor becomes too large, then our class might be violating the Single Responsibility Principle. By following the DIP, we expose the dependencies our classes require and allows for greater decoupling. As always, don’t forget to refactor as you go along.

Establishing a SOLID Foundation Series