Today I Learned

During this past week, I was working with our intern and showing him some cool debugging tricks when I came across a massive method. I gave him 30 seconds to look at it and tell me what he thought the method was doing. After a while, he was able to figure it out, but the fact that it wasn’t easy discernible was enough to give pause.

The lesson here is that if you can’t determine what the method is doing easily, then it’s probably doing way too much (violating the Single Responsibility Principle) and needs to be broken into more easily readable pieces.

To demonstrate what I mean, I wrote a program that inserts Messages into a database. A Message contains a description, the number (for identification when troubleshooting) and the module. We would have issues where different messages would have the same number which would cause confusion when troubleshooting errors.

In the program I wrote, the user provides the message and what module the message belongs to and the program automatically generates the message number and inserts the message into the database.

For brevity’s sake, shown below is the logic for determining what the next message number should be.

[code language=”csharp”]
public int GetNextAlertAndErrorModuleNumber(string module)
{
if (String.IsNullOrEmpty(module))
throw new ArgumentException("module cannot be null or empty");

if (_connection == null)
_connection = CreateConnection();

var results = new List<int>();

_connection.Open();
var cmd = new SqlCommand("dbo.GetAlertData", _connection);
cmd.CommandType = CommandType.StoredProcedure;

var reader = cmd.ExecuteReader();
while (reader.Read())
{
if (!reader["ALERT_ID_NUMBER"].ToString().Contains(module))
continue;

var pieces = reader["ALERT_ID_NUMBER"].ToString().Split(‘ ‘);

results.Add(Int32.Parse(pieces[1]));
}
if (reader != null)
reader.Close();

cmd = new SqlCommand("dbo.GetErrorData";, _connection);
cmd.CommandType = CommandType.StoredProcedure;

reader = cmd.ExecuteReader();
while (reader.Read())
{
if (!reader["ERROR_ID_NUMBER"].ToString().Contains(module))
continue;

var pieces = reader["ERROR_ID_NUMBER"].ToString().Split(‘ ‘);

results.Add(Int32.Parse(pieces[1]));
}
if (reader != null)
reader.Close();

if (_connection != null)
_connection.Close();

return results.Max() + 1;
}
[/code]
The method itself isn’t complex, just calling some stored procedures, parsing the output and adding the number to a list. However, it’s not abundantly clear what the purpose of the calling the stored procedures.

First, it looks like we’re reading the alerts error numbers from a stored procedure call, why don’t we extract that logic out to a helper method and have the public method call the helper?

[code language=”csharp” highlight=”12, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55″]

<pre>
public int GetNextAlertAndErrorModuleNumber(string module)
{
if (String.IsNullOrEmpty(module))
throw new ArgumentException(&amp;amp;quot;module cannot be null or empty&amp;amp;quot;);

if (_connection == null)
_connection = CreateConnection();

var results = new List<int>();

_connection.Open();
results.AddRange(ReadAlerts(module.ToUpper()));

var cmd = new SqlCommand("dbo.GetErrorData", _connection);
cmd.CommandType = CommandType.StoredProcedure;

var reader = cmd.ExecuteReader();
while (reader.Read())
{
if (!reader["ERROR_ID_NUMBER"].ToString().Contains(module))
continue;

var pieces = reader["ERROR_ID_NUMBER&"].ToString().Split(‘ ‘);

results.Add(Int32.Parse(pieces[1]));
}
if (reader != null)
reader.Close();

if (_connection != null)
_connection.Close();

return results.Max() + 1;
}

private List&amp;amp;lt;int&amp;amp;gt; ReadAlerts(string module)
{
var results = new List<int>();
var cmd = new SqlCommand("dbo.GetAlertData", _connection);
cmd.CommandType = CommandType.StoredProcedure;

var reader = cmd.ExecuteReader();
while (reader.Read())
{
if (!reader["ALERT_ID_NUMBER"].ToString().Contains(module))
continue;

var pieces = reader["ALERT_ID_NUMBER"].ToString().Split(‘ ‘);
results.Add(Int32.Parse(pieces[1]));
}
if (reader != null)
reader.Close();

return results;
}
[/code]

By doing this, we fix two issues at once. First, we’ve given a name to the process of reading the alerts which in turns allows us to quickly understand what the public method should be doing (i.e. improved readability).

Second, it allows us for easier debugging because we now have smaller components. For example, let’s say that we were getting the wrong value. In the first implementation, we would have to put breakpoints in different areas trying to determine which part was broken. However, in the new form, we can check to see if ReadAlerts is behaving correctly. If it isn’t, we now know the bug has to be in that method, otherwise, it’s in the rest.

For the next step, you may have noticed that we can repeat the same refactoring trick again, except this time, we can extract the logic for reading the errors into a helper method.

[code language=”csharp” highlight=”11,12″]
public int GetNextAlertAndErrorModuleNumber(string module)
{
if (String.IsNullOrEmpty(module))
throw new ArgumentException("module cannot be null or empty");

if (_connection == null)
_connection = CreateConnection();
_connection.Open();

var results = new List<int>();
results.AddRange(ReadAlerts(module.ToUpper()));
results.AddRange(ReadErrors(module.ToUpper()));

if (_connection != null)
_connection.Close();

return results.Max() + 1;
}

private List&amp;amp;lt;int&amp;amp;gt; ReadAlerts(string module)
{
var results = new List<int>();
var cmd = new SqlCommand("dbo.GetAlertData", _connection);
cmd.CommandType = CommandType.StoredProcedure;

var reader = cmd.ExecuteReader();
while (reader.Read())
{
if (!reader["ALERT_ID_NUMBER"].ToString().Contains(module))
continue;

var pieces = reader["ALERT_ID_NUMBER"].ToString().Split(‘ ‘);
results.Add(Int32.Parse(pieces[1]));
}
if (reader != null)
reader.Close();

return results;
}

private List&amp;amp;lt;int&amp;amp;gt; ReadErrors(string module)
{
var results = new List<int>();
var cmd = new SqlCommand("dbo.GetErrorData", _connection);
cmd.CommandType = CommandType.StoredProcedure;

var reader = cmd.ExecuteReader();
while (reader.Read())
{
if (!reader["ERROR_ID_NUMBER"].ToString().Contains(module))
continue;

var pieces = reader["ERROR_ID_NUMBER"].ToString().Split(‘ ‘);
results.Add(Int32.Parse(pieces[1]));
}
if (reader != null)
reader.Close();

return results;
}
[/code]

After the changes, anyone who looks at the public API can easily see that it’s reading from both Alerts and Errors. This is really powerful because now you can communicate with non-technical people about requirements and what the code is doing.

Let’s say that in the future, the requirements change and this conversation plays out:

Alice (QA, finding an error) – Hey Bob, I was running one of our test plans and it looks like that we’re getting the wrong message number if we’re trying to add a new message and there are warning messages in the database. We are including the warning table when figuring that out, right?

Bob (Engineer, finding the root cause) – Hey you’re right, it looks like we’re only using the alerts and error tables when calculating the next number. Why don’t we write up a ticket for that and get a fix in?

The key point is that no matter how large a method is, there always have to be steps being performed in some order (by definition of an algorithm) and this is the key to refactoring larger methods into more maintainable pieces of code. The trick is determining what those steps are and making decisions on whether to make helper methods or helper classes.

If those steps become complicated, then they should be broken out into helper methods. As time progresses and those helper methods start to become more complicated, then those helper methods should in turn become classes of their own.

Today I Learned

When working in a start-up, without fail, there’s always a need for a way to install our desktop application onto customer machines in a quick/painless fashion. This in itself requires quite a bit of research (Is xcopy enough or what tool do I need to use?) and in typical fashion you need to make a decision without knowing all of the requirements.

The biggest problem with current installer technologies is that it takes a significant amount of work to get one going. Going with a scripting solution like NSIS or WiX and it’s your first time? Be prepared for the steep learning curve for both (and in the meantime you don’t have a lot to show for your work). Using an installer wizard like InstallShield? Be prepared for both the steep learning curve and the massive amount of documentation on how to create the installer.

What I really needed was some tool that could easily make an installer for me without many bells and whistles just so that management could use the tool and give feedback. In short, I was really looking for a minimum viable installer.

After digging around, I found Excelsior, a very easy to use tool for creating installer packages. It’s not the easiest thing to automate, but for creating one-time installers or creating them infrequently, this tool is fantastic.

Getting Started With Excelsior

After installation, go ahead and start Excelsior and click “New”.

In general, the flow for creating the installer is to first choose which files need to be included in the installer. In this application, it’s pretty easy to specify which files to copy:

excelsior_step_1

On the next screen, you can specify the names of the company and product, the version, and (in my opinion, the most useful), shortcuts for your application.

excelsior_step_2

On the third screen, you can add the End User License Agreement (EULA), a splash screen, and custom actions that can be ran after install (updating configuration files or launching other applications). In this screenshot, I added a EULA and an always run action to start the installed application.

excelsior_step_3

After finishing up that screen, save the installation package files and let Excelsior create the installer. By saving the installation package files, Excelsior has a way to track if there’s a new version of the file in the directory you specified earlier. So auto-updating files is great, however, there’s not a way for it to determine if new files should be added or not.

Pros

  • Simple to get started with (stumbling on my own, I was able to get an installer up and running within an hour).
  • Has a decent amount of customization (shortcuts, EULA, custom actions).
  • Can export to NSIS so can incorporate installer creation into continuous integration.
  • Free for personal and commercial use, have another product, Excelsior Delivery that has a few more bells and whistles that costs about $100.

Cons

  • Does not support custom screens out of the box. For example, I needed a screen that would allow the end user to specify a SQL Server database instance. Excelsior cannot do this in the UI, but can be done by coding it in NSIS.
  • There’s not a whole of documentation on the product so it’s usually faster to stumble around in the application looking for a feature.
  • There’s no way for it to automatically add files to the list, however, it can auto-update already added files.
Today I Learned

Don’t pass in more information that you need. It sounds simple, but when working with messy legacy code, it’s easy to forget.

The impetus for this post came from a peer code review. During the review, I found this method:

[code language=”csharp”]
public IStrategy GetStrategy(Project project, bool isAffected)
{
var type = project.Type;
if (type == ProjectType.A &amp;&amp; isAffected)
return new ProjectAIsAffectedStrategy();
if (type == ProjectType.B)
return new ProjectBStrategy();
// Similar if statements
}
[/code]

At first glance, it looks pretty good. Logic was sound and it seemed to be returning a class implementing an interface similar to what we would expect for the Factory pattern. However, there’s a slight problem, can you spot it?

The issue is with the first parameter, Project. The method takes a Project, however, we’re only really depending on the Project’s Type property.

[code language=”csharp” highlight=”3″]
public IStrategy GetStrategy(Project project, bool isAffected)
{
var type = project.Type;
if (type == ProjectType.A &amp;&amp; isAffected)
return new ProjectAIsAffectedStrategy();
if (type == ProjectType.B)
return new ProjectBStrategy();
// Similar if statements
}
[/code]

So why don’t we get rid of the dependency on the Project and instead replace it with the dependency on the ProjectType instead?

[code language=”csharp” highlight=”1″]
public IStrategy GetStrategy(ProjectType type, bool isAffected)
{
if (type == ProjectType.A &amp;&amp; isAffected)
return new ProjectAIsAffectedStrategy();
if (type == ProjectType.B)
return new ProjectBStrategy();
// Similar if statements
}
[/code]

Instictually, I knew this was the right call, but I couldn’t remember why I knew it was a good choice. After some digging, I remembered that this is a Law of Demeter violation, or better known as the Principle of Least Knowledge violation.

In general, this principle states that a method should have the least amount of information it needs to do it’s job. Other classic violations of this principle is when you use a class’s internals internals. For example,

[code language=”csharp”]SomeClassA.SomePropertyB.WithSomeMethodC()[/code]

One of the reasons that I really like the Law of Demeter is that if you follow it, you create easier to test methods. Don’t believe me? Which is easier to create, the Project class (which may have many dependencies that would need to be stubbed) or the ProjectType enum (which by definition has zero dependencies)?

Another reason that following the Law of Demeter is good practice is that it forces your code to be explicit about what dependencies are required. For example, in the first implementation, the caller knew that the method needed a Project, but had no clue on how much of the Project it needed (does it need all of the properties set? Does it need further setup besides basic instantiation?). However, with the refactored version, now it’s much clearer that the method has a looser dependency on not Project, but just the ProjectType.