Sunday, June 27, 2010

Turning functional code into great code: Part 2: Language usage

Welcome! This post is part of a series; if you missed Part 1, go ahead and read it.

Today, we're going to focus on how the StylesheetCreator class can make better use of the Java language and its standard API. We'll leave third-party API usage and design improvements for another day, although there is always a bit of overlap among these three areas.

The first improvement is at the very end of the createStylesheet method when writing out the file.


FileOutputStream fo = new FileOutputStream( stylesheetDirectory
+ outputFilename );
PrintStream ps = new PrintStream( fo );
ps.println( xsltText );


There are two issues here:

  • The stream is never closed. It will be closed when the garbage collector cleans it up and runs the stream's finalize method, but we don't know when that is. So we could run out of open file descriptors if the method is called many times repeatedly.
  • We are using an OutputStream to write characters to a file. InputStream and OutputStream are for reading and writing byte streams; Reader and Writer are for reading and writing character streams. They also give us control over the character encoding.


So our new code looks like this:


FileWriter fw = new FileWriter(
new File( stylesheetDirectory, outputFilename ) );
try
{
fw.write( xsltText );
fw.write( "\n" );
}
finally
{
fw.close();
}


If we run the test again, it still works, so we know that our change hasn't broken anything.

Now let's go back and look at this code:


String ruleContents = "";
...
for ( int z = 0; z < ruleList.size(); z++ )
{
...
switch ( ruleId )
{
case 1: // concat
{
System.out.println( "Case 1" );
ruleContents += stylesheetHelper.concatRule( z + 1,
columnName, params );
break;
}
...
}
}


Do you notice what is happening? We are building the ruleContents string in a loop via the += operator. Since strings in Java are immutable, this will construct a brand-new string each time through the loop, by copying the contents of the old string into the new one, then appending the new rule text. Basically, in Java, the code a += b is equivalent to:


a = new StringBuffer( a ).append( b ).toString();


This is fine if we're only doing it a few times, but it can get pretty inefficient to do it over and over in a loop.

So how do we fix it? By using either StringBuffer, or (in Java 5 and up) StringBuilder instead. The difference is that StringBuffer is synchronized, meaning that you can safely call it from multiple threads at once; and StringBuilder is not. We're not creating multiple threads within the code that is building up our strings, so we'll just use StringBuilder as follows.


StringBuilder ruleContents = new StringBuilder();
...
for ( int z = 0; z < ruleList.size(); z++ )
{
...
switch ( ruleId )
{
case 1: // concat
{
System.out.println( "Case 1" );
ruleContents.append( stylesheetHelper.concatRule( z + 1,
columnName, params ) );
break;
}
...
}
}


We can modify the other cases similarly. Once again, we run our test, and it still passes. Excellent! (By the way, I'll just assume that we run the test after each subsequent modification mentioned in this post to ensure that we haven't broken anything.)

Now let's look at a somewhat odd bit of code: the params variable. Here's how it's constructed and two samples of how it's used.


String params = "";
if ( ruleId == 1 || ruleId == 2 || ruleId == 3 || ruleId == 5 )
{
params = param1;
}
else
{
params = param1 + "," + param2;
}
...
switch ( ruleId )
{
...
case 2: // greater than
{
System.out.println( "Case 2" );
if ( params.indexOf( "-" ) < 0 )
{
int paramInt = Integer.parseInt( params );
ruleContents.append( stylesheetHelper.greaterThanRule(
z + 1, columnName, paramInt ) );
}
else
{
ruleContents.append( stylesheetHelper.greaterThanRule(
z + 1, columnName, params ) );
}
break;
}
...
case 4: // replace
{
System.out.println( "Case 4" );
String paramArr[] = params.split( "," );
ruleContents.append( stylesheetHelper.replaceRule( z + 1,
paramArr[0], columnName, paramArr[1] ) );
break;
}
...
}


This looks rather suspect to me. param1 and param2 (only where applicable) are concatenated together into params separated by a comma, and then they are split apart again in some cases. This seems unnecessary and even fragile, especially if the strings themselves contain commas. Fortunately, we can entirely avoid the use of the params variable as follows.


...
switch ( ruleId )
{
...
case 2: // greater than
{
System.out.println( "Case 2" );
if ( param1.indexOf( "-" ) < 0 )
{
int paramInt = Integer.parseInt( param1 );
ruleContents.append( stylesheetHelper.greaterThanRule(
ruleNum, columnName, paramInt ) );
}
else
{
ruleContents.append( stylesheetHelper.greaterThanRule(
ruleNum, columnName, param1 ) );
}
break;
}
...
case 4: // replace
{
System.out.println( "Case 4" );
ruleContents.append( stylesheetHelper.replaceRule( z + 1,
param1, columnName, param2 ) );
break;
}
...
}


We can make similar adjustments for all the other cases.

Now let's look at the signature of the createStylesheet method:


public void createStylesheet( String parentNodeName, String childNodeNames,
List ruleList, String outputFilename ) throws Exception


When I created the test case in part 1 of this series, the hardest part was trying to determine what I needed to pass in for childNodeNames and ruleList. As it turns out, childNodeNames is a comma-separated list of XML node names, which are strings. So, if it is a list of Strings, why not just use a List<String>? Indeed we should do that, but we'll postpone that modification until part 3, when we take a look at StylesheetHelper.

So what about ruleList? The trouble is that I don't know exactly what needs to go in the list -- and in fact, Java will give you a warning when you try to compile the file, complaining that we're using a raw type (one without type parameters) instead of a parameterized type. Happily, we can figure out what type of object goes in the list by looking at the code that uses it.


for ( int z = 0; z < ruleList.size(); z++ )
{
ArrayList rule = (ArrayList) ruleList.get( z );

String columnName = (String) rule.get( 1 );
int ruleId = ( Integer.parseInt( rule.get( 2 ) + "" ) );
String param1 = (String) ( rule.get( 3 ) );
String param2 = (String) ( rule.get( 4 ) );
...
}


So, ruleList is a list of ArrayLists, each of which contains five elements. The first element (index 0) is never used; the second, fourth and fifth are strings, and the third is unknown. Since I don't have all the original code, I'm going to go out on a limb here and assume that all the elements of each inner list are strings. I don't know for sure, but it makes this example easier. So we have a list of ArrayLists of strings:


public void createStylesheet( String parentNodeName, String childNodeNames,
List<ArrayList<String>> ruleList, String outputFilename ) throws Exception


But we can make this a little better. Why should we require the inner lists to be ArrayLists? It is true that we get elements out via random access (i.e. the get method), and LinkedLists require traversing all the way from the beginning of the list to the required element. However, these inner lists only contain five elements, so the performance penalty will be minimal. So let's just use List instead of ArrayList and permit the caller to pass in any type of list. Our method signature now looks like this:


public void createStylesheet( String parentNodeName, String childNodeNames,
List<List<String>> ruleList, String outputFilename ) throws Exception


Note that this is still not an optimal solution, but since we're just showing language usage for now and not delving into design, we'll leave it like this.

So now that we have a properly parameterized list, we can make good use of it and remove all the casts from the loop:


for ( int z = 0; z < ruleList.size(); z++ )
{
List<String> rule = ruleList.get( z );

String columnName = rule.get( 1 );
int ruleId = ( Integer.parseInt( rule.get( 2 ) ) );
String param1 = rule.get( 3 );
String param2 = rule.get( 4 );
...
}


Also, for the aforementioned performance reasons, we should avoid using random access on long lists if we can -- and the outermost list of rules could theoretically be long. Happily, there's no reason in this case why we need to use random access (List.get) to get the rules out of the ruleList. In fact, Java 5 makes this super-easy for us.


for ( List<String> rule : ruleList )
{
String columnName = rule.get( 1 );
...
switch ( ruleId )
{
case 1: // concat
{
System.out.println( "Case 1" );
ruleContents.append( stylesheetHelper.concatRule( z + 1,
columnName, params );
break;
}
...
}
}


Unfortunately, the above code won't compile. We no longer have the loop index z that we were using to represent the rule number, which is passed into all the StylesheetHelper methods. As a result, we need to simulate the old z index:


int z = 0;
for ( List<String> rule : ruleList )
{
String columnName = rule.get( 1 );
...
switch ( ruleId )
{
case 1: // concat
{
System.out.println( "Case 1" );
ruleContents.append( stylesheetHelper.concatRule( z + 1,
columnName, params );
break;
}
...
}

z++;
}


If you take a look at the way that z is used throughout the code, it's always z + 1, a rule number. So let's give it a meaningful name, and initialize it to 1 so that we don't constantly have to add 1 everywhere.


int ruleNum = 1;
for ( List<String> rule : ruleList )
{
String columnName = rule.get( 1 );
int ruleId = Integer.parseInt( rule.get( 2 ) );
String param1 = rule.get( 3 );
String param2 = rule.get( 4 );
System.out.println( "Rule table values: " + columnName + " " +
ruleId + " " + param1 + " " + param2 );

switch ( ruleId )
{
case 1: // concat
{
System.out.println( "Case 1" );
ruleContents.append( stylesheetHelper.concatRule( ruleNum,
columnName, params );
break;
}
...
}

ruleNum++;
}


While we're looking at the switch statement, there are a couple of improvements that can be made. Firstly, what happens if we don't match any case? Right now, nothing. The code will just bypass the errant rule ID and keep going with the next one. However, if I were debugging this code in production, I'd want to know that someone passed an unexpected rule ID to this method! So let's add a default case. In fact, in general, one should always have a default case in a switch statement.


int ruleNum = 1;
for ( List<String> rule : ruleList )
{
String columnName = rule.get( 1 );
int ruleId = Integer.parseInt( rule.get( 2 ) );
String param1 = rule.get( 3 );
String param2 = rule.get( 4 );
System.out.println( "Rule table values: " + columnName + " " +
ruleId + " " + param1 + " " + param2 );

switch ( ruleId )
{
case 1: // concat
{
...
}
case 2: // greater than
{
...
}
...
default:
{
System.out.println( "WARNING: Tried to invoke unknown rule ID " + ruleId );
}
}

ruleNum++;
}


Now let's get rid of the "magic numbers" above (case 1, 2, 3, 4, 5, 16). We could use public static final int constants to do that, but we can do better.

Java 5 has a new feature for representing a selection of known values: enums. Unlike in C, where enums are basically just glorified integers, Java represents enums as classes, with each enum value being an instance of that class. Since each enum has an "ordinal" (integer) value, enums can be used in switch statements too.


package net.galluzzo.example.stylesheetbuilder.refactored.part2;

public enum RuleType
{
CONCAT,
GREATER_THAN,
LESS_THAN,
REPLACE,
LENGTH,
BETWEEN
}


This is a good start; but in our case, we have an existing rule ID that we want to map to a RuleType -- we don't want to use Java's default numbering scheme that starts at 0 and goes up sequentially. So instead of doing external mapping, we'll let the enumeration itself do the mapping.


package net.galluzzo.example.stylesheetbuilder.refactored.part2;

import java.util.EnumSet;
import java.util.HashMap;
import java.util.Map;

public enum RuleType
{
CONCAT( 1 ),
GREATER_THAN( 2 ),
LESS_THAN( 3 ),
REPLACE( 4 ),
LENGTH( 5 ),
BETWEEN( 16 );

private static final Map<Integer, RuleType> ruleIdToRuleTypeMap =
new HashMap<Integer, RuleType>();

private int ruleId;


static
{
for ( RuleType ruleType : EnumSet.allOf( RuleType.class ) )
{
ruleIdToRuleTypeMap.put( ruleType.getRuleId(), ruleType );
}
}

public static RuleType valueFromId( int ruleId )
{
return ruleIdToRuleTypeMap.get( ruleId );
}


private RuleType( int ruleId )
{
this.ruleId = ruleId;
}

public int getRuleId()
{
return ruleId;
}
}


The methods are all pretty self-explanatory except for the static initializer, which uses the built-in EnumSet class to loop over all the enum values in order to build a map from rule IDs to their corresponding RuleTypes. We use this map in the valueFromId method, which retrieves the RuleType instance for a particular rule ID.

So the loop in StylesheetCreator.createStylesheet now looks like this:


int ruleNum = 1;
for ( List<String> rule : ruleList )
{
String columnName = rule.get( 1 );
int ruleId = Integer.parseInt( rule.get( 2 ) );
RuleType ruleType = RuleType.valueFromId( ruleId );
String param1 = rule.get( 3 );
String param2 = rule.get( 4 );
System.out.println( "Rule table values: " + columnName + " " +
ruleId + " " + param1 + " " + param2 );

switch ( ruleType )
{
case CONCAT:
{
System.out.println( "Case 1" );
ruleContents.append( stylesheetHelper.concatRule( ruleNum,
columnName, param1 ) );
break;
}
case GREATER_THAN:
{
System.out.println( "Case 2" );
if ( param1.indexOf( "-" ) < 0 )
{
int paramInt = Integer.parseInt( param1 );
ruleContents.append( stylesheetHelper.greaterThanRule(
ruleNum, columnName, paramInt ) );
}
else
{
ruleContents.append( stylesheetHelper.greaterThanRule(
ruleNum, columnName, param1 ) );
}
break;
}
...
}

ruleNum++;
}


We replaced the switch statement with a switch over the RuleType enum values, which is more readable. In fact, we were able to ditch the comments after each case describing what it was supposed to do; they're redundant, because the enum name gives us the same information that the comment used to.

Let's go one step further by consolidating the System.out.printlns at the beginning of each case into one place, and let's make the code print out the rule type enumeration instead of the rule ID:


int ruleNum = 1;
for ( List<String> rule : ruleList )
{
String columnName = rule.get( 1 );
int ruleId = Integer.parseInt( rule.get( 2 ) );
RuleType ruleType = RuleType.valueFromId( ruleId );
String param1 = rule.get( 3 );
String param2 = rule.get( 4 );
System.out.println( "Rule table values: " + columnName + " " +
ruleId + " " + param1 + " " + param2 );

if ( ruleType != null )
{
System.out.println( "Case " + ruleType );
}

switch ( ruleType )
{
case CONCAT:
{
ruleContents.append( stylesheetHelper.concatRule( ruleNum,
columnName, param1 ) );
break;
}
case GREATER_THAN:
{
if ( param1.indexOf( "-" ) < 0 )
{
int paramInt = Integer.parseInt( param1 );
ruleContents.append( stylesheetHelper.greaterThanRule(
ruleNum, columnName, paramInt ) );
}
else
{
ruleContents.append( stylesheetHelper.greaterThanRule(
ruleNum, columnName, param1 ) );
}
break;
}
...
}

ruleNum++;
}


This enum type is pretty nifty, but we'll discuss an even better solution in another couple of posts.

That's about it for now. We've covered Writer vs. OutputStream, StringBuffer vs. StringBuilder vs. concatenation, type parameters, the enhanced for loop, and enums. Next time, we'll look beyond the standard Java runtime to third-party APIs, and take our first look at the mysterious StylesheetHelper class. Stay tuned!

No comments: