Refactorings from "Writing Efficient Programs"

Jon Bentley wasn’t using the term “refactoring” in his books like Writing Efficient Programs, but many of his optimization guidelines can be looked at that way. This paper describes several of those rules, and points out a few subtleties concerning them.

The refactoring rules in Writing Efficient Programs tend to be at a lower level than the refactorings described in M. Fowler’s Refactoring book: they’re focused more on performance issues, sometimes even at the expense of code readability. (If you use these rules for performance tuning, measure your program to ensure that performance is actually improving: some rules work better “forward” on some machines and “backwards” on others, and some can interfere with what your compiler does.) Not all the rules are strictly performance-based: some will enhance your understanding of what the program does.

We’ll discuss these rules in four categories:

Expressions
Loops
Procedures
Others

The names don’t always agree with Bentley’s originals; I’ve tried to use Fowler’s approach of using an active verb phrase (“do this”).

The approach will be to show a motivating example “before and after”, like this:

Name of refactoring
Before Sample code before refactoring	After Same code after refactoring

Name of refactoring

Before

Sample code before refactoring

After

Same code after refactoring

The part that’s going to change (on the left) is show in italics; the part that has changed (on the right) is shown in bold. It will be followed by any discussion.

Expressions

Short-circuit a Monotone Function
Before if (a & b & c) ...	After if (a && b && c) ...

Short-circuit a Monotone Function

Before

if (a & b & c)
   ...

After

if (a && b && c) 
   ...

Some functions can only grow in one direction: “and” conditions can never go from false to true, positive integers added together get bigger. This table suggests a few possibilities:

Operator	Type	Result can only be…
& (“and”)	boolean	more false
\| (“or”)	boolean	more true
+	positive numbers	bigger
+	String	longer
–	positive numbers	smaller
add()	collection	bigger

Initialize Earlier
Before f() { double value = Math.sin(0.25); // rest of original }	After final static double defaultValue = Math.sin(0.25); . . . f() { double value = defaultValue; // rest of original }

Initialize Earlier

Before

f() {
    double value = Math.sin(0.25);
    // rest of original
}

After

final static double defaultValue = Math.sin(0.25);
 . . .
f() {
    double value = defaultValue;
    // rest of original 
}

In the example, we’ve moved initialization from when the routine is called to class loading time (and possibly compile-time). This table shows initialization time from latest to earliest:

Initialization time
In loop
Before loop
In function (local variable creation)
In constructor (instance creation)
As static (class load-time creation)
Compile-time

Notice the use of final static in the example – this says to create the value only once, when the class is loaded, and never permit it to be changed after that.

Pre-Compute Values (“Lookup Table”)
Before int f(int i) { if (i < 10) {return i i - i;*} return 0; }	After *final static int[] values = {00-0, 11-1, 22-2, ..., 99-9};* int f(int i) { if (i < 10) {return values[i];} return 0; }

Pre-Compute Values (“Lookup Table”)

Before

int f(int i) {
    if (i < 10) {return i * i - i;}
    return 0;
}

After

final static int[] values = 
  {0*0-0, 1*1-1, 2*2-2, ..., 9*9-9};

int f(int i) {
    if (i < 10) {return values[i];}
    return 0;
}

This is an example of “Initialize Earlier”, implemented in a specific way.

Replace Boolean with “If”
Before boolean condition = age > 21 && income < 100000; f(); if (condition) { // do something } else { // do something else }	After if (age > 21 && income < 100000) { f(); // do something } else { f(); // do something else }

Replace Boolean with “If”

Before

boolean condition = 
  age > 21 && income < 100000;

f();

if (condition) {
    // do something
} else {
    // do something else
}

After

if (age > 21 && income < 100000) {
    f();
    // do something
} else {
    f();
    // do something else
}

This code assumes that nothing else uses the computed boolean value (so we can eliminate the variable). It may seem odd to end up with two copies of the intervening body (“f()” in the example), but once this code is in each branch it can be customized for the situation at hand.

Pair Related Computations (“Two can live as cheaply as one”)
Before double x = something; f1 = Math.sin(x); f2 = Math.cos(x);	After double x = something; Point p = sincos(x); f1 = p.x; f2 = p.y;

Pair Related Computations (“Two can live as cheaply as one”)

Before

double x = something;

f1 = Math.sin(x);

f2 = Math.cos(x);

After

double x = something;
Point p = sincos(x);
f1 = p.x;
f2 = p.y;

There are times when you’re computing related functions on the same set of inputs. If you only knew that both results were needed, you could compute both at the same time more cheaply than one after the other. This may be because you’re able to apply other techniques (such as Fuse Loops) to the paired code, or it may be because of some mathematical relationship between them (sin() and cos() share some intermediate computations).

The function can be fairly complicated. In a previous paper (“Refactoring: An Example,” in Extreme Programming Explored) we described a situation where a template would repeatedly scan a string and replace patterns, but if the template knew all strings for any replacement, it could do it in one pass by working on whatever pattern it found next.

Remove Common Sub-Expression
Before int start = 10 + f.length(); int endPos = 15 + f.length();	After int length = f.length(); int start = 10 + length; int endPos = 15 + length;

Remove Common Sub-Expression

Before

int start = 10 + f.length();
int endPos = 15 + f.length();

After

int length = f.length();
int start = 10 + length;
int endPos = 15 + length;

The expression being moved must not have side-effects (because it goes from being evaluated twice to being evaluated only once), and it must not be able to change its value between the two uses (even in the presence of threads).

Apply Algebraic Identity
Before // Assume a>0 assert (a > Math.sqrt(b)); . . . return aa + 3a + 2;	After // Assume a>0 assert (*aa > b)); . . . return (a+1) * (a+2)**

Apply Algebraic Identity

Before

// Assume a>0
assert (a > Math.sqrt(b));
  . . .
return a*a + 3*a + 2;

After

// Assume a>0
assert (a*a > b));
  . . .
return (a+1) * (a+2)

This rule lets you replace computations with equivalent but cheaper ones.

Loops

Combine Tests (“Sentinels”)
Before final static int SIZE=200; int[] a = new int[SIZE]; . . . int pos = 0; while (pos < SIZE && a[pos] != searchValue) { pos++; } return pos;	After final static int SIZE=200; int[] a = new int[SIZE + 1]; . . . a[SIZE] = searchValue; int pos = 0; while (a[pos] != searchValue) { pos++; } return pos;

Combine Tests (“Sentinels”)

Before

final static int SIZE=200;
int[] a = new int[SIZE];
  . . .

int pos = 0;
while (pos < SIZE && a[pos] != searchValue) {
    pos++;
}
return pos;

After

final static int SIZE=200;
int[] a = new int[SIZE + 1];
  . . .
a[SIZE] = searchValue;
int pos = 0;
while (a[pos] != searchValue) {
    pos++;
}
return pos;

By adding a little extra storage and putting a sentinel value there, we’re able to combine the tests “a < SIZE” and “a[pos] != searchValue“.

Hoist Loop-Invariant Code
Before void f(double d) { for (int i=0; i<100; i++) { plot(i, i*Math.sin(d)); } }	After void f(double d) { double dsin=Math.sin(d); for (int i=0; i<100; i++) { plot(i, i*dsin); } }

Hoist Loop-Invariant Code

Before

void f(double d) {


    for (int i=0; i<100; i++) {
        plot(i, i*Math.sin(d));
    }
}

After

void f(double d) {
    double dsin=Math.sin(d);

    for (int i=0; i<100; i++) {
        plot(i, i*dsin);
    }
}

This is also a special case of Initialize Earlier. (“Hoist” brings in the idea of lifting the code up.) Because the computation is inside a loop, its cost may be paid multiple times. Be careful that other threads can’t change the value (lest it not be constant in the loop). The expression must have no side effects (as it’s now evaluated only once).

Unroll Loop
Before for (int i=0; i<3; i++) { sum += qi - i7; }	After int i = 0; sum += qi - i7; i++; *sum += qi - i7; i++; sum += qi - i7;*

Unroll Loop

Before

for (int i=0; i<3; i++) {
    sum += q*i - i*7;
}

After

int i = 0;
sum += q*i - i*7;
i++;
sum += q*i - i*7;
i++;
sum += q*i - i*7;

This is one of those “touchy” transformations. You may find yourself running it “backwards” more often than forwards (i.e., realizing that a repetitive sequence of statements can be put in a loop). The benefit is that it reduces or eliminates the overhead of the loop; this can usually only pay off when the loop overhead is high relative to the computation being done. You don’t have to completely unroll the loop; you could run it half as many times by having two copies of the body (and making any adjustments for odd numbers). Make sure the subsequent copies of the body take into account the updated value of the loop index.

Fuse Loops
Before int i; for (i=0; i<k; i++) { sum += f(i); } for (i=0; i<<; i++) { prod *= g(i); }	After int i; for (i=0; i<k; i++) { sum += f(i); prod *= g(i); }

Fuse Loops

Before

int i;
for (i=0; i<k; i++) {
    sum += f(i);
}


for (i=0; i<<; i++) {
    prod *= g(i);
}

After

int i;
for (i=0; i<k; i++) {
    sum += f(i);
    prod *= g(i);
}

The two loops must run the same number of times (or be made to). The two bodies should be independent, so merging them won’t let them interfere with each other.

Reduce Operator Strength
Before int i; for (i=0; i<100; i++) { plot(i, 4i*); }	After int i, i4; for (i=0, i4=0; i<100; i++, i4+=4) { plot(i, i4); }

Reduce Operator Strength

Before

int i;
for (i=0; i<100; i++) {
    plot(i, 4*i);
}

After

int i, i4;
for (i=0, i4=0; i<100; i++, i4+=4) {
    plot(i, i4);
}

This is the compiler optimization “Reduction in Strength”. The “reduction” refers to the fact that we’ve converted the multiplication to an addition. This transformation is related to “Apply Algebraic Identity”.

Procedures

The “Procedure” rules J. Bentley describes are more complicated than I can properly explain, so I’ll summarize them and refer you to Writing Efficient Programs for the details.

Inline Method

Bentley’s first procedure rule is equivalent to M. Fowler’s refactoring “Inline Method” (and “Extract Method”). See Refactoring for a full discussion.

Use Coroutines for Multi-Pass Algorithms

A coroutine is a variant of the idea of a subroutine where the two routines call each other in a peer relationship rather than master-slave. (See Knuth, vol. 1.) Multiple calls of the coroutine lead to it resuming where it left off, rather than recursively starting at the top again. Java uses subroutines, not coroutines, so you’re not going to easily create code that uses that approach. (A Java program could probably use state machines of some sort to simulate them but it would not be an “accidental” implementation.)

Bentley points out a related issue though: some designs will read and write complete files in a multi-pass way, but an implementation could be designed to have the various passes communicate via in-memory buffers, a little at a time, rather than through the file system. The classes java.io.PipedInputStream and java.io.PipedOutputStream can help you implement this in a multi-threaded way. (This is related to Unix’ “|” pipe and filter mechanism.)

Introduce Helper Method for Small Cases

Some methods are better for large cases than small ones. (You often see this for recursive solutions.) The method can divide-and-conquer until it hits the small case, then call a special method tuned for that. (Bentley uses a good example of this in discussing sorting: a quicksort may be the chosen generic sorting method, but it has a lot of overhead in sorting only a few elements.)

Remove Tail-Recursion

If the last thing a routine does is recursively call itself, you can instead adjust the arguments and branch back to the top, saving the overhead of stacking contexts and making another procedure call.

Introduce an Explicit Stack

Recursion works by having a hidden call stack. A routine can simulate this by maintaining its own stack. When a recursive call would be made, push the current state onto the stack, adjust the arguments, and branch to the top. This is a pretty dramatic change in a procedure, and can definitely reduce the clarity of the routine. (You may be able to use it backwards, converting a stack-based routine into a recursive routine – the recursive routine will usually be much shorter and much easier to understand.)

Other Techniques

Handle Common Cases First
Before if ((i&0x0F)==0x08) { // divisible by 8 } else if ((i&0x0F)==0x02) { // even but not div. by 8 } else { // odd }	After if ((i&0x01)==0x01) { // odd } else if ((i&0x0F)!=0x08) { // even but not div. by 8 } else { // divisible by 8 }

Handle Common Cases First

Before

if ((i&0x0F)==0x08) {
    // divisible by 8
} else if ((i&0x0F)==0x02) {
    // even but not div. by 8
} else {
    // odd
}

After

if ((i&0x01)==0x01) {
    // odd
} else if ((i&0x0F)!=0x08) {
    // even but not div. by 8
} else {

    // divisible by 8
}

Suppose the example code is in a loop where i runs from 0 to a large value. In the original, odd numbers are tested last even though they make up half the numbers. In the revised version, the tests are ordered so that the most common values are tested first.

Be very careful when twisting around the if-clauses and revising the conditions.

Exploit Hardware Parallelism
Before public class Status { boolean hasBand; boolean inLine; boolean hasNews; } compatible = s1.hasBand == s2.hasBand && s1.inLine == s2.inLine && s1.hasNews == s2.hasNews;	After public class Status { public final static int HAS_BAND=1; public final static int IN_LINE=2; public final static int HAS_NEWS=4; public int status; } compatible = s1.status == s2.status;

Exploit Hardware Parallelism

Before

public class Status {
    boolean hasBand;
    boolean inLine;
    boolean hasNews;
}


compatible = 
  s1.hasBand == s2.hasBand &&
  s1.inLine == s2.inLine &&
  s1.hasNews == s2.hasNews;

After

public class Status {
    public final static int HAS_BAND=1;
    public final static int IN_LINE=2;
    public final static int HAS_NEWS=4;

    public int status;
}

compatible = 
  s1.status == s2.status;

In this example, we’ve changed a series of flags to use the bits in a word, so we can use word-level operations (including equality, “and”, “or”, etc.) The principle goes further, though. You might be able to use multiple threads across CPUS; you may be able to arrange things so reads and writes of values can be overlapped; you may be able to overlap processing and I/O; etc.

Trade Off Time versus Space

Bentley has a series of rules that discuss how to spend space to save time, and vice versa. His rules cover the gamut – from pre-computing all results to introducing interpreters.

Summary

We’ve discussed a number of refactorings, based on Bentley’s book Writing Efficient Programs. He discussed them in the context of performance tuning, but they are applicable outside that domain. We’ve classified them like this:

Expressions
- Short-Circuit a Monotone Function
- Initialize Earlier
- Pre-Compute Values
- Replace Boolean with “If”
- Pair Related Computations
- Remove Common Sub-Expression
- Apply Algebraic Identity
Loops
- Combine Tests
- Hoist Loop-Invariant Code
- Unroll Loop
- Fuse Loops
- Reduce Operator Strength
Procedures
- Inline Method
- Use Coroutines for Multi-Pass Algorithms
- Introduce Helper Method for Small Cases
- Remove Tail-Recursion
- Introduce Explicit Stack
Other
- Handle Common Cases First
- Exploit Hardware Parallelism
- Trade Off Time Versus Space

I encourage you to use these refactorings when they improve your code. I also urge you to pick up and read a copy of Jon Bentley’s Writing Efficient Programs, both for a better explanation of these rules, and a glimpse of how performance tuning can resemble refactoring for design.

Resources

Writing Efficient Programs, Jon L. Bentley, 1982.
Refactoring: Improving the Design of Existing Code, Martin Fowler, 1999.
The Art of Computer Programming. Volume 1: Fundamental Algorithms, Donald E. Knuth, 1968.
“Lazy Optimization Patterns for Efficient Smalltalk Programming,” Ken Auer and Kent Beck, in Pattern Languages of Program Design 2, (edited by Vlissides, Coplien, and Kerth), 1996.
Performance Engineering of Software Systems, Connie U. Smith, 1990.

[Written 2-11-2000; added Auer/Beck and Smith references, 2-19-2000; corrected “more false” problem caught by Kieran Barry 12-22-2001.]