XPlorations

XP123XPlorationsWriting Efficient Programs (Feb. 2000)

Refactorings from Writing Efficient Programs February, 2000

Jon Bentley wasn't using the term "refactoring" in his books like Writing Efficient Programs, but many of his optimization guidelines can be looked at that way. This paper describes several of those rules, and points out a few subtleties concerning them.

The refactoring rules in Writing Efficient Programs tend to be at a lower level than the refactorings described in M. Fowler's Refactoring book: they're focused more on performance issues, sometimes even at the expense of code readability. (If you use these rules for performance tuning, measure your program to ensure that performance is actually improving: some rules work better "forward" on some machines and "backwards" on others, and some can interfere with what your compiler does.) Not all the rules are strictly performance-based: some will enhance your understanding of what the program does.

We'll discuss these rules in four categories:

  • Expressions
  • Loops
  • Procedures
  • Others

The names don't always agree with Bentley's originals; I've tried to use Fowler's approach of using an active verb phrase ("do this").

The approach will be to show a motivating example "before and after", like this:

Name of refactoring
Before

Sample code before refactoring

 

After

Same code after refactoring

 

The part that's going to change (on the left) is show in italics; the part that has changed (on the right) is shown in bold. It will be followed by any discussion.

Expressions

Short-circuit a Monotone Function
Before
if (a & b & c)
   ...
After
if (a && b && c) 
   ...

Some functions can only grow in one direction: "and" conditions can never go from false to true, positive integers added together get bigger. This table suggests a few possibilities:

Operator Type Result can only be...
 & ("and") boolean more false
 | ("or") boolean more true
 + positive numbers bigger
 + String longer
 - positive numbers smaller
 add() collection bigger

Initialize Earlier
Before
f() {
    double value = Math.sin(0.25);
    // rest of original
}
After
final static double defaultValue = 
    Math.sin(0.25);
  :
f() {
    double value = defaultValue;
    // rest of original
}

In the example, we've moved initialization from when the routine is called to class loading time (and possibly compile-time). This table shows initialization time from latest to earliest:

Initialization time
In loop
Before loop
In function (local variable creation)
In constructor (instance creation)
As static (class load-time creation)
Compile-time

Notice the use of final static in the example - this says to create the value only once, when the class is loaded, and never permit it to be changed after that.


Pre-Compute Values
("Lookup Table")
Before
int f(int i) {
    if (i < 10) {return i * i - i;}
    return 0;
}
After
final static int[] values = 
  {0*0-0, 1*1-1, 2*2-2, ..., 9*9-9};

int f(int i) {
    if (i < 10) {return values[i];}
    return 0;
}

This is an example of "Initialize Earlier", implemented in a specific way.


Replace Boolean with "If"
Before
boolean condition = 
  age > 21 && income < 100000;

f();

if (condition) {
    // do something
} else {
    // do something else
}
After
if (age > 21 && income < 100000) {
    f();
    // do something
} else {
    f();
    // do something else
}

This code assumes that nothing else uses the computed boolean value (so we can eliminate the variable). It may seem odd to end up with two copies of the intervening body ("f()" in the example), but once this code is in each branch it can be customized for the situation at hand.


Pair Related Computations
("Two can live as cheaply as one")
Before
double x = something;

f1 = Math.sin(x);
f2 = Math.cos(x);
After
double x = something;
Point p = sincos(x);
f1 = p.x;
f2 = p.y;

There are times when you're computing related functions on the same set of inputs. If you only knew that both results were needed, you could compute both at the same time more cheaply than one after the other. This may be because you're able to apply other techniques (such as Fuse Loops) to the paired code, or it may be because of some mathematical relationship between them (sin() and cos() share some intermediate computations).

The function can be fairly complicated. In a previous paper (Refactoring: An Example) we described a situation where a template would repeatedly scan a string and replace patterns, but if the template knew all strings for any replacement, it could do it in one pass by working on whatever pattern it found next.


Remove Common Sub-Expression
Before
int start = 10 + f.length();
int endPos = 15 + f.length();
After
int length = f.length();
int start = 10 + length;
int endPos = 15 + length;

The expression being moved must not have side-effects (because it goes from being evaluated twice to being evaluated only once), and it must not be able to change its value between the two uses (even in the presence of threads).


Apply Algebraic Identity
Before
// Assume a>0
assert (a > Math.sqrt(b));
  :
return a*a + 3*a + 2;
After
// Assume a>0
assert (a*a > b));
  :
return (a+1) * (a+2)

This rule lets you replace computations with equivalent but cheaper ones.

Loops

Combine Tests ("Sentinels")
Before
final static int SIZE=200;
int[] a = new int[SIZE];
  :

int pos = 0;
while (pos < SIZE && a[pos] != searchValue) {
    pos++;
}
return pos;
After
final static int SIZE=200;
int[] a = new int[SIZE + 1];
  :
a[SIZE] = searchValue;
int pos = 0;
while (a[pos] != searchValue) {
    pos++;
}
return pos;

By adding a little extra storage and putting a sentinel value there, we're able to combine the tests "a < SIZE" and "a[pos] != searchValue".


Hoist Loop-Invariant Code
Before
void f(double d) {


    for (int i=0; i<100; i++) {
        plot(i, i*Math.sin(d));
    }
}
After
void f(double d) {
    double dsin=Math.sin(d);

    for (int i=0; i<100; i++) {
        plot(i, i*dsin);
    }
}

This is also a special case of Initialize Earlier. ("Hoist" brings in the idea of lifting the code up.) Because the computation is inside a loop, its cost may be paid multiple times. Be careful that other threads can't change the value (lest it not be constant in the loop). The expression must have no side effects (as it's now evaluated only once).


Unroll Loop
Before
for (int i=0; i<3; i++) {
    sum += q*i - i*7;
}
After
int i = 0;
sum += q*i - i*7;
i++;
sum += q*i - i*7;
i++;
sum += q*i - i*7;

This is one of those "touchy" transformations. You may find yourself running it "backwards" more often than forwards (i.e., realizing that a repetitive sequence of statements can be put in a loop). The benefit is that it reduces or eliminates the overhead of the loop; this can usually only pay off when the loop overhead is high relative to the computation being done. You don't have to completely unroll the loop; you could run it half as many times by having two copies of the body (and making any adjustments for odd numbers). Make sure the subsequent copies of the body take into account the updated value of the loop index.


Fuse Loops
Before
int i;
for (i=0; i<k; i++) {
    sum += f(i);
}

for (i=0; i<<; i++) {
    prod *= g(i);
}
After
int i;
for (i=0; i<k; i++) {
    sum += f(i);
    prod *= g(i);
}

The two loops must run the same number of times (or be made to). The two bodies should be independent, so merging them won't let them interfere with each other.


Reduce Operator Strength
Before
int i;
for (i=0; i<100; i++) {
    plot(i, 4*i);
}
After
int i, i4;
for (i=0, i4=0; i<100; i++, i4+=4) {
    plot(i, i4);
}

This is the compiler optimization "Reduction in Strength". The "reduction" refers to the fact that we've converted the multiplication to an addition. This transformation is related to "Apply Algebraic Identity".

Procedures

The "Procedure" rules J. Bentley describes are more complicated than I can properly explain, so I'll summarize them and refer you to Writing Efficient Programs for the details.

Inline Method

Bentley's first procedure rule is equivalent to M. Fowler's refactoring "Inline Method" (and "Extract Method"). See Refactoring for a full discussion.

Use Coroutines for Multi-Pass Algorithms

A coroutine is a variant of the idea of a subroutine where the two routines call each other in a peer relationship rather than master-slave. (See Knuth, vol. 1.) Multiple calls of the coroutine lead to it resuming where it left off, rather than recursively starting at the top again. Java uses subroutines, not coroutines, so you're not going to easily create code that uses that approach. (A Java program could probably use state machines of some sort to simulate them but it would not be an "accidental" implementation.)

Bentley points out a related issue though: some designs will read and write complete files in a multi-pass way, but an implementation could be designed to have the various passes communicate via in-memory buffers, a little at a time, rather than through the file system. The classes java.io.PipedInputStream and java.io.PipedOutputStream can help you implement this in a multi-threaded way. (This is related to Unix' "|" pipe and filter mechanism.)

Introduce Helper Method for Small Cases

Some methods are better for large cases than small ones. (You often see this for recursive solutions.) The method can divide-and-conquer until it hits the small case, then call a special method tuned for that. (Bentley uses a good example of this in discussing sorting: a quicksort may be the chosen generic sorting method, but it has a lot of overhead in sorting only a few elements.)

Remove Tail-Recursion

If the last thing a routine does is recursively call itself, you can instead adjust the arguments and branch back to the top, saving the overhead of stacking contexts and making another procedure call.

Introduce an Explicit Stack

Recursion works by having a hidden call stack. A routine can simulate this by maintaining its own stack. When a recursive call would be made, push the current state onto the stack, adjust the arguments, and branch to the top. This is a pretty dramatic change in a procedure, and can definitely reduce the clarity of the routine. (You may be able to use it backwards, converting a stack-based routine into a recursive routine - the recursive routine will usually be much shorter and much easier to understand.)

Other Techniques

Handle Common Cases First
Before
if ((i&0x0F)==0x08) {
    // divisible by 8
} else if ((i&0x0F)==0x02) {
    // even but not div. by 8
} else {
    // odd
}
After
if ((i&0x01)==0x01) {
    // odd
} else if ((i&0x0F)!=0x08) {
    // even but not div. by 8
} else {
    // divisible by 8
}

Suppose the example code is in a loop where i runs from 0 to a large value. In the original, odd numbers are tested last even though they make up half the numbers. In the revised version, the tests are ordered so that the most common values are tested first.

Be very careful when twisting around the if-clauses and revising the conditions.


Exploit Hardware Parallelism
Before
public class Status {
    boolean hasBand;
    boolean inLine;
    boolean hasNews;
}


compatible = 
  s1.hasBand == s2.hasBand &&
  s1.inLine == s2.inLine &&
  s1.hasNews == s2.hasNews;
After
public class Status {
    public final static int HAS_BAND=1;
    public final static int IN_LINE=2;
    public final static int HAS_NEWS=4;

    public int status;
}

compatible = 
  s1.status == s2.status;

In this example, we've changed a series of flags to use the bits in a word, so we can use word-level operations (including equality, "and", "or", etc.) The principle goes further, though. You might be able to use multiple threads across CPUS; you may be able to arrange things so reads and writes of values can be overlapped; you may be able to overlap processing and I/O; etc.

Trade Off Time versus Space

Bentley has a series of rules that discuss how to spend space to save time, and vice versa. His rules cover the gamut - from pre-computing all results to introducing interpreters.

Summary

We've discussed a number of refactorings, based on Bentley's book Writing Efficient Programs. He discussed them in the context of performance tuning, but they are applicable outside that domain. We've classified them like this:

  • Expressions
    • Short-Circuit a Monotone Function
    • Initialize Earlier
    • Pre-Compute Values
    • Replace Boolean with "If"
    • Pair Related Computations
    • Remove Common Sub-Expression
    • Apply Algebraic Identity
  • Loops
    • Combine Tests
    • Hoist Loop-Invariant Code
    • Unroll Loop
    • Fuse Loops
    • Reduce Operator Strength
  • Procedures
    • Inline Method
    • Use Coroutines for Multi-Pass Algorithms
    • Introduce Helper Method for Small Cases
    • Remove Tail-Recursion
    • Introduce Explicit Stack
  • Other
    • Handle Common Cases First
    • Exploit Hardware Parallelism
    • Trade Off Time Versus Space

I encourage you to use these refactorings when they improve your code. I also urge you to pick up and read a copy of Jon Bentley's Writing Efficient Programs, both for a better explanation of these rules, and a glimpse of how performance tuning can resemble refactoring for design.

Resources

[Written 2-11-2000; added Auer/Beck and Smith references, 2-19-2000; corrected "more false" problem caught by Kieran Barry 12-22-2001.]

Copyright 1994-2010, William C. Wake - William.Wake@acm.org