XPlorations
BackgroundSuppose we're testing a library search system. The library has a database of book descriptions, and we give it queries. Each query produces a result set of the matching items.When we consider this system, we might think of the database as a collection of records, and think of the query as a string. The query string represents a boolean expression, so we want to test its logical nature. Finally, the query string conforms to a grammar, so it might be well- or ill-formed. Test StrategyTo get ideas on testing, we might go to a site like http://www.testing.com, and see Brian Marick's suggestions. Summarizing:
How will we define the tests? These tests work well with a spreadsheet. We can have one table for the database contents, and another for queries.
The semantics will be: load each line of the data table (after the first two), then run each query. Apply the test (with its optional value) to the result. The tests I have in mind are:
QueriesQueries have a simple structure:query = termThe precedence is left to right, so "fish or beef and steak" is interpreted "((fish or beef) and steak" (though the parentheses are not allowed). Matching should not be case sensitive; the operator names are not case sensitive either. Blanks are ignored. The NOT operator may need a little explanation. In classical logic, not is a unary operator. In the search system, "a NOT b" is interpreted "a AND NOT b" (but the latter is not a legal expression in our query language). The NOT operator is used to eliminate unwanted terms. Thus, "extreme AND programming NOT sports" would match "Extreme Programming Explored" but not "Is Extreme Programming an Extreme Sport?" Test Strategy RevisitedWe'll start with the collection rules, and make four separate test databases: empty, one record, many records, and duplicates. For each of these, we'll try a variety of queries.We'll test with a blank and non-blank queries, and we'll try grammatical and non-grammatical queries. We'll apply the logical operator testing rules (which we haven't covered yet). For the integer values, there are really only two places that use integers: the number of records in the collection (which we covered), and the size of the result. Results can have anywhere from zero to the number of items in the database; only 0 and "max" apply from the testing rules. But we'll certainly hit a result set of size "1" as well as some intermediate values. The Simplest Test: An Empty CollectionWhat can we learn from an empty collection? It seems like that isn't even worth trying--who would have a library with only one book? So: the data set is empty. (We may not even create the spreadsheet page for it.) Let's cover the test case for a blank or non-blank query string:
Install this test and run it. That will show you the first reason to start with the simplest test: setting up the environment and running the test is non-trivial. You'd rather start with a small test set. In our example, there's a second reason it pays to start with an empty collection: we decided that a blank query returns a result with 0 items. We could have decided that it was an error. Murphy's Law tells you that if you didn't say, the programmers would have done the opposite of what you ended up wanting. We can also test non-grammatical queries with an empty collection. It seems reasonable that error checking would happen before the query actually does its work, so it won't matter if the collection is empty or not. (And a quick chat with the programmers confirms our intuition.)
We can certainly find other non-grammatical queries to try. Finally, we might toss in one legal but complicated and tricky one:
One-Element CollectionWe need a table with one record:
(What else would I use?) You might start by repeating the earlier queries; they'll get the same answer. We can check that a record is found as expected for a simple query:
Brian Marick (www.testing.com) has suggested some good rules for testing booleans:
In the comments, TF/FT/etc. indicates which term is true and which is false.
Note that we've checked a couple other things by having variety in the test cases: we've checked things in all three columns, and we've mixed case in both terms and operators. This can be beneficial or not. On the good side, it cuts down the number of test cases. On the bad side, it confounds two things. If "dive OR wake" fails, is it because matching is case-sensitive, or because there's a problem in "OR"? You'll have to come to your own balance on this. If you find yourself unable to figure out what the problem is, it can be a sign to simplify your tests. This can be partially mitigated by keeping early tests simple, focused on one aspect, and let later tests build complexity. We can explore a few other aspects of booleans. For one thing, there's a set of identity rules: "A OR A = A"; "A AND A = A"; "A NOT A = false". This shows up in our queries like this:
We might want to verify that the term order doesn't matter. (We want "Programming AND Extreme" to have the same result as "Extreme AND Programming".)
Finally, we'd like to verify that operator precedence is right. Remember that "a OR b AND c" should be interpreted "(a OR b) AND c" rather than "a OR (b AND c)". (Some grammars for logical operators, and most programming languages, treat AND as higher precedence than OR; we'd like to make sure that doesn't bubble out to the query language.) A few minutes thought (or a half hour of logical derivation:) will convince you that if "(a OR b) AND c" is true, then "a OR (b AND c)" will also be true. To show that the interpretation is wrong, we'd have to find a case where the second one matches but the first one doesn't. This case exists when A matches, but B and C do not.
We're definitely hitting some subtleties now. What happens if we don't test this case? Are we doomed? What about the many other ways queries could be wrong? You always have to draw a line somewhere. It took me about an hour to convince myself that the precedence problem could exist, to come up with an example, and to prove that the example could be logically derived. If I hadn't done this, there's a chance this bug would surface. The biggest danger is that the programmers would misunderstand the specification, and assume "normal" precedence. But as customer, I'm in the room with them talking about it. If we had just missed this subtlety, note that this is an obscure case in the grand scheme of things: I've tested one- and two-term queries more thoroughly, and those are by far the most common ones users will form. That doesn't completely excuse the potential problem. If I were worried enough, I could get someone to write a small program that generated "all" queries with 3 or 4 terms. ("All" meaning all combinations of operators and terms present or not, not "all" possible terms.) "a op b op c op d": Each term might be present or not (2^4=16), and each operator could have one of 3 values (3*3*3), so there are 16*27=432 combinations I might look at. This is an uncomfortable number, but not unbearable. In normal use, I'd probably just pick a handful of problematic-looking ones, and trust the result. Collection with Many ElementsWe'll define a bigger collection, and repeat our queries. (They'll have slightly different result counts, but none will go from 0 to a positive number or vice versa.)
We'll also add a new query to cover the case "all records are in the result set." We had this for the one-record case, but we'd feel better seeing it on a bigger set as well.
Collection with DuplicatesVerify that duplicated entries are treated the same. We decide we won't "de-dup" but rather just return what we find.
ConclusionWe've created an initial set of tests for a simple query system that might show up in library software. Several things stand out:
Sidebar: Running the TestsA spreadsheet provides the test creator with a comfortable, well-understood interface, and it can write easily processed files.For the example we used, the tester and the programmer have to agree on the columns and their meanings, and the file format. When the tester is done writing a file, they can save it into a tab-delimited or bar-delimited file (whatever they agree on with the programmers). The programmers can write a simple program that will read the file and do the appropriate function. The programmers might generate code, or just have the program read and execute functions directly. I think of the GUI (graphical user interface) as being lopped off or set aside, and the spreadsheet as providing an alternative interface. It's often best to focus on feature testing first, without worrying about the GUI until the features are right. A spreadsheet is nice for this approach: the tester still gets a GUI, just not one customized to the problem at hand. Resources and Related Articles
[May 25, 2001.] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Copyright 1994-2009, William C. Wake - William.Wake@acm.org |