Tuesday, September 30, 2008

Is coverage really enough?

What is coverage?
In this week's assignment, our professor assigned a plethora of reading material relating to coverage. Yet, after reading through all the articles, none of them really explained what coverage is in a simple and concise way that's easy for an undergraduate student to understand. Finally, I figured it out myself, and this is my definition: coverage is the percent of code that is being executed when the program is run. Now, this seems like a very simple definition, but it's actually quite accurate, and the reason why it is will become obvious soon.

How to test coverage?
For this exercise, we were given a Java project that implements a stack using an ArrayList. The project also came with some JUnit test cases. Using Ant and Emma to generate a nicely-formatted HTML file, we can not only see the coverage percentage but also which code is being executed and which ones are not. You can see a sample Emma report here: sample report

How to improve coverage?
Our first task was to write additional code so that the coverage increases to 100% from the default 80% for the stack project. Going back to our earlier definition, coverage is defined as what percentage of code is being executed at runtime. This means that methods need to be called and each line needs to be executed, including any branch conditions. So what's the easiest way to increase coverage? Write code to run what wasn't executed! For this exercise, the most sensible way of doing this is to write additional JUnit tests. By adding some tests that call the unused methods and throw the exceptions in the catch blocks, the code was easily brought up to 100%.

Screenshot of some of the JUnit test cases used

How does coverage help?
So how does coverage help a developer? It helps him to see which parts of his program are untouched, and therefore untested. Also, if the code is covered and there were no errors during runtime, that means that there is nothing wrong with the execution of the program for that particular trial. So does this mean that if the code coverage is 100% and there is no problem during runtime, then there is nothing wrong with the code? This is the most misleading part and as we'll see shortly, code coverage is not the same thing as code robustness.

Why should we not rely solely on coverage?
In the previous scenario, we raised the coverage to 100% and the code ran fine with no problems. It's easy to just assume that because the code ran fine with 100% coverage, then there is no problem with the code. To disprove this, we'll now purposely introduce a bug into the code that doesn't break the 100% coverage, but will blatantly cause problems. The easiest way to do this is to impose a cap on the number of elements the stack can hold. We'll do this by modifying the push method in the Stack class from:
public void push(Object obj) {
this.elements.add(obj);
}
to:
public void push(Object obj) {
if(this.elements.size() <= 4 { this.elements.add(obj); } }
As you can see, this makes it so that only a maximum of 5 items can be added to the stack. Because the JUnit tests only adds 3 elements at most to the stack, the JUnit tests pass, and Emma shows 100% coverage in the code. However, if we create a new JUnit test that tries to push more than 5 items and then pop them off, the elements will obviously be wrong. This goes to show that just because code coverage is high, that does not mean that there are no problems with the code.

Why is it not good to solely rely on coverage?
Going back to our earlier definition once more, coverage ONLY tells us which code is executed during runtime and which ones are not. Just because a line of code is executed successfully, that doesn't mean it's robust and can handle any situation. That job is left to the developer to write good test cases that tests not only the expected parameters but also the out-of-boundary situations too. Coverage is only one of the tools that developers can use to debug their code, but it should never be the sole means of determining the quality of code. Where is coverage comes in handy is showing the developer which pieces of code have not been debugged yet, but it is in no way a replacement for creating good, robust test cases.

You can get the source code for both versions here:
Stack project without bug
Stack project with bug

Wednesday, September 24, 2008

Automated vs. Manual QA

Bugs, bugs, bugs, every programmer's nightmare. A program may take a month to write, but several months to debug. Even if all the bugs are caught, more are bound to pop up in the most unexpected of places. Fortunately for us, just as in real life, there are tools that we can use to kill those pesky critters once and for all.

Exercise
In this exercise, we used three quality assurance tools, Checkstyle, FindBugs, and PMD. Each one has a different focus, which will be described in more detail later on in this post. Our professor provided a test project for us to try these tools on and we used the Ant build system to run each QA tool on it. Then we are supposed to fix up the code as much as possible, then upload the final build, which I've provided here:

http://finalranma.imap.cc/stack-danielftian-6.0.924.zip

Checkstyle
Checkstyle mainly checks the formatting of the source code to insure that it follows code formatting guidelines. Of course, the guidelines can vary from organization to organization but the important thing is for everybody to use the same one so that they don't have to spend hours trying to read other people's code and can spend the time fixing it instead. In the provided project, there were a few formatting errors that were easily fixed once they were pointed out, such as the position of the curly braces. Checkstyle generates a HTML report that lists all the places where a formatting error has occurred. It will even find problems with Javadoc comments, such as missing parameters and incorrect sentence formatting. However, it obviously won't catch things such as method names and variables that are ambiguous, and these tasks are still better left to real people to verify.

FindBugs
FindBugs specializes in finding bugs that normally won't be flagged during compile time, but might become issues during runtime. The one example that was brought up in the test project was that Integer one = new Integer(1); is much less efficient during runtime as Integer one = Integer.valueOf(1);. FindBugs generates a HTML report that includes detailed descriptions of the problem and the solution in plain English that even beginner programmers can understand. This is a great QA tool that can find problems veteran programmers know about, but less-experienced ones would be unaware of.

PMD
PMD, like FindBugs, focuses on finding bugs that show up during runtime, but I find that it uses a much more robust, but stricter ruleset for finding errors. This can be both a good and a bad thing. For example, when I ran PMD on the test project, it reported that there was an empty catch block and even managed to detect that a method was creating a variable and then returning it immediately when it could have just returned the variable's value instead. It also suggested that certain variables could be made final since they are only initialized in the declaration or constructor. However, one particular error confused me. The description said "Avoid using implementation types like 'ArrayList'; use the interface instead" for the code ArrayList list = new ArrayList;. Because of this one lingering error (I fixed all the Checkstyle and FindBugs errors), I was unable to use Ant to verify the build. Just like FindBugs, PMD will generates a HTML report and provides detailed descriptions for the problems it found, but unfortunately the description pages all link to PMD's website, which is a problem if the computer doesn't have internet access. Also, the description pages provide examples, but they are examples of what the wrong code looks like and doesn't show any examples of what the correct code should be, which became very troublesome in this case since I couldn't find a solution elsewhere.

CodeRuler and QA tools
Our professor also asked us to run the QA tools on our prior CodeRuler assignment to see how well our code held up to conventions. Checkstyle reported a bunch of Javadoc errors, along with several lines being longer than 100 characters. All of the errors that Checkstyle caught were also acknowledged by a peer review of our code by one of our classmates. FindBugs reported that there were no errors, but PMD showed up several, illustrating the differences between their philosophies. PMD reported that one of the methods we had was empty and also to avoid using if (x != y) in order to keep the code consistent with the "if same, else different" philosophy. This problem was obviously not caught in the peer review because it's more of a semantic problem than a potential source for bugs.

Automated vs. Manual QA
So in the end, which one is better? Automated or manual QA? I'd have to say both. Manual QA can catch things that only a human can notice, such as badly-named variables and methods and incorrect formatting for comments. Automated QA, on the other hand, can use the knowledge gained from veteran programmers and catch bad programming practices that would otherwise go unnoticed, but even so, a human being needs to review the errors to deem whether a fix is necessary or not. In the end, by combining the two methods, a programmer can not only write code that works, but also code that is well-formatted and, when compiled, is efficient and better than it could have been without any QA tools.

Monday, September 8, 2008

Lessons learned from Code Ruler

Our instructor provided us with an assignment to implement an AI script for CodeRuler, a game that was created by IBM. You can find the official site here:

CodeRuler: http://www.alphaworks.ibm.com/tech/coderuler

The Premise
You are a ruler who has been given a castle, 10 peasants, and 10 knights. Peasants can capture land, garnering you points and faster reinforcements from your castle, but die to one swift blow from a knight. Knights can kill opposing peasants and knights and capture castles, and castles themselves will provide provide a new peasant or knight every few seconds, faster if you own more land. Points are gathered from killing the enemy and capturing their castles, as well as claiming land and having surviving units at the end of the game. Of course, the player with the highest number of points wins. The twist to this game, however, is that the player has no direct control over the units and must write an AI script in Java that will be used to control them.

The Implementation
For this assignment, we were assigned partners to work with and I had the pleasure of working with Tyler Wolff. The strategy that we decided to implement was to have the peasants capture land in a 'fill' fashion by moving up and down until it encounters a square already captured, then moving left and right one block and repeating the same process. The implementation for the peasants is actually quite simple since they can't do anything except this one function.

CodeRuler showing our 'fill' implementation for peasants

The meat of where the coding is, at least in my opinion, is writing an effective AI for the knights. Because the knights are the ones that have the biggest impact on the outcome of the game, we decided to focus most of our efforts on writing an effective strategy for them. What we ultimately decided on was to eliminate the biggest threat first - the enemy knights, and then take care of the other units only when there are no knights to oppose ours. To that end, we try to 'gang up' on a single knight by picking one that's closest to one of our knights and then sending all of our knights to kill that single one. Unlike peasants, knights have health and it takes several attacks to bring one down, so focusing the attacks should prove more effective than randomly selecting targets.

However, there are some other factors to consider. Since only one unit can occupy any given space at any time, a situation can occur where all the knights try to go towards one enemy unit but get stuck behind each other, forming a line that's easily defeated. This is because when the knights try to move to a square already occupied by a friendly knight, they can't do so and therefore stop in their tracks, essentially wasting a turn. The fix for that was to make them turn 45 degrees to the left or right and attempt to head in that direction instead, moving around the knight rather than bumping into him repeatedly. This code worked quite well and improved our results drastically. We even took it one step further and added an additional check to see if there is an enemy in the new direction and to attack it if there is. However, the code only performs 2 checks and if both spaces are occupied by friendly units, the knight will not move or attack.

If all the enemy knights are dead or if a castle happens to be closer than the closest enemy knight, the knights will attempt to capture the castle. The process of capturing a castle is the same as capturing land, except only a knight can do it. After all the enemy knights and castles are killed and captured, the knights will then go after the closest peasant. Our code is written so that if there is already a knight chasing after a peasant, another knight will not join the chase and will instead pick a random direction to move in until it finds a peasant that is not being pursued by another knight. This strategy works best if the knights are spread out, as each knight can chase down a different peasant, but does not work so well if they are bunched together because they will all see one peasant as the closest one, but only one knight will chase after it while the rest move in random directions. However, at this point the enemy has no castles and no knights, so the player has essentially won and killing the peasants slowly or quickly has little bearing on the final outcome.

Playtesting
So all this is good in theory, but how does it hold up against other AIs? Although we couldn't test our AI against the other groups that also worked on this assignment, we did manage to test it with the built-in AIs and the results were impressive:

Our AI vs. Migrant Ruler
Trial 1: 864 vs. 0
Trial 2: 812 vs. 0
Trial 3: 639 vs. 0

Our AI vs. Gang Up Ruler
Trial 1: 712 vs. 84
Trial 2: 739 vs. 133
Trial 3: 809 vs. 60

Our AI vs. Smart Split Up Ruler
Trial 1: 788 vs. 70
Trial 2: 797 vs. 86
Trial 3: 801 vs. 58

Our strategy was very effective in eliminating the enemy knights and our AI can easily win any one-on-one match. We even tried it with a 6-player free-for-all and our AI can win up to 70% of the games. There is only one potential problem; CodeRuler imposes a time limit on how long it takes code to run, a mere 0.5 seconds every turn. Because our code contains a large amount of loops and checks, it can reach this limit, especially when the knights are going after the peasants. When this limit is exceeded, CodeRuler will stop the AI script from running, although all the other AIs will continue to run, leading to a situation where your units sit there and don't do anything at all while the enemy wipes out your forces with impunity. This would be a good way to impose a cap on the amount of logic that people can program in and would strike a good balance between complexity and optimizations, except for one problem: it is completely dependent on how fast the computer is that CodeRuler is being run on. Someone with a faster system will be able to implement more complex code than someone with a slower computer. In my opinion, there needs to be a better way to impose this limit because even amongst our small class of 20 students, there is a big variance between computer speeds.

Lessons Learned
Through this experience, I feel like I've learned a great deal about programming itself. Eclipse surprised me a number of times, especially when it pointed out which variables I haven't used, and going from using a text editor (albeit with syntax coloring) to an IDE has been an enlightening experience. CodeRuler is, I believe, a really great alternative to writing simple Java programs because it presents a clear and fun goal that can be constantly improved upon. If the processor speed issue is fixed, it has the potential to be the next "Hello world" program. Lastly, working with Tyler has been a very good experience. Usually when I get into groups, I tend to try and do everything by myself because my partner almost always ends up being incompetent or lazy and I find it much more reassuring to finish everything by myself so I'm not placing my grades in the hands of something I don't trust. Tyler, on the other hand, has shown me that if people do get together and pull their weight, they can yield a product that is more than the sum of its parts.

You can obtain a copy of our code and our Javadoc from here:

http://www2.hawaii.edu/~wolfft/tylerwolff-daniel.f.tian.zip