Engineering Blog (temporary title): November 2008

Monday, November 24, 2008

Implementing a stack in Wicket - a simple example that leads to frustration

Wicket good
Now that we've been working on the DueDates project for the past few weeks, the next step is to design a web interface for it. Unfortunately, many of us don't know the difference between a CSS tag and a kangaroo, so our professor gave us a very simple assignment to help us 'jump start' our way into using Wicket.

What is Wicket?
So what is Wicket? According to Wikipedia, it's a "lightweight component-based web application framework for the Java programming language." In simpler words, it's a tool that allows Java developers to write Java programs to display web pages. This is different from Java applets that run inside a web page; Wicket allows developers to take advantage of Java to generate web pages for them. This means that all the features of Java can be used, such as inheritance and encapsulation, and dynamic websites can be created much like writing a normal Java program.

Stacks of Wicket
Our assignment this time was to implement a stack in a web page using Wicket. Three buttons are available to push, pop, and clear the stack, and a text box was provided to push new strings onto the stack. This program seems deceptively simple, but it proved to be a challenge to do in Wicket, even with the professor's ample examples and bootstrap code for us to use. Coupled with the poorly-written Javadocs for Wicket, I found it hard to make progress while writing the code. In the end I did finish the assignment, but I don't feel like I can do it if asked to write it from scratch.

My Wicket stack project

One problem that I did run into and could not resolve was how to write a JUnit test case for Wicket. The JUnit assert classes that Wicket provides have very lacking Javadocs and I couldn't figure out how to use most of the methods in the classes. I attempted to write some tests based on what I could figure out, but in the end a coverage report showed that the tests were not working. A specific example would be the submit button; I have three submit buttons and I overrided each one's onSubmit method to do what it's supposed to do. However, I couldn't figure out how to 'press' a specific button in the unit test because the FormTester class (provided by Wicket) only has a submit() and submit(string) method, and the submit(string) method doesn't work when I give it the name of the button. The Javadoc comment for that method is also close to non-existent, only listing it as an alternative method to...something else that it fails to mention.

Although it's easy to see how powerful a tool Wicket can be, it's also disappointing that the Javadocs are poorly written and how it's significantly harder to turn a normally simple program such as a stack into a web application. I was expecting the transition to be easier than this but perhaps this is because I'm still unfamiliar with how to use all the classes. Hopefully by the time DueDates 2.0 rolls around, I'll be able to master the art of Wicket-fu.

You can find a copy of my Wicket stack implementation by clicking here.

Monday, November 17, 2008

DueDates 1.2, with more features anew

Scotty, I need more features!
Continuing our DueDates project, we've brought the version number up to 1.2 with some new nifty features. In addition to querying the library websites and printing a list of borrowed books to the console, the program can now send e-mail too. It also supports a wakeup function that keeps the program running in the background and wakes up every so often to check for borrowed books. If there are any books due, it sends out an e-mail.

Agree to disagree
Implementing these new features was quite simple so my lab partner and I did not need to meet face-to-face this week to get them done. However, another problem did arise. We both had an idea of how we wanted to structure the program, but unfortunately our two ideas were complete opposite of each other and could not be reasonably combined. Aric wanted to have a DueDates class that processed the user input and formatted the output, and each UI class, for example a GUI or the console, would simply take the user input and pass it to DueDates, and DueDates will give the UI an output to print. So the flow goes something like this:

DueDates entry point creates Console class -> Console reads user input and passes to DueDates-> DueDates processes user input, gets output, and calls print method in Console -> Console class prints output based on information it receives from DueDates

I really don't like this way because it means that adding any methods in the UI classes meant adding additional code into DueDates to call those methods, and you can't just create an instance of the UI classes and expect it to work how you'd expect it to because it expects the calling class to be DueDates. In other words, the UI classes -only- work as user interfaces and expects another class, in this case DueDates, to handle all the processing. This seems like a very circular relationship to me, where DueDates expects to receive information from a UI class and the UI class requires DueDates to handle processing.

What I want to do is have the DueDates class be nothing more than an entry point into the application. The UI classes themselves will use another class, for example LenderHolder, with all the necessary methods to create, login, and query the libraries. The flow goes something like this:

DueDates entry point creates instance of Console -> Console reads user input and calls methods in LenderHolder -> LenderHolder creates, logs in, and queries the libraries -> Console gets the borrowed books from LenderHolder and prints out the data

I believe this is a much more elegant solution because each class can be instantiated and used on its own. DueDates is the entry point and is used if the person is simply trying to use the program. The Console class can be used by outside developers if they want to use the console to read in user input and do what the DueDates project does, and if they want to develop their own UI, they can use the LenderHolder class. In this case, there is no circular relationship and each class serves a useful function that would be usable by another developer working on another project.

We've yet to work out this disagreement, but I hope to resolve things before the next version is out. As for the overall software development process, I only ran into one other problem: my internet connection was really slow one day so whenever the Eclipse Hackystat sensor plugin tried to send DevTime data to the Hackystat server, it would take a really long time and freeze up Eclipse in the meantime. This was frustrating to say the least and made me quit developing that day because every 5 minutes, Eclipse would freeze for 1 minute while waiting for the sensor plugin to upload the data.

Look forward to more updates in the future!

Friday, November 7, 2008

Software ICU - making sure your code doesn't go blue

Code blue, code blue! We need an ICU!
In a hospital, some patients are put in an Intensive Care Unit (ICU) which monitors all their vitals and ensures that they don't end up with something like cardiac arrest, prompting doctors to come running down the hallway yelling "Code blue!" while pushing along a defibrillator machine. Likewise, for software development, we can also put our projects into a software ICU that monitors its 'health', preventing us from running down the halls yelling "Code red!" when we see all the compiler errors and unhandled exceptions.

Hackystat, it's in your project, hacking your stats
The software ICU we're using for our project is called Hackystat, written and maintained by our professor for this course. It records the data generated from several Java development tools and graph it over time:

Hackystat server screenshot showing our project's health

In the screenshot, you can see several columns:

Coverage: code coverage measured using Emma, higher is better.
Complexity: cyclomatic complexity measured using JavaNCSS, a measure of how many branches and loops the code has, lower is better.
Coupling: how dependent a class is on other classes (such as how many imports it uses), measured using DependencyFinder, lower is better.
Churn: the number of lines that are modified in a SVN commit, measured using the SVN changelogs, lower is better (commit early, commit often).
CodeIssue: the number of issues that the QA tools reported, in our case Checkstyle, FindBugs, and PMD, ideally this should be zero.
Size: size of the project in lines of code measured using SCLC.
DevTime: amount of time spent developing the project, measured using an Eclipse plug-in provided by Hackystat, it records whether there was any work done in 5-minute increments.
Commit: number of SVN commits made, measured using the SVN changelogs.
Build: number of builds made, measured using Ant.
Test: number of unit tests done on the project, measured using JUnit.

As you can see, some of the columns are colored (green for good, red for bad) and some aren't. The columns that aren't colored are the ones that are hard to use as metrics for evaluating a project's health. For example, how do you determine how healthy a project is based on the number of lines of code it has? A simple program will obviously have less lines than a large project, but that's no indicator of how healthy either of them are.

As for our project, you can see that the complexity is steadily falling and churn is going down too, indicating that we're simplifying the code and committing it more often. Unfortunately, our coverage has taken quite a huge drop, from around 80% to 50%. This is because I had to disable a unit test that was failing half the time. It was querying the same web page three times in a row and it would occasionally fail to download the page, resulting in a failure. This made me realize that I had to redesign that class because the unit tests for it are clunky and hard to write. Apart from this issue though, our project is quite healthy - lots of greens, and as long as we get those unit tests written the project status is looking good.

I want me a piece of that Hackystat too
So now you've seen what Hackystat can do, and you might be thinking that you want to use it for your projects too. But how do you set it up? Fortunately, the Hackystat website provides a detailed tutorial on how to set up your development environment correctly. Unfortunately, you have to set up each sensor manually by writing several configuration files, installing the necessary tools you want to gather data from, and then setting up the Ant build files to report the sensor data to the Hackystat server. I myself didn't run into any significant problems while setting up my system but several of my classmates did. If you're going to do it yourself, especially if you want to set up all the sensors that Hackystat supports, be prepared to spend a good deal of time getting everything to work. You only have to do it once for each computer you want to use the Hackystat system for, so it's better to just bite the bullet now and reap the benefits of having a software ICU automatically monitor your project.

So why use Hackystat?
So why use Hackystat if it's difficult to set up? After all, the tools that Hackystat monitors can all be run manually without any significant difficulties. Well, as with any tool, it's only effective if it's being used. To expect the developers (or even yourself!) to manually run each tool every day is practically asking them to forget to do it. Also, running the tools manually can only tell you the status of the project at that given moment. Hackystat can display the trend of the project over time, which can reveal information that would be hidden otherwise. Perhaps the coverage of the project is slowly dropping over time, indicating that more unit tests need to be written. Seeing a graph of the coverage over the period of a week can instantly reveal this, but it might not be obvious if the coverage percentage is still high. All in all, having another tool to monitor a project with is always a good thing.

Monday, November 3, 2008

One small step for man, one giant leap for DueDates

Finally, version 1.1
As I've posted previously, we are now adding shiny new features to our DueDates project! In addition to displaying borrowed books from either the Hawaii State Public Library System or the University of Hawaii at Manoa Library System, our project now supports sorting by either library name or by earliest due date, and also displaying only the books that are due within a certain number of days. We managed to implement everything that our professor wanted for this milestone, but it was a hard battle to get there.

It's as easy as quantum physics!
Implementing the new features was easy enough because our code is was written with modularity in mind so all we had to do was drop in the new classes and modify a few things here and there. However, as I've learned over the years of programming, it's the error checking and testing that really makes people wonder why anyone would choose computer science as their major. In DueDates 1.0, we didn't implement any error checking or error reporting, so this time around I wanted to put all of that in there in droves. At least that was the plan, until I realized that up until now I've only learned the theory behind exception handling but haven't had any practical experience implementing them. After spending several days reading up on proper ways of working with them, I think I've finally grasped the concept of throwing and catching exceptions and I put them into our project. That was the easy part.

Then came the test cases. Oh boy, this is where improper project planning really came back to bite us in the rear. Rather than writing the test cases first and then doing the implementation based on them, we decided to first write the classes and then the test cases to verify them. This doesn't sound like a bad idea in theory, but in practice it's like a visit through hell. First of all, although JUnit supports 'command-line variables' (actually, they're VM environment variables you set at runtime, but let's not be a Java Nazi), I couldn't figure out how to modify our custom Ant buildfile to use them. I could use them when I ran JUnit by itself though in Eclipse. In the end, I had to temporarily hardwire the variables into the code. The next version of DueDates will definitely resolve this issue.

The next problem was testing private methods and getting access to private fields. There are various guides floating around the internet that say if you need to test a private method, you should probably reconsider why it's private in the first place, but sometimes it just doesn't make sense to make it public or when making it package-private doesn't help because your test cases are in a different package. We had a situation like that where our entry point, the main method in the DueDates class, was calling a private method in the same class and the test case was in a different package. In order to get around this problem, I had to use reflection, a way to access private methods and fields by 'unlocking' them.

This is when the third and biggest problem cropped up. While I was writing the test cases for the DueDates class, I soon realized that I was creating a new instance of the DueDates class over and over again because I had no other way to set certain variables or perform certain operations. In turn, this bumped the JUnit run time up to more than 3 minutes! It got to the point where I didn't even want to run the tests anymore because they took so long. This made me realize that I need to break up the DueDates class into more methods to make JUnit testing easier and bring down the run time. If I had written the test cases first, I probably would've realized this early on and wouldn't have to redesign the DueDates class.

When coders collide
Once again I worked with my partner Aric, and because of how complex our code has gotten, there were multiple times where we had collisions when one of us tried to SVN commit our changes. I learned just how important it is to assign tasks that don't overlap with each other since it can take quite some time to work out the collisions. Sometimes it was really obvious whose changes to use, but other times it was a mental struggle to figure out whose code was 'superior'. Either way it is a colossal waste of time. The diff tool that comes with TortoiseSVN is pretty junky too, not only is the interface reminiscent of the MS-DOS days, but sometimes me and Aric edited the same line with the same exact text and it screamed "collision!" (why would you NOT merge two changes that are exactly the same?).

We found even less time to meet up this time around due to the differences in our schedules, but we still communicated online and as much as possible. Every time I made any sort of major change to the system, I shot him an e-mail letting him know what I did. In the future, if we can't meet up very often, it becomes even more important that we split up the project correctly to minimize overlaps.

Hudson, the best friend you've never had
Hudson provides 'continuous integration', which is just a fancy way of saying "it automatically does things you want it to." The 'things' that it does can be very helpful though, such as periodically compiling your code or automatically running QA tools. In our case, we set up Hudson to periodically check for any SVN updates, and if there is one, it will compile the code and run the JUnit tests, Checkstyle, PMD, and FindBugs. If it encounters any errors, it will set the project status to 'fail' and automatically send out an e-mail to all the group members. It's a very nice tool in this regard since it provides an extra 'set of eyes' for our code. The only problem I can foresee, which doesn't apply to us, is if Hudson is set up incorrectly. Since all it does is automatically do what a human would do, the development environment in Hudson must be set up the same way as the developers have it. Our professor gave us strict outlines on how to set up our environment, so we've had no problems compiling and verifying our code with Hudson. Although this version of DueDates was harder to work on than the 1.0 release because of all the extra features and error checking that we added in, Hudson helped alleviate some of the burden by automatically telling us when we screwed up.

Lessons learned through pain and misery
So what lessons have I learned this week, now that I've had even more experience with the DueDates project? Well...

1. WRITE YOUR TEST CASES FIRST. I can't stress this point enough. They are almost in a sense an interface for your classes. By designing a test case, you're also designing how you want your methods and variables to be accessed. That's not to say that the test cases are immutable and that the classes should conform to them absolutely. On the contrary, the test cases should be improved upon and rewritten if there are any problems with them. However, by designing them first, you can save yourself a lot of hassle later on when you realize that you can't access the methods or data members that you need.

2. Delegate out tasks to minimize overlap. Object-oriented programming tries to teach the concept of modularity, that classes should be written so that the individual pieces can be assembled together into the complete system. Likewise, developers should try to 'modularize' their work by having different people assigned to different tasks. This way you maximize everybody's time instead of butting heads trying to figure out whose version of the same source code is the 'superior' one. You don't want to start an impromptu Mortal Code Kombat, trust me.

3. Focus on one thing at a time. One thing that I've noticed is that when I come across a bug or something that needs improving but I'm not quite sure how to do it, I'd rather work on something that I do know how to do and save the other issue for later. That's what I did for the 1.0 version of the project, but for 1.1 I tried to force myself to address the problem instead of putting it off. What I realized is that although it's more frustrating to have to do the research and put in the time to trace down the problems, in the long run it's much, much better for development because you solve the problems that are affecting the system -now- instead of writing new code that can potentially introduce even more problems. As a good friend of mine once told me, "You gotta do what you gotta do." So do what you gotta do now or else you'll end up with more do's on your hand than you know what to do with and we all know what that leads to: a big pile of do-do.

Look forward to more posts as we round the corner to DueDates 1.2!

Saturday, November 1, 2008

Code review revisited: something old, something new

It's (code) hammer time!
Now that we've gotten further on our DueDates project, our professor asked us to do a code review again for a (hopefully) new and improved version. Last week we reviewed three other projects by our classmates, but due to the added functionality and length of the code, we only reviewed one project this week, DueDates-Gold.

Something new
So what's new this time around? We added another library the user could get their borrowed book info from, the Hawaii Public State Library system. We also added in sorting by due date and by library name, and also a -within flag to show only the books due within a certain number of days. Finally, we tried out a new feature of Google Project Hosting that makes code review easier by tying in the review to the specific branch of code rather than the revision number.

Something old
Unfortunately, this code review has seen a bunch of old problems crop up. The DueDates-Gold project is, simply put, not modular at all. It only has three classes excluding JUnit test cases and exceptions, and one class performs almost all of the functionality of the program. The second class is simply an ADT, and the last one contains only one method. Just like the last code review, some people in the class seem to lack a fundamental understanding of how object-oriented programming works.

One thing I did learn from reviewing their project though was that it's important to make your classes modular, but it's also important to make your project settings modular too. Everyone in the class is using the Eclipse IDE, and our projects come with settings for it so anyone using Eclipse doesn't have to manually set up things such as external libraries and build paths. When I imported the DueDates-Gold project, there were a number of problems that I had to fix before I could even compile and test it, mostly having to do with missing jar files. Also, their JUnit test cases relied on reading environment variables that the user has to set up themselves, which I believe is a very bad practice. It helped to illustrate just how important it is to set up your project so that other developers can easily download your source code and work on it without having to spend hours just trying to get their development environment the same as yours.

Bad GPH, bad boy
Google Project Hosting, you're up to your old tricks again. You tempt us with the allure of a new feature that cures cancer and eliminates hunger, but really it's just the same old crap in a new box. In my last code review, I complained about how annoying the code review process is in GPH. Any comments that people make are tied into the revision number they made them on, but you can't tell if a revision has any comments when looking at the revision changelog. This means that it's not only difficult for the reviewers to find their own comments after submitting them, but it's doubly difficult for the developer to find them in the first place. Our professor had us try a new technique this time by creating a branch of our code and reviewing the code in the branch, but there is no difference in how the code review works. In the end I had to write a guide on how to keep all the comments in one revision number so that it's easy to find for us. You can view the guide here. This also illustrates yet another problem with GPH; when you create an issue, you can't edit the description of the issue, you can only add comments or change the issue title.

Strangely, we were supposed to receive a code review from the DueDates-Green group, but we didn't get any comments from them. When I looked at their project, it seems like they haven't set up their own project for this code review either.

I'll get you next time
Although we had a better idea of what we wanted our reviewers to scrutinize this time around, our review request is still pretty general. There are problems with the code that we're already well-aware of already, for example we don't have enough error checking, so asking the reviewers to take a look at that would be a waste of their time and ours. Next time we do a code review, I want to pinpoint more specific problems that I want them to look at, such as whether the project is portable enough for them to download and compile right away, which is something that is hard to test on our own systems since they're already set up correctly. Unfortunately, because we didn't receive a code review, this question will have to remain unanswered until the next code review.

One more thing: until the GPH code review problem gets resolved, it's probably a good idea to write explicit instructions for the reviewers on how to keep their reviews in one revision, unless you enjoy doing a scavenger hunt through your revision log.

Engineering Blog (temporary title)