Thursday, October 23, 2008

Code reviewing - a coder's litmus test

I'd like a second opinion, doc
Unless your entire coding career has consisted of nothing more than "Hello world" programs, chances are you've looked over your code to see if there are any areas that can be improved and optimized. However, since you wrote it yourself, chances are you also think it's the bomb diggity when it's actually more along the lines of what your dog did on your front yard this morning. This is where having a second opinion comes in really handy, and to give us some practice in code reviewing, our professor had us review three other DueDates projects done by the other groups in our class. If you're unfamiliar with the DueDates project, refer to this post. I was assigned the following projects to review:

DueDates-Purple
DueDates-Violet
DueDates-Red

DueDates-Purple
This project, in my opinion, could use a lot of work. All of their methods are in one class file which to me seems like they either rushed to get it done or don't really understand what objected-oriented programming is. For example, their command line parser is very easily broken; it assumes that the ID number will always be 9 characters long and that the dash will always be the 6th character. The parser also only checks to see if there are 3 or 4 command-line arguments, and if it's 4, it expects '-verbose' to be the first argument. This will become a problem once we start to use multiple login credentials in one command where we'll be using multiple flags. I realize this is only the first revision of the project but to me it seems like they'll just have to end up rewriting everything anyway because in its current form it's not modular at all.

DueDates-Violet
This project was the best of the three that I reviewed. I was asked to only look at the JUnit test cases class for the ErrorLogger class, a class that keeps track of which errors occurred and then outputs them to the console. Strangely, I was not asked to look at the ErrorLogger class itself, which may have been helpful in figuring out if the test cases were good or not. Fortunately, the test cases were very well-written and I was only able to bring up some minor cases that they could've added. Also, I didn't look at the other classes, but the file names makes me feel like they understand the concept of object-oriented programming and are striving to make their code modular.

DueDates-Red
This project, like DueDates-Purple, put all of their methods into one class. I was asked to only look at their getHawaiiStateInfo method, which logs the user into the Hawaii State Library system, gets the user's borrowed items, and prints them out to the console. Apart from the fact that this one method is doing too many things by itself, it should also be put into a new class by itself and the return type should be changed to something more modular rather than outputting directly to the console. This project will also require significant rewrites.

DueDates-Orange, our own project
Likewise, we also received code reviews from everyone that was assigned to look over our project. The reviews generally pointed out one thing: that we didn't spend enough time on error reporting. Other than that, most of the comments were for minor issues, but nobody pointed out any major faults. We even received some high praises for making the system easily expandable! You can view our project here.

Some interesting observations
Throughout the reviews, I've noticed some interesting trends. First of all, people actually wrote nice comments such as "I like how you did this" or "This is what I wanted to do too". I was surprised by this because I thought a code review was supposed to point out only the bad parts of the code, although I suppose some words of encouragement would be nice to hear for the code writers. However, I think this leads to another problem that I've noticed, which is that people generally left only a few comments here and there and a good number of them were 'nice' comments. In my opinion, a code review is a critical examination of the code and any problems should be pointed out. By being nice and holding back potentially 'hurtful' comments, it pretty much renders the review useless because it sidesteps the entire point of doing a code review in the first place. The reviewers need to be brutal and say what needs to be said instead of dancing around the point.

It also seems like some people don't really grasp the concept of object-oriented programming. When we started this project, we were told that we'll be expanding it as we go along, adding more and more features over time. Therefore, it was to our best interest to make the project as modular as possible from the get-go so we don't have to end up rewriting big portions of it. Unfortunately, 2 out of the 3 projects I reviewed wrote all of the code in one class, and it was written to specifically perform only what the requirements wanted but had no future expandability in mind. I hope this is because they rushed to get the project done by the due date, but I have to say it's sort of shameful to see code still written like this in a 400-level course.

Google Project Hosting, I love to hate you
In order to do our code review, we used Google Project Hosting. In my previous blog post, I praised how GPH was a good all-in-one package for programmers who were looking for all the tools they needed to manage their projects. After this code review experience, I'm going to have to retract that statement. For reviewers, commenting on code is easy enough; you open up the source file in GPH, double-click on the line you want to edit, then type in your comments and save them. What GPH doesn't tell you, however, is that your comments don't get posted until you tell GPH to publish them, but the kicker is that there's no 'Publish comments' link anywhere to be found! To get to it, you need to click on the revision number in the changelog that your comment was for and then publish it from there, but this is completely non-obvious and no instructions are provided. I lost and entire project's worth of comments because I didn't know this.

Where's the "Publish comments" button/link?


Here it is! (on the right side)

The second issue is for people trying to look at code review comments. The comments themselves are assigned to a revision number and by default, all comments submitted are associated with the most recent revision. Since you can only view one revision at a time, this presents a problem if the comments are spread across multiple revisions. To make matters worse, the revision browser doesn't show which revisions have comments and which ones don't.

Can you tell which revisions have comments?

In our case, what happened was that some people reviewed our code and submitted comments, then we made some more revisions and some other people reviewed our code after that. Because by default GPH submits the comments to the most recent revision, the comments were split across two different revisions and it made reading through them a pain.

Bruises licked and lessons learned
So what are some of the lessons that I got out of this code review? First of all, make sure you tell your reviewers to look at the right classes. I was asked to only look at a JUnit test class, but was not asked to look at the class that it tests! Obviously I can't really evaluate a test case class very well if I'm not to look at the class it tests too. Secondly, if you're doing a code review, be as brutally honest as possible. It does you and the reviewer no good to play 'nice' and beat around the bush when there are glaring problems with the code. If the code sucks, just say it and save everybody a lot of time. Lastly, write all of your comments for a single revision, and the code writers need to tell the reviewers which revision number to write their comments for. Otherwise it makes it hard for the code writers to view the comments and a super helpful comment might go by unread, lost forever in the internet void. If these tips are followed, it should make the review process much easier for both the coders and the reviewers and prevent you from pulling out your hair in sheer frustration. Now if you'll excuse me, I need to get a haircut...

Monday, October 20, 2008

DueDates project, a first taste in multi-developer coding

Coders don't play nice with each another
From the moment we started taking ICS courses, we were told by our professors that if we use someone else's code for our assignments, it constitutes as plagiarism. As a result, the easiest way for us to prevent against this is to not share our code with others in the first place. Maybe this is the reason why so many ICS students seem so anti-social, it's all because they want to keep their code to themselves!

Luckily, our ICS 413 class aims to break this viciously bad habit because, in the real world, not sharing code pretty much means not receiving a paycheck. To help us learn that sharing code and collaborating with one another is a -good- thing, we were assigned to implement a simple program. All it had to do was check to see if we had any borrowed books from the UH library system and print it out to the console.

Every beginning...already has another beginning
To facilitate this process, our professor provided us with a Java class that uses the HttpUnit library. This library (and supporting libraries, since it relies on other stuff too) allows developers to use 'screen scraping', which is a shorter and smarter-sounding way of saying "parse HTML, automatically fill out form fields, and submit it". The Java class he gave us started us off by providing methods to log into the UH library system and pull the borrowed books data from a table. All we had to do was program it so it could be run from a console using arguments.

I'd like to order Subversion for two, please
Previously we played around with Subversion by ourselves, but this is the first assignment where more than one person had to actively work on a single project. I worked with Aric West on this project, and it was definitely a much different experience than coding individually.

We managed to implement the requirements, but not without some significant difficulties. All the lessons learned about creating test cases and planning out your project before you even begin to code really hit home. At first we decided to do a quick-and-dirty implementation to test the code that our professor gave us, but after that we decided to just expand on it since it was already written and working. This is where we should've done things differently. Although we tried to write it initially with modularity in mind, we should have first planned it out thoroughly because some parts of it required almost a complete rewrite. For example, I was tasked with creating a command line parser class. Although the class itself was pretty easy to write, the hardest part was making it play nice with everything else and I had to modify every other class to support error checking for the parser class. This problem could have been mollified if we planned out our project structure before we started coding, and although it was small enough to handle for a small program with only two developers, it's easy to see how it can become a HUGE problem when you have to tell a big group of coders "Hey guys, I need to update 20 classes to support mine because we didn't design things properly." Not exactly an efficient way of establishing good co-worker relations...

Also, just like there are different literary styles when writing a book, there are also different styles that people use when they code. Aric wrote some code that got the job done, but I wasn't a big fan of how it was written, even though there was nothing wrong with it. The control freak in me wanted to change everything around to how I wanted it to look, but I had to suppress that urge and just accept it as an alternate road leading to the same destination.

Don't call me, I'll call you
Our professor stressed that we should meet up for at least 30 minutes each day to work on the project. This is easy to do in an office environment; walk over to the other guy's cubicle and strike up a conversation with him. As students though, it was much harder to do because of conflicting schedules, difficulties in transportation, or even something as simple as finding a place to meet up at. As a result, we were only able to meet up for the first two days of the project, and we made the most progress in those two days. After the first day we already finished the quick-and-dirty implementation, and after the second we managed to modularize a big part of it. However, after that things kinda fell apart and we lost communications. Neither of us worked on the project again until the day before it was due and we could only collaborate online. Even then, that was a much different experience than in person. There's just something about working with someone else next to you that really motivates you to get the work done and avoid all distractions. Since I'm assuming that this project will continue on in the future, I'd definitely try to meet up with my group members more often, though it might become harder as the number of people increases. Still, the advantages far outweigh the hassle of synchronizing everyone's schedules, and after this project I'm sure everyone else will value it highly too.

You can read me like an open-source project
Open source development is an interesting way of working on a project. Normally people who are working on a project also see each other in person or are in the same company, but when an open-source project gets put online, there could be any number of developers from various locations, and there isn't so much a development team as there are just contributors. Hence, there need to be tools so that people can keep track of the project's progress. Luckily these tools do exist, and Google Project Hosting does a pretty good job of providing these tools in one consolidated package. It offers SVN access for both visitors and developers, a wiki to store relevant information, file hosting, and an issue tracker to delegate out tasks. However, it's not perfect and lacks in certain features that I'd personally want to see. One thing in particular is that I'm not a big fan of how you can't edit anything once it's been committed, and a history of all the changes is always kept. Nice if someone comes in and destroys all your data, but not so nice if you made a huge spelling error or posted something stupid. A feature to be able to edit or remove your changes within 15 minutes of committing them would have been nice. Overall though, Google Project Hosting is pretty awesome given that it's free and offers pretty much everything someone needs to maintain an open-source project.

Thursday, October 9, 2008

Configuration Management, Part 2

Be nice to your neighbors, now
Rarely in the real world will you be working on a programming assignment by yourself. Usually you'll have to collaborate with several people and the end result is a compilation of everybody's efforts. In part 1 of configuration management, we created and modified two Java stack projects, and we're going to do the same again, this time for a classmate. I was partnered up with Creighton Okada, who created his stack project here. Just as with the last exercise, I checked out the source code, made a slight modification, then committed the changes back to the SVN server.

SVN changelog for stack-okada project

Creighton did an excellent job on his stack project. There were no bugs or errors that Ant could find, and unlike me, he remembered to add in descriptions for all his SVN commits. All I made was a simple Javadoc comment change to his project. Although in this case, Creighton and I were not doing edits at the same time, I can see how being able to do so is a great convenience. When I was working on the CodeRuler project with Tyler, I programmed a certain block of code while he did another. We had a slight clash when we tried to merge the code because I edited some of his loops and function calls to make them more efficient, but he changed his functions so that my calls wouldn't work on his code anymore. In the end we decided to go with my changes, but this problem could've been altogether avoided if we used SVN to keep each other up to date on what the other was coding. If a small issue like this can happen between just two programmers, imagine how much more complex it will get when more people are involved. Even when there's only one developer, SVN can be helpful by keeping revision histories for every file, and it's a cinch to bring a new developer into the project. I know I'll be SVN for my projects in the future since I'm notorious for making big changes to some code that ends up breaking the system, and only then do I realize I forgot to make a backup...

Google Project Hosting, what will they think of next?
Google Project Hosting (GPH) is an interesting beast. The basic premise of the site is to host people's projects and allow anyone to use SVN to manage their projects. There is also a mailing list to automatically e-mail people any time the project is updated, though it would have been nice if GPH itself had a built-in mailer rather than having to create a separate Google Groups mailer and configuring GPH to use it. GPH is very friendly and very open-source; anyone with a Google account can create a project, and anyone can browse and download the source code for any project. They even offer different open-source licenses that your project can fall under. Although not quite the beast as SourceForge is, it offers a simple way for people to use SVN for their projects without having to set up the server themselves. And who knows, putting the project online can even attract new developers, and since it's already on a SVN server and set up, all you need to do is add them as a member into the project and they're good to go!

Wednesday, October 8, 2008

Configuration Management, Part 1

What is configuration management?
Let's suppose that I just started working at a small firm with 10 employees, and we're working on a programming project. Bob and I became fast friends and we get along great. But Bob and I have also been assigned to work on the same piece of code in the same source file. This obviously creates a problem. Just last week, Bob made the mistake of working on the file when I was also working on it, and the 2 hours I spent writing that nifty function was lost forever when Bob overwrote my changes with his. Now Bob e-mails me whenever he starts to work on the file and when he stops working on it so we don't come across this problem again. However, Bob sent me an e-mail today telling me he's going to work on the file, and it's been 4 hours since then...oh look, here comes Bob now. He just told me that he forgot to e-mail me 3 hours ago telling me he was done with the file. What did I do with my 4 hours? Twiddle my thumbs, re-arrange my desktop, change my wallpaper, and take a quick nap. Now I get to work on the file, and Bob gets to take a paid mini-vacation thanks to me.

As you can see, this is a very big problem. What if Bob and I need to work on the same file, but on different parts of it that don't overlap? What happens if Bob accidentally overwrites my changes? Is there any way of handling overlaps when they do happen? This is exactly what configuration management aims to control automatically so that the programmers can do their job rather than send out "are you done yet?" e-mails. In this case we will be using Subversion, abbreviated as SVN, to get the job done.

Subversion? Sounds scary!
A secret agent, clad in an expensive tuxedo and armed with nothing more than a small pistol, somehow manages to infiltrate a top secret enemy base and subvert the computers that are aiming nuclear warheads at a friendly country. Cue the explosions and the dashing escape, and complete the mission successfully while getting the girl to boot. Fortunately (or maybe unfortunately), this is not what happens when we use Subversion to maintain our source code. So what is Subversion? At its core, it's nothing more than a file server that allows people to upload and download files. However, it's the additional features that differentiates Subversion from other file servers. It does three things that the others don't do:

1. Revision history - SVN tracks all changes made on files so that it's possible to get any previous versions of the file. It's like a little time machine that allows you to go back to any point in the file's lifetime.
2. File edit merging - SVN allows multiple users to be working on the same file at the same time. Users 'check out' a local copy of the file, make their edits, and then upload them to the server. If two people check out a file at the same time but one of them uploads it first, the second person won't overwrite the first person's changes when he uploads his copy; the two files will be merged together with both of the edits - unless there's a overlap.
3. Merge overlap handling - if two people try to upload a file and there is an overlap in the edits, SVN will notify the user and let him handle it. This way one user cannot overwrite another user's work without his knowledge. So as we can see, SVN solves the issues mentioned earlier quite well, allowing more than one person to work on a file at once, merging their changes, and dealing with overlaps. Now that we know what SVN can do, we can move on to the practical part - actually using SVN to manage a project.

Step 1. Install an SVN client
Our assignment for this week begins off with us installing an SVN client, TortoiseSVN in this case. Don't let the name fool you, TortoiseSVN is not only free, but quite powerful. It integrates into the shell, allowing you to right click on a folder and instantly turn it into a SVN-managed folder. I ran into no problems installing and running this SVN client and I highly recommend it if you don't already have a favorite SVN client that you use.


TortoiseSVN right-click shell menu

Step 2. Modify an existing Google project
To start off simple, we're going to take an existing Google project, modify it, and then upload it to the SVN server. A nice, simple project has been provided for us by our professor, the same stack project we've worked on before. I created a folder and 'checked out' all the project files into that folder, then modified a Javadoc comment slightly and then uploaded my changes to the SVN server. The process is quite simple, and the only cumbersome part was having to use the command-line-only Ant to verify the project before and after the modifications to make sure it wasn't broken to begin with and it isn't broken after the changes were made.

SVN changelog for stack-johnson project

Step 3. Create a new Google Hosting Project
Now that we've modified an existing project and have learned how to use SVN, we're going to create our own project and upload it to Google Project Hosting so that it's SVN-managed. After creating the project page on Google Code, all we need to do is create a folder on our computer, use SVN to check out the files from the SVN server (which should contain only 3 empty folders), and then add in our source code and commit it to the server. The process is pretty straightforward, but remember to check the project settings to see which directory SVN checks out by default. Mine was set to the trunk directory, and when I uploaded my files, I put them in a folder called trunk, resulting in all my files being put into the trunk/trunk folder. Also, it's very easy to forget to add a description of what was changed, and after you finish committing it I don't think there's any way you can modify it, so don't forget to add in a description of what you modified each time.

SVN changelog for my own stack project, notice how I forgot the description the first two times

Hurray for SVN!
Although the exercises in this post don't really show off the true usefulness of SVN, it's easy to see how this tool is crucial when there are multiple people working on one project. Even when there's only one developer, it can be used to keep track of file changes, which might come in especially handy when a big part of the code is modified and the entire program breaks but there are no backups of the older versions, a situation I've been in before. SVN is much more than just a simple file server, so take advantage of all the features that it offers!