Thursday, February 21, 2008

Learning and Fixing our Bugs

I had a particularly nasty issue assigned to me a week or two ago. It's one of those bugs that I spent so much time and effort on that it was difficult to enjoy my evenings and weekend. It was nasty for a variety of reasons.


First, it involved a program component which was originally written by a developer who had already decided he would no longer work for the company, but decided he'd finish this component. As you can imagine the code is in horrible shape. Few people have any knowledge of this component, and none of those who do understand the entire system.


Second, this component could be considered an entire software solution in its own right. The size of the project is large enough that it is often installed on its own machine in production.


Third, this component was installed in an environment in the testing environment which had taken 6 weeks to setup (normally, just a few days is required).


Fourth, the error message was one that had occurred before, and in most cases, pointed to some particular setting being incorrect. Thus, it made it very difficult to rule out environmental issues.


Finally, this component is not used by one software team, but three. Since all three teams use the same component, no one “owns” the component. However, there are “satellite” assemblies of this component which each of the three teams provide on their own.


Because of the third and fourth reasons (similar errors pointing to environment issues), I spent a good amount of time traveling down that path. In the end, it proved to be a very complicated issue. There were multiple reasons for the errors. Most of the errors could be tracked down to the final reason, no one owning the software component. Due to this fact, code for this component was not delivered correctly, because of the satellite assemblies.


I learned a couple valuable lessons from this error though. First, I had great difficulty recreating the error due to a poor setup on my development machine and I can never let that happen. While I normally detest debugging, debugging this component is absolutely essential, for the size of the project and the poor state of the code is to much for one developer to grasp at one time.


Second, and more importantly, I learned it is a VERY bad idea to have a large internal component which is used by several teams, but no one team is responsible for maintaining. In hindsight, this is obvious, and probably rarely happens in other companies (if does in yours, I'd like to know).


While I could have done a much better job in handling this error and I could have solved it in a much quicker fashion, I have at least taken a couple things away from my mistakes, and my boss has recognized that this component needs to be developed by one team, and one team only.


So, recognize that mistakes will happen, but always learn from them, especially in the field of Development.