Tuesday, July 18, 2006

The 5 Magic Rules of Systems Development

I recently read about an ISP engineer who accidently deleted 700GB worth of customer emails. He got confused between a window session to the live server and one to the development server. Now, we've made exactly the same mistake a while back. As a result we came up with a short list of inalienable rules of systems development:

#1: The smallest change can make the biggest difference

Ever heard the chaos theory metaphor about a butterfly flapping its wings in Tokyo, causing it to rain in New York? It must have been a programmer who came up with that, because nowhere else are such effects more apparent than in the wild west of systems development. I've seen entire distributed systems go down because an SQL query got split onto multiple lines using "\" characters. Distributed systems integrate software from multiple vendors running on various platforms. They don't all use the same coding conventions. Live by this mantra and the rest of the rules follow naturally.

#2: Don't dive without a buddy

Never make changes to the LIVE server without testing, consultation and oversight by another team member. Take time to do 'checks' with your teammate before you commit any changes. Explain to him exactly what it is you're planning to do, in what order, what files will be changed, where they've been backed up and how you can reverse the action in case of failure etc. 9 out of 10 times everything goes fine. Prepare for the 10th time.

#3: Thou shalt not boondoggle!

Leave a software developer alone without clear timelines and deliverables and he will come up with a brilliant solution to a problem that doesn't exist. Have a clear development plan that everyone on your development team agrees to. Do not recode everything to make it 'more like Web 2.0' while there are security holes in your system, that you know about!

#4: The Power of 'One'

Do not work on many problems at once. Inevitably, one feature will seem more important today than the one you started working on yesterday. Tomorrow the same thing happens. All of a sudden, you're working on 10 features at once, and you're expected to finish all of them by the end of the month. You don't get to test all of them thoroughly before you upload to the live server on a Friday afternoon, and start getting frantic phonecalls from clients on Sunday. You now have to fix it, but first you need to find the cause. You have 5000 lines of code to filter through and you can't 'see' the problem.

Code 'one' thing, test 'one' thing, implement 'one' thing.

#5: All the world's a stage

Sometimes everything works perfectly on the development server in your office. But the moment you upload to your Live server, everything breaks!

Follow the Power of 'One' and test changes on a staging server before you implement them live. If anything breaks, you can figure out why and prevent the same thing happening when you're ready to go live. Your staging server should approximate your LIVE server as closely as possible. The only difference should be the IP address! Before you upload changes to the live server, upload them once, and only once, to the staging server. Test as if it's live. If you can, have some of your clients use the staging server for a while, and see if they experience any problems.

Follow the rules above and it could just save your job or your startup business. Either way, it will make things a lot less stressful.