An Engineer's guide to Scrum
Agile development, or Scrum is a management technique that's become very popular in software companies...especially game software companies.
Teams and Sprints
The company is split into teams of between maybe 4 and perhaps at most 10 people. Life proceeds in "Sprints" of between 2 and 4 weeks - longer when you're in long-term development cycles, shorter when you approach a deliverable. People may move from one team to another from sprint to the next - but generally, we try not to do that too much. A sprint is a self-contained event - we take in "Stories" at the beginning, we finish them and produce releasable, essentially bug-free, working code at the end.
Each sprint has a pattern - with some ritualized meetings.
Everything starts with the "Stake holders" (marketing people, senior managers, product managers) writing "Stories" describing the things they want to be able to do with the product. These are kinda formulaic and are supposed to fit on a half sheet of paper:
- As a user of WonderProduct2000, I need the ability for the spreadsheet component to bake bread.
- SUCH THAT:
- The bread is light and fluffy.
- Spreadsheet processing is not degraded while baking.
- There is a pleasant "BING!" sound when it's done.
- CONDITIONS OF SATISFACTION:
- Bread fluffy-ometer readings of 0.4 or above are obtained.
- Bread is not burned.
- BING! Sound makes small children yell "Mommy, mommy, can I have some?"
- BONUS FEATURES:
Stories are supposed to describe desired results - they should never describe the method by which the result is to be achieved.
Some stories are just investigation tasks - with conditions of satisfaction (COS) that include documentation - maybe a stand-up presentation and a sign-off from some committee or other. Most result in a concrete "thing" - code, documentation, art, audio. Something that can be demonstrated as meeting it's COS.
Other stories are too big for the team to complete within one sprint - but it's a rule that those have to be broken down into smaller sub-stories. Stories are thus heirarchical in nature. "Epic" stories may embody the guiding principles for an entire multi-year development cycle, but the process of recursive sub-division must always result in stories that are one sprint or less in duration.
Generally, the COS should be designed to be testable - preferably by the QA department and/or by a demonstration at the sprint retrospective meeting. A research story might end up with there being a document - and the COS might include "* Eric has read the final document and is happy with it."
(Which may in fact happen on the preceding Friday...)
At the start of the sprint, the team and their immediate manager spend half a day going through the stories - some of which are new and urgent - some are old and dusty (the "Product backlog"). We discuss how we're going to implement each new one and briefly ensure that nothing has changed about the old ones (maybe they've become moot - or need to be rewritten in some way) - and for each new or revised one, we estimate the complexity of the chosen approach.
In small or skill-diverse teams, the person who will be working on the job can do a complexity estimation directly in man-hours. This is emphatically NOT a managerial role. When the engineer says "this is a 16 man-hour job" - they are effectively entering a promise to complete the work in that amount of time.
Larger, more skill-uniform teams use "Planning Poker" (I kid you not!). We have a hand of special "planning poker" playing cards for each engineer (they have goats on them!) - they are marked with numbers in the corners. The numbers are in units of "difficulty" - which is an abstract quantity that doesn't relate to "effort" or "man-hours" (well, not directly). We have a handful of old "benchmark" stories that we can compare to. We have a gut feel for what is a 10 point story, what is a 3 pointer and so on.
Each person with the skill-set to work on the story places a card face-down on the table representing his/her estimate for the difficulty of the task and on the count of three we all show our estimates. If everyone more or less agrees - we write that estimate into the story and move quickly along to the next one. If we don't agree then the highest and lowest outliers explain why they disagree with the majority...and we play another round of cards. Pretty soon we more or less agree. If someone digs their heels in for a low-ball number - then they can expect to be doing the work! But generally, the resulting estimate should be such that all of the people involved would feel happy to pick up the task and complete it at that estimate.
Repeat until all new stories are scored. Stories carried over from previous sprints may need to be re-visited if conditions surrounding their implementation has changed. But by the end, we have good estimates for all of the stories. Big stories that will need to be broken down later will have more approximate estimates - but those can be updated as their component stories emerge and can be accurately estimated.
Now, management can put together the estimates and pick a selection of high priority ones that should be worked on this sprint. Engineers then take these stories and break them down into individual "tasks" - atoms of work - generally between 4 and 16 hours each. These are written on yellow post-it notes in fat black "sharpie" pens to guarantee that the task can be described in just a few words...with the hourly estimate written in the bottom-right corner.
This discipline of breaking into bite sized chunks seems to help with estimating. Using a black sharpie and a post-it note ensures that we have bite-sized descriptions.
Now, after lunch, usually, we take another look at the total workload for the sprint - we may adjust the hours, split some tasks up into sub-stories. If, after detailed estimation, we don't think we can get all of the stories done - then we take the lowest priority stories and transfer them onto pink post-its to indicate that these are "BONUS" tasks - or maybe bump the story out of the sprint altogether.
If we think we can do more work than we estimated in poker, then we grab new stories and task them out onto pink BONUS post-its too.
Nobody leaves the meeting until every one of the engineers agrees that the yellow tasks can and WILL be completed in the upcoming sprint. This is VITALLY important. If you agree - then you've made a personal commitment to finishing that story. If you don't agree - then even the lowliest engineer can force a story out of the sprint in the face of any amount of management insistence...it's a rule. If managers don't like that - then it's their job to modify the stories, maybe split a big story into smaller ones - maybe shift priorities, get more staff...the one thing they cannot do is to cajole an engineer into accepting a lower estimate...not EVER.
We stick all of the yellow and pink post-its onto a whiteboard out in the corridor outside our office...someplace public, uncomfortable, and with no chairs! Planning monday is over.
The Scrum Board
OK - to the "SCUM BOARD"!
The Scrum board is laid out in several columns with boxes at the bottom marked "MOOT" and "BLOCKED".
Different projects and different companies have different layouts: I've seen 5 columns: "PENDING", "IN PROGRESS", "IN CODE-REVIEW", "IN QA" and "COMPLETE" representing the life-cycle of a task. But "PENDING", "WORKING", "COMMITTED", "IN QA" and "COMPLETE" is another option...whatever works. Art projects will have different setups...maybe "MESH-COMPLETE", "TEXTURE COMPLETE", "ANIMATION RIGGED", etc. There should always be a "PENDING", a "COMPLETE" and one or two 'working' columns.
Tasks start in PENDING and work their way across to COMPLETE over the life of the sprint. By the end of the sprint, every single yellow post it MUST be in the right-hand column. The only way out is for the task to be considered "MOOT" (we decided that this wasn't necessary in order to complete the story because we found an easier way) - or to be "BLOCKED" by some external issue that's beyond our control (my hard drive exploded, Fred who works in our other office didn't do what he promised - and we relied on that for our estimates).
BLOCKED tasks are a monumental issue - it is the task of the 'Scrum Master' to find out why tasks are blocked and what can be done to unblock them. It is super-rare for a task to stay blocked past the end of the sprint.
When stories don't make it into the final column, hard questions must be asked. Did someone get sick? Did the estimate prove to be wildly wrong...and why? Is there something wrong with our process? These bad stories must be carried over into the next sprint - but that's a failure of the team...it's not supposed to happen, and if it happens often, then management need to step in and figure out what's broken.
Another thing is that the scrum board is "owned" by the team. If one story isn't making it - the people who aren't working on it should be asking themselves if they have the time/ability to help out the engineer who's struggling. The TEAM is responsible for pushing the stories over the line - and if one person isn't getting things done, they'll notice and peer-pressure should kick in.
The Scrum Master
The scrum master shouldn't be a managment position - it's generally a part-time job and it doesn't come with ANY authority. It's purely an administrative job.
Some teams have their lead engineer do the job - or maybe even a QA guy - for a while we took it in turns to be scrum master with a different person taking the job each sprint. Some companies actually employ someone specifically to do this job.
Now, at a fixed time - typically early each morning - all of the engineers on the team and their "Scrum Master" (and anyone else who cares to come along to listen) meets up at the scrum board. The engineers are called "Pigs" and the other people "Chickens". There is an old joke that explains these names (When you have ham and eggs for breakfast, the chicken is involved but the pig is committed!). If you are late by even a second for the kick-off - it costs you a dollar.
The scrum has rules. Only pigs are allowed to speak - chickens keep quiet unless asked to contribute (which is rare). The scrum is FAST - 10 minutes is too long...hence doing it out in the corridor - with no chairs!
Each pig takes a turn to stand up in front of the board, briefly explain what they did during the previous day - cross off hours from the post-it notes on to show how much work they estimate is left to do on the task(s) they worked on. That number can go up or down or stay the same...it's an estimate of what's left - not an indication of the number of hours you spent on it.
If there is less than 8 hours left on the current task - or if the task is finished, they pick another yellow post-it from the board and initial it to indicate that they have taken responsibility for doing it. In some teams (and especially in bug-fix sprints) people can grab any task they fancy - but in many cases, we put each person in charge of a complete story that they work from start to finish.
When we run out of yellow (and blue!) post-its - we start in on the pink "BONUS" stories.
When a task is done, the post-it moves on to the next column - so it gets peer-reviewed, QA'ed, Documented, whatever. At some point (depending on company policy) it crosses a line that requires it to be checked into the source repository. When it reaches the right-hand column, the code will have been checked in, tested, documented and it's done, completely, utterly DONE!
If you are BLOCKED for some reason, it is CRITICAL that you explain this at the scrum-board. If everyone agrees that you are stuck - you put the post-it into the BLOCKED box and find another one to work on. Scrum finishes with everyone discussing how to un-block the tasks in the BLOCKED box...this is very important.
The Burn Down Chart
After scrum, the scrum-master writes down the number of hours that were crossed off and updates a "Burn Down" graph - we keep a spread-sheet, but some people just draw it on paper. The graph is taped to the scrum board so anyone wandering down the corridor can see at a glance whether the team is performing on target.
At it's simplest, this graph has "DAYS" on the X axis and "HOURS" on the Y. There are two curves - one is a straight line that goes from (8 hours)*(sprint_length_in_days-2)*(number_of_engineers) on day #1 (day #0 was planning day) down to zero on the last day of the sprint - this is what we hope will happen. The other curve shows the total number of hours left in tasks that aren't in the FINISHED column on each day of the sprint - it shows what's actually happening! If the second curve is higher than the first one...everyone works harder...stays late! If we're a little below the first curve, we feel comfortable. If we're too far below the curve, management ask why we underestimated our tasks!
If a "quick popup task" arrives mid-sprint (like a nasty bug or something) - we call it a "found task" - we estimate the hours and write it on a blue post-it note. The team has to make a snap decision to either agree to somehow fit it in with the other tasks - or to punt it off into the next sprint. If you're above the line on the burndown chart - you should probably punt it. If you're under the line, you can see if you have the capacity to add it to this sprint.
Nobody other than the engineers is empowered to put a blue post-it onto the board. But once it's there - it has to make it down to the end with all the others...when the engineers make a commitment - they are expected to make it.
If it's a long sprint, we might spend an hour in a "mid-sprint review" - where we identify any problems if we're "above the line" on the sprint. Maybe reshuffle some tasks between people if one person is behind and another ahead.
End of sprint
At the end of the sprint - hopefully - all of the tasks are done, and with luck, we have some pink "BONUS" and blue "FOUND TASK" post-it's down there too.
On the last day, we have two more meetings - one is spent convincing each other (and management) that we met the "CONDITIONS OF SATISFACTION" of the stories. This entails live demonstrations, display of screenshots, pointing to documentation on the Wiki...that kind of thing. There are cheers and applause when people stand up and explain what they did.
We lock down the repository, make a release with the resulting code and branch the version control system ready to start again on the first Monday of the next sprint.
The second meeting is "Sprint retrospective" where we ask (and answer) the three ritual questions: "What did we do right? What did we do wrong? What didn't we do?"...after a bad sprint, this is a long, painful meeting. After a good one, it's a lot of fun.
We often have an end-of-sprint party...free food, etc. Sadly, the government frowned on us spending their money on beer - so we don't do that anymore! :-(
That's the basics from an engineers' point of view. There is more to it than this though. Managers have "Epic" Stories...stories that are broken down into lesser stories. Engineers can also write stories - either simple ones describing a bug fix - larger ones describing things that we know we need to get done - but which are "invisible" to managers - even Epic ones if we have some major new idea to put forward.
When there are lots of teams - and lots of scrums, the scrum-masters get together once a day and have a "Scrum of scrums" where they describe what their teams are doing, how it's going, what the cross-team blockers are, etc. Similarly, they have scrum-of-scrum planning, scrum-of-scrum sprint retrospectives, etc. In a giant company, there might even be scrum-of-scrum-of-scrum levels.
Isn't it a little...um...silly?
The ritualized, post-it notes and sharpies, pigs and chickens, planning poker?! Sometimes the process seems a little crazy - but it works! It's like magic. The necessity of standing up in front of your team-mates and explaining what you did yesterday is a strong incentive not to goof-off. If you read email, surfed the web, ate junk food in the kitchen and talked to your buddies all day - then when you come to stand at the scrum board the next day and can only cross 2 hours off an 8 hour task...you really feel the pain...and you can see at a glance that you've got to pull it back by staying on-task.
I get a real high from showing off my latest stuff at the end-of-sprint meeting.
You get into the rhythm of sprint cycles - we mostly have 3 week sprints and the pressure towards the end is balanced by the relatively relaxation at start of the next.
People seem to be VASTLY more productive in a scrum system. Also, people feel more empowered. The system doesn't allow managers to pressure you into taking on more work than you can manage. The responsibility to make deadlines is entirely of your own making - so when things go wrong, you have nobody to blame but yourself. Of course, if you habitually quote high numbers of hours to do work then that's noticable from the ratio of planning poker points to hours estimated. It's no protection against you getting a crappy annual review! But it's hard to be a slacker when it's your peers who are seeing your estimates and sharing the pain when you fail.
If you don't make the work by the end of the sprint, you have NOBODY to blame but yourself. It's a horrible feeling. People work extra hours just to avoid that happening...nobody forces them to crunch - they do it because they feel bad about making a promise and not meeting it. You said how long it would take you - you swore on a pile of post-it notes that your estimate was good - and you discussed it with everyone else. You could have chosen to do less. The pressure not to overrun is gigantic...and when you do overrun, you rapidly learn to make smaller estimates next time!
NOBODY can derail a story once it's started - NOBODY can inject a new "Panic" story in mid-sprint. Life is calmer, more measured.
Despite that seeming-rigidity, the system is "AGILE" because managers can inject new stories and redirect the team at 2 to 4 weeks notice. They know that the team will become free to consider new stories at the end of the sprint. They know to the day when their new requirement will be met.
They can plan ahead very accurately because they have the stories in the queue, marked by "difficulty" from the planning poker - and the stories that have been completed - along with their difficulty scores AND the actual number of man-hours the team took to complete them. They can use statistics to track the ratio between the difficulty number and the man-hours to get a pretty good idea of how long some major new "Epic" story/feature will take to become a reality. (They talk about "team momentum"). They can also note when the ratio of hours consumed to difficulty-estimated gets worse - that something is going wrong with the team - a morale issue maybe? Someone on the team not pulling their weight perhaps.
Since each sprint is atomic - you (theoretically) have a releasable product at the end of each sprint cycle (although in practice, we generally insert a 2-week mini-sprint before each release - and in those sprints, the only stories we work are bug fixes).
Releases always happen on-time - although they may not have all of the features ("stories") you wanted them to have.
Conclusion - Does It Work?
I honestly believe in this system. Having worked this way in two companies, it's MAGICAL. Everyone loves it, we're happier, we get more work done, management are happy. It just works! People who come for job interviews often ask "Do you guys do scrum?" and they're very happy when you say "Yes!". Lots of people (myself included) would never consider going back to the bad old days.
What I describe here is pretty much the standard flavor of scrum...but there are other variations.