Application Kata "Comparative Estimation"

Estimates are at least used in software development to assess the value of a requirement and the budget required to implement it. However, as can be seen in countless teams, such an estimate is very, very difficult to make. Nevertheless, product owners and developers are constantly asked to provide estimates.

Absolute estimate


A popular method for estimating the expected effort required to implement a requirement is the use of story points (SP). Instead of estimating real time in hours or days, an abstract metric is used. Whether a requirement estimated at 5 SP takes 5 hours or 5 days or 10 days is not known and not important. What is important is that 5 SP is more than 3 SP and less than 10 SP.

At least that is the theory. In practice, however, the person who needs an estimate soon will start converting SP estimates into durations. The reason for this is: "If it took the team 10 days to implement 85 SP last time, it will take them 10 days again next time. Every day, the team is able to deliver 8.5 SPs."

This reasoning is understandable - but absolutely counterproductive. It puts pressure on development as soon as an estimate is turned into a deadline: "Oh, so you're estimating this at 110 SP? That means you'll have it ready in 13 days."

Although the SP estimate should perhaps never be used in this way, it certainly often is.

The situation is similar when estimates are based on T-shirt sizes, e.g. S, M, L, XL.

The core problem lies in the absolute metric: a certain value (e.g. 8 or M) is used to express the amount of a required budget.

Comparative estimate


To overcome the mapping of abstract estimates to absolute time and monetary values, relative estimates can be used. Instead of assigning a number to a requirement, it is simply compared with another requirement. The result of the comparison is an order.

Absolute estimation: Requirements A, B, C are estimated at A=5, B=8, C=3.

Relative estimation: Requirements A, B, C are estimated with B>A>C.

If A+B+C then takes 10 days to implement, nothing can be transferred to the next estimate of X, Y, Z! And that is a feature, not a bug.

Relative estimation or comparative estimation works by comparing alternatives in pairs. For the requirements A, B, C, D, these pairs are to be compared: A:B, A:C, A:D, B:C, B:D, C:D. That is 3+2+1=6 comparisons for 4 alternatives. More generally, for n alternatives, (n-1)((n-1)+1)/2 = n(n-1)/2 comparisons.

For each pair, a decision must be made as to which alternative is classified as "larger", "heavier", "more valuable", "more costly"... Example:

  1. A:B -> B>A
  2. A:C -> A>C
  3. A:D -> D>A
  4. B:C -> B>C*
  5. B:D -> D>B
  6. C:D -> D>C*

The resulting sequence: D,B,A,C

Comparisons 4. and 6. are marked with *, as a different result would cause an inconsistency. They did not even have to be created.

Since B>A (1.) and A>C (2.), B must also be estimated > as C (4.) due to transitivity.

Perhaps then comparisons 4. and 6. should not be offered at all. On the other hand, it might be worthwhile to allow inconsistencies to occur because they indicate previously hidden confusion or misunderstandings that need to be resolved before a reasonable estimate can be made.

If more than one person makes a relative estimate of alternatives, the overall result can be calculated by counting ranks, e.g.

Estimator 1
Estimator 2
Estimator 3
The total rank of the alternatives would be the sum of their ranks, e.g. 1+1+2 for D.
Estimator 1 ranking
Estimator 2 ranking
Estimator 3 ranking
Total rank

In order of overall rank, this would be the result of the relative estimate: D(4),B(6),A(9),C(11)



The relative estimate can be used to estimate the expected effort for the implementation of requirements. One way to interpret the example result above could be: D will take longer than B, which will take longer than A, which will take longer than C to be implemented.

However, it is not known how long D or B will take, nor how much longer D will take than B.

Or such an estimate can be used to estimate the value that a requirement could have for a customer (or a sales department). Then the above result could be interpreted as follows: D will deliver more value than B, which will deliver even more value than A, which would be more valuable than C.

However, it is not known how much absolute value a requirement has.

This is also a feature, not a flaw of this approach!

Absolute estimates are difficult, perhaps even impossible, within reasonable margins of error. This even applies to the assignment of abstract numbers to requirements. Equating D with 13 and B with 8 is pretty accurate, even if SPs are only meant to express relative values.

But a gut feeling is enough for a purely comparative assessment in pairs. That is honest. Because developers have no more than a gut feeling when estimating. (And even that could be wrong).

The decision for B>A in step 1. above does not stand for more than "My gut feeling is that B will take longer than A. But by how much? But by how much...? I don't know." or "My gut feeling is B is more valuable than A. But by how much? I don't know."

Perhaps a comparison could be provided with a factor such as "B takes a little longer" or "B takes much longer". But more than 2 or 3 factors (e.g. "more" (>), "much more" (>>), "enormously more" (>>>)) make no sense. A gut feeling cannot be more precise.

Such a factor would then of course influence the overall rank. This could be facilitated by greater distances (>=1, >>=3, >>>=10) between alternatives. Instead of assigning them contiguous ranks, the ranking could be sparse. Example: D>B>>A>>>C could be translated into

1. D
2. B
5. A
15. C

Your task

Functional requirements

Write a program to support comparative estimation in a team.

Each estimate begins with the entry of an optional title for the estimate project (e.g. "Sprint 12") and the items (user stories) to be compared. The number of items does not have to be limited. However, if all items are entered, the maximum number of comparisons will be displayed.

The estimation project is then distributed to the team members (e.g. by telling them its ID or sending them a URL) so that they can perform their comparison.

Team members compare the project items in pairs. This can be as simple as just selecting which item in a pair is considered larger/more valuable. Or a factor can also be specified, e.g. "much bigger" or "much, much more valuable".

The overall rank of the articles can be viewed at any time. It is updated each time a team member submits his/her comparisons. (Multiple submission of comparisons for the same project will overwrite previous comparisons).

The user simply identifies himself by entering his e-mail address. No further authentication/authorization is required.

Non-functional requirements

The program should run on an intranet. Several team members can work on their comparisons simultaneously on their computers. An evaluation is possible at any time.

The number of team members is low (1..10). The number of elements per project is low (2..10).

The number of estimated projects per team ranges from 4 to 100 per year.

Enabling a real distribution of team members via the Internet would be nice, but is not necessary.

Projects are identified by an automatically generated ID. This ID is linked to the creator of the project. Project lists only show projects that belong to the current user. However, any valid project ID can be entered for a comparative estimate. No authorization is necessary. Anyone who happens to know a project ID is considered sufficiently authorized by this knowledge to make contributions.

User interface sketch of the console

The app does not need a fancy GUI. A simple console user interface is also sufficient. It could look like this. The ¶ indicates where the user presses the ENTER key after an input.
$ compest¶
N(ew estimation project, C(ompare, E(valuate project, L(ist projects, eX(it: N¶
Project #3
Title (optional): Sprint 12¶
Item A: Add button¶
Item B: Change background¶
Item C: Personalize UI¶
Item D: Add persistence¶
Item E:¶
4 items, max. number of comparisons: 6
Ok [Y/n]:¶
N(ew estimation project, C(ompare, E(valuate project, L(ist projects, eX(it: X¶
$ compest¶
N(ew estimation project, C(ompare, E(valuate project, L(ist projects, eX(it: C¶
Project number: 3¶
Sprint 12
Compare A: "Add button" to B: "Change background": B¶
Compare A: "Add button" to C: "Personalize UI": C¶
Compare A: "Add button" to D: "Add persistence": D¶
Compare B: "Change background" to C: "Personalize UI": C¶
Compare B: "Change background" to D: "Add persistence": D¶
Compare C: "Personalize UI" to D: "Add persistence": D¶
Your ranking:
 1. D: Add persistence
 2. c: Personalize UI
 3. B: Change background
 4. A: Add button
N(ew estimation project, C(ompare, E(valuate project, L(ist projects, eX(it: X¶
$ compest¶
$ compest¶
$ compest¶
N(ew estimation project, C(ompare, E(valuate project, L(ist projects, eX(it: e¶
Project number: 3¶
Total ranking of Sprint 12:
 1. D: Add persistence
 2. c: Personalize UI
 3. A: Add button
 4. B: Change background
N(ew estimation project, C(ompare, E(valuate project, L(ist projects, eX(it: L¶
1st sprint 11
2. refactoring
3. sprint 12
N(ew estimation project, C(ompare, E(valuate project, L(ist projects, eX(it: x¶

Variation: More complex user interface

A web UI or desktop GUI is of course welcome.