What is not quite so commonly discussed is the fact that not only can outcomes be affected by the inherent variation of the durations themselves, but the task durations may be affected by the outcomes of other tasks or outside influences.

For example, we may have a case where we know that if one task takes longer time than expected, then another task will also take longer. In practice, we see this all the time where a requirements definition suddenly becomes a lot bigger than planned with additional features which in turn makes design take longer than planned because there is now a lot more to cover off in the solution. And so on.

In a general way, we accept this because we do risk identification and try to work that into our estimates. This is a start, but integrating probabilities in our heads is a difficult thing, indeed.

In fact, we the humans are notoriously bad at interpreting and accounting for the effects of such dependencies or correlations in anything but trivial networks. When these connections form bona fide feedback loops, it gets a lot worse. Now, Monte Carlo simulation is not necessarily the best tool for analysing networks with extensive feedback loops (e.g., System Dynamics is far better suited for this task), but we can nevertheless address a certain level of correlations.

Some authors claim that correlations between tasks in a project schedule may be more important than the choice of distribution to the distribution of the expected finish times (e.g., David M. Wall, *Distributions and correlations in Monte Carlo simulation*, Construction Management and Economics, 1997, 15:3, 241-258).

If this is indeed the case, then understanding how to apply such correlations to a project schedule when doing a Monte Carlo simulation would certainly be useful.

Oen way to explore this subject is to do an experiment and in the following I will be using the correlation group feature of Full Monte, a Monte Carlo simulation tool for MS-Project, to illustrate how correlations may affect a schedule. I am grateful for the many useful comments received from Tony Welsh of Barbecana, the creator of Full Monte, clarifying the Correlation Group feature in Full Monte in order to make this experiment possible.

## Objectives of the experiment

- Use a simple set of experiments with Full Monte to demonstrate how can we technically apply Correlation Groups in Full Monte
- Demonstrate the effects of correlation with two scenarios (correlated tasks are in parallel, correlated tasks are sequential)
- See if there are any observable changes in the expected finish dates
- Present key learning points for practical application of correlations to real projects

## Project structure for all experiments

- Two tasks, A & B, with identical, deterministic duration of 209 weeks (large time frame makes it easier to see effects even if subtle).
- Correlation between the two tasks is either 0% (none) or 100% (strong).
- Tasks are either in parallel (A&B occurs at the same time) or in sequence (A occurs before B).
- This gives us four different scenarios (S):
- S1 – Parallel, 0% correlation
- S2 – Parallel, 100% correlation
- S3 – Sequential, 0% correlation
- S4 – Sequential, 100% correlation

Note: I used a fixed seed for the runs in order to remove spurious variation from the effects of random seed application (small effect with large number of variations, but let’s only vary the real experimental variables).

## S1 – Parallel tasks, 0% correlation

This is the baseline where A and B take place at the same time, i.e., the project structure has two parallel tasks.

The tasks are setup in Full Monte in the Edit screen with a Uniformdistribution. The Most Likely estimate is greyed out because this is simply using the deterministic estimate of 209 weeks. I left the default percentages for Optimistic and Pessimistic as suggested by the software simply because I don’t care about the range at this point. We just want to see the main behaviour of using correlations, so the actual variation does not really matter.

As we can see, there is no Correlation Group specified for the baseline.

Next I setup the run in the Risk and Analysis screen. I only care about finish times, not cost, so excluded that. Similarly, I don’t care about sensitivity either for the experiment. The choice of 10,000 simulations is arbitrary, I just want a high enough number to stabilize the results. For a two task project, this should be plenty (you can test the assumption by gradually stepping up the number of simulations and observing the point where the standard deviation of the outcomes stabilize, i.e., stop changing).

Clicking ‘OK’ kicks of the simulation and I got an outcome with a mean finish date of 10/2/2016 with a standard deviation of 49.52 weeks, plus various confidence levels for the distribution of expected finish dates. (I’ll summarize the results for S1 and S2 below for the comparison.)

The most immediate and striking outcome is the shape of the distribution which is now triangular in shape.

Next I ran S2 which factors in a correlation.

## S2 – Parallel tasks, 100% correlation

Once again, the tasks are setup as for S1, but this time we add a Correlation Group which I simply called S2 so I could keep the graphics apart later and then associated the 100% correlation as our experimental parameter value.

The simulation setup remains the same for all the scenarios.

Now for the outcome. As we can see, this time the outcome distribution is a Uniform distribution.

## Comparing S1 and S2

I have summarized the results from the histograms in the table below plus added results for somethign I call S1′ which we’ll get to shortly.

Scenario |
Confidence level % |
Mean |
Std.Dev |
||||

0 |
10 |
50 |
90 |
100 |
|||

S1 |
02/10/14 |
05/11/15 |
12/07/16 |
11/17/17 |
02/19/18 |
10/02/16 |
49.52 |

S2 |
01/06/14 |
06/20/14 |
01/27/16 |
09/07/17 |
02/26/18 |
01/31/16 |
61.19 |

S1’ |
01/06/14 |
06/27/14 |
02/09/16 |
09/14/17 |
02/26/18 |
02/07/16 |
61.19 |

The observation we can make here is that the parallel tasks when correlated exhibit higher variation but yield a tighter confidence interval for e.g., the 10-90% range. This is because when we correlate 100% as in this case, we give pretty much total control to the first task (A) and basically end up sampling from one distribution.

We can prove this by re-running S1 with just one task (I simply set B to 0 weeks duration). The results are marked S1′ and as we can see, the distribution is very similar to S2.

It’s a bit easier to see this if instead of dates we calculate the durations in days from start to the dates for the respective confidence values and then show that in a bar graph like this:

Here we see the effets of the correlations where the ranges for the confidence intervals changed when we added correlations (and we see that S2 is quite similar to the ‘one task’ project per S1′).

## S3 – Sequential tasks, 0% correlation

Now, let’s look at the effects when the two tasks are sequential. First the baseline which is identical to S1 except for tasks B now following task A.

Obviously, the project is now taking longer than for S1 and S2. Also, the distribution of expected finish dates is tending towards normal which we would expect from the Central Limit Theorem.

But what happens when we add correlation?

## S4 – Sequential tasks, 100% correlation

I added a Correlation Group S4 with 100% correlation between A and B and ran another simulation.

As for S4, we see that the distribution of expected finish dates tends towards a Uniform distribution. This is will not be the case if either of the tasks have a distribution other than Uniform (try it, the outcomes will tend towards Normal).

## Comparing S3 and S4

I put the two outcomes in a table and we can now see that the addition of correlation changed the risk profile of the project. The confidence intervals shrank slightly, except for the last 10%, but the variability (or volatility) increased as seen from the larger Standard Deviation.

Scenario |
Confidence level |
Mean |
Std.Dev |
||||

0 |
10 |
50 |
90 |
100 |
|||

S3 |
02/22/16 |
11/19/17 |
02/02/20 |
05/05/22 |
01/28/24 |
02/09/20 |
86.19 |

S4 |
12/28/15 |
11/07/16 |
01/31/20 |
04/18/23 |
03/18/24 |
02/02/20 |
122.62 |

Again, its a little easier to see the differences if we create a bar graph like for S1 and S2.

The effect tends in the same direction as for the parallel task example.

## Learning points

Obviously, the effect of between task correlations will differ depending upon both the strength and the direction of the correlation. In these two examples, I only tried a strong, positive correlation in order to test how to set up the problem in the tool and to get an idea about some of the resulting behaviours from such correlations.

The effects can be subtle and may not always be so easy to spot in large project structures.

In practice, I would build up the project structure and estimates in stages by creating a baseline and then adding variations one at a time to develop a progression of the effects.

As any experimenter knows, if you make more than one change, you cannot know which one specifically is responsible for the observed changes in outcomes.

That said, there are experimental techniques like the Taguchi method which introduce more than one change at a time, but it is not clear how such a method is easily applied to project structures due to the critically low availability of prior, empirical information about durations and their distributions.

Suffice it to say that setting up the problem in a tool like Full Monte is technically not terribly difficult. The hard part is knowing what to set up in the first place. It is worth noting that the concept of a correlation group makes it easy to link many tasks together to model correlations rather than just two tasks as in the above example. Knowing that this is a smart thing to do, is another story.