By now all of you must have heard about the debacle that is the latest census. The intent of going electronic was right, but the execution left a lot to be desired! Unfortunately, the issues we had with the site has had a number of negative consequences. A full review will be commissioned, but in the meantime we can expect “heads to roll” and the IT supplier is likely to have to pay some sort of compensation. However, from my perspective, the biggest fallout is the big dent in confidence that this debacle has led to with the general public. Indeed, it will be a pity if this issue leads to future technology projects not getting off the ground due to this general lack of confidence.
Having looked at the potential fallout, let’s try and analyse what may have potentially happened. I am using the word ‘potentially’ here as no formal investigation has been conducted and as a result there is no official word on this. From what we have read and heard, any one of the following scenarios could have occurred:
- A distributed Denial of Service (DDoS) attack may have taken place from overseas, either overwhelming the internet connection to the website or the servers hosting the website themselves
- A similar attack could have occurred emanating locally from within Australia
- The web infrastructure hosting the website was not designed with enough capacity and when a large portion of the population tried to logon at the same time on Tuesday evening, the website failed
- A fourth scenario is possible where there was a failure in any of the system components of the web infrastructure leading to the outage, but this perhaps the least likely of all scenarios.
As stated before, without a full investigation being conducted, it is difficult to say which of the above may have eventuated.
Having discussed the possible scenarios above, let’s now look at the mitigations that could have been in place to prevent the issue. I will only concentrate on the first three as these are the likely scenarios:
- An international DDoS – there are many solutions in the market available from technology providers to help mitigate DDoS attacks. Attacks flooding the internet connection to the website are best addressed using any of a number of technologies available on the market in conjunction with the internet provider. Attacks targeting the website itself can be addressed using various technologies in conjunction with the website designer. To stop an overseas based DDoS attack, your internet provider could simply filter out all requests to the website originating from an overseas IP address. The point here is that there are many technology solutions in the market that can easily be obtained to address a DDoS attack.
- A local DDoS – the solutions discussed above are equally applicable to a local DDoS attack as well. The only exception is that it is counter intuitive to filter out local IP addresses for obvious reasons. DDoS attacks are quite common now and for something as prominent as the census website, protection against DDoS attacks should have been a mandatory requirement
- Inadequate capacity within the web infrastructure – this again can, and should have been, addressed with relative ease. Estimating peak load (e.g. 70% of the population logging on between 7.30pm and 11.30pm on census night) and then testing the ability of the website to handle this load should have clearly revealed its robustness in this scenario. If issues were found, adding more capacity and / or applying other technology solutions such as load balancers should have addressed the issue. The point to remember here is that your estimate of peak load has to be accurate in order to model the right amount of traffic and subsequently be able to detect any issues and remediate them. I suspect that this may have led to some of the issues we saw
The government’s initiative to use electronic means to carry out the census was a good call in utilising technology to make things more efficient and effective. Unfortunately, the execution was not quite on the mark and has potentially led to a general loss of confidence in projects of this type. A number of possible causes for this issue has been discussed ranging from overseas and local DDoS attacks to the infrastructure not being able to handle peak load. I have discussed some simple steps to mitigate these issues ranging from technology solutions to mitigate DDoS attacks to simple steps around load testing and then implementing technology solutions to address any issues determined. One can only hope that all parties involved can learn some key lessons from this situation and that we do not get a repeat of this with future technology initiatives.