Steve Bowes-Phipps, a senior consultant at PTS Consulting Group, discusses operational risk assessment in the data centre…
We all think we know what a risk is. You can probably think of two or three examples straight away and most of the time, when discussing data centre risks, we might hear about the risk of the power failing or the risk of someone unauthorised trying to gain access to the facility. Those are valid risks; however, they only tell part of the story for a professional data centre manager.
Raising risks, recording them and executing mitigation and/or elimination strategies is more time-consuming than it seems. A risk is still a risk, even if you feel you have a containment strategy in place. Just because you build a Tier IV data centre it doesn’t mean that the risk of power or cooling failure goes away – it just becomes less service-affecting, so long as it doesn’t get out of control.
A root and branch risk review is a key procedure which most data centre managers should do; either on a rolling basis with limited scope (‘Waves’), or as a significant project to drastically overhaul a risk sensitive data centre. Categorising and prioritising risks is fundamental to ensuring that the limited resources you have at your disposal are spent wisely and with the greatest impact operationally.
Clarify the purpose
Familiarity with daily operations can cause ‘risk blindness’
The fundamental principle is to align the risk assessment objectives with what the data centre is there for – why does it exist? What is the business purpose?
Once that has been clarified, the risks that may prevent it from meeting its business purpose can be identified from an operational perspective. What is vital here is that operational staff take full part in this process because it is they who know what works and what doesn’t – they all have their own ‘stories’ of the organisation and that retained knowledge needs to be fully leveraged.
It is amazing how, even in a data centre with a good operating record, there are still a significant number of risks that have never been addressed properly and sit out there waiting to trip the organisation up. It is important to get a third party view because familiarity with daily operations can cause ‘risk blindness’.
While risks are typically identified through the data centre’s operational staff, the prioritisation of effort to mitigate or eliminate them is driven from senior management and the ‘risk appetite’ of the business. In this way, resources are always used effectively and have senior backing to get things done.
Understanding the impact
The final piece of the puzzle is in the review cycle, where random processes are checked to ensure they are being followed accurately. The aim here is not to catch anyone out, but to determine if controls are being ignored through lax discipline or because more effective ways of working have been discovered. Processes should adapt to the business and if there is a faster, more effective way of doing a task, an assessment and new control can be crafted to ensure that any risk to the business is fully understood and managed appropriately.
Communicating and fully engaging staff in this process has its ‘light bulb’ moments, where people see the impact of what they do, no matter how minor they think it is, and how this impacts on someone else’s role, the company’s goals or affects the customer experience. Sometimes all three!
This post originated at Data Centre Management magazine, from the same publisher as The Stack. Click here to find out more about the UK’s most important industry publication for the data centre space.