Scientists in Italy have published a paper postulating that the only way data centres can increase in size as they will need to in the future is if they are unmanned – and driven by analytical, automated systems.

The paper Towards Operator-less Data Centers Through Data-Driven, Predictive, Proactive Autonomics outlines the work of two researchers at the Department of Computer Science in the University of Pisa, who explain why the legendarily low-staffed data centre needs to completely divest itself of human dependence:

‘For the most part, current automated data center management tools are limited to low-level infrastructure provisioning, resource allocation, scheduling or monitoring tasks with no predictive capabilities. This leaves the brunt of the problem in detecting and resolving undesired behaviors to armies of operators who continuously monitor streams of data being displayed on monitors. Even at the highly optimistic rate of 26,000 servers managed per staffer, this situation is not sustainable if data centers are ever to be scaled to extreme dimensions.’

The team’s theory is that analysis of data centre logs and other event records can inform machine learning systems how to regulate and respond to system changes, and envision what they describe as ‘Autonomics 2.0’ – a data-driven, predictive and proactive holistic model which regards the data centre environment as an ecosystem, even taking into account physical and socio-political factors when forming new decisions.

The team performed an exploratory feature analysis, using the BigQuery data tool provided by the Google Cloud Platform. The challenge of analysing the logs and records in a data centre is formidable – the project required the analysis of 12 terabytes of data containing over 11 billion rows.

The machine learning model assigned to the task was an array of instances of Random Forest (RF) classifiers, which in themselves are clusters of decision trees.

‘Our results show that models with sufficient predictive powers can be built based on data found in typical logs and that they can form the basis of an effective data-driven, predictive and proactive autonomic manager.’