Data Centers in the Wild: A Large Performance Study

With the advancement of virtualization technologies and the benefit of economies of scale, industries are seeking scalable IT solutions, such as data centers hosted either in-house or by a third party. In spite of the ubiquity of data centers, little is known about their in-production performance. This study fills this gap by conducting a large scale performance survey on several thousands of data center servers within a time period that spans two years. We provide in-depth analysis on the diversity and time evolution of existing data centers by statistically characterizing typical data center server workloads, highlighting similarities and differences in the usage of basic resource components, including CPU, memory, disk, and file system. In addition, we quantify the time variability and seasonality of resource demands and how they are changing according to different geographical locations as well as virtual and physical operating systems. This survey provides a baseline for workload calibration, which is critical for the development of scalable and efficient resource management and capacity planning of future data centers.

By: R. Birke, L.Y. Chen, E. Smirni

Published in: RZ3820 in 2012


This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.


Questions about this service can be mailed to .