DCIM and Thermal Monitoring as a Data Center Monitoring Tool
Data center infrastructure management (DCIM) comprises processes and technologies used to monitor, measure, and manage a data center's physical and virtual infrastructure. DCIM utilizes tools, software, and applications to keep track of a range of key areas in data centers, such as:
1. Physical security: This includes unauthorized access and malicious activities, preventing the use of cameras, monitoring door locks, and other sensors to detect intrusions and provide alerts.
2. Environmental security: Environmental conditions such as dust, humidity, and temperature can be hazardous, and threaten the smooth running of data centers. DCIM systems help reduce equipment risk from these hazards. Equipment in data centers require a significant amount of energy, therefore, it’s crucial to ensure that the airflow in a data center is cooled and monitored to prevent equipment from overheating. The humidity in a data center must be within a specific range to avoid corrosion.
3. Asset security: DCIM monitors data center assets such as storage devices, network equipment, and servers to identify unauthorized activities occurring on critical assets.
4. Logical security: System logs, network traffic and other data are monitored by DCIM to alert personnel to suspicious activities, data and network breaches.
Data center infrastructure management, or DCIM, utilizes monitoring tools to gather asset data to help improve operational efficiencies across the entire organization. These can be divided into different levels, including:
1. Enterprise-class monitoring: Many nodes across numerous data can be managed through monitoring, data collection, thresholds, and alerts. This comprises environmental sensors, busways, busbars/bus ducts, UPS, PDUs, Remote Power Panels (RPPs), Computer Room Air Handling CRAS and multiple protocols like Modbus, SNMP and Building Automation and Control Network (BACnet).
2. Data distribution and storage management.
3. Infrastructure monitoring.
Thermal monitoring is the process of collecting and analyzing data about the temperature of critical electrical assets in a data center.
Thermal monitoring is used in data centres to monitor the temperature of the electrical equipment and infrastructure to prevent overheating and, therefore, equipment failure. This is an important element that contributes to power availability and system uptime.
Temperature rise, especially on electrical joints, is a warning sign that potential issues such as a loose or compromised connection may be present. Left unchecked, there is increased risk of electrical equipment failure, which can put personnel working on or around these critical electrical assets at higher risk. Monitoring the temperature of electrical joints helps not only to avoid downtime and damage to critical infrastructure that can otherwise lead to reduced efficiency, corrupt data, or equipment failure, but it can also help keep personnel safe around assets.
Data center operators face several challenges, but equipment overheating is one of the most critical. Overheating equipment can lead to unplanned downtime, which has a detrimental effect on service reliability for customers and leads to significant financial and reputational costs.As reliance on data increases, there is a greater need for technology such as continuous thermal monitoring to help prevent outages and avoid unplanned downtime.
The adoption of thermal monitoring in data centers is accelerating because it is helping engineering teams minimize equipment damage and reduce the likelihood of outages that can result from undetected faults.
Thermal monitoring can be implemented in data centers in a variety of ways, which include:
Continuous Thermal Monitoring (CTM): CTM is a condition-based monitoring approach that can take the place of periodic inspection using infrared (IR) imaging cameras. It is a proactive way of monitoring the temperature of electrical infrastructure in data centers and other industries that utilize critical infrastructure. It involves using sensors to continuously measure and monitor the temperature of various electrical assets across the data centre, providing real-time data on the health of monitored assets. The sensors provide real-time temperature data, alerting personnel to temperature rises before they exceed safe thresholds. The data from these sensors can then be gathered and analyzed to make insightful decisions and identify potential faults. These sensors can be integrated into SCADA/BMS systems, providing alarms, notifications, trends, and analysis, helping with predictive maintenance.
Thermal imaging cameras: Utilizing thermal imaging cameras, or IR thermography, is another thermal monitoring method. These cameras capture photos of the heat that electrical equipment emits. Hot spots and other issues that might not be obvious to the naked eye might be found using thermal cameras. This approach has proved historically popular but is rapidly being replaced by more predictive approaches such as CTM, outlined above.
Audits and maintenance: This is a preventive maintenance approach that is carried out at regular periods to ensure cooling systems, HVAC (Heating, Ventilation and Air Conditioning) and other critical infrastructure are operating optimally.
Building greater resilience in data centers is critical for owners and operators to run reliable and sustainable facilities that meet future demands. Maintaining efficiency and electrical safety are essential; therefore, monitoring the temperature of critical assets helps to understand where potential failures in critical equipment are likely to occur in advance of an outage. The alerts from temperature monitoring provide information that can be used to schedule predictive maintenance and a more proactive approach for operational personnel.
Discuss your specific application requirements with our expert engineers, obtain additional technical information, or learn more about our other applications.