When people talk about AI, two words tend to come out of their mouths: chips and Nvidia. Jensen Huang’s company is the largest company on earth primarily thanks to the sales of GPUs (Graphics Processing Units) that power the AI workloads on which the world is increasingly relying. Most people also know that the infrastructure needed to power said AI workloads can be encapsulated in two words: data centers. While most people are familiar with what a data center is, few understand how it operates and its key components.
Over the past few weeks, I have been looking into three companies, two of which operate in the HVAC industry. I initially wanted to look into these companies because they have been growing at a good pace and their stocks are off by more than 40% from their highs. However, I soon discovered the source of their rapid growth: data centers! The reason the data center cooling market has been growing rapidly is somewhat straightforward: outside of semiconductors and general IT equipment, the most significant expense in a data center is cooling infrastructure. Finding a reliable estimate for market size is challenging because the industry is still in its infancy. However, some sources indicate that around $450 billion of Capex is planned for data center infrastructure over the next couple of years. Capex tends to be divided as follows in an average data center construction project:
60%-70% to the equipment contained within the racks (you’ll understand what this is in a bit)
15%-20% to cooling infrastructure
10% - 20% to the construction of the building per se, and other expenses
Assuming the $450 billion estimate is correct, that would mean that around $67-$90 billion will be spent on cooling infrastructure over the next couple of years. While this might not sound impressive considering the TAMs being discussed around AI, we should not forget that this market was mostly non-existent 10 years ago. I wouldn’t take these numbers at face value either, but it’s undeniable that the money spent on data center cooling infrastructure is expected to be significant over the coming years.
The explosion of said expenditure is coming from two main players: hyperscalers and co-locators. Whilst I’d imagine you’d know what a hyperscaler is by now (Amazon, Google, Microsoft, Meta…), a co-locator might be somewhat unknown to you. A co-locator is simply a company that builds the “shell” of a data center and provides related horizontal services (such as cooling, security, and network connections) on behalf of several end consumers. A co-locator’s customer is responsible for providing the remaining equipment (mainly what goes inside the rack). Co-locators are the reason why some companies can “bypass” hyperscalers without facing the risk of excess capacity inherent to data center infrastructure operations. Managing capacity is pretty tough in this industry, and not every company can excel at it:
I think one of the least understood about AWS over the last 18 years has been what a massive logistics challenge it is to run that business. If you end up actually with too little capacity, then you have service disruptions, which really nobody does because it means companies can’t scale their applications.
So, most companies deliver more capacity than they need. However, if you deliver too much capacity, the economics are pretty woeful and you don’t like the returns or the operating income. And we have built models over a long period of time that are algorithmic and sophisticated that land the right amount of capacity.
Source: Andy Jassy, Amazon’s CEO, during the Q2 2024 earnings call
Co-locators can be somewhat known as the demand aggregators for the part of the industry that wants to “own” its infrastructure.
While the data center cooling market may seem interesting at first glance, it comes with its fair share of challenges that set its constituents apart from a case like Nvidia’s. Recall that Nvidia has accrued a good portion of the AI benefits thanks to its market-leading GPUs and the inability of its competitors to offer (thus far) something similar. This is unlikely to be the case for data center cooling providers, which should serve as a valuable lesson for any investor: large and growing markets do not necessarily result in attractive investment opportunities. It’s not just about the demand but about the demand in the context of the available supply! (I recommend reading Capital Returns by Edward Chancellor to understand why).
To better understand the data center cooling market and its opportunities and challenges, I’ll go over the following topics in this article:
The basics of data center infrastructure
The importance of cooling
The evolution of cooling needs and infrastructure
Opportunities and challenges
Without further ado, let’s get started.
1. The basics of data center infrastructure
While I’ve met many people who know what a data center is, I haven't met many who understand how data centers operate. The basic structure of a data center is pretty straightforward. It consists of a shell (i.e., the building) that houses numerous racks typically installed in a hallway configuration, and the cooling infrastructure that ensures this equipment can run at optimal temperatures:
“Racks” are simply column structures that house the servers, networking equipment, storage equipment, power infrastructure, and a portion of the cooling infrastructure. It’s within the servers contained in a rack that one finds the famous CPUs and GPUs that have made Nvidia the most valuable company on earth (at least to this day).
Before understanding why cooling is key, it is essential to understand the two main types of data centers that coexist today. I’ve named them general-purpose data centers and AI or high-performance compute (‘HPC’) data centers. The difference between the two is key, not only for chip manufacturers (AI data centers mostly run on GPUs, whereas general-purpose data centers mostly run on CPUs), but also for cooling infrastructure providers (AI data centers require significantly more cooling infrastructure).
A general-purpose data center is essentially a data center designed to support the “basic” operations of an enterprise. This has important implications because, even though it processes data, it doesn’t have to do so continuously and its servers don't tend to run on peak utilization. Additionally, the data it processes tends to be somewhat structured.
An AI data center, however, primarily processes unstructured data and does so 24/7. This means chips might run at 90%-100% utilization for several days straight, and therefore, the cooling infrastructure must dissipate “peak heat” at all times. In short, AI workloads require significantly more computing power than general-purpose data centers, which is a key difference that translates into increased heat generation and, consequently, higher cooling needs.
2. The importance of cooling
If you’ve used any piece of technological equipment for a sustained period or if you’ve run several compute “heavy” programs on your computer simultaneously, you’ll know that technological devices eventually get hot. The reason is that most of the electrical power consumed by a computing device is converted into heat. This heat originates from a phenomenon known as electrical resistance. More computing power requires more electrical current, and the resistance caused by the IT equipment eventually converts into heat.
If this heat is not correctly dissipated, the chips and the equipment can surpass optimal temperatures, which can have undesirable effects:
Thermal throttling: When chips get hot, they run slower to stay within their specified temperature limits (which can typically range up to 100°C). This means that they start to perform below what they should
Downtime: If thermal throttling is not enough to stop the chip from overheating, they might end up shutting down, leading to very costly downtime
Hardware durability: operating in very hot environments reduces the lifetime of the hardware and ends up resulting in higher maintenance and replacement costs down the line
Cooling infrastructure is responsible for maintaining the optimal temperature in a data center, allowing equipment to run optimally and reliably. But it’s not just about cooling, it’s about doing it in an energy-efficient way. Let me introduce a very important metric here: PUE (Power Usage Effectiveness). PUE is a metric that all data center operators track, as it measures the percentage of total energy consumed by the data center that is used by the IT equipment. The minimum this metric can show is 1, which theoretically means that all the energy consumed by the data center is being used by the IT equipment.
In the real world, this metric will always be higher than 1 because a portion of the total power consumption is consumed by the cooling infrastructure. The goal is to bring it as close as possible to 1 while having the optimal temperature in the data center. Getting close to 1 at the expense of chips overheating and performing below their optimal level is not ideal either.
We’ve already seen that cooling infrastructure makes up a significant portion of data center Capex (15%-20%), but what about Opex? A pretty significant chunk as well. Power is the most significant operating expense for a data center, typically accounting for 40% to 60% of total Opex. Cooling infrastructure typically accounts for around 30%-40% of total power usage, which means that approximately 12% to 24% of total data center Opex is allocated to powering cooling infrastructure. Typical PUEs in the industry range from 1.1 to 1.6, which is consistent with approximately 70% of total power consumption being allocated to IT equipment.
If we add this extremely high Opex to the significant Capex expense, and consider that around an additional 10-15% of Opex is allocated to maintaining cooling equipment, there’s no denying that cooling is front and center in data center costs. Adding all of it together, a typical data center will spend around 15% of CapEx on cooling infrastructure and around 30% of OpEx to run this equipment. Many people discuss the significant amount hyperscalers spend on chips, but few discuss the substantial amount they spend on cooling.
Cooling has always been crucial in the data center industry, but AI has been a significant inflection point. AI workloads require much more computing power, and higher computing power has been achieved by increasing density within the racks (putting chips closer together increases computing power). Until here, all good. The problem is that increased density is resulting in stratospheric amounts of heat being generated by AI racks. ASML mentioned during its latest Capital Markets Day that the limit to AI is not computing power but rather cost and energy consumption:

The numbers here are quite staggering. The industry typically uses a metric called kW per rack, which measures the electrical power consumed by the equipment installed in a rack at any given moment. General purpose data centers used to average somewhere between 3 and 20 kW per rack (the later for the most advanced cloud) whereas AI and HPC (High performance compute) data centers tend to average somewhere around 30-80 kW per rack (higher than 80 kW is also normal) and this metric is increasing as companies look for more computing power. This means that AI data centers consume up to 10x more kW per rack than their general-purpose counterparts.
To understand the scale of this power consumption, it’s worth comparing it to something we all use and are familiar with: our homes. The typical power consumed by the average US home is around 1.5 kW, meaning that the power used by an HPC rack could power more than 40 homes (assuming 60 kW per rack). No wonder people are starting to worry about the ability of the current power generation and infrastructure to support AI.
Now, computing power is a double whammy for power usage. The higher the density within a rack, the more power it requires to function, and the more computing power we enable, the more heat it generates. This results in even greater power consumption for the cooling equipment. This means that the importance of cooling in the data center goes far beyond providing optimal temperatures; it’s also closely related to sustainability.
3. The evolution of cooling needs and infrastructure
Now that we understand the crucial role cooling plays in digital infrastructure, let’s examine its evolution over the years. Before the arrival of AI, data centers were typically cooled like a regular building: with air cooling provided by HVAC manufacturers (many of which have been calling data centers a key market for a while). To maximize efficiency, data centers use a hot/cold aisle configuration. The backs of the racks (through which heat dissipates) face each other, creating a hot aisle, whereas the front of the racks receives an inflow of cool air. Simple air cooling infrastructure + this configuration was pretty much all it took for the servers to operate at optimal temperatures:
Air cooling, however, is falling behind the needs of AI datacenters. The reason is that higher computing power generates more heat in the same space, meaning that significantly more heat per rack must be dissipated. This means that a new infrastructure/method is required: in comes liquid cooling.
There are two main types of liquid cooling (so far):
Direct liquid cooling (‘DLC’)
Immersion cooling
Let’s start with DLC. In direct liquid cooling, water or a refrigerant is pushed directly into the server (the source) to carry away the heat. This warm liquid flows out of the rack and is replaced by cool liquid in a continuous cooling process:
Direct liquid cooling is a much more efficient cooling solution for AI workloads for several reasons. First, it goes directly into the source of the heat. Secondly, water/refrigerant is a much better dissipator of heat than air. For AI data centers, air simply can’t do the job.
Immersion cooling takes it to the next level. Racks are entirely submerged in dielectric liquid (it needs to be dielectric so that it doesn’t damage the IT equipment). This is, in theory, the most effective solution because, being submerged, the liquid captures 100% of the heat generated by the servers:
Liquid cooling is in its infancy (around 95% of data centers are still air cooled), but it’s quickly taking the front stage with the arrival of AI. All modern AI data centers require a hybrid solution (air/liquid).
There’s just 1% of the data center industry that’s liquid cooled, and still, 99% has to be liquid-cooled the next 10-15 years.
Source: Former Data Center Operations Manager at Meta
4. Sizing the opportunity and understanding the challenges
I believe there are several opportunities and challenges in the industry. While the timing will be anyone’s guess, it does seem that the cooling opportunity that AI brings with it is significant. With hyperscalers and co-locators expected to invest significant sums in AI data centers over the coming years, a substantial portion of this expense is likely to be allocated to cooling infrastructure, thereby significantly expanding the industry. With these data centers working day in and day out, the maintenance opportunity also seems significant over time.
Now, there are a couple of things to be aware of here. I would categorize the challenges into two main groups:
The evolution of the technology
Expanding supply coupled with a lack of differentiation
Let’s start with the first. As discussed throughout the article, cooling technology is evolving rapidly. Over the past 5 years, we’ve gone from air cooling to direct liquid cooling to immersion cooling. The industry is hungry for more computing power, which means that more heat will be generated. However, the industry is also seeking more efficient solutions and cooling methods, making innovation a requirement. This ultimately means that whatever solution is the most used today can change in a couple of years, and nobody really knows how it will evolve.
This “flaw” is much more dangerous when coupled with the second one. There’s currently a supply crunch in the industry because hyperscalers are investing significant amounts of money into data centers, and the cooling industry is unprepared. This supply/demand imbalance might create a false impression that providers are differentiated. While there might be some differentiation in terms of customized solutions (as a former Meta employee said: “Those kind of custom solutions and those kind of custom partnerships is what the hyperscalers look for”), commodity equipment is not differentiated, and numerous companies can offer it.
The cooling industry, however, is aggressively investing in capacity to satisfy the needs of the data center industry, which may lead to a situation where demand and supply balance out eventually, and providers start to compete on price to gain contracts. This would be further accelerated if hyperscalers and co-locators decide to slow down the buildout. Right now, margins and backlogs can be a tad misleading because demand outpaces supply but it’s unlikely that the industry will remain in said supply crunch forever. This might also be what the market is worried about, as the stocks of providers of said infrastructure are significantly off their highs after delivering staggering returns over the past couple of years, and despite backlogs and growth still remaining at a good level.
Two of these companies dissapointed on margins, which honestly is consistent with competitive intensity increasing as supply catches demand:
Even though everything has a price, this seems to be a good example of an okay industry being disguised as a great industry due to a temporary supply/demand imbalance. The imbalance will not correct itself overnight, but when it does, the industry can potentially look a lot different than it does today. Operating margins are currently acceptable for these providers, but this is a capital-Intensive industry where excess supply can quickly erode these margins.
I am currently looking into one of the three cooling infrastructure providers I shared in the graph above (in many cases, this is not their only business), which has a somewhat differentiated approach to data center cooling infrastructure and has been growing staggeringly fast. I hope to bring an in-depth report for paid subscribers soon.
Have a great 4th of July,
Leandro
Great article - have you looked into Vertiv Holdings also in this space. Dave Cote had a amazing track record as CEO of Honeywell and has been the chairman of the board at Vertiv since it went public.
Excellent article. Great overview of data center construction directly applicable to stock selection. Thanks!