Memory Chips: Barriers to Innovation and Bridging the Gap

While logic chips advanced in parallel processing, memory has failed to keep up with their innovation, creating a bandwidth bottleneck for which a solution is becoming increasingly important

Sep 12, 2024

Think about all the technological developments that have been made in the last ten years as it relates to faster smartphones, LLMS, and ability of cars to parallel park themselves. Much of these efficiency gains are happening because logic chip innovation has continued to follow the path set by Moore’s Law, albeit with some hitches along the way. Meanwhile, Logic’s little brother, Memory, has been forgotten in the cold, and his innovation has slowed. Today I want to explore a bit about the physical reasons why this has happened, and frame investment opportunities within the memory chip space.

What are logic and memory chips? How do they differ, and how do they work together?

It’s easy to think of the logic chip as the “brains” of the electronic device. They are more customizable and thus more complex relative to memory chips, and innovation in logic chips are primarily centered around speed and efficiency. Importantly, the logic chip primarily forms calculations, and the scale of transistor shrink at each node jump (see here if I’m speaking gibberish) historically moved in line with the decreasing power consumption in what we’ll later explain as Dennard scaling. Put more simply, smaller transistors on logic chips require less voltage to operate, which reduces dynamic power consumption even as the sheer number of transistors on a chip increases.

Note: Dynamic power consumption refers to the amount of power used by a digital system when the signals change between 0 and 1. It’s easy to think of this in the context of riding a bike. When you pedal (transistor switches on), you’re using your body’s energy. The faster you pedal, the more energy you need (the more often the transistors switch on and off, the higher the frequency). If you’re pedaling uphill, you need more force to move your legs and pedal (higher voltage is needed to move the electrical signals through the chip). If you are carrying a very heavy backpack, it is harder to keep pedaling because of the weight on your back (in the same vein, the higher the capacitance, the more energy it takes to charge and discharge, consuming more power through the process)

Memory chips, meanwhile, store data either permanently (NAND flash) or temporarily (DRAM). It holds data that the logic chip needs to access in order to perform it’s computational tasks. During LLM training, for example, the model is constantly reading large batches of data and updating weights in real time. Because of this uber-low latency requirement, DRAM — dynamic random access memory — is preferred for LLM training.

In keeping with the bicycle analogy, a memory chip’s role can either be temporary or permanent. Now imagine you are biking from San Francisco to Washington DC. Likely, you’ll have a backpack that has all your short-term needs (DRAM) and then you may have a support vehicle carrying all the extra gear that’s not immediately needed but could be called upon (NAND). If you want to quickly have a sip of water, you’d rather keep it in your backpack in the same way a computer wants data that it can easily access through it’s DRAM for the logic chip’s quick computations.

Barriers to Innovation in Both

In logic chips, we are potentially nearing the end of Moore’s Law. I say potentially because although we are approaching the limits of physics as we know it at 3nm, science has continuously engineered solutions to these problems across the semiconductor value chain, from lithography to process technology to design automation. That being said, the shrinking of transistors has posed a few problems:

Manufacturing precision: At very small nodes, manufacturing defects can increase as a result of the level of precision required to fab, lowering the yield and driving up costs, primarily at the foundry level first, but then throughout the system
Quantum tunneling: “As chipmakers have squeezed ever more transistors onto a chip, transistors have gotten smaller, and the distances between different transistor regions have decreased. So today, electronic barriers that were once thick enough to block current are now so thin that electrons can barrel right through them.” (IEEE Spectrum)

Additionally, logic chip power, which has grown in multiples due to the adoption of GPU, has created thermal limitations for innovation, sparking demand for cooling solutions. This has emerged as a key investable ideas for KKR — acquired CoolIT Systems for $270m last year — and ExxonMobil and Intel, who partnered earlier this year on a liquid cooling related venture.

On the memory side of things, the end of Dennard scaling (2005-2006, by most accounts) was extremely important.

Very simply, Dennard scaling called for the fact that as memory components shrank, the power density would remain the same. By 2006, however, smaller no longer meant better. In fact, smaller transistors led to increasing consumption and heat generation due to leakage currents, which describe unintended currents that flow even when the transistor is off as a function of small transistors. As this increased power consumption, we arrived at the end of Dennard scaling.

While logic chip advancements were highly encouraged for the sake of creating faster technology (and rightly so), shrinking cells were a fundamental challenge to the memory chip manufacturer. You can only shrink the capacitors (which store the data) so much before they can’t hold charge, which leads to problems with data retention. To offset that, you would have a higher number of refresh cycles, where DRAM rewrites the data stored in its memory to prevent data loss due to charge leakage. But ya know what that means? More power consumption!

On the NAND side, smaller transistors makes it harder to trap electrons, increasing the risk of data loss. To offset this risk, further data management of NAND (long term storage) is needed, all of which requires power and slows down performance. The key takeaway here is that as the chips are shrinking, more power is required, and without that, we have a memory bottleneck that will require innovation to solve. So if the old fashioned way of shrinking transistors to increase memory bandwidth no longer works, what other options have we created?

Shifting Towards 3D and Layered Memory Architectures

The shift towards 3D chip architectures in both logic and memory is in response to the limitations of a 2D design approach, as mentioned above. In the realm of logic, 3D architectures such as Fin Field-Effect-Transistor (FINFET, see below for image from an extremely helpful LinkedIn post) stacks components vertically to improve performance and power efficiency.

Similarly, in memory-land, 3D NAND and High Bandwidth Memory (HBM), which both use vertical stacking to increase capacity and bandwidth, are becoming popularized. 3D NAND is meant to increase capacity and storage density by building upwards instead of just shrinking the cells. It is also more power efficient by reducing power consumption per bit stored, making it most ideal for mobile markets and data centers, both of which require a high level of energy efficiency. That being said, the performance bottleneck of data transfer speeds is not quite solved with this problem. In fact, latency is higher sometimes because the stacked architecture of the chip creates a longer pathway for data access.

High bandwidth memory (HBM) is perhaps the most promising solution as it relates to our backpack (DRAM). Using through-silicon vias (TSVs), memory is once again built upwards (usually anywhere from 4 to 8 chips in a stack), allowing for closer proximity to the processor using a technology called 2.5D packaging. How this works is beyond the scope of what I’ve learned, but see below for a graph from Jefferies thaat helps visualize.

The operative point is that high proximity to processors reduces the distance that data needs to travel, reducing latency and increasing bandwidth. This is mission critical for optimizing high bandwidth workloads such as training LLMs, because GPU processors need to access massive datasets quickly. Thus, as processing power has increased, HBM is an opportunity to bridge the gap between demand for data and speed of transfer from memory.

As you may have imagined, there is a high degree of complexity with this memory orientation, creating one of the biggest drawbacks to HBM.

HBM requires a sophisticated fabrication process that involves stacking multiple DRAM chips on a silicon interposer using through-silicon vias (TSVs) and micro bumps. TSVs are vertical electrical connections that pass through the silicon wafer, while micro bumps are tiny solder balls that connect the DRAM chips to the interposer. These techniques enable high-density and high-speed data transfer between the chips, but they also increase the cost and difficulty of production. Advanced packaging and metrology requirements are more capital intensive in HBM manufacturing given the 2.5/3D stacked die arrangement.
- Prakash Vijayan, Driehaus Capital Management: June 2024

Now there are other emerging memory technologies like MRAM, ReRAM, and Dodge RAM (last one was a joke), but we’ll skirt those for the sake of this article. But I do want to get at how this is an investable trend within the chip world.

Investing in the Memory Bottleneck

Companies engaged in HBM are my clear winners. This article from Fabricated Knowledge helps to frame the commoditization of the memory space and how it can be likened to Oil and Gas.

Let’s pretend that oil and gas producers magically found a new type of energy that we mentioned called SuperOil+. They can repurpose and buy equipment they are already familiar with to drill this SuperOil+, but it will take 2x as much equipment to yield the same volume of SuperOil+. However, SuperOil+ costs 5x as much as oil. Additionally, SuperOil+ is in insatiable demand, and as much SuperOil+ as you can drill will be bought at least for the next few years.
So what will you do? If you’re an energy producer, you will likely put every dollar of capex towards SuperOil+ because the returns are better and pull back capex from Oil and Gas spending, which ironically lowers supply growth. Now the supply and demand curves of oil and gas are not impacted by SuperOil+, so supply growth going down will help price stabilize even faster. This means that SuperOil+ is a windfall as a new market, and a windfall for the traditional oil and gas market. The entire ecosystem will be more profitable because supply growth in the traditional oil and gas ecosystem is going down. That’s exactly what’s happening today, except replace SuperOil+ with HBM.

Fabricated Knowledge

Why HBM is the Hottest Thing in Memory

a year ago · 73 likes · 16 comments · Doug O'Laughlin

Because of this, I would invest in those with the most exposure to HBM, which happens to be the Korean semiconductor industry, in particular SK Hynix (~50% market share) and Samsung (~40%). The margin capture has plenty of upside for these companies once you get through the capex cycle, which I would expect to be large given the amount of spend across the tech spectrum.

Of course memory companies have to be cautious, because as a commodity, memory oversupply can lead to price declines and give back the margin gains. But because of (1) how long the memory bottleneck has been constraining the developments in logic and (2) how fast logic is progressing forward, I expect HBM to be a highly investable area.

Nikhil and Xander's Blog

Discussion about this post