Lone Henchman

Sometimes about tools, sometimes about graphics, sometimes I just ramble.

Memory and Physics

September 05, 2025
This post begins the series Memory and Caching.

Computers have both a superpower and a problem: they are physical objects. And what the laws of physics give in terms of the massive potential to do work in parallel, they take away in the speed-of-light limit and heat. All of this has to be engineered around.

I'm writing this for an audience of programmers less experienced with low-level systems concepts: both junior devs and more senior devs who've spent their careers in higher-level code. Basically, I just want a link that I can share the next time I need to explain this stuff beginning from first principles. If this is all old news to you, skip on ahead to another section.

Let's consider RAM. Consider a 16GB stick of DRAM. That has \(16 \times 2^{30} \times 8\) (which is \(2^{37}\)) bits of storage space. And each of those bits needs its own physical cluster of atoms to store its value. Now, atoms are tiny, so that's not actually all that much material, but we also need a way to route signals between each of those bits and the CPU's working silicon. That's a lot of (tiny) wires.

The problem of space

The first problem physics presents is that physical things take up physical space, and that finding a physical thing in physical space is not, in fact, a simple problem. Ever lost your keys in a small apartment, or just around your desk at work? Now go try to find them in an entire greater metro region.

An analogy to cities

Hardware designers solve this problem the same way city planners solve it. In a city, there are many rooms, and people might need to go from any one of those rooms to any other. But nobody would try to build a path from each individual room to every possible other room in the region. That would be impossible - the paths wouldn't fit. Worse, they'd intersect one another in so many places that there would be constant gridlock.

Instead, the city connects its spaces like this:

This also influences how we address space within the city.

Mapping analogy to hardware

RAM is structured similarly to the example of the city.

Individual bits are treated like rooms - no one bothers with them. The smallest addressible element of memory is logically the byte which is (on all modern machines except maybe some very special snowflake embedded devices) a cluster of eight bits. But even that leaves the problem space too large since the total number of bits is enormous and a factor of 8 only goes so far.

At the level of a RAM chip, bytes are further clustered into larger units, which must be read or written all together. This is a bit like a city where everyone lives in an apartment: in order to get somewhere you need the address of the building, and you can figure out how to get to the individual unit (and then the correct room within that unit) later.

RAM chips are further grouped onto RAM sticks, where there are several chips. This is a bit like different suburbs on a city.

And, depending on the particulars of the machine, this hierarchical grouping of bits into clusters (of clusters (of clusters (of ...))) produces a structure which, like a building address, turns the problem of finding the value of a particular bit among a sea of bits into a series of smaller problems: talk to the right RAM stick, to the right chip on that stick, to the right row of memory cells within the chip. Going to the analogy, solving this series of smaller problems effectively navigates the CPU to the right apartment block, and then it gets to figure out how to get to the specific unit within that block (and room within that unit) on its own.

But the CPU isn't driving tiny cars on tiny roads. The CPU is sending and receiving signals. That means that what's actually going on is physical connections are being made between (speaking loosely here) teeny tiny little wires. Each step of the hierarchical addressing problem results in groups of transistors physically disconnecting one set of wires and instead connecting another.

The problem of time

Solving the problems of space, we pick up problems in time.

In some sense, the speed of light being what it is, problems of space are problems of time: it takes more time to send a signal down a long wire than a short one.

In order to deal with space we wound up dividing it into a hierarchical structure. And at each level of that heirarchy, we decide which subdivision we're interested in by toggling transistors. Time is gained by packing memory tightly and thus reducing the total length of wire between any memory cell and the CPU. But time is also lost it because each of those junctions between levels in the hierarchy takes time to reconfigure between read and write operations.

Then there's the problem of electrons simply wandering away over time. Yeah, that'sa thing - physics is fun like that. And we can't just insulate our memory cells better to prevent it because that would make them slower (for several reasons). DRAM deals with this by looping through its contents, periodically reading and rewriting them in order to replenish the electrons in memory cells that have started losing them (and to flush excess electrons from memory cells that have started accumulating extras). Any request to read or write that the CPU makes has to be coordinated around the work of the refresh cycle. (Refresh cycles are relatively infrequent, so the CPU only very rarely has to wait for one to complete. However, the circuitry which makes sure that the CPU's request doesn't happen to collide with the refresh cycle is not, itself, free.)

How this impacts the CPU

From the CPU's perspective, main RAM poses two problems:

The first problem could be trivially solved if not for the second one. (After all, what could be the harm in reading more memory than necessary and then just ignoring the unneeded bits?) The second problem is where all the action is, because the CPU can move data around internally hundreds of times faster than data can be moved between the CPU and RAM. That means that anything a CPU can do to reduce the frequency with which it needs to talk to RAM is going to have a big payoff.

How CPUs (and their designers) deal with this

The CPU handles the problems posed by memory being a physical thing by having a cache of recently accessed memory. That is, the CPU keeps a copy of some of the data that's in RAM so that it can be manipulated without paying the cost of a full trip out to the DRAM chips.

Cache memory is also built differently from DRAM - it is much faster. The problem with it is that it's also much more power-hungry, so there can only be a little bit of it before it becomes impractical to operate.

This memory is organized around a structure called a cache line. A cache line is just some number of bytes (typically 64) plus a little bit of extra storage where the CPU keeps track of exactly which 64 bytes are stored there.

In addition to recently-used memory, the cache keeps a copy of memory that might soon be used. Basically, the CPU's memory controller keeps track of what is being read and written. When it spots certain common patterns in those memory accesses, it assumes that the pattern will continue. If there's then a lull in traffic between the CPU and RAM it'll fill that gap by prefetching a copy of the next bit of data that the pattern predicts.

And finally, once the CPU has fetched its own copy of a given block of memory it can use its fast internal circuitry to quickly isolate the individual byte (or even bit) that it was interested in.

This post is part of the series Memory and Caching. The next part is How Cache Impacts Software.