From Inodes to HDFS: How Usage Patterns Reshaped File Systems
From Inodes to the Cloud: even file systems eventually move out of their parents’ directory!
Early file systems were designed in a world where workloads were modest. Users created small files, often less than 200 kilobytes, edited them occasionally, and deleted them rarely. Under these assumptions, designers optimized for low latency. Directory lookups had to be quick, disk seeks had to be fast, and metadata consistency mattered since files were constantly being updated in place.
A coworker once told me about his wife, a psychologist, who ran her entire career out of one single text file. She never created new documents, never organized folders. Instead she stuffed everything into that one file and used bookmarks to hop around. Over the years it grew so massive that normal editors like Notepad waved a white flag.
Most of us would have split the data into separate files, but she stuck to her one-file model. Eventually she had to buy UltraEdit just to keep working, because it was the only tool that could even open the beast. My coworker joked that she had basically invented her own version of the Google File System, except instead of 64MB chunks spread across servers, it was one giant file chunked across her patience, with replication done entirely in her head.
On Unix systems, this design revolved around the inode. An inode stores the metadata of a file, including its size, permissions, and the disk block addresses of its contents. Directories map human-readable names to inodes, which means multiple names can point to the same inode and symbolic links can redirect across directories. The programming interface is elegant and simple. Programs use open, read, write, and close system calls, and the operating system resolves them to block-level operations. Consider the following example that reads the contents of a file in C.
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
int main() {
int fd = open("input.txt", O_RDONLY);
if (fd < 0) return 1;
char buffer[1024];
ssize_t n;
while ((n = read(fd, buffer, sizeof(buffer))) > 0) {
write(STDOUT_FILENO, buffer, n);
}
close(fd);
return 0;
}
This model worked perfectly in the age of desktop computing. A single machine, low failure rates, and interactive human activity shaped the design. The system was built for reliability at the scale of one disk and one user.