Databases: The Last 10 Years and the Next 10
A reflection on how databases evolved over the last ~10 years, and what those changes suggest about the decade ahead in the era of Agentic AI in the enterprise
This was originally delivered as a guest lecture to graduate students at Boston University’s engineering department. I am sharing it here with adjustments for a broader audience, because I believe the topic is relevant to anyone interested in data in the AI era.
Most people don’t think about databases.
They sit quietly underneath modern software systems, powering applications, storing operational data, and enabling everything from simple transactions to complex business workflows. It’s only when something fails—a payment doesn’t go through, a page doesn’t load, a number looks wrong—that their importance becomes visible.
Over the past decade, the demands on these systems have changed significantly. The rise of the internet and SaaS reshaped how they were used and what was expected of them. We are now entering another transition, where the shift is in how software systems are constructed and used in the age of AI.
This is a reflection on how databases evolved over the last ten years, and what those changes suggest about the decade ahead with the full adoption of AI in the enterprise.

Let’s quickly recap the old world. Prior to the SaaS era, most operational systems operated in environments where growth was relatively predictable. Databases were designed accordingly. Systems like Oracle RAC and IBM DB2 dominated, optimized for large enterprises with stable workloads, centralized infrastructure, and the ability to provision capacity ahead of demand.
These systems were sophisticated and powerful, but they were also shaped by assumptions that suddenly no longer held true: that scale would increase gradually, that vertical scaling was sufficient, and that cost of reliability was high. In many cases, they provided capabilities well beyond what the average workload required, but enterprises were tuned towards paying for peaks and pre-provisioned capacity.
SaaS and Internet-Scale Growth
The shift to SaaS disrupted all those assumptions.
Internet distribution enabled companies to reach large user bases almost immediately, and growth curves became both steeper and less predictable. Instead of steady increases in load, systems had to accommodate sudden surges in users, transactions, and data volume, often within compressed timeframes.
More importantly, operational data became tightly coupled to revenue. Systems managing customers, orders, inventory, or advertising were no longer back-office infrastructure—they were the business itself. Database performance and reliability moved directly onto the critical path.
At the same time, with the explosion of SaaS, companies experiencing this growth were not large enterprises. They were cost-sensitive and operationally lean startups or SMBs, which made traditional enterprise database solutions too costly and impractical for their initially simple needs.
Adapting Existing Systems
Rather than replacing databases outright, teams adapted what was available.
One path involved using systems that were already capable of handling large volumes of data—key-value stores, early NoSQL systems, and even data warehouse infrastructure. These systems provided scalability, but they were not designed for transactional workloads. Key-value stores sacrificed structure, and warehouses were optimized for analytical queries, not low-latency operations.
Using them for operational systems meant relinquishing relational models, transactional guarantees, and SQL as a unifying abstraction. The cost of that decision was not always immediately visible, but it manifested as increased complexity in application logic. Responsibilities traditionally handled by the database—joins, consistency, integrity—were pushed outward into the application layer.
The other solution that companies reached for to preserve transactional consistency was to extend relational systems themselves.
Sharding became (and in a lot of cases still is) a common strategy: partitioning data across multiple databases and routing requests at the application level. This approach preserved familiar semantics within a shard, but introduced significant complexity. Data had to be continuously rebalanced, cross-shard queries became difficult, and application logic grew more complex and brittle.
Many large-scale systems relied on some form of sharded architecture, but the operational overhead was substantial. It worked, but it required constant intervention by teams of data engineers and DB experts.
Architectural Constraints
Across both approaches, a common pattern emerged – systems were being stretched beyond the assumptions they were originally built on.
Relational databases like MySQL and Postgres were fundamentally designed for a single-node architecture. Scaling them beyond that model required layering additional mechanisms—sharding, replication, routing—on top of that core assumption.
This led to an increasingly clear realization that architectural boundaries aren’t easily bypassed. Systems can be extended beyond their original design, but scaling will hit the limits of the original architectural constraints, and the complexity of doing so will compound exponentially over time.
Globally-distributed Databases
This realization led to the next phase in database evolution where adjustment wasn’t incremental, but a fundamental shift in the relational database architecture.
A new type of operational database designed from the ground up with scalability as a first-class concern. Systems like Spanner integrated ideas from distributed storage and file systems directly into the database layer. Instead of treating scale and data distribution as an external concern, it became part of the core design.
This had several implications. Storage models became more append-oriented. Concurrency control mechanisms changed. Transactional guarantees were preserved, but required new approaches to coordination and consistency.
Perhaps more importantly, tradeoffs that were previously implicit became explicit. Concepts like snapshot reads, consistency levels, and latency versus correctness were directly surfaced to the application developers.
The system did not eliminate these tradeoffs. It made them visible.
A New Shift: Who Uses Databases
If the last decade was defined by how databases adapted to scale, the next shift is defined by how they are used—and by whom.
Two distinct but overlapping patterns are emerging.
Agents as Users
In the world of AI agents managing and controlling business workflows, databases are created, used, and discarded by agents as part of a task.
An agent may provision a database to organize intermediate state, populate it with structured data, query it, and tear it down once the task is complete. In this model, the database is not a persistent system of record but a transient tool. Decisions about schema, storage, and lifecycle are made internally by the agent and never exposed to the end user.
This shifts the emphasis toward:
rapid provisioning and teardown
API-first interaction models
systems that can be composed and discarded efficiently
The database becomes part of the execution environment rather than a system directly managed by humans.
Existing systems with (non-AI) users
At the same time, human-operated systems—and the databases underlying them—are not going away anytime soon.
Most business-critical data continues to reside in long-lived systems of record:
relational databases and warehouses
SaaS platforms exposed through APIs
internal services built over years of iteration
These systems encode not just data, but business processes and institutional knowledge. Replacing them completely is just not practical.
Instead, agents automating existing workflows explicitly need to operate within this existing landscape—querying data, triggering workflows, and integrating across systems that were not designed with them in mind.
The result is a dual mode of interaction: databases as ephemeral tools within agent-driven workflows, and databases as durable systems of record shaped by human decisions over time but accessed by agents.

Semantics: the old new frontier
In both cases, access to data is no longer the primary challenge. Understanding it is.
Databases provide structure through schemas and store the underlying data, but they do not fully capture the meaning of that data. Concepts such as “churn,” “active user,” or “revenue” are dependent on business context, internal definitions, and assumptions that vary across teams and over time.
This type information exists outside the database and is fragmented. It lives in the heads of data engineers and analysts, in documentation that is often incomplete or outdated, in application code where business logic is embedded in calculations, and sometimes in internal catalogs that themselves require maintenance.
For human teams, this fragmentation is manageable through shared context and communication. For AI systems to be effective with operational data, this fragmented knowledge needs to be understood and codified. A semantic layer becomes the way these definitions, assumptions, and business meanings are made explicit and usable.
That is why it is such a hot topic of discussion in the current zeitgeist. Model intelligence and reasoning are no longer the bottleneck; getting the right information accurately to the agent is.
Implications for the Next 10 Years
As AI gets fully embraced into enterprise workflows, the importance of agents understanding and operating on the right information will become critical. In fact, it is one of the key gaps holding operational agents back today.
Data has always been and will continue to be distributed across systems—databases, APIs, and services. Schemas will remain necessary, but insufficient on their own to convey meaning. Correctness will depend not just on executing queries accurately, but on selecting the appropriate sources and interpreting them within context.
At the same time, interfaces to data systems will need to accommodate both human users and programmatic agents, which places different demands on abstraction, clarity, and efficiency. So data system APIs will need to evolve to accommodate this shift.
A Practical Perspective
The evolution of databases over the past decade reflects a consistent pattern – systems adapt to the constraints imposed on them, whether those constraints are scale, cost, or operational complexity.
The next decade introduces a different set of constraints—how systems are constructed, how they are interacted with, and how meaning is conveyed across increasingly complex environments.
Understanding how data systems are structured, what assumptions they make, and where their boundaries lie is essential to working with them effectively, especially when AI is ultimately managing the interactions and workflows.
Why Fundamentals Still Matter
One of the biggest misconceptions that I see today in the age of AI is that systems knowledge and engineering fundamentals are becoming less important.
The opposite is true.
Agents will increasingly write code. They will build systems. They will interact with infrastructure. But they will do what you ask them to do.
This means it is extremely important that those guiding the AI to build complex software systems:
Understand systems deeply
Make the right architectural decisions
Orient the agent toward correct outcomes
Even as systems evolve, query planning continues to influence performance, indexing strategies shape access patterns, and operational tasks such as index backfills become more complex in distributed environments. Databases remain on the critical path of application performance, where small inefficiencies can have outsized effects.
Understanding these systems is not simply a matter of familiarity with tools, but of understanding the tradeoffs embedded in their design.
The more clearly we understand the fundamentals of the underlying software infrastructure, the better positioned we are to navigate what comes next.
Thanks to Prof. Ed Solovey for inviting me and to the students for their great questions and interactions!
Subscribe to our blog to follow our journey as we share our learnings!


