Discover more from Datascience Learning Center
What is the Modern Data Stack?
And why does it matter?
I enjoy the definitions we use to try to explain how digital transformation occurs with our evolving use of data and machine learning that is occurring. There has been a lot of hype around “the modern stack” in recent years, so I thought I’d try to cover the topic.
The modern data stack (MDS) is a suite of tools used for data integration. These tools include, in order of how the data flows:
a fully managed ELT data pipeline
a cloud-based columnar warehouse or data lake as a destination
a data transformation tool
a business intelligence or data visualization platform.
Rapid innovation in cloud data technology and exponential growth in the number of new products and companies in the data and analytics space is likely changing the future of MDS.
The term modern data stack seems to have entered terminology in late 2019 and since the beginning of 2020 the term “the modern data stack” has gone more mainstream.
TL;DR: It refers to the new, best-of-breed data architecture, which is centered around a cloud data warehouse (or cloud lakes) and built to cope with massive amounts of data.
Ease of Use
The most important difference between a modern data stack and a legacy data stack is that the modern data stack is hosted in the cloud and requires little technical configuration by the user.
The Modern Data Stack (MDS) has been popularized for a couple of years but only recently has there been convergence on its definition. In short, if you use any of the following, you are likely to have the foundational piece of a Modern Data Stack:
Big Query (GCP)
You can visualize the MDS as such, see above.
Designed for Scalability
Massive scalability: Another key aspect of modern tools is that they are designed for massive scale. Partly due to the elastic nature of the cloud and partly because most of these modern systems are designed with distributed systems principles that allow for horizontal scalability instead of old desktop and single server applications (e.g. Snowflake, Databricks, ThoughtSpot.)
Composable Data Stack
Composable data stack: The idea behind a composable data stack is that each product behaves more like a configurable component into a larger architecture, rather than its own island.
MDS is Replacing TDS
The traditional data stack (TDS) is failing to deliver and keep up with the data demands of any modern organization. To maintain a competitive advantage, organizations need data that they can act on at the right time as well as being flexible enough to adapt to changes. A TDS typically refers to on-premise Hadoop (ecosystem) and SQL warehouses that are both logically coupled and complex.
The MDS is elegant and it enables new solutions and a more sophisticated DataOps future.
The traditional data stack (TDS) is dying mainly due to:
1. Long turn-around time to untangle and set up infrastructure
2. Slow response to new information
3. Expensive journey to insights
So what is the main result of the MDS replacing the TDS?
Automation: Once user workflows are represented in code, it is fairly easy to automate them. This allows the tool to be leveraged at a much bigger scale than previously possible.
Innovation Through Agility
You can tell how innovative a company is by measuring how long it takes them to go from an idea to validation (or invalidation.) If you look at the difference between the companies that are doing fast-paced innovation and their stagnant counterparts, it is not so much that the latter lacks people with great ideas. It’s that the former has a much better environment for supporting innovation.
It is thought by advocates then that the MDS augments innovation and therefore profitability.
The Narrative of DataOps entering the MDS Era
So if Data science and DataOps were a story the 2020s are quite interesting in that driven by demand from businesses seeking greater value and more control over their customer data, a new approach to data analysis and activation has emerged and become the “go-to” approach for building a data architecture.
The Modern Data Stack (MDS) is centered around an ecosystem of tools businesses use to collect, move, store, transform, analyze, and operationalize their data.
I think MDS is a precursor to more unified DataOps platforms that can do more with less (i.e. more no code interoperability).
How Fast is the MDS Growing in Investment and Adoption?
The MDS market has seen explosive growth in the last 5+ years. In fact, between 2015 and 2020 alone, the top 30 data infrastructure startups raised over $8 billion of venture capital.
Rise of CDPs
A customer data platform is a collection of software which creates a persistent, unified customer database that is accessible to other systems.
Some believe that in the evolution of the Cloud, CDPs were the bridge to the MDS. Think about it, CDPs are often a foundational piece of an MDS. CDPs make integrations, one of the most challenging parts of building a traditional data stack, easy. This easy integration has allowed tools that run on customer data to flourish and an ecosystem of data tools to form.
More areas of the MDS will integrate with all-in-one solutions. For instance in marketing we have all-in-one marketing platforms – such as Salesforce Marketing Cloud and HubSpot Marketing Hub – have become popular over the last 10-15 years. But what will be the all-in-one ML and DataOps winners of the future? I don’t know, but I find it quite interesting.
Last Thoughts on Benefits of MDS Adoption
Speed and scale of data. Since MDS tools are almost always managed SaaS, they can scale almost infinitely.
Easier and faster implementation. It doesn’t take weeks or months to add a new data system anymore. You don’t have to provision infrastructure or deploy software.
Reduced cost and maintenance. MDS tools generally have consumption-based pricing. You only pay for what you use.
Easier to build on top of. MDS tools are built to plugin with other tools. They frequently offer out-of-the-box integrations with popular cloud services and platforms, so stitching together low-or-no-code systems is easier.
Solve bleeding edge use cases. MDS tools give your digital business a competitive advantage by giving you faster and greater value out of your customer data.
In short I think it’s obvious that a modern data stack (MDS) is a solution that can help an organization save time, effort, and money. It is faster, more scalable, and more accessible than the traditional data stack.
What do you think?
Data Science Learning Center is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
Did you like this exploration of the basic concepts? Give it a like so I know and I’ll write more posts like this.