The Modern Data Stack; a cloud data warehouse connected to various, specialized, Software-as-a-Service (SaaS) tools: the foundation for a Marketing Data Hub. Integrating multiple data sources into a data warehouse or operationalizing the data in marketing channels or other systems (Composable CDP). But where do you start building a modern data stack for marketing, and how can marketing benefit from it?
The modern data stack consists of various best-of-breed solutions, each providing a piece of functionality. Think of it as a modular system where you can relatively easily add or replace building blocks.
The data and marketing 'technology stack' are blending together.
Within the modern data stack, it's not just about collecting data in a central database (data warehouse) so that only analysts or data scientists can do their work. Data democratization - giving everyone in the organization access to reliable data - and thereby being able to operationalize the data, are increasingly becoming an integral part of the system (the stack).
Data from the warehouse can operationalized in other systems (such as marketing channels or CRM systems). So, data doesn't just go into the warehouse, but also comes out. For instance, there are tools that allow non-technical users to easily assemble target groups and automatically share them with a marketing automation tool or Google Ads. All of this is based on the data already present in the data warehouse. This development is now known as the Composable Customer Data Platform (CDP).
Also, components such as data transformation, governance, and self-service business intelligence are, in most cases, a fixed part of the Modern Data Stack. The fact is that the data and marketing technology stacks are slowly blending together...
The rise of the Modern Data Stack
Several of the developments that led to the rise of the modern data stack:
The shift from on-premise to cloud-native data warehouses (e.g. analytical databases like Amazon Redshift, Snowflake, or Google BigQuery).
The rise of SaaS tools and services within cloud environments that have a modular approach (building blocks)
The emergence of self-service business intelligence tools to make (reliable) data available to everyone within an organization (data democratization).
The great advantage of the modern data stack is that it has become less complex to set up such a system. You don't need an army of data engineers to stack the building blocks on top of each other. However, this doesn't mean the role of the data engineer has become redundant, quite the opposite. It can make the life of the data engineer easier and more productive.
Thanks to tools that can be integrated into the stack, end users themselves can work with the data. This includes both gaining insights and operationalizing the data. This applies not only for analysts but also marketing, sales, or operations teams.
Another plus is that the system is modularly built, and costs are often dependent on what you use. For both small and large organizations, a modern data stack is highly suitable: it can be expanded when necessary, in both scale and functionality. But what components make up such a modern data stack, and where do you start building?
The components of a Modern Data Stack
You can divide the stack into the following main categories, which I will explain step by step:
Data sources & integration
Data storage and transformation (ELT)
Data analysis and visualizatiion
Data activation (operations)
Data governance and Observability
1. Data integration: importing sources to the data warehouse
As mentioned, it has become much simpler to connect all the various data sources within your organization to the data warehouse, without the need for an army of IT specialists and data engineers. This doesn't mean that a data engineer is any less valuable.
The tools available in the market often contain 100 or more ready-made connectors. Think of pulling in marketing performance data from Facebook, Google, or LinkedIn, as well as CRM data from Hubspot, Salesforce, or Shopify. All of this can be configured without much technical expertise. However, it's advisable to have someone knowledgeable oversee the setup. Some popular providers of data integration tools are:
These tools ultimately pull data to the analytical database in the cloud data warehouse.
Not just customer data..
Given that we drop the Composable Customer Data Platforms (CDPs) concept, one might get the impression that only customer data is imported. However, this is absolutely not the case. Examples include performance data from marketing channels, product and pricing data, socio-demographic data, weather data, etc.
2. Data storage & transformation
The centerpiece of modern data stack is the (cloud-native) analytical database. These types of databases are capable of rapidly analyzing vast amounts of data. The most well-known products are:
Azure Synapse / Fabric
Almost all data integration tools mentioned in the previous section support the above databases as their destination. From experience, we know that choosing between different databases can be challenging, although their capabilities aren't vastly different. For an excellent comparison, I recommend checking out this article by Rogier Werschkull.
To ensure data remains consistent and usable throughout the process, it's essential to transform the data (think cleaning and merging). Since cloud-native analytical databases are so powerful, it's entirely feasible to load untransformed data (as-is) into the data warehouse and tackle data transformation as a subsequent step. This shifts from ETL to ELT (Extract, Transform, Load to Extract, Load, Transform). This changes the functionality expected from the data integration tools discussed in the previous chapter. All data transformation is then centralized rather than being partly in the integration process and partly in the data warehouse.
3. Data analysis: Towards a ‘Self-Service’ Model
Making data accessible to everyone within an organization is crucial for cultivating a data-driven culture. Hence, even without technical knowledge like SQL, retrieving desired data should be straightforward. It's therefore essential to choose a modern business intelligence tool that integrates with cloud-native databases. Most tools now offer self-service functionality, enabling end users to resolve their inquiries using data. This could involve easily exploring data sources or delving deeper into existing reports. Renowned providers include:
Looker / Looker Studio
Apache Superset(open source)
4. Data Operations: Activate Data from the Data Warehouse
Wouldn't it be great not only to analyze but also to (automatically) utilize the data in, for instance, your marketing channels or CRM? Effectively becoming a part of the marketing-technology stack?
Around 2021, tools designed to operationalize data from a data warehouse became more and more popular. This is often referred to as Reverse ETL. As the name suggests, Reverse ETL does the exact opposite of ETL (Extract, Transform, Load): it reintroduces data from your data warehouse back to its source or other systems.
This doesn't just benefit marketing but also sectors like sales or HR. Automatically create Google Ads or Facebook audiences, enriching insights in a CRM system (360º customer view), or personalizing email campaigns. Popular tools:
From Insight to Action
The foundation for a Composable CDP
Reverse ETL essentially positions the modern data stack to compete with stand-alone Customer Data Platforms (CDPs). One could view the Marketing Data Hub as the foundation for a Composable CDP.
5. Data governance and Observability
The final but by no means the least important step: data governance and data quality. Trust in the data is, in fact, one of the most crucial components for the entire system to succeed. This is a category where a lot of development is currently happening, and it is expected to soar in 2022. Broadly, there are two types of tools in this category: data catalogs and data quality & observability.
How can you tell from which data source a specific dashboard metric originates? Or conversely, if table X changes, which dashboards or systems might be potentially affected? Data catalogs help map this out. These tools link various data sources, tables, dashboards, or other data creators and users (data lineage). They often also provide documentation capabilities and a search function. Some tools include:
Data quality & observability
Besides insights and documentation, different data sources can also be automatically monitored. This can be either based on machine learning that detects data anomalies or manually added checks (like missing essential fields or data updated within the last 24 hours). Upon discrepancies, alerts can be dispatched, for instance, through email or Slack. Tools include:
Elementary (for dbt)
Start with a business case
Our top advice: start with a business case. The Modern Data Stack is perfect for this approach. Not every source needs to be connected right from the start. Begin with one or a few business cases and expand from there. This way, you can quickly create value and support. Demonstrate that it's not just a tech matter. With the right setup, the output should be easily accessible to the entire organization in terms of insights and operationalizing data within marketing but also other departments.
Initially, key fundamental decisions will have to be made, but expansion can be gradual afterward. Engage the right people and ensure the necessary expertise is on hand to establish a solid foundation.