Home
>
Data Science
>
Update The Data Stack Journey: Lessons from Architecting Stacks at Heroku and Mattermost

March 29, 2022 by Phu Nguyen

Update The Data Stack Journey: Lessons from Architecting Stacks at Heroku and Mattermost

Main Contents:

The Data Stack Journey: Lessons from Architecting Stacks at Heroku and Mattermost is an article under the topic Data Science Many of you are most interested in today !! Today, let’s InApps.net learn The Data Stack Journey: Lessons from Architecting Stacks at Heroku and Mattermost in today’s post !

The Data Stack Journey

Eric Dodds

Eric leads Customer Success and supports Growth teams at RudderStack. He has a long history of helping companies architect customer data stacks and use their data to grow.

The moment your business first provisions a database to store data, you’ve taken your first step on the Data Stack Journey. Understanding where you are on your journey can help you choose the right data tools for your business. This toolkit will enable you to leverage your data for its maximum impact.

Ultimately, the Data Stack Journey is the evolution of your data, data tools and processes over time as your business and data grow and change. By choosing the right tools and technologies at the appropriate steps of your Journey, you will be able to maximize the value that you’re able to extract from your data while also controlling your spending on tools and resources. Perhaps most importantly, you can make wise decisions early that pay dividends as you grow, helping you avoid migration/integration work and the need for re-architecting for scale.

Company Stages

Let’s start by looking at the stages businesses generally go through as they grow, from small startup to international enterprise.

As companies evolve, so do their data and data technology needs. With each stage of growth comes new challenges and we’ll specifically call out targeted advice for each stage.

First Steps: You have a small number of customers that you maintain hightouch relationships with. The founders are wearing all the hats and most back office processes are being done manually.

Seed: Your product has gained traction and you’re no longer able to personally message every customer. Your business has started to generate revenue and you are quickly iterating the product. This is generally where Seed and Series A stage VC companies are. Most traditional departments, such as Sales, Marketing, etc., are staffed by small (1-3) teams of people.

Growth: You’ve found product market fit and are aggressively growing your market share. You have thousands (or even more) of customers (the number also depends on the business model). Annual revenue is now in the millions. Your organization is starting to feel growing pains and needs to scale and mature. This is around where most Series B and C companies find themselves.

Mature: Your business is considered a big player in its market and is considered one of the de facto choices in your segment. Product iteration tends to be more deliberate and calculated. Optimizing any business metric by even a handful of percentage points could yield millions of dollars of cost savings or revenue.

Your Grandma Has Heard of Them: Your business deals with actual “Big Data.” We’re talking FAANG-level here, where data sizes are in terms of Petabytes. You’re building some of your own internal tools to deal with the scale of your data because there aren’t off-the-shelf solutions for all of your data needs.

Data Stack Components

Now that we’ve established the high-level stages of companies as they go through the Data Stack Journey, let’s take a look at the core components of the data stack. The customer data stack is a comprehensive, complex system, so for this post we’ve trimmed the list down the core components at the foundation of the stack that every company needs.

Data warehouse: the “center” of the stack, where all of your data is unified and accessible (to teams, tools, apps, etc.)

Data transformation: processing and transformation to ensure your data is usable for downstream functions

Data visualization: translates data into a usable interface for humans for all kinds of purposes, primarily analytical

Event stream: user behavior data from your websites and apps

ETL/ELT (tabular): table data from your cloud apps

Reverse-ETL: sending data from your warehouse to apps and other destinations

Data governance: keeping your data clean & compliant across the entire stack

Sponsor Note

sponsor logo

RudderStack is the smart customer data pipeline. Easily build pipelines connecting your whole customer data stack, then make them smarter with enriched data from your warehouse for identity stitching and other advanced use cases. Start building smarter customer data pipelines today with RudderStack.

Data Stack Components by Stage

This is where the rubber meets the road for companies — deciding which components to use at which stage. (This analysis comes from Alex’s experience building and re-building data stacks at various stages at companies like Heroku and Mattermost.)

Data Warehouse

First Step and Seed: You most likely can just query your production database, though I would recommend using a read replica/follower. I would say you can run dbt against your production database but I’d be very careful and might just skip dbt altogether until you can get a database for analytics only.

For a while, you should be able to get away with using a Postgres database as your data warehouse.

Growth and larger: Once you get to a certain size, you’ll most likely be looking for a proper data warehouse.

The top three players at the moment are Google’s BigQuery, Amazon Redshift, and Snowflake. I personally put Snowflake and BigQuery as 1a and 1b.

BigQuery has capabilities for streaming data into your warehouse very quickly (e.g. a handful of seconds to get certain sources of data queryable).
I find Snowflake to be the easiest to use, with a more conventional database permission system. Also being able to control the amount of compute power by controlling the size of your “Virtual Warehouse” makes it easy to run your big beefy jobs in a decent runtime. It also has interesting features like Snowpipe for pulling data into the warehouse.

With both BigQuery and Snowflake, you need to keep an eye on costs as you’re generally paying for how many compute resources you are consuming.

Redshift is great but is more like a traditional database system to run. The benefit to Redshift is that you’re just paying a flat amount regardless of how much you do with it. You have to spend more time on database administration tasks, however, such as optimizing column encodings, sort keys and distribution keys. Also, your compute power isn’t separate from your storage. So if you just need more storage, you have to add more nodes to your cluster, even if you don’t necessarily want to pay for more compute power.

Data Transformation

The one and only dbt. This the one tool I’d recommend for companies of any size. It provides a framework for transforming the data as part of your ELT process. With dbt, you define your raw data sources and build models in SQL-like templates and then build models that reference other models. Because you’ve now defined a hierarchy of models, dbt can now build your models appropriately based on the graph of models you’ve built.

Dbt will then automatically generate a docs site that allows you to visualize the lineage graph of your models and raw data sources. This can really help with onboarding new analysts and data engineers by giving them an easy way to explore how the data is modeled and its lineage.

Dbt Cloud is $50/mo/developer user with unlimited “view-only” users. You can also run dbt yourself through a tool like Airflow. In most cases, I’d start with dbt Cloud since the operational burden of running Airflow yourself could be more than it’s worth, especially if you’re just using it to run dbt.

Data Visualization

For First Step and Seed, I’d recommend Metabase. This provides data visualization that won’t break the bank. For future scalability and maintainability, I’d recommend leveraging dbt to build models that can answer most questions with simple SQL queries. By putting most of the complexity in your dbt models, you ensure that users can’t get themselves into trouble by incorrectly writing SQL. This also allows you to have more portability if you choose to move to a different visualization platform.

Pricing for Metabase can be as low as $100/mo if you use their cloud service.

For Growth and larger, I would recommend Looker. The only reason I wouldn’t recommend it for the smaller company stages is that the cost is much higher than alternatives such as Metabase. With Looker you define your data model in LookML, which Looker then uses to provide a drag-and-drop interface for end users that enables them to build their own visualizations without needing to write SQL. This lets your analytics team scale by not getting bogged down answering one-off questions from end users or having to build every chart or graph that your users need.

Event Stream

There are really only two choices here — Segment and RudderStack. They are both affordable in the First Step and Seed Stages, but RudderStack scales much better through the Growth, Mature and Enterprise phases.

Segment: Segment is a mature product with a heavy lean towards marketing and now revenue operations users. While Segment will sync your data to your data warehouse, a lot of features are restricted to only use data that is in Segment itself. This means it’s difficult to unlock the real power of your data. Also, with MTU-based pricing, you can be forced into situations where you’re trying to determine if implementing new telemetry is worth the cost.

RudderStack: RudderStack doesn’t store any data (they enable you to build a CDP on your own warehouse) and their event stream capabilities are as good or better than Segment. Also, they offer Cloud Extract (ELT) and Warehouse Actions (reverse-ETL), meaning you don’t have to manage your event stream separately from your other pipelines. Lastly, because they are open source, there’s no vendor lock-in.

Data Governance

This is closely related to the transformations component we covered above, so we don’t need to go into a ton of detail. Ultimately, you need to bake data governance into the stack from the beginning and dbt/Looker are the tools for the job.

First Steps and Seed: With dbt, you can hide the complexity of transforming raw data into usable models and also ensure that internal users are only looking at data from the vetted dbt models.

Growth and larger: With the combination of dbt and Looker, you can maintain good data governance. Looker gives you the platform to let you unify the definitions of business metrics so that everyone is playing off the same sheet of music.

ETL/ELT (tabular)

ETL/ELT solutions for getting tabular data from your cloud tools to your warehouse are becoming commoditized and that functionality is now ubiquitous among data stacks. There are various options and some up-and-comers to keep an eye out for.

Pipelinewise: An open source framework for running Singer.io taps and targets. It’s relatively easy to run yourself and can get you started with ETL. Recommended for First Steps and Seed (but you can use it as just an additional tool since it’s pretty straightforward to use).

Stitchdata: A SaaS product that also runs Singer.io taps and targets. The UX is relatively straightforward, though there are some rough edges and you can’t use cron style scheduling unless you purchase their enterprise plan. The pricing is based purely on the number of rows synced and generally is not very expensive.

FiveTran: This is the premium product in this space with a polished UX, but also comes with the higher cost. They also have a pricing plan based on “monthly active rows” which can make it a little more challenging to calculate. Recommended for Growth or larger companies.

RudderStack Cloud Extract: With Cloud Extract, you can pull data out of sources and into your warehouse. Because it’s integrated with the rest of the RudderStack product, it is easier to manage without having tons of different tools you have to log into.

Airbyte: This is a newcomer to the space and is open source. They are looking to move away from Singer.io formats and are also looking to integrate better with dbt. Keep an eye on this one.

Reverse-ETL

Enriching data in your business systems from your data warehouse is becoming increasingly important. The key is to ensure that each business system is getting populated with accurate data. This also allows for the various teams in your company to work with the technologies they are familiar with without having to jump between a bunch of different systems. For instance, enriching lead information in your CRM with product usage data can help your sales reps engage with customers better and lead to more conversions.

Reverse-ETL can also be instrumental in helping to automate and make business processes more efficient. There are a lot of competitors in this space and the dust hasn’t yet settled on which is the premier solution.

Census: As of this writing, Census is one of the more mature products in this space. Don’t underestimate how much a good UX helps to drive adoption within an organization. The default pricing model is per connector without charging extra per volume. If your organization syncs a ton of data to one destination, this could be the solution for you.

Hightouch: This product is developing quickly. One interesting feature is their Git sync for dbt which gives you “configuration-as-code” which will help with scalability and maintainability. Their pricing is based on volume of unique records synced to any number of destinations. This could be the solution for you if you’re looking to sync a relatively smaller number of records to a lot of different destinations.

Polytomic: This product is a new entrant into the space and they have a couple of interesting twists. One is that they don’t write to the data source that they’re pulling data from, which means you can hook it up to a read-only replica without issue. Also, they can join together data from a variety of sources, including Google Sheets, and push the combined data to a variety of sources. If you’re in the early stages of your Data Stack Journey this is definitely worth a look.

RudderStack Warehouse Actions: With RudderStack’s warehouse-first architecture, it dovetails nicely into using your Warehouse to send data to other sources. This feature is still developing but looks promising and keeps you from having “Yet Another Tool” if you have relatively straightforward reverse-ETL needs.

Only You Can Prevent a Messy Stack

It’s a great time to be a data engineer architecting a customer data stack. The tools available allow you to build from the data layer up with your warehouse as the center, giving you, ultimately, flexibility and scalability. Perhaps the best news is that there are tools like dbt and RudderStack that will scale with you from your first steps through becoming an enterprise company, drastically simplifying your work on the stack and giving you the ability to focus on your product.

At the end of the day, though, tools are only the conduit: the best companies make data itself a first-class citizen in the organization and invest time upfront in data modeling, transformations and governance to ensure that no matter which tools are used, data stays clean and usable through every stage of growth.

Featured image by Pixabay.

InApps is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: Metabase.

Source: InApps.net

List of Keywords users find our article on Google:

mattermost heroku

heroku

mattermost

heroku alternatives

dbt cloud

“dbt” “data modeling”

amazon redshift etl

data stack

fivetran alternative

sync google sheets with bigquery

redshift query history

splunk etl

heroku integration

google sheets to bigquery

google sheets bigquery

recruiters guide to ats data migrations

redshift generate series

gooddata connector

mattermost reviews

aws redshift etl

bigquery to google sheets

what is the effect of storage cost on google bigquery pricing?

redshift column types

rudderstack docs

heroku status

heroku com

bigquery etl

heroku client

splunk sort

heroku analytics

fivetran vs airflow

big journeys begin with small steps

fivetran google

redshift elt

splunk snowflake integration

heruko

google bigquery data types

data lineage snowflake

architecting with google cloud platform

dovenmuehle

looker pricing

heroku pricing

postgres etl

“metabase”

github metabase

splunk sort by time

process street google sheets integration

flat stacks

stack time complexity

heroku pipelines

looker etl

dbt model 1

google drive to bigquery integration

journeys pay rate

dbt docs

heroku id

hire heroku developers

send data from google bigquery to google sheets

etl amazon redshift

airflow bigquery

fivetran pricing

outsource image stitching

etl bigquery

gooddata competitors

looker alternatives

looker docs

heroku competitors

best platform to upload my architecture lessons

looker redshift

bigquery looker

is heroku free

bigquery google sheets integration

heroku website

heroku file system

redshift migration tool

airflow dbt

data lineage in snowflake

what factors affect google bigquery pricing

dbt data engineering

google bigquery to google sheets

heroku free

looker migration

postgres to snowflake

heroku server

bigquery google sheets

benefits of crm architecture in sales productivity

stitching data lineage

google analytics to redshift

recruitment crm data migration

crm data model

singer io

splunk inside sales representative salary

heroku status twitter

responsive web design enriching the user experience

eve online conduit jump

cdp kafka

splunk docs

status.heroku

splunk create graph from search

heroku jobs

heroku postgres data integration

splunk datamodel

looker jobs

mattermost jobs

heroku twitter

snowflake kafka connector

google cloud pricing calculator for bigquery

customer journey wikipedia

etl google bigquery

heroku company

update mattermost client

hightouch.io

heroku app facebook

mattermost docs

heroku connect pricing

rudderstack transformations

fivetran pricing model

heroku down

heroku play

lean business plan template google docs

looker to google sheets

splunk app development

splunk enterprise alternatives

case stack warehouse

kafka snowflake connector

looker google sheets

snowflake docs

splunk search reference

fivetran vs segment

heroku kafka

journey of a seed story

airflow helm chart

prometheus vs splunk

heroku app

matermost

looker fivetran

stacks project

unify street wear clothing

fivetran competitors

fivetran revenue

lookml

dbt run

heroku icon

dbt jobs

mobile growth stack

stage data engineer

centric business systems jobs

heorku

heroku enterprise

looker bigquery

metabase bigquery

splunk calculated fields

heroku scale

mattermost help

splunk max event size

full program dbt southern california

heruku

looker data actions

heroku pipeline

rudderstack destinations

update heroku app

component stack

how to connect facebook leads to bigquery

segment bigquery integration

splunk annual revenue

postgres helm chart

rudderstack cdp

dbt ui

heroku email

heroku logging

heroku review

kafka streams configuration

could not run dbt

dbt template

google sheets to snowflake

heroku postgres pricing

how much does looker cost

kafka helm chart

add column to redshift table

alternatives to dbt labs

companies similar to fivetran

confluent competitors

dbt documentation

extract bigquery

fivetran docs

heroku review apps

snowflake to google sheets

splunk reference architecture

airbyte cloud pricing

data lineage template

heroku x-frame-options

kafka to splunk

pricing heroku

redshift select into existing table

wawa pay

airflow with dbt

cern marketplace

dbt models

wawa starting pay

easy warehouse jobs

heroku net

heroku terms of service

metabase google analytics

snowflake create stage

splunk to kafka

dbt analytics

distribution key and sort key in redshift

heroku build

rudderstack open source

stack and still

bigquery lineage

heroku development

heroku images

kafka streams join

looker automation

what is heroku

alternative heroku

heroki

heroku forms

heroku.com pricing

learn heroku

snowflake lineage

dovenmuehl

heroku platform

looker cost

redshift distribution keys

review app heroku

stitchdata alternatives

amazon redshift alternatives

etl toolkit

snowflake data lineage

create table redshift

data governance snowflake

data lineage jobs

heroku getting started

heroku there was a problem with your login.

heroku to google cloud migration

how to push changes to heroku

redshift update

segment cdp competitors

snowflake cdp

airflow template

banking app ux case study

dbt model

google bigquery functions

heroku git pull

heroku open source

heroku postgres security

heroku start app

postgres heroku

redshift functions

redshift migration

redshift to bigquery migration

snowflake data governance

data governance in snowflake

heroku postgres upgrade

looker actions

segment etl

snowflake fivetran

splunk architecture components

stack meaning

postgres metrics

metabases

aws redshift integration

redshift health

software maintainability

Rate this post

Phu Nguyen

As a Senior Tech Enthusiast, I bring a decade of experience to the realm of tech writing, blending deep industry knowledge with a passion for storytelling. With expertise in software development to emerging tech trends like AI and IoT—my articles not only inform but also inspire. My journey in tech writing has been marked by a commitment to accuracy, clarity, and engaging storytelling, making me a trusted voice in the tech community.

Let’s create the next big thing together!

Coming together is a beginning. Keeping together is progress. Working together is success.

Let’s talk

Recommended

Tech News

February 11, 2025 by Tam Ho

Update The Data Stack Journey: Lessons from Architecting Stacks at Heroku and Mattermost

Read more about The Data Stack Journey: Lessons from Architecting Stacks at Heroku and Mattermost at Wikipedia

The Data Stack Journey

Company Stages

Data Stack Components

Data Stack Components by Stage

Data Warehouse

Data Transformation

Data Visualization

Event Stream

Data Governance

ETL/ELT (tabular)

Reverse-ETL

Only You Can Prevent a Messy Stack

List of Keywords users find our article on Google:

FITNESS APP DEVELOPMENT

ONLINE COURSE APP

EVE HR – WEB DESIGN

AIRGOGO WEBSITE

WALLET APP DEVELOPMENT

Ho Chi Minh City Launches Digital Traffic App 2017

Why Your Business Needs a Mobile App Rather Than a Website

7 Questions To Ask Yourself Before You ‘App’ | Entrepreneur

Homestays Marketplace Application Development

Applying blockchain in the telecom industry ecosystem

Blog post

9 Practical Tips to Choose a Mobile App Development Company for 2023

Top 10 Offshore Development Companies (ODCs) in 2025

How can businesses effectively integrate AI into their operations?

FITNESS APP DEVELOPMENT

Locations

Read more about The Data Stack Journey: Lessons from Architecting Stacks at Heroku and Mattermost at Wikipedia

The Data Stack Journey

Company Stages

Data Stack Components

Data Stack Components by Stage

Data Warehouse

Data Transformation

Data Visualization

Event Stream

Data Governance

ETL/ELT (tabular)

Reverse-ETL

Only You Can Prevent a Messy Stack

List of Keywords users find our article on Google:

Get a custom Proposal

You need to enter your email to download

Blog post

Locations