- Home
- >
- Software Development
- >
- An App Builder for the Data Science Team
Companies are not just gathering massive amounts of data but also new sorts of data, such as location data and sentiment analysis, which are being used to not only chart the past but also, using machine learning, forecast the future.
However, corporations have not been able to exploit the data since sharing it inside ultimately requires too much time and human resources to construct the applications needed to harness the data correctly.
Enter Streamlit, an open-source platform that allows data scientists to easily construct online apps that will enable them to access and explore machine learning models, advanced algorithms, and complicated data types.
What is Streamlit?
“There is a completely new class of business intelligence problems that didn’t exist five years ago, and the traditional ways, using Tableau or Microsoft Power BI, just saying, ‘Let’s put up a dashboard, and let’s put up some graphs, and, and graph this data’ just no longer works in this world,” said Adrien Treuille, co-founder and CEO of Streamlit.
He and co-founders Amanda Kelly, and Thiago Teixeira, met while working at the innovation lab Google X in 2013.
They began with the question: What if we could make building tools as easy as writing Python scripts?
They wanted data scientists and machine learning engineers to be able to build apps that would let them interact with the data without having to call in a tools team or manage backend data engineering tasks.
Today the San Francisco-based company, which open-sourced the technology in 2019, has more than 16,000 GitHub stars and a community of more than 30,000 developers around the world. It is used by the likes of Delta Dental, Caterpillar, 7-Eleven, Uber, Ford and Pfizer.
Streamlit began with the question: What if we could make building tools as easy as writing Python scripts?
“Building a small web app in Streamlit takes me 10% as long as it’d take to build the same thing with a conventional app-building approach. Streamlit is an even bigger win for data scientists who don’t know JavaScript since Streamlit lets them build everything in Python,” said former Google data scientist Dan Becker, founder of Kaggle Learn and Decision.ai, now vice president of product, Decision Intelligence at DataRobot.
“Historically I’d have to manage frontend code, backend code and communication between them. With Streamlit, I can specify how I want the page to work in Python, and it takes care of everything. The pages look nice by default, saving me the trouble of writing CSS. Streamlit is uniquely easy to learn. It takes about 10 minutes to learn enough to be productive.”
Part of Existing Workflow
Rather than build a one-size-fits-all tool, the idea was to create Lego-like capabilities to let users create their own ways to make sense of their data. That might mean building sliders with different variables or pulling out subsets of data into sidebars to look at it in different ways.
These apps are visualizations of data written as just a few lines of Python code, the mainstay of data scientists’ existing workflow. React is the frontend framework used to render data on the screen.
Streamlit treats widgets as variables. Every interaction simply reruns the script from top to bottom.
It downloads the data only once, using a cache primitive that acts as a persistent, immutable data store that enables the app to safely reuse information. That eliminates redundant data fetches and computation.
The product deploys apps directly from private Git repos and updates instantly on commits.
It integrates with popular Python libraries used in data science such as NumPy, Pandas, Matplotlib, Scikit-learn and others.
“From my perspective, Streamlit is by far the fastest method to turn an interesting bit of analysis, machine learning model or clever visualization into a data product that you can easily show to other people online,” said Tyler Richards, a data scientist at Facebook who also wrote a book on Streamlit.
“I consistently have this problem where I have an awesome result at work or for a personal project and am forced between dumbing it down to something I can stick easily into a dashboard or Word doc (a static graph or some basic performance stats on my model), or spending a huge amount of time creating a custom Flask/Django app. Streamlit is the best of both of these worlds because I can just directly create a fully functioning web app from my already created Python script and use their tools to host it easily.”
Hours, Not Weeks
Treuille pulled from his experience working with students on machine learning projects as a professor at Carnegie Mellon University and as a vice president of autonomous vehicle startup Zoox.
With Streamlit, a project that previously would take weeks can be done in a few hours, he said.
“[The data science] group has unique challenges the company has never seen before, particularly when it comes to how do we make available the insights that we’re producing, scalably, so that the marketing team can directly benefit from a model that we’ve built that predicts the future, or so that the product team can themselves look through all of this geographic data filtered in ways that are not traditionally possible, and then jump in and see sentiment analysis applied to this or that country,” he said.
“So those are the kinds of like, next-generation challenges that data scientists and machine learning engineers are very good at solving, but which have not been systematically shared more broadly in the company.”
The company is built on the open-source technology and adds enterprise-grade data security and authentication as well as collaboration features for both data scientists and their customers.
“Literally in an afternoon, within the work that you’re already doing, you can go from an analysis that was primarily for yourself … to something that’s interactive and shareable with somebody else,” said Kelly.
“We’ve had people tell us all the time, ‘This would have been 10,000 lines of code if I had to put this in a different language like Flask, and it was, like 100 lines [in Streamlit].’ Or ‘This took another team three and a half months to build; I replicated the exact same thing in six hours.’”
New Features in 1.0
Though Streamlit can be deployed anywhere, the company recently announced Streamlit Cloud to handle containers, authentication, scaling, security and more.
The company’s physical infrastructure is hosted and managed on Google Cloud Platform (GCP), taking advantage of its built-in security, privacy and redundancy features.
Users’ permission levels are those assigned on GitHub. Workers with write access to a particular app can make changes, but only those with admin access can deploy an app or delete it.
The technology recently reached the 1.0 milestone.
“We spent basically all of 2020, and a good chunk of 2021, both adding these features but also hardening, making sure that we were really testing with the community, really figuring out and saying, ‘Is this not just the fastest way to go out and build an app, but the best way to do that in terms of the primitives and ease of use,’” Kelly said.
Among those new features:
- Improving caching by harnessing Apache Arrow for serialization and memory management, which added speed and responsiveness.
- Providing more customization with app layout primitives and themes to enable users to match their company brand.
- Adding statefulness with session state and forms to enable users to create more complex apps.
- Adding components and integrations to enable users to write their own components or pull in libraries like SpaCy, HiPlot or Folium. New functionality also includes the ability to send and receive video or draw on a canvas.
Its roadmap includes plans to add to its widget library, improve the developer experience and make sharing of code, components and apps easier.
In a blog post, Crystal Huang, who describes herself as an aspiring data scientist, described her project using Streamlit to apply face mask detection to photos using deep learning algorithms.
Streamlit has raised $62 million, most recently a $35 million Series B round announced in April from Sequoia and previous investors Gradient Ventures and GGV Capital.
>> Learn more: Tableau vs Power BI vs Qlik Sense – Comparison in 2022
Looking to hire well-experienced professionals? Go to InApps right now to find yourself some of the most well-versed individuals in the industry.
Let’s create the next big thing together!
Coming together is a beginning. Keeping together is progress. Working together is success.