The amount of data businesses process today would seem unfathomable to a database administrator only a decade ago. Where we once talked about megabytes, gigabytes, and terabytes, more often today we talk about hundreds of terabytes, if not petabytes, of data. New and constantly evolving data streams have overwhelmed legacy systems, leading data warehouse architects and database administrators to develop highly specialized, platform-specific skills to create workarounds, custom tools, and new capabilities.
Despite the increased volume and complexity, businesses recognize the importance of data in making decisions, and they’re investing more in time, tools, and people to manage their data. Technology investments are expected to increase by more than half (57%) of organizations worldwide in the next 12 months, with cloud computing and business process management as two of the top three areas targeted for greater investment by tech CIOs, according to IDG. And recent IDC research highlights the rapid acceleration of data-driven business priorities, predicting that:
- Worldwide revenue for big data and business analytics solutions will reach $260 billion in 2022, representing an 11.9% compound annual growth rate (CAGR) from 2017-2022.
- The installed base of storage capacity worldwide will more than double over the 2018-2023 forecast period, growing to 11.7 zettabytes (ZB) in 2023.
As technology advances to meet new data demands, it also creates new areas of opportunity for business growth and operational efficiency. Today, for example, some technologies already enable automatic query optimization, while machine learning algorithms help automate a variety of once-manual functions. Advances in technology are even starting to let data warehouses tune themselves. This capability is accelerating the speed at which data warehouses deliver value to businesses.
With the advance of machine learning and availability of near-infinite storage and computing power in the cloud, we’re headed toward an exciting new era: the age of the self-adapting data warehouse.
Toward the Self-Adapting Data Warehouse
Fortunately, computing power and memory have advanced to the point that machines can process much larger and more complicated data sets. Data warehousing can greatly reduce the time to value for a company’s data and make data management processes more efficient.
As a result, many businesses are shifting to cloud-based data warehousing as a service, in which businesses can access and query their data in near real time, without incurring the costs of housing, managing, and maintaining an on-premises solution in a data center. The shift also transforms workload tuning activities, one of the critical parts of a successful data warehouse, from a manual in-house activity to part of the service offering. With self-adapting data warehouses, tuning is not just part of the service offering, but can be increasingly automated with machine learning. The scale and volume of data isn’t a limiting factor. Instead, it’s an advantage, helping discern patterns more quickly and delivering the right data to the right person, at the right time.
In this way, data warehousing will resemble other smart applications, which incorporate data-driven, actionable insights into the user experience itself. Like the updates to mobile applications, data warehousing as a service has advanced to the point that upgrades and patches to the system are completely invisible to the end users.
More broadly, the move to a self-adapting data warehouse complements an industry-wide shift toward agile methodologies, in which businesses extract value faster through regular deployment of incremental improvements. In the world of software, agile methods help developers get to market faster with a minimum viable product (MVP). In the world of data warehousing, self-adapting systems help businesses get insights faster, with less manual intervention. As the data volume grows, and more queries are executed, the self-adapting data warehouse will deliver exponentially better results based on the additional inputs it has to analyze.
As we enter the age of the self-adapting data warehouse, three key benefits await those businesses that make the shift.
Benefit #1: Enjoy greater speed and agility
The self-adapting data warehouse will load, process, and present data in the best possible way for each user without human intervention, delivering more business value faster. A self-adapting data warehouse will analyze larger data sets, which were previously out of reach, more quickly, and without the months of preparation and planning required for legacy on-premises platforms. With a self-adapting data warehouse, it’s possible to load the data and begin analyzing it immediately, without the need to spend time tuning.
Benefit #2: Anticipate patterns faster and optimize queries
The more data a machine algorithm is exposed to, the faster, and more accurately, it will recognize patterns. That said, it takes a lot of data to reach a statistically significant, accurate conclusion. Data warehouses have an advantage in this arena, since they see data from across dozens of sources and potentially millions of queries. With these inputs, a self-adapting data warehouse will anticipate future user queries (much in the same way auto-fill completes common searches), or even recommend query rewrites based on past query patterns. The self-adapting data warehouse will also provide intelligent query rewrites automatically, with the engine determining which view of the data will deliver the fastest result to each user for each query.
Benefit #3: Automated data organization and optimized workloads
As any database administrator can attest, determining how to index and organize the data for greatest congruence, performance, and accessibility is a challenge in legacy platforms. Getting performance and value out of a data warehouse historically took a lot of time and manual effort for planning and maintenance.
Self-adapting data warehouses will automate the organization of the data, eliminating the manual work required to design the optimal organization of the data. Eliminating these complex, error-prone, and manual tasks creates an opportunity for database administrators and architects to focus on broader strategic data management initiatives.
A self-adapting data warehouse will also optimize workloads. In the past, when multiple users accessed a very large data set simultaneously, legacy data warehouses could not easily optimize all queries and response times. But a self-adapting data warehouse with an underlying machine learning algorithm can determine how best to organize compute resources for the various workloads. This includes automatically scaling compute to support increased concurrency when there is an unexpected surge in usage, maintaining a consistent level of service. A self-adapting data warehouse will organize data and the compute power automatically so that it provides the best performance for each user and maximizes the number, and type, of workloads it can execute concurrently.
Conclusion
The rapidly growing scale and variety of enterprise data, once a concern among data warehouse experts, is quickly becoming a key asset for reducing time to value. By leveraging large-scale data sets and query history, the self-adapting data warehouse will recognize patterns faster, anticipating user needs and optimizing queries, views, and workloads.
The move to a self-adapting data warehouse represents the next phase in the evolution of data warehousing. But at its foundation, it’s a return to the real value of any data warehouse–helping businesses derive more value from their data, faster.
Feature image via Pixabay.