Microsoft has revealed another piece to its emerging cloud stack for big data analysis.
On Monday, the company announced it had formulated a new query language called U-SQL, designed to run on the Azure Data Lake Store, which Microsoft plans to launch in preview mode by year’s end.
Microsoft is positioning the Data Lake Store as a service for analyzing large-scale unstructured data. It is designed to support tools built for the Hadoop File System (HDFS), but promises to be easier to manage than running a Hadoop or Spark cluster in-house.
Microsoft announced the Azure Data Lake back in April, but U-SQL finally answers the question of how useful information could be panned from the petabytes of corporate data that a data lake would collect. This is where the company started thinking about ways of simplifying the process of big data analysis.
Most organizations already have big data of some sort — log files, customer purchasing records, video footage — but don’t really have the advanced in-house programming chops to make use of it. Many do have Microsoft techies, however.
“Microsoft’s goal is to make big data technology simpler and more accessible to the greatest number of people possible,” e-mailed Oliver Chiu, Microsoft product marketing manager for Hadoop, big data, and data warehousing.
As a result, Microsoft built U-SQL to fuse the declarative familiarity of SQL with the expressive capabilities of a programming language, Microsoft’s C# in this case. It combines the best of both worlds, in theory anyway.
“We’ve heard that many data engineers struggle to process data with today’s tools. Code-based solutions offer great power, but are complex to learn; SQL-based tools are easy to start with but difficult to extend,” Chiu wrote.
A blog post by Michael Rys, Microsoft principal program manager for big data, goes into greater detail.
U-SQL provides a way to mingle SQL keywords with syntactic C# expressions, so that within a single script, a programmer can schematize the data from an unstructured source, use SQL to aggregate the data into the desired form, and then write the output to a file or table. U-SQL’s programmatic capabilities provide a way to work over the data in multiple steps, setting the stage for complex analysis.
Only time will tell if U-SQL’s approach will make big data analysis more palatable. Some Microsoft shops have already been kicking the tires, however.
Belgian IT services company Codit has been testing the technology with a big data job in mind: It is developing a system that combines usage data from smart home energy meters with energy spot market prices, in order to highlight those times when dialing back the power could save the most money.
For Codit, Azure Data Lake holds the promise that the company could test and build out such big data services without fussing over the underlying infrastructure, Codit chief technology officer Sam Vanhoutte wrote in an email.
Also time-saving is that U-SQL should be familiar to those who already know SQL and C#, minimizing the time it would have taken for a developer to learn additional languages. “If you know both languages, it’s indeed rather easy to get up to speed,” Vanhoutte wrote.
Having a single tool to query SQL, NoSQL, and other unstructured data sources could also be handy.
“I love the idea of having a standard tool that would allow me to query a diverse range of NoSQL or Hadoop-based big data services,” wrote Australian IT .Net consultant Michael Pine, in a Twitter direct message. He noted more companies are moving their data to the cloud, so a technology such as U-SQL could be potentially valuable.
“For me, the key requirements would be a tool that allows me to get to the data in a consistent manner across all sorts of data sources,” Pine wrote.
The combination of U-SQL and the Azure Data Lake Store can’t address all big data use cases, Vanhoutte cautioned. If you want to do some machine learning or stream processing within the Microsoft cloud, you will have to familiarize yourself on other Azure technologies, such as Machine Learning and Stream Analytics respectively.
Another potential limitation is the specialized nature of U-SQL: Microsoft did not specify whether or not U-SQL would be made available for non-Azure, or even for non-Microsoft, platforms. “We are taking customer and market feedback around U-SQL to determine what we [will] do in the future,” Chiu wrote.
The company did note that, in addition to the Data Lake Store, U-SQL could also be used to query data in the Azure SQL Database and Azure SQL Data Warehouse.
To be sure, there are other alternatives that combine SQL with programming language hooks, such as Spark SQL and Teradata’s SQL-MapReduce.
“Basically, this sounds like Microsoft is planting flowers in its walled garden to emulate what’s been growing outside for a while,” emailed Curt Monash, head of IT analyst firm Monash Research.
Feature image: “Big_Data_Prob” by KamiPhuc is licensed under CC BY 2.0.