How “able” is your ETL process? Modernizing data prep, part 1
For all the exciting discovery that data analytics enables, data preparation involves, for most users, an equal amount of drudgery. That’s true for a number of reasons, first and foremost being that enterprise data is rarely structured for analytic use; it’s often designed for transactional system performance or to minimize storage. Wrangling in data that are spread across different locations and technologies (database, cube, cloud-based, on-premises, flat files, etc.), and then cleaning up “dirty” (incorrect, improperly encoded, duplicated or blank) data is a time-consuming and labor-intensive task, constantly repeated as data sources come and go.
Many different data preparation technologies exist to solve these problems, but is every data prep tool ”able” to solve the entire data preparation problem?
Why old (and newer) tools fall short
The term ETL is itself a vestige from computing’s past; “extract, transfer and load” has been required to make stored data usable since mainframe days. Today, traditional ETL tools still serve their purpose, receiving input from any data source and being able to output to any target. These tools support a wide range of transformations, using a client/server approach. The server application processes the data, and IT developers use a client application. However, today’s ETL tools generally are not specific to business intelligence (BI). IT users require data modeling training to optimize the data for eventual analytic consumption by business users.
Far newer are the desktop data preparation tools, which have proliferated over the last decade or so. Designed for business analysts, these tools are installed on desktop computers. They can extract, transform and load enterprise data, and output it to the file formats of specific BI tools. But desktop data prep tools cannot output data in robust and scalable formats that are the foundation of modern enterprise BI. These limitations in their functionality and scalability make desktop data preparation tools a niche solution for business users’ ever-growing demand for more data.
What data prep for analytics must be able to deliver
At a high level, the right data preparation solution for modern analytics is one that is highly “able” – able to be used by any level of data consumer, from casual business analyst to IT developer. The ETL process can receive input from any source and output transformed data to any analytic data model. It must be architected for modern multi-tenant cloud use, with appropriate user interfaces for both business and IT audiences.
Importantly, the data prep process must support users’ insatiable demand for data, placing no limits on data volume, source complexity or user counts.
As illustrated below, these requirements translate into a concise list of capabilities. The right data preparation tool is:
- Approachable: The solution needs to be appropriate for the job role.
- Accessible: The solution should be browser-based.
- Programmable: The solution should not be limited to built-in transforms in the user interface.
- Scalable: The tool should support unlimited scalability.
- Flexible: The data prep tool should be flexible, supporting multiple input and output options, and easily react to data source changes.
- Networkable: The data preparation efforts of each user should be available for others to leverage without re-creating connectivity or logic.
- Repeatable: The tool should allow for full automation of the entire data preparation process.
- Extensible: Data preparation should not be limited to what a single tool can do; the ability to extend the data preparation process to a "network" of services is key.
Birst has taken a smarter and more modern approach to developing data preparation tools to meet the ambitious requirements of today’s enterprise users. Importantly, the Birst solution makes data prep easy and accessible to all users, enabling each data analyst to create their own view of the business while still maintaining consistent calculations and business rules across the board. Birst eliminates desktop software installs that create file-based local data silos, enabling data to be networked throughout an entire organization.
Further, Birst’s purpose-built ETL capabilities are specifically designed for scalable, enterprise-class analytics, automatically creating a dimensional data model that delivers analytic-ready data. Designed from the ground up to be highly scalable in cloud analytic deployments, Birst provides browser-based, always-on access to analytics, even during data processing.
The table below summarizes the difference between desktop data prep, traditional ETL tools, and Birst ETL.
Part 2 of this blog will take a deeper dive into the eight ways that Birst is far better “able” to meet the needs of data-driven enterprises.