Informatica doubles down on the cloud data pipeline

On Jan 14, 2020

Over the past few years, we’ve been focusing our research on how cloud-native architecture has transformed databases. By separating compute from storage, database providers have been able to simplify operation through serverless deployment; improve service levels by leveraging fast interconnects and cheap storage to scale replication; and reinvent the way that transactions are committed through leveraging the intelligence built into storage.

By virtue of the fact that enterprises are increasingly having to work with data originating both inside and outside the bounds of the data center, Integration Platform-as-a-Service (IPaaS) has become a very competitive market. The IPaaS market is crowded, including a range of providers who focus on application and/or data integration.

We recently had a chance to do a deep dive into Informatica’s Intelligent Cloud Service (IICS). This is actually the second go-round for Informatica in the cloud, as it reengineered the entire Informatica portfolio into a cloud-native platform. Introduced in 2016, IICS recently reached critical mass in its coverage of the Informatica portfolio.

The company has been in the news lately for reaching several key milestones. For the first time since going private, the company cleared the $1 billion mark in recurring revenues and they just announced a changing of the guard, with its head of products stepping into the CEO role.

And to get a bit more mundane, yes, Informatica too has caught the data catalog bug. Yes, we have our own rant about catalogs, as we’ve always believed that data catalogs should be part of a broader integration solution rather than their own silo. But the reality is that in most organizations, the notion of an enterprise data catalog will prove as practical and likely as achieving an enterprise data model and master data management scheme. And anyway, there are many types of data catalogs. Data can be kind of personal to the people or line organizations that own or consume the data, so department-or division-level data catalogs have had unusual staying power. OK, we’re talking about you, Alation.

So, it’s realistic that Informatica is positioning its data catalog as “a catalog of catalogs” because most organizations are probably going to have more than one. Informatica’s catalog aims to be the connective tissue that sources metadata and discovers relationships from a variety of points. Informatica is embedding its catalog in the metadata tier of its portfolio.

Cutting back to the chase, Informatica’s cloud Integration Platform-as-a-Service (IPaaS). platform shows what’s possible when you refactor and rearchitect what was previously a series of disparate tools into cloud-based microservices architecture.

Building on the common metadata core, there are distinct tiers for connectivity and integration, where ingest and ETL operations are triggered; to user experience; and processing engines, where you can select the execution method of choice, such as Spark for batch, real-time streaming. The system optimizes automatically for the mode of execution by pushing processing close to where the data is stored. For instance, SQL calls could be pushed inside the database, Spark processing to the data lake, while real-time IoT processing is directed to the streaming engine. It allows physical separation of workloads, so ingestion, replication, query processing, data quality, and data profiling can operate side by side while minimizing or eliminating interference.

By separating compute from storage, and abstracting common functionality into microservices, it allows the platform to take advantage of serverless operation, eliminating the burden of managing autoscaling from the customer. By refactoring the “core platform” (e.g., the runtime, secure agent, clustering, connectivity, etc.) from the data management services, it facilitates reuse and enables the system to better withstand faults – key to delivering consistent SLAs.

Of course, no dialog about any managed cloud service is complete without spotlighting the role of machine learning, as both have symbiotic relationships. A key draw of the cloud is the operational simplicity that it can deliver, and in a managed cloud service, the service provider can “learn” from how customers use its service to provide the guided experiences that can further simplify operation and rapidly debug operational issues that would otherwise bog down the IT folks in troubleshooting. In turn, by operating a managed service, the cloud provider has the treasure trove of logs that feed the ML models. In our 2019 annual look ahead, we spoke about the relationship of cloud and machine learning to databases: cloud would redefine the architecture of a database, and machine learning would run it. The same applies to any platform as a service offering, such as IPaaS.

Informatica has been steadily increasing the use of machine learning in its services and has branded it CLAIRE. It’s not a monolithic feature or single algorithm, but an umbrella of ML that automates the legwork to keep operations simple. For instance, if an integration job fails because of a problem with caching, CLAIRE “responds” with recommendations for retuning the cache. While it might not sound as dramatic as an AI model that solves world hunger, in a managed cloud service, the modest optimizations (such as retuning the cache) that ML delivers add up.