How an Industrial DataOps Company Builds Its Data Pipelines

According to Microsoft’s 2019 Manufacturing Trends Report, the price of an industrial IoT sensor was expected to be just $0.38 cents in 2020. By 2021, it was predicted that there would be a whopping 36.13 billion “smart” machines.

The fourth Industrial Revolution is in full swing and it promises to usher in an age of optimization across the manufacturing, oil and gas, and utility industries. Of course, the actual ushering will have to be done by someone — or more accurately, teams of people at companies like Cognite.

The Norwegian company, which has an office in Austin and is growing its team after a recent fundraising round, has built an industrial DataOps platform designed to help what it refers to as companies in “heavy-asset industries” un-silo their data and do more with it. Enabling companies with thousands of machines that generate data seemingly every second to glean insights from that information requires data pipelines that are built to scale from the jump. According to Vibha Srinivasan, senior director of data science, this is precisely what Cognite has done.

We recently sat down with Cognite’s senior director of data science to learn more about the tech used to build data pipelines that handle an almost unfathomable amount of data.

Vibha Srinivasan

Senior Director, Data Science • Cognite INC.

What technologies or tools are you currently using to build your data pipeline, and why did you choose those technologies specifically?

The Cognite industry solutions team implements data-driven solutions for our clients in the energy, manufacturing and power and utilities sectors. Fortunately, we have an amazing solution architecture team that does the bulk of the data extraction and pipeline work. They use everything from custom-made integrations to connect to specialized industrial systems to middleware like Kafka and MQTT for IIOT and integration platforms like Azure Data Factory for ingesting and transforming client data.

All data is ingested into Cognite’s core offering, Cognite Data Fusion, or CDF, which is a cloud data warehouse optimized for industrial data. The data scientists use homegrown Python SDKs to extract data from CDF, typically as Pandas dataframes, for their work. We also use Cognite functions, which are similar to cloud functions, to run our pipelines on a timed schedule.

Our solutions are designed to be highly scalable from the get go.”

As your company — and thus, your volume of data — grows, what steps are you taking to ensure your data pipeline continues to scale with the business?

Since our clients are heavy asset industries, the data we work with is high in volume and variety: think sensor data, work orders and equipment hierarchy. The CDF platform and the custom extractors were designed for scalability, and the engineering team continually makes improvements to the APIs for speedier data extraction and transformation. Thanks to their support, our solutions are designed to be highly scalable from the get-go.

Recent Articles