Datafiniti mines and cleans web data for your business

Imagine for a moment you’re Nickelodeon, and one of your revenue streams is licensing characters from your shows to toy manufacturers. You can’t dictate how much the manufacturers and retailers sell the toys for, but you’d prefer people not think of Teenage Mutant Ninja Turtles as cheap, or Dora The Explorer as out of reach.

Written by Colin Morris
Published on Sep. 22, 2015
Datafiniti mines and cleans web data for your business

Imagine for a moment you’re Nickelodeon, and one of your revenue streams is licensing characters from your shows to toy manufacturers. You can’t dictate how much the manufacturers and retailers sell the toys for, but you’d prefer people not think of Teenage Mutant Ninja Turtles as cheap, or Dora The Explorer as out of reach.

Fortunately for you, there’s an overwhelming amount of data freely available online painting a picture of which toys cost how much, who’s buying them and where, and whether they’re satisfied. The only problem is that picture is broken up into a million pieces scattered across a hundred websites.

This is where Datafiniti comes in.

The Austin-based company, which bills itself as “the first search engine for data,” was founded in 2011 by CEO Shion Deysarkar with a vision to make all that scattered web data instantly searchable, so you could theoretically pull together a data set depicting Dora’s market positioning as easily as a list of neighborhood take-out spots.

Of course, there’s a little more to it than that. Gathering the data is only the first step of a sophisticated process. Experts say data scientists typically spend 80 percent of their time cleaning the data they find, leaving very little time for actually analyzing it.

​“The issue with using web scraping tools is that businesses naively assume that the scraped data will be immediately useful,” Deysarkar said.​

Datafiniti’s quality control process combines human intervention with machine learning so they only have to set up each data source once, which dramatically streamlines their process compared to those of competitors and gives their clients more time to focus on analyzing the data.

“Our engineers are great at setting up the process and overall structure of how data flows through our system,” Deysarkar said. “Say we have an issue with usernames on reviews. The human process can highlight where the issue is happening and can tweak the automated process accordingly.”

Datafiniti’s suite of tools and services, including 80legs, their self-serve data search engine, can be applied to virtually any industry. Every time we review a product, service or business, we’re contributing to a massive body of online data that companies can use to make strategic decisions. Once Datafiniti finds, cleans and packages this data for analysis, the possibilities and implications are profound.

Among those implications is privacy risks for the online users whose data is being mined, so Datafiniti created the Charter for Responsible and Ethical Acquisition of Data, or CREAD. Among other things, the document commits Datafiniti to only gather data that is publicly available and not imitate users or services to gain access to data.

 

Have a tip for us or know of a company that deserves coverage? Email us via [email protected].

Hiring Now
Indeed Flex
HR Tech • Information Technology • Sales • SEO