RSAC insights: Concentric AI directs Google’s search techniques towards locking down data sprawl

By Byron V. Acohido

In order to extract value from the Internet, data sprawl first must get reined in. This has always been the case.

Related: Equipping SOCs for the long haul

What good is connecting applications, servers and networks across the public cloud if you’re unable to securely operationalize the datasets that these interconnected systems store and access?

Solving data sprawl has now become a focal point of cybersecurity. It’s about time. Much of the buzz as RSA Conference 2022 happens this week (June 6 – 9)in San Francisco will be around innovations to help companies make sense of data as it gets increasingly dispersed to far-flung pockets of the public cloud.

I had the chance to visit with Karthik Krishnan, CEO of San Jose, Calif.-based Concentric AI, which is in the thick of this development. Concentric got its start in 2018 to help companies solve data sprawl — from the data security and governance perspective – and has grown to 50 employees, with $22 million in venture capital backing. For a full drill down of our discussion, please give the accompanying podcast a listen. Here are a few key takeaways.

Crawling, classifying

Jeff Bezos solved data sprawl for selling books and gave us Amazon. Larry Page and Sergey Brin solved data sprawl for generalized information lookups and gave us Google.

In much the same sense, companies must now solve data sprawl associated with moving to an increasingly interconnected digital ecosystem. And addressing data security has become paramount.

Up until recently, Krishnan observes, companies could get away with taking a “vault” approach to securing their sensitive data. Security teams generally knew which on-premises server held certain batches of both structured and unstructured data; they could narrowly control access to a company’s on-premises data centers, and that was deemed good-enough, security-wise, he says.

Today sensitive information, especially unstructured data, routinely gets dispersed across multiple cloud-connected platforms. Software developers share coding in public repositories; legal and finance teams email records to each other; the operations, marketing and sales staff collaborate with each other and connect to third-party suppliers.


“It’s no longer just customer data or personally identifiable information, it’s also strategic business content, confidential data, financial information and intellectual property,” Krishnan says.

The idea to launch Concentric struck Krishnan when he was working as an executive at HP and a customer admitted losing track of sensitive data.

“He was very fearful that he couldn’t pass an audit and would get fined because his company’s data had sprawled all over the Internet and he just didn’t know where all of the information was,” Krishnan recalls. “It has become very, very clear that data security is pretty broken for most enterprises.”

The solution Krishnan came up with was to fundamentally do what Amazon did for e-commerce and what Google did for search. Concentric’s SaaS solution crawls a company’s network to discover and classify every bit of sensitive data; it granularly categorizes each piece of data, and this puts the company in a position to set wise governance policies for each and every data element.

Ultimate target

Taking this granular approach to data security and governance results in companies gaining a direct, hands-on capability to shrink their attack surface – at the data layer, Krishnan says. It will remain vital, going forward, to keep an eagle eye on sensitive data because, at the end of the day, that’s what threat actors are focused on, he says.

“Either somebody wants to steal your information because they want to put it up for sale in the Dark Web or somebody wants to encrypt your data and extort a ransom from you,” he says.  “Data is the foundational element. . .  the data layer must be scrutinized, and governance has to be put in place because data is the ultimate target.”

Concentric seeks to make verifying the security of sensitive data as quick and easy as Googling a restaurant review. To accomplish this, he says, it crawls data with advanced analysis technologies and brings “deep learning” data analytics to bear.

Krishnan gave me the example of a technology company that was concerned about employees flouting a company ban on the use of personal email accounts to share proprietary documents.

It took a few days to exhaustively scan the tech company’s Google Drive account and discover each and every sensitive document, applying deep learning along the way. “We inventoried everything from tax filings to NDAs to design documents to even things they may not care about, like resumes,” he says.

Every last bit of granular information held in the company’s Google Drive was organized into “clusters of contextually similar data,” he says. This included financial, human resources, sales and marketing, and engineering records and documents.

“This is all done autonomously,” Krishnan says, “The customer doesn’t have to write a single rule, nor define a single policy; the machine learning and the AI go in and do all of this work to give them this view.”

The tech company was then able to go into any data cluster and compare how each document had been touched; for example, how a given file was classified, where it had been recently moved to and who the file might have been shared with. “We compare document usage to how similar documents are used, which lets us autonomously find risk and flag it with widgets that represent those risks across a bunch of dimensions,” Krishnan says.

In this case, the tech company was able to flush out the employees who routinely violated company data security rules. What’s more it was able to refine its Google Drive usage policies, tightening in some areas and loosening where appropriate. This type of day-to-day, hands-on tweaking of data hygiene is something all companies need to embrace.

To be sure, the data layer is just one of several layers that require constant attention. It has become vital, of course, to also discover, classify and manage proactively in other crucial security layers: authentication; endpoint detection and response; SaaS configurations; vulnerability management.

It’s encouraging that advanced cyber hygiene tools to do all of this are gaining traction. I’ll keep watch and keep reporting.


Pulitzer Prize-winning business journalist Byron V. Acohido is dedicated to fostering public awareness about how to make the Internet as private and secure as it ought to be.

(LW provides consulting services to the vendors we cover.)

Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInEmail this to someone