MY TAKE: Why monetizing data lakes will require applying ‘attribute-based’ access rules to encryption

By Byron V. Acohido

The amount of data in the world topped an astounding 59 zetabytes in 2020, much of it pooling in data lakes.

Related:  The importance of basic research

We’ve barely scratched the surface of applying artificial intelligence and advanced data analytics to the raw data collecting in these gargantuan cloud-storage structures erected by Amazon, Microsoft and Google. But it’s coming, in the form of driverless cars, climate-restoring infrastructure and next-gen healthcare technology.

In order to get there, one big technical hurdle must be surmounted. A new form of agile cryptography must get established in order to robustly preserve privacy and security as all this raw data gets put to commercial use.

I recently had the chance to discuss this with Kei Karasawa, vice president of strategy, and Fang Wu, consultant, at NTT Research, a Silicon Valley-based think tank which is in the thick of deriving the math formulas that will get us there.

They outlined why something called attribute-based encryption, or ABE, has emerged as the basis for a new form of agile cryptography that we will need in order to kick digital transformation into high gear.

For a drill down on our discussion, please give the accompanying podcast a listen. Here are the key takeaways:

Cloud exposures

Data lakes continue to swell because each second of every day, every human, on average, is creating 1.7 megabytes of fresh data. These are the rivulets feeding the data lakes.

A zettabyte equals one trillion gigabytes. Big data just keeps getting bigger. And we humans crunch as much of it as we can by applying machine learning and artificial intelligence to derive cool new digital services. But we’re going to need the help of quantum computers to get to the really amazing stuff, and that hardware is coming.

As we press ahead into our digital future, however, we’ll also need to retool the public-key-infrastructure. PKI is the authentication and encryption framework on which the Internet is built. It works by issuing digital certificates to verify the authenticity of the servers ingesting the data trickling in from our smartphones, Internet of Things sensors and the like.

Just as crucially, PKI is the framework for encrypting data in transit; it works by issuing sets of decryption keys – a public key on the server side, and a private key on the user side. This arrangement has gotten us this far – but it is too brittle, from a security perspective, to carry us forward.

Karasawa cites the example of a company email server that has a public key and issues private keys to all its users. Each private key serves a narrow function: it gives the same type of authenticity and level of access to each user.

This creates exposure. The best evidence of this is how email has become a battleground where companies must continually defend attackers’ endlessly creative efforts to manipulate email to circulate malware and distribute phishing ruses.

And this exposure doesn’t go away by replacing the company’s on-premises email server with a cloud-based email service, Karasawa says. In fact, it highlights how the migration to cloud services has expanded the attack surface.


“When you create an email archive in the cloud, you need to share secret keys to the whole dataset, so everyone can read all of the data in the cloud,” Karasawa says. All the attacker needs to do, he says, is to take over the account of a legitimate user to attain deep access to a lot of sensitive information stored in the cloud. And threat actors have become  adept at account takeovers.

Attribute-based access

Clearly, our approach to issuing secret user keys needs rethinking. And this is where attribute-based encryption – ABE — enters the picture.

Some context: attribute-based access control – ABAC — is a long-standing methodology by which attributes, or characteristics, rather than roles (such as authorized email user) are used to determine access.

The National Institute of Standards and Technology has issued extensive ABAC guidelines. The NIST standards serve as a roadmap showing how to more granularly manage access rights for people and systems without unduly burdening users or system administrators.

ABE is a new form of public-key encryption by which it’s possible to issue private keys that work only when a specific set of conditions are met – and those conditions can range from simplistic to extensive.

In short, ABE makes it possible to issue highly customized private keys designed to serve very granular purposes. This capacity to push attribute-based access down the encryption level is going to be essential to maintain the integrity of the next-gen smart infrastructure and smart digital services.

These leading-edge systems, in fact, are ramping up today, as 5G connectivity, which allows for much denser distribution of IoT sensors, gain more and more traction. Our cities, transportation systems, homes, workplaces and even clothing are getting smarter, day-by-day, trickling ever more data into the data lakes. As the absolute number of legacy encryption keys keeps rising, it is becoming more and more evident that one-size-fits-most cryptography solutions won’t be enough.

Encrypting just once

Consider the example of an elderly couple relying on smart services. Data from IoT sensors in this hypothetical couple’s home, car, appliances and various health monitors will trickle into a data lake. A small army of specialists — from software developers and sundry service providers to medical staff and family members — will leverage apps to support this couple.

These apps, in turn, will make use of data stored in data lakes. Yet each party in this example needs access to only a few specific drops of data from the lake. And the couple’s privacy will need to be preserved.


“If you used the conventional way of doing individualized encryption, you’d pretty much have to encrypt every time with different keys,” Wu says. “ABE allows you to encrypt once, and then issue different kinds of keys to different users, depending on your policies. So from that perspective this gives you a much more efficient way of protecting the data.”

This approach aligns with NIST’s long established standards for attribute-based access controls – ABAC–  in general.  ABAC is a proven methodology of authorization based on evaluating attributes associated with the subject or object and correlating that information with policies, rules and relationships.

By applying those same principles to PKI keys, ABE “provides for a much more fine-grained definition of sensitive data,” Karasawa told me. “People can think about providing more flexible IoT services and other types of services.”

We are going to need a more granular approach to encrypting the information flowing into our data lakes. The cool new services derived from the vast amounts of data collected by next-gen IoT systems demand it – these services simply cannot be too easy to corrupt.

Karasawa pointed out the example of facial-recognition systems gaining the capacity to accurately identify faces in video streams. “That data can include uncertain privacy, so those access rights need to be carefully authorized,” he observes.

ABE is on the cusp of wider productization. It holds the potential to help us make higher use of all those zetabytes of data flowing into data lakes – in ways that respect personal privacy and keeps operational data secure. I’ll keep watch, and keep reporting.


Pulitzer Prize-winning business journalist Byron V. Acohido is dedicated to fostering public awareness about how to make the Internet as private and secure as it ought to be.

(LW provides consulting services to the vendors we cover.)


Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInEmail this to someone