SHARED INTEL: Reviving ‘observability’ as a means to deeply monitor complex modern networks

By Byron V. Acohido

An array of promising security trends is in motion.

New frameworks, like SASE, CWPP and CSPM, seek to weave security more robustly into the highly dynamic, intensely complex architecture of modern business networks.

Related: 5 Top SIEM myths

And a slew of new application security technologies designed specifically to infuse security deeply into specific software components – as new coding is being developed and even after it gets deployed and begins running in live use.

Now comes another security initiative worth noting. A broad push is underway to retool an old-school software monitoring technique, called observability, and bring it to bear on modern business networks. I had the chance to sit down with George Gerchow, chief security officer at Sumo Logic, to get into the weeds on this.

Based in Redwood City, Calif., Sumo Logic supplies advanced cloud monitoring services and is in the thick of this drive to adapt classic observability to the convoluted needs of company networks, today and going forward. For a drill down on this lively discussion, please give the accompanying podcast a listen. Here are the main takeaways:

Seeing inside systems

From the 1950s through 1990s, software was developed sequentially using the waterfall method. One phase followed the next — conception, design, build, test, deploy, maintain — and it typically took many months to complete all phases. Security typically got bolted on, post-deployment.

Meanwhile, control system engineers needed a way to assess how well deployed software worked together in the field. So, observability arose as a way to infer the condition of a system’s internal states based on monitoring its external outputs. This was very doable during the decades  when on-premises datacenters were the heart and soul of company networks.

Fast forward to 2021. The waterfall method has long since dried up. The iterative software development process, i.e. DevOps, has taken over. Forget phases. Today modular snippets of coding, called microservices, get cobbled together in containers to rapidly push out minimally viable software – to learn where it works or fails.

Iteration and remediation occur on the fly, and agility is the watchword. Security, meanwhile, is still largely bolted on and this has enticed  threat actors to take take full advantage. This is precisely why there’s a push to develop and implement new cloud-centric security tools and frameworks.

With all of this going on, and with complexity mushrooming, there remains a huge need for site reliability engineers (SREs) – the control engineers of today – to monitor system health on a day-in, day-out basis. This means observability, as a concept, is more vital than ever.

“Back in the day, people would look externally to try to figure out what was going on internally,” Gerchow told me. “And now you’ve got to somehow get at the root of code, within code, using tools to help define what you’re looking at . . . because there has to be a way to gain understanding about these new architectural systems.”

Single source of truth

Over the past 3 to 5 years, observability has been updated to meet this need. Instead of just monitoring event logs and analyzing traffic patterns at a surface level, observability tools today leverage machine learning and advanced data analytics to shed light on three data formats: metrics, traces and logs. Gerchow walked me through how each piece fits — and how all three work together.

•Metrics. This refers to the use of memory or processing power over a span of time. It is now possible to delineate the precise amount of computing power that should be used for each legitimate task. Thus, if too much or too little gets used, correlations can be made to possible reasons – and judgements can be made about what to do.

•Traces. A trace is a record of all events that trigger across multiple systems stemming from the same request flow. When a problem crops up, tracing can highlight which functions and what systems may have behaved abnormally; and it can also help an SRE determine just how far the user was able to get.

•Logs. An event log is a snapshot of an event that happened on any given network system, time-stamped and written into a permanent file. Event logs have been around forever; they were the main component of classic observability tools; and they’ve also emerged as the mainstay of Security Information and Event Management (SIEM) systems.

It’s the meshing of those three data formats that translates into effective observability. “Metrics, tracing and logs can and should be combined as a single source of truth,” Gerchow says. “This allows anyone in an organization to be able to leverage that information and be able to provide agility, reliability and then, down the line, cybersecurity resiliency as well.”

Adoption drivers

We’re in the nascent stage of this next-gen observability, if you will, making a material impact across the business landscape. At this stage, it is clear that the primary driver for embracing observability in enterprise settings isn’t to improve security, Gerchow says, it’s to streamline cloud migration.

Early adopters are using observability for things like fine tuning the amount of memory dedicated to a new software container deployed in support of a new mobile app, and then replicating that approach across other new apps, he told me. Clearly, the main reason companies, at this point, are attracted to observability is because it could help them dramatically reduce cloud native deployments and ongoing maintenance.


“The Holy Grail is that, at some point, you really want the intelligence of these systems to take over and start doing self-healing,” Gerchow observes. “Instead of an end user or a dashboard notifying you that there’s an issue and then having to have a human go and fix something.”

Pursuing observability in this manner — to smooth the bumpy migration to cloud-native operations —  is a good thing. Ultimately, this should naturally lead to improving security, as well.

Observes Gerchow: “The first thing that you lose when you move out to the cloud, if you’re not prepared, is visibility. ‘Oh, my gosh. What’s happening? How many workloads do we have out there, and how much access do people have to those workloads? How secure are they? All of a sudden there’s a new blast surface open to attack.

“Security teams are going to have to get out in front of this; they have to really understand what developers are doing and how to best work with them. And they have to start partnering up to bake security into their DNA.”

The wild card suggesting this will happen sooner, rather than later, is that  most SREs are “natural security people,” Gerchow says. “SREs care deeply about security because for things to be reliable, they have to be secure.”

It’s encouraging that observability is gaining steam at the same time that robust security frameworks and advanced app security solutions are making a splash. Many marbles are rolling in a very positive direction. I’ll keep watch and keep reporting.


Pulitzer Prize-winning business journalist Byron V. Acohido is dedicated to fostering public awareness about how to make the Internet as private and secure as it ought to be.

(LW provides consulting services to the vendors we cover.)


Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInEmail this to someone