ArtificialIntelligenceBanner.png

The State of Machine Learning in Cybersecurity

Author: TK Keanini
September 2019

No matter where you stand on Machine Learning (ML) and Artificial Intelligence (AI), there’s still plenty to talk about when it comes to how we as an industry are currently making use of them. With that in mind, I’d like to share some thoughts on ways we need to view machine learning and artificial intelligence as well as how we need to shift the conversation around them.

More Effective = Less Obvious

I’m still amazed by how machine learning has remained a hot topic. That’s not to say it doesn’t deserve to be an area of interest though. Rather, what I’m suggesting is that what we should be talking about are the outcomes and capabilities it delivers. Some of you may remember when XML was a big deal, and everyone could not stop talking about it. Fast forward to today and no one advertises that they use XML since that would just be obvious and users care more about the functionality it enables. Machine learning will follow along the same path. In time, it will become an essential aspect of the way we approach security and become simply another background process. Once that happens, we can instead focus on talking about the analytical outcomes it enables.

An Ensemble Cast Featuring Machine Learning

Anyone who has built an effective security analytics pipeline knows that job one is to ensure that it is resilient to active evasion. Threat actors know as much or more than you do about the detection methods within the environments they wish to penetrate and persist. The job of security analytics is to find the most stealthy and evasive threat actor activity in the network and to do this, you cannot just rely on a single technique. In order for that detection to happen, you need a diverse set of techniques that all complement one another. While a threat actor will be able to evade one or two of them simultaneously, they don’t stand a chance against hundreds of them! The true power of detection is in diversity!

To better illustrate this idea, let’s take a moment to consider a modern bank vault. Vaults employ a diverse set of detection techniques like motion, thermal, laser arrays, and on some physical dimension, an alarm will be tripped, and the appropriate response will ensue. We do the same in the digital world where machine learning helps us model timing or volumetric aspects of behaviors that are statistically normal and we can signal on outliers. This can be done all the way down at the protocol level where models are deterministic or all the way up to the application or users’ behaviors which can sometimes be less deterministic.  If you are in the same camp I am, we have had years to refine these analytical techniques and have published well over 50 papers on the topic in the past 12 years.

The Precision and Scale of ML

So, at this point, you might be asking yourself some important questions. Why can’t we just keep using lists of bad things and lists of good things? Why do we need machine learning in security analytics? What unique value does it bring us? The first thing I want to say here is that we should not be religious about machine learning or AI. To us, they are just another set of tools in the larger analytics pipeline. In fact, the most helpful analytics comes from using a bit of everything.

If you hand me a list and say, “If you ever see these patterns, let me know about it immediately!” I’m good with that. I can do that all day long and at very high speeds. But what if we are looking for something that cannot be known prior to the list making act? What if what we are looking for cannot be seen but only inferred? The shadows of the objects but never the objects if you will. What if we are not really sure what something is or the role it plays in the larger system (i.e., categorization and classification)? All these questions are where machine learning has contributed a great deal to security analytics. Let’s point to a few examples.

The Essence of Encrypted Traffic Analytics

Encryption has made what was observable in the network impossible to observe. You can argue with me on this, but mathematics is not on your side, so let’s just accept the fact that deep packet inspection is a thing of the past. We need a new strategy and that strategy is the power of inference. At Cisco, like other companies developing TLS fingerprinting techniques, we leverage the fact that all encrypted sessions begin unencrypted and that the routers and switches can send us an “Observable Derivative.” This metadata coming from the network is a mathematical shadow of the payloads we cannot inspect directly because of encryption. Machine learning helps us train on these observable derivatives so that if its shape and size overtime is the same as some malicious behavior, we can bring this to your attention all without having to deal with decryption.

Why is this Printer Browsing Netflix?

Sometimes we are lucky enough to know the identity and role of a user, application, or device as it interacts with systems across the network. The reality is, most days we are far from 100% on this, so machine learning can help us cluster network activity to make an assertion like, “based on the behavior and interactions of this thing, we can call it a printer!” When you are dealing with thousands upon thousands of computers interacting with one another across your digital business, even if you had a list at some point in time – it is likely not up to date. The value to this labeling is not just so that you have objects with the most accurate labels, but so you can infer suspicious behavior based on its trusted role. For example, if a network device is labeled a printer, it is expected to act like a printer – future behavior can be expected from this device. If one day it starts to browse Netflix or checks out some code from a repository, it should be brought to your attention. With machine learning, you can infer from behavior what something is or if you already know what something is, you can predict its “normal” behavior and flag any behavior “not normal.”

Pattern Matching Versus Behavioral Analytics

Lists are great! Hand me a high-fidelity list and I will hand you back high-fidelity alerts generated from that list. Hand me a noisy or low fidelity list and I will hand you back noise. Back in 1959, computer gaming and AI pioneer Arthur Samuel defined machine learning as a “field of study that gives computers the ability to learn without being explicitly programmed.” In security analytics, we can use it for just that and we can have analytical processes that implicitly program a list for you given the activity observed (the telemetry it is presented). Machine learning helps us implicitly put together a list that could not have been known a priori. In security, we complement what we know with what we can infer through negation. A simple example would be “if these are my sanctioned DNS servers and activities, then what is this other thing here?!” Logically, instead of saying something is A (or a member of set A), we are saying not-A but that only is practical if we have already closed off the world to {A, B} – not-A is B if the set is closed. If, however we did not close off the world to a fixed set of members, not-A could be anything in the universe which is not helpful.

Useful Info for Your Day-to-day Tasks

I had gone my entire career measuring humans as if they were machines, and not measuring humans as humans. We cannot forget that no matter how fancy we get with the data science, if a human in the end will need to act on this information, they ultimately need to understand it. I had gone my entire career thinking that the data science could explain the results and while this is academically accurate, it is not helpful to the person who needs to understand the analytical outcome. The sense-making of the data is square in the domain of human understanding and this is why the only question we want to ask is “Was this alert helpful? Yes or no?” At the end of the day, we want to make sure that the person behind the console understands why an alert was triggered and if that helped them.

Conclusion

We owe a big round of applause to artificial intelligence for birthing the child we know and love named machine learning and all that it has contributed to security analytics over the past year. We remain pragmatic in its application as we know that, just because it is the new kid on the block, we cannot turn our backs on simple or complex lists of rules, simple statistical analysis, and any other method that has got us to where we are today.

Lucky for us, machine learning has already shown signs of playing well with its peers as we continue to find ways to improve existing security processes through pairing them with ML. It can’t solve every single problem on its own, but when it works together with the people and processes that have come before it, we get that much closer to a more secure future. And if machine learning is the child of AI, who then are its brothers and sisters that we have yet to explore in Security Analytics?