How to Measure and Improve Decision Automation for Cybersecurity (Next Gen SOAR)
Author: Kumar Saurabh
September 2019
Facing an increasingly sophisticated barrage of threats, Security Operations Centers (SOCs) today are evaluating a variety of security tools, including security automation tools.
Security Operations, Automation and Response (SOAR) systems promise to automate data collection and threat remediation. They collect alerts and log data and, once an analyst has studied this data and decided upon a course of action, they perform some automated steps, such as closing ports and deleting files, to contain the threat. Some security automation platforms go even further, promising to completely automate the analysis and decision-making.
If these platforms could automate not just the comparatively simple steps of data collection and task automation, but also the more advanced work of threat analysis and decision-making that constitute the most difficult part of a security analyst’s job, then security automation would bring an unprecedented level of efficiency and consistency to Security Operations. And analysts, who in nearly every SOC struggle to keep up with an unending torrent of alerts and user requests, would finally have more free time for proactive threat hunting and other critical – but often deferred – tasks.
Most SOC teams are hesitant and often times skeptical to automate more cognitive steps - steps that require domain knowledge, knowledge of tricks and techniques that they have acquired over the years, and expertise to know what data means and how to transform that data into a decision.
How do we objectively determine if an automation is capable of automating analysis and decision making?
Addressing SOC Skeptics: Can Automated Decision Analysis Reach the Right Conclusions?
An intelligent security automation platform can automate the cognitive work of analysts for any of a hundred or thousand types of threats. If the platform can do this job well, there’s no reason for SOCs not to adopt this type of automation.
Hence, once we have automated a playbook, we want to measure and ensure that the automation is working as accurately as a human analyst would.
Undoubtedly, there will be some differences between what an automation will do from what an analyst would do. After all, sometimes two analysts in the same SOC will offer different conclusions and recommendations even when evaluating the same threat and working from the same playbook.
The goal isn’t (and, practically, it can’t be) to have automation match the conclusions of an analyst 100% of the time, since 1) even analysts differ in their conclusions, and 2) it’s unlikely that machines and people will achieve the same results 100% of the time.
What SOCs really want to know is that any gap in accuracy between a human analyst and an automated system is within an acceptable threshold. The security platform and the analyst can differ – just not too much. In the event that they do differ by too much, further analysis is warranted before jumping to conclusions about analyst, automation or both.
It’s worth noting that even when automation cannot make a definitive conclusion itself about a threat, it can still automate a lot of an analyst’s work, substantially reducing a SOC’s overall workload.
With these preliminaries decided, let’s consider how a SOC could go about methodically comparing the results delivered by a security automation platform to the results delivered by a senior analysts whose work in the SOC is considered exemplary.
Methodology: Measuring Results in Phishing Triage
To compare the results of security automation and human analysis, it’s helpful to focus on a single area or domain of security analysis.
Phishing triage is a common and – unfortunately – necessary part of daily life in just about any SOC. The goal of phishing triage is to determine which suspicious emails flagged by users or security tools are phishing attempts and which are benign.
Security automation can evaluate suspicious emails flagged as possible phishing attempts. As it analyzes emails, it sorts them into these distinct categories:
Malicious (phishing)
Benign (not phishing)
Needs Manual Review (the system cannot automatically decide whether the email is a genuine phishing attempt.)
To benchmark the performance for this comparison, we will ask a human security analyst to triage a batch of emails (in this example, 30 emails) and sort them into two folders, Malicious and Benign. (We don’t need a third category for emails needing manual review, since the analyst is already manually reviewing every email submitted for analysis.)
Metrics for Evaluating Results
After the security automation platform and the human analyst have both examined and sorted the complete batch of suspicious emails, we can compare the results. To make this comparison, we’ll use these two metrics:
Accuracy
When the email was examined more closely or permitted to be delivered, did its classification turn out to be accurate?Coverage
For what percentage of emails did the automation platform fully automate analysis and make a decision (as opposed to marking the email “Needs Manual Review”)? Note that coverage refers to complete automation, obviating the need for an analyst to review
Ideally, both these metrics should be high. Let’s examine why. Consider two possibilities:
High accuracy, low coverage
If an automation platform delivered high accuracy (for example, 95%) but could reach a determination only in a limited number of cases (for example, covering only 50% of emails being considered), the automated platform would end up serving primarily as a labor-saving device. For example, coverage rate of 50% could eliminate half the SOC’s analytical workload.Low accuracy, high coverage
Conversely, if the platform delivered high coverage but low accuracy, the platform would not be useful at all to SOCs. Evaluating the majority of emails but miscategorizing them is hardly the result that any SOC is seeking.
Of the two metrics, accuracy is ultimately more important, because if accuracy is high, then the SOC can rely on automation to eliminate manual review of whatever percentage of emails is being covered and hope that that percentage will rise over time.
Ideally, both metrics should be high. Then SOCs would have a proven accurate solution for evaluating the majority of suspicious emails.
Success Criteria
If accuracy is around 90% or better and coverage is 75% or better, most SOCs we talk to would consider the accuracy of the automation to be good enough. That’s because 75% of the analysts’ workload has been reduced. And with the workload most security analysts are carrying today, that’s a welcome relief.
What about accuracy rates? Results will vary obviously from playbook to playbook and SOC to SOC, but generally we find that:
Security analyst error rates range from 16% to 25%.
Automation platform error rates range from 1 to 5%.
Keep in mind that most first generation SOAR systems have a coverage rate of 0%, because they never fully automate their analysis. Instead, they require an analyst to stop work and review whatever data the SOAR is delivering an alert about.
When we say that coverage is 75%, we mean that three times out of four, the security analysts in the SOC need to take no action whatsoever in order for the security alert or incident to be fully resolved. That rate applies regardless of whether the resolution turns out to be dismissing the alert as a false positive or diagnosing it correctly as a specific type of threat and taking action to remediate it.
What if for a particular playbook or situation, the automation platform’s accuracy or coverage is not good enough?
At LogicHub, when we encounter this situation, we spend an hour each week with a human analyst to understand why they would make the decision differently than our automation platform did. We will update the playbook within two business days, and then take another set of measurements over the next two days, and review the new metrics on the fifth day.