The Cyber Fed Model: Creating Communities of Trust – Lessons Learned
Author: Kathy Lee Simunich
September 2019
Around the beginning of this millennium, as computer intrusions were gaining attention, an idea was conceived to share cyberthreat information from machine to machine. At Argonne National Laboratory (Argonne), around 2004, a team of cybersecurity specialists and software engineers designed and implemented an automated cyberthreat-sharing system called the Cyber Fed Model (CFM).[1] A fundamental design concept was to design trust and flexible distribution into the system from the start, in order to create flexible (and dynamic) communities for sharing cyberthreat indicators (CTI) between the various national labs and other U.S. Department of Energy (DOE) sites and plants. Even today, it is quite difficult to convince organizations to share their indicators of compromise, so creating trust communities was the first challenge to tackle. The DOE community consists of the labs, sites, and plants across the DOE complex. DOE is the Sector-Specific Agency for Energy, so another trust community consists of energy sector organizations, both public and private, as well as the DOE Power Marketing Administrations (PMAs).
CFM utilizes a set of web servers that physically separates the four trust communities of our clients, DOE, the energy sector, other U.S. government departments and agencies, and general (a catch-all for other private industry and non-government entities). Within each domain, client sites are grouped into “federations,” and sites may be part of one or more federations. Federations can be defined hierarchically as well; sub-federations can represent different groups within an organization. In addition to being structured around organizations, federations can also be organized around shared interest areas (e.g., high-performance computers, oil and natural gas, advanced manufacturing). Non-browser-based client programs are set up at the member sites to automatically upload (publish) and download (subscribe) data automatically.
To address the issue of trust, CFM has a flexible and dynamic distribution system, which gives a site control of its own uploaded information whenever it uploads data. For example, a DOE laboratory can tag its data to be shared with all the DOE sites in the DOE Federation, it might want to share only with other national laboratories, or it may want to share only to a specific site. The site may change these permissions for each individual upload, if desired. The CTI data is doubly encrypted in transit and remains encrypted at rest. Each site and each federation maintain a GPG (GNU Privacy Guard) key pair. The uploading site will encrypt its data using the public keys for the federations and/or sites with whom it wishes to share. The CFM domain server uses these keys to limit availability to only the designated recipients (i.e., members of specified federations and individually specified sites), who will download the data on their next poll of the server.
Around 2013–2014, DOE (represented by Argonne) became a participant in the Enhance Shared Situational Awareness (ESSA) initiative, a multi-federal-agency consortium that includes federal cybersecurity centers. One task of the ESSA community was to create the Access Control Specification (ACS), which defines how data may be accessed and further shared once CTI information is shared outside an organization to a central repository. Full implementation of this specification will be needed before federal agencies will be able to fully share their CTI information. Another decision that resulted from the ESSA effort was to agree to use the emerging standard for formatting CTI, called STIXTM (Structured Threat Information eXpression), which is now an OASIS standard (Organization for the Advancement of Structured Information Standards).
More recently, a new interagency and private-industry community is being created by the U.S. Department of Homeland Security (DHS) through the Cybersecurity Information Sharing Act (CISA) of 2015. Before CISA, Argonne, through the CFM, shared CTI with other federal agencies and private industry. At the inception of the DHS’s Automated Information Sharing (AIS) system, CFM was one of the first participants to begin sharing CTI data.
Once an organization achieves the capability to share CTI, they face the challenge of metrics: how much data is being shared, with whom, what type, and how much of each type of data is being reported; as well as duplicate indicators vs. multiple sightings of an indicator. CFM stores the CTI data in a database as an encrypted file and, if it is shared with CFM, the individual indicators are extracted and stored as well. The file representation needs to be maintained so that the distribution can be correctly directed. The individual indicators are necessary to compile metrics on types of data and how much of each type exists, as well as any data analysis or searching that needs to be done for reporting purposes.
Upon upload, a file must go through a series of processes within the CFM server before final storage, and errors that result in quarantine or rejection may occur at various points. The first step is to verify that the upload is coming from a known and trusted source. Next, the system verifies that the upload contents are in a valid file format. Then the indicators are checked against a whitelist; they are individually removed if either whitelisted or in an invalid format. Last, in certain cases, the file and header metadata (e.g., who is the originator of the data) go through a source obfuscation process so that when the other recipients receive the data, they do not know where the data originated. This anonymization step was required before asset owners in the energy sector would sign up to become CFM member sites. It is also important when crossing certain relationship boundaries
Another challenge for automated machine-to-machine sharing is knowing when something goes wrong, whether on the client side or the server side. CFM uses a centralized logging system to log the status of the running systems. It logs when files are received, from whom, any metadata included in the headers from the uploading sites, and any warnings or errors in processing the files. Systems are set up to monitor the health of the various CFM processes and will email alerts on various conditions to the CFM team. They also monitor connections to external partners and send alerts on connection failures. Custom CFM monitoring processes continuously connect to each domain server and will email an alert if any machines are down or excessive errors occur
Echo cancellation was also a difficult hurdle. For example, if a participating site both uploads data and downloads data, the uploader may not want the same data back that they previously uploaded. This becomes especially challenging if the server obfuscates the source of the data. For instance, Argonne had to add a marker in the STIX documents it uploads to the AIS feeds so that the marker could be checked when making the next download call, so as to not re-upload DOE data back to CFM. The CFM server implements a “what’s new” algorithm that maintains a “bookmark” for every client, marking which files that client has already downloaded for each call. By default, this filters out the user’s own data so that the user does not have to do it.
Interoperability became an issue once we started sharing outside the CFM platform. Argonne needed to create a custom-built translation layer that could not only convert from the XML-based format of CFM messages to the more industry-adopted STIX format, but also work with different authentication schemes. CFM utilizes Basic web authentication, custom validation processing, and GPG encryption, whereas DHS AIS uses PKI (Public Key Infrastructure) certificate-based authentication. Each platform may have a different authentication scheme. The CFM platform needed to support each of these in order to interact with the platform that uses them.
One of the lessons the CFM team learned was limiting our focus only on indicators. Even though it is the first step toward sharing, many indicators need to have some sort of context attached to them to be useful; users want to know what the quality of the data is, and what their organization’s response to it should be. The STIX architecture was designed after CFM became operational, and one of its good features of STIX is that it tries to capture much more context that can be associated with an indicator, such as: course of action taken; tactics, techniques, and procedures (TTPs); threat actors; and campaigns. All of this is information that would help another analyst defend their organization’s networks. Participants need to strongly adhere to standardization in representing these higher-level concepts, so that the meaning does not get lost in interpretation. Version 2.x of the STIX architecture has gone a long way toward rectifying the overly flexible STIX 1.x specifications, where there were multiple ways to encode the same CTI data. Version 2.x also captures relationships between observed indicators and their higher-level context elements. These relationships and the additional associated context will be critical to the evolving role of CTI sharing.
The future of CTI sharing needs to evolve by incorporating added context and relational data, sharing behaviors of adversaries, and aiding in the composition and sharing of “playbooks.” Adding playbooks to CTI sharing will allow organizations to capture and orchestrate potential responses that can be automated to quickly and more thoroughly defend organizations. In addition, implementing a distributed search capability would move the focus of cyberthreat information sharing from a “publish” model to a “research” model. A distributed search system could allow participants to search unpublished data from an organization (such as full packet capture, or even cyber-physical data) so that an analyst or team can build up the context or playbook and then share that with the community. Analysts would be able to discover behaviors across multiple attacks and define proactive defenses.
Shifting to a behavior-based approach would focus on the bigger picture of the cyberthreat versus isolated pieces of the puzzle. Instead of focusing on a single action, analysts can start viewing attacks as a sequence of actions, possibly utilizing Mitre’s ATT&CK matrix. Playbooks that define workflows for response or remediation against the behaviors/sequencing of TTPs related to individual campaigns and adversaries can be shared to protect the community at large.
Solving these challenges will require collaboration and a coordinated effort. However, with a more comprehensive understanding of the threat, and the availability of automation and orchestration capabilities, analysts will be able to disrupt the adversary in ways that will cost significantly more time and effort to work around than today’s typical response of blocking a single indicator. All of this will contribute to the long-term improvement of cyber-situational awareness.
[1] The system was known as the “Federated Model for Cyber Security” until 2010 and was the recipient of the 2009 DOE Innovation award.