Understanding How to Conduct a Risk and Resilience Assessments (RRA)
by Kevin Owens
This article covers ANSI/AWWA J100-10 “Standard for Risk and Resilience Management of Water and Wastewater Systems.” This standard can help facilitate compliance with America’s Water Infrastructure Act of 2018 (AWIA).
Overview of Risk
Security is all about minimizing risk. Risk is expressed as a calculation:
R = T x V x C
R = Risk ($/year)
T = Threat Likelihood (per year)
V = Vulnerability
C = Consequence ($)
The first part of the equation, Threat times Vulnerability, can be looked at as a probability. What is the likelihood of a vulnerability being affected by a specific threat? This threat could be a natural disaster type of event (earthquake, hurricane, tornado, etc.) or a malicious actor launching a cyber of physical attack against your assets.
The probability of natural disaster events can be drawn from historical databases, but threat actors are difficult to measure since it is dependent on several values, including the:
- Skill of the malicious actor
- Motive of the malicious actor
- Opportunity – whether the malicious actor possess the required knowledge and access
- Resources – financial resources and amount of malicious actors
How is a vulnerability discovered by a malicious actor? This is done via reconnaissance, scanning, and information disclosure. The likelihood of this vulnerability being discovered and exploited can be based on:
- Ease of discovery – Is there a service or application that indicates the version number alerting a malicious actor to an unpatched asset?
- Ease of exploitation – Does a tool currently exist that is easy to use, or does it require several difficult steps to achieve?
- Awareness – Has the vulnerability been disclosed to the public?
- Detection – Can the steps to exploit the vulnerability be easily detected? Can the organization take countermeasures to block this exploit?
When examining vulnerabilities, one needs to look beyond just system and device vulnerabilities, but to also consider the human factor. If the system is secure with no vulnerabilities, but a user can open an email attachment without restriction this should also be a vulnerability.
The last part of the equation describes the consequences, or impact, of an event that has occurred. This is defined by two main factors:
- Technical impact – Described by confidentiality, integrity, and availability of the data or system
- Business impact – Described by business impact analysis, which accounts for financial damage, noncompliance as a result of a breach, and legal or privacy implications
The combination of the likelihood of an event and the impact describes the severity of that risk.
One could limit the consequences and severity of a malicious actor by imposing security policies, processes, and procedures. This will not prevent a breach, but it can greatly reduce the impact that any intrusion.
Table 1. A simple example of a Risk Matrix.
What needs to be assessed for an AWIA RRA?
The act requires assessing the following things at a minimum:
- Risks to the system from malevolent acts and natural hazards
- Resilience of components: pipes and constructed conveyances, physical barriers, source water, water collection and intake, pretreatment, treatment, storage and distribution facilities, electronic, computer, or other automated systems
- Monitoring practices of the system (water quality, security surveillance systems, access control systems, cyber security systems, energy management systems, or others)
- Financial infrastructure (accounting and financial business systems that may be vulnerable to cyber attacks)
- Use, storage, or handling of various chemicals by the system
- Operation and maintenance of the system
- Evaluation of capital and operational needs for risk/resilience management
The last bullet focuses on what management needs to be able to assess capital and operational needs. Note that there is an increased emphasis on cybersecurity threats to process controls and business enterprise systems, e.g. “financial infrastructure.”
The J100 Methodology
The Seven-Step J100 Methodology is as follows:
The J100 Standard is to enable water utilities to make sound decisions when allocating scarce resources towards reducing risk and improving resilience. J100 is not the most comprehensive and retailed risk assessment, but it is practical and efficient to apply.
It provides both consistency and comparability because of:
- Common terminology
- Common metrics
- Common processes
- Common scenarios
- Consistent results
J100 provides methodology and resource materials that can be used for addressing these requirements. This standard represents the consensus of the water sector and it is a voluntary standard that provides minimum requirements. Note that it does not supersede laws, regulations, or codes but the proper application of the standards is a basis for demonstrating due diligence.
Step 1: Asset Characterization
During this step one determines which assets that, if compromised, could result in prolonged or widespread service interruption or degradation, injuries, fatalities, and/or detrimental impact. An asset is critical if the utility’s mission is significantly degraded if it is lost or unavailable.
- Pretreatment process (zebra mussels, disinfection, corrosion control)
- Storage and distribution facilities (reservoirs, elevated tanks, distribution network)
- Process control systems
- Select enterprise systems (Financial infrastructure – billing, procurement)
There are six sub-steps to asset characterization:
- Determine which assets are involved in mission or critical functions
- Create a list of potentially critical assets
- Identify the internal/external supporting infrastructure
- Examine protective countermeasures and mitigation strategies
- Estimate worst-case scenarios for each of these assets
- Using those estimates, determine the most critical assets
Reduction Note: During this step, some assets could be removed from this list if they are:
- Non-Critical & Low Consequence Facility
- Low-Consequence Assets in Critical Facilities
- Low-Consequence or Low Threat Likelihood Critical Assets in Critical Facilities
If you identify any of those items or ones later steps, STOP, Defer, or Eliminate those items from your list.
Step 2: Threat Characterization
Consider applicable malevolent threats, natural hazards, and dependency/proximity hazards:
- Natural hazards (hurricanes, tornados, earthquakes, floods, wildfires, ice storms)
- Dependency and proximity hazards (utilities, key suppliers, key employees)
- Newer threats added (hydrologic change, Derecho, workplace violence, ransomware, supply chain dependencies)
These threat-asset pairs need to be assessed and ranked according to magnitude. These threat-asset pairs are what are analyzed through the rest of the process.
Reduction Note: During this step, some threats could be removed from this list if they are Low-Consequence or Low Threat Likelihood Asset-Pairs.
Step 3: Consequence Analysis
Identify the worst reasonable consequences that can be caused by specific threats on the assets, includes serious injuries, fatalities, financial loss to utility, and the economic impact on the regional community.
- Apply reasonable worst-case scenarios
- Estimate those consequences of those scenarios
- Evaluate additional consequences, if needed
Record these results and use the J100 Appendix B and the EPA “Baseline Information on Malevolent Acts for Community Water Systems.”
Note: The value of a statistical life ($7.4M in 2006 $) and a statistical serious injury is 35% of that.
Step 4: Vulnerability Analysis
Next, one needs to analyze the ability of each critical asset and its protective systems to withstand each identified threat. Things to examine are weakness in facilities, policies/procedures, and personnel behavior.
There are several security protective measures/principles, which I will discuss further from the ANSI/AWWA G430 Standard:
- Deny adversaries access to the information and other resources they require to conduct attack planning
- Persuade adversaries from conducting an attack through emphasis of the likelihood of failure and capture
- Project a sufficiently hostile view of the environment to an adversary to make an attack difficult or too unachievable to progress
- Amplify the effectiveness of security measures and messaging
- Messaging on the corporate website about the effectiveness of security measures (including the monitoring of visitors/cookies to enhance the user experience)
- Limiting the information available about the asset forcing a physical reconnaissance visit to the asset (increasing the likelihood of detection)
- Ensuring that the approaches to and areas around the asset are clear, easily monitored and that there is an appropriate challenge by the security officers or staff to unknown individuals (“Can I help you”)
- Messaging for the entire attacker journey, from the website through to the physical approaches to the site, that provide reassuring messages about the security measures in place
- To identify threat or attack behaviors at every stage of an attack – planning, reconnaissance, and deployment
- Initiate an appropriate response to a threat or attack as early in the attack timeline as possible
- Monitor for the loss of information or assets which have been moved off site
- Detecting hostile reconnaissance through the monitoring and detection of suspicious activities on the corporate website and visits to the asset
- Implement a CCTV monitoring system covering beyond the site perimeter to identify an attack team approaching
- Use an information/asset logging system to identify patterns of information/assets not being returned or accounted for
- Maximizing the time between the detection of an attack (at any of the stages in the attack timeline) and an attack reaching an asset’s perimeter
- Limit availability/access to information in order to prevent an adversary developing an optimized attack plan – thereby increasing the attack timeline and further increasing the chances of detection
- Monitor the area beyond the perimeter enabling early detection and maximizing delay time for an adversary to transition the ground
- Ensure an adversary requires multiple or extended visits to a site to gather information for an attack plan – increasing the risk of detection and extending the attack planning timeline
- Maximize stand-off to any form of attack
- Minimize single points of failure beyond your perimeter
- Understand the potential effects of an attack on the surrounding environment and its impact on your site
- Use of vehicle security barriers to enforce an appropriate stand-off distance
- Use of resilient power supply, preventing single point of failure
- Locating key servers at the core of the building
- Use of local business/security forums to discuss impacts of attacks on neighbors and potential mitigations that could be used
- Determine what external response is required to the range of threats your site faces and ensure measures are in place to initiate the response
- Where appropriate exercise your plans with external response forces, including communicating with neighbours
- Establish an out-of-hours system to deliver a nominated keyholder to the site within an appropriate time frame
- Law enforcement response
There are several different methods to estimate vulnerability which include, but are not limited to:
- Event trees
- Path analysis
- Vulnerability logic diagrams
- Professional judgement/experience
- Hybrid methods
Step 5: Threat Analysis
During this step, one estimates the likelihood of malevolent events and dependent hazards as well as estimating the probability of natural hazards. There are three approaches to estimate threat likelihood:
- Proxy Measure: Examining the attractiveness of the target (big city, government customers, etc.)
- Best Estimate: This is based on informed experience, based on federal, state, local, professional, or other guidance
- Conditional Assignment: Probability analysis of 0 to 1.0
Reference the EPA “Baseline Information on Malevolent Acts for Community Water Systems”, July 2019.
Step 6: Risk and Resilience Analysis
During this step, one estimates the owner's risk and resilience relative to each threat-asset pair, going back to our Risk Equation above (R = T * V * C).
One then needs to examine their current level of resilience, which can include factors such as connectivity, interdependencies, preparedness, continuity of operations, and recovery plans.
Step 7: Risk and Resilience Management
Determine whether actions are needed to enhance all-hazards security and/or resilience. If so, decide on and implement actions to achieve an acceptable level of risk at an acceptable cost. This includes net benefit/cost ratio calculations. Steps for this include:
- From the Step 6 threat-asset pairs, determine the risk and resilience levels that are acceptable
- List the countermeasures and mitigation strategy options for those threat-asset pairs that are not acceptable
- Estimate costs for each of those options
- Assess those options
- Identify if any of those options will mitigate more than one threat-asset pair
- Calculate the net benefits and benefits-cost ratio
- Review those results
- Continuously monitor and evaluate the selected options that were installed to ensure that it meets your needs
- Conduct periodic risk analyses to stay current
Total Gross Benefit = Baseline Risk – Mitigation Risk
Net Benefit = Total Gross Benefit – Mitigation Cost
Net Benefit / Cost Ratio = Net Benefit / Mitigation Cost
Give priorities to mitigation measures that have the highest net benefit (lives saved and injuries avoided). Use the benefit/cost ratio as a “tie breaker” through similar net benefit levels. A benefit/cost ratio of greater than 1.0 means there is a return on investment (ROI) made in the mitigation strategy.
Reduction Note: During this step, some assets could be removed from this list if they are:
- Threat Asset Pairs w/ Acceptable Risk/Resilience
- Low Net Benefit Options
Core Team for RRA
Your core team to perform your RRA could include:
- Team Leader
- Risk Analyst
- Treatment/Distribution Operations and Maintenance
- Information Technology
Additional Stakeholders could be:
- Customer Service
- Laboratory / Compliance
- First Responders
During your engagement with stakeholders one should seek to achieve consensus on:
- Critical Assets
- Relevant threats and hazards
- Risk/resilience management recommendations
And create a balance between:
- Mitigation resources that are needed
- Risk acceptance/transfer
- Ownership of results
Results of Performing the J100 Process
J100 aids in defining the capital and operational needs (an optional phase of the AWIA requirements). Benefits include:
- Risk and Resilience Management calculations yield life cycle cost estimates for individual mitigation measures and mitigation portfolios
- Grouping of mitigation measures to create portfolios helps to form projects
- J100 results help integrate security and preparedness into CIP planning
- Prioritizing projects, for phased implementation, helps utilities understand budget needs in out years
Lastly, your RRA will aid in your development of your Emergency Response Plan (ERP):
- RRA can help define and prioritize risks
- It can help characterize dependencies
- It can help quantify operational impacts
- It can help identify cost-effective mitigation strategies
About the Author
Kevin Owens is an experienced consultant with more than 20 years of cybersecurity knowledge, uniquely qualified to examine a network from the adversary's point-of-view and then increase the security posture/defense/detection. Subject matter expert (SME) in Industrial Control Systems (ICS)/SCADA, Security Assessments, Cybersecurity, Cyber Defense, and Teaching.
The article has been originally published at: https://www.linkedin.com/pulse/understanding-how-conduct-risk-resilience-assessments-kevin-owens/