Texas power outages demonstrate grid cyber vulnerability
Recent Texas power outages and the loss of both electricity and water across Texas demonstrate how vulnerable ERCOT and Texas are to not only natural disasters such as snowstorms and hurricanes but also manmade and malicious activities. More than that, it also demonstrates the vulnerability of the entire U.S. Energy grid. The good news is that many of the root causes of the Texas debacle can be addressed with improved policies, better risk planning considerations, updating energy demand models, and modifying FERC/NERC cybersecurity requirements for securing and monitoring control systems. To be clear, most of the issues are not technical issues, but leadership/regulatory issues that can be resolved.
As experienced by Texas, the electric grid and much of our critical infrastructures are particularly vulnerable to disruption from natural disasters (e.g., major earthquakes, floods, blizzards, and ice storms). Closer analysis shows the same effects created by natural disasters can also be triggered by adversaries able to create the same disruptions and cascading effects by exploiting control systems (e.g., SCADA systems, plant distributed control systems, controllers, relays, process instrumentation, etc.). Cyber vulnerabilities are often more exposed during natural disasters when the focus is elsewhere, while at the same time many security procedures and processes are suspended to be able to expeditiously restore operations and connectivity.
There are other considerations that affect the organizations that operate the grid and other critical infrastructures such as insurance and impacts on credit ratings. Because of the increased cyber risk that can lead to catastrophes such as what occurred in Texas, insurance and credit ratings can be significantly impacted.
A Closer Look at Cyber Exploit and Disruption Threats
A cyber event directed at the grid places vulnerable grid and other control systems at risk. However, in the case of ERCOT and the Southwest Power Pool (SPP), the severe weather was the primary cause for the potential of grid cybersecurity events. One driving factor for the gap and vulnerability of the grid is the dependency on remote access for digital networks and control system devices established to instantaneously monitor and operate control systems and associated equipment. Following natural disasters, remote access is also used to help bring critical facilities back on-line.
Hurricanes Katrina and Harvey are earlier examples where cybersecurity considerations were intentionally “bypassed” to expeditiously bring facilities back on-line. As a result, I was part of an Idaho National Laboratory (INL) team that provided guidance to those facilities.
From a cyber security perspective what has changed over the years is the cyber capability of nation-state actors such as China and Russia to not only monitor but also affect the magnitude and recovery of events such as what happened in Texas. Think of what additional impacts could have occurred if there were hardware backdoors in Chinese-made transformers that were manipulated or if the SolarWinds cyber compromise were used to manipulate the Operational Technology (OT) networks and building control systems in power grid and natural gas control centers and plant control rooms.
This is even more relevant when China’s foreign ministry said February 19th that “seeing the plight of Americans suffering in a severe winter storm that hit the state of Texas this week reinforced a belief among Chinese citizens that their country is ‘on the right path’”. What has also been evident is that industry exercises such as GridEx seem to miss real incidents such as what happened in Texas. Consequently, missing these real events brings up questions such as are established operating standards like NERC Reliability Standards and ERCOT and SPP operating practices being adequately considered? In my view, not enough questions are being asked about the cyber issues. It’s the easiest way to spot a vulnerability – ask appropriate questions and if they can’t be answered satisfactorily, they should be investigated.
The risk management decision to not winterize power plants, pipelines, and other critical grid infrastructure has been identified as one of the causes of the power outages. A hard-freeze and heavy snows in the Southwest are low probability, but as seen in Texas and elsewhere, can cause very high consequence events. Burying pipes underground or providing adequate thermal insulation is not a cyber problem. However, providing remotely controlled heaters or electric heat tracing for critical equipment including instrumentation creates a cyber vulnerability that can be exploited.
The Polar Vortex that affected Texas the week of February 15, 2021 has required power, water, and energy facilities to get back on-line as soon as possible similar to what occurred after Hurricanes Katrina and Harvey. The Texas events experienced significant grid frequency drops that caused system-wide impacts. Those impacts could potentially be unintentionally or maliciously exploited to cause long-term damage to grid and other critical infrastructure equipment. Because of the lack of appropriate control system cyber forensics and training, bad actors could exacerbate the problem without being detected. A real concern is there aren’t enough critical spares and crews to get the grid back on-line and that doesn’t account for potentially being held hostage by foreign manufacturers.
A Demonstration that Remains Relevant
The Aurora vulnerability is the remote opening of breakers and reclosing them out-of-phase with the grid. With respect to the Texas outage, at 1:55 a.m. Central Time, February 15, 2021, the frequency across the Texas grid dropped from 60 Hz to 59.308 Hz. Maintaining the frequency within a range around 60 Hz is imperative to keep power generation plants online and supplying electricity. That 0.7 Hz drop was enough to automatically shut down power generators as being out of sync with the grid. Before the frequency drop, at 1:50 a.m., the statewide electricity load was 62,439 megawatts (MW). When the frequency drop occurred, the grid automatically shed load by nearly 10,000 MW, to 52,950 MW. When power plants instantaneously shut down, it creates a shock to the grid that can impact large electric equipment. It required the ERCOT operator to continue to shed load. By 1 p.m. on Monday, the grid was down to 46,281 MW, nearly 20,000 MW less than its 1:50 a.m. high.
In the case of conducting rolling blackouts to help stabilize an overloaded electric grid, a consideration is NERC Reliability Standard PRC-006 – “Automatic Under Frequency Load Shedding”. Regional transmission organizations, such as ERCOT and SPP, have established operating requirements based on the measures outlined in PRC-006. In the case of ERCOT, these under frequency requirements establish 3 levels of load shedding and the corresponding frequencies. ERCOT sets operating requirements of 59.3 Hz with load relief of at least 5% at level 1 (the actual drop that occurred February 15th), 58.9 Hz with load relief of at least 15% at level 2, and 58.5 Hz with load relief of at least 25% to prevent long term cascading outages.
What does this have to do with Aurora? The drop in frequency resulted in an automatic load shed (shutdown) of the generating plants. This is the first step of the 2 steps of Aurora – remotely opening the breakers. Step 2 is remotely reclosing the breakers out-of-phase with the grid which is where the long-term damage occurs. This is due to the large mechanical torques and electric current spikes produced when the equipment is restarted out-of-phase with the grid that act like sticks of dynamite. The reclosing of the breakers can be malicious to deliberately cause damage or unintentional as technicians are trying to bring generating plants back on-line too quickly. I talked to grid expert, Mike Swearingen, about his thoughts. According to Mike, creating Aurora events affecting multiple generating sites is a possibility. Per ERCOT operating requirements, the Texas grid initiated rolling blackouts by shedding load to avoid dropping below 58.5 Hz which would cause a cascading long-term loss of the system. This means the grid operators would be monitoring the re-energizing of their systems under controlled frequency at certain loads.
However, if the sensors within the relay control systems were inaccurate, they could cause a frequency imbalance which would affect phase angles enough to cause potential Aurora events. In the Texas event, frozen instruments were blamed for many of the outages. Frozen or inaccurate instruments could be the erroneous input needed to cause single or multiple Aurora events. The second path to Aurora is the remote access required to re-establish the grid. Reclosing breakers out-of-phase, whether malicious or unintentional, will cause Aurora impacts. This can occur if a technician, in a hurry to restart a facility, takes a shortcut or if there is a malicious compromise of remote access to relays from a SolarWinds-induced or other malicious compromise. Recognize Aurora causes physical damage to long-lead equipment which can lead to very long outages possibly on the order of 9-18 MONTHS.
These possibilities have not been adequately addressed in grid reliability or insurance studies – the numbers simply are too big. Given what occurred in Texas, Mike Rogers’ article is particularly relevant – “Why America would not survive a real first strike cyberattack today” at https://thehill.com/opinion/cybersecurity/539826-we-would-not-survive-true-first-strike-cyberattack.
Fair Weather Threats
In plain English, the weather event that occurred in Texas occurred during the same time frame as the Chinese-made transformer hardware backdoor issue and the Russian SolarWinds hack. These cyber threats present significant new threats that must be addressed in Texas and elsewhere.
FERC has issued a Notice of Public Rulemaking (NOPR) on incentives for grid cybersecurity. NERC Critical Infrastructure Protection (CIP) rules use a tiered approach (Bright Line) to categorize assets on the bulk power system as high, medium, or low impact facilities, with more stringent security requirements for larger facilities. Large electric grid control centers are considered high impact. Medium facilities, which begin at 1,500 MW, include some smaller control centers, ultra-high voltage transmission, and large substations and generating facilities.
According to the NERC CIPs and the FERC NOPR, everything else on the bulk power system is low impact including the Chinese-made transformers which provide approximately 10% of the power to New York City!
What happened in Texas occurred to High, Medium, and Low impact facilities. Weather and cyber adversaries do not care about Bright Lines nor meeting compliance which is the heart of the CIP standards. The reliability part of NERC which issued NERC PRC-006 doesn’t care about Bright Lines either. FERC, credit rating agencies, and insurance companies should not either.
Who is going to explain to the almost 20 million people affected by the Texas outages that CIP rules are more important than keeping their lights on?
What Can Be Done
Learning from what happened in Texas, it is time for the U.S. and FERC to revisit its NOPR to base incentives on utilities demonstrating their cybersecurity upgrades effectiveness in hardening their grid instead of relying on Bright Lines. Similar to the existing rate recovery framework for generation and transmission facilities, FERC needs to establish cybersecurity rate recovery mechanisms based on cyber improvements effectiveness which has yet to be demonstrated for control system devices.
Just like Texas having its grid isolated from the rest of the country and assuming more risk for extreme weather cases, having all-electric distribution excluded from NERC cybersecurity requirements is a recipe for disaster. Rep. Michael McCaul (R-Texas) defended his state’s independent power grid Sunday, February 21st, telling CNN’s Dana Bash (https://www.politico.com/news/2021/02/21/mccaul-texas-power-grid-winter-weather-470576) that “we’re not used to this type of weather” and pointing to a decade-old report on winterization as the way forward. The Texas power grid “was set up that way to be independent of federal oversight and regulations,” McCaul said on “State of the Union.” “That’s very good with things like cybersecurity; not so good when it comes to an Arctic blast like this one.”
Unfortunately, Congressman McCaul misspoke when he stated cybersecurity is a benefit to Texas being independent of federal oversight and regulations. There have been more than 350 control system cyber incidents in the North American electric grid including many in Texas. Many of the control system cyber incidents in other regions also applied to Texas. ERCOT and Texas utilities participate in the NERC CIP process and follow the same regulatory requirements as utilities in the rest of the country. From a cybersecurity perspective, there is nothing unique about the Texas grid. Utilities in Texas use the same power generation and grid equipment (the same can be said for water and natural gas control system equipment) from the same vendors with the same remote access as other utilities outside Texas and even outside North America. Additionally, ERCOT uses SCADA systems from the same vendors as other ISO/RTOs.
Ironically, I was scheduled to give a presentation with a senior scientist from one of the national laboratories February 16th at the Texas A&M Instrumentation and Automation Symposium until it was deferred until the week of March 30th because of the power outages in Texas.
We have not only a responsibility but also an opportunity to use the Texas experience to make needed changes to regulations and guidance on cybersecurity of critical infrastructures. This includes all organizations that have a financial stake in critical infrastructure protection including credit rating agencies and insurance companies. From a cybersecurity perspective, the Texas debacle has pointed out the following:
Isolating the Texas grid from other sources of power contributed to the debacle.
The NERC CIP directives, which are followed by ERCOT and the utilities in Texas, directly contribute to the cyber insecurity of the grid by excluding electric distribution, control system devices, non-routable (serial) device networks, using the Bright Line criteria to exclude all but the largest generating plants and highest voltage transmission systems even though other facilities are also cyber vulnerable, not requiring the expeditious removal of identified malware, ignoring power flows, etc. These gaps apply to all US utilities and have been exploited resulting in wide-spread outages and equipment damage.
It is evident that our adversaries are watching what happened, how we are responding, and what is being done to prevent future grid impacts. As such, resilience means addressing what could possibly be expected. The solution to building and operating a more resilient grid and other critical infrastructures lies with leadership in the industry, government, Congress, and stakeholders such as credit rating agencies and insurance companies.