Since SCADA Systems are typically considered industrial-grade and robust, it’s common to assume they’re flawless, right?
However, many SCADA systems don’t perform as expected or as advertised by their manufacturers. It’s important not to hastily conclude that a malfunctioning SCADA product reflects its overall quality and warrants replacement. To draw a loose analogy, it’s akin to replacing a V8 Supercar when the driver or pit crew lacks knowledge about how the car was configured for the specific track conditions. Perhaps the car was tuned for dry weather, and now it’s raining. Regardless of the quality of this analogy, the reality is that SCADA Systems in practical use often differ significantly from glossy presentations.
A clear indicator of trouble is when one contacts the vendor for support and hears the response, “the product is operating according to its design.” This leads to questions about why there are so many issues with SCADA Systems. Is there a magic bullet to fix these problems? And how do automation vendors sustain their business if their SCADA products seem to have performance issues?
Before delving into answers to these thought-provoking questions, let’s first identify what we believe are the top five issues with SCADA SYSTEMS. It’s worth noting that we use the term SYSTEM because evaluating SCADA software in isolation is impractical. With that context in mind, here’s our list, presented without a specific ranking order.
1. Incorrect selection and application of SCADA network protocol stack
Certain SCADA Systems offer support for Sequence of Events (SOE) or timestamped data. In these systems, when supported the field device supports time series data, the data samples are time stamped with the real-time clock and the message is presented to the SCADA system. This approach effectively addresses network latency issues and helps mitigate challenges that can occur when a field device experiences temporary disconnection from the system. This is a valuable practice employed by most large utilities managing distributed assets today, although it differs from historical system configurations.
In contrast, if your system operates on MODBUS, it does not employ the SOE approach. Instead, it utilizes a real-time acquisition system. In this system, the SCADA system scans the available data points and records the virtual process value (PV) as the actual value, alongside the server’s timestamp. Real-time acquisition systems are better suited for plant-based systems with high-speed local area networks. They tend to rely on high-availability networks and often incorporate redundant communication systems to overcome challenges with intermittent communication systems. With higher speed communication systems “disconnects” can be more noticeable.
Mixing these two methodologies can be problematic. When both system types are combined, especially for remote field assets on slow communication networks, it can lead to operator confusion when events appear to be recorded in a different sequence than how it was received in real time. This confusion may be further compounded by what was reported by field personnel. Mixing different system data types can result in irreparable data capture issues, essentially creating a flawed system by design.
2. SCADA Application Redundancy
A system can encompass various modes of redundancy, each influenced by factors like interfaces, the underlying computer infrastructure hosting the applications, failover strategies, return-to-normal procedures, and the inherent functionality of redundancy features. Redundancy is inherently intricate. Even when an application is touted as “easier to configure than brand XXXX” or as simply “plug and play,” it raises deeper questions. What transpires with the data when a software component experiences a failure? How does the system bounce back? Is there data loss and how long does it operate in a degraded state? What alerts are at the disposal of operators or system administrators to signal a state of degradation?
If your system’s redundancy was either preconfigured or lacked a comprehensive examination of potential fault scenarios, you have every reason to be nervous. SCADA Systems are complex applications. Having a paired redundant interface, a missing software component, or an additional interface connection can impact performance significantly. A system’s engineering approach, encompassing rigorous testing, is obligatory to gain a full comprehension of whether a SCADA System in a state of failure is adhering to its designed functionality or not.
Consider this scenario: If the SCADA system is struggling due to Fault A, it’s plausible that an operator or an uninformed engineer might focus on diagnosing the degraded performance instead of addressing Fault A. Fault A should be the sole concern because the system operates in a degraded state by design. These aspects can be understood and managed to meet an acceptable operational standard if the chosen redundancy architecture was evaluated from the outset and periodically tested.
3. Direct connection to SCADA from external enterprise systems
SCADA Systems primarily serve as visualization, data aggregation, and alarm annunciation tools. While they typically include moderate storage capabilities with compression, their ability to handle hundreds of simultaneous clients or third-party interface connections is often limited. As a SCADA System expands in size, the number of clients/interfaces can significantly affect the core SCADA system’s performance. It is generally considered poor practice to excessively extract large volumes of data directly from SCADA storage systems.
A more effective approach to managing data is to capture information as it is presented to the SCADA System and store the substantial volumes of process data in a process historian as it is generated. One of the process historian’s most significant advantages is that it provides isolation for the SCADA system and ensures a deterministic data stream to the core SCADA System. This model allows SCADA to fulfill its intended purpose without overburdening it with tasks beyond its primary function.
4. Poor engineering Practice
It’s difficult to admit, but the majority of SCADA Systems wouldn’t meet the standards required for a mission to the moon. This assertion isn’t a critique of the software systems themselves but rather highlights shortcomings in engineering practices, standards management, change management, and inadequate or absent staged testing.
So, how can you identify subpar engineering practices? The following checklist is not exhaustive but offers a glimpse of what should be in place for critical infrastructure SCADA Systems where the safety of people and assets is paramount. It’s essential to emphasize that safety is always crucial, and engineers cannot simply say, “What can I do when there’s no budget?” The checklist includes:
The absence of these elements means that engineers are improvising as they work on the project. It also means that new engineers involved in maintenance, support, or design extensions are more likely to follow suit. Essentially, they are encouraged to improvise, and even a single instance of this behavior perpetuates it.
Engineers rarely have the time or scope to conduct a comprehensive audit of a system for minor changes, which often marks the beginning of a decline in standards. It takes courage to insist on adhering to standards, such as making the device conform to specifications or correcting wiring errors. This may involve addressing challenging situations or confronting colleagues or contractors. However, this discipline is essential to controlling long-term costs and ensuring the sustainability of a system. A lack of adherence to these standards not only leads to operational confusion but also raises safety concerns.
5. Tech Bias
The absence of these elements means that engineers are improvising as they work on the project. It also means that new engineers involved in maintenance, support, or design extensions are more likely to follow suit. Essentially, they are encouraged to improvise, and even a single instance of this behavior perpetuates it.
Engineers rarely have the time or scope to conduct a comprehensive audit of a system for minor changes, which often marks the beginning of a decline in standards. It takes courage to insist on adhering to standards, such as making the device conform to specifications or correcting wiring errors. This may involve addressing challenging situations or confronting colleagues or contractors. However, this discipline is essential to controlling long-term costs and ensuring the sustainability of a system. A lack of adherence to these standards not only leads to operational confusion but also raises safety concerns.
Summary
Returning to the initial questions, when faced with a malfunctioning SCADA system, is there a simple solution, a silver bullet? In most cases, it’s not a single factor but a complex web of issues that contribute to the problem. SCADA systems are intricate, distributed applications with numerous subsystems and external interfaces, making them highly dynamic and complex. Sometimes, the performance of a single interface, such as a database connection, can render the entire SCADA system unusable. This underscores the importance of SCADA System Administrators having staging environments for rigorous functional testing, load testing, and risk mitigation.
It’s worth considering whether SCADA is the right fit for your needs or if an IoT solution might be more suitable. Many of the principles discussed earlier also apply to IoT, so it’s essential to carefully evaluate IoT as a potential replacement for a struggling or crippled SCADA system. Before discarding your current system, it’s crucial to understand the underlying reasons (“the why”) and avoid making hasty decisions.
On a positive note, well-engineered and properly maintained SCADA SYSTEMS can operate seamlessly for many years without downtime. While this is a reality for some operations, it’s not the norm for the majority. And does it all come down to the brand of SCADA? The answer, as discussed, goes beyond brand considerations.