Alarm Bells

So many SCADA systems do not operate as intended or as published by the manufacturer. Also, it doesn’t make sense to conclude that because a SCADA product built by a vendor isn’t working, that is a bad product and should be replaced. That could be like replacing a Porsche 911 when it’s the driver or the pit crew that know the basics but don’t know how the car was setup for the driver and the specific track. Just maybe the car is tuned for dry conditions and it has since rained. No matter how poor this analogy is, what holds true is SCADA Systems in the field usually don’t look like the glossy presentation.

 

A tell-tale story for someone to know they are in trouble is calling the vendor for support and hearing the response: “the product is working as per design”. So, this begs the question, so why are there so many problems with SCADA Systems? Is there a magic bullet to fix them? How do the automation vendors survive if their SCADA products don’t seem to work so well?

 

Before we answer those provocative questions, perhaps we should address what in our opinion are the top 5 issues with SCADA SYSTEMS. We emphasise that the subject matter includes the word SYSTEM because SCADA software by itself is impossible to assess. With that context, here is our list. Our list is not in any particular order of ranking.

1. Incorrect selection and application of SCADA network protocol stack

Some SCADA Systems support Sequence of Events (SOE) or timestamped data. This means the field device has its own real time clock and the event time is attached to the process variable (PV) reading. This overcomes network latency and mitigates some of the issues that arise if the field device is temporarily disconnected from the system. This is a fantastic principle applied by most large utilities managing distributed assets today, however this is not how systems have historically been configured. If your system is MODBUS, then your system is not SOE. In contrast, a real time acquisition system is where the SCADA system scans the points available to it and uses its own time (server time) and records the virtual process value (PV) as the real value along with the server time. This real time acquisition system is more suited to plant-based systems with high speed local area network systems. Real time acquisition systems are also more dependent on high availability networks and often they also have redundant communication systems. Mixing the two methodologies spells DANGER! When both system types are mixed (particularly for remote field assets on slow communications networks) operators can be confused when events seem to be recorded in a different order than what makes logical sense or was reported by field staff or worse reported by customers! Mixing system data types can result in bad data capture which is not repairable. It is bad by design.

2. SCADA Application Redundancy

There are many modes of redundancy that a system can cater for. Modes can be limited by the interfaces, the computer infrastructure hosting the applications, the fail over strategy, the return to normal strategy and how the redundancy features operate natively. Redundancy is always complex. Even if one application is described as “much easier to configure than brand XXXX” or “it just works”, the deeper questions are; what is happening with the data when a software component fails? How does the system recover, is there data loss and how long is it in a degraded state? What alerts are available to the operator or system administrator so that they know something is in a degraded state.

 

If your system redundancy was implemented out of the box or there wasn’t a workshop to step through all of the possible fault scenarios, the chances are when failure occurs you have every reason to be nervous. As per point 1 above, SCADA Systems are complex application stacks and having a paired redundant interface, a software component missing, or an additional interface connected is likely to affect performance. A system’s engineering approach including thorough testing is mandatory to fully understand that a SCADA System in failure mode is or is not doing what it was designed to do. Consider this; if the SCADA system is limping along because of fault A, it is possible that perhaps seeing the degraded performance an operator or uninformed engineer begins to fault find degraded performance instead of resolving fault A. Fault A is the only issue of interest because the system operates in a degraded state either by product design or system design. All of this can be known and managed to an accepted operational standard if the chosen redundancy architecture was exposed from the beginning and tested periodically.

3. Direct connection to SCADA from external enterprise systems

SCADA Systems are visualisation and data collection (aggregation) tools. Even though most also have moderate storage systems in terms of compression, what they usually do not do well is support hundreds or thousands of simultaneous clients or 3rd party interface connections. As a SCADA System grows in size sometimes the number of clients/interfaces has a significant impact on the core SCADA system performance. “Dragging” huge amounts of data from SCADA storage systems is simply put, bad practice. The more suitable method of providing data repositories or warehousing the data is to capture the information as it is presented to the SCADA System and storing the large volumes of process data in a process historian as it is created. Perhaps one of the process historian’s most significant benefits is creating isolation to the SCADA system and ensuring a deterministic data stream to the core SCADA System. This model allows SCaDA to do what it is designed to do rather than attempting to make SCADA try to serve more than its primary purpose.

4. Poor engineering Practice

It hurts to admit this, but the state of most SCADA Systems would not be adequate to put men on the moon! This brash statement has little to do with the software systems themselves and more to do with the discipline of engineering, engineering standards management, change management and incomplete or a lack of staged testing.

 

So how do you detect poor engineering practice? The following checklist is not meant to be thorough, but it provides a hint of what should be available for critical infrastructure SCADA Systems where safety of people and assets is important. So when is that not important? Trick question. It is always important, and engineers cannot say “when there is no budget what can I do!” So, the list:

  • Defined requirements
  • An Approved design
  • A design management plan
  • Installation and Transition Management Plan
  • Configuration management and version control
  • Test Plans/Procedures
  • Requirements Verification Matrix which cross checks all requirements to a test procedure item
  • Tag or Point lists which establish conventions ie a dictionary. The idea of inverting bits because someone has wired an input back to front is a train wreck waiting for a person to blame.

Not having all of these things in place means engineers on the job are making it up as they go. It also means new engineers in maintenance, support or extending design, are more likely to make it up. They are basically being invited to make it up and one single instance perpetuates this behaviour. Rarely does an engineer have the time or scope to completely audit a system to make a minor change, thus minor changes are usually the start of the avalanche of standards degradation. It takes courage to say, “make the device fit the standard” or “rewire it correctly”. That means confronting a situation or challenging a colleague or contractor. This is the necessary discipline required to keep costs down in the long run and have a system sustainable for life. A lack of these standards leads to not only operational confusion, but it raises safety concerns.

5. Tech Bias

Now this is going to make someone angry. If you ask an engineer which is the best software application for your system, what is the answer going to be? Of course, the one they are good at! We just can’t help preserving our sense of importance. I say we, because I am an engineer too! it’s hard to be impartial, it takes practice. It takes a system that removes any chance of bias. The only way to select the right products for the job is to have a system of measurement that goes beyond the basic opinions of assessors. Is that even possible? The reality is, many products are suitable for most applications if configured to perform inside the operating envelope of the specific product. SCADA Products perform better when they are not customised to look like “another vendor’s” product. So, applying any software product as “commercial off the shelf” reduces the risk of attempting square pegs in round holes. It is not that you cannot bang it hard enough to get that square peg in, it is just that sometimes the peg may get a little stuck. With enough money you can always get a big enough hammer to make something work, but it won’t work “as designed” and support will be challenged when they hear about your creative award winning highly customised bespoke solution. In the background they may even be laughing.

Ok, I admit I have a problem. Is there any hope?

So back to the big questions at the start. When my SCADA is broken, is there a silver bullet? Simply, it could be broken because of one factor, but usually it is much more complex. Because SCADA can be a unified distributed application with subsystems and external interfaces, the number of inter-dependencies means the system is very complex and extremely dynamic in nature. Sometimes the performance of one particular interface e.g. a database connection, may render the SCADA System unusable. This places great emphasis on the need for SCADA System Administrators to have staging environments for functional testing, load testing and removing controllable risks.

 

SCADA might not be for you. Maybe you need an IoT solution. Most of the principles above also apply to IoT, so be wary of IoT as the silver bullet for a limping or crippled SCADA System. One must understand “the why” before we throw out the baby with the bath water and start again!

 

On the brighter side, well-engineered and maintained SCADA SYSTEMS can operate for years and years without down time. This is the reality for some operators, just not the majority. Has it much to do with the brand of SCADA? I think the answer is above.