Is a single source of truth possible for an enterprise information management system?
Yes!
The single source of truth buzzword is common place today. Being very clear about what this means to a data consumer helps remove some of the mystery and articulate some of the complexities hidden beneath the surface. Perhaps the most important myth to dispel is that usually there is more than one data warehouse. Even though there is usually one data warehouse for all process data there are other non-process data bases that have important information. The single source of truth principle is more related to one user interface which organises and contextualises what the data consumer needs to see.
Operational or data analysis – which will it be?
It is not so much a matter of one data type or the other (mutually exclusive outcomes), unless the system design and in particular its growth strategy is left to chance. This is synonymous with fitting a Big Mack Bullbar to a Morris Mini and expecting great performance. In their own wright both items are useful and have their place, but together there is more to consider.
Traditionally, process plants have been funded and managed by engineering groups. Increasingly planners, modellers and asset managers are seeking to measure and operate the enterprises entire asset investment more efficiently. Many process plants were originally developed for operational management only. The control system communications system would normally be designed for very small access times so that alarms and control actions would occur instantaneously and operators would see the actions in real time, virtually instantaneously. For example, the time to annunciate an alarm condition on the process plant floor workstation might be 2 seconds, and sampling rates for process variables important as they are to the “head office”, would be second in importance to alarm annunciation.
Secondary to this requirement would be the transferring of volumes of data. This is particularly true for batch type data. Historically, logging analogue information in the SCADA or HMI System has been supported by the relatively fast access times. A drawback with this scenario is in increasing the number of analogues (process variables) in a plant and how they are processed by control and monitoring equipment. This has a large direct impact on how the HMI system performs from a data resolution perspective. So, as a system grows, its data performance can decrease even if the information is being reported centrally, and regularly. The greater issue is how fast can the software platform scan the PLCs/RTUs on the plant.
Modelling data accuracy and resolution
In any process control system it is imperative that data resolution be determined early in the design. Resolution inevitably determines the volume of sampled data to be reported. This impacts on both the communications media requirements (bandwidth) and ultimately the database size. Over sampling although safe in the minds of those planning a system without an intimate knowledge of the process to be managed, can virtually make a communications system unusable in certain situations. These potential stop points are usually referred to as bottlenecks. Field instrumentation accuracy and stability should be matched by the control and monitoring equipment. How often the analogues are sampled and the time resolution are critical decisions for many processes. The monitoring equipment communication protocol should also support the time base resolution requirement of both the sample equipment and the data repository. This is perhaps more apparent on time series historical data that arrives in the repository some time after the events have occurred in the remote plant.
Capturing the minimum reporting element
The minimum reporting element is the smallest pre-processed measure of quantity the data modellers wish to see or archive. For example, instantaneous flow may be nice to have, but even high speed data acquisition loggers are limited in memory capacity to support this dream. Depending on the objective and the process to control, it may be a reasonable request to pre-process the flow by sampling every second and averaging every 5 samples. The resolution on such a process might still be 12 bit data with an accuracy of 0.5% and would be clearly understood by the data owners. Armed with a summary of all the analogues and their minimum reporting definitions, the system designer can embark on balancing the operational performance specifications with the data modelling requirements which determines what is stored in the “single source of truth” process data warehouse.
Managing this balance requires attention to detail especially on low bandwidth communications media.
Why not have it all?
Having it all comes at a price, particularly for distributed control systems. The main engineering discipline for a balanced design approach is communications bandwidth planning based on an intimate understanding of the protocols, electrical interfacing standards and of course the data volumes and access requirements as discussed above. The approach must begin with the end in mind. The data outputs (back-end system) must be defined before the front end (data gathering engines and devices) are designed or even chosen.
What about legacy systems?
You may ask what about my system? You may conclude that no one thought too much about Information Systems or reports when our system was designed. Yes, it is true that perhaps the system may not have been designed with reports or back end database connections in mind. The good news is that more often the way a system is built is the cause of most system performance issues. The other good news is, sometimes going back to total system design and replacing one component can solve the problem. The thing to be weary of, is simply addressing the bottleneck without understanding what is really going on is fraught with risk. Removing one bottleneck sometimes just shifts it somewhere else. Control of data communications is required to effectively manage distributed systems and this doesn’t happen by accident.
The most important mindset for a single source of truth.
Even today HMI software applications generally have some type of reporting system inbuilt, but the functionality and openness is usually focussed on raw data and lacks significant localised intelligence. This is because it doesn’t make sense to distribute business intelligence throughout a network and the field devices rarely have the processing grunt to run business intelligence algorithms. This is also the reason it is important to capture the right data from the field without degrading its quality rendering it no longer useful when it finally makes its way to the BI Platform where smart processing can be performed. The most important golden rule is to understand the data at the source, have a control mechanism for data management and then ensure the communications network will support it.
How did we end up with broken enterprise reporting systems?
The product developers are not able to guess how the end user’s plant is defined and therefore tools are provided to the market place to facilitate reporting. The reporting suites (visualisation and metrics) provided have little to do with the performance of the underlying platform, even though doing reports on poor quality platforms has plagued the industry for years, the right platform is unaffected by any reports any user throws at it.
Usually external information systems are required to provide generic tools for accessing and manipulating the data appropriately. For process control engineers reporting is usually what is left for last. “Let’s see what comes out first and then we can customise your reports to match your particular requirements” might add comfort to a client wanting to hear that they are a part of the design process, but it overlooks the significant demand that back end systems have on front end real time applications. You simply cannot do this and expect a good outcome.
SCADA/HMI real time systems should be developed with Information System’s specifications completed first. Design the inputs and associated controls according to the overall objectives, or simply put, define all the outputs first!