From time to time we are asked, “what is the best reporting system for my operational system?”
What is the best reporting system for my process historian or my SCADA system?
Firstly, we are going to answer a different question by stating that here are some of the most common reporting tools we see used with process historians: PowerBi, MS SRS, Tableau, Oracle BI, Qlik, Tibco.
In addition, each of the historian vendors usually bundle native analysis tools, optimised trending widgets or plugins which all help novice users get started consuming data with relative ease. In terms of performance, often the bundled tools specific to each historian product are optimised for the product and do a pretty decent job. To use a more generic reporting tool, for example a Microsoft BI tool, extra effort is required to package or shape the data so that it can be digested by even the most experienced report developers. In general terms, data needs to be shaped, contextualised and organised for easy consumption (warehoused), so that by the time the average report writer comes along to prepare a new report, the hard work should all be done. Right? We will call this process of organising the data “dimensioning” the data. This can be a complex task if done manually by a software developer or database administrator or a simple task if done by a software product purpose built for creating a layer between the plant historian and the enterprise.
This layer of abstraction organises and publishes the specific data sets used for reporting. In essence a “single source of truth” is created for “reporting data”. Many experts advocate the process historian as the “single source of truth” for data, and that remains true for raw data. When dimensioning the data, the new data sets are captured at the rate at which they are most likely to be consumed by reports, dashboards and casual users, thus optimising (minimising) the number of samples at the abstraction layer such that client consumption of information is exceptionally fast. The raw data remains available to technical users (engineers, data scientists, analysts) and dimensioned data is made available to the wider audience who do not have to be process experts to gain insights from what is provided. From our experience, we see report writers no longer having to build reports that have 8 or 9 joins just to produce simple reports because the information has been dimensioned in such a way that data retrieval is efficient or simple to digest.
Back to the original question, perhaps the best system for reporting for your process historian is the system you already have. Most organisations have selected a BI tool suitable for their organisation which is independent of the operational technology stack (process historian, SCADA, IoT). This makes complete sense and based on the number of users, why select the BI tool for a handful of use cases centred around process data? So, you can see, the answer to the original question is loaded with challenges.
How important is “dimensioning” data?
The larger the data sets and the more complex the relationships between devices, assets, instruments, and “ideas”, the more important it becomes to shape the data, i.e. to warehouse it efficiently. To complicate matters, there is a significant challenge to find people who understand process data. They are usually process engineers and analysts and are a precious resource in most organisations. They also, rarely have the technical skills to manually create data warehouses for process data either. Likewise, they are rarely report writers and database administrators. For this reason, technology can be used to solve this organisational human resource challenge and also overcome the complexities and performance limitations imposed by humans when they do attempt to “code” complex data relationships. This is stuff that technology was created to solve, the stuff humans should be happy to setup and step back from. This is mind numbing stuff. It is supposed to be a one-time event, i.e. a set and forget situation.
Recently I heard “on the street” one organisation paid upward of a Million Dollars for a single report to be developed for their organisation. I am chilled to consider they may have needed more than one report! Now that was probably one serious report and without knowing who, what and why this is just street talk, but I would be very confident in stating that the complexity of the alleged report suggests the data set was not dimensioned, organised and simplified for reporting purposes. If an organisation has a thousand reports to build, then setting up templates for repeat use, cannot afford to be complex any more. It doesn’t make economic sense.
Too often the core products (e.g. BI Tool or process historian or both) are blamed for a poor outcome. This finger pointing is exacerbated when one party implements the OT software stack and another party implements the BI tools. One could easily assume that based on the wonderful presentations of BI Tool available, anyone could produce the same result. From a presentation perspective, that may be true and organisations do in fact get great graphical results, but often the data is poor quality because either the source is not has not been qualified or the data has not been dimensioned and considered for consumption. The very act of dimensioning process data as outlined above implies carefully considering how data is to be consumed and ensuring the raw data supports the desired result. As new ideas and correlations in raw data are detected (perhaps using artificial intelligence) then new dimensions can be created without disturbing history and anything that has been done before. Dimensioning the data can thankfully be retrospectively applied to systems that currently have historical data. This means organisations that are currently challenged to produce the desired organisational outcomes may not have to panic that they are starting from scratch. It is not necessarily time to throw it all away, just rethink how the data is organised and consumed, then dimension and share it. Also, it is not all about planning and organising either. We have hinted there is a quality factor to consider.
Trusted data trumps everything?
Just because we have data stored and that data is connected to an instrument, it doesn’t mean the data is accurate or useful. The data samples may also be repeatable, repeatably inaccurate. Perhaps information is sampled too much or not enough. These important data attributes are more a factor for the IoT or OT system configuration and are important when making plans to publish information to audiences who may not be process savvy. For the developers, they need to keep in mind that Quality is assumed by consumers and once confidence is lost, it is difficult to restore.
The big challenges for the last decade have been getting the data into process historians efficiently. That problem was solved with improvements in technology and supporting infrastructure. IoT now helps provide even more methods to acquire data for us to consume. Now that we are storing trillions of bytes of information in some systems every week, how can we possible query that much information? The good news is, you don’t have to, if you understand the nature of the data.
Data needs context so that it can be accurately understood. Two decades ago the motto was, if you don’t know what you want the data for, store it all. In the old days, with acquisition rates slow, and no one making significant decisions on the face value of the data, that may have been ok. Today, the world is a very different place in terms of acquiring information faster and relying upon it to make decisions without having to speak to a room of data scientists or engineers who provide context about the accuracy and reliability of the specific information you are concerned about on any given day. Data has come of age and must stand on its own. The reporting systems that expose and publish our data warehouses must be architected intentionally for success. This has not been done well in the past.
How do we ensure process data is correct? Does it really matter?
There are a number of techniques being applied to data streams to ensure the data can be relied upon. “Edge”, new storage technology and data cleansing techniques aside, there is no substitute for making sure what gets initially stored is calibrated in the first instance. This is more an engineering practice than anything and it is as essential today as it was 20 years ago. Yes, calibrate the data at the first instance and never compromise on this. However, we do not live in a perfect world, especially the digital world. The next step is to detect anomalies in existing data streams. This can be done at the operational level (e.g. SCADA) where operators or users, tag or notify maintenance personnel where there is a problem. This seems like a sensible approach for any organisation that has invested in maintenance and operational staff who manage assets and equipment, however, the larger the system gets, the more challenging this becomes. When the number of maintenance events, exceptions and issues to address each day increase beyond one person’s ability to manage, the efficiencies of human management start to nose dive and better systems are required. Despite this obvious drawback, we can use new data driven technology to solve this issue no matter what the size of the organisation. For example, machine or process learning, as a specific extension to anomaly detection, can be used to determine changes or deviations to operating envelopes for individual instruments or very complex multivariable processes. We have now reached the stage where technology can detect with greater levels of confidence process degradation with better results than humans. The same technology is used to pin point the most significant contributors that cause change to how our systems are supposed to work.
Do I need new technology for Digital Transformation?
While we are still sorting out our data systems and ensuring we have reliable data sets for others to consume, we are sometimes missing the point that the new technologies we wish to apply to these clean data sets can also be used to purify what we already have! Machine Learning isn’t only about the future state and having a shiny utopian fully automated asset that tells us 3 months before it fails. It is about surviving today and efficiently knowing that the investment that has been made to date can still be leveraged even if it is not calibrated and especially when accuracy is yet to be established. We have tolerated the lack of calibration and accuracy for some time already and survived. The most important message of data management is establishing the new baseline.
To produce a useful baseline, the infrastructure must be reliable, and the architecture must support dimensioning the data so that advanced analytics and artificial intelligence can discover the value hidden deep within the organisation’s existing data repositories. Even though Big Data Analytics and improved real time information about operations are likely to foster significant process improvements and have transferable context within an industry, the hidden value is specific and localised to the enterprise. These improvements are derived from correlating seemingly unrelated local circumstances and process inputs about specific equipment and assets.
The world now appears to have matured sufficiently that the low hanging fruit of operational efficiency has mostly been “picked” by smart humans, encapsulated in standard operating procedures and smart purpose-built bespoke software. However, it is the repeatability of the data stream that unlocks new value… value we are still yet to fully uncover. This new value can be achieved today with existing technology. The challenge is to dimension, baseline and then efficiently measure our process data systems using humans for the setup and management and technology for the rest. Digital Transformation relies on reliable data streams as outlined above. We will address Digital Transformation in another article more specific to process, practice and how we interface with our technology.