In the recent past, most of our work revolved around I/O systems, networks, controllers and processing of I/O data to make machines operate efficiently and correctly. I think most of us have now come to understand that our jobs are now more than that. Yes, we will still have to know about EtherNet/IP, Profinet IO, Modbus TCP, and OPC UA and how controllers work, but what we really must understand is data – how to manage it, process it, store it and normalize it.
The last one, normalizing it, is one that creates a lot of confusion. What is data normalization? Is EtherNet/IP or PROFINET IO data normalized? If not, what does it mean to normalize it? It’s a pretty big subject, but I’m going to tackle it in this article and the following one.
The term normalization or data normalization has its root in database development. Normalization from a database perspective is very strictly defined as the organization of data in one or more databases in such a way that data is not duplicated, redundant or inconsistent. As a simple example, you wouldn’t want a person’s social security number duplicated in both the Payroll database and the Human resource database. Database people also use it to specify how to design tables and establish relationships between tables in ways that protect the data and make it more flexible. Of course, the confusion is created because it doesn’t mean any of those things in the manufacturing world.
In the manufacturing world, there are unlimited sources of data. Data is created by a myriad of sensors, automation devices, controllers, human input devices and much more. Data arrives in every format imaginable. Luckily, we can count on 8-bit bytes (standardized in the 1950s), but our data may be organized as one, two, four or more bytes. Decoding may mean interpreting some number of bytes as simple binary, floating-point, ASCII or something else. And worse, different devices sometimes use different scaling and engineering units. You could have different flow meters with different engineering units: one reporting flow in gallons per minute and the other in liters per minute. Two identical devices could use different scaling, reporting a single byte of data on different scales. Fifty means one half of full scale on a 1 to 100 scale while fifty is approximately one-fifth of full scale on a 1 to 256 scale.
The world is messy, and for a long time, it didn’t matter all that much. Data from these devices with different data formats, engineering units and scales would be processed individually by the local controller in logic that a programmer would write. It wasn’t optimal but it was tolerable. But in the Industry 4.0 world, messy data is a big problem.
Now, we want to bring data together from all sorts of devices to view it, chart it, report it, archive it, do process analytics on it and much more. It’s downright impossible to do any of that if you can’t correlate two data flows because their data formats, scaling and engineering units are different. And you really don’t want to waste the very expensive time of your process analytics people writing software to modify values in databases they’ve collected from your factory floor.
We’re never going to clean up the inputs. The day will never come where every sensor manufacturer and device vendor will agree on the best way to present data. They may agree on the media to use and offer Ethernet or IO-Link or something else, but the day isn’t going to come when they’ll agree on exactly what those bytes mean that traverse those physical media.
In the next article, we’ll describe a solution for this mess and how RTA is handling that problem in our product line (Hint: OPC UA).