Hi, my name is John Patton, I’m a Cloud Solutions Architect at NorthBay solutions with 20 years in software development and solutions design. At NorthBay, we’re always working to understand the state of the art in technology and application/solutions delivery.
The beauty of NBS, is if you’re interested in a particular technology, you can make a pitch to work in that domain. For instance, I’m interested in the Internet of Things, or IoT, so I expressed interest in working in this area, and since I’m writing a blog about it, you can guess the answer.
Internet of Things (IoT)
The Internet of things can include everything from your Amazon Alexa, Google Home, or NestHub, through to smart plugs, lightbulbs, cameras and doorbell cameras.
In a business setting, there’s Industrial Internet of Things, or IIoT, in which the connected device can be a smart machine, like a refrigerator,or motor or robot, which provides business value, perhaps in terms of monitoring of running parameters which can be used to cause the issuance of alarms or alerts if the device is running outside of normal operating parameters. Otherwise, depending on the device, one could make changes to the settings of the device, to account for a changing external environment, or change in business requirements.
This could mean changing the settings on a manufacturing machine due to environmental temperature, or settings on a motor to maintain optimal operations without inducing failure in the bearings of the motor, for example.
The work I undertook in the IoT area was specifically within Industrial IoT or IIot. In December, 2020, AWS released a service called Amazon Monitron.
https://aws.amazon.com/monitron/
Monitron is an IoT solution with built in Machine Learning capabilities. The service is quite simple to set up within the AWS Console; you only need to set up a Single Signon (SSO) user and a Monitron project. Everything else is managed through a smart phone running the AWS Monitron App, which is downloadable from the Google Play store, since it is only supported on Android as of this writing.
Monitron is predicated on dedicated hardware, which consists of an IoT Gateway, and sensors. The IoT gateway is a super simple device with only a power input as its sole physical connection and is configured to connect to the local wireless network using a phone with the Amazon Monitron App.
SCREENSHOT of Phone
Once the gateway is connected to the internet and your Monitron instance, sensors are connected to the gateway by again using the Monitron app on your phone and NFC by adding a sensor.
When adding the sensor, details as to the class of device the sensor will be monitoring, the site, the location on the device under monitoring can be added. Once the sensor is added to the gateway and physically connected to the device to be monitored, sensor data around temperature and vibrations will be sent to the Monitron service. In the initial phase, data will only be collected and fed to the Machine Learning in order to establish the baseline behaviour. Any failure which occurs during the baselining phase will not be reported and must be added manually, otherwise the algorithm will assume it is normal behavior. Baselining will take between 2-7 days. After baselining and depending on the threshold set by the class of device selected for the particular sensor, alarms will be triggered within the Monitron app.
One thing of note which I found interesting, is there are no other services within the AWS account setup to provide backend services to Monitron. There are no related CloudWatch logs, no S3 buckets visible, only the Monitron service and SSO user accounts setup initially; all interaction with the Monitron service takes place through the application itself, including adding sensors, reporting failures, receiving notifications and alerts, taking ad-hoc readings, everything.
This is a very plug and play service. It is easy to set up, monitor and run.
My Test Environment
One of the challenges of doing this type of PoC is that I don’t work in a factory or large industrial shop, and due to the ever familiar limitations of Covid-19 restrictions, I had to do the testing at home. Luckily, I live in a detached house, so there are some long running machines to which I have unfettered access. Also, one of my many interests and another PoC I’m working on is related to Cryptocurrency mining, so since I had a AWS Monitron Starter kit, which includes the IoT Gateway and five sensors, I came up with a test plan. My mining rig has 3 RTX3080 GPU’s, so that’s three, I put one sensor on each. Given that this is a PoC and those GPU’s are expensive, rather than super gluing the sensors to them, I simply set them on top.
Three sensors utilized! Additionally, I have a heat pump, which provides both heating and cooling for my house, so it is a peaky load, coming on and off based on the temperature in the house. I taped a sensor on top, since I didn’t want super glue there either.
That left me with a single sensor. Since houses built in the 21st century are very air tight, there is another long running motor in my house, the air exchanger. This device runs all the time. It has a fan inside, which pulls in outside air for circulation throughout the house and pumps stale air outside. I also taped a sensor on top of it.
Now, with all sensors deployed and collecting data, I waited for the system to establish the baseline. I was curious how the bursty operation of the furnace would be interpreted by the service. To be honest, I thought I might get alarms just by virtue of the fact that the furnace runs for short periods of time and then is off. Days went by and no alarms.
The GPU’s and Air exchanger had more constant run temperatures and vibration ranges, so I watched the lines of the temperature and vibration measurements get longer. One thing I DID notice within the Monitron app, was that there were horizontal lines at two points along the Y axis on the vibration, labelled ISO Warning at 2.8 mm/s and ISO Alarm at 7.10 mm/s. I learned that in order to cause an alarm, the values would have to exceed these standard thresholds.
Test Rig
Now that the five sensors were detecting and feeding metrics to the service and I was able to see the baseline forming, I began to think now I might induce failure. I certainly didn’t want to cause a failure in an expensive GPU or an important and expensive home system in my house. When I started the PoC, I imagined building a complex Rube Goldberg machine and impressing the world at my craftiness. I started by removing the sensor from the furnace. I thought an alarm might be induced by the complete absence of any vibrations or temperature variations which should’ve been in the baseline - nothing. No alarm was triggered. I deduced that the sensors don’t cover below threshold situations as I’d thought.
Next, I thought I’d induce some heat into the metal contact plate at the bottom of the sensor with the soldering iron. That worked to a degree, pun intended, but then the sensor appeared to start to melt, and the temperature didn’t rise fast enough. So not wanting to destroy the sensor, I abandoned that approach.
In the end it came in as a much simpler solution. I wanted something that I felt could run for a fairly long time unsupervised and with which I could introduce the desired change in metrics which would trigger an alarm.
My old, but trusty corded drill came to the rescue. Because it was a standard drill I’d previously bought an attachment to make it like a hammer drill, in order to drill into concrete and rocks. It serves the purpose well. When I ran it for a short time, it still didn’t induce the type of vibrations I wanted, so I added a key chuck to the mix. Given its shape and obvious imbalance, I clamped the drill to my cabinet and zip tied the sensor to the drill and the trigger to the “On” position.
It didn’t take too long to induce an alarm.
View Alarms
Alarm Handling
Handling of alarms consists of two states, an “Alarm” state, where the device has passed a threshold, but not been acknowledged by repair staff, and a “Maintenance” state where the alarm has been acknowledged, but not repaired.
The intent here is that someone acknowledges the alarm so others needn’t worry about it and that person should work to remediate the issue by performing the maintenance, be it repairing or replacing the failed component. Following that work, the person should enter the failure conditions within the Monitron app, to clear the issue and feed the ML that it may learn about failure conditions. Once this has been done, the device’s sensor should be operating within tolerances and the alarm should be cleared.