David Linthicum
Contributor

Where edge computing breaks down: Operations

analysis
Apr 05, 20225 mins
Cloud Computing

As edge computing becomes more widespread, many are experiencing unexpected operational challenges.

broken light bulb innovation fail fragile binary by patpitchaya getty 2400x1600

Let’s say that your job is to monitor oil well operations across a small country. You have a device installed at every oil pumpjack, the mechanism that pumps oil out of the ground from an existing well. This device monitors local weather and pump operations. It even automates local processes on the pumpjack.

Collectively, these devices are known as edge computers. They have their own processors, local storage systems, operating systems, and networking interfaces that allow them to communicate with a centralized collection and analysis system. This centralized system uses artificial intelligence and data analytics to determine when human operators need to be dispatched. For instance, the device can determine when a pump motor is about to fail or when oil flow is too high or too low.

The devices also leverage centralized data collection to monitor overall production and provide oversight for all the pumpjacks producing oil. There are 500 devices in this specific edge computing network, one for each remote pumpjack, and all devices communicate back to a centralized system on a public cloud provider.

The first few months of using these edge computing devices to monitor remote and unmanned pumping operations went fine. However, storage systems on the devices soon began to fail due to a known flaw, and network interfaces stopped and had to be reset. Most often, some key sensors used on the pumpjack stopped working. These problems could only be fixed by sending out humans to fix them, thus incurring a cost that defeated the purpose of leveraging these devices to automate pump operations.

To address this problem, remember that we should deal with edge computing like any other compute and storage platform under operational control—do the basics. Back up the data on each edge device, remote to central. Remotely update the operating systems and firmware much like you do on a smartphone. Support application updates that include changes to the data structure. Also, track the configuration, including operating system releases, application updates and patches, and even the software versions running on some of the smart sensors.

In the oil well example, about 500 different hardware and software components are tracked just for one device that controls one pumpjack. Remember, there are 500 pumpjacks. So, 250,000 hardware and software components must be tracked and operated.

The trouble with edge computing comes when you look at how things really work. Yes, we have very solid, quality components for our edge computer, such as network interfaces, storage systems, and processors that are all resistant to environmental challenges such as heat and humidity. However, if any one of these components fails, then many or most of the other components will fail to proceed in their tasks as well. For example, if the oil sensor stops working, we can’t deal with oil flow issues if we don’t know the temperature of the oil. If we can’t fix our oil flow issue, then we’ll need to shut the entire pumpjack down until someone can fix the sensor and restart production.

Similar issues can extend to medical device operations, remote factory operations, farming, or any scenario where you have to operate computing systems that are not easily accessible. The types of edge computing experiences described here are not uncommon today.

I suggest we do a bit more planning upfront about how these types of edge computing systems should operate. With the growing use of edge computing, we need to understand how we leverage configuration management and operations systems and then think through how to deal with what we’ll likely see in the field. We need to manage many of the same components, including their interdependence and the ability to deal with varying levels of failures. We also need to minimize the number of humans required to fix the issues and reduce the downtime of the edge computing systems and connected sensors.

Possible solutions come from a few different schools of thought. In many cases, the manufacturer or owner of the devices will develop a custom solution for the specific environment (like the oil field problems). Some promote using a redundant array of devices and sensors that can increase reliability to five 9s. Edge computing device platforms (computing, storage, and networking) are typically less than $200 per unit. Why not leverage several in a redundant array? Ask the same question about sensors.

I predict that we’ll need some industry standards and best practices to make reliable Internet of Things systems a workable reality for edge systems that run outside of data centers or other easily controlled environments. If everyone builds one-off solutions to serve their specific operational needs, nothing will end up being a true solution. It will take cross-collaboration between edge computing technology providers and industries. If we want edge computing to scale, we’ll first need some innovative thinking.

David Linthicum
Contributor

David S. Linthicum is an internationally recognized industry expert and thought leader. Dave has authored 13 books on computing, the latest of which is An Insider’s Guide to Cloud Computing. Dave’s industry experience includes tenures as CTO and CEO of several successful software companies, and upper-level management positions in Fortune 100 companies. He keynotes leading technology conferences on cloud computing, SOA, enterprise application integration, and enterprise architecture. Dave writes the Cloud Computing blog for InfoWorld. His views are his own.

More from this author