Machine Learning for Condition Monitoring

Communication networks, such as satellite systems, involve many pieces complex pieces of equipment. Technicians who manage such systems often require software that helps assist them in monitoring all the components in order to keep everything running reliably. The stakes here can be large. There are many cases where single failures can have a large monetary impact. In 2009, a 30 second Super Bowl ad costed $3 million. If a satellite link necessary to transmit that ad failed for even two minutes a broadcaster could lose $12 million. Even a slight improvement in the ability of technicians to foresee such an event is valuable. There are many smaller and more frequent costs that could be reduced through better foresight into failures; for instance, in managing the inventory of spare parts or scheduling technician trips to service equipment in remote locations.

My research goals focus on applying machine learning, a combination of computer science and statistics, to predict failures in network hardware, particularly satellite Earth terminals. This task is known as condition monitoring. Software systems exist that record events on devices in satellite terminals, such as changes in signal strength or device temperatures. My research will give us the ability to make quantitative predictions on the time until the failure of each device. It is representable in much the same way as life tables are in the insurance industry. However, the predictions by my methods can change instantaneously in response to new events, which will provide more accurate predictions.

Consider a concrete example in existing software systems: every time a change in wind speed is noticed by a weather station on an Earth terminal, it is recorded in the data. This is important because that wind speed increase might be predictive in a loss of satellite signal, as the wind could be the result of an impending storm. In any given system there may be hundreds of different events being monitored. It is not possible for a human technician to discover all of the possible relationships without the aid of software that can pin point possible relationships between events. The task is not easy for computer programs either, because they don't have access to complete knowledge of the system they predict faults in, they must make predictions on empirical evidence and will not be perfect predictors. In other words, they must learn with experience, which is where machine learning becomes of use.

Machine Learning is the study of algorithms that improve with increased exposure to data. In contrast, current systems do not change no matter how much data they process; their entire functionality is specified in advance by the programmer. A machine learning algorithm, by contrast, becomes more intelligent as more data is processed. I will utilize machine learning to provide some intelligent prediction to monitoring software in satellite systems. The methods are a combination of statistical and computer science techniques. They adapt to perform better as a result of experience in a way that does not require a set of equations that completely specify the behavior of the system. It will enable technicians to much more efficiently manage their systems.

Typical computer software is written to follow a set of rigid rules specified by the programmer. Computers excel at following a set of rigid rules to solve a problem, such as adding up all the entries in an accounting record. Input from the real world is much harder to process if it is noisy or generated in a way that is not completely understood in advance. I will solve this task.

Machine learning algorithms can be applied in many different disciplines. They are usually tied to a given application area, which utilizes specialized domain knowledge. Some of these areas include speech applications, computer vision, bioinformatics, finance, and the focus of my research, condition monitoring. However, methods I develop for condition monitoring have the potential to be adapted and applied in these other application areas.