Real-Time, Online and Offline Complex Event Processing
Using NIST as computer science reference, an online algorithm is an algorithm that processes data (including events) element-by-element (and event-by-event), serially without having the entire problem space available from the beginning. In contrast, an offline algorithm is provided the entire problem set from the start.
Hence, real-time event processing applications generally involve online processing. Offline processing is useful when creating models that will be used in online applications. Most people would equate real-time event processing with online processing, generally speaking.
However, let’s assume that our online algorithm is also learning in near real-time. In that case, the online algorithm is actually updating based on a complete set of data already input to the system, by definition this is offline processing (an output based on a complete set of input data). Hence, real-time adaptive or learning systems have both elements of online and offline processing.
Now, one interesting aspect is what happens between the output of the offline process and the update of the online process. If we are very confident in our offline learning algorithm(s), we can update the online process in near real-time. If we are less confident, we would need to test our new model before updating our online algorithm. Naturally, the consequences and costs of false-positives vs. false-negatives play an important role in our confidence level.
Recently we discussed Apache Mahout and some very talented folks commented on how Google’s MapReduce framework was optimal for offline processing of large sets of data and how the output of an offline process would be used to update an online application. This is a very interesting discussion relative to complex event processing.
For example, if we have a massive set of data that is continually changing (like network traffic in an intrusion or fraud detection scenario) and we are using this data used to build and update detection models, when does the distinction between online and offline processing become blurred? If one system has a much smaller offline-online update lifecycle compared to another system, and the detection confidence is quite high, this isn’t this a key competitive advantage?
In other words, decision support has always been about decreasing the time between observation and action (with high accuracy of course). Consulting for the US military, we referred to this as the OODA loop (observe-orient-decide-act). The goal of most systems is to gain advantage over an opponent by having a shorter (in time) OODA loop. Naturally, if our fighter plane in in the sky processing events in real-time, it is not very “good” if the plane must land to update its real-time, online process.
In business, the same is true. The goal is to have a shorter OODA loop than our competitors. Hence, having learning systems that can process offline data in near real-time and update real-time, online systems is the ultimate goal of most CEP applications.
Hence, complex event processing is ultimately about reducing the OODA-loop lifecycle. To accomplish this, we need both offline and online processing.