Classification in Complex Event Processing

Posted on 02/01/09 2 Comments

Following up on the excellent discussion in Predicting Events with Logistic Regression I think it is time to talk a bit about the importance of classification in complex event processing.  CEP is, by definition, about detecting business opportunities and threats in real-time.   It follows, that by definition, CEP is centered around classifying and discriminating complex events as either opportunity or a threat.

In earlier posts,  I have often mentioned the importance of Bayesian analytics in CEP/EP.  The Apache Mahout development team specifically lists Support Vector Machines (SVM), Logistic regression, Bayesian networks, Perceptron and Winnow and Neural Networks as classification algorithms.

Note: The key concept of Bayes’s theorem is that the true rates of false positives and false negatives are not a function of the accuracy of the test alone, but also the actual rate or frequency of occurrence within the test population; and, often, the more powerful issue is the actual rates of the condition within the sample being tested.

However, before diving into these methods (we will continue in future posts), let’s discuss classification a bit more.

Looking to our friend Wikipedia, statistical classification is a procedure in which individual objects are grouped based on quantitative analysis on one or more characteristics inherent in the objects and is based on a training set of previously grouped objects.   We often see this form of classification in network-based intrusion detection, where neural networks are trained to baseline normal network traffic and this training set is used to classify network traffic as normal or abnormal.   We see a similar application in spam detection where Bayesian networks are used to classify text as spam or ham.   We like ham; spam is bad.

There is no single classifier that works best on all given problems and various tests much be performed to compare classifier performance.  In a statistical classification problem, precision is the number of true positives divided by the total number of elements labeled as belonging to the classFalse positives are objects incorrectly labeled as belonging to the class (ham classified as spam, for example).   False negatives are objects which were not labeled as belonging to that class but should have been (spam classifed as ham, for example).  Recall, in this case, is defined as the number of true positives divided by the total number of elements that actually belong to the class . (i.e. the sum of true positives and false negatives.)  These ratios can be translated directly into probabilities.

Basically, it should be trivial to see that CEP problems of detecting opportunities and threats in real-time can be viewed as a classification problem where precision, recall, true positives, true negatives, false positives, and false negatives are key concepts.   Most CEP classes of problems are based around optimizing the tradeoffs of falsely classifying an object as belonging to a group (for example, a false positive threat) and missing the threat altogether (a false negative).  Terms like Type I error (a false positive) and Type II error (a false negative) are used to describe possible detection errors created in statistical decision processes.

In some classification problems, there can be zero tolerance for any false negatives. One example would be the threat of a nuclear strike.  In this case, false negatives have far greater impact than a false positive on missile defense.   Well, that might not be true if the false positive results in a counter missile strike!!  This outlines some of the core challenges of detecting opportunities and threats in real time.    CEP is non-trivial, but let’s not get into game theory this year.

Classification of events is critical for complex event processing.  Complex events are harder to classify than simple events.   This might be a good place to start when formulating a quantitative definition of a complex event.

Share and Enjoy:
  • Print
  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Yahoo! Buzz
  • Twitter
  • Google Bookmarks

2 Comments

  1. Rainer von Ammon says:
    Sunday, February 1, 2009 at 2:35pm

    Hi Tim,
    around 2 years ago, I wrote in the complex-events blog of David Luckham that the problem of recall, precision, noise and so on was very well investigated in the disciplin of Information Retrieval starting in the late sixtieth or seventieth. A great and famous guy was Professor Gerald Salton http://de.wikipedia.org/wiki/Gerard_Salton. There was a long discusssion of positive or negative, means relevant or irrelevant, a bit relevant, sometimes relevant … Probabilistic approaches, Fuzzy set approaches. We should use this knowledge for CEP. I tried to wrote a summary around 1984 in German unfortunately, but there is a lot of better original English literature

    Best regards,
    Rainer v. Ammon

  2. Tim Bass says:
    Sunday, February 1, 2009 at 7:52pm

    Hi Rainer,

    Thanks for stopping by and commenting. Great to see you.

    Yes, I think many of us could not agree more that the dialog about CEP/EP should be about established detection theory and related classification and discrimination topics (and also clustering, etc.)

    Unfortunately, a lot of time has been wasted in the last few years because the CEP/EP dialog has been dominated by semi-meaningless marketing buzzwords like BAM and EDA to name a few. The field is dominated by vendors and their ecosystem analyst-partners who do not have a clue about detection theory, reducing CEP/EP to jargon and buzzwords.

    Let’s keep the dialog going here. Almost all of the “other sites, forums and blogs” are too far off track to be of use to furthering the state-of-the-art of event processing. We have way too many people reinventing the wheel and the NIH, not invented here attitude does little to help business solve their challenging CEP/EP problems.

    The foundations for event processing were here long before the buzzword “CEP” was created. This buzzword, as it is current used, does little to further the state-of-the-art; in fact is has proven itself to be a huge step backwards, I am disappointed to say.

    Yours sincerely, Tim

Post a Comment

You must be logged in to post a comment.