Recently we completed the installation and training of an open source Bayesian classifier to replace a rule-based approach to manage forum spam. In a nutshell, we found the rule-based approach was highly prone to both false positives and false negatives; however, a statistical approach using a Bayesian approach has turned out to be far superior. We are applying this same approach to real-time threat analysis and other classification problems.
The engineering question is not “should we completely get rid of rules and replace rule-based approaches with more sophisticated analytics?”. Rules are useful and work good for many simple processing problems. However, rules alone are highly inefficient for most classes of (not simple) problems. I have pointed this out a number of times over the years and I think most people “get it”; so I was a bit surprised when I read this post last year by Paul Vincent, CEP versus ESP – an essay (or maybe a rant). In that post, Paul blogged:
“The wider “complex event processing” term additionally covers other mechanisms like ECA rules, production rules, and so forth …”
I think the industry would be a lot better off (grow faster, solve more problems, be more profitable) if folks selling hammers would cease to define the world based on what can be tooled with hammers; and folks who sell screwdrivers would stop defining the world based on what screwdrivers can do well. To Paul’s credit, he does conclude, correctly in my view:
…in most enterprises there are usually multiple use cases for multiple types of CEP that are best handled by multiple paradigms (such as specialist ESP, event-driven business processes, rule-driven event processing, event-based business rules, event-driven analytics, etc). One should no more expect a large company to rely on a single CEP paradigm as it would on a single computer hardware technology.
This brings me back our various classification projects, one is documented in A New Bayesian Spam Classifier Using B8. In that project, we used some simple rules to pre-process text, but the more sophisticated and complex processing is performed by a statistical classifier.
Let me simply conclude by voicing my continued frustration at anyone who believes that rule-based (or query-based) approaches are a metaphor for complex event processing. They are not. Rule and query-based approaches are more closely aligned to simple event processing. Writing IF-THEN-ELSE logic is quite simple. Adding a new condition to an IF-THEN-ELSE statement is also simple. The only thing complex about this approach is managing a large set of rules, because the more complex the problem the more unscaleable and difficult to manage any rule-based approach becomes.
Conversely, the opposite can be said to be true with systems that are specialized in complex event processing. Complex data sets become training sets for more advanced statistical methods. In fact, “the more the merrier” is a good way to describe it.
For example, if you have a rule or query processing system and a new condition appears, your system will experience either a false positive or false negative. Then, you must go write a set of rules to manage that new condition. That new set of rules might adversely effect your existing rule base (we have seen this in practice) and cause an unexpected false positive (or negative) later on down the road. Rules are simply not efficient in complex data processing solutions.
However, when you use advanced analytics, like a well designed Bayesian classifier, and a new condition appears, it is not necessary to write any more logic. No coding. No new configuration. No new rules. You simply send the new condition(s) to the classifier and the system “learns” from the experience.
We would all be better off if the folks in the CEP space (including my friends, and ex-close friends, at TIBCO, StreamBase, Progress, etc) would stop using CEP as a metaphor for rule and query-based event data processing. In fact, the opposite is more likely true. True “complex event processing” is the processing task that rules do not perform efficiently – the software that Paul Vincent (and others) marginalizes as “and so forth” because they work for a company (or companies) that are selling “a hammer” (rule-based software) and so therefore everything out there must be defined as “a nail” (a problem that can be solved with rules).
This is one key reason that CEP, the term and the technologies, continues to flounder and sink. Customers and end users need more sophisticated methods, but the vendors keep trying to tell us that “simple” is “complex” and “complex” is “simple”. Only software vendors, analysts and advertisers sing the praises of CEP because most complex problems cannot be efficiently solved with rule or query-based approaches (alone). The new users, the people with the complex problems, are not “buying the hype”.
Perhaps we should rename the CEP space “Orwellian Event Processing” ?
‘… describes the situation, idea, or societal condition that George Orwell identified as being destructive to the welfare of a free [professional] society. It connotes an attitude and a policy of control by propaganda, surveillance, misinformation, denial of truth, and manipulation of the past.”