MapReduce is a software framework implemented in C++ with interfaces in Python and Java introduced by Google to support parallel computations over large (multiple petabyte) data sets on clusters of computers.  The Apache  Hadoop project is a free open source Java MapReduce implementation.  Mahout is an Apache project, based on Hadoop, with an objective to build scalable, Apache-licensed machine learning libraries.

The Mahout team is initially focused on building the ten machine learning libraries detailed in Map-Reduce for Machine Learning on Multicore by seven members of Stanford’s computer science department.   These libraries, some, if not all, critical for “real” complex event processing, include;

  1. Locally Weighted Linear Regression (LWLR),
  2. Naive Bayes (NB),
  3. Gaussian Discriminative Analysis (GDA),
  4. k-means,
  5. Logistic Regression (LR),
  6. Neural Network (NN),
  7. Principal Components Analysis (PCA),
  8. Independent Component Analysis (ICA),
  9. Expectation Maximization (EM), and
  10. Support Vector Machine (SVM)

Ready to move beyond rules and rule-based systems to process complex events? Interested folks should visit the Apache Mahout Wiki.

Note: See also, Map-Reduce: A Relevance to Analytics?