MapReduce is a software framework implemented in C++ with interfaces in Python and Java introduced by Google to support parallel computations over large (multiple petabyte) data sets on clusters of computers. The Apache Hadoop project is a free open source Java MapReduce implementation. Mahout is an Apache project, based on Hadoop, with an objective to build scalable, Apache-licensed machine learning libraries.
The Mahout team is initially focused on building the ten machine learning libraries detailed in Map-Reduce for Machine Learning on Multicore by seven members of Stanford’s computer science department. These libraries, some, if not all, critical for “real” complex event processing, include;
- Locally Weighted Linear Regression (LWLR),
- Naive Bayes (NB),
- Gaussian Discriminative Analysis (GDA),
- Logistic Regression (LR),
- Neural Network (NN),
- Principal Components Analysis (PCA),
- Independent Component Analysis (ICA),
- Expectation Maximization (EM), and
- Support Vector Machine (SVM)
Ready to move beyond rules and rule-based systems to process complex events? Interested folks should visit the Apache Mahout Wiki.
Note: See also, Map-Reduce: A Relevance to Analytics?