Category: Apache Mahout

Mahout on Elastic MapReduce: Running k-means Clustering

Posted on 05/07/09 No Comments

Following up on KMeans Clustering Now Running on Elastic MapReduce, Stephen Green has generously documented the steps that was necessary to get an example of k-Means clustering up and running on Amazon’s Elastic MapReduce (EMR) on the Apache Lucene Mahout wiki. Mahout on Elastic MapReduce by Stephen Green As a side note, there has been [...]

Read more

KMeans Clustering Now Running on Elastic MapReduce

Posted on 04/19/09 1 Comment

Stephen Green, blogger and principal investigator of the AURA project in Sun Labs, has moved the state-of-the-art of analytics-as-a-service a few steps forward with the first documented working Mahout application on Amazon’s Elastic MapReduce (EMR). EMR was announced on April 1st and on April 15th Stephen announced to the Mahout users group that he was [...]

Read more

[ANNOUNCE] Apache Mahout 0.1 Released

Posted on 04/08/09 1 Comment

The Apache Lucene project is pleased to announce the release of Apache Mahout 0.1. Apache Mahout is a subproject of Apache Lucene with the goal of delivering scalable machine learning algorithm implementations under the Apache license.  The first public release includes implementations for clustering, classification, collaborative filtering and evolutionary programming. Highlights include: Taste Collaborative Filtering [...]

Read more

Real CEP News: Amazon Announces Elastic MapReduce

Posted on 04/02/09 4 Comments

Yesterday Amazon announced the public beta of Amazon Elastic MapReduce, a web-based service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data.  Amazon Elastic MapReduce utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service [...]

Read more

A Review of Zabbix – Zabbix Rules! (Part 2)

Posted on 03/23/09 2 Comments

In A Review of Zabbix – Zabbix Rules! (Part 1) we provided a brief introduction to Zabbix in the context of network and security management.  In this post I will discuss Zabbix as an event processing platform. Zabbix is like most event processing platforms.  Zabbix provides both agent-initiated events as well as server-requested events.  In [...]

Read more

Real-Time, Online and Offline Complex Event Processing

Posted on 02/08/09 No Comments

Using NIST as computer science reference, an online algorithm is an algorithm that processes data (including events) element-by-element (and event-by-event), serially without having the entire problem space available from the beginning.  In contrast, an offline algorithm is provided the entire problem set from the start. Hence, real-time event processing applications generally involve online processing.  Offline processing is useful when creating [...]

Read more

Analytics as a Service (A3S)

Posted on 02/02/09 No Comments

We are seeing more interest in using cloud computing infrastuctures like Amazon’s EC2 and S3 for number crunching.  I can easily see how folks in network security could use a Neural Network service to baseline their network traffic patterns, perhaps crunching massive log files, or even crunching near-real time data if they wanted to pay [...]

Read more

Classification in Complex Event Processing

Posted on 02/01/09 2 Comments

Following up on the excellent discussion in Predicting Events with Logistic Regression I think it is time to talk a bit about the importance of classification in complex event processing.  CEP is, by definition, about detecting business opportunities and threats in real-time.   It follows, that by definition, CEP is centered around classifying and discriminating complex [...]

Read more

Predicting Events with Logistic Regression

Posted on 01/27/09 19 Comments

In earlier post, CEP by Apache Mahout via the Google MapReduce Framework and Apache Mahout: Real-Time Decisioning in the MapReduce Framework, we started to look at the Google MadReduce framework and the planned analytics of the Apache Mahout development team.  In this post, we will look at the first algorithm mentioned by the Mahout team, [...]

Read more