Mahout on Elastic MapReduce: Running k-means Clustering
Following up on KMeans Clustering Now Running on Elastic MapReduce, Stephen Green has generously documented the steps that was necessary to get an example of k-Means clustering up and running on Amazon’s Elastic MapReduce (EMR) on the Apache Lucene Mahout wiki.
Mahout on Elastic MapReduce by Stephen Green
As a side note, there has been considerable discussion about how MapReduce is primarily useful for processing batch data. However, considering how easy it is to upload data to S3, it is a small leap of the imagination to visualize how we can upload real-time event data from myriad sources and process that data in near real-time (and process complex events) using EMR.
On the other hand, if Amazon’s EMR implementation proved to be overly restrictive for a CEP-type of application, it might be necessary to build our own Mahout/Hadoop/MapReduce Amazon Machine Image (AMI). Stay tuned.
Maybe some of our FSI readers/gurus can port (install) some event handlers over to EC2 and provide us with a public AMIs to experiment with?
Note: Amazon Elastic MapReduce Developer Guide (API Version 2009-03-31)
Filed under: Advanced Event Processing, Analytics, Apache Mahout, CEP News and Events, CEP Tutorials, Cloud Computing, Complex Event Processing, Cyberstrategics, Development and Evaluation, Education and Training, Event Processing, Open Source, Scheduling, Standards, Systems Engineering, Use Cases, Virtualization











