Real CEP News: Amazon Announces Elastic MapReduce

Yesterday Amazon announced the public beta of Amazon Elastic MapReduce, a web-based service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data.  Amazon Elastic MapReduce utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

Using Amazon Elastic MapReduce,  CEP developers, for example, can instantly provision as much or as little capacity as they need to perform data-intensive tasks for applications such as indexing, data mining, log file analysis, times-series analysis, anomaly detection profiling, machine learning, financial analysis, scientific simulation, or bioinformatics research.  Amazon Elastic MapReduce now permits developers to focus on crunching and analyzing data (including historical event data) without having to worry about set-up, management or tuning of Hadoop clusters and the associated computational capacity.

To work with Amazon Elastic MapReduce, a CEP developer, for example, could develop an application to run statistical algorithms against an “event cloud.”  The developer uploads event data to Amazon S3, uses the AWS Management Console or APIs to specify the number and type of instances required for the analysis, and click “Create Job Flow.” Amazon manages the rest, running Hadoop over the number of specified instances, providing progress monitoring, and delivering the output to Amazon S3.

Amazon hopes this new service offering will prove a powerful tool for data and event processing requirements.  Anyone can sign up and start using the service today at aws.amazon.com/elasticmapreduce.

Share and Enjoy:
  • Digg
  • StumbleUpon
  • del.icio.us
  • Technorati
  • Facebook
  • Mixx
  • Google
  • Slashdot
  • Furl
  • Reddit
  • Spurl
  • LinkedIn

4 Responses to “Real CEP News: Amazon Announces Elastic MapReduce”

  1. Hello Tim,

    I would not call this (Complex) Event Processing. For me Event Processing is about processing data interactively (i.e. as soon as the data arrives) but it’s not about processing data in offline or batch. MapReduce it’s not interactive, it is a design pattern to implement batch computations. Am I wrong?

    Regards,

    PatternStorm

  2. Hi Claudi,

    Great to hear from you; thanks for commenting.

    You are correct at MapReduce is primarily used to process massive amounts of data, off-line.

    However, you must keep in mind that most complex event processing classes of problems require much more “heavy lifting” in the off-line processing vis-a-vis on-line processing. Most complex event problems require intensive model building, baselining, event cloud profiling, training, etc and this is all done off-line in support of on-line real-time processing.

    As an analogy, it is like a football game. Most of the work in winning a football game goes into what happens before the game. You must find the right players, create models for playing based on the players, you train, train, train, you practice, practice, practice, you must study the opposition and you build models against various scenarios. Then, you go to the field and play the game in real-time. Without all the off-line work, there is little chance to win the game in real-time.

    According to your comments, all the work that is done prior to the players taking the field in a league game is not “football”. I can assure you that all football players and coaches will disagree with you.

    The same is true in most all areas of complex event processing. The vast majority of the work that must be done to detect complex events and situations in real-time is done off-line. I don’t have any reliable statistics in my back pocket; but I would venture to say that in most “real” CEP applications, like football, greater than 90 to 95 percent of the work (maybe much more in some applications) is done off-line, preparing for the on-line game.

    Yes, I very much believe you are wrong in saying that using MapReduce for off-line event processing is not complex event processing.

    Yours sincerely, Tim

  3. To add to my reply above, I wanted to point out that many people like to equate CEP to the human mind (decisioning) or situational awareness. Most experts with experience in building systems that attempt to provide situational awareness will agree that the vast majority of the work happens “in the past”….. or “off-line” depending on the terminology we adopt in our explaination.

    For example, detecting a missile launch and determining if the missile is a threat requires myriad off-line processing long before actual real-time processing occurs. Without off-line processing ahead of real-time, real-time detection is near impossible (similar to the football analogy in my earlier comment).

    The same is true, actually, in how our human minds work. We learn to crawl before we learn to walk and before we learn to run and before we play football. Life is a training process and all that we learn, in the past, is what gives us the abilitiy to respond in real-time, now.

    In other words, we don’t just “show up” on the planet, one day a small egg fused with sperm, the next day a football star, it take years of human learning and training to become a footballer.

    A similar concept is true in complex event processing classes of problems. Only relatively simple problems can be solved without historical data, prior knowledge based on off-line processing. That is why most of the self-described CEP applications today are really more like “simple event processing” because most of these applications are built on simple forward chaining rules engines that have limiited “vision” based on a relatively small sllding time window.

    Rule-based systems that process events in a forward chaining manner across relative short sliding time windows in real-time have a very limited capabilitiy to provide situational awareness for most complex event processing applications if there has not been intensive off-line processing prior to the near real-time application.

    It is a common fallacy to think we can process complex events without prior knowledge, off-line processing, training, learning, modelling, model testing, etc. Most of the work happens long before “real-time”

    …. as I menioned before, you cannot win many professional football games without intensive “off-line” processing before league play begins.

    Yours faithfully, Tim

  4. [...] Elastic MapReduce - although well blogged by other people, I’m more interested in when a CEP vendor (Aleri?) [...]

Leave a Reply

Copyright © 2007-2008, The CEP Blog, All Rights Reserved.