Category: Open Source

Orwellian Event Processing

Posted on 02/28/10 16 Comments

Recently we completed the installation and training of an open source Bayesian classifier to replace a rule-based approach to manage forum spam.  In a nutshell, we found the rule-based approach was highly prone to both false positives and false negatives; however, a statistical approach using a Bayesian approach has turned out to be far superior. [...]

Read more

GeoIP and Geo-Targeting

Posted on 09/08/09 No Comments

Lately I have been busy with a web-based geo-targeting project.   For those of you not familiar with geo-targeting, the deeper you get into geo-targeting, the more you realize how important and interesting it is. Geo-targeting is used for fraud detection, personalization, ad-targeting, content-delivery, and more.   In addition, the same basic concept is used [...]

Read more

Mahout on Elastic MapReduce: Running k-means Clustering

Posted on 05/07/09 No Comments

Following up on KMeans Clustering Now Running on Elastic MapReduce, Stephen Green has generously documented the steps that was necessary to get an example of k-Means clustering up and running on Amazon’s Elastic MapReduce (EMR) on the Apache Lucene Mahout wiki. Mahout on Elastic MapReduce by Stephen Green As a side note, there has been [...]

Read more

KMeans Clustering Now Running on Elastic MapReduce

Posted on 04/19/09 1 Comment

Stephen Green, blogger and principal investigator of the AURA project in Sun Labs, has moved the state-of-the-art of analytics-as-a-service a few steps forward with the first documented working Mahout application on Amazon’s Elastic MapReduce (EMR). EMR was announced on April 1st and on April 15th Stephen announced to the Mahout users group that he was [...]

Read more

[ANNOUNCE] Apache Mahout 0.1 Released

Posted on 04/08/09 1 Comment

The Apache Lucene project is pleased to announce the release of Apache Mahout 0.1. Apache Mahout is a subproject of Apache Lucene with the goal of delivering scalable machine learning algorithm implementations under the Apache license.  The first public release includes implementations for clustering, classification, collaborative filtering and evolutionary programming. Highlights include: Taste Collaborative Filtering [...]

Read more

A Review of Zabbix – Zabbix Rules! (Part 2)

Posted on 03/23/09 2 Comments

In A Review of Zabbix – Zabbix Rules! (Part 1) we provided a brief introduction to Zabbix in the context of network and security management.  In this post I will discuss Zabbix as an event processing platform. Zabbix is like most event processing platforms.  Zabbix provides both agent-initiated events as well as server-requested events.  In [...]

Read more

Real-Time Predictive Analytics for Web Servers

Posted on 03/02/09 6 Comments

We recently made the decision to move to Zabbix to monitor one of our busy production Apache web servers.  One of the things we need to do in the future is try to predict system outages and take corrective actions before the system actually goes down. For example, recently a busy server experenced an outage [...]

Read more

Analytics as a Service (A3S)

Posted on 02/02/09 No Comments

We are seeing more interest in using cloud computing infrastuctures like Amazon’s EC2 and S3 for number crunching.  I can easily see how folks in network security could use a Neural Network service to baseline their network traffic patterns, perhaps crunching massive log files, or even crunching near-real time data if they wanted to pay [...]

Read more

CEP by Apache Mahout via the Google MapReduce Framework

Posted on 11/24/08 No Comments

MapReduce is a software framework implemented in C++ with interfaces in Python and Java introduced by Google to support parallel computations over large (multiple petabyte) data sets on clusters of computers.  The Apache  Hadoop project is a free open source Java MapReduce implementation.  Mahout is an Apache project, based on Hadoop, with an objective to [...]

Read more

Streaming SQL Approaches Insist in Ignoring Causality by PatternStorm

Posted on 09/05/08 1 Comment

The following excellent discussion is reposted from Streaming SQL approaches insist in ignoring causality by PatternStorm. The recent paper “Towards a Streaming SQL Standard” by Oracle and Streambase unifies and generalizes two different execution models of Streaming SQL: Oracle’s and StreamBase’s. While it’s true that the generalization succeeds in overcoming the unability of both execution models [...]

Read more
Page 1 of 212»