Category: Open Source
Orwellian Event Processing
Recently we completed the installation and training of an open source Bayesian classifier to replace a rule-based approach to manage forum spam. In a nutshell, we found the rule-based approach was highly prone to both false positives and false negatives; however, a statistical approach using a Bayesian approach has turned out to be far superior. [...]
Read moreGeoIP and Geo-Targeting
Lately I have been busy with a web-based geo-targeting project. For those of you not familiar with geo-targeting, the deeper you get into geo-targeting, the more you realize how important and interesting it is. Geo-targeting is used for fraud detection, personalization, ad-targeting, content-delivery, and more. In addition, the same basic concept is used [...]
Read moreMahout on Elastic MapReduce: Running k-means Clustering
Following up on KMeans Clustering Now Running on Elastic MapReduce, Stephen Green has generously documented the steps that was necessary to get an example of k-Means clustering up and running on Amazon’s Elastic MapReduce (EMR) on the Apache Lucene Mahout wiki. Mahout on Elastic MapReduce by Stephen Green As a side note, there has been [...]
Read moreKMeans Clustering Now Running on Elastic MapReduce
Stephen Green, blogger and principal investigator of the AURA project in Sun Labs, has moved the state-of-the-art of analytics-as-a-service a few steps forward with the first documented working Mahout application on Amazon’s Elastic MapReduce (EMR). EMR was announced on April 1st and on April 15th Stephen announced to the Mahout users group that he was [...]
Read more[ANNOUNCE] Apache Mahout 0.1 Released
The Apache Lucene project is pleased to announce the release of Apache Mahout 0.1. Apache Mahout is a subproject of Apache Lucene with the goal of delivering scalable machine learning algorithm implementations under the Apache license. The first public release includes implementations for clustering, classification, collaborative filtering and evolutionary programming. Highlights include: Taste Collaborative Filtering [...]
Read moreA Review of Zabbix – Zabbix Rules! (Part 2)
In A Review of Zabbix – Zabbix Rules! (Part 1) we provided a brief introduction to Zabbix in the context of network and security management. In this post I will discuss Zabbix as an event processing platform. Zabbix is like most event processing platforms. Zabbix provides both agent-initiated events as well as server-requested events. In [...]
Read moreReal-Time Predictive Analytics for Web Servers
We recently made the decision to move to Zabbix to monitor one of our busy production Apache web servers. One of the things we need to do in the future is try to predict system outages and take corrective actions before the system actually goes down. For example, recently a busy server experenced an outage [...]
Read moreAnalytics as a Service (A3S)
We are seeing more interest in using cloud computing infrastuctures like Amazon’s EC2 and S3 for number crunching. I can easily see how folks in network security could use a Neural Network service to baseline their network traffic patterns, perhaps crunching massive log files, or even crunching near-real time data if they wanted to pay [...]
Read moreCEP by Apache Mahout via the Google MapReduce Framework
MapReduce is a software framework implemented in C++ with interfaces in Python and Java introduced by Google to support parallel computations over large (multiple petabyte) data sets on clusters of computers. The Apache Hadoop project is a free open source Java MapReduce implementation. Mahout is an Apache project, based on Hadoop, with an objective to [...]
Read moreStreaming SQL Approaches Insist in Ignoring Causality by PatternStorm
The following excellent discussion is reposted from Streaming SQL approaches insist in ignoring causality by PatternStorm. The recent paper “Towards a Streaming SQL Standard” by Oracle and Streambase unifies and generalizes two different execution models of Streaming SQL: Oracle’s and StreamBase’s. While it’s true that the generalization succeeds in overcoming the unability of both execution models [...]
Read more