Clustered Databases Versus Virtualization for CEP Applications
In my earlier post, A Model For Distributed Event Processing, I promised to address grid computing, distributed object caching and virtualization, and how these technologies relate to complex event processing. Some of my readers might forget my earlier roots in networking if I continue to talk about higher level abstractions! So, in this follow-up post I will discuss virtualization relative to database clustering.
In typical clustered database environments there are quite a few major performance constraints. These constraints limit our capability to architect and design solutions for distributed, complex, cooperative event processing problems and scenarios. Socket-based interprocess communications (IPCs) within database clusters create a performance limitation contrained by low bandwidth, high latency, and processing overhead.
In addition, the communications performance between the application layer and the database layer can be limited by both TCP and operating system overhead. To make matter worse, hardware input-output constraints limits scalability for connecting database servers to disk storage. These are standard distributed computing constraints.
The physical architecture to address scalability in emerging distributed CEP solutions require a low-latency network communications infrastructure (sometimes called a fabric). This simple means that event processing agents (EPAs) require virtualization technologies such as Remote Direct Memory Access (RDMA). CEP agents (often called CEP engines) should have the capability to write data directly to the memory spaces of a CEP agent fabric (sometimes called an event processing network, EPN). This is similar to the concept of shared memory as an IPC in UNIX-based systems applied to distributed computing, so all “old hat” UNIX systems engineers will easily grok these concepts.
RDMA virtualization helps improve performance by bypassing operating-system and TCP overhead resulting in significantly higher bandwidth and lower latency in the EPF (Event Processing Fabric – I just minted a new three letter acronym, sorry!). This, in turn, improves the communication speed between event processing agents in an event processing network (EPN), or EPF (depending on your taste in acronyms).
Scheduling tasks such as a distributed semaphore checking and lock management can also operate more efficiently and with higher performance. Distributed tables scans, decision tree searches, rule-engine evaluations, Bayesian and neural analytics can all be performed in parallel, dramatically improving both performance and scalability of distributed event processing applications.
In addition, by adopting transparent protocols with existing socket APIs, the CEP architect can bypass both operating-system and TCP protocol overhead. In other words, communications infrastructures for CEP that optimize networking, interprocess communications, and storage, provide architects with the underlying tools to build better solutions to computational complex problems.
Many of the communications constraints of earlier distributed architectures for solving complex problems, such as blackboard architectures, can be mitigated with advances in virtualization. So, in a nutshell, virtualization technologies, are one of the most important underlying capabilities required for distributed, high performance CEP applications, in my opinion.
The article, Virtualization hot; ITIL, IPv6 not, appears to indicate that some of the top IT managers at Interop New York might agree with me.
Unfortunately for a few software vendors, virtualization threatens to dilute their market share for ESB and message bus sale. (OBTW, SOA is DOA.) “Old hat” UNIX system programmers will recall how the UNIX IPC called “message queues” lost favor to sockets, pipes and shared memory. A similar trend is happening in the virtualization world with RDMA as a distributed shared memory technology versus message-based communications technologies. I will opine more on this topic later.