Neon Enterprise Software Blog

Welcome to Neon Enterprise Software Blog Sign in | Join | Help
in Search

Data Management Today by Craig Mullins

News, views, and issues involved in managing data as a valuable corporate asset.

InfoSphere Streams: Analyzing Any Data, Anywhere, All the Time (IOD2009)

I am still at the IBM Information on Demand conference in Las Vegas and today IBM briefed me on their stream computing solution - InfoSphere Streams. I mentioned this briefly in yesterday's blog posting about analytics but I want to get into the topic in much more depth today.

So what is stream computing? Basically, it is the ingestion of data -- structured or unstructured -- from arbitrary sources and processing it without necessarily persisting it. Any digitized data is fair game for stream computing. As the data streams it is analyzed and processed in a problem-specific manner. The "sweet spot" applications for stream computing are when devices produce large amounts of instrumentation data on a regular basis. The data is difficult for humans to interpret easily and is likely to be too voluminous to be stored in a database somewhere. Examples of types of data that are well-suited for stream computing include healthcare, weather, telephony, stock trades... you get the idea.

By analyzing large streams of data and looking for trends, patterns, and "interesting" data, stream computing can solve problems that were not practical to solve using traditional computing methods. Another useful way of thinking about this is as RTAP - Real-Time Analytical Processing (as opposed to OLAP, On-Line Analytical Processing). 

The IBM product for stream computing is called InfoSphere Streams. It runs on xSeries blades (up to 125 x86 blades) using Linux. It is based on three main abstractions:

  1. The stream - bit pipes of data which can be subscribed to
  2. Operators - analytical calculation processors
  3. Topology - the integration of streams to operators

The data streams into the system, which is built as a series of progressing, cascading steps. Each step progressively refines the analysis looking for the  information, patterns, trends, and diagnoses. Consider, for example, a law enforcement application with a stream video surveillance data. Much of the stream will not be interesting. It becomes interesting when a person shows up in the video. So the operators would be analyzing the video, performing scene detection and face identification. When one is found that section of video can be captured and retained. And the face might even be matched automatically against a database of known criminals.

Another example: IBM and the University of Ontario Institute of Technology (UOIT) are using InfoSphere Streams to help doctors detect subtle changes in the condition of critically ill premature babies.  The software ingests a constant stream of biomedical data, such as heart rate and respiration, along with clinical information about the babies.  Monitoring "preemies" as a patient group is especially important as certain life-threatening conditions such as infection may be detected up to 24 hours in advance by observing changes in physiological data streams. Constantly monitoring the stream of healthcare data can enable many types of early diagnoses that would take medical professionals much longer to draw. For example,a rhythmic heartbeat can indicate problems (like infections); a normal heartbeat is more variable. Analyzing a ECG stream can highlight this pattern and alert medical professionals to a problem that might otherwise go undetected for a long period. Detecting the problem early can allow doctors to treat an infection before it causes great harm.

A stream computing application can get quite complex. Continuous applications, composed of individual operators, can be interconnected and operate on multiple data streams. Again, think about the healthcare example. There can be multiple streams (blood pressure, heart, temperature, etc.), from multiple patients (because infections travel from patient to patient), having multiple diagnoses.

IBM's stream computing offerings and research is the result of more than 20 years of IBM information management expertise, five years of development by IBM Research, and more than 200 patents. The solution relies upon a new programming language to express topologies and operators called Spade (soon to be renamed).By processing millions of data points per second and performing advanced analytics on the data can help to usher in a shift in the way we manage and deal with vast amounts of data. It is all

It is all a part of what IBM is referring to as new intelligence for a smarter planet: systems are more instrumented, interconnected, and intelligent. And that will enable organizations to better focus on value, exploit opportunites more effectively, and change and move more quickly.

The future is here and it might be time to re-think the way we do business... by joining the stream.

Published Tuesday, October 27, 2009 11:05 PM by cmullins

Comments

 

Twitter Trackbacks for Data Management Today by Craig Mullins : InfoSphere Streams: Analyzing Any Data, Anywhere, All the Time [neonesoft.com] on Topsy.com said:

October 28, 2009 3:14 AM
 

uberVU - social comments said:

This post was mentioned on Twitter by JBezivin: RT @craigmullins #IOD2009 #InfoSphere #Streams: Analyzing Any Data, Anywhere, All the Time http://tinyurl.com/yhdb6c8

November 4, 2009 6:07 AM
Anonymous comments are disabled

About cmullins

Craig S. Mullins is a data management strategist for NEON Enterprise Software. Craig has extensive experience in the field of database management having worked as an application developer, a DBA, and an instructor with multiple database management systems, including working with with DB2 for z/OS since Version 1. Craig is also an IBM Information Champion and is the author of two books: "DB2 Developer’s Guide" and "Database Administration: Practices and Procedures."

This Blog

Syndication

News

Be sure to visit my web site at http://www.craigsmullins.com
Powered by Community Server, by Telligent Systems