I am still at the IBM Information on Demand conference in Las Vegas and today IBM briefed me on their stream computing solution - InfoSphere Streams. I mentioned this briefly in yesterday's blog posting about analytics but I want to get into the topic in much more depth today.
So what is stream computing? Basically, it is the ingestion of data -- structured or unstructured -- from arbitrary sources and processing it without necessarily persisting it. Any digitized data is fair game for stream computing. As the data streams it is analyzed and processed in a problem-specific manner. The "sweet spot" applications for stream computing are when devices produce large amounts of instrumentation data on a regular basis. The data is difficult for humans to interpret easily and is likely to be too voluminous to be stored in a database somewhere. Examples of types of data that are well-suited for stream computing include healthcare, weather, telephony, stock trades... you get the idea.
By analyzing large streams of data and looking for trends, patterns, and "interesting" data, stream computing can solve problems that were not practical to solve using traditional computing methods. Another useful way of thinking about this is as RTAP - Real-Time Analytical Processing (as opposed to OLAP, On-Line Analytical Processing).
The IBM product for stream computing is called InfoSphere Streams. It runs on xSeries blades (up to 125 x86 blades) using Linux. It is based on three main abstractions:
- The stream - bit pipes of data which can be subscribed to
- Operators - analytical calculation processors
- Topology - the integration of streams to operators
The data streams into the system, which is built as a series of progressing, cascading steps. Each step progressively refines the analysis looking for the information, patterns, trends, and diagnoses. Consider, for example, a law enforcement application with a stream video surveillance data. Much of the stream will not be interesting. It becomes interesting when a person shows up in the video. So the operators would be analyzing the video, performing scene detection and face identification. When one is found that section of video can be captured and retained. And the face might even be matched automatically against a database of known criminals.
Another example: IBM and the University of Ontario Institute of Technology (UOIT) are
using InfoSphere Streams to help doctors detect subtle changes in the
condition of critically ill premature babies. The software ingests a
constant stream of biomedical data, such as heart rate and respiration,
along with clinical information about the babies. Monitoring
"preemies" as a patient group is especially important as certain
life-threatening conditions such as infection may be detected up to 24
hours in advance by observing changes in physiological data streams. Constantly monitoring the stream of healthcare data can enable many types of early diagnoses that would take medical professionals much longer to draw. For example,a rhythmic heartbeat can indicate problems (like infections); a normal heartbeat is more variable. Analyzing a ECG stream can highlight this pattern and alert medical professionals to a problem that might otherwise go undetected for a long period. Detecting the problem early can allow doctors to treat an infection before it causes great harm.
A stream computing application can get quite complex. Continuous applications, composed of individual operators, can be interconnected and operate on multiple data streams. Again, think about the healthcare example. There can be multiple streams (blood pressure, heart, temperature, etc.), from multiple patients (because infections travel from patient to patient), having multiple diagnoses.
IBM's stream computing offerings and research is the result of more
than 20 years of IBM information management expertise, five years of
development by IBM Research, and more than 200 patents. The solution relies upon a new programming language to express topologies and operators called Spade (soon to be renamed).By processing millions of data points per second and performing advanced analytics on the data can help to usher in a shift in the way we manage and deal with vast amounts of data. It is all
It is all a part of what IBM is referring to as new intelligence for a smarter planet: systems are more instrumented, interconnected, and intelligent. And that will enable organizations to better focus on value, exploit opportunites more effectively, and change and move more quickly.
The future is here and it might be time to re-think the way we do business... by joining the stream.