Neon Enterprise Software Blog

Welcome to Neon Enterprise Software Blog Sign in | Join | Help
in Search

Data Management Today by Craig Mullins

News, views, and issues involved in managing data as a valuable corporate asset.

  • What is Generosity Factor?

    It seems that there is some confusion “out there” regarding the generosity factor associated with using a zIIP specialty processor… so I thought I’d try to clarify things in today’s blog entry.

    For those unacquainted with specialty processors, you really should take some time to learn a bit more about them before you read on here. If you are interested, you can find a nice introductory piece here on my blog. At any rate, when you activate your zIIP processor, some percentage of the relevant workload can be redirected off of the main CP onto the zIIP – but not 100% of the workload. This can be frustrating for those unaccustomed to running with specialty processors. Even with zIIPs installed, all of the potential workload that could be run on the zIIP will not be redirected to the zIIP – only a percentage of it. This is what is referred to as the IBM “generosity factor.”

    In order for us to comprehend generosity factor first we must define some “new” terminology: qualified and eligible work. “Qualified work” is anything that can run on the specialty processor (in this case the zIIP). “Eligible work” is anything that is actually marked as dispatchable to a zIIP. Just because work is qualified does not make it eligible.

    When you are reviewing your performance reports, zIIP qualified time will be a larger number than zIIP eligible time. Qualified time is the total time which could have run on the zIIP (because it is a distributed DB2 request, XML, a parallel request, or other enclave SRB work). Eligible workload is that which IBM makes eligible to actually run on the zIIP processor. This does not mean it actually ran on the zIIP; perhaps the zIIP was running at capacity and IIPHONORPRIORITY was set to YES. In that case zIIP eligible workload can run on a general purpose CP.

    The IBM generosity factor then can be calculated by dividing zIIP eligible time by zIIP qualified time. Although the IBM generosity factor is not publicly published, testing has shown it to be about 55%. (Is that generous or stingy? Maybe it should be the stinginess factor instead, huh?)

    All of the above discussion really does not take into account ISV workload that gets moved to a zIIP. But, really, this type of workload doesn’t come into play when speaking about a generosity factor. ISVs that zIIP enable workload using enclave SRBs typically are not going to throttle the redirection of their workload like IBM does. Although the API provides an option to set a “generosity factor” it is uncommon for it to be set to anything other than 100 percent. Of course, that does not mean that 100 percent of the vendor product gets zIIP enabled (more on that in a moment).

    When specialty processors are “in the mix,” monitoring CPU time is not as straightforward as before. The CPU your machine consumes will fall into four areas:

    1. General purpose CP work running on the CP
    2. zIIP-qualified work not marked as zIIP-eligible, so it ran on a general purpose CP. This is the work that fell outside of the generosity factor.
    3. zIIP-eligible work that ran on a zIIP. This is the work that actually ran on the zIIP.
    4. zIIP-eligible work that overflowed to a general purpose CP. This is the work that could have run on the zIIP, but did not due to capacity issues.

    So What Does It All Mean?

    Well, a few things spring to mind after digesting all of this. First of all, in my opinion, this whole generosity factor thing is annoying. You bought some hardware (a zIIP processor) so why can’t it be fully exploited? In other words, isn’t it reasonable to expect the generosity factor to be 100 percent, at least most of the time? Yes, yes, if the zIIP is overburdened you’ll want to avoid sending more workload to it, but other than that, why wouldn’t you want to use it at its full capacity?

    I do not know of any ISV that has zIIP enabled their products and implemented a “generosity factor” of less than 100 percent using the API. But do not get confused by terminology. ISVs cannot achieve 100 percent product exploitation on zIIPs because a lot of the code cannot run in SRB mode -- that is, either too much of the functionality has already been written (and it would be too costly or difficult to re-write) or there are no SRB mode APIs to support what they want to do.

    Also, keep in mind that ISVs typically pay a performance penalty due to the increased path lengths from switching back and forth between SRB and TCB mode. This can actually increase the CPU consumption of the application after things get zIIP enabled, so be careful.

    Another implication of specialty processor adoption is that we need to spend a little bit more time capacity planning. You’ll need to know what the generosity factor is for all of the software you’ll be using that runs on zIIPs (and zAAPs, for that matter). Of course, the amount of work redirected to zIIPs for most system management type products will be inconsequential in the grand scheme of your overall workload.

    Think about it this way: you’ve got an ISV product for loading DB2 data. The ISV has zIIP enabled the utility and that is great but… how frequently are you actually loading data into your DB2 tables? What percentage of your overall workload is that? So how much money is it actually going to save you? And it is all about saving money, isn’t it? The more you can redirect to run on specialty processors, the more money you save.

    Now assume that 75 percent of your workload is DB2 related… and 15 percent of that is distributed requests, which can be redirected to zIIPs. Right off the bat you’d think that about 11.25 percent of your workload could run on the zIIP (100 x .75 x .15 = 11.25) but you’re forgetting the “generosity factor” of 55 percent which brings it down to about 6 percent. (100 x .75 x .15 x .55) 6.1875 percent.

    So the bottom line here is that you should perform actual tests on your systems of any products claiming to exploit specialty processors… only then can you really determine their exploitation of specialty processors as well as any potential benefit you might gain by using them.

  • Mainframes: The Safe IT Career Choice

    A recent Computerworld article (Bank of America touts mainframe work as a safe career)touts the mainframe as a safe haven for those considering a career in IT. This is an interesting article because the usual spiel you hear in industry trade rags is that the mainframe is dying and only a fool would work on such a platform. It is good to hear an alternate opinion on the matter in a journal as respected as Computerworld. (Of course, the fact that I agree with this opinion might have a little something to do with my cheer upon reading the article.)

    One of the highlights of this particular article is the discussion of avialable mainframe jobs at sites such as Monster (764 jobs over 30 days) and Dice.com (1,200 ads over 30 days). These are significant numbers of jobs, especially in a down economy.

    Another interesting tidbit from this piece is that "IBM says it's mainframe revenue has grown in eight of the last 13 quarters." This is impressive; consider the difficult servers market coupled with the impression that the platform is dying.

    Speaking of the death of the mainframe, don't you believe it for a minute. People having been predicting the death of the mainframe since the advent of client/server in the late 1980s. That is more than 20 years! Think of all the things that have died in that timespan while the mainframe keeps on chugging away: IBM's PC business, Circuit City, Koogle peanut butter, public pay phones, Johnny Cash... the list is endless.

    Some may counter that they recall reading about companies that were going to eliminate their mainframe. Well, yes, I'm sure you do remember those, I do, too. But do you recall reading many articles about companies that SUCCESSFULLY eliminated their mainframes? Many tried, few succeeded. Indeed, the re-Boot Hill web site provides examples of companies that tried to eliminate the mainframe but could not (hence, they had to re-boot). If you follow the link to the re-Boot Hill site click on the little tombstones to read the stories of failure.

    So, the mainframe is a rock-solid platform, continues to grow, and is producing a significant number of job opportunites... what is not to like?

  • It's 2010! Time for New Year’s Resolutions for DBAs.

     At the beginning over every year many of us take the time to cobble together some resolutions for the coming year. We plan to lose weight, save money, stop smoking, and so on. Usually, it doesn’t take long before we’ve abandoned these resolutions. Perhaps we’d be wiser to make some business related resolutions. With that in mind, here are some thoughts on the New Year’s resolutions you might be wise to make as a DBA in 2010.

    Are you insatiably curious? A good DBA must become a jack-of-all-trades. DBAs are expected to know everything about everything -- at least in terms of how it works with databases. From technical and business jargon to the latest management and technology fads, the DBA is expected to be "in the know." So perhaps “be more curious” would be a useful DBA resolution.

    Most DBAs know that private time is a luxury we cannot afford. A DBA must be prepared for interruptions at any time to answer any type of question -- and not just about databases, either. With that in mind, how are your people skills? DBA are usually respected as a database guru, but also frequently criticized as a curmudgeon with limited people skills. Just about every database programmer has his or her favorite DBA story. You know, those anecdotes that begin with "I had a problem..." and end with "and then he told me to stop bothering him and read the manual." DBAs simply do not have a "warm and fuzzy" image. However, this perception probably has more to do with the nature and scope of the job than with anything else. The DBMS spans the enterprise, effectively placing the DBA on call for the applications of the entire organization. As such, you will interact with many different people and take on many different roles. To be successful, you will need an easy-going and somewhat amiable manner. So another good New Year’s resolution might be to “improve your people skills.” Take a Dale Carnegie course or start by reading Carnegie’s seminal book, How to Win Friends and Influence People.

    How adaptable you are? A day in the life of a DBA is usually quite hectic. The DBA maintains production and test environments, monitors active application development projects, attends strategy and design meetings, selects and evaluates new products and connects legacy systems to the Web. And, of course: Joe in Accounting just resubmitted that query from hell that's bringing the system to a halt. Can you do something about that? All of this can occur within a single workday. You must be able to embrace the chaos to succeed as a DBA. So a third resolution might be to “roll with the punches” better – and without complaining!

    Of course, you need to be organized and capable of succinct planning, too. Being able to plan for changes and implement new functionality is a key component of database administration. And although this may seem to clash with the need to be flexible and adaptable, it doesn't really. Not once you get used to it. You just need to prepare yourself to be adapatable and organize to incorporate change more rapidly than others. So my final suggestion for a 2010 New Year’s resolution is to adopt a planning methodology and stick to it. Buy a planner – either electronic or not – and use it this year. You might even consider taking a time management class.

    If you keep all of these resolutions, just imagine how productive you will be in 2010. And then you can use 2011 to lose weight and save money and…

  • The Constantly Changing Role of the DBA

    One of the biggest challenges I see these days for DBAs is the ongoing redefinition of the job roles and responsibilities. Oh, most people know the rudimentary aspects of the job, namely keeping your organization's databases and applications running up to par. The DBA has to be the resident DBMS expert (whether that is DB2, Oracle or SQL Server, or most likely a combination of those). He or she has to be able to solve thorny performance problems, ensure backups are taken, recover and restore data when problems occur, make operational changes to database structures and, really, be able to tackle any issue that arises that is data-related.

    All of these roles continue to be requirements of the job, but that is no longer sufficient for most organizations. The DBA is expected to take on numerous additional -- mostly technical -- roles. These can include application development, managing the application server, enterprise application integration, managing Web services, network administration and more.

    Indeed, I would guess that if you compare the job description of DBAs across several organizations, no two of them would match exactly. This is both good and bad. It is good because it continually challenges the technically minded employees who tend to become DBAs. But it can be bad, too; because the job differs so much from company to company, it becomes more difficult to replace a DBA who leaves or retires. And no one can deny that database administration is a full-time, stressful job all on its own. But the stress level just keeps increasing as additional duties get tacked onto the DBA's "to do" list.
  • The History of ERP

    I was recently introduced to this "wiki" timeline that covers the history of manufacturing & ERP software. It is nicely done with useful information, so I thought I would share it with my readers.

    The timeline covers 17 key events that have shaped the ERP software industry over the last 50 years. It's a quick way to understand the context of current enterprise software events.

    Of course, 50 years is a long time and there are gaps in the coverage. After speaking with the originator of this timeline, he tells me that the intent is to grow this to 35+ events and dates… with the help of industry professionals. So I hope today's blog entry helps to get out the word.

    Please share your ideas and suggestions for improving this ERP Wiki.

  • Cleaning Out My Closet (Predicting the Future, Part 4)

    As I continue to pore over old magazines and documents that were clogging my closet, I keep on finding interesting things. Yes, I am still, slowly, methodically cleaning out the closet in my home office. It was quite a mess - and stuffed with many articles, research notes, and things that I just couldn't part with over the years. Now, the case can be made, that I cannot get rid of some of these things because I'm writing about them... and you never know when I might have to refer back to some of this material.  Just what I needed… another excuse to hang onto things!

    Of course, I am throwing out a lot of stuff, too. For example, that binder of PC standards from the mid-90's? Trashed. And I threw out a bunch of old parallel printer cables (not all of them, mind you, most of them)... after all, I am still a packrat, just a cleaner one now!

    There continue to be some interesting things I’m finding in my closet, too. Regular readers know that I have outlined some of them in recent posts here; and today I am going to subject you to a few more.

    I came across a research note from Gartner Group on the value of training IS professionals (from 1995, hence the IS instead of IT). In it, Gartner notes that the per-employee costs of retraining a programmer on a new development paradigm (that was a big word back then) is almost $7,000 over 25 days. I wonder if this still holds true? The note goes on to discuss the hidden costs, too: things like productivity loss, the cost of the underground support network, and diminished service levels. I think this is all still very valid. The more things change, the more they stay the same...Of course, you have to re-train your staff on new development methods, right? If not, we'd all still be coding COBOL.

    Another interesting Gartner Group research piece I pulled out of an old pile of papers was on the topic of metadata (from 1994). The note basically described the function of metadata for data transformation, data delivery, and informational purposes. It closed by stating that organizations need to recognize the importance of metadata to achieve success. And this is still true today. Indeed, there is somewhat of a renaissance of metadata management occurring in businesses today as they struggle to cope with growing mounds of data and stringent regulatory requirements. That is (sort of) what data governance is all about.

    And here is a white paper on how to inspire an audience as you give a presentation. With all the presentations I give each year I understand why I kept this. According to Carmine Gallo, author of Fire Them Up!, there are 7 keys for inspiring an audience, and they form an acronym that reads INSPIRE:

    1. Ignite Your Enthusiasm
    2. Navigate the Way
    3. Sell the Benefit
    4. Paint a Picture
    5. Invite Participation
    6. Reinforce Optimism
    7. Encourage Potential

    Finally, at least for today, I came across a 1997 research note on the Repository market. This research note talked about the consolidation that was going on in that space back then: Brownstone and Reltech acquired by Platinum technology; R&O acquired by Viasoft. Interestingly, neither of the acquirers exist as independent software vendors today: Platinum was acquired by CA and Viasoft was acquired by ASG. And today the big Respository in the sky concept is basically dead (or dying).

    I’ve written about this before, check it out here if you’d like - The Importance of Metadata, Part 3.

    I'm taking a break from the cleaning effort the next couple of days, but I doubt this will be the last blog post on this topic as I continue to review the "stuff" that comes out of my closet. So check back regularly for more…

  • Predicting The Future, Part 3

    As I continue the task of cleaning my home office I keep stumbling across stuff that attempted to predict the future. As readers of my first two posts in this series will know, predicting the future can be a messy business. Nobody can see into the future, no matter what some fortune tellers say. And technology changes quickly and sometimes something revolutionary will show up and render your prediction (guess) irrelevant. As an example, few people in the late 1980s foresaw the Internet revolution. Or blogs like this one, for that matter!

    I try to avoid predicting the future; every now and then I may alert readers about some technology that I think will be disruptive (e.g. RFID, stream computing), but I don’t usually try to predict what software or companies will succeed or perish.

    Part of reorganizing my office is managing the hundreds, if not thousands, of technical books that I have accumulated. While doing this, I happened across an old book written by John C. Dvorak called Dvorak Predicts. I have enjoyed Dvorak's columns for years, as well as his web show CrankyGeeks. (By the way, does anyone know why I can't get CrankyGeeks on my TiVo any more?)

    John is entertaining and very knowledgeable. But as I flipped through his book it became evident that his powers of prediction are poor. And why wouldn’t they be? No one knows what tomorrow will bring...

    Now I don’t do what I am about to do to discredit Dvorak specifically, but to alert everyone that even the most talented writers and pundits have a hard time predicting the future. Here, direct from his book, Dvorak Predicts (published in 1994 by McGraw-Hill, ISBN 0-07-881981-4) are some real stinker predictions. Here are a few of them (in bold with page number), followed by my comments (in italics):

    We can expect IBM to someday quit the mainframe business just as it quit the scientific computer business. (page 20)

    Although this prediction may come true some day, it won't be any time soon. Here it is, 15 years later and the mainframe is still viable and still a core offering from IBM. Why didn’t he predict that IBM would quit the PC business (which it did in 2004)?

    Voice recognition will be the killer application of the 1990s. (page 21)

    Didn’t happen. Oh, IBM and Dragon had some voice recognition applications that sold in so-so amounts, but killer app? Nope... and it still isn't widely adopted.

    Microsoft will open stores. (page 64)

    Never happened; good thing for Microsoft, too since other companies that opened stores didn’t fare all that well (e.g. Gateway).

    Furthermore, Dvorak predicted that Unicode would lead to the death of ASCII by 1995. Well, in the long term I’m sure that prediction will come true, but the by 1995 part didn't! Did he ever really believe that? I mean, even though Unicode is important today, ASCII is still around...

    And a few things that Dvorak does not even mention include the World Wide Web, Java, and spam. One would think that a prescient prognosticator would foresee these three facts of everyday life in the world of IT. But no...

    Of course, Dvorak did get some things absolutely correct. For example: "Piracy will increase despite efforts to stop it." He wrote that prediction about software piracy but it is absolutely applicable today with regard to media, especially music and movies. And he said that "Gerstner will be good for IBM," which he undoubtedly was. And he predicted a rosy future for recordable CD and optical media (then again, back then, who wasn’t predicting that?)

    A lot of the predictions (and there are many more than are mentioned here) strike me, here in the future, as of the "who cares" variety. By that I mean, I would expect a useful book of predictions to predict about things that matter in the future, and not about things that are just dust bunnies from the past (e.g. Apple Newton, OS/2). And he never foresees the Palm or Smartphones; just calls the basic idea behind the Apple Newton "stupid" (page 83).

    So, what good is a book of predictions if the majority of them don’t come true. Exactly! No dang good at all -- other than (maybe) as an amusing read (or as blog fodder). The book is out of print, but if you find a copy you might consider shelling out the couple pennies or so it takes to buy it these days. And here in the future, 15 years later, you might get a chuckle out of some of it... or at least enjoy reading it as an historical piece on an interesting time in the history of computers -- that being, the timeframe right before the Web exploded.

    I hope my readers are enjoying reading this series on "predicting the future" as much as I am writing them... please post a comment with your thoughts and check back again soon as I continue cleaning out my closet...

  • But When They're Right... (Predicting The Future, Part 2)

    In a recent blog entry I pointed out several wrong and/or suspect predictions made by industry pundits. But they're not always wrong. Some predictions seem so simple that anyone could have made it, at least sitting here in the future it can seem that way… but sometimes futuristic soothsaying can be correct, and quite useful to know.

    While I was cleaning out my closet I came upon the 1996 conference proceedings of Gartner's Getting Your Data in Shape event. I thought it might be interesting to see what Gartner was saying back then, and what has transpired since. And guess what? Gartner did a very good job predicting the future. Now I don't want to be seen as a shill for Gartner - and sometimes I'm sure they are not as prescient as they were for this particular event - but having a dedicated team of analysts investigating a field and making informed, reasonable projections can be helpful to IT organizations.

    In the next few paragraphs I will examine some of the forward-looking projections I found in this set of proceedings. Keep in mind that Gartner applies a probability to its predictions -- sort of a way of hedging their bets. Generally speaking, the Gartner analyst feels very comfortable about anything with a probability over 0.7... and a 0.9 is almost certainty.

    For the next ten years, the extended relational model will remain the most viable platform for generic data management (0.8 probability).

    This prediction was spot-on. The extended relational products (DB2, Oracle, SQL Server, and to a lesser extent Sybase) are the most commonly used DBMS products for most varieties of data management activities. You might scoff at this prediction and say something like "that was an easy call" - but to be confident of such a prediction over a 10 year span is impressive. And OO was threatening to be a big disruptive force in DBMS back then.

    Although repository products offer help with the data warehouse metadata issues, by 2000, vendors will not provide a robust solution until distributed repository architectures with specialized "subrepositories" are delivered (0.8 probability)

    Not bad here either. Remember, this planning assumption was made during the age of the big "R" Repository - Platinum technology was touting their Repository (which was a merge of the Brownstone and RelTech repositories), Microsoft was making noise about theirs, and repositories were being viewed as the glue that holds the IT infrastructure together. Of course, the big "R" repository has basically died and no real "robust" solution has ever been delivered. CMDB seems to have supplanted repository as the term du jour in this area.

    Specialty data warehouse DBMS products will provide support for predictable queries appropriate for specific DSS applications only, and thus will fail as an enterprise data warehouse by 1997 (0.8 probability)

    For the most part, this has come true, too. The specialty DBMS products (e.g. Red Brick and Arbor) are either gone or surviving only on the periphery.

    Organizations should plan to spend at least 2.5 times their legacy systems management budgets on distributed systems management through 2001 (0.8 probability)

    This statement was made at a time when the general consensus was that client/server implementations would be a lot less costly than mainframe implementations. And though Gartner doesn't use these terms specifically, they are basically saying that management of distributed systems is complex and costly. Another good call.

    The maximum-sized database in production will always outpace the maximum manageable database size.

    This one does not have a probability because it was drawn from a chart that graphed database size versus manageability. I've been quoting this nugget of information in my presentations for some time now. And it is true. There is always an implementation out there that is straining against the limits of what can be done today - and managed today. I think it means something like this: it would take longer to recover this database than to rebuild it from scratch so we won't back it up. But once again, a good piece of knowledge from Gartner.

    Overall, the information in these proceedings would have been very worthwhile to an IT department back in 1996 as it worked to establish budgets and plans for the future. So, skepticism is a virtue, but sometimes analytical "guesses" can be helpful... just don't bet your career on them!

  • Wordle Graphic of My Blog

    Wordle: DMToday Blog Wordle

    The image that appears in this blog posting is “word cloud” that was generated automatically by Wordle.You just feed Wordle an RSS feed or a list of words and it produces "word cloud" art out of it.

    It is somewhat similar to the tag cloud that appears in the sidebar of this blog, but it is based on all of the text in all of my postings, not just the tags I specify for each post. The Wordle "word cloud" gives greater prominence to words that appear more frequently in the source text. So it looks like I focused mostly on data, DBMS, database, enterprise software, and open source... which is very close to my initial intentions for this blog.

    Kinda neat, huh? If you want to see a bigger version of this Wordle, just click on the image.

  • Predicting the Future?

    Have you ever wanted to go back in time to review all of those confident pundit predictions to determine which came true and which, well… didn't?  I think about this especially at this time of year when all the web sites and IT publications start printing their annual wrap up articles with 10 Bold Predictions for 2010... or 5 Things That Will Change Next Year!

    Just recently I’ve been cleaning out the closet in my home office again. This is where I regularly store those treasures that I think I might want to refer to again, but usually never do. So the clutter starts to grow and grow and eventually, if I ever want to step inside the closet ever again (or save anything new in there), I have to go through that “stuff” and get rid of some of it.

    As I started the process I noticed many interesting things lurking about. How about IBM redbooks from the DB2 V2 era (yes, mainframe DB2)? And there were some ancient manuals on VSAM and the IBM Repository and tons of proceedings books from industry conferences. If you have attended any conferences lately you know that they no longer hand out printed materials (and many no longer even hand out CDs). So, yes, these proceedings “books” are from the early 90s (from IDUG, the DB2 Technical Conference, Gartner Symposium, and even Database and Client/Server World). I’m a packrat, so I’m going to keep these relics as mementos of days gone by… one day I may box them up and put them away in the attic instead of keeping them nearby though, but that would take too much time right now.

    As I continued my excavation I came upon some old magazines, as well as research reports from industry analyst groups. These proved to be quite entertaining. As I read through some of this material I became even more adamant that it never pays to believe prognosticators -- even professional ones. Sure, reading this stuff when it is current can be helpful as a guide and maybe to give you something to think about… but it is not very wise to base your future plans on the predictions in these magazines and research reports.

    I’m going to point out some predictions that went awry. I will not attribute them to any particular publication or analyst group, though. (Because I don’t want to get into any online shouting matches or arguments.) Suffice it to say, that all of the following come from respected sources. Here are some of my favorites:

    From an analyst/pundit in 1996 commenting on the DBMS market “Illustra now gives Informix a significant lead…Oracle’s greater market share and financial resources should compensate for its technical handicaps. No other vendor seems likely to mount a credible challenge to these two.” (My thoughts: At least this guy didn’t forget about IBM, errr, wait-a-minute, I mean at least this guy didn’t put Sybase in his top two.)

    On Message Oriented Middleware (MOM) from an analyst group in 1995 “(Our group) believes there are interoperability issues looming that may or may not be serious issues to users, depending on whether highly automatic message-delivery assurance, security, and traffic management are important at this early stage.” (My thoughts: Hmmm. Now what should I do with this sage advice? I “may or may not” ignore it.)

    A Y2K survey from a financial analyst in 1997 “Only 6 percent had made no plans in this area (Y2K)” (My thoughts: Here is an example where using this survey as input we might’ve been able to say, hey, looks like most everyone will fix this problem and we won’t need to build bunkers in Wyoming in case the world falls apart.)

    A 1997 magazine on the future of the PC “Your PC will become an appliance, not making nearly the same demands on your time as today’s PCs. (My thoughts: This guy did NOT anticipate Vista… and I bet he has had to buy a new PC since then requiring a re-installation of everything. And oh, wait-a-minute, it is my Mom calling me because her CD drive no longer works and she can’t find anything she downloads. Yes, it is all so easy now.)

    I’m going to stop here for the time-being. But as I keep on cleaning out my closet I will post additional blog entries on this topic. So keep on checking back in.

     

  • A Quick Look at the Open Source DBMS Market

    The rapid acceptance and usage of Linux as an platform for enterprise computing has enlivened the Open Source community. The term “open source” refers to software that users are free to run, copy, distribute, study, change and improve. Often “open source” gets misinterpreted to mean free software. This is understandable, but the open source concept of free is closer to liberty than it is to no charge.

    Open Source software adheres to the following beliefs:

    • Users are free to run the program, for any purpose.
    • Users are free to inspect the actual source code of the program to determine how it works.
    • Users are free to modify and adapt the software to their specific needs.
    • Users are free to redistribute copies to whomever.
    • Users are free to release code improvements to the public, to benefit the whole community.

    LAMP architectures will be used for certain database applications and systems that meet a specific set of criteria. The acronym LAMP is commonly used as a shortcut to specify the most popular open source software. LAMP stands for Linux, Apache Web server, MySQL DBMS and the PHP/Python/Perl development languages. It is a collective of open source software that can be used to deploy applications with minimal cost, which is the intriguing part to most of its adopters.

    The "no cost" aspect of open source and LAMP is both a positive and a negative. It is positive for the obvious reasons; I mean, no one wants to spend money for something if they can avoid it, right? Of course, the "no cost" label only applies to the initial acquisition cost of the software, and then only maybe. Red Hat, MySQL and others sell distributions of the software to make implementation and management easier.

    Additionally, support is crucial. Do you really want to implement a mission-critical application on free LAMP software and then not have anyone to turn to if problems occur? In other words, who will support the software when you have issues? Companies exist that sell this support. But now if we are buying the software and support to go along with it, how much different is it than commercial software?

    At any rate, open source and LAMP will succeed for those projects that could not be cost-justified or are not affordable with commercial OS/DBMS/Web server software. As the projects gain acceptance in the company, additional support can be purchased to ensure the viability of the applications.

    But where are the limits of open source? I would not predict that a major insurance company, for example, would choose to implement its policy and customer system on "open source"/LAMP technologies. Such mission critical applications require the robust functionality and durability of commercial DBMS software. In some cases, though, the Linux operating system may be chosen for some of these “mission critical” applications. This is the case because Linux has had the time in the market to improve and garner a reputation for stability and functionality. It likely will take longer for open source DBMS products to garner a similar reputation. Today, open source DBMS software is being used in conjunction with the major enterprise DBMSs (Oracle, DB2, and SQL Server) and in SMBs that cannot cost-justify an enterprise DBMS.

    Interestingly, I think open source database technology is in the process of facing up to some problems today. The major open source DBMS products (MySQL, Firebird, PostgreSQL and Sleepycat/Berkeley DB) are mostly simpler to use than enterprise DBMS products because they do not have all the bells and whistles of the enterprise software. Over time, these are being added to the open source players. Triggers, stored procedures, integrity constraints and so on will make the open source DBMS products more complex to use, and, therefore, smaller organizations will have more difficulty deploying them rapidly. The software may be moving away from its sweet spot in terms of how and when it is implemented.

    Another issue being faced by the open source DBMS world is Oracle’s acquisitive ways. Of the four DBMS products mentioned above, one is already owned by Oracle, and when (if) Oracle’s acquisition of Sun is completed MySQL will be an Oracle product, too. What will the future of open source DBMS hold if the major players are gobbled up by the commercial DBMS companies? This is the question that the European Commission is asking as it questions whether to approve the Oracle acquisition of Sun.

    Unique in the open source DBMS market is Ingres, which began its life as a commercial product. Ingres was open sourced by CA (its previous owner) in May 2004. Somewhat different than other open source DBMS offerings, Ingres’ heritage enables it to deliver high-volume transaction processing, high availability, multi-platform support, and security for mission-critical application deployments. Believe it or not, there actually was a time, in the deep dark past, when  Ingres outsold Oracle. If you are looking for a high quality open source DBMS you would do well to consider Ingres.

    And we cannot cover the open source DBMS market without discussing EnterpriseDB. Basically, EnterpriseDB is to open source DBMS (PostgreSQL) as Red Hat is to open source operating systems (Linux). EnterpriseDB offers subscription plans and support for the PostgreSQL DBMS. The company has earned a reputation for offering Oracle compatibility, touting the ability of its customers to replace the Oracle DBMS with PostgreSQL and not have to change its application code. EnterpriseDB seems to be succeeding where others have failed in the past. Anyone remember Great Bridge? Several years ago Great Bridge tried to duplicate for PostgreSQL what Red Hat did for Linux. They failed. But perhaps the time was not ripe for such an offering, whereas today it is?

    The bottom line is that there are a wealth of options if you are interested in using an open source DBMS product. But know what your needs are and what features are available in the open source DBMS products before diving headfirst into the open source waters.

  • Data Modeling Concepts Every DBA Should Know

    Organizations often force the DBA to take on the job of data modeling. That does not mean that DBAs are well trained in data modeling, nor does it mean that DBAs are best suited to take on the task of data modeling. Data administration (DA) separates the business aspects of data resource management from the technology used to manage data. When the DA function exists in an organization, it is more closely aligned with the actual business users of data. The DA group is responsible for understanding the business lexicon and translating it into a logical data model.

    That said, many organizations lump DA and DBA together into a DBA group. As such, the DA tasks usually suffer. One of these tasks is data modeling. You must learn to discover the entire truth of the data needs of your business. You cannot simply ask one user or rely upon a single expert because his or her scope of experience will not be comprehensive. The goal of a data model is to record the data requirements of a business process. The scope of the data model for each line of business must be comprehensive. If an enterprise data model exists for the organization, then each individual line of the business data model must be verified against the overall enterprise data model for correctness. An enterprise data model is a single data model that comprehensively describes the data needs of the entire organization. Managing and maintaining an enterprise data model is fraught with many non-database-related distractions such as corporate politics and ROI that is hard to quantify.

    Data modeling begins as a conceptual venture. The first objective of conceptual data modeling is to understand the requirements. A data model, in and of itself, is of limited value. Of course, a data model delivers value by enhancing communication and understanding, and it can be argued that these are quite valuable. But the primary value of a data model is its ability to be used as a blueprint to build a physical database. When databases are built from a well-designed data model, the resulting structures provide increased value to the organization. The value derived from the data model exhibits itself in the form of minimized redundancy, maximized data integrity, increased stability, better data sharing, increased consistency, more timely access to data, and better usability. These qualities are achieved because the data model clearly outlines the data resource requirements and relationships in a clear, concise manner. Building databases from a data model will result in a better database implementation because you will have a better understanding of the data to be stored in your databases.

    A data model can clarify data patterns and potential uses for data that would remain hidden without the data blueprint provided by the data model. Discovery of such patterns can change the way your business operates and can potentially lead to a competitive advantage and increased revenue for your organization.

    Data modeling requires a different mindset than requirements gathering for application development and process-oriented tasks. It is important to think "what" is of interest instead of "how" tasks are accomplished.

    • Think conceptual -- focus on business issues and terms.
    • Think structure -- how something is done is not important for data modeling. The things that processes are being done to are what is important to data modeling.
    • Think relationship -- the way that things are related to one another is important because relationships map the data model blueprint.

    As you create your data models, you are developing the lexicon of your organization's business. If you are a DBA with data modeling responsibilities I recommend that you find your way to a class, or at least pick up a few good books on the topic. The following books are quite good: Data Modeling Essentials by Graeme Simsion (Morgan Kaufmann, 2004); Mastering Data Modeling: A User-Driven Approach by John Carlis and Joseph Maguire (Addison-Wesley, 2001).

  • The High Cost of Enterprise Software

    I believe that enterprise software is too costly and something needs to be done about it. The software environment ten years from now needs to look completely different than it does today. Of course, that last statement will probably be true notwithstanding my observation about the cost of software simply because of technological advancement. For example, the user interface for computing devices will likely be more like the iPod Touch than like the current mouse-driven GUI. But that is not what I want to talk about today.

    No, today I want to rail on about the extreme cost of enterprise software -- the software that runs the computing infrastructure of medium to large businesses. It is not uncommon for companies to spend multiple millions of dollars on licenses and support contracts for enterprise software packages. This comprises not only operating systems, but database systems, business intelligence and analytics, transaction processing systems, web servers, portals, system management and DBA tools, and so on.

    Now don't get me wrong. I realize that there is intrinsic value in enterprise software. Properly utilized and deployed it can help to better run your business, deliver value, and frequently it can even offer competitive advantage. But what is a fair value for enterprise software?

    Let's look at something simple, like a performance monitor. Nice software, helps you find problems, probably costs anywhere from several hundred thousand dollars to over a million depending on the size of the machines you are running it on. Why does it cost that much? Well, because companies have been willing to pay that much. Not because the software HAS to cost that much to develop. I mean, how many lines of code are in that monitor? Probably less than Microsoft Excel and I can get that for a hundred bucks or so. And I can almost guarantee that Excel has a larger development and support team than whatever monitor you choose to mention.

    So the pricing is skewed not based on what it costs to develop, but what the market will bear. That is fine, after all we live in a free market economy (depending on where you live, I guess). But I don't believe that the free market will continue to support such expensive software. And the open source movement is kind of bearing that out. We have open source operating systems, database systems, BI tools, and so on. But I don't know that open source is the full answer to the problem. There are some companies that prefer to purchase commercial software than to rely on open source software.

    As I continue to think about enterprise software a bit further... In many cases, enterprise software vendors have migrated away from selling new software licenses to selling mostly maintenance and support. For some companies, as much as half of their revenue comes from maintenance and support instead of selling new software. Viewed another way, you could be excused for thinking that some of these companies are doing little more than asking their customers to pay for the continued right to use the software. Quite often, in mature enterprise software segments, there is little maintenance going on, so what is of that support contract buying for you? Sounds like a nice little racket... you know what I'm talking about? So you pay several million for the software and then hundreds of thousands, maybe millions more for the continued right to use it and get bug fixes (shouldn't bug fixes be free?).

    Another problem with enterprise software is feature bloat. Enterprise software can be so expensive because vendors want to price it as if all of its features will be used by the customer. But seldom is that true. Usually only a few features are needed and used on a regular basis. Part of the problem, though, is that those few features can be different for each organization. But really, the core features that most customers want don't differ all that much. Think back to the monitor example: what do you expect it to do? Inspect the system and provide reporting metrics on CPU usage, elapsed time, locking issues, etc. Probably 90% or more of the core functionality is what most customers desire. But then there is that 10%, right? How can you cost effectively deal with that?

    One way vendors deal with this is to offer many separately-priced features enabled by key. But that is complicated for the user and it requires additional resources for the software vendor to develop and support it.

    So what is the answer? Gee, I wish I knew... I've got ideas and thoughts and perhaps I'll share them with you all in a later blog post. But I think I've babbled on enough for today... What do you think about the state of the enterprise software market today? Feel free to write your comment here...

  • Accuracy Versus Speed

    Brevity is the enemy of accuracy. I could stop writing right there, but then that would be too brief, and I would much rather be accurate. Most of us strive to keep our days moving along. Making things faster is usually looked upon as beneficial. We have fast food, quick copy centers, rapid delivery, and speed dialing, just to name a few of the fast features of daily life. And most of us would not want to be without them.

    But all too often we sacrifice accuracy at the expense of speed. So what if it isn’t exactly right, it is “good enough” and we did it very quickly, didn’t we? Doesn’t that sound familiar? I’m sure it does for most of us in the technology field. The computer industry is one of the worst offenders when it comes to favoring expedience over accuracy.

    For the purposes of this blog entry, I’m not talking about software quality and bugs. I could, because I think they are a symptom of our “faster, quicker” mindset. But there are many more complex issues surrounding software quality. That may be a topic for future entries. Today, I’m simply talking about the inconsistent and inaccurate ways in which IT professionals use terminology. Let’s look at a couple of examples.

    Database Versus DBMS

    What is a database? I bet most people reading this article believe they know the answer to that question. But many of them would be wrong. DB2 is not a database, it is a DBMS, or Database Management System. You can use DB2 to create a database, but DB2, in and of itself, is not a database. Same goes for Oracle (which is a DBMS and a company) and SQL Server (just a DBMS).

    So what is a database? A database is an organized store of data wherein the data is accessible by named data elements (for example, fields, records, and files). It does not even have to be computerized to be a database. The phone book is a database (Why do they still send out phone books? Does anyone even use them any more? Now I'm way off topic, so let's get back on track.)

    A DBMS is software that enables end users or application programmers to share data. It provides a systematic method of creating, updating, retrieving and storing information in a database. DBMSs also are generally responsible for data integrity, data access control, and automated rollback, restart and recovery.

    In layman’s terms, you can think of a database as a filing system. You can think of the filing cabinet itself along with the file folders and labels as the DBMS. A DBMS manages databases. You implement and access database instances using the capabilities of the DBMS.

    So, DB2 and Oracle and SQL Server and MySQL are database management systems. Your payroll application uses the payroll database, which may be implemented using DB2 or Oracle or...

    Why is that important? If we do not use precise terms when we write, speak, and work confusion can result. And confusion leads to over budget projects, improperly developed systems, and lost productivity. So precision must be important to us.

    Do you want another example?

    Information Versus Data

    The basic building block of knowledge is data. Data is a fact represented as an item or event out of context and with no relation to other things. Examples of data are 27, 010110, and JAN. Without additional details we know nothing about any of these three pieces of data. Consider:

    • Is 27 a number in base ten, or is it in octal (which would translate to 23 in base ten)?
    • If 27 is a number in base ten what does it represent? Is it an age, a dollar amount, an IQ, a shoe size, or something else entirely?
    • What about 010110? Is it a binary number? Or is it a representation of a date, perhaps January 1, 1910? January 1, 2010? Or something else entirely?
    • Finally, what does JAN represent? Is it a woman's name (or a man's name)? Or does it represent the first month of the year?

    All of these are examples of data because of the lack of context.

    Information, on the other hand, adds context through relationships between data, and possibly other information. Data with metadata and context makes information. The relationships may represent information, yet the relations do not actually constitute information until they are understood. Also, the relationships that represent data have a tendency to be limited in context, mostly about the past or present, with little if any implication for the future.

    Knowledge adds understanding and retention to information. It is the next natural progression after information. To have "knowledge" requires information in conjunction with patterns between data, information, and other knowledge, coupling it with understanding and cognition.

    The final step would be to move from knowledge to wisdom. Wisdom can be thought of as knowledge applied. You may have the knowledge that fatty foods are bad for you, but if you eat it anyway, you are not wise.

    Now how often do you precisely use these terms, especially data and information? 

    Synopsis

    There are plenty of other examples, too. When was the last time you heard or read an acronym and had no idea what it meant? What about industry buzzwords like SOA or cloud computing? They are used all the time, but rarely defined. I bet if we got twelve cloud computing "experts" and put them in separate rooms we'd come up with at least twelve different definitions of cloud computing.

    As skilled IT professionals, we need to be more precise in our day to day language. Doing so preserves knowledge and minimizes confusion. And it is laudable to pursue both of those goals. Furthermore, it is cost effective in terms of clarity and productivity. And isn’t that the reason we go to work everyday?

  • InfoSphere Streams: Analyzing Any Data, Anywhere, All the Time (IOD2009)

    I am still at the IBM Information on Demand conference in Las Vegas and today IBM briefed me on their stream computing solution - InfoSphere Streams. I mentioned this briefly in yesterday's blog posting about analytics but I want to get into the topic in much more depth today.

    So what is stream computing? Basically, it is the ingestion of data -- structured or unstructured -- from arbitrary sources and processing it without necessarily persisting it. Any digitized data is fair game for stream computing. As the data streams it is analyzed and processed in a problem-specific manner. The "sweet spot" applications for stream computing are when devices produce large amounts of instrumentation data on a regular basis. The data is difficult for humans to interpret easily and is likely to be too voluminous to be stored in a database somewhere. Examples of types of data that are well-suited for stream computing include healthcare, weather, telephony, stock trades... you get the idea.

    By analyzing large streams of data and looking for trends, patterns, and "interesting" data, stream computing can solve problems that were not practical to solve using traditional computing methods. Another useful way of thinking about this is as RTAP - Real-Time Analytical Processing (as opposed to OLAP, On-Line Analytical Processing). 

    The IBM product for stream computing is called InfoSphere Streams. It runs on xSeries blades (up to 125 x86 blades) using Linux. It is based on three main abstractions:

    1. The stream - bit pipes of data which can be subscribed to
    2. Operators - analytical calculation processors
    3. Topology - the integration of streams to operators

    The data streams into the system, which is built as a series of progressing, cascading steps. Each step progressively refines the analysis looking for the  information, patterns, trends, and diagnoses. Consider, for example, a law enforcement application with a stream video surveillance data. Much of the stream will not be interesting. It becomes interesting when a person shows up in the video. So the operators would be analyzing the video, performing scene detection and face identification. When one is found that section of video can be captured and retained. And the face might even be matched automatically against a database of known criminals.

    Another example: IBM and the University of Ontario Institute of Technology (UOIT) are using InfoSphere Streams to help doctors detect subtle changes in the condition of critically ill premature babies.  The software ingests a constant stream of biomedical data, such as heart rate and respiration, along with clinical information about the babies.  Monitoring "preemies" as a patient group is especially important as certain life-threatening conditions such as infection may be detected up to 24 hours in advance by observing changes in physiological data streams. Constantly monitoring the stream of healthcare data can enable many types of early diagnoses that would take medical professionals much longer to draw. For example,a rhythmic heartbeat can indicate problems (like infections); a normal heartbeat is more variable. Analyzing a ECG stream can highlight this pattern and alert medical professionals to a problem that might otherwise go undetected for a long period. Detecting the problem early can allow doctors to treat an infection before it causes great harm.

    A stream computing application can get quite complex. Continuous applications, composed of individual operators, can be interconnected and operate on multiple data streams. Again, think about the healthcare example. There can be multiple streams (blood pressure, heart, temperature, etc.), from multiple patients (because infections travel from patient to patient), having multiple diagnoses.

    IBM's stream computing offerings and research is the result of more than 20 years of IBM information management expertise, five years of development by IBM Research, and more than 200 patents. The solution relies upon a new programming language to express topologies and operators called Spade (soon to be renamed).By processing millions of data points per second and performing advanced analytics on the data can help to usher in a shift in the way we manage and deal with vast amounts of data. It is all

    It is all a part of what IBM is referring to as new intelligence for a smarter planet: systems are more instrumented, interconnected, and intelligent. And that will enable organizations to better focus on value, exploit opportunites more effectively, and change and move more quickly.

    The future is here and it might be time to re-think the way we do business... by joining the stream.

More Posts Next page »

This Blog

Syndication

News

Be sure to visit my web site at http://www.craigsmullins.com
Powered by Community Server, by Telligent Systems