Information Taming Technologies – The New Buzz Phrase?

IDC published their key findings from the annual Digital Universe study (sponsored by EMC), highlighting just how much information we’re all creating and using.  In EMC’s press release, we highlighted the following:  “Information Taming” technologies are driving down the cost of creating, capturing, managing and storing information—one-sixth the cost in 2011 versus 2005.”  Information Taming Technologies!  What a name!

I would consider the SourceOne family of Information Governance products to be “information taming technologies.”  For example, SourceOne products can help archive inactive content from production environments. This leads to improved application performance, improved backup operations and reduced costs through tiered storage – all while preserving the user experience.  If those benefits are delivered, I would say that information taming has occurred successfully! One area where I think this is particularly significant is when managing Microsoft SharePoint environments.  As more and more content gets created and stored within SharePoint, how can organizations maintain high service levels for users while containing or reducing operational costs?

From my vantage point, governance policies are important to taming information growth. Critically important.  Consider IDC’s forecast that organizations will need to deal with 50x more information by 2020 than they’re managing today.  50X more! I wonder how many organizations are thrilled at that prospect when they are struggling to gain control of today’s current volume of information.  I think organizations will find governance policies that reduce hard operational costs and enhance performance to be important?  Do you?

The content residing in Microsoft SharePoint is not immune to the growth figures IDC presents.  Microsoft SharePoint stores content within the SQL database.  This means that as content accumulates, organizations will need strategies for ensuring that end-users don’t see degradation in performance such as in search and retrieval times. SourceOne for Microsoft SharePoint can help organizations externalize active content from their SharePoint environment, in effect storing the content separately from the SharePoint SQL database.  This dramatically reduces the burden on the application, enables potential reduction of storage costs and still ensures that users can go about their business without disruptions.  Sounds pretty good, right?  Would you say it sounds like information has been tamed?  I do

Then there is the inactive content residing in SharePoint and content that is leftover from collaborative projects.  Do you think governance policies for how this content is handled at project conclusion is important?  I do.  If content is no longer being actively used and accessed, why clog up the system with it?  Regularly archiving inactive content reduces frees up the SharePoint environment and enables retention, disposition and eDiscovery capabilities to be invoked when appropriate.  This sounds pretty good too, right?  Information is tamed again!

I’ve provided these two examples to help make a point – IDC’s study provides a tremendous amount of information for us to think about.  Based on the findings, information governance will continue to grow in importance as organizations look for effective ways to manage this tremendous growth.

As for me, I still like the term “Information Taming” technologies.  I firmly believe that governance policies are critical in helping organizations effectively manage and store information. And, the SourceOne family will continue to help organizations tame their information even as it replicates at an unimaginable pace.  So, look for me to weave “information taming technologies” into my next discussion on the SourceOne family of Information Governance products, especially when it relates to managing Microsoft SharePoint.


The Cloud and Software Engineering

In my role here at EMC, I have the opportunity to speak to many of our software development teams about how software development is changing. One of the presentations that I give discusses industry trends that are impacting the way we think about designing and developing software. In that particular presentation, there’s a rather boring graphic that looks like this:

image

This graphic represents the fact that “the cloud” is an important concept that software developers (especially those of us working in the IT Infrastructure management realm) need to be aware of and need to understand.

How “The Cloud” Impacts Software Engineering

Many people would argue that “The Cloud” (be it a private cloud, a public cloud or a hybrid) doesn’t really change anything about how software is engineered since it’s all about infrastructure, and after all, infrastructure is not the concern of the software engineer, right? The problem with that mode of thinking is that one big promise of the cloud movement is flexibility, and software that runs “in the cloud” needs to embrace that flexibility. Consider these points:

  1. Cloud offerings billing themselves as “Platform as a Service (PaaS)” generally attempt to abstract the physical hardware environment from the virtualized application platform that software is built on. At a very high level, what this means is that developers who build applications for PaaS offerings don’t have any control over how the hardware is configured, maintained and protected.
  2. Cloud offerings billing themselves as “Infrastructure as a Service (IaaS)” generally provide a the ability for hosting of virtual machines that are fully-controlled by the customer, but like PaaS offerings, abstract the physical implementation and associated maintenance from the customer.

In both cases above, software engineers need to understand that the environment that their software is executing on is subject to change without notice. In the case of PaaS offerings, developers need to think about persistent storage and volatile storage rather than simply writing to disk. For example, consider an normal application that writes data to a storage device. In many cases, the developer would choose to write that data to a logical disk hosted by the server that the application is executing on. The issue here is that for PaaS, in most cases the “disk” storage available to your application is volatile in nature and in the event that the environment hosting your app changes (failover, patch maintenance, network maintenance, etc) the state of the “disk” is not guaranteed, so for data that must be persisted, developers need to create and maintain non-volatile storage. In the case of IaaS offerings, in the event of a failover or other need to move the hosted virtual machine, there is no guarantee that the state of the VM will be maintained (i.e., it’s possible that the VM could be reset to its initial state, and thus developers need to consider building in mechanisms to deal with that possibility.

The point of the above is that software engineers need to consider the inherit flexibility requirements of the cloud infrastructure when designing and building software that is deployed “to the cloud”.

Perfmon Counter of the MonthDisk Queue Length

OK, so maybe Perfmon Counter of the Week was a little optimistic.  Let’s say month, OK?

This one is another disk-related counter, but I like it because there’s so much myth around it.  It seems anytime I see a document around disk performance as it related to an application, I see references to “disk queue length”.  And it is either really vague, like “monitor the disk queue length – if it’s high, you may have problems with the disk subsystem,” or it’s really specific like “if your disk queue length is consistently higher than 2, you may have a problem with your disk”.

And neither piece of advice is quite wrong, nor is it quite right.

Here’s the thing with disk queue length, and any other queue-oriented counter:  Whether a queue length is “bad” or “good” depends on the number of resources servicing the queue. 

Imagine you’re in line at the grocery store.  There’s a single cashier, and there are nine people in front of you.  That’s a queue length of ten, and you’ll start wondering whether you really need the quinoa your wife told you to get.  Now imagine that there’s a single line with nine people in front of you, but there are 5 cashiers.  You still have a queue length of ten, but you’re going to be through it in a lot less time.  In fact, it’s better than having one person in front of you if there’s a single cashier, because that dude in front of you might have 63 coupons and a checkbook. If there are other cashiers, he’s only dominating one of multiple resources.

Unfortunately, that doesn’t make the quinoa look any more appetizing.

So does that mean that disk queue length is useless?  Not at all.  If you know how many resources there are servicing the LUN, then you can see whether it’s really a problem or not.  Typically the rule of thumb is 2 per disk.  So a queue length of 6 may indicate a bottleneck if you only have 2 disks in the RAID group servicing the LUN.  But it’s really not a problem if you have a 10-disk R1/0 RAID group.  The same goes for processor queue length.

The other aspect in which it is interesting is trending.  I’m a huge fan of measuring performance stats when application performance is good so that you have something to compare it to when things go downhill.  So it’s useful as a comparative value as well.  Even if you don’t have performance data from the “good times”, you can still visualize it with a time-series chart.

SQL Server Worldwide User Group (SSWUG) PowerPivot Analytics Expo

The folks at SSWUG are putting on a PowerPivot Analytics expo on July 15th. I’ll be speaking at the event on “SharePoint 2010 Business Intelligence Feature Review”. My session will be broadcast from 11:30am – 1:00pm Mountain Daylight Time. The session will cover Excel Services, and how it fits into the larger Business Intelligence ecosystem for SharePoint 2010. I will also be covering an introduction to PowerPivot and how it helps to deliver “BI for the masses” when coupled with SharePoint as a delivery mechanism.

 

These expos are designed to give people a one day seminar of focused content, and also introduces the Virtual Conference system, which works very well. People can sit at their desks and still have that “going to a technical conference” feeling.

Check it out! To learn more and to register for the event, see http://www.vconferenceonline.com/event/home.aspx?id=281

Moving the Blog Back Here!

Well, after playing around with the blog on my local server, I’ve decided that it really does work best here on blogger. I have changed the url to http://blog.sqltrainer.com so hopefully that won’t confuse things too much.

 

I always say this, but I’m hoping to spend more time blogging about stuff here over the next year. Work has been exceptionally exciting and I’ve had the awesome opportunity to work with some very cool technologies that I’d love to spend more time educating others about. Time is always an issue though, so we’ll see how it goes.