Extract from Electronic Messaging Association Messaging Magazine January/February
1996
Author: Clive Horton, President ReSoft
International Defining a Service Level for the Messaging System
A few weeks ago, I was invited at the eleventh hour, to attend a show by
the old Motown group, The Temptations at our local theater in Stamford CT.
The show was terrific. One hundred percent entertainment by a group of
professionals who really knew their stuff, went through all their old songs
and dance routines. Most of the audience was representative of people growing
up in the mid sixties and the whole thing took me back nostalgically to my
youth as I thought about how easy things seemed to be in those days wishing,
perhaps, we were back there again.
The changing role of e-mail
At the same time a thought popped into my head about some messaging throughput
problems a particular customer had experienced earlier that same day which
got me thinking about how it used to be in the 'old days' of messaging. In
the not-so distant past, albeit not as far back as when The Temptations were
top of the charts, e-mail was a much more simplistic method of communication.
Consider your own use of e-mail a few years ago. You probably had e-mail
attached to your main system, maybe PROFS or All-In-One. With this came the
ability to compose messages and send to colleagues on the same system and
perhaps, if you had adopted an expensive mail switch, to other users in your
own organization, regardless of their mail system, and maybe to selected
external partners in remote locations. And if you were really advanced, you
may have also had the ability to attach revisable documents to be able to
share information with other users.
And supporting it was simple as pretty much everything was centralized. If
the system went down, everybody, but everybody, knew about it.
But fast forward to today. The past few years have seen an explosion in a
number of technology developments that directly affect the use of email -
more affordable bandwidth in the network to send larger items - more affordable
and more powerful intelligent desktops allowing the user to be in control
of more functions - and as a result, more pressure from users to do more
at the desktop. And, as we know this has encouraged users to want to share
information electronically. Coupled with the mail-enabling capabilities available
today from many applications and word-processing systems, the face of the
e-mail system has changed - no longer a casual communication system, but
the backbone for delivering many time-sensitive and mission-critical pieces
of information to users internally and externally.
End users have now begun to realize the importance of the messaging network,
and they are now demanding it get treated the same way as their other mission
critical applications - the dreaded service level agreement from the IS
department to guarantee availability. And why not? Why shouldn't the messaging
system be treated the same as say, the accounting system or the banking system
where one of the key critical success factors for the system is based around
an agreed to service level which is defined, published and, most importantly,
measured on a regular basis.
In these times of quality-measurement and empowerment, not only is this critical
to the business, but it becomes critical to the Messaging manager and the
measurement of the success of that role to deliver mail within some agreed
guidelines.
But where do you start??
When we analyze the dynamics of measuring the availability of the messaging
system, the issue is complicated by the differences between Messaging as
an application and, say, the accounting or banking system.
Messaging is a distributed system. One of its major benefits is in being
able to run at a local level without impacting other user groups. It can
run without needing immediate interaction with other parts of the network.
This localizes the application and reduces traffic across the network - not
everyone needs to network up to the mainframe and back again every time .
But at the same time, this distributed system is a burden to centralizing
the support skills. If a local post office slows down or stops working, who
will know about it first. The local user!
Messaging is an asynchronous process. Although local response time is important
to the individual user, this is generally not a messaging issue. Once the
message has left the user interface and entered the mail system, the issue
then becomes how long does it take to get to the other end - wherever that
may be - and how consistent this delivery is. Acceptable speed of delivery
for such an asynchronous mode of delivery in the past could be measured in
days (or not measured at all). Today it's importance is measured by the users
in minutes!
Messaging is a communications system. And to do this requires connectivity
to everyone else to whom the user needs to talk, both internally and to those
users deemed necessary to attach to externally - vendors, sales reps, customers
etc. In most cases this requires gateways between different systems, in different
formats across different networks as well as interfaces to the Internet and
X.400 networks. The resultant mix of multiple platforms creates challenges
in maintaining a consistent method of measurement across the organization.
Now the measurement problem starts to get bigger.
Messaging is a scalable system. Outside of the strengths and weaknesses of
particular e-mail systems, their interoperability or migration capabilities,
e-mail is a single application. Organizations need to be confident that the
Messaging application will support them today and grow with them to meet
their needs in the future. Therefore, the tools to manage the system need
to scale from an initial implementation to the broadest user-population -
encompassing company mergers, buyouts and downsizing; encompassing moving
between different messaging platforms for strategic reasons. But always
maintaining consistency in the background. It is not unusual to see a messaging
system comprising several hundred different post offices of different types,
each of which has the capability to fail at any time and for any reason.
Messaging is the single biggest application you support - compared to other
applications such as Accounting, Banking or Manufacturing, the messaging
system is one of the few systems that touches just about everybody across
the organization. So its availability or lack of affects everyone.
Where do we find tools?
So how do you set and agree a realistic service level agreement for messaging
availability - across many different Post Offices, gateways, routers and
MTA's, running on many different platforms, across many different locations,
both internal and external - with your users? And, of equal importance, how
do you demonstrate compliance to what you have committed?
Before we can define what the Service Level for messaging should be, we need
to find an accurate method for being able to measure it. The foundation for
such a system can be found in the mail-monitoring system. A sound mail monitoring
system has the ability to provide a robust and highly scalable way of
pro-actively alerting the e-mail administrator to poorly and non-performing
e-mail nodes across the largest and most diverse of corporate networks.
Without a mail monitoring system, the users typically find out about a problem
in the chain of events before the Mail Administrator does - primarily because
messaging is a regular and important part of their business process and some
part of that process has noticeably failed to deliver. Mail monitoring can
send out individual tests across the e-mail network to determine the availability
of all the links in the e-mail chain. Severity thresholds can be set for
each test to determine how severe the lateness of each test is. And routines
can be incorporated to alert the appropriate person in the event of outages
and tardiness and can report those in the appropriate manner (via e-mail,
pager etc.) far earlier than the user will find out. The result - MIS is
proactively managing the real-time availability of the e-mail system.
But the mail monitoring system can be leveraged in a very different way -
to determine a history of the performance of the email network. For instance,
consider all the mail monitoring tests going across the network. Each one
comes back to a central point and contains time and date stamp information
on how long that test took. Now wouldn't it be a good idea to capture that
information and use it to historically determine the performance of all tests
across a given time period.
For one thing, such information could be used to determine the performance
history for individual nodes. Think of the power of being able to analyze
all tests performed against a particular node over the past month, three
months or six months. This would help you investigate the complaint by a
user group in Peoria who claims their mail delivery seems to slow down on
a Monday morning, where currently you have no method to trap that information.
Such an analysis may help to point you in the direction of a solution - perhaps
the problem is network bandwidth related. Perhaps doing some load balancing
on the network will relieve this post office from some of the traffic passing
through this location. Perhaps the post office disk space needs expanding
or reorganizing.
But even more important to the users, if this information could be consolidated
into a macro-level report for the whole messaging system, then a service
level can be implemented because it can be measured. Consider being able
to automatically create a report that relates the actual availability across
the whole messaging system over the last week or the last month. A report
that maintains accuracy by taking into account the characteristics of each
of the post offices on the network, for instance, reporting only within that
post office's stated operating hours and ignoring time allocated for regularly
scheduled preventative maintenance. A report that takes into account issues
such as overlapping mail tests when calculating outages and slowdowns.
Overlapping tests occur when there is an outage at a remote point which causes
scheduled tests traveling to or through that point to stack up. The mail
monitoring system should sense the delay, should alert the operator and suspend
its testing, but probably not before a number of tests have been sent and
become late. Once the problem is corrected and mail is flowing again, the
outstanding tests will return and be time- stamped. But statistically, the
service level reporting procedure should not add up the sum of those multiple
tests or the statistical impact of the outage will be erroneously inflated.
The reporting algorithm must subtract those times where tests overlap a specific
period of time through a specific node, tallying that time only once, or
the resultant report will be neither accurate nor credible.
What can go wrong will go wrong
So now we have found a way to accurately measure the service level, then
how do we define the Service level for the organization? This is much more
specific to each organization. In defining a service level , it is important
to define something that is achievable, determined by the size, complexity
and geographical spread of the network. The two key factors are - the time
it should take for mail to be delivered if nothing goes wrong - and the
percentage of mail that is delivered within this time. If you decide that
mail should be delivered end to end within 45 minutes, it is not realistic
for 100% of mail to be delivered within that time. Things will go wrong in
a complex network, particularly where parts of the system are outside of
your direct control. So negotiate a number less than 100% that both MIS and
the Users can live with. Some organizations have found that 95% delivery
within 30 minutes is acceptable, others use 98% within 2 hours. Whatever
you choose to publish should reflect the complexities of your own network.
Once the service level has been established, the tool should be introduced
to help you maintain compliance. Its mail monitoring tool should collect
statistics on all of the node availability tests it sends out. And its reporting
facilities should allow you to consolidate the collected data to report the
actual service level to the end users as well as publishing statistics-on-demand
at a global level and for each individual post office.
In summary, the benefits of applying a mail monitoring tool to help you measure
your service level:
-
Provides you with early warning of potential messaging and network problems
-
Provides an opportunity to proactively manage and recover from problems
-
Helps maintain a higher level of user satisfaction.
I dusted off some of my old albums and dragged out my Best of The Temptations
to play on an aging record deck that has lived in the attic. It was great
to reminisce but the record was so scratched and the quality so poor compared
to today's compact discs. It's good in many ways that things have moved on.
Author Profile
Clive Horton is President of ReSoft International LLC in New Canaan CT. He
has been involved in the messaging industry for over 10 years and formed
ReSoft International in 1994 to provide tools to help companies better manage
multiple messaging systems across their organizations. He can be reached
at clive.horton@re-soft.com
|