Monday, April 23, 2012

Network Sizing: Assumptions or Well Informed Decisions?

In an earlier post, I mentioned I would do some more work on the Exchange Client Bandwidth Calculator from Microsoft.

I will try and explain why I strongly disagree with these kinds of tools.

Each Domino server logs the activity performed by end users. For each user session, an entry is written in the log that records start and stop time (and duration) of the session along with the network bytes sent and received by the client from the server. There's more data logged, but this is not relevant for this article.

Because DNA collects all this data over a 7 day period, we can aggregate this session data over all activities from all end users for each DNA customer:
  • kbps_rcvd = 8 x sum (bytes_received) / 1024
  • kbps_sent = 8 x sum (bytes_sent) / 1024
Doing so for the most recent 100 DNA engagements, shows the average score for each customer in the graph below:

Both axes are scaled logarithmically for visualization purposes, and the overall DNA average is shown by the cross-hair. Each icon in the graph represents a customer organization, the larger the icon, the more users the organization has (seat counts range from 10 to 93,000).

Two things we can observe from these customer scores:
  1. Average Client Network Bandwidth Consumption ranges from 1 to 49 kbps per user across these 100 recent DNA Customers;
  2. Overall DNA Average equals 10.03 (received by users) and 2.40 kbps (sent by users);
Let's take a look at what Microsoft is doing in their calculation workbook: 
  1. In the workbook (worksheet named 'Data Tables') they show peak traffic levels in kbps for each type of client
  2. Microsoft assumes a normal distribution when it comes to User Demand on a typical working day
Using the default profiles in the workbook, this would result in per user averages of 6.36 kbps received for Very Heavy Users and 1.42 kbps received for Light Users. Applying this to our Real World Customers, clearly shows how this would cause significant under-capacity in 70 of our 100 recent customers:

What's causing the huge differences?

I think Microsoft makes a big mistake assuming that all users work along a normal distribution, as shown in their graph (worksheet 'Tools'):

The reality however is completely different, in several ways:

Analyzing the distinct number of hours that users are active for my 100 recent customers, shows that 20% of all users are active less than two hours per day (blue line). 40% of all users are active no more than 5 hours per day.
So this means that a normal distribution -as Microsoft is assuming in their workbook- is not realistic. In fact,  the analysis of the workload for one of my customers illustrates why Microsoft is wrong:

Notice how most users are online between 7 and 17, but the network load is very much concentrated around the morning hours. This is when the remote workers (that are typically online 1-3 hours per day) consume all their new mail, especially on Monday morning.

Having experience with large customers, Microsoft should realize that users can be located in different time zones. So unless you place a data center in each time zone, and have all users from that time zone connect only to that data center in their on time zone, you will see demand patterns that do not match a normal distribution. Instead you will see multiple distributions come together in one data center. 

Finally, let me show you the real world: kbps received and sent for 928,420 Lotus Notes Users  (I love Tableau Software...):

Dear Microsoft, please beware that the real distribution for network bandwidth consumption of end users lays as much as 8 decades (10^8) apart. Each customer is a unique. Even within each customer you should analyze the Real End User Demand in each office location, before calculating network requirements.

No comments:

Post a Comment

I like interaction, thank you!

Note: Only a member of this blog may post a comment.