SQL Server Using VMware
ESX Server 3.5
The results of a benchmarking study performed in Brocade
test labs demonstrate that SQL Server can be deployed on
VMware ESX Server 3.5 for Online Transaction Processing
(OLTP) applications in very favorable server consolidation
ratios to meet corporate IT business requirements.
The astounding success of the virtualization of applications on
the VMware® ESX Server platform has changed the character of
IT infrastructures worldwide. Organizations today are continuing to
deploy applications using virtual machines (VMs) on ESX Servers
at a record pace.
There are, however, classes of applications that organizations have
been unwilling to move to virtualized environments. One example
is, Online Transaction Processing (OLTP) applications running
on Microsoft SQL Server database engines. These applications
are characterized by high levels of Input/Output (I/O), in which
the number of transactions is very high and the amount of data
transferred per transaction is very low. Database Administrators
(DBAs) suspect that these applications will put too much stress
on ESX Server and, as a result, are somewhat hesitant to deploy
I/O-intensive applications on virtualized platforms.
This paper presents the results of a benchmarking study, in which Brocade® used an
industry-standard online transaction processing benchmark (TPC-C) in an ESX Server cluster
to determine if ESX Server version 3.5 could run 10 simultaneously active VMs running SQL
Server with a significant workload on a single ESX Server and 20 such VMs in a cluster of
four hosts. The results demonstrate that ESX Server is more than capable of handling the
task. Brocade is very confident that DBAs can deploy SQL Server on ESX Server 3.5 for OLTP
applications and obtain similar server consolidation ratios, as we observed, which are very
favorable and that make strong business sense.
Included in this paper is supporting performance data for ESX Server and the VMs running
under ESX Server, which clearly shows that the workload performed successfully with
platform memory, processor, and I/O resources to spare. In addition, the data shows that
there is no evidence that ESX Server is a bottleneck in any way.
Deploying virtualized servers and applications in volume on VMware ESX platforms started
more than two years ago. Even though the same application consolidation ratios apply to I/O-
intensive applications as other applications virtualized today, end users have been reluctant
to deploy I/O-intensive applications on ESX because of the perceived risk associated with
those deployments. The general perception is that the ESX I/O abstraction layers cannot
support the I/O throughput required to service I/O-intensive applications. Since many of
these servers are considered much more critical to the enterprise, prevailing opinion is that
it is better to avoid the perceived risk by continuing to deploy the applications on dedicated
server platforms. If considered at all, consolidation of I/O-intensive applications was and for
the most part still remains limited to increasing the number of application instances on more
powerful dedicated platforms. This leaves a significant portion of end-user infrastructures,
made up of Windows and Intel-based Linux and Sun Solaris servers, which are still not
deployed in virtualized environments. And unfortunately, end users are missing the business
benefits that would accrue from those deployments.
NOTE: I/O-intensive applications are those that either produce a lot of storage traffic or
generate a lot of I/Os per Second (IOPS). Microsoft SQL Server, Microsoft Exchange and
Oracle database servers are popular examples of I/O-intensive applications.
Until now there has been very little third-party ESX benchmark data to support the
deployment of I/O-intensive applications in virtual environments. What data was available
was very limited; typically one guest running a load generator such as IOMETER in a single
virtualized instance on a single ESX platform.
When VMware released ESX Server 3.5, Brocade felt it was time to revisit the possibility of
virtualizing I/O-intensive applications on ESX Server. At the same time, more ISV vendors
now support their applications in virtual environments. The time seemed right to generate
data to convince end users to consider deploying more important applications in virtualized
environments. Because of its wide deployment in IT environments, Brocade decided to
generate benchmark data on Microsoft SQL Server
The goal of this study was to provide meaningful data that would be relevant to end users
to give them enough confidence to consider proliferating SQL Server in an ESX Server
environment. Details were as follows:
• Use an application deployed in large numbers by end users on dedicated platforms today.
• Demonstrate that I/O-intensive applications are viable candidates for virtualization.
• Show an application consolidation ratio that makes business sense.
• Demonstrate that the risk is minimal when the application is virtualized.
• Highlight the probable bottleneck areas of such an implementation.
• Simulate as much as possible in a typical IT environment (resources, knowledge, and
infrastructure). Note that the goal of the testing was not to break performance records or
set high-water marks.
• Show that results apply equally in enterprise environments and Small-to-Medium Business
• Apply no application optimization that might skew results. In other words, a guest OS and
ESX Server with an out-of-the-box configuration was used as much as possible.
• Limit the benchmarks to a specific workload type with clearly defined performance criteria.
• Perform benchmark testing that can be duplicated.
VMware has data that suggests that the workload for typical SQL Sever applications in most
organizations is quite small: 90 percent are fewer than 20 Transactions per Second (TPS).
For the purposes of this benchmark study, we established a performance goal of 50 TPS per
virtualized SQL Server instance as measured by TPC-C. This corresponds to 500 concurrent
users working at twice the normal speed on each VM running SQL Server, a workload that is
very much greater than the average reported by VMware.
An objective benchmark that is well known in the DBA community was used. This, combined
TPC-C is a standard benchmarking tool that
with the decision to concentrate on an OLTP workload, led to the choice of Transaction
emulates an OLTP processing environment,
Processing Council TPC-C benchmarks.
for example, an inventory management
There were two phases—single-platform tests and a cluster test—as detailed below:
system. The tool reproduces new order
entries, order status inquiries, and payment
• A “bare metal” (BM) baseline was established by running TPC-C benchmark on one of the
settlements and can be considered a true
hosts in native Windows before ESX Server was installed. The SQL Server application was
application with data entry. (Input field
limited to one of the Central Processing Unit (CPU) cores and was allowed to use as much
values, however, are not sanity-checked.)
memory as it needed.
Multiple warehouse inventories are
• ESX 3.5 was then installed on all the four Systems under Test (SUTs) designated to run the
scanned, stocking levels are checked,
SQL Server application.
and deliveries are scheduled. Individual
• In the single-platform phase: 10 separate tests were run on one of the platforms, and with
transactions can generate one or more
each new test, an additional virtualized instance of SQL Server running the same workload
additional transactions. No batch
was added. The tests were stopped at 10 simultaneous instances on a single platform due
operations were performed; however, they
to time constraints.
were simulated by causing an application
• For the cluster test: Five instances of SQL Server were run per ESX Server, thus running
checkpoint to be performed once during
a total 20 SQL Server instances simultaneously on all four platforms in the ESX cluster. A
limit was reached at 5 instances per ESX platform (due to licensing constraints with the
Only new order transactions are counted
benchmark driver product used in testing).
in the TPS metric. The actual transaction
score would have been higher if all
transactions had been counted; TPS should
The benchmarks were performed in the Brocade Solutions Center Labs at Brocade’s
be considered artificially low.
corporate headquarters in San Jose, California.
The data generated is representative
of real-world demands and end users
The test environment, shown in Figure 1, included the following:
could validate it on their own if they
wished. TPC-C has been available for a
• Quest Benchmark Factory was used to generate the TPC-C workload with 500 concurrent
relatively long time and is credible as
end users per guest Operating System (OS).
a benchmarking tool. Transactions per
• The Systems under Test were dual-socket, quad-core ESX Servers containing 32 gigabytes
Second (TPS) or Transactions per Minute
(TPM) are commonly accepted metrics. It
is relatively easy to reproduce a given test.
• Storage was configured with a shared Virtual Machine File System (VMFS) volume for the
TPC-C generates a lot of IOPS—storage and
guest OS files on the Storage Area Network (SAN) and Raw Device Mapping (RDM) LUNS
network latency will be critical success
for the database and log files.
• The SAN consists of two Brocade 200E Switches and a Hitachi AMS1000 storage array.
• No optimization was performed for the ESX Server, Windows, and SQL Server. Detailed
configurations for these components are available separately (http://www.brocade.com/
• One CPU was allocated to each SQL Server instance, but no limits were set on how much
CPU core the instance could use. 1.5 GB of memory was reserved for each VM running a
SQL Server instance. The memory configuration was determined by running the benchmark
monitoring paging function. The memory allocation was increased until paging stopped,
and then a small amount of memory was added.
NOTE: See “Detailed Configuration Information” toward the end of the paper to find out
how to access detailed information on all the components. If you wish to duplicate this
infrastructure, refer to the Brocade Web site, from which you can download the configurations
for the components used in the benchmark.
Benchmark infrastructure for the
The following provides additional details about the test infrastructure illustrated in Figure 1:
• The console is the Benchmark Factory control point, which is connected to the driver
systems shown below it to run the agents driving the SUT running from 1 to 10 VMs
configured with SQL Server.
• The SUT is shown in blue near the bottom.
• The ESX 3.5 host is dual-connected to the Fibre Channel SAN composed of two Brocade
200E 4 Gbit /sec switches. The switches in turn are connected to a Hitachi storage array.
• The SAN provides redundant paths to the storage, but only one side is actually active at a
time, which means that all the traffic to and from the storage array flows through the same
switch. In the case of a failure in any Fibre Channel component in the path, it is possible
to fail over to the alternate switch and continue operations. This normally occurs without
interruption to the applications.
Emulating the End-User Experience
The benchmark study was conducted by testers who represent typical IT Windows System
Administrators. For example, they had no prior Fibre Channel (FC) experience and no previous
knowledge of TPC-C, Benchmark Factory, or storage optimization. (Lack of optimization in the
SUTs was discussed earlier.)
The results of the benchmark study and the degree of performance and stability experienced
with ESX Server 3.5 were satisfying. Workload goals were met with no adverse effects to
the hosts, SAN, or storage environments. The results are presented below in two parts:
single-platform tests and the ESX Server cluster tests. Table 1 summarizes the TPC-C scores
obtained for the “bare-metal” (BM) run and the 10 virtualized instance runs.
Results for Single-Platform Tests
Average Guest TPS
Transaction TPC-C scores for
1 to 10 concurrent instances.
While there are minor variations, the results are very consistent for each run and VM
instance. A single ESX Server demonstrated an overall rate of 31,260 TPM servicing 5,000
users in 10 VMs while using only about 17 percent of the CPU resources available on the SUT.
Running 10 VMs, each with one virtual CPU, on an 8-core host means that the ESX Server
was over-committed in terms of CPU resources. The results show that ESX Server is more
than capable of handling such over-commitment.
Table 2 summarizes average ESX Server and virtual machine CPU utilization in concurrently
running VMs from 1 through 10 concurrent SQL Server instances. The “Average VM CPU
Utilization” column shows the CPU utilized by each VM as a percent of the amount they were
allocated. (The ESX Server information is also graphed in Figure 2.)
ESX Server and virtual machine
The first thing to notice is that the ESX Server CPU utilization does not increase much as
VMs are added. The data shows that the incremental ESX processor utilization is less than
2.5 percent across the range of concurrent instances tested. The incremental overhead for
running the SQL Server instances using the TPC-C workload in these tests over the range of
VMs tested is minimal and very predictable.
Virtual machine CPU utilization is also very consistent, which is to be expected because every
SQL Server instance is performing the same work with the same data.
ESX Server CPU utilization for
1 to 10 concurrent instances.
NOTE: Virtual machine accounting data is subject to the overall functioning of the virtualizing
platform. VMware has produced a detailed white paper on VM time keeping (Timekeeping in
VMware Virtual Machines, http://www.vmware.com/pdf/vmware_timekeeping.pdf).
Table 3 summarizes average ESX Server and virtual machine memory usage from 1 through
10 concurrent SQL Server instances. The “Average VM Memory Utilization” column shows the
memory utilized by each VM as a percent of the amount they were allocated.
ESX Server and
VM memory utilization.
Two trends are immediately apparent.
• The extra memory required by ESX Server for each new instance is in the range of 3 to
5 percent for environments in which memory is not constrained. This illustrates very clearly
the effects of ESX Server VM page sharing. ESX Server provides an additional optimization
to memory management by sharing pages among VMs. In this instance, page sharing
decreases memory requirements for each VM running SQL Server even further.
• Each virtual machine consumes an average of less than half of the amount of memory
with which it was originally configured, and the average memory utilization declines as
more VMs are added.
ESX memory utilization for
1 to 10 concurent instances.
I/O generated by TPC-C workload can be characterized by a very large number of short block
transfers. This section presents the results for the ESX Server only, to show how much I/O
actually gets through the ESX Server abstraction layers and out the Fibre Channel Host
Bus Adapter (HBA) ports. Table 4 shows the total platform I/O throughput obtained in each
iteration of the test in megabytes per second.
The first column is the number of SQL Server VMs; the second column shows the average
transfer level in megabytes per second; and the third column shows the peak reached
ESX Average Host I/O
ESX Peak I/O
total I/O utilization.
during each run. Notice that the traffic levels are not particularly high by FC standards. This
is due to the nature of the test, in which the number of IOPS is high, but the amount of data
transferred is low.
Storage latency plays a critical role in the ability of an application such as SQL Server to
maintain the level of performance for the response times required by OLTP applications.
Given that the host platform has no bottlenecks, both the storage array and the storage
network must respond quickly to keep response times low. Proper storage array configuration
minimizes device delays and the Brocade FC SAN ensures that there is no latency in the
network. Latency is not a friend to OLTP environments, because it can cause application
performance issues and higher response times. At worst, it triggers resource budgeting
activities in the database servers, which can cause longer-term performance issues. In
extreme cases, the database engine may decide that the device is no longer functional and
Figure 4 graphs the level of I/O traffic for the ten benchmark runs. Since each run activates
an additional instance of SQL Server running the same workload, the regular increase in
workload is expected. The start of each run is clearly shown by a spike in the traffic, caused
by end users logging in to the application and some SQL Server startup activity such as
building up the buffer cache. The OLTP transaction portion of the benchmark follows. This
accurately simulates the use case associated with start-of-day activities, when end users
come into work and start logging on to their applications.
ESX I/O levels in kilobytes per second
for 1 to 10 concurrent instances.
In Figure 4, the 10 runs show a decrease in I/O activity to 0 (zero) between each run. The
result curves for all 10 runs have a similar shape with a peak of I/O activity at the start of
the run followed by a tapering off of activity as the run progresses. A benchmark run starts
by logging on the 500 users (the initial spike of resource usage including some SQL Server
buffer caching), followed by the running of the TPC-C transactions. Once the transactions are
complete, the I/O activity goes to 0 (zero) and the test ends.