Performance issue!! And approach?

What exactly is “performance”? 

How well a gadget, software, or service performs in terms of reaction time.

When we talk about the performance of a car or bike, we look at how fast it can travel and how much it costs.

Deviation

The discrepancy between the expected and actual results is defined as the deviation.

Edit Image

Straying from a well-trodden path or acceptable standard

The aforementioned definition will help us determine the types of difficulties we will face.

Performance problems of many kinds

The majority of performance concerns can be divided into two categories: Technical and Perceptional

  • Technical
  • These problems are further subdivided into three types: configuration, sizing, and break/fix difficulties.
  • With the following metrics, these difficulties can be severe:
  • Response Time
  • Utilization Percentage 
  • Throughput
  • The number of transactions completed
  • Input/Output operations per second (IOPS)
  • ​​​​​​​Perceptional 
  • ​​​​​​​Such challenges are frequently linked to a customer’s or user’s expectations.
  • Expectations can be motivated by a variety of factors, including:
  • ​​​​​​​Unable to comprehend vendor-supplied performance benchmarks,
  • Comparisons between the old and new setup, hardware, settings, and so on
  • The performance matrices can be used to quantify the difficulties.
  • The next stage is to “provide enough information and evidence to the customer” to back up your claims.

Scoping performance issues

  • What exactly is the problem?
  • What is the impact of the performance issue?
  • Is it preventing email from being sent?
  • Is it ceasing reporting?
  • What is the expected level of performance?
  • Was it ever working as expected?
  • When did it stop working as expected? (Date and time)
  • Was there anything changed before that time?
  • What is unaffected by the problem?
  • What’s the distinction between working and not working?
  • How frequently do we encounter the problem?
  • What is the significance of an expected performance?
  • Is the problem easily reproducible?
  • Can we pinpoint the location of the performance issue?
  • Clusters
  • Hosts/Hosts Versions of hardware/vSphere
  • Datastores and their variants
  • Virtual Machines/Guest Operating Systems/Specific Workload (DB/Mail Server, for example)
  • ​​​​​​​Network/vLan/Subnets/Sites
  • What is the host’s power management configuration?

Example:

A realistic example of how to use the above questions to narrow down the problem.

A customer-supplied statement

In our VMware cluster, we are now experiencing inconsistency in performance across hosts.

Questionnaire 

  • What exactly is the problem you’re having?

We have a large farm with over 200 hosts spread around the environment. We procured an additional 50 hosts, each with the same configuration as the existing ones. In our VMware clusters, we are currently experiencing inconsistency in performance across various hosts. Instead of disrupting the environment, we’ve been performing a lot of testing, focusing on a few test VMs and two hosts out of ten in the specific cluster. connected to a common FC storage device.

  • Was this working as expected before? (In Lifecycle of Product)

Yes! Older sets of hosts are working absolutely fine. But the issue is with the new hosts that we procured. But all the hosts are identical from a hardware configuration perspective as well as in the vSphere OS versions.

  • Is there any specific time when the problem is experienced?

We are seeing the issue on the VMs hosted on new hosts all day. We have tested the performance during various times of the day on various hosts. We have a dedicated test VM which is a clone of one of the production VMs.

  • Testing Setup

VM names: Test-Win2016-1, Test-Linux-1

Guest OS: Windows Server 2106 Std./Red Hat Linux 7/base install

Datastore/Volume : TestVMFS ​​​​​​​

Bench-marking tool: Crystal Disk Mark 6.0.0

  • How is the testing performed?

Placed the VM on the old hosts and the new hosts and ran the benchmark tool for performance testing. We identified that the new hosts see the performance issue. Hence, we created a test cluster with 2 hosts, comprising of a new and an old host.

  • What are the test results? 

Hosting the VM on Oldhost-01 produces 5000 write IOPs. Hosting the VM on the new host-10 produces 2500 write IOPs. The same test was performed on both the hosts at various times of the day.

With the above questions, we have already eliminated common factors such as VMs, Guest Operating Systems, vSphere versionsDatastore, Testing Process, Storage ConnectivityStorage Protocol, and Time of the Day.

Improved Problem Statement: 

Experienced poor write performance when VM is hosted on the newly procured hosts.

or 

New hosts produce 50% fewer IOPs compared to identical old hosts with the same sets of VMs.

Further elimination

  • The workload on the good and bad hosts
  • Running “ESXTOP” to validate the CPU, memory, storage, and network performance
  • drivers, firmware, BIOS version, etc. on working and non-working hosts.
  • Any Scsi read-write errors that can be seen on the

This is an example of how to approach the performance issue and not a hard and fast rule that we need to use the same approach. But when we have a precise problem statement, it is easy for us to isolate the issue and get it fixed.

Let’s discuss some performance basics in upcoming posts. Please feel free to add your comments so that I can improve.

Leave a comment