Getting a SharePoint system up and running - with data correlation and data analysis

This paper deals with time series correlation. In recent years, this has become one of the central pivotal points when it comes to problems with the performance of applications.

Let's get to the concrete problem: Our customer, a medium-sized mechanical engineering company, had massive problems with the response times of its SharePoint pages after migrating to the new version of SharePoint.

Excerpt from the company's IT landscape

The company's Windows infrastructure includes the following components:

  • Active Directory
  • DNS
  • File
  • SharePoint
  • SQL

The hardware environment mainly consists of components from HPE: HPE C7000 blade systems with the corresponding BL460g7 server (blade) racks with SAN connections of 8GB/s each and network connections (LAN) of 10GB/s each. The disk storage system is an HPE P9500. The HPE P9500 was one of the fastest disk subsystems on the market before the widespread introduction of "All Flash" systems.

After the SharePoint conversion, many things no longer work properly

The functions of the SharePoint service are the most important basis for cooperation between employees in the company. Response times of one to one and a half seconds are the normal standard. However, after the migration, response times for certain pages and SharePoint applications (web parts) were over five seconds. Employees complained of poor to very poor response times. Since SharePoint is classified as a business-critical application, urgent action was required.

The IT department could not find the cause with the available monitoring systems. All infrastructure areas such as Windows server, MS-SQL server, network and storage reported "green".

The various monitoring systems showed single-digit or very small two-digit percentage values for basic values such as CPU, main memory, network, etc. So everything is in the green? So everything was in the green zone after all? How could that be?

Analysis and correlation: finding the real causes

To fix the problem, management set up a working group and tasked our company with finding the root causes of poor response times. We installed the SightLine® analysis and correlation system to collect data from:

  • Windows Server
  • Active Directory
  • MS-SQL server
  • SharePoint
  • Storage
  • Network services (DHCP and DNS)
  • Network

The result: By correlating the measured values of the various infrastructure components, the "Temp DB" of the Microsoft SQL server was identified as the cause of the problem.

SharePoint and SQL Server: Take targeted measures

The SharePoint sites and SharePoint applications use the databases of the SQL server in such a way that very many accesses to the "TEMP DB" are necessary. By optimising the Temp-DB and constantly monitoring its performance, the required response times were achieved again and even improved to over 200%.

The great benefit for the company:

Through the analysis and correlation functions across the entire IT landscape, the best performance of all systems could be achieved again. Employees can now access information and documents quickly again. Overall, the performance of the SharePoint environment improved and with it the work performance of the entire company.