CN113535407B

CN113535407B - Optimization method, system, equipment and storage medium of server

Info

Publication number: CN113535407B
Application number: CN202110875903.4A
Authority: CN
Inventors: 段谊海; 郭锋; 王晓通; 王朋飞; 赵阳阳; 荆亚; 刘畅
Original assignee: Jinan Inspur Data Technology Co Ltd
Current assignee: Jinan Inspur Data Technology Co Ltd
Priority date: 2021-07-30
Filing date: 2021-07-30
Publication date: 2024-03-19
Anticipated expiration: 2041-07-30
Also published as: CN113535407A

Abstract

The application discloses an optimization method of a server, which comprises the following steps: collecting various hardware indexes during the operation of a target service; normalizing various hardware indexes; and when the numerical value of the hardware index after any one normalization in the first time period exceeds a preset threshold corresponding to the hardware index, taking the hardware index as a server bottleneck influence factor to optimize the server. By applying the scheme, the optimization of the server can be effectively carried out, and the limitation caused by the hardware bottleneck of the server is avoided, so that the performance of the server is fully exerted. The application also discloses an optimization system, equipment and storage medium of the server, and the optimization system, equipment and storage medium have corresponding technical effects.

Description

Optimization method, system, equipment and storage medium of server

Technical Field

The present invention relates to the field of server technologies, and in particular, to a method, a system, an apparatus, and a storage medium for optimizing a server.

Background

With the rapid development of artificial intelligence, the calculation amount of various data by users is continuously increasing, but due to the insufficient knowledge of the server, the situation that the service cannot be well operated due to the bottleneck of the server often occurs, and the full utilization of various performances of the server is also not facilitated

In summary, how to effectively optimize the server, so as to fully exert the performance of the server is a technical problem that needs to be solved by those skilled in the art.

Disclosure of Invention

The invention aims to provide a method, a system, equipment and a storage medium for optimizing a server, so that the optimization of the server is effectively performed, and the performance of the server is fully exerted.

In order to solve the technical problems, the invention provides the following technical scheme:

an optimization method of a server, comprising:

collecting various hardware indexes during the operation of a target service;

normalizing the hardware indexes;

and when the numerical value of the hardware index after any one normalization in the first time period exceeds a preset threshold corresponding to the hardware index, taking the hardware index as a server bottleneck influence factor to optimize the server.

Preferably, the normalizing the hardware indexes includes:

and normalizing each hardware index by determining the percentage of the current value of each hardware index and the corresponding theoretical value of the hardware index.

Preferably, the preset thresholds corresponding to the hardware indexes are the same.

Preferably, the method further comprises:

determining respective execution times of each instruction in a preset instruction library in the running process of the target service by collecting each micro-architecture index of the target service in the running process;

and when the execution times of any instruction do not exceed the times threshold corresponding to the instruction, taking the instruction as a server bottleneck influence factor to optimize the server.

Preferably, the method further comprises:

determining the utilization rate of the vectorized instruction of the floating point operation according to each micro-architecture index;

when the utilization rate of the vectorized instruction of the floating point operation is lower than a first utilization rate threshold value, the floating point operation is used as a server bottleneck influence factor to optimize the server.

Preferably, the determining, according to each micro-architecture indicator, the use rate of the vectorized instruction of the floating point operation includes:

determining total double-precision floating point operation instructions, total single-precision floating point operation instructions and the total execution times of x87 double-precision floating point operation instructions according to each micro-architecture index;

and dividing the sum of the execution times by the sum of each floating point operation instruction and each vector operation instruction to obtain the utilization rate of the vectorized instruction of the floating point operation.

Preferably, the method further comprises:

collecting each level of cache hit rate of the target service in the running process;

and when the cache hit rate of any level is lower than the cache hit rate threshold, taking each level of cache as a server bottleneck influence factor to optimize the server.

An optimization system for a server, comprising:

the hardware index acquisition module is used for acquiring various hardware indexes during the operation of the target service;

the normalization module is used for normalizing the hardware indexes;

and the first execution module is used for taking the hardware index as a server bottleneck influence factor to optimize the server when the numerical value of the hardware index in the first duration after any one normalization exceeds a preset threshold corresponding to the hardware index.

An optimization apparatus of a server, comprising:

a memory for storing a computer program;

a processor for executing the computer program to implement the steps of the method of optimizing a server as described in any one of the above.

A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of optimizing a server according to any of the preceding claims.

By applying the technical scheme provided by the embodiment of the invention, the hardware bottleneck of the server can be found according to the statistical result, so that the optimization of the server can be performed accordingly, and the full play of the performance of the server is facilitated. Specifically, the method and the device collect various hardware indexes during the operation of the target service, and then normalize the various hardware indexes. Normalization takes into account the different representation forms of different hardware indexes, and subsequent comparison can be conveniently performed through normalization operation. When the value of any one normalized hardware index in the first duration exceeds the preset threshold corresponding to the hardware index, the hardware index is indicated to have a larger limit on the performance of the server, so that the hardware index is used as a bottleneck influence factor of the server, and the optimization of the server can be performed. The optimization of the server can be effectively carried out, and the limitation caused by the hardware bottleneck of the server is avoided, so that the performance of the server is fully exerted.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of an optimization method of a server according to the present invention;

fig. 2 is a schematic structural diagram of an optimization system of a server in the present invention.

Detailed Description

The core of the invention is to provide an optimization method of the server, which can effectively optimize the server, avoid the limitation caused by the hardware bottleneck of the server, and is beneficial to fully playing the performance of the server.

In order to better understand the aspects of the present invention, the present invention will be described in further detail with reference to the accompanying drawings and detailed description. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, fig. 1 is a flowchart illustrating an implementation of a method for optimizing a server according to the present invention, where the method for optimizing a server may include the following steps:

step S101: and collecting various hardware indexes during the operation of the target service.

The specific content of the target service can be set according to actual needs, and when various hardware indexes of the target service are collected, collection can be completed usually by means of reading a system file, calling a third party driver, a practical system tool and the like.

The specific content of each hardware index may also be set and adjusted according to actual needs, for example, in one specific embodiment of the present invention, each hardware index at least includes: CPU utilization, memory bandwidth, network card real-time rate, nfs real-time rate, IB network real-time rate, memory utilization, swap utilization, GPU memory utilization, PCIE bandwidth.

The hardware index in the implementation mode can generally reflect the hardware state of the server in the running process of the target service more comprehensively, and is also beneficial to the analysis and the determination of the hardware bottleneck of the server. Of course, in other embodiments, the specific content of the hardware index may be adjusted according to actual needs, and the implementation of the present invention is not affected.

Step S102: and (5) normalizing various hardware indexes.

After each hardware index of the target service in operation is collected, normalization of each hardware index is needed.

The normalization is performed by considering that the representation forms of different hardware indexes are different, for example, the unit of the hardware index, i.e. the real-time rate of the network card, is the rate, the CPU utilization rate is the hardware index, and the unit is not available, if the collected hardware indexes are respectively analyzed, the programming is complex, and the subsequent comparison is not facilitated.

When normalization of each hardware index is performed, the normalization of each hardware index can be performed by determining the percentage of the current value of each hardware index and the corresponding theoretical value of the hardware index. For example, the ratio of the real-time speed of the network card to the theoretical speed of the corresponding network card is used as the normalization result of the hardware index. For another example, the ratio of the memory utilization rate to the memory theoretical utilization rate is used as the normalization result of the hardware index, and of course, since the memory theoretical utilization rate is 100%, i.e. 1, the acquired value of the hardware index of the memory utilization rate is the value after normalization.

In other embodiments, other normalization manners may be provided, so that the purpose of normalization can be completed between the numerical values 0 and 1 for each hardware index after normalization, and the specific manner can be set and adjusted according to actual needs.

Step S103: and when the numerical value of the hardware index after any one normalization in the first time period exceeds a preset threshold corresponding to the hardware index, taking the hardware index as a server bottleneck influence factor to optimize the server.

For each normalized hardware index, the corresponding preset threshold value may be the same or different. However, it should be noted that, because normalization is performed on each hardware index, the value of the hardware index after normalization is between 0 and 1, so as to facilitate design and facilitate unification of requirements for each hardware index, in practical application, preset thresholds corresponding to each hardware index are generally the same.

For example, in one occasion, the preset threshold corresponding to each hardware index is set to 0.95, that is, when the value of any one normalized hardware index in the first duration exceeds 0.95, the hardware index is used as a bottleneck influence factor of the server to optimize the server.

After determining that a plurality of hardware indexes serve as bottleneck influence factors of the server, the optimization mode of the specific server can be various, for example, prompt information can be output to enable staff to notice the situation, and then hardware associated with the hardware indexes is replaced.

The specific mode of optimizing the server by taking a certain hardware index as a server bottleneck influence factor can be selected according to actual conditions, but it is understood that after optimization, the hardware index cannot be used as a server bottleneck any more to influence the performance of the server.

In one embodiment of the present invention, the method may further include:

when the execution times of any one instruction do not exceed the times threshold corresponding to the instruction, the instruction is used as a server bottleneck influence factor to optimize the server.

In the foregoing embodiment, the optimization processing of the server is performed by analyzing the hardware bottleneck of the server, and in this embodiment, considering that in some cases, the hardware of the server may not have a bottleneck, but the software of the server limits the performance of the server, so in this embodiment, the optimization processing of the server is performed by analyzing the software bottleneck of the server.

Specifically, the implementation mode can collect various micro-architecture indexes of the target service in the running process, and the collection of the various micro-architecture indexes can be realized by reading the numerical value of a register. For example, in one specific scenario, the collected microarchitectural metrics include: total double-precision floating point operation times, total single-precision floating point operation times, x87 double-precision floating point operation times, SSE PACKED double-precision floating point operation times, SSE SCALAR double-precision floating point operation times, AVX PACKED double-progress floating point operation times, AVX512 PACKED double-precision floating point operation times, SSE PACKED single-precision floating point operation times, SSE SCALAR single-precision floating point operation times, AVX PACKED single-progress floating point operation times, AVX512 PACKED single-precision floating point operation times, SSE PACKED double-precision vector operation times, AVX PACKED double-progress vector operation times, AVX512 PACKED double-precision vector operation times, SSE PACKED single-precision vector operation times, AVX PACKED single-progress vector operation times, and AVX512 PACKED single-precision vector operation times.

After each micro-architecture index of the target service in the running process is collected, the respective execution times of each instruction in a preset instruction library in the running process of the target service can be determined, for example, in an occasion, the instruction of AVX512 PACKED double-precision vector operation is arranged in the preset instruction library, the execution times of the instruction of AVX512 PACKED double-precision vector operation in the running process of the target service can be determined through each collected micro-architecture index of the target service in the running process, if the execution times are lower than the corresponding times threshold value of the instruction, the fact that the performance exertion of a server is limited by software possibly being the server is indicated, and the instruction can be used as a server bottleneck influence factor to optimize the server.

For example, in one scenario the processor of the server is AVX512 instruction enabled, but during the running of the target service, there is little to no invocation of the AVX512 instruction, probably because the software of the server is not updated resulting in the inability to use the AVX512 instruction. When the staff optimizes the server, the program supporting the AVX512 instruction can be downloaded or modified by himself, the AVX512 instruction can complete 512-bit calculation in one clock period, if the calculation of the int32 bits is practically used before, after the AVX512 instruction is used, 16 calculation of the int32 bits can be achieved in one clock period, namely the performance is improved by 16 times.

The specific content in the preset instruction library can be set and adjusted according to the needs, but it can be understood that each instruction in the preset instruction library is usually each instruction beneficial to the performance of the server.

Further, in a specific embodiment of the present invention, the method may further include:

determining the utilization rate of the vectorized instruction of the floating point operation according to various microarchitectural indexes;

In the foregoing embodiment, the single micro-architecture index is considered, and in this embodiment, the comprehensive determination of multiple micro-architecture indexes is performed. Specifically, the embodiment can judge the utilization rate of the vectorized instruction of the floating point operation, namely, according to various microarchitectural indexes, determine the utilization rate of the vectorized instruction of the floating point operation, if the utilization rate of the vectorized instruction of the floating point operation is lower than a first utilization rate threshold value, it indicates that an optimization space exists, and therefore, the floating point operation can be used as a bottleneck influence factor of a server to optimize the server.

The specific manner of determining the utilization rate of the vectorized instruction of the floating point operation may be various, for example, in one embodiment of the present invention, the operation of determining the utilization rate of the vectorized instruction of the floating point operation according to various microarchitectural indexes may specifically include:

determining total double-precision floating point operation instructions, total single-precision floating point operation instructions and the total execution times of x87 double-precision floating point operation instructions according to various microarchitectural indexes;

dividing the sum of the execution times by the sum of each floating point operation instruction and each vector operation instruction to obtain the utilization rate of the vectorization instruction of the floating point operation.

In this embodiment, the vectorized instruction considering the floating point operation generally includes 3 kinds of total double-precision floating point operation instructions, total single-precision floating point operation instructions and x87 double-precision floating point operation instructions, and thus the execution times sum of these 3 kinds of floating point operation instructions is calculated. And the sum of each floating point instruction and each vector instruction can be generally expressed as: total double-precision floating point operation, total single-precision floating point operation, x87 double-precision floating point operation, SSE PACKED double-precision floating point operation, SSE SCALAR double-precision floating point operation, AVX pack double-progress floating point operation, AVX512 pack double-precision floating point operation, SSE PACKED single-precision floating point operation, SSE SCALAR single-precision floating point operation, AVX pack single-progress floating point operation, AVX512 pack single-precision floating point operation, SSE PACKED double-precision vector operation, AVX pack double-progress vector operation, AVX512 pack double-precision vector operation, SSE PACKED single-precision vector operation, AVX pack single-progress vector operation, and AVX512 pack single-precision vector operation.

After the sum of each floating point operation instruction and each vector operation instruction is obtained, the sum of the execution times is divided by the sum, so that the utilization rate of the vectorization instruction of the floating point operation can be obtained.

When the utilization of vectorized instructions of floating point operations is below a first utilization threshold, such as below 80%, the floating point operations may be used as server bottleneck influencing factors for optimization of the server.

In one embodiment of the present invention, the method may further include:

collecting the hit rate of each level of cache of the target service in the running process;

when the cache hit rate of any level is lower than the cache hit rate threshold, each level of cache is used as a server bottleneck influence factor to optimize the server.

In this embodiment, in addition to the bottleneck analysis of the server according to the hardware index and the micro-architecture index, other indexes affecting the performance bottleneck of the server may be analyzed to optimize the server, mainly some software indexes, such as CPI, total memory bandwidth, memory read/write bandwidth, PCIE read/write bandwidth, cache hit rates at different levels, and so on.

For example, in this embodiment, the cache hit rate of each level of the target service in the running process may be collected, and when the cache hit rate of any level is lower than the cache hit rate threshold, each level of cache may be used as a bottleneck influencing factor of the server to optimize the server, for example, the operations of the same memory may be uniformly processed, so as to improve the cache hit rate of each level.

Corresponding to the above method embodiment, the embodiment of the invention also provides an optimization system of the server, which can be referred to in a mutually corresponding manner.

Referring to fig. 2, the optimization system of the server may include:

the hardware index acquisition module 201 is used for acquiring various hardware indexes during the operation of the target service;

the normalization module 202 is used for normalizing various hardware indexes;

the first execution module 203 is configured to take the hardware indicator as a server bottleneck influence factor to optimize the server when the value of the hardware indicator after any one of the normalization in the first duration exceeds a preset threshold corresponding to the hardware indicator.

In one embodiment of the present invention, the normalization module 202 is specifically configured to:

and normalizing the hardware indexes by determining the percentage of the current values of the hardware indexes and the corresponding theoretical values of the hardware indexes.

In a specific embodiment of the present invention, the preset thresholds corresponding to the hardware indexes are the same.

In a specific embodiment of the present invention, each hardware index at least includes: CPU utilization, memory bandwidth, network card real-time rate, nfs real-time rate, IB network real-time rate, memory utilization, swap utilization, GPU memory utilization, PCIE bandwidth.

In one embodiment of the present invention, the method further comprises:

the micro-architecture index acquisition module is used for determining the respective execution times of each instruction in a preset instruction library in the running process of the target service by acquiring each micro-architecture index of the target service in the running process;

and the second execution module is used for taking the instruction as a server bottleneck influence factor to optimize the server when the execution times of any instruction do not exceed the times threshold corresponding to the instruction.

In one embodiment of the present invention, the method further comprises:

the floating point operation vectorization instruction utilization rate calculation module is used for determining the utilization rate of the vectorization instruction of the floating point operation according to various micro-architecture indexes;

and the third execution module is used for taking the floating point operation as a server bottleneck influence factor to optimize the server when the utilization rate of the vectorized instruction of the floating point operation is lower than the first utilization rate threshold value.

In one embodiment of the present invention, the floating point operation vectorization instruction utilization rate calculation module is specifically configured to:

In one embodiment of the present invention, the method further comprises:

the cache hit rate acquisition module is used for acquiring the cache hit rate of each level of the target service in the running process;

and the third execution module is used for taking each level of cache as a server bottleneck influence factor to optimize the server when the cache hit rate of any level is lower than the cache hit rate threshold.

Corresponding to the above method and system embodiments, the embodiments of the present invention further provide an optimization device of a server and a computer readable storage medium having a computer program stored thereon, which when executed by a processor, implements the steps of the optimization method of a server as in any of the embodiments described above.

The optimizing device of the server may include:

a memory for storing a computer program;

a processor for executing a computer program to implement the steps of the method of optimizing a server as in any of the embodiments described above.

It is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The principles and embodiments of the present invention have been described herein with reference to specific examples, but the description of the examples above is only for aiding in understanding the technical solution of the present invention and its core ideas. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the invention can be made without departing from the principles of the invention and these modifications and adaptations are intended to be within the scope of the invention as defined in the following claims.

Claims

1. A method for optimizing a server, comprising:

collecting various hardware indexes during the operation of a target service;

normalizing the hardware indexes;

when the numerical value of the hardware index after normalization in the first time period exceeds a preset threshold corresponding to the hardware index, taking the hardware index as a server bottleneck influence factor to optimize the server;

further comprises:

2. The method for optimizing a server according to claim 1, wherein the normalizing the hardware metrics includes:

3. The optimization method of the server according to claim 2, wherein the preset thresholds corresponding to the hardware indexes are the same.

4. The optimization method of a server according to claim 1, further comprising:

5. The method for optimizing a server according to claim 4, wherein determining the utilization rate of the vectorized instruction of the floating point operation according to each of the microarchitectural indexes comprises:

6. A method of optimizing a server according to any one of claims 1 to 3, further comprising:

7. An optimization system for a server, comprising:

the normalization module is used for normalizing the hardware indexes;

the first execution module is used for taking the hardware index as a server bottleneck influence factor to optimize the server when the numerical value of the hardware index in a first duration after any normalization exceeds a preset threshold corresponding to the hardware index;

further comprises:

8. An optimization apparatus of a server, comprising:

a memory for storing a computer program;

a processor for executing the computer program to implement the steps of the method of optimizing a server according to any one of claims 1 to 6.

9. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the optimization method of a server according to any of claims 1 to 6.