CN112231102A

CN112231102A - Method, device, equipment and product for improving performance of storage system

Info

Publication number: CN112231102A
Application number: CN202011109038.4A
Authority: CN
Inventors: 刘伟锋; 张在贵
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2020-10-16
Filing date: 2020-10-16
Publication date: 2021-01-15

Abstract

The invention provides a method, a device, equipment and a product for improving the performance of a storage system, wherein the method comprises the following steps: inquiring the configuration of a CPU of a physical server; selecting a bound logic core according to the configuration of the CPU; creating a thread, and binding the created thread with a selected logic core of the CPU; and when the thread is dispatched, the thread is distributed to the bound CPU logic core. The cost of thread migration among a plurality of logic cores is reduced, and the performance of the storage system is improved. The created thread is bound to the appointed CPU logic core, and the thread is distributed to the appointed CPU logic core according to the binding strategy when the operating system schedules the thread, so that the thread migration overhead is reduced, and the performance of the storage system is improved.

Description

Method, device, equipment and product for improving performance of storage system

Technical Field

The invention relates to the technical field of performance improvement of storage systems, in particular to a method, a device, equipment and a product for improving the performance of a storage system.

Background

The server virtualization can convert physical resources into logically manageable virtual resources, integrate a plurality of logical servers into one physical server, realize the purpose of simultaneously operating a plurality of virtual environments, reduce the cost of the server and manage the server more easily and safely.

QEMU-KVM is a common solution for current server virtualization, especially for OpenStack to build private cloud scenarios. The KVM provides virtualization of CPU and memory in kernel space, and the QEMU virtualizes hardware I/O in user space. When the I/O of the VM operating system is intercepted by the KVM, it is handed to QEMU for processing. When the QEMU is connected to the distributed storage system, the backend storage is generally accessed by directly calling the librbd client.

In order to improve the parallel task processing capability, the physical server generally adopts a multi-core CPU of SMP or NUMA, so that a plurality of threads can run on a plurality of logic cores of the CPU simultaneously and in parallel. The operating system may employ a scheduling algorithm and may assign threads to run on the appropriate logical cores taking into account load balancing among the cores. If the load of a logic core is too high, the thread responsible for the logic core is migrated to other logic cores to be executed, and finally, the threads of all processes are approximately uniformly distributed on all cores of the CPU. If the target logical core and the current logical core are not in the same NUMA node or span multiple physical CPUs, the cost of thread migration is high, and the system performance is reduced.

For a physical server, one VM corresponds to one QEMU process, and one QEMU process can create a plurality of parallel threads to process I/O tasks, i.e., read and write access is performed on a storage back end. After the current QEMU calls a librbd client to create an I/O thread, the thread scheduling completely depends on a scheduling strategy of an operating system of a physical server, and when a plurality of QEMU processes or other processes exist on the physical server at the same time, the thread of the librbd can migrate among a plurality of cores of a CPU due to the fact that the I/O of the VM has burstiness, so that the migration cost of a thread logic core is high, and the I/O performance of a storage system is reduced.

Disclosure of Invention

The invention provides a method, a device, equipment and a product for improving the performance of a storage system, aiming at the problems that when a plurality of QEMU processes or other processes exist on a physical server at the same time, because the I/O of a VM has burstiness, a thread of a librbd can migrate among a plurality of cores of a CPU, the migration cost of a thread logic core is higher, and the I/O performance of the storage system is reduced.

The technical scheme of the invention is as follows:

in a first aspect, a technical solution of the present invention provides a method for improving performance of a storage system, including the following steps:

inquiring the configuration of a CPU of a physical server;

selecting a bound logic core according to the configuration of the CPU;

creating a thread, and binding the created thread with a selected logic core of the CPU;

and when the thread is dispatched, the thread is distributed to the bound CPU logic core. The cost of thread migration among a plurality of logic cores is reduced, and the performance of the storage system is improved.

Further, the step of creating a thread and binding the created thread with the selected logical core of the CPU includes:

writing the selected logic core into a configuration file;

creating a thread, analyzing a configuration file when the thread is created, and reading a binding switch and a binding parameter in the configuration file;

setting the hard affinity of the created thread to the configured logical core effects binding of the logical core. The created thread is bound to a specified CPU logical core.

Further, the step of querying the configuration of the physical server CPU comprises:

acquiring the number of NUMA nodes, the number of CPUs (central processing units) of each node, namely the number of physical cores, and the number of logical cores of each CPU.

Further, the step of selecting the bound logic core according to the configuration of the CPU includes:

judging according to the configuration of the CPU;

if a plurality of NUMA nodes exist, the CPU where the bound logic core is located in the same NUMA node;

if a plurality of physical cores exist, judging whether to start a hyper-thread, and if not, binding adjacent logic cores; if yes, the bound logic cores are distributed on the same physical core. When the operating system schedules the threads, the threads are distributed to the appointed CPU logic cores according to the binding strategy, so that the thread migration overhead is reduced, and the performance of the storage system is improved.

In a second aspect, the technical solution of the present invention provides a device for improving performance of a storage system, including an inquiry module, a selection module, a binding module, and a processing module;

the query module is used for querying the configuration of the CPU of the physical server;

the selection module is used for selecting the bound logic core according to the configuration of the CPU;

the binding module is used for creating a thread and binding the created thread with the selected logic core of the CPU;

and the processing module is used for distributing the threads to the bound CPU logic cores when the threads are dispatched.

Furthermore, the binding module comprises a writing unit, an analyzing and reading unit and a binding unit;

the write-in unit is used for writing the selected logic core into a configuration file;

the analysis reading unit is used for creating a thread, analyzing a configuration file when the thread is created, and reading a binding switch and binding parameters in the configuration file;

and the binding unit is used for setting the hard affinity of the created thread as the configured logic core to realize the binding of the logic core.

Furthermore, the query module comprises a node number acquisition unit, a physical core number acquisition unit and a logic core number acquisition unit;

a node number obtaining unit for obtaining the number of NUMA nodes;

a physical core number obtaining unit configured to obtain the number of CPUs, i.e., the number of physical cores, of each node;

and the logic core number acquisition unit is used for acquiring the number of the logic cores of each CPU.

Further, the selection module comprises a judgment unit and a selection unit;

the judging unit is used for judging according to the configuration of the CPU; the method is also used for judging whether to start the hyper-thread if a plurality of physical cores exist;

the selecting unit is used for selecting the CPU where the bound logic core is located in the same NUMA node if the judging unit judges that a plurality of NUMA nodes exist; the judging unit is also used for selecting and binding adjacent logic cores if the judging unit judges that the hyper-thread is not started; if the judging unit judges that the hyper-thread is started, the logic cores bound firstly are distributed on the same physical core.

In a third aspect, the present invention further provides an electronic device, including a memory and a processor, where the memory and the processor complete communication with each other through a bus; the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of improving performance of a storage system according to the first aspect.

In a fourth aspect, the present invention also provides a computer program product, which includes a computer program stored on a non-transitory computer-readable storage medium, where the computer program includes program instructions, and when the program instructions are executed by a computer, the computer executes the method for improving the performance of a storage system according to the first aspect.

According to the technical scheme, the invention has the following advantages: the created thread is bound to the appointed CPU logic core, and the thread is distributed to the appointed CPU logic core according to the binding strategy when the operating system schedules the thread, so that the thread migration overhead is reduced, and the performance of the storage system is improved.

In addition, the invention has reliable design principle, simple structure and very wide application prospect.

Therefore, compared with the prior art, the invention has prominent substantive features and remarkable progress, and the beneficial effects of the implementation are also obvious.

Drawings

In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.

FIG. 1 is a schematic flow diagram of a method of one embodiment of the invention.

Fig. 2 is a schematic block diagram of an apparatus of one embodiment of the present invention.

Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. The following explains key terms appearing in the present invention.

As shown in fig. 1, the technical solution of the present invention provides a method for improving performance of a storage system, including the following steps:

s1: inquiring the configuration of a CPU of a physical server;

s2: selecting a bound logic core according to the configuration of the CPU;

s3: creating a thread, and binding the created thread with a selected logic core of the CPU;

s4: and when the thread is dispatched, the thread is distributed to the bound CPU logic core.

In some embodiments, the step of creating a thread and binding the created thread with the selected logical core of the CPU in step S3 includes:

s31: writing the selected logic core into a configuration file;

s32: creating a thread, analyzing a configuration file when the thread is created, and reading a binding switch and a binding parameter in the configuration file;

s33: setting the hard affinity of the created thread to the configured logical core effects binding of the logical core. The created thread is bound to a specified CPU logical core. The logical core selected in step S2 is written into the configuration file of librbd. The method comprises the following specific steps:

Bind_mask_flag＝true(false)

bind _ mask0 ═ 0xF 000-63 logical cores

64 bits unsigned number, 1bit represents 1 logical core ID, example: the 10-system 3840-16-system 0xF 00-2-system 111100000000, the 8 th to 11 th bits are 1, and the 8 th to 11 th logic cores are bound to

Bind _ mask 1-0 x 064-127 logical core

Bind _ mask 2-0 x 0128-191 logic core

Bind _ mask 3-0 x 0192-255 logical core

In step S1, the step of querying the configuration of the physical server CPU includes:

In some embodiments, the step of selecting the bound logical core according to the configuration of the CPU in step S2 includes:

s21: judging according to the configuration of the CPU;

s22: if a plurality of NUMA nodes exist, the CPU where the bound logic core is located in the same NUMA node;

s23: if a plurality of physical cores exist, judging whether to start the hyper-thread, if not, executing step S24, and if so, executing step S25;

s24: binding adjacent logic cores;

s25: the bound logical cores are distributed on the same physical core. When the operating system schedules the threads, the threads are distributed to the appointed CPU logic cores according to the binding strategy, so that the thread migration overhead is reduced, and the performance of the storage system is improved. It should be noted here that the logical core bound by librbd is selected. And if a plurality of NUMA nodes exist, the CPU where the bound logic core is located is in the same NUMA node. If a plurality of physical cores exist and the hyper-thread is not started, the non-started hyper-thread is a physical core, namely a logic core, and adjacent logic cores are bound. If the physical core opens the hyper-thread, the bound logical cores are distributed on the same physical core. It should be noted that, a physical core should have two logical cores when the hyper-thread is turned on.

In step S4, it should be noted that, when the QEMU calls the librbd to create the I/O thread, the librbd parses the configuration file, reads the core binding switch and the core binding parameters, and sets the hard affinity of the created thread as the configured logical core. The cost of thread migration among a plurality of logic cores is reduced, and the performance of the storage system is improved.

As shown in fig. 2, the technical solution of the present invention provides a device for improving performance of a storage system, including an inquiry module, a selection module, a binding module, and a processing module;

In some embodiments, the binding module includes a writing unit, a parsing reading unit, and a binding unit;

In some embodiments, the query module includes a node number obtaining unit, a physical core number obtaining unit, and a logical core number obtaining unit;

a node number obtaining unit for obtaining the number of NUMA nodes;

In some embodiments, the selecting module comprises a judging unit and a selecting unit;

As shown in fig. 3, an embodiment of the present invention provides an electronic device, which may include: the system comprises a processor (processor), a communication Interface (communication Interface), a memory (memory) and a bus, wherein the processor, the communication Interface and the memory are communicated with each other through the bus. The bus may be used for information transfer between the electronic device and the sensor. The processor may call logic instructions in memory to perform the following method: s1: inquiring the configuration of a CPU of a physical server; s2: selecting a bound logic core according to the configuration of the CPU; s3: creating a thread, and binding the created thread with a selected logic core of the CPU; s4: and when the thread is dispatched, the thread is distributed to the bound CPU logic core.

In addition, the logic instructions in the memory may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Embodiments of the present invention also provide a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to perform the method of the above method embodiments, for example, comprising: s1: inquiring the configuration of a CPU of a physical server; s2: selecting a bound logic core according to the configuration of the CPU; s3: creating a thread, and binding the created thread with a selected logic core of the CPU; s4: and when the thread is dispatched, the thread is distributed to the bound CPU logic core.

Although the present invention has been described in detail by referring to the drawings in connection with the preferred embodiments, the present invention is not limited thereto. Various equivalent modifications or substitutions can be made on the embodiments of the present invention by those skilled in the art without departing from the spirit and scope of the present invention, and these modifications or substitutions are within the scope of the present invention/any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A method for improving performance of a storage system, comprising the steps of:

inquiring the configuration of a CPU of a physical server;

selecting a bound logic core according to the configuration of the CPU;

and when the thread is dispatched, the thread is distributed to the bound CPU logic core.

2. The method of claim 1, wherein the step of creating a thread and binding the created thread to the selected logical core of the CPU comprises:

writing the selected logic core into a configuration file;

setting the hard affinity of the created thread to the configured logical core effects binding of the logical core.

3. The method for improving the performance of a storage system according to claim 1, wherein the step of querying the configuration of the physical server CPU comprises:

4. The method of claim 3, wherein the step of selecting the bound logic core according to the configuration of the CPU comprises:

judging according to the configuration of the CPU;

if a plurality of physical cores exist, judging whether to start a hyper-thread, and if not, binding adjacent logic cores; if yes, the bound logic cores are distributed on the same physical core.

5. A device for improving the performance of a storage system is characterized by comprising an inquiry module, a selection module, a binding module and a processing module;

6. The apparatus for improving the performance of a storage system according to claim 5, wherein the binding module includes a writing unit, a parsing reading unit, and a binding unit;

7. The apparatus for improving performance of a storage system according to claim 6, wherein the query module includes a node number obtaining unit, a physical core number obtaining unit, and a logical core number obtaining unit;

a node number obtaining unit for obtaining the number of NUMA nodes;

8. The apparatus for improving the performance of a storage system according to claim 7, wherein the selecting module comprises a determining unit and a selecting unit;

9. An electronic device is characterized by comprising a memory and a processor, wherein the memory and the processor are communicated with each other through a bus; the memory stores program instructions executable by the processor, the processor calling the program instructions to perform the method of improving performance of a storage system according to any one of claims 1 to 4.

10. A computer program product, characterized in that the computer program product comprises a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to carry out the method of improving the performance of a storage system according to any one of claims 1 to 4.