CN115878335A

CN115878335A - Lock transmission method and related device

Info

Publication number: CN115878335A
Application number: CN202111136893.9A
Authority: CN
Inventors: 陈更; 付明; 雷继棠
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-09-27
Filing date: 2021-09-27
Publication date: 2023-03-31

Abstract

The application discloses a lock transmission method which is applied to a non-uniform memory access NUMA architecture. The method comprises the following steps: acquiring a lock release request from a first thread, wherein the lock release request is used for requesting to release a plurality of locks for controlling access to shared resources, and each lock in the plurality of locks corresponds to a node of each level in a NUMA architecture; if the thread waiting for preempting the first lock exists, releasing the first lock and reserving holding relations among the plurality of locks so as to transmit the plurality of locks to the thread waiting for preempting the first lock, wherein the first lock is a lock corresponding to a lowest level node in the plurality of locks; if the thread waiting for preempting the first lock does not exist, releasing the first lock and a second lock held by the first lock so as to transmit the second lock to the thread waiting for preempting the second lock; and the access condition of the shared resource is that the thread holds a lock corresponding to a node of each level under the NUMA architecture. Based on the scheme, the lock can be prevented from being transmitted across the NUMA domain as much as possible, and the performance of the lock is improved.

Description

Lock transmission method and related device

Technical Field

The present application relates to the field of computer technologies, and in particular, to a lock delivery method and a related device.

Background

Non-uniform memory access (NUMA) is a type of multiprocessor computer memory architecture. Computer systems employing NUMA architectures typically include a plurality of NUMA domain nodes, each including one or more processors and physical memory, and the physical memory in each NUMA domain node is managed by a separate memory controller. And the speed of the processor accessing the physical memory under the same NUMA domain node is higher than the speed of the processor accessing the physical memory under other NUMA domain nodes. Briefly, in a computer system that deploys a NUMA architecture, the speed at which a processor accesses memory depends on the location of the memory relative to the processor.

In order to prevent multiple threads from accessing the shared resource at the same time, when multiple threads in a computer system need to access the same shared resource, each thread needs to contend for a spinlock corresponding to the shared resource before accessing the shared resource. At any time, a spinlock can only be held by one thread at most, and only the thread holding the spinlock corresponding to the shared resource can access the shared resource. Currently, the order in which a thread acquires spin locks is determined mainly based on the order in which the thread applies for spin locks.

However, when a spin lock is transferred among a plurality of threads based on an application sequence of the spin lock, a phenomenon that the spin lock is frequently transferred across a NUMA domain is easily generated, so that a large number of memory access behaviors across the NUMA domain are caused, the transfer efficiency of the spin lock is low, and the access of the threads to shared resources is easily influenced.

Disclosure of Invention

The application provides a lock transfer method which can avoid transferring locks across NUMA domains as much as possible and improve the performance of the locks.

The application provides a lock delivery method, which is applied to a NUMA architecture. The NUMA architecture is a tree structure that includes a plurality of levels, each level in the tree structure including one or more nodes. Each node in the NUMA architecture may be referred to as a NUMA domain, and a node located at a higher level has a larger corresponding NUMA domain; the lower the level of nodes, the smaller the corresponding NUMA domain.

The lock transfer method comprises the following steps: acquiring a lock release request from a first thread, wherein the lock release request is used for requesting to release a plurality of locks for controlling access to a shared resource, each lock in the plurality of locks corresponds to a node of each hierarchy in the NUMA architecture, and a plurality of nodes corresponding to the plurality of locks have a connection relation. Wherein, in the plurality of locks, the locks corresponding to two nodes adjacent to each other in the hierarchy have holding relationship, and the lock corresponding to the node in the lower hierarchy holds the lock corresponding to the node in the higher hierarchy. Because the lock corresponding to the node of each hierarchy holds the lock corresponding to the node of the previous hierarchy, the thread can hold the locks corresponding to the nodes of all hierarchies in the NUMA architecture by holding the lock corresponding to the node of the lowest hierarchy.

And if a thread waiting for preempting the first lock exists, namely a thread located under the same NUMA domain node as the first thread (namely a thread belonging to the same NUMA domain as the first thread), releasing the first lock according to the lock release request and reserving holding relations among the plurality of locks so as to transmit the plurality of locks to the thread waiting for preempting the first lock, wherein the first lock is a lock corresponding to the lowest level node in the plurality of locks. In the case where a holding relationship of multiple locks is preserved, the locks corresponding to lower level nodes in the multiple locks still hold locks corresponding to higher level nodes, i.e., the locks corresponding to smaller NUMA domains hold locks corresponding to larger NUMA domains. In this way, after the other threads preempt the first lock released by the first thread, the locks corresponding to the nodes of each hierarchy in the NUMA architecture can be obtained, so that the access right to the shared resource is obtained.

And if the thread waiting for preempting the first lock does not exist, releasing the first lock and a second lock held by the first lock according to the lock release request so as to transmit the second lock to the thread waiting for preempting the second lock.

And the access condition of the shared resource is that a thread holds a lock corresponding to a node of each level under the NUMA architecture.

In the scheme, by setting a corresponding lock for each node under the NUMA architecture of a plurality of hierarchies, the thread can obtain the authority of accessing the shared resources corresponding to the lock only when obtaining the locks corresponding to the nodes of each hierarchy under the NUMA architecture. In the process of lock transmission, the thread preferentially determines to release the lock of the bottommost node, so that other threads located under the same NUMA domain node can preferentially acquire the lock, thereby avoiding the transmission of the lock across NUMA domains as much as possible and improving the performance of the lock.

In a possible implementation manner, if there is no thread waiting to preempt the first lock, releasing the first lock and a second lock held by the first lock according to the lock release request, and reserving the holding relationship among the remaining locks in the plurality of locks, so as to transfer the remaining locks in the plurality of locks to the thread waiting to preempt the second lock; wherein remaining ones of the plurality of locks are others of the plurality of locks other than the first lock.

That is, the processor releases the holding relationship between the first thread and the first lock and the holding relationship between the first lock and the second lock according to the lock release request from the first thread, and the other holding relationships among the plurality of locks are not released. In this way, since the thread preempts the locks in the order from the lower level to the upper level, for the thread waiting to preempt the second lock, after the thread preempts the second lock, the second lock and the other locks (i.e., the remaining locks except the first lock) in the levels above the second lock can be obtained, and thus the locks corresponding to the nodes in each level under the NUMA architecture are obtained.

In the scheme, during the period of releasing the first lock and the second lock, the holding relation among other locks is reserved, so that other threads positioned in adjacent NUMA domains with the first thread can be preferentially preempted to obtain the locks of nodes of each level, the locks are prevented from being transmitted across large NUMA domains, and the lock performance is improved as much as possible.

In a possible implementation manner, in the process of releasing the second lock held by the first lock, if there is no thread waiting to preempt the second lock, the second lock and the locks held by the second lock are released, so that the other locks, except the first lock and the second lock, in the plurality of locks are transferred to the thread waiting to preempt the lock held by the second lock.

In short, when releasing a plurality of locks held by a thread, the locks to be released are determined in order from a lower hierarchy to a higher hierarchy, and the determined locks to be released are released one by one. When releasing the lock of any one hierarchy node, if a thread waiting for preempting the lock of the node exists, only releasing the lock of the node and reserving the holding relationship among the rest locks; if there are no threads waiting to preempt the lock of the node, then the lock of the node is released and the release of the lock of the node at the previous level of the node is triggered.

According to the scheme, the locks are sequentially released from the low level to the high level, and when a thread waiting for preemption of the lock of a certain node exists, the locks of other level nodes above the node are reserved, so that each transmission of the locks is transmitted in the range of the smallest NUMA domain, the long-distance transmission of the locks is avoided, and the lock performance is improved.

In a possible implementation manner, in order to avoid that threads under other NUMA domain nodes wait for too long time, a corresponding mechanism may be set to avoid that a lock is transmitted for too long time under the same NUMA domain node.

Specifically, if there is a thread waiting for preemption of a first lock and the number of times of release of the first lock is less than a first threshold, the first lock is released according to the lock release request and the holding relationship among the locks is reserved, where the number of times of release of the first lock is used to indicate the number of times of transfer of the first lock between threads under a node corresponding to the first lock.

And if the thread waiting for preempting the first lock does not exist, or the releasing times of the first lock are larger than or equal to the first threshold value, releasing the first lock and a second lock held by the first lock according to the lock releasing request.

In the scheme, the number of times of lock transmission under a certain node can be determined by recording the number of times of lock release corresponding to each node, so that the locks of the node and the node of the previous level are released under the condition that the locks are transmitted too much under the same node, and the thread waiting time under other nodes in the NUMA domain is avoided.

In a possible implementation manner, if there is a thread waiting for preemption of a first lock and the time for the first lock to hold the second lock is less than a second threshold, the first lock is released and the holding relationship among the locks is reserved according to the lock release request, wherein the release number of the first lock is used for indicating the number of times of passing the first lock between threads under the node corresponding to the first lock.

And if the thread waiting for preempting the first lock does not exist, or the time for the first lock to hold the second lock is greater than or equal to a second threshold value, releasing the first lock and the second lock held by the first lock according to the lock release request.

In the scheme, the time length of the lock under a certain node is determined through recording, and under the condition that the lock is transmitted for a long time under the same node, the locks of the node and the previous-level node are released, so that the thread waiting time under other nodes in the NUMA domain is avoided.

In one possible implementation, the method further includes: and acquiring a lock application request from a second thread, wherein the lock application request is used for requesting to apply for controlling the lock for accessing the shared resource. Wherein, the lock application request from the second thread may carry the identifier of the second thread to indicate the identity of the applicant.

And determining a plurality of target locks corresponding to the second thread according to the lock application request, wherein the target locks respectively correspond to nodes of each level in the NUMA architecture, the nodes corresponding to adjacent locks in the target locks are located in adjacent levels, and a connection relationship exists between the nodes corresponding to the adjacent locks in the target locks.

And according to the sequence from low to high of the hierarchy of the nodes corresponding to the locks, preempting the plurality of locks in sequence until the second thread successfully preempts the plurality of target locks. Specifically, after the second thread successfully preempts a target lock of a certain level, it may be checked whether the target lock preempted by the second thread holds a lock of a previous level. If the target lock preempted by the second thread holds the lock of the previous level, the second thread can be considered to have successfully preempted the target locks in all levels, so that the target lock can be stopped from being continuously preempted; if the target lock preempted by the second thread does not hold a lock of the previous level, then the lock of the previous level continues to be preempted.

In one possible implementation, the release of the lock may be implemented by way of a function call. Specifically, a first release function is called according to the lock release request, where the first release function is a lock release function corresponding to the first lock; and executing the first release function, and releasing the first lock and reserving the holding relationship among the locks according to the condition that the number of threads waiting to preempt the first lock is not 0.

In a possible implementation manner, a first release function is called according to the lock release request, where the first release function is a lock release function corresponding to the first lock; calling a second release function indicated in the first release function according to the condition that the number of threads waiting for preemption of the first lock is 0, wherein the second release function is a lock release function corresponding to the second lock; executing the second release function to release a second lock held by the first lock; executing the first release function to release the first lock.

In one possible implementation, the plurality of locks are all spin locks.

In one possible implementation, the plurality of locks held by the first thread may all be locks of the same type, for example, the plurality of locks held by the first thread are all MCS locks or Ticket locks. Or, the plurality of locks include locks of different types, for example, the plurality of locks include MCS lock, ticket lock, and CLH lock.

In one possible implementation, the type of each lock in the plurality of locks is determined according to an operational performance of a different lock in each tier of the NUMA architecture. That is, the type of each lock in the plurality of locks held by the first thread is determined according to the performance of the different locks in each tier of the NUMA architecture. Also, NUMA domains are equivalent for the same hierarchy in a NUMA architecture, so the type of lock used by each hierarchy in a NUMA architecture is the same.

Because the number of threads under nodes of different hierarchies in the NUMA architecture can be different, the use performance of different types of locks in each hierarchy is determined through testing, and the lock type with the highest performance is adopted in each hierarchy in the NUMA architecture, so that the lock transfer performance in the NUMA architecture can be improved.

In one possible implementation, the plurality of levels in the NUMA architecture include a system level, a NUMA node level, a cache group level, and a physical core level; or, the plurality of levels in the NUMA architecture include a system level, a socket level, a NUMA node level, and a cache group level.

A second aspect of the present application provides a lock delivery apparatus, where the apparatus is applied to a non-uniform memory access NUMA architecture, where the NUMA architecture is a tree structure including multiple hierarchies, and each hierarchy in the tree structure includes one or more nodes, and the apparatus includes: an obtaining unit, configured to obtain a lock release request from a first thread, where the lock release request is used to request release of a plurality of locks controlling access to a shared resource, and each lock in the plurality of locks corresponds to a node in each hierarchy in the NUMA architecture, where, in the plurality of locks, locks corresponding to two nodes adjacent to each hierarchy have a holding relationship therebetween, and a lock corresponding to a node in a lower hierarchy holds a lock corresponding to a node in an upper hierarchy; the processing unit is used for releasing the first lock according to the lock release request and reserving holding relations among the plurality of locks if a thread waiting for preempting the first lock exists, so that the plurality of locks are transmitted to the thread waiting for preempting the first lock, and the first lock is a lock corresponding to a lowest level node in the plurality of locks; the processing unit is further configured to release the first lock and a second lock held by the first lock according to the lock release request if there is no thread waiting to preempt the first lock, so as to transfer the second lock to a thread waiting to preempt the second lock; and the access condition of the shared resource is that a thread holds a lock corresponding to a node of each level under the NUMA architecture.

In a possible implementation manner, the processing unit is specifically configured to: releasing the first lock and a second lock held by the first lock according to the lock release request, and reserving holding relations among the rest locks in the plurality of locks so as to transfer the rest locks in the plurality of locks to a thread waiting for preempting the second lock; wherein the remaining ones of the plurality of locks are others of the plurality of locks other than the first lock.

In a possible implementation manner, the processing unit is specifically configured to: in the process of releasing the second lock held by the first lock, if there is no thread waiting to preempt the second lock, releasing the second lock and the locks held by the second lock, so as to transfer the other locks except the first lock and the second lock in the plurality of locks to the thread waiting to preempt the lock held by the second lock.

In a possible implementation manner, the processing unit is specifically configured to: if threads waiting for preemption of a first lock exist and the release times of the first lock are smaller than a first threshold value, releasing the first lock according to the lock release request and reserving the holding relationship among the locks, wherein the release times of the first lock are used for indicating the transfer times of the first lock among the threads under the node corresponding to the first lock; and if the thread waiting for preempting the first lock does not exist, or the releasing times of the first lock are larger than or equal to the first threshold value, releasing the first lock and a second lock held by the first lock according to the lock releasing request.

In a possible implementation manner, the processing unit is specifically configured to: if threads waiting for preemption of a first lock exist and the time for the first lock to hold the second lock is less than a second threshold value, releasing the first lock according to the lock release request and reserving the holding relationship among the locks, wherein the release times of the first lock are used for indicating the transfer times of the first lock among the threads under the node corresponding to the first lock; and if the thread waiting for preempting the first lock does not exist, or the time for the first lock to hold the second lock is greater than or equal to a second threshold value, releasing the first lock and the second lock held by the first lock according to the lock release request.

In a possible implementation manner, the obtaining unit is further configured to obtain a lock application request from a second thread, where the lock application request is used to request to apply for a lock that controls access to the shared resource; the processing unit is further configured to determine, according to the lock application request, a plurality of target locks corresponding to the second thread, where the plurality of target locks respectively correspond to nodes of each hierarchy in the NUMA architecture, nodes corresponding to adjacent locks in the plurality of target locks are located in adjacent hierarchies, and nodes corresponding to adjacent locks in the plurality of target locks have a connection relationship; and the processing unit is further configured to preempt the plurality of locks in sequence according to a sequence from low to high of a hierarchy in which nodes corresponding to the locks are located until the second thread preempts the plurality of target locks successfully.

In a possible implementation manner, the processing unit is specifically configured to: calling a first release function according to the lock release request, wherein the first release function is a lock release function corresponding to the first lock; and executing the first release function, and releasing the first lock and reserving the holding relationship among the locks according to the condition that the number of threads waiting to preempt the first lock is not 0.

In a possible implementation manner, the processing unit is specifically configured to: calling a first release function according to the lock release request, wherein the first release function is a lock release function corresponding to the first lock; calling a second release function indicated in the first release function according to the condition that the number of threads waiting for preemption of the first lock is 0, wherein the second release function is a lock release function corresponding to the second lock; executing the second release function to release a second lock held by the first lock; executing the first release function to release the first lock.

In one possible implementation, the plurality of locks are all spin locks.

In one possible implementation, the plurality of locks are all locks of the same type; or, the plurality of locks comprise locks of different types.

In one possible implementation, the type of each lock in the plurality of locks is determined according to an operational performance of a different lock in each tier of the NUMA architecture.

A third aspect of the present application provides an electronic device, comprising: a memory and a processor; the memory stores code, the processor is configured to execute the code, and when executed, the electronic device performs the method as any one of the implementation manners of the first aspect.

A fourth aspect of the present application provides a computer-readable storage medium having stored thereon a computer program which, when run on a computer, causes the computer to perform the method of any one of the implementations of the first aspect.

A fifth aspect of the present application provides a computer program product which, when run on a computer, causes the computer to perform the method as any one of the implementations in the first aspect.

A sixth aspect of the present application provides a chip comprising one or more processors. A part or all of the processor is used for reading and executing the computer program stored in the memory so as to execute the method in any possible implementation mode of any one aspect.

Optionally, the chip may include a memory, and the memory and the processor may be connected to the memory through a circuit or a wire. Optionally, the chip further comprises a communication interface, the processor being connected to the communication interface. The communication interface is used for receiving data and/or information needing to be processed, the processor acquires the data and/or information from the communication interface, processes the data and/or information, and outputs a processing result through the communication interface. The communication interface may be an input output interface. The method provided by the application can be realized by one chip or by cooperation of a plurality of chips.

Drawings

Fig. 1 is a schematic diagram of a plurality of nodes in a NUMA architecture according to an embodiment of the present application;

FIG. 2 is a NUMA structure under different hardware architectures provided by embodiments of the present application;

FIG. 3 is a schematic diagram of synchronous speed-up ratios of threads in different NUMA levels provided by an embodiment of the present application;

fig. 4 is a schematic diagram of a thread queuing and lock grabbing in a CPU according to an embodiment of the present application;

FIG. 5 is a schematic diagram illustrating a lock transfer according to an embodiment of the present application;

FIG. 6 is a schematic diagram illustrating another lock transfer provided by an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device 101 according to an embodiment of the present disclosure;

fig. 8 is a schematic diagram of an application architecture of a lock delivery method according to an embodiment of the present application;

FIG. 9 is a diagram illustrating a thread application lock according to an embodiment of the present application;

fig. 10 is a flowchart illustrating a lock delivery method 1000 according to an embodiment of the present application;

FIG. 11 is a schematic illustration of a lock release and delivery provided by an embodiment of the present application;

FIG. 12 is a schematic illustration comparing a lock delivery provided by embodiments of the present application;

fig. 13 is a schematic diagram of an array structure of a node according to an embodiment of the present disclosure;

FIG. 14 is a system architecture diagram of a level DB according to an embodiment of the present application;

FIG. 15 is a comparison of lock transfer performance provided by embodiments of the present application;

fig. 16 is a schematic structural diagram of a lock delivery apparatus 1600 according to an embodiment of the present application;

fig. 17 is a schematic structural diagram of a computer-readable storage medium 1700 provided in an embodiment of the present application.

Detailed Description

Embodiments of the present application are described below with reference to the accompanying drawings. As can be known to those skilled in the art, with the development of technology and the emergence of new scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.

The terms "first," "second," and the like in the description and claims of this application and in the foregoing drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely descriptive of the various embodiments of the application and how objects of the same nature can be distinguished. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

For the sake of understanding, technical terms related to the embodiments of the present application will be described below.

Moore's law: is a law proposed by gorden-mol, one of the Intel founders. The contents of moore's law are: the number of transistors that can be accommodated on an integrated circuit doubles approximately every two years.

Hyper-threading: a parallel computing technique. The hyper-threading technology can simulate two logic cores in a multi-thread processor into two physical chips, so that a single processor can use thread-level parallel computation, and further the multi-thread operating system and software are compatible. Wherein, the two logic cores share the same first level cache (L1 cache).

And (4) Cache: a cache is a small but high speed memory located between the CPU and the memory.

Package: also called socket, is a slot for fixing CPU on the computer motherboard and conducting electrical signals. With the development of multi-core technology, multiple CPUs are generally packaged together, and such a Package is called a Package.

Spin lock (spin lock): a lock for implementing protection of a shared resource. In particular, spin locks are used to ensure that only one thread or process entity can access a lock of a shared resource at a time. At most only one execution unit can hold a spin lock at any one time. If a thread attempts to request to acquire a spin lock already held by one of the other threads, the thread may loop to attempt to acquire the spin lock to wait for the spin lock to be released by the other threads. If the spin lock is not held by any other thread, the thread requesting to acquire the spin lock can directly acquire the spin lock so as to access the shared resource protected by the spin lock.

In particular, locking in an operating system is used to ensure data consistency. When multiple threads access a shared resource, a lock is needed to ensure that the shared resource can maintain consistency after the processes access the shared resource. For example, suppose process a and process B write a file at the same time, process a writes a character a and process B writes a character B. Without a lock, it is not known which process executed first, so when the two processes are finished, the content of the file is uncertain, and may be character a (i.e. process B executed write a first, process a executed write character a later, and covered character B), or may be character B (i.e. process a executed write a first, process B executed write character B later, and covered character a). Therefore, to ensure that the contents of the file after execution are consistent for process a and process B, a lock and unlock operation is required. When the first process executes, the file is locked, so that when the second process needs to execute the write operation, the file is blocked and suspended because the file is locked by the first process. Only after the first process finishes executing and the file is unlocked, the second process can write the file, so that the operation sequence of the two processes on the file is ensured, and the problem of inconsistent data is prevented.

Among them, spin locks have many different types of locks, such as MCS lock, ticket lock, CLH lock, and Hemlock. Various types of spin locks are described separately below.

MCS lock: under the MCS lock mechanism, each thread has a node code (node) belonging to itself. When applying for a lock, a thread adds its node to the end of the queue and repeatedly checks whether a local variable in its node is set to 1, i.e. whether a lock is acquired. When the lock is released, the thread clears the local variable flag in its node and sets the local variable in the next node in the queue to 1. The MCS lock has the advantages that threads only need to repeatedly check local variables, shared data access among caches is reduced, and bus pressure is relieved.

Ticket lock: under the Ticket lock mechanism, each thread applying for queuing obtains a queuing number (Ticket). And each thread repeatedly checks whether the ticket of the thread is equal to the global ticket, and when the ticket of the thread is equal to the global ticket, the thread acquires a lock. When a thread releases the lock, its ticket is incremented by 1 to serve as the ticket for the next thread that can acquire the lock, so that the thread holding the ticket can find that a spin lock has been acquired during the spin check.

CLH lock: CLH lock is similar to MCS lock, and is a local spin lock. Unlike the MCS lock, the CLH lock has no explicit linked list to connect each node, and each node has a pointer pointing to its predecessor node to form an implicit linked list. Each thread repeatedly checks the local variable in the predecessor node, and once the local variable in the predecessor node changes, the node acquires the lock.

Hemlock: hemlock is similar to CLH lock and is a local spin lock. Unlike CLH lock, hemlock requires a subsequent node (i.e., the node corresponding to the next thread to acquire the lock) to reset the local variable flag of the lock owner node when releasing the lock.

Mutex: mutual exclusion lock, a mechanism that prevents two threads from reading and writing the same common resource.

With the failure of moore's law, computer architectures evolve from single-core processing architectures to multi-core processing architectures. When the number of cores of the CPU increases gradually, the memory controller becomes a performance bottleneck. Modern computer architectures employ NUMA architectures in order to reduce memory controller and bus pressure. Computer systems employing NUMA architectures typically include a plurality of NUMA domain nodes, each NUMA domain node including one or more CPUs and physical memory, and the physical memory in each NUMA domain node is managed by a separate memory controller. Each NUMA domain node is connected through a bus. Therefore, the speed of accessing the physical memory under the same NUMA domain node (i.e., the local node) by the CPU is higher than the speed of accessing the physical memory under other NUMA domain nodes. Briefly, in a computer system that deploys a NUMA architecture, the speed at which a processor accesses memory depends on the location of the memory relative to the processor.

Referring to fig. 1, fig. 1 is a schematic diagram of a plurality of nodes in a NUMA architecture according to an embodiment of the present application. As shown in fig. 1, under the NUMA architecture, a total of 4 nodes, node0 to node 3, are included, and each node includes a CPU and a memory. For the CPU0, the speed of accessing the memory 0 under the same node (i.e., node 0) by the CPU0 is fast, and the speed of accessing the memories under other nodes (i.e., memory 1, memory 2, and memory 3) by the CPU0 is slow.

Currently, in shared memory systems of multi-core architectures, there are typically multiple layers of NUMA structures. Illustratively, referring to fig. 2, fig. 2 is a NUMA structure under different hardware architectures provided by the embodiments of the present application. As shown in FIG. 2, the system architecture on the left is the NUMA system architecture under the AMD Epyc X86_64 processor, and the system architecture on the right is the NUMA system architecture under the Kunpeng920 ARMv8 processor.

In a NUMA system architecture under an AMD Epyc X86_64 processor, three NUMA tiers are included. Because the processor supports hyper-threading, a first NUMA level (NUMA level 1) in the NUMA system architecture is a physical core (core) level, a second NUMA level (NUMA level 2) is a cache group (cache group) level, and a third NUMA level (NUMA level 3) is a NUMA node (NUMA node) level. The physical core hierarchy includes two hyper-threads, which share a first level cache (L1 cache) in the same physical core. Two hyper-threads in the same physical core share data through the L1 cache, and the access speed is very high; however, the speed of the hyper-threading is slowed down by performing memory access across the physical cores.

In the architecture of the AMD Epyc X86_64 processor, 3 physical cores share the same set of level three caches (L3 caches), so the 3 physical cores that share the same set of L3 caches may be called cache groups. Because the speed of the shared data of the 3 physical cores in the cache group is high when the shared data is accessed by one of the 3 physical cores, and the speed of the shared data in the cache group is low when the shared data is accessed across the cache group, the cache group forms a second NUMA hierarchy, namely a cache group hierarchy.

Furthermore, in the architecture of the AMD Epyc X86_64 processor, multiple cache groups constitute one NUMA node, and thus NUMA nodes constitute the third NUMA hierarchy, i.e. NUMA node hierarchy.

In the architecture of the Kunpeng920 ARMv8 processor, three NUMA levels are also included. Because under this architecture, the processor does not support hyper-threading, and every four physical cores share a cache group. Thus, every fourth physical core constitutes a NUMA hierarchy, i.e., the first NUMA hierarchy (cache group hierarchy). Further, each 8 cache groups share one NUMA node, so the second NUMA level is a NUMA node level. Further, every two NUMA nodes share a Package, so the third NUMA level is a Package level.

Generally, the individual shared memory cells in each NUMA hierarchy may be referred to as NUMA domains.

As shown in fig. 2, each NUMA level has a corresponding NUMA domain, the higher the NUMA level, the larger the corresponding NUMA domain; the lower the NUMA hierarchy, the smaller the corresponding NUMA domain. For example, the level of a first NUMA level (NUMA level 1) is lower than that of a second NUMA level (NUMA level 2), and NUMA domains corresponding to the first NUMA level are smaller than NUMA domains corresponding to the second NUMA level; similarly, the second NUMA level (numlevel 2) is lower in level than the third NUMA level (numlevel 3), and NUMA domains corresponding to the second NUMA level are smaller than NUMA domains corresponding to the third NUMA level.

Referring to fig. 3, fig. 3 is a schematic diagram of synchronous speed-up ratios of threads in different NUMA hierarchies according to an embodiment of the present application. As shown in FIG. 3, assume that the synchronous throughput rate of two threads that do not belong to any NUMA domain is 1.0. On the AMD Epyc X86_64 platform, the synchronous throughput rate of two threads which belong to the same NUMA node but not the same cache group is 1.55. The synchronous throughput rate of two threads which belong to the same cache group but not the same core is 9.07. The synchronous throughput rate of two hyper-threads belonging to the same core is 12.21.

Accordingly, in Kunpeng920 ARMv8, the synchronous throughput rate of two threads belonging to the same package and different NUMA nodes is 1.76. The synchronous throughput rate of two threads which belong to the same NUMA node but not the same cache group is 2.98. The synchronous throughput rate of two threads belonging to the same cache group is 7.03.

As can be seen in fig. 3, the synchronous throughput of threads within the same NUMA domain is higher than that of threads within different NUMA domains. Therefore, to improve the performance of a NUMA system, it is necessary to minimize migration of data across NUMA domains, that is, to control data access in the same NUMA domain as much as possible.

For ease of understanding, the impact on lock delivery performance when delivering locks across NUMA domains is described in detail below in conjunction with specific examples.

Taking MCS lock as an example, under the MCS lock mechanism, each thread has a local variable node. The local variable node of the thread comprises a locked variable and a next pointer; locked represents whether the thread acquires the lock; the next pointer points to the next object to rob the lock. In addition, all threads share a global variable pointer tail that points to the last element in the lock wait queue.

The phase of applying for the lock by the thread mainly comprises the following steps a-d.

Step a, setting a next pointer in a local variable node of a thread to be NULL (NULL), namely, no other lock grabbing objects follow the thread. The reason is that the newly added thread is queued at the end of the lock wait queue, and there are no other waiting threads behind.

And b, setting a locked variable in a local variable node of the thread to be 1, wherein the thread does not hold a lock.

And c, setting the pointing value of the global variable pointer tail to be the current node, namely representing that the last element in the lock waiting queue is the current thread.

And step d, the thread circularly waits until the locked variable in the local variable node of the thread becomes 0.

In the phase of releasing the lock by the thread, the thread needing to release the lock sets the locked variable pointed by the next pointer in the local variable node thereof to 0, that is, sets the locked variable in the local variable node of the next thread to 0. Thus, the next thread, upon examining the locked variable in its local variable node, may find that it has acquired the lock.

As can be seen from the above description, when the lock is released, the CPU running the previous thread needs to access the locked variable corresponding to the next thread through the next pointer and set it to 0. If the two threads run in different NUMA domains, CPU cross-domain access to memory may occur during the lock transfer, resulting in lower performance of the lock.

Referring to fig. 4, fig. 4 is a schematic diagram of a thread queuing lock preemption in a CPU according to an embodiment of the present application. As shown in fig. 4, according to the order of the threads in the CPU applying for the lock grabbing, the threads in CPU0, CPU 2, CPU1 and CPU 3 are added to the lock waiting queue in sequence.

Referring to fig. 5, fig. 5 is a schematic diagram of a lock transmission according to an embodiment of the present application. As shown in fig. 5, after the threads applying for locks are queued in the order shown in fig. 4, the locks applied by the threads are transferred among CPU0, CPU 2, CPU1, and CPU 3 in the order in which the threads were queued. CPU0 and CPU1 are located in NUMA node0, and CPU 2 and CPU 3 are located in NUMA node 1. In the lock transfer process, the lock is firstly transferred from the CPU0 to the CPU 2, and the NUMA domain crossing migration is carried out for one time; then, the lock is transferred from the CPU 2 to the CPU1, and the NUMA domain crossing migration is carried out again; finally, the lock is migrated from CPU1 to CPU 3, again across NUMA domains. The transfer of the three locks causes three times of migration across NUMA domains, which causes great damage to the performance of the locks.

If the NUMA architecture can be sensed during the lock transfer process, the number of times that the lock is transferred across the NUMA domain can be avoided as much as possible, and the lock performance can be improved. For example, referring to fig. 6, fig. 6 is a schematic diagram of another lock transfer provided by the embodiment of the present application. As shown in FIG. 6, for the lock wait queue shown in FIG. 4, during the actual lock transfer process, the lock is transferred first from CPU0 to CPU1, then from CPU1 to CPU 2, and finally from CPU 2 to CPU 3. As can be seen from fig. 6, in the three lock transfer processes, only the lock transfer process from CPU1 to CPU 2 is across NUMA domains, that is, only one lock transfer process is performed across NUMA in the three lock transfer processes, which improves the lock performance well.

In view of this, embodiments of the present application provide a lock delivery method capable of sensing a NUMA architecture. By setting a corresponding lock for each node under a NUMA architecture of a plurality of hierarchies, a thread can obtain the authority of accessing the shared resources corresponding to the lock only when obtaining the locks corresponding to the nodes of each hierarchy under the NUMA architecture. In the process of lock transmission, the thread preferentially releases the lock of the bottommost node, so that other threads located under the same NUMA domain node can preferentially obtain the lock, thereby avoiding the transmission of the lock across NUMA domains as much as possible and improving the performance of the lock.

Specifically, the lock delivery method provided by the embodiment of the application can be applied to electronic equipment. Illustratively, the electronic device may be, for example, a server, a smart phone (mobile phone), a Personal Computer (PC), a notebook computer, or the like.

For convenience of description, the method provided by the embodiment of the present application will be described below by taking the method provided by the embodiment of the present application as an example applied to a server.

In order to facilitate understanding of the present solution, in the embodiment of the present application, first, a structure of an electronic device provided in the present application is described with reference to fig. 7.

Referring to fig. 7, fig. 7 is a schematic structural diagram of an electronic device 101 according to an embodiment of the present disclosure. As shown in FIG. 7, electronic device 101 includes a processor 103, processor 103 coupled to a system bus 105. Processor 103 may be one or more processors, each of which may include one or more processor cores. A display adapter (video adapter) 107, which may drive a display 109, the display 109 coupled with system bus 105. System bus 105 is coupled to an input/output (I/O) bus through a bus bridge 111. The I/O interface 115 is coupled to an I/O bus. The I/O interface 115 communicates with various I/O devices such as an input device 117 (e.g., a touch screen, etc.), external memory 121 (e.g., a hard disk, floppy disk, optical disk, or flash disk), multimedia interface, etc. A transceiver 123 (which can send and/or receive radio communication signals), a camera 155 (which can capture still and motion digital video images), and an external USB port 125. Wherein, optionally, the interface connected with the I/O interface 115 may be a USB interface.

The processor 103 may be any conventional processor, including a Reduced Instruction Set Computing (RISC) processor, a Complex Instruction Set Computing (CISC) processor, or a combination thereof. Alternatively, the processor may be a dedicated device such as an ASIC.

Electronic device 101 may communicate with software deploying server 149 via network interface 129. Illustratively, the network interface 129 is a hardware network interface, such as a network card. The network 127 may be an external network, such as the internet, or an internal network, such as an ethernet or Virtual Private Network (VPN). Optionally, the network 127 may also be a wireless network, such as a WiFi network, a cellular network, and the like.

Hard drive interface 131 is coupled to system bus 105. The hardware drive interface is connected to the hard disk drive 133. An internal memory 135 is coupled to the system bus 105. The data running in the internal memory 135 may include an Operating System (OS) 137, applications 143, and schedules of the electronic device 101.

The processor 103 may communicate with the internal memory 135 via the system bus 105 to retrieve instructions and data from the internal memory 135 within the application program 143 to implement the execution of the program.

The operating system includes a Shell 139 and a kernel 141.Shell 139 is an interface between the user and the kernel of the operating system. The shell is the outermost layer of the operating system. The shell manages the interaction between users and the operating system: waits for user input, interprets the user input to the operating system, and processes the output results of the various operating systems.

Kernel 141 is comprised of those portions of the operating system that are used to manage memory, files, peripherals, and system resources. The kernel 141 interacts directly with the hardware, and the operating system kernel typically runs processes and provides inter-process communication, CPU slot management, interrupts, memory management, and IO management, among other things.

Illustratively, in the case where the electronic device 101 is a smartphone, the application program 143 includes an instant messaging-related program. In one embodiment, electronic device 101 may download application 143 from software deploying server 149 when execution of application 143 is required.

The device applied to the embodiment of the present application is described above, and the lock delivery method provided by the embodiment of the present application will be described in detail below.

Referring to fig. 8, fig. 8 is a schematic view of an application architecture of a lock delivery method according to an embodiment of the present disclosure. As shown in fig. 8, the lock delivery method provided by the embodiment of the present application is applied to a NUMA architecture, which is a tree structure including a plurality of hierarchies, and each hierarchy in the tree structure includes one or more nodes. The tree structure is a multi-level nested structure, and nodes in the tree structure usually have a one-to-many relationship. For a NUMA architecture, because NUMA domains at higher levels in the NUMA architecture often include NUMA domains at lower levels, the NUMA architecture may be constructed as a tree structure including a plurality of levels, and a node in the tree structure is a NUMA domain at the level of the node. And the nodes of the lower hierarchy connected with the nodes in the tree structure are the NUMA domains of the lower hierarchy included in the NUMA domain corresponding to the node.

Illustratively, taking FIG. 8 as an example, the NUMA architecture in FIG. 8 includes four levels. The four levels are a first level, a second level, a third level and a fourth level from low to high. Where the NUMA architecture is an architecture under an AMD X86_64 processor, the first level may be, for example, a physical core level, the second level may be, for example, a cache set level, the third level may be, for example, a NUMA node level, and the fourth level may be, for example, a system level.

Where the NUMA architecture is under an ARMv8 processor, the first tier may be, for example, a cache bank tier, the second tier may be, for example, a NUMA node tier, the third tier may be, for example, a slot tier, and the fourth tier may be, for example, a system tier.

In a NUMA architecture, each node in the NUMA architecture is configured with a lock. When a thread applies for a lock corresponding to a shared resource, the thread needs to apply for locks of nodes of each level in a NUMA architecture one by one according to a sequence from a lower level to a higher level based on a connection relation of the nodes in the NUMA architecture. When a thread holds a lock of a node of each level in the NUMA architecture, the thread holds a lock corresponding to the shared resource on behalf of the thread, that is, the thread obtains access rights to the shared resource.

For example, referring to fig. 9, fig. 9 is a schematic diagram of a thread application lock according to an embodiment of the present application. As shown in fig. 9, for thread 1, thread 1 needs to apply for locks of nodes of each hierarchy in the NUMA architecture one by one in order from a lower hierarchy to a higher hierarchy based on a connection relationship between a node connected to thread 1 and a node in the NUMA architecture. The node connected with the thread 1 is the node1, the node of the previous hierarchy connected with the node1 is the node 7, the node of the previous hierarchy connected with the node 7 is the node 10, and the node of the previous hierarchy connected with the node 10 is the node 12. Therefore, thread 1 needs to apply for the locks corresponding to the 4 nodes one by one according to the sequence of node 1> node 7> node 10> node 12. When thread 1 holds the locks corresponding to node1, node 7, node 10 and node 12, then thread 1 holds the lock corresponding to the shared resource, i.e. thread 1 obtains access rights to the shared resource.

Similarly, for thread 5 in fig. 9, thread 5 needs to apply for locks of nodes of each level in the NUMA architecture one by one in order from a lower level to a higher level based on the node to which thread 5 is connected and the connection relationship between nodes in the NUMA architecture. That is, the thread 5 needs to apply for the locks corresponding to the 4 nodes one by one according to the sequence of node 4> node 8> node 10> node 12. When thread 1 holds the locks corresponding to node 4, node 8, node 10 and node 12, thread 5 holds the lock corresponding to the shared resource, that is, thread 5 obtains access to the shared resource.

As can be seen from fig. 9, since there is only one node (i.e., the root node in the tree structure) in the highest hierarchy (i.e., the fourth hierarchy), the lock corresponding to the node in the highest hierarchy can be held by only one thread at a time. When one thread holds the lock corresponding to the node of each hierarchy in the NUMA architecture, other threads cannot hold the lock corresponding to the node of each hierarchy in the NUMA architecture, and therefore only one thread accesses the shared resources at the same time.

Referring to fig. 10, fig. 10 is a flowchart illustrating a lock delivery method 1000 according to an embodiment of the present disclosure. As shown in fig. 10, the lock delivery method 1000 includes the following steps 1001-1003.

Step 1001, a lock release request from a first thread is obtained, where the lock release request is used to request release of a plurality of locks controlling access to a shared resource.

In this embodiment, when the first thread obtains a plurality of locks for controlling access to the shared resource, and accesses the shared resource according to the plurality of locks, the first thread generates a lock release request to request release of the plurality of locks held by the first thread. After the processor acquires a lock release request from the first thread, determining a plurality of locks requested to be released by the first thread according to the lock release request. Wherein each lock in the plurality of locks requested to be released by the first thread corresponds to a node of each level in the NUMA architecture, and a plurality of nodes corresponding to the plurality of locks have a connection relationship therebetween. For example, each lock in the plurality of locks requested to be released by the first thread corresponds to one node in each level in the NUMA architecture, and any two nodes of adjacent levels to which the plurality of locks correspond have a connection relationship, that is, a node of a lower level is located in a NUMA domain indicated by a node of an upper level.

In addition, in the plurality of locks, locks corresponding to two nodes adjacent to each other in the hierarchy have holding relationship, and locks corresponding to lower hierarchy nodes hold locks corresponding to higher hierarchy nodes. That is, in the above-described plurality of locks, except for the lock corresponding to the highest-level node, the lock corresponding to the node in each level holds the lock corresponding to the node in the previous level. In this way, because the lock corresponding to the node of each hierarchy holds the lock corresponding to the node of the previous hierarchy, the thread can hold the locks corresponding to the nodes of all hierarchies in the NUMA architecture by holding the lock corresponding to the node of the lowest hierarchy.

Taking fig. 9 as an example, the thread 1 holds a lock corresponding to the node1, the lock corresponding to the node1 holds a lock corresponding to the node 7, the lock corresponding to the node 7 holds a lock corresponding to the node 10, and the node 10 holds a lock corresponding to the node 12, that is, the lock corresponding to the node of each hierarchy holds a lock corresponding to the node of the previous hierarchy, so that a relationship of holding the locks layer by layer is formed among the locks.

Optionally, the locks held by the first thread are all spin locks, that is, the locks of each node in the NUMA architecture are all spin locks.

In addition, the plurality of locks held by the first thread may be all locks of the same type, for example, the plurality of locks held by the first thread are all MCS locks or Ticket locks. Or, the plurality of locks include locks of different types, for example, the plurality of locks include MCS lock, ticket lock, and CLH lock.

Specifically, in this implementation, the locks used by each level in the NUMA architecture may be determined according to the performance of different types of locks at each level in the NUMA architecture. That is, the type of each lock in the plurality of locks held by the first thread is determined according to the performance of the different locks in each tier of the NUMA architecture. Also, the type of lock used by each level in the NUMA architecture is the same.

Illustratively, under the ARMv8 platform, the system level in the NUMA architecture adopts MCS lock, slot level adopts token lock, NUMA node level adopts CLH lock, and cache group level adopts token lock.

Under the AMD x86 platform, a system level in the NUMA architecture adopts CLH lock, a NUMA node level adopts MCS lock, a cache group level adopts MCS lock, and a physical core level adopts ticket lock.

In practical applications, the types of locks employed by the various levels in the NUMA architecture may be determined according to the platform to which the lock delivery provided by the embodiments of the present application is applied.

Step 1002, if there is a thread waiting to preempt a first lock, releasing the first lock according to the lock release request and reserving holding relationships among the locks, so as to transmit the locks to the thread waiting to preempt the first lock, where the first lock is a lock corresponding to a lowest-level node in the locks.

After determining the plurality of locks that the first thread requests to release, it may be determined whether there are currently other threads waiting to preempt the first lock based on the first lock of the plurality of locks corresponding to the lowest level node. The thread waiting for preempting the first lock and the first thread are threads located at the same node, so the thread waiting for preempting the first lock and the first thread belong to the same thread in the NUMA domain.

If there is currently a thread waiting to preempt the first lock, the first lock held by the first thread may be released so that other threads waiting to preempt the first lock can preempt the first lock. In addition, in releasing the first lock held by the first thread, the holding relationship among the plurality of locks is preserved, namely, the lock corresponding to the node of the lower hierarchy among the plurality of locks still holds the lock corresponding to the node of the higher hierarchy. In this way, after the other threads waiting for preemption of the first lock acquire the first lock released by the first thread, the locks corresponding to the nodes of each hierarchy in the NUMA architecture can be acquired, so that the access right to the shared resource is acquired.

It will be appreciated that since the lock corresponding to the higher level node in the plurality of locks is actually held by the lock corresponding to the lower level node, rather than being held directly by the first thread, the first thread does not hold any other locks in the plurality of locks after releasing the first lock.

Taking fig. 9 as an example, the lock corresponding to the node1 is held by the thread 1, the lock corresponding to the node1 is held by the lock corresponding to the node 7, the lock corresponding to the node 7 is held by the lock corresponding to the node 10, and the lock corresponding to the node 12 is held by the node 10, that is, the holding relationship between the thread 1 and the locks is: thread 1> lock of node 7> lock of node 10> lock of node 12. Assuming that when thread 1 requests to release the plurality of locks it holds, there is thread 2 waiting to preempt the lock of node1 (i.e., the lock corresponding to the lowest level node), the lock of node1 held by thread 1 is released, and the holding relationship between the plurality of locks, i.e., the lock of node 1> the lock of node 7> the lock of node 10> the lock of node 12, is preserved. Thus, after the thread 2 preempts the lock of the node1, the thread 2 can obtain the locks of the nodes of the respective hierarchies, that is, the holding relationship between the thread 2 and the locks is as follows: thread 2> lock of node 1> lock of node 7> lock of node 10> lock of node 12.

In particular implementations, the release of the lock may be implemented by way of a function call. For example, the processor may call a first release function according to the lock release request, where the first release function is a lock release function corresponding to the first lock. The processor then executes the first release function to release the first lock and retain the holding relationship among the plurality of locks in response to the number of threads waiting to preempt the first lock not being 0.

Step 1003, if there is no thread waiting to preempt the first lock, releasing the first lock and a second lock held by the first lock according to the lock release request, so as to transmit the second lock to the thread waiting to preempt the second lock.

In the event that it is determined that there are no threads waiting to preempt the first lock, it may be assumed that no other threads of the NUMA domain that belong to the same lowest hierarchy as the first thread are waiting to access the shared resource, and therefore the first lock held by the first thread, as well as the second lock held by the first lock, may be released. The first lock is a lock corresponding to a node at the lowest level in the NUMA architecture, and the second lock held by the first lock is a lock corresponding to a node at a level above the level where the node corresponding to the first lock is located, that is, the second lock is a lock corresponding to a node at a second level in the NUMA architecture. In this way, after the processor releases the first lock and the second lock, a thread located in a different NUMA domain from the first thread may preempt the second lock in order to obtain locks corresponding to each level node in the NUMA architecture.

Illustratively, taking FIG. 9 as an example, assume that the holding relationships between thread 1 and the locks are: thread 1> lock of node 7> lock of node 10> node 12, where "thread 1> lock of node 1" indicates that thread 1 holds the lock of node 1. When thread 1 needs to release multiple locks, thread 2 is already serviced, so thread 2 no longer waits to preempt node 1's lock. Therefore, when the lock release request of thread 1 is processed, the lock of node1 held by thread 1 and the lock of node 7 held by the lock of node1 can be released. Thus, when thread 3 or thread 4 needs to preempt the lock of each level, it can be ensured that thread 3 and thread 4 can preempt the lock of node 7 in the second level.

Specifically, in a specific implementation, the processor may call a first release function according to the lock release request, where the first release function is a lock release function corresponding to the first lock. Then, the processor calls a second release function indicated in the first release function according to the condition that the number of threads waiting for preemption of the first lock is 0, wherein the second release function is a lock release function corresponding to the second lock. The processor then executes the second release function to release a second lock held by the first lock. Finally, the processor executes a first release function to release the first lock.

In this embodiment, by setting a corresponding lock for each node in the NUMA architecture of multiple hierarchies, a thread can obtain the right to access the shared resource corresponding to the lock only when obtaining locks corresponding to the nodes in all hierarchies in the NUMA architecture. In the process of lock transmission, the thread preferentially determines to release the lock of the bottommost node, so that other threads located under the same NUMA domain node can preferentially acquire the lock, thereby avoiding the transmission of the lock across NUMA domains as much as possible and improving the performance of the lock.

Optionally, the processor may reserve holding relationships between remaining locks of the plurality of locks during releasing of the first lock and a second lock held by the first lock according to the lock release request, so as to transfer the remaining locks of the plurality of locks to a thread waiting to preempt the second lock. Wherein the remaining ones of the plurality of locks are others of the plurality of locks other than the first lock.

That is, the processor releases the holding relationship between the first thread and the first lock and the holding relationship between the first lock and the second lock according to the lock release request from the first thread, and the other holding relationships among the plurality of locks are not released. In this way, since the thread preempts the locks in the order from the lower hierarchy to the upper hierarchy, for the thread waiting to preempt the second lock, after the thread preempts the second lock, the second lock and the other locks in the hierarchies above the second lock (i.e., the remaining locks except the first lock in the plurality of locks) can be obtained, and thus the locks corresponding to the nodes in each hierarchy in the NUMA architecture are obtained.

For example, referring to fig. 11, fig. 11 is a schematic diagram of a lock release and transmission according to an embodiment of the present application. As shown in FIG. 11, assume that the holding relationship between thread 1 and multiple locks is: thread 1> lock of node 7> lock of node 10> node 12, thread 2 does not wait to preempt the lock of node1 when thread 1 requests release of multiple locks. In addition, thread 3 has preempted the lock of node 2 and waits at spin to preempt the lock of node 7; thread 5 preempts the lock of node 4 and the lock of node 8 and waits on its spin to preempt the lock of node 10. In this case, thread 1 issues a lock release request, and the processor releases the lock of node1 and the lock of node 7 according to the lock release request, and retains the holding relationship between the lock of node 7 and the lock of node 10 and the holding relationship between the lock of node 10 and the lock of node 12.

In this way, after releasing the lock of node 7, thread 3 can acquire the lock of the node of each hierarchy after node 7 after preempting the lock of node 7 by node 2, i.e., thread 3 can acquire the lock of node 7, the lock of node 10 and the lock of node 12. Because the thread 3 triggers the lock of the node 7 to be preempted after the lock of the node 2 is obtained by the preemption, the thread 3 obtains the locks of the nodes of each hierarchy in the NUMA architecture after the thread 3 preempts the lock of the node 7, thereby obtaining the authority to access the shared resource.

As can be seen in FIG. 11, thread 1 and thread 3 are located in adjacent NUMA domains, while thread 1 and thread 5 are located in non-adjacent NUMA domains. When the lock is transmitted from the thread 1 to the thread 3, a NUMA domain of one hierarchy level needs to be crossed, and the performance loss is relatively small; whereas a lock passed from thread 1 to thread 5 would need to span both levels of NUMA domains, with a relatively large performance penalty. Therefore, by reserving the holding relationship of the remaining locks after the node 7, the thread 3 located in the adjacent NUMA domain can preferentially obtain the locks of the nodes of each hierarchy, thereby avoiding the locks from passing across a large NUMA domain and improving the performance of the locks as much as possible.

In the scheme, during the period of releasing the first lock and the second lock, the holding relationship among other locks is reserved, so that other threads located in adjacent NUMA domains with the first thread can preferentially preempt the locks of the nodes of each hierarchy, the locks are prevented from being transmitted across a large NUMA domain, and the lock performance is improved as much as possible.

Further, the processor may not retain the holding relationship between the remaining locks of the plurality of locks during the release of the first lock and the second lock held by the first lock in accordance with the lock release request. That is, the processor acts to release the locks of the respective hierarchy nodes, and other threads waiting to preempt any of the locks held by the first thread may preempt the released lock. Therefore, other threads occupying the lock earlier can acquire the lock of each hierarchical node earlier, so that the shared resource is accessed earlier, and the threads are prevented from waiting for a long time.

Similarly, in releasing the second lock held by the first lock, if there is no thread waiting to preempt the second lock, the second lock and the locks held by the second lock are released to pass on the other locks of the plurality of locks except the first lock and the second lock to the thread waiting to preempt the lock held by the second lock.

That is, in the present embodiment, when releasing a plurality of locks held by a thread, the locks are sequentially released in order from the lower hierarchy level to the upper hierarchy level. When releasing the lock of any one hierarchy node, if a thread waiting for preempting the lock of the node exists, only releasing the lock of the node and reserving the holding relationship among the rest locks; if there are no threads waiting to preempt the lock of the node, then the lock of the node is released and the release of the lock of the node at the previous level of the node is triggered.

According to the scheme, the locks which need to be released are sequentially released and determined from the low hierarchy to the high hierarchy, and when threads waiting for the lock of a certain node to be preempted exist, the locks of other hierarchy nodes above the node are reserved, so that each transmission of the locks is guaranteed to be transmitted in the range of the smallest NUMA domain, long-distance transmission of the locks is avoided, and the lock performance is improved.

For example, referring to fig. 12, fig. 12 is a schematic diagram illustrating a comparison of lock transmissions according to an embodiment of the present application. As shown in fig. 12 (a), in the case of transferring locks based on their application sequence in the related art, it is assumed that the application sequence of the locks is: thread 1> thread 8> thread 3> thread 5. Then the order of delivery of the lock is thread 1> thread 8> thread 3> thread 5. Thus, a lock is passed from thread 1 to thread 8, requiring three levels of NUMA domains to be spanned; locks are passed from thread 8 to thread 3, requiring three levels of NUMA domains across the domain; locks are transferred from thread 3 to thread 5, which requires two levels of NUMA domains to span, i.e., three transfers require eight levels of NUMA domains to span, and have a large impact on lock performance.

As shown in (a) in fig. 12, in the case of transferring a lock based on the lock transfer method provided in the embodiment of the present application, it is also assumed that the application sequence of the lock is: thread 1> thread 8> thread 3> thread 5. When releasing the plurality of locks held by the thread 1, since there is no thread waiting to preempt the lock of the node1 and there is a thread 3 waiting to preempt the lock of the node 7, the locks of the node1 and the node 7 are released, so that the thread 3 can acquire the locks of the respective hierarchies after preempting to acquire the lock of the node 7. Similarly, when releasing the lock held by the thread 3, the locks, namely the locks of the node 2, the node 7 and the node 10, are released layer by layer in the order from the lower hierarchy level to the upper hierarchy level, and the holding relationship between the lock of the node 10 and the lock of the node 12 is reserved, so that the thread 5 can obtain the locks of all the hierarchy levels after preempting to obtain the lock of the node 10. Finally, when releasing the locks held by the thread 5, the locks of the respective hierarchies are sequentially released so that the thread 8 can preempt the locks of the respective hierarchies.

That is, in fig. 12 (b), the order of delivery of locks is thread 1> thread 3> thread 5> thread 8. Thus, a lock is passed from thread 1 to thread 3, only one level NUMA domain needs to be spanned; locks are passed from thread 3 to thread 5, requiring NUMA domains that span two levels of domains; locks passed from thread 5 to thread 8 require three levels of NUMA domains across domains, i.e., three passes require a total of six levels of NUMA domains across domains. Obviously, each transmission of the lock is transmitted in the range as close as possible, and the performance of the lock is improved.

It will be appreciated that when locks are delivered based on the methods described above, the order of delivery of the locks is dependent only on the location in the NUMA architecture of the thread waiting to preempt the lock, and not on the order in which the locks are applied. That is, each transfer of the lock is to be made within as short a distance as possible, whereas threads that are far from the thread currently holding the lock may need to wait longer to preempt the lock. Therefore, in order to avoid waiting too long for threads under other NUMA domain nodes, a corresponding mechanism may be set to avoid passing locks under the same NUMA domain node for too long.

In one possible embodiment, after acquiring a lock release request from a first thread, if there is a thread waiting for preemption of a first lock and the number of times of release of the first lock is less than a first threshold, the first lock is released according to the lock release request and holding relationships among the locks are reserved. The release times of the first lock are used for indicating the transfer times of the first lock between threads under the node corresponding to the first lock.

And if the thread waiting for pre-empting the first lock does not exist, or the releasing times of the first lock is greater than or equal to the first threshold, releasing the first lock and a second lock held by the first lock according to the lock releasing request.

It can be understood that if there is a thread waiting for preempting the lock of a node under a node, when the lock held by the thread is released, only the lock of the node is released, and the lock of the node at the previous level of the node is not released, so that the lock is always delivered in the thread under the node. Therefore, the number of times of releasing a lock corresponding to a certain node can be regarded as the number of times of transferring the lock corresponding to the node in the NUMA domain corresponding to the node. If the number of times of transmission of the lock in the NUMA domain corresponding to a certain node is smaller than the first threshold, the number of times of transmission of the lock in the NUMA domain corresponding to the node is less, and the lock can continue to be transmitted in the NUMA domain corresponding to the node. If the number of times of transmission of the lock in the NUMA domain corresponding to a certain node is greater than or equal to the first threshold, it represents that the number of times of transmission of the lock in the NUMA domain corresponding to the node is large, and it is necessary to release the lock of the node at the previous level of the node, so that the lock can be transmitted in the NUMA domains corresponding to other nodes, and it is avoided that threads in other NUMA domains wait for too long.

It is to be noted that the value of the first threshold may be determined according to an actual application scenario, for example, the value of the first threshold may be the number of threads under a node corresponding to the first lock in the NUMA architecture, or the value of the first threshold may be half of the number of threads under a node corresponding to the first lock in the NUMA architecture. The embodiment does not limit the specific value of the first threshold.

In another possible embodiment, after acquiring a lock release request from a first thread, if there is a thread waiting to preempt a first lock and the time for the first lock to hold the second lock is less than a second threshold, the first lock is released and the holding relationship among the locks is reserved according to the lock release request. The number of times of releasing the first lock is used for indicating the number of times of transferring the first lock between threads under the node corresponding to the first lock. The time when the first lock holds the second lock is the time when the first lock passes between threads under the node corresponding to the first lock.

And if the thread waiting for pre-empting the first lock does not exist, or the time for the first lock to hold the second lock is greater than or equal to a second threshold value, releasing the first lock and the second lock held by the first lock according to the lock release request.

That is, if the time to pass between threads of a lock under a node is less than the second threshold, then the time to pass between threads of the lock under the node is represented to be shorter and the passing between threads under the node can continue. If the transfer time of the lock between threads under a certain node is greater than or equal to the second threshold, the transfer time of the lock between threads under the node is longer, and the lock of the node at the previous level needs to be released, so that the lock can be transferred between threads under other nodes, and the threads under other nodes are prevented from waiting for too long time.

It is noted that the value of the second threshold may be determined according to the actual application scenario, for example, the value of the second threshold may be 0.01ms or 0.02ms. The embodiment does not limit the specific value of the second threshold.

The lock transfer method provided by the embodiment of the present application is described above from the perspective of the lock release process, and the lock transfer method provided by the embodiment of the present application is described below from the perspective of the lock application process.

In one possible embodiment, the method 1000 further includes the following steps 1004-1006.

Step 1004, acquiring a lock application request from the second thread, where the lock application request is used to request to apply for a lock controlling access to the shared resource.

When the second thread needs to access the shared resource, the second thread generates a lock application request to apply for a lock that controls access to the shared resource. Wherein, the lock application request from the second thread may carry the identification of the second thread to indicate the identity of the applicant.

Step 1005, determining a plurality of target locks corresponding to the second thread according to the lock application request.

In this embodiment, after the lock application request is obtained, a plurality of target locks corresponding to the second thread may be determined according to the thread identifier in the lock application request. Wherein the second thread needs to hold the plurality of target locks to be able to access the shared resource. The target locks respectively correspond to nodes of each level in the NUMA architecture, nodes corresponding to adjacent locks in the target locks are located in adjacent levels, and the nodes corresponding to the adjacent locks in the target locks have a connection relation.

Specifically, the node of the lowest hierarchy corresponding to the second thread may be determined according to the thread identifier in the lock application request, that is, the second thread is located below the node of the lowest hierarchy. And then sequentially determining the nodes of all levels according to the node of the lowest level corresponding to the second thread and the connection relation of the nodes of all levels, and finally obtaining a plurality of nodes corresponding to the second thread. Wherein the second thread is located within the NUMA domain indicated by the plurality of nodes. After determining a plurality of nodes corresponding to a second thread, the locks of the plurality of nodes may be determined as a plurality of target locks corresponding to the second thread.

For example, taking fig. 9 as an example, assuming that the second thread is thread 2 in fig. 9, it may be determined that the node of the lowest hierarchy corresponding to thread 2 is node 2. Then, based on the node connection relationship between the levels, the plurality of nodes corresponding to the thread 2 in sequence are respectively: node 2, node 7, node 10 and node 12. Thus, the plurality of target locks corresponding to thread 2 are the lock of node 2, the lock of node 7, the lock of node 10, and the lock of node 12.

And 1006, preempting the plurality of locks in sequence according to a sequence from low to high of the hierarchy of the nodes corresponding to the locks until the second thread successfully preempts the plurality of target locks.

After a plurality of target locks corresponding to the second thread are determined, the plurality of target locks corresponding to the second thread are preempted in sequence according to the sequence from the low level to the high level of the nodes until the second thread preempts all the locks in the plurality of target locks successfully.

It will be appreciated that in the process of a second thread preempting a plurality of target locks, if a target lock of a level is being held by another thread, the second thread is trapped in a loop waiting phase and always tries the target locks held by the other thread. Once the other threads release the target lock that the second thread waits to preempt, the second thread may preempt the target lock based on the preemption rules of the target lock itself.

After the second thread successfully preempts a target lock of a level, it may check whether the target lock preempted by the second thread holds a lock of a previous level. If the target lock preempted by the second thread holds a lock of a previous level, then the second thread may be deemed to have successfully preempted the target locks in all levels, and thus may cease to continue to preempt the target lock.

For convenience of understanding, the lock transmission method provided by the embodiment of the present application will be described in detail below with reference to specific examples.

In the implementation process, each node in the NUMA architecture and the connection relationship between the nodes are defined through an array structure.

For each node in the array structure, there are multiple variables. The plurality of variables of the node are respectively: lock variable value (has lock), lock type (local lock), whether or not to hold a lock of a node of the previous level (has high lock), number of waiters of the lock (waiters), number of releases of the lock (rand), and a pointer to a node of the previous level (global lock).

For example, referring to fig. 13, fig. 13 is a schematic diagram of an array structure of a node according to an embodiment of the present application. As shown in fig. 13, based on a plurality of variables in the node1, the lock variable value (has lock) of the node1 is 1, which indicates that the thread holds the lock of the node 1; the lock type of the node1 is MCS lock; the value of the has high lock variable in the node1 is 1, that is, the node1 is locked by the node lock of the previous level; the number of threads waiting for preempting the lock of the node1 is 3; the number of times of releasing the lock of the node1 is 2; the node pointer of the upper level to which the lock of node1 points is to node 2.

In addition, a mapping relationship between nodes may also be established, where the mapping relationship is used to indicate a connection relationship between nodes between two adjacent hierarchies, that is, to indicate a connection relationship between nodes in a NUMA architecture.

The three phases of lock initialization, lock application, and lock release will be described below based on the above array structure.

First, the lock is initialized.

In the lock initialization process, the array structure can be traversed, and a global lock pointer of each node in the array structure is set according to the mapping relation between the nodes. In addition, the lock variable value of each node in the array structure is set to 0, namely, the lock of the node is not held by the thread; and setting the value of the has high lock variable of each node in the array structure to 0, wherein the lock of each node does not hold the node lock of the previous level.

Second, application for a lock.

First, respective lock application functions are defined for nodes in the respective hierarchies. For nodes of other levels except the highest level, the lock application functions of the nodes comprise application instructions of locks of the current node, and lock application function call instructions of locks of nodes of the last level of the current node are nested.

When the processor executes the lock application function of a certain node, the processor executes an application instruction in the lock application function to apply for the lock of the current node; and when the current node does not hold the lock of the node of the previous hierarchy, calling a lock application function of the lock of the node of the previous hierarchy, thereby triggering the application of the lock of the node of the previous hierarchy. That is to say, when a thread applies for a lock of a certain node, whether the node holds a lock of a node of the previous hierarchy or not is judged, and the layer-by-layer hierarchy lock application from the low hierarchy to the high hierarchy is triggered until all the hierarchy locks are applied.

For example, taking fig. 9 as an example, it is assumed that a lock application function corresponding to a node of a first hierarchy in the NUMA architecture is cache group _ access, a lock application function corresponding to a node of a second hierarchy is NUMA node _ access, a lock application function corresponding to a node of a third hierarchy is Package _ access, and a lock application function corresponding to a node of a fourth hierarchy is System _ access. Then, in the lock application function cache group _ acquire, a call instruction of the lock application function NUMA node _ acquire is included to indicate that the lock application function NUMA node _ acquire is called when the node of the first hierarchy does not hold a lock of the node of the second hierarchy. Similarly, in the lock application function NUMA node _ acquire, a call instruction of the lock application function Package _ acquire is included; the lock application function Package _ acquire includes a call instruction of the lock application function System _ acquire.

Specifically, the lock application function of a certain node is executed by the following steps.

Step 1, adding 1 to the number of waiters (waiters) of a lock corresponding to a current node in an array structure to indicate that a new waiting thread exists;

and 2, calling an application function of the lock corresponding to the node to apply for the lock of the node. For example, when the lock of the node is ticket lock, a ticket lock _ acquire function is called.

And 3, subtracting 1 from the number of waiters (waiters) of the lock corresponding to the current node. Since, at the time of executing step 3, the application function of the lock in the previous step has been successfully executed, i.e. it is stated that the lock of the current node has been obtained and no longer waits, the number of waiters (waiters) of the lock corresponding to the current node is reduced by 1.

And step 3, inquiring a has high lock variable of the current node in the array structure to determine whether the current node holds the lock of the node of the previous level. If the current node does not hold the lock of the previous level node, the lock application function of the previous level node needs to be called to apply for the lock of the previous level node.

And thirdly, releasing the lock.

First, respective lock release functions are defined for nodes in the respective hierarchies. For nodes of other levels except the highest level, the lock release functions of the nodes comprise release applying instructions of the lock of the current node, and the lock release function calling instructions of the nodes of the previous level of the current node are nested.

When the processor executes the lock release function of a certain node, the processor judges whether a thread waiting for preempting the lock of the current node exists. And when the thread which waits for the lock of the current node to be preempted does not exist, calling the lock release function of the lock of the previous level node, thereby triggering the release of the lock of the previous level node. When the processor determines that there is a thread waiting to preempt the lock of the current node, the processor releases the lock from that node down the hierarchy. That is to say, when a thread needs to release the lock of a certain node, whether the lock of the node has a thread waiting for preemption is judged, and when the node has the thread waiting for preemption, the lock is triggered to be released from the node downwards layer by layer; when the node does not have the thread waiting for preemption, triggering and judging whether the thread waiting for preemption exists in the upper-level node of the node until determining that all-level locks need to be released or the current node has the thread waiting for preemption.

For example, taking fig. 9 as an example, it is assumed that a lock release function corresponding to a node of a first hierarchy in the NUMA architecture is cache group _ release, a lock release function corresponding to a node of a second hierarchy is NUMA node _ release, a lock release function corresponding to a node of a third hierarchy is Package _ release, and a lock release function corresponding to a node of a fourth hierarchy is System _ release. Then, in the lock release function cache group _ release, a call instruction of the lock release function NUMA node _ release is included to indicate that the lock release function NUMA node _ release is called when the node of the first hierarchy does not have a thread waiting for preemption. Similarly, in the lock release function NUMA node _ release, a call instruction of the lock release function Package _ release is included; the lock release function Package _ release includes a call instruction of the lock release function System _ release.

Specifically, when the lock release function of a certain node is executed, the following steps are included.

Step 1, querying a waiters variable of a current node in an array structure by calling a function cachelock _ has _ waiters to determine whether the waiting number of locks corresponding to the current node is 0. If the waiters variable is greater than 0, the fact that threads wait to obtain the lock of the current node is indicated, and the function cachelock _ has _ waiters returns to 1; if the waiters variable is equal to 0, indicating that no thread is waiting to acquire the lock of the current node, the function cachelock _ has _ waiters returns 0.

Step 2, querying the rand variable of the current node in the array structure by calling the function cachelock _ keep _ local to determine whether the number of times of releasing the lock of the current node is equal to the maximum value (i.e. the above-mentioned first threshold). If the value of the rand variable is equal to the maximum value, updating the value of the rand variable to 0, and returning the function cachelock _ keep _ local to 0; if the value of the rand variable is not equal to the maximum value, the value of the rand variable is incremented by 1 and the function cachelock _ keep _ local returns to 1.

And 3, if the functions cachelock _ has _ waiters in the step 1 and cachelock _ key _ local in the step 2 both return to 1, indicating that other threads wait to preempt the lock of the current node currently, and the release times of the current node are less than the maximum value, so that the lock release function of the current node is called to enable the lock of the current node to be transferred in the thread below the current node. In addition, because the lock holding relation between the current node and other nodes on the upper level of the current node is not changed, after the locks of the current node are obtained by preemption by other threads, the locks of the nodes of all levels can be obtained, so that the locks are guaranteed to be transmitted in the NUMA domain.

If the function cachelock _ has _ waiters in step 1 or the function cachelock _ key _ local in step 2 returns 0, it indicates that no other thread is waiting to preempt the lock of the current node currently, and the number of times of release of the current node is less than the maximum value, so the lock release function of the current node and the lock release function of the previous level node of the current node are called to release the lock of the current node and the lock of the previous level node of the current node.

In order to illustrate the improvement of the lock transfer performance by the lock transfer method provided in the embodiment of the present application, the lock transfer performance is tested in the embodiment. Specifically, the software LevelDB for testing lock performance, which is widely used in the industry, is adopted in the present embodiment. Referring to fig. 14, fig. 14 is a schematic diagram of a system architecture of a level db according to an embodiment of the present application.

In the level DB implementation, a large number of exclusive locks (mutex) are used to ensure that only one thread accesses a critical section. In this embodiment, the lock and unlock operations of mutex are replaced by acquire and release operations of the lock in this embodiment, so as to implement the lock transfer method provided in this embodiment.

Specifically, referring to fig. 15, fig. 15 is a schematic diagram illustrating comparison of lock transmission performance provided by the embodiment of the present application. As shown in FIG. 15, this example runs a LevelDB and Kyoto cabin performance comparison on the AMD Epyc x86 platform and the Kunpeng920 ARMv8 platform, respectively. The locks participating in the test are HMCS lock (i.e., related art lock), MCS lock, respectively.

Wherein, the three-layer lock with the best performance is perceived by part of NUMA layers: the x86 platform is a hemlock (cache group layer) MCS lock (NUMA node layer) MCS lock (System layer); the ARMv8 platform is token lock (NUMA node layer) CLH lock (Package layer) MCS lock (System layer).

Full NUMA layer aware four-layer lock with best performance: on the x86 platform, token lock (core layer), MCS lock (cache group layer), MCS lock (NUMA node layer) CLH lock (System layer); the ARMv8 platform includes ticket lock (cache group layer), CLH lock (NUMA node layer), ticket lock (Package layer), and CLH lock (System layer).

As can be seen from fig. 15, the performance advantage of the lock delivery method provided by this embodiment is obvious compared with the lock delivery method HMCS lock in the related art.

On the basis of the embodiments corresponding to fig. 1 to fig. 15, in order to better implement the above-mentioned scheme of the embodiments of the present application, the following also provides related equipment for implementing the above-mentioned scheme.

Specifically, referring to fig. 16, fig. 16 is a schematic structural diagram of a lock passing apparatus 1600 according to an embodiment of the present application, where the lock passing apparatus 1600 is applied to a non-uniform memory access NUMA architecture, where the NUMA architecture is a tree structure including a plurality of hierarchies, and each hierarchy in the tree structure includes one or more nodes. The lock transfer device 1600 includes: an obtaining unit 1601, configured to obtain a lock release request from a first thread, where the lock release request is used to request release of a plurality of locks that control access to a shared resource, and each lock in the plurality of locks corresponds to a node in each hierarchy in the NUMA architecture, where locks corresponding to two nodes adjacent to each hierarchy in the plurality of locks have a holding relationship therebetween, and a lock corresponding to a node in a lower hierarchy holds a lock corresponding to a node in an upper hierarchy; a processing unit 1602, configured to release the first lock according to the lock release request and reserve holding relationships among the locks to transmit the locks to a thread waiting to preempt the first lock if there is a thread waiting to preempt the first lock, where the first lock is a lock corresponding to a lowest-level node in the locks; the processing unit 1602, configured to release the first lock and a second lock held by the first lock according to the lock release request if there is no thread waiting to preempt the first lock, so as to transfer the second lock to a thread waiting to preempt the second lock; and the access condition of the shared resource is that a thread holds a lock corresponding to a node of each level under the NUMA architecture.

In a possible implementation manner, the processing unit 1602 is specifically configured to: releasing the first lock and a second lock held by the first lock according to the lock release request, and reserving holding relations among the rest locks in the plurality of locks so as to transfer the rest locks in the plurality of locks to a thread waiting for preempting the second lock; wherein the remaining ones of the plurality of locks are others of the plurality of locks other than the first lock.

In a possible implementation manner, the processing unit 1602 is specifically configured to: in the process of releasing the second lock held by the first lock, if there is no thread waiting to preempt the second lock, releasing the second lock and the locks held by the second lock, so as to transfer the locks other than the first lock and the second lock in the plurality of locks to the thread waiting to preempt the lock held by the second lock.

In a possible implementation manner, the processing unit 1602 is specifically configured to: if threads waiting for preempting a first lock exist and the number of times of releasing the first lock is smaller than a first threshold value, releasing the first lock according to the lock release request and reserving the holding relationship among the locks, wherein the number of times of releasing the first lock is used for indicating the number of times of transferring the first lock among the threads under the node corresponding to the first lock; and if the thread waiting for pre-empting the first lock does not exist, or the releasing times of the first lock is greater than or equal to the first threshold, releasing the first lock and a second lock held by the first lock according to the lock releasing request.

In a possible implementation manner, the processing unit 1602 is specifically configured to: if threads waiting for preemption of a first lock exist and the time for the first lock to hold the second lock is less than a second threshold value, releasing the first lock according to the lock release request and reserving the holding relationship among the locks, wherein the release times of the first lock are used for indicating the transfer times of the first lock among the threads under the node corresponding to the first lock; and if the thread waiting for pre-empting the first lock does not exist, or the time for the first lock to hold the second lock is greater than or equal to a second threshold value, releasing the first lock and the second lock held by the first lock according to the lock release request.

In a possible implementation manner, the obtaining unit 1601 is further configured to obtain a lock application request from a second thread, where the lock application request is used to request to apply for a lock that controls access to the shared resource; the processing unit 1602, configured to determine, according to the lock application request, a plurality of target locks corresponding to the second thread, where the plurality of target locks respectively correspond to nodes of each level in the NUMA architecture, where nodes corresponding to adjacent locks in the plurality of target locks are located in adjacent levels, and nodes corresponding to adjacent locks in the plurality of target locks have a connection relationship; the processing unit 1602 is further configured to preempt the plurality of locks in sequence according to a sequence from a low level to a high level of a hierarchy where nodes corresponding to the locks are located, until the second thread preempts the plurality of target locks successfully.

In a possible implementation manner, the processing unit 1602 is specifically configured to: calling a first release function according to the lock release request, wherein the first release function is a lock release function corresponding to the first lock; and executing the first release function, and releasing the first lock and reserving the holding relation among the plurality of locks according to the condition that the number of threads waiting for preempting the first lock is not 0.

In a possible implementation manner, the processing unit 1602 is specifically configured to: calling a first release function according to the lock release request, wherein the first release function is a lock release function corresponding to the first lock; calling a second release function indicated in the first release function according to the condition that the number of threads waiting for preemption of the first lock is 0, wherein the second release function is a lock release function corresponding to the second lock; executing the second release function to release a second lock held by the first lock; executing the first release function to release the first lock.

In one possible implementation, the plurality of locks are all spin locks.

In one possible implementation, the type of each lock in the plurality of locks is determined based on performance capabilities of a different lock in each tier of the NUMA architecture.

The lock delivery method provided by the embodiment of the present application may be specifically executed by a chip in an electronic device, where the chip includes: a processing unit, which may be, for example, a processor, and a communication unit, which may be, for example, an input/output interface, a pin or a circuit, etc. The processing unit may execute the computer executable instructions stored in the storage unit to cause the chip in the electronic device to perform the lock delivery method described in the embodiments of fig. 1 to 15. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, and the like, and the storage unit may also be a storage unit located outside the chip in the wireless access device, such as a read-only memory (ROM) or another type of static storage device that can store static information and instructions, a Random Access Memory (RAM), and the like.

The present application further provides a computer-readable storage medium, with reference to fig. 17, and in some embodiments, the method disclosed in fig. 3 above may be embodied as computer program instructions encoded on the computer-readable storage medium in a machine-readable format or on other non-transitory media or articles of manufacture.

Fig. 17 schematically illustrates a conceptual partial view of an example computer-readable storage medium comprising a computer program for executing a computer process on a computing device, arranged in accordance with at least some embodiments presented herein.

In one embodiment, the computer-readable storage medium 1700 is provided using a signal bearing medium 1701. The signal bearing medium 1701 may include one or more program instructions 1702 that, when executed by one or more processors, may provide the functions or portions of the functions described above with respect to fig. 5. Thus, for example, referring to the embodiment illustrated in FIG. 5, one or more features of steps 501-502 may be undertaken by one or more instructions associated with the signal bearing medium 1701. Further, program instructions 1702 in FIG. 17 also describe example instructions.

In some examples, the signal bearing medium 1701 may include a computer readable medium 1703 such as, but not limited to, a hard disk drive, a Compact Disc (CD), a Digital Video Disc (DVD), a digital tape, a memory, a ROM or RAM, and so forth.

In some implementations, the signal bearing medium 1701 may include a computer recordable medium 1704 such as, but not limited to, a memory, a read/write (R/W) CD, a R/W DVD, and the like. In some implementations, the signal bearing medium 1701 may include a communication medium 1705, such as, but not limited to, a digital and/or analog communication medium (e.g., a fiber optic cable, a waveguide, a wired communications link, a wireless communication link, etc.). Thus, for example, signal-bearing medium 1701 may be communicated by a wireless form of communication medium 1705 (e.g., a wireless communication medium conforming to the IEEE 802.17 standard or other transmission protocol).

The one or more program instructions 1702 may be, for example, computer-executable instructions or logic-implementing instructions. In some examples, a computing device of the computing device may be configured to provide various operations, functions, or actions in response to program instructions 1702 conveyed to the computing device by one or more of computer-readable media 1703, computer-recordable media 1704, and/or communication media 1705.

It should be understood that the arrangements described herein are for illustrative purposes only. Thus, those skilled in the art will appreciate that other arrangements and other elements (e.g., machines, interfaces, functions, orders, and groupings of functions, etc.) can be used instead, and that some elements may be omitted altogether depending upon the desired results. In addition, many of the described elements are functional entities that may be implemented as discrete or distributed components or in conjunction with other components, in any suitable combination and location.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: u disk, removable hard disk, read only memory, random access memory, magnetic or optical disk, etc. for storing program codes.

Claims

1. A lock delivery method applied to a non-uniform memory access (NUMA) architecture, the NUMA architecture being a tree structure including a plurality of levels, each level in the tree structure including one or more nodes, the method comprising:

acquiring a lock release request from a first thread, wherein the lock release request is used for requesting to release a plurality of locks for controlling access to a shared resource, each lock in the plurality of locks corresponds to a node in each level in the NUMA architecture, and in the plurality of locks, locks corresponding to two nodes adjacent to each level have holding relationship, and a lock corresponding to a node in a lower level holds a lock corresponding to a node in a higher level;

if a thread waiting for preemption of a first lock exists, releasing the first lock according to the lock release request and reserving holding relationships among the locks so as to transmit the locks to the thread waiting for preemption of the first lock, wherein the first lock is a lock corresponding to a lowest level node in the locks;

if the thread waiting for preempting the first lock does not exist, releasing the first lock and a second lock held by the first lock according to the lock release request so as to transmit the second lock to the thread waiting for preempting the second lock;

2. The method of claim 1, wherein the releasing the first lock and a second lock held by the first lock according to the lock release request comprises:

releasing the first lock and a second lock held by the first lock according to the lock release request, and reserving holding relations among the rest locks in the plurality of locks so as to transfer the rest locks in the plurality of locks to a thread waiting for preempting the second lock;

wherein remaining ones of the plurality of locks are others of the plurality of locks other than the first lock.

3. A method according to claim 1 or claim 2, wherein in releasing a second lock held by the first lock, if there are no threads waiting to preempt the second lock, then releasing the second lock and the locks held by the second lock to pass on to the other of the plurality of locks than the first lock and the second lock to the threads waiting to preempt the lock held by the second lock.

4. A method according to any one of claims 1-3, wherein said releasing the first lock and retaining the holding relationship among the plurality of locks according to the lock release request if there is a thread waiting to preempt the first lock comprises:

if threads waiting for preemption of a first lock exist and the release times of the first lock are smaller than a first threshold value, releasing the first lock according to the lock release request and reserving the holding relationship among the locks, wherein the release times of the first lock are used for indicating the transfer times of the first lock among the threads under the node corresponding to the first lock;

if there is no thread waiting for preemption of the first lock, releasing the first lock and a second lock held by the first lock according to the lock release request, including:

5. A method according to any one of claims 1-3, wherein releasing the first lock and reserving the holding relationships among the plurality of locks in accordance with the lock release request if there is a thread waiting to preempt the first lock comprises:

if threads waiting for preempting a first lock exist and the time for the first lock to hold the second lock is smaller than a second threshold value, releasing the first lock according to the lock release request and reserving the holding relationship among the locks, wherein the release times of the first lock are used for indicating the transfer times of the first lock among the threads under the node corresponding to the first lock;

6. The method of any one of claims 1-5, further comprising:

acquiring a lock application request from a second thread, wherein the lock application request is used for requesting to apply for controlling the lock for accessing the shared resource;

determining a plurality of target locks corresponding to the second thread according to the lock application request, wherein the target locks respectively correspond to nodes of each level in the NUMA architecture, the nodes corresponding to adjacent locks in the target locks are located in adjacent levels, and a connection relationship exists between the nodes corresponding to the adjacent locks in the target locks;

and according to the sequence from low to high of the hierarchy of the nodes corresponding to the locks, preempting the plurality of locks in sequence until the second thread successfully preempts the plurality of target locks.

7. A method according to any one of claims 1-3, wherein said releasing the first lock and retaining the holding relationship among the plurality of locks according to the lock release request if there is a thread waiting to preempt the first lock comprises:

calling a first release function according to the lock release request, wherein the first release function is a lock release function corresponding to the first lock;

and executing the first release function, and releasing the first lock and reserving the holding relationship among the locks according to the condition that the number of threads waiting to preempt the first lock is not 0.

8. A method according to any one of claims 1-3, wherein said releasing the first lock and a second lock held by the first lock in accordance with the lock release request if there is no thread waiting to preempt the first lock comprises:

calling a second release function indicated in the first release function according to the condition that the number of threads waiting for preemption of the first lock is 0, wherein the second release function is a lock release function corresponding to the second lock;

executing the second release function to release a second lock held by the first lock;

executing the first release function to release the first lock.

9. The method of any one of claims 1-8, wherein the plurality of locks are all spin locks.

10. The method of any one of claims 1-9, wherein the plurality of locks are all of the same type of lock;

or, the plurality of locks comprise locks of different types.

11. The method of claim 10, wherein the type of each lock in the plurality of locks is determined according to performance capabilities of a different lock in each tier of the NUMA architecture.

12. The method of any one of claims 1-11, wherein the plurality of levels in the NUMA architecture include a system level, a NUMA node level, a cache bank level, and a physical core level;

or, the plurality of levels in the NUMA architecture include a system level, a socket level, a NUMA node level, and a cache group level.

13. A lock delivery apparatus for use with a non-uniform memory access (NUMA) architecture, the NUMA architecture being a tree structure including a plurality of levels, each level in the tree structure including one or more nodes, the apparatus comprising:

an obtaining unit, configured to obtain a lock release request from a first thread, where the lock release request is used to request release of a plurality of locks controlling access to a shared resource, and each lock in the plurality of locks corresponds to a node in each hierarchy in the NUMA architecture, where, in the plurality of locks, locks corresponding to two nodes adjacent to each hierarchy have a holding relationship therebetween, and a lock corresponding to a node in a lower hierarchy holds a lock corresponding to a node in an upper hierarchy;

the processing unit is used for releasing the first lock according to the lock release request and reserving holding relations among the plurality of locks if a thread waiting for preempting the first lock exists, so that the plurality of locks are transmitted to the thread waiting for preempting the first lock, and the first lock is a lock corresponding to a lowest level node in the plurality of locks;

the processing unit is further configured to release the first lock and a second lock held by the first lock according to the lock release request if there is no thread waiting to preempt the first lock, so as to transfer the second lock to a thread waiting to preempt the second lock;

14. The apparatus according to claim 13, wherein the processing unit is specifically configured to:

15. The apparatus according to claim 13 or 14, wherein the processing unit is specifically configured to:

in the process of releasing the second lock held by the first lock, if there is no thread waiting to preempt the second lock, releasing the second lock and the locks held by the second lock, so as to transfer the locks other than the first lock and the second lock in the plurality of locks to the thread waiting to preempt the lock held by the second lock.

16. The apparatus according to any one of claims 13 to 15, wherein the processing unit is specifically configured to:

17. The apparatus according to any one of claims 13 to 15, wherein the processing unit is specifically configured to:

18. The apparatus of any one of claims 13-17,

the acquiring unit is further configured to acquire a lock application request from a second thread, where the lock application request is used to request for a lock for controlling access to the shared resource;

the processing unit is further configured to determine, according to the lock application request, a plurality of target locks corresponding to the second thread, where the plurality of target locks respectively correspond to nodes of each hierarchy in the NUMA architecture, and nodes corresponding to adjacent locks in the plurality of target locks are located in adjacent hierarchies, and nodes corresponding to adjacent locks in the plurality of target locks have a connection relationship;

and the processing unit is further configured to preempt the plurality of locks in sequence according to a sequence from low to high of a hierarchy in which nodes corresponding to the locks are located until the second thread preempts the plurality of target locks successfully.

19. The apparatus according to any one of claims 13 to 15, wherein the processing unit is specifically configured to:

and executing the first release function, and releasing the first lock and reserving the holding relation among the plurality of locks according to the condition that the number of threads waiting for preempting the first lock is not 0.

20. The apparatus according to any one of claims 13 to 15, wherein the processing unit is specifically configured to:

calling a second release function indicated in the first release function when the number of threads waiting for preempting the first lock is 0, wherein the second release function is a lock release function corresponding to the second lock;

executing the first release function to release the first lock.

21. The apparatus of any of claims 13-20, wherein the plurality of locks are spin locks.

22. The apparatus of any one of claims 13-21, wherein the plurality of locks are all of the same type;

or, the plurality of locks comprise locks of different types.

23. The apparatus of claim 22, wherein the type of each lock in the plurality of locks is determined according to performance capabilities of a different lock in each tier of the NUMA architecture.

24. The apparatus of any of claims 13-23, wherein the plurality of levels in the NUMA architecture comprise a system level, a NUMA node level, a cache bank level, and a physical core level;

25. An electronic device comprising a memory and a processor; the memory stores code, the processor is configured to execute the code, and when executed, the electronic device performs the method of any of claims 1-12.

26. A computer readable storage medium comprising computer readable instructions which, when run on a computer, cause the computer to perform the method of any of claims 1 to 12.

27. A computer program product comprising computer readable instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1 to 12.