WO2023066141A1 - 一种获取锁资源的方法、装置及设备 - Google Patents

一种获取锁资源的方法、装置及设备 Download PDF

Info

Publication number
WO2023066141A1
WO2023066141A1 PCT/CN2022/125241 CN2022125241W WO2023066141A1 WO 2023066141 A1 WO2023066141 A1 WO 2023066141A1 CN 2022125241 W CN2022125241 W CN 2022125241W WO 2023066141 A1 WO2023066141 A1 WO 2023066141A1
Authority
WO
WIPO (PCT)
Prior art keywords
core
duration
thread
code segment
waiting queue
Prior art date
Application number
PCT/CN2022/125241
Other languages
English (en)
French (fr)
Inventor
刘年
吴明瑜
陈海波
郭寒军
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023066141A1 publication Critical patent/WO2023066141A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores

Definitions

  • the embodiments of the present application relate to the field of computer technology, and in particular to a method, device and equipment for acquiring lock resources.
  • a mutex is a commonly used synchronization primitive. It is used to provide exclusive access guarantees for multi-threaded applications. When multiple threads need to modify shared data (such as the same entry in the database), these threads need to compete with each other to obtain the mutex; at the same time, only one thread that has acquired the mutex can modify the shared data. Revise. For a mutex, if it is acquired by one thread, other threads competing for the mutex can only enter the sleep state.
  • mutex acquired in this way is also called a spin lock.
  • queue locks appear again.
  • the queue lock maintains a first-in-first-out waiting queue.
  • Each thread that wants to acquire a mutex needs to join the waiting queue first, and the thread that joins the waiting queue first acquires the mutex first.
  • the heterogeneous multi-core processor can also be called an asymmetric multi-core processor, which contains multiple cores of different sizes.
  • a small core usually refers to a core with poor performance.
  • the thread running under the small core needs to acquire the mutex; after that, if the thread running under the big core needs to acquire the mutex, it will run under the big core It is also necessary to join the waiting queue; since the thread running under the small core joins the waiting queue before running under the large core, the thread running under the small core acquires the mutex first.
  • the thread running under the small core takes a long time to execute the critical section (the code area protected by the mutex lock) after acquiring the mutex, which will cause the thread running under the large core to The waiting time of the thread under the CPU is longer, resulting in a decrease in the throughput of the processor.
  • the critical section the code area protected by the mutex lock
  • the embodiment of the present application provides a method, device, and device for acquiring lock resources.
  • the method can make the second thread running under the second core with high priority precede the first thread running on the first core with low priority.
  • One thread joins the waiting queue, so that the second thread acquires the lock resource preferentially, so as to reduce the waiting time of the second thread and improve the throughput rate of the processor.
  • the embodiment of the present application provides a method for acquiring lock resources, which is applied to a computer device.
  • the processor of the computer device includes a first core and a second core, and the first core and the second core correspond to different priorities; Based on this, the method includes: running the first code segment through the first thread running on the first core to perform the following first operation, wherein the first code segment refers to a segment of code in the application program: determining the first core
  • the priority of the second core is lower than the priority of the second core.
  • the first core corresponds to a low priority mark
  • the second core corresponds to High priority identification, so that the first thread can determine that the priority of the first core is lower than the priority of the second core based on the priority identification corresponding to the first core; in the case that the waiting queue is not empty, delay joining Waiting queue, so that the second thread running under the second core joins the waiting queue before the first thread, and the waiting queue is used to compete for lock resources.
  • the embodiment of the present application does not specifically limit the length of the delay.
  • the first thread running on the first core delays joining the waiting queue so that the thread running under the second core
  • the second thread joins the waiting queue before the first thread; in this way, the second thread can preferentially acquire the lock resource, so that the critical section can be executed preferentially.
  • the lock resource is obtained, so that the thread running under the core with high priority can execute the critical section prior to the thread running under the core with low priority, thereby reducing the number of threads running under the priority core.
  • the waiting time of the thread under the high core increases the number of times the thread running under the core with high priority executes the critical section; and because the core with high priority is usually the core with better performance, it runs under the core with better performance.
  • the execution time of the critical section of the thread is shorter, and the execution time of the thread running under the core with poor performance is longer. During the period of time, the total number of times the critical section is executed can be increased, thereby improving the throughput.
  • determining that the priority of the first core is lower than that of the second core includes: determining that the priority of the first core is lower than that of the second core based on the performance of the second core being better than that of the first core
  • the priority of the core, wherein the performance of the core can be determined by multiple parameters, for example, can be determined by parameters such as frequency and cache.
  • the embodiment of the present application can reduce the waiting time of the thread running under the core with excellent performance, and increase the number of times the thread running under the core with excellent performance executes the critical section; And because the thread running under the core with better performance executes the critical section for a shorter period of time, and the thread running under the core with poor performance executes the critical section for a longer period of time, therefore, within a fixed period of time, the performance can be improved.
  • the total number of critical sections to improve throughput.
  • delaying joining the waiting queue includes: when the waiting queue is not empty, the first thread enters a backoff state, and the backoff state is a state waiting to join the waiting queue; Based on the duration of entering the backoff state is greater than or equal to the out-of-order duration, join the waiting queue.
  • the first thread joins the waiting queue when the waiting time is greater than or equal to the out-of-sequence duration, it is prevented that the first thread's waiting time is too long to affect the normal operation of the first code segment.
  • the first operation before joining the waiting queue, also includes: obtaining the out-of-order duration.
  • obtaining the out-of-order duration There are many methods for determining the out-of-order duration, which are not specifically limited in this embodiment of the present application. For example, you can The out-of-sequence duration is obtained based on whether the first code segment has a delay requirement.
  • the out-of-order duration is acquired based on the duration of entering the backoff state being greater than or equal to the out-of-order duration.
  • obtaining the out-of-order duration includes: based on the delay requirement of the first code segment, obtaining the first duration corresponding to the first code segment, as the out-of-order duration, the first duration is obtained based on the delay requirement ;
  • the first duration can be estimated based on the target delay specified by the delay requirement, and then the first duration can be adjusted later.
  • the first duration is usually less than the target time delay.
  • the first duration is determined based on the delay requirement, and the first duration is used as the out-of-order duration, which can prevent the running time of the first code segment from being too long due to the out-of-order duration. meet the delay requirements.
  • the method further includes: before running the first code segment, performing the following second operation through the first thread: setting the value of the global variable as the first identifier of the first code segment, the global variable represents There is a code segment with a delay requirement; based on the delay requirement of the first code segment, before obtaining the first duration corresponding to the first code segment as the out-of-sequence duration, the first operation also includes: based on the value of the global variable An identification, it is determined that the first code segment has a delay requirement.
  • the first code segment with delay requirements is marked by introducing a global variable, and the value of the global variable is set as the first identifier of the first code segment before running the first code segment, so as to
  • the first thread can determine that the first code segment has a delay requirement according to the value of the global variable, and the method of marking the first code segment with the global variable is simple and easy to implement.
  • the method further includes: after running the first code segment, performing the following third operation through the first thread: setting the value of the global variable to the second identifier, and the second identifier does not identify any code part.
  • the first thread can execute multiple code segments, if the value of the global variable is not set as the second flag, it may cause incorrect calculation of out-of-order duration; for example, the next code segment of the first code segment does not have a delay requirement , if the value of the global variable is not set to the second identifier, the value of the global variable is still the first identifier; in this way, in the process of running the next code segment through the first thread, the first code segment will be mistakenly set to The corresponding first duration is used as the out-of-sequence duration; therefore, setting the value of the global variable as the second identifier can prevent incorrect calculation of the out-of-sequence duration.
  • the method further includes: after running the first code segment, performing the following third operation through the first thread: obtaining the actual running time of the first code segment, specifically, after running the first code segment , you can record the timestamp when the first code segment starts to run, and when the first code segment finishes running, get the timestamp when the first code segment ends, based on the timestamp when the first code segment starts running and when the run ends
  • the time stamp of the first code segment can be calculated to obtain the actual running time of the first code segment; based on the relative size of the actual running time and the target delay specified by the delay requirement, the first time length can be adjusted, and there are many ways to adjust the first time length kind.
  • the first duration is obtained based on the delay requirement, the first duration may not be accurate enough.
  • the first duration is short, causing the actual running duration of the first code segment to be far less than the delay requirement, or the first duration is longer , causing the actual running time of the first code segment to be much longer than the delay requirement; therefore, based on the relative size of the actual running time of the first code segment and the target delay specified by the delay requirement, the first time length is adjusted, thereby achieving In the case of delayed joining of the first thread, the delay requirement of the first code segment is met.
  • adjusting the first duration based on the relative size between the actual running duration and the target delay includes: shortening the first duration based on the fact that the actual running duration is greater than the target delay, wherein the shortening range can be based on the actual Adjustments are required.
  • shortening the first time period can make the actual running time of the first code segment less than the target time delay, so that the time of the first code segment can be satisfied under the condition of implementing the delayed joining of the first thread. delay demand.
  • adjusting the first duration based on the relative size between the actual running duration and the target delay includes: extending the first duration based on the fact that the actual running duration is less than the target delay.
  • the first time length is extended, so as to meet the time delay requirements of the first code segment, the time length of the first thread in the backoff state is extended as much as possible, so that the second thread Threads are added to the waiting queue first to improve throughput.
  • obtaining the out-of-sequence duration includes: obtaining the second duration as the out-of-sequence duration based on the fact that the first code segment does not have a delay requirement.
  • the second duration is usually longer; the second duration is obtained as the out-of-order duration, so that the first thread can delay joining the waiting queue.
  • the first operation further includes: based on the value of the global variable as the second identifier, determining The first code segment does not have a delay requirement, the global variable indicates a code segment with a delay requirement, and the second identifier does not identify any code segment; where the second identifier does not identify any code segment, it can also be understood that there is currently no code segment Code snippet for latency requirements.
  • the first code segment with a delay requirement is marked by introducing a global variable, so that it can be determined that the first code segment does not have a delay requirement based on the value of the global variable as the second identifier; This method of marking the first code segment with a global variable is simple and easy to implement.
  • delaying joining the waiting queue also includes: detecting the waiting queue based on the duration of entering the backoff state is less than the out-of-order duration; when the waiting queue is empty, joining waiting queue.
  • the waiting queue Based on the duration of entering the backoff state is less than the out-of-sequence duration, the waiting queue is detected, and if the waiting queue is empty, it is added to the waiting queue, which can prevent the waiting queue from being empty but the first thread is still in the backoff state, thereby maximizing the utilization Lock resources to improve throughput.
  • the first operation further includes: if the waiting queue is empty, joining the waiting queue.
  • joining the waiting queue can prevent the waiting queue from being empty but the first thread is still in the back-off state, thereby maximizing the use of lock resources and improving throughput.
  • the embodiment of the present application provides an apparatus for acquiring lock resources, which is applied to computer equipment, and the processor of the computer equipment includes a first core and a second core; the apparatus includes: a determination unit, configured to determine the The priority is lower than the priority of the second core; the queue joining unit is used to delay joining the waiting queue when the waiting queue is not empty, so that the second thread running under the second core joins before the first thread Waiting queue, the waiting queue is used to compete for lock resources.
  • the determining unit is configured to determine that the priority of the first core is lower than the priority of the second core based on the performance of the second core being better than the performance of the first core.
  • the queue joining unit is used for the first thread to enter the back-off state when the waiting queue is not empty, and the back-off state is the state of waiting to join the waiting queue; If the sequence time is long, join the waiting queue.
  • the queue join unit is used to obtain the out-of-order duration.
  • the queue joining unit is configured to obtain the first duration corresponding to the first code segment as the out-of-sequence duration based on the delay requirement of the first code segment, and the first duration is obtained based on the delay requirement.
  • the device also includes: a first setting unit, configured to set the value of the global variable as the first identifier of the first code segment, the global variable indicates a code segment with delay requirements; a queue adding unit is further used to determine that the first code segment has a delay requirement based on the value of the global variable as the first identifier.
  • the device further includes: a second setting unit, configured to set the value of the global variable as a second identifier, and the second identifier does not identify any code segment.
  • the device further includes: an adjustment unit, configured to obtain the actual running duration of the first code segment; based on the relative size of the actual running duration and the target delay specified by the delay requirement, adjust the first duration.
  • the adjustment unit is configured to shorten the first duration based on the fact that the actual running duration is greater than the target latency.
  • the adjustment unit is configured to extend the first duration based on the fact that the actual running duration is less than the target delay.
  • the queue joining unit is configured to obtain the second duration as the out-of-sequence duration based on the fact that the first code segment does not have a delay requirement.
  • the queue joining unit is used to determine that the first code segment does not have a delay requirement based on the value of the global variable as the second identifier, the global variable indicates a code segment with a delay requirement, and the second identifier Does not identify any one code segment.
  • the queue joining unit is configured to detect the waiting queue based on that the duration of entering the backoff state is less than the out-of-sequence duration; when the waiting queue is empty, join the waiting queue.
  • the queue joining unit is configured to join the waiting queue when the waiting queue is empty.
  • the third aspect of the embodiment of the present application provides a computer device, the computer device includes a memory and a processor, the memory is used to store computer-readable instructions (or called computer programs), and the processor is used to read the The above computer-readable instructions are used to implement the method provided by any of the foregoing implementation manners.
  • the fourth aspect of the embodiments of the present application provides a computer program product containing instructions, which, when run on a computer, enables the computer to execute the method described in any of the foregoing aspects and any of the various possible implementations. method.
  • the fifth aspect of the embodiment of the present application provides a computer-readable storage medium, including instructions, and when the instructions are run on the computer, the computer executes the above-mentioned first aspect and any one of the various possible implementation manners. described method.
  • a sixth aspect of the embodiment of the present application provides a chip, including one or more processors. Part or all of the processor is used to read and execute the computer program stored in the memory, so as to execute the method in any possible implementation manner of the above first aspect.
  • the chip includes a memory, and the memory and the processor are connected to the memory through a circuit or wires. Further optionally, the chip further includes a communication interface, and the processor is connected to the communication interface.
  • the communication interface is used to receive data and/or information to be processed, and the processor obtains the data and/or information from the communication interface, processes the data and/or information, and outputs the processing result through the communication interface.
  • the communication interface may be an input-output interface.
  • some of the one or more processors may implement some of the steps in the above method through dedicated hardware, for example, the processing related to the neural network model may be performed by a dedicated neural network processor or graphics processor to achieve.
  • the method provided in the embodiment of the present application may be implemented by one chip, or may be implemented by multiple chips in cooperation.
  • FIG. 1 is a schematic diagram of the core of a computer device provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of a first embodiment of a method for acquiring a lock resource provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of a second embodiment of a method for acquiring a lock resource provided by an embodiment of the present application
  • FIG. 4 is a schematic flow diagram of an embodiment of obtaining out-of-sequence duration in the embodiment of the present application
  • FIG. 5 is a schematic diagram of a third embodiment of a method for acquiring lock resources provided by the embodiment of the application.
  • FIG. 6 is a schematic flow diagram of another embodiment of obtaining out-of-sequence duration in the embodiment of the present application.
  • Fig. 7 is the process that the first thread joins the waiting queue in the embodiment of the present application.
  • FIG. 8 is a schematic diagram of adding an out-of-order lock to an application program in an embodiment of the present application.
  • Fig. 9 is a schematic diagram of the first thread joining the waiting queue in the embodiment of the present application.
  • Figure 10 is a schematic diagram of the comparison between the throughput rate of the target library and the throughput rate using other locks
  • FIG. 11 is a schematic diagram of changes in the delay of epochs in the embodiment of the present application.
  • FIG. 12 is a schematic diagram of a device for acquiring lock resources provided by the embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • plural means two or more.
  • the term “and/or” or the character “/” in this application is just an association relationship describing associated objects, indicating that there may be three relationships, for example, A and/or B, or A/B, which may indicate: A alone exists, both A and B exist, and B exists alone.
  • the embodiment of the present application can be applied to the computer equipment shown in FIG. 1, and the processor in the computer equipment includes multiple cores (cores) of different sizes; A core with better performance, and a small core can refer to a core with poorer performance.
  • cores cores
  • FIG. 1 shows 4 cores, and these 4 cores include 1 core A, 2 cores B and 1 core C.
  • the size of the circle is used to indicate the size of the core; based on Figure 1, it can be seen that core A is larger than core B, specifically, the performance of core A is better than that of core B; core B is greater than core C, specifically can be It shows that the performance of core B is better than that of small core C.
  • core B in Figure 1 can be regarded as a small core; relative to core B in Figure 1, core C in Figure 1 can also be regarded as a small core.
  • the spin lock will lead to the problem of unfairness among threads, and although the queue lock can ensure the fairness among threads, it will also lead to a decrease in throughput.
  • thread 1 Assuming that the lock is occupied by a certain thread, within a period of time before the lock is released, thread 1, thread 2, thread 3, thread 4, thread 5, and thread 6 request the lock in order of time, and thread 1 and thread 3 Running under core B, thread 2, thread 4, thread 5 and thread 6 running under core A.
  • the occupied lock is a spin lock
  • thread 1, thread 2, thread 3, thread 4, thread 5, and thread 6 will compete for the lock at the same time through atomic operations ; Specifically, thread 1, thread 2, thread 3, thread 4, thread 5, and thread 6 will set the lock variable to a characteristic value (for example, 1), if one of the threads is set successfully, then the thread will acquire the spin lock ;
  • the atomic operation refers to the operation that will not be interrupted by the thread scheduling mechanism.
  • thread 2 thread 4, thread 5, thread 6, and thread 1 successively acquire the spin lock, but due to limited time, thread 3 does not acquire the spin lock.
  • the occupied lock is a queue lock
  • the order of requesting the lock is thread 1
  • thread 2 thread 3, thread 4, thread 5, and thread 6
  • the order of joining the waiting queue is also thread 1, thread 2 , thread 3, thread 4, thread 5, and thread 6.
  • Thread 1 and thread 3 compete for the queue lock, it takes a long time to execute the critical section, and because of the limited time, the final result of competing for the lock is: thread 1, thread 2, thread 3. Thread 4 acquires the queue lock sequentially, and thread 5 and thread 6 do not acquire the queue lock within this limited period of time.
  • the occupied lock is a queue lock
  • 4 threads execute the critical section once through the queue lock
  • the occupied lock is a spin lock
  • 5 threads pass through the spin lock.
  • Each critical section is executed once; therefore, the number of executions of the critical section through the queue lock is less than the number of executions of the critical section through the spin lock, that is, the queue lock will lead to a decrease in the throughput rate, where the throughput rate can be understood as the amount of service provided per unit time .
  • the queue lock mentioned above can also be called a queuing spin lock (FIFO Ticket Spinlock), referred to as a FIFO lock, and a FIFO lock can be understood as a new type of spin lock.
  • FIFO Ticket Spinlock queuing spin lock
  • FIFO lock queuing spin lock
  • the embodiment of the present application provides a method for acquiring lock resources, which utilizes the existing FIFO lock, but before joining the waiting queue, first judges the priority of the core of the running thread, when the priority of the core When the level is low, the thread delays joining the waiting queue; in this way, before the thread joins the waiting queue, the thread running under the core with high priority can join the waiting queue first; therefore, even if the thread running under the core with high priority A thread that requests a lock resource later than a thread running under a core with a lower priority can also obtain the lock resource first; where the lock resource can also be referred to as a lock for short.
  • the embodiment of the present application can reduce the time for threads running under the core with high priority to acquire lock resources.
  • the core with high priority is the core with better performance, it means that the thread running under the core with high priority Threads can preferentially acquire lock resources and execute critical sections, while threads running under high-priority cores execute critical sections for a shorter time, so the number of executions of critical sections within a certain period of time can be increased, thereby improving throughput.
  • the embodiment of the present application provides a first embodiment of a method for acquiring lock resources, the first embodiment is applied to a computer device, the processor of the computer device includes multiple cores, the embodiment of the present application
  • the specific number of cores is not limited; for example, the number of cores may be 2, 3, or more than 3.
  • Multiple cores correspond to multiple priorities, and there are various methods for dividing priorities, which are not specifically limited in this embodiment of the present application; usually, multiple cores can be prioritized based on the performance of the cores.
  • the four cores in Figure 1 can be divided into three priority levels based on the performance of the cores. Specifically, core A corresponds to the first priority, and core B corresponds to the second priority. , core C corresponds to the third priority.
  • the 4 cores can also be divided into 2 priorities based on the performance of the cores. Specifically, core A corresponds to the first priority, and core B and core C both correspond to the second priority; or, core A And core B corresponds to the first priority, and core C corresponds to the second priority.
  • multiple cores can be divided into two priorities.
  • the methods include:
  • the first code segment is run by the first thread running on the first core to perform the first operation, wherein the first code segment refers to a segment of code in the application program, and the segment of code can specifically be used to process a certain request.
  • multiple code segments can be run by the first thread, and the first code segment can be any one of the multiple code segments.
  • the first operation includes:
  • Step 101 determine that the priority of the first core is lower than the priority of the second core.
  • step 101 may include: determining that the priority of the first core is lower than that of the first core based on the performance of the second core being better than the performance of the first core. Two cores.
  • the first core and the second core can be divided into corresponding priorities in advance, and different priority identifiers can be set for the first core and the second core based on different priorities;
  • a priority identifier corresponding to a core determines that the priority of the first core is lower than the priority of the second core.
  • the priority identification of the first core is AA
  • the priority identification of the second core is BB
  • the priority indicated by AA is lower than the priority indicated by BB
  • the first thread is determining the priority of the first core
  • it is AA instead of BB, it can be determined that the priority of the first core is lower than that of the second core.
  • Step 102 if the waiting queue is not empty, delay joining the waiting queue, so that the second thread running under the second core joins the waiting queue before the first thread, and the waiting queue is used to compete for lock resources.
  • Delayed joining the waiting queue can be understood as, when it is determined that the waiting queue is not empty, it does not immediately join the waiting queue, but waits for a period of time before joining the waiting queue; then during the waiting period, the second thread can Join the waiting queue first.
  • Step 103 if the waiting queue is empty, join the waiting queue.
  • step 103 can improve the throughput rate.
  • the first thread running on the first core delays joining the waiting queue to Make the second thread running under the second core join the waiting queue before the first thread; in this way, the second thread can preferentially acquire the lock resource, so that the critical section can be preferentially executed.
  • the lock resource is obtained, so that the thread running under the core with high priority can execute the critical section prior to the thread running under the core with low priority, thereby reducing the number of threads running under the core with high priority.
  • the waiting time of the thread under the core of the core increases the number of times the thread running under the core with high priority executes the critical section; and because the core with high priority is usually the core with better performance, the thread running under the core with better performance.
  • the thread executes the critical section for a short period of time, and the thread running under the core with poor performance executes the critical section for a long time. Therefore, increasing the number of threads running under the core with high priority executes the critical section at a fixed time In the section, the total number of times the critical section is executed can be increased, thereby improving the throughput.
  • step 102 Based on the relevant description of step 102 above, it can be known that if the waiting queue is not empty, the delay in joining the waiting queue can be set according to actual needs.
  • developers may have delay requirements for the operation of some code segments. Based on this, for the first code segment with delay requirements and the first code segment with no delay requirements, set different lengths of delay time.
  • this embodiment of the present application provides a second embodiment of a method for acquiring lock resources, which includes:
  • Step 201 set the value of the global variable as the first identifier of the first code segment.
  • the global variable represents a code segment with delay requirement.
  • this embodiment introduces a global variable; when the value of the global variable is the first identifier of the first code segment, it indicates that the first code segment has a delay requirement.
  • each code segment can be run by the first thread.
  • each code segment corresponds to a globally unique identifier, so the first identifier is also globally unique.
  • the initial value of the global variable can be set, and the initial value usually does not identify any code segment, and the specific value of the initial value can be set according to actual needs; for example, the initial value can be -1, This initial value is also referred to as the second identifier hereinafter.
  • step 201 is optional.
  • step 201 may be performed before running the first code segment; in this embodiment, the operation performed before running the first code segment is called a second operation, and accordingly, step 201 is included in the second operation.
  • the first code segment is run by the first thread to perform the first operation (ie step 202 to step 207), and at the same time, the time stamp at which the first code segment starts to run is recorded, which is used for calculation The actual running time of the first code segment.
  • Step 202 determining that the priority of the first core is lower than the priority of the second core.
  • Step 202 is similar to step 101, for details, please refer to the relevant description of step 101 to understand step 202.
  • Step 203 obtaining out-of-sequence duration.
  • the out-of-order duration refers to the duration during which the first thread and the second thread are allowed to join the waiting queue out of order, that is, the duration during which the first thread delays joining the queue, or the duration during which the first thread waits to join the queue.
  • the out-of-sequence duration can be determined based on whether the first code segment has a delay requirement, and in this embodiment, the first code segment has a delay requirement.
  • step 203 includes:
  • step 301 it is determined that there is a delay requirement in the first code segment.
  • step 301 may include:
  • the first code segment Based on the value of the global variable as the first identifier, it is determined that the first code segment has a delay requirement.
  • Step 302 based on the delay requirement of the first code segment, obtain the first duration corresponding to the first code segment as the out-of-sequence duration, and the first duration is obtained based on the delay requirement.
  • the first duration may be estimated based on the target latency specified by the latency requirement, and then the first duration may be adjusted later.
  • the out-of-order duration refers to the duration of the first thread waiting to join the queue.
  • it also takes time to execute other parts of the code in the first code segment through the first thread. Therefore, the first duration is usually shorter than the time The target delay required by the delay requirement.
  • the first duration may be dynamically adjusted.
  • Step 203 is executed before step 205; the embodiment of the present application does not specifically limit the sequence of steps 203 and 204, specifically, step 203 can be executed first, and then step 204 can be executed, or step 204 can be executed first, and then executed Step 203.
  • Step 204 when the waiting queue is not empty, the first thread enters a back-off state, and the back-off state is a state of waiting to join the waiting queue.
  • a time stamp of entering the backoff state may be recorded.
  • Step 205 based on the duration of entering the backoff state is greater than or equal to the out-of-sequence duration, join the waiting queue.
  • step 205 it is necessary to calculate the duration of entering the back-off state; specifically, the current time can be continuously obtained, and the time length of entering the back-off state can be calculated based on the current time and the time stamp of entering the back-off state; if entering If the duration of the backoff state is greater than or equal to the out-of-order duration, it will join the waiting queue.
  • duration of entering the back-off state is less than the out-of-sequence duration, repeat the above operation, that is, obtain the current time again, and calculate the duration of entering the back-off state again until the duration of entering the back-off state is greater than or equal to the out-of-order duration.
  • Step 206 based on the time length of entering the backoff state is less than the out-of-order time length, the waiting queue is detected.
  • the waiting queue can also be detected to avoid that the duration of entering the backoff state is less than the out-of-order duration, but the waiting queue is empty; in this case, if the first thread is still in the backoff state, it will cause a waste of lock resources.
  • the number of detections of the waiting queue can be less than the number of times of calculating the duration of entering the backoff state, so as to avoid additional time delay caused by frequent detection operations.
  • the strategy of exponential back-off check can be used to detect the waiting queue, that is, when the number of times to calculate the time to enter the back-off state is an exponential multiple, and the time to enter the back-off state is less than the out-of-order time, the waiting queue is detected; and when the time to enter the back-off state is calculated When the number of times of the duration is a non-exponential multiple, the waiting queue is not detected.
  • the waiting queue is detected; When the sequence time is long, check the waiting queue; after calculating the time to enter the backoff state for the fourth time, and when the time to enter the backoff state is less than the out-of-order time, check the waiting queue; after calculating the time to enter the backoff state for the eighth time, and enter When the duration of the backoff state is less than the out-of-order duration, the waiting queue is detected; and so on.
  • the waiting queue is not detected.
  • Step 207 if the waiting queue is empty, join the waiting queue.
  • step 207 it is possible to prevent the waiting queue from being empty but the first thread is still in the backoff state, so as to maximize the use of lock resources and improve throughput.
  • steps 203 to 207 constitute a specific implementation of step 102 .
  • the third operation may also be performed by the first thread (ie step 208 to step 210).
  • step 208 the value of the global variable is set as the second identifier, and the second identifier does not identify any code segment.
  • step 208 the value of the global variable is still the first identification; like this, in the process of running the next code segment by the first thread, The first duration corresponding to the first code segment will be mistakenly regarded as the out-of-order duration.
  • step 208 can prevent miscalculation of out-of-sequence durations.
  • step 208 is optional, and generally, when step 201 is executed, step 208 will be executed.
  • the first duration is obtained based on the delay requirement, but the first duration may not be accurate enough, so after running the first code segment, the first duration can be adjusted; for example, since the first duration is based on The delay requirement is obtained, so the first duration can be adjusted based on the delay requirement and using a feedback mechanism, which will be described through steps 209 and 210 below.
  • Step 209 acquiring the actual running duration of the first code segment.
  • step 209 may include: obtaining the time stamp of ending the running of the first code segment, and then based on the time of starting to run The actual running time of the first code segment is calculated based on the timestamp and the time stamp of the end of running.
  • Step 210 adjust the first duration based on the relative size of the actual running duration and the target delay, where the target delay is the running delay of the first code segment expected by the user.
  • step 210 includes: shortening the first duration based on the fact that the actual running duration is greater than the target delay.
  • the first duration may be shortened by half.
  • step 210 includes: extending the first duration based on the fact that the actual running duration is less than the target delay.
  • the adjustment range can be extended by one unit each time on the basis of the first duration; in order to prevent extending the first After the duration, the actual running duration is longer than the target delay again, and the adjustment range of one unit is generally smaller than the shortening range during the previous shortening of the first duration.
  • the adjustment range may be (100-PCT)/PCT of the shortening range in the previous shortening process of the first duration.
  • PCT is the tail delay index of the target delay. For example: if the set target delay is P99 tail delay, the PCT is 99, and the adjustment range is 1/99 of the shortening range during the previous shortening of the first duration.
  • the tail delay refers to a specific delay, and in all the times of running the first code segment, the delay of most operations will be less than the specific delay; for example, the P99 tail delay refers to the delay in the first code segment In all runs of the segment, 99% of the runs will have a delay less than the delay of the run.
  • the first time length will be shortened by half, and each subsequent increase will be 1/99 of the reduction range. If the execution situation does not change, the next 99 adjustments will not exceed the target latency.
  • the embodiment of the present application provides a third embodiment of a method for acquiring lock resources, which includes:
  • Step 401 determine that the priority of the first core is lower than the priority of the second core.
  • Step 401 is similar to step 202, and step 401 can be understood by referring to the relevant description of step 202 above.
  • Step 402 obtain the out-of-sequence duration.
  • step 402 is different from step 203 .
  • step 402 includes:
  • Step 501 determine that there is no delay requirement in the first code segment.
  • step 501 may include:
  • the global variable Based on the value of the global variable as the second identifier, it is determined that the first code segment does not have a delay requirement, the global variable indicates a code segment with a delay requirement, and the second identifier does not identify any code segment.
  • Step 502 based on the fact that the first code segment does not have a delay requirement, acquire the second duration as the out-of-sequence duration.
  • the out-of-sequence duration can be very long; but in order to prevent the first thread from being in a backoff state all the time, causing the first code segment to fail to run, this embodiment will The second duration with a finite length is used as an out-of-sequence duration.
  • the second duration is longer than the first duration. Based on this, the second duration can also be called the maximum out-of-sequence duration.
  • the first duration can be dynamically adjusted based on delay requirements.
  • the second duration can be fixed.
  • Step 403 when the waiting queue is not empty, the first thread enters a back-off state, and the back-off state is a state of waiting to join the waiting queue.
  • Step 403 is similar to step 204, and step 403 can be understood by referring to the related description of step 204 above.
  • Step 404 based on the duration of entering the back-off state is greater than or equal to the out-of-order duration, join the waiting queue.
  • Step 404 is similar to step 205, and step 404 can be understood by referring to the relevant description of step 205 above.
  • Step 405 based on the time length of entering the backoff state is less than the out-of-order time length, the waiting queue is detected.
  • Step 405 is similar to step 206, and step 405 can be understood by referring to the relevant description of step 206 above.
  • Step 406 if the waiting queue is empty, join the waiting queue.
  • Step 406 is similar to step 207, and step 406 can be understood by referring to the relevant description of step 207 above.
  • the first thread determines that the priority of the first core is lower than the priority of the second core, and then enters the back-off state (i.e. a. delay joining the waiting queue shown in Figure 7); after the first thread enters the back-off state, The second thread running under the second core joins the tail of the waiting queue, and the lock will be passed down in the waiting queue by the lock holder in turn, and finally passed to the second thread.
  • the back-off state i.e. a. delay joining the waiting queue shown in Figure 7
  • the first thread joins the tail of the waiting queue to acquire a lock.
  • the method provided in the embodiment of the present application can be applied to any application program, and there are various specific application modes.
  • the method provided by the embodiment of the present application can be implemented by modifying the code of the application program; Modify the locking code in the code segment so that the first thread performs the above-mentioned first operation in the process of running the modified locking code; Corresponding code is added after the code, so that the first thread performs the above-mentioned second operation while running the code before the locked code, and performs the above-mentioned third operation during the running of the code after the locked code.
  • the application program includes a delay-critical code segment (i.e. the first code segment in the preceding text), and the embodiment of the present application refers to the delay-critical code segment as epoch;
  • the code related to the mutex lock will execute the operation related to the mutex lock when running the code related to the mutex lock.
  • the operation related to the mutex lock can include calling the function pthread_mutex_lock.
  • the function pthread_mutex_lock is used to make the thread acquire mutex.
  • the embodiment of the present application adds a target library, which contains codes related to mutexes in the target library.
  • a target library which contains codes related to mutexes in the target library.
  • When running the codes related to mutexes in the target library will also perform operations related to mutexes; and, using redirection (marked by 3 in Figure 8), the operations related to mutexes performed when running the above delay-critical code fragments will be executed in the target library
  • the operation (such as the first operation) related to the mutex lock that is carried out during the code related to the mutex lock; like this, when the first thread calls the function pthread_mutex_lock, the first operation ( Figure 8 adopts 4 mark) will be automatically performed,
  • the first operation is used to enable the first thread to obtain the out-of-order lock (marked by 5 in FIG. 8 ) provided by the embodiment of the present application. Based on the foregoing description, it can be seen that the out-of-order lock is established based on the existing FIFO lock
  • redirection is to redirect various network requests to other locations through various methods.
  • the embodiment of the present application only needs to add a target library, and redirect the mutex-related operations in the first code segment to the target library
  • the operations related to the mutex in do not need to modify the code of the application.
  • the first thread needs to perform the second operation before performing the first operation, and the third operation needs to be performed after performing the first operation.
  • the implementation methods of the second operation and the third operation are as follows: before the first code segment, add the interface epoch_start (this interface marks the beginning of epoch, Figure 8 uses 1 mark) and interface epoch_end (this interface marks the end of epoch, Figure 8 uses 2 mark), the target library also includes the code called through the interface epoch_star and the code called through the interface epoch_end; when the first thread calls the interface epoch_star, it will run the code in the target library to perform the second operation above, when the first When the thread calls the interface epoch_end, it will run the code in the target library to perform the third operation above.
  • the first thread will perform the first operation during the running of the first code segment, thereby obtaining the out-of-order lock;
  • the number of out-of-order locks can be one or multiple; when the number of out-of-order locks is multiple, it means that the first thread needs to perform the first operation multiple times to obtain Multiple out-of-order locks, multiple out-of-order locks can be different locks.
  • the multiple out-of-order locks can be in a nested relationship, or can be associated through interfaces such as condition variables or trylocks.
  • the first situation is: the priority of the first core is higher than the priority of the second core; the second situation is: the first thread The priority of the first core is lower than that of the second core, but the first code segment has a delay requirement; the third case is: the priority of the first core is lower than that of the second core, but the first code segment There is no delay requirement.
  • this embodiment of the present application sets a corresponding interface for each situation.
  • adding the first thread to the waiting queue includes:
  • the first thread first judges whether the first core is a large core (that is, whether it is a high priority);
  • the first thread calls the interface lock_immediately to directly join the waiting queue, and finally locks successfully;
  • the first thread judges whether it is in the epoch (that is, judges whether there is a delay requirement);
  • the first thread calls the interface ock_reorder to join the waiting queue by executing steps 203 to 210, and finally locks successfully;
  • the first thread calls the interface ock_eventually to join the waiting queue by executing steps 402 to 40, and finally locks successfully.
  • the first thread needs to join the waiting queue first.
  • the three situations in which the first thread joins the waiting queue are introduced above; after acquiring the lock, the first thread will execute the critical section ; After executing the critical section, the first thread will release the lock, and the process of releasing the lock will be briefly described below.
  • the operations related to the mutex in the first code segment may include calling the function pthread_mutex_lock, which is used to enable the thread to acquire the mutex; in addition, the mutex in the first code segment Lock-related operations may also include calling the function pthread_mutex_unlock, which is used to make the thread release the lock.
  • Figure 10 shows the ratio of the throughput rate using the target library (LibASL) to the throughput rate using other locks (such as the spin lock Spinlock, the numbered Ticket lock, the MCS lock and the Pthread mutex lock), and the actual execution rate of the code segment Whether the duration can reach the set target latency SLO; where the horizontal axis represents different target latency SLOs, and the vertical axis represents the ratio of throughput.
  • LibASL target library
  • other locks such as the spin lock Spinlock, the numbered Ticket lock, the MCS lock and the Pthread mutex lock
  • the throughput rate of LibASL is 1.2 times the throughput rate of the spin lock (Spinlock) (i.e. the performance is improved by 20%); and , at this time, the total tail delay (Total Tail) of all cores is 0.948 times the set SLO (6*104ns), and the tail delay of the small core is 0.948 times the set SLO (6*104ns). 1.002 times (not shown in Figure 10).
  • FIG. 10 also uses vertical dotted lines to represent time delays using other time locks.
  • the tail delay (Spin Total) of all cores is 10.4*104ns
  • the tail delay (Spin Little) of the small core is 13.1*104ns.
  • LibASL when a reasonable SLO is set, the tail delay of LibASL (whether it is the tail delay of the small core or all cores) can be guaranteed (that is, less than the SLO); ⁇ Compared with Pthread mutex, MCS lock, LibASL can increase the throughput rate by 40% to 66% under the premise of ensuring that the target delay is met; At the same time, LibASL can increase throughput by 40%.
  • Figure 11 shows the latency of each epoch in the first 350ms of the test.
  • the horizontal axis in the figure represents the time axis from 0 to 350ms, while the vertical axis represents the delay of epoch.
  • the horizontal solid line (10*104ns) in the figure indicates the size of the currently set delay SLO.
  • the red and green points in the figure represent the execution delay of all epochs on a small core and a large core, respectively.
  • LibASL can also quickly adjust the out-of-order duration to maximize throughput.
  • LibASL can also ensure that the delay is within the SLO.
  • the set SLO cannot be guaranteed. Therefore, LibASL is not used at this time, and the delay on all cores is guaranteed to be consistent as much as possible. Therefore, it can be seen from Figure 11 that the delays on the large and small cores are the same at this time.
  • the embodiment of the present application also provides an apparatus for acquiring lock resources, which is applied to computer equipment, and the processor of the computer equipment includes a first core and a second core; the apparatus includes: a determination unit 601, configured to determine The priority of the first core is lower than the priority of the second core; the queue joining unit 602 is used to delay joining the waiting queue when the waiting queue is not empty, so that the second thread running under the second core first Join the waiting queue in the first thread, and the waiting queue is used to compete for lock resources.
  • a determination unit 601 configured to determine The priority of the first core is lower than the priority of the second core
  • the queue joining unit 602 is used to delay joining the waiting queue when the waiting queue is not empty, so that the second thread running under the second core first Join the waiting queue in the first thread, and the waiting queue is used to compete for lock resources.
  • the determining unit 601 is configured to determine that the priority of the first core is lower than the priority of the second core based on the performance of the second core being better than the performance of the first core.
  • the queue joining unit 602 is configured to enter the back-off state for the first thread when the waiting queue is not empty, and the back-off state is a state of waiting to join the waiting queue; the duration of entering the back-off state is greater than or equal to Out of sequence duration, join the waiting queue.
  • the queue joining unit 602 is configured to obtain the out-of-sequence duration.
  • the queue adding unit 602 is configured to obtain the first duration corresponding to the first code segment as the out-of-sequence duration based on the delay requirement of the first code segment, and the first duration is obtained based on the delay requirement .
  • the device further includes: a first setting unit 603, configured to set the value of the global variable as the first identifier of the first code segment, the global variable represents a code segment with a delay requirement; the queue joins The unit 602 is further configured to determine that the first code segment has a delay requirement based on the value of the global variable as the first identifier.
  • the device further includes: a second setting unit 604, configured to set the value of the global variable as a second identifier, and the second identifier does not identify any code segment.
  • the device further includes: an adjustment unit 605, configured to acquire the actual running duration of the first code segment; for a while.
  • the adjusting unit 605 is configured to shorten the first duration based on the fact that the actual running duration is greater than the target latency.
  • the adjusting unit 605 is configured to extend the first duration based on the fact that the actual running duration is less than the target delay.
  • the queue adding unit 602 is configured to acquire the second duration as the out-of-sequence duration based on the fact that the first code segment does not have a delay requirement.
  • the queue adding unit 602 is configured to determine that the first code segment does not have a delay requirement based on the value of the global variable as the second identifier, the global variable indicates a code segment with a delay requirement, and the second ID does not identify any one code segment.
  • the queue adding unit 602 is configured to detect the waiting queue based on the time duration of entering the backoff state being shorter than the out-of-order duration; and join the waiting queue when the waiting queue is empty.
  • the queue adding unit 602 is configured to join the waiting queue when the waiting queue is empty.
  • FIG. 13 is a schematic structural diagram of a computer device provided by an embodiment of the present application, which is used to implement the function of the device for acquiring lock resources in the embodiment corresponding to FIG. 11.
  • the computer device 1800 consists of one or more The server is implemented, and the computer device 1800 may have relatively large differences due to different configurations or performances, and may include one or more central processing units (central processing units, CPU) 1822 (for example, one or more processors) and memory 1832, One or more storage media 1830 (eg, one or more mass storage devices) storing application programs 1842 or data 1844 .
  • the memory 1832 and the storage medium 1830 may be temporary storage or persistent storage.
  • the program stored in the storage medium 1830 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the computer device. Furthermore, the central processing unit 1822 may be configured to communicate with the storage medium 1830 , and execute a series of instruction operations in the storage medium 1830 on the computer device 1800 .
  • Computer device 1800 can also include one or more power supplies 1826, one or more wired or wireless network interfaces 1850, one or more input and output interfaces 1858, and/or, one or more operating systems 1841, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
  • the central processing unit 1822 may be configured to execute the method for acquiring lock resources performed by the device for acquiring lock resources in the embodiment corresponding to FIG. 11 .
  • the central processing unit 1822 can be used for:
  • the first code segment is run by a first thread running on a first core to perform the following first operation:
  • the embodiment of the present application also provides a chip, including one or more processors. Part or all of the processor is used to read and execute the computer program stored in the memory, so as to execute the methods of the foregoing embodiments.
  • the chip includes a memory, and the memory and the processor are connected to the memory through a circuit or wires. Further optionally, the chip further includes a communication interface, and the processor is connected to the communication interface.
  • the communication interface is used to receive data and/or information to be processed, and the processor obtains the data and/or information from the communication interface, processes the data and/or information, and outputs the processing result through the communication interface.
  • the communication interface may be an input-output interface.
  • some of the one or more processors may implement some of the steps in the above method through dedicated hardware, for example, the processing related to the neural network model may be performed by a dedicated neural network processor or graphics processor to achieve.
  • the method provided in the embodiment of the present application may be implemented by one chip, or may be implemented by multiple chips in cooperation.
  • the embodiment of the present application also provides a computer storage medium, which is used for storing computer software instructions used by the above-mentioned computer equipment, which includes a program for executing a program designed for the computer equipment.
  • the computer device may be the same as the apparatus for acquiring lock resources in the aforementioned embodiment corresponding to FIG. 11 .
  • the embodiment of the present application also provides a computer program product, the computer program product includes computer software instructions, and the computer software instructions can be loaded by a processor to implement the procedures in the methods shown in the foregoing embodiments.
  • the disclosed system, device and method can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
  • the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请实施例公开了一种获取锁资源的方法、装置及设备,该方法应用于处理器包括第一核心和第二核心的计算机设备,具体包括:通过运行在第一核心上的第一线程运行第一代码段,在运行的过程中,第一线程确定第一核心的优先级低于第二核心的优先级;在等待队列不为空的情况下,第一线程延迟加入等待队列,这样,运行在第二核心下的第二线程便可以先于第一线程加入等待队列,从而先于第一线程获取到锁资源,从而降低第二线程的等待时间,提高处理器的吞吐率。

Description

一种获取锁资源的方法、装置及设备
本申请要求于2021年10月21日提交中国专利局、申请号为CN202111227144.7、申请名称为“一种获取锁资源的方法、装置及设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及计算机技术领域,尤其涉及一种获取锁资源的方法、装置及设备。
背景技术
互斥锁是常用的一种同步原语。其用于为多线程应用程序提供互斥访问保证。当多个线程需要修改共享的数据时(如数据库中的同一条目),这些线程需要相互竞争以获取互斥锁;在同一时刻,只有获取到互斥锁的一个线程能够对该共享的数据进行修改。对于互斥锁,如果被一个线程获取到,那么其他要竞争互斥锁的线程只能进入睡眠状态。
目前,还存在其他获取互斥锁的方法。
例如,如果互斥锁被一个线程获取到,那么其他要竞争互斥锁的线程不会进入睡眠状态,会一直循环检测该互斥锁是否被释放;并在互斥锁被释放后,再去竞争互斥锁。通过这种方法获取的互斥锁又叫自旋锁。
但是,在循环检测的过程中,可能还会新增的一部分线程;在自旋锁被释放后,这一部分新增的线程与此前一直循环检测自旋锁是否被释放的线程一同竞争自旋锁,即导致新增的一部分线程与此前一直循环检测自旋锁是否被释放的线程间的不公平。
为了保证多个线程竞争锁的公平性,又出现了队列锁。队列锁维护了一个先入先出的等待队列,每个要获取互斥锁的线程都需要先加入等待队列,先加入等待队列的线程先获取到互斥锁。
以异构多核处理器(Asymmetric Multiprocessor,AMP)为例,异构多核处理器也可以称为非对称多核处理器,其中包含多个大小不同的核心(core),大核心通常是指性能较优的核心,小核心通常是指性能较差的核心。
若运行在小核心下的线程需要获取互斥锁,则运行在小核心下的线程便需要加入等待队列;此后,若运行在大核心下的线程需要获取互斥锁,则运行在大核心下也需要加入等待队列;由于运行在小核心下的线程先于运行在大核心下加入等待队列,所以运行在小核心下的线程先获取到互斥锁。
然而,由于小核心的性能较差,所以运行在小核心下的线程在获取到互斥锁后,执行临界区(互斥锁保护的代码区)的时间较长,这会导致运行在大核心下的线程的等待时间较长,导致处理器的吞吐率下降。
发明内容
本申请实施例提供了一种获取锁资源的方法、装置及设备,该方法能使得运行在优先级高的第二核心下的第二线程先于运行在优先级低的第一核心上的第一线程,加入等待队列,从而使得第二线程优先获取锁资源,以降低第二线程的等待时间,提高处理器的吞吐 率。
第一方面,本申请实施例提供了一种获取锁资源的方法,应用于计算机设备,计算机设备的处理器包括第一核心和第二核心,第一核心和第二核心对应不同的优先级;基于此,该方法包括:通过运行在第一核心上的第一线程运行第一代码段,以执行以下第一操作,其中,第一代码段是指应用程序中的一段代码:确定第一核心的优先级低于第二核心的优先级,确定方法有多种,例如,为第一核心和第二核心设定不同的优先级标识,第一核心对应低优先级的标识,第二核心对应高优先级的标识,这样,第一线程便可以基于第一核心对应的优先级标识确定第一核心的优先级低于第二核心的优先级;在等待队列不为空的情况下,延迟加入等待队列,以使得运行在第二核心下的第二线程先于第一线程加入等待队列,等待队列用于竞争锁资源,本申请实施例对延迟的时长不做具体限定。
由于第一核心的优先级低于第二核心的优先级,所以在等待队列不为空的情况下,运行在第一核心上的第一线程延迟加入等待队列,以使得运行在第二核心下的第二线程先于第一线程加入等待队列;这样,第二线程便可以优先获取到锁资源,从而可以优先执行临界区。
由此可见,按照本申请实施例提供的方法获取锁资源,使得运行在优先级高的核心下的线程可以优先于运行在优先级低的核心下的线程执行临界区,从而减少运行在优先级高的核心下的线程的等待时间,增加运行在优先级高的核心下的线程执行临界区的次数;又由于优先级高的核心通常是性能较优的核心,运行在性能较优的核心下的线程执行临界区的时长较短,而运行在性能较差的核心下的线程执行临界区的时长较长,因此,增加运行在优先级高的核心下的线程执行临界区的次数,在固定时间段内,能够提高执行临界区的总次数,从而提高吞吐率。
作为一种可实现的方式,确定第一核心的优先级低于第二核心的优先级包括:基于第二核心的性能优于第一核心的性能,确定第一核心的优先级低于第二核心的优先级,其中,核心的性能可以由多个参数确定,例如,可以由频率、缓存等参数确定。
由于第二核心的性能优于第一核心的性能,所以本申请实施例能够减少运行在性能优的核心下的线程的等待时间,增加运行在性能优的核心下的线程执行临界区的次数;又由于运行在性能较优的核心下的线程执行临界区的时长较短,而运行在性能较差的核心下的线程执行临界区的时长较长,因此,在固定时间段内,能够提高执行临界区的总次数,从而提高吞吐率。
作为一种可实现的方式,在等待队列不为空的情况下,延迟加入等待队列包括:在等待队列不为空的情况下第一线程进入退避状态,退避状态为等待加入等待队列的状态;基于进入退避状态的时长大于或等于乱序时长,加入等待队列。
由于第一线程在等待时间大于或等于乱序时长时,加入等待队列,从而防止第一线程的等待时间过长,以影响第一代码段的正常运行。
作为一种可实现的方式,加入等待队列之前,第一操作还包括:获取乱序时长,其中,乱序时长的确定方法有多种,本申请实施例对此不做具体限定,例如,可以基于第一代码段是否存在时延需求来获取乱序时长。
在确定第一核心的优先级低于第二核心之后,获取乱序时长,并通过乱序时长控制第一线程的等待时间,从而防止第一线程的等待时间过长,以影响第一代码段的正常运行。
作为一种可实现的方式,在确定第一核心的优先级低于第二核心之后,在基于进入退避状态的时长大于或等于乱序时长,获取乱序时长。
作为一种可实现的方式,获取乱序时长包括:基于第一代码段存在时延需求,获取第一代码段对应的第一时长,作为乱序时长,第一时长是基于时延需求获取的;基于时延需求确定第一时长的方法有多种,例如,可以基于时延需求规定的目标时延估算第一时长,后续再对第一时长进行调整,其中,第一时长通常小于目标时延。
由于第一代码段存在时延需求,所以基于时延需求确定第一时长,并将第一时长作为乱序时长,能够防止乱序时长过大导致第一代码段的运行时长过长,而无法满足时延需求的情况。
作为一种可实现的方式,方法还包括:在运行第一代码段前,通过第一线程执行以下第二操作:将全局变量的取值设置为第一代码段的第一标识,全局变量表示存在时延需求的代码段;在基于第一代码段存在时延需求,获取第一代码段对应的第一时长,作为乱序时长之前,第一操作还包括:基于全局变量的取值为第一标识,确定第一代码段存在时延需求。
在该实现方式中,通过引入全局变量来对存在时延需求的第一代码段进行标记,并在运行第一代码段前将全局变量的取值设置为第一代码段的第一标识,以使得第一线程可以根据全局变量的取值确定第一代码段存在时延需求,采用全局变量标记第一代码段的方式简单且容易实现。
作为一种可实现的方式,方法还包括:在运行第一代码段后,通过第一线程执行以下第三操作:将全局变量的取值设置为第二标识,第二标识不标识任何一个代码段。
由于第一线程可以执行多个代码段,若不将全局变量的取值设置为第二标识,可能造成乱序时长的错误计算;例如,第一代码段的下一个代码段不存在时延需求,若不将全局变量的取值设置为第二标识,则全局变量的取值还是第一标识;这样,在通过第一线程运行下一个代码段的过程中,会错误地将第一代码段对应的第一时长作为乱序时长;因此,将全局变量的取值设置为第二标识,能够防止乱序时长计算错误。
作为一种可实现的方式,方法还包括:在运行第一代码段后,通过第一线程执行以下第三操作:获取第一代码段的实际运行的时长,具体地,在运行第一代码段时,可以记录开始运行第一代码段的时间戳,当第一代码段运行结束时,再获取第一代码段运行结束时的时间戳,基于开始运行第一代码段的时间戳和运行结束时的时间戳便可以计算得到第一代码段的实际运行的时长;基于实际运行的时长与时延需求规定的目标时延间的相对大小,调整第一时长,调整第一时长的方法可以有多种。
由于第一时长是基于时延需求获取的,但第一时长可能不够准确,例如,第一时长较短,造成第一代码段的实际运行的时长远小于时延需求,或第一时长较长,造成第一代码段的实际运行的时长远大于时延需求;因此,基于第一代码段实际运行的时长与时延需求规定的目标时延间的相对大小,调整第一时长,从而在实现第一线程的延迟加入的情况下, 满足第一代码段的时延需求。
作为一种可实现的方式,基于实际运行的时长与目标时延间的相对大小,调整第一时长包括:基于实际运行的时长大于目标时延,缩短第一时长,其中,缩短幅度可以根据实际需要进行调整。
基于实际运行的时长大于目标时延,缩短第一时长,能够使得第一代码段实际运行的时长小于目标时延,从而在实现第一线程的延迟加入的情况下,满足第一代码段的时延需求。
作为一种可实现的方式,基于实际运行的时长与目标时延间的相对大小,调整第一时长包括:基于实际运行的时长小于目标时延,延长第一时长。
基于实际运行的时长小于目标时延,延长第一时长,从而在满足第一代码段的时延需求的情况下,尽可能地延长第一线程处于退避状态的时长,以尽可能地使得第二线程优先加入等待队列,提高吞吐率。
作为一种可实现的方式,获取乱序时长包括:基于第一代码段不存在时延需求,获取第二时长作为乱序时长。
由于第一代码段不存在时延需求,所以第二时长通常较长;获取第二时长作为乱序时长,使得第一线程可以能够延迟加入等待队列。
作为一种可实现的方式,在基于第一代码段不存在时延需求,将固定的第二时长作为乱序时长之后,第一操作还包括:基于全局变量的取值为第二标识,确定第一代码段不存在时延需求,全局变量表示存在时延需求的代码段,第二标识不标识任何一个代码段;其中,第二标识不标识任何一个代码段,也可以理解为当前无存在时延需求的代码段。
在该实现方式中,通过引入全局变量来对存在时延需求的第一代码段进行标记,这样,便可以基于全局变量的取值为第二标识,确定第一代码段不存在时延需求;这种采用全局变量标记第一代码段的方式简单且容易实现。
作为一种可实现的方式,在等待队列不为空的情况下,延迟加入等待队列还包括:基于进入退避状态的时长小于乱序时长,检测等待队列;在等待队列为空的情况下,加入等待队列。
基于进入退避状态的时长小于乱序时长,检测等待队列,并在等待队列为空的情况下,加入等待队列,可以防止等待队列为空但第一线程仍处于退避状态,从而可以最大化地利用锁资源,提高吞吐量。
作为一种可实现的方式,在确定第一核心的优先级低于第二核心之后,第一操作还包括:在等待队列为空的情况下,加入等待队列。
在等待队列为空的情况下,加入等待队列,可以防止等待队列为空但第一线程仍处于退避状态,从而可以最大化地利用锁资源,提高吞吐量。
第二方面,本申请实施例提供了一种获取锁资源的装置,应用于计算机设备,计算机设备的处理器包括第一核心和第二核心;装置包括:确定单元,用于确定第一核心的优先级低于第二核心的优先级;队列加入单元,用于在等待队列不为空的情况下,延迟加入等待队列,以使得运行在第二核心下的第二线程先于第一线程加入等待队列,等待队列用于 竞争锁资源。
作为一种可实现的方式,确定单元,用于基于第二核心的性能优于第一核心的性能,确定第一核心的优先级低于第二核心的优先级。
作为一种可实现的方式,队列加入单元,用于在等待队列不为空的情况下第一线程进入退避状态,退避状态为等待加入等待队列的状态;基于进入退避状态的时长大于或等于乱序时长,加入等待队列。
作为一种可实现的方式,队列加入单元,用于获取乱序时长。
作为一种可实现的方式,队列加入单元,用于基于第一代码段存在时延需求,获取第一代码段对应的第一时长作为乱序时长,第一时长是基于时延需求获取的。
作为一种可实现的方式,装置还包括:第一设置单元,用于将全局变量的取值设置为第一代码段的第一标识,全局变量表示存在时延需求的代码段;队列加入单元,还用于基于全局变量的取值为第一标识,确定第一代码段存在时延需求。
作为一种可实现的方式,装置还包括:第二设置单元,用于将全局变量的取值设置为第二标识,第二标识不标识任何一个代码段。
作为一种可实现的方式,装置还包括:调整单元,用于获取第一代码段的实际运行的时长;基于实际运行的时长与时延需求规定的目标时延间的相对大小,调整第一时长。
作为一种可实现的方式,调整单元,用于基于实际运行的时长大于目标时延,缩短第一时长。
作为一种可实现的方式,调整单元,用于基于实际运行的时长小于目标时延,延长第一时长。
作为一种可实现的方式,队列加入单元,用于基于第一代码段不存在时延需求,获取第二时长作为乱序时长。
作为一种可实现的方式,队列加入单元,用于基于全局变量的取值为第二标识,确定第一代码段不存在时延需求,全局变量表示存在时延需求的代码段,第二标识不标识任何一个代码段。
作为一种可实现的方式,队列加入单元,用于基于进入退避状态的时长小于乱序时长,检测等待队列;在等待队列为空的情况下,加入等待队列。
作为一种可实现的方式,队列加入单元,用于在等待队列为空的情况下,加入等待队列。
其中,以上各单元的具体实现、相关说明以及技术效果请参考本申请实施例第一方面的描述。
本申请实施例第三方面提供一种计算机设备,所述计算机设备包括存储器和处理器,所述存储器用于存储计算机可读指令(或者称之为计算机程序),所述处理器用于读取所述计算机可读指令以实现前述任意实现方式提供的方法。
本申请实施例第四方面提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得所述计算机执行如前述任一方面以及各种可能的实现方式中任一项所述的方法。
本申请实施例第五方面提供了一种计算机可读存储介质,包括指令,当所述指令在计 算机上运行时,使得计算机执行如上述第一方面以及各种可能的实现方式中任一项所述的方法。
本申请实施例第六方面提供了一种芯片,包括一个或多个处理器。所述处理器中的部分或全部用于读取并执行存储器中存储的计算机程序,以执行上述第一方面任意可能的实现方式中的方法。
可选地,该芯片该包括存储器,该存储器与该处理器通过电路或电线与存储器连接。进一步可选地,该芯片还包括通信接口,处理器与该通信接口连接。通信接口用于接收需要处理的数据和/或信息,处理器从该通信接口获取该数据和/或信息,并对该数据和/或信息进行处理,并通过该通信接口输出处理结果。该通信接口可以是输入输出接口。
在一些实现方式中,所述一个或多个处理器中还可以有部分处理器是通过专用硬件的方式来实现以上方法中的部分步骤,例如涉及神经网络模型的处理可以由专用神经网络处理器或图形处理器来实现。
本申请实施例提供的方法可以由一个芯片实现,也可以由多个芯片协同实现。
附图说明
图1为本申请实施例提供的计算机设备的核心示意图;
图2为本申请实施例提供的一种获取锁资源的方法的第一实施例的示意图;
图3为本申请实施例提供的一种获取锁资源的方法的第二实施例的示意图;
图4为本申请实施例中获取乱序时长的一个实施例的流程示意图;
图5为申请实施例提供的一种获取锁资源的方法的第三实施例的示意图;
图6为本申请实施例中获取乱序时长的另一个实施例的流程示意图;
图7为本申请实施例中第一线程加入等待队列的过程;
图8为本申请实施例中为应用程序加可乱序锁的示意图;
图9为本申请实施例中第一线程加入等待队列示意图;
图10为目标库的吞吐率与使用其他锁的吞吐率的对照示意图;
图11为本申请实施例中epoch的时延的变化示意图;
图12为本申请实施例还提供了一种获取锁资源的装置的示意图;
图13是本申请实施例提供的计算机设备的一种结构示意图。
具体实施方式
下面结合附图,对本申请的实施例进行描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或模块的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或模块,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步 骤或模块。在本申请中出现的对步骤进行的命名或者编号,并不意味着必须按照命名或者编号所指示的时间/逻辑先后顺序执行方法流程中的步骤,已经命名或者编号的流程步骤可以根据要实现的技术目的变更执行次序,只要能达到相同或者相类似的技术效果即可。
另外,在本发明的描述中,除非另有说明,“多个”的含义是两个或两个以上。本申请中的术语“和/或”或字符“/”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,或A/B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。
本申请实施例可以应用于图1所示的计算机设备,该计算机设备中的处理器包括多个大小不同的核心(core);需要说明的是,核心的大小是核的相对大小,大核心可以是性能较优的核心,小核心可以是指性能较差的核心。
具体地,图1示出了4个核心,这4个核心包括1个核心A、2个核心B和1个核心C。
在图1中,利用圆圈的大小表示核心的大小;基于图1可以看出,核心A大于核心B,具体可以表现为核心A的性能优于核心B的性能;核心B大于核心C,具体可以表现为核心B的性能优于小核心C的性能。
相对于图1中的核心A来说,图1中的核心B可以看成是小核心;相对于图1中的核心B来说,图1中的核心C也可以看成是小核心。
可以理解的是,当多个线程需要修改共享数据时,都需要通过竞争来获取保护该共享数据的锁。
其中,锁的类型有多种,例如自旋锁和队列锁等。
但自旋锁会导致线程间的不公平的问题,而队列锁虽然能够保证线程间的公平性,但同时也会导致吞吐率下降。
下面以图1中的核心A和核心B为例对此进行说明。
假设锁被某一线程占有,在锁被释放前的一段时间内,线程1、线程2、线程3、线程4、线程5和线程6按照时间的先后顺序依次请求锁,且线程1和线程3运行在核心B下,线程2、线程4、线程5和线程6运行在核心A下。
若被占有的锁为自旋锁,则不论请求锁的时间的先后顺序,在锁被释放后,线程1、线程2、线程3、线程4、线程5和线程6会通过原子操作同时竞争锁;具体地,线程1、线程2、线程3、线程4、线程5和线程6会将锁变量设置为特征值(例如1),若其中一个线程设置成功,则该线程会获取到自旋锁;其中,原子操作是指不会被线程调度机制打断的操作。
由于核心A的性能优于核心B的性能,所以线程2、线程4、线程5和线程6执行原子操作成功的概率比线程1和线程3执行原子操作成功的概率大,那么线程2、线程4、线程5和线程6竞争到自旋锁的概率便大于线程1和线程3竞争到自旋锁的概率。
最终竞争锁的结果为:线程2、线程4、线程5线程6和线程1依次获取到自旋锁,而由于时间有限,线程3未获取到自旋锁。
若被占有的锁为队列锁,由于请求锁的时间的先后顺序为线程1、线程2、线程3、线程4、线程5和线程6,所以加入等待队列的顺序也依次为线程1、线程2、线程3、线程4、 线程5和线程6。
需要说明的是,线程在竞争到锁后需要执行临界区(锁保护的代码区)以实现对共享数据的修改,且只有竞争到队列锁的线程才能进入到临界区。
由于核心B的性能较差,所以线程1和线程3在竞争到队列锁后,执行临界区的时间较长,而又由于时间有限,所以最终竞争锁的结果为:线程1、线程2、线程3、线程4依次获取到队列锁,线程5和线程6在这段有限的时间内未获取到队列锁。
基于上述说明可知,在同一时间段内,若被占有的锁为队列锁,4个线程通过队列锁各执行一次临界区;而若被占有的锁为自旋锁,5个线程通过自旋锁各执行一次临界区;所以,通过队列锁执行临界区的次数小于通过自旋锁执行临界区的次数,即队列锁会导致吞吐率降低,其中,吞吐率可以理解为单位时间内提供的服务量。
前文中的队列锁,也可以称为排队自旋锁(FIFO Ticket Spinlock),简称为FIFO锁,FIFO锁可以理解为一种新型自旋锁。
为此,本申请实施例提供了一种获取锁资源的方法,该方法利用了已有的FIFO锁,但在加入等待队列之前,先对运行线程的核心的优先级进行判断,当核心的优先级低时,则线程延迟加入等待队列;这样,在该线程加入等待队列之前,运行在优先级高的核心下的线程便可以先加入等待队列;因此,即使运行在优先级高的核心下的线程晚于运行在优先级低的核心下的线程请求锁资源,也能够优先获取到锁资源;其中,锁资源也可以简称为锁。
因此,本申请实施例能够减少运行在优先级高的核心下的线程获取锁资源的时间,当优先级高的核心为性能较优的核心时,则意味着运行在优先级高的核心下的线程可以优先获取锁资源并执行临界区,而运行在优先级高的核心下的线程执行临界区的时间较短,所以可以提高一定时间内临界区的执行次数,从而提高吞吐率。
需要说明的是,在本申请实施例中,即使运行在优先级高的核心下的线程晚于运行在优先级低的核心下的线程竞争锁资源,但也能早于运行在优先级低的核心下的线程加入等待队列并获得锁资源,所以本申请实施例中的锁可以看成是一种新型的FIFO锁;本申请实施例将这种新型的FIFO锁称为可乱序锁。
下面对本申请实施例提供的方法进行具体介绍。
如图2所示,本申请实施例提供了一种获取锁资源的方法的第一实施例,该第一实施例应用于计算机设备,该计算机设备的处理器包括多个核心,本申请实施例对核心的具体数量不做限定;例如,核心的数量可以为2个,3个,或3个以上。
多个核心对应多个优先级,划分优先级的方法有多种,本申请实施例对此不做具体限定;通常情况下,可以基于核心的性能优劣对多个核心进行优先级划分。
以图1中的4个核心为例,可以基于核心的性能优劣将这4个核心划分为3个优先级,具体地,核心A对应第一个优先级,核心B对应第二个优先级,核心C对应第三个优先级。
还可以基于核心的性能优劣将这4个核心划分为2个优先级,具体地,核心A对应第一个优先级,核心B和核心C都对应第二个优先级;又或者,核心A和核心B对应第一个优先级,核心C对应第二个优先级。
通常情况下,可以将多个核心划分为两个优先级。
为了便于说明,下文以多个核心中的第一核心和第二核心为例,对本申请实施例提供的方法进行说明。
具体地,方法包括:
通过运行在第一核心上的第一线程运行第一代码段,以执行第一操作,其中,第一代码段是指应用程序中的一段代码,该段代码具体可以用于处理某个请求。
需要说明的是,通过第一线程可以运行多个代码段,第一代码段可以是多个代码段中的任意一个。
第一操作包括:
步骤101,确定第一核心的优先级低于第二核心的优先级。
基于前述说明可知,核心的优先级通常是基于核心的性能划分的,相应地,步骤101可以包括:基于第二核心的性能优于第一核心的性能,确定第一核心的优先级低于第二核心。
具体地,可以预先将第一核心和第二核心划分相应的优先级,基于优先级的不同,为第一核心和第二核心设定不同的优先级标识;这样,第一线程便可以基于第一核心对应的优先级标识确定第一核心的优先级低于第二核心的优先级。
例如,第一核心的优先级标识为AA,第二核心的优先级标识为BB,其中,AA所表示的优先级低于BB所表示的优先级;第一线程在确定第一核心的优先级为AA而不是BB时,便可以确定第一核心的优先级低于第二核心的优先级。
步骤102,在等待队列不为空的情况下,延迟加入等待队列,以使得运行在第二核心下的第二线程先于第一线程加入等待队列,等待队列用于竞争锁资源。
延迟加入等待队列可以理解为,在确定等待队列不为空的情况下,不立即加入等待队列,而是等待一段时间后再加入等待队列;那么在等待的这一段时间内,第二线程便可以先加入等待队列。
需要说明的是,延迟加入队列的方法有多种,本申请实施例对此不作具体限定,下文会对此进行具体介绍。
由于前文对等待队列进行了说明,故在此不做详述;简而言之,先加入等待队列的线程会先获取到锁资源。
步骤103,在等待队列为空的情况下,加入等待队列。
可以理解的是,在等待队列为空的情况下,若不加入等待队列,则会浪费锁资源,也会导致吞吐率下降;因此步骤103能够提高吞吐率。
需要说明的是,当多个核心对应两个优先级时,若确定第一核心的优先级为较高的优先级,则无论等待队列是否为空,便可以直接加入等待队列。
在本申请实施例中,由于第一核心的优先级低于第二核心的优先级,所以在等待队列不为空的情况下,运行在第一核心上的第一线程延迟加入等待队列,以使得运行在第二核心下的第二线程先于第一线程加入等待队列;这样,第二线程便可以优先获取到锁资源,从而可以优先执行临界区。
基于此,按照本申请实施例提供的方法获取锁资源,使得运行在优先级高的核心下的线程可以优先于运行在优先级低的核心下的线程执行临界区,从而减少运行在优先级高的核心下的线程的等待时间,增加运行在优先级高的核心下的线程执行临界区的次数;又由于优先级高的核心通常是性能较优的核心,运行在性能较优的核心下的线程执行临界区的时长较短,而运行在性能较差的核心下的线程执行临界区的时长较长,因此,增加运行在优先级高的核心下的线程执行临界区的次数,在固定时间段内,能够提高执行临界区的总次数,从而提高吞吐率。
基于前文中步骤102的相关说明可知,在等待队列不为空的情况下,延迟加入等待队列,而延迟时间的长短可以根据实际需要进行设定。
例如,开发人员可能对某些代码段的运行具有时延需求,基于此,对于第一代码段存在时延需求,以及第一代码段不存在时延需求两种情况,设定不同长短的延迟时间。
下面分别通过第一代码段存在时延需求,以及第一代码段不存在时延需求两种情况,对本申请实施例提供的方法进行分别说明。
在这里,先介绍第一代码段存在时延需求的情况,如图3所示,本申请实施例提供了一种获取锁资源的方法的第二实施例,该实施例包括:
步骤201,将全局变量的取值设置为第一代码段的第一标识。
其中,全局变量表示存在时延需求的代码段。
为了方便确定第一代码段具有时延需求,该实施例引入了一个全局变量;当全局变量的取值为第一代码段的第一标识时,则表示第一代码段存在时延需求。
基于前述说明可知,通过第一线程可以运行多个代码段,为了能够准确确定存在时延需求的代码段,每个代码段都对应一个全局唯一的标识,所以第一标识也是全局唯一的。
在执行步骤201之前,可以设定全局变量的初始值,该初始值通常不标识任何一个代码段,初始值的具体取值可以根据实际需要进行设定;例如,该初始值可以为-1,下文也将该初始值称为第二标识。
需要说明的是,确定第一代码段具有时延需求的方法有多种,引入全局变量只是其中的一种方法,所以步骤201是可选的。
在该实施例中,可以在运行第一代码段之前执行步骤201;该实施例将在运行第一代码段前执行的操作称为第二操作,相应地,步骤201包含于第二操作中。
在执行第二操作后,通过第一线程运行第一代码段,以执行第一操作(即步骤202至步骤207),同时,记录第一代码段开始运行的时间戳,该时间戳用于计算第一代码段实际运行的时长。
步骤202,确定第一核心的优先级低于第二核心的优先级。
步骤202与步骤101类似,具体可参阅步骤101的相关说明对步骤202进行理解。
步骤203,获取乱序时长。
乱序时长是指允许第一线程和第二线程不按顺序加入等待队列的时长,即第一线程延迟加入队列的时长,或者说是第一线程等待加入队列的时长。
基于前述说明,该乱序时长可以基于第一代码段是否存在时延需求来确定,而在该实 施例中,第一代码段存在时延需求。
因此,作为一种可实现的方式,如图4所示,步骤203包括:
步骤301,确定第一代码段存在时延需求。
确定第一代码段存在时延需求的方法有多种,本申请实施例对此不做具体限定。
在执行步骤201的情况下,步骤301可以包括:
基于全局变量的取值为第一标识,确定第一代码段存在时延需求。
步骤302,基于第一代码段存在时延需求,获取第一代码段对应的第一时长,作为乱序时长,第一时长是基于时延需求获取的。
基于时延需求确定第一时长的方法有多种,本申请实施例对此不做限定。
例如,可以基于时延需求规定的目标时延估算第一时长,后续再对第一时长进行调整。
需要说明的是,乱序时长是指第一线程等待加入队列的时长,除此之外,通过第一线程执行第一代码段中的其他部分代码也需要时间,因此,第一时长通常小于时延需求所要求的目标时延。
在该实施例中,第一时长可以是动态调整的。
步骤203是在步骤205之前执行;本申请实施例对步骤203和步骤204的先后顺序不做具体限定,具体地,可以先执行步骤203,再执行步骤204,也可以先执行步骤204,再执行步骤203。
步骤204,在等待队列不为空的情况下第一线程进入退避状态,退避状态为等待加入等待队列的状态。
在退避状态下,第一线程不加入等待队列。
可以理解的是,为了计算第一线程进入退避状态的时长,在进入退避状态时,可以记录进入退避状态的时间戳。
步骤205,基于进入退避状态的时长大于或等于乱序时长,加入等待队列。
可以理解的是,在执行步骤205之前,需要计算进入退避状态的时长;具体地,可以不断获取当前的时刻,并基于当前的时刻和进入退避状态的时间戳计算进入退避状态的时长;若进入退避状态的时长大于或等于乱序时长,则加入等待队列。
若进入退避状态的时长小于乱序时长,则重复上述操作,即再次获取当前的时刻,并再次计算进入退避状态的时长,直到进入退避状态的时长大于或等于乱序时长。
步骤206,基于进入退避状态的时长小于乱序时长,检测等待队列。
需要说明的是,在进入退避状态的时长小于乱序时长的情况下,除了再次计算进入退避状态的时长外,也可以检测等待队列,以避免进入退避状态的时长小于乱序时长,但等待队列为空的情况;在这种情况下,若第一线程仍处于退避状态,则会造成锁资源的浪费。
并非在每次计算进入退避状态的时长后都需要检测等待队列,换句话说,检测等待队列的次数可以小于计算进入退避状态的时长的次数,从而避免频繁的检测操作带来额外的时延。
具体地,可以采用指数退避检查的策略检测等待队列,即在计算进入退避状态的时长的次数为指数倍数,且进入退避状态的时长小于乱序时长时,检测等待队列;而在计算进 入退避状态的时长的次数为非指数倍数时,不检测等待队列。
例如,在第1次计算进入退避状态的时长后,若进入退避状态的时长小于乱序时长,则检测等待队列;在第2次计算进入退避状态的时长后,且进入退避状态的时长小于乱序时长时,检测等待队列;在第4次计算进入退避状态的时长后,且进入退避状态的时长小于乱序时长时,检测等待队列;在第8次计算进入退避状态的时长后,且进入退避状态的时长小于乱序时长时,检测等待队列;依次类推。
当计算进入退避状态的时长的次数为第3次、第5次、第6次等非指数倍数时,不检测等待队列。
步骤207,在等待队列为空的情况下,加入等待队列。
通过执行步骤207,可以防止等待队列为空但第一线程仍处于退避状态,以最大化地利用锁资源,提高吞吐量。
需要说明的是,步骤203至步骤207构成了步骤102的具体实施方案。
在通过第一线程执行第一操作后,还可以通过第一线程执行第三操作(即步骤208至步骤210)。
步骤208,将全局变量的取值设置为第二标识,第二标识不标识任何一个代码段。
基于前述说明可知,通过第一线程可以执行多个代码段,在执行了步骤201的情况下,若不执行步骤208,则在通过第一线程运行下一个代码段的过程中,可能造成乱序时长的错误计算。
例如,第一代码段的下一个代码段不存在时延需求,若不执行步骤208,则全局变量的取值还是第一标识;这样,在通过第一线程运行下一个代码段的过程中,会错误地将第一代码段对应的第一时长作为乱序时长。
因此,通过步骤208可以防止乱序时长的错误计算。
需要说明的是,步骤208是可选的,通常情况下,在执行步骤201的情况下,会执行步骤208。
可以理解的是,第一时长是基于时延需求获取的,但第一时长可能不够准确,因此在运行完第一代码段后,可以对第一时长进行调整;例如,由于第一时长是基于时延需求获取的,所以可以基于时延需求,并采用反馈机制对第一时长进行调整,下面通过步骤209和步骤210对此进行说明。
步骤209,获取第一代码段的实际运行的时长。
基于前述说明可知,在执行第一操作的同时,会记录开始运行第一代码段的时间戳;基于此,步骤209可以包括:获取结束运行第一代码段的时间戳,然后基于开始运行的时间戳和结束运行的时间戳计算第一代码段的实际运行的时长。
步骤210,基于实际运行的时长与目标时延间的相对大小,调整第一时长,目标时延为用户期望的第一代码段的运行的时延。
作为一种可实现的方式,步骤210包括:基于实际运行的时长大于目标时延,缩短第一时长。
需要说明的是,缩短第一时长的方法有多种,本申请实施例对此不做具体限定;例如, 可以将第一时长缩短一半。
作为一种可实现的方式,步骤210包括:基于实际运行的时长小于目标时延,延长第一时长。
需要说明的,延长第一时长的方法有多种,本申请实施例对此不做具体限定;例如,可以在第一时长的基础上,每次延长一个单位的调整幅度;为了防止延长第一时长后,再次导致实际运行的时长大于目标时延,一个单位的调整幅度一般小于前一次缩短第一时长过程中的缩短幅度。
例如,调整幅度可以为前一次缩短第一时长过程中的缩短幅度的(100-PCT)/PCT。其中,PCT为目标时延的尾时延指标。例如:如果设定的目标时延为P99尾时延,则该PCT为99,调整幅度为前一次缩短第一时长过程中的缩短幅度的1/99。
其中,尾时延是指一个特定的时延,且在第一代码段的所有次数运行中,多数运行的时延都会小于该特定的时延;例如,P99尾时延是指在第一代码段的所有次数运行中,99%的运行的时延都会小于的时延。
因此,在该实施例中,实际运行的时长一旦超出设定的目标时延,第一时长将缩短一半,以后每次增幅为缩小幅度的1/99。如果执行情况没有发生变化,接下来的99次调整时延都不会超过目标时延。
下面介绍第一代码段不存在时延需求的情况,如图5所示,本申请实施例提供了一种获取锁资源的方法的第三实施例,该实施例包括:
步骤401,确定第一核心的优先级低于第二核心的优先级。
步骤401与步骤202类似,具体可参照前文中步骤202的相关说明对步骤401进行理解。
步骤402,获取乱序时长。
由于在该实施例中,第一代码段不存在时延需求,所以步骤402与步骤203不同。
具体地,作为一种可实现的方式,如图6所示,步骤402包括:
步骤501,确定第一代码段不存在时延需求。
确定第一代码段不存在时延需求的方法有多种,本申请实施例对此不做具体限定。
在执行步骤201的情况下,步骤501可以包括:
基于全局变量的取值为第二标识,确定第一代码段不存在时延需求,全局变量表示存在时延需求的代码段,第二标识不标识任何一个代码段。
由于前文对全局变量进行了说明,故在此不做赘述。
步骤502,基于第一代码段不存在时延需求,获取第二时长作为乱序时长。
需要说明的是,由于第一代码段不存在时延需求,理论上,乱序时长可以很长;但为了防止第一线程一直处于退避状态,导致第一代码段无法继续运行,该实施例将长度有限的第二时长作为乱序时长。
但由于第一代码段不存在时延需求,所以通常情况下,第二时长大于第一时长,基于此,第二时长也可以称为最大乱序时长。
此外,第一时长是可以基于时延需求动态调整的,与第一时长不同,第二时长可以是 固定的。
步骤403,在等待队列不为空的情况下第一线程进入退避状态,退避状态为等待加入等待队列的状态。
步骤403与步骤204类似,具体可参照前文中步骤204的相关说明对步骤403进行理解。
步骤404,基于进入退避状态的时长大于或等于乱序时长,加入等待队列。
步骤404与步骤205类似,具体可参照前文中步骤205的相关说明对步骤404进行理解。
步骤405,基于进入退避状态的时长小于乱序时长,检测等待队列。
步骤405与步骤206类似,具体可参照前文中步骤206的相关说明对步骤405进行理解。
步骤406,在等待队列为空的情况下,加入等待队列。
步骤406与步骤207类似,具体可参照前文中步骤207的相关说明对步骤406进行理解。
为了便于理解,下面结合图7对第一线程加入等待队列的过程进行说明。
具体地,第一线程确定第一核心的优先级低于第二核心的优先级,然后进入退避状态(即图7所示的a.延迟加入等待队列);当第一线程进入退避状态后,运行在第二核心下的第二线程加入等待队列队尾,锁会由锁的持有者依次在等待队列中向下传递,最终传递至第二线程。
当乱序时长结束(图7以b示出)后,或当等待队列为空(图7以c示出)时,第一线程加入等待队列队尾,以获取锁。
上文对本申请实施例提供的方法进行了说明,下文对本申请实施例提供的方法的应用场景进行说明。
具体地,本申请实施例提供的方法可以应用于任意应用程序,具体应用方式包括多种。
示例性地,对于一个普通的应用程序来说,可以通过修改该应用程序的代码来实现本申请实施例提供的方法;例如,应用程序包括一个代码段,该代码段中包含加锁代码,可以对该代码段中的加锁代码进行修改,以使得第一线程在运行修改后的加锁代码的过程中,执行上述第一操作;除此之外,还可以在加锁代码前和加锁代码后添加相应的代码,使得第一线程在运行加锁代码前的代码的过程中,执行上述第二操作,在运行加锁代码后的代码的过程中,执行上述第三操作。
可以理解的是,大量修改应用程序的代码会带来较大的工作量,为了减少工作量且增强本申请实施例提供的方法的实用性,可以采用下面的方法对应用程序进行改进。
如图8所示,应用程序包括时延关键代码片段(即前文中的第一代码段),本申请实施例将该时延关键代码片段称为epoch;该时延关键代码片段中包含与互斥锁相关的代码,在运行该与互斥锁相关的代码时,会执行与互斥锁相关的操作,该与互斥锁相关的操作可以包括调用函数pthread_mutex_lock,函数pthread_mutex_lock用于使得线程获取到互斥锁。
为了尽可能地不修改上述应用程序原有的代码,本申请实施例增加了一个目标库,该 目标库中包含与互斥锁相关的代码,当运行目标库中与互斥锁相关的代码时,也会执行与互斥锁相关的操作;并且,采用重定向的方式(图8采用③标记)将运行上述时延关键代码片段时所执行的与互斥锁相关的操作,运行目标库内与互斥锁相关的代码时所执行的与互斥锁相关的操作(例如第一操作);这样,当第一线程调用函数pthread_mutex_lock时,会自动执行第一操作(图8采用④标记),该第一操作用于使得第一线程获取到本申请实施例提供的可乱序锁(图8采用⑤标记),基于前述说明可知,该可乱序锁是基于已有的FIFO锁建立的。
其中,重定向(Redirect)就是通过各种方法将各种网络请求重新定个方向转到其它位置。
由此可见,在不存在时延需求时,对于已有的应用程序,本申请实施例仅需要增加一个目标库,并将第一代码段中的与互斥锁相关的操作重定向到目标库中的与互斥锁相关的操作即可,不需要修改应用程序的代码。
基于前文的说明可知,若第一代码段存在时延需求,第一线程在执行第一操作前,还需要执行第二操作,在执行第一操作后,还需要执行第三操作。
第二操作和第三操作的实现方法如下:在第一代码段前添加接口epoch_start(该接口标记epoch的开始,图8采用①标记)和接口epoch_end(该接口标记epoch的结束,图8采用②标记),目标库中还包括通过接口epoch_star调用的代码和通过接口epoch_end调用的代码;当第一线程调用接口epoch_star,则会运行目标库中的代码以执行前文中的第二操作,当第一线程调用接口epoch_end,则会运行目标库中的代码以执行前文中的第三操作。
由此可见,即使第一代码段存在时延需求,也仅需要在应用程序的代码中添加两个接口即可,不需要对应用程序的代码进行过多的修改,工作量较少。
基于前文的说明可知,第一线程在运行第一代码段的过程中,会执行第一操作,从而获取到可乱序锁;需要说明说明的是,本申请实施例对可乱序锁的数量不做具体限定,可乱序锁的数量可以为1个,也可以为多个;当可乱序锁的数量为多个时,则意味着第一线程需要执行多次第一操作,以获取多个可乱序锁,多个可乱序锁可以为不同的锁。
此外,当可乱序锁的数量为多个时,多个可乱序锁之间可以是嵌套的关系,也可以通过条件变量或trylock等接口关联。
基于前文三个实施例的说明可知,第一线程加入等待队列的情况包括三种,第一种情况是:第一核心的优先级高于第二核心的优先级;第二种情况是:第一核心的优先级低于第二核心的优先级,但第一代码段存在时延需求;第三种情况是:第一核心的优先级低于第二核心的优先级,但第一代码段不存在时延需求。
对应这三种情况,第一线程执行的操作不同,为了便于区分,本申请实施例为每种情况设定了相应的接口。
具体地,如图9所示,假设计算机设备的多个核心对应两个优先级;第一线程加入等待队列包括:
第一线程先判断第一核心是否为大核(即是否为高优先级);
若第一核心为大核(即高优先级),则第一线程调用接口lock_immediately,以直接加入等待队列,最终加锁成功;
若第一核心不是大核(即低优先级),则第一线程判断是否在epoch中(即判断是否存在时延需求);
若第一核心不是大核,且在epoch中(即存在时延需求),则第一线程调用接口ock_reorder,以通过执行步骤203至步骤210加入等待队列,最终加锁成功;
若第一核心不是大核,且不在epoch中(即存在时延需求),则第一线程调用接口ock_eventually,以通过执行步骤402至步骤40加入等待队列,最终加锁成功。
可以理解的是,在获取锁的过程中,第一线程需要先加入等待队列,上文对第一线程加入等待队列的三种情况进行了介绍;在获取锁后,第一线程会执行临界区;在执行临界区后,第一线程会释放锁,下面对释放锁的过程进行简单说明。
基于前文说明可知,第一代码段中的与互斥锁相关的操作可以包括调用函数pthread_mutex_lock,函数pthread_mutex_lock用于使得线程获取到互斥锁;除此之外,第一代码段中的与互斥锁相关的操作还可以包括调用函数pthread_mutex_unlock,函数pthread_mutex_unlock用于使得线程释放锁。
由于第一代码段中的与互斥锁相关的代码,被重定向到目标库内部的与互斥锁相关的代码,所以当第一线程调用函数pthread_mutex_unlock时,会自动执行目标库内部的释放可乱序锁的操作,其中,对于前述加入等待队列的三种情况,释放可乱序锁的操作都相同。
为了体现本申请实施例提供的方法的有益效果,申请人进行了如下的测试。
图10示出了使用目标库(LibASL)的吞吐率与使用其他锁的(如自旋锁Spinlock、排号Ticket锁、MCS锁与Pthread互斥锁)吞吐率的比例,以及代码段实际运行的时长是否能够达到设定的目标时延SLO;其中,横轴代表设定不同的目标时延SLO,纵轴代表吞吐率的比例。
如图10所述,当设定时延为6*104ns时,使用目标库(LibASL),LibASL的吞吐率为自旋锁(Spinlock)的吞吐率的1.2倍(即性能提升20%);并且,此时,所有核心的实际尾时延(Total Tail)与设定的SLO(6*104ns)之间为0.948倍,小核的尾时延与设定的SLO(6*104ns)之间为1.002倍(图10未示出)。
除此之外,图10还使用纵向的虚线来表示使用其他的锁时的时延。例如,使用自旋锁时,所有核心的尾时延(Spin Total)为10.4*104ns,而小核的尾时延(Spin Little)为13.1*104ns。
由此可见,当设置合理的SLO时,LibASL的尾时延(无论是小核还是全部核心的尾时延)都能得到保证(即小于SLO);而在吞吐率方面,与排号Ticket锁、MCS锁与Pthread互斥锁相比,LibASL可以在保证满足目标时延的前提下将吞吐率提升40%到66%;而与自旋锁相比,在拥有相同的小核尾时延的同时,LibASL可以将吞吐率提升40%。
图11展示了测试开始的350ms中每一个epoch的时延。图中横轴表示0到350ms的时间轴,而纵轴表示了epoch的时延。图中横实线(10*104ns)标示当前设定的时延SLO的大小。在测试中,我们分别在100ms增大epoch长度(即执行所需时间)8倍;在200ms将epoch的长度恢复为初始长度;在250ms开始频繁变动epoch的长度;最后在300ms将epoch的长度扩大128倍。图中红色与绿色的点分别表示一个小核与一个大核上所有epoch 的执行时延。
从图11中可以看到,在0到25ms,小核上的乱序时长逐步扩大,导致时延也逐步扩大,最终在25ms时达到近似设定的SLO的大小。此时大核上的时延也逐步减少。而在25ms到100ms时,乱序时长大小将往复调整,保证时延在SLO允许的限度内波动。当100ms将epoch长度扩大8倍时,LibASL能够快速适应新的epoch长度,缩小乱序时长的大小,保证epoch的时延在设定的SLO范围内。同样的,当在200ms恢复epoch长度时,LibASL也能快速调整乱序时长的大小最大化吞吐率。当epoch的长度在250ms开始频繁波动时,LibASL同样能够保证时延在SLO内。最后在300ms时,由于epoch的长度扩大128倍,导致设定的SLO无法保证,因此此时不采用LibASL,尽可能保证所有核心上的时延一致。因此可以从图11中看到,此时大核小核上的时延相同。
如图12所示,本申请实施例还提供了一种获取锁资源的装置,应用于计算机设备,计算机设备的处理器包括第一核心和第二核心;装置包括:确定单元601,用于确定第一核心的优先级低于第二核心的优先级;队列加入单元602,用于在等待队列不为空的情况下,延迟加入等待队列,以使得运行在第二核心下的第二线程先于第一线程加入等待队列,等待队列用于竞争锁资源。
作为一种可实现的方式,确定单元601,用于基于第二核心的性能优于第一核心的性能,确定第一核心的优先级低于第二核心的优先级。
作为一种可实现的方式,队列加入单元602,用于在等待队列不为空的情况下第一线程进入退避状态,退避状态为等待加入等待队列的状态;基于进入退避状态的时长大于或等于乱序时长,加入等待队列。
作为一种可实现的方式,队列加入单元602,用于获取乱序时长。
作为一种可实现的方式,队列加入单元602,用于基于第一代码段存在时延需求,获取第一代码段对应的第一时长作为乱序时长,第一时长是基于时延需求获取的。
作为一种可实现的方式,装置还包括:第一设置单元603,用于将全局变量的取值设置为第一代码段的第一标识,全局变量表示存在时延需求的代码段;队列加入单元602,还用于基于全局变量的取值为第一标识,确定第一代码段存在时延需求。
作为一种可实现的方式,装置还包括:第二设置单元604,用于将全局变量的取值设置为第二标识,第二标识不标识任何一个代码段。
作为一种可实现的方式,装置还包括:调整单元605,用于获取第一代码段的实际运行的时长;基于实际运行的时长与时延需求规定的目标时延间的相对大小,调整第一时长。
作为一种可实现的方式,调整单元605,用于基于实际运行的时长大于目标时延,缩短第一时长。
作为一种可实现的方式,调整单元605,用于基于实际运行的时长小于目标时延,延长第一时长。
作为一种可实现的方式,队列加入单元602,用于基于第一代码段不存在时延需求,获取第二时长作为乱序时长。
作为一种可实现的方式,队列加入单元602,用于基于全局变量的取值为第二标识, 确定第一代码段不存在时延需求,全局变量表示存在时延需求的代码段,第二标识不标识任何一个代码段。
作为一种可实现的方式,队列加入单元602,用于基于进入退避状态的时长小于乱序时长,检测等待队列;在等待队列为空的情况下,加入等待队列。
作为一种可实现的方式,队列加入单元602,用于在等待队列为空的情况下,加入等待队列。
其中,以上各单元的具体实现、相关说明以及技术效果请参考前述方法部分的描述。
请参阅图13,图13是本申请实施例提供的计算机设备的一种结构示意图,用于实现图11对应实施例中获取锁资源的装置的功能,具体的,计算机设备1800由一个或多个服务器实现,计算机设备1800可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)1822(例如,一个或一个以上处理器)和存储器1832,一个或一个以上存储应用程序1842或数据1844的存储介质1830(例如一个或一个以上海量存储设备)。其中,存储器1832和存储介质1830可以是短暂存储或持久存储。存储在存储介质1830的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对计算机设备中的一系列指令操作。更进一步地,中央处理器1822可以设置为与存储介质1830通信,在计算机设备1800上执行存储介质1830中的一系列指令操作。
计算机设备1800还可以包括一个或一个以上电源1826,一个或一个以上有线或无线网络接口1850,一个或一个以上输入输出接口1858,和/或,一个或一个以上操作系统1841,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。
本申请实施例中,中央处理器1822,可以用于执行图11对应实施例中获取锁资源的装置执行的获取锁资源的方法。具体的,中央处理器1822,可以用于:
通过运行在第一核心上的第一线程运行第一代码段,以执行以下第一操作:
确定第一核心的优先级低于第二核心的优先级;
在等待队列不为空的情况下,延迟加入等待队列,以使得运行在第二核心下的第二线程先于第一线程加入等待队列,等待队列用于竞争锁资源。
本申请实施例还提供一种芯片,包括一个或多个处理器。所述处理器中的部分或全部用于读取并执行存储器中存储的计算机程序,以执行前述各实施例的方法。
可选地,该芯片该包括存储器,该存储器与该处理器通过电路或电线与存储器连接。进一步可选地,该芯片还包括通信接口,处理器与该通信接口连接。通信接口用于接收需要处理的数据和/或信息,处理器从该通信接口获取该数据和/或信息,并对该数据和/或信息进行处理,并通过该通信接口输出处理结果。该通信接口可以是输入输出接口。
在一些实现方式中,所述一个或多个处理器中还可以有部分处理器是通过专用硬件的方式来实现以上方法中的部分步骤,例如涉及神经网络模型的处理可以由专用神经网络处理器或图形处理器来实现。
本申请实施例提供的方法可以由一个芯片实现,也可以由多个芯片协同实现。
本申请实施例还提供了一种计算机存储介质,该计算机存储介质用于储存为上述计算机设备所用的计算机软件指令,其包括用于执行为计算机设备所设计的程序。
该计算机设备可以如前述图11对应实施例中获取锁资源的装置。
本申请实施例还提供了一种计算机程序产品,该计算机程序产品包括计算机软件指令,该计算机软件指令可通过处理器进行加载来实现前述各个实施例所示的方法中的流程。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。

Claims (18)

  1. 一种获取锁资源的方法,其特征在于,应用于计算机设备,所述计算机设备的处理器包括第一核心和第二核心;
    所述方法包括:
    通过运行在所述第一核心上的第一线程运行第一代码段,以执行以下第一操作:
    确定所述第一核心的优先级低于所述第二核心的优先级;
    在等待队列不为空的情况下,延迟加入所述等待队列,以使得运行在所述第二核心下的第二线程先于所述第一线程加入所述等待队列,所述等待队列用于竞争锁资源。
  2. 根据权利要求1所述的方法,其特征在于,所述确定所述第一核心的优先级低于所述第二核心的优先级包括:
    基于所述第二核心的性能优于所述第一核心的性能,确定所述第一核心的优先级低于所述第二核心的优先级。
  3. 根据权利要求1或2所述的方法,其特征在于,所述在等待队列不为空的情况下,延迟加入所述等待队列包括:
    在所述等待队列不为空的情况下,所述第一线程进入退避状态,所述退避状态为等待加入所述等待队列的状态;
    基于所述进入退避状态的时长大于或等于乱序时长,加入所述等待队列。
  4. 根据权利要求3所述的方法,其特征在于,所述第一操作还包括:
    获取所述乱序时长。
  5. 根据权利要求4所述的方法,其特征在于,所述获取所述乱序时长包括:
    基于所述第一代码段存在时延需求,获取所述第一代码段对应的第一时长作为所述乱序时长,所述第一时长是基于所述时延需求获得的。
  6. 根据权利要求5所述的方法,其特征在于,所述方法还包括:
    在运行所述第一代码段前,通过所述第一线程执行以下第二操作:
    将全局变量的取值设置为所述第一代码段的第一标识,所述全局变量表示存在时延需求的代码段;
    在所述基于第一代码段存在时延需求,将所述第一代码段对应的第一时长作为所述乱序时长之前,所述第一操作还包括:
    基于所述全局变量的取值为所述第一标识,确定所述第一代码段存在时延需求。
  7. 根据权利要求6所述的方法,其特征在于,所述方法还包括:
    在运行所述第一代码段后,通过所述第一线程执行以下第三操作:
    将所述全局变量的取值设置为所述第二标识,所述第二标识不标识任何一个代码段。
  8. 根据权利要求5或6所述的方法,其特征在于,所述方法还包括:
    在运行所述第一代码段后,通过所述第一线程执行以下第三操作:
    获取所述第一代码段的实际运行的时长;
    基于所述实际运行的时长与所述时延需求规定的目标时延间的相对大小,调整所述第一时长。
  9. 根据权利要求8所述的方法,其特征在于,所述基于所述实际运行的时长与目标时延间的相对大小,调整所述第一时长包括:
    基于所述实际运行的时长大于所述目标时延,缩短所述第一时长。
  10. 根据权利要求8所述的方法,其特征在于,所述基于所述实际运行的时长与目标时延间的相对大小,调整所述第一时长包括:
    基于所述实际运行的时长小于所述目标时延,延长所述第一时长。
  11. 根据权利要求4所述的方法,其特征在于,所述获取所述乱序时长包括:
    基于所述第一代码段不存在时延需求,获取第二时长作为所述乱序时长。
  12. 根据权利要求11所述的方法,其特征在于,在所述基于所述第一代码段不存在时延需求,获取第二时长作为所述乱序时长之后,所述第一操作还包括:
    基于全局变量的取值为第二标识,确定所述第一代码段不存在时延需求,所述全局变量表示存在时延需求的代码段,所述第二标识不标识任何一个代码段。
  13. 根据权利要求3所述的方法,其特征在于,所述在等待队列不为空的情况下,延迟加入所述等待队列还包括:
    基于所述进入退避状态的时长小于所述乱序时长,检测所述等待队列;
    在所述等待队列为空的情况下,加入所述等待队列。
  14. 根据权利要求1至12中任意一项所述的方法,其特征在于,在所述确定所述第一核心的优先级低于所述第二核心之后,所述第一操作还包括:
    在所述等待队列为空的情况下,加入所述等待队列。
  15. 一种获取锁资源的装置,其特征在于,应用于计算机设备,所述计算机设备的处理器包括第一核心和第二核心;
    所述装置包括:
    确定单元,用于确定所述第一核心的优先级低于所述第二核心的优先级;
    队列加入单元,用于在所述等待队列不为空的情况下,延迟加入所述等待队列,以使得运行在所述第二核心下的第二线程先于所述第一线程加入所述等待队列,所述等待队列用于竞争锁资源。
  16. 一种计算机设备,其特征在于,包括存储器和处理器,其中,所述存储器用于存储计算机可读指令;所述处理器用于读取所述计算机可读指令并实现如权利要求1-14任意一项所述的方法。
  17. 一种计算机存储介质,其特征在于,存储有计算机可读指令,且所述计算机可读指令在被处理器执行时实现如权利要求1-14任意一项所述的方法。
  18. 一种计算机程序产品,其特征在于,所述计算机程序产品中包含计算机可读指令,当该计算机可读指令被处理器执行时实现如权利要求1-14任意一项所述的方法。
PCT/CN2022/125241 2021-10-21 2022-10-14 一种获取锁资源的方法、装置及设备 WO2023066141A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111227144.7 2021-10-21
CN202111227144.7A CN116010040A (zh) 2021-10-21 2021-10-21 一种获取锁资源的方法、装置及设备

Publications (1)

Publication Number Publication Date
WO2023066141A1 true WO2023066141A1 (zh) 2023-04-27

Family

ID=86032226

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/125241 WO2023066141A1 (zh) 2021-10-21 2022-10-14 一种获取锁资源的方法、装置及设备

Country Status (2)

Country Link
CN (1) CN116010040A (zh)
WO (1) WO2023066141A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117453378A (zh) * 2023-12-25 2024-01-26 北京卡普拉科技有限公司 多应用程序间i/o请求调度方法、装置、设备及介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110185154A1 (en) * 2008-07-22 2011-07-28 Elektrobit Automotive Software Gmbh Synchronization of multiple processor cores
CN105071973A (zh) * 2015-08-28 2015-11-18 迈普通信技术股份有限公司 一种报文接收方法及网络设备
CN107329810A (zh) * 2016-04-28 2017-11-07 飞思卡尔半导体公司 用于多核处理器的信号机
US20180293113A1 (en) * 2017-04-05 2018-10-11 Cavium, Inc. Managing lock and unlock operations using active spinning
CN112765088A (zh) * 2019-11-04 2021-05-07 罗习五 利用数据标签提高多计算单元平台上数据共享的方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110185154A1 (en) * 2008-07-22 2011-07-28 Elektrobit Automotive Software Gmbh Synchronization of multiple processor cores
CN105071973A (zh) * 2015-08-28 2015-11-18 迈普通信技术股份有限公司 一种报文接收方法及网络设备
CN107329810A (zh) * 2016-04-28 2017-11-07 飞思卡尔半导体公司 用于多核处理器的信号机
US20180293113A1 (en) * 2017-04-05 2018-10-11 Cavium, Inc. Managing lock and unlock operations using active spinning
CN112765088A (zh) * 2019-11-04 2021-05-07 罗习五 利用数据标签提高多计算单元平台上数据共享的方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117453378A (zh) * 2023-12-25 2024-01-26 北京卡普拉科技有限公司 多应用程序间i/o请求调度方法、装置、设备及介质
CN117453378B (zh) * 2023-12-25 2024-03-19 北京卡普拉科技有限公司 多应用程序间i/o请求调度方法、装置、设备及介质

Also Published As

Publication number Publication date
CN116010040A (zh) 2023-04-25

Similar Documents

Publication Publication Date Title
JP6974260B2 (ja) 高帯域幅メモリー(hbm+)システム、及び高帯域幅メモリーシステムにおいてメモリーコマンドを調整する方法
US9996402B2 (en) System and method for implementing scalable adaptive reader-writer locks
US8914800B2 (en) Behavioral model based multi-threaded architecture
US8020166B2 (en) Dynamically controlling the number of busy waiters in a synchronization object
WO2023066141A1 (zh) 一种获取锁资源的方法、装置及设备
KR101686082B1 (ko) 결정적 프로그레스 인덱스 기반 락 제어 및 스레드 스케줄링 방법 및 장치
US6295602B1 (en) Event-driven serialization of access to shared resources
KR100850387B1 (ko) 패시브 스레드 및 액티브 세마포어를 갖는 처리 아키텍쳐
JP2010128664A (ja) マルチプロセッサシステム、競合回避プログラム及び競合回避方法
JP2013218744A (ja) リソースに基づいたスケジューラ
CN111459622B (zh) 调度虚拟cpu的方法、装置、计算机设备和存储介质
US20090089555A1 (en) Methods and apparatus for executing or converting real-time instructions
US11269693B2 (en) Method, apparatus, and electronic device for improving CPU performance
US9229716B2 (en) Time-based task priority boost management using boost register values
CN106020333B (zh) 多核定时器实现方法和多核系统
US20070074222A1 (en) Thread scheduling apparatus, systems, and methods
KR101377195B1 (ko) 컴퓨터 마이크로 작업
JP2008225641A (ja) コンピュータシステム、割り込み制御方法及びプログラム
JP2019079336A (ja) 数値制御装置
Erickson et al. Reducing tardiness under global scheduling by splitting jobs
CN112714492A (zh) 一种uwb数据包处理方法、系统、电子设备及其存储介质
CN115981829B (zh) 一种物联网中的调度方法及系统
Lee et al. Interrupt handler migration and direct interrupt scheduling for rapid scheduling of interrupt-driven tasks
WO2022174442A1 (zh) 多核处理器、多核处理器的处理方法及相关设备
US20040103414A1 (en) Method and apparatus for interprocess communications

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22882750

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022882750

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022882750

Country of ref document: EP

Effective date: 20240416

NENP Non-entry into the national phase

Ref country code: DE