CN117009079A - Method and device for accessing critical section - Google Patents

Method and device for accessing critical section Download PDF

Info

Publication number
CN117009079A
CN117009079A CN202310962715.4A CN202310962715A CN117009079A CN 117009079 A CN117009079 A CN 117009079A CN 202310962715 A CN202310962715 A CN 202310962715A CN 117009079 A CN117009079 A CN 117009079A
Authority
CN
China
Prior art keywords
processor core
processor
instruction
critical section
core
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310962715.4A
Other languages
Chinese (zh)
Inventor
王振
邵立松
闫志伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Phytium Technology Co Ltd
Original Assignee
Phytium Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Phytium Technology Co Ltd filed Critical Phytium Technology Co Ltd
Priority to CN202310962715.4A priority Critical patent/CN117009079A/en
Publication of CN117009079A publication Critical patent/CN117009079A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5022Mechanisms to release resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Power Sources (AREA)

Abstract

The application provides a method and a device for accessing a critical section, wherein the method comprises the following steps: if the lock in the critical section is occupied by the first processor core, the second processor core enters a low power consumption state; after the first processor core releases the lock of the critical section, the first processor core wakes up the second processor core in a low power consumption state based on a first instruction, wherein the first instruction carries a first parameter, and the first parameter is used for indicating that a target processor core of the first instruction is the second processor core; after the second processor core is awakened, the second processor core preempts the lock of the critical section such that the second processor accesses the critical section. The application introduces the first parameter into the instruction for waking up the processor cores, thereby being capable of indicating the target processor core to be woken up and avoiding the problem of large power consumption caused by waking up all the processor cores.

Description

Method and device for accessing critical section
Technical Field
The application relates to the technical field of information technology, in particular to a method and a device for accessing a critical section.
Background
In a multi-core processor, in order to guarantee the atomicity of data accessed by the multi-core processor, a lock mechanism is generally used to protect data structures in a critical section. When a lock within a critical section is released by one processor core, the processor core wakes up all other processor cores in a low power state with a wake instruction. This wake-up approach of the processor core introduces many inefficient operations, thereby increasing the power consumption of the processor core.
Disclosure of Invention
The present application is directed to a method and apparatus for accessing critical sections, which are described in several aspects below.
In a first aspect, a method of accessing a critical section is provided, the method being applied to a processor, the processor including a first processor core and a second processor core, the method comprising: if the lock in the critical section is occupied by the first processor core, the second processor core enters a low power consumption state; after the first processor core releases the lock of the critical section, the first processor core wakes up the second processor core in a low power consumption state based on a first instruction, wherein the first instruction carries a first parameter, and the first parameter is used for indicating that a target processor core of the first instruction is the second processor core; after the second processor core is awakened, the second processor core preempts the lock of the critical section such that the second processor accesses the critical section.
In a second aspect, there is provided a processor comprising: a storage section for storing a code; a first processor core and a second processor core for executing the code to perform the steps of: if the lock in the critical section is occupied by the first processor core, the second processor core enters a low power consumption state; after the first processor core releases the lock of the critical section, the first processor core wakes up the second processor core in a low power consumption state based on a first instruction, wherein the first instruction carries a first parameter, and the first parameter is used for indicating that a target processor core of the first instruction is the second processor core; after the second processor core is awakened, the second processor core preempts the lock of the critical section such that the second processor accesses the critical section.
In a third aspect, there is provided an electronic device comprising a processor as described in the second aspect.
In a fourth aspect, a computer readable storage medium is provided, the computer readable storage medium storing program code which, when run on a computer, causes the computer to perform the method of accessing a critical section according to the first aspect.
In a fifth aspect, a computer program product is provided, the computer program product comprising a computer program/instruction which, when executed by the computer program/instruction processor, implements the method of accessing critical sections according to the first aspect.
In some implementations, the computer program product includes computer program code that can include program code that, when run on a computer, causes the computer to perform the method of accessing critical sections shown in the first aspect.
The application introduces a first parameter in the wake-up instruction. The first parameter may indicate a destination processor core to be awakened, so that a problem of high power consumption caused by awakening all processor cores in a low power consumption state may be avoided.
Drawings
FIG. 1 is a schematic flow diagram of CPU pipeline technology.
FIG. 2 is a diagram showing the coding structure of the A64 instruction set.
FIG. 3 is a schematic flow diagram illustrating processor core access to critical sections in a multi-core processor employing spin locking.
FIG. 4 is a schematic flow chart of a method for accessing critical sections according to an embodiment of the present application.
Fig. 5 is a schematic flow chart diagram showing a possible implementation of step S420 in fig. 4.
Fig. 6 is a schematic diagram of a coding structure of a target instruction according to an embodiment of the present application.
Fig. 7 is a schematic diagram of a hardware processing flow of a method for accessing a critical section according to an embodiment of the present application.
Fig. 8 is a schematic structural diagram of a processor according to an embodiment of the present application.
Fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments.
The performance of the processor is particularly important for electronic devices. The performance of a processor depends on the design of the processor architecture, which is the design and implementation of the processor instruction set, and determines the performance, power consumption, compatibility, etc. characteristics of the processor. Processor architectures can be divided on a large scale into two classes, one being complex instruction set computers (complex instruction set computer, CISC) and the other being reduced instruction set computers (reduced instruction set computing, RISC).
Of the two processor architectures described above, RISC is a more optimal architecture based on which program execution can be accomplished more efficiently. RISC architecture is based on CPU pipelining, which divides the process of executing an instruction by a processor into multiple stages, with each stage being processed by a different unit. The CPU pipeline typically divides execution of an instruction into five stages, including a fetch stage, a decode stage, an execute stage, a memory access stage, and a write back stage. During these five phases, the transfer of instructions among the units of the processor may be as shown in FIG. 1.
The fetch stage is the process of fetching instructions from memory. Processor instructions may generally include memory access instructions, arithmetic operation instructions, logic operation instructions, and the like. The fetch stage is performed by the fetch unit 110 of the processor, and the fetch unit 110 fetches instructions from memory and passes the instructions to the decode unit 120.
The decode stage is the process of translating instructions fetched from memory and is typically performed by the decode unit 120 of the processor. The decoding by the decoding unit 120 may result in operations to be performed by the instruction, such as performing operations or performing memory accesses. The destination register index of the instruction may also be obtained by decoding, which may be used to read operands from or store results into the destination register. As shown in fig. 1, decode unit 120 passes the translated instruction to issue logic 130, where issue logic 130 is responsible for issuing the instruction to execution unit 140.
The execution phase refers to the process of actually operating on an instruction. For example, if the instruction is an add instruction, then the operands are added; in the case of a subtraction instruction, the operands are subtracted. After an instruction arrives at the execution unit 140, the instruction is executed by the processing unit corresponding to the instruction. The execution phase of an arithmetic operation instruction, for example, may be performed by an integer 141 or floating point 142 unit of the processor.
The memory access phase refers to the process (not shown) by which memory access instructions read data from or write data to memory, and may be performed by load/store (load/store) component 143 of the processor.
The write-back stage is a process of writing back the result of instruction execution to the general register set (write-back unit 150), and is responsible for the execution unit corresponding to the instruction. For example, for an arithmetic operation instruction, the execution result value is from the result of the execution stage calculation, the result being written back into the instruction's target register by integer 141 or floating point 142; for memory access instructions, the results of execution come from the data read from memory during the memory access phase, and are written back to the instruction's target register by load/store unit 143.
As shown in fig. 1, the CPU pipeline is completed by the cooperation of the units of the processor, so that the processor can effectively perform various computing tasks. Through CPU pipeline technology, the overlapping of each stage of different instructions can be realized, so that the parallel processing of several instructions is realized, and the running process of a program is accelerated. Therefore, based on CPU pipeline technology, RISC architecture can realize efficient execution of programs.
The advanced reduced instruction set computer (advanced reduced instruction set computing machines, ARM) architecture is a typical RISC architecture. As technology evolves, ARM evolves into many versions. For example, the ARMv8 architecture is one of the ARM architectures in widespread use today, which aims to provide higher performance and lower power consumption. Compared with the previous version, the ARMv8 architecture can support a larger memory address space and higher computing performance, and simultaneously support functions such as virtualization and security expansion, so that the ARMv8 architecture is widely applied to the fields of mobile equipment, servers, internet of things equipment and the like. For example, in the field of mobile devices, most mobile devices such as smartphones, tablet computers, notebook computers and the like use a processor with an ARMv8 architecture. Also, for example, in the server field, cloud computing and big data processing are particularly well-suited for application to the ARMv8 architecture. In the field of internet of things equipment, common smart home such as smart speakers, smart door locks, common automatic driving automobiles and the like also adopt a processor with an ARMv8 architecture.
The ARMv8 architecture supports two execution modes: a 64-bit mode and a 32-bit mode. The 64-bit mode may run a 64-bit application while the 32-bit mode is compatible with previous ARM architecture versions and may run a 32-bit application. Therefore, the ARMv8 architecture processor can run 32-bit and 64-bit application programs simultaneously, and provides more powerful computing power and wider application scenarios.
The ARMv8 architecture also provides more efficient registers than previous versions. ARMv8 provides 31 64-bit general purpose registers R0-R30 that are always accessible and can be accessed at all exception levels. In the 32-bit execution mode, each general purpose register is 32 bits wide. In the 64-bit execution mode, however, each register is 64 bits wide. The increased width helps to reduce register pressure in most applications, providing an acceleration in performance.
The ARMv8 architecture also uses a more compact a64 instruction set. The instruction set is a set of meaningful machine code segments that the processor can recognize, and the design of the instruction set is the most important part of the processor architecture. The a64 instruction set supports 64-bit registers, instructions, and memory addresses. The A64 instruction set comprises a data processing instruction, a memory access instruction, a miscellaneous instruction and the like. Where the data processing instructions include arithmetic and logical operation instructions, multiply and divide instructions, conditional instructions, and the like. Memory access instructions include load instructions, store instructions, and the like. Miscellaneous instructions include branch instructions, exception handling instructions, system instructions, hint instructions, and the like.
The A64 instruction set has a fixed instruction format and short length, can support high-efficiency control operation, and realizes a high-speed pipeline with a simple structure. A64 instruction width 32bits of ARMv8, a general instruction encoding structure is shown in FIG. 2 (a), wherein 28-25 bits are the most significant opcode 0, op0, indicating the type of instruction operation and op0 for different types of instructions. For example, the op0 of a data processing instruction is x101, the op0 of a load/store instruction is x1x0, and the op0 of a miscellaneous instruction is 101x. Further, different types of instructions may be subdivided into different subdivided instruction types, so the specific encoding structure of the different types of instructions may vary. Taking the miscellaneous instruction as an example, the general instruction encoding structure of the miscellaneous instruction is shown in fig. 2 (b). The 31-29 bits are the most significant operation code op0 bits in the miscellaneous instructions, the 28-26 bits are fixed to 101 to indicate that the instruction type is the miscellaneous instructions, the 25-12 bits are the next-higher operation code (operation code 1, op 1) bits, the 4-0 bits are the next-lower operation code (operation code 2, op 2) bits, and the op1 and op 2bits are used for further determining the subdivision type of the instruction. Taking the hint instruction in the miscellaneous instruction as an example, the op0 bit is 110, the op1 bit is 0100000010010, and the op 2bit is 11111.
Still further, hint instructions can also be divided into different sub-divided instructions, such as send event SEV instructions, send local event (SEVL) instructions, abnormal sync barrier (exception synchronization barrier, ESB) instructions, wait for event (WFE) instructions, wait for event timeout (wait for event timeout, WFET) instructions, etc. The hint instruction is also in a more specific encoding format as shown in figure 2 (c). Wherein bits 11-8 are Control Register (CR) m bits, representing one field in the control register for specifying a particular register in the control register. Bits 7-5 are op2 to determine the further subdivision class of the instruction. For example, the CRm of an SEV instruction is 0101op2 to 100; CRm of the SEVL instruction is 0000 and op2 is 101; the CRm of the ESB instruction is 0010 and op2 is 000.
The ARMv8 architecture also provides support for a system that contains multiple processors, i.e., a multiprocessor system. The multiprocessor system may include a plurality of single-core processors or a plurality of multi-core processors in the ARMv8 architecture. A multi-core processor refers to a processor that may typically contain multiple processing cores (cores), each of which may independently execute instructions. The multi-core processor may be a dual core, quad core, six core, eight core, or the like, of different numbers of cores. For example, cortex-A57 multi-core processors (Cortex-A57 multi-core processor, cortex-A57 MCP) and Cortex-A53 multi-core processors (Cortex-A53 multi-core processor, cortex-A53 MCP) may contain one to four cores. The multi-core processor can process multiple tasks simultaneously, thereby improving processing speed and efficiency, especially in applications requiring a large amount of parallel processing. The overall power consumption of a multi-core processor may be significantly lower than a single processor core based system. Because the multiple cores may complete program execution faster, certain elements of the system may be shut down completely for a longer period of time. In addition, a system with multiple cores may operate at a lower frequency than a single processor to achieve the same throughput, and lower power supply voltages may consume lower power consumption. Having multiple cores may also provide more options for system configuration, and multi-core devices may also respond faster than single-core devices. Therefore, the multi-core processor is widely applied to the fields of high-performance computing, graphic processing, big data processing, games, general application processors, embedded systems and the like.
Multiple cores in a multi-core processor share resources such as memory and peripherals during execution of instructions, and multiple cores may access and/or modify certain resources at the same time, e.g., multiple cores access and/or some global variable or data structure at the same time. But if multiple cores modify the same resource at the same time, the atomicity of the data is destroyed. The atomicity of data means that one or a series of operations of operation data cannot be interrupted, and when other programs acquire the operation data, only the data before and after the operation can be acquired, and intermediate data in the operation process cannot be acquired. If the atomicity of the data is compromised, i.e., some processor cores acquire intermediate data during the operation of the data by other processor cores, then the data in the different processor cores may be inconsistent, which may lead to program confusion. In a multiprocessor operating system, therefore, the atomicity of the data accessed by the multi-core processor is guaranteed.
To guarantee the atomicity of the data accessed by the multi-core processor, the processor architecture provides a corresponding mechanism. For example, in a multi-core processor of the ARMv8 architecture, the atomicity of data in a common memory can be guaranteed by an instruction related to exclusive access. Instructions related to exclusive access rely on the capability of the core or memory system to tag a particular address so that the core uses the exclusive access monitor to monitor for exclusive access to the particular address. These instructions include load exclusive register (load exclusive register, LDXR) instructions, store exclusive register (store exclusive register, STXR) instructions, and clear exclusive (CLREX) instructions. When a certain processor core uses an instruction related to exclusive access, other processor cores cannot access the data accessed by the processor core any more, so that the atomicity of the data is ensured.
In addition, the ARMv8 architecture provides a lock mechanism to control access of multiple processor cores to peripheral devices. The lock mechanism is to lock a certain memory address, and the lock can effectively prevent different processors from simultaneously changing the same data structure, so as to protect the data structures accessed by a plurality of processors simultaneously. Common locks include spin locks, mutex locks, and the like. Spin-locks are a type of busy-wait based lock that, when a processor core attempts to acquire a lock, if the lock is already held by other processor cores, the processor core will continually cycle through the attempt to acquire the lock until the lock is released. A mutex lock is a type of blocking lock that when one processor core attempts to acquire a lock, if another processor core is accessing a data structure, indicating that the current lock is already occupied, meaning that the other processor core cannot do anything else with respect to the data structure, waiting for the release of the lock. It can thus be seen that by mutex or spin locking, it is possible to allow only one processor core to use a shared resource at a time, such shared resource being referred to as a critical resource, and the section of the program that each processor core accesses the critical resource being referred to as a critical section. That is, only one processor core can access the critical section code at a certain time point through the mutual exclusion lock or the spin lock, so that the consistency of operation data in the critical section is ensured.
For ease of understanding, the process by which a certain processor core accesses critical sections is described below in connection with FIG. 3. FIG. 3 is a schematic flow diagram of a method for processor cores to access critical sections in a multi-core processor employing spin locking. The method shown in fig. 3 includes steps S310-S330.
In step S310, to access the data structure of the critical section, the processor core first determines whether the critical section can be entered, and the processor core may determine by means of a read lock identifier. The lock identifier is used to indicate the state of the lock and may be a variable or data structure, such as a simple boolean variable or an integer variable. Taking a lock with integer variable structure as an example, a lock is already occupied when the variable is 1 and a lock is available when the variable is 0.
Referring to step S320, if the lock is not occupied by another processor core, the current processor core occupies the lock and enters the critical section. If the lock is already occupied by another processor core, the current processor core may enter a low power state waiting for the other processor core to release the lock. It is common to find that a lock is occupied when a certain processor core accesses a critical area, and does not actively enter a low power state, but rather attempts to acquire the lock until successful. This increases the power consumption of the operating system, and thus the processor architecture provides some mechanism for the non-preempted processor cores of the lock to enter a low power state to reduce the power consumption of the system. For example, in the current ARMv8 specification, a wait for event mechanism is provided to cause the processor core to enter a low power consumption state. The processor core may be brought into a low power consumption state by executing the WFE instruction and the WFET instruction in the hint instruction. The low power state may be understood as a power saving mode of the processor core, in which the processor core may turn off or reduce current of its internal circuits to reduce power consumption, while maintaining a certain operating state to quickly respond when needed.
Continuing with step S330, the processor core in the low power consumption state needs to wake up actively by other processor cores to restore to the normal execution state. Similarly, when waking up a processor core in a low power state, a corresponding event mechanism is also required. For example, in the ARMv8 specification, a processor core that enters a low power state by waiting for an event mechanism may wake up by sending an event mechanism. The send event mechanism provides a wake instruction to wake the processor core in a low power state, such as SEV instructions and SEVL instructions in hint instructions. When the current processor core preempted to the lock exits the critical section after the program is executed, a wake-up instruction can be used to wake up the processor core in the low power consumption state to restore the normal execution state. The SEV instruction can send event signals to all processor cores in the multiprocessor system in a broadcast mode, and wake up all the processor cores; whereas the SEVL instruction will only issue an event signal to the local processor core, i.e., the current processor core, waking up the local processor core. Neither SEV instructions nor SEVL instructions support issuing event signals to a particular processor core that causes the processor core to exit a low power state.
Since SEV instructions are in the form of broadcast, all processor cores in the system will wake up, whether or not they need to be. This is because the SEV instruction does not carry any multiprocessor identification (multiprocessor identification, MPID) value or mask identification of the processor cores, so the SEV instruction cannot wake up a particular processor core. However, in the lock use scenario, critical section data structures are only allowed to be accessed by one of the processor cores. The processor cores after awakening will still preempt the lock and eventually only one processor core will be able to enter the critical section. Other processor cores that do not enter the critical section may re-enter the low power state. In this scheme of waking up multiple processor cores in a broadcast manner, waking them up to perform corresponding operations may be regarded as an ineffective operation for the processor cores that are woken up but not preempted to the lock, which may result in greater power consumption of the processor.
In view of the above problems, the embodiment of the application provides a method for accessing a critical section. The embodiment of the application introduces a first parameter in the wake-up instruction. The first parameter may indicate a destination processor core to be awakened, so that a problem of high power consumption caused by awakening all processor cores in a low power consumption state may be avoided.
Further, in the method for accessing a critical section according to the embodiment of the present application, since the first parameter can enable the wake-up instruction to not wake up other non-destination processor cores, an invalid operation can be avoided, and therefore, the system bandwidth can also be reduced.
The following describes in detail the method for accessing the critical section according to the embodiment of the present application with reference to fig. 4. It should be appreciated that the method provided by the embodiment of the present application is applicable to the multi-core processor described above. The multi-core processor includes at least a first processor core and a second processor core. Referring to fig. 4, the method provided by the embodiment of the present application includes steps S410 to S430.
In step S410, if the lock within the critical section is occupied by a first processor core, a second processor core enters a low power consumption state. As described above, in a multi-core processor, only one processor core can preempt a lock at a time to ensure atomicity of data. When the second processor core accesses the critical area, the lock is found to be occupied by the first processor core, and the second processor core enters a low power consumption state to reduce the power consumption of the operating system.
In some implementations, the second processor core may determine whether the lock of the critical section is preempted by reading the identification of the lock, where the implementation of the second processor core determining whether the lock of the critical section is preempted based on the identification of the lock may be referred to above in connection with the description of fig. 3. For brevity, the description is omitted here.
In some implementations, the lock is found to be occupied by other processor cores when the second processor core accesses the critical area, at which point the second processor core may enter a low power consumption state by executing a wait for event instruction. The wait for event instruction may be a WEF or WEFT instruction, or the like.
In step S420, after the first processor core releases the lock of the critical section, the first processor core wakes up the second processor core in a low power consumption state based on the first instruction. The first instruction may carry a first parameter for indicating that the destination processor core of the first instruction is the second processor core. The format of the first instruction will be described below in conjunction with fig. 6, and is not described herein for brevity.
In some implementations, the first parameter may be an identification of a register that identifies the register in which the identification of the second processor core is stored. In some scenarios, the second processor core may occupy more bits, and the first instruction may not have enough space to directly carry the identifier of the second processor core, so in the embodiment of the present application, the first instruction may indirectly carry the identifier of the second processor core by carrying the identifier of the register. Of course, in the embodiment of the present application, if the above problem is not considered, the identifier of the second processor core may also be directly carried as the first parameter in the first instruction.
In some implementations, when the first processor core needs to wake up the second processor core, the identifier of the second processor core to be woken up may be loaded into the register first, and then an event signal may be sent to the second processor core to be woken up through the first instruction. In this way, the first processor core may issue a wake-up instruction to a specific destination processor core to exit the low power state, so as to avoid waking up other non-destination processor cores at the same time.
In some implementations, the register may be a 64-bit general purpose register R0-30. Of course, the registers in the embodiments of the present application may be other types of registers.
In some implementations, the identification of the second processor core herein may be a unique identifier of the second processor core, such as an MPID value or a mask identification, etc., which helps to improve the accuracy with which the first processor core identifies the second processor core to wake up.
In step S430, after the second processor core is awakened, the second processor core preempts the lock of the critical section such that the second processor core accesses the critical section.
For ease of understanding, a possible implementation of the first processor core waking up the second processor core in an embodiment of the present application is described below in connection with fig. 5. Fig. 5 is a schematic flow chart diagram of one possible implementation of step S420 in fig. 4. Fig. 5 shows steps S510-S530.
In step S510, the first processor core obtains a target instruction from the memory. As described above, the first stage of instruction execution in CPU pipeline technology is the fetch stage, where the first processor core may load the target instruction from memory via the fetch unit.
In step S520, the first processor decodes the target instruction to obtain a first parameter. For example, the first processor core may decode the target instruction by a decode unit. By decoding, the first processor core may determine that the operation of the target instruction is to wake up the target processor core and obtain a first parameter indicative of the target processor core.
In step S530, the first processor core transmits an event notification signal to the second processor core according to the first parameter to wake up the second processor core in a low power consumption state. For example, the first processor core may obtain, through the first parameter, which general purpose register holds the identifier of the second processor core, and then send an event notification signal to the second processor core according to the identifier of the second processor core, so as to wake up the second processor. In this way, the first processor core may effectively recognize and wake the second processor core.
Additionally, in some implementations, the aforementioned target instruction type may be a send event instruction, which may be obtained by modifying the send event instruction provided by the system architecture. The instruction set is expanded on the basis of sending the event instruction, so that the target instruction can be ensured to accord with the structural rule of the instruction in the system architecture. For example, an SEV instruction in the ARMv8 architecture may be modified, in which a first parameter is added to store the identity of the register mentioned above. The format definition of the target instruction may be as shown in fig. 6, depending on the system instruction format defined by the architecture. In the ARMv8 architecture, each instruction has a fixed four byte size. Bits 0-4 may be set to register identification bit Rt for storing the register identification in which the MPID value of the destination processor core may be stored. Then the target instruction may be represented using the SEV tape parameter instruction SEV Rt. Rt can be a 64-bit general purpose register R0-R30, and the range of values is 00000-11110. In addition, according to the A64 instruction set format described above, CRm takes on the value 0101 and op2 takes on the value 100. The function of the instruction is to send an event signal to the processor core identified by the processor core MPID value stored in Rt, so that the processor core exits the low power consumption state.
In the following, with reference to fig. 7, a process flow of the method for accessing a critical section provided by the embodiment of the present application on hardware is described by taking an example in which a target instruction is an SEV with reference instruction shown in fig. 6 and an identifier of a second processor core is an MPID value. The method shown in fig. 7 includes steps S710-S740.
In step S710, the first processor core loads an SEV with reference instruction (not shown) from the memory through the instruction fetch unit 710, and passes the SEV with reference instruction to the decode unit 720.
In step S720, the decoding unit 720 decodes the instruction, the operation from which the instruction was obtained is an SEV operation, and which of the registers holds the MPID value of the second processor core. The decode unit 720 then passes the instruction related information to the transmit logic unit 730.
In step S730, the issue logic unit 730 sends instruction related information to the execution unit 740.
In step S740, the SEV tape instruction is executed by the SEV execution unit 741 corresponding to the SEV instruction in the execution unit 740. The SEV execution component 741 needs to determine the event (event) hard-wired to which the MPID value corresponds and then send an event signal according to the event hard-wired to the output (input) channel in the event interface unit 742 to the input (input) channel of the event interface unit 742 of the second processor core, which receives and actively responds to the event signal. The second processor core resumes normal execution if in the low power mode. This completes the wake-up of the second processor core and does not wake up other non-destination processor cores.
The method embodiment of the present application is described above in detail with reference to fig. 1 to 7, and the apparatus embodiment of the present application is described below in detail with reference to fig. 8 and 9. It is to be understood that the description of the method embodiments corresponds to the description of the device embodiments, and that parts not described in detail can therefore be seen in the preceding method embodiments.
FIG. 8 is a schematic diagram of a processor according to an embodiment of the application. As shown in fig. 8, the processor 800 includes a memory unit 810, a first processor core 820, and a second processor core 830.
The memory unit 810 is used to store codes and provide instructions to the first processor core 820, and the memory unit 810 may include read only memory and random access memory. A portion of storage component 810 can also include non-volatile random access memory, for example, storage component 810 can also store information of a device type.
A first processor core 820 and a second processor core 830 for executing code to perform the steps of: if a lock within the critical section is occupied by first processor core 820, second processor core 830 enters a low power consumption state; after the first processor core 820 releases the lock of the critical section, the first processor core 820 wakes up the second processor core 830 in the low power consumption state based on the first instruction, where the first instruction carries a first parameter, and the first parameter is used to indicate that the destination processor 800 core of the first instruction is the second processor core 830; and after the second processor core 830 is awakened, the second processor core 830 preempts the lock of the critical section such that the second processor core 830 accesses the critical section.
In one possible implementation, the first parameter is an identification of a register, and the register has stored therein an identification of the second processor core 830. The register type is not limited and may be one 64-bit general purpose register R0 to 30. In addition, the identification of the second processor core is not limited, and may be a unique identifier of the second processor core, such as an MPID value or a mask identification.
In one possible implementation, the first processor core 820 is configured to: acquiring a target instruction from a memory, and decoding the target instruction to acquire a first parameter; an event notification signal is sent to the second processor core 830 according to the first parameter to wake the second processor core 830 in a low power consumption state.
In one possible implementation, the second processor core 830 may execute a wait for event instruction, which may be a WFE instruction or a WFET instruction, to enter a low power state.
In one possible implementation, the target instruction is a send event instruction, which may employ the instruction structure shown in fig. 6.
It should be appreciated that in embodiments of the present application, the processor 800 may be a central processing unit (central processing unit, CPU), the processor 800 may also be other general purpose processors, digital signal processors (digital signal processor, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), off-the-shelf programmable gate arrays (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor 800 may be any conventional processor or the like.
In implementation, the steps of the methods described above may be performed by integrated logic circuitry in hardware or instructions in software in processor 800. The method for requesting uplink transmission resources disclosed in connection with the embodiment of the present application may be directly embodied as a hardware processor executing or may be executed by a combination of hardware and software modules in the processor 800. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. To avoid repetition, a detailed description is not provided herein.
Fig. 9 is a schematic diagram of an electronic device according to an embodiment of the application. Fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 9, the electronic device 900 includes a memory 910, a processor 920, and an input/output interface 930. The memory 910, the processor 920, and the input/output interface 930 are connected through an internal connection path, where the memory 910 is configured to store instructions, and the processor 920 is configured to execute the instructions stored in the memory 920, so as to control the input/output interface 930 to receive input data and information, and output data such as an operation result.
The memory 910 may include read only memory and random access memory and provide instructions and data to the processor 920. A portion of the processor 920 may also include nonvolatile random access memory. For example, the processor 920 may also store information of the device type.
In one possible implementation, the processor 920 may be the processor 800 shown in fig. 8. It should be appreciated that in the embodiment of the present application, the processor 920 may be a general-purpose central processing unit (central processing unit, CPU), a microprocessor, an application-specific integrated circuit (application specific integrated circuit, ASIC), or one or more integrated circuits for executing related programs to implement the technical solutions provided by the embodiments of the present application.
In implementation, the steps of the above method may be performed by integrated logic circuitry in hardware or instructions in software in processor 920. The method for requesting uplink transmission resources disclosed in connection with the embodiment of the present application may be directly embodied as a hardware processor executing or may be executed by a combination of hardware and software modules in the processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 910, and the processor 920 reads the information in the memory 910 and performs the steps of the method in combination with the hardware. To avoid repetition, a detailed description is not provided herein.
It should be appreciated that in embodiments of the present application, the processor may be a central processing unit (central processing unit, CPU), the processor may also be other general purpose processors, digital signal processors (digital signal processor, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), off-the-shelf programmable gate arrays (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The present application also provides a computer readable storage medium storing program code which, when run on a computer, causes the computer to perform the above-described method of accessing critical sections.
The application also provides a computer program product comprising a computer program/instruction which when executed by a computer program/instruction processor implements the above method of accessing critical sections.
In some implementations, the computer program product includes computer program code that can include computer program code that, when run on a computer, causes the computer to perform the method of accessing critical sections described above.
It should be understood that in embodiments of the present application, "B corresponding to a" means that B is associated with a, from which B may be determined. It should also be understood that determining B from a does not mean determining B from a alone, but may also determine B from a and/or other information.
It should be understood that the term "and/or" is merely an association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.
It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.
In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber Line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be read by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a digital versatile disk (digital video disc, DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method of accessing a critical section, the method being applied to a processor, the processor comprising a first processor core and a second processor core, the method comprising:
if the lock in the critical section is occupied by the first processor core, the second processor core enters a low power consumption state;
after the first processor core releases the lock of the critical section, the first processor core wakes up the second processor core in a low power consumption state based on a first instruction, wherein the first instruction carries a first parameter, and the first parameter is used for indicating that a target processor core of the first instruction is the second processor core;
after the second processor core is awakened, the second processor core preempts the lock of the critical section such that the second processor accesses the critical section.
2. The method of claim 1, wherein the first parameter is an identification of a register and the register has stored therein an identification of the second processor core.
3. The method of claim 1 or 2, wherein the first processor core wakes up the second processor core in a low power state based on a first instruction, comprising:
the first processor core acquires a target instruction from a memory;
the first processor checks the target instruction to decode so as to acquire the first parameter;
the first processor core sends an event notification signal to the second processor core according to the first parameter to wake up the second processor core in the low power consumption state.
4. The method of claim 1 or 2, wherein the second processor core entering a low power consumption state comprises:
the second processor core executes a wait for event instruction to enter the low power state.
5. The method of claim 1, wherein the target instruction is a send event instruction.
6. A processor, comprising:
a storage section for storing a code;
A first processor core and a second processor core for executing the code to perform the steps of:
if the lock in the critical section is occupied by the first processor core, the second processor core enters a low power consumption state;
after the first processor core releases the lock of the critical section, the first processor core wakes up the second processor core in a low power consumption state based on a first instruction, wherein the first instruction carries a first parameter, and the first parameter is used for indicating that a target processor core of the first instruction is the second processor core;
after the second processor core is awakened, the second processor core preempts the lock of the critical section such that the second processor accesses the critical section.
7. The processor of claim 6, wherein the first parameter is an identification of a register and the register has stored therein an identification of the second processor core.
8. The processor of claim 6 or 7, wherein the first processor core is to:
acquiring a target instruction from a memory, and decoding the target instruction to acquire the first parameter;
And sending an event notification signal to the second processor core according to the first parameter so as to wake up the second processor core in the low power consumption state.
9. An electronic device comprising the processor of any of claims 6-8.
10. A computer readable storage medium, characterized in that the computer readable storage medium stores a program code which, when run on a computer, causes the computer to perform the method of any of claims 1-5.
CN202310962715.4A 2023-08-01 2023-08-01 Method and device for accessing critical section Pending CN117009079A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310962715.4A CN117009079A (en) 2023-08-01 2023-08-01 Method and device for accessing critical section

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310962715.4A CN117009079A (en) 2023-08-01 2023-08-01 Method and device for accessing critical section

Publications (1)

Publication Number Publication Date
CN117009079A true CN117009079A (en) 2023-11-07

Family

ID=88561369

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310962715.4A Pending CN117009079A (en) 2023-08-01 2023-08-01 Method and device for accessing critical section

Country Status (1)

Country Link
CN (1) CN117009079A (en)

Similar Documents

Publication Publication Date Title
EP3274816B1 (en) User-level fork and join processors, methods, systems, and instructions
JP5876458B2 (en) SIMD vector synchronization
KR101842058B1 (en) Instruction and logic to provide pushing buffer copy and store functionality
JP6143872B2 (en) Apparatus, method, and system
US20170212825A1 (en) Hardware profiling mechanism to enable page level automatic binary translation
EP3757769B1 (en) Systems and methods to skip inconsequential matrix operations
GB2529777A (en) Processor with granular add immediates capability and methods
EP3644179A2 (en) Apparatus and method for tile gather and tile scatter
GB2514881A (en) Robust and high performance instructions for system call
US20210042146A1 (en) Systems, Methods, and Apparatuses for Resource Monitoring
EP3757765A1 (en) Apparatus and method for modifying addresses, data, or program code associated with offloaded instructions
US11354128B2 (en) Optimized mode transitions through predicting target state
EP3295299A1 (en) Decoding information about a group of instructions including a size of the group of instructions
US10073775B2 (en) Apparatus and method for triggered prefetching to improve I/O and producer-consumer workload efficiency
US20220318014A1 (en) Method and apparatus for data-ready memory operations
CN117009079A (en) Method and device for accessing critical section
US20110173420A1 (en) Processor resume unit
US10296338B2 (en) System, apparatus and method for low overhead control transfer to alternate address space in a processor
US10394678B2 (en) Wait and poll instructions for monitoring a plurality of addresses
US20090063881A1 (en) Low-overhead/power-saving processor synchronization mechanism, and applications thereof
US11907712B2 (en) Methods, systems, and apparatuses for out-of-order access to a shared microcode sequencer by a clustered decode pipeline
US11467844B2 (en) Storing multiple instructions in a single reordering buffer entry
US20210089305A1 (en) Instruction executing method and apparatus
US11436146B2 (en) Storage control apparatus, processing apparatus, computer system, and storage control method
US7290153B2 (en) System, method, and apparatus for reducing power consumption in a microprocessor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination