CN118034958A - Task state notification system and method for multi-process scene - Google Patents
Task state notification system and method for multi-process scene Download PDFInfo
- Publication number
- CN118034958A CN118034958A CN202410410674.2A CN202410410674A CN118034958A CN 118034958 A CN118034958 A CN 118034958A CN 202410410674 A CN202410410674 A CN 202410410674A CN 118034958 A CN118034958 A CN 118034958A
- Authority
- CN
- China
- Prior art keywords
- uio
- interrupt
- independent
- shared
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 177
- 230000008569 process Effects 0.000 claims abstract description 128
- 230000001133 acceleration Effects 0.000 claims abstract description 77
- 239000013598 vector Substances 0.000 claims abstract description 39
- 238000012545 processing Methods 0.000 claims description 22
- 238000004590 computer program Methods 0.000 claims description 15
- 238000013507 mapping Methods 0.000 claims description 10
- 230000001960 triggered effect Effects 0.000 claims description 7
- 238000010586 diagram Methods 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 230000006835 compression Effects 0.000 description 4
- 238000007906 compression Methods 0.000 description 4
- 230000006837 decompression Effects 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 239000008186 active pharmaceutical agent Substances 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000009877 rendering Methods 0.000 description 2
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013475 authorization Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 229910021389 graphene Inorganic materials 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/542—Event management; Broadcasting; Multicasting; Notifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/546—Message passing systems or structures, e.g. queues
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Debugging And Monitoring (AREA)
Abstract
One or more embodiments of the present disclosure provide a task state notification system and method for a multi-process scenario, where the system includes: the shared UIO equipment is used for receiving an interrupt issued by the shared hardware queue under the condition that the hardware acceleration equipment completes a target task in the shared hardware queue, wherein the interrupt carries a user vector corresponding to a user state process issuing the target task; searching in a plurality of independent UIO devices based on the user vector, and issuing the interrupt to the searched independent UIO devices; the shared hardware queue is shared by user state processes corresponding to the plurality of independent UIO devices; the independent UIO devices are in one-to-one correspondence with the user state processes and are used for receiving the interrupt issued by the shared UIO devices so that the user state process corresponding to the independent UIO devices can acquire the completion state of the target task issued by the independent UIO devices based on the interrupt.
Description
Technical Field
One or more embodiments of the present disclosure relate to the field of computers, and more particularly, to a task state notification system and method for a multi-process scenario.
Background
Hardware acceleration devices refer to specialized hardware components designed to improve performance of specific computing tasks (e.g., encryption, decryption, compression, decompression), and each hardware acceleration device typically includes multiple hardware queues to manage and optimize workload allocation and resource utilization within the hardware acceleration device, where the hardware queues play a vital role in improving system performance and concurrent processing capacity.
In the related art, each hardware queue is typically associated with a single process and is managed and scheduled by an operating system kernel. However, with the development of cloud computing, big data and distributed computing, the demand for shared resources is increasing, and the exclusive use of hardware queues by a single process cannot meet the requirement of high-performance parallel processing.
Disclosure of Invention
In view of this, one or more embodiments of the present description provide a task state notification system and method for a multi-process scenario.
In order to achieve the above object, one or more embodiments of the present disclosure provide the following technical solutions:
according to a first aspect of one or more embodiments of the present specification, there is provided a task state notification system of a multi-process scenario, including:
The shared UIO equipment is used for receiving an interrupt issued by the shared hardware queue under the condition that the hardware acceleration equipment completes a target task in the shared hardware queue, wherein the interrupt carries a user vector corresponding to a user state process issuing the target task; searching in a plurality of independent UIO devices based on the user vector, and issuing the interrupt to the searched independent UIO devices; the shared hardware queue is shared by user state processes corresponding to the plurality of independent UIO devices;
The independent UIO devices are in one-to-one correspondence with the user state processes and are used for receiving the interrupt issued by the shared UIO devices so that the user state process corresponding to the independent UIO devices can acquire the completion state of the target task issued by the independent UIO devices based on the interrupt.
According to a second aspect of one or more embodiments of the present disclosure, a task state notification method for a multi-process scenario is provided, including:
Under the condition that the hardware acceleration equipment completes a target task in a shared hardware queue, the shared UIO equipment receives an interrupt issued by the shared hardware queue, wherein the interrupt carries a user vector corresponding to a user state process issuing the target task; searching in a plurality of independent UIO devices based on the user vector, and issuing the interrupt to the searched independent UIO devices; the shared hardware queue is shared by user state processes corresponding to the plurality of independent UIO devices, and the independent UIO devices are in one-to-one correspondence with the user state processes;
and the independent UIO equipment receives the interrupt issued by the shared UIO equipment so as to obtain the completion state of the target task issued by the independent UIO equipment based on the interrupt by the user state process corresponding to the independent UIO equipment.
According to a third aspect of one or more embodiments of the present specification, an electronic device is presented, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method according to the second aspect described above when executing the computer program.
According to a fourth aspect of one or more embodiments of the present description, a computer-readable storage medium is presented, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the method of the second aspect described above.
According to a fifth aspect of one or more embodiments of the present description, a computer program product is presented, comprising a computer program/instruction which, when executed by a processor, implements the method of the second aspect described above.
In the technical scheme provided by the specification, under the condition that the hardware acceleration device completes the target task in the shared hardware queue, the shared UIO device can receive the interrupt issued by the shared hardware queue. Because the interrupt carries the user vector corresponding to the user state process which issues the target task, the independent UIO device corresponding to the interrupt can be found based on the user vector, and the interrupt is further issued to the independent UIO device. Because the independent UIO devices are in one-to-one correspondence with the user state processes, the user state process corresponding to the independent UIO device which receives the interrupt can acquire the completion state of the target task which is issued by the independent UIO device based on the interrupt. By applying the technical scheme of the specification, under the condition that the hardware acceleration device completes the target task, the completion condition of the target task can be accurately notified to the process for issuing the task, so that one hardware queue can be shared by a plurality of user state processes, hardware resource disputes among the plurality of hardware queues are avoided, reliable support is provided for sharing hardware resources by a plurality of processes, and the method is more suitable for cloud scenes with a large number of processes with hardware acceleration requirements.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the embodiments of the disclosure.
Drawings
FIG. 1 is a schematic diagram of a computer system architecture according to an exemplary embodiment.
Fig. 2 is a block diagram of a task state notification system for a multiprocess scenario provided by an exemplary embodiment.
Fig. 3 is a block diagram of a task state notification system for a multiprocess scenario provided by an exemplary embodiment.
Fig. 4 is a block diagram of a task state notification system for a multiprocess scenario provided by an exemplary embodiment.
Fig. 5 is a flowchart of a task state notification method for a multi-process scenario provided by an exemplary embodiment.
Fig. 6 is a detailed flowchart of a task state notification method of a multi-process scenario provided by an exemplary embodiment.
Fig. 7 is a schematic diagram of an apparatus according to an exemplary embodiment.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with aspects of one or more embodiments of the present description as detailed in the accompanying claims.
It should be noted that: in other embodiments, the steps of the corresponding method are not necessarily performed in the order shown and described in this specification. In some other embodiments, the method may include more or fewer steps than described in this specification. Furthermore, individual steps described in this specification, in other embodiments, may be described as being split into multiple steps; while various steps described in this specification may be combined into a single step in other embodiments.
User information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in this specification are both information and data authorized by the user or sufficiently authorized by the parties, and the collection, use and processing of relevant data requires compliance with relevant laws and regulations and standards of the relevant country and region, and is provided with corresponding operation portals for the user to choose authorization or denial.
Hardware acceleration devices refer to specialized hardware components designed specifically to improve the performance of a particular computing task, such devices achieving higher processing speeds, lower latency, and better energy efficiency by transferring computationally intensive or time-sensitive tasks that would otherwise be performed by the CPU to specialized circuits or processors. For example, intel fast assist acceleration technology (Intel Quick Assist Technology, QAT) provides acceleration functions of encryption, decryption, compression, decompression, graphics processors (Graphics Processing Unit, GPU) can be used for graphics rendering and high performance computing, field-Programmable gate arrays (Field-Programmable GATE ARRAY, FPGA) and Application-specific integrated circuits (ASIC) are suitable for custom computing tasks such as encryption, decryption, signal processing, deep learning reasoning, etc. The hardware queue is a data structure in the hardware acceleration device or related to the hardware acceleration device, and is used for temporarily storing data packets, commands or tasks to be processed. For example, in a memory controller, a hardware queue is used to manage disk I/O requests, and in a GPU, the hardware queue may be used to schedule different types of graphics or computing tasks.
Currently, one hardware acceleration device may correspond to a plurality of hardware queues, and the hardware acceleration device is capable of processing tasks in the plurality of hardware queues in parallel. It is noted that the hardware queues in the related art are typically associated with only a single process and are managed and scheduled by the operating system kernel. However, with the development of cloud computing, big data, and distributed computing, the demand for shared resources is increasing, and there is a problem that hardware resources are limited, but the number of processes that want to accelerate hardware through hardware queues is enormous. The related art scheme has a great limitation not only in a container scene, but also in a cloud scene, since the number of processes that want to use the hardware acceleration device is more unpredictable, the related art scheme is difficult to apply to the cloud scene. In addition, the problem of resource contention and lock contention among multiple hardware queues in the related art may also result, thereby affecting throughput and latency performance of the overall system.
In order to solve the above problems, the present disclosure proposes a task state notification system and method for a multi-process scenario, which can accurately notify a process for issuing a target task of the completion of the target task when a hardware acceleration device completes the target task, so that one hardware queue can be shared and used by a plurality of user state processes, avoiding hardware resource contention among a plurality of hardware queues, providing reliable support for sharing hardware resources by a plurality of processes, and being more suitable for a cloud scenario where a large number of processes with hardware acceleration requirements exist.
FIG. 1 is a schematic diagram of a computer system architecture according to an exemplary embodiment. As shown in fig. 1, the architecture is a hierarchical structure, and is mainly divided into a user mode 11, a kernel mode 12 and a hardware device layer 13.
User state 11 is the environment in which applications and libraries are running, and processes in user state typically have no direct access to the underlying hardware resources. The application program interacts with the kernel through a system call interface to perform operations such as reading and writing files, creating processes, allocating memory, initiating network requests and the like. Kernel mode 12 is the core part of the operating system, has the highest authority level, and is responsible for managing the core functions of hardware resources, system call, interrupt processing, memory management, process scheduling, file system, network protocol stack, drivers and the like. The kernel provides a driver for the hardware device, and the driver is used as a bridge between hardware and software and is responsible for interacting with the hardware and controlling the behavior of the hardware device. The hardware device layer 13 includes various hardware components such as a CPU, a memory, a hard disk, a network interface card, and a hardware acceleration device (e.g., QAT, GPU), which are connected to each other by a bus and other communication mechanisms.
Typically, when a user-mode process needs to perform a privileged operation (e.g., access a hardware device, allocate memory, etc.), a request is initiated to the operating system via a specific system call instruction. Executing the system call triggers a context switch process from the user mode to the kernel mode, in which the processor stores state information of the current user mode process and switches to the kernel mode to run a corresponding service routine. After the privileged operation is completed, the kernel returns the operation result to the user state process which initiates the system call, and restores the state before the process so as to continue execution. In a scenario associated with a hardware acceleration device, a user-mode process may communicate with a driver in the kernel by invoking a specific API interface. These APIs allow user-state processes to interact indirectly with the hardware acceleration device, e.g., they may submit computing tasks to the hardware queue for processing, or respond to interrupt events from the hardware acceleration device to properly manage and schedule hardware resources.
Fig. 2 is a block diagram of a task state notification system for a multiprocess scenario provided by an exemplary embodiment. As shown in fig. 2, the system may include a shared uoo device 21 and a plurality of independent uoo devices, such as independent uoo device 221, independent uoo device 222, independent uoo device 223, and independent uoo device 224.
In one embodiment, the UIO device is Userspace I/O device, i.e., user space input/output device. The independent UIO device 221, the independent UIO device 222, the independent UIO device 223, and the independent UIO device 224 may be obtained by applying for the kernel UIO framework to obtain the shared UIO device 21 and applying for the independent UIO device for each user state process. As shown, one shared uoo device 21 may correspond to multiple independent uoo devices, with the shared uoo device 21 and each independent uoo device being in a kernel mode. As described above, in the related art, a driver of a hardware device in a system is located in kernel space, and the driver is responsible for interacting with hardware and controlling the behavior of the hardware device as a bridge between the hardware and software (user-mode processes). The UIO framework provides a mechanism whereby user state processes can directly access and control hardware registers, map device memory to process address space through mmap () system calls, and then operate the hardware by reading and writing these memory regions. Meanwhile, the UIO equipment also provides a basic interrupt processing mechanism, so that the user state process can monitor the occurrence of the interrupt.
In one embodiment, the shared hardware queue is associated with a hardware acceleration device, where the shared hardware queue includes one or more processing tasks issued by different user-mode processes, and the hardware acceleration device needs to process the processing tasks in the shared hardware queue. For example, if the hardware acceleration device is intel rapid assisted acceleration technology (Intel Quick Assist Technology, QAT), the QAT is required to encrypt and decrypt the task in the shared hardware queue or compress and decompress the task; if the hardware acceleration device is a graphics processor (Graphics Processing Unit, GPU), then graphics rendering or high-performance computation of tasks in the shared hardware queue is required. When the hardware acceleration device completes a certain task in the shared hardware queue, the task can be used as a target task, and meanwhile, it can be understood that the target task should be issued by a certain user state process, and when the hardware acceleration device finishes processing the target task, the user state process issuing the target task should be notified to execute subsequent processing steps by the user state process.
In one embodiment, when a hardware acceleration device finishes processing a target task in a shared hardware queue, an interrupt may be triggered and issued to a shared uoo device by a shared hardware queue associated with the hardware acceleration device. It should be noted that the interrupt carries both an interrupt vector and a user vector. After the interrupt is triggered, the shared hardware queue can find the shared UIO device corresponding to the interrupt based on the interrupt vector, and meanwhile, because the interrupt carries a user vector and the user vector can identify the independent UIO device corresponding to the user state process for issuing the target task, the shared UIO device can search in a plurality of independent UIO devices corresponding to the shared hardware queue based on the user vector and issue the interrupt to the searched independent UIO device. Further, because the independent uoo devices are in one-to-one correspondence with the user state processes, after an independent uoo device receives the interrupt, the corresponding user state process can obtain the interrupt information in the independent uoo device. Under the condition that the user state process obtains the interrupt information in the corresponding independent UIO device, the hardware acceleration device can be considered to finish the target task issued by the user state process, namely, the user state process can obtain the finishing state of the target task issued by the user state process based on the interrupt received by the corresponding independent UIO device.
For example, the hardware acceleration device may trigger an interrupt when the target task a in the shared hardware queue is processed, and the shared hardware queue may issue the interrupt to the shared uoo device 21. The shared uoo device 21 searches for the user vector carried by the interrupt in the independent uoo device 221, the independent uoo device 222, the independent uoo device 223, and the independent uoo device 224. Assuming that the found independent UIO device is independent UIO device 223, shared UIO device 21 issues the interrupt to independent UIO device 223. After that, the independent UIO device 223 receives the interrupt issued by the shared UIO device 21, and the user state process corresponding to the independent UIO device 223 can monitor the interrupt, so as to obtain the completion state of the target task issued by the independent UIO device 223.
According to the embodiment, by applying the technical scheme of the specification, under the condition that the hardware acceleration device completes the target task, the completion condition of the target task can be accurately notified to the process for issuing the task, so that one hardware queue can be shared and used by a plurality of user state processes, hardware resource contention among the plurality of hardware queues is avoided, reliable support is provided for sharing hardware resources by a plurality of processes, and the method is more suitable for cloud scenes with a large number of processes with hardware acceleration requirements.
In the related art, the PASID (Process ADDRESS SPACE ID ) context provides a mechanism to ensure that the hardware acceleration device can properly access the memory of the target Process. When a hardware acceleration device initiates a DMA (Direct Memory Access, i.e., direct memory access) request, by carrying the PASID information, the IOMMU (I/O Memory Management Unit, i.e., input/output memory management unit) can map the DMA request of the hardware acceleration device to the corresponding process address space based on the context, ensuring that the device can only access the memory area authorized to it, enhancing the security of the system and optimizing the resource management. In modern operating systems, particularly where I/O virtualization and device sharing are involved, it is the kernel and IOMMU that use the PASID context to manage hardware access to memory. When a user-mode process needs hardware acceleration equipment to process certain tasks, the user-mode process sends a request to a kernel through a system call, the kernel checks whether a corresponding PASID context is set for the user-mode process after receiving the request, and if the corresponding PASID context is not set, the kernel creates or updates the PASID context and associates the PASID context with the hardware acceleration equipment. When the hardware acceleration device performs DMA operation according to the PASID, the IOMMU converts the physical address into a logical address in a process address space of the corresponding user state process according to the established context mapping table.
Fig. 3 is a block diagram of a task state notification system for a multiprocess scenario provided by an exemplary embodiment. As shown in fig. 3, the shared UIO device in the system maintains the PASID context information, such as the PASID context information 31, the PASID context information 32, the PASID context information 33, and the PASID context information 34, respectively corresponding to a plurality of user state processes, where the plurality of PASID context information is organized in a linked list. Each PASID context information comprises description information of an independent UIO device corresponding to a corresponding user state process, and the description information is equivalent to a one-to-one correspondence relationship among the user state process, the independent UIO device and the PASID context information.
Specifically, each piece of PASID context information maintained by the shared UIO device may include: the description information of the independent PASID, the process ID and the independent UIO equipment corresponding to the corresponding user state process. The description information of the independent UIO equipment can be matched with the user vector. When the shared UIO equipment receives the interrupt issued by the shared hardware queue, the description information in each PASID context information maintained by the shared UIO equipment can be searched, and the description information matched with the user vector carried by the interrupt and the PASID context information containing the description information are determined. Specifically, when the plurality of PASID context information is organized in a linked list form, the shared UIO device can be searched in a traversal mode; when the plurality of PASID context information is organized in a tree structure, the shared UIO equipment can be searched in a tree searching mode; of course, the plurality of the PASID context information may be organized in other forms and searched in a corresponding manner, for example, a hash table and a hash function are constructed and searched through hash values, which is not limited in this specification. Because the user state process, the independent UIO device and the PASID context information are in one-to-one correspondence, the independent UIO device corresponding to the PASID context information can be determined when the PASID context information is determined, and the independent UIO device is determined to be the independent UIO device searched based on the user vector.
According to the embodiment, the description information of the independent UIO devices corresponding to each user state process is recorded in the PASID context information, and the shared UIO devices maintain the PASID context information of a plurality of user state processes, so that the shared UIO devices can be precisely matched based on user vectors, the independent UIO devices corresponding to the interrupt can be accurately found out, the interrupt can be issued to the independent UIO devices truly corresponding to the interrupt, and a guarantee is provided for the follow-up user state processes to obtain correct interrupt information.
In an embodiment, the independent uoo device does not send information to its corresponding user mode process, so that when the independent uoo device receives an interrupt issued by the shared uoo device, the independent uoo device exposes the interrupt to its corresponding user mode process. The user state process can acquire the interrupt by polling the independent UIO device and acquire the completion state of the target task it issues based on the interrupt. Specifically, the user state process may monitor a specific memory area in the independent UIO device through a poll function to detect an interrupt state or obtain interrupt information. If the user state process obtains that interrupt information exists in the corresponding independent UIO device, the user state process indicates that the target task issued by the user state process is processed by the hardware acceleration device, and then further operations can be executed, for example, specific information of the target task after the processing is completed is obtained by reading a hardware acceleration device register or other memory areas related to the hardware acceleration device. By the method, under the condition that the independent UIO equipment receives the interrupt, the corresponding user state process can acquire the completion state of the task issued by the independent UIO equipment according to the interrupt, so that accurate notification of the completion condition of the target task is realized.
Fig. 4 is a block diagram of a task state notification system for a multiprocess scenario provided by an exemplary embodiment. As shown in fig. 4, the system further includes a hardware acceleration device interrupt table 41 and a hardware acceleration device 42. The hardware acceleration device interrupt table 41 records a specific index address corresponding to the shared hardware queue, and the specific index address is applied to the operating system by the driver of the hardware acceleration device. And the hardware acceleration device 42 is configured to process the target task in the shared hardware queue, and in case of completing the target task, perform a write operation on a specific index address in the hardware acceleration device interrupt table 41, so that the interrupt is triggered.
Specifically, the hardware acceleration device interrupt table 41 may include an MSI-X (MESSAGE SIGNALED Interrupt eXtended, message signal interrupt extension) table and an MSI (MESSAGE SIGNALED Interrupts, message signal interrupt) table. When the hardware accelerating device completes the target task, the driver of the hardware accelerating device can write a data value into the corresponding specific index address in the MSI or MSI-X table according to the rule set during the initialization, and the interrupt can be triggered through the writing operation. Meanwhile, according to the address written by the writing operation, the shared hardware queue corresponding to the interrupt, such as shared hardware queue 0, can be identified, and then the shared hardware queue 0 can further issue the interrupt to the shared UIO device. By means of the method, the hardware acceleration device can accurately trigger the interrupt and identify the corresponding shared hardware queue under the condition that the target task is completed, so that the shared hardware queue can further issue the interrupt, and guarantee is provided for execution of subsequent steps.
In one embodiment, the hardware acceleration device may include various types of devices, such as an encryption/decryption accelerator, a compression/decompression accelerator, and a QAT that supports both encryption and decryption and compression/decompression functions, and further a graphics processor GPU, a field programmable gate array FPGA, an application specific integrated circuit ASIC, a smart network card SMARTNICS, a packet processing unit DPUs, a signal processing unit DSP, a storage accelerator, and an AI accelerator, among others. In practical applications, a technician may select a specific hardware acceleration device according to needs, which the present application is not limited to.
Fig. 5 is a flowchart of a task state notification method for a multi-process scenario provided by an exemplary embodiment. As shown in fig. 5, the method comprises the steps of:
Step 501, under the condition that a hardware acceleration device completes a target task in a shared hardware queue, a shared uo device receives an interrupt issued by the shared hardware queue, wherein the interrupt carries a user vector corresponding to a user state process issuing the target task; searching in a plurality of independent UIO devices based on the user vector, and issuing the interrupt to the searched independent UIO devices; the shared hardware queue is shared by user state processes corresponding to the plurality of independent UIO devices, and the independent UIO devices are in one-to-one correspondence with the user state processes;
In step 502, the independent UIO device receives the interrupt issued by the shared UIO device, so that the user state process corresponding to the independent UIO device obtains the completion state of the target task issued by the independent UIO device based on the interrupt.
In an embodiment, the shared uoo device and the independent uoo device related to the method are the shared uoo device and the independent uoo device in the task state notification system of the multi-process scenario shown in fig. 2. The relevant content may be referred to the relevant description in the system-side embodiment described above.
In one embodiment, the user-mode process may issue tasks to the hardware acceleration device based on SVM (Shared Virtual Memory ) technology. Specifically, the kernel maps the memory area of the hardware acceleration device to the address space of the user state process through the IOMMU, so that the user state process can directly read and write the memory. When a user state process needs to issue a task, the user state process can request an operating system to allocate resources of the hardware acceleration device through system call, and acquire a memory pointer of the hardware acceleration device in a mapping memory area. The mapping memory area is an address space of the user state process obtained by mapping the memory area of the hardware acceleration device, and meanwhile, the user state process can directly read and write the mapping memory area through the memory pointer. And then, the user state process can configure task information and PASID context information, in the process, the user vector is written into a user vector register to realize one-to-one correspondence between the user vector and the PASID, and the construction of the correspondence among the user state process, the PASID context and the independent UIO is completed based on the correspondence. And finally, filling the task information into the mapping memory area, and completing the issuing of the target task.
The flow of the technical solution of the present specification will be described in detail with reference to a detailed flowchart of a task state notification method of a multiprocess scenario shown in fig. 6.
After the hardware acceleration device completes the target task in the shared hardware queue:
Step 601, triggering an interrupt by writing a specific index address in the MSI-X or MSI table, wherein the interrupt carries an interrupt vector and a user vector;
step 602, based on the interrupt vector, the shared hardware queue corresponding to the interrupt further issues the interrupt to the shared uo device;
Step 603, after the shared uo device receives the interrupt, searching for multiple pieces of PAsid context information corresponding to each user state process, which are maintained by the shared uo device, based on the user vector carried by the interrupt and the description information of the independent uo device contained in the PAsid context information, searching for the PAsid context information matched with the user vector, and determining the independent uo device corresponding to the determined PAsid context information as the independent uo device found based on the user vector;
step 604, after finding the independent UIO device corresponding to the interrupt, the shared UIO device further issues the interrupt to the found independent UIO device;
in step 605, the user state process polls the independent UIO device, so that the user state process can finally obtain the completion status of the target task issued by the user state process.
Fig. 7 is a schematic block diagram of an electronic device according to an exemplary embodiment. Referring to FIG. 7, at the hardware level, the device includes a processor 702, an internal bus 704, a network interface 706, memory 708, a hardware acceleration device 710, and non-volatile storage 712, although other hardware required for other functions may be included. One or more embodiments of the present description may be implemented in a software-based manner, such as by the processor 702 reading a corresponding computer program from the non-volatile storage 712 into the memory 708 and then running. Of course, in addition to software implementation, one or more embodiments of the present disclosure do not exclude other implementation manners, such as a logic device or a combination of software and hardware, etc., that is, the execution subject of the following processing flow is not limited to each logic unit, but may also be hardware or a logic device.
Accordingly, the present specification also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements a method as described in any of the embodiments above.
Accordingly, embodiments of the present specification also propose a computer program product configured to perform a method according to any of the embodiments described above.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. A typical implementation device is a computer, which may be in the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or a combination of any of these devices.
In a typical configuration, a computer includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, read only compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage, quantum memory, graphene-based storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by the computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
The terminology used in the one or more embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the specification. As used in this specification, one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in one or more embodiments of the present description to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination" depending on the context.
The foregoing description of the preferred embodiment(s) is (are) merely intended to illustrate the embodiment(s) of the present invention, and it is not intended to limit the embodiment(s) of the present invention to the particular embodiment(s) described.
Claims (12)
1. A task state notification system for a multiprocess scenario, comprising:
The shared UIO equipment is used for receiving an interrupt issued by the shared hardware queue under the condition that the hardware acceleration equipment completes a target task in the shared hardware queue, wherein the interrupt carries a user vector corresponding to a user state process issuing the target task; searching in a plurality of independent UIO devices based on the user vector, and issuing the interrupt to the searched independent UIO devices; the shared hardware queue is shared by user state processes corresponding to the plurality of independent UIO devices;
The independent UIO devices are in one-to-one correspondence with the user state processes and are used for receiving the interrupt issued by the shared UIO devices so that the user state process corresponding to the independent UIO devices can acquire the completion state of the target task issued by the independent UIO devices based on the interrupt.
2. The system of claim 1, wherein the shared UIO device maintains a plurality of PASID context information corresponding to each of the user state processes, the PASID context information including description information of the independent UIO device corresponding to the corresponding user state process;
The shared UIO device is specifically configured to:
searching description information in each PASID context information, and determining the description information matched with the user vector and the PASID context information containing the description information;
and determining the independent UIO equipment corresponding to the determined PASID context information as the independent UIO equipment searched based on the user vector.
3. The system of claim 1, wherein the independent UIO device is specifically configured to:
under the condition that the interrupt issued by the shared UIO equipment is received, exposing the interrupt to a user state process corresponding to the interrupt, acquiring the interrupt by the user state process in a polling mode, and acquiring the completion state of a target task issued by the interrupt based on the interrupt.
4. The system of claim 1, further comprising:
a hardware acceleration device interrupt table records a specific index address corresponding to the shared hardware queue, wherein the specific index address is applied to an operating system by a driver of the hardware acceleration device;
And the hardware acceleration device is used for processing the target task in the shared hardware queue, and writing the specific index address in the hardware acceleration device interrupt table under the condition that the target task is completed, so that the interrupt is triggered.
5. A task state notification method for a multiprocess scenario, comprising:
Under the condition that the hardware acceleration equipment completes a target task in a shared hardware queue, the shared UIO equipment receives an interrupt issued by the shared hardware queue, wherein the interrupt carries a user vector corresponding to a user state process issuing the target task; searching in a plurality of independent UIO devices based on the user vector, and issuing the interrupt to the searched independent UIO devices; the shared hardware queue is shared by user state processes corresponding to the plurality of independent UIO devices, and the independent UIO devices are in one-to-one correspondence with the user state processes;
and the independent UIO equipment receives the interrupt issued by the shared UIO equipment so as to obtain the completion state of the target task issued by the independent UIO equipment based on the interrupt by the user state process corresponding to the independent UIO equipment.
6. The method of claim 5, wherein the shared UIO device maintains a plurality of PASID context information corresponding to each of the user state processes, the PASID context information including description information of the independent UIO device corresponding to the corresponding user state process;
The searching in a plurality of independent uoo devices based on the user vector carried by the interrupt, and issuing the interrupt to the searched independent uoo devices includes:
searching description information in each PASID context information, and determining the description information matched with the user vector and the PASID context information containing the description information;
And determining the UIO equipment corresponding to the determined PASID context information as independent UIO equipment searched based on the user vector.
7. The method of claim 5, wherein the receiving, by the independent UIO device, the interrupt issued by the shared UIO device to obtain, by the corresponding user state process of the independent UIO device, a completion status of the target task issued by the independent UIO device based on the interrupt, includes:
Under the condition that the independent UIO device receives the interrupt issued by the shared UIO device, the interrupt is exposed to a user state process corresponding to the independent UIO device, the interrupt is acquired by the user state process in a polling mode, and the completion state of a target task issued by the independent UIO device is acquired based on the interrupt.
8. The method of claim 5, wherein the interrupt is triggered by:
Under the condition that the hardware acceleration device completes a target task in a shared hardware queue, writing a specific index address in an interrupt table of the hardware acceleration device so that the interrupt is triggered;
the hardware acceleration device interrupt table records a specific index address corresponding to the shared hardware queue, and the specific index address is applied to an operating system by a driver of the hardware acceleration device.
9. The method of claim 5, wherein the user-mode process issues the target task to the hardware acceleration device by:
Requesting an operating system to allocate resources of the hardware acceleration device through system call, and acquiring a memory pointer of the hardware acceleration device in a mapping memory area, wherein the mapping memory area is used for representing an address space of a user state process obtained by mapping the memory area of the hardware acceleration device;
and configuring task information and PASID context information, and filling the task information into the mapping memory area.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any one of claims 5 to 9 when executing the computer program.
11. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 5 to 9.
12. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the method of any of claims 5 to 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410410674.2A CN118034958B (en) | 2024-04-07 | 2024-04-07 | Task state notification system and method for multi-process scene |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410410674.2A CN118034958B (en) | 2024-04-07 | 2024-04-07 | Task state notification system and method for multi-process scene |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118034958A true CN118034958A (en) | 2024-05-14 |
CN118034958B CN118034958B (en) | 2024-08-06 |
Family
ID=90993563
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410410674.2A Active CN118034958B (en) | 2024-04-07 | 2024-04-07 | Task state notification system and method for multi-process scene |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118034958B (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102971723A (en) * | 2011-06-28 | 2013-03-13 | 华为技术有限公司 | Distributed multi-process communication method and device |
US20200320017A1 (en) * | 2019-04-04 | 2020-10-08 | Cisco Technology, Inc. | Network interface card resource partitioning |
CN112698963A (en) * | 2020-12-22 | 2021-04-23 | 新华三技术有限公司成都分公司 | Event notification method and device |
CN113535433A (en) * | 2021-07-21 | 2021-10-22 | 广州市品高软件股份有限公司 | Control forwarding separation method, device, equipment and storage medium based on Linux system |
WO2022251998A1 (en) * | 2021-05-31 | 2022-12-08 | 华为技术有限公司 | Communication method and system supporting multiple protocol stacks |
CN115859386A (en) * | 2022-11-25 | 2023-03-28 | 阿里巴巴(中国)有限公司 | Chip accelerator, encryption and decryption method and device, computer equipment and storage medium |
CN115981685A (en) * | 2021-10-14 | 2023-04-18 | 华为技术有限公司 | Application upgrading method and device, computing equipment and chip system |
CN116383175A (en) * | 2023-03-21 | 2023-07-04 | 康键信息技术(深圳)有限公司 | Data loading method, device, equipment and computer readable medium |
WO2023179508A1 (en) * | 2022-03-22 | 2023-09-28 | 北京有竹居网络技术有限公司 | Data processing method and apparatus, readable medium and electronic device |
-
2024
- 2024-04-07 CN CN202410410674.2A patent/CN118034958B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102971723A (en) * | 2011-06-28 | 2013-03-13 | 华为技术有限公司 | Distributed multi-process communication method and device |
US20200320017A1 (en) * | 2019-04-04 | 2020-10-08 | Cisco Technology, Inc. | Network interface card resource partitioning |
CN112698963A (en) * | 2020-12-22 | 2021-04-23 | 新华三技术有限公司成都分公司 | Event notification method and device |
WO2022251998A1 (en) * | 2021-05-31 | 2022-12-08 | 华为技术有限公司 | Communication method and system supporting multiple protocol stacks |
CN113535433A (en) * | 2021-07-21 | 2021-10-22 | 广州市品高软件股份有限公司 | Control forwarding separation method, device, equipment and storage medium based on Linux system |
CN115981685A (en) * | 2021-10-14 | 2023-04-18 | 华为技术有限公司 | Application upgrading method and device, computing equipment and chip system |
WO2023179508A1 (en) * | 2022-03-22 | 2023-09-28 | 北京有竹居网络技术有限公司 | Data processing method and apparatus, readable medium and electronic device |
CN115859386A (en) * | 2022-11-25 | 2023-03-28 | 阿里巴巴(中国)有限公司 | Chip accelerator, encryption and decryption method and device, computer equipment and storage medium |
CN116383175A (en) * | 2023-03-21 | 2023-07-04 | 康键信息技术(深圳)有限公司 | Data loading method, device, equipment and computer readable medium |
Also Published As
Publication number | Publication date |
---|---|
CN118034958B (en) | 2024-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110245001B (en) | Data isolation method and device and electronic equipment | |
Kato et al. | RGEM: A responsive GPGPU execution model for runtime engines | |
US10565131B2 (en) | Main memory including hardware accelerator and method of operating the same | |
US11836091B2 (en) | Secure memory access in a virtualized computing environment | |
US11086815B1 (en) | Supporting access to accelerators on a programmable integrated circuit by multiple host processes | |
CN112231007B (en) | Device driving method based on user mode and kernel mode driving cooperative processing framework | |
US11934698B2 (en) | Process isolation for a processor-in-memory (“PIM”) device | |
CN109376104B (en) | Chip and data processing method and device based on chip | |
US11500802B1 (en) | Data replication for accelerator | |
CN112330229B (en) | Resource scheduling method, device, electronic equipment and computer readable storage medium | |
US20240211256A1 (en) | Partition and isolation of a processing-in-memory (pim) device | |
US20220261489A1 (en) | Capability management method and computer device | |
US8751724B2 (en) | Dynamic memory reconfiguration to delay performance overhead | |
US20160283258A1 (en) | Sharing memory between guests | |
US11385927B2 (en) | Interrupt servicing in userspace | |
US20220066827A1 (en) | Disaggregated memory pool assignment | |
CN117349870A (en) | Transparent encryption and decryption computing system, method, equipment and medium based on heterogeneous computing | |
CN118034958B (en) | Task state notification system and method for multi-process scene | |
JP5254710B2 (en) | Data transfer device, data transfer method and processor | |
US11960420B2 (en) | Direct memory control operations on memory data structures | |
US11481255B2 (en) | Management of memory pages for a set of non-consecutive work elements in work queue designated by a sliding window for execution on a coherent accelerator | |
CN112989326A (en) | Instruction sending method and device | |
US9176910B2 (en) | Sending a next request to a resource before a completion interrupt for a previous request | |
CN116745754A (en) | System and method for accessing remote resource | |
KR102498319B1 (en) | Semiconductor device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |