CN117608763A - Task processing method and device - Google Patents

Task processing method and device Download PDF

Info

Publication number
CN117608763A
CN117608763A CN202311801568.9A CN202311801568A CN117608763A CN 117608763 A CN117608763 A CN 117608763A CN 202311801568 A CN202311801568 A CN 202311801568A CN 117608763 A CN117608763 A CN 117608763A
Authority
CN
China
Prior art keywords
task
unit
data
task unit
storage area
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311801568.9A
Other languages
Chinese (zh)
Inventor
李文凯
周强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202311801568.9A priority Critical patent/CN117608763A/en
Publication of CN117608763A publication Critical patent/CN117608763A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1009Address translation using page tables, e.g. page table structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45583Memory management, e.g. access or allocation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The embodiment of the specification provides a task processing method and device, wherein each subtask unit of a first task unit is correspondingly established to each second task unit one by one, and a corresponding group of subtask units and second task units are associated to a single storage area in a memory together, so that the problem of pseudo multi-threading can be avoided. In addition, when the first task unit executes the preset task, the corresponding group of subtask units and the second task unit transfer data through the single storage area in the memory which is commonly associated with the subtask units as the shared memory, so that the data transmission efficiency can be improved. In summary, this embodiment can improve task processing efficiency.

Description

Task processing method and device
Technical Field
One or more embodiments of the present disclosure relate to the field of computer technology, and in particular, to a method and an apparatus for task processing.
Background
A Virtual Machine (Virtual Machine) is a complete computer system running in a completely isolated environment with complete hardware system functionality through software emulation. A single virtual machine has independent CMOS, hard disk and operating system, and can operate as if it were a physical machine. Different services may be executed by virtual machine systems in different environments, for example, taking a processing object as a knowledge graph, a bipartite graph, a graph calculation task of graph data such as a isomorphic homogeneous graph (a single node type, a single side type, such as a social graph, a transaction graph, etc.), may be executed in a Java virtual machine environment, and a machine learning model based on Artificial Intelligence (AI) may be executed in a Python virtual machine environment. In practice, interactions may occur between different services. For example, in a dynamic graph computing scenario, each iteration computation of the graph computing task may generate a sub-graph feature, and the sub-graph feature is obtained by forward propagation reasoning of an AI model in a Python environment, and the reasoning result of the sub-graph feature is returned to the graph computing task in the Java virtual machine environment to participate in the next iteration computation. Because different virtual machines are independent of each other, data interaction between two virtual machines (which can be generalized to processes independent of each other) is an important technical problem.
Disclosure of Invention
One or more embodiments of the present specification describe a method and apparatus for task processing to solve one or more of the problems mentioned in the background.
According to a first aspect, there is provided a method of task processing for executing a predetermined task via a first task unit, the first task unit corresponding to a number of sub-task units executing the predetermined task, each sub-task unit corresponding to a respective second task unit created via configuration in the first task unit, each second task unit being isolated from the first task unit, the number of sub-task units comprising a first sub-task unit corresponding to a current second task unit and being commonly associated to a first storage area; the method comprises the following steps: calculating to obtain first data aiming at the preset task by utilizing the first subtask unit, and writing the first data into the first storage area; under the condition that the first data is detected, the current second task unit reads and processes the first data from a first storage area to obtain second data, and writes the second data into the first storage area; the second data is read from the first storage area via the first subtask unit to continue to perform the predetermined task.
In one embodiment, the first task unit is configured with candidate interfaces of the second task units, each of the second task units being created from dependencies selected by a user from the candidate interfaces.
In one embodiment, the first task unit is a Java virtual machine, the first sub task unit is a Java thread, and the second task unit is a Python virtual machine.
In one embodiment, the first subtask unit has a first mapping relationship between a logical address of the present task unit and a first physical address corresponding to the first storage area, and the first physical address is synchronized to the present second task unit by the first task unit when the present second task unit is created, so that the present second task unit determines a second mapping relationship between the logical address of the present task unit and the first physical address.
In one embodiment, the first data is written to the first storage area by the first subtask unit in the form of a first circular queue and read from the first circular queue by the current second task unit, and the second data is written to the first storage area by the current second task unit in the form of a second circular queue and read from the second circular queue by the first subtask unit.
In a further embodiment, the first subtask unit writes data to and reads data from the first storage area by maintaining a write pointer of the first circular queue and a read pointer of the second circular queue, and the current second task unit writes data to and reads data from the first storage area by maintaining a read pointer of the first circular queue and a read pointer of the first circular queue.
In a further embodiment, the first storage area is a memory storage area, and the current second task unit interacts with the first storage area by invoking a Cython language based memory operation module.
In one embodiment, the predetermined task is a graph calculation task, the first subtask unit executes a subtask for calculating a first sub-graph in the graph calculation task, the first data is a first sub-graph feature corresponding to the first sub-graph, and the second data is an inference result of predicting a node or entity corresponding to the first sub-graph by the current second task unit according to the first sub-graph feature.
According to a second aspect, there is provided a computing device for task processing, comprising a memory and a processor, the processor performing a predetermined task by running a first task unit, a second task unit, the first task unit corresponding to a number of sub-task units performing the predetermined task, each sub-task unit corresponding to a respective second task unit created via configuration in the first task unit, each second task unit being isolated from the first task unit, the number of sub-task units comprising a first sub-task unit corresponding to a current second task unit and being commonly associated to a first storage area; wherein, in a task cycle:
the first subtask unit calculates first data aiming at the preset task and writes the first data into the first storage area;
the current second task unit reads and processes the first data from the first storage area to obtain second data under the condition that the first data is detected, and writes the second data into the first storage area;
the first task unit reads the second data from the first storage area to continue to execute the predetermined task.
According to a third aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method corresponding to any of the task elements of the first aspect.
By the device and the method provided by the embodiment of the specification, in the process of executing the preset task through the first task unit, auxiliary data processing is needed to be carried out by the second task unit which is isolated from the first task unit. Therefore, each sub-task unit of the first task unit is correspondingly established to each second task unit one by one, and a corresponding group of sub-task units and the second task units are associated to a single storage area in the memory together, so that the problem of pseudo multi-threading can be avoided, and the task processing efficiency is improved. In addition, when the first task unit executes the preset task, the corresponding group of subtask units and the second task unit transfer data through the single storage area in the memory which is commonly associated with the subtask units as the shared memory, so that the data transmission efficiency can be improved, and further the task processing efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of an architecture for one embodiment of the present technology;
FIG. 2 shows a flow diagram of task processing according to an embodiment of the present description;
FIG. 3 is a schematic diagram showing a time sequence flow of task processing according to a specific example under the technical idea of the present specification;
fig. 4 shows a block diagram of the structure of an apparatus for task processing according to one embodiment of the present specification.
Detailed Description
The following describes the scheme provided in the present specification with reference to the drawings.
Fig. 1 shows a schematic diagram of an implementation architecture of an embodiment of the present description. As shown in fig. 1, in the implementation architecture provided in the present specification, the computing platform may be any computer, device, server, or the like. The computing platform may obtain the predetermined task from a local or external device. The predetermined task here may be various business processing tasks such as a graph calculation task and the like. The predetermined task may be processed via a predetermined task unit. The task unit may be implemented by, for example, a main process (e.g., process 0, i.e., main task unit) and several threads (e.g., thread 1, thread 2, thread 3, and thread 4 in fig. 1) therein. The threads can perform parallel computation, and can also cooperatively complete computation subtasks of different modules in the preset tasks.
The predetermined tasks for which this specification is directed may also invoke modules that need to be executed in other task elements (e.g., other processes) during processing via a master task element (e.g., master process). Such as the dynamic graph computation task mentioned in the background. For example, the graph computation task may be performed via a Java execution environment (e.g., an environment built by a Java virtual machine, JVM, implemented by Java processes), while the AI model is required in a Python execution environment (e.g., an environment built by a virtual machine, PVM, implemented by the Python language).
Because of the environmental isolation between processes (memory usage and isolation between other resources allocated during runtime, such as isolation between different virtual machines), in conventional techniques, inter-process communication may be implemented by calling an interface or the like. However, in some business scenarios, the real-time requirements are high, and in this way the efficiency requirements may not be met.
In order to improve interaction efficiency between processes, the technical scheme provided by the specification adopts the technical scheme that data are transferred between different processes in sharing. Shared memory is a way of inter-process communication, as the name implies, is the same piece of physical memory that allows multiple (typically two) unrelated processes to access. In general, a single task unit may have its own process control block (e.g., PCB) and address Space (Addr Space), and corresponding to a page table, is responsible for mapping its own virtual address (logical address) to physical address and is managed by a Memory Management Unit (MMU). The virtual addresses of two different task units are mapped to the same area of physical space through page tables, and the area pointed to by the virtual addresses is shared memory. Shared memory is a very efficient way to share and transfer data between multiple running processes, which can connect the same piece of physical memory into their respective address spaces. If a process writes data to shared memory, the changes made will immediately affect any other process that can access the same piece of shared memory.
On the other hand, considering that some task units have a "pseudo multi-threading problem" and may affect task processing efficiency, the implementation architecture of the present specification adopts a multi-process processing manner. One example of a "pseudo-multi-threading problem" is a problem caused by global interpreter locks in Python processes. In general, threads are independent of each other, so that the threads in the same process share data, and when each thread accesses a data resource, a 'contention' state occurs, that is, data may be occupied by multiple threads at the same time, so that data confusion is caused. Global interpreter lock functions to limit simultaneous execution of multiple threads, ensuring that only one thread is executing at a time. As such, the threads cannot execute simultaneously, resulting in pseudo-multithreading problems.
In the implementation architecture shown in fig. 1, for each subtask unit (e.g., each thread) in the main task unit (e.g., process 0 in fig. 1), each other task unit may be created in a one-to-one correspondence, such as process 1, process 2, process 3, process 4, etc. as illustrated, corresponding to thread 1, thread 2, thread 3, thread 4, etc. in process 0, respectively. Wherein various other processes may be created via the master process. In fig. 1, the corresponding process and thread are represented by the same line type as the connection to the memory. A corresponding set of subtask units and task units (e.g., thread 1 and process 1 in fig. 1) correspond to the same address range or storage area in memory. As shown in fig. 1, memory area 2, memory area 3, memory area 4, and the like are shared memories of each set of threads and processes, thereby exchanging data with each other. Therefore, the problem of pseudo multi-threading can be effectively avoided, parallel data processing is realized, and the processing efficiency of the preset task is improved.
The technical idea of the present specification will be described in detail with reference to a specific example shown in fig. 2.
FIG. 2 illustrates a task processing flow diagram according to one embodiment. The execution subject of this flow may be a computer, device, server, with some computing power, more specifically, for example, a computing platform as shown in fig. 1.
Here, a main task unit for executing a predetermined task is denoted as a first task unit, such as a JVM virtual machine (corresponding to a Java process). The first task unit may correspond to a plurality of subtask units (e.g., java threads in a JVM). These subtask units may perform processing of predetermined tasks in parallel. Via the first task unit, a plurality of second task units, such as PVM virtual machines (corresponding to Python processes), may be created. Wherein the first task unit may be configured with candidate interfaces of the second task unit, one or more of which may be created via several dependency items selected from the respective candidate interfaces.
Each second task unit is isolated from the first task unit. For example, if the first task unit is a JVM and the second task unit is a PVM, a plurality of Python processes may be created in the first task unit by using a multiprocessing.process () interface function, and the Python processes are used as respective second task units, and dependencies such as functions to be run, memory resources and the like are specified in brackets, and then the processes are started through p.start (). The first task unit may act as a task manager for each second task unit. In the graph calculation task, the first task unit serves as a graph calculation engine, and controls tasks executed by the second task unit, such as an inference task using an AI model. In one embodiment, the first task unit may control tasks performed by the second task unit by providing the second task unit with a corresponding task interface (e.g., an AI model interface).
On the other hand, the first task unit can also serve as an administrator to manage the CPU and memory resources of the second task unit. Those skilled in the art will appreciate that when the first task unit is created, a first mapping relationship may be created between the logical address of the task unit and the physical address in the memory. Through the first mapping relationship, corresponding physical addresses can be allocated to each subtask unit (such as each thread). When creating a single second task unit, the first task unit may synchronize a physical address corresponding to the corresponding thread (e.g., a physical address corresponding to the first storage area, hereinafter referred to as a first physical address) to the single second task unit, so that association under the second association relationship is achieved by the logical address in the second task unit and the first physical address. In this way, a shared data association in the memory via the first storage area is established between a single sub-task unit of the first task unit and a single second task unit.
In the case of receiving a predetermined task (e.g., a graph computation task), the first task unit may execute the predetermined task in parallel through the respective sub-task units. For example, in a graph computation task, each subtask unit processes a single sub-graph in the graph data separately. In this manner, a set of subtask units and a second task unit may process for a portion of the tasks in the predetermined task. Taking the example that the first subtask unit represents any subtask unit, the second task unit corresponding to the first subtask unit is recorded as the current second task unit, and in one task period, the first subtask unit and the current second task unit can cooperate to execute the task processing flow shown in fig. 2.
As shown in fig. 2, the task processing flow of an embodiment of the present specification may include: step 202, calculating to obtain first data aiming at a preset task by utilizing a first subtask unit, and writing the first data into a first storage area; step 204, under the condition that the first data is detected, the current second task unit reads and processes the first data from the first storage area to obtain second data, and writes the second data into the first storage area; step 206, reading the second data from the first storage area via the first subtask unit to continue to execute the predetermined task.
First, in step 202, first data is calculated for a predetermined task using a first subtask unit, and the first data is written into a first storage area.
Here, the first subtask unit may process a portion of the data in the predetermined task, and the obtained processing result is denoted as first data. The data processed by the first subtask unit and the processing results obtained are different according to the difference of the preset tasks. For example, in the graph computation task, the partial data processed by the first subtask unit may be a sub-graph in the graph data, and the first data is, for example, sub-graph feature data obtained for the sub-graph processing, and so on.
The first subtask unit may write the first data obtained by processing into a first storage area, such as a shared memory, which corresponds to the memory in common with the current second subtask unit according to the physical address determined by the first mapping relationship. The shared memory is used as a data transmission bridge, and can transmit data through various transmission paths. A common way to share memory is to store data, for example, through a family of shmx functions. The data storage is in the form of, for example, a queue or the like. Taking a queue as an example, one end is allowed to write, the other end is read or deleted, and the first-in first-out (First In First Out) principle is followed. The queues may include sequential queues, circular queues, chain queues, and the like.
Since the architecture of the present specification is directed to an architecture employing a plurality of second task units, a lock-free queue may be used to ensure interaction efficiency of data. The lock-free queue allows for fast data transfer between single producers and single consumers. The producer writes data from the tail of the queue and the consumer reads and deletes data from the head of the queue. In the form of a lock-free queue, the first sub-task unit can be a single producer, and the current second task unit can be a single consumer, so that the correctness of data can be ensured without adding a mutual exclusion mechanism, and the read-write performance is improved. The circular queue in which the first subtask unit writes the first data is, for example, denoted as a first circular queue.
As an example, lock-free queues are implemented, for example, using RingBuffer in a circular queue. RingBuffer is a buffer structure, can store data in a cyclic annular structure, and each buffer is fixed memory space size in advance, stores a certain amount of data, and the buffer of finite length can avoid memory fragmentation, reduces the overhead of dynamic memory allocation, improves operating efficiency and reliability. In the case of a lock-free queue implemented using RingBuffer, the first subtask unit may maintain a corresponding write pointer, through which the write address of the first data is pointed to. For example, the address of the write data is determined by modulo 1 the queue length (the corresponding memory size of the queue, typically no greater than the size of the first memory region, e.g., 10 megabytes) of the tail pointer. Thus, by moving the write pointer, the first data may be written to the lock-free queue. In an alternative example, the first subtask unit may maintain two write pointers, a write pointer in the memory of the task unit and a write pointer in the queue, respectively. The two write pointers may be associated according to a first mapping relation.
It will be appreciated that a queue operation may immobilize the head pointer, then move the tail pointer backwards each time an element is enqueued, return the element pointed to by the current head pointer each time dequeued, and then move the entire array one position forward, with the tail pointer decremented by one at the same time. The circular queue can be regarded as surrounding the array into a circle, and the head pointer and the tail pointer can be changed according to enqueuing and dequeuing, so that all other elements except dequeued elements can be omitted when the queue data structure dequeues, and the efficiency is improved.
In other embodiments, the first data may be written to the first storage area in other manners, which are not described herein.
Then, in case the first data is detected, the current second task unit reads and processes the first data from the first storage area to obtain second data, and writes the second data to the first storage area, via step 204.
The current second task unit and the first sub task unit share the first storage area, and therefore, in the case where it is detected that the first sub task unit writes the first data to the first storage area, the current second task unit can read the first data from the first storage area. Under the condition that the first storage area is used as the shared memory, as the shared memory is not provided with a synchronous mechanism, before the first subtask unit finishes the writing operation of the first data, the read-write mutual exclusion can be realized in a signal quantity mode and the like, and the current second task unit is controlled to read the first data.
And the current second task unit may read the first data from the first storage area after the first data writing is completed. The operation of reading data may be performed by a read pointer. Specifically, the first data may be read from the head of the queue by a read pointer, e.g., by adding 1 to the current head of the queue pointer and modulo the length of the queue, thereby obtaining the location of the first data in the first storage region. Optionally, the current second task unit may maintain two read pointers, namely, a read pointer in the memory of the current task unit and a read pointer in the queue. The two read pointers may be associated according to a second mapping relationship.
The second task unit may process the first data, and for convenience of description, the obtained processing result may be referred to as second data in this specification. For example, in the case where the second task unit is an AI model, the current second task unit may infer the first data to obtain a prediction processing result as the second data.
The second data obtained by the current second task unit can also be transmitted back to the first subtask unit for further processing. Thus, the current second task unit may also write second data to the first storage area to be shared with the first subtask unit. Similarly, the current second task unit stores the second data in the first storage area through the shmxx family of functions. The data storage is in the form of, for example, a queue (such as the lock-free queue previously described) or the like.
In the case of storing data in a queue, to ensure data accuracy and timeliness, the current second task unit and the first subtask unit may maintain another queue, such as a second circular queue, in the first storage region that is different from the first subtask unit writing the first data. The further queue may also be in a single producer single consumer mode, with the current second task unit writing data as producer and the first sub-task unit reading data as consumer. For the other queue, the current second task unit may maintain a corresponding write pointer, e.g., a write pointer in the memory of the task unit and a write pointer in the queue (two write pointers may be associated via a second association relationship), and the first sub-task unit may maintain a corresponding read pointer, e.g., a read pointer in the memory of the task unit and a read pointer in the queue (two write pointers may be associated via a first association relationship).
It will be appreciated that where the first and second task units are Java processes or Python processes managed via virtual machines, their operation on the underlying physical memory (e.g. the first storage area) typically relies on the middle tier implementation. To improve the efficiency of the operation, cython may be used as an interaction between the Python process and the address in the first memory area. The Memory View (Memory View) of Cython can be used to quickly access the elements of the array, the Memory View and the NumPy array have similar structures, and information such as shape, base, stride is stored, and the Memory View does not own a data storage area, but acquires the data storage area from other Python objects or the C language array. There are two forms of memory views in Cython, and depending on the manner of use, cython will automatically switch between these two formats. At the C language level it is a structure that Cython will automatically convert to a memryview object when it is needed to be used as a Python object.
Therefore, the Python virtual machine may perform memory operations through an interface (API) that stores a memory operation module written based on the Cython language in advance. Therefore, interfaces such as mapping association and the like can be canceled, mapping association on the shared memory can be standardized, and Python can operate the underlying physical memory segment as a common method is called. Because the Cython language can be converted into the optimized C/C++ language, the memory operation module written based on the Cython language can be executed at the speed of the compiled C language in the runtime environment, can directly call a C language library, can reserve an original interface for the Python language, and can be directly used by task units based on the Python language. Therefore, the memory operation module written by the Cython language can improve the calling performance among the modules, thereby improving the data transmission efficiency among task units.
In some embodiments, the operations of the first task unit and the second task unit and the underlying physical memory may be implemented in other manners, such as Pybind11, which is not described herein.
Next, in step 206, the second data is read from the first storage area via the first subtask unit to continue to perform the predetermined task.
In case it is detected that the second task unit writes the second data to the first storage area, the first sub-task unit may read the second data from the first storage area. Similarly, under the condition that the first storage area is used as the shared memory, before the second task unit completes the writing operation of the second data, the read-write mutual exclusion can be realized in a semaphore mode or the like, and the current first sub task unit is controlled to read the second data. And after the second data writing is completed, the first subtask unit may read the second data from the first storage area. The process of the first subtask unit reading the second data is similar to the current process of the second subtask unit reading the first data, and will not be described herein.
The first task unit reads the second data from the first storage area to continue to execute the predetermined task. For example, the second data is a prediction result for the first data, and the first subtask unit may further perform a correlation process using the prediction result. In the case where the predetermined task is a graph calculation task, the relevant process is, for example, the next round of iterative calculation of graph data or the like.
In order to further clarify the technical solution of the present disclosure, fig. 3 shows a schematic diagram of a specific example of a service scenario of an entity chain finger. In the embodiment of fig. 3, the predetermined task is an entity chain of knowledge maps in a graph computation task. In fig. 3, the first task unit is a JVM virtual machine (Java process), and the second task unit is a PVM virtual machine (Python process). As shown in fig. 3, for the currently running JVM virtual machine, in order for it to perform entity chain finger tasks, a plurality of PVM virtual machines, such as PVM1, PVM2, PVM3, etc., are first created by configuring the Python interface in the JVM virtual machine. The number of PVM virtual machines may be consistent with the number of parallel threads of the physical chain finger task, and the number may be preset, for example, the number is 3 in fig. 3. Meanwhile, the JVM virtual machine establishes association relations with all storage areas in the memory for all threads respectively, and synchronizes the corresponding storage areas to the corresponding PVMs. Thus, after a single PVM virtual machine is created, it can have a shared memory area, i.e., shared memory, with a single thread in the JVM.
Thus, in the case that a specific entity chain refers to the arrival of a task, the JVM may perform feature computation of each node in the graph data in a multithreaded parallel manner. Wherein a single thread may compute one subgraph. Thus, each thread obtains the sub-graph characteristics corresponding to the corresponding sub-graph respectively. Here, the sub-graph features may include node features (or entity features) corresponding to respective nodes (or entities) in the respective sub-graph. After calculating the corresponding sub-graph features, a single thread may store them in the corresponding storage areas. The corresponding PVM reads the sub-graph features from the memory region and makes the same physical predictions. If n groups of identical entities are predicted, the data is stored in the corresponding storage area. The corresponding thread reads n sets of identical entities from the memory region and performs entity chain pointing.
The process shown by the dashed box 301 in fig. 3 can be considered as one entity chain finger period. The process shown by dashed box 301 may then be repeated for a number of round iterations of the physical chain finger operation.
According to the task processing method provided by the technical conception of the specification, each sub task unit of the first task unit is correspondingly established to each second task unit one by one, and a corresponding group of sub task units and the second task units are associated to a single storage area in the memory together, so that the problem of pseudo multithreading can be avoided, and the task processing efficiency is improved. In addition, when the first task unit executes the preset task, the corresponding group of subtask units and the second task unit transfer data through the single storage area in the memory which is commonly associated with the subtask units as the shared memory, so that the data transmission efficiency can be improved, and further the task processing efficiency is improved.
According to an embodiment of another aspect, there is also provided a computing device for task processing. The computing device may include memory and a processor, such as a computer, terminal, server, etc. having certain computing capabilities, more particularly a computing platform as shown in fig. 1. FIG. 4 illustrates a computing device 400 for task processing according to one embodiment. As shown in fig. 4, computing device 400 may include a memory 401 and a processor 402.
Wherein the processor 402 may perform a predetermined task by running the first task unit, the second task unit. The first task unit corresponds to a plurality of sub task units that execute a predetermined task, and each sub task unit corresponds to each second task unit created via configuration in the first task unit. Each second task unit is isolated from the first task unit. Any one of the plurality of subtask units may be referred to as a first subtask unit, which corresponds to the current second subtask unit and is commonly associated to a first storage area in the memory 401.
In a single task cycle of performing a predetermined task: the first subtask unit can calculate first data aiming at a preset task and write the first data into the first storage area; the current second task unit can read and process the first data from the first storage area to obtain second data under the condition that the first data is detected, and write the second data into the first storage area; the first task unit may also read the second data from the first storage area to continue performing the predetermined task.
In one embodiment, the first task unit is configured with candidate interfaces for the second task units, each of the second task units being created from dependencies selected by the user from the candidate interfaces.
In one embodiment, the first task unit is a Java virtual machine, the first sub-task unit is a Java thread, and the second task unit is a Python virtual machine.
In one embodiment, the first subtask unit has a first mapping relationship between a logical address of the present task unit and a first physical address corresponding to the first storage area, and the first physical address is synchronized to the present second task unit by the first task unit when the present second task unit is created, so that the present second task unit determines a second mapping relationship between the logical address of the present task unit and the first physical address.
In one embodiment, first data is written to the first storage area by the first subtask unit in the form of a first circular queue and read from the first circular queue by the current second task unit, and second data is written to the first storage area by the current second task unit in the form of a second circular queue and read from the second circular queue by the first subtask unit.
In a further embodiment, the first subtask unit writes data to and reads data from the first storage area by maintaining a write pointer of the first circular queue and a read pointer of the second circular queue, and the current second subtask unit writes data to and reads data from the first storage area by maintaining a read pointer of the first circular queue and a read pointer of the first circular queue.
In a further embodiment, the first storage area is a memory storage area, and the current second task unit interacts with the first storage area through a memory operation module written based on the Cython language.
In one embodiment, the predetermined task is a graph computation task, the first subtask unit executes a subtask for computing a first sub-graph in the graph computation task, the first data is a first sub-graph feature corresponding to the first sub-graph, and the second data is an inference result for predicting a node or entity corresponding to the first sub-graph with respect to the first sub-graph feature.
It should be noted that, the computing device 400 shown in fig. 4 corresponds to the method described in fig. 2, and the corresponding description in the embodiment of the method shown in fig. 2 is also applicable to the computing device 400, which is not described herein.
According to an embodiment of another aspect, there is also provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with the single task unit in fig. 2.
Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the embodiments of the present disclosure may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, these functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-described specific embodiments are used for further describing the technical concept of the present disclosure in detail, and it should be understood that the above description is only specific embodiments of the technical concept of the present disclosure, and is not intended to limit the scope of the technical concept of the present disclosure, and any modifications, equivalent substitutions, improvements, etc. made on the basis of the technical scheme of the embodiment of the present disclosure should be included in the scope of the technical concept of the present disclosure.

Claims (10)

1. A method of task processing for performing a predetermined task via a first task unit corresponding to a number of sub-task units performing the predetermined task, each sub-task unit corresponding to a respective second task unit created via configuration in the first task unit, each second task unit being isolated from the first task unit, the number of sub-task units comprising a first sub-task unit corresponding to a current second task unit and being commonly associated to a first storage area; the method comprises the following steps:
calculating to obtain first data aiming at the preset task by utilizing the first subtask unit, and writing the first data into the first storage area;
under the condition that the first data is detected, the current second task unit reads and processes the first data from a first storage area to obtain second data, and writes the second data into the first storage area;
the second data is read from the first storage area via the first subtask unit to continue to perform the predetermined task.
2. A method as claimed in claim 1, wherein the first task unit is configured with candidate interfaces for second task units, each second task unit being created from dependencies selected by a user from the candidate interfaces.
3. The method of claim 1, wherein the first task unit is a Java virtual machine, the first sub task unit is a Java thread, and the second task unit is a Python virtual machine.
4. The method of claim 1, wherein a first sub-task unit has a first mapping relationship between a logical address of the current task unit and a first physical address corresponding to the first storage area, the first physical address being synchronized by the first task unit to the current second task unit when the current second task unit is created, so that the current second task unit determines a second mapping relationship between the logical address of the current task unit and the first physical address.
5. The method of claim 1, wherein the first data is written to the first storage area by the first subtask unit in the form of a first circular queue and read from the first circular queue by the current second task unit, and the second data is written to the first storage area by the current second task unit in the form of a second circular queue and read from the second circular queue by the first subtask unit.
6. The method of claim 5, wherein the first subtask unit writes data to and reads data from the first storage area by maintaining a write pointer of the first circular queue and a read pointer of the second circular queue, and the current second task unit writes data to and reads data from the first storage area by maintaining a read pointer of the first circular queue and a read pointer of the first circular queue.
7. The method of claim 1, wherein the first storage area is a memory storage area, and the current second task unit interacts with the first storage area by invoking a Cython language-based memory operation module.
8. The method according to any one of claims 1-7, wherein the predetermined task is a graph computation task, the first subtask unit executes a subtask for computing a first sub-graph in the graph computation task, the first data is a first sub-graph feature corresponding to the first sub-graph, and the second data is an inference result of the current second task unit predicting a node or entity corresponding to the first sub-graph with respect to the first sub-graph feature.
9. A computing device for task processing, comprising a memory and a processor, the processor executing a predetermined task by running a first task unit, a second task unit, the first task unit corresponding to a number of sub-task units executing the predetermined task, each sub-task unit corresponding to a respective second task unit created via configuration in the first task unit, each second task unit being isolated from the first task unit, the number of sub-task units comprising a first sub-task unit corresponding to a current second task unit and being commonly associated to a first storage area; wherein, in a task cycle:
the first subtask unit calculates first data aiming at the preset task and writes the first data into the first storage area;
the current second task unit reads and processes the first data from the first storage area to obtain second data under the condition that the first data is detected, and writes the second data into the first storage area;
the first task unit reads the second data from the first storage area to continue to execute the predetermined task.
10. A computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method corresponding to the single task unit of any of claims 1-8.
CN202311801568.9A 2023-12-25 2023-12-25 Task processing method and device Pending CN117608763A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311801568.9A CN117608763A (en) 2023-12-25 2023-12-25 Task processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311801568.9A CN117608763A (en) 2023-12-25 2023-12-25 Task processing method and device

Publications (1)

Publication Number Publication Date
CN117608763A true CN117608763A (en) 2024-02-27

Family

ID=89959924

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311801568.9A Pending CN117608763A (en) 2023-12-25 2023-12-25 Task processing method and device

Country Status (1)

Country Link
CN (1) CN117608763A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117806988A (en) * 2024-02-29 2024-04-02 山东云海国创云计算装备产业创新中心有限公司 Task execution method, task configuration method, board card and server

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117806988A (en) * 2024-02-29 2024-04-02 山东云海国创云计算装备产业创新中心有限公司 Task execution method, task configuration method, board card and server
CN117806988B (en) * 2024-02-29 2024-05-24 山东云海国创云计算装备产业创新中心有限公司 Task execution method, task configuration method, board card and server

Similar Documents

Publication Publication Date Title
US9009711B2 (en) Grouping and parallel execution of tasks based on functional dependencies and immediate transmission of data results upon availability
US8316190B2 (en) Computer architecture and method of operation for multi-computer distributed processing having redundant array of independent systems with replicated memory and code striping
US8464217B2 (en) Object-oriented support for dynamic assignment of parallel computing resources
Nukada et al. NVCR: A transparent checkpoint-restart library for NVIDIA CUDA
KR101400286B1 (en) Method and apparatus for migrating task in multi-processor system
CN111078323B (en) Data processing method and device based on coroutine, computer equipment and storage medium
US8429664B2 (en) Job scheduling apparatus and job scheduling method
CA2820081A1 (en) Distributed computing architecture
CN111309649B (en) Data transmission and task processing method, device and equipment
CN117608763A (en) Task processing method and device
US20110265093A1 (en) Computer System and Program Product
CN116467061B (en) Task execution method and device, storage medium and electronic equipment
Perarnau et al. Argo NodeOS: Toward unified resource management for exascale
US8543722B2 (en) Message passing with queues and channels
Yasugi et al. ABCL/onEM-4: A new software/hardware architecture for object-oriented concurrent computing on an extended dataflow supercomputer
Catellani et al. Challenges in the implementation of MrsP
US20080134187A1 (en) Hardware scheduled smp architectures
Tsigas et al. Non-blocking data sharing in multiprocessor real-time systems
Grelck et al. Distributed s-net: Cluster and grid computing without the hassle
Shrivastava et al. Supporting transaction predictability in replicated DRTDBS
US8918799B2 (en) Method to utilize cores in different operating system partitions
Lin et al. Master–worker model for mapreduce paradigm on the tile64 many-core platform
Jungklass et al. Memopt: Automated memory distribution for multicore microcontrollers with hard real-time requirements
CN114880104A (en) Method, system and storage medium for facilitating out-of-order execution of OpenCL workgroups
Leidel et al. Toward a scalable heterogeneous runtime system for the convey MX architecture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination