CN116737348B - Multi-party task processing method and device, computer equipment and storage medium - Google Patents

Multi-party task processing method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN116737348B
CN116737348B CN202311018571.3A CN202311018571A CN116737348B CN 116737348 B CN116737348 B CN 116737348B CN 202311018571 A CN202311018571 A CN 202311018571A CN 116737348 B CN116737348 B CN 116737348B
Authority
CN
China
Prior art keywords
task
participant
data processing
data
deployed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311018571.3A
Other languages
Chinese (zh)
Other versions
CN116737348A (en
Inventor
陈瑞钦
蒋杰
刘煜宏
陈鹏
程勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202311018571.3A priority Critical patent/CN116737348B/en
Publication of CN116737348A publication Critical patent/CN116737348A/en
Application granted granted Critical
Publication of CN116737348B publication Critical patent/CN116737348B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/466Transaction processing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application provides a multiparty task processing method and device, computer equipment and storage medium. The multiparty task is jointly participated in and executed by the first participator and the second participator, and comprises a plurality of data processing subtasks; the method comprises the following steps: analyzing the multiparty task to obtain a matching relationship between the data blocks deployed in the first participant and the second participant; one data block group with a matching relationship includes task data required for executing at least one data processing subtask in the multiparty task; acquiring resource information of data processing resources deployed in a first participant and a second participant; setting an execution plan for a plurality of data processing subtasks based on the matching relationship and the resource information; and scheduling the first participant and the second participant to perform data processing on each data block group based on the deployed data processing resources according to the task execution sequence indicated by the execution plan so as to jointly execute a plurality of data processing subtasks. The multi-party task can be ensured to be successfully executed.

Description

Multi-party task processing method and device, computer equipment and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for processing a multiparty task, a computer device, and a storage medium.
Background
Along with the arrival of the hot trend of the artificial intelligence technology, the artificial intelligence technology such as machine learning brings new solutions to various industries and promotes the efficient development of the industries. In general, a machine learning model requires a large amount of data to train, but in an actual application scenario, the data are distributed in different organizations, companies and departments, and cannot be intensively trained in a data sharing mode due to the limitation of data management rules, so that a data island is formed, and the data value distributed in each data island cannot be deeply mined. In recent years, with rapid development and practice of Multi-party task execution technologies such as FL (Federated Learning, federal learning), MPC (Secure Multi-Party Computation) and the like, a new solution is provided for breaking "data islands" and data availability is not visible. The multiparty task execution technology may be used to execute multiparty tasks, where multiparty tasks refer to data processing tasks that require multiple (i.e., two or more) participants to participate in execution together, for example, multiparty tasks may include machine learning model training tasks that multiple participants participate in execution together.
Disclosure of Invention
The embodiment of the application provides a method and a device for processing multiparty tasks, computer equipment and a storage medium, which can ensure that the multiparty tasks can be successfully executed.
On one hand, the embodiment of the application provides a processing method of a multiparty task, wherein the multiparty task is needed to be jointly participated in and executed by a first participator and a second participator, and the multiparty task comprises a plurality of data processing subtasks; the processing method of the multiparty task comprises the following steps:
analyzing the multiparty task to be executed to obtain a matching relationship between the data blocks deployed in the first participant and the data blocks deployed in the second participant; one data block group with a matching relationship includes task data required for executing at least one data processing subtask in the multiparty task;
acquiring resource information of data processing resources deployed in a first participant and a second participant;
setting an execution plan for a plurality of data processing subtasks based on the matching relationship and the resource information; the execution plan is used for indicating the task execution sequence of the plurality of data processing subtasks;
and scheduling the first party and the second party to perform data processing on each data block group based on the deployed data processing resources according to the task execution sequence indicated by the execution plan, and jointly executing a plurality of data processing subtasks.
Correspondingly, the embodiment of the application provides a processing device for multiparty tasks, wherein the multiparty tasks need to be jointly participated in and executed by a first participator and a second participator, and the multiparty tasks comprise a plurality of data processing subtasks; the processing device for multiparty tasks comprises:
the processing unit is used for analyzing the multiparty task to be executed to obtain a matching relationship between the data block deployed in the first party and the data block deployed in the second party; one data block group with a matching relationship includes task data required for executing at least one data processing subtask in the multiparty task;
an acquisition unit configured to acquire resource information of data processing resources deployed in the first party and the second party;
the processing unit is also used for setting an execution plan for a plurality of data processing subtasks based on the matching relation and the resource information; the execution plan is used for indicating the task execution sequence of the plurality of data processing subtasks;
and the processing unit is also used for scheduling the first party and the second party to perform data processing on each data block group based on the deployed data processing resources according to the task execution sequence indicated by the execution plan and jointly executing a plurality of data processing subtasks.
In one implementation, the processing unit is configured to, when setting an execution plan for a plurality of data processing subtasks based on the matching relationship and the resource information, specifically perform the following steps:
determining an execution plan setting strategy matched with the resource information and the matching relation together;
according to the determined execution plan setting strategy, determining a first task batch of which the task execution sequence belongs to a first priority from a plurality of data processing subtasks; the first task batch comprises one or more data processing subtasks which are jointly and parallelly executed by a first participant and a second participant;
if the plurality of data processing subtasks include residual data processing subtasks besides the data processing subtasks included in the first task batch, determining a second task batch of which the task execution sequence belongs to a second priority from the residual data processing subtasks according to the determined execution plan setting strategy; the second task batch comprises one or more data processing subtasks which are jointly and parallelly executed by the first participant and the second participant in the residual data processing subtasks;
if other data processing subtasks exist in the remaining data processing subtasks except the data processing subtasks included in the second task batch, continuing to determine the task batch of which the task execution sequence belongs to the subsequent priority until the priorities of all the data processing subtasks included in the multiparty task are determined; wherein the task execution order belonging to the first priority precedes the task execution order belonging to the second priority.
In one implementation, the resource information is used to indicate that data processing resources deployed in the first and second participants are insufficient; the matching relationship is a one-to-one matching relationship; the execution plan setting strategy matched with the resource information and the one-to-one matching relationship is a first execution plan setting strategy;
the processing unit is configured to set a policy according to a first execution plan, and when determining that a task execution sequence belongs to a first task batch with a first priority in the plurality of data processing subtasks, the processing unit is specifically configured to execute the following steps:
selecting a candidate data block from the data blocks deployed by the first participant;
determining a matching data block belonging to the same data block group as the candidate data block in the data blocks deployed by the second participant, and sending the identification of the matching data block to the second participant, so that the second participant selects a target matching data block in the matching data blocks according to the identification of the matching data block;
receiving an identification of a target matching data block sent by a second participant, and determining target candidate data blocks belonging to the same data block group as the target matching data block in the candidate data blocks according to the identification of the target matching data block; a target candidate data block with a matching relationship and a target matching data block form a target data block group;
Determining a target data processing subtask corresponding to the determined target data block group as a data processing subtask included in the first task batch; the target data chunk set includes task data required to perform a target data processing subtask in the multi-party task.
In one implementation, the first participant and the second participant each include a task execution node in which the data block and the data processing resources are deployed;
the processing unit is configured to, when selecting a candidate data block from the data blocks deployed by the first party, specifically perform the following steps:
selecting unused data blocks with the quantity smaller than or equal to the quantity of idle data processing resources of each task execution node from the data blocks deployed by each task execution node aiming at each task execution node included by the first participant;
and determining unused data blocks selected by each task execution node included by the first participant as candidate data blocks.
In one implementation, the first participant and the second participant each include a task execution node in which the data block and the data processing resources are deployed; the process of selecting a target matching data block from the matching data blocks by the second participant according to the identification of the matching data block comprises the following steps:
Determining target task execution nodes deployed in the second party by each matching data block according to the identification of the matching data block;
for each target task execution node, if the number of the matched data blocks deployed in the target task execution node is greater than the number of idle data processing resources of the target task execution node, deleting part of the matched data blocks in the matched data blocks deployed in the target task execution node, so that the number of the residual matched data blocks deleted by the target task execution node is smaller than or equal to the number of idle data processing resources of the target task execution node;
and determining the remaining matching data blocks in each target task execution node as target matching data blocks.
In one implementation, the processing device of the multiparty task is arranged in a first participant, wherein the first participant is a task coordination master, and the second participant is a task coordination receiver; the processing unit is further used for executing the following steps:
acquiring a first amount of data processing resources deployed by a first participant;
acquiring a second amount of data processing resources deployed by a second participant;
and if the first quantity is smaller than the second quantity, determining that the first party is a task coordination master, and the second party is the task coordination receiver.
In one implementation, the first participant and the second participant each include a task execution node in which the data block and the data processing resources are deployed; the processing unit is further used for executing the following steps:
if the first number is equal to the second number, acquiring a third number of task execution nodes included in the first participant and acquiring a fourth number of task execution nodes included in the second participant;
if the third number is larger than the fourth number, the first participant is determined to be a task coordination master, and the second participant is determined to be a task coordination receiver.
In one implementation, the resource information is used to indicate that data processing resources deployed in the first and second participants are sufficient; the matching relationship is a one-to-many matching relationship; the execution plan setting strategy matched with the resource information and the one-to-many matching relationship is a second execution plan setting strategy;
the processing unit is configured to set a policy according to the second execution plan, and when determining that the task execution sequence belongs to a first task batch with a first priority among the plurality of data processing subtasks, the processing unit is specifically configured to execute the following steps:
generating a bipartite graph according to the matching relationship between the data blocks deployed in the first participant and the data blocks deployed in the second participant; the bipartite graph comprises a first vertex set, a second vertex set and an edge set; vertices in the first set of vertices are used to represent data blocks deployed in the first participant and vertices in the second set of vertices are used to represent data blocks deployed in the second participant; edges in the edge set are used to represent a matching relationship between data blocks deployed in the first party and data blocks deployed in the second party; the data block group with the matching relation is expressed as a matching vertex group in the bipartite graph;
According to a bipartite graph multiple matching strategy, determining a target matching vertex group in each matching vertex group included in the bipartite graph;
determining a data block group corresponding to the target matching vertex group as a target data block group;
determining a target data processing subtask corresponding to the determined target data block group as a data processing subtask included in the first task batch; the target data chunk set includes task data required to perform a target data processing subtask in the multi-party task.
In one implementation, the processing unit is configured to, according to a bipartite graph multiple matching policy, determine, when a target matching vertex group is determined from each matching vertex group included in the bipartite graph, specifically perform the following steps:
selecting a reference matched vertex group from all matched vertex groups included in the bipartite graph, adding the reference matched vertex group into a matched vertex group set, and taking the matched vertex groups except the reference matched vertex group in the bipartite graph as residual matched vertex groups;
if the vertex belonging to the second vertex set in the first residual matched vertex set in the residual matched vertex sets does not coincide with the vertex belonging to the second vertex set in the matched vertex set, adding the first residual matched vertex set into the matched vertex set, and continuing traversing the second residual matched vertex set in the residual matched vertex sets until all residual matched vertex sets are traversed;
If the vertex belonging to the second vertex set in the first residual matched vertex set in the residual matched vertex sets is overlapped with the vertex belonging to the second vertex set in the matched vertex set, continuing traversing the second residual matched vertex set in the residual matched vertex sets until the traversing of all the residual matched vertex sets is finished;
and determining the matched vertex group included in the matched vertex group set as a target matched vertex group.
In one implementation, the first participant and the second participant each include a task execution node in which the data block and the data processing resources are deployed; the processing unit is used for dispatching the first participant and the second participant to perform data processing on the data block group corresponding to the first task batch based on the deployed data processing resources, and is specifically used for executing the following steps when the data processing subtasks in the first task batch are jointly executed:
determining a target data block group required for executing the data processing subtasks in the first task batch, wherein in the target data block group, a data block deployed in a task execution node of a first participant is a target candidate data block, and a data block deployed in a task execution node of a second participant is a target matching data block;
Performing data processing on the target candidate data block based on the data processing resources deployed in the task execution node for deploying the target candidate data block, and executing the task part of the data processing subtask corresponding to the target data block group on the first participant;
and triggering the second party to perform data processing on the target matching data block based on the data processing resources deployed in the task execution node for deploying the target matching data block, and executing the task part of the data processing subtask corresponding to the target data block group on the second party.
In one implementation, the obtaining unit is configured to, when obtaining resource information of data processing resources deployed in the first participant and the second participant, specifically perform the following steps:
obtaining a first total amount of data processing resources deployed in a first participant and a second participant;
obtaining a second total number of data chunks deployed in the first and second participants;
if the first total number is smaller than the second total number, generating resource information, wherein the resource information is used for indicating that the data processing resources deployed in the first participant and the second participant are insufficient;
If the first total number is greater than or equal to the second total number, resource information is generated, the resource information indicating that data processing resources deployed in the first and second participants are sufficient.
Accordingly, embodiments of the present application provide a computer device comprising:
a processor adapted to implement a computer program;
a computer readable storage medium storing a computer program adapted to be loaded by a processor and to perform the above-described method of processing a multi-party task.
Accordingly, embodiments of the present application provide a computer readable storage medium storing a computer program that, when read and executed by a processor of a computer device, causes the computer device to perform the above-described processing method of a multiparty task.
Accordingly, embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium and executes the computer instructions to cause the computer device to perform the method of processing a multi-party task as described above.
In this embodiment, a multi-party task needs to be jointly executed by a first participant and a second participant, where the multi-party task may include a plurality of data processing subtasks, and may set a task execution order for the plurality of data processing subtasks according to a matching relationship between a data block deployed in the first participant and a data block deployed in the second participant and resource information of data processing resources deployed in the first participant and the second participant, where a data block group having the matching relationship may include task data for executing at least one data processing subtask of the plurality of data processing subtasks; then, the first participant and the second participant can be scheduled to perform data processing on each data block group based on the deployed data processing resources according to the set task execution sequence, so that a plurality of data processing subtasks are jointly executed, that is, the first participant and the second participant can be scheduled to jointly execute a plurality of data processing subtasks according to the set task execution sequence, and a plurality of data processing subtasks included in the multiparty task can be reasonably scheduled and executed between the first participant and the second participant, so that smooth execution of the multiparty task can be ensured.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a conceptual diagram of a multiparty task provided by an embodiment of the present application;
FIG. 2 is a schematic diagram of an architecture of a multi-party task execution system provided in an embodiment of the present application;
FIG. 3 is a schematic diagram of a multi-party task according to an embodiment of the present application;
FIG. 4 is a schematic diagram of another multi-party task provided in an embodiment of the present application that cannot be successfully performed;
fig. 5 is a flow chart of a method for processing a multiparty task according to an embodiment of the present application;
FIG. 6 is a flow chart of another method for processing a multi-party task according to an embodiment of the present application;
FIG. 7 is a bipartite graphic intent corresponding to a one-to-one matching relationship provided by embodiments of the present application;
FIG. 8 is a schematic diagram of task scheduling in a one-to-one matching relationship with insufficient resources according to an embodiment of the present application;
FIG. 9 is a schematic diagram of task scheduling in another resource-deficient one-to-one matching relationship provided by embodiments of the present application;
FIG. 10 is a bipartite graph diagram of a one-to-many matching relationship provided by embodiments of the present application;
FIG. 11 is a schematic diagram of task scheduling in a one-to-many matching relationship with sufficient resources according to an embodiment of the present application;
fig. 12 is a schematic structural diagram of a processing device for multiparty tasks provided in an embodiment of the present application;
fig. 13 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
In order to more clearly understand the technical solutions provided by the embodiments of the present application, key terms related to the embodiments of the present application are described below.
(1) Embodiments of the present application relate to multiparty tasks. As shown in fig. 1, a multiparty task refers to a data processing task that requires joint participation by multiple participants to be performed. The present application implementation does not limit the number of participants involved in performing a multiparty task, and embodiments of the present application illustrate an example in which a multiparty task needs to be performed by two participants (e.g., a first participant and a second participant) together. Further, the multiparty tasks may include data processing tasks performed based on multiparty task execution techniques such as FL (Federated Learning, federal learning), MPC (Secure Multi-Party Computation), and the like; federal learning is a machine learning framework, which can effectively help a plurality of participants to train a machine learning model under the requirement of meeting data management rules; the multiparty security calculation is a universal cryptographic primitive, which enables each participant to cooperatively calculate any function on the premise of not revealing task data of each participant of the multiparty task, and the multiparty security calculation is a cryptographic basis implemented by applications such as electronic election, threshold signature and the like; it is easy to think that the multiparty task related to the embodiment of the present application may be a data processing task that needs to be jointly executed by a plurality of participants in each application field (for example, a machine learning field, a cryptography field, etc.), and in the embodiment of the present application, the multiparty task is illustrated by taking a machine learning model training task in the artificial intelligence technical field as an example, so that, based on the characteristic that the multiparty task is invisible to the data of each participant, the lossless training of the machine learning model can be implemented by using the data of each participant on the premise that the data isolation is not revealed.
Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision. Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like. The pre-training model is the latest development result of deep learning, and integrates the technology.
Task data required for executing the multiparty task can be distributed among different participants, and each participant needs to participate in executing the multiparty task together because each participant is limited by the data management rules and cannot share the task data among the participants. For example, the machine learning model training task may be jointly performed by the first participant and the second participant, the task data distributed to the first participant may include tag data of the training sample, the task data distributed to the second participant may include feature data of the training sample, and the first participant and the second participant need to jointly participate in performing the machine learning model training task because the task data cannot be shared between the first participant and the second participant. In general, in a multiparty task such as a machine learning model training task, a party having tag data of a training sample may be referred to as a data application party (Guest party), a party providing feature data of the training sample may be referred to as a data provider party (Host party), that is, a first party may be referred to as a data application party (Guest party), and a second party may be referred to as a data provider party (Host party).
(2) Embodiments of the present application relate to multi-party task execution systems. The multiparty task execution system refers to a data processing system formed by all participants of the multiparty task, and all the participants in the multiparty task execution system can participate in executing the multiparty task together; the number of participants included in the multi-party task execution system is not limited in the embodiments of the present application, and the multi-party task execution system includes two participants (e.g., a first participant and a second participant may be included) as an example. As shown in fig. 2, the multiparty task execution system may include a first participant 201 and a second participant 202, where the first participant 201 and the second participant 202 may establish a direct communication connection through wired communication, or the first participant 201 and the second participant 202 may establish an indirect communication connection through wireless communication. The first participant 201 and the second participant may each be a distributed system including a task scheduling node (Driver) and a task execution node (Executor), where the distributed system refers to a system formed by connecting multiple dispersed devices through a communication network, and processing and control functions of the distributed system are distributed on each device, that is, local tasks of each participant of the multiparty task are executed in a distributed environment. The following describes the functions of the task scheduling node and the task execution node, respectively:
(1) Task scheduling node (Driver):
each participant may include one or more task scheduling nodes, each task scheduling node may be associated with one or more task execution nodes, and the multi-party task execution system shown in fig. 2 is described by taking as an example that each participant of the multi-party task execution system includes one task scheduling node, and each task scheduling node is associated with two task execution nodes. In general, a multi-party task may be split into smaller task units, which may be referred to as data processing subtasks, that is, the multi-party task may include a plurality of data processing subtasks, and each party participates in executing the data processing subtasks together. Each task scheduling node may be configured to allocate and coordinate a plurality of data processing subtasks included in the multi-party task on a task execution node associated with the task scheduling node; wherein, the allocation refers to allocating a plurality of data processing subtasks to different task execution nodes for execution; coordination refers to setting an execution plan of each task execution node to execute an allocated data processing subtask (i.e., setting a task execution order of each task execution node to execute an allocated data processing subtask).
(2) Task execution node (Executor):
the task execution node may be configured to execute the data processing subtasks assigned by the task scheduling node.
Task Data required for executing the Data processing subtasks may be deployed in the task execution node, and the task Data required for executing the Data processing subtasks may be deployed in the task execution node in the form of Data blocks (or may be abbreviated as Partition). In the multiparty task execution system shown in fig. 2, two data blocks are disposed in each task execution node, task data required for executing a data processing subtask in a first participant is disposed in the task execution node of the first participant in the form of data blocks, and task data required for executing a data processing subtask in a second participant is disposed in the task execution node of the second participant in the form of data blocks.
Further, the data blocks deployed in the first party and the data blocks deployed in the second party may have a matching relationship, where the matching relationship may include at least one of: a one-to-one matching relationship and a one-to-many matching relationship; the one-to-one matching relationship may refer to: a matching relationship is provided between one data block deployed in the first party and one data block deployed in the second party; the one-to-many matching relationship may refer to: one data block deployed in a first party has a matching relationship with a plurality of data blocks deployed in a second party. The data block group with the matching relationship may include task data for executing at least one data processing subtask of the plurality of data processing subtasks, and it may be further understood that the data block group with the matching relationship and the at least one data processing subtask have a corresponding relationship, and a process for executing the at least one data processing subtask is a data processing process of the data block group with the matching relationship corresponding to the at least one data processing subtask; and in the data block group with the matching relationship, the first participant can perform data processing on the data blocks deployed in the first participant in the data block group with the matching relationship so as to execute the task part of at least one data processing subtask corresponding to the data block group with the matching relationship on the first participant, and the second participant can perform data processing on the data blocks deployed in the second participant in the data block group with the matching relationship so as to execute the task part of at least one data processing subtask corresponding to the data block group with the matching relationship on the second participant, thereby realizing that the first participant and the second participant jointly participate in executing at least one data processing subtask corresponding to the data block group with the matching relationship.
It should be noted that, when the matching relationship is a one-to-one matching relationship, the data block group having the matching relationship corresponds to one data processing subtask, that is, the data block group having the one-to-one matching relationship may include task data for executing one data processing subtask; when the matching relationship is a one-to-many matching relationship, the data block group with the matching relationship corresponds to at least two data processing subtasks, that is, the data block group with the one-to-many matching relationship may include task data of at least two data processing subtasks, it may be further understood that the one-to-many matching relationship may be split into a plurality of one-to-one matching relationships, each one-to-one matching relationship corresponds to one data processing subtask, and the number of one-to-one matching relationships obtained by splitting the one-to-many matching relationship is consistent with the number of data processing subtasks corresponding to the one-to-many matching relationship. It should also be noted that in a multiparty task, the task execution nodes included by each participant may not be equal, but the number of data chunks deployed in each participant may be equal.
In addition, it is further required to ensure that, in the data block group having the matching relationship, data alignment within the data blocks disposed in each participant, where the data alignment may be understood as that task data included in the data blocks disposed in each participant are aligned in one or more data dimensions, for example, a second participant is a data provider, feature data of 30 objects are included in the data blocks disposed in the second participant, a first participant is a data application, and tag data of the 30 objects are required to be included in the data blocks disposed in the first participant, so that the data blocks disposed in the first participant and the data blocks disposed in the second participant can be aligned.
The task execution node can be further provided with data processing resources, and the data processing resources can be used for carrying out data processing on the data blocks; the data processing resources may include at least one of: CPU (Central Processing Unit ), memory resources, and Network resources required for communication. Each data block needs at least one unit of data processing resource to perform data processing, and in this embodiment, the number of data processing resources required for performing data processing on one data block is not limited.
In the multiparty task execution system, the task scheduling node may be a terminal or a server, and the task execution node may be a terminal or a server. The terminal mentioned in the embodiment of the present application may include any one of the following: smart phones, tablet computers, notebook computers, desktop computers, intelligent voice interaction devices, smart watches, vehicle terminals, intelligent home appliances, and aircraft, etc., but are not limited thereto. The servers mentioned in the embodiments of the present application may be separate physical servers, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be cloud servers that provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), and basic cloud computing services such as big data and artificial intelligence platforms, which are not limited in the embodiments of the present application.
It can be understood that the multiparty task execution system shown in fig. 2 is for more clearly describing the technical solution of the embodiment of the present application, and does not constitute a limitation on the technical solution provided by the embodiment of the present application, and those skilled in the art can know that, with the evolution of the system architecture and the appearance of a new service scenario, the technical solution provided by the embodiment of the present application is equally applicable to similar technical problems.
As described above, each participant included in the multiparty task execution system is a distributed system, and task scheduling policies of the distributed system generally include FIFO (First In First out, first-in first-out) and FAIR (FAIR scheduling) and related variants; the FIFO is used for scheduling and executing according to the task allocation sequence, the tasks allocated first are executed preferentially, the FAIR is used for scheduling according to the execution time of the tasks, and the obtained execution time of each task is ensured to be similar. If task scheduling nodes in each participant of the multiparty task are in task coordination, only local task coordination of each participant is considered, and how task scheduling nodes in other participants are in task coordination is not considered, so that the multiparty task cannot be successfully executed; because the multiparty task needs to keep synchronization of all the participants on the whole of the data processing subtask (synchronization can be understood as that for the data processing subtask to be executed, all the participants need to receive the data processing results of the data blocks corresponding to the data processing subtask by other participants in as short time as possible so as to ensure that the data processing subtask can be successfully executed, therefore, in an ideal state, all the participants need to synchronously process the data processing of the data blocks corresponding to the data processing subtask in all the participants), if the multiparty task is not synchronous in the scheduling of the data processing subtask, the multiparty task cannot normally run, and therefore, the task scheduling strategy of the independent distributed system is not suitable for task scheduling of the multiparty task communicated across the distributed system, and the allocation coordination of the multiparty task communicated across the distributed system needs other task scheduling strategies to be processed.
An example of data processing subtask scheduling mismatch between two participants (which may include a first participant and a second participant, for example) is shown as follows: the execution of one data processing subtask needs to perform data processing on the data block with a matching relationship in the two participants, for example, when the data block k deployed in the first participant and the data block k deployed in the second participant have a matching relationship, the data processing subtask deployed in the first participant and the data block k deployed in the second participant need to be synchronously processed, referring to fig. 3, when only one data processing resource is deployed on each task execution node in the two participants, the task execution node 1 and the task execution node 2 of the first participant are executing the data processing subtask respectively corresponding to the data block 1 and the data block 4, and the task execution node 1 and the task execution node 2 of the second participant are executing the data processing subtask respectively corresponding to the data block 2 and the data block 3, the data processing subtasks being executed by the two participants are not matched, and both communication are waiting for the data processing result of the corresponding data block of the other party, so that the data processing subtask will form a situation waiting until the timeout is terminated.
In order to enable the multi-party task to schedule and execute synchronization at the data processing subtask level, the embodiment of the application proposes the following two task scheduling strategies, and the following description can be seen specifically:
the first task scheduling strategy is suitable for some algorithms, for example, the first participant independently completes a certain stage task corresponding to all data blocks, the stage result is sent to the second participant for caching, the second participant can read the stage result of the corresponding data block sent by the first participant when carrying out data processing on each data block, and the stage result is sent by the first participant and can be directly read, so that the data processing subtasks corresponding to each data block deployed in the second participant can be ensured to be carried out smoothly.
The second task scheduling strategy ensures that each data block deployed on each task execution node can obtain corresponding data processing resources, and all data processing subtasks corresponding to the data block groups with the matching relationship can be performed simultaneously, so that the data blocks deployed in all the participants naturally form matching. A simple processing mode can be that the number of task execution nodes in each participant is the same, and only 1 data block and 1 data processing resource are deployed on each task execution node, in this case, data processing can be carried out on data blocks in a data block group with a matching relationship in each participant, so that each participant can be ensured to synchronously execute data processing subtasks.
However, the task scheduling policies described above are not applicable to all multi-party tasks. The first task scheduling strategy is not universal based on the special design of the algorithm level, and when all the participants participating in the multiparty task need to communicate frequently and many times, the mode is limited greatly, and the complexity of algorithm implementation is unacceptable. The second task scheduling policy needs to allocate corresponding data processing resources for all data blocks deployed in each participant, and requires that the matching relationships between the data blocks deployed in each participant are one-to-one matching relationships, when there is a requirement that the matching relationship between the data blocks deployed in each participant is one-to-many matching relationships, or when there is a requirement that the data processing resources allocated for the data blocks deployed in each participant are limited, the multiparty task will not operate normally, which can be seen in the case of the embodiments shown in fig. 3 and 4:
as shown in fig. 3, when the data processing resources are limited, the matching relationship between the data blocks deployed in each of the participants of the multiparty task is a one-to-one matching relationship, but the data blocks scheduled for data processing do not have a matching relationship between the two participants, the multiparty task cannot be executed smoothly. Specifically, the task execution nodes in the first participant and the second participant shown in fig. 3 only deploy one data processing resource and two data blocks, so each task execution node can only perform data processing on one data block at a time, and the data blocks indicated by the dashed boxes in fig. 3 represent that no data processing resource is currently allocated and data processing is not performed temporarily. Under the second task scheduling policy, there may be a scenario that the first party is subjected to data processing by the data block 1 and the data block 4, and the second party is subjected to data processing by the data block 2 and the data block 3, both parties are waiting for the data processing result of the corresponding data block of the other party and cannot wait for timeout infinitely, so that the multi-party task cannot be successfully executed.
As shown in fig. 4, in the case that the data processing resources are sufficient and the matching relationship between the data blocks deployed in the respective participants of the multiparty task is a one-to-many matching relationship, there is a situation that the data block resources compete, and the multiparty task cannot be executed smoothly. Specifically, the task execution nodes in the first participant and the second participant shown in fig. 4 each deploy two data processing resources and two data blocks, and thus, the data blocks deployed in each task execution node can be data-processed. But each data block and the other data blocks have a matching relationship and belong to different data processing subtasks; for example, the data block 1 deployed in the first party and the data block 1 deployed in the second party have a matching relationship, and the data block 3 deployed in the first party and the data block 1 and the data block 3 deployed in the second party have a matching relationship; the data processing subtasks (which may be referred to as first data processing subtasks, for example) corresponding to the data block group consisting of the deployed data block 1 in the first participant, the data block 1 deployed in the second participant and the data block 3 deployed in the second participant are different from the data processing subtasks (which may be referred to as second data processing subtasks, for example) corresponding to the data block group consisting of the deployed data block 3 in the first participant, the data block 1 deployed in the second participant and the data block 3 deployed in the second participant. If the first participant performs data processing on the data block 1 at this time to perform a first data processing subtask, but the data block 3 of the second participant participates in a second data processing subtask corresponding to the data block 3 in the first participant, the data processing resource will be occupied, and the first data processing subtask corresponding to the data block 1 in the first participant will be blocked at this time.
Based on this, the embodiment of the application provides a method for processing a multi-party task, which provides a new task scheduling policy, where the new task scheduling policy considers a matching relationship between a data block deployed in a first party and a data block deployed in a second party, and whether resources of data processing resources deployed in the first party and the second party are sufficient, and sets a task execution order for a plurality of data processing subtasks included in the multi-party task, so that the first party and the second party can be scheduled to execute the plurality of data processing subtasks included in the multi-party task according to the set task execution order. The task execution sequence of the plurality of data processing subtasks may specifically refer to: dividing a plurality of data processing subtasks into different task batches, wherein different priorities exist among task execution sequences of the data processing subtasks belonging to the different task batches, the data processing subtasks in the task batch with high priority of the task execution sequence are executed preferentially, the data processing subtasks in the task batch with low priority of the task execution sequence are executed later, and the data processing tasks belonging to the same task batch can be scheduled to be executed in parallel together by a first participant and a second participant; it can be seen that the data processing tasks belonging to the same task lot can be executed in parallel, so that the first participant and the second participant can be scheduled to execute each task lot in turn according to the priorities of the task execution sequences of the different task lots, thereby ensuring that the multiparty tasks can be executed smoothly.
The embodiment of the application provides a processing method of a multiparty task, which mainly introduces how to schedule and execute a plurality of data processing subtasks included in the multiparty task and how to determine whether the data processing resources are sufficiently deployed. The multiparty task may be performed by a computer device, which may be, for example, a first party in the multiparty task performance system described above, the first party being a data application party (Guest party) and the second party being a data provider party (Host party). As shown in fig. 5, the processing method of the multiparty task may include, but is not limited to, the following steps S501 to S504:
s501, analyzing multiparty tasks to be executed to obtain a matching relationship between a data block deployed in a first party and a data block deployed in a second party; one set of data blocks having a matching relationship includes task data required to perform at least one data processing sub-task of the multi-party task.
And analyzing the multiparty task to be executed, and obtaining the matching relationship between the data blocks deployed in the first party and the data blocks deployed in the second party. As described previously, the matching relationship between the data blocks deployed in the first party and the data blocks deployed in the second party may include at least one of: a one-to-one matching relationship and a one-to-many matching relationship. One data block group with the matching relationship may include task data required for executing at least one data processing subtask in the multiparty task, and it may be further understood that there is a corresponding relationship between the data block group with the matching relationship and the data processing subtask, and a process of performing data processing on the data block group with the matching relationship is a process of executing the data block subtask corresponding to the data block group; and, the data processing of the group of data blocks may include: the first party performs data processing on the data blocks disposed in the first party in the data block group (the first party performs data processing on the data blocks disposed in the first party in the data block group, and may perform a task portion of a data processing subtask corresponding to the data block group on the first party), and the second party performs data processing on the data blocks disposed in the second party in the data block group (the second party performs data processing on the data blocks disposed in the second party in the data block group, and may perform a task portion of a data processing subtask corresponding to the data block group on the second party).
S502, acquiring resource information of data processing resources deployed in the first participant and the second participant.
The resource information of the data processing resources deployed in the first and second participants may be used to indicate a resource sufficiency condition (e.g., resource sufficiency or resource insufficiency) of the data processing resources deployed in the first and second participants, and the resource information may be obtained by comparing the number of data processing resources deployed in the first and second participants with the number of data blocks deployed in the first and second participants. In particular, a first total amount of data processing resources deployed in a first participant and a second participant may be obtained; obtaining a second total number of data chunks deployed in the first and second participants; if the first total number is less than the second total number, resource information may be generated, the resource information may be used to indicate that data processing resources deployed in the first and second participants are insufficient; if the first total number is greater than or equal to the second total number, resource information may be generated, which may be used to indicate that data processing resources deployed in the first and second participants are sufficient.
For example, a first total number of data processing resources deployed in the first and second participants is 20, a second total number of data chunks deployed in the first and second participants is 30, the first total number is less than the second total number, and resource information indicating insufficient data processing resources deployed in the first and second participants may be generated; in this case, when data processing is performed, there is a case where a part of the data blocks do not have data processing resources and cannot be subjected to data processing, and the data processing is performed only after the data processing resources are released. A first total number of data processing resources deployed in the first and second participants being 30, a second total number of data chunks deployed in the first and second participants being 30, the first total number being equal to the second total number, resource information indicating that the resources of the data processing resources deployed in the first and second participants are sufficient may be generated; in this case, when data processing is performed, data processing resources exist for each data block, and data processing can be performed.
S503, setting an execution plan for the plurality of data processing subtasks based on the matching relation and the resource information, wherein the execution plan is used for indicating the task execution sequence of the plurality of data processing subtasks.
After the matching relationship between the data blocks deployed in the first participant and the data blocks deployed in the second participant is obtained by parsing, and the resource information of the data processing resources deployed in the first participant and the second participant is obtained, an execution plan may be set for the plurality of data processing subtasks based on the matching relationship and the resource information, where the execution plan is used to indicate a task execution order of the plurality of data processing subtasks. The task execution sequence of the plurality of data processing subtasks may specifically refer to: dividing the data processing subtasks included in the multiparty task into different task batches, wherein the task execution sequences of the data processing subtasks belonging to the same task batch belong to the same priority, the task execution sequences of the data processing subtasks belonging to different task batches belong to different priorities, the task batch with high priority of the task execution sequence is executed preferentially, and the task batch with low priority of the task execution sequence is executed later; and, the data processing subtasks divided into the same task batch are among the unexecuted data processing subtasks included in the multiparty task, the first participant and the second participant support one or more unexecuted data processing subtasks that are jointly executed in parallel. The first participant and the second participant support to jointly execute the data processing subtasks in the same task batch in parallel, and specifically refer to: the data processing subtasks in the same task batch can be executed in parallel in the first participant and the second participant, and the first participant and the second participant jointly execute the data processing subtasks in the same task batch; further, task data required for executing the data processing subtasks are deployed in the form of data blocks in the first participant and the second participant, where the first participant and the second participant execute the data processing subtasks in the same task batch together, specifically: the first participant performs data processing on data blocks required for executing the data processing subtasks in the same task batch deployed in the first participant to execute task portions of the data processing subtasks in the same task batch on the first participant, and the second participant performs data processing on data blocks required for executing the data processing subtasks in the same task batch deployed in the second participant to execute task portions of the data processing subtasks in the same task batch on the second participant.
Specifically, the process of setting an execution plan for a plurality of data processing subtasks based on the matching relationship and the resource information may include: determining an execution plan setting strategy matched with the resource information and the matching relation together; a policy may be set according to the determined execution plan, and a first task batch of which the task execution order belongs to a first priority may be determined among the plurality of data processing subtasks; the first task batch includes one or more data processing sub-tasks of the plurality of data processing sub-tasks that are executed in parallel in common by the first participant and the second participant. If the plurality of data processing subtasks include a residual data processing subtask in addition to the data processing subtask included in the first task lot, determining a second task lot of which the task execution sequence belongs to a second priority from the residual data processing subtasks according to the determined execution plan setting strategy; the second task batch includes one or more of the remaining data processing sub-tasks that the first participant and the second participant support for common parallel execution. If other data processing subtasks exist in the remaining data processing subtasks except the data processing subtasks included in the second task batch, the task batch with the task execution sequence belonging to the subsequent priority can be continuously determined until the priorities of all the data processing subtasks included in the multiparty task to which the task execution sequence belongs are determined; wherein the task execution order belonging to the first priority precedes the task execution order belonging to the second priority.
And S504, scheduling the first party and the second party to perform data processing on each data block group based on the deployed data processing resources according to the task execution sequence indicated by the execution plan, and jointly executing a plurality of data processing subtasks.
After setting an execution plan for the plurality of data processing subtasks based on the matching relationship and the resource information, the execution plan is used for indicating the task execution sequence of the plurality of data processing subtasks, the first participant and the second participant can be scheduled to perform data processing on each data block group based on the deployed data processing resources according to the task execution sequence indicated by the execution plan, and the plurality of data processing subtasks are jointly executed. As described above, the task execution order of the plurality of data processing subtasks may specifically refer to: dividing the data processing subtasks included in the multiparty tasks into different task batches, wherein different priorities exist among task execution sequences of the different task batches; therefore, according to the task execution sequence indicated by the execution plan, the first party and the second party are scheduled to perform data processing on each data block group based on the deployed data processing resources, and a plurality of data processing subtasks are jointly executed, which may specifically be: according to the priorities of task execution sequences of the task batches, the first participator and the second participator are dispatched to perform data processing on data block groups required for executing the task batches based on deployed data processing resources, data processing subtasks in the task batches are jointly executed, task batches with high priorities of task execution are executed preferentially, and task batches with low priorities of task execution sequences are executed later.
In the embodiment of the application, a plurality of data processing subtasks included in the multiparty task can be reasonably scheduled and executed between the first participant and the second participant, and after the task execution sequence is set for the plurality of data processing subtasks based on the matching relation and the resource information, each data processing subtask can be jointly and orderly executed by the first participant and the second participant in sequence according to the task execution sequence, so that the multiparty task can be ensured to be jointly and smoothly executed by the first participant and the second participant.
The embodiment of the application provides a processing method of a multiparty task, which mainly introduces a task scheduling execution mode of a plurality of data processing subtasks included in the multiparty task under the condition that different resource information and matching relations correspond. The multiparty task may be performed by a computer device, which may be, for example, a first party in the multiparty task performance system described above, the first party being a data application party (Guest party) and the second party being a data provider party (Host party). As shown in fig. 6, the processing method of the multiparty task may include, but is not limited to, the following steps S601 to S607:
S601, analyzing the multiparty task to be executed to obtain a matching relationship between the data block deployed in the first party and the data block deployed in the second party; one set of data blocks having a matching relationship includes task data required to perform at least one data processing sub-task of the multi-party task.
In this embodiment, the execution process of step S601 is the same as the execution process of step S501 in the embodiment shown in fig. 5, and the specific execution process can be referred to the specific description of step S501 in the embodiment shown in fig. 5, which is not repeated here.
S602, acquiring resource information of data processing resources deployed in the first participant and the second participant.
In this embodiment, the execution process of step S602 is the same as the execution process of step S502 in the embodiment shown in fig. 5, and the specific execution process can be referred to the specific description of step S502 in the embodiment shown in fig. 5, which is not repeated here.
S603, determining an execution plan setting strategy matched with the resource information and the matching relation together.
The execution plan setting policy that is matched together with the resource information and the matching relationship may include a first execution plan setting policy or a second execution plan setting policy. Wherein the first execution plan setting policy may be matched together with resource information for indicating insufficient data processing resources deployed in the first and second participants, and a one-to-one matching relationship; that is, the first execution plan setting policy is a task scheduling policy in the first case including: the data processing resources deployed in the first party and the second party are insufficient, and a matching relationship is provided between one data block deployed in the first party and one data block deployed in the second party. The second execution plan setting policy may be matched together with resource information indicating that the data processing resources deployed in the first and second participants are sufficient, and a one-to-many matching relationship; that is, the second execution plan setting policy is a task scheduling policy in the second case, the second case including: the data processing resources deployed in the first and second participants are sufficient, and there is a matching relationship between one data chunk deployed in the first participant and a plurality of data chunks deployed in the second participant.
S604, determining a first task batch of which the task execution sequence belongs to a first priority from a plurality of data processing subtasks according to the determined execution plan setting strategy; the first task batch includes one or more data processing sub-tasks of the plurality of data processing sub-tasks that the first participant and the second participant are capable of executing in parallel in common.
After determining an execution plan setting policy that is matched with the resource information and the matching relationship together, determining a first task batch of which the task execution order belongs to a first priority among the plurality of data processing subtasks according to the determined execution plan setting policy; wherein the first task batch may include one or more data processing sub-tasks of the plurality of data processing sub-tasks that are executed in parallel in common by the first participant and the second participant. The following describes the procedure of determining, among a plurality of data processing subtasks, a first task lot whose task execution order belongs to a first priority, under a first execution plan setting policy and a second execution plan setting policy, respectively:
(1) The first execution plan sets a policy (corresponding to the case where the data processing resources are insufficient and one-to-one matching relationship):
Before describing a process of determining that a task execution sequence belongs to a first task lot of a first priority among a plurality of data processing subtasks according to a first execution plan setting strategy, a concept of bipartite graph is introduced, and a matching relationship between a data block deployed in a first participant and a data block deployed in a second participant can be converted into a bipartite graph. Specifically, the process of generating the bipartite graph according to the matching relationship between the data blocks deployed in the first participant and the data blocks deployed in the second participant may include: vertices in the first set of vertices may be used to represent data blocks deployed in the first participant, vertices in the second set of vertices may be used to represent data blocks deployed in the second participant, and edges in the set of edges may be used to represent matching relationships between data blocks deployed in the first participant and data blocks deployed in the second participant. That is, the bipartite graph may include a first set of vertices, where vertices in the first set of vertices may be used to represent data blocks deployed by a first participant, a second set of vertices may be used to represent data blocks deployed by a second participant, and a set of edges, where edges in the set of edges may be used to represent a matching relationship between data blocks deployed by the first participant and data blocks deployed by the second participant.
From the perspective of bipartite graph matching, as shown in fig. 7, the one-to-one matching relationship forms perfect matching in the bipartite graph sense, and under the condition of sufficient data processing resources, the data processing subtasks corresponding to the data block groups with the matching relationship can be executed. However, if the data processing resources are insufficient, the number of connected edges in bipartite graph matching is limited, and it cannot be ensured that all data processing word tasks corresponding to the data block groups with the matching relationship can be executed. Therefore, in the case of insufficient data processing resources and one-to-one matching relationship, if it is desired to determine the task execution order of the data processing subtasks corresponding to each data block group, it is necessary to consider, among the plurality of data processing subtasks included in the multiparty task, a first part of the data processing subtasks that are capable of being executed in parallel in common, the first part of the data processing subtasks as a first task batch whose task execution order belongs to a first priority, consider, among the remaining data processing subtasks, a second part of the data processing subtasks that are capable of being executed in parallel in common in consideration of insufficient data processing resources, consider the second part of the data processing subtasks as a second task batch whose task execution order belongs to a second priority, and so on until the priorities to which the task execution orders of all the data processing subtasks included in the multiparty task belong are determined.
In more detail, the process of determining, among the plurality of data processing sub-tasks, that the task execution order belongs to the first task batch of the first priority according to the first execution plan setting policy may specifically include the following sub-steps s 11-s 14:
s11, selecting a candidate data block from the data blocks deployed by the first participant.
As described above, the data blocks may be deployed in the task execution node of the first participant, so that in order to ensure that the selected data processing subtasks can be successfully executed, the data blocks with the number smaller than or equal to the number of data processing resources deployed in the first participant may be selected in the first participant as candidate data blocks, so that the candidate data blocks may be allocated with data processing resources, and the data processing subtasks corresponding to the data block group to which the candidate data blocks belong may be successfully executed in parallel for the first participant. Specifically, the process of selecting a candidate data block from the data blocks deployed by the first participant may specifically include: selecting unused data blocks (i.e., data blocks not subjected to data processing) of which the number is smaller than or equal to the number of idle data processing resources (i.e., data processing resources not used for data processing on the data blocks) of each task execution node in the data blocks deployed by each task execution node for each task execution node included in the first participant; and determining unused data blocks selected by each task execution node included by the first participant as candidate data blocks.
Further, each task execution node may select, according to a data block selection rule, an unused data block having a number smaller than or equal to the number of idle data processing resources deployed by itself among the unused data blocks deployed by itself. For example, the data block selection rules may include a data block identification selection rule, where the data block identification selection rule refers to: each task execution node sorts the identifications of the unused data blocks deployed by itself, and sequentially selects the unused data blocks with the number smaller than or equal to the number of idle data processing resources deployed by itself according to the arrangement sequence of the identifications of the unused data blocks. For another example, the data block selection rule may include a data block random selection rule, where the data block random selection rule refers to: each task execution node may randomly select unused data blocks from among the unused data blocks deployed by itself, the number of unused data blocks being less than or equal to the number of free data processing resources deployed by itself.
And s12, determining a matched data block belonging to the same data block group as the candidate data block in the data blocks deployed by the second participant, and sending the identification of the matched data block to the second participant so that the second participant selects a target matched data block in the matched data blocks according to the identification of the matched data block.
Based on the process that the first participant selects the candidate data block from the data blocks deployed by the first participant, it can be seen that the data processing subtasks corresponding to the data block group to which the selected candidate data block belongs can be smoothly executed in parallel by the first participant, but not necessarily can be smoothly executed by the second participant; this is because, among the data blocks deployed by the second participant, there may be no data processing resource available for data processing of the matching data blocks belonging to the same data block group as the candidate data block, and therefore, the second participant needs to delete some matching data blocks incapable of data processing from the matching data blocks, so that the deletion of the data processing subtasks corresponding to the data block group to which the obtained target matching data block belongs can be smoothly performed for both the first participant and the second participant.
Specifically, the first party can determine a matching data block belonging to the same data block group as the candidate data block in the data blocks deployed by the second party; the determining may be based on a bipartite graph, and for any candidate matching block, the first participant may determine a vertex corresponding to the candidate data block in the bipartite graph, and then may determine another vertex connected by an edge to the vertex corresponding to the candidate data block in the bipartite graph, and the data block corresponding to the other vertex may be determined as the matching data block belonging to the same data block group as the candidate data block. Based on the bipartite graph, the process of determining matching data blocks belonging to the same data block group as the candidate data block can be accelerated.
The first party may then send the identification of the matching data block to the second party to cause the second party to select a target matching data block among the matching data blocks based on the identification of the matching data block. The process of selecting the target matching data block from the matching data blocks by the second participant according to the identification of the matching data block may specifically include:
determining target task execution nodes deployed in the second party by each matching data block according to the identification of the matching data block; for each target task execution node, if the number of the matched data blocks deployed in the target task execution node is greater than the number of idle data processing resources of the target task execution node, part of the matched data blocks (for example, the matched data blocks may be deleted randomly or may be deleted according to the identification sequence of the matched data blocks) may be deleted in the matched data blocks deployed in the target task execution node, so that the number of the remaining matched data blocks deleted by the target task execution node is less than or equal to the number of idle data processing resources of the target task execution node, that is, the second party may delete unreasonable part of the matched data blocks, so that all the remaining matched data blocks deleted by the target task execution node may be subjected to data processing; and determining the remaining matching data blocks in each target task execution node as target matching data blocks, so that the deleted target matching data blocks can be allocated with data processing resources, and the second party can perform data processing.
s13, receiving the identification of the target matching data block sent by the second party, and determining target candidate data blocks belonging to the same data block group as the target matching data block in the candidate data blocks according to the identification of the target matching data block; a target candidate data block and a target matching data block with a matching relationship form a target data block group.
The process of determining the target candidate data block belonging to the same data block group as the target matching data block in the candidate data blocks according to the identification of the target matching data block may also be performed based on a bipartite graph, which may accelerate the determination process. Therefore, the target candidate data blocks included in the target data block group can be smoothly subjected to data processing in the first participant, the target matching data blocks included in the target data block group can also be smoothly subjected to data processing in the second participant, and the data processing subtasks corresponding to the target data block group can be jointly and parallelly executed by the first participant and the second participant.
S14, determining the target data processing subtasks corresponding to the determined target data block group as the data processing subtasks included in the first task batch; the target data chunk set includes task data required to perform a target data processing subtask in the multi-party task.
It should be noted that, the steps s 11-s 14 are required to be executed under the condition that the first participant is the task coordination master and the second participant is the task coordination receiver; the method for determining that the first party is a task coordination master and the second party is a task coordination receiver specifically may include:
when the number of data processing resources deployed in the first participant is different from the number of data processing resources deployed in the second participant, if the number of data processing resources deployed in the first participant is smaller than the number of data processing resources deployed in the second participant, it may be determined that the first participant is the task orchestration master and the second participant is the task orchestration recipient. In particular, a first amount of data processing resources deployed by a first participant may be obtained, and a second amount of data processing resources deployed by a second participant may be obtained; if the first number is smaller than the second number, the first participant can be determined to be a task orchestration dominant and the second participant can be determined to be the task orchestration recipient. The participant with a small number of data processing resources is taken as a task coordination master, because the participant with a small number of data processing resources can concurrently process data, and therefore, the maximum number of data processing subtasks which can be executed in parallel in the multiparty task is limited, and the maximum number of data processing subtasks which can be executed in parallel in the multiparty task is equal to the number of data processing resources, and therefore, the number of data processing resources is a key for limiting the number of data processing subtasks which can be executed in parallel.
When the number of data processing resources deployed in the first participant is the same as the number of data processing resources deployed in the second participant, if the number of task execution nodes included in the first participant is greater than the number of task execution nodes included in the second participant, it may be determined that the first participant is a task coordination master, and the second participant is the task coordination receiver. In particular, a first amount of data processing resources deployed by a first participant may be obtained, and a second amount of data processing resources deployed by a second participant may be obtained; if the first number is equal to the second number, a third number of task execution nodes included in the first participant and a fourth number of task execution nodes included in the second participant may be obtained; if the third number is greater than the fourth number, it may be determined that the first participant is the task orchestration master and the second participant is the task orchestration recipient. When the number of data processing resources deployed by the two participants is the same, the participant with a large number of task execution nodes is selected as a task coordination master, because the more the number of task execution nodes is, the smaller the number of data processing resources deployed on each task execution node is, the lower the selection difficulty of the task execution nodes in selecting candidate data blocks is, so that the selection efficiency of the candidate data blocks can be improved, the scheduling efficiency of each data processing subtask in the multiparty task can be improved to a certain extent, and the execution efficiency of the multiparty task can be improved.
(2) The second execution plan sets a policy (corresponding to the case where the data processing resources are sufficient and one-to-many matching relationship):
in the case of sufficient data processing resources, all data blocks may be processed, in principle all data processing sub-tasks may be performed, but since the data blocks deployed in the two participants have a one-to-many matching relationship, the data blocks deployed in the second participant may assume a plurality of data processing sub-tasks, and in general, when the process of data processing a data block is used to perform a certain data processing sub-task, the data processing resources allocated to the data block are all applied to the data processing sub-task, and the other data processing sub-tasks corresponding to the data block will not be performed. Therefore, in the case of sufficient data processing resources and a one-to-many matching relationship, if it is desired to determine the execution order of the data processing subtasks corresponding to the respective data block groups, it is necessary to consider that, in the case where there is no preemption of data processing resources among the data processing subtasks, among the plurality of data processing subtasks included in the multi-party task, a first part of the data processing subtasks that can be executed in parallel is selected, the first part of the data processing subtasks are regarded as a first task batch whose task execution order belongs to a first priority, and then, among the remaining data processing subtasks, a second part of the data processing subtasks that can be executed in parallel is selected, the second part of the data processing subtasks are regarded as a second task batch whose task execution order belongs to a second priority, and so on until the priorities to which the task execution orders of all the data processing subtasks included in the multi-party task belong are determined.
In more detail, according to the second execution plan setting policy, the process of determining, among the plurality of data processing subtasks, that the task execution order belongs to the first task batch of the first priority may specifically include:
firstly, generating a bipartite graph according to a matching relationship between a data block deployed in a first participant and a data block deployed in a second participant; the data block group having the matching relationship may be represented as a matching vertex group in the bipartite graph. And secondly, determining target matched vertex groups in each matched vertex group included in the bipartite graph according to a bipartite graph multiple matching strategy. Then, the data block group corresponding to the target matching vertex group can be determined as a target data block group; determining a target data processing subtask corresponding to the determined target data block group as a data processing subtask included in the first task batch; the target data chunk set includes task data required to perform a target data processing subtask in the multi-party task.
The process of determining the target matching vertex group in each matching vertex group included in the bipartite graph according to the bipartite graph multiple matching strategy may include: selecting a reference matched vertex group from all matched vertex groups included in the bipartite graph, adding the reference matched vertex group into a matched vertex group set, and taking the matched vertex groups except the reference matched vertex group in the bipartite graph as residual matched vertex groups; if the vertex belonging to the second vertex set in the first residual matched vertex set in the residual matched vertex sets does not coincide with the vertex belonging to the second vertex set in the matched vertex set, the first residual matched vertex set can be added into the matched vertex set, and the second residual matched vertex set in the residual matched vertex sets is continuously traversed until all residual matched vertex sets are traversed; if the vertex belonging to the second vertex set in the first residual matched vertex set in the residual matched vertex sets is overlapped with the vertex belonging to the second vertex set in the matched vertex set, continuing to traverse the second residual matched vertex set in the residual matched vertex sets until all residual matched vertex sets are traversed; then, the matching vertex groups included in the matching vertex group set may be determined as target matching vertex groups. In the process of determining the target matching vertex group in each matching vertex group included in the bipartite graph based on the bipartite graph multiple matching strategy, it is required to ensure that no coincidence exists between vertices belonging to the second vertex set in each matching vertex group included in the matching vertex group set, so that no preemption of data processing resources exists between selected data processing tasks belonging to the same task batch, and thus, the data processing tasks belonging to the same task batch can be guaranteed to be executed in parallel together.
S605, if a plurality of data processing subtasks include residual data processing subtasks besides the data processing subtasks included in the first task lot, determining a second task lot of which the task execution sequence belongs to a second priority in the residual data processing subtasks according to the determined execution plan setting strategy; the second task batch includes one or more data processing sub-tasks of the remaining data processing sub-tasks that the first participant and the second participant are capable of executing in parallel in common.
If the plurality of data processing subtasks do not have any residual data processing subtasks except the data processing subtasks included in the first task batch, the method indicates that all the data processing subtasks included in the multiparty task belong to the first task batch, and the first participant and the second participant can jointly and parallelly execute all the data processing subtasks included in the multiparty task. If there are remaining data processing subtasks in the plurality of data processing subtasks except the data processing subtasks included in the first task lot, a policy may be set according to the determined execution plan, and a second task lot whose task execution order belongs to the second priority may be determined in the remaining data processing subtasks, where the task execution order belonging to the first priority is before the task execution order belonging to the second priority. In step S605, the process of determining, according to the determined execution plan setting policy, the second task lot whose task execution order belongs to the second priority among the remaining data processing sub-tasks is the same as the process of determining, according to the determined execution plan setting policy, the first task lot whose task execution order belongs to the first priority among the plurality of data processing sub-tasks in step S604, which is specifically referred to the description of step S604 and will not be repeated here. Wherein the second task batch includes one or more data processing sub-tasks of the remaining data processing sub-tasks that the first participant and the second participant are capable of executing in parallel in common.
S606, if other data processing subtasks exist in the rest data processing subtasks except the data processing subtasks included in the second task batch, continuing to determine the task batch with the task execution sequence belonging to the subsequent priority until the priorities of all the data processing subtasks included in the multiparty task are determined.
If there are no other data processing subtasks except the data processing subtasks included in the second task lot in the remaining data processing subtasks, it may be stated that all the data processing subtasks included in the multiparty task may be executed in two task lots, and the data processing subtasks included in each task lot may be executed in parallel by both the first participant and the second participant. If other data processing subtasks exist in the remaining data processing subtasks except the data processing subtasks included in the second task batch, the task batch with the task execution sequence belonging to the subsequent priority can be continuously determined until the priorities of all the data processing subtasks included in the multiparty task are determined. In step S606, the process of determining that the task execution sequence belongs to the task lot with the subsequent priority is the same as the process of determining that the task execution sequence belongs to the first task lot with the first priority in step S604, specifically, refer to the process of determining that the task execution sequence belongs to the first task lot with the first priority in the plurality of data processing subtasks according to the determined execution plan setting policy in step S604, which is not described herein.
S607, according to the priority of the task execution sequence of each task batch, the first and second participators are dispatched to perform data processing on the data block group required for executing each task batch based on the deployed data processing resources, and the data processing subtasks in each task batch are jointly executed.
Taking the first task batch as an example, the process of scheduling the first participant and the second participant to perform data processing on the data block group corresponding to the first task batch based on the deployed data processing resources to jointly execute the data processing subtasks in the first task batch may specifically include: determining a target data block group required for executing the data processing subtasks in the first task batch, wherein in the target data block group, a data block deployed in a task execution node of a first participant is a target candidate data block, and a data block deployed in a task execution node of a second participant is a target matching data block; the first participant can perform data processing on the target candidate data block based on the data processing resources deployed in the task execution node for deploying the target candidate data block, and execute the task part of the data processing subtask corresponding to the target data block group on the first participant; and the first participant can trigger the second participant to perform data processing on the target matching data block based on the data processing resources deployed in the task execution node for deploying the target matching data block, and execute the task part of the data processing subtask corresponding to the target data block group on the second participant.
Wherein, for the first execution plan setting policy, the manner in which the first party triggers the second party may include: the first participant sends the identification of the matched data block to the second participant, so that the second participant selects a target matched data block in the matched data blocks according to the identification of the matched data block, and accordingly, the second participant can perform data processing on the target matched data block based on the data processing resources deployed in the task execution node for deploying the target matched data block, and execute the task part of the data processing subtask corresponding to the target data block group on the second participant; for the second execution plan setting policy, the manner in which the first party triggers the second party may include: the first party sends the target data block group to the second party, and the data blocks deployed in the second party in the target data block group are target matching data blocks, so that the second party can perform data processing on the target matching data blocks based on the data processing resources deployed in the task execution nodes for deploying the target matching data blocks, and execute the task parts of the data processing subtasks corresponding to the target data block group on the second party.
It should be noted that, the above steps S603-S607 introduce the determination of the data processing subtasks included in each task lot, and the execution of each task lot may be performed sequentially, that is, the data processing subtasks included in each task lot are determined first, and then each task lot is executed. In addition, the determination of the data processing subtasks included in each task lot, and the execution of each task lot may be performed alternately, which may be understood that after determining that the task execution order belongs to a first task lot of a first priority, the first participant and the second participant may be scheduled to jointly execute the data processing subtasks in the first task lot, after the execution of the data processing subtasks in the first task lot is completed, the second task lot of a second priority may be determined, after determining that the task execution order belongs to a second task lot of a second priority, the first participant and the second participant may be scheduled to jointly execute the data processing subtasks in the second task lot, and so on, until all the data processing subtasks in the multiparty task are completed.
In the embodiment of the present application, when the deployed data processing resources are insufficient and the matching relationship is a one-to-one matching relationship, it may be considered that, under the condition that the data processing resources are insufficient, the first participant and the second participant may jointly execute the data processing subtasks in parallel, divide the data processing subtasks included in the multiparty task into different task batches, and sequentially execute each task batch according to the priority to which the task execution sequence of the task batch belongs, so that smooth execution of the multiparty task may be ensured. When the deployed data processing resources are sufficient and the matching relationship is one-to-many, the data processing subtasks which can be jointly executed in parallel by the first participant and the second participant can be considered under the condition that no data processing resource preemption exists among the data processing subtasks, the data processing subtasks which are included in the multiparty tasks are divided into different task batches, and each task batch is sequentially executed according to the priority to which the task execution sequence of the task batch belongs, so that smooth execution of the multiparty tasks can be ensured.
The following describes a multi-party task scheduling process under a first execution task setting policy and a second execution task setting policy in combination with a specific application example:
(1) First execution task setting policy:
as shown in fig. 8, 6 data blocks are deployed in each of the first participant and the second participant, and the data blocks deployed in the first participant and the data blocks deployed in the second participant have a one-to-one matching relationship; the first participant comprises 3 task execution nodes, and each task execution node is provided with 1 data processing resource and 2 data blocks, so that the data processing resources are insufficient; the second participant comprises 2 task execution nodes, wherein 1 data processing resource and 3 data blocks are deployed in one task execution node, 2 data processing resources and 3 data blocks are deployed in the other task execution node, and the data processing resources are insufficient; it can be seen that the multiparty task shown in fig. 8 matches the first execution task setting policy. And, the number of data processing resources deployed in the first participant is the same as the number of data processing resources deployed in the second participant, and the number of task execution nodes included in the first participant is greater than the number of task execution nodes included in the second participant, so that the first participant can act as a task coordination master, and the second participant can act as a task coordination receiver to jointly participate in task scheduling of multi-party tasks. Under the first execution task setting strategy, the specific task scheduling process of the multiparty task is as follows:
The first participant as the task orchestration master may select unused data blocks, the number of which is equal to the number of data processing resources deployed in each task execution node, as candidate data blocks in order of the identity of the data blocks from small to large, e.g., the selected candidate data blocks are { data block 1, data block 2, and data block 3}; among the data blocks deployed by the second participant, the matching data blocks belonging to the same data block group as the candidate data block are { data block 1, data block 2, and data block 3}. The first party can send the selection result to the second party, the second party further screens, and as only 1 data processing resource is deployed in the task execution node 1 of the second party, but all the matching data blocks { data block 1 and data block 3} are deployed in the task execution node 1, the matching data blocks { data block 1 and data block 3} cannot be processed in parallel, therefore, one matching data block { data block 1 and data block 3} needs to be selected as a target matching data block, for example, the data block 1 is selected as a target matching data block, and the data block 3 can be deleted; 2 data processing resources are allocated in the task execution node 2 to which the matching data block { data block 2} belongs, so that the matching data block { data block 2} can be used as a target matching data block for data processing. Finally, a target data chunk group { { first participant that can be co-concurrently data processed may be selected: data block 1, second party: data block 1, { first party }: data block 2, second party: data block 2, the data processing subtasks corresponding to the target set of data blocks may be added to the first task batch, and the data processing subtasks in the first task batch may be scheduled for common parallel execution by the first participant and the second participant.
The running state (may also be referred to as execution progress) of the data processing subtasks between the first participant and the second participant may be detected, and if the data processing subtasks in the first task lot are scheduled to be executed by the first participant and the second participant together, the target data block group { { the first participant that can be processed by the common parallel data may be further selected: data block 4, second party: data block 4, { first party }: data block 3, second party: data block 3}, the data processing subtasks corresponding to the target set of data blocks may be added to the second task batch, and the data processing subtasks in the second task batch may be scheduled for common parallel execution by the first participant and the second participant. If the data processing subtasks in the second task batch are scheduled to be performed by the first participant and the second participant together, then the target set of data blocks that can be processed by the common parallel data { { first participant: data block 5, second party: data block 5, { first party }: data block 6, second party: data block 6, the data processing subtasks corresponding to the target set of data blocks may be added to a third task batch, and the data processing subtasks in the third task batch may be scheduled for common parallel execution by the first participant and the second participant.
Fig. 8 shows only one possible scheduling manner for a multi-party task, and other scheduling manners for a multi-party task are possible, and as shown in fig. 9, a target data block group { { first party that can be processed by common parallel data may be selected: data block 4, second party: data block 4, { first party }: data block 2, second party: data block 2, { first party }: data block 3, second party: data block 3}, the data processing subtasks corresponding to the target set of data blocks may be added to the first task batch, and the data processing subtasks in the first task batch may be scheduled for common parallel execution by the first participant and the second participant. The set of target data chunks that can be processed by the common parallel data { { first participant: data block 5, second party: data block 5, { first party }: data block 6, second party: data block 6, the data processing subtasks corresponding to the target set of data blocks may be added to the second task batch, and the data processing subtasks in the second task batch may be scheduled for common parallel execution by the first participant and the second participant. The set of target data chunks { first participant) that can be processed by the common parallel data may be further selected: data block 1, second party: data block 1, the data processing subtasks corresponding to the target set of data blocks may be added to a third task batch, and the data processing subtasks in the third task batch may be scheduled for common parallel execution by the first participant and the second participant.
(2) Second execution task setting policy:
as shown in fig. 10, 4 data blocks are deployed in each of the first participant and the second participant, the data blocks deployed in the first participant and the data blocks deployed in the second participant have a one-to-many matching relationship, and each data block includes task data for executing 2 data processing subtasks; the first participant comprises 2 task execution nodes, wherein 2 data processing resources and 2 data blocks are deployed in each task execution node, and the data processing resources are sufficient; the second participant comprises 2 task execution nodes, wherein 2 data processing resources and 2 data blocks are deployed in each task execution node, and the data processing resources are sufficient; it can be seen that the multiparty task shown in fig. 9 matches the second execution task setting policy. Fig. 10 also shows a bipartite graph corresponding to the multiparty task shown in fig. 9.
Based on the bipartite graph multiple matching strategy, the matching result is shown in fig. 11, and the target data block group { { first participant that can be processed by common parallel data can be selected: data block 1, second party: data block 1 and data block 3, { first party }: data block 2, second party: data block 2 and data block 4, the data processing subtasks corresponding to the target set of data blocks may be added to the first task batch, and the data processing subtasks in the first task batch may be scheduled for common parallel execution by the first participant and the second participant. The running state (may also be referred to as execution progress) of the data processing subtasks between the first participant and the second participant may be detected, and if the data processing subtasks in the first task lot are scheduled to be executed by the first participant and the second participant together, the target data block group { { the first participant that can be processed by the common parallel data may be further selected: data block 3, second party: data block 1 and data block 3, { first party }: data block 4, second party: data block 2 and data block 4, the data processing subtasks corresponding to the target set of data blocks may be added to the second task batch, and the data processing subtasks in the second task batch may be scheduled for common parallel execution by the first participant and the second participant.
The foregoing details of the method of embodiments of the present application are set forth in order to provide a better understanding of the foregoing aspects of embodiments of the present application, and accordingly, the following provides a device of embodiments of the present application.
Referring to fig. 12, fig. 12 is a schematic structural diagram of a processing apparatus for a multi-party task provided in an embodiment of the present application, where the processing apparatus for a multi-party task may be provided in a computer device provided in an embodiment of the present application, and the computer device may be a first participant in the multi-party task execution system shown in fig. 2, and the multi-party task needs to be jointly executed by the first participant and a second participant, where the multi-party task includes a plurality of data processing subtasks. The processing means of the multiparty task shown in fig. 12 may be a computer program (comprising program code) running in a computer device, which may be used to perform some or all of the steps of the method embodiments shown in fig. 5 or 6. Referring to fig. 12, the processing apparatus for multiparty tasks may include the following units:
a processing unit 1201, configured to parse the multiparty task to be executed to obtain a matching relationship between the data block deployed in the first party and the data block deployed in the second party; one data block group with a matching relationship includes task data required for executing at least one data processing subtask in the multiparty task;
An acquisition unit 1202 for acquiring resource information of data processing resources deployed in the first participant and the second participant;
a processing unit 1201, configured to set an execution plan for a plurality of data processing subtasks based on the matching relationship and the resource information; the execution plan is used for indicating the task execution sequence of the plurality of data processing subtasks;
the processing unit 1201 is further configured to schedule, according to a task execution order indicated by the execution plan, the first participant and the second participant to perform data processing on each data block group based on the deployed data processing resources, and execute a plurality of data processing subtasks together.
In one implementation, the processing unit 1201 is configured to, when setting an execution plan for a plurality of data processing subtasks based on the matching relationship and the resource information, specifically perform the following steps:
determining an execution plan setting strategy matched with the resource information and the matching relation together;
according to the determined execution plan setting strategy, determining a first task batch of which the task execution sequence belongs to a first priority from a plurality of data processing subtasks; the first task batch comprises one or more data processing subtasks which are jointly and parallelly executed by a first participant and a second participant;
If the plurality of data processing subtasks include residual data processing subtasks besides the data processing subtasks included in the first task batch, determining a second task batch of which the task execution sequence belongs to a second priority from the residual data processing subtasks according to the determined execution plan setting strategy; the second task batch comprises one or more data processing subtasks which are jointly and parallelly executed by the first participant and the second participant in the residual data processing subtasks;
if other data processing subtasks exist in the remaining data processing subtasks except the data processing subtasks included in the second task batch, continuing to determine the task batch of which the task execution sequence belongs to the subsequent priority until the priorities of all the data processing subtasks included in the multiparty task are determined; wherein the task execution order belonging to the first priority precedes the task execution order belonging to the second priority.
In one implementation, the resource information is used to indicate that data processing resources deployed in the first and second participants are insufficient; the matching relationship is a one-to-one matching relationship; the execution plan setting strategy matched with the resource information and the one-to-one matching relationship is a first execution plan setting strategy;
The processing unit 1201 is configured to set a policy according to a first execution plan, and when determining, from the plurality of data processing subtasks, that the task execution sequence belongs to a first task batch with a first priority, specifically perform the following steps:
selecting a candidate data block from the data blocks deployed by the first participant;
determining a matching data block belonging to the same data block group as the candidate data block in the data blocks deployed by the second participant, and sending the identification of the matching data block to the second participant, so that the second participant selects a target matching data block in the matching data blocks according to the identification of the matching data block;
receiving an identification of a target matching data block sent by a second participant, and determining target candidate data blocks belonging to the same data block group as the target matching data block in the candidate data blocks according to the identification of the target matching data block; a target candidate data block with a matching relationship and a target matching data block form a target data block group;
determining a target data processing subtask corresponding to the determined target data block group as a data processing subtask included in the first task batch; the target data chunk set includes task data required to perform a target data processing subtask in the multi-party task.
In one implementation, the first participant and the second participant each include a task execution node in which the data block and the data processing resources are deployed;
the processing unit 1201 is configured to, when selecting a candidate data block from the data blocks deployed by the first party, specifically perform the following steps:
selecting unused data blocks with the quantity smaller than or equal to the quantity of idle data processing resources of each task execution node from the data blocks deployed by each task execution node aiming at each task execution node included by the first participant;
and determining unused data blocks selected by each task execution node included by the first participant as candidate data blocks.
In one implementation, the first participant and the second participant each include a task execution node in which the data block and the data processing resources are deployed; the process of selecting a target matching data block from the matching data blocks by the second participant according to the identification of the matching data block comprises the following steps:
determining target task execution nodes deployed in the second party by each matching data block according to the identification of the matching data block;
For each target task execution node, if the number of the matched data blocks deployed in the target task execution node is greater than the number of idle data processing resources of the target task execution node, deleting part of the matched data blocks in the matched data blocks deployed in the target task execution node, so that the number of the residual matched data blocks deleted by the target task execution node is smaller than or equal to the number of idle data processing resources of the target task execution node;
and determining the remaining matching data blocks in each target task execution node as target matching data blocks.
In one implementation, the processing device of the multiparty task is arranged in a first participant, wherein the first participant is a task coordination master, and the second participant is a task coordination receiver; the processing unit 1201 is further configured to perform the following steps:
acquiring a first amount of data processing resources deployed by a first participant;
acquiring a second amount of data processing resources deployed by a second participant;
and if the first quantity is smaller than the second quantity, determining that the first party is a task coordination master, and the second party is the task coordination receiver.
In one implementation, the first participant and the second participant each include a task execution node in which the data block and the data processing resources are deployed; the processing unit 1201 is further configured to perform the following steps:
If the first number is equal to the second number, acquiring a third number of task execution nodes included in the first participant and acquiring a fourth number of task execution nodes included in the second participant;
if the third number is larger than the fourth number, the first participant is determined to be a task coordination master, and the second participant is determined to be a task coordination receiver.
In one implementation, the resource information is used to indicate that data processing resources deployed in the first and second participants are sufficient; the matching relationship is a one-to-many matching relationship; the execution plan setting strategy matched with the resource information and the one-to-many matching relationship is a second execution plan setting strategy;
the processing unit 1201 is configured to set a policy according to the second execution plan, and when determining that the task execution sequence belongs to the first task lot with the first priority among the plurality of data processing subtasks, specifically is configured to execute the following steps:
generating a bipartite graph according to the matching relationship between the data blocks deployed in the first participant and the data blocks deployed in the second participant; the bipartite graph comprises a first vertex set, a second vertex set and an edge set; vertices in the first set of vertices are used to represent data blocks deployed in the first participant and vertices in the second set of vertices are used to represent data blocks deployed in the second participant; edges in the edge set are used to represent a matching relationship between data blocks deployed in the first party and data blocks deployed in the second party; the data block group with the matching relation is expressed as a matching vertex group in the bipartite graph;
According to a bipartite graph multiple matching strategy, determining a target matching vertex group in each matching vertex group included in the bipartite graph;
determining a data block group corresponding to the target matching vertex group as a target data block group;
determining a target data processing subtask corresponding to the determined target data block group as a data processing subtask included in the first task batch; the target data chunk set includes task data required to perform a target data processing subtask in the multi-party task.
In one implementation, the processing unit 1201 is configured to, according to a bipartite graph multiple matching policy, determine, when determining a target matching vertex group from the matching vertex groups included in the bipartite graph, specifically perform the following steps:
selecting a reference matched vertex group from all matched vertex groups included in the bipartite graph, adding the reference matched vertex group into a matched vertex group set, and taking the matched vertex groups except the reference matched vertex group in the bipartite graph as residual matched vertex groups;
if the vertex belonging to the second vertex set in the first residual matched vertex set in the residual matched vertex sets does not coincide with the vertex belonging to the second vertex set in the matched vertex set, adding the first residual matched vertex set into the matched vertex set, and continuing traversing the second residual matched vertex set in the residual matched vertex sets until all residual matched vertex sets are traversed;
If the vertex belonging to the second vertex set in the first residual matched vertex set in the residual matched vertex sets is overlapped with the vertex belonging to the second vertex set in the matched vertex set, continuing traversing the second residual matched vertex set in the residual matched vertex sets until the traversing of all the residual matched vertex sets is finished;
and determining the matched vertex group included in the matched vertex group set as a target matched vertex group.
In one implementation, the first participant and the second participant each include a task execution node in which the data block and the data processing resources are deployed; the processing unit 1201 is configured to schedule the first participant and the second participant to perform data processing on the data block group corresponding to the first task batch based on the deployed data processing resources, and when executing the data processing subtasks in the first task batch together, specifically is configured to execute the following steps:
determining a target data block group required for executing the data processing subtasks in the first task batch, wherein in the target data block group, a data block deployed in a task execution node of a first participant is a target candidate data block, and a data block deployed in a task execution node of a second participant is a target matching data block;
Performing data processing on the target candidate data block based on the data processing resources deployed in the task execution node for deploying the target candidate data block, and executing the task part of the data processing subtask corresponding to the target data block group on the first participant;
and triggering the second party to perform data processing on the target matching data block based on the data processing resources deployed in the task execution node for deploying the target matching data block, and executing the task part of the data processing subtask corresponding to the target data block group on the second party.
In one implementation, the obtaining unit 1202 is configured to, when obtaining resource information of data processing resources deployed in the first participant and the second participant, specifically perform the following steps:
obtaining a first total amount of data processing resources deployed in a first participant and a second participant;
obtaining a second total number of data chunks deployed in the first and second participants;
if the first total number is smaller than the second total number, generating resource information, wherein the resource information is used for indicating that the data processing resources deployed in the first participant and the second participant are insufficient;
If the first total number is greater than or equal to the second total number, resource information is generated, the resource information indicating that data processing resources deployed in the first and second participants are sufficient.
According to another embodiment of the present application, each unit in the multiparty task processing device shown in fig. 12 may be separately or completely combined into one or several other units to form a unit, or some unit(s) thereof may be further split into multiple units with smaller functions to form a unit, which may achieve the same operation without affecting the implementation of the technical effects of the embodiments of the present application. The above units are divided based on logic functions, and in practical applications, the functions of one unit may be implemented by a plurality of units, or the functions of a plurality of units may be implemented by one unit. In other embodiments of the present application, the processing device for multiparty tasks may also include other units, and in practical applications, these functions may also be implemented with assistance from other units, and may be implemented by cooperation of multiple units.
According to another embodiment of the present application, a processing apparatus for multiparty tasks as shown in fig. 12 may be constructed by running a computer program (including program code) capable of executing some or all of the steps involved in the method as shown in fig. 5 or 6 on a general-purpose computing device such as a computer including a processing element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read only storage medium (ROM), and the like, and a storage element, and a method of processing multiparty files of the embodiment of the present application may be implemented. The computer program may be recorded on, for example, a computer-readable storage medium, and loaded into and executed by the computing device described above.
In this embodiment, a multi-party task needs to be jointly executed by a first participant and a second participant, where the multi-party task may include a plurality of data processing subtasks, and may set a task execution order for the plurality of data processing subtasks according to a matching relationship between a data block deployed in the first participant and a data block deployed in the second participant and resource information of data processing resources deployed in the first participant and the second participant, where a data block group having the matching relationship may include task data for executing at least one data processing subtask of the plurality of data processing subtasks; then, the first participant and the second participant can be scheduled to perform data processing on each data block group based on the deployed data processing resources according to the set task execution sequence, so that a plurality of data processing subtasks are jointly executed, that is, the first participant and the second participant can be scheduled to jointly execute a plurality of data processing subtasks according to the set task execution sequence, and a plurality of data processing subtasks included in the multiparty task can be reasonably scheduled and executed between the first participant and the second participant, so that smooth execution of the multiparty task can be ensured.
Based on the above method and apparatus embodiments, embodiments of the present application provide a computer device that may be a first participant in a multi-party task execution system as shown in fig. 2. Referring to fig. 13, fig. 13 is a schematic structural diagram of a computer device according to an embodiment of the present application. The computer device shown in fig. 13 includes at least a processor 1301, an input interface 1302, an output interface 1303, and a computer-readable storage medium 1304. Wherein the processor 1301, the input interface 1302, the output interface 1303, and the computer-readable storage medium 1304 may be connected by a bus or other means.
The computer readable storage medium 1304 may be stored in a memory of a computer device, the computer readable storage medium 1304 for storing a computer program comprising computer instructions, and the processor 1301 for executing the program instructions stored in the computer readable storage medium 1304. Processor 1301 (or CPU (Central Processing Unit, central processing unit)) is a computing core and a control core of a computer device adapted to implement one or more computer instructions, in particular adapted to load and execute one or more computer instructions to implement a corresponding method flow or a corresponding function.
The embodiment of the application also provides a computer readable storage medium, which is a memory device in a computer device and is used for storing programs and data. It is understood that the computer readable storage medium herein may include both built-in storage media in a computer device and extended storage media supported by the computer device. The computer-readable storage medium provides storage space that stores an operating system of the computer device. Also stored in the memory space are one or more computer instructions, which may be one or more computer programs (including program code), adapted to be loaded and executed by the processor. Note that the computer readable storage medium can be either a high-speed RAM Memory or a Non-Volatile Memory (Non-Volatile Memory), such as at least one magnetic disk Memory; optionally, at least one computer readable storage medium remotely located from the aforementioned processor.
In some embodiments, one or more computer instructions stored in computer-readable storage medium 1304 may be loaded and executed by processor 1301 to implement the corresponding steps of the method of processing a multi-party task described above with respect to fig. 5 or 6. The multiparty task is to be jointly participated in by the first participant and the second participant, and includes a plurality of data processing subtasks, in which in a specific implementation, computer instructions in the computer readable storage medium 1304 are loaded by the processor 1301 and perform the following steps:
Analyzing the multiparty task to be executed to obtain a matching relationship between the data blocks deployed in the first participant and the data blocks deployed in the second participant; one data block group with a matching relationship includes task data required for executing at least one data processing subtask in the multiparty task;
acquiring resource information of data processing resources deployed in a first participant and a second participant;
setting an execution plan for a plurality of data processing subtasks based on the matching relationship and the resource information; the execution plan is used for indicating the task execution sequence of the plurality of data processing subtasks;
and scheduling the first party and the second party to perform data processing on each data block group based on the deployed data processing resources according to the task execution sequence indicated by the execution plan, and jointly executing a plurality of data processing subtasks.
In one implementation, the computer instructions in the computer readable storage medium 1304 are loaded by the processor 1301 and executed to set an execution plan for a plurality of data processing subtasks based on the matching relationship and the resource information, specifically for performing the steps of:
determining an execution plan setting strategy matched with the resource information and the matching relation together;
According to the determined execution plan setting strategy, determining a first task batch of which the task execution sequence belongs to a first priority from a plurality of data processing subtasks; the first task batch comprises one or more data processing subtasks which are jointly and parallelly executed by a first participant and a second participant;
if the plurality of data processing subtasks include residual data processing subtasks besides the data processing subtasks included in the first task batch, determining a second task batch of which the task execution sequence belongs to a second priority from the residual data processing subtasks according to the determined execution plan setting strategy; the second task batch comprises one or more data processing subtasks which are jointly and parallelly executed by the first participant and the second participant in the residual data processing subtasks;
if other data processing subtasks exist in the remaining data processing subtasks except the data processing subtasks included in the second task batch, continuing to determine the task batch of which the task execution sequence belongs to the subsequent priority until the priorities of all the data processing subtasks included in the multiparty task are determined; wherein the task execution order belonging to the first priority precedes the task execution order belonging to the second priority.
In one implementation, the resource information is used to indicate that data processing resources deployed in the first and second participants are insufficient; the matching relationship is a one-to-one matching relationship; the execution plan setting strategy matched with the resource information and the one-to-one matching relationship is a first execution plan setting strategy;
computer instructions in the computer readable storage medium 1304 are loaded by the processor 1301 and executed to set a policy according to a first execution plan, and when determining a first task batch in which a task execution order belongs to a first priority among a plurality of data processing sub-tasks, the computer instructions are specifically configured to perform the steps of:
selecting a candidate data block from the data blocks deployed by the first participant;
determining a matching data block belonging to the same data block group as the candidate data block in the data blocks deployed by the second participant, and sending the identification of the matching data block to the second participant, so that the second participant selects a target matching data block in the matching data blocks according to the identification of the matching data block;
receiving an identification of a target matching data block sent by a second participant, and determining target candidate data blocks belonging to the same data block group as the target matching data block in the candidate data blocks according to the identification of the target matching data block; a target candidate data block with a matching relationship and a target matching data block form a target data block group;
Determining a target data processing subtask corresponding to the determined target data block group as a data processing subtask included in the first task batch; the target data chunk set includes task data required to perform a target data processing subtask in the multi-party task.
In one implementation, the first participant and the second participant each include a task execution node in which the data block and the data processing resources are deployed;
computer instructions in the computer-readable storage medium 1304 are loaded by the processor 1301 and executed to select a candidate data block from the data blocks deployed by the first participant, specifically for performing the steps of:
selecting unused data blocks with the quantity smaller than or equal to the quantity of idle data processing resources of each task execution node from the data blocks deployed by each task execution node aiming at each task execution node included by the first participant;
and determining unused data blocks selected by each task execution node included by the first participant as candidate data blocks.
In one implementation, the first participant and the second participant each include a task execution node in which the data block and the data processing resources are deployed; the process of selecting a target matching data block from the matching data blocks by the second participant according to the identification of the matching data block comprises the following steps:
Determining target task execution nodes deployed in the second party by each matching data block according to the identification of the matching data block;
for each target task execution node, if the number of the matched data blocks deployed in the target task execution node is greater than the number of idle data processing resources of the target task execution node, deleting part of the matched data blocks in the matched data blocks deployed in the target task execution node, so that the number of the residual matched data blocks deleted by the target task execution node is smaller than or equal to the number of idle data processing resources of the target task execution node;
and determining the remaining matching data blocks in each target task execution node as target matching data blocks.
In one implementation, the processing device of the multiparty task is arranged in a first participant, wherein the first participant is a task coordination master, and the second participant is a task coordination receiver; computer instructions in the computer-readable storage medium 1304 are loaded by the processor 1301 and are also used to perform the steps of:
acquiring a first amount of data processing resources deployed by a first participant;
acquiring a second amount of data processing resources deployed by a second participant;
And if the first quantity is smaller than the second quantity, determining that the first party is a task coordination master, and the second party is the task coordination receiver.
In one implementation, the first participant and the second participant each include a task execution node in which the data block and the data processing resources are deployed; computer instructions in the computer-readable storage medium 1304 are loaded by the processor 1301 and are also used to perform the steps of:
if the first number is equal to the second number, acquiring a third number of task execution nodes included in the first participant and acquiring a fourth number of task execution nodes included in the second participant;
if the third number is larger than the fourth number, the first participant is determined to be a task coordination master, and the second participant is determined to be a task coordination receiver.
In one implementation, the resource information is used to indicate that data processing resources deployed in the first and second participants are sufficient; the matching relationship is a one-to-many matching relationship; the execution plan setting strategy matched with the resource information and the one-to-many matching relationship is a second execution plan setting strategy;
computer instructions in the computer readable storage medium 1304 are loaded by the processor 1301 and executed to set a policy according to the second execution plan, when determining a first task batch of the plurality of data processing sub-tasks that the task execution order belongs to the first priority, specifically for performing the steps of:
Generating a bipartite graph according to the matching relationship between the data blocks deployed in the first participant and the data blocks deployed in the second participant; the bipartite graph comprises a first vertex set, a second vertex set and an edge set; vertices in the first set of vertices are used to represent data blocks deployed in the first participant and vertices in the second set of vertices are used to represent data blocks deployed in the second participant; edges in the edge set are used to represent a matching relationship between data blocks deployed in the first party and data blocks deployed in the second party; the data block group with the matching relation is expressed as a matching vertex group in the bipartite graph;
according to a bipartite graph multiple matching strategy, determining a target matching vertex group in each matching vertex group included in the bipartite graph;
determining a data block group corresponding to the target matching vertex group as a target data block group;
determining a target data processing subtask corresponding to the determined target data block group as a data processing subtask included in the first task batch; the target data chunk set includes task data required to perform a target data processing subtask in the multi-party task.
In one implementation, computer instructions in the computer-readable storage medium 1304 are loaded by the processor 1301 and executed to perform the steps of, in particular, determining a target matching vertex group among the respective matching vertex groups included in the bipartite graph according to a bipartite graph multiple matching policy:
Selecting a reference matched vertex group from all matched vertex groups included in the bipartite graph, adding the reference matched vertex group into a matched vertex group set, and taking the matched vertex groups except the reference matched vertex group in the bipartite graph as residual matched vertex groups;
if the vertex belonging to the second vertex set in the first residual matched vertex set in the residual matched vertex sets does not coincide with the vertex belonging to the second vertex set in the matched vertex set, adding the first residual matched vertex set into the matched vertex set, and continuing traversing the second residual matched vertex set in the residual matched vertex sets until all residual matched vertex sets are traversed;
if the vertex belonging to the second vertex set in the first residual matched vertex set in the residual matched vertex sets is overlapped with the vertex belonging to the second vertex set in the matched vertex set, continuing traversing the second residual matched vertex set in the residual matched vertex sets until the traversing of all the residual matched vertex sets is finished;
and determining the matched vertex group included in the matched vertex group set as a target matched vertex group.
In one implementation, the first participant and the second participant each include a task execution node in which the data block and the data processing resources are deployed; the computer instructions in the computer readable storage medium 1304 are loaded by the processor 1301 and executed to schedule the first participant and the second participant to perform data processing on the data block group corresponding to the first task batch based on the deployed data processing resources, and when the data processing subtasks in the first task batch are jointly executed, the method specifically is used for executing the following steps:
Determining a target data block group required for executing the data processing subtasks in the first task batch, wherein in the target data block group, a data block deployed in a task execution node of a first participant is a target candidate data block, and a data block deployed in a task execution node of a second participant is a target matching data block;
performing data processing on the target candidate data block based on the data processing resources deployed in the task execution node for deploying the target candidate data block, and executing the task part of the data processing subtask corresponding to the target data block group on the first participant;
and triggering the second party to perform data processing on the target matching data block based on the data processing resources deployed in the task execution node for deploying the target matching data block, and executing the task part of the data processing subtask corresponding to the target data block group on the second party.
In one implementation, the computer instructions in the computer readable storage medium 1304 are loaded and executed by the processor 1301 to obtain resource information for data processing resources deployed in the first and second participants, specifically to perform the steps of:
Obtaining a first total amount of data processing resources deployed in a first participant and a second participant;
obtaining a second total number of data chunks deployed in the first and second participants;
if the first total number is smaller than the second total number, generating resource information, wherein the resource information is used for indicating that the data processing resources deployed in the first participant and the second participant are insufficient;
if the first total number is greater than or equal to the second total number, resource information is generated, the resource information indicating that data processing resources deployed in the first and second participants are sufficient.
In this embodiment, a multi-party task needs to be jointly executed by a first participant and a second participant, where the multi-party task may include a plurality of data processing subtasks, and may set a task execution order for the plurality of data processing subtasks according to a matching relationship between a data block deployed in the first participant and a data block deployed in the second participant and resource information of data processing resources deployed in the first participant and the second participant, where a data block group having the matching relationship may include task data for executing at least one data processing subtask of the plurality of data processing subtasks; then, the first participant and the second participant can be scheduled to perform data processing on each data block group based on the deployed data processing resources according to the set task execution sequence, so that a plurality of data processing subtasks are jointly executed, that is, the first participant and the second participant can be scheduled to jointly execute a plurality of data processing subtasks according to the set task execution sequence, and a plurality of data processing subtasks included in the multiparty task can be reasonably scheduled and executed between the first participant and the second participant, so that smooth execution of the multiparty task can be ensured.
According to one aspect of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions to cause the computer device to perform the methods of processing the multi-party tasks provided in the various alternatives described above.
The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (14)

1. The method for processing the multiparty task is characterized in that the multiparty task is needed to be jointly participated in and executed by a first participator and a second participator, and the multiparty task comprises a plurality of data processing subtasks; the method comprises the following steps:
analyzing the multiparty task to be executed to obtain a matching relationship between the data block deployed in the first participant and the data block deployed in the second participant; one data block group with a matching relationship includes task data required for executing at least one data processing subtask in the multiparty task;
Acquiring resource information of data processing resources deployed in the first participant and the second participant;
setting an execution plan for the plurality of data processing subtasks based on the matching relationship and the resource information; the execution plan is used for indicating the task execution sequence of the plurality of data processing subtasks;
and according to the task execution sequence indicated by the execution plan, scheduling the first participant and the second participant to perform data processing on each data block group based on deployed data processing resources so as to jointly execute the plurality of data processing subtasks.
2. The method of claim 1, wherein the setting an execution plan for the plurality of data processing subtasks based on the matching relationship and the resource information comprises:
determining an execution plan setting strategy matched with the resource information and the matching relation together;
determining a first task batch of which the task execution sequence belongs to a first priority from the plurality of data processing subtasks according to the determined execution plan setting strategy; the first task batch comprises one or more data processing subtasks which are executed in parallel jointly by the first participant and the second participant;
If the plurality of data processing subtasks include a residual data processing subtask except the data processing subtask included in the first task batch, determining a second task batch of which the task execution sequence belongs to a second priority in the residual data processing subtask according to the determined execution plan setting strategy; the second task batch comprises one or more data processing subtasks which are jointly and parallelly executed by the first participant and the second participant in the residual data processing subtasks;
if other data processing subtasks exist in the remaining data processing subtasks except the data processing subtasks included in the second task batch, continuing to determine the task batch of which the task execution sequence belongs to the subsequent priority until the priorities of all the data processing subtasks included in the multiparty task to which the task execution sequence belongs are determined; wherein the task execution order belonging to the first priority is before the task execution order belonging to the second priority.
3. The method of claim 2, wherein the resource information is used to indicate insufficient data processing resources deployed in the first and second participants; the matching relationship is a one-to-one matching relationship; the execution plan setting policy which is matched with the resource information and the one-to-one matching relationship together is a first execution plan setting policy;
A process of determining, among the plurality of data processing subtasks, a first task batch whose task execution order belongs to a first priority according to the first execution plan setting policy, including:
selecting a candidate data block from the data blocks deployed by the first participant;
determining a matched data block belonging to the same data block group as the candidate data block in the data blocks deployed by the second participant, and sending the identification of the matched data block to the second participant so that the second participant selects a target matched data block from the matched data blocks according to the identification of the matched data block;
receiving the identification of the target matching data block sent by the second participant, and determining target candidate data blocks belonging to the same data block group as the target matching data block in the candidate data blocks according to the identification of the target matching data block; a target candidate data block with a matching relationship and a target matching data block form a target data block group;
determining the target data processing subtasks corresponding to the determined target data block group as the data processing subtasks included in the first task batch; the set of target data blocks includes task data required to perform the target data processing subtasks in the multi-party task.
4. The method of claim 3, wherein the first participant and the second participant each comprise a task execution node, the data block and the data processing resource being disposed in the task execution node;
the selecting a candidate data block from the data blocks deployed by the first participant comprises:
selecting unused data blocks with the quantity smaller than or equal to the quantity of idle data processing resources of each task execution node from the data blocks deployed by each task execution node aiming at each task execution node included by the first participant;
and determining unused data blocks selected by each task execution node included by the first participant as the candidate data blocks.
5. The method of claim 3, wherein the first participant and the second participant each comprise a task execution node, the data block and the data processing resource being disposed in the task execution node; the process of selecting a target matching data block from the matching data blocks by the second party according to the identification of the matching data block includes:
determining target task execution nodes deployed in the second party by each matching data block according to the identification of the matching data block;
For each target task execution node, if the number of the matched data blocks deployed in the target task execution node is greater than the number of idle data processing resources of the target task execution node, deleting part of the matched data blocks in the matched data blocks deployed in the target task execution node, so that the number of the residual matched data blocks deleted by the target task execution node is smaller than or equal to the number of idle data processing resources of the target task execution node;
and determining the rest matched data blocks in each target task execution node as the target matched data blocks.
6. The method of claim 3, wherein the method is performed by a first party, the first party being a task orchestration master and the second party being a task orchestration recipient; the method further comprises the steps of:
acquiring a first amount of data processing resources deployed by the first participant;
obtaining a second amount of data processing resources deployed by the second participant;
and if the first quantity is smaller than the second quantity, determining that the first party is the task coordination master and the second party is the task coordination receiver.
7. The method of claim 6, wherein the first participant and the second participant each comprise a task execution node in which the data block and the data processing resource are deployed; the method further comprises the steps of:
if the first number is equal to the second number, acquiring a third number of task execution nodes included in the first participant and acquiring a fourth number of task execution nodes included in the second participant;
and if the third number is greater than the fourth number, determining that the first participant is the task orchestration master and the second participant is the task orchestration receiver.
8. The method of claim 2, wherein the resource information is used to indicate that data processing resources deployed in the first participant and the second participant are sufficient; the matching relationship is a one-to-many matching relationship; the execution plan setting policy which is matched with the resource information and the one-to-many matching relationship together is a second execution plan setting policy;
a process of determining, among the plurality of data processing sub-tasks, a first task batch whose task execution order belongs to a first priority according to the second execution plan setting policy, including:
Generating a bipartite graph according to a matching relationship between the data blocks deployed in the first participant and the data blocks deployed in the second participant; the bipartite graph comprises a first vertex set, a second vertex set and an edge set; vertices in the first set of vertices are used to represent data blocks deployed in the first participant and vertices in the second set of vertices are used to represent data blocks deployed in the second participant; the edges in the edge set are used for representing a matching relationship between the data blocks deployed in the first participant and the data blocks deployed in the second participant; the data block group with the matching relation is expressed as a matching vertex group in the bipartite graph;
determining a target matching vertex group in each matching vertex group included in the bipartite graph according to a bipartite graph multiple matching strategy;
determining the data block group corresponding to the target matching vertex group as a target data block group;
determining the target data processing subtasks corresponding to the determined target data block group as the data processing subtasks included in the first task batch; the set of target data blocks includes task data required to perform the target data processing subtasks in the multi-party task.
9. The method of claim 8, wherein said determining a set of target matching vertices among the respective sets of matching vertices comprised in the bipartite graph according to a bipartite graph multiple matching strategy comprises:
selecting a reference matched vertex group from each matched vertex group included in the bipartite graph, and adding the reference matched vertex group to a matched vertex group set, wherein the matched vertex groups except the reference matched vertex group in the bipartite graph are the rest matched vertex groups;
if the vertex belonging to the second vertex set in the first residual matched vertex set in the residual matched vertex sets does not coincide with the vertex belonging to the second vertex set in the matched vertex set, adding the first residual matched vertex set into the matched vertex set, and continuing to traverse the second residual matched vertex set in the residual matched vertex sets until all residual matched vertex sets are traversed;
if the vertex belonging to the second vertex set in the first residual matched vertex set in the residual matched vertex sets is overlapped with the vertex belonging to the second vertex set in the matched vertex set, continuing traversing the second residual matched vertex set in the residual matched vertex sets until all residual matched vertex sets are traversed;
And determining each matched vertex group included in the matched vertex group set as the target matched vertex group.
10. The method of claim 2, wherein the first participant and the second participant each comprise a task execution node in which the data block and the data processing resource are deployed; scheduling the first participant and the second participant to perform data processing on the data block group corresponding to the first task batch based on the deployed data processing resources, and jointly executing the data processing subtasks in the first task batch, including:
determining a target data block group required for executing the data processing subtasks in the first task batch, wherein in the target data block group, a data block deployed in a task execution node of the first participant is a target candidate data block, and a data block deployed in a task execution node of the second participant is a target matching data block;
performing data processing on the target candidate data block based on data processing resources deployed in a task execution node deploying the target candidate data block, and executing a task part of a data processing subtask corresponding to the target data block group on the first participant; the method comprises the steps of,
Triggering the second party to perform data processing on the target matching data block based on the data processing resources deployed in the task execution node deploying the target matching data block, and executing the task part of the data processing subtask corresponding to the target data block group on the second party.
11. The method of claim 1, wherein the obtaining resource information for data processing resources deployed in the first party and the second party comprises:
obtaining a first total amount of data processing resources deployed in the first and second participants;
obtaining a second total number of data chunks deployed in the first and second participants;
if the first total number is less than the second total number, generating the resource information, the resource information being used to indicate that the data processing resources deployed in the first and second participants are insufficient;
and if the first total number is greater than or equal to the second total number, generating the resource information, wherein the resource information is used for indicating that the data processing resources deployed in the first participant and the second participant are sufficient.
12. A processing device for a multiparty task, wherein the multiparty task is executed by a first participant and a second participant, and the multiparty task comprises a plurality of data processing subtasks; the device comprises:
the processing unit is used for analyzing the multiparty task to be executed to obtain a matching relationship between the data block deployed in the first participant and the data block deployed in the second participant; one data block group with a matching relationship includes task data required for executing at least one data processing subtask in the multiparty task;
an acquisition unit configured to acquire resource information of data processing resources deployed in the first party and the second party;
the processing unit is further used for setting an execution plan for the plurality of data processing subtasks based on the matching relation and the resource information; the execution plan is used for indicating the task execution sequence of the plurality of data processing subtasks;
the processing unit is further configured to schedule, according to the task execution order indicated by the execution plan, the first participant and the second participant to perform data processing on each data block group based on deployed data processing resources, so as to jointly execute the plurality of data processing subtasks.
13. A computer device, the computer device comprising:
a processor adapted to implement a computer program;
a computer readable storage medium storing a computer program adapted to be loaded by the processor and to perform the method of processing a multi-party task as claimed in any one of claims 1-11.
14. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program adapted to be loaded by a processor and to perform the method of processing a multi-party task according to any of the claims 1-11.
CN202311018571.3A 2023-08-14 2023-08-14 Multi-party task processing method and device, computer equipment and storage medium Active CN116737348B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311018571.3A CN116737348B (en) 2023-08-14 2023-08-14 Multi-party task processing method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311018571.3A CN116737348B (en) 2023-08-14 2023-08-14 Multi-party task processing method and device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116737348A CN116737348A (en) 2023-09-12
CN116737348B true CN116737348B (en) 2024-01-02

Family

ID=87911764

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311018571.3A Active CN116737348B (en) 2023-08-14 2023-08-14 Multi-party task processing method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116737348B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113961318A (en) * 2020-07-20 2022-01-21 百度在线网络技术(北京)有限公司 Distributed scheduling method, device, equipment and storage medium
CN114675964A (en) * 2022-03-08 2022-06-28 杭州博盾习言科技有限公司 Distributed scheduling method, system and medium based on Federal decision tree model training
WO2022252995A1 (en) * 2021-06-02 2022-12-08 支付宝(杭州)信息技术有限公司 Smart contract deployment method and apparatus

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113537508B (en) * 2021-06-18 2024-02-02 百度在线网络技术(北京)有限公司 Processing method and device for federal calculation, electronic equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113961318A (en) * 2020-07-20 2022-01-21 百度在线网络技术(北京)有限公司 Distributed scheduling method, device, equipment and storage medium
WO2022252995A1 (en) * 2021-06-02 2022-12-08 支付宝(杭州)信息技术有限公司 Smart contract deployment method and apparatus
CN114675964A (en) * 2022-03-08 2022-06-28 杭州博盾习言科技有限公司 Distributed scheduling method, system and medium based on Federal decision tree model training

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
联邦学习及其在电信行业的应用;李鉴 等;信息通信技术与政策(第09期);全文 *

Also Published As

Publication number Publication date
CN116737348A (en) 2023-09-12

Similar Documents

Publication Publication Date Title
US20200364608A1 (en) Communicating in a federated learning environment
CN113641457B (en) Container creation method, device, apparatus, medium, and program product
US8434085B2 (en) Scalable scheduling of tasks in heterogeneous systems
CN104317749B (en) Information write-in method and device
CN113900598B (en) Data storage method, device, equipment and storage medium based on block chain
CN112035238A (en) Task scheduling processing method and device, cluster system and readable storage medium
CN108280150A (en) A kind of distribution asynchronous service distribution method and system
CN106503091A (en) A kind of implementation method of changeable data structure automatic synchronization coupling
CN114327844A (en) Memory allocation method, related device and computer readable storage medium
CN110308984A (en) It is a kind of for handle geographically distributed data across cluster computing system
CN111860853A (en) Online prediction system, online prediction equipment, online prediction method and electronic equipment
CN112668880A (en) Work order scheduling method and device for distribution network grid, computer equipment and storage medium
CN113946431A (en) Resource scheduling method, system, medium and computing device
Wang et al. Solving coupling security problem for sustainable sensor-cloud systems based on fog computing
CN113946389A (en) Federal learning process execution optimization method, device, storage medium, and program product
CN116737348B (en) Multi-party task processing method and device, computer equipment and storage medium
CN116010051A (en) Federal learning multitasking scheduling method and device
CN111049900B (en) Internet of things flow calculation scheduling method and device and electronic equipment
CN113821313A (en) Task scheduling method and device and electronic equipment
CN102945154B (en) Resource conflict digestion method in workflow execution
Prasad et al. Performance Analysis of Schedulers to Handle Multi Jobs in Hadoop Cluster.
CN109104497A (en) A kind of method for processing business and device based on cloud platform
CN110427763B (en) Consensus method of distributed system based on predefined execution codes
CN115361382B (en) Data processing method, device, equipment and storage medium based on data group
CN113592089B (en) Gradient synchronization method for compressed sensing in distributed deep learning training scene

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40092349

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant