WO2023231781A1

WO2023231781A1 - Distributed collaborative ai task evaluation method, management apparatus, control apparatus and system

Info

Publication number: WO2023231781A1
Application number: PCT/CN2023/094843
Authority: WO
Inventors: 郑子木; 杨锦; 罗思奇; 齐飞
Original assignee: 华为云计算技术有限公司
Priority date: 2022-06-02
Filing date: 2023-05-17
Publication date: 2023-12-07
Also published as: CN117215884A

Abstract

Provided in the embodiments of the present application are a distributed collaborative artificial intelligence (AI) task evaluation method, a management apparatus, a control apparatus and a system. The method comprises: acquiring distributed collaborative AI task configuration and a distributed collaborative AI task object, the task configuration comprising the configuration of a distributed collaborative AI task environment; according to the configuration of the task environment and the task object, receiving a first management instruction, the first management instruction comprising a management instruction for the task environment and/or a management instruction for the task object; and managing a distributed collaborative AI task use case according to the first management instruction, the task use case comprising the task environment and the task object. Therefore, by means of flexibly configuring AI task use cases, the execution conditions of different AI task use cases can be obtained, and then an edge cloud collaborative distributed AI task architecture can be easily implemented and deployed.

Description

Distributed collaborative AI task evaluation method, management device, control device and system

This application requires the priority of the Chinese patent application submitted to the China Patent Office on June 2, 2022, with the application number 202210623375.8 and the application name "Distributed Collaborative AI Task Evaluation Method, Management Device, Control Device and System", all of which The contents are incorporated into this application by reference.

Technical field

Embodiments of the present application relate to the field of artificial intelligence, and more specifically, to a distributed collaborative AI task evaluation method, management device, control device and system.

Background technique

By processing the tasks of distributed collaborative artificial intelligence (AI), the advantages of different devices can be effectively used to jointly realize artificial intelligence, especially the task processing of edge-cloud collaborative distributed artificial intelligence.

Due to the advantage of large-scale computing power on the cloud side, performing machine learning (ML) on the cloud side has become a well-known method. Currently, most large cloud platform providers have provided machine learning services. However, machine learning is a specific form of artificial intelligence. In the process of obtaining corresponding models through machine learning, a large amount of data cannot be obtained directly from cloud nodes, but is obtained through edge devices.

As the performance of edge devices improves, some machine learning tasks can be migrated to edge devices, which is edge AI technology. Edge AI technology has data advantages and can reduce communication delays, and can be used in tasks with high latency requirements. Since the cloud side has computing power advantages and the edge side has the advantages of data and low latency, it has become a development trend for distributed edge-cloud collaboration to complete the task of distributed collaborative AI.

However, because the processing of distributed collaborative AI has problems such as high dependence on the maturity of business scenarios and high requirements for cross-team collaboration, especially in the task processing of distributed edge-cloud collaborative AI, how to make the tasks of distributed collaborative AI The ease of deployment of the processing architecture has become an urgent problem to be solved.

Contents of the invention

Embodiments of the present application provide a distributed collaborative AI task evaluation method, management device, control device and system. By flexibly configuring the task use cases of distributed collaborative AI, and then flexibly managing the task evaluation container of distributed collaborative AI, it helps to obtain different results. The execution of AI task use cases makes the distributed collaborative AI task processing architecture easy to deploy.

In the first aspect, a method for task evaluation of distributed collaborative artificial intelligence is provided. The method is applied to the control node. The method includes: obtaining the task configuration of distributed collaborative AI and the task object of distributed collaborative AI. The task configuration includes Configuration of the task environment of the distributed collaborative AI; receiving the first management instruction according to the configuration of the task environment and the task object, wherein the first management instruction includes the management instruction of the task environment and/or the management instruction of the task object; according to the first management Instructions manage task use cases of distributed collaborative AI. Task use cases include task environments and task objects.

It should be understood that distributed collaborative AI means that different AI processes corresponding to the AI paradigm can be arranged in the same Implemented in different containers of the device, or different AI processes can be implemented in different containers arranged on different devices. For example, the machine learning paradigm includes a training process and an inference process, where the training process and the inference process can be implemented at the same time. It can be implemented in different containers arranged on one device, or the training process and inference process can be implemented in different containers on different devices.

In the embodiment of this application, by managing the task environment and task objects, task use cases corresponding to the paradigm of distributed collaborative artificial intelligence can be flexibly managed, which facilitates service developers to manage task use cases according to business needs in actual scenarios. , which facilitates the processing of different task use cases and helps to obtain the execution status of different AI task use cases, thereby making the edge-cloud collaborative distributed AI task architecture easy to deploy.

In a possible implementation, managing distributed collaborative AI task use cases includes: adding task use cases, deleting task use cases, modifying task use cases, or querying task use cases.

In a possible implementation, when the task evaluation mode is a multi-node task evaluation mode, the method further includes: generating a first local management instruction and a response to the first local control device according to the task use case and the global container management configuration. The second local management instruction of the second local control device, wherein the task configuration includes a global container management configuration, and the global container management configuration includes the first local control device, the instruction action corresponding to the first local management instruction, the second local control device and the third Instruction actions corresponding to the two local management instructions: sending the first local management instruction to the first local control device, and sending the second local management instruction to the second local control device.

It should be understood that the multi-node task evaluation mode can perform multi-node distributed collaboration at task edge nodes and task cloud nodes through control cloud nodes.

In a possible implementation, the instruction action corresponding to the first local management instruction is a new instruction to the first local control device, a deletion instruction to the first local control device, or a modification to the first local control device. command, or a query command to the first local control device; the management command corresponding to the second local management command is a new command to the second local control device, or a deletion command to the second local control device, or a command to the second local control device. Modification instructions to the control device, or query instructions to the second local control device.

In a possible implementation, the method further includes: receiving an evaluation result of at least one task use case from the first local control device or the second local control device; and displaying the evaluation result of at least one task use case.

In the embodiment of this application, in the task evaluation of multi-node distributed collaborative AI, through the management of task use cases, the corresponding task evaluation containers can be managed through different paradigms to realize the evaluation of different task use cases. Especially when the data provider and algorithm provider are inconsistent, it can be ensured that when the data provided by the data does not leave the edge node, by obtaining the evaluation results of different task use cases, it helps to obtain the execution status of different task use cases, thereby making the edge The cloud collaborative distributed AI task architecture is easy to deploy.

In a possible implementation, when the task evaluation mode is a single-node task evaluation simulation mode, the simulation switch configuration parameter in the task configuration is start. The method also includes: managing the third party according to the task use case and the local container management configuration. A task evaluation container, where the task evaluation container includes a local simulation workflow, the task configuration includes a local container management configuration, and the local container management configuration includes an AI task evaluation container and management actions corresponding to the AI task evaluation container, where the third task evaluation container corresponds The management actions include adding a third task evaluation container, deleting a third task evaluation container, modifying a third task evaluation container, or querying a third task evaluation container.

It should be understood that the single-node task evaluation simulation mode can be a single-node task evaluation simulation on a cloud node, or a single-node task evaluation simulation on an edge node.

In the embodiment of this application, the task use case management module manages task use cases (for example, creating AI task use cases) through task configurations and task objects, and the container management module manages task evaluation content through task use cases and simulation modes. container (for example, add a new AI task container), and perform task evaluation simulation through the simulation workflow in the task evaluation container to obtain simulation results. Not only can you create task paradigms in task use cases based on task scenarios, but you can also implement multi-node simulation on a single node. For service developers, through the task evaluation architecture of distributed collaborative AI in the embodiment of this application, task use cases can be flexibly configured according to task scenarios, that is, corresponding task use cases can be managed according to different AI paradigms (for example, Create corresponding task cases). For algorithm developers, to implement multi-node simulation on a single node, there is no need to repeatedly deploy the distributed collaborative AI task evaluation architecture on the side and cloud side as the side task environment changes, nor does it need to be deployed on multiple nodes. By building an AI task evaluation platform on a virtual node or multiple device nodes, you can simulate task use cases and obtain simulation results, which greatly reduces manpower and material costs.

In a possible implementation, when the third task use case for managing distributed collaboration is a new task use case, managing the third task evaluation container according to the task use case and local container management configuration includes: according to the task use case and local container management Configuration, add the third task evaluation container The third task evaluation container includes the local simulation workflow.

In a possible implementation, the method includes: the task configuration also includes a simulation mode corresponding to the third task evaluation container; the simulation mode corresponding to the third task evaluation container is any one of the following modes: the simulation mode is a simulation test algorithm performance ; Or the simulation mode is to simulate and test system performance; or the simulation mode is to simulate and test algorithm performance and system performance; or the simulation mode is to simulate and test system unit performance.

In a possible implementation, when the simulation mode is to simulate and test algorithm performance, the method also includes: creating an algorithm pseudo-container in the third task evaluation container corresponding to the simulation mode of simulating and testing algorithm performance; or when the simulation mode is to When simulating and testing system performance, the method also includes: creating a system pseudo-container in the third task evaluation container corresponding to the simulation mode of simulating and testing system performance; or when the simulation mode is to simulate and test algorithm performance and system performance, in the simulation test algorithm Create a real container in the third task evaluation container corresponding to the simulation mode of performance and system performance; or when the simulation mode is to simulate the test system unit performance, continue to use the third task evaluation corresponding to the simulation mode to simulate the test system unit performance. container.

In a possible implementation, the method further includes: receiving a simulation result of at least one task case from the third task evaluation container, where the simulation result is that the third task evaluation container calls the configuration according to the local container and corresponds to the third task evaluation container. Simulation mode is obtained by simulating task use cases; among them, the task configuration includes the local container call configuration, and the local container call configuration includes the calling sequence of the algorithm module in the task paradigm, and the hyperparameters corresponding to the algorithm module. Among them, the task use case includes the task environment. , the task environment includes the task paradigm; the simulation results of at least one task use case are displayed.

In the embodiment of this application, by displaying the simulation results after simulating different task use cases, users can intuitively compare the simulation results of task objects, thereby obtaining objective evaluation results of task use cases, which is helpful for users to select appropriate goals. Task object.

In one possible implementation, the type of task evaluation is any one of benchmark testing, certification, or competition.

In a possible implementation, when the type of task evaluation is benchmark testing, the task environment is a test environment, the task object is a test object, and the test object is any one of a test algorithm, a test model, a test system, or a test scenario. , the task paradigm is the test paradigm.

In the second aspect, a method for task evaluation of distributed collaborative artificial intelligence AI is provided. The method is applied to task cloud nodes. The method includes: receiving a first local management instruction from a management device, and the first local management instruction is based on The task use case of distributed collaborative AI is determined according to the first management instruction, where the first management instruction includes the management instruction of the task environment and/or the management instruction of the task object, and the task use case includes the task environment and the task object. ; Manage the first local control device according to the first local management instruction.

In the embodiment of this application, in the task evaluation of multi-node distributed collaborative AI, through the management of task use cases, Corresponding task evaluation containers can be managed through different paradigms to achieve evaluation of different task use cases. Especially when the data provider and algorithm provider are inconsistent, it can be ensured that when the data provided by the data does not leave the edge node, by obtaining the evaluation results of different task use cases, it helps to obtain the execution status of different task use cases, thereby making the edge The cloud collaborative distributed AI task architecture is easy to deploy.

In a possible implementation, the first management instruction is used to add a task use case, delete a task use case, modify a task use case, or query a task use case; managing the first local control device is to add a first local control device, Either delete the first local control device, modify the first local control device, or query the first local control device.

In a possible implementation, when managing the first local control device is to add a new first local control device, the method further includes: the first local management instruction includes the task use case and the first local container management configuration corresponding to the task cloud node. ; Manage the first task evaluation container according to the task use case and the first local container management configuration, and the first local container management configuration includes the first task evaluation container and management instructions corresponding to the first evaluation container.

In a possible implementation, the management instruction corresponding to the first task evaluation container is a new instruction to the first task evaluation container, a deletion instruction to the first task evaluation container, or a modification to the first task evaluation container. Instructions, or query instructions to the first task evaluation container, or instructions to implement communication between the first task evaluation container and an external task evaluation container, wherein the external task evaluation container belongs to a node other than the task cloud node.

In a possible implementation, if the first task evaluation container is a newly added first task evaluation container, then the first task evaluation container corresponding to the new task cloud node, where the first AI task evaluation container corresponding to the task cloud node The container includes a first distributed workflow.

In a possible implementation, the method further includes: when the evaluation result of the task use case is in the first task evaluation container, receiving the evaluation result of at least one task use case from the first AI task evaluation container; displaying the evaluation result of the at least one task use case. Evaluation results; send the evaluation results of at least one task case to the management module.

In the embodiment of this application, by displaying the evaluation results after evaluating different task use cases, the user can intuitively compare the evaluation results of the task objects, thereby obtaining objective evaluation results of the task use cases, which is beneficial to the user in choosing the appropriate one. Target task object.

In the third aspect, a method for task evaluation of distributed collaborative artificial intelligence AI is provided. The method is applied to task edge nodes. The method includes: receiving a second local management instruction from the management device, and the second local management instruction is based on The task use case is determined according to the second management instruction, wherein the second management instruction includes the management instruction of the task environment and/or the management instruction of the task object, and the task use case includes the task environment and the task object; according to the second local Management instructions to manage the second local control device.

In a possible implementation, the second management instruction is used to add a task use case, delete a task use case, modify a task use case, or query a task use case; managing the second local control device is to add a second local control device, Either delete the second local control device, modify the second local control device, or query the second local control device.

In a possible implementation, when managing the second local control device is to add a second local control device, the method further includes: the second local management instruction includes the task use case and the second local container management configuration corresponding to the task cloud node. ; Manage the second task evaluation container according to the task use case and the second local container management configuration, and the second local container management configuration includes the second task evaluation container and management instructions corresponding to the second evaluation container.

In a possible implementation, the management instruction corresponding to the second task evaluation container is a new instruction to the second task evaluation container, a deletion instruction to the second task evaluation container, or a modification to the second task evaluation container. Instructions, or query instructions for the second task evaluation container, or instructions to implement communication between the second task evaluation container and the external task evaluation container, wherein the external task evaluation container belongs to a node other than the task cloud node.

In a possible implementation, if the second task evaluation container is a new second task evaluation container, then a second task evaluation container corresponding to the new task cloud node is added, where the second AI task evaluation container corresponding to the task cloud node is The container includes a second distributed workflow.

In a possible implementation, the method further includes: when the evaluation result of the task use case is in the second task evaluation container, receiving the evaluation result of at least one task use case from the second AI task evaluation container; displaying the evaluation result of the at least one task use case. Evaluation results; sending the evaluation results of at least one task use case to the management device.

In the fourth aspect, a management device for task evaluation of distributed collaborative artificial intelligence is provided. The device includes a task use case management module and a communication module. The device includes: a communication module used to: obtain the task configuration and distribution of distributed collaborative AI. The task object of the distributed collaborative AI, the task configuration includes the configuration of the task environment of the distributed collaborative AI; according to the configuration of the task environment and the task object, the first management instruction is received, wherein the first management instruction includes the management instruction of the task environment and/or Management instructions for task objects; the task use case management module is used to manage task use cases of distributed collaborative AI according to the first management instruction. The task use cases include task environments and task objects.

It should be understood that distributed collaborative AI means that different AI processes corresponding to the AI paradigm can be implemented in different containers arranged on the same device, or different AI processes can be implemented in different containers arranged on different devices. For example, the machine learning paradigm includes a training process and an inference process, where the training process and the inference process can be implemented in different containers arranged on the same device, or the training process and the inference process can be implemented in different containers on different devices.

In a possible implementation, when the task evaluation mode is a multi-node task evaluation mode, the management device is arranged at the control cloud node. The management device also includes a container management module, and the container management module is configured to: according to the task use case and The global container management configuration generates a first local management instruction for the first local control device and a second local management instruction for the second local control device, wherein the task configuration includes a global container management configuration, and the global container management configuration includes The first local control device, the instruction action corresponding to the first local management instruction, the second local control device and the instruction action corresponding to the second local management instruction; the communication module is used to: send the first local management instruction to the first local control device, Send a second local management instruction to the second local control device.

In a possible implementation, the device further includes a second result display module: the communication module is configured to: receive the execution status of at least one task case from the first local control device or the second local control device; the second result display module Used for: Displaying the execution status of at least one task case.

In a possible implementation, when the task evaluation mode is a single-node task evaluation simulation mode, the simulation switch configuration parameter in the task configuration is start, the management device is arranged on a single node, and the management device also includes a container management module : The container management module is used to: manage the third task evaluation container according to the task use case and the local container management configuration, where the task evaluation container includes the local simulation workflow, the task configuration includes the local container management configuration, and the local container management configuration includes the AI task evaluation. Management actions corresponding to the container and the AI task evaluation container. The management actions corresponding to the third task evaluation container include adding a third task evaluation container, deleting the third task evaluation container, or deleting the third task evaluation container. Modification, or query to the third task evaluation container.

It should be understood that the single-node task evaluation simulation mode can be to perform single-node task evaluation simulation on the cloud node, or to perform single-node task evaluation simulation on the edge node, that is, the management device can be arranged on the edge node, or can be Arranged on multiple nodes.

In the embodiment of this application, the task use case management module manages task use cases (for example, creating AI task use cases) through task configurations and task objects, and the container management module manages task evaluation containers (for example, new AI task use cases) through task use cases and simulation modes. Add AI task container), and perform task evaluation simulation through the simulation workflow in the task evaluation container to obtain the simulation results. Not only can you create task paradigms in task use cases based on task scenarios, but you can also implement multi-node simulation on a single node. For service developers, through the task evaluation architecture of distributed collaborative AI in the embodiment of this application, task use cases can be flexibly configured according to task scenarios, that is, corresponding task use cases can be managed according to different AI paradigms (for example, Create corresponding task cases). For algorithm developers, to implement multi-node simulation on a single node, there is no need to repeatedly deploy the distributed collaborative AI task evaluation architecture on the side and cloud side as the side task environment changes, nor does it need to be deployed on multiple nodes. By building an AI task evaluation platform on a virtual node or multiple device nodes, you can simulate task use cases and obtain simulation results, which greatly reduces manpower and material costs.

In a possible implementation, when the task use case for managing distributed collaborative AI is a new task use case, the container management module is used to: add a third task evaluation container according to the task use case and local container management configuration, and add a third task evaluation container. Task The evaluation container includes local simulation workflows.

In a possible implementation, the task configuration also includes a simulation mode corresponding to the third task evaluation container; the simulation mode corresponding to the third task evaluation container is any one of the following modes: the simulation mode It is to simulate and test algorithm performance; or the simulation mode is to simulate and test system performance; or the simulation mode is to simulate and test the algorithm performance and the system performance; or the simulation mode is to simulate and test system unit performance.

In a possible implementation, when the simulation mode is to simulate and test algorithm performance, in the third task evaluation container corresponding to the simulation mode of simulating and testing algorithm performance, the container management module is also used to create an algorithm pseudo-container; or when the simulation When the mode is to simulate and test system performance, in the third task evaluation container corresponding to the simulation mode of simulating and testing system performance, the container management module is also used to: create a system pseudo-container; or when the simulation mode is to simulate and test algorithm performance and system performance , in the third task evaluation container corresponding to the simulation mode of simulating test algorithm performance and system performance, the container management module is also used to create a real container; or when the simulation mode is to simulate and test system unit performance, the container management module is also used to inherit The simulation mode is a third task evaluation container corresponding to the simulation mode that simulates the performance of the test system unit.

In a possible implementation, the management device further includes a result display module, and the communication module is configured to receive the simulation result of at least one task case from the third task evaluation container, and the simulation result is that the third task evaluation container calls the configuration according to the local container The simulation mode corresponding to the third task evaluation container is obtained by simulating the task use case; among them, the task configuration includes the local container call configuration, and the local container call configuration includes the calling sequence of the algorithm module in the task paradigm, and the hyperparameters corresponding to the algorithm module. , wherein the task use case includes a task environment, and the task environment includes a task paradigm; the result display module is used to: display the simulation result of at least one task use case.

In a possible implementation, when the type of task evaluation is benchmark testing, the task environment is a test environment, the task object is a test object, and the test object is a test algorithm, a test model, and a test system. Or any one of the test scenarios, the task paradigm is the test paradigm.

In the fifth aspect, a first local control device for task evaluation of distributed collaborative artificial intelligence AI is provided. The first local control device is arranged at a task cloud node. The first local control device includes a first communication module and a first container. Management module: the first communication module is used to: receive the first local management instruction from the management device, the first local management instruction is determined according to the task use case of the distributed collaborative AI, and the task use case is determined according to the first management instruction, where , the first management instruction includes the management instruction of the task environment and/or the management instruction of the task object, the task use case includes the task environment and the task object; the first container management module is used to: manage the first local control device according to the first local management instruction .

In a possible implementation, when the first local control device is managed to add a first local control device, the first local control device A local management instruction includes a task use case and a first local container management configuration corresponding to the task cloud node; the first container management module is also used to: manage the first task evaluation container, the first local container according to the task use case and the first local container management configuration. The container management configuration includes a first task evaluation container and management instructions corresponding to the first evaluation container.

In a possible implementation, if the first task evaluation container is a new first task evaluation container, then the first container management module is also used to: add a first task evaluation container corresponding to the task cloud node, where the task The first AI task evaluation container corresponding to the cloud node includes a first distributed workflow.

In a possible implementation, the first local control device further includes a first result display module: when the evaluation result of the task use case is in the first task evaluation container, the first communication module is also used to: receive data from the first AI task The evaluation container evaluates the evaluation result of at least one task use case; the first result display module is used to: display the evaluation result of at least one task use case; and send the evaluation result of at least one task use case to the management module.

In a sixth aspect, a second local control device for task evaluation of distributed collaborative artificial intelligence AI is provided. The device is arranged at a task edge node. The second local control device includes a second communication module and a second container management module. The second communication module is configured to: receive a second local management instruction from the management device, the second local management instruction is determined based on the task use case, the task use case is determined based on the second management instruction, wherein the second management instruction includes the task environment Management instructions and/or management instructions of task objects. The task use case includes a task environment and a task object; the second container management module is used to: manage the second local control device according to the second local management instruction.

In a possible implementation, when managing the second local control device is to add a second local control device, the second local management instruction includes the task use case and the second local container management configuration corresponding to the task cloud node; the second container The management module is also used to: manage the second task evaluation container according to the task use case and the second local container management configuration. The second local container management configuration includes the second task evaluation container and the management instructions corresponding to the second evaluation container.

In a possible implementation, the management instruction corresponding to the second task evaluation container is to evaluate the second task container New instructions for the second task evaluation container, or deletion instructions for the second task evaluation container, or modification instructions for the second task evaluation container, or query instructions for the second task evaluation container, or implementation of the second task evaluation container and external task evaluation containers Communication instructions between nodes where the external task evaluation container belongs to a node other than the task cloud node.

In a possible implementation, if the second task evaluation container is a new second task evaluation container, then the second container management module is also used to: add a second task evaluation container corresponding to the task cloud node, where the task The second AI task evaluation container corresponding to the cloud node includes a second distributed workflow.

In a possible implementation, the second local control device further includes a second result display module: when the evaluation result of the task use case is in the second task evaluation container, the second communication module is configured to: receive the evaluation result from the second AI task The container evaluates the result of at least one task use case; the second result display module is used to: display the evaluation result of at least one task use case; and send the evaluation result of at least one task use case to the management device.

In a seventh aspect, a distributed collaborative artificial intelligence AI task evaluation system is provided. The system includes a management device in any possible implementation of the device design of the fourth aspect, and a management device in the device design of the fifth aspect. The first local control device in any possible implementation manner, or the second local control device in any possible implementation manner in the device design of the sixth aspect.

In an eighth aspect, a computer device is provided. The device includes a memory and a processor. The memory is used to store programs; the processor is used to execute the program stored in the memory. When the program stored in the memory is executed, The processor is configured to execute the method in the first aspect and any one implementation of the first aspect, or the processor is configured to execute the method in the second aspect and any one implementation of the second aspect, so The processor is configured to execute the third aspect and the method in any implementation manner of the third aspect.

The processor in the eighth aspect above can be either a central processing unit (CPU) or a combination of a CPU and a neural network computing processor. The neural network computing processor here can include a graphics processor (graphics processing unit). unit (GPU), neural-network processing unit (NPU) and tensor processing unit (TPU), etc. Among them, TPU is an artificial intelligence accelerator special integrated circuit fully customized by Google for machine learning.

In a ninth aspect, embodiments of the present application provide a computer program product. The computer program product includes: computer program code. When the computer program code is run on a computer, it causes the computer to execute any possible method design in the first aspect. The method in the implementation mode, or executes the method in any possible implementation mode in the method design of the above second aspect, or executes the method in any possible implementation mode in the method design of the above third aspect.

In a tenth aspect, embodiments of the present application provide a computer-readable medium. The computer-readable medium stores program code. When the computer program code is run on a computer, it causes the computer to execute any one of the above-mentioned method designs of the first aspect. The method in the implementation mode, or execute the method in any possible implementation mode in the method design of the above second aspect, or execute the method in any possible implementation mode in the method design of the above third aspect.

In an eleventh aspect, a chip is provided. The chip includes a processor and a data interface. The processor reads instructions stored in the memory through the data interface and executes any one of the first aspect or the second aspect. method in an implementation.

Optionally, as an implementation manner, the chip may further include a memory, in which instructions are stored, and the processor is configured to execute the instructions stored in the memory. When the instructions are executed, the The processor is configured to execute the method in any implementation manner of the first aspect or the second aspect.

The above-mentioned chip can specifically be a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC).

Description of the drawings

Figure 1 is a schematic diagram of an artificial intelligence main body framework provided by an embodiment of the present application;

Figure 2A is a schematic diagram of the architecture of a traditional edge-cloud collaborative task method provided by an embodiment of the present application;

Figure 2B is a schematic diagram of the architecture of another traditional edge-cloud collaborative task method provided by an embodiment of the present application;

Figure 3A is a schematic diagram of a distributed collaborative AI task evaluation architecture provided by an embodiment of the present application;

Figure 3B is a schematic diagram of another distributed collaborative AI task evaluation architecture provided by an embodiment of the present application;

Figure 3C is a schematic diagram of another distributed collaborative AI task architecture provided by an embodiment of the present application;

Figure 4 is a flowchart of a distributed collaborative AI task evaluation method provided by an embodiment of the present application;

Figure 5 is a schematic flow chart of a distributed collaborative AI benchmark test simulation method provided by an embodiment of the present application;

Figure 6 is a schematic flowchart of a benchmark test process for distributed edge-cloud collaborative AI provided by an embodiment of the present application;

Figure 7 is a schematic diagram of the hardware structure of a distributed collaborative AI task evaluation system provided by an embodiment of the present application;

Figure 8 is a schematic diagram of the hardware structure of a distributed collaborative AI task evaluation and management device provided by an embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings. Obviously, the described embodiments are only some of the embodiments of the present application, rather than all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.

Figure 1 is a schematic diagram of an artificial intelligence main frame provided by an embodiment of the present application. The main frame describes the overall workflow of the artificial intelligence system and is suitable for general needs in the field of artificial intelligence.

The above artificial intelligence theme framework is elaborated below from the two dimensions of "intelligent information chain" (horizontal axis) and "information technology (IT) value chain" (vertical axis).

"Intelligent information chain" reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensation process of "data-information-knowledge-wisdom".

The "IT value chain" reflects the value that artificial intelligence brings to the information technology industry, from the underlying infrastructure of human intelligence and information (providing and processing technology implementation) to the systematic industrial ecological process.

(1)Infrastructure:

Infrastructure provides computing power support for artificial intelligence systems, enables communication with the external world, and supports it through basic platforms.

The infrastructure can communicate with the outside through sensors, and the computing power of the infrastructure can be provided by smart chips.

The smart chip here can be a central processing unit (CPU), a neural network processing unit (NPU), a graphics processing unit (GPU), or an application specific integrated circuit. Integrated circuit, ASIC) and field programmable gate array (field programmable gate array, FPGA) and other hardware acceleration chips.

The basic platform of infrastructure can include distributed computing framework and network related platform guarantees and support, and can include cloud storage and computing, interconnection networks, etc.

For example, for infrastructure, data can be obtained through sensors and external communication, and then the data can be provided to smart chips in the distributed computing system provided by the basic platform for calculation.

(2)Data:

Data from the upper layer of the infrastructure is used to represent data sources in the field of artificial intelligence. This data involves graphics, images, voice, text, sequences, and also involves IoT data of traditional equipment, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.

It should be understood that a sequence can be understood as a data sequence. The most common one is sequential sequence data, that is, sequential data, such as weather forecast data (temperature, wind direction, etc.) within a period of time, or physiological data such as stock market data, human blood sugar change data, etc. sequence and so on.

(3)Data processing:

The above data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other processing methods.

Among them, machine learning and deep learning can perform symbolic and formal intelligent information modeling, extraction, preprocessing, training, etc. on data.

Reasoning refers to the process of simulating human intelligent reasoning in computers or intelligent systems, using formal information to perform machine thinking and problem solving based on reasoning control strategies. Typical functions are search and matching.

Decision-making refers to the process of decision-making after intelligent information is reasoned, and usually provides functions such as classification, sorting, and prediction.

(4) General abilities:

After the data is processed as mentioned above, some general capabilities can be formed based on the results of further data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, and image processing. identification, etc.

(5) Intelligent products and industry applications:

Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of overall artificial intelligence solutions, productizing intelligent information decision-making and realizing practical applications. Its application fields mainly include: intelligent manufacturing, intelligent transportation, Smart home, smart medical care, smart security, autonomous driving, safe city, smart terminal, etc.

Embodiments of the present application can be applied to many fields in artificial intelligence, such as smart manufacturing, smart transportation, smart home, smart medical care, smart security, autonomous driving, safe cities and other fields.

In order to facilitate understanding, the following first introduces the relevant terms, concepts or technologies that may be involved in the embodiments of this application:

1) Edge devices

Edge devices can be understood as any device with computing resources and network resources other than cloud-side devices. The edge device can be a client device or a device between the cloud server and the client device. For example, a mobile phone can be an edge device, a sensor can be an edge device, and a gateway can be an edge device between the smart home terminal and the cloud server. In an ideal world, edge devices are designed to analyze or process data close to the source of the data. Since there is no data flow, network traffic and processing delays are reduced.

The edge device in the embodiment of the present application may be a mobile phone, a tablet personal computer (TPC), a media player, a smart home, a laptop computer (LC), or a personal digital assistant with computing capabilities. assistant (PDA), personal computer (PC), camera, camcorder, smart watch, wearable device (WD) or self-driving vehicle, etc. It can be understood that the embodiments of the present application do not limit the specific form of the edge device.

2) Edge AI

Edge AI originates from edge computing. Edge computing uses edge devices to process data from data generation sources, which helps reduce the processing load of the overall cloud-edge collaboration system and reduce data delays. Edge AI processes AI algorithms locally on edge devices, processing and analyzing data from data generation sources without the need for streaming or cloud-side data storage.

3)AI Paradigm

AI paradigm is an AI process recognized by the industry or academia. In an AI paradigm, the AI processing process framework remains unchanged, and the specific algorithms in the processing process framework can be replaced. Taking the edge-cloud collaborative AI paradigm as an example, in the training and reasoning of image classification models, if the edge-cloud collaborative AI paradigm is to train the image classification model on the cloud side, the edge side completes the inference of image classification through the trained image classification model. process, then "cloud-side training, edge-side reasoning" is an edge-cloud collaborative AI paradigm. In this AI paradigm, specific training methods or reasoning methods are replaceable, but the process of "cloud-side training, edge-side reasoning" remains unchanged.

The "cloud side" can also be called the cloud side, and the "edge side" can also be called the edge device side, and can also have other names, which are not limited in the embodiments of this application.

4)Test scenario

The test scenario is a business scenario that meets the specific application of edge devices. The test scenario can be represented by business scenario description, data set settings, data feature settings, data label settings, related indicators and standards. For example, the test scenario of the test paradigm can be a vehicle re-identification application scenario, then the relevant indicator can be the mean average precision (mAP) of the vehicle re-identification category, and the relevant standard can be the qualification standard of the relevant indicator, for example, vehicle re-identification In the recognition scenario, the qualifying standard for mAP is mAP greater than or equal to 0.95.

5)Test object

The test object refers to the target instance of the test, which can be an algorithm, model, system, data set or scenario, etc. For example, in the vehicle re-identification test scenario, there are a series of vehicle re-identification algorithms. At this time, the test object can be this series of vehicle re-identification algorithms, and then the best vehicle re-identification test scenario can be obtained from the series of algorithms. algorithm. For another example, the test object is a series of test scenarios. For a specific algorithm, the best test scenario is obtained from this series of test scenarios.

6) Test environment

The test environment is the configuration or constraints required for edge-cloud collaborative distributed AI testing. The configuration can include the resource configuration and AI algorithm configuration required for testing. For example, resource configuration can include CPU core number configuration, transmission bandwidth configuration, etc.; AI algorithm configuration can include business data set configuration, algorithm accuracy evaluation indicator configuration, AI algorithm test paradigm, etc.

Taking the AI algorithm as incremental learning as an example to illustrate the testing paradigm of the AI algorithm, the testing paradigm of the AI algorithm can also be called the AI paradigm. The incremental testing paradigm includes the training process and inference. The training process includes the initialization training module and the model update module, and the inference process includes the difficult example identification module.

7)Test cases

Test cases include test objects and test environments, which are execution instances under the constraints of the test environment to verify whether the test objects meet specific performance requirements.

8) Benchmark

The unit of benchmark testing is a test case, which includes a series of test cases. Benchmark testing is an edge-cloud collaborative distributed AI system evaluation method recognized by academia or industry.

9) Container technology

Container technology is an operating system-level virtualization technology that isolates different processes through operating system isolation technology, such as control groups and namespaces under Linux. Container technology is different from hardware virtualization technology in that it does not have virtual hardware, and there is no operating system inside the container, but only processes. It is precisely because of this feature of container technology that containers are lighter and more convenient to manage than virtual machines. In the running state of the container, a set of common management operations are defined, such as starting, stopping, pausing, deleting, etc., so that the life cycle of the container can be managed uniformly. When the container is running, it is started on demand, that is, after the created container completes the corresponding task, it can be deleted and re-created when it is used next time.

Figure 2A is a schematic diagram of the architecture of a traditional edge-cloud collaborative task method provided by an embodiment of the present application. Figure 2B is a schematic diagram of the architecture of another traditional edge-cloud collaborative task method provided by an embodiment of the present application. The following will be combined with Figure 2A and Figure 2B explains this in detail respectively.

Currently, there are many edge-cloud collaborative distributed AI task evaluation methods, especially in benchmark tests. For the benchmark test task shown in Figure 2A, algorithm developers deploy the test paradigms of cloud-side training and side-side reasoning on the cloud side and side respectively. On the side, cloud nodes and edge nodes can be specific physical nodes. In edge-cloud collaborative distributed collaboration, the task of the cloud node is training. The cloud node sends the trained model to the edge node, and the task of the edge node is For reasoning work, edge nodes complete specific reasoning tasks through trained models. As an edge device near the data generation source, edge nodes can send newly collected data to cloud nodes to further train the model.

As shown in the benchmark test task shown in Figure 2B, such as the test framework for federated learning, algorithm developers deploy the test paradigm of edge-side training and cloud-side aggregation and post-inference reasoning on the edge and cloud sides respectively. That is, the distributed collaboration task of edge nodes is training work. The data generated by the data source is trained on the edge nodes, and then the trained model of the edge nodes is sent to the cloud node. The cloud node will obtain the model from different edge nodes. After aggregation, the overall model is obtained, and specific reasoning tasks are completed through the overall model on the cloud node.

Whether it is the edge-cloud collaborative distributed AI benchmark architecture shown in Figure 2A or the edge-cloud collaborative distributed AI benchmark architecture shown in Figure 2B, there are the following problems.

First of all, once the test paradigm is determined, after the service developer deploys such a test paradigm, the algorithm developer can only test the test cases under such a test paradigm. For example, once the test paradigm for cloud-side training and side-side reasoning in Figure 2A is determined, it cannot be easily changed. If it is to be changed, a new test paradigm needs to be redeployed. In particular, the test paradigm under the test framework for federated learning in Figure 2B is only applicable when the data provider and the algorithm provider are consistent. If the algorithm provider and the data provider are inconsistent, the algorithm provider cannot obtain training. Data, data providers cannot obtain the algorithm, and the test paradigm in Figure 2B is no longer applicable. It is often an ideal state for the algorithm provider and the data provider to be consistent, but in reality, the two are often inconsistent.

Secondly, algorithm developers need to repeatedly deploy test cases in the test paradigm and deploy related images according to the actual side environment. If the test object is an algorithm, a single node can test the algorithm, but if If the test object is a system, then you need to build a multi-node or multi-device architecture to correctly test the test case. For example, create multiple nodes on the cloud side for testing, or build multiple Raspberry Pi architectures on the side. carry out testing. At present, this kind of testing method has relatively high labor and material costs.

In addition, the current test scenarios are relatively limited and only support limited tasks in medical scenarios or traffic classification scenarios, such as image classification, target detection, speech recognition, etc. Therefore, the test cases will also be relatively limited, that is, the tests in the test cases The environment and test objects are both in limited test scenarios. For example, there is a lack of support for typical edge-cloud collaborative distributed AI application scenarios such as industrial quality inspection and vehicle re-identification.

In order to solve the above problems, embodiments of the present application propose a distributed collaborative AI task evaluation method and device. The distributed collaborative AI task evaluation method will be described in detail below with reference to the accompanying drawings. The following mainly takes edge-cloud collaborative distributed AI tasks as an example for explanation. It should be understood that the distributed collaborative architecture in the embodiments of this application is not limited to edge-cloud collaborative distributed architecture, but can also be other distributed architectures. In this regard No restrictions.

First, the edge-cloud collaborative distributed AI task evaluation architecture in the embodiment of the present application will be described with reference to Figures 3A to 3C.

The AI task assessment type of edge-cloud collaboration is an AI task involving standardized assessment. For example, it can be an AI benchmark test task for edge-cloud collaboration, an AI application certification task for cloud services and products for edge-cloud collaboration, or an AI competition rating task for edge-cloud collaboration, etc. The embodiments of this application do not limit this. In addition, the following mainly takes the AI benchmark test task of edge-cloud collaboration as an example for detailed explanation.

The edge-cloud collaborative distributed AI task evaluation architecture can include both the cloud side and the edge side. The cloud side includes cloud nodes, and the edge side includes edge nodes. Cloud nodes and edge nodes can be specific physical nodes, that is, cloud nodes can be servers and other equipment on the cloud side, and edge nodes can be other than specific equipment on the cloud side. Devices, such as edge devices or client devices. More specifically, the edge-cloud collaborative distributed AI task evaluation architecture can also include the cloud side, edge side, and client. In this end-edge-cloud architecture, at this time, the specific equipment on the edge side is between the cloud side and the client side. Devices between clients, for example, edge devices can be gateways between smart home clients and cloud-side servers. The embodiments of this application do not limit the specific physical forms of edge nodes and cloud nodes. The cloud node can also be a virtual cloud node.

It should be understood that Figures 3A and 3C in the embodiment of the present application take the edge-cloud architecture as an example for illustration. The distributed collaborative AI task evaluation method and device in the embodiment of the present application are also applicable to the architecture of the device-edge cloud.

In addition, the embodiment of the present application does not limit the specific number of cloud nodes and edge nodes. The number of cloud nodes and edge nodes in Figure 3A and Figure 3B is only an exemplary representation.

Figure 3A is a schematic diagram of a distributed collaborative AI task evaluation architecture provided by an embodiment of the present application.

As shown in Figure 3A, the cloud nodes on the cloud side include control cloud nodes and task cloud nodes. A management module is arranged in the control cloud node, which can also be called a management device. The management module is used to realize the life cycle management of the global container. The management module has two ways to manage the global container life cycle. First, the task management device can create local control devices and task evaluation containers in all nodes, where local control devices arranged on other nodes are also a kind of container. Secondly, the task management device can also create a first local control device and a second local control device, where the first local control device is arranged at the task cloud node, the second local control device is arranged at the task edge node, and then the two local control devices The device manages task containers of local nodes respectively. The management device includes a container management module. The container management module implements functions such as adding, deleting, modifying, and querying global containers. The management device also includes a communication module.

The management device also includes a task use case management module, which can also be called a global task use case management module. The global task use case management module is used for basic functions such as adding, deleting, modifying, and querying task use cases. For example, when the AI task evaluation is an AI benchmark test, the global task case management module can be a global test case management module, and the global test can be The instance management module is used for basic functions such as adding, deleting, modifying, and querying the test environment and/or test objects.

A first local control device is arranged in the task cloud node. The first local control device is used to implement life cycle management of the first task evaluation container and communication with the external evaluation task container, where the external task container belongs to the task corresponding to the local task evaluation container. Nodes other than the node, that is, nodes other than the task cloud node corresponding to the first task evaluation container, for example, the second task evaluation container shown in Figure 3A. The specific workflow (pipeline) in the first task evaluation container has been created at the early stage of the creation of the first task evaluation container. For example, the first task evaluation container may include a local simulation workflow and a first distributed workflow. The first local control device also includes a first container management module and a first task case management module, where the first container management module is used to implement functions such as adding, deleting, modifying, and querying containers on the local node. A task case management module is used to manage task cases, for example, to implement basic functions such as adding, deleting, modifying, and querying test objects and/or test environments in test cases. The first local control device also includes a first communication module.

The second local control device arranged on the side task edge node is similar to the first local control device arranged on the task cloud node. For the sake of simplicity, no further description will be given here.

In the embodiment of the present application, if the edge-cloud collaboration task is performed through the architecture shown in Figure 3A, the management device for global management is arranged on a separate control cloud node.

In addition, the management device can also be arranged in the same cloud node as the first local control device and the first task evaluation container. Figure 3B is a schematic diagram of another distributed collaborative AI task evaluation architecture provided by an embodiment of the present application.

Compared with the architecture diagram shown in Figure 3B and Figure 3A, the difference lies in that the management device is not separately arranged on a control cloud node. The management device, as a high-level management module, the first local control device and the first task evaluation container, are jointly placed on the same cloud node. The functions of the structural modules in Figure 3B are the same as those in Figure 3A, and will not be described again for the sake of simplicity.

When the task evaluation simulation of multi-node distributed collaborative AI is implemented through a single node, the task evaluation simulation of distributed collaborative AI can be performed only on edge nodes or cloud nodes, thereby realizing multi-node distributed collaborative AI on a single node. Task evaluation simulation, for example, benchmarking simulation of distributed collaborative AI with multiple nodes on a single node. Figure 3C is a schematic diagram of another distributed collaborative AI task architecture provided by an embodiment of the present application.

As shown in Figure 3C, the task evaluation simulation architecture diagram of distributed collaborative AI that implements multi-node task evaluation simulation through a single node. The single-node management device is used to implement life cycle management of all containers of a single node. The single-node management device includes a task case management module and a container management module. For example, in the benchmark test of distributed collaborative AI, the test case management module is used for basic functions such as adding, deleting, modifying or querying the test environment and test objects. The container management module is used to implement basic functions such as adding, deleting, modifying or querying the third AI task evaluation container. A local simulation workflow is included in the test container.

Among the above three distributed task architectures, the architecture shown in Figures 3A and 3B can be used to implement the task evaluation of distributed collaborative AI on edge-cloud collaborative multi-nodes, such as the benchmark test of distributed collaborative AI, while the architecture shown in Figure 3C The architecture can be used to implement task evaluation simulations of multi-node distributed collaborative AI on a single node, such as benchmark testing simulations of multi-node distributed collaborative AI on a single node.

The task evaluation method of distributed collaborative AI will be explained in detail below with reference to Figures 4 to 6.

Figure 4 is a flow chart of an evaluation method for distributed collaborative AI tasks provided by an embodiment of the present application.

S401: Obtain the task configuration of the distributed collaborative AI and the task object of the distributed collaborative AI. The AI task configuration includes the configuration of the task environment of the distributed collaborative AI.

As a possible implementation method, read the configuration file corresponding to the task configuration and read the file corresponding to the task object.

It should be understood that the task configuration and task objects of distributed collaborative AI are adjustable. The specific adjustment methods can be basic functions such as adding, deleting, modifying, and querying. The embodiments of this application do not limit the adjustment methods.

S402: Receive a first management instruction according to the configuration of the task environment and the task object, where the first management instruction includes a management instruction of the task environment and/or a management instruction of the task object.

It should be understood that the first management instruction may be a management instruction of the task environment, the first management instruction may also be a management instruction of the task object, and the first management instruction may also be a management instruction of the task environment and the task object.

S403. According to the first management instruction, manage task use cases of distributed collaborative AI. The task use cases include AI task environments and AI task objects.

As a possible implementation manner, the first management instruction may be a basic management instruction such as adding, deleting, modifying or querying the AI task case, which is not limited in the embodiment of the present application.

As described above, the type of task evaluation of distributed collaborative AI is an AI task involving standardized evaluation. For example, it can be a benchmark test of distributed edge-cloud collaborative AI, application certification of cloud services and products for distributed edge-cloud collaborative AI, or a competition rating task for distributed edge-cloud collaborative AI, etc. The following takes the benchmark test of distributed edge-cloud collaborative AI as an example. The task evaluation method of distributed collaborative AI is explained in detail from the following two benchmark tests. The first is to conduct multi-node benchmark test simulation on a single node, and the second is to conduct distributed edge-cloud collaborative benchmark test on multiple nodes.

At this time, the task environment is the test environment, the task object is the test object, the task case is the test case, the task configuration is the test environment configuration, the task environment configuration is the test environment configuration, and the task paradigm is the test paradigm (can also be called AI paradigm), The task evaluation mode is the benchmark test mode, and the task case management module is the test case management module.

The first type performs multi-node benchmark simulation on a single node. The benchmark simulation architecture is shown in Figure 3C. This will be described in detail below with reference to Figure 5 .

Figure 5 is a schematic flowchart of a distributed collaborative AI benchmark test simulation method provided by an embodiment of the present application. Among them, the test scenario is safety helmet target detection in industrial quality inspection as an example, and the test paradigm in the test environment is edge-cloud collaborative incremental learning (IL).

Before the management device starts to be used, the management device is started according to the benchmark test mode, and each module in the management device is initialized.

Specifically, according to the single-node benchmark test simulation mode, the management device is started, and the test case management module, container management module, communication module and result display module in the management device are initialized.

S510. The communication module of the management device obtains the test configuration file. The test configuration file includes a test environment configuration module, a test paradigm configuration module, an algorithm basic configuration module, an algorithm hyperparameter configuration module, a container management configuration module, and a container invocation configuration. Among them, the configuration modules in the test configuration file may exist in the form of separate configuration modules, or may not exist in the form of separate configuration modules. For example, the test paradigm configuration module can be a separate configuration module or included in the test environment configuration module. Algorithm hyperparameter configuration can exist in the form of a sub-configuration file or not.

As a possible implementation method, the test case management module reads the configuration parameters in the test environment configuration module, test paradigm configuration module, algorithm basic configuration module or algorithm super parameter configuration module. By way of example, the configuration parameters of the test environment module are as shown in Table 1.

Table 1 Test environment configuration module parameter description

It should be understood that the test environment parameter configuration in Table 1 is only an exemplary description. There can also be other test environment parameter configurations, which can be modified accordingly according to the test scenario and the needs of the service developer. The embodiments of the present application do not limit this.

It should also be understood that when the simulation switch is enabled, the simulation mode of each task container can be any one in Table 1. For all task containers in a single node, the simulation mode of all task containers can be any one of these four simulation modes, or any combination of these four simulation modes. That is, all task work nodes can be simulated in the same simulation mode, or they can be simulated in different simulation modes. For example, if 10 task containers created by a single node are all used for system performance test simulation, then 10 system pseudo-containers will be created inside these 10 task containers. The system pseudo-container can be understood as the system pseudo-container does not directly use data to run, but has calculation formulas corresponding to system performance such as bandwidth consumption or energy consumption inside the system pseudo-container. These calculation formulas can be used for system performance test simulation. For another example, if there are 10 task containers created by a single node, 5 of them are used for system performance test simulation and the other 5 are used for algorithm performance test simulation. Then, a system pseudo-container is created inside the 5 containers used for system performance test simulation. Algorithm pseudo-containers are created inside 5 containers used for algorithm performance test simulation.

For example, the parameter description in the test paradigm configuration module is shown in Table 2.

Table 2 Test paradigm configuration module parameter description

It should be understood that the test paradigm configuration parameters may exist in a separate module or as part of the parameters in the test environment configuration module, and the embodiments of the present application do not limit this. In the embodiment of the present application, the configuration parameters of the test paradigm are used as part of the test environment configuration parameters. Here, the test environment module parameters and the test paradigm module parameters are shown separately.

For example, the parameter description in the basic configuration module of the algorithm is shown in Table 3.

Table 3 Algorithm basic configuration module parameter description

Among them, the AI algorithm in the algorithm basic configuration module can be any AI algorithm.

For example, the hyperparameters in the algorithm hyperparameter module can exist directly in the algorithm hyperparameter configuration module in the form of enumerations, or in the form of a hyperparameter configuration file. For the former, if the algorithm hyperparameter is the learning rate, all learning rates are listed in the list, and the hyperparameter name of the learning rate is added to the multi-parameters in the basic configuration module of the algorithm. For the latter, if the hyperparameter_file in the algorithm basic configuration module is not empty, then when obtaining the algorithm hyperparameter configuration, the algorithm hyperparameter configuration module will not be scanned, but the algorithm hyperparameter configuration file will be read directly.

Container management configuration includes the number of containers and the size of each container. Container management configuration is local container management configuration and/or global container management configuration. For example, when the container management configuration is a local container management configuration, the local container management configuration includes the number of local containers and the size of each local container.

The container calling configuration includes the calling sequence of the modules corresponding to the test paradigm and the hyperparameters of the modules corresponding to the test paradigm.

S520: The communication module of the management device obtains the test object.

As a possible implementation method, when the test object is a test algorithm, that is, when the test object is an algorithm corresponding to the test paradigm, the test case management module obtains the test algorithm by reading the custom algorithm file.

When the test object is an algorithm, the test object can be different customized algorithm files, thereby obtaining different algorithms.

For example, in the incremental learning paradigm, the incremental learning paradigm includes an initialization training module, a difficult example identification module, and a model update module. Each module can be implemented through different algorithms, so the customized algorithm file can be these three modules. the corresponding algorithm. By reading different custom algorithms, different test algorithms can be obtained.

It should be understood that the custom algorithm files are located in the algorithm directory in the edge-cloud collaborative AI benchmark testing platform library. When performing multi-node simulation on a single node, the edge-cloud collaborative AI benchmark testing platform library needs to be deployed on a single node. Specifically, department The deployment method can be to download the edge cloud collaborative AI benchmark testing platform library.

It should also be understood that the custom algorithm file is a standard template file. For example, the custom algorithm file can conform to the algorithm interface specification of Sedna lib.

S530: The communication module of the management device obtains the data set.

As a possible implementation method, the test case management module can read data files in txt format and csv format.

It should be understood that the data source of the test data can be the open source data set in Kaggle and the data preprocessing algorithm corresponding to the open source data set, or it can also be the data collected by the data provider obtained by the edge node.

Taking safety helmet target detection as an example, when the test case management module reads a data file in txt format, each line of the txt file records a piece of index information and label information of unstructured data. The index information in the txt format data file is represented by the absolute path of the image and the coordinates of the target box, and the label information can be represented by the number 1 or 0.

When the test case management module reads a data file in csv format, the data file in csv format can be represented by the characteristic attributes of structured data and the specific parameters corresponding to the characteristic attributes of each piece of structured data.

It should be understood that the acquisition order of S520 to S540 is not limited in the embodiment of the present application. It may be the acquisition order of S520, S530 and S540, or it may be other orders.

S540: The communication module of the management device receives a first management instruction according to the test configuration and the test object, where the first management instruction includes a test environment management instruction and/or a test object management instruction.

It should be understood that test configuration includes test environment configuration.

S550: The test case management module of the management device manages test cases according to the first management instruction. The test cases include a test environment and a test object.

As a possible implementation, test cases can be managed by adding, deleting, modifying or querying tests.

Optionally, the test case management module obtains the management status information of the test case. The test case management information includes test cases and test case status information. For example, if the first management instruction is to process changes such as adding, deleting, and modifying test cases, then the status information of the test cases is the information after the test cases have been changed. For another example, if the first management instruction is to process the query of the test case without any changes, then the status information of the test case is the current status information of the test case.

S560: When the simulation switch parameter in the test environment is enabled, the container management module of the management device manages the third task evaluation container according to the test case and the local container management configuration, where the third task evaluation container includes the local simulation workflow. Among them, managing the third task evaluation container may include adding a third task evaluation container, deleting the third task evaluation container, modifying the third task evaluation container, or querying the third task evaluation container.

Specifically, managing the third task evaluation container during the benchmark test may include adding a new benchmark test container, deleting the benchmark test container, modifying the benchmark test container, or querying the benchmark test container.

It should be understood that the number of benchmark containers is not limited here and can be determined according to the number of containers parameters in the local container management configuration.

For example, the test paradigm in the test case is an incremental learning paradigm. The incremental learning paradigm includes an initialization training module, a difficult case identification module, and a model update module. The container management module can create a training container for the initialization training module and the model update module. Establish an inference container for the difficult case identification module.

Optionally, when the management test case is a new test case, the container management module of the management device adds a third benchmark test container according to the test case and the local container management configuration, where the third benchmark test container includes a local simulation workflow . According to the simulation mode parameters, the container conditions in the third benchmark test container are determined.

The simulation mode in the third benchmark container can be any one of the four simulation modes in Table 1.

As a possible implementation manner, when the simulation mode parameter is the simulation test algorithm performance, the container management module of the management device creates an algorithm pseudo-container in the third task evaluation container. For example, the container management module of the management device creates an algorithm pseudo-container in the third benchmark container.

As a possible implementation manner, when the simulation mode parameter is the simulation test system performance, the container management module of the management device creates a system pseudo-container in the third task evaluation container. For example, the container management module of the management device creates a system pseudo-container in the third benchmark container.

As a possible implementation manner, when the simulation mode parameters are simulation test system performance and algorithm performance, the container management module of the management device creates a real container in the third task evaluation container. For example, the container management module of the management device creates a real container in the third benchmark container.

As a possible implementation manner, when the simulation mode parameter is the simulation test system unit performance, the container management module of the management device continues to use the third task to evaluate the container. For example, the container management module of the management device inherits the third benchmark container that has been created.

Optionally, when the management test case is a new test case, the container management module of the management device adds a third benchmark test container according to the test case, simulation mode parameters and local container management configuration, where the third benchmark test container includes Local simulation workflow.

As a possible implementation manner, when the simulation mode parameter is the simulation test algorithm performance, the third task evaluation container created by the container management module of the management device includes an algorithm pseudo-container. For example, the third benchmark test container created by the container management module of the management device includes an algorithm pseudo-container.

As a possible implementation manner, when the simulation mode parameter is a simulation test system performance, the third task evaluation container created by the container management module of the management device includes a system pseudo-container. For example, the third benchmark test container created by the container management module of the management device includes a system pseudo-container.

As a possible implementation manner, when the simulation mode parameters are simulation test system performance and algorithm performance, the third task evaluation container created by the container management module of the management device includes a real container. For example, the third benchmark container created by the container management module of the management device includes the real container.

As a possible implementation manner, when the simulation mode parameter is the simulation test system unit performance, the container management module of the management device creates a third task evaluation container. For example, the container management module of the management device creates a third benchmark container.

S570, the local simulation workflow of the third task evaluation container, simulates the test case according to the local container call configuration and the simulation mode corresponding to the third task evaluation container, and obtains the simulation results, where the test configuration includes the local container call configuration, The local container calling configuration includes the calling sequence of the algorithm module corresponding to the test paradigm and the hyperparameters of the algorithm module corresponding to the test paradigm.

Exemplarily, the local simulation workflow of the third benchmark test container calls the algorithm module and algorithm module hyperparameters related to the test paradigm from the use case management module in sequence according to the call configuration of the third benchmark test container corresponding to the test paradigm, and performs the test Use cases to simulate and obtain test results.

For example, the calling sequence of modules related to the incremental learning paradigm is the initialization training module, the difficult example identification module and the model update module. Then the local simulation workflow in the training container first calls the initialization training module, and then the local simulation work in the inference container The flow then calls the difficult example identification module, and finally the local simulation workflow in the training container calls the model update module. Among them, the container management module starts training containers and inference containers on demand.

As a possible implementation method, when the simulation mode parameter is the simulation test algorithm performance, the data set obtained by the S530 is divided into multiple parts according to the settings in the test environment configuration parameters, and the test cases are simulated in the algorithm pseudo-container. test.

As a possible implementation method, when the simulation mode parameter is to simulate the test system performance, the data in the data set is not actually run in the system pseudo container. The system performance test simulation results are obtained through the system performance calculation formula in the system pseudo container. .

As a possible implementation method, when the simulation mode parameters are the simulation test system performance and algorithm performance, the test cases are simulated and tested in the created real container.

As a possible implementation method, when the simulation mode parameter is to simulate the test system unit performance, the local simulation workflow of the benchmark container calls the modules and module hyperparameters related to the test paradigm in sequence, and tests the test cases in the benchmark container. Carry out simulation testing.

Among them, the simulation results can include the training model, inference results, evaluation indicators and other parameters of each test case.

S580: The management device receives the simulation result of at least one task use case from the third task evaluation container, and the result display module of the management device displays the simulation result of at least one task use case.

Exemplarily, the management device receives the simulation result of at least one test case from the third benchmark test container, and the result display module of the management device displays the simulation result of the at least one test case.

As a possible implementation method, the simulation results can display test case execution information through an online console or user interactive interface, or the simulation results can be directly saved in an offline save file.

It should be noted that the simulation result of at least one test case is the simulation execution of different test cases, and the simulation result can be used to determine the target test object. For example, if the target test algorithm is determined in a specific test scenario, the simulation results can be the execution of different test algorithms. In image classification, the evaluation index in the simulation results can be the image classification accuracy of different test algorithms.

In the embodiment of the present application, by displaying the simulation results, the user can intuitively compare the simulation results of the test object, thereby obtaining objective test results, which is helpful for the user to select an appropriate target object.

It should be noted that in the multi-node AI benchmark test simulation in a single node shown in Figure 5, the management of test cases and local containers are new methods. It can also be performed on a single node according to the embodiment of this application. An architecture for multi-node AI task evaluation and other management methods, such as deletion, modification, query and other management functions. The detailed management methods will not be described here.

In the embodiment of the present application, the test case management module manages test cases (for example, creating test cases) through test configurations and test objects, and the container management module manages benchmark test work containers (for example, creates Benchmark test container), and perform benchmark test simulation through the simulation workflow in the benchmark test work container to obtain the simulation results. Not only can the test paradigm in the test case be created based on the test scenario, but multi-node simulation can also be implemented on a single node. For service developers, through the edge-cloud collaborative AI benchmark testing architecture in the embodiments of this application, test cases can be flexibly configured according to test scenarios, that is, corresponding test cases can be managed according to different test paradigms (for example, new Add corresponding test cases). For algorithm developers, to implement multi-node simulation on a single node, there is no need to repeatedly deploy the test architecture on the side and cloud side as the side test environment changes, nor does it need to be deployed on multiple virtual nodes or multiple By building a test platform on the equipment node, the algorithm and/or system can be simulated and tested, and the simulation results can be obtained, which greatly reduces manpower and material costs.

The second is to conduct distributed edge-cloud collaboration benchmark testing on multiple nodes. The benchmark testing framework is shown in Figure 3A or Figure 3B. The following mainly describes in detail using the framework of Figure 3A in conjunction with Figure 6 .

Figure 6 is a schematic flowchart of a benchmark test process for distributed edge-cloud collaborative AI provided by an embodiment of the present application. Among them, the test scenario is safety helmet target detection in industrial quality inspection as an example, and the test paradigm in the test environment is edge-cloud collaborative incremental learning as an example.

Before the management device starts to be used, start the management device according to the benchmark test mode, initialize each module in the management device, and create a local control device in the task node.

Specifically, according to the multi-node benchmark test mode, the management module is started, and the test case management module, container management module, communication module and result display module in the management device are initialized. Create a first local control device and a second local control device according to the initial test case and test configuration in the initialized test case management module.

S601. The communication module in the management device obtains the test configuration file.

It should be understood that the test configuration file obtained by the communication module in the management device is the same as the content in S510, and will not be described again for the sake of simplicity.

It should be noted that in the benchmark test of distributed collaborative AI on multiple nodes, the simulation switch in the test environment parameter configuration can be off or on. If the simulation switch is on, it will be distributed through multiple nodes. Benchmark test simulation of collaborative AI. Here, since the benchmark test mode is a multi-node distributed collaborative AI benchmark test, the simulation switch is turned off.

In the benchmark test of multi-node distributed collaborative AI, the container management configuration is global container management configuration and local container management configuration. The global container management configuration includes the number of local control devices and the size of the local control device. The local container management configuration includes each The number of local containers corresponding to the local control device and the size of each local container corresponding to each local control device. For example, as shown in FIG. 3A , the number of local control devices included in the global container management configuration is 2, that is, the first local control device and the second local control device.

S602. The communication module in the management device obtains the test object.

S603. The communication module in the management device receives a first management instruction according to the configuration of the test environment and the test object, where the first management instruction includes a test environment management instruction and/or a test object management instruction.

S604. The test case management module in the management device manages test cases according to the first management instruction. The test cases include a test environment and a test object.

It should be understood that S602, S603 and S604 are the same as the contents in S520, S540 and S550 respectively, and will not be described again for the sake of brevity.

S605: The container management module in the management device generates instructions for managing the first local control device and instructions for managing the second local control device based on the test case and the global container management configuration.

Wherein, the instruction to generate and manage the first local control device is to generate an instruction to create the first local control device, or to generate an instruction to delete the first local control device, or to generate an instruction to modify the first local control device, or to generate an instruction to query the first local control device. Instructions for controlling the device; instructions for generating and managing the second local control device are generating instructions for creating the second local control device, or generating instructions for deleting the second local control device, or generating instructions for modifying the second local control device, or generating queries. Instructions from the second local control device.

As a possible implementation method, when the management test case is a new test case, the container management module generates instructions for adding a new cloud node local control device and a new edge node local control device based on the test case and the global container management configuration. instruction.

Optionally, if the management device and the first local control device are arranged in the same cloud node, as shown in Figure 3A, then the container management module in the management device directly manages the first local control according to the test case and the global container management configuration. device.

It should be understood that the global container management configuration includes a first local control device, an instruction action corresponding to the first local management instruction, a second local control device, and an instruction action corresponding to the second local management instruction.

S606: The management device sends a first local management instruction to the first local control device and a second local management instruction to the second local control device.

Specifically, the instruction action corresponding to the first local management instruction is a new instruction to the first local control device, or a deletion instruction to the first local control device, or a modification instruction to the first local control device, or an instruction to the first local control device. Query instruction of the local control device; the management instruction corresponding to the second local management instruction is a new instruction to the second local control device, or a deletion instruction to the second local control device, or a modification instruction to the second local control device, Or a query command to the second local control device.

As a possible implementation manner, when the management test case is a new test case, the management device sends a first local control device new command to the first local control device, and sends a new second local control command to the second local control device. Device instructions.

S607: The first communication module of the first local control device receives the first local management instruction from the management device, and the first container management module of the first local control device manages the first local control device according to the first local management instruction.

It should be noted that the first local management instruction includes test cases, local container management configuration and container calling configuration.

The first local control device is used to implement life cycle management of the first task evaluation container in the task cloud node (for example, life cycle management of the first benchmark container), and communication with the external task evaluation container (for example, with the external Test container communication), the external task evaluation container is arranged on a node other than the task cloud node. For example, the first local control device is used to implement communication between the first task evaluation container in the task cloud node and the second task evaluation container in the task edge node. The specific implementation may be to implement control plane communication between the first local control device, the management device and the second local control device, so that there is a communication network between the first task evaluation container and the second task evaluation container, and the first task evaluation container is implemented Data plane communication with the first task evaluation container. For example, the first test container and the edge node test container can transmit data such as data sets, models, and algorithms.

As a possible implementation manner, the communication module of the first local control device adds an instruction of the first local control device, and the first container management module of the first local control device adds a new instruction of the first local control device according to the instruction of the first local control device. Add cloud node local control device.

S608: The first container management module of the first local control device manages the first task evaluation container according to the test case and the local container management configuration, where the first task evaluation container includes the first distributed workflow.

It should be understood that managing the first task evaluation container may be any of the basic functions of adding, deleting, modifying, and querying the first task evaluation container. For example, the addition of the first benchmark container.

It should be understood that the number of created first task evaluations is not limited here and can be determined according to the number of containers parameters in the local container management configuration.

For example, the testing paradigm is an incremental learning paradigm, and the first task evaluation container managed by the first container management module in the task cloud node is a training container.

S609: The second communication module of the second local control device receives an instruction to manage the second local control device, and manages the second local control device according to the instruction to manage the second local control device.

As a possible implementation manner, the second communication module of the second local control device receives an instruction to add a second local control device, and the second container management module of the second local control device responds according to the instruction of adding a second local control device. Command to add a second local control device.

It should be noted that the instructions for managing the second local control device include test cases, local container management configuration and capacity The server calls the configuration.

The second local control device is used to implement life cycle management of the second task evaluation container in the task cloud node (for example, life cycle management of the second benchmark container), and communication with the external task evaluation container (for example, with the external Test container communication), the external task evaluation container is arranged on a node other than the task cloud node. For example, the second local control device is used to implement communication between the first task evaluation container in the task cloud node and the second task evaluation container in the task edge node. The specific implementation process is similar to the local control device of the cloud node in S607. For the sake of simplicity, it will not be described in detail here.

S610: The second container management module in the second local control device manages the second task evaluation container according to the test case and the local container management configuration, where the second task evaluation container includes the second distributed workflow.

It should be understood that managing the second task evaluation container may be any of the basic functions of adding, deleting, modifying, and querying the second task evaluation container. For example, the addition of a second benchmark container.

It should be understood that the number of created second task evaluations is not limited here and can be determined according to the number of containers parameters in the local container management configuration.

For example, the testing paradigm is an incremental learning paradigm, and the second container management module evaluates the container as an inference container during the second management task.

It should be understood that S607 and S608 are the processes in which the task cloud node manages the first local control device and the first task evaluation container, and S609 and S610 are the processes in which the task edge node manages the second local control device and the second task evaluation container. These two The process is in no particular order.

S611. The first task evaluation container sends the algorithm or model to the second task evaluation container according to the test case and the container call configuration.

As a possible implementation, when sending the model to the second task evaluation container, the first distributed workflow of the first task evaluation container calls the model evaluation from the first task case management module according to the test case and the local container call configuration. Module,evaluates the test model to determine the target test,model. The first distributed workflow of the first task evaluation container sends the target test model to the second task evaluation container. The test model may be obtained directly, or may be trained on a task cloud node. The embodiment of the present application does not limit the source of the test model.

It should be understood that the content included in the first task use case management module is the same as the content included in the task use case management module in the management device. Therefore, when the network connection between the management device and the first local control device is disconnected, the first distributed workflow can directly call the algorithm module corresponding to the AI paradigm from the first task evaluation container.

Specifically, before the first task evaluation container sends the algorithm or model to the second task evaluation container, the first task evaluation container and the second task evaluation container pass between the first local control device, the management device and the second local control device. A communication network is built between them for data plane transmission.

In the embodiment of this application, by sending the target test model to the task edge node, the task edge node serves as the data provider and the task cloud node serves as the algorithm provider. When the two are inconsistent, the data provider can be prevented from obtaining the algorithm, thereby reducing The possibility of algorithm leakage from the algorithm provider.

As a possible implementation, when sending the algorithm to the second task evaluation container, the first task evaluation container directly sends the algorithm to the second task evaluation container according to the test case and container call configuration.

S612, the second task evaluation container receives the model or algorithm from the first task evaluation container, the second task evaluation container's second distributed workflow and according to the test case and container call configuration, calls the inference module from the second task case management module , based on the model, perform inference on the data of the task edge nodes, obtain the test results corresponding to the test cases, and send the test results to the first task evaluation container.

The explanation of the second task use case management module is similar to that of the first task use case management module, so no details will be given here.

As a possible implementation manner, when receiving the model from the first task evaluation container, the second task evaluation container receives the target test pattern from the first task evaluation container. The second distributed workflow of the second task evaluation container calls the inference module from the second task case management module according to the test case and the container call configuration, and performs inference on the target test model through the inference data to obtain the test results of the target test model, And send the test results to the first task evaluation container.

As a possible implementation, when receiving the algorithm from the first task evaluation container, the second distributed workflow of the second task evaluation container calls the training module from the second task case management module according to the test case and the container call configuration, Get the trained model. Then the inference module is called from the second test case management module, inference is performed on the data of the task edge nodes based on the trained model, the test results corresponding to the test cases are obtained, and the test results are sent to the first task evaluation container.

Optionally, S613, when the test results are in the second task evaluation container, the second communication module of the second local control device obtains the test results of at least one test case, and the second result display module of the second local control device displays at least The second local control device sends the test result of at least one test case to the global test case management module.

As a possible implementation method, the test result of at least one test case can display the execution status of at least one test case through an online console or user interaction interface. Among them, if the test object is an algorithm, then the test results can include the test algorithm, the test algorithm indicator results, the test paradigm to which the test algorithm belongs, and the super-parameter configuration of the test algorithm, etc. For example, the test results of at least one test case are displayed in the form of a ranking list on the user interaction interface.

Optionally, S614, when the test results are in the first task evaluation container, the first communication module of the first local control device obtains the test results of at least one test case, and the first result display module of the first local control device displays at least The first local control device sends the test result of at least one test case to the global test case management module

Optionally, S615, the first task evaluation container receives the test results and updates the target test model according to the test results.

It should be noted that in the benchmark test simulation of multi-node distributed collaborative AI shown in Figure 6, the management of test cases and local containers are new methods. The multi-node distributed collaborative AI in the embodiment of this application can also be managed. Collaborate with the AI task evaluation architecture to perform other management methods, such as deletion, modification, query and other management functions. The detailed management methods will not be described here.

In the embodiment of this application, in the benchmark test of multi-node distributed collaborative AI, through the management of test cases, corresponding test containers can be managed for different test paradigms to implement benchmark tests of different test cases. Especially when the data provider and algorithm provider are inconsistent, it can be ensured that the model is debugged and updated by obtaining the test results of different test cases when the data provided by the data does not leave the edge node.

The above content is combined with Figures 4 to 6 to explain in detail the distributed collaborative AI task evaluation method provided by the embodiment of the present application. The following will be combined with Figures 7 and 8 to analyze the multi-node task evaluation mode and the single-node task evaluation simulation mode respectively. The distributed collaborative AI task evaluation system and device provided by the embodiment will be described in detail. It should be understood that the devices described below can perform the foregoing methods of the embodiments of the present application. In order to avoid unnecessary repetition, repeated descriptions are appropriately omitted when introducing the devices of the embodiments of the present application.

For a structural schematic diagram of a management device for task evaluation of distributed collaborative AI provided by the embodiment of the present application, reference can be made to the management device in Figure 3A or the management device in Figure 3C. The device includes a task use case management module and a communication module. Optionally, the management device also includes a container management module and a result display module.

The communication module is used to: obtain the task configuration of distributed collaborative AI and the task object of distributed collaborative AI. The task configuration The configuration includes the configuration of the task environment of the distributed collaborative AI; according to the configuration of the task environment and the task object, a first management instruction is received, wherein the first management instruction includes a management instruction of the task environment and/or a management instruction of the task object.

The task use case management module is used to manage task use cases of distributed collaborative AI according to the first management instruction. The task use cases include task environments and task objects.

For simplicity, please refer to the above method embodiment for other steps.

Among them, the task case management module, container management module and result display module can all be implemented by software or can be implemented by hardware.

Illustratively, the following uses the task use case management module as an example to introduce the implementation method of the task use case management module. Similarly, the implementation of the container management module and result display module can refer to the implementation of the task case management module.

When implemented by software, the task case management module may be an application or block of code running on a computer device. The computer device may be at least one of a physical host, a virtual machine, a container, and other computing devices. Further, the above computer equipment may be one or more. For example, the task case management module can be an application running on multiple hosts/virtual machines/containers. It should be noted that multiple hosts/virtual machines/containers used to run the application can be distributed in the same availability zone (AZ) or in different AZs. Multiple hosts/VMs/containers used to run the application can be distributed in the same region or in different regions. Among them, usually a region can include multiple AZs.

Likewise, multiple hosts/VMs/containers used to run the application can be distributed in the same virtual private cloud (VPC) or across multiple VPCs. Among them, usually a region can include multiple VPCs, and a VPC can include multiple AZs.

When implemented by hardware, the task case management module may include at least one computing device, such as a server. Alternatively, the task use case management module can also be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD). Among them, the above-mentioned PLD can be a complex programmable logical device (CPLD), a field-programmable gate array (field-programmable gate array, FPGA), a general array logic (generic array logic, GAL), or any combination thereof.

Multiple computing devices included in the task case management module can be distributed in the same AZ or in different AZs. Multiple computing devices included in the task case management module can be distributed in the same region or in different regions. Similarly, multiple computing devices included in the task case management module can be distributed in the same VPC or in multiple VPCs. The plurality of computing devices may be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs.

For a structural schematic diagram of a first local control device for task evaluation of another distributed collaborative AI provided by the embodiment of the present application, reference can be made to the first local control device in Figure 3A. The first local control device includes a first container management module and The first communication module. The first local control device may also include a first task case management module and a first result display module.

The first communication module is configured to: receive a first local management instruction from the management device, the first local management instruction is determined according to the task use case, the task use case is determined according to the first management instruction, wherein the first management instruction includes the task environment management instructions and/or management instructions for task objects.

The first container management module is configured to manage the first local control device according to the first local management instruction.

For a structural schematic diagram of a second local control device for task evaluation of distributed collaborative AI provided by an embodiment of the present application, reference can be made to the second local control device in Figure 3A. The second local control device includes a second container management module. block and a second communication module. The second local control device may also include a second task case management module and a second result display module.

The second communication module is configured to: receive a second local management instruction from the management device, the second local management instruction is determined according to the task use case, and the task use case is determined according to the second management instruction, wherein the second management instruction includes the task environment. management instructions and/or management instructions for task objects.

The second container management module is configured to manage the second local control device according to the second local management instruction.

It should be noted that the above-mentioned modules are functionally logically divided and do not limit the above-mentioned modules to be independent hardware units. The term "module" here can be implemented in the form of software and/or hardware, and is not specifically limited.

For example, a "module" may be a software program, a hardware circuit, or a combination of both that implements the above functions. The hardware circuit may include an application specific integrated circuit (ASIC), an electronic circuit, a processor (such as a shared processor, a dedicated processor, or a group processor) for executing one or more software or firmware programs. etc.) and memory, merged logic circuitry, and/or other suitable components to support the described functionality.

Therefore, the modules of each example described in the embodiments of the present application can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.

Figure 7 is a schematic diagram of the hardware structure of a distributed collaborative AI task evaluation system provided by an embodiment of the present application. The hardware structural diagram of the task evaluation system in Figure 7 is suitable for multi-node task evaluation mode. As shown in FIG. 7 , the computer device cluster corresponding to the system includes at least one computing device. For example, as shown in FIG. 7 , the at least one computing device may include a computing device 700A, a computing device 700B, and a computing device 700C. Each computing device includes a bus 702, a processor 704, a communication interface 708, and a memory 706. The processor 704, the memory 706 and the communication interface 708 communicate through the bus 702.

The bus 702 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one line is used in Figure 3, but it does not mean that there is only one bus or one type of bus. Bus 702 may include a path that carries information between various components of computing device 700 (eg, memory 706, processor 704, communications interface 708).

The processor 704 may include a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (micro processor, MP) or a digital signal processor (digital signal processor, DSP). any one or more of them.

Memory 706 may include volatile memory, such as random access memory (RAM). The processor 704 may also include non-volatile memory (non-volatile memory), such as read-only memory (ROM), flash memory, hard disk drive (HDD) or solid state drive (solid state drive). drive, SSD).

The memory 706 stores executable program codes, and the processor 704 executes the executable program codes to respectively implement the functions of the modules in the aforementioned device. As shown in Figure 7, the memory 706 in the computing device 700A stores executable program codes for each module in the management device, and the processor 704 in the computing device 700A executes the executable program codes to implement the functions of the modules in the management device. . The memory 706 in the computing device 700B stores executable program codes for each module in the first local control device, and the processor 704 in the computing device 700B executes the executable program codes to implement the functions of the modules in the management device. Memory 706 in computing device 700C stores a second local control device The executable program code of each module is installed, and the processor 704 in the computing device 700C executes the executable program code to implement the function of the module in the management device.

As a possible implementation manner, at least one computing device can jointly execute the instructions of the management device, the first local control device, and the second local control device for the method in FIG. 4, or FIG. 5, or FIG. 6.

The communication interface 708 uses transceiver modules such as, but not limited to, network interface cards and transceivers to implement communication between at least one computing device 700 and other devices or communication networks.

As a possible implementation, at least one computing device may be connected via a network. Wherein, the network may be a wide area network or a local area network, etc., as shown in Figure 7.

Figure 8 is a schematic diagram of the hardware structure of a distributed collaborative AI task evaluation and management device provided by an embodiment of the present application. Figure 8 is a single-node task evaluation simulation mode suitable for distributed collaborative AI. Computing device 800 includes bus 802, processor 804, communication interface 808, and memory 806. The processor 804, the memory 806 and the communication interface 808 communicate through the bus 802.

For detailed descriptions of the hardware in the computing device 800, please refer to each computing device in He 7. For the sake of brevity, no further description is given here. As shown in Figure 8, the memory 806 in the computing device 800 stores executable program codes for each module in the management device, and the processor 804 in the computing device 800 executes the executable program code to implement the functions of the modules in the management device. .

As a possible implementation manner, the computing device 800 may jointly execute the instructions of the management device for the method in FIG. 4, or FIG. 5, or FIG. 6.

According to the method provided by the embodiment of the present application, the present application also provides a computer program product. The computer program product includes: computer program code. When the computer program code is run on the computer, the computer executes the execution as shown in Figure 4, or as shown in Figure 4. 5, or the method of the embodiment shown in Figure 6.

According to the method provided by the embodiment of the present application, the present application also provides a computer-readable medium. The computer-readable medium stores program code. When the program code is run on a computer, the computer executes the execution as shown in Figure 4, or Figure 4. 5, or the method of the embodiment shown in Figure 6.

Those of ordinary skill in the art will appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented with electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.

Those skilled in the art can clearly understand that for the convenience and simplicity of description, the specific working processes of the systems, devices and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be described again here.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit.

If the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code. .

The above are only specific embodiments of the present application, but the protection scope of the present application is not limited thereto. Any person familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the present application. should be covered by the protection scope of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims

A method for task evaluation of distributed collaborative artificial intelligence AI, characterized in that the method is applied to control nodes, and the method includes:

Obtain the task configuration of the distributed collaborative AI and the task object of the distributed collaborative AI, where the task configuration includes the configuration of the task environment of the distributed collaborative AI;

According to the configuration of the task environment and the task object, receive a first management instruction, wherein the first management instruction includes a management instruction of the task environment and/or a management instruction of the task object;

According to the first management instruction, the task use case of the distributed collaborative AI is managed, and the task use case includes the task environment and the task object.
The method according to claim 1, wherein the task use case for managing the distributed collaborative AI includes:

Add the task use case, delete the task use case, modify the task use case, or query the task use case.
The method according to claim 1 or 2, characterized in that when the task evaluation mode is the multi-node task evaluation mode, the method further includes:

According to the task use case and the global container management configuration, a first local management instruction to the first local control device and a second local management instruction to the second local control device are generated, wherein the task configuration includes the global container management Configuration, the global container management configuration includes the first local control device, the instruction action corresponding to the first local management instruction, the second local control device and the instruction action corresponding to the second local management instruction;

The first local management instruction is sent to the first local control device, and the second local management instruction is sent to the second local control device.
The method of claim 3, wherein the instruction action corresponding to the first local management instruction is a new instruction to the first local control device or a deletion instruction to the first local control device. , or a modification instruction to the first local control device, or a query instruction to the first local control device;

The management instruction corresponding to the second local management instruction is a new instruction for the second local control device, a deletion instruction for the second local control device, or a modification instruction for the second local control device. , or a query instruction to the second local control device.
The method of claim 4, further comprising:

receiving an evaluation result of at least one of the task use cases from the first local control device or the second local control device;

Display an evaluation result of the at least one task case.
The method according to claim 1 or 2, characterized in that when the mode of task evaluation is the single-node task evaluation simulation mode, the simulation switch configuration parameter in the task configuration is start, and the method further include:

Manage a third task evaluation container according to the task use case and the local container management configuration, wherein the task evaluation container includes a local simulation workflow, the task configuration includes a local container management configuration, and the local container management configuration includes the The third task evaluation container and the management action corresponding to the third task evaluation container,

Wherein, the management actions corresponding to the third task evaluation container include adding a new addition to the third task evaluation container, deleting the third task evaluation container, or modifying the third task evaluation container, or Regarding the mentioned A three-task evaluation container query.
The method according to claim 6, characterized in that when the task use case for managing the distributed collaborative AI is to add the task use case, the management of the third task use case is based on the task use case and the local container management configuration. The three-task assessment container includes:

According to the task use case and the local container management configuration, the third task evaluation container is added, and the third task evaluation container includes a local simulation workflow.
The method according to claim 7, characterized in that the method includes: the task configuration further includes a simulation mode corresponding to the third task evaluation container;

The simulation mode corresponding to the third task evaluation container is any one of the following modes:

The simulation mode is simulation testing algorithm performance; or

The simulation mode is to simulate and test system performance; or

The simulation mode is to simulate and test the algorithm performance and the system performance; or

The simulation mode is to simulate and test system unit performance.
The method according to claim 8, characterized in that:

When the simulation mode is a simulation test algorithm performance, the method further includes: creating an algorithm pseudo-container in the third task evaluation container corresponding to the simulation mode of the simulation test algorithm performance; or

When the simulation mode is a simulation test system performance, the method further includes: creating a system pseudo-container in the third task evaluation container corresponding to the simulation mode of the simulation test system performance; or

When the simulation mode is to simulate and test the algorithm performance and the system performance, create a real container in the third task evaluation container corresponding to the simulation mode of the simulation to test the algorithm performance and the system performance; or

When the simulation mode is to simulate the performance of the test system unit, the third task evaluation container corresponding to the simulation mode in which the simulation mode is to simulate the performance of the test system unit is used.
The method according to claim 8 or 9, characterized in that, the method further includes:

Receive a simulation result of at least one task use case from the third task evaluation container, where the simulation result is the simulation mode corresponding to the local container call configuration and the third task evaluation container according to the local simulation workflow , obtained by simulating the task use case;

Wherein, the task configuration includes the local container calling configuration, the local container calling configuration includes the calling sequence of the algorithm module in the task paradigm, and the hyperparameters corresponding to the algorithm module, wherein the task use case includes the task environment, the task environment including the task paradigm;

Display simulation results for at least one of the described task cases.
The method according to any one of claims 1 to 10, characterized in that the type of task evaluation is any one of benchmark testing, certification or competition.
The method according to claim 11, characterized in that when the type of task evaluation is a benchmark test,

The task environment is a test environment,

The task object is a test object, and the test object is any one of a test algorithm, a test model, a test system or a test scenario,

The task paradigm is the test paradigm.
A method for task evaluation of distributed collaborative artificial intelligence AI, the method is applied to task cloud nodes, and is characterized in that the method includes:

Receive a first local management instruction from the management device, the first local management instruction is determined according to the task use case of the distributed collaborative AI, the task use case is determined according to the first management instruction, wherein the first local management instruction is determined according to the task use case of the distributed collaborative AI. A management instruction includes a management instruction of a task environment and/or a management instruction of a task object, and the task use case includes the task environment and the task object;

According to the first local management instruction, the first local control device is managed.
The method according to claim 13, characterized in that the first management instruction is used to add the task use case, delete the task use case, modify the task use case, or query the task use case;

The management of the first local control device is to add the first local control device, delete the first local control device, modify the first local control device, or query the first local control device.
The method according to claim 13 or 14, characterized in that when the managing the first local control device is to add the first local control device, the method further includes:

The first local management instruction includes the first local container management configuration corresponding to the task use case and the task cloud node;

Manage a first task evaluation container according to the task use case and the first local container management configuration, where the first local container management configuration includes the first task evaluation container and management instructions corresponding to the first evaluation container.
The method according to claim 15, characterized in that the management instruction corresponding to the first task evaluation container is a new instruction to the first task evaluation container or a deletion instruction to the first task evaluation container. , or a modification instruction to the first task evaluation container, or a query instruction to the first task evaluation container, or an instruction to implement communication between the first task evaluation container and an external task evaluation container, wherein, The external task evaluation container belongs to a node other than the task cloud node.
The method according to claim 16, characterized in that if the first task evaluation container is to add the first task evaluation container, then add a first task evaluation container corresponding to the task cloud node, wherein, The first AI task evaluation container corresponding to the task cloud node includes a first distributed workflow.
The method of claim 17, further comprising:

When the evaluation result of the task use case is in the first task evaluation container, receiving the evaluation result of at least one of the task use cases from the first AI task evaluation container;

Display the evaluation results of at least one of the described task cases;

Send an evaluation result of at least one of the task use cases to the management device.
The method according to any one of claims 13 to 18, characterized in that the type of task evaluation is any one of benchmark testing, certification or competition.
The method according to claim 19, characterized in that when the type of task evaluation is a benchmark test,

The task environment is a test environment,

The task object is a test object, and the test object is any one of a test algorithm, a test model, a test system or a test scenario,

The task paradigm is the test paradigm.
A method for task evaluation of distributed collaborative artificial intelligence AI, the method is applied to task cloud nodes, and is characterized in that the method includes:

Receive a second local management instruction from the management device, the second local management instruction is determined according to the task use case, the task use case is determined according to the second management instruction, wherein the second management instruction includes a task environment Management instructions and/or management instructions of task objects, the task use cases include the task environment and the task object;

The second local control device is managed according to the second local management instruction.
The method according to claim 21, characterized in that the second management instruction is used to add the task use case, delete the task use case, modify the task use case, or query the task use case;

The management of the second local control device is to add the second local control device, delete the second local control device, modify the second local control device, or query the second local control device.
The method according to claim 21 or 22, characterized in that when the managing the second local control device is to add the second local control device, the method further includes:

The second local management instruction includes the second local container management configuration corresponding to the task use case and the task cloud node;

A second task evaluation container is managed according to the task use case and the second local container management configuration, and the second local container management configuration includes the second task evaluation container and management instructions corresponding to the second evaluation container.
The method according to claim 23, characterized in that the management instruction corresponding to the second task evaluation container is a new instruction for the second task evaluation container or a deletion instruction for the second task evaluation container. , or a modification instruction to the second task evaluation container, or a query instruction to the second task evaluation container, or an instruction to implement communication between the second task evaluation container and an external task evaluation container, wherein, The external task evaluation container belongs to a node other than the task cloud node.
The method according to claim 24, characterized in that if the second task evaluation container is to add the second task evaluation container, then add a second task evaluation container corresponding to the task cloud node, wherein, The second AI task evaluation container corresponding to the task cloud node includes a second distributed workflow.
The method of claim 25, further comprising:

When the evaluation result of the task use case is in the second task evaluation container, receiving the evaluation result of at least one of the task use cases from the second AI task evaluation container;

Display the evaluation results of at least one of the described task cases;

Send an evaluation result of at least one of the task use cases to the management device.
The method according to any one of claims 21 to 26, characterized in that the type of task evaluation is any one of benchmark testing, certification or competition.
The method according to claim 27, characterized in that when the type of task evaluation is a benchmark test,

The task environment is a test environment,

The task object is a test object, and the test object is any one of a test algorithm, a test model, a test system or a test scenario,

The task paradigm is the test paradigm.
A management device for task evaluation of distributed collaborative artificial intelligence AI, characterized in that the management device includes a task use case management module and a communication module,

The communication module is configured to obtain the task configuration of the distributed collaborative AI and the distributed collaborative AI task object, where the task configuration includes the configuration of the task environment of the distributed collaborative AI;

The communication module is configured to receive a first management instruction according to the configuration of the task environment and the task object, wherein the first management instruction includes a management instruction of the task environment and/or a management instruction of the task object. management instructions;

The task case management module is used for:

According to the first management instruction, the task use case of the distributed collaborative AI is managed, and the task use case includes the task environment and the task object.
A first local control device for task evaluation of distributed collaborative artificial intelligence AI, characterized in that the first local control device includes a first container management module and a first communication module,

The first communication module is configured to: receive a first local management instruction from a management device, the first local management instruction is determined based on the task use case of the distributed collaborative AI, the task use case is determined based on the first management Determined by instructions, wherein the first management instructions include management instructions for the task environment of the distributed collaborative AI and/or management instructions for the task objects of the distributed collaborative AI;

The first container management module is used for:

The first local control device is managed according to the first local management instruction.
A second local control device for task evaluation of distributed collaborative artificial intelligence AI, characterized in that the device includes a second container management module and a second communication module,

The second communication module is configured to: receive a second local management instruction from the management device, the second local management instruction is determined according to the task use case of the distributed collaborative AI, the task use case is determined according to the first management Determined by instructions, wherein the first management instructions include management instructions for the task environment of the distributed collaborative AI and/or management instructions for the task objects of the distributed collaborative AI;

The second container management module is configured to manage the second local control device according to the second local management instruction.
A task evaluation system for distributed collaborative artificial intelligence AI, characterized in that the task evaluation system includes a management device according to claim 29, a first local control device according to claim 30, and a first local control device according to claim 31 The second local control device.
A computer device, characterized in that the computer device includes a processor and a memory;

The memory is used to store computer execution instructions;

The processor is configured to execute computer execution instructions stored in the memory, so that the computer device performs the method as described in any one of claims 1 to 12, or performs the method as described in any one of claims 13 to 20. method, or perform the method according to any one of claims 21 to 28.
A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, and when the computer program is run on one or more processors, it causes the computer to execute the method as claimed in claim 1 The method according to any one of claims 13 to 12, or the method described in any one of claims 13 to 20, or the method described in any one of claims 21 to 28.