WO2023231781A1 - Distributed collaborative ai task evaluation method, management apparatus, control apparatus and system - Google Patents

Distributed collaborative ai task evaluation method, management apparatus, control apparatus and system Download PDF

Info

Publication number
WO2023231781A1
WO2023231781A1 PCT/CN2023/094843 CN2023094843W WO2023231781A1 WO 2023231781 A1 WO2023231781 A1 WO 2023231781A1 CN 2023094843 W CN2023094843 W CN 2023094843W WO 2023231781 A1 WO2023231781 A1 WO 2023231781A1
Authority
WO
WIPO (PCT)
Prior art keywords
task
container
management
evaluation
test
Prior art date
Application number
PCT/CN2023/094843
Other languages
French (fr)
Chinese (zh)
Inventor
郑子木
杨锦
罗思奇
齐飞
Original Assignee
华为云计算技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为云计算技术有限公司 filed Critical 华为云计算技术有限公司
Publication of WO2023231781A1 publication Critical patent/WO2023231781A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Definitions

  • Embodiments of the present application relate to the field of artificial intelligence, and more specifically, to a distributed collaborative AI task evaluation method, management device, control device and system.
  • edge AI technology has data advantages and can reduce communication delays, and can be used in tasks with high latency requirements. Since the cloud side has computing power advantages and the edge side has the advantages of data and low latency, it has become a development trend for distributed edge-cloud collaboration to complete the task of distributed collaborative AI.
  • Embodiments of the present application provide a distributed collaborative AI task evaluation method, management device, control device and system.
  • flexibly configuring the task use cases of distributed collaborative AI and then flexibly managing the task evaluation container of distributed collaborative AI, it helps to obtain different results.
  • the execution of AI task use cases makes the distributed collaborative AI task processing architecture easy to deploy.
  • a method for task evaluation of distributed collaborative artificial intelligence is provided.
  • the method is applied to the control node.
  • the method includes: obtaining the task configuration of distributed collaborative AI and the task object of distributed collaborative AI.
  • the task configuration includes Configuration of the task environment of the distributed collaborative AI; receiving the first management instruction according to the configuration of the task environment and the task object, wherein the first management instruction includes the management instruction of the task environment and/or the management instruction of the task object; according to the first management Instructions manage task use cases of distributed collaborative AI.
  • Task use cases include task environments and task objects.
  • distributed collaborative AI means that different AI processes corresponding to the AI paradigm can be arranged in the same Implemented in different containers of the device, or different AI processes can be implemented in different containers arranged on different devices.
  • the machine learning paradigm includes a training process and an inference process, where the training process and the inference process can be implemented at the same time. It can be implemented in different containers arranged on one device, or the training process and inference process can be implemented in different containers on different devices.
  • task use cases corresponding to the paradigm of distributed collaborative artificial intelligence can be flexibly managed, which facilitates service developers to manage task use cases according to business needs in actual scenarios. , which facilitates the processing of different task use cases and helps to obtain the execution status of different AI task use cases, thereby making the edge-cloud collaborative distributed AI task architecture easy to deploy.
  • managing distributed collaborative AI task use cases includes: adding task use cases, deleting task use cases, modifying task use cases, or querying task use cases.
  • the method when the task evaluation mode is a multi-node task evaluation mode, the method further includes: generating a first local management instruction and a response to the first local control device according to the task use case and the global container management configuration.
  • the second local management instruction of the second local control device wherein the task configuration includes a global container management configuration, and the global container management configuration includes the first local control device, the instruction action corresponding to the first local management instruction, the second local control device and the third Instruction actions corresponding to the two local management instructions: sending the first local management instruction to the first local control device, and sending the second local management instruction to the second local control device.
  • the multi-node task evaluation mode can perform multi-node distributed collaboration at task edge nodes and task cloud nodes through control cloud nodes.
  • the instruction action corresponding to the first local management instruction is a new instruction to the first local control device, a deletion instruction to the first local control device, or a modification to the first local control device.
  • management command corresponding to the second local management command is a new command to the second local control device, or a deletion command to the second local control device, or a command to the second local control device. Modification instructions to the control device, or query instructions to the second local control device.
  • the method further includes: receiving an evaluation result of at least one task use case from the first local control device or the second local control device; and displaying the evaluation result of at least one task use case.
  • the corresponding task evaluation containers can be managed through different paradigms to realize the evaluation of different task use cases.
  • the data provider and algorithm provider are inconsistent, it can be ensured that when the data provided by the data does not leave the edge node, by obtaining the evaluation results of different task use cases, it helps to obtain the execution status of different task use cases, thereby making the edge
  • the cloud collaborative distributed AI task architecture is easy to deploy.
  • the simulation switch configuration parameter in the task configuration is start.
  • the method also includes: managing the third party according to the task use case and the local container management configuration.
  • a task evaluation container where the task evaluation container includes a local simulation workflow, the task configuration includes a local container management configuration, and the local container management configuration includes an AI task evaluation container and management actions corresponding to the AI task evaluation container, where the third task evaluation container corresponds
  • the management actions include adding a third task evaluation container, deleting a third task evaluation container, modifying a third task evaluation container, or querying a third task evaluation container.
  • the single-node task evaluation simulation mode can be a single-node task evaluation simulation on a cloud node, or a single-node task evaluation simulation on an edge node.
  • the task use case management module manages task use cases (for example, creating AI task use cases) through task configurations and task objects, and the container management module manages task evaluation content through task use cases and simulation modes.
  • container for example, add a new AI task container
  • task evaluation simulation through the simulation workflow in the task evaluation container to obtain simulation results.
  • task use cases can be flexibly configured according to task scenarios, that is, corresponding task use cases can be managed according to different AI paradigms (for example, Create corresponding task cases).
  • managing the third task evaluation container according to the task use case and local container management configuration includes: according to the task use case and local container management Configuration, add the third task evaluation container The third task evaluation container includes the local simulation workflow.
  • the method includes: the task configuration also includes a simulation mode corresponding to the third task evaluation container; the simulation mode corresponding to the third task evaluation container is any one of the following modes: the simulation mode is a simulation test algorithm performance ; Or the simulation mode is to simulate and test system performance; or the simulation mode is to simulate and test algorithm performance and system performance; or the simulation mode is to simulate and test system unit performance.
  • the method when the simulation mode is to simulate and test algorithm performance, the method also includes: creating an algorithm pseudo-container in the third task evaluation container corresponding to the simulation mode of simulating and testing algorithm performance; or when the simulation mode is to When simulating and testing system performance, the method also includes: creating a system pseudo-container in the third task evaluation container corresponding to the simulation mode of simulating and testing system performance; or when the simulation mode is to simulate and test algorithm performance and system performance, in the simulation test algorithm Create a real container in the third task evaluation container corresponding to the simulation mode of performance and system performance; or when the simulation mode is to simulate the test system unit performance, continue to use the third task evaluation corresponding to the simulation mode to simulate the test system unit performance. container.
  • the method further includes: receiving a simulation result of at least one task case from the third task evaluation container, where the simulation result is that the third task evaluation container calls the configuration according to the local container and corresponds to the third task evaluation container.
  • Simulation mode is obtained by simulating task use cases; among them, the task configuration includes the local container call configuration, and the local container call configuration includes the calling sequence of the algorithm module in the task paradigm, and the hyperparameters corresponding to the algorithm module.
  • the task use case includes the task environment.
  • the task environment includes the task paradigm; the simulation results of at least one task use case are displayed.
  • the type of task evaluation is any one of benchmark testing, certification, or competition.
  • the task environment is a test environment
  • the task object is a test object
  • the test object is any one of a test algorithm, a test model, a test system, or a test scenario.
  • the task paradigm is the test paradigm.
  • a method for task evaluation of distributed collaborative artificial intelligence AI is provided.
  • the method is applied to task cloud nodes.
  • the method includes: receiving a first local management instruction from a management device, and the first local management instruction is based on The task use case of distributed collaborative AI is determined according to the first management instruction, where the first management instruction includes the management instruction of the task environment and/or the management instruction of the task object, and the task use case includes the task environment and the task object. ; Manage the first local control device according to the first local management instruction.
  • Corresponding task evaluation containers can be managed through different paradigms to achieve evaluation of different task use cases. Especially when the data provider and algorithm provider are inconsistent, it can be ensured that when the data provided by the data does not leave the edge node, by obtaining the evaluation results of different task use cases, it helps to obtain the execution status of different task use cases, thereby making the edge
  • the cloud collaborative distributed AI task architecture is easy to deploy.
  • the first management instruction is used to add a task use case, delete a task use case, modify a task use case, or query a task use case; managing the first local control device is to add a first local control device, Either delete the first local control device, modify the first local control device, or query the first local control device.
  • the method when managing the first local control device is to add a new first local control device, the method further includes: the first local management instruction includes the task use case and the first local container management configuration corresponding to the task cloud node. ; Manage the first task evaluation container according to the task use case and the first local container management configuration, and the first local container management configuration includes the first task evaluation container and management instructions corresponding to the first evaluation container.
  • the management instruction corresponding to the first task evaluation container is a new instruction to the first task evaluation container, a deletion instruction to the first task evaluation container, or a modification to the first task evaluation container. Instructions, or query instructions to the first task evaluation container, or instructions to implement communication between the first task evaluation container and an external task evaluation container, wherein the external task evaluation container belongs to a node other than the task cloud node.
  • the first task evaluation container is a newly added first task evaluation container, then the first task evaluation container corresponding to the new task cloud node, where the first AI task evaluation container corresponding to the task cloud node
  • the container includes a first distributed workflow.
  • the method further includes: when the evaluation result of the task use case is in the first task evaluation container, receiving the evaluation result of at least one task use case from the first AI task evaluation container; displaying the evaluation result of the at least one task use case. Evaluation results; send the evaluation results of at least one task case to the management module.
  • the user can intuitively compare the evaluation results of the task objects, thereby obtaining objective evaluation results of the task use cases, which is beneficial to the user in choosing the appropriate one.
  • Target task object by displaying the evaluation results after evaluating different task use cases, the user can intuitively compare the evaluation results of the task objects, thereby obtaining objective evaluation results of the task use cases, which is beneficial to the user in choosing the appropriate one.
  • Target task object by displaying the evaluation results after evaluating different task use cases, the user can intuitively compare the evaluation results of the task objects, thereby obtaining objective evaluation results of the task use cases, which is beneficial to the user in choosing the appropriate one.
  • the type of task evaluation is any one of benchmark testing, certification, or competition.
  • the task environment is a test environment
  • the task object is a test object
  • the test object is any one of a test algorithm, a test model, a test system, or a test scenario.
  • the task paradigm is the test paradigm.
  • a method for task evaluation of distributed collaborative artificial intelligence AI is provided.
  • the method is applied to task edge nodes.
  • the method includes: receiving a second local management instruction from the management device, and the second local management instruction is based on The task use case is determined according to the second management instruction, wherein the second management instruction includes the management instruction of the task environment and/or the management instruction of the task object, and the task use case includes the task environment and the task object; according to the second local Management instructions to manage the second local control device.
  • the corresponding task evaluation containers can be managed through different paradigms to realize the evaluation of different task use cases.
  • the data provider and algorithm provider are inconsistent, it can be ensured that when the data provided by the data does not leave the edge node, by obtaining the evaluation results of different task use cases, it helps to obtain the execution status of different task use cases, thereby making the edge
  • the cloud collaborative distributed AI task architecture is easy to deploy.
  • the second management instruction is used to add a task use case, delete a task use case, modify a task use case, or query a task use case; managing the second local control device is to add a second local control device, Either delete the second local control device, modify the second local control device, or query the second local control device.
  • the method when managing the second local control device is to add a second local control device, the method further includes: the second local management instruction includes the task use case and the second local container management configuration corresponding to the task cloud node. ; Manage the second task evaluation container according to the task use case and the second local container management configuration, and the second local container management configuration includes the second task evaluation container and management instructions corresponding to the second evaluation container.
  • the management instruction corresponding to the second task evaluation container is a new instruction to the second task evaluation container, a deletion instruction to the second task evaluation container, or a modification to the second task evaluation container. Instructions, or query instructions for the second task evaluation container, or instructions to implement communication between the second task evaluation container and the external task evaluation container, wherein the external task evaluation container belongs to a node other than the task cloud node.
  • the second task evaluation container is a new second task evaluation container
  • a second task evaluation container corresponding to the new task cloud node is added, where the second AI task evaluation container corresponding to the task cloud node is
  • the container includes a second distributed workflow.
  • the method further includes: when the evaluation result of the task use case is in the second task evaluation container, receiving the evaluation result of at least one task use case from the second AI task evaluation container; displaying the evaluation result of the at least one task use case. Evaluation results; sending the evaluation results of at least one task use case to the management device.
  • the type of task evaluation is any one of benchmark testing, certification, or competition.
  • the task environment is a test environment
  • the task object is a test object
  • the test object is any one of a test algorithm, a test model, a test system, or a test scenario.
  • the task paradigm is the test paradigm.
  • a management device for task evaluation of distributed collaborative artificial intelligence includes a task use case management module and a communication module.
  • the device includes: a communication module used to: obtain the task configuration and distribution of distributed collaborative AI.
  • the task object of the distributed collaborative AI the task configuration includes the configuration of the task environment of the distributed collaborative AI; according to the configuration of the task environment and the task object, the first management instruction is received, wherein the first management instruction includes the management instruction of the task environment and/or Management instructions for task objects; the task use case management module is used to manage task use cases of distributed collaborative AI according to the first management instruction.
  • the task use cases include task environments and task objects.
  • distributed collaborative AI means that different AI processes corresponding to the AI paradigm can be implemented in different containers arranged on the same device, or different AI processes can be implemented in different containers arranged on different devices.
  • the machine learning paradigm includes a training process and an inference process, where the training process and the inference process can be implemented in different containers arranged on the same device, or the training process and the inference process can be implemented in different containers on different devices.
  • task use cases corresponding to the paradigm of distributed collaborative artificial intelligence can be flexibly managed, which facilitates service developers to manage task use cases according to business needs in actual scenarios. , which facilitates the processing of different task use cases and helps to obtain the execution status of different AI task use cases, thereby making the edge-cloud collaborative distributed AI task architecture easy to deploy.
  • managing distributed collaborative AI task use cases includes: adding task use cases, deleting task use cases, modifying task use cases, or querying task use cases.
  • the management device when the task evaluation mode is a multi-node task evaluation mode, the management device is arranged at the control cloud node.
  • the management device also includes a container management module, and the container management module is configured to: according to the task use case and The global container management configuration generates a first local management instruction for the first local control device and a second local management instruction for the second local control device, wherein the task configuration includes a global container management configuration, and the global container management configuration includes The first local control device, the instruction action corresponding to the first local management instruction, the second local control device and the instruction action corresponding to the second local management instruction; the communication module is used to: send the first local management instruction to the first local control device, Send a second local management instruction to the second local control device.
  • the multi-node task evaluation mode can perform multi-node distributed collaboration at task edge nodes and task cloud nodes through control cloud nodes.
  • the instruction action corresponding to the first local management instruction is a new instruction to the first local control device, a deletion instruction to the first local control device, or a modification to the first local control device.
  • management command corresponding to the second local management command is a new command to the second local control device, or a deletion command to the second local control device, or a command to the second local control device. Modification instructions to the control device, or query instructions to the second local control device.
  • the device further includes a second result display module: the communication module is configured to: receive the execution status of at least one task case from the first local control device or the second local control device; the second result display module Used for: Displaying the execution status of at least one task case.
  • the corresponding task evaluation containers can be managed through different paradigms to realize the evaluation of different task use cases.
  • the data provider and algorithm provider are inconsistent, it can be ensured that when the data provided by the data does not leave the edge node, by obtaining the evaluation results of different task use cases, it helps to obtain the execution status of different task use cases, thereby making the edge
  • the cloud collaborative distributed AI task architecture is easy to deploy.
  • the management device when the task evaluation mode is a single-node task evaluation simulation mode, the simulation switch configuration parameter in the task configuration is start, the management device is arranged on a single node, and the management device also includes a container management module :
  • the container management module is used to: manage the third task evaluation container according to the task use case and the local container management configuration, where the task evaluation container includes the local simulation workflow, the task configuration includes the local container management configuration, and the local container management configuration includes the AI task evaluation.
  • the management actions corresponding to the third task evaluation container include adding a third task evaluation container, deleting the third task evaluation container, or deleting the third task evaluation container. Modification, or query to the third task evaluation container.
  • the single-node task evaluation simulation mode can be to perform single-node task evaluation simulation on the cloud node, or to perform single-node task evaluation simulation on the edge node, that is, the management device can be arranged on the edge node, or can be Arranged on multiple nodes.
  • the task use case management module manages task use cases (for example, creating AI task use cases) through task configurations and task objects
  • the container management module manages task evaluation containers (for example, new AI task use cases) through task use cases and simulation modes. Add AI task container), and perform task evaluation simulation through the simulation workflow in the task evaluation container to obtain the simulation results.
  • task use cases can be flexibly configured according to task scenarios, that is, corresponding task use cases can be managed according to different AI paradigms (for example, Create corresponding task cases).
  • the container management module is used to: add a third task evaluation container according to the task use case and local container management configuration, and add a third task evaluation container.
  • Task The evaluation container includes local simulation workflows.
  • the task configuration also includes a simulation mode corresponding to the third task evaluation container;
  • the simulation mode corresponding to the third task evaluation container is any one of the following modes: the simulation mode It is to simulate and test algorithm performance; or the simulation mode is to simulate and test system performance; or the simulation mode is to simulate and test the algorithm performance and the system performance; or the simulation mode is to simulate and test system unit performance.
  • the container management module when the simulation mode is to simulate and test algorithm performance, in the third task evaluation container corresponding to the simulation mode of simulating and testing algorithm performance, the container management module is also used to create an algorithm pseudo-container; or when the simulation When the mode is to simulate and test system performance, in the third task evaluation container corresponding to the simulation mode of simulating and testing system performance, the container management module is also used to: create a system pseudo-container; or when the simulation mode is to simulate and test algorithm performance and system performance , in the third task evaluation container corresponding to the simulation mode of simulating test algorithm performance and system performance, the container management module is also used to create a real container; or when the simulation mode is to simulate and test system unit performance, the container management module is also used to inherit The simulation mode is a third task evaluation container corresponding to the simulation mode that simulates the performance of the test system unit.
  • the management device further includes a result display module
  • the communication module is configured to receive the simulation result of at least one task case from the third task evaluation container, and the simulation result is that the third task evaluation container calls the configuration according to the local container
  • the simulation mode corresponding to the third task evaluation container is obtained by simulating the task use case; among them, the task configuration includes the local container call configuration, and the local container call configuration includes the calling sequence of the algorithm module in the task paradigm, and the hyperparameters corresponding to the algorithm module.
  • the task use case includes a task environment, and the task environment includes a task paradigm
  • the result display module is used to: display the simulation result of at least one task use case.
  • the type of task evaluation is any one of benchmark testing, certification, or competition.
  • the task environment is a test environment
  • the task object is a test object
  • the test object is a test algorithm, a test model, and a test system.
  • the task paradigm is the test paradigm.
  • a first local control device for task evaluation of distributed collaborative artificial intelligence AI is provided.
  • the first local control device is arranged at a task cloud node.
  • the first local control device includes a first communication module and a first container.
  • Management module the first communication module is used to: receive the first local management instruction from the management device, the first local management instruction is determined according to the task use case of the distributed collaborative AI, and the task use case is determined according to the first management instruction, where , the first management instruction includes the management instruction of the task environment and/or the management instruction of the task object, the task use case includes the task environment and the task object;
  • the first container management module is used to: manage the first local control device according to the first local management instruction .
  • the corresponding task evaluation containers can be managed through different paradigms to realize the evaluation of different task use cases.
  • the data provider and algorithm provider are inconsistent, it can be ensured that when the data provided by the data does not leave the edge node, by obtaining the evaluation results of different task use cases, it helps to obtain the execution status of different task use cases, thereby making the edge
  • the cloud collaborative distributed AI task architecture is easy to deploy.
  • the first management instruction is used to add a task use case, delete a task use case, modify a task use case, or query a task use case; managing the first local control device is to add a first local control device, Either delete the first local control device, modify the first local control device, or query the first local control device.
  • the first local control device when the first local control device is managed to add a first local control device, the first local control device A local management instruction includes a task use case and a first local container management configuration corresponding to the task cloud node; the first container management module is also used to: manage the first task evaluation container, the first local container according to the task use case and the first local container management configuration.
  • the container management configuration includes a first task evaluation container and management instructions corresponding to the first evaluation container.
  • the management instruction corresponding to the first task evaluation container is a new instruction to the first task evaluation container, a deletion instruction to the first task evaluation container, or a modification to the first task evaluation container. Instructions, or query instructions to the first task evaluation container, or instructions to implement communication between the first task evaluation container and an external task evaluation container, wherein the external task evaluation container belongs to a node other than the task cloud node.
  • the first container management module is also used to: add a first task evaluation container corresponding to the task cloud node, where the task The first AI task evaluation container corresponding to the cloud node includes a first distributed workflow.
  • the first local control device further includes a first result display module: when the evaluation result of the task use case is in the first task evaluation container, the first communication module is also used to: receive data from the first AI task
  • the evaluation container evaluates the evaluation result of at least one task use case
  • the first result display module is used to: display the evaluation result of at least one task use case; and send the evaluation result of at least one task use case to the management module.
  • the user can intuitively compare the evaluation results of the task objects, thereby obtaining objective evaluation results of the task use cases, which is beneficial to the user in choosing the appropriate one.
  • Target task object by displaying the evaluation results after evaluating different task use cases, the user can intuitively compare the evaluation results of the task objects, thereby obtaining objective evaluation results of the task use cases, which is beneficial to the user in choosing the appropriate one.
  • Target task object by displaying the evaluation results after evaluating different task use cases, the user can intuitively compare the evaluation results of the task objects, thereby obtaining objective evaluation results of the task use cases, which is beneficial to the user in choosing the appropriate one.
  • the type of task evaluation is any one of benchmark testing, certification, or competition.
  • the task environment is a test environment
  • the task object is a test object
  • the test object is any one of a test algorithm, a test model, a test system, or a test scenario.
  • the task paradigm is the test paradigm.
  • a second local control device for task evaluation of distributed collaborative artificial intelligence AI.
  • the device is arranged at a task edge node.
  • the second local control device includes a second communication module and a second container management module.
  • the second communication module is configured to: receive a second local management instruction from the management device, the second local management instruction is determined based on the task use case, the task use case is determined based on the second management instruction, wherein the second management instruction includes the task environment Management instructions and/or management instructions of task objects.
  • the task use case includes a task environment and a task object; the second container management module is used to: manage the second local control device according to the second local management instruction.
  • the corresponding task evaluation containers can be managed through different paradigms to realize the evaluation of different task use cases.
  • the data provider and algorithm provider are inconsistent, it can be ensured that when the data provided by the data does not leave the edge node, by obtaining the evaluation results of different task use cases, it helps to obtain the execution status of different task use cases, thereby making the edge
  • the cloud collaborative distributed AI task architecture is easy to deploy.
  • the second management instruction is used to add a task use case, delete a task use case, modify a task use case, or query a task use case; managing the second local control device is to add a second local control device, Either delete the second local control device, modify the second local control device, or query the second local control device.
  • the second local management instruction when managing the second local control device is to add a second local control device, includes the task use case and the second local container management configuration corresponding to the task cloud node; the second container The management module is also used to: manage the second task evaluation container according to the task use case and the second local container management configuration.
  • the second local container management configuration includes the second task evaluation container and the management instructions corresponding to the second evaluation container.
  • the management instruction corresponding to the second task evaluation container is to evaluate the second task container New instructions for the second task evaluation container, or deletion instructions for the second task evaluation container, or modification instructions for the second task evaluation container, or query instructions for the second task evaluation container, or implementation of the second task evaluation container and external task evaluation containers Communication instructions between nodes where the external task evaluation container belongs to a node other than the task cloud node.
  • the second container management module is also used to: add a second task evaluation container corresponding to the task cloud node, where the task The second AI task evaluation container corresponding to the cloud node includes a second distributed workflow.
  • the second local control device further includes a second result display module: when the evaluation result of the task use case is in the second task evaluation container, the second communication module is configured to: receive the evaluation result from the second AI task The container evaluates the result of at least one task use case; the second result display module is used to: display the evaluation result of at least one task use case; and send the evaluation result of at least one task use case to the management device.
  • the user can intuitively compare the evaluation results of the task objects, thereby obtaining objective evaluation results of the task use cases, which is beneficial to the user in choosing the appropriate one.
  • Target task object by displaying the evaluation results after evaluating different task use cases, the user can intuitively compare the evaluation results of the task objects, thereby obtaining objective evaluation results of the task use cases, which is beneficial to the user in choosing the appropriate one.
  • Target task object by displaying the evaluation results after evaluating different task use cases, the user can intuitively compare the evaluation results of the task objects, thereby obtaining objective evaluation results of the task use cases, which is beneficial to the user in choosing the appropriate one.
  • the type of task evaluation is any one of benchmark testing, certification, or competition.
  • the task environment is a test environment
  • the task object is a test object
  • the test object is any one of a test algorithm, a test model, a test system, or a test scenario.
  • the task paradigm is the test paradigm.
  • a distributed collaborative artificial intelligence AI task evaluation system includes a management device in any possible implementation of the device design of the fourth aspect, and a management device in the device design of the fifth aspect.
  • the first local control device in any possible implementation manner, or the second local control device in any possible implementation manner in the device design of the sixth aspect.
  • a computer device in an eighth aspect, includes a memory and a processor.
  • the memory is used to store programs; the processor is used to execute the program stored in the memory.
  • the processor is configured to execute the method in the first aspect and any one implementation of the first aspect, or the processor is configured to execute the method in the second aspect and any one implementation of the second aspect, so The processor is configured to execute the third aspect and the method in any implementation manner of the third aspect.
  • the processor in the eighth aspect above can be either a central processing unit (CPU) or a combination of a CPU and a neural network computing processor.
  • the neural network computing processor here can include a graphics processor (graphics processing unit). unit (GPU), neural-network processing unit (NPU) and tensor processing unit (TPU), etc.
  • GPU graphics processing unit
  • NPU neural-network processing unit
  • TPU tensor processing unit
  • TPU is an artificial intelligence accelerator special integrated circuit fully customized by Google for machine learning.
  • inventions of the present application provide a computer program product.
  • the computer program product includes: computer program code.
  • the computer program code When the computer program code is run on a computer, it causes the computer to execute any possible method design in the first aspect.
  • the method in the implementation mode or executes the method in any possible implementation mode in the method design of the above second aspect, or executes the method in any possible implementation mode in the method design of the above third aspect.
  • embodiments of the present application provide a computer-readable medium.
  • the computer-readable medium stores program code.
  • the computer program code When the computer program code is run on a computer, it causes the computer to execute any one of the above-mentioned method designs of the first aspect.
  • a chip in an eleventh aspect, includes a processor and a data interface.
  • the processor reads instructions stored in the memory through the data interface and executes any one of the first aspect or the second aspect. method in an implementation.
  • the chip may further include a memory, in which instructions are stored, and the processor is configured to execute the instructions stored in the memory.
  • the processor is configured to execute the method in any implementation manner of the first aspect or the second aspect.
  • the above-mentioned chip can specifically be a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC).
  • FPGA field-programmable gate array
  • ASIC application-specific integrated circuit
  • Figure 1 is a schematic diagram of an artificial intelligence main body framework provided by an embodiment of the present application.
  • Figure 2A is a schematic diagram of the architecture of a traditional edge-cloud collaborative task method provided by an embodiment of the present application
  • Figure 2B is a schematic diagram of the architecture of another traditional edge-cloud collaborative task method provided by an embodiment of the present application.
  • Figure 3A is a schematic diagram of a distributed collaborative AI task evaluation architecture provided by an embodiment of the present application.
  • Figure 3B is a schematic diagram of another distributed collaborative AI task evaluation architecture provided by an embodiment of the present application.
  • Figure 3C is a schematic diagram of another distributed collaborative AI task architecture provided by an embodiment of the present application.
  • Figure 4 is a flowchart of a distributed collaborative AI task evaluation method provided by an embodiment of the present application.
  • Figure 5 is a schematic flow chart of a distributed collaborative AI benchmark test simulation method provided by an embodiment of the present application.
  • Figure 6 is a schematic flowchart of a benchmark test process for distributed edge-cloud collaborative AI provided by an embodiment of the present application
  • Figure 7 is a schematic diagram of the hardware structure of a distributed collaborative AI task evaluation system provided by an embodiment of the present application.
  • Figure 8 is a schematic diagram of the hardware structure of a distributed collaborative AI task evaluation and management device provided by an embodiment of the present application.
  • Figure 1 is a schematic diagram of an artificial intelligence main frame provided by an embodiment of the present application.
  • the main frame describes the overall workflow of the artificial intelligence system and is suitable for general needs in the field of artificial intelligence.
  • Intelligent information chain reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensation process of "data-information-knowledge-wisdom".
  • the "IT value chain” reflects the value that artificial intelligence brings to the information technology industry, from the underlying infrastructure of human intelligence and information (providing and processing technology implementation) to the systematic industrial ecological process.
  • Infrastructure provides computing power support for artificial intelligence systems, enables communication with the external world, and supports it through basic platforms.
  • the infrastructure can communicate with the outside through sensors, and the computing power of the infrastructure can be provided by smart chips.
  • the smart chip here can be a central processing unit (CPU), a neural network processing unit (NPU), a graphics processing unit (GPU), or an application specific integrated circuit.
  • CPU central processing unit
  • NPU neural network processing unit
  • GPU graphics processing unit
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • the basic platform of infrastructure can include distributed computing framework and network related platform guarantees and support, and can include cloud storage and computing, interconnection networks, etc.
  • data can be obtained through sensors and external communication, and then the data can be provided to smart chips in the distributed computing system provided by the basic platform for calculation.
  • Data from the upper layer of the infrastructure is used to represent data sources in the field of artificial intelligence.
  • This data involves graphics, images, voice, text, sequences, and also involves IoT data of traditional equipment, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
  • sequence can be understood as a data sequence.
  • sequential sequence data that is, sequential data, such as weather forecast data (temperature, wind direction, etc.) within a period of time, or physiological data such as stock market data, human blood sugar change data, etc. sequence and so on.
  • the above data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other processing methods.
  • machine learning and deep learning can perform symbolic and formal intelligent information modeling, extraction, preprocessing, training, etc. on data.
  • Reasoning refers to the process of simulating human intelligent reasoning in computers or intelligent systems, using formal information to perform machine thinking and problem solving based on reasoning control strategies. Typical functions are search and matching.
  • Decision-making refers to the process of decision-making after intelligent information is reasoned, and usually provides functions such as classification, sorting, and prediction.
  • some general capabilities can be formed based on the results of further data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, and image processing. identification, etc.
  • Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of overall artificial intelligence solutions, productizing intelligent information decision-making and realizing practical applications. Its application fields mainly include: intelligent manufacturing, intelligent transportation, Smart home, smart medical care, smart security, autonomous driving, safe city, smart terminal, etc.
  • Embodiments of the present application can be applied to many fields in artificial intelligence, such as smart manufacturing, smart transportation, smart home, smart medical care, smart security, autonomous driving, safe cities and other fields.
  • Edge devices can be understood as any device with computing resources and network resources other than cloud-side devices.
  • the edge device can be a client device or a device between the cloud server and the client device.
  • a mobile phone can be an edge device
  • a sensor can be an edge device
  • a gateway can be an edge device between the smart home terminal and the cloud server.
  • edge devices are designed to analyze or process data close to the source of the data. Since there is no data flow, network traffic and processing delays are reduced.
  • the edge device in the embodiment of the present application may be a mobile phone, a tablet personal computer (TPC), a media player, a smart home, a laptop computer (LC), or a personal digital assistant with computing capabilities.
  • assistant PDA
  • personal computer PC
  • camera camcorder
  • smart watch wearable device
  • self-driving vehicle etc. It can be understood that the embodiments of the present application do not limit the specific form of the edge device.
  • Edge AI originates from edge computing.
  • Edge computing uses edge devices to process data from data generation sources, which helps reduce the processing load of the overall cloud-edge collaboration system and reduce data delays.
  • Edge AI processes AI algorithms locally on edge devices, processing and analyzing data from data generation sources without the need for streaming or cloud-side data storage.
  • AI paradigm is an AI process recognized by the industry or academia.
  • the AI processing process framework remains unchanged, and the specific algorithms in the processing process framework can be replaced.
  • edge-cloud collaborative AI paradigm as an example, in the training and reasoning of image classification models, if the edge-cloud collaborative AI paradigm is to train the image classification model on the cloud side, the edge side completes the inference of image classification through the trained image classification model. process, then "cloud-side training, edge-side reasoning" is an edge-cloud collaborative AI paradigm.
  • specific training methods or reasoning methods are replaceable, but the process of "cloud-side training, edge-side reasoning" remains unchanged.
  • the “cloud side” can also be called the cloud side, and the “edge side” can also be called the edge device side, and can also have other names, which are not limited in the embodiments of this application.
  • the test scenario is a business scenario that meets the specific application of edge devices.
  • the test scenario can be represented by business scenario description, data set settings, data feature settings, data label settings, related indicators and standards.
  • the test scenario of the test paradigm can be a vehicle re-identification application scenario
  • the relevant indicator can be the mean average precision (mAP) of the vehicle re-identification category
  • the relevant standard can be the qualification standard of the relevant indicator, for example, vehicle re-identification
  • the qualifying standard for mAP is mAP greater than or equal to 0.95.
  • the test object refers to the target instance of the test, which can be an algorithm, model, system, data set or scenario, etc.
  • the test object in the vehicle re-identification test scenario, there are a series of vehicle re-identification algorithms.
  • the test object can be this series of vehicle re-identification algorithms, and then the best vehicle re-identification test scenario can be obtained from the series of algorithms. algorithm.
  • the test object is a series of test scenarios. For a specific algorithm, the best test scenario is obtained from this series of test scenarios.
  • the test environment is the configuration or constraints required for edge-cloud collaborative distributed AI testing.
  • the configuration can include the resource configuration and AI algorithm configuration required for testing.
  • resource configuration can include CPU core number configuration, transmission bandwidth configuration, etc.
  • AI algorithm configuration can include business data set configuration, algorithm accuracy evaluation indicator configuration, AI algorithm test paradigm, etc.
  • the testing paradigm of the AI algorithm can also be called the AI paradigm.
  • the incremental testing paradigm includes the training process and inference.
  • the training process includes the initialization training module and the model update module, and the inference process includes the difficult example identification module.
  • Test cases include test objects and test environments, which are execution instances under the constraints of the test environment to verify whether the test objects meet specific performance requirements.
  • Benchmark testing is an edge-cloud collaborative distributed AI system evaluation method recognized by academia or industry.
  • Container technology is an operating system-level virtualization technology that isolates different processes through operating system isolation technology, such as control groups and namespaces under Linux.
  • Container technology is different from hardware virtualization technology in that it does not have virtual hardware, and there is no operating system inside the container, but only processes. It is precisely because of this feature of container technology that containers are lighter and more convenient to manage than virtual machines.
  • a set of common management operations are defined, such as starting, stopping, pausing, deleting, etc., so that the life cycle of the container can be managed uniformly.
  • the container is running, it is started on demand, that is, after the created container completes the corresponding task, it can be deleted and re-created when it is used next time.
  • Figure 2A is a schematic diagram of the architecture of a traditional edge-cloud collaborative task method provided by an embodiment of the present application.
  • Figure 2B is a schematic diagram of the architecture of another traditional edge-cloud collaborative task method provided by an embodiment of the present application. The following will be combined with Figure 2A and Figure 2B explains this in detail respectively.
  • edge-cloud collaborative distributed AI task evaluation methods are many edge-cloud collaborative distributed AI task evaluation methods, especially in benchmark tests.
  • algorithm developers deploy the test paradigms of cloud-side training and side-side reasoning on the cloud side and side respectively.
  • cloud nodes and edge nodes can be specific physical nodes.
  • the task of the cloud node is training.
  • the cloud node sends the trained model to the edge node, and the task of the edge node is
  • edge nodes complete specific reasoning tasks through trained models.
  • edge nodes can send newly collected data to cloud nodes to further train the model.
  • edge-side training and cloud-side aggregation and post-inference reasoning on the edge and cloud sides respectively. That is, the distributed collaboration task of edge nodes is training work. The data generated by the data source is trained on the edge nodes, and then the trained model of the edge nodes is sent to the cloud node. The cloud node will obtain the model from different edge nodes. After aggregation, the overall model is obtained, and specific reasoning tasks are completed through the overall model on the cloud node.
  • edge-cloud collaborative distributed AI benchmark architecture shown in Figure 2A Whether it is the edge-cloud collaborative distributed AI benchmark architecture shown in Figure 2A or the edge-cloud collaborative distributed AI benchmark architecture shown in Figure 2B, there are the following problems.
  • test paradigm the algorithm developer can only test the test cases under such a test paradigm.
  • the test paradigm for cloud-side training and side-side reasoning in Figure 2A it cannot be easily changed. If it is to be changed, a new test paradigm needs to be redeployed.
  • the test paradigm under the test framework for federated learning in Figure 2B is only applicable when the data provider and the algorithm provider are consistent. If the algorithm provider and the data provider are inconsistent, the algorithm provider cannot obtain training. Data, data providers cannot obtain the algorithm, and the test paradigm in Figure 2B is no longer applicable. It is often an ideal state for the algorithm provider and the data provider to be consistent, but in reality, the two are often inconsistent.
  • test object is an algorithm
  • a single node can test the algorithm
  • this kind of testing method has relatively high labor and material costs.
  • test cases are relatively limited and only support limited tasks in medical scenarios or traffic classification scenarios, such as image classification, target detection, speech recognition, etc. Therefore, the test cases will also be relatively limited, that is, the tests in the test cases
  • the environment and test objects are both in limited test scenarios. For example, there is a lack of support for typical edge-cloud collaborative distributed AI application scenarios such as industrial quality inspection and vehicle re-identification.
  • embodiments of the present application propose a distributed collaborative AI task evaluation method and device.
  • the distributed collaborative AI task evaluation method will be described in detail below with reference to the accompanying drawings.
  • the following mainly takes edge-cloud collaborative distributed AI tasks as an example for explanation.
  • the distributed collaborative architecture in the embodiments of this application is not limited to edge-cloud collaborative distributed architecture, but can also be other distributed architectures. In this regard No restrictions.
  • the AI task assessment type of edge-cloud collaboration is an AI task involving standardized assessment.
  • it can be an AI benchmark test task for edge-cloud collaboration, an AI application certification task for cloud services and products for edge-cloud collaboration, or an AI competition rating task for edge-cloud collaboration, etc.
  • the embodiments of this application do not limit this.
  • the following mainly takes the AI benchmark test task of edge-cloud collaboration as an example for detailed explanation.
  • the edge-cloud collaborative distributed AI task evaluation architecture can include both the cloud side and the edge side.
  • the cloud side includes cloud nodes
  • the edge side includes edge nodes.
  • Cloud nodes and edge nodes can be specific physical nodes, that is, cloud nodes can be servers and other equipment on the cloud side, and edge nodes can be other than specific equipment on the cloud side.
  • Devices such as edge devices or client devices.
  • the edge-cloud collaborative distributed AI task evaluation architecture can also include the cloud side, edge side, and client.
  • the specific equipment on the edge side is between the cloud side and the client side.
  • Devices between clients, for example, edge devices can be gateways between smart home clients and cloud-side servers.
  • the embodiments of this application do not limit the specific physical forms of edge nodes and cloud nodes.
  • the cloud node can also be a virtual cloud node.
  • Figures 3A and 3C in the embodiment of the present application take the edge-cloud architecture as an example for illustration.
  • the distributed collaborative AI task evaluation method and device in the embodiment of the present application are also applicable to the architecture of the device-edge cloud.
  • the embodiment of the present application does not limit the specific number of cloud nodes and edge nodes.
  • the number of cloud nodes and edge nodes in Figure 3A and Figure 3B is only an exemplary representation.
  • Figure 3A is a schematic diagram of a distributed collaborative AI task evaluation architecture provided by an embodiment of the present application.
  • the cloud nodes on the cloud side include control cloud nodes and task cloud nodes.
  • a management module is arranged in the control cloud node, which can also be called a management device.
  • the management module is used to realize the life cycle management of the global container.
  • the management module has two ways to manage the global container life cycle. First, the task management device can create local control devices and task evaluation containers in all nodes, where local control devices arranged on other nodes are also a kind of container. Secondly, the task management device can also create a first local control device and a second local control device, where the first local control device is arranged at the task cloud node, the second local control device is arranged at the task edge node, and then the two local control devices The device manages task containers of local nodes respectively.
  • the management device includes a container management module.
  • the container management module implements functions such as adding, deleting, modifying, and querying global containers.
  • the management device also includes a communication module.
  • the management device also includes a task use case management module, which can also be called a global task use case management module.
  • the global task use case management module is used for basic functions such as adding, deleting, modifying, and querying task use cases.
  • the AI task evaluation is an AI benchmark test
  • the global task case management module can be a global test case management module
  • the global test can be
  • the instance management module is used for basic functions such as adding, deleting, modifying, and querying the test environment and/or test objects.
  • a first local control device is arranged in the task cloud node.
  • the first local control device is used to implement life cycle management of the first task evaluation container and communication with the external evaluation task container, where the external task container belongs to the task corresponding to the local task evaluation container.
  • Nodes other than the node that is, nodes other than the task cloud node corresponding to the first task evaluation container, for example, the second task evaluation container shown in Figure 3A.
  • the specific workflow (pipeline) in the first task evaluation container has been created at the early stage of the creation of the first task evaluation container.
  • the first task evaluation container may include a local simulation workflow and a first distributed workflow.
  • the first local control device also includes a first container management module and a first task case management module, where the first container management module is used to implement functions such as adding, deleting, modifying, and querying containers on the local node.
  • a task case management module is used to manage task cases, for example, to implement basic functions such as adding, deleting, modifying, and querying test objects and/or test environments in test cases.
  • the first local control device also includes a first communication module.
  • the second local control device arranged on the side task edge node is similar to the first local control device arranged on the task cloud node. For the sake of simplicity, no further description will be given here.
  • the management device for global management is arranged on a separate control cloud node.
  • FIG. 3B is a schematic diagram of another distributed collaborative AI task evaluation architecture provided by an embodiment of the present application.
  • the difference lies in that the management device is not separately arranged on a control cloud node.
  • the management device as a high-level management module, the first local control device and the first task evaluation container, are jointly placed on the same cloud node.
  • the functions of the structural modules in Figure 3B are the same as those in Figure 3A, and will not be described again for the sake of simplicity.
  • FIG. 3C is a schematic diagram of another distributed collaborative AI task architecture provided by an embodiment of the present application.
  • the task evaluation simulation architecture diagram of distributed collaborative AI that implements multi-node task evaluation simulation through a single node.
  • the single-node management device is used to implement life cycle management of all containers of a single node.
  • the single-node management device includes a task case management module and a container management module.
  • the test case management module is used for basic functions such as adding, deleting, modifying or querying the test environment and test objects.
  • the container management module is used to implement basic functions such as adding, deleting, modifying or querying the third AI task evaluation container.
  • a local simulation workflow is included in the test container.
  • the architecture shown in Figures 3A and 3B can be used to implement the task evaluation of distributed collaborative AI on edge-cloud collaborative multi-nodes, such as the benchmark test of distributed collaborative AI, while the architecture shown in Figure 3C
  • the architecture can be used to implement task evaluation simulations of multi-node distributed collaborative AI on a single node, such as benchmark testing simulations of multi-node distributed collaborative AI on a single node.
  • Figure 4 is a flow chart of an evaluation method for distributed collaborative AI tasks provided by an embodiment of the present application.
  • the AI task configuration includes the configuration of the task environment of the distributed collaborative AI.
  • the task configuration and task objects of distributed collaborative AI are adjustable.
  • the specific adjustment methods can be basic functions such as adding, deleting, modifying, and querying.
  • the embodiments of this application do not limit the adjustment methods.
  • S402 Receive a first management instruction according to the configuration of the task environment and the task object, where the first management instruction includes a management instruction of the task environment and/or a management instruction of the task object.
  • the first management instruction may be a management instruction of the task environment, the first management instruction may also be a management instruction of the task object, and the first management instruction may also be a management instruction of the task environment and the task object.
  • the task use cases include AI task environments and AI task objects.
  • the first management instruction may be a basic management instruction such as adding, deleting, modifying or querying the AI task case, which is not limited in the embodiment of the present application.
  • task use cases corresponding to the paradigm of distributed collaborative artificial intelligence can be flexibly managed, which facilitates service developers to manage task use cases according to business needs in actual scenarios. , which facilitates the processing of different task use cases and helps to obtain the execution status of different AI task use cases, thereby making the edge-cloud collaborative distributed AI task architecture easy to deploy.
  • the type of task evaluation of distributed collaborative AI is an AI task involving standardized evaluation.
  • it can be a benchmark test of distributed edge-cloud collaborative AI, application certification of cloud services and products for distributed edge-cloud collaborative AI, or a competition rating task for distributed edge-cloud collaborative AI, etc.
  • the following takes the benchmark test of distributed edge-cloud collaborative AI as an example.
  • the task evaluation method of distributed collaborative AI is explained in detail from the following two benchmark tests. The first is to conduct multi-node benchmark test simulation on a single node, and the second is to conduct distributed edge-cloud collaborative benchmark test on multiple nodes.
  • the task environment is the test environment
  • the task object is the test object
  • the task case is the test case
  • the task configuration is the test environment configuration
  • the task environment configuration is the test environment configuration
  • the task paradigm is the test paradigm (can also be called AI paradigm)
  • the task evaluation mode is the benchmark test mode
  • the task case management module is the test case management module.
  • the first type performs multi-node benchmark simulation on a single node.
  • the benchmark simulation architecture is shown in Figure 3C. This will be described in detail below with reference to Figure 5 .
  • Figure 5 is a schematic flowchart of a distributed collaborative AI benchmark test simulation method provided by an embodiment of the present application.
  • the test scenario is safety helmet target detection in industrial quality inspection as an example
  • the test paradigm in the test environment is edge-cloud collaborative incremental learning (IL).
  • IL edge-cloud collaborative incremental learning
  • the management device Before the management device starts to be used, the management device is started according to the benchmark test mode, and each module in the management device is initialized.
  • the management device is started, and the test case management module, container management module, communication module and result display module in the management device are initialized.
  • the communication module of the management device obtains the test configuration file.
  • the test configuration file includes a test environment configuration module, a test paradigm configuration module, an algorithm basic configuration module, an algorithm hyperparameter configuration module, a container management configuration module, and a container invocation configuration.
  • the configuration modules in the test configuration file may exist in the form of separate configuration modules, or may not exist in the form of separate configuration modules.
  • the test paradigm configuration module can be a separate configuration module or included in the test environment configuration module.
  • Algorithm hyperparameter configuration can exist in the form of a sub-configuration file or not.
  • the test case management module reads the configuration parameters in the test environment configuration module, test paradigm configuration module, algorithm basic configuration module or algorithm super parameter configuration module.
  • the configuration parameters of the test environment module are as shown in Table 1.
  • test environment parameter configuration in Table 1 is only an exemplary description. There can also be other test environment parameter configurations, which can be modified accordingly according to the test scenario and the needs of the service developer. The embodiments of the present application do not limit this.
  • the simulation mode of each task container can be any one in Table 1.
  • the simulation mode of all task containers can be any one of these four simulation modes, or any combination of these four simulation modes. That is, all task work nodes can be simulated in the same simulation mode, or they can be simulated in different simulation modes.
  • 10 system pseudo-containers will be created inside these 10 task containers.
  • the system pseudo-container can be understood as the system pseudo-container does not directly use data to run, but has calculation formulas corresponding to system performance such as bandwidth consumption or energy consumption inside the system pseudo-container.
  • calculation formulas can be used for system performance test simulation. For another example, if there are 10 task containers created by a single node, 5 of them are used for system performance test simulation and the other 5 are used for algorithm performance test simulation. Then, a system pseudo-container is created inside the 5 containers used for system performance test simulation. Algorithm pseudo-containers are created inside 5 containers used for algorithm performance test simulation.
  • test paradigm configuration module For example, the parameter description in the test paradigm configuration module is shown in Table 2.
  • test paradigm configuration parameters may exist in a separate module or as part of the parameters in the test environment configuration module, and the embodiments of the present application do not limit this.
  • configuration parameters of the test paradigm are used as part of the test environment configuration parameters.
  • test environment module parameters and the test paradigm module parameters are shown separately.
  • the AI algorithm in the algorithm basic configuration module can be any AI algorithm.
  • the hyperparameters in the algorithm hyperparameter module can exist directly in the algorithm hyperparameter configuration module in the form of enumerations, or in the form of a hyperparameter configuration file.
  • the algorithm hyperparameter is the learning rate
  • all learning rates are listed in the list, and the hyperparameter name of the learning rate is added to the multi-parameters in the basic configuration module of the algorithm.
  • the hyperparameter_file in the algorithm basic configuration module is not empty, then when obtaining the algorithm hyperparameter configuration, the algorithm hyperparameter configuration module will not be scanned, but the algorithm hyperparameter configuration file will be read directly.
  • Container management configuration includes the number of containers and the size of each container.
  • Container management configuration is local container management configuration and/or global container management configuration.
  • the local container management configuration includes the number of local containers and the size of each local container.
  • the container calling configuration includes the calling sequence of the modules corresponding to the test paradigm and the hyperparameters of the modules corresponding to the test paradigm.
  • S520 The communication module of the management device obtains the test object.
  • test case management module obtains the test algorithm by reading the custom algorithm file.
  • test object When the test object is an algorithm, the test object can be different customized algorithm files, thereby obtaining different algorithms.
  • the incremental learning paradigm includes an initialization training module, a difficult example identification module, and a model update module.
  • Each module can be implemented through different algorithms, so the customized algorithm file can be these three modules. the corresponding algorithm.
  • the custom algorithms By reading different custom algorithms, different test algorithms can be obtained.
  • the custom algorithm files are located in the algorithm directory in the edge-cloud collaborative AI benchmark testing platform library.
  • the edge-cloud collaborative AI benchmark testing platform library needs to be deployed on a single node.
  • the deployment method can be to download the edge cloud collaborative AI benchmark testing platform library.
  • custom algorithm file is a standard template file.
  • the custom algorithm file can conform to the algorithm interface specification of Sedna lib.
  • S530 The communication module of the management device obtains the data set.
  • test case management module can read data files in txt format and csv format.
  • the data source of the test data can be the open source data set in Kaggle and the data preprocessing algorithm corresponding to the open source data set, or it can also be the data collected by the data provider obtained by the edge node.
  • each line of the txt file records a piece of index information and label information of unstructured data.
  • the index information in the txt format data file is represented by the absolute path of the image and the coordinates of the target box, and the label information can be represented by the number 1 or 0.
  • the data file in csv format can be represented by the characteristic attributes of structured data and the specific parameters corresponding to the characteristic attributes of each piece of structured data.
  • the acquisition order of S520 to S540 is not limited in the embodiment of the present application. It may be the acquisition order of S520, S530 and S540, or it may be other orders.
  • the communication module of the management device receives a first management instruction according to the test configuration and the test object, where the first management instruction includes a test environment management instruction and/or a test object management instruction.
  • test configuration includes test environment configuration.
  • the test case management module of the management device manages test cases according to the first management instruction.
  • the test cases include a test environment and a test object.
  • test cases can be managed by adding, deleting, modifying or querying tests.
  • the test case management module obtains the management status information of the test case.
  • the test case management information includes test cases and test case status information. For example, if the first management instruction is to process changes such as adding, deleting, and modifying test cases, then the status information of the test cases is the information after the test cases have been changed. For another example, if the first management instruction is to process the query of the test case without any changes, then the status information of the test case is the current status information of the test case.
  • the container management module of the management device manages the third task evaluation container according to the test case and the local container management configuration, where the third task evaluation container includes the local simulation workflow.
  • managing the third task evaluation container may include adding a third task evaluation container, deleting the third task evaluation container, modifying the third task evaluation container, or querying the third task evaluation container.
  • managing the third task evaluation container during the benchmark test may include adding a new benchmark test container, deleting the benchmark test container, modifying the benchmark test container, or querying the benchmark test container.
  • benchmark containers is not limited here and can be determined according to the number of containers parameters in the local container management configuration.
  • the test paradigm in the test case is an incremental learning paradigm.
  • the incremental learning paradigm includes an initialization training module, a difficult case identification module, and a model update module.
  • the container management module can create a training container for the initialization training module and the model update module. Establish an inference container for the difficult case identification module.
  • the container management module of the management device adds a third benchmark test container according to the test case and the local container management configuration, where the third benchmark test container includes a local simulation workflow .
  • the container conditions in the third benchmark test container are determined.
  • the simulation mode in the third benchmark container can be any one of the four simulation modes in Table 1.
  • the container management module of the management device creates an algorithm pseudo-container in the third task evaluation container.
  • the container management module of the management device creates an algorithm pseudo-container in the third benchmark container.
  • the container management module of the management device creates a system pseudo-container in the third task evaluation container.
  • the container management module of the management device creates a system pseudo-container in the third benchmark container.
  • the container management module of the management device creates a real container in the third task evaluation container.
  • the container management module of the management device creates a real container in the third benchmark container.
  • the container management module of the management device continues to use the third task to evaluate the container. For example, the container management module of the management device inherits the third benchmark container that has been created.
  • the container management module of the management device adds a third benchmark test container according to the test case, simulation mode parameters and local container management configuration, where the third benchmark test container includes Local simulation workflow.
  • the simulation mode in the third benchmark container can be any one of the four simulation modes in Table 1.
  • the third task evaluation container created by the container management module of the management device includes an algorithm pseudo-container.
  • the third benchmark test container created by the container management module of the management device includes an algorithm pseudo-container.
  • the third task evaluation container created by the container management module of the management device includes a system pseudo-container.
  • the third benchmark test container created by the container management module of the management device includes a system pseudo-container.
  • the third task evaluation container created by the container management module of the management device includes a real container.
  • the third benchmark container created by the container management module of the management device includes the real container.
  • the container management module of the management device creates a third task evaluation container.
  • the container management module of the management device creates a third benchmark container.
  • the local simulation workflow of the third task evaluation container simulates the test case according to the local container call configuration and the simulation mode corresponding to the third task evaluation container, and obtains the simulation results, where the test configuration includes the local container call configuration,
  • the local container calling configuration includes the calling sequence of the algorithm module corresponding to the test paradigm and the hyperparameters of the algorithm module corresponding to the test paradigm.
  • the local simulation workflow of the third benchmark test container calls the algorithm module and algorithm module hyperparameters related to the test paradigm from the use case management module in sequence according to the call configuration of the third benchmark test container corresponding to the test paradigm, and performs the test Use cases to simulate and obtain test results.
  • the calling sequence of modules related to the incremental learning paradigm is the initialization training module, the difficult example identification module and the model update module.
  • the local simulation workflow in the training container first calls the initialization training module, and then the local simulation work in the inference container The flow then calls the difficult example identification module, and finally the local simulation workflow in the training container calls the model update module.
  • the container management module starts training containers and inference containers on demand.
  • the simulation mode parameter is the simulation test algorithm performance
  • the data set obtained by the S530 is divided into multiple parts according to the settings in the test environment configuration parameters, and the test cases are simulated in the algorithm pseudo-container. test.
  • the simulation mode parameter is to simulate the test system performance
  • the data in the data set is not actually run in the system pseudo container.
  • the system performance test simulation results are obtained through the system performance calculation formula in the system pseudo container. .
  • the simulation mode parameters are the simulation test system performance and algorithm performance
  • the test cases are simulated and tested in the created real container.
  • the local simulation workflow of the benchmark container calls the modules and module hyperparameters related to the test paradigm in sequence, and tests the test cases in the benchmark container. Carry out simulation testing.
  • the simulation results can include the training model, inference results, evaluation indicators and other parameters of each test case.
  • S580 The management device receives the simulation result of at least one task use case from the third task evaluation container, and the result display module of the management device displays the simulation result of at least one task use case.
  • the management device receives the simulation result of at least one test case from the third benchmark test container, and the result display module of the management device displays the simulation result of the at least one test case.
  • the simulation results can display test case execution information through an online console or user interactive interface, or the simulation results can be directly saved in an offline save file.
  • the simulation result of at least one test case is the simulation execution of different test cases, and the simulation result can be used to determine the target test object.
  • the simulation results can be the execution of different test algorithms.
  • the evaluation index in the simulation results can be the image classification accuracy of different test algorithms.
  • the user can intuitively compare the simulation results of the test object, thereby obtaining objective test results, which is helpful for the user to select an appropriate target object.
  • the test case management module manages test cases (for example, creating test cases) through test configurations and test objects, and the container management module manages benchmark test work containers (for example, creates Benchmark test container), and perform benchmark test simulation through the simulation workflow in the benchmark test work container to obtain the simulation results.
  • test cases for example, creating test cases
  • benchmark test work containers for example, creates Benchmark test container
  • test cases can be flexibly configured according to test scenarios, that is, corresponding test cases can be managed according to different test paradigms (for example, new Add corresponding test cases).
  • the second is to conduct distributed edge-cloud collaboration benchmark testing on multiple nodes.
  • the benchmark testing framework is shown in Figure 3A or Figure 3B. The following mainly describes in detail using the framework of Figure 3A in conjunction with Figure 6 .
  • Figure 6 is a schematic flowchart of a benchmark test process for distributed edge-cloud collaborative AI provided by an embodiment of the present application.
  • the test scenario is safety helmet target detection in industrial quality inspection as an example
  • the test paradigm in the test environment is edge-cloud collaborative incremental learning as an example.
  • the management device Before the management device starts to be used, start the management device according to the benchmark test mode, initialize each module in the management device, and create a local control device in the task node.
  • the management module is started, and the test case management module, container management module, communication module and result display module in the management device are initialized. Create a first local control device and a second local control device according to the initial test case and test configuration in the initialized test case management module.
  • the communication module in the management device obtains the test configuration file.
  • test configuration file obtained by the communication module in the management device is the same as the content in S510, and will not be described again for the sake of simplicity.
  • the simulation switch in the test environment parameter configuration can be off or on. If the simulation switch is on, it will be distributed through multiple nodes.
  • Benchmark test simulation of collaborative AI since the benchmark test mode is a multi-node distributed collaborative AI benchmark test, the simulation switch is turned off.
  • the container management configuration is global container management configuration and local container management configuration.
  • the global container management configuration includes the number of local control devices and the size of the local control device.
  • the local container management configuration includes each The number of local containers corresponding to the local control device and the size of each local container corresponding to each local control device. For example, as shown in FIG. 3A , the number of local control devices included in the global container management configuration is 2, that is, the first local control device and the second local control device.
  • the communication module in the management device obtains the test object.
  • the communication module in the management device receives a first management instruction according to the configuration of the test environment and the test object, where the first management instruction includes a test environment management instruction and/or a test object management instruction.
  • the test case management module in the management device manages test cases according to the first management instruction.
  • the test cases include a test environment and a test object.
  • S602, S603 and S604 are the same as the contents in S520, S540 and S550 respectively, and will not be described again for the sake of brevity.
  • the container management module in the management device generates instructions for managing the first local control device and instructions for managing the second local control device based on the test case and the global container management configuration.
  • the instruction to generate and manage the first local control device is to generate an instruction to create the first local control device, or to generate an instruction to delete the first local control device, or to generate an instruction to modify the first local control device, or to generate an instruction to query the first local control device.
  • Instructions for controlling the device; instructions for generating and managing the second local control device are generating instructions for creating the second local control device, or generating instructions for deleting the second local control device, or generating instructions for modifying the second local control device, or generating queries. Instructions from the second local control device.
  • the container management module when the management test case is a new test case, the container management module generates instructions for adding a new cloud node local control device and a new edge node local control device based on the test case and the global container management configuration. instruction.
  • the container management module in the management device directly manages the first local control according to the test case and the global container management configuration. device.
  • the global container management configuration includes a first local control device, an instruction action corresponding to the first local management instruction, a second local control device, and an instruction action corresponding to the second local management instruction.
  • the management device sends a first local management instruction to the first local control device and a second local management instruction to the second local control device.
  • the instruction action corresponding to the first local management instruction is a new instruction to the first local control device, or a deletion instruction to the first local control device, or a modification instruction to the first local control device, or an instruction to the first local control device.
  • Query instruction of the local control device; the management instruction corresponding to the second local management instruction is a new instruction to the second local control device, or a deletion instruction to the second local control device, or a modification instruction to the second local control device, Or a query command to the second local control device.
  • the management device when the management test case is a new test case, the management device sends a first local control device new command to the first local control device, and sends a new second local control command to the second local control device.
  • Device instructions when the management test case is a new test case, the management device sends a first local control device new command to the first local control device, and sends a new second local control command to the second local control device.
  • the first communication module of the first local control device receives the first local management instruction from the management device, and the first container management module of the first local control device manages the first local control device according to the first local management instruction.
  • the first local management instruction includes test cases, local container management configuration and container calling configuration.
  • the first local control device is used to implement life cycle management of the first task evaluation container in the task cloud node (for example, life cycle management of the first benchmark container), and communication with the external task evaluation container (for example, with the external Test container communication), the external task evaluation container is arranged on a node other than the task cloud node.
  • the first local control device is used to implement communication between the first task evaluation container in the task cloud node and the second task evaluation container in the task edge node.
  • the specific implementation may be to implement control plane communication between the first local control device, the management device and the second local control device, so that there is a communication network between the first task evaluation container and the second task evaluation container, and the first task evaluation container is implemented Data plane communication with the first task evaluation container.
  • the first test container and the edge node test container can transmit data such as data sets, models, and algorithms.
  • the communication module of the first local control device adds an instruction of the first local control device
  • the first container management module of the first local control device adds a new instruction of the first local control device according to the instruction of the first local control device.
  • the first container management module of the first local control device manages the first task evaluation container according to the test case and the local container management configuration, where the first task evaluation container includes the first distributed workflow.
  • managing the first task evaluation container may be any of the basic functions of adding, deleting, modifying, and querying the first task evaluation container. For example, the addition of the first benchmark container.
  • the number of created first task evaluations is not limited here and can be determined according to the number of containers parameters in the local container management configuration.
  • the testing paradigm is an incremental learning paradigm
  • the first task evaluation container managed by the first container management module in the task cloud node is a training container.
  • the second communication module of the second local control device receives an instruction to manage the second local control device, and manages the second local control device according to the instruction to manage the second local control device.
  • the second communication module of the second local control device receives an instruction to add a second local control device, and the second container management module of the second local control device responds according to the instruction of adding a second local control device. Command to add a second local control device.
  • the instructions for managing the second local control device include test cases, local container management configuration and capacity
  • the server calls the configuration.
  • the second local control device is used to implement life cycle management of the second task evaluation container in the task cloud node (for example, life cycle management of the second benchmark container), and communication with the external task evaluation container (for example, with the external Test container communication), the external task evaluation container is arranged on a node other than the task cloud node.
  • the second local control device is used to implement communication between the first task evaluation container in the task cloud node and the second task evaluation container in the task edge node.
  • the specific implementation process is similar to the local control device of the cloud node in S607. For the sake of simplicity, it will not be described in detail here.
  • the second container management module in the second local control device manages the second task evaluation container according to the test case and the local container management configuration, where the second task evaluation container includes the second distributed workflow.
  • managing the second task evaluation container may be any of the basic functions of adding, deleting, modifying, and querying the second task evaluation container. For example, the addition of a second benchmark container.
  • the number of created second task evaluations is not limited here and can be determined according to the number of containers parameters in the local container management configuration.
  • the testing paradigm is an incremental learning paradigm
  • the second container management module evaluates the container as an inference container during the second management task.
  • S607 and S608 are the processes in which the task cloud node manages the first local control device and the first task evaluation container
  • S609 and S610 are the processes in which the task edge node manages the second local control device and the second task evaluation container. These two The process is in no particular order.
  • the first task evaluation container sends the algorithm or model to the second task evaluation container according to the test case and the container call configuration.
  • the first distributed workflow of the first task evaluation container calls the model evaluation from the first task case management module according to the test case and the local container call configuration. Module,evaluates the test model to determine the target test,model.
  • the first distributed workflow of the first task evaluation container sends the target test model to the second task evaluation container.
  • the test model may be obtained directly, or may be trained on a task cloud node. The embodiment of the present application does not limit the source of the test model.
  • the content included in the first task use case management module is the same as the content included in the task use case management module in the management device. Therefore, when the network connection between the management device and the first local control device is disconnected, the first distributed workflow can directly call the algorithm module corresponding to the AI paradigm from the first task evaluation container.
  • the first task evaluation container and the second task evaluation container pass between the first local control device, the management device and the second local control device.
  • a communication network is built between them for data plane transmission.
  • the task edge node serves as the data provider and the task cloud node serves as the algorithm provider.
  • the data provider can be prevented from obtaining the algorithm, thereby reducing The possibility of algorithm leakage from the algorithm provider.
  • the first task evaluation container when sending the algorithm to the second task evaluation container, directly sends the algorithm to the second task evaluation container according to the test case and container call configuration.
  • the second task evaluation container receives the model or algorithm from the first task evaluation container, the second task evaluation container's second distributed workflow and according to the test case and container call configuration, calls the inference module from the second task case management module , based on the model, perform inference on the data of the task edge nodes, obtain the test results corresponding to the test cases, and send the test results to the first task evaluation container.
  • the explanation of the second task use case management module is similar to that of the first task use case management module, so no details will be given here.
  • the second task evaluation container when receiving the model from the first task evaluation container, receives the target test pattern from the first task evaluation container.
  • the second distributed workflow of the second task evaluation container calls the inference module from the second task case management module according to the test case and the container call configuration, and performs inference on the target test model through the inference data to obtain the test results of the target test model, And send the test results to the first task evaluation container.
  • the second distributed workflow of the second task evaluation container calls the training module from the second task case management module according to the test case and the container call configuration, Get the trained model. Then the inference module is called from the second test case management module, inference is performed on the data of the task edge nodes based on the trained model, the test results corresponding to the test cases are obtained, and the test results are sent to the first task evaluation container.
  • the second communication module of the second local control device obtains the test results of at least one test case, and the second result display module of the second local control device displays at least The second local control device sends the test result of at least one test case to the global test case management module.
  • the test result of at least one test case can display the execution status of at least one test case through an online console or user interaction interface.
  • the test results can include the test algorithm, the test algorithm indicator results, the test paradigm to which the test algorithm belongs, and the super-parameter configuration of the test algorithm, etc.
  • the test results of at least one test case are displayed in the form of a ranking list on the user interaction interface.
  • the first communication module of the first local control device obtains the test results of at least one test case, and the first result display module of the first local control device displays at least The first local control device sends the test result of at least one test case to the global test case management module
  • the first task evaluation container receives the test results and updates the target test model according to the test results.
  • test containers in the benchmark test of multi-node distributed collaborative AI, through the management of test cases, corresponding test containers can be managed for different test paradigms to implement benchmark tests of different test cases. Especially when the data provider and algorithm provider are inconsistent, it can be ensured that the model is debugged and updated by obtaining the test results of different test cases when the data provided by the data does not leave the edge node.
  • the device includes a task use case management module and a communication module.
  • the management device also includes a container management module and a result display module.
  • the communication module is used to: obtain the task configuration of distributed collaborative AI and the task object of distributed collaborative AI.
  • the task configuration includes the configuration of the task environment of the distributed collaborative AI; according to the configuration of the task environment and the task object, a first management instruction is received, wherein the first management instruction includes a management instruction of the task environment and/or a management instruction of the task object.
  • the task use case management module is used to manage task use cases of distributed collaborative AI according to the first management instruction.
  • the task use cases include task environments and task objects.
  • the task case management module, container management module and result display module can all be implemented by software or can be implemented by hardware.
  • the following uses the task use case management module as an example to introduce the implementation method of the task use case management module.
  • the implementation of the container management module and result display module can refer to the implementation of the task case management module.
  • the task case management module may be an application or block of code running on a computer device.
  • the computer device may be at least one of a physical host, a virtual machine, a container, and other computing devices. Further, the above computer equipment may be one or more.
  • the task case management module can be an application running on multiple hosts/virtual machines/containers. It should be noted that multiple hosts/virtual machines/containers used to run the application can be distributed in the same availability zone (AZ) or in different AZs. Multiple hosts/VMs/containers used to run the application can be distributed in the same region or in different regions. Among them, usually a region can include multiple AZs.
  • multiple hosts/VMs/containers used to run the application can be distributed in the same virtual private cloud (VPC) or across multiple VPCs.
  • VPC virtual private cloud
  • the task case management module may include at least one computing device, such as a server.
  • the task use case management module can also be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD).
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • the above-mentioned PLD can be a complex programmable logical device (CPLD), a field-programmable gate array (field-programmable gate array, FPGA), a general array logic (generic array logic, GAL), or any combination thereof.
  • CPLD complex programmable logical device
  • FPGA field-programmable gate array
  • GAL general array logic
  • Multiple computing devices included in the task case management module can be distributed in the same AZ or in different AZs. Multiple computing devices included in the task case management module can be distributed in the same region or in different regions. Similarly, multiple computing devices included in the task case management module can be distributed in the same VPC or in multiple VPCs.
  • the plurality of computing devices may be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs.
  • the first local control device includes a first container management module and The first communication module.
  • the first local control device may also include a first task case management module and a first result display module.
  • the first communication module is configured to: receive a first local management instruction from the management device, the first local management instruction is determined according to the task use case, the task use case is determined according to the first management instruction, wherein the first management instruction includes the task environment management instructions and/or management instructions for task objects.
  • the first container management module is configured to manage the first local control device according to the first local management instruction.
  • the second local control device includes a second container management module. block and a second communication module.
  • the second local control device may also include a second task case management module and a second result display module.
  • the second communication module is configured to: receive a second local management instruction from the management device, the second local management instruction is determined according to the task use case, and the task use case is determined according to the second management instruction, wherein the second management instruction includes the task environment. management instructions and/or management instructions for task objects.
  • the second container management module is configured to manage the second local control device according to the second local management instruction.
  • modules are functionally logically divided and do not limit the above-mentioned modules to be independent hardware units.
  • the term “module” here can be implemented in the form of software and/or hardware, and is not specifically limited.
  • a “module” may be a software program, a hardware circuit, or a combination of both that implements the above functions.
  • the hardware circuit may include an application specific integrated circuit (ASIC), an electronic circuit, a processor (such as a shared processor, a dedicated processor, or a group processor) for executing one or more software or firmware programs. etc.) and memory, merged logic circuitry, and/or other suitable components to support the described functionality.
  • ASIC application specific integrated circuit
  • processor such as a shared processor, a dedicated processor, or a group processor for executing one or more software or firmware programs. etc.
  • memory merged logic circuitry, and/or other suitable components to support the described functionality.
  • modules of each example described in the embodiments of the present application can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.
  • FIG. 7 is a schematic diagram of the hardware structure of a distributed collaborative AI task evaluation system provided by an embodiment of the present application.
  • the hardware structural diagram of the task evaluation system in Figure 7 is suitable for multi-node task evaluation mode.
  • the computer device cluster corresponding to the system includes at least one computing device.
  • the at least one computing device may include a computing device 700A, a computing device 700B, and a computing device 700C.
  • Each computing device includes a bus 702, a processor 704, a communication interface 708, and a memory 706.
  • the processor 704, the memory 706 and the communication interface 708 communicate through the bus 702.
  • the bus 702 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
  • the bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one line is used in Figure 3, but it does not mean that there is only one bus or one type of bus.
  • Bus 702 may include a path that carries information between various components of computing device 700 (eg, memory 706, processor 704, communications interface 708).
  • the processor 704 may include a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (micro processor, MP) or a digital signal processor (digital signal processor, DSP). any one or more of them.
  • CPU central processing unit
  • GPU graphics processing unit
  • MP microprocessor
  • DSP digital signal processor
  • Memory 706 may include volatile memory, such as random access memory (RAM).
  • the processor 704 may also include non-volatile memory (non-volatile memory), such as read-only memory (ROM), flash memory, hard disk drive (HDD) or solid state drive (solid state drive). drive, SSD).
  • non-volatile memory such as read-only memory (ROM), flash memory, hard disk drive (HDD) or solid state drive (solid state drive). drive, SSD).
  • the memory 706 stores executable program codes, and the processor 704 executes the executable program codes to respectively implement the functions of the modules in the aforementioned device.
  • the memory 706 in the computing device 700A stores executable program codes for each module in the management device, and the processor 704 in the computing device 700A executes the executable program codes to implement the functions of the modules in the management device.
  • the memory 706 in the computing device 700B stores executable program codes for each module in the first local control device, and the processor 704 in the computing device 700B executes the executable program codes to implement the functions of the modules in the management device.
  • Memory 706 in computing device 700C stores a second local control device The executable program code of each module is installed, and the processor 704 in the computing device 700C executes the executable program code to implement the function of the module in the management device.
  • At least one computing device can jointly execute the instructions of the management device, the first local control device, and the second local control device for the method in FIG. 4, or FIG. 5, or FIG. 6.
  • the communication interface 708 uses transceiver modules such as, but not limited to, network interface cards and transceivers to implement communication between at least one computing device 700 and other devices or communication networks.
  • At least one computing device may be connected via a network.
  • the network may be a wide area network or a local area network, etc., as shown in Figure 7.
  • Figure 8 is a schematic diagram of the hardware structure of a distributed collaborative AI task evaluation and management device provided by an embodiment of the present application.
  • Figure 8 is a single-node task evaluation simulation mode suitable for distributed collaborative AI.
  • Computing device 800 includes bus 802, processor 804, communication interface 808, and memory 806.
  • the processor 804, the memory 806 and the communication interface 808 communicate through the bus 802.
  • the memory 806 in the computing device 800 stores executable program codes for each module in the management device, and the processor 804 in the computing device 800 executes the executable program code to implement the functions of the modules in the management device. .
  • the computing device 800 may jointly execute the instructions of the management device for the method in FIG. 4, or FIG. 5, or FIG. 6.
  • the present application also provides a computer program product.
  • the computer program product includes: computer program code.
  • the computer program code When the computer program code is run on the computer, the computer executes the execution as shown in Figure 4, or as shown in Figure 4. 5, or the method of the embodiment shown in Figure 6.
  • the present application also provides a computer-readable medium.
  • the computer-readable medium stores program code.
  • the program code When the program code is run on a computer, the computer executes the execution as shown in Figure 4, or Figure 4. 5, or the method of the embodiment shown in Figure 6.
  • the disclosed systems, devices and methods can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit.
  • the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Provided in the embodiments of the present application are a distributed collaborative artificial intelligence (AI) task evaluation method, a management apparatus, a control apparatus and a system. The method comprises: acquiring distributed collaborative AI task configuration and a distributed collaborative AI task object, the task configuration comprising the configuration of a distributed collaborative AI task environment; according to the configuration of the task environment and the task object, receiving a first management instruction, the first management instruction comprising a management instruction for the task environment and/or a management instruction for the task object; and managing a distributed collaborative AI task use case according to the first management instruction, the task use case comprising the task environment and the task object. Therefore, by means of flexibly configuring AI task use cases, the execution conditions of different AI task use cases can be obtained, and then an edge cloud collaborative distributed AI task architecture can be easily implemented and deployed.

Description

分布式协同AI任务评估方法、管理装置、控制装置和系统Distributed collaborative AI task evaluation method, management device, control device and system
本申请要求于2022年06月02日提交中国专利局、申请号为202210623375.8、申请名称为“分布式协同AI任务评估方法、管理装置、控制装置和系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requires the priority of the Chinese patent application submitted to the China Patent Office on June 2, 2022, with the application number 202210623375.8 and the application name "Distributed Collaborative AI Task Evaluation Method, Management Device, Control Device and System", all of which The contents are incorporated into this application by reference.
技术领域Technical field
本申请实施例涉及人工智能领域,并且更加具体地,涉及一种分布式协同AI任务评估方法、管理装置、控制装置和系统。Embodiments of the present application relate to the field of artificial intelligence, and more specifically, to a distributed collaborative AI task evaluation method, management device, control device and system.
背景技术Background technique
通过对分布式协同人工智能(artificial intelligence,AI)的任务进行处理可以有效利用不同设备的优势,来共同实现人工智能,尤其是边云协同分布式人工智能的任务处理。By processing the tasks of distributed collaborative artificial intelligence (AI), the advantages of different devices can be effectively used to jointly realize artificial intelligence, especially the task processing of edge-cloud collaborative distributed artificial intelligence.
由于云侧具有大规模算力的优势,在云侧执行机器学习(machine learning,ML)已经成为广为人知的方法,目前大部分大型云平台提供商已经提供了机器学习服务。然而机器学习作为人工智能一种具体形式,在通过机器学习获得相应模型的过程中,大量的数据并不能直接从云节点中直接获得,而是通过边缘设备获得的。Due to the advantage of large-scale computing power on the cloud side, performing machine learning (ML) on the cloud side has become a well-known method. Currently, most large cloud platform providers have provided machine learning services. However, machine learning is a specific form of artificial intelligence. In the process of obtaining corresponding models through machine learning, a large amount of data cannot be obtained directly from cloud nodes, but is obtained through edge devices.
随着边缘设备的性能的提升,部分机器学习任务可以迁移到边缘设备上,也就是边缘AI技术。边缘AI技术具有数据优势,并且可以减少通信带来的时延,可以用于时延要求高的任务中。由于云侧具有算力优势,边侧具有数据和低时延的优势,因此,分布式边云协同完成分布式协同AI的任务,成为了一种发展趋势。As the performance of edge devices improves, some machine learning tasks can be migrated to edge devices, which is edge AI technology. Edge AI technology has data advantages and can reduce communication delays, and can be used in tasks with high latency requirements. Since the cloud side has computing power advantages and the edge side has the advantages of data and low latency, it has become a development trend for distributed edge-cloud collaboration to complete the task of distributed collaborative AI.
但是,由于对分布式协同AI的处理存在对业务场景成熟度依赖性高、跨团队协作要求高等问题,尤其在分布式边云协同AI的任务处理中,因此,如何让分布式协同AI的任务处理架构易于落地部署成为一个亟待解决的问题。However, because the processing of distributed collaborative AI has problems such as high dependence on the maturity of business scenarios and high requirements for cross-team collaboration, especially in the task processing of distributed edge-cloud collaborative AI, how to make the tasks of distributed collaborative AI The ease of deployment of the processing architecture has become an urgent problem to be solved.
发明内容Contents of the invention
本申请实施例提供一种分布式协同AI任务评估方法、管理装置、控制装置和系统,通过灵活配置分布式协同AI的任务用例,进而灵活管理分布式协同AI的任务评估容器有助于获得不同AI任务用例的执行情况,进而使得分布式协同AI任务处理架构易于落地部署。Embodiments of the present application provide a distributed collaborative AI task evaluation method, management device, control device and system. By flexibly configuring the task use cases of distributed collaborative AI, and then flexibly managing the task evaluation container of distributed collaborative AI, it helps to obtain different results. The execution of AI task use cases makes the distributed collaborative AI task processing architecture easy to deploy.
第一方面,提供了一种分布式协同人工智能的任务评估的方法,该方法应用于控制节点,该方法包括:获取分布式协同AI的任务配置和分布式协同AI的任务对象,任务配置包括分布式协同AI的任务环境的配置;根据任务环境的配置和任务对象,接收第一管理指令,其中,第一管理指令包括任务环境的管理指令和/或任务对象的管理指令;根据第一管理指令,管理分布式协同AI的任务用例,任务用例包括任务环境和任务对象。In the first aspect, a method for task evaluation of distributed collaborative artificial intelligence is provided. The method is applied to the control node. The method includes: obtaining the task configuration of distributed collaborative AI and the task object of distributed collaborative AI. The task configuration includes Configuration of the task environment of the distributed collaborative AI; receiving the first management instruction according to the configuration of the task environment and the task object, wherein the first management instruction includes the management instruction of the task environment and/or the management instruction of the task object; according to the first management Instructions manage task use cases of distributed collaborative AI. Task use cases include task environments and task objects.
应理解,分布式协同AI表示AI范式对应的不同的AI流程可以分别在布置于同一个 设备的不同的容器中实现,或者不同的AI流程可以分别在布置于不同设备的不同的容器中实现,例如,机器学习范式中包括训练流程和推理流程,其中,训练流程和推理流程可以在同一个设备上布置的不同容器来实现,或者训练流程和推理流程可以分别在不同设备的不同容器上来实现。It should be understood that distributed collaborative AI means that different AI processes corresponding to the AI paradigm can be arranged in the same Implemented in different containers of the device, or different AI processes can be implemented in different containers arranged on different devices. For example, the machine learning paradigm includes a training process and an inference process, where the training process and the inference process can be implemented at the same time. It can be implemented in different containers arranged on one device, or the training process and inference process can be implemented in different containers on different devices.
在本申请实施例中,通过对任务环境和任务对象进行管理,可以灵活地对分布式协同人工智能的范式对应的任务用例进行管理,便于服务开发者根据实际场景下的业务需求,管理任务用例,从而便于对不同的任务用例进行处理,有助于获得不同AI任务用例的执行情况,进而使得边云协同分布式AI任务架构易于落地部署。In the embodiment of this application, by managing the task environment and task objects, task use cases corresponding to the paradigm of distributed collaborative artificial intelligence can be flexibly managed, which facilitates service developers to manage task use cases according to business needs in actual scenarios. , which facilitates the processing of different task use cases and helps to obtain the execution status of different AI task use cases, thereby making the edge-cloud collaborative distributed AI task architecture easy to deploy.
在一种可能的实现方式中,管理分布式协同AI的任务用例包括:新增任务用例,或者删除任务用例,或者修改任务用例,或者查询任务用例。In a possible implementation, managing distributed collaborative AI task use cases includes: adding task use cases, deleting task use cases, modifying task use cases, or querying task use cases.
在一种可能的实现方式中,当任务评估的模式为多节点任务评估模式时,方法还包括:根据任务用例和全局容器管理配置,生成对第一本地控制装置的第一本地管理指令和对第二本地控制装置的第二本地管理指令,其中,任务配置包括全局容器管理配置,全局容器管理配置包括第一本地控制装置、第一本地管理指令对应的指令动作、第二本地控制装置和第二本地管理指令对应的指令动作;向第一本地控制装置发送第一本地管理指令,向第二本地控制装置发送第二本地管理指令。In a possible implementation, when the task evaluation mode is a multi-node task evaluation mode, the method further includes: generating a first local management instruction and a response to the first local control device according to the task use case and the global container management configuration. The second local management instruction of the second local control device, wherein the task configuration includes a global container management configuration, and the global container management configuration includes the first local control device, the instruction action corresponding to the first local management instruction, the second local control device and the third Instruction actions corresponding to the two local management instructions: sending the first local management instruction to the first local control device, and sending the second local management instruction to the second local control device.
应理解,多节点任务评估模式可以在任务边节点和任务云节点通过控制云节点进行多节点分布式协同。It should be understood that the multi-node task evaluation mode can perform multi-node distributed collaboration at task edge nodes and task cloud nodes through control cloud nodes.
在一种可能的实现方式中,第一本地管理指令对应的指令动作为对第一本地控制装置的新增指令,或者对第一本地控制装置的删除指令,或者对第一本地控制装置的修改指令,或者对第一本地控制装置的查询指令;第二本地管理指令对应的管理指令为对第二本地控制装置的新增指令,或者对第二本地控制装置的删除指令,或者对第二本地控制装置的修改指令,或者对第二本地控制装置的查询指令。In a possible implementation, the instruction action corresponding to the first local management instruction is a new instruction to the first local control device, a deletion instruction to the first local control device, or a modification to the first local control device. command, or a query command to the first local control device; the management command corresponding to the second local management command is a new command to the second local control device, or a deletion command to the second local control device, or a command to the second local control device. Modification instructions to the control device, or query instructions to the second local control device.
在一种可能的实现方式中,该方法还包括:从第一本地控制装置或第二本地控制装置接收至少一个任务用例的评估结果;显示至少一个任务用例的评估结果。In a possible implementation, the method further includes: receiving an evaluation result of at least one task use case from the first local control device or the second local control device; and displaying the evaluation result of at least one task use case.
在本申请实施例中,在多节点分布式协同AI的任务评估中,通过对任务用例的管理,可以通过不同的范式管理相应的任务评估容器,以实现对不同任务用例的评估。特别对于数据提供方和算法提供方不一致时,可以保证在数据提供的数据不出边缘节点的情况下,通过获得不同任务用例的评估结果,有助于获得不同任务用例的执行情况,进而使得边云协同分布式AI任务架构易于落地部署。In the embodiment of this application, in the task evaluation of multi-node distributed collaborative AI, through the management of task use cases, the corresponding task evaluation containers can be managed through different paradigms to realize the evaluation of different task use cases. Especially when the data provider and algorithm provider are inconsistent, it can be ensured that when the data provided by the data does not leave the edge node, by obtaining the evaluation results of different task use cases, it helps to obtain the execution status of different task use cases, thereby making the edge The cloud collaborative distributed AI task architecture is easy to deploy.
在一种可能的实现方式中,当任务评估的模式为单节点任务评估仿真模式时,任务配置中的仿真开关配置参数为启动,方法还包括:根据任务用例和本地容器管理配置,管理第三任务评估容器,其中,任务评估容器包括本地仿真工作流,任务配置包括本地容器管理配置,本地容器管理配置包括AI任务评估容器和AI任务评估容器对应的管理动作,其中,第三任务评估容器对应的管理动作包括对第三任务评估容器的新增,或者对第三任务评估容器删除,或者对第三任务评估容器的修改,或者对第三任务评估容器的查询。In a possible implementation, when the task evaluation mode is a single-node task evaluation simulation mode, the simulation switch configuration parameter in the task configuration is start. The method also includes: managing the third party according to the task use case and the local container management configuration. A task evaluation container, where the task evaluation container includes a local simulation workflow, the task configuration includes a local container management configuration, and the local container management configuration includes an AI task evaluation container and management actions corresponding to the AI task evaluation container, where the third task evaluation container corresponds The management actions include adding a third task evaluation container, deleting a third task evaluation container, modifying a third task evaluation container, or querying a third task evaluation container.
应理解,单节点任务评估仿真模式可以是在云节点上进行单节点任务评估仿真,也可以是在边节点上进行单节点任务评估仿真。It should be understood that the single-node task evaluation simulation mode can be a single-node task evaluation simulation on a cloud node, or a single-node task evaluation simulation on an edge node.
在本申请实施例中,任务用例管理模块,通过任务配置和任务对象,来管理任务用例(例如,创建AI任务用例),容器管理模块通过任务用例和仿真模式,管理任务评估容 器(例如,新增AI任务容器),并通过任务评估容器中的仿真工作流进行任务评估仿真,得到仿真结果。不仅可以根据任务场景,对任务用例中的任务范式进行创建,还可以在单节点上实现多节点的仿真。对于服务开发者而言,通过本申请实施例中的分布式协同AI的任务评估架构,可以根据任务场景,灵活配置任务用例,也就是可以根据不同的AI范式,管理相应的任务用例(例如,创建相应的任务用例)。对于算法开发者而言,在单节点上实现多节点的仿真,不需要随着边侧任务环境的变换,在边侧和云侧反复部署分布式协同AI的任务评估架构,也不需要在多个虚拟节点或者多个设备节点上搭建AI任务评估平台,就可以对任务用例进行仿真,得到仿真结果,大大减少了人力和物力成本。In the embodiment of this application, the task use case management module manages task use cases (for example, creating AI task use cases) through task configurations and task objects, and the container management module manages task evaluation content through task use cases and simulation modes. container (for example, add a new AI task container), and perform task evaluation simulation through the simulation workflow in the task evaluation container to obtain simulation results. Not only can you create task paradigms in task use cases based on task scenarios, but you can also implement multi-node simulation on a single node. For service developers, through the task evaluation architecture of distributed collaborative AI in the embodiment of this application, task use cases can be flexibly configured according to task scenarios, that is, corresponding task use cases can be managed according to different AI paradigms (for example, Create corresponding task cases). For algorithm developers, to implement multi-node simulation on a single node, there is no need to repeatedly deploy the distributed collaborative AI task evaluation architecture on the side and cloud side as the side task environment changes, nor does it need to be deployed on multiple nodes. By building an AI task evaluation platform on a virtual node or multiple device nodes, you can simulate task use cases and obtain simulation results, which greatly reduces manpower and material costs.
在一种可能的实现方式中,当管理分布式协同第三的任务用例为新增任务用例时,根据任务用例和本地容器管理配置,管理第三任务评估容器包括:根据任务用例和本地容器管理配置,新增第三任务评估容器第三任务评估容器包括本地仿真工作流。In a possible implementation, when the third task use case for managing distributed collaboration is a new task use case, managing the third task evaluation container according to the task use case and local container management configuration includes: according to the task use case and local container management Configuration, add the third task evaluation container The third task evaluation container includes the local simulation workflow.
在一种可能的实现方式中,方法包括:任务配置还包括第三任务评估容器对应的仿真模式;第三任务评估容器对应的仿真模式为如下模式中任意一种:仿真模式为仿真测试算法性能;或者仿真模式为仿真测试系统性能;或者仿真模式为仿真测试算法性能和系统性能;或者仿真模式为仿真测试系统单元性能。In a possible implementation, the method includes: the task configuration also includes a simulation mode corresponding to the third task evaluation container; the simulation mode corresponding to the third task evaluation container is any one of the following modes: the simulation mode is a simulation test algorithm performance ; Or the simulation mode is to simulate and test system performance; or the simulation mode is to simulate and test algorithm performance and system performance; or the simulation mode is to simulate and test system unit performance.
在一种可能的实现方式中,当仿真模式为仿真测试算法性能时,方法还包括:在仿真测试算法性能的仿真模式对应的第三任务评估容器中,创建算法伪容器;或者当仿真模式为仿真测试系统性能时,方法还包括:在仿真测试系统性能的仿真模式对应的第三任务评估容器中,创建系统伪容器;或者当仿真模式为仿真测试算法性能和系统性能时,在仿真测试算法性能和系统性能的仿真模式对应的第三任务评估容器中,创建真实容器;或者当仿真模式为仿真测试系统单元性能时,沿用仿真模式为仿真测试系统单元性能的仿真模式对应的第三任务评估容器。In a possible implementation, when the simulation mode is to simulate and test algorithm performance, the method also includes: creating an algorithm pseudo-container in the third task evaluation container corresponding to the simulation mode of simulating and testing algorithm performance; or when the simulation mode is to When simulating and testing system performance, the method also includes: creating a system pseudo-container in the third task evaluation container corresponding to the simulation mode of simulating and testing system performance; or when the simulation mode is to simulate and test algorithm performance and system performance, in the simulation test algorithm Create a real container in the third task evaluation container corresponding to the simulation mode of performance and system performance; or when the simulation mode is to simulate the test system unit performance, continue to use the third task evaluation corresponding to the simulation mode to simulate the test system unit performance. container.
在一种可能的实现方式中,方法还包括:接收来自第三任务评估容器的至少一个任务用例的仿真结果,仿真结果是第三任务评估容器根据本地容器调用配置和第三任务评估容器对应的仿真模式,对任务用例进行仿真得到的;其中,任务配置包括本地容器调用配置,本地容器调用配置包括任务范式中算法模块的调用顺序,和算法模块对应的超参数,其中,任务用例包括任务环境,任务环境包括任务范式;显示至少一个任务用例的仿真结果。In a possible implementation, the method further includes: receiving a simulation result of at least one task case from the third task evaluation container, where the simulation result is that the third task evaluation container calls the configuration according to the local container and corresponds to the third task evaluation container. Simulation mode is obtained by simulating task use cases; among them, the task configuration includes the local container call configuration, and the local container call configuration includes the calling sequence of the algorithm module in the task paradigm, and the hyperparameters corresponding to the algorithm module. Among them, the task use case includes the task environment. , the task environment includes the task paradigm; the simulation results of at least one task use case are displayed.
在本申请实施例中,通过将对不同任务用例进行仿真后的仿真结果展示出来,可以让用户直观地比较任务对象的仿真结果,从而得到客观的任务用例的评估结果,利于用户选择合适的目标任务对象。In the embodiment of this application, by displaying the simulation results after simulating different task use cases, users can intuitively compare the simulation results of task objects, thereby obtaining objective evaluation results of task use cases, which is helpful for users to select appropriate goals. Task object.
在一种可能的实现方式中,任务评估的类型为基准测试、认证或竞赛中的任意一种。In one possible implementation, the type of task evaluation is any one of benchmark testing, certification, or competition.
在一种可能的实现方式中,当任务评估的类型为基准测试时,任务环境为测试环境,任务对象为测试对象,测试对象为测试算法、测试模型、测试系统或测试场景中的任意一种,任务范式为测试范式。In a possible implementation, when the type of task evaluation is benchmark testing, the task environment is a test environment, the task object is a test object, and the test object is any one of a test algorithm, a test model, a test system, or a test scenario. , the task paradigm is the test paradigm.
第二方面,提供了一种分布式协同人工智能AI的任务评估的方法,该方法应用于任务云节点,该方法包括:接收来自管理装置的第一本地管理指令,第一本地管理指令是根据分布式协同AI的任务用例确定的,任务用例是根据第一管理指令确定的,其中,第一管理指令包括任务环境的管理指令和/或任务对象的管理指令,任务用例包括任务环境和任务对象;根据第一本地管理指令,管理第一本地控制装置。In the second aspect, a method for task evaluation of distributed collaborative artificial intelligence AI is provided. The method is applied to task cloud nodes. The method includes: receiving a first local management instruction from a management device, and the first local management instruction is based on The task use case of distributed collaborative AI is determined according to the first management instruction, where the first management instruction includes the management instruction of the task environment and/or the management instruction of the task object, and the task use case includes the task environment and the task object. ; Manage the first local control device according to the first local management instruction.
在本申请实施例中,在多节点分布式协同AI的任务评估中,通过对任务用例的管理, 可以通过不同的范式管理相应的任务评估容器,以实现对不同任务用例的评估。特别对于数据提供方和算法提供方不一致时,可以保证在数据提供的数据不出边缘节点的情况下,通过获得不同任务用例的评估结果,有助于获得不同任务用例的执行情况,进而使得边云协同分布式AI任务架构易于落地部署。In the embodiment of this application, in the task evaluation of multi-node distributed collaborative AI, through the management of task use cases, Corresponding task evaluation containers can be managed through different paradigms to achieve evaluation of different task use cases. Especially when the data provider and algorithm provider are inconsistent, it can be ensured that when the data provided by the data does not leave the edge node, by obtaining the evaluation results of different task use cases, it helps to obtain the execution status of different task use cases, thereby making the edge The cloud collaborative distributed AI task architecture is easy to deploy.
在一种可能的实现方式中,第一管理指令用于新增任务用例,或者删除任务用例,或者修改任务用例,或者查询任务用例;管理第一本地控制装置为新增第一本地控制装置,或者删除第一本地控制装置,或者修改第一本地控制装置,或者查询第一本地控制装置。In a possible implementation, the first management instruction is used to add a task use case, delete a task use case, modify a task use case, or query a task use case; managing the first local control device is to add a first local control device, Either delete the first local control device, modify the first local control device, or query the first local control device.
在一种可能的实现方式中,当管理第一本地控制装置为新增第一本地控制装置时,方法还包括:第一本地管理指令包括任务用例和任务云节点对应的第一本地容器管理配置;根据任务用例和第一本地容器管理配置,管理第一任务评估容器,第一本地容器管理配置包括第一任务评估容器和第一评估容器对应的管理指令。In a possible implementation, when managing the first local control device is to add a new first local control device, the method further includes: the first local management instruction includes the task use case and the first local container management configuration corresponding to the task cloud node. ; Manage the first task evaluation container according to the task use case and the first local container management configuration, and the first local container management configuration includes the first task evaluation container and management instructions corresponding to the first evaluation container.
在一种可能的实现方式中,第一任务评估容器对应的管理指令为对第一任务评估容器的新增指令,或者对第一任务评估容器的删除指令,或者对第一任务评估容器的修改指令,或者对第一任务评估容器的查询指令,或者实现第一任务评估容器和外部任务评估容器之间的通信指令,其中,外部任务评估容器属于任务云节点以外的节点。In a possible implementation, the management instruction corresponding to the first task evaluation container is a new instruction to the first task evaluation container, a deletion instruction to the first task evaluation container, or a modification to the first task evaluation container. Instructions, or query instructions to the first task evaluation container, or instructions to implement communication between the first task evaluation container and an external task evaluation container, wherein the external task evaluation container belongs to a node other than the task cloud node.
在一种可能的实现方式中,如果第一任务评估容器为新增第一任务评估容器,那么新增任务云节点对应的第一任务评估容器,其中,任务云节点对应的第一AI任务评估容器包括第一分布式工作流。In a possible implementation, if the first task evaluation container is a newly added first task evaluation container, then the first task evaluation container corresponding to the new task cloud node, where the first AI task evaluation container corresponding to the task cloud node The container includes a first distributed workflow.
在一种可能的实现方式中,方法还包括:当任务用例的评估结果在第一任务评估容器时,接收来自第一AI任务评估容器对至少一个任务用例的评估结果;显示至少一个任务用例的评估结果;向管理模块发送至少一个任务用例的评估结果。In a possible implementation, the method further includes: when the evaluation result of the task use case is in the first task evaluation container, receiving the evaluation result of at least one task use case from the first AI task evaluation container; displaying the evaluation result of the at least one task use case. Evaluation results; send the evaluation results of at least one task case to the management module.
在本申请实施例中,通过在将对不同任务用例进行评估后的评估结果展示出来,可以让用户直观地比较任务对象的评估结果,从而得到客观的任务用例的评估结果,利于用户选择合适的目标任务对象。In the embodiment of this application, by displaying the evaluation results after evaluating different task use cases, the user can intuitively compare the evaluation results of the task objects, thereby obtaining objective evaluation results of the task use cases, which is beneficial to the user in choosing the appropriate one. Target task object.
在一种可能的实现方式中,任务评估的类型为基准测试、认证或竞赛中的任意一种。In one possible implementation, the type of task evaluation is any one of benchmark testing, certification, or competition.
在一种可能的实现方式中,当任务评估的类型为基准测试时,任务环境为测试环境,任务对象为测试对象,测试对象为测试算法、测试模型、测试系统或测试场景中的任意一种,任务范式为测试范式。In a possible implementation, when the type of task evaluation is benchmark testing, the task environment is a test environment, the task object is a test object, and the test object is any one of a test algorithm, a test model, a test system, or a test scenario. , the task paradigm is the test paradigm.
第三方面,提供了一种分布式协同人工智能AI的任务评估的方法,该方法应用于任务边节点,该方法包括:接收来自管理装置的第二本地管理指令,第二本地管理指令是根据任务用例确定的,任务用例是根据第二管理指令确定的,其中,第二管理指令包括任务环境的管理指令和/或任务对象的管理指令,任务用例包括任务环境和任务对象;根据第二本地管理指令,管理第二本地控制装置。In the third aspect, a method for task evaluation of distributed collaborative artificial intelligence AI is provided. The method is applied to task edge nodes. The method includes: receiving a second local management instruction from the management device, and the second local management instruction is based on The task use case is determined according to the second management instruction, wherein the second management instruction includes the management instruction of the task environment and/or the management instruction of the task object, and the task use case includes the task environment and the task object; according to the second local Management instructions to manage the second local control device.
在本申请实施例中,在多节点分布式协同AI的任务评估中,通过对任务用例的管理,可以通过不同的范式管理相应的任务评估容器,以实现对不同任务用例的评估。特别对于数据提供方和算法提供方不一致时,可以保证在数据提供的数据不出边缘节点的情况下,通过获得不同任务用例的评估结果,有助于获得不同任务用例的执行情况,进而使得边云协同分布式AI任务架构易于落地部署。In the embodiment of this application, in the task evaluation of multi-node distributed collaborative AI, through the management of task use cases, the corresponding task evaluation containers can be managed through different paradigms to realize the evaluation of different task use cases. Especially when the data provider and algorithm provider are inconsistent, it can be ensured that when the data provided by the data does not leave the edge node, by obtaining the evaluation results of different task use cases, it helps to obtain the execution status of different task use cases, thereby making the edge The cloud collaborative distributed AI task architecture is easy to deploy.
在一种可能的实现方式中,第二管理指令用于新增任务用例,或者删除任务用例,或者修改任务用例,或者查询任务用例;管理第二本地控制装置为新增第二本地控制装置, 或者删除第二本地控制装置,或者修改第二本地控制装置,或者查询第二本地控制装置。In a possible implementation, the second management instruction is used to add a task use case, delete a task use case, modify a task use case, or query a task use case; managing the second local control device is to add a second local control device, Either delete the second local control device, modify the second local control device, or query the second local control device.
在一种可能的实现方式中,当管理第二本地控制装置为新增第二本地控制装置时,方法还包括:第二本地管理指令包括任务用例和任务云节点对应的第二本地容器管理配置;根据任务用例和第二本地容器管理配置,管理第二任务评估容器,第二本地容器管理配置包括第二任务评估容器和第二评估容器对应的管理指令。In a possible implementation, when managing the second local control device is to add a second local control device, the method further includes: the second local management instruction includes the task use case and the second local container management configuration corresponding to the task cloud node. ; Manage the second task evaluation container according to the task use case and the second local container management configuration, and the second local container management configuration includes the second task evaluation container and management instructions corresponding to the second evaluation container.
在一种可能的实现方式中,第二任务评估容器对应的管理指令为对第二任务评估容器的新增指令,或者对第二任务评估容器的删除指令,或者对第二任务评估容器的修改指令,或者对第二任务评估容器的查询指令,或者实现第二任务评估容器和外部任务评估容器之间的通信指令,其中,外部任务评估容器属于任务云节点以外的节点。In a possible implementation, the management instruction corresponding to the second task evaluation container is a new instruction to the second task evaluation container, a deletion instruction to the second task evaluation container, or a modification to the second task evaluation container. Instructions, or query instructions for the second task evaluation container, or instructions to implement communication between the second task evaluation container and the external task evaluation container, wherein the external task evaluation container belongs to a node other than the task cloud node.
在一种可能的实现方式中,如果第二任务评估容器为新增第二任务评估容器,那么新增任务云节点对应的第二任务评估容器,其中,任务云节点对应的第二AI任务评估容器包括第二分布式工作流。In a possible implementation, if the second task evaluation container is a new second task evaluation container, then a second task evaluation container corresponding to the new task cloud node is added, where the second AI task evaluation container corresponding to the task cloud node is The container includes a second distributed workflow.
在一种可能的实现方式中,方法还包括:当任务用例的评估结果在第二任务评估容器时,接收来自第二AI任务评估容器对至少一个任务用例的评估结果;显示至少一个任务用例的评估结果;向管理装置发送至少一个任务用例的评估结果。In a possible implementation, the method further includes: when the evaluation result of the task use case is in the second task evaluation container, receiving the evaluation result of at least one task use case from the second AI task evaluation container; displaying the evaluation result of the at least one task use case. Evaluation results; sending the evaluation results of at least one task use case to the management device.
在一种可能的实现方式中,任务评估的类型为基准测试、认证或竞赛中的任意一种。In one possible implementation, the type of task evaluation is any one of benchmark testing, certification, or competition.
在一种可能的实现方式中,当任务评估的类型为基准测试时,任务环境为测试环境,任务对象为测试对象,测试对象为测试算法、测试模型、测试系统或测试场景中的任意一种,任务范式为测试范式。In a possible implementation, when the type of task evaluation is benchmark testing, the task environment is a test environment, the task object is a test object, and the test object is any one of a test algorithm, a test model, a test system, or a test scenario. , the task paradigm is the test paradigm.
第四方面,提供了一种分布式协同人工智能的任务评估的管理装置,该装置包括任务用例管理模块和通信模块,该装置包括:通信模块用于:获取分布式协同AI的任务配置和分布式协同AI的任务对象,任务配置包括分布式协同AI的任务环境的配置;根据任务环境的配置和任务对象,接收第一管理指令,其中,第一管理指令包括任务环境的管理指令和/或任务对象的管理指令;任务用例管理模块用于根据第一管理指令,管理分布式协同AI的任务用例,任务用例包括任务环境和任务对象。In the fourth aspect, a management device for task evaluation of distributed collaborative artificial intelligence is provided. The device includes a task use case management module and a communication module. The device includes: a communication module used to: obtain the task configuration and distribution of distributed collaborative AI. The task object of the distributed collaborative AI, the task configuration includes the configuration of the task environment of the distributed collaborative AI; according to the configuration of the task environment and the task object, the first management instruction is received, wherein the first management instruction includes the management instruction of the task environment and/or Management instructions for task objects; the task use case management module is used to manage task use cases of distributed collaborative AI according to the first management instruction. The task use cases include task environments and task objects.
应理解,分布式协同AI表示AI范式对应的不同的AI流程可以分别在布置于同一个设备的不同的容器中实现,或者不同的AI流程可以分别在布置于不同设备的不同的容器中实现,例如,机器学习范式中包括训练流程和推理流程,其中,训练流程和推理流程可以在同一个设备上布置的不同容器来实现,或者训练流程和推理流程可以分别在不同设备的不同容器上来实现。It should be understood that distributed collaborative AI means that different AI processes corresponding to the AI paradigm can be implemented in different containers arranged on the same device, or different AI processes can be implemented in different containers arranged on different devices. For example, the machine learning paradigm includes a training process and an inference process, where the training process and the inference process can be implemented in different containers arranged on the same device, or the training process and the inference process can be implemented in different containers on different devices.
在本申请实施例中,通过对任务环境和任务对象进行管理,可以灵活地对分布式协同人工智能的范式对应的任务用例进行管理,便于服务开发者根据实际场景下的业务需求,管理任务用例,从而便于对不同的任务用例进行处理,有助于获得不同AI任务用例的执行情况,进而使得边云协同分布式AI任务架构易于落地部署。In the embodiment of this application, by managing the task environment and task objects, task use cases corresponding to the paradigm of distributed collaborative artificial intelligence can be flexibly managed, which facilitates service developers to manage task use cases according to business needs in actual scenarios. , which facilitates the processing of different task use cases and helps to obtain the execution status of different AI task use cases, thereby making the edge-cloud collaborative distributed AI task architecture easy to deploy.
在一种可能的实现方式中,管理分布式协同AI的任务用例包括:新增任务用例,或者删除任务用例,或者修改任务用例,或者查询任务用例。In a possible implementation, managing distributed collaborative AI task use cases includes: adding task use cases, deleting task use cases, modifying task use cases, or querying task use cases.
在一种可能的实现方式中,当任务评估的模式为多节点任务评估模式时,该管理装置布置于控制云节点,该管理装置还包括容器管理模块,容器管理模块用于:根据任务用例和全局容器管理配置,生成对第一本地控制装置的第一本地管理指令和对第二本地控制装置的第二本地管理指令,其中,任务配置包括全局容器管理配置,全局容器管理配置包括 第一本地控制装置、第一本地管理指令对应的指令动作、第二本地控制装置和第二本地管理指令对应的指令动作;通信模块用于:向第一本地控制装置发送第一本地管理指令,向第二本地控制装置发送第二本地管理指令。In a possible implementation, when the task evaluation mode is a multi-node task evaluation mode, the management device is arranged at the control cloud node. The management device also includes a container management module, and the container management module is configured to: according to the task use case and The global container management configuration generates a first local management instruction for the first local control device and a second local management instruction for the second local control device, wherein the task configuration includes a global container management configuration, and the global container management configuration includes The first local control device, the instruction action corresponding to the first local management instruction, the second local control device and the instruction action corresponding to the second local management instruction; the communication module is used to: send the first local management instruction to the first local control device, Send a second local management instruction to the second local control device.
应理解,多节点任务评估模式可以在任务边节点和任务云节点通过控制云节点进行多节点分布式协同。It should be understood that the multi-node task evaluation mode can perform multi-node distributed collaboration at task edge nodes and task cloud nodes through control cloud nodes.
在一种可能的实现方式中,第一本地管理指令对应的指令动作为对第一本地控制装置的新增指令,或者对第一本地控制装置的删除指令,或者对第一本地控制装置的修改指令,或者对第一本地控制装置的查询指令;第二本地管理指令对应的管理指令为对第二本地控制装置的新增指令,或者对第二本地控制装置的删除指令,或者对第二本地控制装置的修改指令,或者对第二本地控制装置的查询指令。In a possible implementation, the instruction action corresponding to the first local management instruction is a new instruction to the first local control device, a deletion instruction to the first local control device, or a modification to the first local control device. command, or a query command to the first local control device; the management command corresponding to the second local management command is a new command to the second local control device, or a deletion command to the second local control device, or a command to the second local control device. Modification instructions to the control device, or query instructions to the second local control device.
在一种可能的实现方式中,该装置还包括第二结果显示模块:通信模块用于:从第一本地控制装置或第二本地控制装置接收至少一个任务用例的执行情况;第二结果显示模块用于:显示至少一个任务用例的执行情况。In a possible implementation, the device further includes a second result display module: the communication module is configured to: receive the execution status of at least one task case from the first local control device or the second local control device; the second result display module Used for: Displaying the execution status of at least one task case.
在本申请实施例中,在多节点分布式协同AI的任务评估中,通过对任务用例的管理,可以通过不同的范式管理相应的任务评估容器,以实现对不同任务用例的评估。特别对于数据提供方和算法提供方不一致时,可以保证在数据提供的数据不出边缘节点的情况下,通过获得不同任务用例的评估结果,有助于获得不同任务用例的执行情况,进而使得边云协同分布式AI任务架构易于落地部署。In the embodiment of this application, in the task evaluation of multi-node distributed collaborative AI, through the management of task use cases, the corresponding task evaluation containers can be managed through different paradigms to realize the evaluation of different task use cases. Especially when the data provider and algorithm provider are inconsistent, it can be ensured that when the data provided by the data does not leave the edge node, by obtaining the evaluation results of different task use cases, it helps to obtain the execution status of different task use cases, thereby making the edge The cloud collaborative distributed AI task architecture is easy to deploy.
在一种可能的实现方式中,当任务评估的模式为单节点任务评估仿真模式时,任务配置中的仿真开关配置参数为启动,该管理装置布置于单节点,该管理装置还包括容器管理模块:容器管理模块用于:根据任务用例和本地容器管理配置,管理第三任务评估容器,其中,任务评估容器包括本地仿真工作流,任务配置包括本地容器管理配置,本地容器管理配置包括AI任务评估容器和AI任务评估容器对应的管理动作,其中,第三任务评估容器对应的管理动作包括对第三任务评估容器的新增,或者对第三任务评估容器删除,或者对第三任务评估容器的修改,或者对第三任务评估容器的查询。In a possible implementation, when the task evaluation mode is a single-node task evaluation simulation mode, the simulation switch configuration parameter in the task configuration is start, the management device is arranged on a single node, and the management device also includes a container management module : The container management module is used to: manage the third task evaluation container according to the task use case and the local container management configuration, where the task evaluation container includes the local simulation workflow, the task configuration includes the local container management configuration, and the local container management configuration includes the AI task evaluation. Management actions corresponding to the container and the AI task evaluation container. The management actions corresponding to the third task evaluation container include adding a third task evaluation container, deleting the third task evaluation container, or deleting the third task evaluation container. Modification, or query to the third task evaluation container.
应理解,单节点任务评估仿真模式可以是在云节点上进行单节点任务评估仿真,也可以是在边节点上进行单节点任务评估仿真,也就是该管理装置可以布置于边节点上,也可以布置于多节点上。It should be understood that the single-node task evaluation simulation mode can be to perform single-node task evaluation simulation on the cloud node, or to perform single-node task evaluation simulation on the edge node, that is, the management device can be arranged on the edge node, or can be Arranged on multiple nodes.
在本申请实施例中,任务用例管理模块,通过任务配置和任务对象,来管理任务用例(例如,创建AI任务用例),容器管理模块通过任务用例和仿真模式,管理任务评估容器(例如,新增AI任务容器),并通过任务评估容器中的仿真工作流进行任务评估仿真,得到仿真结果。不仅可以根据任务场景,对任务用例中的任务范式进行创建,还可以在单节点上实现多节点的仿真。对于服务开发者而言,通过本申请实施例中的分布式协同AI的任务评估架构,可以根据任务场景,灵活配置任务用例,也就是可以根据不同的AI范式,管理相应的任务用例(例如,创建相应的任务用例)。对于算法开发者而言,在单节点上实现多节点的仿真,不需要随着边侧任务环境的变换,在边侧和云侧反复部署分布式协同AI的任务评估架构,也不需要在多个虚拟节点或者多个设备节点上搭建AI任务评估平台,就可以对任务用例进行仿真,得到仿真结果,大大减少了人力和物力成本。In the embodiment of this application, the task use case management module manages task use cases (for example, creating AI task use cases) through task configurations and task objects, and the container management module manages task evaluation containers (for example, new AI task use cases) through task use cases and simulation modes. Add AI task container), and perform task evaluation simulation through the simulation workflow in the task evaluation container to obtain the simulation results. Not only can you create task paradigms in task use cases based on task scenarios, but you can also implement multi-node simulation on a single node. For service developers, through the task evaluation architecture of distributed collaborative AI in the embodiment of this application, task use cases can be flexibly configured according to task scenarios, that is, corresponding task use cases can be managed according to different AI paradigms (for example, Create corresponding task cases). For algorithm developers, to implement multi-node simulation on a single node, there is no need to repeatedly deploy the distributed collaborative AI task evaluation architecture on the side and cloud side as the side task environment changes, nor does it need to be deployed on multiple nodes. By building an AI task evaluation platform on a virtual node or multiple device nodes, you can simulate task use cases and obtain simulation results, which greatly reduces manpower and material costs.
在一种可能的实现方式中,当管理分布式协同AI的任务用例为新增任务用例时,容器管理模块用于:根据任务用例和本地容器管理配置,新增第三任务评估容器,第三任务 评估容器包括本地仿真工作流。In a possible implementation, when the task use case for managing distributed collaborative AI is a new task use case, the container management module is used to: add a third task evaluation container according to the task use case and local container management configuration, and add a third task evaluation container. Task The evaluation container includes local simulation workflows.
在一种可能的实现方式中,所述任务配置还包括所述第三任务评估容器对应的仿真模式;所述第三任务评估容器对应的仿真模式为如下模式中任意一种:所述仿真模式为仿真测试算法性能;或者所述仿真模式为仿真测试系统性能;或者所述仿真模式为仿真测试所述算法性能和所述系统性能;或者所述仿真模式为仿真测试系统单元性能。In a possible implementation, the task configuration also includes a simulation mode corresponding to the third task evaluation container; the simulation mode corresponding to the third task evaluation container is any one of the following modes: the simulation mode It is to simulate and test algorithm performance; or the simulation mode is to simulate and test system performance; or the simulation mode is to simulate and test the algorithm performance and the system performance; or the simulation mode is to simulate and test system unit performance.
在一种可能的实现方式中,当仿真模式为仿真测试算法性能时,在仿真测试算法性能的仿真模式对应的第三任务评估容器中,容器管理模块还用于创建算法伪容器;或者当仿真模式为仿真测试系统性能时,在仿真测试系统性能的仿真模式对应的第三任务评估容器中,容器管理模块还用于:创建系统伪容器;或者当仿真模式为仿真测试算法性能和系统性能时,在仿真测试算法性能和系统性能的仿真模式对应的第三任务评估容器中,容器管理模块还用于创建真实容器;或者当仿真模式为仿真测试系统单元性能时,容器管理模块还用于沿用仿真模式为仿真测试系统单元性能的仿真模式对应的第三任务评估容器。In a possible implementation, when the simulation mode is to simulate and test algorithm performance, in the third task evaluation container corresponding to the simulation mode of simulating and testing algorithm performance, the container management module is also used to create an algorithm pseudo-container; or when the simulation When the mode is to simulate and test system performance, in the third task evaluation container corresponding to the simulation mode of simulating and testing system performance, the container management module is also used to: create a system pseudo-container; or when the simulation mode is to simulate and test algorithm performance and system performance , in the third task evaluation container corresponding to the simulation mode of simulating test algorithm performance and system performance, the container management module is also used to create a real container; or when the simulation mode is to simulate and test system unit performance, the container management module is also used to inherit The simulation mode is a third task evaluation container corresponding to the simulation mode that simulates the performance of the test system unit.
在一种可能的实现方式中,管理装置还包括结果显示模块,通信模块用于接收来自第三任务评估容器的至少一个任务用例的仿真结果,仿真结果是第三任务评估容器根据本地容器调用配置和第三任务评估容器对应的仿真模式,对任务用例进行仿真得到的;其中,任务配置包括本地容器调用配置,本地容器调用配置包括任务范式中算法模块的调用顺序,和算法模块对应的超参数,其中,任务用例包括任务环境,任务环境包括任务范式;结果显示模块用于:显示至少一个任务用例的仿真结果。In a possible implementation, the management device further includes a result display module, and the communication module is configured to receive the simulation result of at least one task case from the third task evaluation container, and the simulation result is that the third task evaluation container calls the configuration according to the local container The simulation mode corresponding to the third task evaluation container is obtained by simulating the task use case; among them, the task configuration includes the local container call configuration, and the local container call configuration includes the calling sequence of the algorithm module in the task paradigm, and the hyperparameters corresponding to the algorithm module. , wherein the task use case includes a task environment, and the task environment includes a task paradigm; the result display module is used to: display the simulation result of at least one task use case.
在本申请实施例中,通过将对不同任务用例进行仿真后的仿真结果展示出来,可以让用户直观地比较任务对象的仿真结果,从而得到客观的任务用例的评估结果,利于用户选择合适的目标任务对象。In the embodiment of this application, by displaying the simulation results after simulating different task use cases, users can intuitively compare the simulation results of task objects, thereby obtaining objective evaluation results of task use cases, which is helpful for users to select appropriate goals. Task object.
在一种可能的实现方式中,任务评估的类型为基准测试、认证或竞赛中的任意一种。In one possible implementation, the type of task evaluation is any one of benchmark testing, certification, or competition.
在一种可能的实现方式中,当所述任务评估的类型为基准测试时,所述任务环境为测试环境,所述任务对象为测试对象,所述测试对象为测试算法、测试模型、测试系统或测试场景中的任意一种,任务范式为测试范式。In a possible implementation, when the type of task evaluation is benchmark testing, the task environment is a test environment, the task object is a test object, and the test object is a test algorithm, a test model, and a test system. Or any one of the test scenarios, the task paradigm is the test paradigm.
第五方面,提供了一种分布式协同人工智能AI的任务评估的第一本地控制装置,该第一本地控制装置布置于任务云节点,第一本地控制装置包括第一通信模块和第一容器管理模块:第一通信模块用于:接收来自管理装置的第一本地管理指令,第一本地管理指令是根据分布式协同AI的任务用例确定的,任务用例是根据第一管理指令确定的,其中,第一管理指令包括任务环境的管理指令和/或任务对象的管理指令,任务用例包括任务环境和任务对象;第一容器管理模块用于:根据第一本地管理指令,管理第一本地控制装置。In the fifth aspect, a first local control device for task evaluation of distributed collaborative artificial intelligence AI is provided. The first local control device is arranged at a task cloud node. The first local control device includes a first communication module and a first container. Management module: the first communication module is used to: receive the first local management instruction from the management device, the first local management instruction is determined according to the task use case of the distributed collaborative AI, and the task use case is determined according to the first management instruction, where , the first management instruction includes the management instruction of the task environment and/or the management instruction of the task object, the task use case includes the task environment and the task object; the first container management module is used to: manage the first local control device according to the first local management instruction .
在本申请实施例中,在多节点分布式协同AI的任务评估中,通过对任务用例的管理,可以通过不同的范式管理相应的任务评估容器,以实现对不同任务用例的评估。特别对于数据提供方和算法提供方不一致时,可以保证在数据提供的数据不出边缘节点的情况下,通过获得不同任务用例的评估结果,有助于获得不同任务用例的执行情况,进而使得边云协同分布式AI任务架构易于落地部署。In the embodiment of this application, in the task evaluation of multi-node distributed collaborative AI, through the management of task use cases, the corresponding task evaluation containers can be managed through different paradigms to realize the evaluation of different task use cases. Especially when the data provider and algorithm provider are inconsistent, it can be ensured that when the data provided by the data does not leave the edge node, by obtaining the evaluation results of different task use cases, it helps to obtain the execution status of different task use cases, thereby making the edge The cloud collaborative distributed AI task architecture is easy to deploy.
在一种可能的实现方式中,第一管理指令用于新增任务用例,或者删除任务用例,或者修改任务用例,或者查询任务用例;管理第一本地控制装置为新增第一本地控制装置,或者删除第一本地控制装置,或者修改第一本地控制装置,或者查询第一本地控制装置。In a possible implementation, the first management instruction is used to add a task use case, delete a task use case, modify a task use case, or query a task use case; managing the first local control device is to add a first local control device, Either delete the first local control device, modify the first local control device, or query the first local control device.
在一种可能的实现方式中,当管理第一本地控制装置为新增第一本地控制装置时,第 一本地管理指令包括任务用例和任务云节点对应的第一本地容器管理配置;第一容器管理模块还用于:根据任务用例和第一本地容器管理配置,管理第一任务评估容器,第一本地容器管理配置包括第一任务评估容器和第一评估容器对应的管理指令。In a possible implementation, when the first local control device is managed to add a first local control device, the first local control device A local management instruction includes a task use case and a first local container management configuration corresponding to the task cloud node; the first container management module is also used to: manage the first task evaluation container, the first local container according to the task use case and the first local container management configuration. The container management configuration includes a first task evaluation container and management instructions corresponding to the first evaluation container.
在一种可能的实现方式中,第一任务评估容器对应的管理指令为对第一任务评估容器的新增指令,或者对第一任务评估容器的删除指令,或者对第一任务评估容器的修改指令,或者对第一任务评估容器的查询指令,或者实现第一任务评估容器和外部任务评估容器之间的通信指令,其中,外部任务评估容器属于任务云节点以外的节点。In a possible implementation, the management instruction corresponding to the first task evaluation container is a new instruction to the first task evaluation container, a deletion instruction to the first task evaluation container, or a modification to the first task evaluation container. Instructions, or query instructions to the first task evaluation container, or instructions to implement communication between the first task evaluation container and an external task evaluation container, wherein the external task evaluation container belongs to a node other than the task cloud node.
在一种可能的实现方式中,如果第一任务评估容器为新增第一任务评估容器,那么第一容器管理模块还用于:新增任务云节点对应的第一任务评估容器,其中,任务云节点对应的第一AI任务评估容器包括第一分布式工作流。In a possible implementation, if the first task evaluation container is a new first task evaluation container, then the first container management module is also used to: add a first task evaluation container corresponding to the task cloud node, where the task The first AI task evaluation container corresponding to the cloud node includes a first distributed workflow.
在一种可能的实现方式中,第一本地控制装置还包括第一结果显示模块:当任务用例的评估结果在第一任务评估容器时,第一通信模块还用于:接收来自第一AI任务评估容器对至少一个任务用例的评估结果;第一结果显示模块用于:显示至少一个任务用例的评估结果;向管理模块发送至少一个任务用例的评估结果。In a possible implementation, the first local control device further includes a first result display module: when the evaluation result of the task use case is in the first task evaluation container, the first communication module is also used to: receive data from the first AI task The evaluation container evaluates the evaluation result of at least one task use case; the first result display module is used to: display the evaluation result of at least one task use case; and send the evaluation result of at least one task use case to the management module.
在本申请实施例中,通过在将对不同任务用例进行评估后的评估结果展示出来,可以让用户直观地比较任务对象的评估结果,从而得到客观的任务用例的评估结果,利于用户选择合适的目标任务对象。In the embodiment of this application, by displaying the evaluation results after evaluating different task use cases, the user can intuitively compare the evaluation results of the task objects, thereby obtaining objective evaluation results of the task use cases, which is beneficial to the user in choosing the appropriate one. Target task object.
在一种可能的实现方式中,任务评估的类型为基准测试、认证或竞赛中的任意一种。In one possible implementation, the type of task evaluation is any one of benchmark testing, certification, or competition.
在一种可能的实现方式中,当任务评估的类型为基准测试时,任务环境为测试环境,任务对象为测试对象,测试对象为测试算法、测试模型、测试系统或测试场景中的任意一种,任务范式为测试范式。In a possible implementation, when the type of task evaluation is benchmark testing, the task environment is a test environment, the task object is a test object, and the test object is any one of a test algorithm, a test model, a test system, or a test scenario. , the task paradigm is the test paradigm.
第六方面,提供了一种分布式协同人工智能AI的任务评估的第二本地控制装置,该装置布置于任务边节点,第二本地控制装置包括第二通信模块和第二容器管理模块,第二通信模块用于:接收来自管理装置的第二本地管理指令,第二本地管理指令是根据任务用例确定的,任务用例是根据第二管理指令确定的,其中,第二管理指令包括任务环境的管理指令和/或任务对象的管理指令,任务用例包括任务环境和任务对象;第二容器管理模块用于:根据第二本地管理指令,管理第二本地控制装置。In a sixth aspect, a second local control device for task evaluation of distributed collaborative artificial intelligence AI is provided. The device is arranged at a task edge node. The second local control device includes a second communication module and a second container management module. The second communication module is configured to: receive a second local management instruction from the management device, the second local management instruction is determined based on the task use case, the task use case is determined based on the second management instruction, wherein the second management instruction includes the task environment Management instructions and/or management instructions of task objects. The task use case includes a task environment and a task object; the second container management module is used to: manage the second local control device according to the second local management instruction.
在本申请实施例中,在多节点分布式协同AI的任务评估中,通过对任务用例的管理,可以通过不同的范式管理相应的任务评估容器,以实现对不同任务用例的评估。特别对于数据提供方和算法提供方不一致时,可以保证在数据提供的数据不出边缘节点的情况下,通过获得不同任务用例的评估结果,有助于获得不同任务用例的执行情况,进而使得边云协同分布式AI任务架构易于落地部署。In the embodiment of this application, in the task evaluation of multi-node distributed collaborative AI, through the management of task use cases, the corresponding task evaluation containers can be managed through different paradigms to realize the evaluation of different task use cases. Especially when the data provider and algorithm provider are inconsistent, it can be ensured that when the data provided by the data does not leave the edge node, by obtaining the evaluation results of different task use cases, it helps to obtain the execution status of different task use cases, thereby making the edge The cloud collaborative distributed AI task architecture is easy to deploy.
在一种可能的实现方式中,第二管理指令用于新增任务用例,或者删除任务用例,或者修改任务用例,或者查询任务用例;管理第二本地控制装置为新增第二本地控制装置,或者删除第二本地控制装置,或者修改第二本地控制装置,或者查询第二本地控制装置。In a possible implementation, the second management instruction is used to add a task use case, delete a task use case, modify a task use case, or query a task use case; managing the second local control device is to add a second local control device, Either delete the second local control device, modify the second local control device, or query the second local control device.
在一种可能的实现方式中,当管理第二本地控制装置为新增第二本地控制装置时,第二本地管理指令包括任务用例和任务云节点对应的第二本地容器管理配置;第二容器管理模块还用于:根据任务用例和第二本地容器管理配置,管理第二任务评估容器,第二本地容器管理配置包括第二任务评估容器和第二评估容器对应的管理指令。In a possible implementation, when managing the second local control device is to add a second local control device, the second local management instruction includes the task use case and the second local container management configuration corresponding to the task cloud node; the second container The management module is also used to: manage the second task evaluation container according to the task use case and the second local container management configuration. The second local container management configuration includes the second task evaluation container and the management instructions corresponding to the second evaluation container.
在一种可能的实现方式中,第二任务评估容器对应的管理指令为对第二任务评估容器 的新增指令,或者对第二任务评估容器的删除指令,或者对第二任务评估容器的修改指令,或者对第二任务评估容器的查询指令,或者实现第二任务评估容器和外部任务评估容器之间的通信指令,其中,外部任务评估容器属于任务云节点以外的节点。In a possible implementation, the management instruction corresponding to the second task evaluation container is to evaluate the second task container New instructions for the second task evaluation container, or deletion instructions for the second task evaluation container, or modification instructions for the second task evaluation container, or query instructions for the second task evaluation container, or implementation of the second task evaluation container and external task evaluation containers Communication instructions between nodes where the external task evaluation container belongs to a node other than the task cloud node.
在一种可能的实现方式中,如果第二任务评估容器为新增第二任务评估容器,那么第二容器管理模块还用于:新增任务云节点对应的第二任务评估容器,其中,任务云节点对应的第二AI任务评估容器包括第二分布式工作流。In a possible implementation, if the second task evaluation container is a new second task evaluation container, then the second container management module is also used to: add a second task evaluation container corresponding to the task cloud node, where the task The second AI task evaluation container corresponding to the cloud node includes a second distributed workflow.
在一种可能的实现方式中,第二本地控制装置还包括第二结果显示模块:当任务用例的评估结果在第二任务评估容器时,第二通信模块用于:接收来自第二AI任务评估容器对至少一个任务用例的评估结果;第二结果显示模块用于:显示至少一个任务用例的评估结果;向管理装置发送至少一个任务用例的评估结果。In a possible implementation, the second local control device further includes a second result display module: when the evaluation result of the task use case is in the second task evaluation container, the second communication module is configured to: receive the evaluation result from the second AI task The container evaluates the result of at least one task use case; the second result display module is used to: display the evaluation result of at least one task use case; and send the evaluation result of at least one task use case to the management device.
在本申请实施例中,通过在将对不同任务用例进行评估后的评估结果展示出来,可以让用户直观地比较任务对象的评估结果,从而得到客观的任务用例的评估结果,利于用户选择合适的目标任务对象。In the embodiment of this application, by displaying the evaluation results after evaluating different task use cases, the user can intuitively compare the evaluation results of the task objects, thereby obtaining objective evaluation results of the task use cases, which is beneficial to the user in choosing the appropriate one. Target task object.
在一种可能的实现方式中,任务评估的类型为基准测试、认证或竞赛中的任意一种。In one possible implementation, the type of task evaluation is any one of benchmark testing, certification, or competition.
在一种可能的实现方式中,当任务评估的类型为基准测试时,任务环境为测试环境,任务对象为测试对象,测试对象为测试算法、测试模型、测试系统或测试场景中的任意一种,任务范式为测试范式。In a possible implementation, when the type of task evaluation is benchmark testing, the task environment is a test environment, the task object is a test object, and the test object is any one of a test algorithm, a test model, a test system, or a test scenario. , the task paradigm is the test paradigm.
第七方面,提供了一种分布式协同人工智能AI任务评估系统,该系统包括上述第四方面的装置设计中任意一种可能的实现方式中的管理装置,和上述第五方面的装置设计中任意一种可能的实现方式中的第一本地控制装置,或者执行上述第六方面的装置设计中任意一种可能的实现方式中的第二本地控制装置。In a seventh aspect, a distributed collaborative artificial intelligence AI task evaluation system is provided. The system includes a management device in any possible implementation of the device design of the fourth aspect, and a management device in the device design of the fifth aspect. The first local control device in any possible implementation manner, or the second local control device in any possible implementation manner in the device design of the sixth aspect.
第八方面,提供了一种计算机设备,该设备包括存储器和处理器,存储器,用于存储程序;处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被执行时,所述处理器用于执行第一方面以及第一方面中的任意一种实现方式中的方法,或者所述处理器用于执行第二方面以及第二方面中的任意一种实现方式中的方法,所述处理器用于执行第三方面以及第三方面中的任意一种实现方式中的方法。In an eighth aspect, a computer device is provided. The device includes a memory and a processor. The memory is used to store programs; the processor is used to execute the program stored in the memory. When the program stored in the memory is executed, The processor is configured to execute the method in the first aspect and any one implementation of the first aspect, or the processor is configured to execute the method in the second aspect and any one implementation of the second aspect, so The processor is configured to execute the third aspect and the method in any implementation manner of the third aspect.
上述第八方面中的处理器既可以是中央处理器(central processing unit,CPU),也可以是CPU与神经网络运算处理器的组合,这里的神经网络运算处理器可以包括图形处理器(graphics processing unit,GPU)、神经网络处理器(neural-network processing unit,NPU)和张量处理器(tensor processing unit,TPU)等等。其中,TPU是谷歌(google)为机器学习全定制的人工智能加速器专用集成电路。The processor in the eighth aspect above can be either a central processing unit (CPU) or a combination of a CPU and a neural network computing processor. The neural network computing processor here can include a graphics processor (graphics processing unit). unit (GPU), neural-network processing unit (NPU) and tensor processing unit (TPU), etc. Among them, TPU is an artificial intelligence accelerator special integrated circuit fully customized by Google for machine learning.
第九方面,本申请实施例提供一种计算机程序产品,计算机程序产品包括:计算机程序代码,当计算机程序代码在计算机上运行时,使得计算机执行上述第一方面的方法设计中任意一种可能的实现方式中的方法,或者执行上述第二方面的方法设计中任意一种可能的实现方式中的方法,或者执行上述第三方面的方法设计中任意一种可能的实现方式中的方法。In a ninth aspect, embodiments of the present application provide a computer program product. The computer program product includes: computer program code. When the computer program code is run on a computer, it causes the computer to execute any possible method design in the first aspect. The method in the implementation mode, or executes the method in any possible implementation mode in the method design of the above second aspect, or executes the method in any possible implementation mode in the method design of the above third aspect.
第十方面,本申请实施例提供一种计算机可读介质,计算机可读介质存储有程序代码,当计算机程序代码在计算机上运行时,使得计算机执行上述第一方面的方法设计中任意一种可能的实现方式中的方法,或者执行上述第二方面的方法设计中任意一种可能的实现方式中的方法,或者执行上述第三方面的方法设计中任意一种可能的实现方式中的方法。 In a tenth aspect, embodiments of the present application provide a computer-readable medium. The computer-readable medium stores program code. When the computer program code is run on a computer, it causes the computer to execute any one of the above-mentioned method designs of the first aspect. The method in the implementation mode, or execute the method in any possible implementation mode in the method design of the above second aspect, or execute the method in any possible implementation mode in the method design of the above third aspect.
第十一方面,提供一种芯片,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,执行上述第一方面或第二方面中的任意一种实现方式中的方法。In an eleventh aspect, a chip is provided. The chip includes a processor and a data interface. The processor reads instructions stored in the memory through the data interface and executes any one of the first aspect or the second aspect. method in an implementation.
可选地,作为一种实现方式,所述芯片还可以包括存储器,所述存储器中存储有指令,所述处理器用于执行所述存储器上存储的指令,当所述指令被执行时,所述处理器用于执行第一方面或第二方面中的任意一种实现方式中的方法。Optionally, as an implementation manner, the chip may further include a memory, in which instructions are stored, and the processor is configured to execute the instructions stored in the memory. When the instructions are executed, the The processor is configured to execute the method in any implementation manner of the first aspect or the second aspect.
上述芯片具体可以是现场可编程门阵列(field-programmable gate array,FPGA)或者专用集成电路(application-specific integrated circuit,ASIC)。The above-mentioned chip can specifically be a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC).
附图说明Description of the drawings
图1是本申请实施例提供的一种人工智能主体框架示意图;Figure 1 is a schematic diagram of an artificial intelligence main body framework provided by an embodiment of the present application;
图2A是本申请实施例提供的一种传统的边云协同任务方法架构示意图;Figure 2A is a schematic diagram of the architecture of a traditional edge-cloud collaborative task method provided by an embodiment of the present application;
图2B是本申请实施例提供的另一种传统的边云协同任务方法架构示意图;Figure 2B is a schematic diagram of the architecture of another traditional edge-cloud collaborative task method provided by an embodiment of the present application;
图3A是本申请实施例提供的一种分布式协同AI的任务评估架构示意图;Figure 3A is a schematic diagram of a distributed collaborative AI task evaluation architecture provided by an embodiment of the present application;
图3B是本申请实施例提供的另一种分布式协同AI的任务评估架构示意图;Figure 3B is a schematic diagram of another distributed collaborative AI task evaluation architecture provided by an embodiment of the present application;
图3C是本申请实施例提供的又一种分布式协同AI的任务架构示意图;Figure 3C is a schematic diagram of another distributed collaborative AI task architecture provided by an embodiment of the present application;
图4是本申请实施例提供的一种分布式协同AI的任务评估方法流程;Figure 4 is a flowchart of a distributed collaborative AI task evaluation method provided by an embodiment of the present application;
图5是本申请实施例提供的一种分布式协同AI基准测试仿真方法流程示意图;Figure 5 is a schematic flow chart of a distributed collaborative AI benchmark test simulation method provided by an embodiment of the present application;
图6是本申请实施例提供的一种分布式边云协同AI的基准测试流程示意图;Figure 6 is a schematic flowchart of a benchmark test process for distributed edge-cloud collaborative AI provided by an embodiment of the present application;
图7是本申请实施例提供的一种分布式协同AI的任务评估系统硬件结构性示意图;Figure 7 is a schematic diagram of the hardware structure of a distributed collaborative AI task evaluation system provided by an embodiment of the present application;
图8是本申请实施例提供的一种分布式协同AI的任务评估管理装置硬件结构性示意图。Figure 8 is a schematic diagram of the hardware structure of a distributed collaborative AI task evaluation and management device provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合附图,对本申请实施例中的技术方案进行描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings. Obviously, the described embodiments are only some of the embodiments of the present application, rather than all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.
图1是本申请实施例提供的一种人工智能主体框架示意图,该主体框架描述了人工智能系统总体工作流程,适用于通用的人工智能领域需求。Figure 1 is a schematic diagram of an artificial intelligence main frame provided by an embodiment of the present application. The main frame describes the overall workflow of the artificial intelligence system and is suitable for general needs in the field of artificial intelligence.
下面从“智能信息链”(水平轴)和“信息技术(information technology,IT)价值链”(垂直轴)两个维度对上述人工智能主题框架进行详细的阐述。The above artificial intelligence theme framework is elaborated below from the two dimensions of "intelligent information chain" (horizontal axis) and "information technology (IT) value chain" (vertical axis).
“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。"Intelligent information chain" reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensation process of "data-information-knowledge-wisdom".
“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程,反映人工智能为信息技术产业带来的价值。The "IT value chain" reflects the value that artificial intelligence brings to the information technology industry, from the underlying infrastructure of human intelligence and information (providing and processing technology implementation) to the systematic industrial ecological process.
(1)基础设施:(1)Infrastructure:
基础设施为人工智能系统提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。 Infrastructure provides computing power support for artificial intelligence systems, enables communication with the external world, and supports it through basic platforms.
基础设施可以通过传感器与外部沟通,基础设施的计算能力可以由智能芯片提供。The infrastructure can communicate with the outside through sensors, and the computing power of the infrastructure can be provided by smart chips.
这里的智能芯片可以是中央处理器(central processing unit,CPU)、神经网络处理器(neural-network processing unit,NPU)、图形处理器(graphics processing unit,GPU)、专门应用的集成电路(application specific integrated circuit,ASIC)以及现场可编程门阵列(field programmable gate array,FPGA)等硬件加速芯片。The smart chip here can be a central processing unit (CPU), a neural network processing unit (NPU), a graphics processing unit (GPU), or an application specific integrated circuit. Integrated circuit, ASIC) and field programmable gate array (field programmable gate array, FPGA) and other hardware acceleration chips.
基础设施的基础平台可以包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。The basic platform of infrastructure can include distributed computing framework and network related platform guarantees and support, and can include cloud storage and computing, interconnection networks, etc.
例如,对于基础设施来说,可以通过传感器和外部沟通获取数据,然后将这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。For example, for infrastructure, data can be obtained through sensors and external communication, and then the data can be provided to smart chips in the distributed computing system provided by the basic platform for calculation.
(2)数据:(2)Data:
基础设施的上一层的数据用于表示人工智能领域的数据来源。该数据涉及到图形、图像、语音、文本、序列,还涉及到传统设备的物联网数据,包括已有系统的业务数据以及力、位移、液位、温度、湿度等感知数据。Data from the upper layer of the infrastructure is used to represent data sources in the field of artificial intelligence. This data involves graphics, images, voice, text, sequences, and also involves IoT data of traditional equipment, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
应理解,序列可以理解为数据序列,最常见的有时序序列数据,即时序数据,例如一段时间内的天气预报数据(温度、风向等),又例如股市数据、人体血糖变化数据等生理数据的序列等等。It should be understood that a sequence can be understood as a data sequence. The most common one is sequential sequence data, that is, sequential data, such as weather forecast data (temperature, wind direction, etc.) within a period of time, or physiological data such as stock market data, human blood sugar change data, etc. sequence and so on.
(3)数据处理:(3)Data processing:
上述数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等处理方式。The above data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other processing methods.
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。Among them, machine learning and deep learning can perform symbolic and formal intelligent information modeling, extraction, preprocessing, training, etc. on data.
推理是指在计算机或智能系统中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。Reasoning refers to the process of simulating human intelligent reasoning in computers or intelligent systems, using formal information to perform machine thinking and problem solving based on reasoning control strategies. Typical functions are search and matching.
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。Decision-making refers to the process of decision-making after intelligent information is reasoned, and usually provides functions such as classification, sorting, and prediction.
(4)通用能力:(4) General abilities:
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用系统,例如,翻译,文本的分析,计算机视觉的处理,语音识别,图像的识别等等。After the data is processed as mentioned above, some general capabilities can be formed based on the results of further data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, and image processing. identification, etc.
(5)智能产品及行业应用:(5) Intelligent products and industry applications:
智能产品及行业应用指人工智能系统在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能制造、智能交通、智能家居、智能医疗、智能安防、自动驾驶,平安城市,智能终端等。Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of overall artificial intelligence solutions, productizing intelligent information decision-making and realizing practical applications. Its application fields mainly include: intelligent manufacturing, intelligent transportation, Smart home, smart medical care, smart security, autonomous driving, safe city, smart terminal, etc.
本申请实施例可以应用在人工智能中的很多领域,例如,智能制造、智能交通、智能家居、智能医疗、智能安防、自动驾驶,平安城市等领域。Embodiments of the present application can be applied to many fields in artificial intelligence, such as smart manufacturing, smart transportation, smart home, smart medical care, smart security, autonomous driving, safe cities and other fields.
为了便于理解,下面先对本申请实施例可能涉及相关术语、概念或者技术进行介绍:In order to facilitate understanding, the following first introduces the relevant terms, concepts or technologies that may be involved in the embodiments of this application:
1)边缘设备1) Edge devices
边缘设备可以理解为除云侧设备之外的任一具有计算资源和网络资源的设备。边缘设备可以是客户端设备,也可以是介于云侧服务器和客户端设备之间的设备。比如,手机可以是边缘设备,传感器可以是边缘设备,网关可以是智能家居终端和云侧服务器之间的边缘设备。在理想环境中,边缘设备旨在对数据产生源附近的数据进行分析或处理的设备。 由于没有数据的流转,进而减少网络流量和处理时延。Edge devices can be understood as any device with computing resources and network resources other than cloud-side devices. The edge device can be a client device or a device between the cloud server and the client device. For example, a mobile phone can be an edge device, a sensor can be an edge device, and a gateway can be an edge device between the smart home terminal and the cloud server. In an ideal world, edge devices are designed to analyze or process data close to the source of the data. Since there is no data flow, network traffic and processing delays are reduced.
本申请实施例中的边缘设备可以是具有计算能力的移动电话、平板个人电脑(tablet personal computer,TPC)、媒体播放器、智能家居、笔记本电脑(laptop computer,LC)、个人数字助理(personal digital assistant,PDA)、个人计算机(personal computer,PC)、照相机、摄像机、智能手表、可穿戴式设备(wearable device,WD)或者自动驾驶的车辆等。可以理解的是,本申请实施例对边缘设备的具体形式不作限定。The edge device in the embodiment of the present application may be a mobile phone, a tablet personal computer (TPC), a media player, a smart home, a laptop computer (LC), or a personal digital assistant with computing capabilities. assistant (PDA), personal computer (PC), camera, camcorder, smart watch, wearable device (WD) or self-driving vehicle, etc. It can be understood that the embodiments of the present application do not limit the specific form of the edge device.
2)边缘AI2) Edge AI
边缘AI发源于边缘计算,边缘计算是通过边缘设备对数据产生源的数据进行处理,从而有助于降低整体云边协同系统的处理负载,减少数据延迟。而边缘AI则是在边缘设备上本地处理AI算法,无需流式传输或云侧数据存储的情况下,对数据产生源的数据进行处理和分析等操作。Edge AI originates from edge computing. Edge computing uses edge devices to process data from data generation sources, which helps reduce the processing load of the overall cloud-edge collaboration system and reduce data delays. Edge AI processes AI algorithms locally on edge devices, processing and analyzing data from data generation sources without the need for streaming or cloud-side data storage.
3)AI范式3)AI Paradigm
AI范式是受业界或学界共同认可的AI流程,在一个AI范式中,AI处理流程框架不变,而处理流程框架中的具体算法是可以替换的。以边云协同AI范式为例,在图像分类模型的训练和推理中,如果边云协同AI范式是云侧进行图像分类模型的训练,边侧通过训练好的图像分类模型来完成图像分类的推理过程,那么“云侧训练,边侧推理”就是一种边云协同AI范式。在这种AI范式中,具体的训练方法或推理方法都是可替换的,但是“云侧训练,边侧推理”这个流程是不变的。AI paradigm is an AI process recognized by the industry or academia. In an AI paradigm, the AI processing process framework remains unchanged, and the specific algorithms in the processing process framework can be replaced. Taking the edge-cloud collaborative AI paradigm as an example, in the training and reasoning of image classification models, if the edge-cloud collaborative AI paradigm is to train the image classification model on the cloud side, the edge side completes the inference of image classification through the trained image classification model. process, then "cloud-side training, edge-side reasoning" is an edge-cloud collaborative AI paradigm. In this AI paradigm, specific training methods or reasoning methods are replaceable, but the process of "cloud-side training, edge-side reasoning" remains unchanged.
“云侧”还可以称为云端,“边侧”还可以称为边缘设备侧,还可以有其他名称,本申请实施例中对此不作限制。The "cloud side" can also be called the cloud side, and the "edge side" can also be called the edge device side, and can also have other names, which are not limited in the embodiments of this application.
4)测试场景4)Test scenario
测试场景是满足边缘设备具体应用的业务场景,测试场景可以通过业务场景描述、数据集设定、数据特征设定、数据标签设定、相关指标及标准来表示。例如,测试范式的测试场景可以是车辆重识别应用场景,那么相关指标可以是车辆重识别的全类平均精度(mean average precision,mAP),相关标准可以为相关指标的合格标准,例如,车辆重识别场景中,mAP的合格标准为mAP大于或等于0.95。The test scenario is a business scenario that meets the specific application of edge devices. The test scenario can be represented by business scenario description, data set settings, data feature settings, data label settings, related indicators and standards. For example, the test scenario of the test paradigm can be a vehicle re-identification application scenario, then the relevant indicator can be the mean average precision (mAP) of the vehicle re-identification category, and the relevant standard can be the qualification standard of the relevant indicator, for example, vehicle re-identification In the recognition scenario, the qualifying standard for mAP is mAP greater than or equal to 0.95.
5)测试对象5)Test object
测试对象是指测试的目标实例,该测试对象可以算法、模型、系统、数据集或场景等。例如,在车辆重识别测试场景中,有一系列的车辆重识别算法,此时,测试对象可以为这一系列的车辆重识别算法,进而从一系列算法中得到车辆重识别测试场景中的最佳算法。又例如,测试对象为一系列测试场景,对于某种特定算法,从这一系列测试场景中得到最佳测试场景。The test object refers to the target instance of the test, which can be an algorithm, model, system, data set or scenario, etc. For example, in the vehicle re-identification test scenario, there are a series of vehicle re-identification algorithms. At this time, the test object can be this series of vehicle re-identification algorithms, and then the best vehicle re-identification test scenario can be obtained from the series of algorithms. algorithm. For another example, the test object is a series of test scenarios. For a specific algorithm, the best test scenario is obtained from this series of test scenarios.
6)测试环境6) Test environment
测试环境是边云协同分布式AI测试所需要的配置或约束,其中,该配置可以包括进行测试所需要的资源配置和AI算法配置。例如,资源配置可以为CPU的内核数配置、传输带宽配置等;AI算法配置可以包括业务数据集配置、算法精度评估指标配置、AI算法的测试范式等。The test environment is the configuration or constraints required for edge-cloud collaborative distributed AI testing. The configuration can include the resource configuration and AI algorithm configuration required for testing. For example, resource configuration can include CPU core number configuration, transmission bandwidth configuration, etc.; AI algorithm configuration can include business data set configuration, algorithm accuracy evaluation indicator configuration, AI algorithm test paradigm, etc.
以AI算法为增量学习为例,来说明AI算法的测试范式,AI算法的测试范式还可以称为AI范式。增量测试范式包括训练过程和推理,其中训练过程包括初始化训练模块、模型更新模块,推理过程包括难例识别模块。 Taking the AI algorithm as incremental learning as an example to illustrate the testing paradigm of the AI algorithm, the testing paradigm of the AI algorithm can also be called the AI paradigm. The incremental testing paradigm includes the training process and inference. The training process includes the initialization training module and the model update module, and the inference process includes the difficult example identification module.
7)测试用例7)Test cases
测试用例包括测试对象和测试环境,是为了验证测试对象是否符合特定性能要求,在测试环境约束下的执行实例。Test cases include test objects and test environments, which are execution instances under the constraints of the test environment to verify whether the test objects meet specific performance requirements.
8)基准测试(benchmark)8) Benchmark
基准测试的单位为测试用例,包括一系列测试用例。基准测试是受学界或业界共同认可的边云协同分布式AI系统评估方法。The unit of benchmark testing is a test case, which includes a series of test cases. Benchmark testing is an edge-cloud collaborative distributed AI system evaluation method recognized by academia or industry.
9)容器技术9) Container technology
容器技术是一种操作系统级别的虚拟化技术,通过操作系统隔离技术,例如,Linux下的控制组和命名空间,将不同的进程隔离开来。容器技术不同于硬件虚拟化技术,它没有虚拟硬件,容器的内部也没有操作系统,而只有进程。正是由于容器技术的这个特点,使得容器相比于虚拟机更轻量,管理也更加方便。在容器的运行态,定义了一组公共的管理操作,例如,启动、停止、暂停和删除等,从而可以对容器的生命周期进行统一管理。容器在运行时,是按需启动的,也就是当创建的容器完成相应的任务之后,就可以被删除,待下次再使用时,可以再重新创建。Container technology is an operating system-level virtualization technology that isolates different processes through operating system isolation technology, such as control groups and namespaces under Linux. Container technology is different from hardware virtualization technology in that it does not have virtual hardware, and there is no operating system inside the container, but only processes. It is precisely because of this feature of container technology that containers are lighter and more convenient to manage than virtual machines. In the running state of the container, a set of common management operations are defined, such as starting, stopping, pausing, deleting, etc., so that the life cycle of the container can be managed uniformly. When the container is running, it is started on demand, that is, after the created container completes the corresponding task, it can be deleted and re-created when it is used next time.
图2A是本申请实施例提供的一种传统的边云协同任务方法架构示意图,图2B是本申请实施例提供的另一种传统的边云协同任务方法架构示意图,下面将结合图2A和图2B分别对此做详细说明。Figure 2A is a schematic diagram of the architecture of a traditional edge-cloud collaborative task method provided by an embodiment of the present application. Figure 2B is a schematic diagram of the architecture of another traditional edge-cloud collaborative task method provided by an embodiment of the present application. The following will be combined with Figure 2A and Figure 2B explains this in detail respectively.
目前,有很多边云协同的分布式AI任务评估方法,尤其在基准测试上,如图2A所示的基准测试任务,算法开发者将云侧训练边侧推理的测试范式分别部署在云侧和边侧,云节点和边节点可以是具体的物理节点,边云协同分布式协作中,云节点的工作任务是训练工作,云节点将训练好的模型发送给边节点,边节点的工作任务是推理工作,边节点通过训练好的模型,来完成具体的推理任务。边节点作为数据产生源附近的边缘设备,可以将新采集的数据发送给云节点,从而对模型进行进一步训练。Currently, there are many edge-cloud collaborative distributed AI task evaluation methods, especially in benchmark tests. For the benchmark test task shown in Figure 2A, algorithm developers deploy the test paradigms of cloud-side training and side-side reasoning on the cloud side and side respectively. On the side, cloud nodes and edge nodes can be specific physical nodes. In edge-cloud collaborative distributed collaboration, the task of the cloud node is training. The cloud node sends the trained model to the edge node, and the task of the edge node is For reasoning work, edge nodes complete specific reasoning tasks through trained models. As an edge device near the data generation source, edge nodes can send newly collected data to cloud nodes to further train the model.
如图2B所示的基准测试任务,例如针对联邦学习的测试框架,算法开发者将边侧训练云侧聚合后推理的测试范式分别部署在边侧和云侧。也就是,边节点的分布式协作任务是训练工作,将数据源产生的数据在边节点上进行训练,然后将边节点训练好的模型发送给云节点,云节点将从不同边节点获得的模型进行汇聚,得到总的模型,并在云节点上通过总的模型来完成具体的推理任务。As shown in the benchmark test task shown in Figure 2B, such as the test framework for federated learning, algorithm developers deploy the test paradigm of edge-side training and cloud-side aggregation and post-inference reasoning on the edge and cloud sides respectively. That is, the distributed collaboration task of edge nodes is training work. The data generated by the data source is trained on the edge nodes, and then the trained model of the edge nodes is sent to the cloud node. The cloud node will obtain the model from different edge nodes. After aggregation, the overall model is obtained, and specific reasoning tasks are completed through the overall model on the cloud node.
无论是图2A所示的边云协同分布式AI基准测试架构,还是图2B所示的边云协同分布式AI基准测试架构,都存在如下问题。Whether it is the edge-cloud collaborative distributed AI benchmark architecture shown in Figure 2A or the edge-cloud collaborative distributed AI benchmark architecture shown in Figure 2B, there are the following problems.
首先,如果测试范式一旦被确定后,那么服务开发者将这样的测试范式部署之后,算法开发者在只能在这样的测试范式下对测试用例进行测试。例如,如图2A中云侧训练边侧推理的测试范式一旦被确定后,就不能被轻易改变,如果要改变,那么需要被重新部署新的测试范式。特别地,如图2B中的针对联邦学习的测试框架下的测试范式,仅适用于数据提供方和算法提供方一致的情况,如果算法提供方和数据提供方不一致时,算法提供方无法获得训练数据,数据提供方无法获得算法,此时图2B的测试范式就不再适用。算法提供方和数据提供一致的情况往往是比较理想的状态,而现实情况中,两者往往是不一致的情况居多。First of all, once the test paradigm is determined, after the service developer deploys such a test paradigm, the algorithm developer can only test the test cases under such a test paradigm. For example, once the test paradigm for cloud-side training and side-side reasoning in Figure 2A is determined, it cannot be easily changed. If it is to be changed, a new test paradigm needs to be redeployed. In particular, the test paradigm under the test framework for federated learning in Figure 2B is only applicable when the data provider and the algorithm provider are consistent. If the algorithm provider and the data provider are inconsistent, the algorithm provider cannot obtain training. Data, data providers cannot obtain the algorithm, and the test paradigm in Figure 2B is no longer applicable. It is often an ideal state for the algorithm provider and the data provider to be consistent, but in reality, the two are often inconsistent.
其次,算法开发者需要根据实际的边侧环境,对测试范式中的测试用例进行反复重复部署,反复部署相关镜像,如果测试对象是算法,单节点可以实现对算法的测试,但是如 果测试对象是系统,那么需要通过搭建多节点或者多设备的架构才能对测试用例进行正确的测试,例如,在云侧创建多个节点进行测试,或者在边侧通过搭建多个树莓派架构进行测试。目前,这种测试方式,无论是人力成本,还是物力成本都比较高。Secondly, algorithm developers need to repeatedly deploy test cases in the test paradigm and deploy related images according to the actual side environment. If the test object is an algorithm, a single node can test the algorithm, but if If the test object is a system, then you need to build a multi-node or multi-device architecture to correctly test the test case. For example, create multiple nodes on the cloud side for testing, or build multiple Raspberry Pi architectures on the side. carry out testing. At present, this kind of testing method has relatively high labor and material costs.
另外,目前的测试场景比较有限,仅支持医疗场景或者路况分类场景中的有限的任务,例如图像分类、目标检测、语音识别等,因此,测试用例也会比较有限,也就是测试用例中的测试环境和测试对象都是在有限测试场景下的。例如,缺少对工业质检、车辆重识别等典型的边云协同分布式AI应用场景的支持。In addition, the current test scenarios are relatively limited and only support limited tasks in medical scenarios or traffic classification scenarios, such as image classification, target detection, speech recognition, etc. Therefore, the test cases will also be relatively limited, that is, the tests in the test cases The environment and test objects are both in limited test scenarios. For example, there is a lack of support for typical edge-cloud collaborative distributed AI application scenarios such as industrial quality inspection and vehicle re-identification.
为了解决上述问题,本申请实施例提出了一种分布式协同AI任务评估方法和装置,下面将结合附图详细说明分布式协同AI任务评估方法。下面主要以边云协同分布式AI任务为例进行说明,应理解,本申请实施例中的分布式协同架构,不仅仅局限于边云协同分布式架构,还可以是其他分布式架构,对此不作限制。In order to solve the above problems, embodiments of the present application propose a distributed collaborative AI task evaluation method and device. The distributed collaborative AI task evaluation method will be described in detail below with reference to the accompanying drawings. The following mainly takes edge-cloud collaborative distributed AI tasks as an example for explanation. It should be understood that the distributed collaborative architecture in the embodiments of this application is not limited to edge-cloud collaborative distributed architecture, but can also be other distributed architectures. In this regard No restrictions.
首先,结合图3A至图3C对本申请实施例中边云协同分布式AI任务评估架构进行说明。First, the edge-cloud collaborative distributed AI task evaluation architecture in the embodiment of the present application will be described with reference to Figures 3A to 3C.
边云协同的AI任务评估类型为涉及标准化评估的AI任务。例如可以是边云协同的AI基准测试任务、边云协同的云服务与产品的AI应用认证、或者边云协同的AI竞赛评级任务等,本申请实施例对此不作限制。另外,下文中主要以边云协同的AI基准测试任务为例进行详细说明。The AI task assessment type of edge-cloud collaboration is an AI task involving standardized assessment. For example, it can be an AI benchmark test task for edge-cloud collaboration, an AI application certification task for cloud services and products for edge-cloud collaboration, or an AI competition rating task for edge-cloud collaboration, etc. The embodiments of this application do not limit this. In addition, the following mainly takes the AI benchmark test task of edge-cloud collaboration as an example for detailed explanation.
边云协同分布式的AI任务评估架构,可以包括云侧和边侧。云侧包括云节点,边侧包括边节点,其中,云节点和边节点可以是具体的物理节点,也就是云节点可以是云侧的服务器等设备,边节点可以除云侧具体设备以外的其他设备,例如,边缘设备或者客户端设备。更细化地,边云协同分布式的AI任务评估架构,还可以包括云侧、边侧以及客户端,这种端边云的架构,此时,边侧的具体设备为介于云侧和客户端之间的设备,例如,边侧设备可以是介于智能家居客户端和云侧服务器之间的网关。本申请实施例对边节点和云节点的具体物理形式不作限制。云节点还可以是虚拟的云端节点。The edge-cloud collaborative distributed AI task evaluation architecture can include both the cloud side and the edge side. The cloud side includes cloud nodes, and the edge side includes edge nodes. Cloud nodes and edge nodes can be specific physical nodes, that is, cloud nodes can be servers and other equipment on the cloud side, and edge nodes can be other than specific equipment on the cloud side. Devices, such as edge devices or client devices. More specifically, the edge-cloud collaborative distributed AI task evaluation architecture can also include the cloud side, edge side, and client. In this end-edge-cloud architecture, at this time, the specific equipment on the edge side is between the cloud side and the client side. Devices between clients, for example, edge devices can be gateways between smart home clients and cloud-side servers. The embodiments of this application do not limit the specific physical forms of edge nodes and cloud nodes. The cloud node can also be a virtual cloud node.
应理解,本申请实施例的图3A和3C是以边云的架构为例进行说明,本申请实施例中的分布式协同AI任务评估方法和装置同样适用于端边云的架构。It should be understood that Figures 3A and 3C in the embodiment of the present application take the edge-cloud architecture as an example for illustration. The distributed collaborative AI task evaluation method and device in the embodiment of the present application are also applicable to the architecture of the device-edge cloud.
另外,本申请实施例对云节点和边节点的具体个数也不作限定,图3A和图3B中的云节点和边节点个数仅为示例性表示。In addition, the embodiment of the present application does not limit the specific number of cloud nodes and edge nodes. The number of cloud nodes and edge nodes in Figure 3A and Figure 3B is only an exemplary representation.
图3A是本申请实施例提供的一种分布式协同AI的任务评估架构示意图。Figure 3A is a schematic diagram of a distributed collaborative AI task evaluation architecture provided by an embodiment of the present application.
如图3A所示,云侧的云节点包括控制云节点和任务云节点。控制云节点中布置管理模块,也可以称为管理装置,管理模块用于实现全局容器的生命周期管理。管理模块有两种对全局容器生命周期管理的方式。第一种,任务管理装置可以创建所有节点中的本地控制装置以及任务评估容器,其中,布置在其他节点上的本地控制装置也是一种容器。第二种,任务管理装置还可以创建第一本地控制装置和第二本地控制装置,其中第一本地控制装置布置于任务云节点,第二本地控制装置布置于任务边节点,然后两个本地控制装置分别对本地节点的任务容器进行管理。管理装置包括的容器管理模块,容器管理模块来实现全局容器新增、删除、修改和查询等功能。管理装置还包括通信模块。As shown in Figure 3A, the cloud nodes on the cloud side include control cloud nodes and task cloud nodes. A management module is arranged in the control cloud node, which can also be called a management device. The management module is used to realize the life cycle management of the global container. The management module has two ways to manage the global container life cycle. First, the task management device can create local control devices and task evaluation containers in all nodes, where local control devices arranged on other nodes are also a kind of container. Secondly, the task management device can also create a first local control device and a second local control device, where the first local control device is arranged at the task cloud node, the second local control device is arranged at the task edge node, and then the two local control devices The device manages task containers of local nodes respectively. The management device includes a container management module. The container management module implements functions such as adding, deleting, modifying, and querying global containers. The management device also includes a communication module.
管理装置还包括任务用例管理模块,也可以称为全局任务用例管理模块,全局任务用例管理模块用于对任务用例的新增、删除、修改、查询等基础功能。例如,当AI任务评估为AI基准测试时,全局任务用例管理模块可以为全局测试用例管理模块,全局测试用 例管理模块用于对测试环境和/或测试对象的新增、删除、修改、查询等基础功能。The management device also includes a task use case management module, which can also be called a global task use case management module. The global task use case management module is used for basic functions such as adding, deleting, modifying, and querying task use cases. For example, when the AI task evaluation is an AI benchmark test, the global task case management module can be a global test case management module, and the global test can be The instance management module is used for basic functions such as adding, deleting, modifying, and querying the test environment and/or test objects.
任务云节点中布置第一本地控制装置,第一本地控制装置用于实现对第一任务评估容器的生命周期管理以及和外部评估任务容器的通信,其中外部任务容器属于本地任务评估容器对应的任务节点以外的节点,也就是属于第一任务评估容器对应的任务云节点以外的节点,例如,图3A中所示的第二任务评估容器。第一任务评估容器中具体的工作流(pipeline)在第一任务评估容器创建初期就已经创建好。例如,第一任务评估容器中可以包括本地仿真工作流和第一分布式工作流。第一本地控制装置中也包括第一容器管理模块和第一任务用例管理模块,其中,第一容器管理模块用于实现对本地节点上的容器的新增、删除、修改和查询等功能,第一任务用例管理模块用于对任务用例的管理,例如,对测试用例中的测试对象和/或测试环境实现新增、删除、修改、查询等基础功能。第一本地控制装置中也包括第一通信模块。A first local control device is arranged in the task cloud node. The first local control device is used to implement life cycle management of the first task evaluation container and communication with the external evaluation task container, where the external task container belongs to the task corresponding to the local task evaluation container. Nodes other than the node, that is, nodes other than the task cloud node corresponding to the first task evaluation container, for example, the second task evaluation container shown in Figure 3A. The specific workflow (pipeline) in the first task evaluation container has been created at the early stage of the creation of the first task evaluation container. For example, the first task evaluation container may include a local simulation workflow and a first distributed workflow. The first local control device also includes a first container management module and a first task case management module, where the first container management module is used to implement functions such as adding, deleting, modifying, and querying containers on the local node. A task case management module is used to manage task cases, for example, to implement basic functions such as adding, deleting, modifying, and querying test objects and/or test environments in test cases. The first local control device also includes a first communication module.
边侧的任务边节点上布置的第二本地控制装置和任务云节点上布置的第一本地控制装置相似,为了简洁,在此不做赘述。The second local control device arranged on the side task edge node is similar to the first local control device arranged on the task cloud node. For the sake of simplicity, no further description will be given here.
在本申请实施例中,如果通过图3A所示的架构进行边云协同的任务时,将用于全局管理的管理装置布置于单独的控制云节点上。In the embodiment of the present application, if the edge-cloud collaboration task is performed through the architecture shown in Figure 3A, the management device for global management is arranged on a separate control cloud node.
除此以外,管理装置还可以和第一本地控制装置以及第一任务评估容器布置于同一个云节点中,图3B是本申请实施例提供的另一种分布式协同AI的任务评估架构示意图。In addition, the management device can also be arranged in the same cloud node as the first local control device and the first task evaluation container. Figure 3B is a schematic diagram of another distributed collaborative AI task evaluation architecture provided by an embodiment of the present application.
图3B所示的架构图和图3A相比,不同点在于,管理装置不是单独地布置在一个控制云节点上。管理装置作为高层管理模块和第一本地控制装置以及第一任务评估容器,共同置于同一个云节点上。图3B中的结构模块功能和图3A中相同,为了简洁,在此不做赘述。Compared with the architecture diagram shown in Figure 3B and Figure 3A, the difference lies in that the management device is not separately arranged on a control cloud node. The management device, as a high-level management module, the first local control device and the first task evaluation container, are jointly placed on the same cloud node. The functions of the structural modules in Figure 3B are the same as those in Figure 3A, and will not be described again for the sake of simplicity.
当通过单节点实现多节点的分布式协同AI的任务评估仿真时,分布式协同AI的任务评估仿真工作可以仅在边节点或者云节点上,从而在单节点上实现多节点的分布式协同AI任务评估仿真,例如,在单节点上多节点的分布式协同AI的基准测试仿真。图3C是本申请实施例提供的又一种分布式协同AI的任务架构示意图。When the task evaluation simulation of multi-node distributed collaborative AI is implemented through a single node, the task evaluation simulation of distributed collaborative AI can be performed only on edge nodes or cloud nodes, thereby realizing multi-node distributed collaborative AI on a single node. Task evaluation simulation, for example, benchmarking simulation of distributed collaborative AI with multiple nodes on a single node. Figure 3C is a schematic diagram of another distributed collaborative AI task architecture provided by an embodiment of the present application.
如图3C所示,通过单节点实现多节点的任务评估仿真的分布式协同AI的任务评估仿真架构图。单节点的管理装置用于实现单节点的所有容器的生命周期管理,单节点的管理装置包括任务用例管理模块和容器管理模块。例如,在分布式协同AI的基准测试中,测试用例管理模块用于对测试环境和测试对象的新增、删除、修改或查询等基本功能。容器管理模块用于实现对第三AI任务评估容器的新增、删除、修改或查询等基本功能。测试容器中包括本地仿真工作流。As shown in Figure 3C, the task evaluation simulation architecture diagram of distributed collaborative AI that implements multi-node task evaluation simulation through a single node. The single-node management device is used to implement life cycle management of all containers of a single node. The single-node management device includes a task case management module and a container management module. For example, in the benchmark test of distributed collaborative AI, the test case management module is used for basic functions such as adding, deleting, modifying or querying the test environment and test objects. The container management module is used to implement basic functions such as adding, deleting, modifying or querying the third AI task evaluation container. A local simulation workflow is included in the test container.
上述三种分布式任务架构中,图3A和3B所示的架构可以用于实现边云协同多节点上的分布式协同AI的任务评估,例如分布式协同AI的基准测试,而图3C所示的架构可以用于在单节点上实现多节点的分布式协同AI的任务评估仿真,例如在单节点上实现多节点分布式协同AI的基准测试仿真。Among the above three distributed task architectures, the architecture shown in Figures 3A and 3B can be used to implement the task evaluation of distributed collaborative AI on edge-cloud collaborative multi-nodes, such as the benchmark test of distributed collaborative AI, while the architecture shown in Figure 3C The architecture can be used to implement task evaluation simulations of multi-node distributed collaborative AI on a single node, such as benchmark testing simulations of multi-node distributed collaborative AI on a single node.
下面将结合图4至图6详细说明分布式协同AI的任务评估方法。The task evaluation method of distributed collaborative AI will be explained in detail below with reference to Figures 4 to 6.
图4是本申请实施例提供的一种分布式协同AI任务的评估方法流程图。Figure 4 is a flow chart of an evaluation method for distributed collaborative AI tasks provided by an embodiment of the present application.
S401,获取分布式协同AI的任务配置和分布式协同AI的任务对象,AI任务配置包括分布式协同AI的任务环境的配置。S401: Obtain the task configuration of the distributed collaborative AI and the task object of the distributed collaborative AI. The AI task configuration includes the configuration of the task environment of the distributed collaborative AI.
作为一种可能的实现方式,读取任务配置对应的配置文件,读取任务对象对应的文件。 As a possible implementation method, read the configuration file corresponding to the task configuration and read the file corresponding to the task object.
应理解,分布式协同AI的任务配置和任务对象均为可调整的,具体的调整方式可以是新增、删除、修改和查询等基本功能,本申请实施例对调整的方式不作限定。It should be understood that the task configuration and task objects of distributed collaborative AI are adjustable. The specific adjustment methods can be basic functions such as adding, deleting, modifying, and querying. The embodiments of this application do not limit the adjustment methods.
S402,根据任务环境的配置和任务对象,接收第一管理指令,其中,第一管理指令包括任务环境的管理指令和/或任务对象的管理指令。S402: Receive a first management instruction according to the configuration of the task environment and the task object, where the first management instruction includes a management instruction of the task environment and/or a management instruction of the task object.
应理解,第一管理指令可以是任务环境的管理指令,第一管理指令还可以是任务对象的管理指令,第一管理指令还可以是任务环境和任务对象的管理指令。It should be understood that the first management instruction may be a management instruction of the task environment, the first management instruction may also be a management instruction of the task object, and the first management instruction may also be a management instruction of the task environment and the task object.
S403,根据第一管理指令,管理分布式协同AI的任务用例,任务用例包括AI任务环境和AI任务对象。S403. According to the first management instruction, manage task use cases of distributed collaborative AI. The task use cases include AI task environments and AI task objects.
作为一种可能的实现方式,第一管理指令可以是对AI任务用例的新增、删除、修改或查询等基本管理指令,本申请实施例对此不作限制。As a possible implementation manner, the first management instruction may be a basic management instruction such as adding, deleting, modifying or querying the AI task case, which is not limited in the embodiment of the present application.
在本申请实施例中,通过对任务环境和任务对象进行管理,可以灵活地对分布式协同人工智能的范式对应的任务用例进行管理,便于服务开发者根据实际场景下的业务需求,管理任务用例,从而便于对不同的任务用例进行处理,有助于获得不同AI任务用例的执行情况,进而使得边云协同分布式AI任务架构易于落地部署。In the embodiment of this application, by managing the task environment and task objects, task use cases corresponding to the paradigm of distributed collaborative artificial intelligence can be flexibly managed, which facilitates service developers to manage task use cases according to business needs in actual scenarios. , which facilitates the processing of different task use cases and helps to obtain the execution status of different AI task use cases, thereby making the edge-cloud collaborative distributed AI task architecture easy to deploy.
如上文所描述的,分布式协同AI的任务评估的类型为涉及标准化评估的AI任务。例如可以是分布式边云协同AI的基准测试、分布式边云协同AI的云服务与产品的应用认证、或者分布式边云协同AI的竞赛评级任务等。下面以分布式协同AI的任务评估的类型为分布式边云协同AI的基准测试为例,从以下两种基准测试详细说明分布式协同AI的任务评估方法。第一种为在单节点上进行多节点基准测试仿真,第二种为在多节点上进行分布式边云协同基准测试。As described above, the type of task evaluation of distributed collaborative AI is an AI task involving standardized evaluation. For example, it can be a benchmark test of distributed edge-cloud collaborative AI, application certification of cloud services and products for distributed edge-cloud collaborative AI, or a competition rating task for distributed edge-cloud collaborative AI, etc. The following takes the benchmark test of distributed edge-cloud collaborative AI as an example. The task evaluation method of distributed collaborative AI is explained in detail from the following two benchmark tests. The first is to conduct multi-node benchmark test simulation on a single node, and the second is to conduct distributed edge-cloud collaborative benchmark test on multiple nodes.
此时,任务环境为测试环境,任务对象为测试对象,任务用例为测试用例,任务配置为测试环境配置,任务环境配置为测试环境配置,任务范式为测试范式(也可以称为AI范式),任务评估模式为基准测试模式,任务用例管理模块为测试用例管理模块。At this time, the task environment is the test environment, the task object is the test object, the task case is the test case, the task configuration is the test environment configuration, the task environment configuration is the test environment configuration, and the task paradigm is the test paradigm (can also be called AI paradigm), The task evaluation mode is the benchmark test mode, and the task case management module is the test case management module.
第一种在单节点上进行多节点基准测试仿真,该基准测试仿真架构如图3C所示。下面将结合图5详细说明。The first type performs multi-node benchmark simulation on a single node. The benchmark simulation architecture is shown in Figure 3C. This will be described in detail below with reference to Figure 5 .
图5是本申请实施例提供的一种分布式协同AI基准测试仿真方法流程示意图。其中,以测试场景为工业质检中的安全帽目标检测为例,测试环境中的测试范式为边云协同增量学习(incremental learning,IL)。Figure 5 is a schematic flowchart of a distributed collaborative AI benchmark test simulation method provided by an embodiment of the present application. Among them, the test scenario is safety helmet target detection in industrial quality inspection as an example, and the test paradigm in the test environment is edge-cloud collaborative incremental learning (IL).
在管理装置开始使用之前,根据基准测试模式,启动管理装置,并初始化管理装置中的各个模块。Before the management device starts to be used, the management device is started according to the benchmark test mode, and each module in the management device is initialized.
具体地,根据单节点基准测试仿真模式,启动管理装置,并初始化管理装置中的测试用例管理模块、容器管理模块、通信模块以及结果显示模块。Specifically, according to the single-node benchmark test simulation mode, the management device is started, and the test case management module, container management module, communication module and result display module in the management device are initialized.
S510,管理装置的通信模块获取测试配置文件,测试配置文件包括测试环境配置模块、测试范式配置模块、算法基础配置模块、算法超参配置模块、容器管理配置模块以及容器调用配置。其中,测试配置文件中的配置模块都可以以单独的配置模块的形式存在,或者不以单独的配置模块的形式存在。例如,测试范式配置模块可以为单独的配置模块,也可以包括在测试环境配置模块中。算法超参配置可以以子配置文件的形式存在,也可以不以子配置文件形式存在。S510. The communication module of the management device obtains the test configuration file. The test configuration file includes a test environment configuration module, a test paradigm configuration module, an algorithm basic configuration module, an algorithm hyperparameter configuration module, a container management configuration module, and a container invocation configuration. Among them, the configuration modules in the test configuration file may exist in the form of separate configuration modules, or may not exist in the form of separate configuration modules. For example, the test paradigm configuration module can be a separate configuration module or included in the test environment configuration module. Algorithm hyperparameter configuration can exist in the form of a sub-configuration file or not.
作为一种可能的实现方式,测试用例管理模块读取测试环境配置模块、测试范式配置模块、算法基础配置模块或算法超参配置模块中的配置参数。 示例性地,测试环境模块地配置参数如表1所示。As a possible implementation method, the test case management module reads the configuration parameters in the test environment configuration module, test paradigm configuration module, algorithm basic configuration module or algorithm super parameter configuration module. By way of example, the configuration parameters of the test environment module are as shown in Table 1.
表1测试环境配置模块参数说明

Table 1 Test environment configuration module parameter description

应理解,表1中的测试环境参数配置仅为示例性地说明,还可以有其他测试环境参数配置,可以根据测试场景以及服务开发法者的需求对其做相应的修改。本申请实施例对此不作限制。It should be understood that the test environment parameter configuration in Table 1 is only an exemplary description. There can also be other test environment parameter configurations, which can be modified accordingly according to the test scenario and the needs of the service developer. The embodiments of the present application do not limit this.
还应理解,当仿真开关为启动时,每个任务容器的仿真模式,可以为表1中的任意一种。对于单节点中的所有任务容器而言,所有任务容器的仿真模式可以是这四种的任意一种,或者这四种仿真模式的任意组合。也就是,所有任务工作节点可以在相同的仿真模式下进行仿真,也可以是在不同的仿真模式下进行仿真。例如,单节点创建的10个任务容器,都用于系统性能测试仿真,那么这10个任务容器的内部分别创建10个系统伪容器。系统伪容器可以理解为系统伪容器不直接使用数据进行运行,而是在系统伪容器内部有带宽消耗或者能耗消耗等系统性能对应的计算公式,这些计算公式可以用于系统性能测试仿真。再例如,单节点创建的10个任务容器,其中5个用于系统性能测试仿真,另外5个用于算法性能测试仿真,那么,5个用于系统性能测试仿真的容器内部创建系统伪容器,5个用于算法性能测试仿真的容器内部创建算法伪容器。It should also be understood that when the simulation switch is enabled, the simulation mode of each task container can be any one in Table 1. For all task containers in a single node, the simulation mode of all task containers can be any one of these four simulation modes, or any combination of these four simulation modes. That is, all task work nodes can be simulated in the same simulation mode, or they can be simulated in different simulation modes. For example, if 10 task containers created by a single node are all used for system performance test simulation, then 10 system pseudo-containers will be created inside these 10 task containers. The system pseudo-container can be understood as the system pseudo-container does not directly use data to run, but has calculation formulas corresponding to system performance such as bandwidth consumption or energy consumption inside the system pseudo-container. These calculation formulas can be used for system performance test simulation. For another example, if there are 10 task containers created by a single node, 5 of them are used for system performance test simulation and the other 5 are used for algorithm performance test simulation. Then, a system pseudo-container is created inside the 5 containers used for system performance test simulation. Algorithm pseudo-containers are created inside 5 containers used for algorithm performance test simulation.
示例性地,测试范式配置模块中的参数说明如表2所示。For example, the parameter description in the test paradigm configuration module is shown in Table 2.
表2测试范式配置模块参数说明

Table 2 Test paradigm configuration module parameter description

应理解,测试范式配置参数可以以单独的模块存在,也可以作为测试环境配置模块中参数的一部分存在,本申请实施例对此不作限制。在本申请实施例中以测试范式的配置参数为测试环境配置参数的一部分来说明,此处将测试环境模块参数和测试范式模块参数分开示意。It should be understood that the test paradigm configuration parameters may exist in a separate module or as part of the parameters in the test environment configuration module, and the embodiments of the present application do not limit this. In the embodiment of the present application, the configuration parameters of the test paradigm are used as part of the test environment configuration parameters. Here, the test environment module parameters and the test paradigm module parameters are shown separately.
示例性地,算法基础配置模块中的参数说明如表3所示。For example, the parameter description in the basic configuration module of the algorithm is shown in Table 3.
表3算法基础配置模块参数说明
Table 3 Algorithm basic configuration module parameter description
其中,算法基础配置模块中的AI算法可以任意一种AI算法。Among them, the AI algorithm in the algorithm basic configuration module can be any AI algorithm.
示例性地,算法超参模块中的超参数可以直接以枚举的形式存在于算法超参配置模块中,或者,以超参数配置文件的形式存在。对于前者,如果算法超参数为学习率,通过列表来列举所有的学习率,在算法基础配置模块中的multi-parameters中添加了学习率的超参名。对于后者,算法基础配置模块中的hyperparameter_file不为空,那么获取算法超参配置时,不扫描算法超参配置模块,而是直接读取算法超参数配置文件。For example, the hyperparameters in the algorithm hyperparameter module can exist directly in the algorithm hyperparameter configuration module in the form of enumerations, or in the form of a hyperparameter configuration file. For the former, if the algorithm hyperparameter is the learning rate, all learning rates are listed in the list, and the hyperparameter name of the learning rate is added to the multi-parameters in the basic configuration module of the algorithm. For the latter, if the hyperparameter_file in the algorithm basic configuration module is not empty, then when obtaining the algorithm hyperparameter configuration, the algorithm hyperparameter configuration module will not be scanned, but the algorithm hyperparameter configuration file will be read directly.
容器管理配置包括容器数量、每个容器大小。容器管理配置为本地容器管理配置和/或全局容器管理配置。例如,当容器管理配置为本地容器管理配置时,本地容器管理配置包括本地容器数量和每个本地容器大小。Container management configuration includes the number of containers and the size of each container. Container management configuration is local container management configuration and/or global container management configuration. For example, when the container management configuration is a local container management configuration, the local container management configuration includes the number of local containers and the size of each local container.
容器调用配置包括测试范式对应的模块的调用顺序和测试范式对应的模块的超参数。The container calling configuration includes the calling sequence of the modules corresponding to the test paradigm and the hyperparameters of the modules corresponding to the test paradigm.
S520,管理装置的通信模块获取测试对象。S520: The communication module of the management device obtains the test object.
作为一种可能的实现方式,当测试对象为测试算法时,也就是测试对象为测试范式对应的算法时,测试用例管理模块通过读取自定义算法文件,获取测试算法。As a possible implementation method, when the test object is a test algorithm, that is, when the test object is an algorithm corresponding to the test paradigm, the test case management module obtains the test algorithm by reading the custom algorithm file.
其中,测试对象为算法时,测试对象可以为不同的自定义算法文件,从而获得不同的算法。When the test object is an algorithm, the test object can be different customized algorithm files, thereby obtaining different algorithms.
例如,在增量学习范式中,增量学习范式包括初始化训练模块、难例识别模块以及模型更新模块,每个模块都可以通过不同的算法实现,那么自定义的算法文件可以是这三个模块对应的算法。通过读取不同的自定义算法,可以获得不同的测试算法。For example, in the incremental learning paradigm, the incremental learning paradigm includes an initialization training module, a difficult example identification module, and a model update module. Each module can be implemented through different algorithms, so the customized algorithm file can be these three modules. the corresponding algorithm. By reading different custom algorithms, different test algorithms can be obtained.
应理解,自定义算法文件位于边云协同AI基准测试平台库中的算法目录中,在单节点上进行多节点仿真时,需要将边云协同AI基准测试平台库部署在单节点上,具体的部 署方式可以为下载边云协同AI基准测试平台库。It should be understood that the custom algorithm files are located in the algorithm directory in the edge-cloud collaborative AI benchmark testing platform library. When performing multi-node simulation on a single node, the edge-cloud collaborative AI benchmark testing platform library needs to be deployed on a single node. Specifically, department The deployment method can be to download the edge cloud collaborative AI benchmark testing platform library.
还应理解,自定义算法文件为一种规范模板文件,例如,自定义算法文件可以符合Sedna lib的算法接口规范。It should also be understood that the custom algorithm file is a standard template file. For example, the custom algorithm file can conform to the algorithm interface specification of Sedna lib.
S530,管理装置的通信模块获取数据集。S530: The communication module of the management device obtains the data set.
作为一种可能的实现方式,测试用例管理模块可以通过读取txt格式和csv格式的数据文件。As a possible implementation method, the test case management module can read data files in txt format and csv format.
应理解,测试数据的数据源可以是kaggle中的开源数据集和开源数据集对应的数据预处理算法,还可以是边节点获取的数据提供方采集的数据。It should be understood that the data source of the test data can be the open source data set in Kaggle and the data preprocessing algorithm corresponding to the open source data set, or it can also be the data collected by the data provider obtained by the edge node.
以安全帽目标检测为例,当测试用例管理模块读取的是txt格式的数据文件时,txt文件的每一行记录一条非结构化数据的索引信息和标签信息。txt格式的数据文件中的索引信息通过图片绝对路径和目标框坐标表示,标签信息可以通过数字1或0表示。Taking safety helmet target detection as an example, when the test case management module reads a data file in txt format, each line of the txt file records a piece of index information and label information of unstructured data. The index information in the txt format data file is represented by the absolute path of the image and the coordinates of the target box, and the label information can be represented by the number 1 or 0.
当测试用例管理模块读取的是csv格式的数据文件时,csv格式的数据文件可以通过结构化数据的特征属性,以及每一条结构化数据的特征属性对应的具体参数来表示。When the test case management module reads a data file in csv format, the data file in csv format can be represented by the characteristic attributes of structured data and the specific parameters corresponding to the characteristic attributes of each piece of structured data.
应理解,S520至S540的获取顺序,本申请实施例不作限定,可以是S520、S530和S540的获取顺序,也可以是其他顺序。It should be understood that the acquisition order of S520 to S540 is not limited in the embodiment of the present application. It may be the acquisition order of S520, S530 and S540, or it may be other orders.
S540,管理装置的通信模块根据测试配置和测试对象,接收第一管理指令,其中第一管理指令包括测试环境管理指令和/或测试对象管理指令。S540: The communication module of the management device receives a first management instruction according to the test configuration and the test object, where the first management instruction includes a test environment management instruction and/or a test object management instruction.
应理解,测试配置中包括测试环境配置。It should be understood that test configuration includes test environment configuration.
S550,管理装置的测试用例管理模块根据所述第一管理指令,管理测试用例,测试用例包括测试环境和测试对象。S550: The test case management module of the management device manages test cases according to the first management instruction. The test cases include a test environment and a test object.
作为一种可能的实现方式,可以通过新增、删除、修改或查询测试用来管理测试用例。As a possible implementation, test cases can be managed by adding, deleting, modifying or querying tests.
可选地,测试用例管理模块获得测试用例的管理情况信息。测试用例的管理情况信息包括测试用例和测试用例的状态信息。例如,第一管理指令是对测试用例的新增、删除、修改等变动处理,那么测试用例的状态信息为测试用例变动后的信息。再例如,第一管理指令是对测试用例的查询等不做变动的处理,那么测试用例的状态信息为测试用例当前的状态信息。Optionally, the test case management module obtains the management status information of the test case. The test case management information includes test cases and test case status information. For example, if the first management instruction is to process changes such as adding, deleting, and modifying test cases, then the status information of the test cases is the information after the test cases have been changed. For another example, if the first management instruction is to process the query of the test case without any changes, then the status information of the test case is the current status information of the test case.
S560,当测试环境中的仿真开关参数为启动时,管理装置的容器管理模块根据测试用例和本地容器管理配置,管理第三任务评估容器,其中,第三任务评估容器包括本地仿真工作流。其中,管理第三任务评估容器可以为新增第三任务评估容器,或者删除第三任务评估容器,或者修改第三任务评估容器,或者查询第三任务评估容器。S560: When the simulation switch parameter in the test environment is enabled, the container management module of the management device manages the third task evaluation container according to the test case and the local container management configuration, where the third task evaluation container includes the local simulation workflow. Among them, managing the third task evaluation container may include adding a third task evaluation container, deleting the third task evaluation container, modifying the third task evaluation container, or querying the third task evaluation container.
具体地,管理第三任务评估容器在基准测试中,可以是新增基准测试容器,或者删除基准测试容器,或者修改基准测试容器,或者查询基准测试容器。Specifically, managing the third task evaluation container during the benchmark test may include adding a new benchmark test container, deleting the benchmark test container, modifying the benchmark test container, or querying the benchmark test container.
应理解,基准测试容器的个数在此不做限制,可以根据本地容器管理配置中的容器个数参数确定。It should be understood that the number of benchmark containers is not limited here and can be determined according to the number of containers parameters in the local container management configuration.
示例性地,测试用例中的测试范式为增量学习范式,增量学习范式包括初始化训练模块、难例识别模块以及模型更新模块,容器管理模块可以为初始化训练模块和模型更新模块创建训练容器,为难例识别模块建立推理容器。For example, the test paradigm in the test case is an incremental learning paradigm. The incremental learning paradigm includes an initialization training module, a difficult case identification module, and a model update module. The container management module can create a training container for the initialization training module and the model update module. Establish an inference container for the difficult case identification module.
可选地,当管理测试用例为新增测试用例时,管理装置的容器管理模块根据测试用例和本地容器管理配置,新增第三基准测试容器,其中,第三基准测试容器包括本地仿真工作流。根据仿真模式参数,确定第三基准测试容器中的容器情况。 Optionally, when the management test case is a new test case, the container management module of the management device adds a third benchmark test container according to the test case and the local container management configuration, where the third benchmark test container includes a local simulation workflow . According to the simulation mode parameters, the container conditions in the third benchmark test container are determined.
第三基准测试容器中的仿真模式可以表1中的四种仿真模式中的任意一种。The simulation mode in the third benchmark container can be any one of the four simulation modes in Table 1.
作为一种可能的实现方式,当仿真模式参数为仿真测试算法性能时,管理装置的容器管理模块在第三任务评估容器中创建算法伪容器。例如,管理装置的容器管理模块在第三基准测试容器中创建算法伪容器。As a possible implementation manner, when the simulation mode parameter is the simulation test algorithm performance, the container management module of the management device creates an algorithm pseudo-container in the third task evaluation container. For example, the container management module of the management device creates an algorithm pseudo-container in the third benchmark container.
作为一种可能的实现方式,当仿真模式参数为仿真测试系统性能时,管理装置的容器管理模块在第三任务评估容器中创建系统伪容器。例如,管理装置的容器管理模块在第三基准测试容器中创建系统伪容器。As a possible implementation manner, when the simulation mode parameter is the simulation test system performance, the container management module of the management device creates a system pseudo-container in the third task evaluation container. For example, the container management module of the management device creates a system pseudo-container in the third benchmark container.
作为一种可能的实现方式,当仿真模式参数为仿真测试系统性能和算法性能时,管理装置的容器管理模块在第三任务评估容器中创建真实容器。例如,管理装置的容器管理模块在第三基准测试容器中创建真实容器。As a possible implementation manner, when the simulation mode parameters are simulation test system performance and algorithm performance, the container management module of the management device creates a real container in the third task evaluation container. For example, the container management module of the management device creates a real container in the third benchmark container.
作为一种可能的实现方式,当仿真模式参数为仿真测试系统单元性能时,管理装置的容器管理模块沿用第三任务评估容器。例如,管理装置的容器管理模块沿用已经创建的第三基准测试容器。As a possible implementation manner, when the simulation mode parameter is the simulation test system unit performance, the container management module of the management device continues to use the third task to evaluate the container. For example, the container management module of the management device inherits the third benchmark container that has been created.
可选地,当管理测试用例为新增测试用例时,管理装置的容器管理模块根据测试用例、仿真模式参数和本地容器管理配置,新增第三基准测试容器,其中,第三基准测试容器包括本地仿真工作流。Optionally, when the management test case is a new test case, the container management module of the management device adds a third benchmark test container according to the test case, simulation mode parameters and local container management configuration, where the third benchmark test container includes Local simulation workflow.
第三基准测试容器中的仿真模式可以表1中的四种仿真模式中的任意一种。The simulation mode in the third benchmark container can be any one of the four simulation modes in Table 1.
作为一种可能的实现方式,当仿真模式参数为仿真测试算法性能时,管理装置的容器管理模块创建的第三任务评估容器中包括算法伪容器。例如,管理装置的容器管理模块创建的第三基准测试容器中包括算法伪容器。As a possible implementation manner, when the simulation mode parameter is the simulation test algorithm performance, the third task evaluation container created by the container management module of the management device includes an algorithm pseudo-container. For example, the third benchmark test container created by the container management module of the management device includes an algorithm pseudo-container.
作为一种可能的实现方式,当仿真模式参数为仿真测试系统性能时,管理装置的容器管理模块创建的第三任务评估容器中包括系统伪容器。例如,管理装置的容器管理模块创建的第三基准测试容器中包括系统伪容器。As a possible implementation manner, when the simulation mode parameter is a simulation test system performance, the third task evaluation container created by the container management module of the management device includes a system pseudo-container. For example, the third benchmark test container created by the container management module of the management device includes a system pseudo-container.
作为一种可能的实现方式,当仿真模式参数为仿真测试系统性能和算法性能时,管理装置的容器管理模块创建的第三任务评估容器中包括真实容器。例如管理装置的容器管理模块创建的第三基准测试容器中包括真实容器。As a possible implementation manner, when the simulation mode parameters are simulation test system performance and algorithm performance, the third task evaluation container created by the container management module of the management device includes a real container. For example, the third benchmark container created by the container management module of the management device includes the real container.
作为一种可能的实现方式,当仿真模式参数为仿真测试系统单元性能时,管理装置的容器管理模块创建第三任务评估容器。例如,管理装置的容器管理模块创建第三基准测试容器。As a possible implementation manner, when the simulation mode parameter is the simulation test system unit performance, the container management module of the management device creates a third task evaluation container. For example, the container management module of the management device creates a third benchmark container.
S570,第三任务评估容器的本地仿真工作流,根据本地容器调用配置和第三任务评估容器对应的仿真模式,对测试用例进行仿真,得到仿真结果,其中,测试配置中包括本地容器调用配置,本地容器调用配置包括测试范式对应的算法模块的调用顺序和测试范式对应的算法模块的超参数。S570, the local simulation workflow of the third task evaluation container, simulates the test case according to the local container call configuration and the simulation mode corresponding to the third task evaluation container, and obtains the simulation results, where the test configuration includes the local container call configuration, The local container calling configuration includes the calling sequence of the algorithm module corresponding to the test paradigm and the hyperparameters of the algorithm module corresponding to the test paradigm.
示例性地,第三基准测试容器的本地仿真工作流根据测试范式对应的第三基准测试容器的调用配置,按顺序从用例管理模块中调用测试范式相关的算法模块和算法模块超参数,对测试用例进行仿真,得到测试结果。Exemplarily, the local simulation workflow of the third benchmark test container calls the algorithm module and algorithm module hyperparameters related to the test paradigm from the use case management module in sequence according to the call configuration of the third benchmark test container corresponding to the test paradigm, and performs the test Use cases to simulate and obtain test results.
示例性地,增量学习范式相关模块的调用顺序是初始化训练模块、难例识别模块以及模型更新模块,那么训练容器中的本地仿真工作流先调用初始化训练模块,随后推理容器中的本地仿真工作流再调用难例识别模块,最后训练容器中的本地仿真工作流再调用模型更新模块。其中,容器管理模块按需启动训练容器和推理容器。 For example, the calling sequence of modules related to the incremental learning paradigm is the initialization training module, the difficult example identification module and the model update module. Then the local simulation workflow in the training container first calls the initialization training module, and then the local simulation work in the inference container The flow then calls the difficult example identification module, and finally the local simulation workflow in the training container calls the model update module. Among them, the container management module starts training containers and inference containers on demand.
作为一种可能的实现方式,当仿真模式参数为仿真测试算法性能时,将S530获取的数据集按测试环境配置参数中的设定,分为多份,在算法伪容器中对测试用例进行仿真测试。As a possible implementation method, when the simulation mode parameter is the simulation test algorithm performance, the data set obtained by the S530 is divided into multiple parts according to the settings in the test environment configuration parameters, and the test cases are simulated in the algorithm pseudo-container. test.
作为一种可能的实现方式,当仿真模式参数为仿真测试系统性能时,系统伪容器中不对数据集中的数据进行实际运行,在系统伪容器中通过系统性能计算公式,来获得系统性能测试仿真结果。As a possible implementation method, when the simulation mode parameter is to simulate the test system performance, the data in the data set is not actually run in the system pseudo container. The system performance test simulation results are obtained through the system performance calculation formula in the system pseudo container. .
作为一种可能的实现方式,当仿真模式参数为仿真测试系统性能和算法性能时,在创建的真实容器中,对测试用例进行仿真测试。As a possible implementation method, when the simulation mode parameters are the simulation test system performance and algorithm performance, the test cases are simulated and tested in the created real container.
作为一种可能的实现方式,当仿真模式参数为仿真测试系统单元性能时,基准测试容器的本地仿真工作流,按顺序调用测试范式相关的模块和模块超参数,在基准测试容器中对测试用例进行仿真测试。As a possible implementation method, when the simulation mode parameter is to simulate the test system unit performance, the local simulation workflow of the benchmark container calls the modules and module hyperparameters related to the test paradigm in sequence, and tests the test cases in the benchmark container. Carry out simulation testing.
其中,仿真结果可以包括每个测试用例的训练模型、推理结果、评估指标等参数。Among them, the simulation results can include the training model, inference results, evaluation indicators and other parameters of each test case.
S580,管理装置接收来自第三任务评估容器的至少一个任务用例的仿真结果,管理装置的结果显示模块,显示至少一个任务用例的仿真结果。S580: The management device receives the simulation result of at least one task use case from the third task evaluation container, and the result display module of the management device displays the simulation result of at least one task use case.
示例性地,管理装置接收来自第三基准测试容器的至少一个测试用例的仿真结果,管理装置的结果显示模块,显示至少一个测试用例的仿真结果。Exemplarily, the management device receives the simulation result of at least one test case from the third benchmark test container, and the result display module of the management device displays the simulation result of the at least one test case.
作为一种可能的实现方式,仿真结果可以通过在线控制台或者用户交互界面来展示测试用例的执行情况信息,或者将仿真结果还可以直接保存在离线保存文件中。As a possible implementation method, the simulation results can display test case execution information through an online console or user interactive interface, or the simulation results can be directly saved in an offline save file.
需要说明的是,至少一个测试用例的仿真结果为对不同的测试用例的仿真执行情况,仿真结果可以用于确定目标测试对象。例如,在特定的测试场景中确定目标测试算法,那么仿真结果可以为不同测试算法的执行情况,在图像分类中,仿真结果中评估指标可以是不同测试算法的图像分类正确率。It should be noted that the simulation result of at least one test case is the simulation execution of different test cases, and the simulation result can be used to determine the target test object. For example, if the target test algorithm is determined in a specific test scenario, the simulation results can be the execution of different test algorithms. In image classification, the evaluation index in the simulation results can be the image classification accuracy of different test algorithms.
在本申请实施例中,通过将仿真结果展示出来,可以让用户直观地比较测试对象的仿真结果,从而得到客观测试结果,利于用户选择合适的目标对象。In the embodiment of the present application, by displaying the simulation results, the user can intuitively compare the simulation results of the test object, thereby obtaining objective test results, which is helpful for the user to select an appropriate target object.
需要说明的是,在图5所示的单节点中进行多节点AI基准测试仿真中,对测试用例和本地容器的管理都是新增的方式,还可以根据本申请实施例中在单节点上进行多节点AI任务评估的架构,进行其他管理方式,例如,删除、修改和查询等管理功能。详细的管理方式在此不做赘述。It should be noted that in the multi-node AI benchmark test simulation in a single node shown in Figure 5, the management of test cases and local containers are new methods. It can also be performed on a single node according to the embodiment of this application. An architecture for multi-node AI task evaluation and other management methods, such as deletion, modification, query and other management functions. The detailed management methods will not be described here.
在本申请实施例中,测试用例管理模块,通过测试配置和测试对象,来管理测试用例(例如,创建测试用例),容器管理模块通过测试用例和仿真模式,管理基准测试工作容器(例如,创建基准测试容器),并通过基准测试工作容器中的仿真工作流进行基准测试仿真,得到仿真结果。不仅可以根据测试场景,对测试用例中的测试范式进行创建,还可以在单节点上实现多节点的仿真。对于服务开发者而言,通过本申请实施例中的边云协同AI基准测试架构,可以根据测试场景,灵活配置测试用例,也就是可以根据不同的测试范式,管理相应的测试用例(例如,新增相应的测试用例)。对于算法开发者而言,在单节点上实现多节点的仿真,不需要随着边侧测试环境的变换,在边侧和云侧反复部署测试架构,也不需要在多个虚拟节点或者多个设备节点上搭建测试平台,就可以对算法和/或系统进行仿真测试,得到仿真结果,大大减少了人力和物力成本。In the embodiment of the present application, the test case management module manages test cases (for example, creating test cases) through test configurations and test objects, and the container management module manages benchmark test work containers (for example, creates Benchmark test container), and perform benchmark test simulation through the simulation workflow in the benchmark test work container to obtain the simulation results. Not only can the test paradigm in the test case be created based on the test scenario, but multi-node simulation can also be implemented on a single node. For service developers, through the edge-cloud collaborative AI benchmark testing architecture in the embodiments of this application, test cases can be flexibly configured according to test scenarios, that is, corresponding test cases can be managed according to different test paradigms (for example, new Add corresponding test cases). For algorithm developers, to implement multi-node simulation on a single node, there is no need to repeatedly deploy the test architecture on the side and cloud side as the side test environment changes, nor does it need to be deployed on multiple virtual nodes or multiple By building a test platform on the equipment node, the algorithm and/or system can be simulated and tested, and the simulation results can be obtained, which greatly reduces manpower and material costs.
第二种为在多节点上进行分布式边云协同基准测试,该基准测试框架如图3A或者图3B所示。下面主要以图3A的框架结合图6详细说明。 The second is to conduct distributed edge-cloud collaboration benchmark testing on multiple nodes. The benchmark testing framework is shown in Figure 3A or Figure 3B. The following mainly describes in detail using the framework of Figure 3A in conjunction with Figure 6 .
图6是本申请实施例提供的一种分布式边云协同AI的基准测试流程示意图。其中,以测试场景为工业质检中的安全帽目标检测为例,测试环境中的测试范式以边云协同增量学习为例。Figure 6 is a schematic flowchart of a benchmark test process for distributed edge-cloud collaborative AI provided by an embodiment of the present application. Among them, the test scenario is safety helmet target detection in industrial quality inspection as an example, and the test paradigm in the test environment is edge-cloud collaborative incremental learning as an example.
在管理装置开始使用之前,根据基准测试模式,启动管理装置,并初始化管理装置中的各个模块,并创建任务节点中的本地控制装置。Before the management device starts to be used, start the management device according to the benchmark test mode, initialize each module in the management device, and create a local control device in the task node.
具体地,根据多节点基准测试模式,启动管理模块,并初始化管理装置中的测试用例管理模块、容器管理模块、通信模块以及结果显示模块。根据初始化后的测试用例管理模块中的初始测试用例以及测试配置,创建第一本地控制装置和第二本地控制装置。Specifically, according to the multi-node benchmark test mode, the management module is started, and the test case management module, container management module, communication module and result display module in the management device are initialized. Create a first local control device and a second local control device according to the initial test case and test configuration in the initialized test case management module.
S601,管理装置中的通信模块获取测试配置文件。S601. The communication module in the management device obtains the test configuration file.
应理解,管理装置中的通信模块获取的测试配置文件和S510中的内容相同,为了简洁在此不做赘述。It should be understood that the test configuration file obtained by the communication module in the management device is the same as the content in S510, and will not be described again for the sake of simplicity.
需要说明的是,在多节点上的分布式协同AI的基准测试中,测试环境参数配置中的仿真开关可以为关闭,也可以为启动,如果仿真开关为启动时,则是通过多节点进行分布式协同AI的基准测试仿真。在此处,由于基准测试模式是多节点分布式协同AI基准测试,因此,仿真开关为关闭。It should be noted that in the benchmark test of distributed collaborative AI on multiple nodes, the simulation switch in the test environment parameter configuration can be off or on. If the simulation switch is on, it will be distributed through multiple nodes. Benchmark test simulation of collaborative AI. Here, since the benchmark test mode is a multi-node distributed collaborative AI benchmark test, the simulation switch is turned off.
在多节点分布式协同AI的基准测试中,容器管理配置为全局容器管理配置和本地容器管理配置,全局容器管理配置包括本地控制装置的数量和本地控制装置的大小,本地容器管理配置包括每个本地控制装置对应的本地容器的数量和每个本地控制装置对应的每个本地容器的大小。例如,如图3A所示,全局容器管理配置中包括的本地控制装置数量为2,也就是第一本地控制装置和第二本地控制装置。In the benchmark test of multi-node distributed collaborative AI, the container management configuration is global container management configuration and local container management configuration. The global container management configuration includes the number of local control devices and the size of the local control device. The local container management configuration includes each The number of local containers corresponding to the local control device and the size of each local container corresponding to each local control device. For example, as shown in FIG. 3A , the number of local control devices included in the global container management configuration is 2, that is, the first local control device and the second local control device.
S602,管理装置中的通信模块获取测试对象。S602. The communication module in the management device obtains the test object.
S603,管理装置中的通信模块根据测试环境的配置和测试对象,接收第一管理指令,其中第一管理指令包括测试环境管理指令和/或测试对象管理指令。S603. The communication module in the management device receives a first management instruction according to the configuration of the test environment and the test object, where the first management instruction includes a test environment management instruction and/or a test object management instruction.
S604,管理装置中的测试用例管理模块根据所述第一管理指令,管理测试用例,测试用例包括测试环境和测试对象。S604. The test case management module in the management device manages test cases according to the first management instruction. The test cases include a test environment and a test object.
应理解,S602、S603和S604分别与S520、S540和S550中的内容相同,为了简洁在此不做赘述。It should be understood that S602, S603 and S604 are the same as the contents in S520, S540 and S550 respectively, and will not be described again for the sake of brevity.
S605,管理装置中的容器管理模块根据测试用例和全局容器管理配置,生成管理第一本地控制装置的指令和生成管理第二本地控制装置的指令。S605: The container management module in the management device generates instructions for managing the first local control device and instructions for managing the second local control device based on the test case and the global container management configuration.
其中,生成管理第一本地控制装置的指令为生成创建第一本地控制装置的指令,或者生成删除第一本地控制装置的指令,或者生成修改第一本地控制装置的指令,或者生成查询第一本地控制装置的指令;生成管理第二本地控制装置的指令为生成创建第二本地控制装置的指令,或者生成删除第二本地控制装置的指令,或者生成修改第二本地控制装置的指令,或者生成查询第二本地控制装置的指令。Wherein, the instruction to generate and manage the first local control device is to generate an instruction to create the first local control device, or to generate an instruction to delete the first local control device, or to generate an instruction to modify the first local control device, or to generate an instruction to query the first local control device. Instructions for controlling the device; instructions for generating and managing the second local control device are generating instructions for creating the second local control device, or generating instructions for deleting the second local control device, or generating instructions for modifying the second local control device, or generating queries. Instructions from the second local control device.
作为一种可能的实现方式,当管理测试用例为新增测试用例时,容器管理模块根据测试用例和全局容器管理配置,生成新增云节点本地控制装置的指令和新增边节点本地控制装置的指令。As a possible implementation method, when the management test case is a new test case, the container management module generates instructions for adding a new cloud node local control device and a new edge node local control device based on the test case and the global container management configuration. instruction.
可选地,如果管理装置和第一本地控制装置布置于同一个云节点中,如图3A所示,那么管理装置中的容器管理模块直接根据测试用例和全局容器管理配置来管理第一本地控制装置。 Optionally, if the management device and the first local control device are arranged in the same cloud node, as shown in Figure 3A, then the container management module in the management device directly manages the first local control according to the test case and the global container management configuration. device.
应理解,全局容器管理配置包括第一本地控制装置、第一本地管理指令对应的指令动作、第二本地控制装置和第二本地管理指令对应的指令动作。It should be understood that the global container management configuration includes a first local control device, an instruction action corresponding to the first local management instruction, a second local control device, and an instruction action corresponding to the second local management instruction.
S606,管理装置向第一本地控制装置发送第一本地管理指令,向第二本地控制装置发送第二本地管理指令。S606: The management device sends a first local management instruction to the first local control device and a second local management instruction to the second local control device.
具体地,第一本地管理指令对应的指令动作为对第一本地控制装置的新增指令,或者对第一本地控制装置的删除指令,或者对第一本地控制装置的修改指令,或者对第一本地控制装置的查询指令;第二本地管理指令对应的管理指令为对第二本地控制装置的新增指令,或者对第二本地控制装置的删除指令,或者对第二本地控制装置的修改指令,或者对第二本地控制装置的查询指令。Specifically, the instruction action corresponding to the first local management instruction is a new instruction to the first local control device, or a deletion instruction to the first local control device, or a modification instruction to the first local control device, or an instruction to the first local control device. Query instruction of the local control device; the management instruction corresponding to the second local management instruction is a new instruction to the second local control device, or a deletion instruction to the second local control device, or a modification instruction to the second local control device, Or a query command to the second local control device.
作为一种可能的实现方式,当管理测试用例为新增测试用例时,管理装置向第一本地控制装置发送第一本地控制装置新增指令,向第二本地控制装置发送新增第二本地控制装置的指令。As a possible implementation manner, when the management test case is a new test case, the management device sends a first local control device new command to the first local control device, and sends a new second local control command to the second local control device. Device instructions.
S607,第一本地控制装置的第一通信模块接收来自管理装置的第一本地管理指令,第一本地控制装置的第一容器管理模块并根据第一本地管理指令,管理第一本地控制装置。S607: The first communication module of the first local control device receives the first local management instruction from the management device, and the first container management module of the first local control device manages the first local control device according to the first local management instruction.
需要说明的是,第一本地管理指令包括测试用例、本地容器管理配置和容器调用配置。It should be noted that the first local management instruction includes test cases, local container management configuration and container calling configuration.
第一本地控制装置用于实现对任务云节点中的第一任务评估容器的生命周期管理(例如,第一基准测试容器的生命周期管理),以及和外部任务评估容器的通信(例如,和外部测试容器的通信),外部任务评估容器布置于任务云节点以外的节点。例如,第一本地控制装置用于实现任务云节点中的第一任务评估容器和任务边节点中的第二任务评估容器的通信。具体实现可以为,第一本地控制装置、管理装置和第二本地控制装置之间实现控制面通信,使得第一任务评估容器和第二任务评估容器之间有通信网络,实现第一任务评估容器和第一任务评估容器之间的数据面通信,例如,第一测试容器和边节点测试容器可以传输数据集、模型、算法等数据。The first local control device is used to implement life cycle management of the first task evaluation container in the task cloud node (for example, life cycle management of the first benchmark container), and communication with the external task evaluation container (for example, with the external Test container communication), the external task evaluation container is arranged on a node other than the task cloud node. For example, the first local control device is used to implement communication between the first task evaluation container in the task cloud node and the second task evaluation container in the task edge node. The specific implementation may be to implement control plane communication between the first local control device, the management device and the second local control device, so that there is a communication network between the first task evaluation container and the second task evaluation container, and the first task evaluation container is implemented Data plane communication with the first task evaluation container. For example, the first test container and the edge node test container can transmit data such as data sets, models, and algorithms.
作为一种可能的实现方式,第一本地控制装置的通信模块新增第一本地控制装置的指令,第一本地控制装置的第一容器管理模块并根据新增第一本地控制装置的指令,新增云节点本地控制装置。As a possible implementation manner, the communication module of the first local control device adds an instruction of the first local control device, and the first container management module of the first local control device adds a new instruction of the first local control device according to the instruction of the first local control device. Add cloud node local control device.
S608,第一本地控制装置的第一容器管理模块根据测试用例和本地容器管理配置,管理第一任务评估容器,其中第一任务评估容器包括第一分布式工作流。S608: The first container management module of the first local control device manages the first task evaluation container according to the test case and the local container management configuration, where the first task evaluation container includes the first distributed workflow.
应理解,管理第一任务评估容器可以是对第一任务评估容器的新增、删除、修改和查询等基础功能中的任意一种。例如,对第一基准测试容器的新增。It should be understood that managing the first task evaluation container may be any of the basic functions of adding, deleting, modifying, and querying the first task evaluation container. For example, the addition of the first benchmark container.
应理解,创建第一任务评估的个数在此不做限制,可以根据本地容器管理配置中的容器个数参数确定。It should be understood that the number of created first task evaluations is not limited here and can be determined according to the number of containers parameters in the local container management configuration.
示例性地,测试范式为增量学习范式,第一容器管理模块在任务云节点中管理的第一任务评估容器为训练容器。For example, the testing paradigm is an incremental learning paradigm, and the first task evaluation container managed by the first container management module in the task cloud node is a training container.
S609,第二本地控制装置的第二通信模块接收管理第二本地控制装置的指令,并根据管理第二本地控制装置的指令,管理第二本地控制装置。S609: The second communication module of the second local control device receives an instruction to manage the second local control device, and manages the second local control device according to the instruction to manage the second local control device.
作为一种可能的实现方式,第二本地控制装置的第二通信模块接收新增第二本地控制装置的指令,第二本地控制装置的第二容器管理模块并根据新增第二本地控制装置的指令,新增第二本地控制装置。As a possible implementation manner, the second communication module of the second local control device receives an instruction to add a second local control device, and the second container management module of the second local control device responds according to the instruction of adding a second local control device. Command to add a second local control device.
需要说明的是,管理第二本地控制装置的指令包括测试用例、本地容器管理配置和容 器调用配置。It should be noted that the instructions for managing the second local control device include test cases, local container management configuration and capacity The server calls the configuration.
第二本地控制装置用于实现对任务云节点中的第二任务评估容器的生命周期管理(例如,第二基准测试容器的生命周期管理),以及和外部任务评估容器的通信(例如,和外部测试容器的通信),外部任务评估容器布置于任务云节点以外的节点。例如,第二本地控制装置用于实现任务云节点中的第一任务评估容器和任务边节点中的第二任务评估容器的通信。具体实现过程和S607中云节点本地控制装置相似,为了简洁,在此不做赘述。The second local control device is used to implement life cycle management of the second task evaluation container in the task cloud node (for example, life cycle management of the second benchmark container), and communication with the external task evaluation container (for example, with the external Test container communication), the external task evaluation container is arranged on a node other than the task cloud node. For example, the second local control device is used to implement communication between the first task evaluation container in the task cloud node and the second task evaluation container in the task edge node. The specific implementation process is similar to the local control device of the cloud node in S607. For the sake of simplicity, it will not be described in detail here.
S610,第二本地控制装置中的第二容器管理模块根据测试用例和本地容器管理配置,管理第二任务评估容器,其中第二任务评估容器包括第二分布式工作流。S610: The second container management module in the second local control device manages the second task evaluation container according to the test case and the local container management configuration, where the second task evaluation container includes the second distributed workflow.
应理解,管理第二任务评估容器可以是对第二任务评估容器的新增、删除、修改和查询等基础功能中的任意一种。例如,对第二基准测试容器的新增。It should be understood that managing the second task evaluation container may be any of the basic functions of adding, deleting, modifying, and querying the second task evaluation container. For example, the addition of a second benchmark container.
应理解,创建第二任务评估的个数在此不做限制,可以根据本地容器管理配置中的容器个数参数确定。It should be understood that the number of created second task evaluations is not limited here and can be determined according to the number of containers parameters in the local container management configuration.
示例性地,测试范式为增量学习范式,第二容器管理模块在管理的第二任务评估容器为为推理容器。For example, the testing paradigm is an incremental learning paradigm, and the second container management module evaluates the container as an inference container during the second management task.
应理解,S607和S608为任务云节点管理第一本地控制装置以及第一任务评估容器的过程,S609和S610是任务边节点管理第二本地控制装置以及第二任务评估容器的过程,这两个过程不分先后顺序。It should be understood that S607 and S608 are the processes in which the task cloud node manages the first local control device and the first task evaluation container, and S609 and S610 are the processes in which the task edge node manages the second local control device and the second task evaluation container. These two The process is in no particular order.
S611,第一任务评估容器根据测试用例和容器调用配置,将算法或模型发送给第二任务评估容器。S611. The first task evaluation container sends the algorithm or model to the second task evaluation container according to the test case and the container call configuration.
作为一种可能的实现方式,当发送模型给第二任务评估容器时,第一任务评估容器的第一分布式工作流根据测试用例和本地容器调用配置,从第一任务用例管理模块调用模型评估模块,评估测试模型,以确定目标测试模型。第一任务评估容器的第一分布式工作流将目标测试模型发送给第二任务评估容器。其中,测试模型可以是直接获取的,或者可以是在任务云节点上训练好的,本申请实施例对测试模型的来源不限定。As a possible implementation, when sending the model to the second task evaluation container, the first distributed workflow of the first task evaluation container calls the model evaluation from the first task case management module according to the test case and the local container call configuration. Module,evaluates the test model to determine the target test,model. The first distributed workflow of the first task evaluation container sends the target test model to the second task evaluation container. The test model may be obtained directly, or may be trained on a task cloud node. The embodiment of the present application does not limit the source of the test model.
应理解,第一任务用例管理模块包括的内容和管理装置中的任务用例管理模块包括的内容相同。因此,当管理装置和第一本地控制装置之间的网络连接断开时,第一分布式工作流可以直接从第一任务评估容器中调用AI范式对应的算法模块。It should be understood that the content included in the first task use case management module is the same as the content included in the task use case management module in the management device. Therefore, when the network connection between the management device and the first local control device is disconnected, the first distributed workflow can directly call the algorithm module corresponding to the AI paradigm from the first task evaluation container.
具体地,在第一任务评估容器将算法或模型发送给第二任务评估容器之前,第一任务评估容器和第二任务评估容器,通过第一本地控制装置、管理装置以及第二本地控制装置之间搭建的通信网络,进行数据面传输。Specifically, before the first task evaluation container sends the algorithm or model to the second task evaluation container, the first task evaluation container and the second task evaluation container pass between the first local control device, the management device and the second local control device. A communication network is built between them for data plane transmission.
在本申请实施例中,通过将目标测试模型发送给任务边节点,任务边节点作为数据提供方和任务云节点作为算法提供方,当两者不一致时,可以避免数据提供方获得算法,从而降低算法提供方的算法泄露的可能性。In the embodiment of this application, by sending the target test model to the task edge node, the task edge node serves as the data provider and the task cloud node serves as the algorithm provider. When the two are inconsistent, the data provider can be prevented from obtaining the algorithm, thereby reducing The possibility of algorithm leakage from the algorithm provider.
作为一种可能的实现方式,当发送算法给第二任务评估容器时,第一任务评估容器根据测试用例和容器调用配置,将算法直接发送给第二任务评估容器。As a possible implementation, when sending the algorithm to the second task evaluation container, the first task evaluation container directly sends the algorithm to the second task evaluation container according to the test case and container call configuration.
S612,第二任务评估容器接收来自第一任务评估容器的模型或算法,第二任务评估容器的第二分布式工作流并根据测试用例和容器调用配置,从第二任务用例管理模块调用推理模块,基于模型对任务边节点的数据,进行推理,得到测试用例对应的测试结果,并向第一任务评估容器发送测试结果。S612, the second task evaluation container receives the model or algorithm from the first task evaluation container, the second task evaluation container's second distributed workflow and according to the test case and container call configuration, calls the inference module from the second task case management module , based on the model, perform inference on the data of the task edge nodes, obtain the test results corresponding to the test cases, and send the test results to the first task evaluation container.
对第二任务用例管理模块的解释和第一任务用例管理模块相似,在此不做赘述。 The explanation of the second task use case management module is similar to that of the first task use case management module, so no details will be given here.
作为一种可能的实现方式,当从第一任务评估容器接收模型时,第二任务评估容器接收来自第一任务评估容器的目标测试模式。第二任务评估容器的第二分布式工作流根据测试用例和容器调用配置,从第二任务用例管理模块调用推理模块,通过推理数据对目标测试模型进行推理,以得到目标测试模型的测试结果,并向第一任务评估容器发送测试结果。As a possible implementation manner, when receiving the model from the first task evaluation container, the second task evaluation container receives the target test pattern from the first task evaluation container. The second distributed workflow of the second task evaluation container calls the inference module from the second task case management module according to the test case and the container call configuration, and performs inference on the target test model through the inference data to obtain the test results of the target test model, And send the test results to the first task evaluation container.
作为一种可能的实现方式,当从第一任务评估容器接收算法时,第二任务评估容器的第二分布式工作流根据测试用例和容器调用配置,从第二任务用例管理模块调用训练模块,得到训练好的模型。再从第二测试用例管理模块调用推理模块,基于训练好的模型对任务边节点的数据,进行推理,得到测试用例对应的测试结果,并向第一任务评估容器发送测试结果。As a possible implementation, when receiving the algorithm from the first task evaluation container, the second distributed workflow of the second task evaluation container calls the training module from the second task case management module according to the test case and the container call configuration, Get the trained model. Then the inference module is called from the second test case management module, inference is performed on the data of the task edge nodes based on the trained model, the test results corresponding to the test cases are obtained, and the test results are sent to the first task evaluation container.
可选地,S613,当测试结果在第二任务评估容器中时,第二本地控制装置的第二通信模块获取至少一个测试用例的测试结果,第二本地控制装置的第二结果显示模块显示至少一个测试用例的测试结果,第二本地控制装置并将至少一个测试用例的测试结果发送给全局测试用例管理模块。Optionally, S613, when the test results are in the second task evaluation container, the second communication module of the second local control device obtains the test results of at least one test case, and the second result display module of the second local control device displays at least The second local control device sends the test result of at least one test case to the global test case management module.
作为一种可能的实现方式,至少一个测试用例的测试结果可以通过在线控制台或者用户交互界面来展示至少一个测试用例的执行情况。其中,如果测试对象是算法,那么测试结果可以包括测试算法、测试算法指标结果、测试算法属于的测试范式以及测试算法的超参配置等。例如,在用户交互界面上以排行榜的形式,显示出至少一个测试用例的测试结果。As a possible implementation method, the test result of at least one test case can display the execution status of at least one test case through an online console or user interaction interface. Among them, if the test object is an algorithm, then the test results can include the test algorithm, the test algorithm indicator results, the test paradigm to which the test algorithm belongs, and the super-parameter configuration of the test algorithm, etc. For example, the test results of at least one test case are displayed in the form of a ranking list on the user interaction interface.
可选地,S614,当测试结果在第一任务评估容器中时,第一本地控制装置的第一通信模块获取至少一个测试用例的测试结果,第一本地控制装置的第一结果显示模块显示至少一个测试用例的测试结果,第一本地控制装置并将至少一个测试用例的测试结果发送给全局测试用例管理模块Optionally, S614, when the test results are in the first task evaluation container, the first communication module of the first local control device obtains the test results of at least one test case, and the first result display module of the first local control device displays at least The first local control device sends the test result of at least one test case to the global test case management module
可选地,S615,第一任务评估容器接收测试结果,并根据测试结果更新目标测试模型。Optionally, S615, the first task evaluation container receives the test results and updates the target test model according to the test results.
需要说明的是,在图6所示的多节点分布式协同AI的基准测试仿中,对测试用例和本地容器的管理都是新增的方式,还可以根据本申请实施例中多节点分布式协同AI的任务评估的架构,进行其他管理方式,例如,删除、修改和查询等管理功能。详细的管理方式在此不做赘述。It should be noted that in the benchmark test simulation of multi-node distributed collaborative AI shown in Figure 6, the management of test cases and local containers are new methods. The multi-node distributed collaborative AI in the embodiment of this application can also be managed. Collaborate with the AI task evaluation architecture to perform other management methods, such as deletion, modification, query and other management functions. The detailed management methods will not be described here.
在本申请实施例中,在多节点分布式协同AI的基准测试中,通过对测试用例的管理,可以对不同的测试范式管理相应的测试容器,以实现对不同测试用例的基准测试。特别对于数据提供方和算法提供方不一致时,可以保证在数据提供的数据不出边缘节点的情况下,通过获得不同测试用例的测试结果,来对模型进行调试更新。In the embodiment of this application, in the benchmark test of multi-node distributed collaborative AI, through the management of test cases, corresponding test containers can be managed for different test paradigms to implement benchmark tests of different test cases. Especially when the data provider and algorithm provider are inconsistent, it can be ensured that the model is debugged and updated by obtaining the test results of different test cases when the data provided by the data does not leave the edge node.
上述内容结合图4至图6详细说明了本申请实施例提供的分布式协同AI任务评估方法,下面将结合图7和图8分别从多节点任务评估模式和单节点任务评估仿真模式,对本申请实施例提供的分布式协同AI任务评估系统和装置做详细描述。应理解,下面描述的装置能够执行前述本申请实施例的方法,为了避免不必要的重复,下面在介绍本申请实施例的装置时适当省略重复的描述。The above content is combined with Figures 4 to 6 to explain in detail the distributed collaborative AI task evaluation method provided by the embodiment of the present application. The following will be combined with Figures 7 and 8 to analyze the multi-node task evaluation mode and the single-node task evaluation simulation mode respectively. The distributed collaborative AI task evaluation system and device provided by the embodiment will be described in detail. It should be understood that the devices described below can perform the foregoing methods of the embodiments of the present application. In order to avoid unnecessary repetition, repeated descriptions are appropriately omitted when introducing the devices of the embodiments of the present application.
本申请实施例提供的一种分布式协同AI的任务评估的管理装置的结构性示意图可以参考图3A的中的管理装置或者图3C中的管理装置,该装置包括任务用例管理模块和通信模块,可选地,该管理装置还包括容器管理模块和结果显示模块。For a structural schematic diagram of a management device for task evaluation of distributed collaborative AI provided by the embodiment of the present application, reference can be made to the management device in Figure 3A or the management device in Figure 3C. The device includes a task use case management module and a communication module. Optionally, the management device also includes a container management module and a result display module.
通信模块用于:获取分布式协同AI的任务配置和分布式协同AI的任务对象,任务配 置包括分布式协同AI的任务环境的配置;根据任务环境的配置和任务对象,接收第一管理指令,其中,第一管理指令包括任务环境的管理指令和/或任务对象的管理指令。The communication module is used to: obtain the task configuration of distributed collaborative AI and the task object of distributed collaborative AI. The task configuration The configuration includes the configuration of the task environment of the distributed collaborative AI; according to the configuration of the task environment and the task object, a first management instruction is received, wherein the first management instruction includes a management instruction of the task environment and/or a management instruction of the task object.
任务用例管理模块用于:根据第一管理指令,管理分布式协同AI的任务用例,任务用例包括任务环境和任务对象。The task use case management module is used to manage task use cases of distributed collaborative AI according to the first management instruction. The task use cases include task environments and task objects.
为了简洁,其他步骤请参考上述方法实施例。For simplicity, please refer to the above method embodiment for other steps.
其中,任务用例管理模块、容器管理模块和结果显示模块均可以通过软件实现,或者可以通过硬件实现。Among them, the task case management module, container management module and result display module can all be implemented by software or can be implemented by hardware.
示例性的,接下来以任务用例管理模块为例,介绍任务用例管理模块的实现方式。类似的,容器管理模块和结果显示模块的实现方式可以参考任务用例管理模块的实现方式。Illustratively, the following uses the task use case management module as an example to introduce the implementation method of the task use case management module. Similarly, the implementation of the container management module and result display module can refer to the implementation of the task case management module.
当通过软件实现时,任务用例管理模块可以是运行在计算机设备上的应用程序或代码块。其中,计算机设备可以是物理主机、虚拟机、容器等计算设备中的至少一种。进一步地,上述计算机设备可以是一台或者多台。例如,任务用例管理模块可以是运行在多个主机/虚拟机/容器上的应用程序。需要说明的是,用于运行该应用程序的多个主机/虚拟机/容器可以分布在相同的可用区(availability zone,AZ)中,也可以分布在不同的AZ中。用于运行该应用程序的多个主机/虚拟机/容器可以分布在相同的区域(region)中,也可以分布在不同的region中。其中,通常一个region可以包括多个AZ。When implemented by software, the task case management module may be an application or block of code running on a computer device. The computer device may be at least one of a physical host, a virtual machine, a container, and other computing devices. Further, the above computer equipment may be one or more. For example, the task case management module can be an application running on multiple hosts/virtual machines/containers. It should be noted that multiple hosts/virtual machines/containers used to run the application can be distributed in the same availability zone (AZ) or in different AZs. Multiple hosts/VMs/containers used to run the application can be distributed in the same region or in different regions. Among them, usually a region can include multiple AZs.
同样,用于运行该应用程序的多个主机/虚拟机/容器可以分布在同一个虚拟私有云(virtual private cloud,VPC)中,也可以分布在多个VPC中。其中,通常一个region可以包括多个VPC,而一个VPC中可以包括多个AZ。Likewise, multiple hosts/VMs/containers used to run the application can be distributed in the same virtual private cloud (VPC) or across multiple VPCs. Among them, usually a region can include multiple VPCs, and a VPC can include multiple AZs.
当通过硬件实现时,任务用例管理模块中可以包括至少一个计算设备,如服务器等。或者,任务用例管理模块也可以是利用专用集成电路(application-specific integrated circuit,ASIC)实现、或可编程逻辑器件(programmable logic device,PLD)实现的设备等。其中,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD)、现场可编程门阵列(field-programmable gate array,FPGA)、通用阵列逻辑(generic array logic,GAL)或其任意组合实现。When implemented by hardware, the task case management module may include at least one computing device, such as a server. Alternatively, the task use case management module can also be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD). Among them, the above-mentioned PLD can be a complex programmable logical device (CPLD), a field-programmable gate array (field-programmable gate array, FPGA), a general array logic (generic array logic, GAL), or any combination thereof.
任务用例管理模块包括的多个计算设备可以分布在相同的AZ中,也可以分布在不同的AZ中。任务用例管理模块包括的多个计算设备可以分布在相同的region中,也可以分布在不同的region中。同样,任务用例管理模块包括的多个计算设备可以分布在同一个VPC中,也可以分布在多个VPC中。其中,所述多个计算设备可以是服务器、ASIC、PLD、CPLD、FPGA和GAL等计算设备的任意组合。Multiple computing devices included in the task case management module can be distributed in the same AZ or in different AZs. Multiple computing devices included in the task case management module can be distributed in the same region or in different regions. Similarly, multiple computing devices included in the task case management module can be distributed in the same VPC or in multiple VPCs. The plurality of computing devices may be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs.
本申请实施例提供的另一种分布式协同AI的任务评估第一本地控制装置的结构性示意图可以参考图3A中的第一本地控制装置,该第一本地控制装置包括第一容器管理模块和第一通信模块。该第一本地控制装置还可以包括第一任务用例管理模块、第一结果显示模块。For a structural schematic diagram of a first local control device for task evaluation of another distributed collaborative AI provided by the embodiment of the present application, reference can be made to the first local control device in Figure 3A. The first local control device includes a first container management module and The first communication module. The first local control device may also include a first task case management module and a first result display module.
第一通信模块用于:接收来自管理装置的第一本地管理指令,第一本地管理指令是根据任务用例确定的,任务用例是根据第一管理指令确定的,其中,第一管理指令包括任务环境的管理指令和/或任务对象的管理指令。The first communication module is configured to: receive a first local management instruction from the management device, the first local management instruction is determined according to the task use case, the task use case is determined according to the first management instruction, wherein the first management instruction includes the task environment management instructions and/or management instructions for task objects.
第一容器管理模块用于:根据第一本地管理指令,管理第一本地控制装置。The first container management module is configured to manage the first local control device according to the first local management instruction.
本申请实施例提供的又一种分布式协同AI的任务评估的第二本地控制装置的结构性示意图可以参考图3A中的第二本地控制装置,该第二本地控制装置包括第二容器管理模 块和第二通信模块。该第二本地控制装置还可以包括第二任务用例管理模块、第二结果显示模块For a structural schematic diagram of a second local control device for task evaluation of distributed collaborative AI provided by an embodiment of the present application, reference can be made to the second local control device in Figure 3A. The second local control device includes a second container management module. block and a second communication module. The second local control device may also include a second task case management module and a second result display module.
第二通信模块用于:接收来自管理装置的第二本地管理指令,第二本地管理指令是根据任务用例确定的,任务用例是根据第二管理指令确定的,其中,第二管理指令包括任务环境的管理指令和/或任务对象的管理指令。The second communication module is configured to: receive a second local management instruction from the management device, the second local management instruction is determined according to the task use case, and the task use case is determined according to the second management instruction, wherein the second management instruction includes the task environment. management instructions and/or management instructions for task objects.
第二容器管理模块用于:根据第二本地管理指令,管理第二本地控制装置。The second container management module is configured to manage the second local control device according to the second local management instruction.
需要说明的是,上述模块是从功能逻辑上的划分,并非限定上述模块必须是独立的硬件单元。这里的术语“模块”可以通过软件和/或硬件形式实现,对此不作具体限定。It should be noted that the above-mentioned modules are functionally logically divided and do not limit the above-mentioned modules to be independent hardware units. The term "module" here can be implemented in the form of software and/or hardware, and is not specifically limited.
例如,“模块”可以是实现上述功能的软件程序、硬件电路或二者结合。所述硬件电路可能包括应用特有集成电路(application specific integrated circuit,ASIC)、电子电路、用于执行一个或多个软件或固件程序的处理器(例如共享处理器、专有处理器或组处理器等)和存储器、合并逻辑电路和/或其它支持所描述的功能的合适组件。For example, a "module" may be a software program, a hardware circuit, or a combination of both that implements the above functions. The hardware circuit may include an application specific integrated circuit (ASIC), an electronic circuit, a processor (such as a shared processor, a dedicated processor, or a group processor) for executing one or more software or firmware programs. etc.) and memory, merged logic circuitry, and/or other suitable components to support the described functionality.
因此,在本申请的实施例中描述的各示例的模块,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Therefore, the modules of each example described in the embodiments of the present application can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.
图7是本申请实施例提供的一种分布式协同AI的任务评估系统硬件结构性示意图。图7的任务评估系统的硬件结构性示意图适用于多节点任务评估模式。如图7所示,该系统对应的计算机设备集群包括至少一个计算设备,例如,如图7所示至少一个计算设备可以包括计算设备700A、计算设备700B和计算设备700C。其中每个计算设备中包括总线702、处理器704、通信接口708和存储器706。处理器704、存储器706和通信接口708之间通过总线702通信。Figure 7 is a schematic diagram of the hardware structure of a distributed collaborative AI task evaluation system provided by an embodiment of the present application. The hardware structural diagram of the task evaluation system in Figure 7 is suitable for multi-node task evaluation mode. As shown in FIG. 7 , the computer device cluster corresponding to the system includes at least one computing device. For example, as shown in FIG. 7 , the at least one computing device may include a computing device 700A, a computing device 700B, and a computing device 700C. Each computing device includes a bus 702, a processor 704, a communication interface 708, and a memory 706. The processor 704, the memory 706 and the communication interface 708 communicate through the bus 702.
总线702可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图3中仅用一条线表示,但并不表示仅有一根总线或一种类型的总线。总线702可包括在计算设备700各个部件(例如,存储器706、处理器704、通信接口708)之间传送信息的通路。The bus 702 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one line is used in Figure 3, but it does not mean that there is only one bus or one type of bus. Bus 702 may include a path that carries information between various components of computing device 700 (eg, memory 706, processor 704, communications interface 708).
处理器704可以包括中央处理器(central processing unit,CPU)、图形处理器(graphics processing unit,GPU)、微处理器(micro processor,MP)或者数字信号处理器(digital signal processor,DSP)等处理器中的任意一种或多种。The processor 704 may include a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (micro processor, MP) or a digital signal processor (digital signal processor, DSP). any one or more of them.
存储器706可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM)。处理器704还可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快闪存储器,机械硬盘(hard disk drive,HDD)或固态硬盘(solid state drive,SSD)。Memory 706 may include volatile memory, such as random access memory (RAM). The processor 704 may also include non-volatile memory (non-volatile memory), such as read-only memory (ROM), flash memory, hard disk drive (HDD) or solid state drive (solid state drive). drive, SSD).
存储器706中存储有可执行的程序代码,处理器704执行该可执行的程序代码以分别实现前述装置中模块的功能。如图7所示,计算设备700A中的存储器706存储有管理装置中各模块的可执行的程序代码,计算设备700A中的处理器704执行该可执行的程序代码以实现管理装置中模块的功能。计算设备700B中的存储器706存储有第一本地控制装置中各模块的可执行的程序代码,计算设备700B中的处理器704执行该可执行的程序代码以实现管理装置中模块的功能。计算设备700C中的存储器706存储有第二本地控制装 置中各模块的可执行的程序代码,计算设备700C中的处理器704执行该可执行的程序代码以实现管理装置中模块的功能。The memory 706 stores executable program codes, and the processor 704 executes the executable program codes to respectively implement the functions of the modules in the aforementioned device. As shown in Figure 7, the memory 706 in the computing device 700A stores executable program codes for each module in the management device, and the processor 704 in the computing device 700A executes the executable program codes to implement the functions of the modules in the management device. . The memory 706 in the computing device 700B stores executable program codes for each module in the first local control device, and the processor 704 in the computing device 700B executes the executable program codes to implement the functions of the modules in the management device. Memory 706 in computing device 700C stores a second local control device The executable program code of each module is installed, and the processor 704 in the computing device 700C executes the executable program code to implement the function of the module in the management device.
作为一种可能的实现方式,至少一个计算设备可以共同执行管理装置、第一本地控制装置以及第二本地控制装置用于图4、或图5或图6中的方法的指令。As a possible implementation manner, at least one computing device can jointly execute the instructions of the management device, the first local control device, and the second local control device for the method in FIG. 4, or FIG. 5, or FIG. 6.
通信接口708使用例如但不限于网络接口卡、收发器一类的收发模块,来实现至少一个计算设备700与其他设备或通信网络之间的通信。The communication interface 708 uses transceiver modules such as, but not limited to, network interface cards and transceivers to implement communication between at least one computing device 700 and other devices or communication networks.
作为一种可能的实现方式,至少一个计算设备可以通过网络连接。其中,所述网络可以是广域网或局域网等等,如图7所示。As a possible implementation, at least one computing device may be connected via a network. Wherein, the network may be a wide area network or a local area network, etc., as shown in Figure 7.
图8是本申请实施例提供的一种分布式协同AI的任务评估管理装置硬件结构性示意图。图8适用于分布式协同AI的单节点任务评估仿真模式。计算设备800包括总线802、处理器804、通信接口808和存储器806。处理器804、存储器806和通信接口808之间通过总线802通信。Figure 8 is a schematic diagram of the hardware structure of a distributed collaborative AI task evaluation and management device provided by an embodiment of the present application. Figure 8 is a single-node task evaluation simulation mode suitable for distributed collaborative AI. Computing device 800 includes bus 802, processor 804, communication interface 808, and memory 806. The processor 804, the memory 806 and the communication interface 808 communicate through the bus 802.
计算设备800中的硬件的具体描述请参考他7中每个计算设备,为了简洁,在此不做赘述。如图8所示,计算设备800中的存储器806存储有管理装置中各模块的可执行的程序代码,计算设备800中的处理器804执行该可执行的程序代码以实现管理装置中模块的功能。For detailed descriptions of the hardware in the computing device 800, please refer to each computing device in He 7. For the sake of brevity, no further description is given here. As shown in Figure 8, the memory 806 in the computing device 800 stores executable program codes for each module in the management device, and the processor 804 in the computing device 800 executes the executable program code to implement the functions of the modules in the management device. .
作为一种可能的实现方式,计算设备800可以共同执行管理装置用于图4、或图5或图6中的方法的指令。As a possible implementation manner, the computing device 800 may jointly execute the instructions of the management device for the method in FIG. 4, or FIG. 5, or FIG. 6.
根据本申请实施例提供的方法,本申请还提供一种计算机程序产品,该计算机程序产品包括:计算机程序代码,当该计算机程序代码在计算机上运行时,使得该计算机执行如图4,或图5,或图6所示实施例的方法。According to the method provided by the embodiment of the present application, the present application also provides a computer program product. The computer program product includes: computer program code. When the computer program code is run on the computer, the computer executes the execution as shown in Figure 4, or as shown in Figure 4. 5, or the method of the embodiment shown in Figure 6.
根据本申请实施例提供的方法,本申请还提供一种计算机可读介质,该计算机可读介质存储有程序代码,当该程序代码在计算机上运行时,使得该计算机执行如图4,或图5,或图6所示实施例的方法。According to the method provided by the embodiment of the present application, the present application also provides a computer-readable medium. The computer-readable medium stores program code. When the program code is run on a computer, the computer executes the execution as shown in Figure 4, or Figure 4. 5, or the method of the embodiment shown in Figure 6.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art will appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented with electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and simplicity of description, the specific working processes of the systems, devices and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be described again here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。 The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code. .
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。 The above are only specific embodiments of the present application, but the protection scope of the present application is not limited thereto. Any person familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the present application. should be covered by the protection scope of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims (34)

  1. 一种分布式协同人工智能AI的任务评估的方法,其特征在于,所述方法应用于控制节点,所述方法包括:A method for task evaluation of distributed collaborative artificial intelligence AI, characterized in that the method is applied to control nodes, and the method includes:
    获取所述分布式协同AI的任务配置和所述分布式协同AI的任务对象,所述任务配置包括所述分布式协同AI的任务环境的配置;Obtain the task configuration of the distributed collaborative AI and the task object of the distributed collaborative AI, where the task configuration includes the configuration of the task environment of the distributed collaborative AI;
    根据所述任务环境的配置和所述任务对象,接收第一管理指令,其中,所述第一管理指令包括所述任务环境的管理指令和/或所述任务对象的管理指令;According to the configuration of the task environment and the task object, receive a first management instruction, wherein the first management instruction includes a management instruction of the task environment and/or a management instruction of the task object;
    根据所述第一管理指令,管理所述分布式协同AI的任务用例,所述任务用例包括所述任务环境和所述任务对象。According to the first management instruction, the task use case of the distributed collaborative AI is managed, and the task use case includes the task environment and the task object.
  2. 根据权利要求1所述的方法,其特征在于,所述管理所述分布式协同AI的任务用例包括:The method according to claim 1, wherein the task use case for managing the distributed collaborative AI includes:
    新增所述任务用例,或者删除所述任务用例,或者修改所述任务用例,或者查询所述任务用例。Add the task use case, delete the task use case, modify the task use case, or query the task use case.
  3. 根据权利要求1或2所述的方法,其特征在于,当所述任务评估的模式为所述多节点任务评估模式时,所述方法还包括:The method according to claim 1 or 2, characterized in that when the task evaluation mode is the multi-node task evaluation mode, the method further includes:
    根据所述任务用例和全局容器管理配置,生成对第一本地控制装置的第一本地管理指令和对第二本地控制装置的第二本地管理指令,其中,所述任务配置包括所述全局容器管理配置,所述全局容器管理配置包括所述第一本地控制装置、所述第一本地管理指令对应的指令动作、所述第二本地控制装置和所述第二本地管理指令对应的指令动作;According to the task use case and the global container management configuration, a first local management instruction to the first local control device and a second local management instruction to the second local control device are generated, wherein the task configuration includes the global container management Configuration, the global container management configuration includes the first local control device, the instruction action corresponding to the first local management instruction, the second local control device and the instruction action corresponding to the second local management instruction;
    向所述第一本地控制装置发送所述第一本地管理指令,向所述第二本地控制装置发送所述第二本地管理指令。The first local management instruction is sent to the first local control device, and the second local management instruction is sent to the second local control device.
  4. 根据权利要求3所述的方法,其特征在于,所述第一本地管理指令对应的指令动作为对所述第一本地控制装置的新增指令,或者对所述第一本地控制装置的删除指令,或者对所述第一本地控制装置的修改指令,或者对所述第一本地控制装置的查询指令;The method of claim 3, wherein the instruction action corresponding to the first local management instruction is a new instruction to the first local control device or a deletion instruction to the first local control device. , or a modification instruction to the first local control device, or a query instruction to the first local control device;
    所述第二本地管理指令对应的管理指令为对所述第二本地控制装置的新增指令,或者对所述第二本地控制装置的删除指令,或者对所述第二本地控制装置的修改指令,或者对所述第二本地控制装置的查询指令。The management instruction corresponding to the second local management instruction is a new instruction for the second local control device, a deletion instruction for the second local control device, or a modification instruction for the second local control device. , or a query instruction to the second local control device.
  5. 根据权利要求4所述的方法,其特征在于,所述方法还包括:The method of claim 4, further comprising:
    从所述第一本地控制装置或所述第二本地控制装置接收至少一个所述任务用例的评估结果;receiving an evaluation result of at least one of the task use cases from the first local control device or the second local control device;
    显示所述至少一个所述任务用例的评估结果。Display an evaluation result of the at least one task case.
  6. 根据权利要求1或2所述的方法,其特征在于,当所述任务评估的模式为所述单节点任务评估仿真模式时,所述任务配置中的仿真开关配置参数为启动,所述方法还包括:The method according to claim 1 or 2, characterized in that when the mode of task evaluation is the single-node task evaluation simulation mode, the simulation switch configuration parameter in the task configuration is start, and the method further include:
    根据所述任务用例和本地容器管理配置,管理第三任务评估容器,其中,所述任务评估容器包括本地仿真工作流,所述任务配置包括本地容器管理配置,所述本地容器管理配置包括所述第三任务评估容器和所述第三任务评估容器对应的管理动作,Manage a third task evaluation container according to the task use case and the local container management configuration, wherein the task evaluation container includes a local simulation workflow, the task configuration includes a local container management configuration, and the local container management configuration includes the The third task evaluation container and the management action corresponding to the third task evaluation container,
    其中,所述第三任务评估容器对应的管理动作包括对所述第三任务评估容器的新增,或者对所述第三任务评估容器删除,或者对所述第三任务评估容器的修改,或者对所述第 三任务评估容器的查询。Wherein, the management actions corresponding to the third task evaluation container include adding a new addition to the third task evaluation container, deleting the third task evaluation container, or modifying the third task evaluation container, or Regarding the mentioned A three-task evaluation container query.
  7. 根据权利要求6所述的方法,其特征在于,当所述管理所述分布式协同AI的任务用例为新增所述任务用例时,所述根据所述任务用例和本地容器管理配置,管理第三任务评估容器包括:The method according to claim 6, characterized in that when the task use case for managing the distributed collaborative AI is to add the task use case, the management of the third task use case is based on the task use case and the local container management configuration. The three-task assessment container includes:
    根据所述任务用例和本地容器管理配置,新增所述第三任务评估容器,所述第三任务评估容器包括本地仿真工作流。According to the task use case and the local container management configuration, the third task evaluation container is added, and the third task evaluation container includes a local simulation workflow.
  8. 根据权利要求7所述的方法,其特征在于,所述方法包括:所述任务配置还包括所述第三任务评估容器对应的仿真模式;The method according to claim 7, characterized in that the method includes: the task configuration further includes a simulation mode corresponding to the third task evaluation container;
    所述第三任务评估容器对应的仿真模式为如下模式中任意一种:The simulation mode corresponding to the third task evaluation container is any one of the following modes:
    所述仿真模式为仿真测试算法性能;或者The simulation mode is simulation testing algorithm performance; or
    所述仿真模式为仿真测试系统性能;或者The simulation mode is to simulate and test system performance; or
    所述仿真模式为仿真测试所述算法性能和所述系统性能;或者The simulation mode is to simulate and test the algorithm performance and the system performance; or
    所述仿真模式为仿真测试系统单元性能。The simulation mode is to simulate and test system unit performance.
  9. 根据权利要求8所述的方法,其特征在于,The method according to claim 8, characterized in that:
    当所述仿真模式为仿真测试算法性能时,所述方法还包括:在所述仿真测试算法性能的仿真模式对应的所述第三任务评估容器中,创建算法伪容器;或者When the simulation mode is a simulation test algorithm performance, the method further includes: creating an algorithm pseudo-container in the third task evaluation container corresponding to the simulation mode of the simulation test algorithm performance; or
    当所述仿真模式为仿真测试系统性能时,所述方法还包括:在所述仿真测试系统性能的仿真模式对应的所述第三任务评估容器中,创建系统伪容器;或者When the simulation mode is a simulation test system performance, the method further includes: creating a system pseudo-container in the third task evaluation container corresponding to the simulation mode of the simulation test system performance; or
    当所述仿真模式为仿真测试所述算法性能和所述系统性能时,在所述仿真测试所述算法性能和所述系统性能的仿真模式对应的第三任务评估容器中,创建真实容器;或者When the simulation mode is to simulate and test the algorithm performance and the system performance, create a real container in the third task evaluation container corresponding to the simulation mode of the simulation to test the algorithm performance and the system performance; or
    当所述仿真模式为仿真测试系统单元性能时,沿用所述仿真模式为仿真测试系统单元性能的仿真模式对应的所述第三任务评估容器。When the simulation mode is to simulate the performance of the test system unit, the third task evaluation container corresponding to the simulation mode in which the simulation mode is to simulate the performance of the test system unit is used.
  10. 根据权利要求8或9所述的方法,其特征在于,所述方法还包括:The method according to claim 8 or 9, characterized in that, the method further includes:
    接收来自所述第三任务评估容器的至少一个所述任务用例的仿真结果,所述仿真结果是所述本地仿真工作流根据本地容器调用配置和所述第三任务评估容器对应的所述仿真模式,对所述任务用例进行仿真得到的;Receive a simulation result of at least one task use case from the third task evaluation container, where the simulation result is the simulation mode corresponding to the local container call configuration and the third task evaluation container according to the local simulation workflow , obtained by simulating the task use case;
    其中,所述任务配置包括所述本地容器调用配置,所述本地容器调用配置包括任务范式中算法模块的调用顺序,和所述算法模块对应的超参数,其中,所述任务用例包括所述任务环境,所述任务环境包括所述任务范式;Wherein, the task configuration includes the local container calling configuration, the local container calling configuration includes the calling sequence of the algorithm module in the task paradigm, and the hyperparameters corresponding to the algorithm module, wherein the task use case includes the task environment, the task environment including the task paradigm;
    显示至少一个所述任务用例的仿真结果。Display simulation results for at least one of the described task cases.
  11. 根据权利要求1至10中任一项所述的方法,其特征在于,所述任务评估的类型为基准测试、认证或竞赛中的任意一种。The method according to any one of claims 1 to 10, characterized in that the type of task evaluation is any one of benchmark testing, certification or competition.
  12. 根据权利要求11所述的方法,其特征在于,当所述任务评估的类型为基准测试时,The method according to claim 11, characterized in that when the type of task evaluation is a benchmark test,
    所述任务环境为测试环境,The task environment is a test environment,
    所述任务对象为测试对象,所述测试对象为测试算法、测试模型、测试系统或测试场景中的任意一种,The task object is a test object, and the test object is any one of a test algorithm, a test model, a test system or a test scenario,
    任务范式为测试范式。The task paradigm is the test paradigm.
  13. 一种分布式协同人工智能AI的任务评估的方法,所述方法应用于任务云节点,其特征在于,所述方法包括: A method for task evaluation of distributed collaborative artificial intelligence AI, the method is applied to task cloud nodes, and is characterized in that the method includes:
    接收来自管理装置的第一本地管理指令,所述第一本地管理指令是根据所述分布式协同AI的任务用例确定的,所述任务用例是根据第一管理指令确定的,其中,所述第一管理指令包括任务环境的管理指令和/或任务对象的管理指令,所述任务用例包括所述任务环境和所述任务对象;Receive a first local management instruction from the management device, the first local management instruction is determined according to the task use case of the distributed collaborative AI, the task use case is determined according to the first management instruction, wherein the first local management instruction is determined according to the task use case of the distributed collaborative AI. A management instruction includes a management instruction of a task environment and/or a management instruction of a task object, and the task use case includes the task environment and the task object;
    根据所述第一本地管理指令,管理第一本地控制装置。According to the first local management instruction, the first local control device is managed.
  14. 根据权利要求13所述的方法,其特征在于,所述第一管理指令用于新增所述任务用例,或者删除所述任务用例,或者修改所述任务用例,或者查询所述任务用例;The method according to claim 13, characterized in that the first management instruction is used to add the task use case, delete the task use case, modify the task use case, or query the task use case;
    所述管理第一本地控制装置为新增所述第一本地控制装置,或者删除所述第一本地控制装置,或者修改所述第一本地控制装置,或者查询所述第一本地控制装置。The management of the first local control device is to add the first local control device, delete the first local control device, modify the first local control device, or query the first local control device.
  15. 根据权利要求13或14所述的方法,其特征在于,当所述管理第一本地控制装置为新增所述第一本地控制装置时,所述方法还包括:The method according to claim 13 or 14, characterized in that when the managing the first local control device is to add the first local control device, the method further includes:
    所述第一本地管理指令包括所述任务用例和所述任务云节点对应的第一本地容器管理配置;The first local management instruction includes the first local container management configuration corresponding to the task use case and the task cloud node;
    根据所述任务用例和所述第一本地容器管理配置,管理第一任务评估容器,所述第一本地容器管理配置包括所述第一任务评估容器和所述第一评估容器对应的管理指令。Manage a first task evaluation container according to the task use case and the first local container management configuration, where the first local container management configuration includes the first task evaluation container and management instructions corresponding to the first evaluation container.
  16. 根据权利要求15所述的方法,其特征在于,所述第一任务评估容器对应的管理指令为对所述第一任务评估容器的新增指令,或者对所述第一任务评估容器的删除指令,或者对所述第一任务评估容器的修改指令,或者对所述第一任务评估容器的查询指令,或者实现所述第一任务评估容器和外部任务评估容器之间的通信指令,其中,所述外部任务评估容器属于所述任务云节点以外的节点。The method according to claim 15, characterized in that the management instruction corresponding to the first task evaluation container is a new instruction to the first task evaluation container or a deletion instruction to the first task evaluation container. , or a modification instruction to the first task evaluation container, or a query instruction to the first task evaluation container, or an instruction to implement communication between the first task evaluation container and an external task evaluation container, wherein, The external task evaluation container belongs to a node other than the task cloud node.
  17. 根据权利要求16所述的方法,其特征在于,如果所述第一任务评估容器为新增所述第一任务评估容器,那么新增所述任务云节点对应的第一任务评估容器,其中,所述任务云节点对应的第一AI任务评估容器包括第一分布式工作流。The method according to claim 16, characterized in that if the first task evaluation container is to add the first task evaluation container, then add a first task evaluation container corresponding to the task cloud node, wherein, The first AI task evaluation container corresponding to the task cloud node includes a first distributed workflow.
  18. 根据权利要求17所述的方法,其特征在于,所述方法还包括:The method of claim 17, further comprising:
    当所述任务用例的评估结果在所述第一任务评估容器时,接收来自所述第一AI任务评估容器对至少一个所述任务用例的所述评估结果;When the evaluation result of the task use case is in the first task evaluation container, receiving the evaluation result of at least one of the task use cases from the first AI task evaluation container;
    显示至少一个所述任务用例的评估结果;Display the evaluation results of at least one of the described task cases;
    向所述管理装置发送至少一个所述任务用例的评估结果。Send an evaluation result of at least one of the task use cases to the management device.
  19. 根据权利要求13至18任一项所述的方法,其特征在于,所述任务评估的类型为基准测试、认证或竞赛中的任意一种。The method according to any one of claims 13 to 18, characterized in that the type of task evaluation is any one of benchmark testing, certification or competition.
  20. 根据权利要求19所述的方法,其特征在于,当所述任务评估的类型为基准测试时,The method according to claim 19, characterized in that when the type of task evaluation is a benchmark test,
    所述任务环境为测试环境,The task environment is a test environment,
    所述任务对象为测试对象,所述测试对象为测试算法、测试模型、测试系统或测试场景中的任意一种,The task object is a test object, and the test object is any one of a test algorithm, a test model, a test system or a test scenario,
    任务范式为测试范式。The task paradigm is the test paradigm.
  21. 一种分布式协同人工智能AI的任务评估的方法,所述方法应用于任务云节点,其特征在于,所述方法包括:A method for task evaluation of distributed collaborative artificial intelligence AI, the method is applied to task cloud nodes, and is characterized in that the method includes:
    接收来自管理装置的第二本地管理指令,所述第二本地管理指令是根据任务用例确定的,所述任务用例是根据第二管理指令确定的,其中,所述第二管理指令包括任务环境的 管理指令和/或任务对象的管理指令,所述任务用例包括所述任务环境和所述任务对象;Receive a second local management instruction from the management device, the second local management instruction is determined according to the task use case, the task use case is determined according to the second management instruction, wherein the second management instruction includes a task environment Management instructions and/or management instructions of task objects, the task use cases include the task environment and the task object;
    根据所述第二本地管理指令,管理第二本地控制装置。The second local control device is managed according to the second local management instruction.
  22. 根据权利要求21所述的方法,其特征在于,所述第二管理指令用于新增所述任务用例,或者删除所述任务用例,或者修改所述任务用例,或者查询所述任务用例;The method according to claim 21, characterized in that the second management instruction is used to add the task use case, delete the task use case, modify the task use case, or query the task use case;
    所述管理第二本地控制装置为新增所述第二本地控制装置,或者删除所述第二本地控制装置,或者修改所述第二本地控制装置,或者查询所述第二本地控制装置。The management of the second local control device is to add the second local control device, delete the second local control device, modify the second local control device, or query the second local control device.
  23. 根据权利要求21或22所述的方法,其特征在于,当所述管理第二本地控制装置为新增所述第二本地控制装置时,所述方法还包括:The method according to claim 21 or 22, characterized in that when the managing the second local control device is to add the second local control device, the method further includes:
    所述第二本地管理指令包括所述任务用例和所述任务云节点对应的第二本地容器管理配置;The second local management instruction includes the second local container management configuration corresponding to the task use case and the task cloud node;
    根据所述任务用例和所述第二本地容器管理配置,管理第二任务评估容器,所述第二本地容器管理配置包括所述第二任务评估容器和所述第二评估容器对应的管理指令。A second task evaluation container is managed according to the task use case and the second local container management configuration, and the second local container management configuration includes the second task evaluation container and management instructions corresponding to the second evaluation container.
  24. 根据权利要求23所述的方法,其特征在于,所述第二任务评估容器对应的管理指令为对所述第二任务评估容器的新增指令,或者对所述第二任务评估容器的删除指令,或者对所述第二任务评估容器的修改指令,或者对所述第二任务评估容器的查询指令,或者实现所述第二任务评估容器和外部任务评估容器之间的通信指令,其中,所述外部任务评估容器属于所述任务云节点以外的节点。The method according to claim 23, characterized in that the management instruction corresponding to the second task evaluation container is a new instruction for the second task evaluation container or a deletion instruction for the second task evaluation container. , or a modification instruction to the second task evaluation container, or a query instruction to the second task evaluation container, or an instruction to implement communication between the second task evaluation container and an external task evaluation container, wherein, The external task evaluation container belongs to a node other than the task cloud node.
  25. 根据权利要求24所述的方法,其特征在于,如果所述第二任务评估容器为新增所述第二任务评估容器,那么新增所述任务云节点对应的第二任务评估容器,其中,所述任务云节点对应的第二AI任务评估容器包括第二分布式工作流。The method according to claim 24, characterized in that if the second task evaluation container is to add the second task evaluation container, then add a second task evaluation container corresponding to the task cloud node, wherein, The second AI task evaluation container corresponding to the task cloud node includes a second distributed workflow.
  26. 根据权利要求25所述的方法,其特征在于,所述方法还包括:The method of claim 25, further comprising:
    当所述任务用例的评估结果在所述第二任务评估容器时,接收来自所述第二AI任务评估容器对至少一个所述任务用例的所述评估结果;When the evaluation result of the task use case is in the second task evaluation container, receiving the evaluation result of at least one of the task use cases from the second AI task evaluation container;
    显示至少一个所述任务用例的评估结果;Display the evaluation results of at least one of the described task cases;
    向所述管理装置发送至少一个所述任务用例的评估结果。Send an evaluation result of at least one of the task use cases to the management device.
  27. 根据权利要求21至26任一项所述的方法,其特征在于,所述任务评估的类型为基准测试、认证或竞赛中的任意一种。The method according to any one of claims 21 to 26, characterized in that the type of task evaluation is any one of benchmark testing, certification or competition.
  28. 根据权利要求27所述的方法,其特征在于,当所述任务评估的类型为基准测试时,The method according to claim 27, characterized in that when the type of task evaluation is a benchmark test,
    所述任务环境为测试环境,The task environment is a test environment,
    所述任务对象为测试对象,所述测试对象为测试算法、测试模型、测试系统或测试场景中的任意一种,The task object is a test object, and the test object is any one of a test algorithm, a test model, a test system or a test scenario,
    任务范式为测试范式。The task paradigm is the test paradigm.
  29. 一种分布式协同人工智能AI的任务评估的管理装置,其特征在于,所述管理装置包括任务用例管理模块和通信模块,A management device for task evaluation of distributed collaborative artificial intelligence AI, characterized in that the management device includes a task use case management module and a communication module,
    所述通信模块用于,获取所述分布式协同AI的任务配置和所述分布式协同AI任务对象,所述任务配置包括所述分布式协同AI的任务环境的配置;The communication module is configured to obtain the task configuration of the distributed collaborative AI and the distributed collaborative AI task object, where the task configuration includes the configuration of the task environment of the distributed collaborative AI;
    所述通信模块用于,根据所述任务环境的配置和所述任务对象,接收第一管理指令,其中,所述第一管理指令包括所述任务环境的管理指令和/或所述任务对象的管理指令;The communication module is configured to receive a first management instruction according to the configuration of the task environment and the task object, wherein the first management instruction includes a management instruction of the task environment and/or a management instruction of the task object. management instructions;
    所述任务用例管理模块用于: The task case management module is used for:
    根据所述第一管理指令,管理所述分布式协同AI的任务用例,所述任务用例包括所述任务环境和所述任务对象。According to the first management instruction, the task use case of the distributed collaborative AI is managed, and the task use case includes the task environment and the task object.
  30. 一种分布式协同人工智能AI的任务评估的第一本地控制装置,其特征在于,所述第一本地控制装置包括第一容器管理模块和第一通信模块,A first local control device for task evaluation of distributed collaborative artificial intelligence AI, characterized in that the first local control device includes a first container management module and a first communication module,
    所述第一通信模块用于:接收来自管理装置的第一本地管理指令,所述第一本地管理指令是根据所述分布式协同AI的任务用例确定的,所述任务用例是根据第一管理指令确定的,其中,所述第一管理指令包括所述分布式协同AI的任务环境的管理指令和/或所述分布式协同AI的任务对象的管理指令;The first communication module is configured to: receive a first local management instruction from a management device, the first local management instruction is determined based on the task use case of the distributed collaborative AI, the task use case is determined based on the first management Determined by instructions, wherein the first management instructions include management instructions for the task environment of the distributed collaborative AI and/or management instructions for the task objects of the distributed collaborative AI;
    所述第一容器管理模块用于:The first container management module is used for:
    根据所述第一本地管理指令,管理所述第一本地控制装置。The first local control device is managed according to the first local management instruction.
  31. 一种分布式协同人工智能AI的任务评估的第二本地控制装置,其特征在于,所述装置包括第二容器管理模块和第二通信模块,A second local control device for task evaluation of distributed collaborative artificial intelligence AI, characterized in that the device includes a second container management module and a second communication module,
    所述第二通信模块用于:接收来自管理装置的第二本地管理指令,所述第二本地管理指令是根据所述分布式协同AI的任务用例确定的,所述任务用例是根据第一管理指令确定的,其中,所述第一管理指令包括所述分布式协同AI的任务环境的管理指令和/或所述分布式协同AI的任务对象的管理指令;The second communication module is configured to: receive a second local management instruction from the management device, the second local management instruction is determined according to the task use case of the distributed collaborative AI, the task use case is determined according to the first management Determined by instructions, wherein the first management instructions include management instructions for the task environment of the distributed collaborative AI and/or management instructions for the task objects of the distributed collaborative AI;
    所述第二容器管理模块用于:根据所述第二本地管理指令,管理所述第二本地控制装置。The second container management module is configured to manage the second local control device according to the second local management instruction.
  32. 一种分布式协同人工智能AI的任务评估系统,其特征在于,所述任务评估系统包括如权利要求29所述的管理装置、如权利要求30所述的第一本地控制装置以及如权利要求31所述的第二本地控制装置。A task evaluation system for distributed collaborative artificial intelligence AI, characterized in that the task evaluation system includes a management device according to claim 29, a first local control device according to claim 30, and a first local control device according to claim 31 The second local control device.
  33. 一种计算机设备,其特征在于,所述计算机设备包括处理器和存储器;A computer device, characterized in that the computer device includes a processor and a memory;
    所述存储器用于存储计算机执行指令;The memory is used to store computer execution instructions;
    所述处理器用于执行所述存储器所存储的计算机执行指令,以使所述计算机设备执行如权利要求1至12任一项所述的方法,或者执行如权利要求13至20任一项所述的方法,或者执行如权利要求21至28任一项所述的方法。The processor is configured to execute computer execution instructions stored in the memory, so that the computer device performs the method as described in any one of claims 1 to 12, or performs the method as described in any one of claims 13 to 20. method, or perform the method according to any one of claims 21 to 28.
  34. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序,当所述计算机程序在一个或多个处理器上运行时,使得所述计算机执行如权利要求1至12中任一项所述的方法,或者执行如权利要求13至20中任一项所述的方法,或者执行如权利要求21至28中任一项所述的方法。 A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, and when the computer program is run on one or more processors, it causes the computer to execute the method as claimed in claim 1 The method according to any one of claims 13 to 12, or the method described in any one of claims 13 to 20, or the method described in any one of claims 21 to 28.
PCT/CN2023/094843 2022-06-02 2023-05-17 Distributed collaborative ai task evaluation method, management apparatus, control apparatus and system WO2023231781A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210623375.8 2022-06-02
CN202210623375.8A CN117215884A (en) 2022-06-02 2022-06-02 Distributed collaborative AI task assessment method, management device, control device and system

Publications (1)

Publication Number Publication Date
WO2023231781A1 true WO2023231781A1 (en) 2023-12-07

Family

ID=89026937

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/094843 WO2023231781A1 (en) 2022-06-02 2023-05-17 Distributed collaborative ai task evaluation method, management apparatus, control apparatus and system

Country Status (2)

Country Link
CN (1) CN117215884A (en)
WO (1) WO2023231781A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107066338A (en) * 2017-04-13 2017-08-18 中国人民解放军国防科学技术大学 The computing environment method of automatic configuration of distributed computing system
CN109787792A (en) * 2017-11-10 2019-05-21 阿里巴巴集团控股有限公司 A kind of system managing distributed service cluster
CN110389900A (en) * 2019-07-10 2019-10-29 深圳市腾讯计算机系统有限公司 A kind of distributed experiment & measurement system test method, device and storage medium
US20200257612A1 (en) * 2019-02-11 2020-08-13 Microstrategy Incorporated Validating software functionality
CN113961353A (en) * 2021-10-29 2022-01-21 深圳市慧鲤科技有限公司 Task processing method and distributed system for AI task

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107066338A (en) * 2017-04-13 2017-08-18 中国人民解放军国防科学技术大学 The computing environment method of automatic configuration of distributed computing system
CN109787792A (en) * 2017-11-10 2019-05-21 阿里巴巴集团控股有限公司 A kind of system managing distributed service cluster
US20200257612A1 (en) * 2019-02-11 2020-08-13 Microstrategy Incorporated Validating software functionality
CN110389900A (en) * 2019-07-10 2019-10-29 深圳市腾讯计算机系统有限公司 A kind of distributed experiment & measurement system test method, device and storage medium
CN113961353A (en) * 2021-10-29 2022-01-21 深圳市慧鲤科技有限公司 Task processing method and distributed system for AI task

Also Published As

Publication number Publication date
CN117215884A (en) 2023-12-12

Similar Documents

Publication Publication Date Title
Lai et al. Fedscale: Benchmarking model and system performance of federated learning at scale
US11556746B1 (en) Fast annotation of samples for machine learning model development
US11172022B2 (en) Migrating cloud resources
Tao et al. Testing and quality validation for ai software–perspectives, issues, and practices
US11537506B1 (en) System for visually diagnosing machine learning models
CN106796526A (en) JSON Stylesheet Language Transformations
WO2015126411A1 (en) Migrating cloud resources
CN114424257A (en) Automatic rendering and extraction of form data using machine learning
US11720825B2 (en) Framework for multi-tenant data science experiments at-scale
CN113836754A (en) Multi-agent simulation modeling oriented simulation method, device, equipment and medium
CN112256537B (en) Model running state display method and device, computer equipment and storage medium
CN113407327A (en) Modeling task and data analysis method, device, electronic equipment and system
CN110717268B (en) Portable component unit packaging method based on FACE architecture
EP3887951A1 (en) Rule-based assignment of event-driven application
US20220358240A1 (en) Adaptive data privacy platform
CN112394982B (en) Method, device, medium and electronic equipment for generating voice recognition system
CN112559525B (en) Data checking system, method, device and server
CN116910567B (en) Online training sample construction method and related device for recommended service
Hine Emulating enterprise software environments
US20220012434A1 (en) Contextual diagram-text alignment through machine learning
US9117177B1 (en) Generating module stubs
WO2023231781A1 (en) Distributed collaborative ai task evaluation method, management apparatus, control apparatus and system
CN115688397A (en) Social platform business modeling method based on domain-driven design
US20220122038A1 (en) Process Version Control for Business Process Management
CN116204272A (en) Reproduction method, system and device for model training and related equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23814966

Country of ref document: EP

Kind code of ref document: A1