WO2023206097A1

WO2023206097A1 - Ai model deployment method and apparatus, and electronic device and computer-readable medium

Info

Publication number: WO2023206097A1
Application number: PCT/CN2022/089379
Authority: WO
Inventors: 王海峰; 张洪洋
Original assignee: 西门子股份公司; 西门子（中国）有限公司
Priority date: 2022-04-26
Filing date: 2022-04-26
Publication date: 2023-11-02

Abstract

The embodiments of the present application mainly relate to the field of artificial intelligence (AI), and in particular to an AI model deployment method and apparatus, and an electronic device and a computer-readable medium. The method comprises: parsing an AI model file into a graph, wherein the graph is composed of nodes; performing first-time grouping on the nodes in the graph, so that each sub-graph among a plurality of sub-graphs obtained by means of grouping meets a hardware configuration condition for an edge device; performing simulation running on the graph, which includes first-time grouping information; and when a hardware configuration condition required for simulation running is less than or equal to the hardware configuration condition for the edge device, deploying the graph, which includes the first-time grouping information, to the edge device, and the edge device sequentially executing each sub-graph among the plurality of sub-graphs.

Description

Deployment method, device, electronic device and computer-readable medium of AI model

Technical field

The embodiments of this application mainly relate to the field of artificial intelligence, and in particular, to a deployment method, device, electronic device, and computer-readable medium of an AI model.

Background technique

In recent years, the field of artificial intelligence has developed rapidly, and artificial intelligence (AI) models have emerged one after another. However, in the industrial field, due to the limited memory and computing power of edge devices, large AI models often cannot run well on edge devices, and this has always been a problem in industrial applications. Currently, one method is to reduce the complexity of large AI models through distillation or pruning, which requires some modifications or additional training to the AI model. However, this method may affect the accuracy and precision of the AI model.

Contents of the invention

Embodiments of the present application provide an AI model deployment method, device, platform, and computer-readable medium to deploy large-scale AI models on edge devices without affecting the accuracy and accuracy of the large-scale AI models.

The first aspect provides a method for deploying an AI model, including: parsing the AI model file into a graph, where the graph is composed of nodes; performing a first grouping on the nodes in the graph, so that the grouping is Each of the multiple subgraphs satisfies the hardware configuration conditions of the edge device; simulate and run the graph containing the first grouping information; when the hardware configuration conditions required for simulation operation are less than or equal to the edge device When the hardware configuration conditions are met, the graph containing the first grouping information is deployed to the edge device, and the edge device executes each of the plurality of subgraphs in sequence.

A second aspect provides an AI model deployment device, including components for executing each step of the method provided in the first aspect.

In a third aspect, an electronic device is provided, including: at least one memory configured to store computer readable code; at least one processor configured to call the computer readable code to execute each of the methods provided in the first aspect. step.

In a fourth aspect, a computer-readable medium is provided. Computer-readable instructions are stored on the computer-readable medium. When executed by a processor, the computer-readable instructions cause the processor to execute the method provided in the first aspect. Each step in the method.

Description of the drawings

The following drawings are only intended to schematically illustrate and explain the embodiments of the present application, and do not limit the scope of the embodiments of the present application. in:

Figure 1 is a flow chart of an AI model deployment method according to an embodiment of the present application;

Figure 2 is a schematic diagram of a method of grouping nodes in a graph according to an embodiment of the present application;

Figure 3 is a schematic diagram of an AI model deployment device according to an embodiment of the present application;

FIG. 4 is a schematic diagram of an electronic device according to an embodiment of the present application.

Explanation of reference signs

100: Deployment method of AI model 101-104: Method steps

30: AI model deployment device 31: Sending module 32: Edge device emulator

33: Graph splitter 34: Graph parser 35: Adjuster

400: Electronic equipment 401: Memory 402: Processor

Detailed ways

The subject matter described herein will now be discussed with reference to example implementations. It should be understood that these embodiments are discussed only to enable those skilled in the art to better understand and implement the subject matter described herein, and are not intended to limit the scope, applicability, or examples set forth in the claims. The functions and arrangements of the discussed elements may be changed without departing from the scope of the embodiments of the present application. Each example may omit, substitute, or add various procedures or components as needed. For example, the described methods may be performed in an order different from that described, and individual steps may be added, omitted, or combined. Additionally, features described with respect to some examples may also be combined in other examples.

As used herein, the term "includes" and variations thereof represent an open term meaning "including, but not limited to." The term "based on" means "based at least in part on." The terms "one embodiment" and "an embodiment" mean "at least one embodiment." The term "another embodiment" means "at least one other embodiment". The terms "first", "second", etc. may refer to different or the same object. Other definitions may be included below, whether explicit or implicit. The definition of a term is consistent throughout this specification unless the context clearly dictates otherwise.

The embodiments of the present application will be described in detail below with reference to the accompanying drawings.

Figure 1 is a flow chart of an AI model deployment method according to an embodiment of the present application. As shown in Figure 1, the AI model deployment method 100 includes:

Step 101: Parse the AI model file into a graph, where the graph is composed of nodes.

Step 102: Group the nodes in the graph for the first time so that each of the multiple subgraphs obtained by grouping satisfies the hardware configuration conditions of the edge device.

Optionally, the hardware configuration condition of the edge device may be, for example, random access memory (Random Access Memory, RAM) space or hard disk space. Assume that the hardware configuration condition of the edge device selected for comparison is RAM space. Group the nodes in the graph for the first time so that the RAM space occupied by each subgraph in the grouped subgraphs is equal to or close to the edge device. of RAM space.

Step 103: Simulate the graph containing the first grouping information.

Since it is difficult to accurately estimate the RAM space occupied by actually running a graph containing grouping information, in addition to the AI model itself represented by the graph with grouping information, the RAM space occupied also includes the cache of the relevant optimizer and the related intermediate variables. Storage, so the actual running results can be predicted in advance through simulation running.

Step 104: When the hardware configuration conditions required for simulation operation are less than or equal to the hardware configuration conditions of the edge device, deploy the graph containing the first grouping information to the edge device, and the edge device executes each of the multiple subgraphs in sequence. .

The embodiment of this application groups the graphs corresponding to the AI model so that the hardware configuration conditions occupied by each subgraph obtained by the grouping meet the relevant hardware configuration conditions of the edge device, and then tests through simulation operations to group the relevant bands that meet the requirements. The graph with grouping information is deployed to the edge device. Without affecting the accuracy and accuracy of the large-scale AI model, the embodiments of the present application can deploy the large-scale AI model on edge devices, thereby greatly broadening the application scenarios of edge devices.

Optionally, when the hardware configuration conditions required for simulation operation are greater than the hardware configuration conditions of the edge device, a loop is executed to reduce the hardware configuration conditions corresponding to the graph containing the last grouping information, and the nodes in the graph are updated according to the reduced hardware configuration conditions. For the next grouping, simulate and run the graph containing the current grouping information until the hardware configuration conditions required for the running graph containing the current grouping information are less than or equal to the hardware configuration conditions of the edge device, and deploy the graph containing the current grouping information to the edge. equipment.

In one embodiment, when the RAM space required for simulation operation is greater than the RAM space of the edge device, the loop is executed to reduce the RAM space corresponding to the graph containing the last grouping information to a preset ratio, for example, to the RAM space containing the last grouping information. 90% of the RAM space corresponding to the graph, the nodes in the graph are grouped for the next time according to the reduced RAM space, and the graph containing the current grouping information is simulated until the RAM space required for the graph containing the current grouping information is less than Or equal to the RAM space of the edge device, deploy the graph containing the current grouping information to the edge device.

Optionally, when the hardware configuration conditions required for simulation operation are greater than the hardware configuration conditions of the edge device, the hardware configuration conditions corresponding to the graph containing the first grouping information are reduced to a preset degree to obtain the result after the first reduction. hardware configuration conditions. Use the hardware configuration conditions after the first reduction to group the nodes in the graph for the second time, and run the graph containing the second grouping information. When the hardware configuration conditions required for operation are less than or equal to the hardware configuration conditions of the edge device, the graph containing the second grouping information is deployed to the edge device; when the hardware configuration conditions required for operation are greater than the hardware configuration conditions of the edge device, By analogy, until the hardware configuration conditions required by the running graph of the current grouping information are less than or equal to the hardware configuration conditions of the edge device, the graph of the current grouping information is deployed to the edge device.

The embodiments of this application provide an end-to-end, closed-loop edge deployment method for large AI models. As long as the results of the simulation run do not meet the requirements, the graphs will be regrouped to meet the currently set hardware configuration conditions until the simulation run results meet the requirements. After the request is made, the actual deployment is completed. The embodiment of this application does not require any modification or additional training to the AI model, which not only improves deployment efficiency, but also saves human resources.

In one embodiment, a method for grouping nodes in a graph is provided, starting from at least one input node in the graph, traversing the nodes in the graph according to preset rules, and grouping the nodes in the graph. Among them, the preset rules include: graph connectivity rules, as well as the rules for priority access of related nodes that all parent nodes have visited and related nodes that do not have parent nodes. Among them, the rule of graph connectivity refers to determining the node traversal method based on whether there are edges connecting any nodes. Optionally, when there is a conflict between the rule that related nodes that have been visited by all parent nodes have priority access and the rule that related nodes that do not have parent nodes have priority access, the rule that related nodes that have all parent nodes have visited first will be used. First.

In one embodiment, Figure 2 is a schematic diagram of a method of grouping nodes in a graph according to an embodiment of the present application. As shown in Figure 2, at least one input node of the graph, such as node 7 and node 1, is sequentially added to In the first preset array, the first preset array is used to store all nodes to be accessed, and follows the last-in-first-out principle. An input node is a node in the graph that has no parent node. The last-in-first-out principle means that nodes added later to the array will be retrieved and accessed earlier than nodes added first. Select a first input node, such as node 1, from the first preset array as the first node to be accessed, where the first input node is the last node added to the first preset array. According to the node connectivity, the visit starts from the first node to be visited, and the visited node is determined as the node in the first subgraph. Then, add the child nodes directly connected to the first node to be visited to the first preset array. For example, add the child nodes directly connected to node 1, including node 2, node 3 and node 8, to the first preset array. .

When all parent nodes of the second node selected from the first preset array have been visited or there is no parent node, such as node 2 or node 3 in Figure 2, the second node is determined to be in the first subgraph. node. Add all child nodes of the second node to the first preset array.

When the third node selected from the first preset array has an unvisited parent node, such as node 8 in Figure 2, the third node is added to the second preset array. When all parent nodes of the third node have been visited, the third node is added to the first preset array. Optionally, in each round of selecting a new node as the node to be visited, it is necessary to check whether the node in the second preset array meets the conditions for being added to the first preset array, that is, whether all its parent nodes have been visited, if the condition is met, the node is removed from the second preset array and added to the first preset array.

Optionally, the step of adding the child nodes directly connected to node 1 to the first preset array takes precedence over the step of removing node 7 from the first preset array, that is, when only one node is left in the first preset array When a node is to be taken out, the relevant node to be added is first added to the first preset array.

Through the embodiment of this application, nodes in the graph are grouped to determine multiple subgraphs in the graph. The grouping method provided by the embodiments of this application does not involve any additional modifications to the AI model, and therefore will not affect the accuracy and precision of the AI model.

Figure 3 is a schematic diagram of an AI model deployment device 30 according to an embodiment of the present application. As shown in Figure 3, the AI model deployment device 30 includes:

The sending module 31 is configured to send the hardware configuration conditions of the edge device to the edge device emulator 32 and the graph segmenter 33 respectively.

The graph parser 34 is configured to: parse the AI model file into a graph, where the graph is composed of nodes.

The graph divider 33 is configured to group the nodes in the graph for the first time so that each of the multiple subgraphs obtained by grouping satisfies the hardware configuration conditions of the edge device.

The edge device emulator 32 is configured to: run the graph containing the first grouping information, and when the hardware configuration conditions required for operation are less than or equal to the hardware configuration conditions of the edge device, deploy the graph containing the first grouping information to An edge device such that the edge device executes each of the multiple subgraphs in turn.

Optionally, when the hardware configuration conditions required for the operation of the edge device emulator 32 are greater than the hardware configuration conditions of the edge device, the following steps of loop execution are entered: the edge device emulator 32 will characterize the failed simulation results and the simulation operation includes the above The hardware configuration conditions corresponding to the graph of the previous grouping information are sent to the adjuster 35. The adjuster 35 reduces the hardware configuration conditions corresponding to the graph containing the last grouping information. The graph divider 33 performs operations on the nodes in the graph based on the reduced hardware configuration conditions. For the next grouping, the edge device emulator 32 runs the graph containing the current grouping information until the hardware configuration conditions required for the run graph containing the current grouping information are less than or equal to the hardware configuration conditions of the edge device, and then the graph containing the current grouping information is Deploy to an edge device such that the edge device executes each of the multiple subgraphs in turn.

Optionally, the sending module 31 is also configured to: collect the hardware configuration conditions of the edge device.

Without affecting the accuracy and accuracy of the large-scale AI model, the embodiments of the present application can deploy the large-scale AI model on edge devices, thereby greatly broadening the application scenarios of edge devices.

An embodiment of the present application also provides an electronic device 400. Figure 4 is a schematic diagram of a device control platform 400 according to an embodiment of the present application. As shown in Figure 4, the device control platform 400 includes a processor 402 and a memory 401. Instructions are stored in the memory 401, and when the instructions are executed by the processor 402, the method 100 as described above is implemented.

Wherein, at least one processor 402 may include a microprocessor, an application specific integrated circuit (ASIC), a digital signal processor (DSP), a central processing unit (CPU), a graphics processing unit (GPU), a state machine, etc. Examples of computer readable media include, but are not limited to, floppy disks, CD-ROMs, magnetic disks, memory chips, ROM, RAM, ASICs, configured processors, all-optical media, all magnetic tape or other magnetic media, or from which a computer processor can Any other medium from which instructions are read. Additionally, various other forms of computer-readable media can be used to send or carry instructions to a computer, including routers, private or public networks, or other wired and wireless transmission devices or channels. Instructions can include code in any computer programming language, including C, C++, C++, Visual Basic, Java, and JavaScript.

In addition, the embodiments of the present application also provide a computer-readable medium. Computer-readable instructions are stored on the computer-readable medium. When executed by the processor, the computer-readable instructions cause the processor to execute the aforementioned AI model. Deployment method. Examples of computer-readable media include floppy disks, hard disks, magneto-optical disks, optical disks (such as CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD+RW), magnetic tape, non- Volatile memory cards and ROM. Alternatively, the computer-readable instructions may be downloaded from the server computer or the cloud by the communications network.

It should be noted that not all steps and modules in the above-mentioned processes and system structure diagrams are necessary, and some steps or modules can be ignored according to actual needs. The execution order of each step is not fixed and can be adjusted as needed. The system structure described in the above embodiments may be a physical structure or a logical structure, that is, some modules may be implemented by the same physical entity, or some modules may be implemented by multiple physical entities, or may be implemented by multiple Some components in separate devices are implemented together.

Claims

A method of deploying an AI model, which is characterized by including:

- Parse (101) the AI model file into a graph, where the graph is composed of nodes;

-Perform the first grouping (102) of the nodes in the graph, so that each of the multiple subgraphs obtained by the grouping satisfies the hardware configuration conditions of the edge device;

- the diagram containing the first grouping information described in the simulation run (103);

-When the hardware configuration conditions required for simulation operation are less than or equal to the hardware configuration conditions of the edge device, deploy (104) the graph containing the first grouping information to the edge device, and the edge device executes in sequence Each subgraph of the plurality of subgraphs.
The method according to claim 1, characterized in that, after running (104) the graph containing grouping information, the method further includes:

-When the hardware configuration conditions required for simulation operation are greater than the hardware configuration conditions of the edge device, loop execution is performed to reduce the hardware configuration conditions corresponding to the graph containing the last grouping information, and the hardware configuration conditions in the graph are modified according to the reduced hardware configuration conditions. The node performs the next grouping and simulates running the graph containing the current grouping information until the hardware configuration conditions required for the running graph containing the current grouping information are less than or equal to the hardware configuration conditions of the edge device, and the graph containing the current grouping information is Deploy the graph to the edge device.
The method according to claim 1, characterized in that the hardware configuration conditions of the edge device include: RAM space or hard disk space.
The method according to claim 1, characterized in that, making each of the plurality of subgraphs obtained by grouping satisfy the hardware configuration conditions of the edge device includes:

-So that the RAM space occupied by each subgraph in the grouped subgraphs is equal to or close to the RAM space of the edge device.
The method according to claim 2, characterized in that, after running (104) the graph containing grouping information, the method further includes:

- When the RAM space required for simulation operation is greater than the RAM space of the edge device, loop execution is performed to reduce the RAM space corresponding to the graph containing the last grouping information to a preset ratio, and the RAM space in the graph is adjusted according to the reduced RAM space. The node performs the next grouping and simulates running the graph containing the current grouping information until the RAM space required for the running graph containing the current grouping information is less than or equal to the RAM space of the edge device, and the graph containing the current grouping information is Deploy to the edge device.
The method according to claim 1 or 2, characterized in that, grouping nodes in the graph includes:

-Starting from at least one input node in the graph, traverse the nodes in the graph according to preset rules, and group the nodes in the graph; wherein the preset rules include: rules for the connectivity of the graph , as well as the rules for priority access of related nodes that all parent nodes have visited and related nodes that do not have parent nodes.
The method according to claim 1 or 2, characterized in that, grouping nodes in the graph includes:

-Add at least one input node of the graph to a first preset array, wherein the first preset array is used to store all nodes to be accessed and follows the last-in-first-out principle;

-Select the first input node from the first preset array as the first node to be visited, where the first input node is the last node added to the first preset array;

-According to node connectivity, access starts from the first node to be visited, and the visited node is determined as a node in the first subgraph;

-Add child nodes directly connected to the first node to be visited to the first preset array,

- When all parent nodes of the second node selected from the first preset array have been visited or there is no parent node, the second node is determined to be a node in the first subgraph; All child nodes of the second node are added to the first preset array;

- When the third node selected from the first preset array has an unvisited parent node, add the third node to the second preset array; when all the parents of the third node If all nodes have been visited, add the third node to the first preset array;

- Grouping nodes in the graph, determining a plurality of subgraphs in the graph.
An AI model deployment device, characterized by including:

-The sending module (31) is configured to: send the hardware configuration conditions of the edge device to the edge device emulator (32) and the graph segmenter (33) respectively;

- a graph parser (34) configured to: parse the AI model file into a graph, where the graph is composed of nodes;

-The graph divider (33) is configured to: perform the first grouping of nodes in the graph, so that each of the multiple subgraphs obtained by grouping satisfies the hardware configuration conditions of the edge device ;

-The edge device emulator (32) is configured to: run the graph containing the first grouping information; when the hardware configuration conditions required for operation are less than or equal to the hardware configuration conditions of the edge device, all The graph containing the first grouping information is deployed to the edge device, so that the edge device executes each of the plurality of subgraphs in sequence.
The device according to claim 1, characterized in that:

- When the hardware configuration conditions required for the operation of the edge device emulator (32) are greater than the hardware configuration conditions of the edge device, enter a loop execution:

-The adjuster (35) reduces the hardware configuration conditions corresponding to the graph containing the last grouping information. The graph divider (33) groups the nodes in the graph for the next time according to the reduced hardware configuration conditions. The edge The device emulator (32) runs the graph containing the current grouping information,

-Until the hardware configuration conditions required by the running graph containing the current grouping information are less than or equal to the hardware configuration conditions of the edge device, deploy the graph containing the current grouping information to the edge device.
An electronic device, characterized by including:

At least one memory (401) configured to store computer readable code;

At least one processor (402) is configured to invoke the computer readable code to perform the steps of the method according to any one of claims 1 to 7.
A computer-readable medium, characterized in that computer-readable instructions are stored on the computer-readable medium, and when executed by a processor, the computer-readable instructions cause the processor to execute claims 1 to 7 The steps in any of the methods.