WO2021134231A1

WO2021134231A1 - Computing resource allocation method and apparatus based on inference engine, and computer device

Info

Publication number: WO2021134231A1
Application number: PCT/CN2019/129973
Authority: WO
Inventors: 庄奇
Original assignee: 深圳元戎启行科技有限公司
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2021-07-08
Also published as: CN113412493A

Abstract

Provided is a computing resource allocation method based on an inference engine, the method comprising: acquiring computing resources of a computing platform (202); calling a neural network model, wherein the neural network model comprises a plurality of operation layers (204); identifying a dependence relationship between the plurality of operation layers by means of an inference engine, and mapping each operation layer to a corresponding computing resource (206); and according to the dependence relationship and the mapped computing resource, using the neural network model to perform an inference process by means of the inference engine (208).

Description

Calculation resource allocation method, device and computer equipment based on reasoning engine

Technical field

This application relates to a method, device, computer equipment and storage medium for allocating computing resources based on an inference engine.

Background technique

As a research direction in the field of artificial intelligence, deep learning technology has been applied in many aspects such as speech recognition, image recognition, and natural language processing. With the development of deep learning technology, the scale of the basic neural network model of deep learning technology has also become larger and larger. The operation layer of the neural network model becomes more, and the link between the operation layer and the operation layer also becomes more complicated.

The inference engine can use the neural network model and the hardware computing resources of the computing platform (hereinafter referred to as computing resources) to realize the inference function. If the load balancing of computing resources is not done well, some computing resources may be overused or left unused, which will greatly affect the computing efficiency of the inference process. Therefore, in the face of large-scale neural network models, how to effectively improve the computational efficiency of the inference process by improving the load balance of computational resources has become a technical problem that needs to be solved at present.

Summary of the invention

According to various embodiments disclosed in the present application, a method, device, computer device, and storage medium for allocating computing resources based on an inference engine are provided.

A method for allocating computing resources based on an inference engine, including:

Obtain the computing resources of the computing platform;

Calling a neural network model, the neural network model including multiple operation layers;

Identify the dependency between the multiple operation layers through the inference engine, and map each operation layer to a corresponding computing resource; and

According to the dependency relationship and the mapped computing resources, the neural network model is used to perform the inference process through the inference engine.

A computing resource allocation device based on an inference engine includes:

The resource acquisition module is used to acquire the computing resources of the computing platform;

The model calling module is used to call a neural network model, the neural network model including multiple operation layers;

The relationship recognition module is used to recognize the dependency relationship between the multiple operation layers through the inference engine;

The resource mapping module is used to map each operation layer to the corresponding computing resource; and

The reasoning execution module is used to perform the reasoning process using the neural network model through the reasoning engine according to the dependency relationship and the mapped computing resources.

A computer device includes a memory and one or more processors. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the one or more processors, the one or more Each processor performs the following steps:

Obtain the computing resources of the computing platform;

One or more non-volatile computer-readable storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:

Obtain the computing resources of the computing platform;

The details of one or more embodiments of the present application are set forth in the following drawings and description. Other features and advantages of this application will become apparent from the description, drawings and claims.

Description of the drawings

In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings needed in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. A person of ordinary skill in the art can obtain other drawings based on these drawings without creative work.

Fig. 1 is an application scenario diagram of a calculation resource allocation method based on an inference engine according to one or more embodiments.

Fig. 2 is a schematic flowchart of a method for allocating computing resources based on an inference engine according to one or more embodiments.

FIG. 3 is a schematic diagram of the dependency relationship and naming between the various operation layers in the neural network model according to one or more embodiments.

Fig. 4 is a schematic flowchart of the steps of mapping each operation layer to a corresponding computing resource in an embodiment.

Fig. 5 is a schematic diagram of mapping various operation layers of a neural network model to computing resources according to one or more embodiments.

Fig. 6 is a block diagram of an apparatus for allocating computing resources based on an inference engine according to one or more embodiments.

Figure 7 is a block diagram of a computer device according to one or more embodiments.

Detailed ways

In order to make the technical solutions and advantages of the present application clearer, the following further describes the present application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, and are not used to limit the present application.

In one of the embodiments, the method for allocating computational resources based on the inference engine provided in this application can be specifically applied in the field of autonomous driving, and the neural network model can specifically include at least one of an image recognition model, a behavior prediction model, or a risk assessment model. Kind. For example, the neural network model may be an image recognition model, and the calculation resource allocation method based on the reasoning engine provided in this application may be applied to the application environment as shown in FIG. 1. The autonomous vehicle may include a sensor 102 and a computer device 104, and the sensor 102 may communicate with the computer device 104. The sensor 102 can collect an image of the environment within the visual range. For example, when an autonomous vehicle is driving to an intersection, the sensor 102 can collect traffic signal images. The computer device 104 performs image recognition according to the signal light image collected by the sensor 102, and judges the color of the signal light in the image. Specifically, the computer device 104 can obtain multiple computing resources and call a neural network model. The neural network model includes multiple operation layers. The inference engine recognizes the dependencies between the multiple operation layers, and maps each operation layer to the corresponding Computing resources. The computer device 104 uses the neural network model to perform the inference process through the inference engine according to the dependency relationship and the mapped computing resources to obtain the color of the signal light in the signal light image.

It is understandable that the neural network model inference method provided in the present application implements the inference of the neural network model and can be applied to a variety of application environments, and the neural network model can include multiple types. For example, the neural network model may include a convolutional neural network model, a recurrent neural network model, and a recurrent neural network model. The neural network model can be used to process a variety of different data. For example, the neural network model may specifically include an image recognition model, a feature extraction model, a speech recognition model, a text recognition model, and a scene classification model.

In one embodiment, a method for allocating computing resources based on an inference engine is provided. Taking the method applied to the computer device in FIG. 1 as an example for description, the method specifically includes the following steps:

Step 202: Obtain computing resources of the computing platform.

The computing platform may be a platform used by computer equipment to perform automatic control operations. The computer device may be a stand-alone device, such as a vehicle-mounted computer device. The computer platform has corresponding computing resources. Computing resources include multiple microprocessors, and each microprocessor includes multiple computing streams. The computer device may read the computing resource corresponding to the condition according to a preset condition, and the read computing resource may be a part of the computing resource of the computing platform. The computer equipment can also read all the computing resources of the computing platform.

In step 204, the neural network model is invoked, and the neural network model includes multiple operation layers.

The neural network model is pre-stored in the computer equipment, and the neural network model can be pre-trained. The neural network model can be used in the inference engine to implement the inference process in the computing platform. The neural network model includes multiple operating layers. When facing different business requirements, the reasoning engine can use different neural network models to realize the corresponding reasoning process. For example, when performing image recognition, a neural network model related to image recognition can be used to perform inference process operations. When performing natural language processing, neural network models related to natural language processing can be used to perform inference process operations.

Step 206: Identify the dependency between multiple operation layers through the inference engine, and map each operation layer to a corresponding computing resource.

There are dependencies among the multiple operating layers of the neural network model. Dependency means that the input of one operating layer depends on the output of other operating layers. Among them, the operation layer with dependency relationship can be called the dependent layer and the dependent layer respectively. The output of the dependent layer can form the input of the dependent layer. The current dependent layer can also be a dependent layer corresponding to other operating layers. The dependent layer corresponding to the dependent layer can be one layer or multiple layers.

An inference engine is installed in the computer equipment. The inference engine can identify the dependency between each operation layer in the neural network, and map each operation layer to the corresponding computing resource. The computer device can also use the inference engine to identify the dependency relationship between each operation layer, divide the operation layer with the dependency relationship into the corresponding operation layer group, and map the operation layer group to the corresponding computing resource. The dependency between the operation layers can be pre-configured, or it can be obtained by searching and analyzing the operation layers.

Step 208, according to the dependency relationship and the mapped computing resources, use the neural network model to perform the inference process through the inference engine.

The computer equipment should set the corresponding computing resources to the operating layer of the neural network model, so that each operating layer can be mapped to the corresponding microprocessor, or mapped to the data stream of the corresponding microprocessor. Among them, the operating layers with dependencies can be mapped to the same microprocessor. Operating layers that do not have dependencies can be mapped to different microprocessors. In this way, the calculation resources of the multiple operation layers of the neural network model can be reasonably and effectively allocated. The inference engine can use the neural network model and the computing resources allocated by each operating layer to perform the inference process.

In this embodiment, by identifying the dependency between multiple operating layers in the neural network model, each operating layer is mapped to the corresponding computing resource in the computing platform, thereby reasonably allocating computing resources, effectively improving computing resources Load balancing between. Therefore, the inference engine can effectively improve the calculation efficiency of the inference process when the inference engine uses the neural network model and the computing resources allocated by each operating layer to perform the inference process.

In one embodiment, the computer device may obtain a configuration file corresponding to the neural network model; read the dependency relationship between multiple operation layers in the neural network model in the configuration file.

The configuration file may be configured by the user in advance according to the structure of the neural network model. The configuration file can record the next operation layer corresponding to each operation layer, that is, record the dependency relationship between the operation layers. The configuration file can also record the naming of each operation layer. The computer device can map each operation layer to the corresponding computing resource according to the naming of the operation layer. In one of the embodiments, mapping each operation layer to the corresponding computing resource includes: obtaining the name corresponding to each operation layer; parsing the name according to a preset naming format, and identifying calculations that have a mapping relationship with each operation layer Resources; each operating layer is allocated to computing resources with a mapping relationship.

In this embodiment, when creating the neural network model, the dependency relationship between the operation layer and the operation layer can be defined according to the structure of the neural network model. You can also specify the corresponding computing resource for each operation layer by naming the operation layer. The naming of the operating layer can be recorded in the corresponding configuration file. When the inference engine needs to perform the inference process, the inference engine can read the dependency relationship between the operation layers and the corresponding naming of each operation layer in the configuration file. The reasoning engine can obtain the preset naming format, analyze the naming of each operation layer according to the preset naming format, identify the computing resources that have a mapping relationship with each operation layer, and assign each operation layer to the computing resources with the mapping relationship . Computing resources include microprocessors, or microprocessors and calculation flows in microprocessors.

The schematic diagram of the dependency relationship and naming between the various operating layers in the neural network model can be shown in Figure 3. In Figure 3, the arrows between the operation layers can indicate the dependencies between each other. The dependent layer corresponding to a dependent layer can be one layer or multiple layers. In FIG. 3, the dependent layer corresponding to operation layer 2 is operation layer 1, and the dependent layer corresponding to operation layer 6 is operation layer 2 and operation layer 5. The naming of each operation layer includes the operation layer, the microprocessor, and the calculation flow. Among them, operation layer 0, operation layer 1, and operation layer 2 with dependencies can be mapped to the calculation flow 0 of the microprocessor GPU0. The operation layer 3, the operation layer 4, and the operation layer 5 with dependencies can be mapped to the calculation flow 1 of the microprocessor GPU0, and the operation layer 6 can be mapped to the calculation flow 2 of the microprocessor GPU0. In other words, in Figure 3, different operating layers can be mapped to the same microprocessor, operating layers with dependencies can be mapped to the same calculation flow of the same microprocessor, and operating layers with different dependencies can be mapped to the same microprocessor. Different calculation flows of the processor.

In this embodiment, better load balancing can be achieved by configuring each operation layer in the neural network model. Thereby, the reasoning process can be reasonably allocated to the corresponding computing resources for processing, and the calculation efficiency of the reasoning process can be effectively improved.

In an embodiment, as shown in FIG. 4, the step of mapping each operation layer to a corresponding computing resource includes:

Step 402: Topologically sort the multiple operation layers, and search the sorted operation layers in sequence.

Step 404: Generate multiple operation layer groups according to the search results, and map the multiple operation layer groups to corresponding computing resources.

The computer equipment topologically sorts all the operating layers of the neural network model. Among them, it can be sorted according to the input and output relationship between the operation layers. The sorted operation layer is searched in sequence according to the order of input and output. Through searching, the operating layers with dependencies can be classified into the same operating layer group.

Since the output of the input layer and the constant layer does not depend on other operation layers, when the input layer or the constant layer is searched, the search for the next layer can be skipped directly. When the next layer does not belong to the operation layer of the input layer and the constant layer (it can also be called the operation layer with output dependency), check the dependency relationship with each existing operation layer group.

In one of the embodiments, sequentially searching the sorted operation layers includes: checking the dependency between the sorted operation layers and the existing operation layer group; and comparing the sorted operation layers with the existing operation layer group Statistics are performed on the dependency relationship; the sorted operation layer is classified into the corresponding operation layer group according to the statistical result.

Specifically, when searching for the first operation layer with output dependency (referred to as the sorted operation layer for short), there is no existing operation layer group, and the first sorted operation layer can be counted as the first operation layer. Operation layer group. When searching for the second sorted operation layer, check whether there is a dependency relationship with the first operation layer group. If there is a dependency relationship, record the dependency relationship between the second sorted operation layer and the first operation layer group; otherwise, count the second sorted operation layer into the second operation layer group. By analogy, when searching for each subsequent operation layer after sorting, it is compared with the existing operation layer group to check whether there is a dependency. There are many ways to check whether there is a dependency relationship. For example, when searching for a layer, you can search forward all the inputs of that layer. If you find an operation layer that has direct or indirect input to that layer, it means there is Dependency. At least one operation layer may be included in the operation layer group. When the operating layer group includes two or more operating layers, if the operating layer currently being searched has a dependency relationship with one of the operating layers in the operating layer group, it means that the operating layer currently being searched and the operating layer The group has dependencies and records.

In one of the implementations, the computer device classifies the sorted operation layers into the corresponding operation layer group according to the statistical results, including: if the statistical result is 0, it means that the sorted operation layer does not have a dependency relationship with any operation layer. Then the sorted operation layer belongs to the first independent operation layer group; if the statistical result is 1, it means that the sorted operation layer only has a dependency relationship with an existing operation layer group. If the statistical result is greater than 1, it indicates that there is a dependency between the sorted operation layer and multiple existing operation layer groups. Then the sorted operation layer does not belong to any operation layer group, and the sorted operation layer is classified into the second independent operation layer group. And record the dependency relationship between the second independent operation layer group and multiple existing operation layer groups.

The computer equipment can group the computing resources of the computing platform. Among them, the computer device can group all available computing resources in the computing platform, and can also obtain a preset number of microprocessors and calculation flows in the microprocessors according to preset resource demand conditions. When grouping, the computer equipment can preferentially group according to the microprocessor. When there is one microprocessor, the computer device can group the calculation streams in the microprocessor. When there are two or more microprocessors, the computer device first groups the microprocessors, and then groups the calculation streams in each microprocessor.

The computer device can score the computational complexity of each operation layer group to obtain the corresponding complexity score, and use the dependency relationship and the complexity score to calculate the computing resources that have a mapping relationship with each operation layer group. The computing resources with the mapping relationship may be grouped computing resources.

In this embodiment, by dividing the operation layers into corresponding operation layer groups according to the dependency relationship, the operation layer groups can be mapped to the grouped computing resources. Among them, different operation layer groups can allocate different computing resources, which can enable multiple operation layer groups to operate simultaneously, which can effectively improve the load balance of the computing platform, and the reasoning engine can effectively improve when the neural network model is used for the reasoning process. The computational efficiency of the reasoning process. Moreover, the allocation of computing resources can be completed automatically, effectively reducing manual configuration work.

In the traditional way, when computing resources are allocated to the operation layer of the neural network model, the situation of the reasoning engine is usually not considered. The realization of the reasoning process needs to be scheduled between different microprocessors, which leads to different microprocessors. There are more memory transfers between them, which affects the calculation efficiency of the inference process.

In one of the embodiments, the computer device can access the operation layer group in the order in which the operation layer group is generated. The computer device can map the first operating layer group to the first computing resource. Next, visit the next operation layer group. For simplicity of description, the next operation layer group to be visited can also be referred to as the current operation layer group. If other operation layer groups that have a mapping relationship with the current operation layer group are performing operations in one of the computing resources, the current operation layer group is kept in a waiting state until the operation layer group with the mapping relationship is completed, and the current operation layer group is Enter the same computing resource and start computing. By assigning the operation layer group with the mapping relationship to the same computing resource for calculation, the same reasoning process can be effectively prevented from being carried out between different microprocessors, saving the memory transfer between different microprocessors, and thus being able to infer When the engine executes the reasoning process, it effectively improves the calculation efficiency of the reasoning process.

In one of the embodiments, if there is no dependency between the current operation layer group and the operation layer group under operation, the subsequent operation layer groups after the current operation layer group are detected. If there is no dependency relationship between the subsequent operation layer group and the running operation layer group, the computing resources corresponding to the current operation layer group and the subsequent operation layer group are calculated according to the complexity score.

The complexity score may be calculated by the computer device through multiple dimensions for each operation layer group. The calculation process of the complex score includes: the computer device obtains multiple dimensions for scoring the operation layer group and the weight corresponding to each dimension. Dimensions can include the input size corresponding to the operation layer, the content corresponding to the operation layer, and the time required to calculate the input. The corresponding range and score can be preset for each dimension. For each operation layer group, the computer device performs statistics according to the scores and weights corresponding to the dimensional range of each operation layer, and obtains the complexity score of each operation layer group. The higher the complexity score, the more complicated the calculation process and the longer the calculation time. The lower the complexity score, the simpler the calculation process and the shorter the calculation time.

The computer device compares the complexity score of the current operation layer group with the complexity score of the operation layer group under calculation, and compares the complexity score of the subsequent operation layer group with the complexity score of the operation layer group under operation, and obtains multiple comparisons result. From the multiple comparison results, select an operation layer group that is closest to the complex score of the operation layer group under calculation, and map the closest operation layer group to a computing resource that has not been operated on. Complicated scores are used to calculate the corresponding computing resources for the operation layer groups that do not have a dependency relationship, so that the operation time of all the operation layer groups that do not have a dependency relationship on all computing resources can be made equivalent, and try to make each computing resource end the operation at the same time. Synchronous operations can effectively improve the load balance of the computing platform, thereby promoting the increase in the computational efficiency of the inference process.

Taking a microprocessor as one and four calculation streams as an example, the schematic diagram of each operation layer of the neural network model mapped to the computing resources is shown in Figure 5. The operation layers of the neural network model are divided into four operation layer groups in the manner provided in the foregoing embodiment, and each operation layer group is mapped to a corresponding calculation flow. Among them, operation layer group 0 is mapped to calculation flow 0, operation layer group 1 is mapped to calculation flow 1, operation layer group 2 is mapped to calculation flow 2, and operation layer group 3 is mapped to calculation flow 3. Operation layer group 0, operation layer group 1, and operation layer group 2 do not have a dependency relationship with each other. The calculation flow of calculation flow 0, calculation flow 1, calculation flow 2 can be synchronized to perform the calculation of the inference process, and operation layer 3 is based on operation layer group 0 , The operation layer group 1, the operation layer group 2 operation results as input, in the calculation flow 3 to perform the calculation of the inference process. As a result, while improving the load balance of the computing platform, the computational efficiency of the inference process can be effectively improved.

It should be understood that although the various steps in the flowcharts of FIG. 2 and FIG. 4 are displayed in sequence as indicated by the arrows, these steps are not necessarily performed in sequence in the order indicated by the arrows. Unless explicitly stated in this article, there is no strict order for the execution of these steps, and these steps can be executed in other orders. Moreover, at least part of the steps in Figures 2 and 4 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. These sub-steps or The execution order of the stages is not necessarily carried out sequentially, but may be executed alternately or alternately with other steps or at least a part of other steps or sub-steps or stages.

In one of the embodiments, as shown in FIG. 6, a computing resource allocation device based on an inference engine is provided, including: a resource acquisition module 602, a model invocation module 604, a relationship recognition module 606, a resource mapping module 608, and an inference execution module. Module 610, where:

The resource acquisition module 602 is used to acquire computing resources of the computing platform;

The model calling module 604 is used to call a neural network model, and the neural network model includes multiple operation layers;

The relationship recognition module 606 is used to recognize the dependency relationship between multiple operation layers through the inference engine;

The resource mapping module 608 is used to map each operation layer to the corresponding computing resource; and

The reasoning execution module 610 is used to perform the reasoning process using the neural network model through the reasoning engine according to the dependency relationship and the mapped computing resources.

In one embodiment, the relationship recognition module 606 is also used to obtain a configuration file corresponding to the neural network model; and read the dependency relationship between multiple operation layers in the neural network model in the configuration file.

In one embodiment, the resource mapping module 608 is also used to obtain the name corresponding to each operation layer; to parse the name according to a preset naming format, to identify the computing resource that has a mapping relationship with each operation layer; and to assign each operation layer The layers are allocated to computing resources with a mapping relationship.

In one embodiment, the resource mapping module 608 is also used to topologically sort multiple operation layers, search the sorted operation layers in turn; and generate multiple operation layer groups according to the search results, and map the multiple operation layer groups To the corresponding computing resource.

In one embodiment, the resource mapping module 608 is also used to check the dependency relationship between the sorted operation layer and the existing operation layer group; to collect statistics on the dependency relationship between the sorted operation layer and the existing operation layer group ; And classify the sorted operation layers into the corresponding operation layer groups according to the statistical results.

In one embodiment, the resource mapping module 608 is further configured to divide the sorted operation layer into the first independent operation layer group when the statistical result is 0; when the statistical result is 1, it means that the sorted operation layer is only related to An existing operation layer group has a dependency relationship; and when the statistical result is greater than 1, the sorted operation layer is classified into the second independent operation layer group, and the second independent operation layer group and multiple existing operation layers are recorded Dependencies between groups.

In one embodiment, the resource mapping module 608 is further configured to obtain the generation sequence corresponding to all the operation layer groups when there is a dependency relationship between the operation layer groups; allocate corresponding computing resources to each operation layer group according to the generation sequence; and When the dependent operation layer group is running, the dependent layer group remains in a waiting state until the operation of the dependent operation layer group is completed, and the dependent layer enters the corresponding computing resource to perform the operation.

In one embodiment, the resource mapping module 608 is further configured to obtain the complexity score corresponding to the operation layer group when there is no dependency relationship between the operation layer groups; and use the complexity score to calculate the mapping relationship with each operation layer group. Computing resources.

In one embodiment, the resource mapping module 608 is also used to compare the complexity score of the running operation layer with the complexity scores of multiple operation layers that do not have dependencies to obtain multiple comparison results; Select an operation layer group that is closest to the complex score of the operation layer group that is being calculated; and map the closest operation layer group to the computing resource that has not been calculated.

Regarding the specific limitation of the computing resource allocation device based on the inference engine, please refer to the above limitation on the computing resource allocation method based on the inference engine, which will not be repeated here. The various modules in the foregoing inference engine-based computing resource allocation device can be implemented in whole or in part by software, hardware, and combinations thereof. The above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.

In one embodiment, a computer device is provided, and its internal structure diagram may be as shown in FIG. 7. The computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer readable instructions, and a database. The internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium. The database of the computer equipment is used to store the inference data of the neural network model. The network interface of the computer device is used to communicate with an external terminal through a network connection. When the computer-readable instructions are executed by the processor, a method for allocating computing resources based on an inference engine is realized.

Those skilled in the art can understand that the structure shown in FIG. 7 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.

A computer device includes a memory and one or more processors. The memory stores computer readable instructions. When the computer readable instructions are executed by the processor, the one or more processors execute the above method embodiments. step.

One or more non-volatile computer-readable storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute A step of.

A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through computer-readable instructions. The computer-readable instructions can be stored in a non-volatile computer. In a readable storage medium, when the computer-readable instructions are executed, they may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database, or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

The technical features of the above embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should be It is considered as the range described in this specification.

The above-mentioned embodiments only express several implementation manners of the present application, and the description is relatively specific and detailed, but it should not be understood as a limitation on the scope of the invention patent. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of this application, several modifications and improvements can be made, and these all fall within the protection scope of this application. Therefore, the scope of protection of the patent of this application shall be subject to the appended claims.

Claims

A method for allocating computing resources based on an inference engine, including:

Obtain the computing resources of the computing platform;

Calling a neural network model, the neural network model including multiple operation layers;

Identify the dependency between the multiple operation layers through the inference engine, and map each operation layer to a corresponding computing resource; and

According to the dependency relationship and the mapped computing resources, the neural network model is used to perform the inference process through the inference engine.
The method according to claim 1, wherein the identifying the dependency relationship between the multiple operation layers comprises:

Obtaining a configuration file corresponding to the neural network model; and

Read the dependency relationship among the multiple operation layers in the neural network model in the configuration file.
The method according to claim 1, wherein the mapping each operation layer to a corresponding computing resource comprises:

Get the name corresponding to each operation layer;

Analyze the naming according to the preset naming format, and identify the computing resources that have a mapping relationship with each operation layer; and

Assign each operation layer to computing resources with mapping relationships.
The method according to claim 1, wherein the mapping each operation layer to a corresponding computing resource comprises:

Topologically sort the multiple operation layers, and search the sorted operation layers in sequence; and

Generate multiple operation layer groups according to the search results, and map the multiple operation layer groups to corresponding computing resources.
The method according to claim 4, wherein the sequentially searching the sorted operation layers comprises:

Check the dependency between the sorted operation layer and the existing operation layer group;

Counting the dependency relationship between the sorted operation layer and the existing operation layer group; and

According to the statistical results, the sorted operation layers are classified into corresponding operation layer groups.
The method according to claim 5, wherein the sorting the sorted operation layers into a corresponding operation layer group according to a statistical result comprises:

When the statistical result is 0, classify the sorted operation layer into the first independent operation layer group;

When the statistical result is 1, it means that the sorted operation layer only has a dependency relationship with an existing operation layer group; and

When the statistical result is greater than 1, the sorted operation layer is classified into the second independent operation layer group, and the dependency relationship between the second independent operation layer group and multiple existing operation layer groups is recorded.
The method according to claim 4, wherein the mapping multiple operation layer groups to corresponding computing resources comprises:

When there is a dependency between the operation layer groups, obtain the generation sequence corresponding to all the operation layer groups;

Allocate corresponding computing resources to each operation layer group according to the generation sequence; and

When the dependent operation layer group is running, the dependent layer group remains in a waiting state until the operation of the dependent operation layer group is completed, and the dependent layer enters the corresponding computing resource to perform the operation.
The method according to claim 4, wherein the mapping multiple operation layer groups to corresponding computing resources comprises:

When there is no dependency relationship between the operation layer groups, obtain the complexity score corresponding to the operation layer group; and

Using the complexity score, computing resources that have a mapping relationship with each operation layer group are calculated.
The method according to claim 8, characterized in that said using said complexity score to calculate computing resources having a mapping relationship with each operation layer group comprises:

Compare the complexity score of the running operation layer with the complexity scores of multiple operation layers that do not have dependencies to obtain multiple comparison results;

From multiple comparison results, select an operation layer group that is closest to the complex score of the operation layer group under calculation; and

The closest operation layer group is mapped to the computing resources that have not been operated on.
A computing resource allocation device based on an inference engine includes:

The resource acquisition module is used to acquire the computing resources of the computing platform;

The model calling module is used to call a neural network model, the neural network model including multiple operation layers;

The relationship recognition module is used to recognize the dependency relationship between the multiple operation layers through the inference engine;

The resource mapping module is used to map each operation layer to the corresponding computing resource; and

The reasoning execution module is used to perform the reasoning process using the neural network model through the reasoning engine according to the dependency relationship and the mapped computing resources.
A computer device includes a memory and one or more processors. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the one or more processors, the one or more Each processor performs the following steps:

Obtain the computing resources of the computing platform;

Calling a neural network model, the neural network model including multiple operation layers;

Identify the dependency between the multiple operation layers through the inference engine, and map each operation layer to a corresponding computing resource; and

According to the dependency relationship and the mapped computing resources, the neural network model is used to perform the inference process through the inference engine.
The computer device according to claim 11, wherein the one or more processors further execute the following steps:

Obtaining a configuration file corresponding to the neural network model; and

Read the dependency relationship among the multiple operation layers in the neural network model in the configuration file.
The computer device according to claim 11, wherein the one or more processors further execute the following steps:

Get the name corresponding to each operation layer;

Analyze the naming according to the preset naming format, and identify the computing resources that have a mapping relationship with each operation layer; and

Assign each operation layer to computing resources with mapping relationships.
The computer device according to claim 11, wherein the one or more processors further execute the following steps:

Topologically sort the multiple operation layers, and search the sorted operation layers in sequence; and

Generate multiple operation layer groups according to the search results, and map the multiple operation layer groups to corresponding computing resources.
The computer device according to claim 14, wherein the one or more processors further execute the following steps:

Check the dependency between the sorted operation layer and the existing operation layer group;

Counting the dependency relationship between the sorted operation layer and the existing operation layer group; and

According to the statistical results, the sorted operation layers are classified into corresponding operation layer groups.
The computer device according to claim 15, wherein the one or more processors further execute the following steps:

When the statistical result is 0, classify the sorted operation layer into the first independent operation layer group;

When the statistical result is 1, it means that the sorted operation layer only has a dependency relationship with an existing operation layer group; and

When the statistical result is greater than 1, the sorted operation layer is classified into the second independent operation layer group, and the dependency relationship between the second independent operation layer group and multiple existing operation layer groups is recorded.
The computer device according to claim 14, wherein the one or more processors further execute the following steps:

When there is a dependency between the operation layer groups, obtain the generation sequence corresponding to all the operation layer groups;

Allocate corresponding computing resources to each operation layer group according to the generation sequence; and

When the dependent operation layer group is running, the dependent layer group remains in a waiting state until the operation of the dependent operation layer group is completed, and the dependent layer enters the corresponding computing resource to perform the operation.
The computer device according to claim 14, wherein the one or more processors further execute the following steps:

When there is no dependency relationship between the operation layer groups, obtain the complexity score corresponding to the operation layer group; and

Using the complexity score, computing resources that have a mapping relationship with each operation layer group are calculated.
The computer device according to claim 18, wherein the one or more processors further execute the following steps:

Compare the complexity score of the running operation layer with the complexity scores of multiple operation layers that do not have dependencies to obtain multiple comparison results;

From multiple comparison results, select an operation layer group that is closest to the complex score of the operation layer group under calculation; and

The closest operation layer group is mapped to the computing resources that have not been operated on.
One or more non-volatile computer-readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to execute claim 1 -9 any of the steps.