WO2021134231A1 - Computing resource allocation method and apparatus based on inference engine, and computer device - Google Patents

Computing resource allocation method and apparatus based on inference engine, and computer device Download PDF

Info

Publication number
WO2021134231A1
WO2021134231A1 PCT/CN2019/129973 CN2019129973W WO2021134231A1 WO 2021134231 A1 WO2021134231 A1 WO 2021134231A1 CN 2019129973 W CN2019129973 W CN 2019129973W WO 2021134231 A1 WO2021134231 A1 WO 2021134231A1
Authority
WO
WIPO (PCT)
Prior art keywords
operation layer
computing resources
layer group
layers
neural network
Prior art date
Application number
PCT/CN2019/129973
Other languages
French (fr)
Chinese (zh)
Inventor
庄奇
Original Assignee
深圳元戎启行科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳元戎启行科技有限公司 filed Critical 深圳元戎启行科技有限公司
Priority to PCT/CN2019/129973 priority Critical patent/WO2021134231A1/en
Priority to CN201980037488.6A priority patent/CN113412493A/en
Publication of WO2021134231A1 publication Critical patent/WO2021134231A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Definitions

  • This application relates to a method, device, computer equipment and storage medium for allocating computing resources based on an inference engine.
  • deep learning technology As a research direction in the field of artificial intelligence, deep learning technology has been applied in many aspects such as speech recognition, image recognition, and natural language processing. With the development of deep learning technology, the scale of the basic neural network model of deep learning technology has also become larger and larger. The operation layer of the neural network model becomes more, and the link between the operation layer and the operation layer also becomes more complicated.
  • the inference engine can use the neural network model and the hardware computing resources of the computing platform (hereinafter referred to as computing resources) to realize the inference function. If the load balancing of computing resources is not done well, some computing resources may be overused or left unused, which will greatly affect the computing efficiency of the inference process. Therefore, in the face of large-scale neural network models, how to effectively improve the computational efficiency of the inference process by improving the load balance of computational resources has become a technical problem that needs to be solved at present.
  • a method, device, computer device, and storage medium for allocating computing resources based on an inference engine are provided.
  • a method for allocating computing resources based on an inference engine including:
  • the neural network model including multiple operation layers
  • the neural network model is used to perform the inference process through the inference engine.
  • a computing resource allocation device based on an inference engine includes:
  • the resource acquisition module is used to acquire the computing resources of the computing platform
  • the model calling module is used to call a neural network model, the neural network model including multiple operation layers;
  • the relationship recognition module is used to recognize the dependency relationship between the multiple operation layers through the inference engine
  • the resource mapping module is used to map each operation layer to the corresponding computing resource.
  • the reasoning execution module is used to perform the reasoning process using the neural network model through the reasoning engine according to the dependency relationship and the mapped computing resources.
  • a computer device includes a memory and one or more processors.
  • the memory stores computer-readable instructions.
  • the computer-readable instructions are executed by the one or more processors, the one or more Each processor performs the following steps:
  • the neural network model including multiple operation layers
  • the neural network model is used to perform the inference process through the inference engine.
  • One or more non-volatile computer-readable storage media storing computer-readable instructions.
  • the one or more processors execute the following steps:
  • the neural network model including multiple operation layers
  • the neural network model is used to perform the inference process through the inference engine.
  • Fig. 1 is an application scenario diagram of a calculation resource allocation method based on an inference engine according to one or more embodiments.
  • Fig. 2 is a schematic flowchart of a method for allocating computing resources based on an inference engine according to one or more embodiments.
  • FIG. 3 is a schematic diagram of the dependency relationship and naming between the various operation layers in the neural network model according to one or more embodiments.
  • Fig. 4 is a schematic flowchart of the steps of mapping each operation layer to a corresponding computing resource in an embodiment.
  • Fig. 5 is a schematic diagram of mapping various operation layers of a neural network model to computing resources according to one or more embodiments.
  • Fig. 6 is a block diagram of an apparatus for allocating computing resources based on an inference engine according to one or more embodiments.
  • Figure 7 is a block diagram of a computer device according to one or more embodiments.
  • the method for allocating computational resources based on the inference engine provided in this application can be specifically applied in the field of autonomous driving, and the neural network model can specifically include at least one of an image recognition model, a behavior prediction model, or a risk assessment model.
  • the neural network model may be an image recognition model, and the calculation resource allocation method based on the reasoning engine provided in this application may be applied to the application environment as shown in FIG. 1.
  • the autonomous vehicle may include a sensor 102 and a computer device 104, and the sensor 102 may communicate with the computer device 104.
  • the sensor 102 can collect an image of the environment within the visual range. For example, when an autonomous vehicle is driving to an intersection, the sensor 102 can collect traffic signal images.
  • the computer device 104 performs image recognition according to the signal light image collected by the sensor 102, and judges the color of the signal light in the image. Specifically, the computer device 104 can obtain multiple computing resources and call a neural network model.
  • the neural network model includes multiple operation layers.
  • the inference engine recognizes the dependencies between the multiple operation layers, and maps each operation layer to the corresponding Computing resources.
  • the computer device 104 uses the neural network model to perform the inference process through the inference engine according to the dependency relationship and the mapped computing resources to obtain the color of the signal light in the signal light image.
  • the neural network model inference method implements the inference of the neural network model and can be applied to a variety of application environments, and the neural network model can include multiple types.
  • the neural network model may include a convolutional neural network model, a recurrent neural network model, and a recurrent neural network model.
  • the neural network model can be used to process a variety of different data.
  • the neural network model may specifically include an image recognition model, a feature extraction model, a speech recognition model, a text recognition model, and a scene classification model.
  • a method for allocating computing resources based on an inference engine is provided. Taking the method applied to the computer device in FIG. 1 as an example for description, the method specifically includes the following steps:
  • Step 202 Obtain computing resources of the computing platform.
  • the computing platform may be a platform used by computer equipment to perform automatic control operations.
  • the computer device may be a stand-alone device, such as a vehicle-mounted computer device.
  • the computer platform has corresponding computing resources. Computing resources include multiple microprocessors, and each microprocessor includes multiple computing streams.
  • the computer device may read the computing resource corresponding to the condition according to a preset condition, and the read computing resource may be a part of the computing resource of the computing platform.
  • the computer equipment can also read all the computing resources of the computing platform.
  • step 204 the neural network model is invoked, and the neural network model includes multiple operation layers.
  • the neural network model is pre-stored in the computer equipment, and the neural network model can be pre-trained.
  • the neural network model can be used in the inference engine to implement the inference process in the computing platform.
  • the neural network model includes multiple operating layers.
  • the reasoning engine can use different neural network models to realize the corresponding reasoning process. For example, when performing image recognition, a neural network model related to image recognition can be used to perform inference process operations. When performing natural language processing, neural network models related to natural language processing can be used to perform inference process operations.
  • Step 206 Identify the dependency between multiple operation layers through the inference engine, and map each operation layer to a corresponding computing resource.
  • Dependency means that the input of one operating layer depends on the output of other operating layers.
  • the operation layer with dependency relationship can be called the dependent layer and the dependent layer respectively.
  • the output of the dependent layer can form the input of the dependent layer.
  • the current dependent layer can also be a dependent layer corresponding to other operating layers.
  • the dependent layer corresponding to the dependent layer can be one layer or multiple layers.
  • An inference engine is installed in the computer equipment.
  • the inference engine can identify the dependency between each operation layer in the neural network, and map each operation layer to the corresponding computing resource.
  • the computer device can also use the inference engine to identify the dependency relationship between each operation layer, divide the operation layer with the dependency relationship into the corresponding operation layer group, and map the operation layer group to the corresponding computing resource.
  • the dependency between the operation layers can be pre-configured, or it can be obtained by searching and analyzing the operation layers.
  • Step 208 uses the neural network model to perform the inference process through the inference engine.
  • the computer equipment should set the corresponding computing resources to the operating layer of the neural network model, so that each operating layer can be mapped to the corresponding microprocessor, or mapped to the data stream of the corresponding microprocessor.
  • the operating layers with dependencies can be mapped to the same microprocessor.
  • Operating layers that do not have dependencies can be mapped to different microprocessors. In this way, the calculation resources of the multiple operation layers of the neural network model can be reasonably and effectively allocated.
  • the inference engine can use the neural network model and the computing resources allocated by each operating layer to perform the inference process.
  • each operating layer is mapped to the corresponding computing resource in the computing platform, thereby reasonably allocating computing resources, effectively improving computing resources Load balancing between. Therefore, the inference engine can effectively improve the calculation efficiency of the inference process when the inference engine uses the neural network model and the computing resources allocated by each operating layer to perform the inference process.
  • the computer device may obtain a configuration file corresponding to the neural network model; read the dependency relationship between multiple operation layers in the neural network model in the configuration file.
  • the configuration file may be configured by the user in advance according to the structure of the neural network model.
  • the configuration file can record the next operation layer corresponding to each operation layer, that is, record the dependency relationship between the operation layers.
  • the configuration file can also record the naming of each operation layer.
  • the computer device can map each operation layer to the corresponding computing resource according to the naming of the operation layer.
  • mapping each operation layer to the corresponding computing resource includes: obtaining the name corresponding to each operation layer; parsing the name according to a preset naming format, and identifying calculations that have a mapping relationship with each operation layer Resources; each operating layer is allocated to computing resources with a mapping relationship.
  • the dependency relationship between the operation layer and the operation layer can be defined according to the structure of the neural network model. You can also specify the corresponding computing resource for each operation layer by naming the operation layer. The naming of the operating layer can be recorded in the corresponding configuration file.
  • the inference engine can read the dependency relationship between the operation layers and the corresponding naming of each operation layer in the configuration file.
  • the reasoning engine can obtain the preset naming format, analyze the naming of each operation layer according to the preset naming format, identify the computing resources that have a mapping relationship with each operation layer, and assign each operation layer to the computing resources with the mapping relationship .
  • Computing resources include microprocessors, or microprocessors and calculation flows in microprocessors.
  • the schematic diagram of the dependency relationship and naming between the various operating layers in the neural network model can be shown in Figure 3.
  • the arrows between the operation layers can indicate the dependencies between each other.
  • the dependent layer corresponding to a dependent layer can be one layer or multiple layers.
  • the dependent layer corresponding to operation layer 2 is operation layer 1
  • the dependent layer corresponding to operation layer 6 is operation layer 2 and operation layer 5.
  • the naming of each operation layer includes the operation layer, the microprocessor, and the calculation flow. Among them, operation layer 0, operation layer 1, and operation layer 2 with dependencies can be mapped to the calculation flow 0 of the microprocessor GPU0.
  • the operation layer 3, the operation layer 4, and the operation layer 5 with dependencies can be mapped to the calculation flow 1 of the microprocessor GPU0, and the operation layer 6 can be mapped to the calculation flow 2 of the microprocessor GPU0.
  • different operating layers can be mapped to the same microprocessor, operating layers with dependencies can be mapped to the same calculation flow of the same microprocessor, and operating layers with different dependencies can be mapped to the same microprocessor.
  • better load balancing can be achieved by configuring each operation layer in the neural network model.
  • the reasoning process can be reasonably allocated to the corresponding computing resources for processing, and the calculation efficiency of the reasoning process can be effectively improved.
  • the step of mapping each operation layer to a corresponding computing resource includes:
  • Step 402 Topologically sort the multiple operation layers, and search the sorted operation layers in sequence.
  • Step 404 Generate multiple operation layer groups according to the search results, and map the multiple operation layer groups to corresponding computing resources.
  • the computer equipment topologically sorts all the operating layers of the neural network model. Among them, it can be sorted according to the input and output relationship between the operation layers.
  • the sorted operation layer is searched in sequence according to the order of input and output. Through searching, the operating layers with dependencies can be classified into the same operating layer group.
  • the search for the next layer can be skipped directly.
  • the next layer does not belong to the operation layer of the input layer and the constant layer (it can also be called the operation layer with output dependency), check the dependency relationship with each existing operation layer group.
  • sequentially searching the sorted operation layers includes: checking the dependency between the sorted operation layers and the existing operation layer group; and comparing the sorted operation layers with the existing operation layer group Statistics are performed on the dependency relationship; the sorted operation layer is classified into the corresponding operation layer group according to the statistical result.
  • the sorted operation layer when searching for the first operation layer with output dependency (referred to as the sorted operation layer for short), there is no existing operation layer group, and the first sorted operation layer can be counted as the first operation layer. Operation layer group.
  • searching for the second sorted operation layer check whether there is a dependency relationship with the first operation layer group. If there is a dependency relationship, record the dependency relationship between the second sorted operation layer and the first operation layer group; otherwise, count the second sorted operation layer into the second operation layer group.
  • searching for each subsequent operation layer after sorting it is compared with the existing operation layer group to check whether there is a dependency. There are many ways to check whether there is a dependency relationship.
  • At least one operation layer may be included in the operation layer group.
  • the operating layer group includes two or more operating layers, if the operating layer currently being searched has a dependency relationship with one of the operating layers in the operating layer group, it means that the operating layer currently being searched and the operating layer The group has dependencies and records.
  • the computer device classifies the sorted operation layers into the corresponding operation layer group according to the statistical results, including: if the statistical result is 0, it means that the sorted operation layer does not have a dependency relationship with any operation layer. Then the sorted operation layer belongs to the first independent operation layer group; if the statistical result is 1, it means that the sorted operation layer only has a dependency relationship with an existing operation layer group. If the statistical result is greater than 1, it indicates that there is a dependency between the sorted operation layer and multiple existing operation layer groups. Then the sorted operation layer does not belong to any operation layer group, and the sorted operation layer is classified into the second independent operation layer group. And record the dependency relationship between the second independent operation layer group and multiple existing operation layer groups.
  • the computer equipment can group the computing resources of the computing platform.
  • the computer device can group all available computing resources in the computing platform, and can also obtain a preset number of microprocessors and calculation flows in the microprocessors according to preset resource demand conditions.
  • the computer equipment can preferentially group according to the microprocessor.
  • the computer device can group the calculation streams in the microprocessor.
  • the computer device first groups the microprocessors, and then groups the calculation streams in each microprocessor.
  • the computer device can score the computational complexity of each operation layer group to obtain the corresponding complexity score, and use the dependency relationship and the complexity score to calculate the computing resources that have a mapping relationship with each operation layer group.
  • the computing resources with the mapping relationship may be grouped computing resources.
  • the operation layer groups can be mapped to the grouped computing resources.
  • different operation layer groups can allocate different computing resources, which can enable multiple operation layer groups to operate simultaneously, which can effectively improve the load balance of the computing platform, and the reasoning engine can effectively improve when the neural network model is used for the reasoning process.
  • the computational efficiency of the reasoning process can be Moreover, the allocation of computing resources can be completed automatically, effectively reducing manual configuration work.
  • the computer device can access the operation layer group in the order in which the operation layer group is generated.
  • the computer device can map the first operating layer group to the first computing resource.
  • visit the next operation layer group For simplicity of description, the next operation layer group to be visited can also be referred to as the current operation layer group. If other operation layer groups that have a mapping relationship with the current operation layer group are performing operations in one of the computing resources, the current operation layer group is kept in a waiting state until the operation layer group with the mapping relationship is completed, and the current operation layer group is Enter the same computing resource and start computing.
  • the same reasoning process can be effectively prevented from being carried out between different microprocessors, saving the memory transfer between different microprocessors, and thus being able to infer
  • the engine executes the reasoning process, it effectively improves the calculation efficiency of the reasoning process.
  • the subsequent operation layer groups after the current operation layer group are detected. If there is no dependency relationship between the subsequent operation layer group and the running operation layer group, the computing resources corresponding to the current operation layer group and the subsequent operation layer group are calculated according to the complexity score.
  • the complexity score may be calculated by the computer device through multiple dimensions for each operation layer group.
  • the calculation process of the complex score includes: the computer device obtains multiple dimensions for scoring the operation layer group and the weight corresponding to each dimension. Dimensions can include the input size corresponding to the operation layer, the content corresponding to the operation layer, and the time required to calculate the input.
  • the corresponding range and score can be preset for each dimension.
  • the computer device For each operation layer group, the computer device performs statistics according to the scores and weights corresponding to the dimensional range of each operation layer, and obtains the complexity score of each operation layer group. The higher the complexity score, the more complicated the calculation process and the longer the calculation time. The lower the complexity score, the simpler the calculation process and the shorter the calculation time.
  • the computer device compares the complexity score of the current operation layer group with the complexity score of the operation layer group under calculation, and compares the complexity score of the subsequent operation layer group with the complexity score of the operation layer group under operation, and obtains multiple comparisons result. From the multiple comparison results, select an operation layer group that is closest to the complex score of the operation layer group under calculation, and map the closest operation layer group to a computing resource that has not been operated on. Complicated scores are used to calculate the corresponding computing resources for the operation layer groups that do not have a dependency relationship, so that the operation time of all the operation layer groups that do not have a dependency relationship on all computing resources can be made equivalent, and try to make each computing resource end the operation at the same time. Synchronous operations can effectively improve the load balance of the computing platform, thereby promoting the increase in the computational efficiency of the inference process.
  • each operation layer of the neural network model mapped to the computing resources is shown in Figure 5.
  • the operation layers of the neural network model are divided into four operation layer groups in the manner provided in the foregoing embodiment, and each operation layer group is mapped to a corresponding calculation flow.
  • operation layer group 0 is mapped to calculation flow
  • operation layer group 1 is mapped to calculation flow 1
  • operation layer group 2 is mapped to calculation flow 2
  • operation layer group 3 is mapped to calculation flow 3.
  • Operation layer group 0, operation layer group 1, and operation layer group 2 do not have a dependency relationship with each other.
  • calculation flow of calculation flow 0, calculation flow 1, calculation flow 2 can be synchronized to perform the calculation of the inference process, and operation layer 3 is based on operation layer group 0 ,
  • operation layer group 1, the operation layer group 2 operation results as input, in the calculation flow 3 to perform the calculation of the inference process.
  • a computing resource allocation device based on an inference engine including: a resource acquisition module 602, a model invocation module 604, a relationship recognition module 606, a resource mapping module 608, and an inference execution module.
  • Module 610 where:
  • the resource acquisition module 602 is used to acquire computing resources of the computing platform
  • the model calling module 604 is used to call a neural network model, and the neural network model includes multiple operation layers;
  • the relationship recognition module 606 is used to recognize the dependency relationship between multiple operation layers through the inference engine
  • the resource mapping module 608 is used to map each operation layer to the corresponding computing resource.
  • the reasoning execution module 610 is used to perform the reasoning process using the neural network model through the reasoning engine according to the dependency relationship and the mapped computing resources.
  • the relationship recognition module 606 is also used to obtain a configuration file corresponding to the neural network model; and read the dependency relationship between multiple operation layers in the neural network model in the configuration file.
  • the resource mapping module 608 is also used to obtain the name corresponding to each operation layer; to parse the name according to a preset naming format, to identify the computing resource that has a mapping relationship with each operation layer; and to assign each operation layer
  • the layers are allocated to computing resources with a mapping relationship.
  • the resource mapping module 608 is also used to topologically sort multiple operation layers, search the sorted operation layers in turn; and generate multiple operation layer groups according to the search results, and map the multiple operation layer groups To the corresponding computing resource.
  • the resource mapping module 608 is also used to check the dependency relationship between the sorted operation layer and the existing operation layer group; to collect statistics on the dependency relationship between the sorted operation layer and the existing operation layer group ; And classify the sorted operation layers into the corresponding operation layer groups according to the statistical results.
  • the resource mapping module 608 is further configured to divide the sorted operation layer into the first independent operation layer group when the statistical result is 0; when the statistical result is 1, it means that the sorted operation layer is only related to An existing operation layer group has a dependency relationship; and when the statistical result is greater than 1, the sorted operation layer is classified into the second independent operation layer group, and the second independent operation layer group and multiple existing operation layers are recorded Dependencies between groups.
  • the resource mapping module 608 is further configured to obtain the generation sequence corresponding to all the operation layer groups when there is a dependency relationship between the operation layer groups; allocate corresponding computing resources to each operation layer group according to the generation sequence; and When the dependent operation layer group is running, the dependent layer group remains in a waiting state until the operation of the dependent operation layer group is completed, and the dependent layer enters the corresponding computing resource to perform the operation.
  • the resource mapping module 608 is further configured to obtain the complexity score corresponding to the operation layer group when there is no dependency relationship between the operation layer groups; and use the complexity score to calculate the mapping relationship with each operation layer group. Computing resources.
  • the resource mapping module 608 is also used to compare the complexity score of the running operation layer with the complexity scores of multiple operation layers that do not have dependencies to obtain multiple comparison results; Select an operation layer group that is closest to the complex score of the operation layer group that is being calculated; and map the closest operation layer group to the computing resource that has not been calculated.
  • the various modules in the foregoing inference engine-based computing resource allocation device can be implemented in whole or in part by software, hardware, and combinations thereof.
  • the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
  • a computer device in one embodiment, is provided, and its internal structure diagram may be as shown in FIG. 7.
  • the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium.
  • the database of the computer equipment is used to store the inference data of the neural network model.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • FIG. 7 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • a computer device includes a memory and one or more processors.
  • the memory stores computer readable instructions.
  • the one or more processors execute the above method embodiments. step.
  • One or more non-volatile computer-readable storage media storing computer-readable instructions.
  • the computer-readable instructions execute A step of.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Provided is a computing resource allocation method based on an inference engine, the method comprising: acquiring computing resources of a computing platform (202); calling a neural network model, wherein the neural network model comprises a plurality of operation layers (204); identifying a dependence relationship between the plurality of operation layers by means of an inference engine, and mapping each operation layer to a corresponding computing resource (206); and according to the dependence relationship and the mapped computing resource, using the neural network model to perform an inference process by means of the inference engine (208).

Description

基于推理引擎的计算资源分配方法、装置和计算机设备Calculation resource allocation method, device and computer equipment based on reasoning engine 技术领域Technical field
本申请涉及一种基于推理引擎的计算资源分配方法、装置、计算机设备和存储介质。This application relates to a method, device, computer equipment and storage medium for allocating computing resources based on an inference engine.
背景技术Background technique
深度学习技术作为人工智能领域的研究方向,在语音识别、图像识别、自然语言处理等多个方面得以应用。随着深度学习技术的发展,深度学习技术的基础神经网络模型的规模也变得越大巨大。神经网络模型的操作层变得更多,操作层与操作层之间的链接也随之变得更加复杂。As a research direction in the field of artificial intelligence, deep learning technology has been applied in many aspects such as speech recognition, image recognition, and natural language processing. With the development of deep learning technology, the scale of the basic neural network model of deep learning technology has also become larger and larger. The operation layer of the neural network model becomes more, and the link between the operation layer and the operation layer also becomes more complicated.
推理引擎可以利用神经网络模型与计算平台的硬件计算资源(下文简称为计算资源)实现推理功能。如果计算资源的负载均衡做的不好,部分计算资源可能被过度使用,也可能被闲置,由此会导致推理过程的计算效率受到较大影响。因此,在面对大型的神经网络模型时,如何通过改善计算资源的负载均衡有效提高推理过程的计算效率成为目前需要解决的一个技术问题。The inference engine can use the neural network model and the hardware computing resources of the computing platform (hereinafter referred to as computing resources) to realize the inference function. If the load balancing of computing resources is not done well, some computing resources may be overused or left unused, which will greatly affect the computing efficiency of the inference process. Therefore, in the face of large-scale neural network models, how to effectively improve the computational efficiency of the inference process by improving the load balance of computational resources has become a technical problem that needs to be solved at present.
发明内容Summary of the invention
根据本申请公开的各种实施例,提供一种基于推理引擎的计算资源分配方法、装置、计算机设备和存储介质。According to various embodiments disclosed in the present application, a method, device, computer device, and storage medium for allocating computing resources based on an inference engine are provided.
一种基于推理引擎的计算资源分配方法,包括:A method for allocating computing resources based on an inference engine, including:
获取计算平台的计算资源;Obtain the computing resources of the computing platform;
调用神经网络模型,所述神经网络模型包括多个操作层;Calling a neural network model, the neural network model including multiple operation layers;
通过推理引擎识别所述多个操作层之间的依赖关系,将每个操作层映射至对应的计算资源;及Identify the dependency between the multiple operation layers through the inference engine, and map each operation layer to a corresponding computing resource; and
根据所述依赖关系以及所映射的计算资源,通过所述推理引擎利用所述神经网络模型进行推理过程。According to the dependency relationship and the mapped computing resources, the neural network model is used to perform the inference process through the inference engine.
一种基于推理引擎的计算资源分配装置,包括:A computing resource allocation device based on an inference engine includes:
资源获取模块,用于获取计算平台的计算资源;The resource acquisition module is used to acquire the computing resources of the computing platform;
模型调用模块,用于调用神经网络模型,所述神经网络模型包括多个操作层;The model calling module is used to call a neural network model, the neural network model including multiple operation layers;
关系识别模块,用于通过推理引擎识别所述多个操作层之间的依赖关系;The relationship recognition module is used to recognize the dependency relationship between the multiple operation layers through the inference engine;
资源映射模块,用于将每个操作层映射至对应的计算资源;及The resource mapping module is used to map each operation layer to the corresponding computing resource; and
推理执行模块,用于根据所述依赖关系以及所映射的计算资源,通过所述推理引擎利用所述神经网络模型进行推理过程。The reasoning execution module is used to perform the reasoning process using the neural network model through the reasoning engine according to the dependency relationship and the mapped computing resources.
一种计算机设备,包括存储器及一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:A computer device includes a memory and one or more processors. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the one or more processors, the one or more Each processor performs the following steps:
获取计算平台的计算资源;Obtain the computing resources of the computing platform;
调用神经网络模型,所述神经网络模型包括多个操作层;Calling a neural network model, the neural network model including multiple operation layers;
通过推理引擎识别所述多个操作层之间的依赖关系,将每个操作层映射至对应的计算资源;及Identify the dependency between the multiple operation layers through the inference engine, and map each operation layer to a corresponding computing resource; and
根据所述依赖关系以及所映射的计算资源,通过所述推理引擎利用所述神经网络模型进行推理过程。According to the dependency relationship and the mapped computing resources, the neural network model is used to perform the inference process through the inference engine.
一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:One or more non-volatile computer-readable storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:
获取计算平台的计算资源;Obtain the computing resources of the computing platform;
调用神经网络模型,所述神经网络模型包括多个操作层;Calling a neural network model, the neural network model including multiple operation layers;
通过推理引擎识别所述多个操作层之间的依赖关系,将每个操作层映射至对应的计算资源;及Identify the dependency between the multiple operation layers through the inference engine, and map each operation layer to a corresponding computing resource; and
根据所述依赖关系以及所映射的计算资源,通过所述推理引擎利用所述神经网络模型进行推理过程。According to the dependency relationship and the mapped computing resources, the neural network model is used to perform the inference process through the inference engine.
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请 的其它特征和优点将从说明书、附图以及权利要求书变得明显。The details of one or more embodiments of the present application are set forth in the following drawings and description. Other features and advantages of this application will become apparent from the description, drawings and claims.
附图说明Description of the drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings needed in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. A person of ordinary skill in the art can obtain other drawings based on these drawings without creative work.
图1为根据一个或多个实施例中基于推理引擎的计算资源分配方法的应用场景图。Fig. 1 is an application scenario diagram of a calculation resource allocation method based on an inference engine according to one or more embodiments.
图2为根据一个或多个实施例中基于推理引擎的计算资源分配方法的流程示意图。Fig. 2 is a schematic flowchart of a method for allocating computing resources based on an inference engine according to one or more embodiments.
图3为根据一个或多个实施例中神经网络模型中各操作层之间的依赖关系与命名的示意图。FIG. 3 is a schematic diagram of the dependency relationship and naming between the various operation layers in the neural network model according to one or more embodiments.
图4为一个实施例中将每个操作层映射至对应的计算资源的步骤的流程示意图。Fig. 4 is a schematic flowchart of the steps of mapping each operation layer to a corresponding computing resource in an embodiment.
图5为根据一个或多个实施例中神经网络模型的各个操作层映射至计算资源的示意图。Fig. 5 is a schematic diagram of mapping various operation layers of a neural network model to computing resources according to one or more embodiments.
图6为根据一个或多个实施例中基于推理引擎的计算资源分配装置的框图。Fig. 6 is a block diagram of an apparatus for allocating computing resources based on an inference engine according to one or more embodiments.
图7为根据一个或多个实施例中计算机设备的框图。Figure 7 is a block diagram of a computer device according to one or more embodiments.
具体实施方式Detailed ways
为了使本申请的技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the technical solutions and advantages of the present application clearer, the following further describes the present application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, and are not used to limit the present application.
在其中一个实施例中,本申请提供的基于推理引擎的计算资源分配方法具体可以应用于自动驾驶领域中,神经网络模型具体可以包括图像识别模型、 行为预测模型或者风险评估模型等中的至少一种。例如,神经网络模型可以是图像识别模型,本申请提供的基于推理引擎的计算资源分配方法可以应用于如图1所示的应用环境中。自动驾驶车辆可以包括传感器102和计算机设备104,传感器102可以与计算机设备104进行通信。传感器102可以采集视觉范围内的环境图像。比如在自动驾驶车辆行驶至路口时,传感器102可以采集交通信号灯图像。计算机设备104根据传感器102采集的信号灯图像进行图像识别,判断图像中信号灯的颜色。具体的,计算机设备104可以获取多个计算资源,调用神经网络模型,神经网络模型包括多个操作层,通过推理引擎识别多个操作层之间的依赖关系,将每个操作层映射至对应的计算资源。计算机设备104根据依赖关系以及所映射的计算资源,通过推理引擎利用神经网络模型进行推理过程,得到信号灯图像中信号灯的颜色。In one of the embodiments, the method for allocating computational resources based on the inference engine provided in this application can be specifically applied in the field of autonomous driving, and the neural network model can specifically include at least one of an image recognition model, a behavior prediction model, or a risk assessment model. Kind. For example, the neural network model may be an image recognition model, and the calculation resource allocation method based on the reasoning engine provided in this application may be applied to the application environment as shown in FIG. 1. The autonomous vehicle may include a sensor 102 and a computer device 104, and the sensor 102 may communicate with the computer device 104. The sensor 102 can collect an image of the environment within the visual range. For example, when an autonomous vehicle is driving to an intersection, the sensor 102 can collect traffic signal images. The computer device 104 performs image recognition according to the signal light image collected by the sensor 102, and judges the color of the signal light in the image. Specifically, the computer device 104 can obtain multiple computing resources and call a neural network model. The neural network model includes multiple operation layers. The inference engine recognizes the dependencies between the multiple operation layers, and maps each operation layer to the corresponding Computing resources. The computer device 104 uses the neural network model to perform the inference process through the inference engine according to the dependency relationship and the mapped computing resources to obtain the color of the signal light in the signal light image.
可以理解的,本申请提供的神经网络模型的推理方法实现对神经网络模型进行推理,可以应用于多种应用环境,神经网络模型可以包括多种类型。例如,神经网络模型可以包括卷积神经网络模型、循环神经网络模型以及递归神经网络模型等。神经网络模型可以用于处理多种不同的数据。例如,神经网络模型具体可以包括图像识别模型、特征提取模型、语音识别模型、文本识别模型以及场景分类模型等。It is understandable that the neural network model inference method provided in the present application implements the inference of the neural network model and can be applied to a variety of application environments, and the neural network model can include multiple types. For example, the neural network model may include a convolutional neural network model, a recurrent neural network model, and a recurrent neural network model. The neural network model can be used to process a variety of different data. For example, the neural network model may specifically include an image recognition model, a feature extraction model, a speech recognition model, a text recognition model, and a scene classification model.
在一个实施例中,提供了一种基于推理引擎的计算资源分配方法,以该方法应用于图1中的计算机设备为例进行说明,具体包括以下步骤:In one embodiment, a method for allocating computing resources based on an inference engine is provided. Taking the method applied to the computer device in FIG. 1 as an example for description, the method specifically includes the following steps:
步骤202,获取计算平台的计算资源。Step 202: Obtain computing resources of the computing platform.
计算平台可以是计算机设备用于进行自动控制运算的平台。计算机设备可以是独立的设备,例如车载计算机设备等。计算机平台具有相应的计算资源。计算资源包括多个微处理器,每个微处理器包括多个计算流。计算机设备可以根据预先设置的条件,读取与该条件相对应的计算资源,读取到的计算资源可以是计算平台的部分计算资源。计算机设备也可以读取计算平台所有的计算资源。The computing platform may be a platform used by computer equipment to perform automatic control operations. The computer device may be a stand-alone device, such as a vehicle-mounted computer device. The computer platform has corresponding computing resources. Computing resources include multiple microprocessors, and each microprocessor includes multiple computing streams. The computer device may read the computing resource corresponding to the condition according to a preset condition, and the read computing resource may be a part of the computing resource of the computing platform. The computer equipment can also read all the computing resources of the computing platform.
步骤204,调用神经网络模型,神经网络模型包括多个操作层。In step 204, the neural network model is invoked, and the neural network model includes multiple operation layers.
计算机设备中预设存储了神经网络模型,神经网络模型可以是预先训练好的。神经网络模型可以用于推理引擎在计算平台中实现推理过程。神经网络模型包括多个操作层。在面对不同的业务需求时,推理引擎可以采用不同的神经网络模型,实现相应的推理过程。例如,在进行图像识别时,可以采用与图像识别相关的神经网络模型进行推理过程运算。在进行自然语言处理时,可以采用与自然语言处理相关的神经网络模型进行推理过程运算。The neural network model is pre-stored in the computer equipment, and the neural network model can be pre-trained. The neural network model can be used in the inference engine to implement the inference process in the computing platform. The neural network model includes multiple operating layers. When facing different business requirements, the reasoning engine can use different neural network models to realize the corresponding reasoning process. For example, when performing image recognition, a neural network model related to image recognition can be used to perform inference process operations. When performing natural language processing, neural network models related to natural language processing can be used to perform inference process operations.
步骤206,通过推理引擎识别多个操作层之间的依赖关系,将每个操作层映射至对应的计算资源。Step 206: Identify the dependency between multiple operation layers through the inference engine, and map each operation layer to a corresponding computing resource.
神经网络模型的多个操作层之间具有依赖关系。依赖关系是指其中一个操作层的输入依赖于其他操作层的输出。其中,具有依赖关系的操作层可以分别被称为依赖层与被依赖层。被依赖层的输出可以形成依赖层的输入。当前的依赖层也可以是其他操作层对应的被依赖层。依赖层所对应的被依赖层可以是一层也可以是多层。There are dependencies among the multiple operating layers of the neural network model. Dependency means that the input of one operating layer depends on the output of other operating layers. Among them, the operation layer with dependency relationship can be called the dependent layer and the dependent layer respectively. The output of the dependent layer can form the input of the dependent layer. The current dependent layer can also be a dependent layer corresponding to other operating layers. The dependent layer corresponding to the dependent layer can be one layer or multiple layers.
计算机设备中安装了推理引擎。通过推理引擎可以识别神经网络中每个操作层之间的依赖关系,将每个操作层分别映射至对应的计算资源。计算机设备还可以通过推理引擎在识别到每个操作层之间的依赖关系后,将存在依赖关系的操作层划分为相应的操作层组,将操作层组映射至对应的计算资源。操作层之间的依赖关系可以是预先配置的,也可以是通过对操作层进行搜索后分析得到的。An inference engine is installed in the computer equipment. The inference engine can identify the dependency between each operation layer in the neural network, and map each operation layer to the corresponding computing resource. The computer device can also use the inference engine to identify the dependency relationship between each operation layer, divide the operation layer with the dependency relationship into the corresponding operation layer group, and map the operation layer group to the corresponding computing resource. The dependency between the operation layers can be pre-configured, or it can be obtained by searching and analyzing the operation layers.
步骤208,根据依赖关系以及所映射的计算资源,通过推理引擎利用神经网络模型进行推理过程。 Step 208, according to the dependency relationship and the mapped computing resources, use the neural network model to perform the inference process through the inference engine.
计算机设备将神经网络模型的操作层应设置对应的计算资源,可以使得每个操作层映射至对应的微处理器,或者映射至对应微处理器的数据流。其中,具有依赖关系的操作层可以映射至同一微处理器。不具有依赖关系的操作层可以映射至不同的微处理器。由此实现对神经网络模型的多个操作层的计算资源进行合理有效分配。推理引擎可以利用神经网络模型以及每一层操作层所分配得到的计算资源进行推理过程。The computer equipment should set the corresponding computing resources to the operating layer of the neural network model, so that each operating layer can be mapped to the corresponding microprocessor, or mapped to the data stream of the corresponding microprocessor. Among them, the operating layers with dependencies can be mapped to the same microprocessor. Operating layers that do not have dependencies can be mapped to different microprocessors. In this way, the calculation resources of the multiple operation layers of the neural network model can be reasonably and effectively allocated. The inference engine can use the neural network model and the computing resources allocated by each operating layer to perform the inference process.
本实施例中,通过识别神经网络模型中多个操作层之间的依赖关系,将每个操作层映射至计算平台中对应的计算资源,由此对计算资源进行合理分配,有效改善了计算资源之间的负载均衡。从而使得推理引擎在利用神经网络模型以及每一层操作层所分配得到的计算资源进行推理过程时,能够有效提高推理过程的计算效率。In this embodiment, by identifying the dependency between multiple operating layers in the neural network model, each operating layer is mapped to the corresponding computing resource in the computing platform, thereby reasonably allocating computing resources, effectively improving computing resources Load balancing between. Therefore, the inference engine can effectively improve the calculation efficiency of the inference process when the inference engine uses the neural network model and the computing resources allocated by each operating layer to perform the inference process.
在一个实施例中,计算机设备可以获取与神经网络模型对应的配置文件;在配置文件中读取神经网络模型中多个操作层之间的依赖关系。In one embodiment, the computer device may obtain a configuration file corresponding to the neural network model; read the dependency relationship between multiple operation layers in the neural network model in the configuration file.
配置文件可以是用户预先根据神经网络模型的结构配置的。配置文件中可以记录每个操作层对应的下一个操作层,即记录了操作层之间的依赖关系。配置文件中还可以记录了每个操作操作层的命名。计算机设备根据操作层的命名可以将每个操作层映射至对应的计算资源。在其中一个实施例中,将每个操作层映射至对应的计算资源包括:获取每个操作层对应的命名;根据预设命名格式对命名进行解析,识别与每个操作层具有映射关系的计算资源;将每个操作层分配到具有映射关系的计算资源。The configuration file may be configured by the user in advance according to the structure of the neural network model. The configuration file can record the next operation layer corresponding to each operation layer, that is, record the dependency relationship between the operation layers. The configuration file can also record the naming of each operation layer. The computer device can map each operation layer to the corresponding computing resource according to the naming of the operation layer. In one of the embodiments, mapping each operation layer to the corresponding computing resource includes: obtaining the name corresponding to each operation layer; parsing the name according to a preset naming format, and identifying calculations that have a mapping relationship with each operation layer Resources; each operating layer is allocated to computing resources with a mapping relationship.
本实施例中,在创建神经网络模型时,可以根据神经网络模型的结构定义操作层与操作层之间的依赖关系。还可以通过命名操作层的方式对每个操作层指定对应的计算资源。操作层的命名可以记录在相应的配置文件中。当推理引擎需要进行推理过程时,推理引擎可以在配置文件中读取操作层之间的依赖关系,以及每个操作层对应的命名。推理引擎可以获取预设命名格式,根据预设命名格式对每个操作层的命名进行解析,识别与每个操作层具有映射关系的计算资源,将每个操作层分配到具有映射关系的计算资源。计算资源包括微处理,或者微处理器以及微处理器中的计算流。In this embodiment, when creating the neural network model, the dependency relationship between the operation layer and the operation layer can be defined according to the structure of the neural network model. You can also specify the corresponding computing resource for each operation layer by naming the operation layer. The naming of the operating layer can be recorded in the corresponding configuration file. When the inference engine needs to perform the inference process, the inference engine can read the dependency relationship between the operation layers and the corresponding naming of each operation layer in the configuration file. The reasoning engine can obtain the preset naming format, analyze the naming of each operation layer according to the preset naming format, identify the computing resources that have a mapping relationship with each operation layer, and assign each operation layer to the computing resources with the mapping relationship . Computing resources include microprocessors, or microprocessors and calculation flows in microprocessors.
神经网络模型中各操作层之间的依赖关系与命名的示意图,可以如图3所示。在图3中,操作层之间的箭头可以表示彼此之间的依赖关系。一个依赖层对应的被依赖层可以是一层,也可以是多层。图3中,操作层2对应的被依赖层为操作层1,操作层6对应的被依赖层为操作层2和操作层5。每个操作层的命名中都包含了操作层、微处理器、计算流。其中,具有依赖关系的 操作层0、操作层1以及操作层2可以映射至微处理器GPU0的计算流0。具有依赖关系的操作层3、操作层4以及操作层5可以映射至微处理器GPU0的计算流1,操作层6可以映射至微处理器GPU0的计算流2。也就是说,在图3中,不同的操作层可以映射至同一微处理器,具有依赖关系的操作层可以映射至同一微处理器的同一计算流,具有不同依赖关系的操作层可以映射至同一处理器的不同计算流。The schematic diagram of the dependency relationship and naming between the various operating layers in the neural network model can be shown in Figure 3. In Figure 3, the arrows between the operation layers can indicate the dependencies between each other. The dependent layer corresponding to a dependent layer can be one layer or multiple layers. In FIG. 3, the dependent layer corresponding to operation layer 2 is operation layer 1, and the dependent layer corresponding to operation layer 6 is operation layer 2 and operation layer 5. The naming of each operation layer includes the operation layer, the microprocessor, and the calculation flow. Among them, operation layer 0, operation layer 1, and operation layer 2 with dependencies can be mapped to the calculation flow 0 of the microprocessor GPU0. The operation layer 3, the operation layer 4, and the operation layer 5 with dependencies can be mapped to the calculation flow 1 of the microprocessor GPU0, and the operation layer 6 can be mapped to the calculation flow 2 of the microprocessor GPU0. In other words, in Figure 3, different operating layers can be mapped to the same microprocessor, operating layers with dependencies can be mapped to the same calculation flow of the same microprocessor, and operating layers with different dependencies can be mapped to the same microprocessor. Different calculation flows of the processor.
本实施例中,通过对神经网络模型中的各个操作层进行配置,可以实现较好的负载均衡。从而能够使得推理过程能够合理分配至对应的计算资源进行处理,有效提高推理过程的计算效率。In this embodiment, better load balancing can be achieved by configuring each operation layer in the neural network model. Thereby, the reasoning process can be reasonably allocated to the corresponding computing resources for processing, and the calculation efficiency of the reasoning process can be effectively improved.
在一个实施例中,如图4所示,将每个操作层映射至对应的计算资源的步骤包括:In an embodiment, as shown in FIG. 4, the step of mapping each operation layer to a corresponding computing resource includes:
步骤402,将多个操作层进行拓扑排序,对排序后的操作层依次进行搜索。Step 402: Topologically sort the multiple operation layers, and search the sorted operation layers in sequence.
步骤404,根据搜索结果生成多个操作层组,将多个操作层组映射至对应的计算资源。Step 404: Generate multiple operation layer groups according to the search results, and map the multiple operation layer groups to corresponding computing resources.
计算机设备将神经网络模型的所有操作层进行拓扑排序。其中,可以根据操作层之间的输入输出关系进行排序。对排序后的操作层按照输入输出的先后顺序依次进行搜索。通过搜索,可以将存在依赖关系的操作层划入同一个操作层组。The computer equipment topologically sorts all the operating layers of the neural network model. Among them, it can be sorted according to the input and output relationship between the operation layers. The sorted operation layer is searched in sequence according to the order of input and output. Through searching, the operating layers with dependencies can be classified into the same operating layer group.
由于输入层和常数层的输出不依赖于其他操作层,在搜索到输入层或者常数层时,可以直接跳过搜索下一层。下一层不属于输入层和常数层的操作层时(也可以称为存在输出依赖的操作层),检查与每一个现有操作层组之间的依赖关系。Since the output of the input layer and the constant layer does not depend on other operation layers, when the input layer or the constant layer is searched, the search for the next layer can be skipped directly. When the next layer does not belong to the operation layer of the input layer and the constant layer (it can also be called the operation layer with output dependency), check the dependency relationship with each existing operation layer group.
在其中一个实施例中,对排序后的操作层依次进行搜索包括:检查排序后的操作层与现有操作层组之间的依赖关系;将排序后的操作层与现有操作层组之间的依赖关系进行统计;根据统计结果将排序后的操作层划入相应的操作层组。In one of the embodiments, sequentially searching the sorted operation layers includes: checking the dependency between the sorted operation layers and the existing operation layer group; and comparing the sorted operation layers with the existing operation layer group Statistics are performed on the dependency relationship; the sorted operation layer is classified into the corresponding operation layer group according to the statistical result.
具体的,针对第一个存在输出依赖的操作层(简称为排序后的操作层) 进行搜索时,尚不存在现有的操作层组,可以将第一个排序后的操作层计入第一个操作层组。针对第二个排序后的操作层进行搜索时,检查是否与第一个操作层组存在依赖关系。若存在依赖关系,则记录第二个排序后的操作层与第一个操作层组之间的依赖关系,否则,将第二个排序后的操作层计入第二个操作层组。以此类推,在针对后续的每一个排序后的操作层进行搜索时,分别与已有的操作层组进行比对,检查是否存在依赖关系。检查是否存在依赖关系的方式有多种,例如,针对其中一层进行搜索时,可以对该层所有的输入向前搜索,如果搜索到与该层具有直接或间接输入的操作层,则表示存在依赖关系。操作层组内可以包括至少一个操作层。当操作层组内包括两个或两个以上的操作层时,如果当前进行搜索的操作层与操作层组内的其中一个操作层具有依赖关系,则表示当前进行搜索的操作层与该操作层组具有依赖关系并且进行记录。Specifically, when searching for the first operation layer with output dependency (referred to as the sorted operation layer for short), there is no existing operation layer group, and the first sorted operation layer can be counted as the first operation layer. Operation layer group. When searching for the second sorted operation layer, check whether there is a dependency relationship with the first operation layer group. If there is a dependency relationship, record the dependency relationship between the second sorted operation layer and the first operation layer group; otherwise, count the second sorted operation layer into the second operation layer group. By analogy, when searching for each subsequent operation layer after sorting, it is compared with the existing operation layer group to check whether there is a dependency. There are many ways to check whether there is a dependency relationship. For example, when searching for a layer, you can search forward all the inputs of that layer. If you find an operation layer that has direct or indirect input to that layer, it means there is Dependency. At least one operation layer may be included in the operation layer group. When the operating layer group includes two or more operating layers, if the operating layer currently being searched has a dependency relationship with one of the operating layers in the operating layer group, it means that the operating layer currently being searched and the operating layer The group has dependencies and records.
在其中一个实施中,计算机设备根据统计结果将排序后的操作层划入相应的操作层组,其中包括:如果统计结果为0,表示排序后的操作层与任何操作层都不存在依赖关系,则该排序后的操作层属于第一独立操作层组;如果统计结果为1,表示排序后的操作层仅与一个现有操作层组存在依赖关系。如果统计结果大于1,表示排序后的操作层与多个现有操作层组之间存在依赖关系。那么该排序后的操作层也不属于任意一个操作层组,将该排序后的操作层划入第二独立操作层组中。并且记录第二独立操作层组与多个现有操作层组之间的依赖关系。In one of the implementations, the computer device classifies the sorted operation layers into the corresponding operation layer group according to the statistical results, including: if the statistical result is 0, it means that the sorted operation layer does not have a dependency relationship with any operation layer. Then the sorted operation layer belongs to the first independent operation layer group; if the statistical result is 1, it means that the sorted operation layer only has a dependency relationship with an existing operation layer group. If the statistical result is greater than 1, it indicates that there is a dependency between the sorted operation layer and multiple existing operation layer groups. Then the sorted operation layer does not belong to any operation layer group, and the sorted operation layer is classified into the second independent operation layer group. And record the dependency relationship between the second independent operation layer group and multiple existing operation layer groups.
计算机设备可以对计算平台的计算资源进行分组。其中,计算机设备可以对计算平台中所有可用的计算资源进行分组,也可以根据预设的资源需求条件,获取预设数量的微处理器以及微处理器中的计算流。在进行分组时,计算机设备可以优先根据微处理器进行分组。当微处理器为一个时,计算机设备可以对微处理器中的计算流进行分组。当微处理器为两个或两个以上时,计算机设备首先对微处理器进行分组,然后对每个微处理器中的计算流进行分组。The computer equipment can group the computing resources of the computing platform. Among them, the computer device can group all available computing resources in the computing platform, and can also obtain a preset number of microprocessors and calculation flows in the microprocessors according to preset resource demand conditions. When grouping, the computer equipment can preferentially group according to the microprocessor. When there is one microprocessor, the computer device can group the calculation streams in the microprocessor. When there are two or more microprocessors, the computer device first groups the microprocessors, and then groups the calculation streams in each microprocessor.
计算机设备可以对每个操作层组的计算复杂度进行打分,得到对应的复杂分数,利用依赖关系、复杂分数,计算与每个操作层组具有映射关系的计算资源。其中具有映射关系的计算资源可以是分组后的计算资源。The computer device can score the computational complexity of each operation layer group to obtain the corresponding complexity score, and use the dependency relationship and the complexity score to calculate the computing resources that have a mapping relationship with each operation layer group. The computing resources with the mapping relationship may be grouped computing resources.
本实施例中,通过根据依赖关系将操作层分别划入对应的操作层组中,可以将操作层组映射至分组后的计算资源。其中,不同的操作层组可以分配不同的计算资源,由此可以使得多个操作层组同步运算,由此能够有效改善计算平台的负载均衡,推理引擎利用神经网络模型进行推理过程时能够有效提高推理过程的计算效率。而且计算资源的分配可以自动完成,有效减少了人工的配置工作。In this embodiment, by dividing the operation layers into corresponding operation layer groups according to the dependency relationship, the operation layer groups can be mapped to the grouped computing resources. Among them, different operation layer groups can allocate different computing resources, which can enable multiple operation layer groups to operate simultaneously, which can effectively improve the load balance of the computing platform, and the reasoning engine can effectively improve when the neural network model is used for the reasoning process. The computational efficiency of the reasoning process. Moreover, the allocation of computing resources can be completed automatically, effectively reducing manual configuration work.
在传统的方式中,对神经网络模型的操作层进行计算资源分配时,通常没有考虑推理引擎的情况,推理过程的实现需要通过不同的微处理器之间的调度,由此导致不同微处理器之间存在较多内存传输,从而对推理过程的计算效率造成影响。In the traditional way, when computing resources are allocated to the operation layer of the neural network model, the situation of the reasoning engine is usually not considered. The realization of the reasoning process needs to be scheduled between different microprocessors, which leads to different microprocessors. There are more memory transfers between them, which affects the calculation efficiency of the inference process.
在其中一个实施例中,计算机设备可以按照操作层组的生成顺序访问操作层组。计算机设备可以将第一个操作层组映射至第一个计算资源。接下来访问下一个操作层组。为了简便描述,被访问的下一个操作层组也可以称为当前操作层组。如果与当前操作层组具有映射关系的其他操作层组正在其中一个计算资源中进行运算,那么保持当前操作层组处于等待状态,直至具有映射关系的操作层组运算完成之后,当前操作层组再进入同一计算资源开始运算。通过将具有映射关系的操作层组分配至同一计算资源进行运算,由此能够有效避免同一推理过程在不同微处理器之间进行,节省了不同微处理器之间的内存传输,从而能够在推理引擎执行推理过程中有效提高推理过程的计算效率。In one of the embodiments, the computer device can access the operation layer group in the order in which the operation layer group is generated. The computer device can map the first operating layer group to the first computing resource. Next, visit the next operation layer group. For simplicity of description, the next operation layer group to be visited can also be referred to as the current operation layer group. If other operation layer groups that have a mapping relationship with the current operation layer group are performing operations in one of the computing resources, the current operation layer group is kept in a waiting state until the operation layer group with the mapping relationship is completed, and the current operation layer group is Enter the same computing resource and start computing. By assigning the operation layer group with the mapping relationship to the same computing resource for calculation, the same reasoning process can be effectively prevented from being carried out between different microprocessors, saving the memory transfer between different microprocessors, and thus being able to infer When the engine executes the reasoning process, it effectively improves the calculation efficiency of the reasoning process.
在其中一个实施例中,如果当前操作层组与正在运算中的操作层组都没有依赖关系,则检测当前操作层组之后的后续操作层组。如果后续操作层组与正在运行的操作层组也不存在依赖关系,则根据复杂分数计算当前操作层组与后续操作层组对应的计算资源。In one of the embodiments, if there is no dependency between the current operation layer group and the operation layer group under operation, the subsequent operation layer groups after the current operation layer group are detected. If there is no dependency relationship between the subsequent operation layer group and the running operation layer group, the computing resources corresponding to the current operation layer group and the subsequent operation layer group are calculated according to the complexity score.
复杂分数可以是计算机设备针对每个操作层组通过多个维度的计算得到的。复杂分数的计算过程包括:计算机设备获取用于对操作层组打分的多个维度以及每个维度对应的权重。维度可以包括操作层对应的输入大小、操作层对应的内容,计算输入所需时间等。每个维度可以预先设置对应的范围与分数。计算机设备针对每个操作层组,按照每个操作层的维度范围对应的分数与权重进行统计,得到每个操作层组的复杂分数。复杂分数越高,表示运算过程越复杂,运算耗时越长。复杂分数越低,表示运算过程越简单,运算耗时越短。The complexity score may be calculated by the computer device through multiple dimensions for each operation layer group. The calculation process of the complex score includes: the computer device obtains multiple dimensions for scoring the operation layer group and the weight corresponding to each dimension. Dimensions can include the input size corresponding to the operation layer, the content corresponding to the operation layer, and the time required to calculate the input. The corresponding range and score can be preset for each dimension. For each operation layer group, the computer device performs statistics according to the scores and weights corresponding to the dimensional range of each operation layer, and obtains the complexity score of each operation layer group. The higher the complexity score, the more complicated the calculation process and the longer the calculation time. The lower the complexity score, the simpler the calculation process and the shorter the calculation time.
计算机设备将当前操作层组的复杂分数与正在运算中的操作层组的复杂分数进行比较,将后续操作层组的复杂分数与正在运算中的操作层组的复杂分数进行比较,得到多个比较结果。在多个比较结果中选择一个与正在运算中的操作层组的复杂分数最接近的操作层组,将该最接近的操作层组映射至未进行运算的计算资源。通过复杂分数对不具有依赖关系的操作层组计算相应的计算资源,由此可以让所有计算资源上不具有依赖关系的操作层组的运算耗时相当,尽量使得各个计算资源同时结束运算。通过同步运算,能够有效改善计算平台的负载均衡,从而促进推理过程计算效率的提高。The computer device compares the complexity score of the current operation layer group with the complexity score of the operation layer group under calculation, and compares the complexity score of the subsequent operation layer group with the complexity score of the operation layer group under operation, and obtains multiple comparisons result. From the multiple comparison results, select an operation layer group that is closest to the complex score of the operation layer group under calculation, and map the closest operation layer group to a computing resource that has not been operated on. Complicated scores are used to calculate the corresponding computing resources for the operation layer groups that do not have a dependency relationship, so that the operation time of all the operation layer groups that do not have a dependency relationship on all computing resources can be made equivalent, and try to make each computing resource end the operation at the same time. Synchronous operations can effectively improve the load balance of the computing platform, thereby promoting the increase in the computational efficiency of the inference process.
以微处理器为一个,计算流为四个为例,神经网络模型的各个操作层映射至计算资源的示意图如图5所示。神经网络模型的操作层按照上述实施例中提供的方式划入四个操作层组,每个操作层组分别映射至对应的计算流。其中,操作层组0映射至计算流0,操作层组1映射至计算流1,操作层组2映射至计算流2,操作层组3映射至计算流3。操作层组0、操作层组1、操作层组2彼此之间不具有依赖关系,可以通过计算流0、计算流1、计算流2同步进行推理过程的运算,操作层3根据操作层组0、操作层组1、操作层组2运算后的结果作为输入,在通过计算流3进行推理过程的运算。由此能够在改善计算平台负载均衡的同时,能够有效提高推理过程的计算效率。Taking a microprocessor as one and four calculation streams as an example, the schematic diagram of each operation layer of the neural network model mapped to the computing resources is shown in Figure 5. The operation layers of the neural network model are divided into four operation layer groups in the manner provided in the foregoing embodiment, and each operation layer group is mapped to a corresponding calculation flow. Among them, operation layer group 0 is mapped to calculation flow 0, operation layer group 1 is mapped to calculation flow 1, operation layer group 2 is mapped to calculation flow 2, and operation layer group 3 is mapped to calculation flow 3. Operation layer group 0, operation layer group 1, and operation layer group 2 do not have a dependency relationship with each other. The calculation flow of calculation flow 0, calculation flow 1, calculation flow 2 can be synchronized to perform the calculation of the inference process, and operation layer 3 is based on operation layer group 0 , The operation layer group 1, the operation layer group 2 operation results as input, in the calculation flow 3 to perform the calculation of the inference process. As a result, while improving the load balance of the computing platform, the computational efficiency of the inference process can be effectively improved.
应该理解的是,虽然图2和图4的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本 文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2和图4中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the various steps in the flowcharts of FIG. 2 and FIG. 4 are displayed in sequence as indicated by the arrows, these steps are not necessarily performed in sequence in the order indicated by the arrows. Unless explicitly stated in this article, there is no strict order for the execution of these steps, and these steps can be executed in other orders. Moreover, at least part of the steps in Figures 2 and 4 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. These sub-steps or The execution order of the stages is not necessarily carried out sequentially, but may be executed alternately or alternately with other steps or at least a part of other steps or sub-steps or stages.
在其中一个实施例中,如图6所示,提供了一种基于推理引擎的计算资源分配装置,包括:资源获取模块602、模型调用模块604、关系识别模块606、资源映射模块608、推理执行模块610,其中:In one of the embodiments, as shown in FIG. 6, a computing resource allocation device based on an inference engine is provided, including: a resource acquisition module 602, a model invocation module 604, a relationship recognition module 606, a resource mapping module 608, and an inference execution module. Module 610, where:
资源获取模块602,用于获取计算平台的计算资源;The resource acquisition module 602 is used to acquire computing resources of the computing platform;
模型调用模块604,用于调用神经网络模型,神经网络模型包括多个操作层;The model calling module 604 is used to call a neural network model, and the neural network model includes multiple operation layers;
关系识别模块606,用于通过推理引擎识别多个操作层之间的依赖关系;The relationship recognition module 606 is used to recognize the dependency relationship between multiple operation layers through the inference engine;
资源映射模块608,用于将每个操作层映射至对应的计算资源;及The resource mapping module 608 is used to map each operation layer to the corresponding computing resource; and
推理执行模610,用于根据依赖关系以及所映射的计算资源,通过推理引擎利用神经网络模型进行推理过程。The reasoning execution module 610 is used to perform the reasoning process using the neural network model through the reasoning engine according to the dependency relationship and the mapped computing resources.
在一个实施例中,关系识别模块606还用于获取与神经网络模型对应的配置文件;及在配置文件中读取神经网络模型中多个操作层之间的依赖关系。In one embodiment, the relationship recognition module 606 is also used to obtain a configuration file corresponding to the neural network model; and read the dependency relationship between multiple operation layers in the neural network model in the configuration file.
在一个实施例中,资源映射模块608还用于获取每个操作层对应的命名;根据预设命名格式对命名进行解析,识别与每个操作层具有映射关系的计算资源;及将每个操作层分配到具有映射关系的计算资源。In one embodiment, the resource mapping module 608 is also used to obtain the name corresponding to each operation layer; to parse the name according to a preset naming format, to identify the computing resource that has a mapping relationship with each operation layer; and to assign each operation layer The layers are allocated to computing resources with a mapping relationship.
在一个实施例中,资源映射模块608还用于将多个操作层进行拓扑排序,对排序后的操作层依次进行搜索;及根据搜索结果生成多个操作层组,将多个操作层组映射至对应的计算资源。In one embodiment, the resource mapping module 608 is also used to topologically sort multiple operation layers, search the sorted operation layers in turn; and generate multiple operation layer groups according to the search results, and map the multiple operation layer groups To the corresponding computing resource.
在一个实施例中,资源映射模块608还用于检查排序后的操作层与现有操作层组之间的依赖关系;将排序后的操作层与现有操作层组之间的依赖关系进行统计;及根据统计结果将排序后的操作层划入相应的操作层组。In one embodiment, the resource mapping module 608 is also used to check the dependency relationship between the sorted operation layer and the existing operation layer group; to collect statistics on the dependency relationship between the sorted operation layer and the existing operation layer group ; And classify the sorted operation layers into the corresponding operation layer groups according to the statistical results.
在一个实施例中,资源映射模块608还用于当统计结果为0时,将排序后的操作层划入第一独立操作层组;当统计结果为1时,表示排序后的操作层仅与一个现有操作层组存在依赖关系;及当统计结果大于1时,将该排序后的操作层划入第二独立操作层组中,并且记录第二独立操作层组与多个现有操作层组之间的依赖关系。In one embodiment, the resource mapping module 608 is further configured to divide the sorted operation layer into the first independent operation layer group when the statistical result is 0; when the statistical result is 1, it means that the sorted operation layer is only related to An existing operation layer group has a dependency relationship; and when the statistical result is greater than 1, the sorted operation layer is classified into the second independent operation layer group, and the second independent operation layer group and multiple existing operation layers are recorded Dependencies between groups.
在一个实施例中,资源映射模块608还用于当操作层组之间存在依赖关系时,获取所有操作层组对应的生成顺序;根据生成顺序向每个操作层组分配对应的计算资源;及当被依赖的操作层组正在运行中时,依赖层组保持等待状态,直至被依赖的操作层组运算完成,依赖层进入对应的计算资源进行运算。In one embodiment, the resource mapping module 608 is further configured to obtain the generation sequence corresponding to all the operation layer groups when there is a dependency relationship between the operation layer groups; allocate corresponding computing resources to each operation layer group according to the generation sequence; and When the dependent operation layer group is running, the dependent layer group remains in a waiting state until the operation of the dependent operation layer group is completed, and the dependent layer enters the corresponding computing resource to perform the operation.
在一个实施例中,资源映射模块608还用于当操作层组之间不存在依赖关系时,获取操作层组对应的复杂分数;及利用复杂分数,计算与每个操作层组具有映射关系的计算资源。In one embodiment, the resource mapping module 608 is further configured to obtain the complexity score corresponding to the operation layer group when there is no dependency relationship between the operation layer groups; and use the complexity score to calculate the mapping relationship with each operation layer group. Computing resources.
在一个实施例中,资源映射模块608还用于将正在运行中的操作层的复杂分数与不存在依赖关系的多个操作层的复杂分数进行比较,得到多个比较结果;在多个比较结果中选择一个与正在运算中的操作层组的复杂分数最接近的操作层组;及将最接近的操作层组映射至未进行运算的计算资源。In one embodiment, the resource mapping module 608 is also used to compare the complexity score of the running operation layer with the complexity scores of multiple operation layers that do not have dependencies to obtain multiple comparison results; Select an operation layer group that is closest to the complex score of the operation layer group that is being calculated; and map the closest operation layer group to the computing resource that has not been calculated.
关于基于推理引擎的计算资源分配装置的具体限定可以参见上文中对于基于推理引擎的计算资源分配方法的限定,在此不再赘述。上述基于推理引擎的计算资源分配装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。Regarding the specific limitation of the computing resource allocation device based on the inference engine, please refer to the above limitation on the computing resource allocation method based on the inference engine, which will not be repeated here. The various modules in the foregoing inference engine-based computing resource allocation device can be implemented in whole or in part by software, hardware, and combinations thereof. The above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
在一个实施例中,提供了一种计算机设备,其内部结构图可以如图7所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有 操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储神经网络模型的推理数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种基于推理引擎的计算资源分配方法。In one embodiment, a computer device is provided, and its internal structure diagram may be as shown in FIG. 7. The computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer readable instructions, and a database. The internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium. The database of the computer equipment is used to store the inference data of the neural network model. The network interface of the computer device is used to communicate with an external terminal through a network connection. When the computer-readable instructions are executed by the processor, a method for allocating computing resources based on an inference engine is realized.
本领域技术人员可以理解,图7中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in FIG. 7 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
一种计算机设备,包括存储器和一个或多个处理器,存储器中储存有计算机可读指令,计算机可读指令被处理器执行时,使得一个或多个处理器执行时实现上述方法实施例中的步骤。A computer device includes a memory and one or more processors. The memory stores computer readable instructions. When the computer readable instructions are executed by the processor, the one or more processors execute the above method embodiments. step.
一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行时实现上述方法实施例中的步骤。One or more non-volatile computer-readable storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute A step of.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总 线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through computer-readable instructions. The computer-readable instructions can be stored in a non-volatile computer. In a readable storage medium, when the computer-readable instructions are executed, they may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database, or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should be It is considered as the range described in this specification.
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation manners of the present application, and the description is relatively specific and detailed, but it should not be understood as a limitation on the scope of the invention patent. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of this application, several modifications and improvements can be made, and these all fall within the protection scope of this application. Therefore, the scope of protection of the patent of this application shall be subject to the appended claims.

Claims (20)

  1. 一种基于推理引擎的计算资源分配方法,包括:A method for allocating computing resources based on an inference engine, including:
    获取计算平台的计算资源;Obtain the computing resources of the computing platform;
    调用神经网络模型,所述神经网络模型包括多个操作层;Calling a neural network model, the neural network model including multiple operation layers;
    通过推理引擎识别所述多个操作层之间的依赖关系,将每个操作层映射至对应的计算资源;及Identify the dependency between the multiple operation layers through the inference engine, and map each operation layer to a corresponding computing resource; and
    根据所述依赖关系以及所映射的计算资源,通过所述推理引擎利用所述神经网络模型进行推理过程。According to the dependency relationship and the mapped computing resources, the neural network model is used to perform the inference process through the inference engine.
  2. 根据权利要求1所述的方法,其特征在于,所述识别所述多个操作层之间的依赖关系包括:The method according to claim 1, wherein the identifying the dependency relationship between the multiple operation layers comprises:
    获取与所述神经网络模型对应的配置文件;及Obtaining a configuration file corresponding to the neural network model; and
    在所述配置文件中读取所述神经网络模型中多个操作层之间的依赖关系。Read the dependency relationship among the multiple operation layers in the neural network model in the configuration file.
  3. 根据权利要求1所述的方法,其特征在于,所述将每个操作层映射至对应的计算资源包括:The method according to claim 1, wherein the mapping each operation layer to a corresponding computing resource comprises:
    获取每个操作层对应的命名;Get the name corresponding to each operation layer;
    根据预设命名格式对所述命名进行解析,识别与每个操作层具有映射关系的计算资源;及Analyze the naming according to the preset naming format, and identify the computing resources that have a mapping relationship with each operation layer; and
    将每个操作层分配到具有映射关系的计算资源。Assign each operation layer to computing resources with mapping relationships.
  4. 根据权利要求1所述的方法,其特征在于,所述将每个操作层映射至对应的计算资源包括:The method according to claim 1, wherein the mapping each operation layer to a corresponding computing resource comprises:
    将所述多个操作层进行拓扑排序,对排序后的操作层依次进行搜索;及Topologically sort the multiple operation layers, and search the sorted operation layers in sequence; and
    根据搜索结果生成多个操作层组,将多个操作层组映射至对应的计算资源。Generate multiple operation layer groups according to the search results, and map the multiple operation layer groups to corresponding computing resources.
  5. 根据权利要求4所述的方法,其特征在于,所述对排序后的操作层依次进行搜索包括:The method according to claim 4, wherein the sequentially searching the sorted operation layers comprises:
    检查排序后的操作层与现有操作层组之间的依赖关系;Check the dependency between the sorted operation layer and the existing operation layer group;
    将排序后的操作层与所述现有操作层组之间的依赖关系进行统计;及Counting the dependency relationship between the sorted operation layer and the existing operation layer group; and
    根据统计结果将所述排序后的操作层划入相应的操作层组。According to the statistical results, the sorted operation layers are classified into corresponding operation layer groups.
  6. 根据权利要求5所述的方法,其特征在于,所述根据统计结果将所述排序后的操作层划入相应的操作层组包括:The method according to claim 5, wherein the sorting the sorted operation layers into a corresponding operation layer group according to a statistical result comprises:
    当统计结果为0时,将所述排序后的操作层划入第一独立操作层组;When the statistical result is 0, classify the sorted operation layer into the first independent operation layer group;
    当统计结果为1时,表示排序后的操作层仅与一个现有操作层组存在依赖关系;及When the statistical result is 1, it means that the sorted operation layer only has a dependency relationship with an existing operation layer group; and
    当统计结果大于1时,将该排序后的操作层划入第二独立操作层组中,并且记录第二独立操作层组与多个现有操作层组之间的依赖关系。When the statistical result is greater than 1, the sorted operation layer is classified into the second independent operation layer group, and the dependency relationship between the second independent operation layer group and multiple existing operation layer groups is recorded.
  7. 根据权利要求4所述的方法,其特征在于,所述将多个操作层组映射至对应的计算资源包括:The method according to claim 4, wherein the mapping multiple operation layer groups to corresponding computing resources comprises:
    当操作层组之间存在依赖关系时,获取所有操作层组对应的生成顺序;When there is a dependency between the operation layer groups, obtain the generation sequence corresponding to all the operation layer groups;
    根据所述生成顺序向每个操作层组分配对应的计算资源;及Allocate corresponding computing resources to each operation layer group according to the generation sequence; and
    当被依赖的操作层组正在运行中时,依赖层组保持等待状态,直至所述被依赖的操作层组运算完成,所述依赖层进入对应的计算资源进行运算。When the dependent operation layer group is running, the dependent layer group remains in a waiting state until the operation of the dependent operation layer group is completed, and the dependent layer enters the corresponding computing resource to perform the operation.
  8. 根据权利要求4所述的方法,其特征在于,所述将多个操作层组映射至对应的计算资源包括:The method according to claim 4, wherein the mapping multiple operation layer groups to corresponding computing resources comprises:
    当操作层组之间不存在依赖关系时,获取所述操作层组对应的复杂分数;及When there is no dependency relationship between the operation layer groups, obtain the complexity score corresponding to the operation layer group; and
    利用所述复杂分数,计算与每个操作层组具有映射关系的计算资源。Using the complexity score, computing resources that have a mapping relationship with each operation layer group are calculated.
  9. 根据权利要求8所述的方法,其特征在于,所述利用所述复杂分数,计算与每个操作层组具有映射关系的计算资源包括:The method according to claim 8, characterized in that said using said complexity score to calculate computing resources having a mapping relationship with each operation layer group comprises:
    将正在运行中的操作层的复杂分数与不存在依赖关系的多个操作层的复杂分数进行比较,得到多个比较结果;Compare the complexity score of the running operation layer with the complexity scores of multiple operation layers that do not have dependencies to obtain multiple comparison results;
    在多个比较结果中选择一个与正在运算中的操作层组的复杂分数最接近的操作层组;及From multiple comparison results, select an operation layer group that is closest to the complex score of the operation layer group under calculation; and
    将所述最接近的操作层组映射至未进行运算的计算资源。The closest operation layer group is mapped to the computing resources that have not been operated on.
  10. 一种基于推理引擎的计算资源分配装置,包括:A computing resource allocation device based on an inference engine includes:
    资源获取模块,用于获取计算平台的计算资源;The resource acquisition module is used to acquire the computing resources of the computing platform;
    模型调用模块,用于调用神经网络模型,所述神经网络模型包括多个操作层;The model calling module is used to call a neural network model, the neural network model including multiple operation layers;
    关系识别模块,用于通过推理引擎识别所述多个操作层之间的依赖关系;The relationship recognition module is used to recognize the dependency relationship between the multiple operation layers through the inference engine;
    资源映射模块,用于将每个操作层映射至对应的计算资源;及The resource mapping module is used to map each operation layer to the corresponding computing resource; and
    推理执行模块,用于根据所述依赖关系以及所映射的计算资源,通过所述推理引擎利用所述神经网络模型进行推理过程。The reasoning execution module is used to perform the reasoning process using the neural network model through the reasoning engine according to the dependency relationship and the mapped computing resources.
  11. 一种计算机设备,包括存储器及一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:A computer device includes a memory and one or more processors. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the one or more processors, the one or more Each processor performs the following steps:
    获取计算平台的计算资源;Obtain the computing resources of the computing platform;
    调用神经网络模型,所述神经网络模型包括多个操作层;Calling a neural network model, the neural network model including multiple operation layers;
    通过推理引擎识别所述多个操作层之间的依赖关系,将每个操作层映射至对应的计算资源;及Identify the dependency between the multiple operation layers through the inference engine, and map each operation layer to a corresponding computing resource; and
    根据所述依赖关系以及所映射的计算资源,通过所述推理引擎利用所述神经网络模型进行推理过程。According to the dependency relationship and the mapped computing resources, the neural network model is used to perform the inference process through the inference engine.
  12. 根据权利要求11所述的计算机设备,其特征在于,所述一个或多个处理器还执行以下步骤:The computer device according to claim 11, wherein the one or more processors further execute the following steps:
    获取与所述神经网络模型对应的配置文件;及Obtaining a configuration file corresponding to the neural network model; and
    在所述配置文件中读取所述神经网络模型中多个操作层之间的依赖关系。Read the dependency relationship among the multiple operation layers in the neural network model in the configuration file.
  13. 根据权利要求11所述的计算机设备,其特征在于,所述一个或多个处理器还执行以下步骤:The computer device according to claim 11, wherein the one or more processors further execute the following steps:
    获取每个操作层对应的命名;Get the name corresponding to each operation layer;
    根据预设命名格式对所述命名进行解析,识别与每个操作层具有映射关系的计算资源;及Analyze the naming according to the preset naming format, and identify the computing resources that have a mapping relationship with each operation layer; and
    将每个操作层分配到具有映射关系的计算资源。Assign each operation layer to computing resources with mapping relationships.
  14. 根据权利要求11所述的计算机设备,其特征在于,所述一个或多个处理器还执行以下步骤:The computer device according to claim 11, wherein the one or more processors further execute the following steps:
    将所述多个操作层进行拓扑排序,对排序后的操作层依次进行搜索;及Topologically sort the multiple operation layers, and search the sorted operation layers in sequence; and
    根据搜索结果生成多个操作层组,将多个操作层组映射至对应的计算资源。Generate multiple operation layer groups according to the search results, and map the multiple operation layer groups to corresponding computing resources.
  15. 根据权利要求14所述的计算机设备,其特征在于,所述一个或多个处理器还执行以下步骤:The computer device according to claim 14, wherein the one or more processors further execute the following steps:
    检查排序后的操作层与现有操作层组之间的依赖关系;Check the dependency between the sorted operation layer and the existing operation layer group;
    将排序后的操作层与所述现有操作层组之间的依赖关系进行统计;及Counting the dependency relationship between the sorted operation layer and the existing operation layer group; and
    根据统计结果将所述排序后的操作层划入相应的操作层组。According to the statistical results, the sorted operation layers are classified into corresponding operation layer groups.
  16. 根据权利要求15所述的计算机设备,其特征在于,所述一个或多个处理器还执行以下步骤:The computer device according to claim 15, wherein the one or more processors further execute the following steps:
    当统计结果为0时,将所述排序后的操作层划入第一独立操作层组;When the statistical result is 0, classify the sorted operation layer into the first independent operation layer group;
    当统计结果为1时,表示排序后的操作层仅与一个现有操作层组存在依赖关系;及When the statistical result is 1, it means that the sorted operation layer only has a dependency relationship with an existing operation layer group; and
    当统计结果大于1时,将该排序后的操作层划入第二独立操作层组中,并且记录第二独立操作层组与多个现有操作层组之间的依赖关系。When the statistical result is greater than 1, the sorted operation layer is classified into the second independent operation layer group, and the dependency relationship between the second independent operation layer group and multiple existing operation layer groups is recorded.
  17. 根据权利要求14所述的计算机设备,其特征在于,所述一个或多个处理器还执行以下步骤:The computer device according to claim 14, wherein the one or more processors further execute the following steps:
    当操作层组之间存在依赖关系时,获取所有操作层组对应的生成顺序;When there is a dependency between the operation layer groups, obtain the generation sequence corresponding to all the operation layer groups;
    根据所述生成顺序向每个操作层组分配对应的计算资源;及Allocate corresponding computing resources to each operation layer group according to the generation sequence; and
    当被依赖的操作层组正在运行中时,依赖层组保持等待状态,直至所述被依赖的操作层组运算完成,所述依赖层进入对应的计算资源进行运算。When the dependent operation layer group is running, the dependent layer group remains in a waiting state until the operation of the dependent operation layer group is completed, and the dependent layer enters the corresponding computing resource to perform the operation.
  18. 根据权利要求14所述的计算机设备,其特征在于,所述一个或多个处理器还执行以下步骤:The computer device according to claim 14, wherein the one or more processors further execute the following steps:
    当操作层组之间不存在依赖关系时,获取所述操作层组对应的复杂分数;及When there is no dependency relationship between the operation layer groups, obtain the complexity score corresponding to the operation layer group; and
    利用所述复杂分数,计算与每个操作层组具有映射关系的计算资源。Using the complexity score, computing resources that have a mapping relationship with each operation layer group are calculated.
  19. 根据权利要求18所述的计算机设备,其特征在于,所述一个或多个处理器还执行以下步骤:The computer device according to claim 18, wherein the one or more processors further execute the following steps:
    将正在运行中的操作层的复杂分数与不存在依赖关系的多个操作层的复杂分数进行比较,得到多个比较结果;Compare the complexity score of the running operation layer with the complexity scores of multiple operation layers that do not have dependencies to obtain multiple comparison results;
    在多个比较结果中选择一个与正在运算中的操作层组的复杂分数最接近的操作层组;及From multiple comparison results, select an operation layer group that is closest to the complex score of the operation layer group under calculation; and
    将所述最接近的操作层组映射至未进行运算的计算资源。The closest operation layer group is mapped to the computing resources that have not been operated on.
  20. 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行权利要求1-9任一项所述的步骤。One or more non-volatile computer-readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to execute claim 1 -9 any of the steps.
PCT/CN2019/129973 2019-12-30 2019-12-30 Computing resource allocation method and apparatus based on inference engine, and computer device WO2021134231A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2019/129973 WO2021134231A1 (en) 2019-12-30 2019-12-30 Computing resource allocation method and apparatus based on inference engine, and computer device
CN201980037488.6A CN113412493A (en) 2019-12-30 2019-12-30 Inference engine-based computing resource allocation method and device and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/129973 WO2021134231A1 (en) 2019-12-30 2019-12-30 Computing resource allocation method and apparatus based on inference engine, and computer device

Publications (1)

Publication Number Publication Date
WO2021134231A1 true WO2021134231A1 (en) 2021-07-08

Family

ID=76687484

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/129973 WO2021134231A1 (en) 2019-12-30 2019-12-30 Computing resource allocation method and apparatus based on inference engine, and computer device

Country Status (2)

Country Link
CN (1) CN113412493A (en)
WO (1) WO2021134231A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117852573A (en) * 2024-03-07 2024-04-09 山东云海国创云计算装备产业创新中心有限公司 Computing force execution system, operator computing flow management method, device, equipment and medium
CN117852573B (en) * 2024-03-07 2024-06-07 山东云海国创云计算装备产业创新中心有限公司 Computing force execution system, operator computing flow management method, device, equipment and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7191178B1 (en) * 2002-02-11 2007-03-13 Louisiana Tech University Research Foundation Method for allocation of web pages using neural networks
CN107958285A (en) * 2017-11-21 2018-04-24 深圳普思英察科技有限公司 The mapping method and device of the neutral net of embedded system
CN109919315A (en) * 2019-03-13 2019-06-21 科大讯飞股份有限公司 A kind of forward inference method, apparatus, equipment and the storage medium of neural network
CN109976911A (en) * 2019-03-25 2019-07-05 哈尔滨工程大学 A kind of adaptive resource dispatching method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7191178B1 (en) * 2002-02-11 2007-03-13 Louisiana Tech University Research Foundation Method for allocation of web pages using neural networks
CN107958285A (en) * 2017-11-21 2018-04-24 深圳普思英察科技有限公司 The mapping method and device of the neutral net of embedded system
CN109919315A (en) * 2019-03-13 2019-06-21 科大讯飞股份有限公司 A kind of forward inference method, apparatus, equipment and the storage medium of neural network
CN109976911A (en) * 2019-03-25 2019-07-05 哈尔滨工程大学 A kind of adaptive resource dispatching method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117852573A (en) * 2024-03-07 2024-04-09 山东云海国创云计算装备产业创新中心有限公司 Computing force execution system, operator computing flow management method, device, equipment and medium
CN117852573B (en) * 2024-03-07 2024-06-07 山东云海国创云计算装备产业创新中心有限公司 Computing force execution system, operator computing flow management method, device, equipment and medium

Also Published As

Publication number Publication date
CN113412493A (en) 2021-09-17

Similar Documents

Publication Publication Date Title
US11907760B2 (en) Systems and methods of memory allocation for neural networks
CN111340237B (en) Data processing and model running method, device and computer equipment
CN111427681A (en) Real-time task matching scheduling system and method based on resource monitoring in edge computing
CN112543918A (en) Neural network segmentation method, prediction method and related device
WO2023056723A1 (en) Fault diagnosis method and apparatus, and electronic device and storage medium
WO2023093724A1 (en) Neural network model processing method and device
JP7386706B2 (en) General-purpose machine learning model, model file generation and analysis method
US10860892B1 (en) Systems and methods of synthetic data generation for data stream
TW201942814A (en) Object classification method, apparatus, server, and storage medium
US20210232921A1 (en) Methods and systems for managing processing of neural network across heterogeneous processors
CN112465146A (en) Quantum and classical hybrid cloud platform and task execution method
WO2023093689A1 (en) Computational graph optimization method and apparatus, and device
CN110580527B (en) Method and device for generating universal machine learning model and storage medium
CN114118433A (en) Recommendation method and device for configuration parameters of equipment
CN110750298A (en) AI model compiling method, equipment and storage medium
CN115640851A (en) Neural network efficient reasoning method suitable for test instrument
CN111242167A (en) Distributed image annotation method and device, computer equipment and storage medium
US20200401898A1 (en) Optimizing machine learning model performance
WO2021134231A1 (en) Computing resource allocation method and apparatus based on inference engine, and computer device
CN115917562A (en) Inference method and device of deep learning model, computer equipment and storage medium
US8913838B2 (en) Visual information processing allocation between a mobile device and a network
CN110390315A (en) A kind of image processing method and device
WO2021134350A1 (en) Inference method and apparatus for neural network model, and computer device and storage medium
US20170293660A1 (en) Intent based clustering
US11714992B1 (en) Neural network processing based on subgraph recognition

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19958087

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19958087

Country of ref document: EP

Kind code of ref document: A1