WO2023115776A1

WO2023115776A1 - Neural network reasoning method and apparatus, and computer device, computer-readable storage medium and computer program product

Info

Publication number: WO2023115776A1
Application number: PCT/CN2022/090030
Authority: WO
Inventors: 李天健; 许思
Original assignee: 上海商汤智能科技有限公司
Priority date: 2021-12-24
Filing date: 2022-04-28
Publication date: 2023-06-29
Also published as: CN114398040A

Abstract

Provided are a neural network reasoning method and apparatus, and a computer device and a computer-readable storage medium. The neural network reasoning method comprises: acquiring a target neural network to be deployed, and parsing same, so as to determine network parameters corresponding to network layers of the target neural network; on the basis of a network parameter corresponding to a target network layer, and a predetermined correlation between a sample network layer and a configuration parameter, determining a target configuration parameter corresponding to the target network layer, wherein the type of the sample network layer is the same as that of the target network layer, and the configuration parameter corresponding to the sample network layer is configuration information of an algorithm when an operation corresponding to the sample network layer is executed; and deploying the target neural network on the basis of the target configuration parameter, and performing network reasoning on the basis of the target neural network.

Description

Neural network reasoning method and device, computer equipment, computer-readable storage medium, and computer program product

Cross References to Related Applications

The embodiment of the present disclosure is based on the Chinese patent application with the application number 202111595072.1, the application date is December 24, 2021, and the application name is "a neural network reasoning method, device, computer equipment and storage medium", and requires the Chinese patent Priority of the application, the entire content of the Chinese patent application is hereby incorporated by reference into this disclosure.

technical field

The present disclosure relates to but not limited to the field of computer technology, and in particular relates to a neural network reasoning method and device, computer equipment, computer-readable storage media, and computer program products.

Background technique

With the development of deep learning, there are more and more types of neural networks, and there are more and more configuration parameters for convolution calculations in neural networks. When deploying a neural network, an inference engine is usually used to optimize the configuration parameters of different calculations, which can improve the performance of neural network inference.

Contents of the invention

Embodiments of the present disclosure provide a neural network reasoning method and device, computer equipment, a computer-readable storage medium, and a computer program product.

An embodiment of the present disclosure provides a neural network reasoning method, including:

Obtaining the target neural network to be deployed, and analyzing the target neural network, and determining network parameters corresponding to each network layer of the target neural network;

Based on the network parameters corresponding to the target network layer and the predetermined correspondence between the sample network layer and configuration parameters, determine the target configuration parameters corresponding to the target network layer; wherein, the sample network layer and the target network The types of the layers are the same, and the configuration parameters corresponding to the sample network layer are the configuration information of the algorithm when performing the operation corresponding to the sample network layer;

Deploying the target neural network based on the target configuration parameters, and performing network reasoning based on the target neural network.

In this way, when deploying the target neural network, the target configuration parameters corresponding to the target network layer are automatically determined based on the network parameters corresponding to the target network layer and the predetermined correspondence between the sample network layer and the configuration parameters, and Deploying the target neural network based on the target configuration parameters saves time for configuring parameters in the neural network deployment initialization phase, thereby improving the deployment efficiency of the neural network.

An embodiment of the present disclosure also provides a neural network reasoning device, including:

The analysis part is configured to obtain the target neural network to be deployed, and analyze the target neural network, and determine the network parameters corresponding to each network layer of the target neural network;

The determining part is configured to determine target configuration parameters corresponding to the target network layer based on network parameters corresponding to the target network layer and a predetermined correspondence between sample network layers and configuration parameters; wherein, the sample network The layer is of the same type as the target network layer, and the configuration parameter corresponding to the sample network layer is configuration information of an algorithm when performing an operation corresponding to the sample network layer;

The reasoning part is configured to deploy the target neural network based on the target configuration parameters, and perform network reasoning based on the target neural network.

An embodiment of the present disclosure also provides a computer device, including: a processor, a memory, and a bus. The memory stores machine-readable instructions executable by the processor. When the computer device is running, the processor and The memories communicate with each other through a bus, and the machine-readable instructions are executed by the processor to execute the steps of the above neural network reasoning method.

An embodiment of the present disclosure also provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the steps of the above neural network reasoning method are executed.

An embodiment of the present disclosure also provides a computer program product, where the computer program product includes a computer program or an instruction, and when the computer program or instruction is run on an electronic device, the electronic device is made to execute the steps of the above method.

For the effect description of the above-mentioned neural network reasoning device, computer equipment, computer-readable storage medium and computer program product, please refer to the description of the above-mentioned neural network reasoning method.

In order to make the above-mentioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments will be described in detail below together with the accompanying drawings.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Description of drawings

In order to illustrate the technical solutions of the embodiments of the present disclosure more clearly, the following will briefly introduce the accompanying drawings used in the embodiments. The accompanying drawings here are incorporated into the specification and constitute a part of the specification. The drawings show the embodiments consistent with the present disclosure, and are used together with the description to explain the technical solution of the present disclosure. It should be understood that the following drawings only show some embodiments of the present disclosure, and therefore should not be regarded as limiting the scope. For those skilled in the art, they can also make From these drawings other related drawings are obtained.

FIG. 1 shows a flowchart of a neural network reasoning method provided by an embodiment of the present disclosure;

FIG. 2 shows a flowchart of a method for determining target configuration parameters corresponding to a target network layer in the neural network reasoning method provided by an embodiment of the present disclosure;

FIG. 3 shows a flowchart of a method for determining at least one candidate sample network layer corresponding to a target network layer in the neural network reasoning method provided by an embodiment of the present disclosure;

FIG. 4 shows a flowchart of a method for deploying a target neural network in the neural network reasoning method provided by an embodiment of the present disclosure;

Fig. 5 shows the flow chart of the method for generating the target deployment code corresponding to the target neural network in the neural network reasoning method provided by the embodiment of the present disclosure;

FIG. 6 shows a flowchart of a method for network reasoning in the neural network reasoning method provided by an embodiment of the present disclosure;

Fig. 7 shows a schematic diagram of the architecture of a neural network reasoning device provided by an embodiment of the present disclosure;

FIG. 8 shows a schematic structural diagram of a computer device provided by an embodiment of the present disclosure.

Detailed ways

In order to make the purpose, technical solutions and advantages of the embodiments of the present disclosure clearer, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are The present disclosure discloses some embodiments, but not all embodiments. The components of the disclosed embodiments generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations. Accordingly, the following detailed description of the embodiments of the present disclosure provided in the accompanying drawings is not intended to limit the scope of the claimed disclosure, but represents selected embodiments of the present disclosure. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative effort shall fall within the protection scope of the present disclosure.

It should be noted that like numerals and letters denote similar items in the following figures, therefore, once an item is defined in one figure, it does not require further definition and explanation in subsequent figures.

The term "and/or" in this article only describes an association relationship, which means that there can be three kinds of relationships, for example, A and/or B can mean: there is A alone, A and B exist at the same time, and B exists alone. situation. In addition, the term "at least one" herein means any one of a variety or any combination of at least two of the more, for example, including at least one of A, B, and C, which may mean including from A, Any one or more elements selected from the set formed by B and C.

According to the research of the embodiments of the present disclosure, it is found that in related technologies, in order to obtain better inference performance, the inference engine often needs to traverse a large number of configuration parameter combinations in the preprocessing stage, and perform the actual deployment of the neural network according to the configuration parameter combinations obtained through the traversal. , to select a better combination of configuration parameters according to the test results after actual deployment, which makes the preprocessing stage take a long time and reduces the deployment efficiency of the neural network.

Based on the above research, the present disclosure provides a neural network reasoning method, device, computer equipment, and storage medium. When deploying the target neural network, the network parameters corresponding to the target network layer and the predetermined sample network layer and configuration parameters are used. The corresponding relationship between, automatically determine the target configuration parameters corresponding to the target network layer, and deploy the target neural network based on the target configuration parameters, saving the time for configuring parameters in the initial stage of neural network deployment, thereby improving Deployment efficiency of neural networks.

In order to facilitate the understanding of this embodiment, a neural network reasoning method disclosed in the embodiments of the present disclosure is first introduced in detail. The execution subject of the neural network reasoning method provided in the embodiments of the present disclosure is generally a computer device with a certain computing power. The computer device includes, for example: a terminal device or a server or other processing device, and the terminal device may be a user equipment (User Equipment, UE), a mobile device, a user terminal, a terminal, and the like. In some possible implementation manners, the neural network reasoning method may be implemented by a processor invoking computer-readable instructions stored in a memory.

Referring to FIG. 1, which is a flowchart of a neural network reasoning method provided by an embodiment of the present disclosure, the method includes S101 to S103, wherein:

S101: Obtain a target neural network to be deployed, analyze the target neural network, and determine network parameters corresponding to each network layer of the target neural network.

S102: Determine the target configuration parameters corresponding to the target network layer based on the network parameters corresponding to the target network layer and the predetermined correspondence between the sample network layer and the configuration parameters; wherein, the sample network layer and the The types of the target network layers are the same, and the configuration parameters corresponding to the sample network layers are configuration information of algorithms when performing operations corresponding to the sample network layers.

S103: Deploy the target neural network based on the target configuration parameters, and perform network reasoning based on the target neural network.

The following is a detailed description of the above steps.

For S101, the network parameters corresponding to each network layer of the target neural network include weight parameters, bias parameters, convolution parameters of the convolution layer, activation parameters of the activation layer, etc., by determining the network parameters of the target neural network The network parameters corresponding to the respective layers can determine the type of each network layer of the target neural network to be deployed, and the parameter values of the network parameters corresponding to each network layer.

Exemplarily, by analyzing the target neural network, it can be determined that the network parameters corresponding to the convolutional layer in the target neural network are the amount of convolution operations for convolution operations, the size of the convolution kernel, and the convolution step size Equal convolution parameters, wherein the amount of convolution operation can be represented by the length and width of the feature map participating in the convolution operation, and the target neural network can represent a specific neural network among multiple alternative neural networks .

In addition, when analyzing the target neural network, it is also possible to determine the network parameters corresponding to the network layers of the target neural network and the hierarchical relationship between the network layers of the target neural network, and then perform network reasoning.

Here, the target network layer can be a convolutional layer or a network layer performing matrix operations. Since these network layers have a large amount of computation, corresponding configuration parameters can be set to improve computing efficiency. The algorithm represents the Methods, including whether to use a specific computing mechanism, the amount of computing performed by each computing unit during computing, etc. Wherein, the target network layer may represent a specific network layer among multiple network layers in the target neural network, and the target configuration parameter may represent a configuration parameter corresponding to the target network layer.

Wherein, the configuration parameters corresponding to the sample network layer are optimal configuration parameters for neural network deployment of the sample network layer under the setting of various network parameters, for example, may be the optimal solution of the configuration parameters corresponding to the target network layer .

Exemplarily, taking the target network layer and the sample network layer as a convolution layer as an example, the configuration parameters corresponding to the sample network layer include the architecture of each unified computing device in a Graphics Processing Unit (GPU). (Compute Unified Device Architecture, CUDA) computing volume of the convolution operation performed by the computing unit, the computing volume of the convolution operation performed by each minimum computing unit, whether to use the double buffer mechanism for convolution operation, whether to use the split mechanism to perform the convolution operation The product calculation is decomposed, the iteration step size of each cycle calculation of each CUDA operation unit, the iteration step size of each cycle calculation of each minimum operation unit, etc.

In some embodiments, a plurality of sample network layers with different network parameters can be determined in an exhaustive manner, and before the neural network is deployed, the neural network reasoning engine can pre-determine the correspondence between the sample network layers and the configuration parameters relationship; for any of the sample network layers, the configuration parameters corresponding to the sample network layer may be the configuration parameters used when the neural network whose operation results meet the preset conditions is deployed, and the preset conditions may be after deployment The inference speed of the neural network is greater than the preset threshold. Wherein, the preset threshold may be a positive number.

It should be noted that the device used for neural network deployment using the sample network layer is a test device, not the target deployment device for actually deploying the target network layer. Due to hardware differences between devices, the same configuration parameters are in The running results on different deployment devices may also be different, that is, the target neural network is deployed according to the configuration parameters with better running results on the test device, and the final running results may not be better.

In this way, before the deployment of the neural network, the optimal configuration parameters for the neural network deployment of the sample network layer under each network parameter setting can be obtained, and since the type of the sample network layer is the same as that of the target network layer, subsequent When determining the target configuration parameters of the target network layer, the target configuration parameters corresponding to the target network layer may be determined based on the network parameters corresponding to the target network layer and the predetermined correspondence between the sample network layer and configuration parameters . Therefore, a better solution suitable for configuration parameters corresponding to a specific convolutional layer or a network layer performing matrix operations (the target network layer) can be quickly selected, saving time for configuring parameters in the initial stage of neural network deployment.

In some implementation manners, as shown in FIG. 2, the target configuration parameters corresponding to the target network layer may be determined through the following steps:

S201: Based on the similarity between the network parameters corresponding to the target network layer and the sample network parameters corresponding to the sample network layer, determine at least one candidate sample network layer corresponding to the target network layer.

Here, the similarity between the network parameters corresponding to the target network layer and the sample network parameters corresponding to the sample network layer may be a cosine similarity between network parameters.

In some embodiments, one or more of the network parameters may be used to determine the similarity. When multiple network parameters are used to determine the similarity, the similarity of each network parameter selected for determining the similarity can be determined separately, and the similarity of each network parameter is weighted and summed, and the similarity after the weighted sum is degree, as the similarity between the network parameters corresponding to the target network layer and the sample network parameters corresponding to the sample network layer.

In some implementation manners, as shown in FIG. 3, at least one candidate sample network layer corresponding to the target network layer may be determined through the following steps:

S2011: Determine the sample network layer whose similarity is greater than a preset similarity as an initial sample network layer.

Exemplarily, taking the preset similarity as 0.8, and the similarities between sample network layers 1 to 4 and the target network layer as 0.6, 0.9, 0.75, and 0.85 respectively as an example, sample network layer 2 and sample network layer 4 can be It is determined as the initial sample network layer, so that a candidate set including multiple initial sample network layers can be obtained.

Exemplarily, the optimal solution is selected and recorded by traversing the candidate set. Here, in order to improve the accuracy of the solution matching the configuration parameters corresponding to the network layer, the candidate set can be reduced by pruning.

S2012: Determine the at least one candidate network layer from the initial sample network layer based on the configuration information screening condition matching the network parameter corresponding to the target network layer and the configuration parameters corresponding to each initial sample network layer.

Here, the configuration information screening conditions can be obtained by means of data analysis, for example, by performing data analysis on the network parameters corresponding to the target network layer, the maximum value and minimum value of each configuration parameter corresponding to the network parameters can be obtained. value, the corresponding configuration information filter condition is that the parameter value of the configuration parameter must not be greater than the corresponding maximum value and must not be less than the corresponding minimum value.

In the embodiment of S2011 to S2012, by setting configuration information screening conditions matching the network parameters corresponding to the target network layer, each of the initial sample network layers is screened, and determined from each of the initial sample network layers The at least one candidate sample network layer can reduce the amount of calculation when subsequently determining the configuration parameters of the target network layer, thereby improving the deployment efficiency of the neural network.

S202: For any of the candidate sample network layers, deploy the target neural network based on the configuration parameters corresponding to the candidate sample network layer, and determine the operation result of the target neural network in the deployment mode corresponding to the candidate sample network layer .

In this way, the neural network reasoning engine can be used to deploy the neural network; after the target neural network is deployed to the target deployment device, the target neural network can be determined in the deployment mode corresponding to the candidate sample network layer. The running result, the running result may be inference speed, reasoning accuracy, etc., and the deployment effect of the target neural network in this deployment mode can be determined through the running result.

S203: Based on the operation results of the target neural network in the deployment mode corresponding to each of the candidate sample network layers, determine the target candidate sample network layer, and use the configuration parameters corresponding to the target candidate sample network layer as the target configuration parameter.

In some embodiments, when determining the target candidate sample network layer based on the operation results of the corresponding deployment modes of each candidate sample network layer, the operation can be determined from the candidate sample network layers according to the preset operation result evaluation rules. For the network layer of the target candidate sample with a better result, the configuration parameter corresponding to the network layer of the target candidate sample is used as the target configuration parameter. Wherein, the target candidate sample network layer may represent a specific candidate sample network layer in at least one candidate sample network layer.

Exemplarily, taking the operation result as the inference speed as an example, the corresponding operation result evaluation rule may be to select a candidate sample network layer whose inference speed is greater than a preset threshold as the target candidate sample network layer.

In the embodiment of S201 to S203, by determining a candidate set containing at least one candidate sample network layer, and performing trial operation (that is, performing neural network deployment according to the configuration parameters of the sample network layer) and related configuration parameter selection according to the candidate set, Compared with related technologies that use exhaustive methods to conduct trial runs and select optimal configuration parameters, the number of trial runs required in the initialization phase of neural network deployment is less, thereby saving parameter configuration time in the initialization phase; through fast The configuration parameters of the target network layer realize the fast algorithm selection, thereby improving the deployment efficiency of the neural network.

Here, the network reasoning is to perform data processing on the input data based on the target neural network, so as to obtain a data processing result corresponding to the input data.

Exemplarily, taking the target neural network as an image recognition network as an example, after deploying the target neural network, after inputting a picture containing a cat into the target neural network, through the network reasoning of the target neural network, The reasoning result can be obtained as "cat".

In some implementations, as shown in Figure 4, the target neural network can be deployed through the following steps:

S401: Based on the target configuration parameters, determine the first deployment code corresponding to the target network layer; and, based on the network parameters of other network layers in the target neural network except the target network layer, determine the other The second deployment code corresponding to the network layer.

Here, the first deployment code and the second deployment code are codes that can be recognized by the central processing unit. By generating the first deployment code and the second deployment code, the target can be recorded in the central processing unit. The deployment configuration of the neural network, wherein the central processing unit is a device for deploying the neural network reasoning engine, and is used for deploying the target neural network.

In some embodiments, when determining the first deployment code corresponding to the target network layer based on the target configuration parameters, the target configuration parameters corresponding to the target network layer may be encapsulated based on a preset code encapsulation rule , to determine the first deployment code corresponding to the target network layer; when determining the second deployment code corresponding to the other network layers based on the network parameters of other network layers in the target neural network except the target network layer The second deployment code corresponding to the other network layer may be obtained from the neural network reasoning engine according to the network parameters of the other network layer. Here, other network layers may represent any network layer in the target neural network other than the target network layer, or two or more network layers.

Wherein, the code encapsulation rule may define a template for code encapsulation, and when encapsulating the target configuration parameters corresponding to the target network layer based on the preset code encapsulation rules, the target configuration parameter and the template may be The corresponding relationship between the target configuration parameters is added to the corresponding position of the template, thereby generating the first deployment code corresponding to the target network layer. Here, the code encapsulation rule may indicate a correspondence between the first configuration parameter and a code encapsulation template.

In some implementation manners, when performing code encapsulation, fusion information corresponding to the target network layer may also be obtained, and the target network layer and the fusion relationship with the target network layer may be processed according to the fusion information Other network layers perform code encapsulation.

Exemplarily, taking the target network layer as a convolutional layer as an example, according to the fusion information, it can be determined that other network layers that have a fusion relationship with the convolutional layer are activation layers, then the convolutional layer can be Corresponding configuration parameters and network parameters corresponding to the activation layer are simultaneously code-encapsulated, which can improve the efficiency of initial deployment code generation.

Exemplarily, before code encapsulation, the convolutional code may be split into multiple parts. In this way, these split code parts can be spliced to obtain the deployment code.

In this way, code encapsulation through preset encapsulation rules can automatically generate encapsulated code when deploying the neural network, without adding corresponding codes to the neural network inference engine in advance, thereby reducing the space occupied by the neural network inference engine. Improve the deployment efficiency of neural networks.

S402: Based on the first deployment code and the second deployment code, generate a target deployment code corresponding to the target neural network, and add the target deployment code to a target deployment device.

Here, the target deployment device may be a hardware device such as a graphics processor that can be used for neural network deployment. After the target deployment code is added to the target deployment device, the deployment of the target neural network is completed. .

Wherein, the target deployment code may refer to a specific deployment code generated by the first deployment code and the second deployment code.

In the embodiments of S401 to S402, by automatically generating deployment codes and merging the generated codes, compared with related technologies using neural network reasoning engines to save all deployment codes, the storage space of neural network reasoning engines can be saved, thereby Improve the deployment efficiency of neural networks.

Moreover, compared with all static compilation methods, the corresponding code can be generated according to the fusion effect of convolution/matrix operations, and the performance of neural network reasoning will be improved.

In some implementations, as shown in Figure 5, the target deployment code corresponding to the target neural network can be generated through the following steps:

S4021: Concatenate the first deployment code and the second deployment code to determine an initial deployment code.

Here, when splicing the first deployment code and the second deployment code, the first deployment code and the second deployment code can be combined according to the connection relationship between the network layers in the target neural network. The deployment code is spliced, and the spliced deployment code is the initial deployment code.

S4022: Call the target interface function of the target deployment device to compile the initial deployment code, and generate the target deployment code, where the target deployment code is code running on the target deployment device.

Here, the target interface function may represent a specific interface function of the target deployment device.

Exemplarily, taking the target deployment device as a graphics processor as an example, the target interface function of the NVRTC interface can be called to compile the initial deployment code, and generate target deployment code that can run on the graphics processor.

In the embodiment of S4021 to S4022, by splicing the automatically generated first deployment code and the second deployment code, and calling the target interface function to compile and process the spliced initial deployment code, the deployment target can be generated in real time. The target deployment code of the neural network can improve the deployment efficiency of the neural network.

In some implementations, as shown in Figure 6, network reasoning can be performed through the following steps:

S601: Receive deployment information corresponding to the target deployment code sent by the target deployment device; wherein the deployment information is used to describe a deployment location of codes corresponding to each network layer of the target neural network.

Here, after receiving the target deployment code, the target deployment device may deploy the target deployment code in the target deployment device, and send the deployment information to the neural network reasoning engine, so that the The target deployment device performs neural network inference after receiving the inference instruction.

S602: Perform neural network inference based on the deployment information and the hierarchical relationship between network layers obtained by parsing the target neural network.

Here, when performing neural network reasoning based on the deployment information and the hierarchical relationship, the codes corresponding to each deployment information may be sequentially run according to the hierarchical relationship to perform neural network reasoning.

In some embodiments, the hierarchical relationship between the network layers of the target neural network obtained by parsing the target neural network may be the same analysis that determines the network parameters corresponding to each network layer of the target neural network obtained, or obtained by another analysis of the target neural network.

In some embodiments, the inference sequence between at least one network layer that needs to be used when performing neural network inference can be determined according to the hierarchical relationship, and the neural network inference engine can sequentially report to the target according to the inference sequence. The deployment device sends an inference instruction to instruct the target deployment device to run corresponding codes according to the inference sequence to perform neural network inference.

The neural network reasoning method provided by the embodiments of the present disclosure automatically determines the network parameters corresponding to the target network layer based on the network parameters corresponding to the target network layer and the predetermined corresponding relationship between the sample network layer and the configuration parameters when deploying the target neural network. The target configuration parameters corresponding to the network layer, and the target neural network is deployed based on the target configuration parameters, which saves the time for configuring parameters in the neural network deployment initialization stage, thereby improving the deployment efficiency of the neural network.

Those skilled in the art can understand that in the above method of specific implementation, the writing order of each step does not mean a strict execution order and constitutes any limitation on the implementation process. The specific execution order of each step should be based on its function and possible The inner logic is OK.

Based on the same inventive concept, the embodiment of the present disclosure also provides a neural network reasoning device corresponding to the neural network reasoning method. Since the problem-solving principle of the device in the embodiment of the present disclosure is similar to the above-mentioned neural network reasoning method of the embodiment of the present disclosure, therefore The implementation of the device can refer to the implementation of the method.

Referring to FIG. 7 , it is a schematic diagram of the architecture of a neural network inference device provided by an embodiment of the present disclosure. The device includes various parts, which can be implemented by a processor in a computer device; of course, it can also be implemented by a specific logic Circuit implementation; in the process of implementation, the processor can be a central processing unit (Central Processing Unit, CPU), a microprocessor (Microprocessor Unit, MPU), a digital signal processor (Digital Signal Processor, DSP), an application-specific integrated circuit ( Application Specific Integrated Circuit, ASIC), Field Programmable Gate Array (Field Programmable Gate Array, FPGA) or GPU, etc. The neural network reasoning device includes: an analysis part 701, a determination part 702, and a reasoning part 703; wherein,

The parsing part 701 is configured to acquire the target neural network to be deployed, analyze the target neural network, and determine network parameters corresponding to each network layer of the target neural network;

The determining part 702 is configured to determine the target configuration parameter corresponding to the target network layer based on the network parameter corresponding to the target network layer and the predetermined correspondence between the sample network layer and the configuration parameter; wherein, the sample The network layer is of the same type as the target network layer, and the configuration parameter corresponding to the sample network layer is configuration information of an algorithm when performing an operation corresponding to the sample network layer;

The reasoning part 703 is configured to deploy the target neural network based on the target configuration parameters, and perform network reasoning based on the target neural network.

In some implementations, the determining part 702 determines the target configuration parameters corresponding to the target network layer based on the network parameters corresponding to the target network layer and the predetermined correspondence between sample network layers and configuration parameters , is configured as:

determining at least one candidate sample network layer corresponding to the target network layer based on the similarity between the network parameters corresponding to the target network layer and the sample network parameters corresponding to the sample network layer;

For any of the candidate sample network layers, deploy the target neural network based on the configuration parameters corresponding to the candidate sample network layer, and determine the operation results under the deployment mode corresponding to the candidate sample network layer;

Based on the operation results of the target neural network in the deployment mode corresponding to each of the candidate sample network layers, determine the target candidate sample network layer, and use the configuration parameters corresponding to the target candidate sample network layer as the target configuration parameters.

In some implementations, the determining part 702 determines at least When a candidate sample network layer is configured as:

Determining the sample network layer whose similarity is greater than the preset similarity as the initial sample network layer;

The at least one candidate sample network layer is determined from the initial sample network layer based on the configuration information screening condition matching the network parameter corresponding to the target network layer and the configuration parameters corresponding to each initial sample network layer.

In some implementations, the reasoning part 703, when deploying the target neural network based on the target configuration parameters, is configured to:

Based on the target configuration parameters, determine the first deployment code corresponding to the target network layer; and, based on the network parameters of other network layers in the target neural network except the target network layer, determine the other network layers The corresponding second deployment code;

Based on the first deployment code and the second deployment code, generate a target deployment code corresponding to the target neural network, and add the target deployment code to a target deployment device.

In some implementation manners, the reasoning part 703, when determining the first deployment code corresponding to the target network layer based on the target configuration parameters and the network parameters, is configured to:

Based on a preset code encapsulation rule, the target configuration parameters corresponding to the target network layer are encapsulated, and the first deployment code corresponding to the target network layer is determined.

In some implementation manners, the reasoning part 703, when generating the target deployment code corresponding to the target neural network based on the first deployment code and the second deployment code, is configured to:

splicing the first deployment code and the second deployment code to determine an initial deployment code;

Calling the target interface function of the target deployment device to compile the initial deployment code to generate the target deployment code, wherein the target deployment code is code running on the target deployment device.

In some implementations, before performing network reasoning based on the target neural network, the reasoning part 703 is further configured to:

receiving deployment information corresponding to the target deployment code sent by the target deployment device; wherein the deployment information is used to describe the deployment position of the code corresponding to each network layer of the target neural network;

The parsing part 701, when parsing the target neural network and determining network parameters corresponding to each network layer of the target neural network, is configured as:

Analyzing the target neural network, determining the network parameters corresponding to the respective network layers of the target neural network and the hierarchical relationship between the network layers of the target neural network;

The reasoning part 703, when performing network reasoning based on the target neural network, is configured to:

Neural network reasoning is performed based on the deployment information and the hierarchical relationship.

In some implementation manners, the reasoning part 703, when performing neural network reasoning based on the deployment information and the hierarchical relationship, is configured to:

According to the hierarchical relationship, the codes corresponding to the deployment information are sequentially run to perform neural network reasoning.

The neural network reasoning device provided by the embodiments of the present disclosure, when deploying the target neural network, automatically determines the network parameters corresponding to the target network layer based on the corresponding relationship between the predetermined sample network layer and the configuration parameters. The target configuration parameters corresponding to the network layer, and the target neural network is deployed based on the target configuration parameters, which saves the time for configuring parameters in the neural network deployment initialization stage, thereby improving the deployment efficiency of the neural network. For the description of the processing flow of each part in the device and the interaction flow between each part, reference may be made to the relevant description in the above method embodiment, and details are not described here again.

In the embodiments of the present disclosure and other embodiments, a "part" may be a part of a circuit, a part of a processor, a part of a program or software, etc., of course it may also be a unit, a module or a non-modular one.

Based on the same technical idea, the embodiment of the present disclosure also provides a computer device. Referring to FIG. 8 , it is a schematic structural diagram of a computer device 800 provided by an embodiment of the present disclosure, including a processor 801 , a memory 802 , and a bus 803 . Among them, the memory 802 is configured to store execution instructions, including a memory 8021 and an external memory 8022; the memory 8021 here is also called an internal memory, and is configured to temporarily store calculation data in the processor 801, and exchange data with an external memory 8022 such as a hard disk. For data, the processor 801 exchanges data with the external memory 8022 through the memory 8021. When the computer device 800 is running, the processor 801 communicates with the memory 802 through the bus 803, so that the processor 801 executes the following instructions:

Here, the processor 801 may also be called a CPU. The processor 801 may be an integrated circuit chip with signal processing capability. The processor 801 may also be a general processor, DSP, ASIC, FPGA, GPU or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like. In addition, the processor 801 may be jointly implemented by integrated circuit chips.

Embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is run by a processor, the steps of the neural network reasoning method described in the foregoing method embodiments are executed. Wherein, the storage medium may be a volatile computer-readable storage medium or a non-volatile computer-readable storage medium.

Embodiments of the present disclosure also provide a computer program product, the computer program product carries program code, and the instructions included in the program code can be used to execute the steps of the neural network reasoning method described in the method embodiment above. For details, please refer to the above-mentioned Method Example.

Wherein, the above-mentioned computer program product may be specifically implemented by means of hardware, software or a combination thereof. In some embodiments, the computer program product is embodied as a computer storage medium, and in other embodiments, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) and the like.

Those skilled in the art can clearly understand that for the convenience and brevity of description, for the specific working process of the system and device described above, reference can be made to the corresponding process in the foregoing method embodiments. In the several embodiments provided in the present disclosure, it should be understood that the disclosed systems, devices and methods may be implemented in other ways. The device embodiments described above are illustrative. For example, the division of the units is a logical function division. In actual implementation, there may be another division method. For example, multiple units or components can be combined or integrated. to another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some communication interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.

If the functions are realized in the form of software function units and sold or used as independent products, they can be stored in a non-volatile computer-readable storage medium executable by a processor. Based on this understanding, the essence of the technical solution of the present disclosure or the part that contributes to the related technology or the part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including several The instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present disclosure. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .

Finally, it should be noted that: the above-mentioned embodiments are only specific implementations of the present disclosure, and are used to illustrate the technical solutions of the present disclosure, rather than limit them, and the protection scope of the present disclosure is not limited thereto, although referring to the aforementioned The embodiments have described the present disclosure in detail, and those skilled in the art should understand that any person familiar with the technical field can still modify the technical solutions described in the foregoing embodiments within the technical scope disclosed in the present disclosure Changes can be easily imagined, or equivalent replacements can be made to some of the technical features; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present disclosure, and should be included in this disclosure. within the scope of protection. Therefore, the protection scope of the present disclosure should be defined by the protection scope of the claims.

Industrial Applicability

Embodiments of the present disclosure provide a neural network reasoning method and device, computer equipment, a computer-readable storage medium, and a computer program product, wherein the neural network reasoning method includes: acquiring a target neural network to be deployed, and The network is analyzed to determine the network parameters corresponding to each network layer of the target neural network; based on the network parameters corresponding to the target network layer and the corresponding relationship between the predetermined sample network layer and configuration parameters, determine A target configuration parameter corresponding to a layer; wherein, the sample network layer is of the same type as the target network layer, and the configuration parameter corresponding to the sample network layer is configuration information of an algorithm when performing an operation corresponding to the sample network layer; Deploying the target neural network based on the target configuration parameters, and performing network reasoning based on the target neural network. The above scheme saves the time for configuring parameters in the initialization phase of neural network deployment, thereby improving the deployment efficiency of the neural network.

Claims

A neural network reasoning method, comprising:

Obtaining the target neural network to be deployed, and analyzing the target neural network, and determining network parameters corresponding to each network layer of the target neural network;

Based on the network parameters corresponding to the target network layer and the predetermined correspondence between the sample network layer and configuration parameters, determine the target configuration parameters corresponding to the target network layer; wherein, the sample network layer and the target network The types of the layers are the same, and the configuration parameters corresponding to the sample network layer are the configuration information of the algorithm when performing the operation corresponding to the sample network layer;

Deploying the target neural network based on the target configuration parameters, and performing network reasoning based on the target neural network.
The method according to claim 1, wherein the target configuration parameters corresponding to the target network layer are determined based on the network parameters corresponding to the target network layer and the predetermined correspondence between the sample network layer and configuration parameters ,include:

determining at least one candidate sample network layer corresponding to the target network layer based on the similarity between the network parameters corresponding to the target network layer and the sample network parameters corresponding to the sample network layer;

For any of the candidate sample network layers, deploy the target neural network based on the configuration parameters corresponding to the candidate sample network layer, and determine the operation result of the target neural network in the deployment mode corresponding to the candidate sample network layer;

Based on the operation results of the target neural network in the deployment mode corresponding to each of the candidate sample network layers, determine the target candidate sample network layer, and use the configuration parameters corresponding to the target candidate sample network layer as the target configuration parameters.
The method according to claim 2, wherein, based on the similarity between the network parameters corresponding to the target network layer and the sample network parameters corresponding to the sample network layer, at least A candidate sample network layer, including:

Determining the sample network layer whose similarity is greater than the preset similarity as the initial sample network layer;

The at least one candidate sample network layer is determined from the initial sample network layer based on the configuration information screening condition matching the network parameter corresponding to the target network layer and the configuration parameters corresponding to each initial sample network layer.
The method according to any one of claims 1 to 3, wherein said deploying said target neural network based on said target configuration parameters comprises:

Based on the target configuration parameters, determine the first deployment code corresponding to the target network layer; and, based on the network parameters of other network layers in the target neural network except the target network layer, determine the other network layers The corresponding second deployment code;

Based on the first deployment code and the second deployment code, generate a target deployment code corresponding to the target neural network, and add the target deployment code to a target deployment device.
The method according to claim 4, wherein said determining the first deployment code corresponding to the target network layer based on the target configuration parameters comprises:

Based on a preset code encapsulation rule, the target configuration parameters corresponding to the target network layer are encapsulated, and the first deployment code corresponding to the target network layer is determined.
The method according to claim 4, wherein said generating the target deployment code corresponding to the target neural network based on the first deployment code and the second deployment code comprises:

splicing the first deployment code and the second deployment code to determine an initial deployment code;

calling the target interface function of the target deployment device to compile the initial deployment code to generate the target deployment code, wherein the target deployment code is code running on the target deployment device.
The method according to any one of claims 4 to 6, wherein, before performing network reasoning based on the target neural network, the method further comprises:

receiving deployment information corresponding to the target deployment code sent by the target deployment device; wherein the deployment information is used to describe the deployment position of the code corresponding to each network layer of the target neural network;

The said target neural network is analyzed, and the network parameters corresponding to each network layer of said target neural network are determined, including:

Analyzing the target neural network, determining the network parameters corresponding to the respective network layers of the target neural network and the hierarchical relationship between the network layers of the target neural network;

The network reasoning based on the target neural network includes:

Neural network reasoning is performed based on the deployment information and the hierarchical relationship.
The method according to claim 7, wherein said performing neural network reasoning based on said deployment information and said hierarchical relationship comprises:

According to the hierarchical relationship, the codes corresponding to the deployment information are sequentially run to perform neural network reasoning.
A neural network reasoning device, comprising:

The parsing part is configured to obtain the target neural network to be deployed, and analyze the target neural network to determine network parameters corresponding to each network layer of the target neural network;

The determining part is configured to determine target configuration parameters corresponding to the target network layer based on network parameters corresponding to the target network layer and a predetermined correspondence between sample network layers and configuration parameters; wherein, the sample network The layer is of the same type as the target network layer, and the configuration parameter corresponding to the sample network layer is configuration information of an algorithm when performing an operation corresponding to the sample network layer;

The reasoning part is configured to deploy the target neural network based on the target configuration parameters, and perform network reasoning based on the target neural network.
A computer device, comprising: a processor, a memory, and a bus, the memory stores machine-readable instructions executable by the processor, and when the computer device is running, the processor and the memory are connected through Bus communication, when the machine-readable instructions are executed by the processor, the steps of the neural network reasoning method according to any one of claims 1 to 8 are executed.
A computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the steps of the neural network reasoning method according to any one of claims 1 to 8 are executed.
A computer program product, the computer program product comprising a computer program or an instruction, when the computer program or instruction is run on an electronic device, the electronic device is made to execute the method described in any one of claims 1 to 8 The steps of the neural network inference method.