WO2020134828A1 - 一种神经网络推理结构优化方法及装置 - Google Patents

一种神经网络推理结构优化方法及装置 Download PDF

Info

Publication number
WO2020134828A1
WO2020134828A1 PCT/CN2019/121520 CN2019121520W WO2020134828A1 WO 2020134828 A1 WO2020134828 A1 WO 2020134828A1 CN 2019121520 W CN2019121520 W CN 2019121520W WO 2020134828 A1 WO2020134828 A1 WO 2020134828A1
Authority
WO
WIPO (PCT)
Prior art keywords
layer
network layer
network
merge
nth
Prior art date
Application number
PCT/CN2019/121520
Other languages
English (en)
French (fr)
Inventor
易立强
Original Assignee
深圳云天励飞技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳云天励飞技术有限公司 filed Critical 深圳云天励飞技术有限公司
Publication of WO2020134828A1 publication Critical patent/WO2020134828A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • This application relates to the field of machine learning technology, and in particular to a method and device for optimizing neural network inference structure.
  • Batch normalization is an algorithm that was born to overcome the difficulty of training due to the deepening of neural network layers. It is a step to reduce the transfer of internal covariates, which can reduce the dependence of the gradient on the parameters or their initial value scale, and have a beneficial effect on the gradient flow through the network.
  • the neural network in order to accelerate the gradient descent convergence speed of the training neural network, the neural network generally also adds normalization processing operations to the input data.
  • the neural network inference structure Due to the addition of BN or normalization processing, the neural network inference structure has increased in depth and amount of processing compared to the previous process, while increasing the processing delay, which has an adverse effect on the deployment efficiency of the neural network model inference.
  • the embodiments of the present application provide a method and device for optimizing a neural network inference structure, which can reduce the calculation amount and processing delay in neural network inference to a certain extent, so as to achieve the purpose of improving the inference efficiency of the neural network model.
  • the first aspect of the embodiments of the present application provides a neural network inference structure optimization method, including:
  • Nth network layer in the neural network inference structure is a normalization layer, where N is a positive integer
  • the N-1th network layer satisfies a preset condition, wherein the preset condition is that the N-1th network layer is a convolutional layer or a fully connected layer, and the N-1th network layer Is only connected to the Nth network layer;
  • a first preset algorithm is called to process the N-1th network layer to merge the Nth network layer into the N-1th network layer.
  • the second aspect of the embodiments of the present application provides a neural network inference structure optimization method, including:
  • Nth network layer in the neural network inference structure is a normalization layer, where N is a positive integer
  • the N+1th network layer is a convolutional layer or a fully connected layer
  • a second preset algorithm is called to process the N+1th network layer to merge the Nth network layer into The N+1th network layer.
  • a third aspect of the embodiments of the present application provides a neural network inference structure optimization method, including:
  • the M+1th network layer is a convolutional layer or a fully connected layer
  • the M+1th layer When the output of the network layer is only connected to the M+2th network layer, then:
  • the second preset algorithm is invoked to process the M+1th network layer to merge the Mth network layer into the M+1th network layer to obtain the Mth network layer
  • the second optimized network layer of +1 network layer
  • a fourth aspect of the embodiments of the present application provides a neural network inference structure optimization device, including:
  • the normalization layer confirmation module is used to confirm that the Nth network layer in the neural network inference structure is the normalization layer, where N is a positive integer;
  • the judgment module is used to confirm whether the N-1th network layer meets a preset condition, wherein the preset condition is that the N-1th network layer is a convolutional layer or a fully connected layer, and the Nth The output of -1 network layer is only connected to the Nth network layer;
  • a first merging module used to call a first preset algorithm to merge the Nth network layer to the N-1th network layer when the N-1th network layer meets the preset condition .
  • a fifth aspect of the embodiments of the present application provides a neural network inference structure optimization device, including:
  • the normalization layer confirmation module is used to confirm that the Nth network layer in the neural network inference structure is the normalization layer, where N is a positive integer;
  • the judgment module is used to confirm whether the N+1th network layer is a convolutional layer or a fully connected layer;
  • the first merging module is used to call the second preset algorithm to merge the Nth network layer into the N+1th layer when the N+1th network layer is a convolutional layer or a fully connected layer Network layer.
  • a sixth aspect of the embodiments of the present application provides a neural network inference structure optimization device, including:
  • the first multi-layer merge module is used when the Mth network layer and the M+2th network layer of the neural network inference structure are normalized layers, and the M+1th network layer is a convolutional layer or a fully connected layer , And when the output of the M+1 network layer is only connected to the M+2 network layer, the first preset algorithm is called to process the M+1 network layer to convert the The M+2th network layer is merged into the M+1th network layer to obtain the first optimized network layer of the M+1th network layer, and the second preset algorithm is called to the M+th network layer.
  • the first optimized network layer of 1 network layer performs processing to merge the Mth network layer into the first optimized network layer of the M+1th network layer;
  • a second multi-layer merge module which is used when the Mth network layer and the M+2th network layer of the neural network inference structure are normalized layers, and the M+1th network layer is a convolutional layer or Fully connected layer, and when the output of the M+1 network layer is only connected to the M+2 network layer, the second preset algorithm is called to process the M+1 network layer To merge the Mth network layer into the M+1th network layer to obtain the second optimized network layer of the M+1th network layer and call the first preset algorithm pair
  • the second optimized network layer of the M+1 network layer performs processing to merge the M+2 network layer into the second optimized network layer of the M+1 network layer.
  • a seventh aspect of the embodiments of the present application provides a computer-readable storage medium that stores a computer program, and the computer program is executed by a processor to implement the method.
  • the layer before the normalized layer is a convolutional layer or a fully connected layer
  • the layer before the normalized layer When the output is only connected to the normalized layer, call the first preset algorithm to merge the normalized layer into the convolutional layer or the fully connected layer; or, by obtaining the normalized layer, confirm the normalized layer's
  • the second preset algorithm is called to merge the normalized layer into the convolutional layer or the fully connected layer; by convolving the normalized layer that meets the condition with its adjacent
  • FIG. 1 is a schematic flowchart of a method for optimizing a neural network inference structure according to an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of a method for optimizing a neural network inference structure provided by an embodiment of the present invention
  • FIG. 3 is a schematic flowchart of a method for optimizing a neural network inference structure provided by an embodiment of the present invention
  • FIG. 4 is a schematic flowchart of a method for optimizing a neural network inference structure provided by an embodiment of the present invention
  • FIG. 5 is a schematic flowchart of a method for optimizing a neural network inference structure provided by an embodiment of the present invention
  • FIG. 6 is a schematic flowchart of a method for optimizing a neural network inference structure provided by an embodiment of the present invention
  • FIG. 7 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of a neural network inference structure optimization device provided by an embodiment of the present invention.
  • the neural network inference structure includes several network layers, and the several network layers include a convolution layer, a normalization layer, a fully connected layer, a pooling layer, and an activation layer.
  • equation (1) is transformed as follows:
  • x, y are the input and output of BN
  • ⁇ and ⁇ are the scaling and transfer parameters of BN
  • ⁇ and ⁇ are the mean and standard deviation of the training samples, respectively.
  • Neural networks also generally perform preprocessing and normalization operations on input data, which has also become an indispensable process for neural networks. Similar processing is as follows:
  • equation (2) is transformed as follows:
  • ⁇ and ⁇ are the mean and variance of the training sample It is the pre-normalized equivalent coefficient after preprocessing.
  • batch normalization and pre-processing normalization can be unified into normalization processing.
  • FIG. 1 is a schematic flowchart of a method for optimizing a neural network inference structure according to an embodiment of the present application. As shown in FIG. 1, it may include steps 101-103, as follows:
  • Nth network layer in the neural network inference structure is a normalization layer, where N is a positive integer
  • the output of the N-1th network layer is only connected to the Nth network layer, that is, when the Nth network layer realizes the normalization processing of the output of the N-1th network layer, and Not connected to other network layers, which ensures that when the Nth network layer is merged into the N-1th network layer, the parameters of other network layers are not affected, that is, if the N-1th layer is
  • the output of the network layer is not only connected to the Nth network layer, but also connected to the network layer N′ in parallel. When the Nth network layer is merged into the N-1th network layer, it affects The equivalence of the operations of the parallel network layer N′ before and after the merge is described.
  • the first preset algorithm is:
  • W i,j represents the weight coefficient between the input of the j-th channel of the N-1th network layer and the output of the i-th channel
  • a i ,c i represents the normalization of the N-th network layer The equivalent coefficient of the i-th channel of the chemical layer
  • the j-th channel input and the j-th channel input of the N-1th network layer obtained by merging the Nth network layer and the N-1th network layer according to the first preset algorithm The first weight coefficient between the outputs of the i channels and the first offset parameter of the i-th channel of the N-1 network layer.
  • the derivation process of the first preset algorithm is as follows:
  • x′ i , y i respectively represent the input and output of the ith channel of the Nth network layer, then:
  • x j represents the jth channel input of the convolutional layer or the fully connected layer
  • R is the number of channel inputs of the N-1th network layer.
  • the layer before the normalized layer is a convolutional layer or a fully connected layer, and the layer before the normalized layer
  • the first preset algorithm is called to merge the normalized layer into the convolutional layer or fully connected layer; by combining the normalized layer that meets the condition with its adjacent convolutional layer or The fully connected layers are merged, so that the number of layers of the neural network inference structure is relatively reduced, and the depth is relatively shallow, thereby relatively reducing the calculation amount and processing delay in neural network inference, and thereby improving the inference efficiency of the neural network model.
  • FIG. 2 is a schematic flowchart of a method for optimizing a neural network inference structure according to an embodiment of the present application. This embodiment is a further expansion of the embodiment shown in FIG. 1, as shown in FIG. 2, it may include steps 201-205, as follows:
  • Nth network layer in the neural network inference structure is a normalization layer, where N is a positive integer
  • the N-1th network layer does not satisfy the preset condition, confirm whether the N+1th network layer is a convolutional layer or a fully connected layer;
  • the N+1th network layer is a convolutional layer or a fully connected layer, then call a second preset algorithm to process the N+1th network layer to convert the Nth network layer Merge to the N+1th network layer.
  • the N+1th network layer is processed by calling a second preset algorithm, To merge the Nth network layer to the N+1th network layer, wherein the second preset algorithm is:
  • a j , c j represent the equivalent coefficient of the j th channel of the Nth network layer, that is, the normalization layer;
  • Wi , j , b i represent the th a weight coefficient between the input of the j channels and the output of the i-th channel and an offset coefficient of the i-th channel, and R is the number of channel inputs of the N+1th network layer;
  • Respectively represent the j-th channel input and the i-th channel input of the N+1th network layer obtained by merging the Nth network layer and the N+1th network layer according to the second preset algorithm The second weight coefficient between the channel outputs and the second offset parameter of the i-th channel.
  • the derivation process of the second preset algorithm is as follows:
  • x′ j , y i respectively represent the jth channel input and the ith channel output of the N+1 network layer as a convolutional layer or a fully connected layer ,then:
  • x j is the jth channel input of the Nth network layer.
  • the padding value needs to be modified accordingly. If the padding value in the convolutional layer before conversion is p, the padding input value of the jth channel of the convolutional layer obtained after the merger is adjusted to (pc j )/a j accordingly , where a j , c j represent The normalization layer is the equivalent coefficient of the jth channel of the Nth network layer.
  • the next layer of the normalization layer is judged Whether it is a convolutional layer or a fully connected layer, when the next layer of the normalized layer is a convolutional layer or a fully connected layer, the second preset algorithm is called to merge the normalized layer into the convolutional layer or Fully connected layer; by merging the normalized layer that satisfies the condition with its adjacent convolutional layer or fully connected layer, the number of layers of the neural network inference structure is relatively reduced, and the depth is relatively shallow, which relatively reduces the neural network inference The amount of calculation and processing delay in the process further improve the reasoning efficiency of the neural network model.
  • FIG. 3 is a schematic flowchart of a method for optimizing a neural network inference structure according to an embodiment of the present application. As shown in FIG. 3, it may include steps 301-303 as follows:
  • Nth network layer in the neural network inference structure is a normalization layer, where N is a positive integer
  • the N+1 network layer is a convolution layer or a fully connected layer, then call a second preset algorithm to process the N+1 network layer to convert the N+1 network layer Merge to the N+1th network layer.
  • the padding values need to be corresponding modify. If the padding value in the convolutional layer before conversion is p, the padding input value of the jth channel of the convolutional layer obtained after the merger is adjusted to (pc j )/a j accordingly , where a j , c j represent The normalization layer is the equivalent coefficient of the jth channel of the Nth network layer.
  • the second preset algorithm is called to normalize The layers are merged into the convolutional layer or the fully connected layer; by merging the normalization layer and its adjacent convolutional layer or fully connected layer, the number of layers of the neural network inference structure is relatively reduced, and the depth is relatively shallow, so that the relative Reduce the calculation amount and processing delay in neural network inference, thereby improving the inference efficiency of the neural network model; at the same time, adjust the filling value of the layer with the filling value in the network layer accordingly, which ensures that the neural network inference structure is optimized Equivalence of before and after operation.
  • FIG. 4 is a schematic flowchart of a method for optimizing a neural network inference structure according to an embodiment of the present application. As shown in FIG. 4, it may include steps 401-405, as follows:
  • Nth network layer in the neural network inference structure is a normalization layer, where N is a positive integer
  • the N+1 network layer is a convolution layer or a fully connected layer, call a second preset algorithm to process the N+1 network layer to convert the N+1 network layer Merge to the N+1th network layer;
  • the N+1th network layer is not a convolutional layer or a fully connected layer, confirm whether the N-1th network layer meets a preset condition, where the preset condition is the N-1th
  • the preset condition is the N-1th
  • Each network layer is a convolutional layer or a fully connected layer, and the output of the N-1th network layer is only connected to the Nth network layer;
  • the previous layer of the normalization layer is judged Whether the preset condition is satisfied, when the previous layer of the normalization layer meets the preset condition, the first preset algorithm is called to merge the normalization layer into the convolutional layer or the fully connected layer;
  • FIG. 5 is a schematic flowchart of a method for optimizing a neural network inference structure according to an embodiment of the present application. As shown in FIG. 5, this embodiment is a corresponding process performed when the three adjacent network layers in the neural network inference structure are a normalized layer, a convolutional layer or a fully connected layer, and a normalized layer in turn. It may include steps 501-503 as follows, as follows:
  • the M+1th network layer is a convolutional layer or a fully connected layer
  • the M+th layer The output of one network layer is only connected to the M+2th network layer
  • both the Mth network layer and the M+2th network layer are normalized layers, the M+1th network layer is a convolutional layer or a fully connected layer, and the M+1th network layer's
  • the first preset algorithm is first called to process the M+1th network layer to merge the M+2th network layer into all The M+1th network layer; then call a second preset algorithm to process the first optimized network layer of the M+1th network layer obtained after merging to merge the Mth network layer to The first optimized network layer of the M+1th network layer.
  • the padding input value of the jth channel of the convolutional layer obtained after the merger is adjusted to (pc j )/a j accordingly , where a j , c j represent The normalization layer is the equivalent coefficient of the jth channel of the Mth network layer.
  • the neural network inference structure by obtaining the three adjacent network layers in turn, they are a normalized layer, a convolutional layer or a fully connected layer, a normalized layer, and the output of the convolutional layer or the fully connected layer
  • first call the first preset algorithm to merge the latter normalization layer into the convolutional layer or fully connected layer and then call the second preset algorithm to merge the previous normalization layer Merged into the convolutional layer or fully connected layer obtained by the above merger; by merging the three adjacent network layers to obtain a network layer, the number of layers of the neural network inference structure is relatively reduced, and the depth is relatively shallow, which is relatively reduced
  • the amount of calculation and processing delay in neural network inference improves the inference efficiency of the neural network model.
  • FIG. 6 is a schematic flowchart of a method for optimizing a neural network inference structure according to an embodiment of the present application. As shown in FIG. 6, it may include steps 601-603, as follows:
  • the M+1th network layer is a convolutional layer or a fully connected layer
  • the M+th layer The output of one network layer is only connected to the M+2th network layer
  • the padding value needs to be modified accordingly . If the padding value in the convolutional layer before conversion is p, the padding input value of the jth channel of the convolutional layer obtained after the merger is adjusted to (pc j )/a j accordingly , where a j , c j represent The normalization layer is the equivalent coefficient of the jth channel of the Mth network layer.
  • the neural network inference structure by obtaining the three adjacent network layers in turn, they are a normalized layer, a convolutional layer or a fully connected layer, a normalized layer, and the output of the convolutional layer or the fully connected layer
  • first call the second preset algorithm to merge the previous normalized layer into the convolutional layer or the fully connected layer and then call the first preset algorithm to merge the latter normalized layer Merged into the convolutional layer or fully connected layer obtained by the above merger; by merging the three adjacent network layers to obtain a network layer, the number of layers of the neural network inference structure is relatively reduced, and the depth is relatively shallow, which is relatively reduced
  • the amount of calculation and processing delay in neural network inference improves the inference efficiency of the neural network model.
  • FIG. 7 is a schematic structural diagram of a terminal provided by an embodiment of the present application. As shown in the figure, it includes a processor, an input device, an output device, and a memory. The input device, the output device, and the memory are connected to each other, wherein the memory is used to store a computer program, the computer program includes program instructions, the processor is configured to call the program instructions, and the above program includes to execute the following Step instructions
  • Nth network layer in the neural network inference structure is a normalization layer, where N is a positive integer
  • the N-1th network layer satisfies a preset condition, wherein the preset condition is that the N-1th network layer is a convolution layer or a fully connected layer, and the N-1th network layer Is only connected to the Nth network layer;
  • a first preset algorithm is called to process the N-1th network layer to merge the Nth network layer into the N-1th network layer.
  • Nth network layer in the neural network inference structure is a normalization layer, where N is a positive integer
  • the N+1th network layer is a convolutional layer or a fully connected layer
  • a second preset algorithm is called to process the N+1th network layer to merge the Nth network layer into The N+1th network layer.
  • the M+1th network layer is a convolutional layer or a fully connected layer
  • the M+th layer When the output of one network layer is only connected to the M+2th network layer, then:
  • the second preset algorithm is invoked to process the M+1th network layer to merge the Mth network layer into the M+1th network layer to obtain the Mth network layer
  • the second optimized network layer of +1 network layer
  • the layer before the normalized layer is a convolutional layer or a fully connected layer
  • the layer before the normalized layer When the output is only connected to the normalized layer, call the first preset algorithm to merge the normalized layer into the convolutional layer or the fully connected layer; or, by obtaining the normalized layer, confirm the normalized layer's
  • the second preset algorithm is called to merge the normalized layer into the convolutional layer or the fully connected layer; by convolving the normalized layer that meets the condition with its adjacent
  • the terminal includes a hardware structure and/or a software module corresponding to each function.
  • the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is performed by hardware or computer software driven hardware depends on the specific application of the technical solution and design constraints. Professional technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
  • the embodiments of the present application may divide the functional unit of the terminal according to the above method example, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit.
  • the above integrated unit can be implemented in the form of hardware or software function unit. It should be noted that the division of the units in the embodiments of the present application is schematic, and is only a division of logical functions, and there may be other division manners in actual implementation.
  • FIG. 8 is a schematic structural diagram of a neural network inference structure optimization device according to an embodiment of the present application.
  • the device includes a normalization layer confirmation module 801, a judgment module 802, and a first merge module 803, specifically:
  • the normalization layer confirmation module 801 is used to confirm that the Nth network layer in the neural network inference structure is a normalization layer, where N is a positive integer;
  • the judgment module 802 is used to confirm whether the N-1th network layer meets a preset condition, wherein the preset condition is that the N-1th network layer is a convolutional layer or a fully connected layer, and the The output of the N-1 network layers is only connected to the Nth network layer;
  • the first merging module 803 is configured to call the first preset algorithm to merge the Nth network layer to the N-1th network when the N-1th network layer meets the preset condition Floor.
  • a neural network inference structure optimization device including:
  • the normalization layer confirmation module is used to confirm that the Nth network layer in the neural network inference structure is the normalization layer, where N is a positive integer;
  • the judgment module is used to confirm whether the N+1th network layer is a convolutional layer or a fully connected layer;
  • the first merging module is used to call the second preset algorithm to merge the Nth network layer into the N+1th layer when the N+1th network layer is a convolutional layer or a fully connected layer Network layer.
  • a neural network inference structure optimization device including:
  • the first multi-layer merge module is used when the Mth network layer and the M+2th network layer of the neural network inference structure are normalized layers, and the M+1th network layer is a convolutional layer or a fully connected layer , And when the output of the M+1 network layer is only connected to the M+2 network layer, the first preset algorithm is called to process the M+1 network layer to convert the The M+2th network layer is merged into the M+1th network layer to obtain the first optimized network layer of the M+1th network layer, and the second preset algorithm is called to the M+th network layer.
  • the first optimized network layer of 1 network layer performs processing to merge the Mth network layer into the first optimized network layer of the M+1th network layer;
  • a second multi-layer merge module which is used when the Mth network layer and the M+2th network layer of the neural network inference structure are normalized layers, and the M+1th network layer is a convolutional layer or Fully connected layer, and when the output of the M+1 network layer is only connected to the M+2 network layer, the second preset algorithm is called to process the M+1 network layer To merge the Mth network layer into the M+1th network layer to obtain the second optimized network layer of the M+1th network layer and call the first preset algorithm pair
  • the second optimized network layer of the M+1 network layer performs processing to merge the M+2 network layer into the second optimized network layer of the M+1 network layer.
  • the layer before the normalized layer is a convolutional layer or a fully connected layer
  • the layer before the normalized layer When the output is only connected to the normalized layer, call the first preset algorithm to merge the normalized layer into the convolutional layer or the fully connected layer; or, by obtaining the normalized layer, confirm the normalized layer's
  • the second preset algorithm is called to merge the normalized layer into the convolutional layer or the fully connected layer; by convolving the normalized layer that meets the condition with its adjacent
  • An embodiment of the present application further provides a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, and the computer program enables the computer to perform any kind of neural network inference structure optimization as described in the above method embodiments Some or all steps of the method.
  • An embodiment of the present application further provides a computer program product, the computer program product includes a non-transitory computer-readable storage medium storing a computer program, the computer program causes the computer to execute any of the nerves described in the above method embodiments Some or all steps of the network inference structure optimization method.
  • the disclosed device may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or may Integration into another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above integrated unit may be implemented in the form of hardware or software program modules.
  • the integrated unit is implemented in the form of a software program module and sold or used as an independent product, it may be stored in a computer-readable memory.
  • the technical solution of the present application may essentially be a part that contributes to the prior art or all or part of the technical solution may be embodied in the form of a software product, and the computer software product is stored in a memory.
  • Several instructions are included to enable a computer device (which may be a personal computer, server, network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application.
  • the foregoing memory includes: U disk, read-only memory (ROM), random access memory (RAM), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.
  • the program may be stored in a computer-readable memory, and the memory may include: a flash disk , Read-only memory, random access device, magnetic disk or optical disk, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Image Analysis (AREA)

Abstract

一种神经网络推理结构优化方法,包括:当神经网络推理结构的第M个网络层和第M+2个网络层均为归一化层,第M+1个网络层为卷积层或全连接层,且所述第M+1个网络层的输出仅与所述第M+2个网络层相连时(501),则调用第一预设算法对所述第M+1个网络层进行处理,以将所述第M+2个网络层合并至所述第M+1个网络层,以得到所述第M+1个网络层的第一优化网络层(502);调用第二预设算法对所述第M+1个网络层的第一优化网络层进行处理,以将所述第M个网络层合并至所述第M+1个网络层的第一优化网络层(503)。能够最大限度的降低神经网络推理中的计算量和处理时延,以达到提高神经网络模型推理效率的目的。

Description

一种神经网络推理结构优化方法及装置
本申请要求于2018年12月27日提交中国专利局,申请号为201811612053.3、发明名称为“一种神经网络推理结构优化方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及机器学习技术领域,具体涉及神经网络推理结构优化方法及装置。
背景技术
批归一化(Batch Normalization,BN),是为了克服神经网络层数加深导致难以训练而诞生的一个算法。它是减少内部协变量转移的一个步骤,其可以减少梯度对参数或它们的初始值尺度上的依赖,对通过网络的梯度流动产生有益的影响。另外,为了加快训练神经网络的梯度下降收敛速度,神经网络一般也会对输入数据加入归一化处理操作。
由于加入了BN或者归一化处理,神经网络推理结构相比之前在处理层次深度和计算量上都有所增加,同时增大了处理时延,对神经网络模型推理部署效率产生不利影响。
发明内容
本申请实施例提供一种神经网络推理结构优化方法及装置,能够在一定程度上降低神经网络推理中的计算量和处理时延,以达到提高神经网络模型推理效率的目的。
本申请实施例的第一方面提供了一种神经网络推理结构优化方法,包括:
确认神经网络推理结构中第N个网络层为归一化层,其中,N为正整数;
确认第N-1个网络层是否满足预设条件,其中,所述预设条件为所述第N-1 个网络层为卷积层或全连接层,且所述第N-1个网络层的输出仅与所述第N个网络层相连;
若所述第N-1个网络层满足所述预设条件,则调用第一预设算法对所述第N-1个网络层进行处理,以将所述第N个网络层合并至所述第N-1个网络层。
本申请实施例的第二方面提供了一种神经网络推理结构优化方法,包括:
确认神经网络推理结构中第N个网络层为归一化层,其中,N为正整数;
确认第N+1个网络层是否为卷积层或全连接层;
若所述第N+1个网络层为卷积层或全连接层,则调用第二预设算法对所述第N+1个网络层进行处理,以将所述第N个网络层合并至所述第N+1个网络层。
本申请实施例的第三方面提供了一种神经网络推理结构优化方法,包括:
当神经网络推理结构的第M个网络层和第M+2个网络层均为归一化层,第M+1个网络层为卷积层或全连接层,且所述第M+1个网络层的输出仅与所述第M+2个网络层相连时,则:
调用第一预设算法对所述第M+1个网络层进行处理,以将所述第M+2个网络层合并至所述第M+1个网络层,以得到所述第M+1个网络层的第一优化网络层;
调用第二预设算法对所述第M+1个网络层的第一优化网络层进行处理,以将所述第M个网络层合并至所述第M+1个网络层的第一优化网络层;
或者,调用所述第二预设算法对所述第M+1个网络层进行处理,以将所述第M个网络层合并至所述第M+1个网络层,以得到所述第M+1个网络层的第二优化网络层;
调用所述第一预设算法对所述第M+1个网络层的第二优化网络层进行处理,以将所述第M+2个网络层合并至所述第M+1个网络层的第二优化网络层。
本申请实施例的第四方面提供了一种神经网络推理结构优化装置,包括:
归一化层确认模块,用于确认神经网络推理结构中第N个网络层为归一化 层,其中,N为正整数;
判断模块,用于确认第N-1个网络层是否满足预设条件,其中,所述预设条件为所述第N-1个网络层为卷积层或全连接层,且所述第N-1个网络层的输出仅与所述第N个网络层相连;
第一合并模块,用于在所述第N-1个网络层满足所述预设条件时,调用第一预设算法将所述第N个网络层合并至所述第N-1个网络层。
本申请实施例的第五方面提供了一种神经网络推理结构优化装置,包括:
归一化层确认模块,用于确认神经网络推理结构中第N个网络层为归一化层,其中,N为正整数;
判断模块,用于确认第N+1个网络层是否为卷积层或全连接层;
第一合并模块,用于在所述第N+1个网络层为卷积层或全连接层时,调用第二预设算法将所述第N个网络层合并至所述第N+1个网络层。
本申请实施例的第六方面提供了一种神经网络推理结构优化装置,包括:
第一多层合并模块,用于当神经网络推理结构的第M个网络层和第M+2个网络层均为归一化层,第M+1个网络层为卷积层或全连接层,且所述第M+1个网络层的输出仅与所述第M+2个网络层相连时,调用第一预设算法对所述第M+1个网络层进行处理,以将所述第M+2个网络层合并至所述第M+1个网络层,以得到所述第M+1个网络层的第一优化网络层,并调用第二预设算法对所述第M+1个网络层的第一优化网络层进行处理,以将所述第M个网络层合并至所述第M+1个网络层的第一优化网络层;
或者,包括第二多层合并模块,用于当神经网络推理结构的第M个网络层和第M+2个网络层均为归一化层,第M+1个网络层为卷积层或全连接层,且所述第M+1个网络层的输出仅与所述第M+2个网络层相连时,调用所述第二预设算法对所述第M+1个网络层进行处理,以将所述第M个网络层合并至所述第M+1个网络层,以得到所述第M+1个网络层的第二优化网络层,并调用所述第一预设算法对所述第M+1个网络层的第二优化网络层进行处理,以将所 述第M+2个网络层合并至所述第M+1个网络层的第二优化网络层。
本申请实施例的第七方面提供了一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行以实现所述的方法。
实施本申请实施例,至少具有如下有益效果:
通过本申请实施例,在神经网络推理结构中通过获取归一化层,在确认该归一化层的前一层为卷积层或全连接层,且该归一化层的前一层的输出仅与该归一化层相连时,调用第一预设算法将该归一化层合并至该卷积层或全连接层;或者,通过获取归一化层,确认该归一化层的后一层为卷积层或全连接层时,调用第二预设算法将该归一化层合并至该卷积层或全连接层;通过将满足条件的归一化层与其相邻卷积层或全连接层进行合并,使得神经网络推理结构的层数相对变少,深度相对变浅,从而降低了神经网络推理中的计算量和处理时延,进而提高了神经网络模型的推理效率。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本发明一实施例提供的一种神经网络推理结构优化方法的流程示意图;
图2是本发明实施例提供的一种神经网络推理结构优化方法的流程示意图;
图3是本发明实施例提供的一种神经网络推理结构优化方法的流程示意图;
图4是本发明实施例提供的一种神经网络推理结构优化方法的流程示意图;
图5是本发明实施例提供的一种神经网络推理结构优化方法的流程示意图;
图6是本发明实施例提供的一种神经网络推理结构优化方法的流程示意图;
图7是本发明实施例提供的一种终端的结构示意图;
图8是本发明实施例提供的一种神经网络推理结构优化装置的示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其他步骤或单元。
在本申请中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本申请所描述的实施例可以与其它实施例相结合。
一般的,神经网络推理结构中包括若干网络层,该若干网络层包括卷积层、归一化层、全连接层、池化层、激活层等。
通常批归一化的处理操作如下:
Figure PCTCN2019121520-appb-000001
优选的,本方案中将式(1)进行如下变换:
Figure PCTCN2019121520-appb-000002
则定义
Figure PCTCN2019121520-appb-000003
为BN变换后的等效系数。
其中,x,y分别为BN的输入和输出,
Figure PCTCN2019121520-appb-000004
为进行缩放和转移的输入,γ,β分别为BN的缩放和转移参数;μ,σ分别为训练样本的均值和标准差,这两个值可以通过批的滑动平均方式计算得到;ε为给定的很小的常数。
神经网络一般还会对输入数据进行预处理归一化操作,其也成为了神经网络不可缺少的处理,类似的处理如下:
Figure PCTCN2019121520-appb-000005
本方案中将式(2)进行如下变换:
=ax+c;
其中x,y分别为预处理归一化的输入和输出,μ,σ分别为训练样本的均值和方差,
Figure PCTCN2019121520-appb-000006
为预处理归一化变换后的等效系数。
不失一般性,可将批归一化和预处理归一化统一为归一化处理。
请参阅图1,图1为本申请实施例提供了一种神经网络推理结构优化方法的流程示意图。如图1所示,其可包括步骤101-103,具体如下:
101、确认神经网络推理结构中第N个网络层为归一化层,其中,N为正整数;
102、确认第N-1个网络层是否满足预设条件,其中,所述预设条件为所述第N-1个网络层为卷积层或全连接层,且所述第N-1个网络层的输出仅与所述第N个网络层相连;
所述第N-1个网络层的输出仅与所述第N个网络层相连,即所述第N个网络层在实现对第N-1个网络层的输出进行归一化处理时候,并不与其他网络层相连,保障了在将所述第N个网络层合并至所述第N-1个网络层时,不影响其他网络层的参数的正常,即若所述第N-1个网络层的输出不仅与所述第N个网络层相连,同时也并行的与网络层N’相连,则在将所述第N个网络层合并至所述第N-1个网络层时,影响了所述并行的网络层N’在合并前后运算的等效性。
103、若所述第N-1个网络层满足所述预设条件,则调用第一预设算法对所述第N-1个网络层进行处理,以将所述第N个网络层合并至所述第N-1个网络层。
其中,所述第一预设算法为:
Figure PCTCN2019121520-appb-000007
Figure PCTCN2019121520-appb-000008
其中,W i,j表示所述第N-1个网络层的第j个通道输入与第i个通道输出之间的权重系数;a i,c i表示所述第N个网络层即归一化层的第i个通道的等效系数;
Figure PCTCN2019121520-appb-000009
分别表示将所述第N个网络层与所述第N-1个网络层按照第一预设算法进行合并后得到的所述第N-1个网络层的所述第j个通道输入与第i个通道输出之间的第一权重系数以及所述第N-1个网络层的第i个通道的第一偏置参数。
所述第一预设算法的推导过程如下:
对于第N个网络层为归一化层时,x′ i,y i分别表示所述第N个网络层的第i个通道输入和输出,则:
Figure PCTCN2019121520-appb-000010
Figure PCTCN2019121520-appb-000011
其中,x j表示卷积层或者全连接层的第j个通道输入,R为第N-1个网络层的通道输入的个数。
通过本申请实施例,在神经网络推理结构中通过获取归一化层,在确认该归一化层的前一层为卷积层或全连接层,且该归一化层的前一层的输出仅与该归一化层相连时,调用第一预设算法将该归一化层合并至该卷积层或全连接层;通过将满足条件的归一化层与其相邻卷积层或全连接层进行合并,使得神经网络推理结构的层数相对变少,深度相对变浅,从而相对降低了神经网络推理中的计算量和处理时延,进而提高了神经网络模型的推理效率。
图2为本申请实施例提供了一种神经网络推理结构优化方法的流程示意图。该实施例是对图1所示实施例的进一步拓展,如图2所示,其可包括步骤201-205,具体如下:
201、确认神经网络推理结构中第N个网络层为归一化层,其中,N为正整数;
202、确认第N-1个网络层是否满足预设条件,其中,所述预设条件为所述第N-1个网络层为卷积层或全连接层,且所述第N-1个网络层的输出仅与所述第N个网络层相连;
203、若所述第N-1个网络层满足所述预设条件,则调用第一预设算法对所述第N-1个网络层进行处理,以将所述第N个网络层合并至所述第N-1个网络层;
204、若所述第N-1个网络层不满足所述预设条件,则确认第N+1个网络层是否为卷积层或全连接层;
205、若所述第N+1个网络层为卷积层或全连接层,则调用第二预设算法 对所述第N+1个网络层进行处理,以将所述第N个网络层合并至所述第N+1个网络层。
当所述第N+1个网络层为卷积层或者所述第N+1个网络层为全连接层时,通过调用第二预设算法对所述第N+1个网络层进行处理,以将所述第N个网络层合并至所述第N+1个网络层,其中,所述第二预设算法为:
Figure PCTCN2019121520-appb-000012
Figure PCTCN2019121520-appb-000013
其中,a j,c j表示所述第N个网络层即归一化层的第j个通道的等效系数;W i,j,b i分别表示所述第N+1个网络层的第j个通道输入与第i个通道输出之间的权重系数以及第i个通道的偏置系数,R为所述第N+1个网络层的通道输入的个数;
Figure PCTCN2019121520-appb-000014
分别表示将所述第N个网络层与所述第N+1个网络层按照第二预设算法进行合并后得到的所述第N+1个网络层的第j个通道输入与第i个通道输出之间的第二权重系数及第i个通道的第二偏置参数。
具体地,所述第二预设算法的推导过程如下:
对于第N个网络层为归一化层时,x′ j,y i分别表示所述第N+1个网络层为卷积层或全连接层的第j个通道输入和第i个通道输出,则:
Figure PCTCN2019121520-appb-000015
其中,x j为第N个网络层的第j个通道输入。
在上述合并之后,为了保障神经网络推理结构在网络层合并前后运算的等效性,则对于第N+1个网络层为卷积层且存在填充数值时,需要对填充数值进 行相应的修改。如果转换前卷积层中存在的填充数值为p,则合并后得到的卷积层的第j个通道输入的填充数值相应调整为(p-c j)/a j,其中,a j,c j表示所述归一化层即第N个网络层的第j个通道的等效系数。
通过本申请实施例,在神经网络推理结构中通过获取归一化层,确认该归一化层的前一层不满足所述预设条件的时候,则判断该归一化层的后一层是否为卷积层或全连接层,当该归一化层的后一层是卷积层或全连接层时,则调用第二预设算法将该归一化层合并至该卷积层或全连接层;通过将满足条件的归一化层与其相邻卷积层或全连接层进行合并,使得神经网络推理结构的层数相对变少,深度相对变浅,从而相对降低了神经网络推理中的计算量和处理时延,进而提高了神经网络模型的推理效率。
图3为本申请实施例提供了一种神经网络推理结构优化方法的流程示意图。如图3所示,其可包括步骤301-303,具体如下:
301、确认神经网络推理结构中第N个网络层为归一化层,其中,N为正整数;
302、确认第N+1个网络层是否为卷积层或全连接层;
303、若所述第N+1个网络层为卷积层或全连接层,则调用第二预设算法对所述第N+1个网络层进行处理,以将所述第N个网络层合并至所述第N+1个网络层。
进一步,在上述合并之后,为了保障神经网络推理结构在网络层合并前后运算的等效性,则对于第N+1个网络层为卷积层且存在填充数值时,需要对填充数值进行相应的修改。如果转换前卷积层中存在的填充数值为p,则合并后得到的卷积层的第j个通道输入的填充数值相应调整为(p-c j)/a j,其中,a j,c j表示所述归一化层即第N个网络层的第j个通道的等效系数。
通过本申请实施例,在神经网络推理结构中通过获取归一化层,确认该归一化层的后一层为卷积层或全连接层时,调用第二预设算法将该归一化层合并至该卷积层或全连接层;通过将归一化层与其相邻卷积层或全连接层进行合并, 使得神经网络推理结构的层数相对变少,深度相对变浅,从而相对降低了神经网络推理中的计算量和处理时延,进而提高了神经网络模型的推理效率;同时将网络层中存在填充数值的层的填充数值进行相应调整,使得保障了神经网络推理结构在优化前后运算的等效性。
图4为本申请实施例提供了一种神经网络推理结构优化方法的流程示意图。如图4所示,其可包括步骤401-405,具体如下:
401、确认神经网络推理结构中第N个网络层为归一化层,其中,N为正整数;
402、确认第N+1个网络层是否为卷积层或全连接层;
403、若所述第N+1个网络层为卷积层或全连接层,则调用第二预设算法对所述第N+1个网络层进行处理,以将所述第N个网络层合并至所述第N+1个网络层;
404、若所述第N+1个网络层不是卷积层或全连接层,则确认第N-1个网络层是否满足预设条件,其中,所述预设条件为所述第N-1个网络层为卷积层或全连接层,且所述第N-1个网络层的输出仅与所述第N个网络层相连;
405、若所述第N-1个网络层满足所述预设条件,则调用第一预设算法对所述第N-1个网络层进行处理,以将所述第N个网络层合并至所述第N-1个网络层。
通过本申请实施例,在神经网络推理结构中通过获取归一化层,确认该归一化层的后一层不是卷积层或全连接层时,则判断该归一化层的前一层是否满足所述预设条件,当该归一化层的前一层满足所述预设条件时,则调用第一预设算法将该归一化层合并至该卷积层或全连接层;通过将满足条件的归一化层与其相邻卷积层或全连接层进行合并,使得神经网络推理结构的层数相对变少,深度相对变浅,则相对降低了在神经网络推理中的计算量和处理时延,达到高效应用的目的。
图5为本申请实施例提供了一种神经网络推理结构优化方法的流程示意 图。如图5所示,该实施例是对于当所述神经网络推理结构中相邻三个网络层依次是归一化层、卷积层或全连接层、归一化层时进行的相应处理,如下其可包括步骤501-503,具体如下:
501、确认神经网络推理结构的第M个网络层和第M+2个网络层均为归一化层,第M+1个网络层为卷积层或全连接层,且所述第M+1个网络层的输出仅与所述第M+2个网络层相连;
502、调用第一预设算法对所述第M+1个网络层进行处理,以将所述第M+2个网络层合并至所述第M+1个网络层,以得到所述第M+1个网络层的第一优化网络层;
503、调用第二预设算法对所述第M+1个网络层的第一优化网络层进行处理,以将所述第M个网络层合并至所述第M+1个网络层的第一优化网络层。
即,对于第M个网络层和第M+2个网络层均为归一化层,第M+1个网络层为卷积层或全连接层,且所述第M+1个网络层的输出仅与所述第M+2个网络层相连时,则先调用第一预设算法对所述第M+1个网络层进行处理,以将所述第M+2个网络层合并至所述第M+1个网络层;然后调用第二预设算法对合并后得到的所述第M+1个网络层的第一优化网络层进行处理,以将所述第M个网络层合并至所述第M+1个网络层的第一优化网络层。
进一步,在将所述第M个网络层合并至所述第M+1个网络层的第一优化网络层之后,对于第M+1个网络层为卷积层且存在填充数值时,需要对填充数值进行相应的修改。如果转换前卷积层中存在的填充数值为p,则合并后得到的卷积层的第j个通道输入的填充数值相应调整为(p-c j)/a j,其中,a j,c j表示所述归一化层即第M个网络层的第j个通道的等效系数。
通过本申请实施例,在神经网络推理结构中通过获取相邻三个网络层依次是归一化层、卷积层或全连接层、归一化层,且卷积层或全连接层的输出仅与后一归一化层相连时,先调用第一预设算法将后一归一化层合并至该卷积层或全连接层,然后调用第二预设算法将前一归一化层合并至上述合并得到的卷积 层或全连接层;通过将相邻三个网络层合并后得到一个网络层,使得神经网络推理结构的层数相对变少,深度相对变浅,从而相对降低了神经网络推理中的计算量和处理时延,进而提高了神经网络模型的推理效率。
图6为本申请实施例提供了一种神经网络推理结构优化方法的流程示意图。如图6所示,其可包括步骤601-603,具体如下:
601、确认神经网络推理结构的第M个网络层和第M+2个网络层均为归一化层,第M+1个网络层为卷积层或全连接层,且所述第M+1个网络层的输出仅与所述第M+2个网络层相连;
602、调用所述第二预设算法对所述第M+1个网络层进行处理,以将所述第M个网络层合并至所述第M+1个网络层,以得到所述第M+1个网络层的第二优化网络层;
其中,在将所述第M个网络层合并至所述第M+1个网络层之后,对于第M+1个网络层为卷积层且存在填充数值时,需要对填充数值进行相应的修改。如果转换前卷积层中存在的填充数值为p,则合并后得到的卷积层的第j个通道输入的填充数值相应调整为(p-c j)/a j,其中,a j,c j表示所述归一化层即第M个网络层的第j个通道的等效系数。
603、调用所述第一预设算法对所述第M+1个网络层的第二优化网络层进行处理,以将所述第M+2个网络层合并至所述第M+1个网络层的第二优化网络层。
通过本申请实施例,在神经网络推理结构中通过获取相邻三个网络层依次是归一化层、卷积层或全连接层、归一化层,且卷积层或全连接层的输出仅与后一归一化层相连时,先调用第二预设算法将前一归一化层合并至该卷积层或全连接层,然后调用第一预设算法将后一归一化层合并至上述合并得到的卷积层或全连接层;通过将相邻三个网络层合并后得到一个网络层,使得神经网络推理结构的层数相对变少,深度相对变浅,从而相对降低了神经网络推理中的计算量和处理时延,进而提高了神经网络模型的推理效率。
与上述实施例一致的,请参阅图7,图7为本申请实施例提供的一种终端的结构示意图,如图所示,包括处理器、输入设备、输出设备和存储器,所述处理器、输入设备、输出设备和存储器相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,上述程序包括用于执行以下步骤的指令;
确认神经网络推理结构中第N个网络层为归一化层,其中,N为正整数;
确认第N-1个网络层是否满足预设条件,其中,所述预设条件为所述第N-1个网络层为卷积层或全连接层,且所述第N-1个网络层的输出仅与所述第N个网络层相连;
若所述第N-1个网络层满足所述预设条件,则调用第一预设算法对所述第N-1个网络层进行处理,以将所述第N个网络层合并至所述第N-1个网络层。
或者,确认神经网络推理结构中第N个网络层为归一化层,其中,N为正整数;
确认第N+1个网络层是否为卷积层或全连接层;
若所述第N+1个网络层为卷积层或全连接层,则调用第二预设算法对所述第N+1个网络层进行处理,以将所述第N个网络层合并至所述第N+1个网络层。
或者,当神经网络推理结构的第M个网络层和第M+2个网络层均为归一化层,第M+1个网络层为卷积层或全连接层,且所述第M+1个网络层的输出仅与所述第M+2个网络层相连时,则:
调用第一预设算法对所述第M+1个网络层进行处理,以将所述第M+2个网络层合并至所述第M+1个网络层,以得到所述第M+1个网络层的第一优化网络层;
调用第二预设算法对所述第M+1个网络层的第一优化网络层进行处理,以将所述第M个网络层合并至所述第M+1个网络层的第一优化网络层;
或者,调用所述第二预设算法对所述第M+1个网络层进行处理,以将所述 第M个网络层合并至所述第M+1个网络层,以得到所述第M+1个网络层的第二优化网络层;
调用所述第一预设算法对所述第M+1个网络层的第二优化网络层进行处理,以将所述第M+2个网络层合并至所述第M+1个网络层的第二优化网络层。
通过本申请实施例,在神经网络推理结构中通过获取归一化层,在确认该归一化层的前一层为卷积层或全连接层,且该归一化层的前一层的输出仅与该归一化层相连时,调用第一预设算法将该归一化层合并至该卷积层或全连接层;或者,通过获取归一化层,确认该归一化层的后一层为卷积层或全连接层时,调用第二预设算法将该归一化层合并至该卷积层或全连接层;通过将满足条件的归一化层与其相邻卷积层或全连接层进行合并,使得神经网络推理结构的层数相对变少,深度相对变浅,则相对降低了神经网络推理中的计算量和处理时延,达到高效应用的目的。
上述主要从方法侧执行过程的角度对本申请实施例的方案进行了介绍。可以理解的是,终端为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所提供的实施例描述的各示例的单元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本申请实施例可以根据上述方法示例对终端进行功能单元的划分,例如,可以对应各个功能划分各个功能单元,也可以将两个或两个以上的功能集成在一个处理单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。需要说明的是,本申请实施例中对单元的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
与上述一致的,请参阅图8,图8为本申请实施例提供了一种神经网络推 理结构优化装置的结构示意图。该装置包括归一化层确认模块801、判断模块802、第一合并模块803,具体地:
归一化层确认模块801,用于确认神经网络推理结构中第N个网络层为归一化层,其中,N为正整数;
判断模块802,用于确认第N-1个网络层是否满足预设条件,其中,所述预设条件为所述第N-1个网络层为卷积层或全连接层,且所述第N-1个网络层的输出仅与所述第N个网络层相连;
第一合并模块803,用于在所述第N-1个网络层满足所述预设条件时,调用第一预设算法将所述第N个网络层合并至所述第N-1个网络层。
作为另一实施例,还提供一种神经网络推理结构优化装置,包括:
归一化层确认模块,用于确认神经网络推理结构中第N个网络层为归一化层,其中,N为正整数;
判断模块,用于确认第N+1个网络层是否为卷积层或全连接层;
第一合并模块,用于在所述第N+1个网络层为卷积层或全连接层时,调用第二预设算法将所述第N个网络层合并至所述第N+1个网络层。
作为另一实施例,还提供一种神经网络推理结构优化装置,包括:
第一多层合并模块,用于当神经网络推理结构的第M个网络层和第M+2个网络层均为归一化层,第M+1个网络层为卷积层或全连接层,且所述第M+1个网络层的输出仅与所述第M+2个网络层相连时,调用第一预设算法对所述第M+1个网络层进行处理,以将所述第M+2个网络层合并至所述第M+1个网络层,以得到所述第M+1个网络层的第一优化网络层,并调用第二预设算法对所述第M+1个网络层的第一优化网络层进行处理,以将所述第M个网络层合并至所述第M+1个网络层的第一优化网络层;
或者,包括第二多层合并模块,用于当神经网络推理结构的第M个网络层和第M+2个网络层均为归一化层,第M+1个网络层为卷积层或全连接层,且所述第M+1个网络层的输出仅与所述第M+2个网络层相连时,调用所述第二 预设算法对所述第M+1个网络层进行处理,以将所述第M个网络层合并至所述第M+1个网络层,以得到所述第M+1个网络层的第二优化网络层,并调用所述第一预设算法对所述第M+1个网络层的第二优化网络层进行处理,以将所述第M+2个网络层合并至所述第M+1个网络层的第二优化网络层。
通过本申请实施例,在神经网络推理结构中通过获取归一化层,在确认该归一化层的前一层为卷积层或全连接层,且该归一化层的前一层的输出仅与该归一化层相连时,调用第一预设算法将该归一化层合并至该卷积层或全连接层;或者,通过获取归一化层,确认该归一化层的后一层为卷积层或全连接层时,调用第二预设算法将该归一化层合并至该卷积层或全连接层;通过将满足条件的归一化层与其相邻卷积层或全连接层进行合并,使得神经网络推理结构的层数相对变少,深度相对变浅,则相对降低了神经网络推理中的计算量和处理时延,达到高效应用的目的。
本申请实施例还提供一种计算机存储介质,其中,该计算机存储介质存储用于电子数据交换的计算机程序,该计算机程序使得计算机执行如上述方法实施例中记载的任何一种神经网络推理结构优化方法的部分或全部步骤。
本申请实施例还提供一种计算机程序产品,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,该计算机程序使得计算机执行如上述方法实施例中记载的任何一种神经网络推理结构优化方法的部分或全部步骤。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在申请明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件程序模块的形式实现。
所述集成的单元如果以软件程序模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存 储器中,存储器可以包括:闪存盘、只读存储器、随机存取器、磁盘或光盘等。
以上对本申请实施例进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (10)

  1. 一种神经网络推理结构优化方法,其特征在于,包括:
    确认神经网络推理结构中第N个网络层为归一化层,其中,N为正整数;
    确认第N-1个网络层是否满足预设条件,其中,所述预设条件为所述第N-1个网络层为卷积层或全连接层,且所述第N-1个网络层的输出仅与所述第N个网络层相连;
    若所述第N-1个网络层满足所述预设条件,则调用第一预设算法对所述第N-1个网络层进行处理,以将所述第N个网络层合并至所述第N-1个网络层。
  2. 根据权利要求1所述的方法,其特征在于,还包括:
    若所述第N-1个网络层不满足所述预设条件,则确认第N+1个网络层是否为卷积层或全连接层;
    若所述第N+1个网络层为卷积层或全连接层,则调用第二预设算法对所述第N+1个网络层进行处理,以将所述第N个网络层合并至所述第N+1个网络层。
  3. 一种神经网络推理结构优化方法,其特征在于,包括:
    确认神经网络推理结构中第N个网络层为归一化层,其中,N为正整数;
    确认第N+1个网络层是否为卷积层或全连接层;
    若所述第N+1个网络层为卷积层或全连接层,则调用第二预设算法对所述第N+1个网络层进行处理,以将所述第N个网络层合并至所述第N+1个网络层。
  4. 根据权利要求3所述的方法,其特征在于,还包括:
    若所述第N+1个网络层不是卷积层或全连接层,则确认第N-1个网络层是否满足预设条件,其中,所述预设条件为所述第N-1个网络层为卷积层或全连接层,且所述第N-1个网络层的输出仅与所述第N个网络层相连;
    若所述第N-1个网络层满足所述预设条件,则调用第一预设算法对所述第N-1个网络层进行处理,以将所述第N个网络层合并至所述第N-1个网络层。
  5. 一种神经网络推理结构优化方法,其特征在于,包括:
    当神经网络推理结构的第M个网络层和第M+2个网络层均为归一化层,第M+1个网络层为卷积层或全连接层,且所述第M+1个网络层的输出仅与所述第M+2个网络层相连时,则:
    调用第一预设算法对所述第M+1个网络层进行处理,以将所述第M+2个网络层合并至所述第M+1个网络层,以得到所述第M+1个网络层的第一优化网络层;
    调用第二预设算法对所述第M+1个网络层的第一优化网络层进行处理,以将所述第M个网络层合并至所述第M+1个网络层的第一优化网络层;
    或者,调用所述第二预设算法对所述第M+1个网络层进行处理,以将所述第M个网络层合并至所述第M+1个网络层,以得到所述第M+1个网络层的第二优化网络层;
    调用所述第一预设算法对所述第M+1个网络层的第二优化网络层进行处理,以将所述第M+2个网络层合并至所述第M+1个网络层的第二优化网络层。
  6. 一种神经网络推理结构优化装置,其特征在于,包括:
    归一化层确认模块,用于确认神经网络推理结构中第N个网络层为归一化层,其中,N为正整数;
    判断模块,用于确认第N-1个网络层是否满足预设条件,其中,所述预设条件为所述第N-1个网络层为卷积层或全连接层,且所述第N-1个网络层的输出仅与所述第N个网络层相连;
    第一合并模块,用于在所述第N-1个网络层满足所述预设条件时,调用第一预设算法将所述第N个网络层合并至所述第N-1个网络层。
  7. 根据权利要求6所述的装置,其特征在于,还包括第二合并模块,用于:
    在所述第N-1个网络层不满足所述预设条件时,当确认第N+1个网络层为卷积层或全连接层时,调用第二预设算法将所述第N个网络层合并至所述第N+1个网络层。
  8. 一种神经网络推理结构优化装置,其特征在于,包括:
    归一化层确认模块,用于确认神经网络推理结构中第N个网络层为归一化层,其中,N为正整数;
    判断模块,用于确认第N+1个网络层是否为卷积层或全连接层;
    第一合并模块,用于在所述第N+1个网络层为卷积层或全连接层时,调用第二预设算法将所述第N个网络层合并至所述第N+1个网络层。
  9. 根据权利要求8所述的装置,其特征在于,还包括第二合并模块,用于:
    在所述第N+1个网络层不是卷积层或全连接层时,当确认第N-1个网络层满足预设条件时,其中,所述预设条件为所述第N-1个网络层为卷积层或全连接层,且所述第N-1个网络层的输出仅与所述第N个网络层相连,调用第一预设算法将所述第N个网络层合并至所述第N-1个网络层。
  10. 一种神经网络推理结构优化装置,其特征在于,包括:
    第一多层合并模块,用于当神经网络推理结构的第M个网络层和第M+2个网络层均为归一化层,第M+1个网络层为卷积层或全连接层,且所述第M+1个网络层的输出仅与所述第M+2个网络层相连时,调用第一预设算法对所述第M+1个网络层进行处理,以将所述第M+2个网络层合并至所述第M+1个网络层,以得到所述第M+1个网络层的第一优化网络层,并调用第二预设算法对所述第M+1个网络层的第一优化网络层进行处理,以将所述第M个网络层合并至所述第M+1个网络层的第一优化网络层;
    或者,包括第二多层合并模块,用于当神经网络推理结构的第M个网络层和第M+2个网络层均为归一化层,第M+1个网络层为卷积层或全连接层,且所述第M+1个网络层的输出仅与所述第M+2个网络层相连时,调用所述第二预设算法对所述第M+1个网络层进行处理,以将所述第M个网络层合并至所述第M+1个网络层,以得到所述第M+1个网络层的第二优化网络层,并调用所述第一预设算法对所述第M+1个网络层的第二优化网络层进行处理,以将所述第M+2个网络层合并至所述第M+1个网络层的第二优化网络层。
PCT/CN2019/121520 2018-12-27 2019-11-28 一种神经网络推理结构优化方法及装置 WO2020134828A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811612053.3A CN109635934A (zh) 2018-12-27 2018-12-27 一种神经网络推理结构优化方法及装置
CN201811612053.3 2018-12-27

Publications (1)

Publication Number Publication Date
WO2020134828A1 true WO2020134828A1 (zh) 2020-07-02

Family

ID=66078360

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/121520 WO2020134828A1 (zh) 2018-12-27 2019-11-28 一种神经网络推理结构优化方法及装置

Country Status (2)

Country Link
CN (1) CN109635934A (zh)
WO (1) WO2020134828A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635934A (zh) * 2018-12-27 2019-04-16 深圳云天励飞技术有限公司 一种神经网络推理结构优化方法及装置
CN111582433B (zh) * 2020-04-30 2022-07-15 清华大学 一种硬件友好的神经网络结构自动搜索方法及装置
CN112862100B (zh) * 2021-01-29 2022-02-08 网易有道信息技术(北京)有限公司 用于优化神经网络模型推理的方法及设备
CN115841590B (zh) * 2022-11-16 2023-10-03 中国烟草总公司湖南省公司 神经网络推理优化方法、装置、设备及可读存储介质
CN115906941B (zh) * 2022-11-16 2023-10-03 中国烟草总公司湖南省公司 神经网络自适应退出方法、装置、设备及可读存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108009634A (zh) * 2017-12-21 2018-05-08 美的集团股份有限公司 一种卷积神经网络的优化方法、装置及计算机存储介质
CN108304921A (zh) * 2018-02-09 2018-07-20 北京市商汤科技开发有限公司 卷积神经网络的训练方法及图像处理方法、装置
CN108537326A (zh) * 2017-03-06 2018-09-14 百度(美国)有限责任公司 用于自动驾驶车辆的方法、介质和系统
CN109034371A (zh) * 2018-06-27 2018-12-18 北京文安智能技术股份有限公司 一种深度学习模型推理期加速方法、装置及系统
CN109635934A (zh) * 2018-12-27 2019-04-16 深圳云天励飞技术有限公司 一种神经网络推理结构优化方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537326A (zh) * 2017-03-06 2018-09-14 百度(美国)有限责任公司 用于自动驾驶车辆的方法、介质和系统
CN108009634A (zh) * 2017-12-21 2018-05-08 美的集团股份有限公司 一种卷积神经网络的优化方法、装置及计算机存储介质
CN108304921A (zh) * 2018-02-09 2018-07-20 北京市商汤科技开发有限公司 卷积神经网络的训练方法及图像处理方法、装置
CN109034371A (zh) * 2018-06-27 2018-12-18 北京文安智能技术股份有限公司 一种深度学习模型推理期加速方法、装置及系统
CN109635934A (zh) * 2018-12-27 2019-04-16 深圳云天励飞技术有限公司 一种神经网络推理结构优化方法及装置

Also Published As

Publication number Publication date
CN109635934A (zh) 2019-04-16

Similar Documents

Publication Publication Date Title
WO2020134828A1 (zh) 一种神经网络推理结构优化方法及装置
CN112561078B (zh) 分布式的模型训练方法及相关装置
TWI785227B (zh) 深度神經網絡中批量標準化層修剪的方法
CN108764317B (zh) 一种基于多路特征加权的残差卷积神经网络图像分类方法
EP4131020A1 (en) Data processing method and device
CN108319599A (zh) 一种人机对话的方法和装置
WO2019154411A1 (zh) 词向量更新方法和装置
TW201807621A (zh) 人造神經網路、人造神經元及其控制方法
CN106991999B (zh) 语音识别方法及装置
WO2022111002A1 (zh) 用于训练神经网络的方法、设备和计算机可读存储介质
WO2018113790A1 (zh) 一种人工神经网络运算的装置及方法
JP3323894B2 (ja) ニューラルネットワーク学習方法及び装置
CN111353534B (zh) 一种基于自适应分数阶梯度的图数据类别预测方法
CN109102067B (zh) 神经网络节点的自增减方法、计算机设备及存储介质
WO2024012476A1 (zh) 一种模型训练方法及相关设备
WO2024060839A9 (zh) 对象操作方法、装置、计算机设备以及计算机存储介质
WO2023231887A1 (zh) 基于张量的持续学习方法和装置
WO2023019996A1 (zh) 图像特征的融合方法、装置、电子设备和存储介质
WO2021244203A1 (zh) 参数优化的方法、电子设备和存储介质
WO2020041934A1 (zh) 一种数据处理设备以及一种数据处理方法
WO2019200548A1 (zh) 网络模型编译器及相关产品
TWI732467B (zh) 訓練稀疏連接神經網路的方法
CN113554104B (zh) 一种基于深度学习模型的图像分类方法
CN114926322A (zh) 图像生成方法、装置、电子设备和存储介质
TW202301130A (zh) 深度學習網路裝置、其使用的記憶體存取方法與非揮發性儲存媒介

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19901576

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19901576

Country of ref document: EP

Kind code of ref document: A1