CN113762479A - Neural network optimization method and device - Google Patents
Neural network optimization method and device Download PDFInfo
- Publication number
- CN113762479A CN113762479A CN202111060216.3A CN202111060216A CN113762479A CN 113762479 A CN113762479 A CN 113762479A CN 202111060216 A CN202111060216 A CN 202111060216A CN 113762479 A CN113762479 A CN 113762479A
- Authority
- CN
- China
- Prior art keywords
- convolution kernel
- fusible
- residual
- branch
- residual error
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 29
- 238000000034 method Methods 0.000 title claims abstract description 23
- 238000005457 optimization Methods 0.000 title claims abstract description 15
- 238000012549 training Methods 0.000 claims abstract description 23
- 230000004927 fusion Effects 0.000 claims abstract description 22
- 238000006243 chemical reaction Methods 0.000 claims abstract description 6
- 238000010606 normalization Methods 0.000 claims description 14
- 230000009466 transformation Effects 0.000 claims description 8
- 230000006835 compression Effects 0.000 abstract description 4
- 238000007906 compression Methods 0.000 abstract description 4
- 230000010354 integration Effects 0.000 abstract 1
- 239000011159 matrix material Substances 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 238000002474 experimental method Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 238000003062 neural network model Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 101100498818 Arabidopsis thaliana DDR4 gene Proteins 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000000137 annealing Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000006260 foam Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000009776 industrial production Methods 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
Abstract
The invention discloses a neural network optimization method and device. Wherein the method comprises the following steps: model training is carried out based on a multi-branch fusible residual structure, and trained model parameters are extracted; performing structure conversion on the trained fusible residual error structure type by using a fusion operator to obtain a single-branch residual error structure; and deploying the single-branch residual structure to a target device and executing the inference step of the target task. The invention realizes the integration of the residual error module through design, carries out structural replacement on the residual error module, fully utilizes the advantages of a multi-branch structure and a single-branch structure, improves the memory efficiency and the parallelism degree when the network deployment is operated, saves the network resource consumption and accelerates the network reasoning speed; and a re-parameterization method is adopted for parameter compression, so that the problem of precision reduction caused by cutting parameters and connection is reduced.
Description
Technical Field
The embodiment of the invention relates to the technical field of neural networks, in particular to a neural network optimization method and device.
Background
In recent years, with the rapid development of deep learning, the deep learning has achieved excellent performance in many tasks, so that the deep learning is increasingly applied to a plurality of life and industrial fields. At present, a deployment deep neural network model is divided into an Online deployment mode and an Offline deployment mode. The Offline deployment is usually used in most practical industrial production environments, and the Offline deployment processes data locally without passing through a network, so that the safety and the real-time performance can be guaranteed. However, for embedded end-side devices with limited computational resources, the massive demands on computational power from deep neural networks are unacceptable. At the same time, heavy computing can quickly drain its limited battery power for embedded mobile devices that use batteries.
To solve the deployment dilemma of deep neural networks in embedded devices, bottlenecks have occurred only by the conventional method. The simple increase of DRAM memory capacity of embedded equipment and the enhancement of CPU operational capability cannot match the development speed of neural networks. And in many industrial scenarios, there are strict volume and power consumption limitations on embedded devices, which present a huge challenge to the deployment of neural networks on embedded devices. The constraint requirements of the neural network on the deployment memory and the power consumption of the embedded device are solved, so a feasible neural network deployment scheme meeting the embedded limited hardware resources is born, namely, the neural network model compression.
However, the conventional neural network model compression method cuts redundant connections and parameters out of the trained network model, thereby reducing the number of parameters. Because the compression methods do not change the overall architecture of the network, only redundant connections and parameters are cut off, and thus the model loses part of precision; in addition, the traditional neural network architecture cannot simultaneously utilize the advantages of a multi-branch structure and a single-branch structure, so that the neural network reasoning efficiency is low.
Disclosure of Invention
The invention provides a neural network optimization method and device, which are used for effectively reducing model parameters and improving reasoning efficiency of a neural network.
In a first aspect, an embodiment of the present invention provides a neural network optimization method, including:
model training is carried out based on a multi-branch fusible residual structure, and trained model parameters are extracted;
performing structure conversion on the trained fusible residual error structure type by using a fusion operator to obtain a single-branch residual error structure;
and deploying the single-branch residual structure to a target device and executing the inference step of the target task.
Optionally, the fusible residual structure is obtained by removing a relu layer between two consecutive convolution kernels from the residual structure.
Optionally, the convolution kernel structure in the fusible residual structure includes: a 1 by 1 convolution kernel, a 3 by 3 convolution kernel following the 1 by 1 convolution kernel, and a 1 by 1 convolution kernel following the 3 by 3 convolution kernel.
Optionally, performing structure transformation on the trained fusible residual structure by using a fusion operator, including:
traversing all fusible residual error structures in the neural network;
and substituting the convolution kernel input in the fusible residual error structure into the formula of the batch normalization layer to obtain the convolution kernel fused with the batch normalization layer.
Optionally, performing structure transformation on the trained fusible residual structure by using a fusion operator, including:
each convolution kernel in the fusible residual error structure takes the output of the previous convolution kernel layer as input and feeds the output back to the next convolution kernel so as to realize the combination of the convolution kernels and the convolution kernels.
Optionally, performing structure transformation on the trained fusible residual structure by using a fusion operator, including:
each convolution kernel in the fusible residual error structure takes the output of the previous convolution kernel layer as input and feeds the output back to the next convolution kernel so as to realize the combination of the convolution kernels and the convolution kernels.
In a second aspect, an embodiment of the present invention further provides a neural network optimization apparatus, including:
the training module is used for carrying out model training based on the multi-branch fusible residual structure and extracting model parameters after training;
the fusion module is used for performing structure conversion on the trained fusible residual error structure type by utilizing a fusion operator to obtain a single-branch residual error structure;
and the deployment inference module is used for deploying the single-branch residual structure to target equipment and executing inference steps of a target task.
Aiming at the memory low-efficiency and low-parallelism structure of a multi-branch network, the invention provides a fusible residual module, adopts a re-parameterization technology, aims at a similar ResNet network, carries out structural replacement on the residual module by replacing the fusible residual module, fuses the residual structure into a convolution when in deployment, avoids the additional memory consumption brought by the multi-branch structure of the network, reduces the network depth, improves the memory efficiency and the parallelism when the network is deployed, saves the network resource consumption, and accelerates the network reasoning speed; meanwhile, various equivalent convolution structures and anisotropic convolution structures are provided, and the performance of the fusible residual error module is enhanced.
Drawings
Fig. 1 is a flowchart of a neural network optimization method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a fusible residual structure according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an equivalent expansion of a 1 by 1 convolution kernel according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a neural network optimization device according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Examples
Fig. 1 is a flowchart of a neural network optimization method provided in an embodiment of the present invention, which specifically includes the following steps:
s110, model training is carried out based on the multi-branch fusible residual structure, and trained model parameters are extracted.
Referring to fig. 2, fig. 2 is a schematic diagram of a fusible residual structure according to an embodiment of the present invention. The fusible residual structure in this embodiment removes the relu layer between two consecutive convolutional layers, removing the nonlinear relationship between the convolutional layers, thereby enabling it to fuse. Further, the fusible residual structure adopts a 131 structure, i.e., a 1 by 1 convolution kernel, a 3 by 3 convolution kernel following the 1 by 1 convolution kernel, and a 1 by 1 convolution kernel following the 3 by 3 convolution kernel.
In this embodiment, at the convolution kernel of 3 by 3, the accuracy degradation problem caused by removing the relu layer is reduced by widening the number of channels.
And S120, performing structure conversion on the trained fusible residual structure by using a fusion operator to obtain a single-branch residual structure.
Specifically, the method for performing structure transformation on the trained model parameters by adopting the fusion operator mainly comprises the following steps: the method comprises three parts of convolution kernel and batch normalization layer combination, convolution kernel and convolution kernel combination and convolution kernel horizontal combination.
(1) Convolution kernel and batch normalization layer merging
In this embodiment, the convolution kernel with the batch normalization layer fused is obtained by traversing all the fusible residual error structures in the neural network and bringing the convolution kernel input in the fusible residual error structures into the formula of the batch normalization layer.
Specifically, the formula of the convolution kernel is:
Conv(X)=WX+b
where X is the input image matrix, W is the parameter matrix, and b is the bias matrix.
The output of the convolution kernel is substituted into the formula of the batch normalization layer to obtain the following expression:
where mean and var are the mean and variance, respectively, of the input matrix X, and γ and β are the scaling factor and bias, respectively, in the normalization layer.
Order:
wherein, WfusedIs a fused parameter matrix, BfsuedIs the fused bias matrix.
The following expression is obtained, which is actually a convolution kernel expression fused with batch normalization.
Convfused(X)=BN(Conv(X))
=WfusedX+Bfused
Wherein, ConvfusedIs a convolution kernel expression formed by fusing batch normalization and convolution kernels, and is represented by WfusedAnd BfsuedAnd (4) forming.
(2) Convolution kernel and convolution kernel merging
In this embodiment, after the batchnorm layers are fused into the convolution kernel layers, each convolution kernel layer in fig. 2 is directly connected, which means that each convolution kernel layer takes the output of its previous convolution kernel layer as an input and feeds back the output to its next convolution kernel layer, so as to implement the combination of the convolution kernel and the convolution kernel.
The specific expression is as follows:
Conv2(Conv1(X))=W2(W1X+b1)+b2
=W2W1X+W2b1+b2
=(W2W1)X+(W2b1+b2)
order:
Wfused=(W2W1)bfused=(W2b1+b2)
the expression is obtained which is in fact an equivalent expression fusing two successive convolution kernels.
Convfused=WfusedX+bfused
(3) Convolution kernel horizontal merging
For a fusible residual structure with downsampling, the 1 by 1 convolution kernels on the skip layer need to be merged horizontally. Specifically, to merge horizontally, the 1 by 1 convolution kernel on the direct connection needs to be equivalently extended to the 3 by 3 convolution kernel to match the sizes, as shown in fig. 3. A 1 by 1 convolution kernel can be seen as a special case of a 3 by 3 convolution kernel, i.e. it can be represented by a 3 by 3 convolution kernel. As shown in fig. 3, the 1 by 1 convolution kernel is extended to a 3 by 3 convolution kernel by filling zeros around the 1 by 1 convolution kernel. The horizontal 3 by 3 convolution kernels may then be combined into one 3 by 3 convolution kernel by adding the 3 by 3 convolution kernel to the center point of the extended 3 by 3 convolution kernel.
S130, deploying the single-branch residual structure to target equipment and executing the inference step of a target task.
For example, a target task may be to automatically assess mineralized foam grade on an embedded device. Aiming at the scenes, the accuracy of the converged ResNe network is reserved during cloud training, and the converged ResNe network is converted into a single-branch structure during deployment and then is deployed at an embedded equipment end, so that the reasoning speed can be obviously increased, and the single reasoning time delay is reduced.
The target task may also be to guard against and detect malicious traffic in the software defined network. Aiming at the scene, the application of the converged ResNet network can effectively improve the reasoning speed of the ResNet network, thereby reducing the interval of network flow scanning each time and improving the overall safety of the software defined network.
Further, the embodiment of the present invention further provides a corresponding experimental verification result, which specifically includes the following contents:
1. experimental setup
The experiment training is carried out by using a Pythroch, the Cifar10 and Cifar100 data sets with enhanced simple data are trained for 120 periods, the learning rate is changed into a preheated cosine annealing function with 5 epochs, and the training batch size (batch size) is 256. In the experimental test, a Pythroch is used as a software environment for the test, the server graphics card is NVIDIA V100, the embedded device is NVIDIATX2, and the speed unit is example/second. In the experimental comparison, the proposed branch fusion method for the residual structure is applied to the ResNet, and compared with the original ResNet in terms of operation speed, model accuracy and memory consumption.
OS | Ubuntu 16.04 Xenial |
CPU | 2*Intel Xeon E5-2620 v4@32x3GHz |
GPU | 2*Nvidia Tesla V100 |
RAM | 256GB DDR4 |
TABLE 1 training Server configuration Table
The training server for the experiment in this embodiment uses an Intel Xeon E5 server, and is configured with 2 NVIDIA V100 video cards, the specific configuration of which is shown in table 1.
Table 2 NVDIA TX2 configuration table
Testing was also performed on the embedded platform at deployment time, using Nvidia TX2 as the deployment environment, which carries quad-coresMPCore,8GB 256 bit LPDDR4 memory, operating system Ubuntu 18.04. The specific configuration thereof is shown in table 2.
2. Results of the experiment
Model (model) | V100 speed (FPS) | TX speed (FPS) | Deployment parameter number (MB) |
ResNet18 | 1644.34 | 159.54 | 45 |
ResNet18* | 3038.67 | 300.22 | 21 |
ResNet34 | 1641.48 | 158.51 | 84 |
ResNet34 | 3031.32 | 298.60 | 39 |
ResNet50 | 474.71 | 48.23 | 98 |
ResNet50* | 2054.89 | 189.00 | 40 |
ResNet101 | 277.84 | 28.86 | 171 |
ResNet101* | 1200.04 | 112.75 | 78 |
ResNet152 | 192.23 | 20.30 | 231 |
ResNet152* | 834.63 | 79.34 | 110 |
TABLE 3 deployment speed comparison at V100 and TX2
Table 3 shows the comparison of the inference speed when the server side and the embedded side are actually deployed. In the test, ResNet18, ResNet34, ResNet50, ResNet101 and ResNet152 in branch fusion deployment are compared with an original model, and the batch size (batch size) is 64 during reasoning. The speed-up ratio of the fusible residual module relative to BasicBlock (shallow ResNet) is about 1.84, the speed-up ratio relative to Bottleneck (deep ResNet) is about 4, and the parameter number is about half less than that of the original ResNet.
TABLE 4 CIFAR10 comparison of training results on CIFAR100
Table 4 shows the training results on Cifar10 and Cifar100, in this test, ResNet18, ResNet34, ResNet50, and ResNet101 deployed in branch fusion are compared with the original model, and a VGG network is added for comparison, and the model performance loss of removing the nonlinear layer is recovered by connecting the fusible extension module. The model with ResNet 50-analog band "-" is a network generated by directly replacing a corresponding ResNet with a fusible residual module, and can be seen that the nonlinear Relu layer in the residual module is directly removed, so that the network performance is reduced by 1% -2% compared with the original network, and the model with ResNet 50-analog band "-" is formed by adding a multipath extension branch to the fusible residual module so as to improve the model performance. Experiments show that through the fusible extension module, the fusible residual module in the embodiment is basically consistent with the accuracy of the original ResNet network.
3. Analysis of Experimental results
Considering that the different points of interest of the model during training and deployment are different, by means of the idea of reparameterization, the embodiment provides a fusible residual module for a residual structure aiming at the hardware operation efficiency during network inference, and optimizes the inference efficiency and the memory efficiency of the residual network model during deployment. By removing the nonlinear layer in the residual error structure and fusing the multi-branch structure before deployment, the model branch structure is removed, the number of model layers is reduced, and the memory efficiency and the operation efficiency during deployment are improved. Firstly, the advantages and limitations of the linear network structure and the multi-branch network structure are discussed, secondly, the training and deployment of the network are decoupled by fine-tuning the ResNet network structure, the multi-branch residual error network structure is used during the training, and the multi-branch residual error network structure is converted into the linear network structure during the deployment, and meanwhile, the advantages of the single-branch network and the multi-branch network are utilized to avoid the disadvantages of the single-branch network and the multi-branch network. Compared with a ResNet network, the model obtained finally has equivalent accuracy and an acceleration ratio of 1.8-4.4 under the condition that the parameters are reduced by half.
With continued reference to fig. 4, fig. 4 is a diagram of a neural network optimization apparatus according to an embodiment of the present invention, where the apparatus includes:
a training module 210, configured to perform model training based on a multi-branch fusible residual structure, and extract model parameters after training;
the fusion module 220 is configured to perform structure transformation on the trained fusible residual structure type by using a fusion operator to obtain a single-branch residual structure;
and a deployment inference module 230, configured to deploy the single-branch residual structure to a target device and perform inference steps of a target task.
Optionally, the fusible residual structure is obtained by removing a relu layer between two consecutive convolution kernels from the residual structure.
Optionally, the convolution kernel structure in the fusible residual structure includes: a 1 by 1 convolution kernel, a 3 by 3 convolution kernel following the 1 by 1 convolution kernel, and a 1 by 1 convolution kernel following the 3 by 3 convolution kernel.
Wherein the fusion module 220 is specifically configured to: traversing all fusible residual error structures in the neural network;
and substituting the convolution kernel input in the fusible residual error structure into the formula of the batch normalization layer to obtain the convolution kernel fused with the batch normalization layer.
Wherein the fusion module 220 is specifically configured to: each convolution kernel in the fusible residual error structure takes the output of the previous convolution kernel layer as input and feeds the output back to the next convolution kernel so as to realize the combination of the convolution kernels and the convolution kernels.
Wherein the fusion module 220 is specifically configured to: for a fusible residual structure with downsampling, expanding a 1-by-1 convolution kernel on a direct connection into a 3-by-3 convolution kernel;
and adding the central point of the expanded 3-by-3 convolution kernel to the 3-by-3 convolution kernel to complete horizontal combination.
The neural network optimization device provided by the embodiment of the invention can execute the neural network optimization method provided by any embodiment of the invention, has corresponding functional modules and beneficial effects of the execution method, and is not repeated.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.
Claims (7)
1. A neural network optimization method, comprising:
model training is carried out based on a multi-branch fusible residual structure, and trained model parameters are extracted;
performing structure conversion on the trained fusible residual error structure type by using a fusion operator to obtain a single-branch residual error structure;
and deploying the single-branch residual structure to a target device and executing the inference step of the target task.
2. The method of claim 1, wherein the fusible residual structure is derived from the residual structure by removing a relu layer between two successive convolution kernels.
3. The method of claim 1, a convolution kernel structure in the fusible residual structure comprising: a 1 by 1 convolution kernel, a 3 by 3 convolution kernel following the 1 by 1 convolution kernel, and a 1 by 1 convolution kernel following the 3 by 3 convolution kernel.
4. The method of claim 1, wherein performing structure transformation on the trained fusible residual structure type by using a fusion operator comprises:
traversing all fusible residual error structures in the neural network;
and substituting the convolution kernel input in the fusible residual error structure into the formula of the batch normalization layer to obtain the convolution kernel fused with the batch normalization layer.
5. The method of claim 1, wherein performing structure transformation on the trained fusible residual structure type by using a fusion operator comprises:
each convolution kernel in the fusible residual error structure takes the output of the previous convolution kernel layer as input and feeds the output back to the next convolution kernel so as to realize the combination of the convolution kernels and the convolution kernels.
6. The method of claim 2, wherein performing structure transformation on the trained fusible residual structure type by using a fusion operator comprises:
for a fusible residual structure with downsampling, expanding a 1-by-1 convolution kernel on a direct connection into a 3-by-3 convolution kernel;
and adding the central point of the expanded 3-by-3 convolution kernel to the 3-by-3 convolution kernel to complete horizontal combination.
7. An apparatus for neural network optimization, comprising:
the training module is used for carrying out model training based on the multi-branch fusible residual structure and extracting model parameters after training;
the fusion module is used for performing structure conversion on the trained fusible residual error structure type by utilizing a fusion operator to obtain a single-branch residual error structure;
and the deployment inference module is used for deploying the single-branch residual structure to target equipment and executing inference steps of a target task.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111060216.3A CN113762479A (en) | 2021-09-10 | 2021-09-10 | Neural network optimization method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111060216.3A CN113762479A (en) | 2021-09-10 | 2021-09-10 | Neural network optimization method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113762479A true CN113762479A (en) | 2021-12-07 |
Family
ID=78794622
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111060216.3A Pending CN113762479A (en) | 2021-09-10 | 2021-09-10 | Neural network optimization method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113762479A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115293337A (en) * | 2022-10-09 | 2022-11-04 | 深圳比特微电子科技有限公司 | Method and device for constructing neural network, computing equipment and storage medium |
CN115600653A (en) * | 2022-12-07 | 2023-01-13 | 荣耀终端有限公司(Cn) | Deployment method and device of neural network model |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190114511A1 (en) * | 2017-10-16 | 2019-04-18 | Illumina, Inc. | Deep Learning-Based Techniques for Training Deep Convolutional Neural Networks |
CN110929697A (en) * | 2019-12-17 | 2020-03-27 | 中国人民解放军海军航空大学 | Neural network target identification method and system based on residual error structure |
CN111242862A (en) * | 2020-01-09 | 2020-06-05 | 西安理工大学 | Multi-scale fusion parallel dense residual convolution neural network image denoising method |
CN111861870A (en) * | 2020-07-16 | 2020-10-30 | 南通大学 | End-to-end parallel generator network construction method for image translation |
US20210264278A1 (en) * | 2020-02-24 | 2021-08-26 | Adobe Inc. | Neural network architecture pruning |
-
2021
- 2021-09-10 CN CN202111060216.3A patent/CN113762479A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190114511A1 (en) * | 2017-10-16 | 2019-04-18 | Illumina, Inc. | Deep Learning-Based Techniques for Training Deep Convolutional Neural Networks |
CN110929697A (en) * | 2019-12-17 | 2020-03-27 | 中国人民解放军海军航空大学 | Neural network target identification method and system based on residual error structure |
CN111242862A (en) * | 2020-01-09 | 2020-06-05 | 西安理工大学 | Multi-scale fusion parallel dense residual convolution neural network image denoising method |
US20210264278A1 (en) * | 2020-02-24 | 2021-08-26 | Adobe Inc. | Neural network architecture pruning |
CN111861870A (en) * | 2020-07-16 | 2020-10-30 | 南通大学 | End-to-end parallel generator network construction method for image translation |
Non-Patent Citations (1)
Title |
---|
魏书伟;曾上游;潘兵;王新娇;: "基于多样化结构的轻量型卷积神经网络设计", 现代电子技术, no. 12 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115293337A (en) * | 2022-10-09 | 2022-11-04 | 深圳比特微电子科技有限公司 | Method and device for constructing neural network, computing equipment and storage medium |
CN115293337B (en) * | 2022-10-09 | 2022-12-30 | 深圳比特微电子科技有限公司 | Method and device for constructing neural network, computing equipment and storage medium |
CN115600653A (en) * | 2022-12-07 | 2023-01-13 | 荣耀终端有限公司(Cn) | Deployment method and device of neural network model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113762479A (en) | Neural network optimization method and device | |
Liu et al. | Feature pyramid encoding network for real-time semantic segmentation | |
US11120330B2 (en) | Accelerator in convolutional neural network and method for operating the same | |
CN108765247B (en) | Image processing method, device, storage medium and equipment | |
CN108304921B (en) | Convolutional neural network training method and image processing method and device | |
CN109858613B (en) | Compression method and system of deep neural network and terminal equipment | |
CN110889416B (en) | Salient object detection method based on cascade improved network | |
Shao et al. | Branchy-GNN: A device-edge co-inference framework for efficient point cloud processing | |
CN107103585B (en) | Image super-resolution system | |
CN110674939A (en) | Deep neural network model compression method based on pruning threshold automatic search | |
US20230252294A1 (en) | Data processing method, apparatus, and device, and computer-readable storage medium | |
CN114580636A (en) | Neural network lightweight deployment method based on three-target joint optimization | |
Bethge et al. | Learning to train a binary neural network | |
CN111738435A (en) | Online sparse training method and system based on mobile equipment | |
US20220019846A1 (en) | Image analysis system and operating method of the same | |
WO2023231635A1 (en) | Model transmission method and apparatus | |
CN111860770A (en) | Model compression method and system integrating clipping and quantization | |
CN111542837B (en) | Three-dimensional convolutional neural network computing device and related products | |
WO2019127926A1 (en) | Calculation method and calculation device for sparse neural network, electronic device, computer readable storage medium, and computer program product | |
CN114882530A (en) | Pedestrian detection-oriented lightweight convolutional neural network model | |
CN116150612A (en) | Model training method and communication device | |
CN113222121A (en) | Data processing method, device and equipment | |
CN111047038A (en) | Neural network compression method using block circulant matrix | |
Orovas et al. | A cellular system for pattern recognition using associative neural networks | |
Chen et al. | Far-Sighted BiSeNet V2 for Real-time Semantic Segmentation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |