CN113762479A - Neural network optimization method and device - Google Patents

Neural network optimization method and device Download PDF

Info

Publication number
CN113762479A
CN113762479A CN202111060216.3A CN202111060216A CN113762479A CN 113762479 A CN113762479 A CN 113762479A CN 202111060216 A CN202111060216 A CN 202111060216A CN 113762479 A CN113762479 A CN 113762479A
Authority
CN
China
Prior art keywords
convolution kernel
fusible
residual
branch
residual error
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111060216.3A
Other languages
Chinese (zh)
Inventor
徐友庆
高成
关晨
孟祥峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Park Sheng Intelligent Technology Co ltd
Original Assignee
Shenzhen Park Sheng Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Park Sheng Intelligent Technology Co ltd filed Critical Shenzhen Park Sheng Intelligent Technology Co ltd
Priority to CN202111060216.3A priority Critical patent/CN113762479A/en
Publication of CN113762479A publication Critical patent/CN113762479A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Abstract

The invention discloses a neural network optimization method and device. Wherein the method comprises the following steps: model training is carried out based on a multi-branch fusible residual structure, and trained model parameters are extracted; performing structure conversion on the trained fusible residual error structure type by using a fusion operator to obtain a single-branch residual error structure; and deploying the single-branch residual structure to a target device and executing the inference step of the target task. The invention realizes the integration of the residual error module through design, carries out structural replacement on the residual error module, fully utilizes the advantages of a multi-branch structure and a single-branch structure, improves the memory efficiency and the parallelism degree when the network deployment is operated, saves the network resource consumption and accelerates the network reasoning speed; and a re-parameterization method is adopted for parameter compression, so that the problem of precision reduction caused by cutting parameters and connection is reduced.

Description

Neural network optimization method and device
Technical Field
The embodiment of the invention relates to the technical field of neural networks, in particular to a neural network optimization method and device.
Background
In recent years, with the rapid development of deep learning, the deep learning has achieved excellent performance in many tasks, so that the deep learning is increasingly applied to a plurality of life and industrial fields. At present, a deployment deep neural network model is divided into an Online deployment mode and an Offline deployment mode. The Offline deployment is usually used in most practical industrial production environments, and the Offline deployment processes data locally without passing through a network, so that the safety and the real-time performance can be guaranteed. However, for embedded end-side devices with limited computational resources, the massive demands on computational power from deep neural networks are unacceptable. At the same time, heavy computing can quickly drain its limited battery power for embedded mobile devices that use batteries.
To solve the deployment dilemma of deep neural networks in embedded devices, bottlenecks have occurred only by the conventional method. The simple increase of DRAM memory capacity of embedded equipment and the enhancement of CPU operational capability cannot match the development speed of neural networks. And in many industrial scenarios, there are strict volume and power consumption limitations on embedded devices, which present a huge challenge to the deployment of neural networks on embedded devices. The constraint requirements of the neural network on the deployment memory and the power consumption of the embedded device are solved, so a feasible neural network deployment scheme meeting the embedded limited hardware resources is born, namely, the neural network model compression.
However, the conventional neural network model compression method cuts redundant connections and parameters out of the trained network model, thereby reducing the number of parameters. Because the compression methods do not change the overall architecture of the network, only redundant connections and parameters are cut off, and thus the model loses part of precision; in addition, the traditional neural network architecture cannot simultaneously utilize the advantages of a multi-branch structure and a single-branch structure, so that the neural network reasoning efficiency is low.
Disclosure of Invention
The invention provides a neural network optimization method and device, which are used for effectively reducing model parameters and improving reasoning efficiency of a neural network.
In a first aspect, an embodiment of the present invention provides a neural network optimization method, including:
model training is carried out based on a multi-branch fusible residual structure, and trained model parameters are extracted;
performing structure conversion on the trained fusible residual error structure type by using a fusion operator to obtain a single-branch residual error structure;
and deploying the single-branch residual structure to a target device and executing the inference step of the target task.
Optionally, the fusible residual structure is obtained by removing a relu layer between two consecutive convolution kernels from the residual structure.
Optionally, the convolution kernel structure in the fusible residual structure includes: a 1 by 1 convolution kernel, a 3 by 3 convolution kernel following the 1 by 1 convolution kernel, and a 1 by 1 convolution kernel following the 3 by 3 convolution kernel.
Optionally, performing structure transformation on the trained fusible residual structure by using a fusion operator, including:
traversing all fusible residual error structures in the neural network;
and substituting the convolution kernel input in the fusible residual error structure into the formula of the batch normalization layer to obtain the convolution kernel fused with the batch normalization layer.
Optionally, performing structure transformation on the trained fusible residual structure by using a fusion operator, including:
each convolution kernel in the fusible residual error structure takes the output of the previous convolution kernel layer as input and feeds the output back to the next convolution kernel so as to realize the combination of the convolution kernels and the convolution kernels.
Optionally, performing structure transformation on the trained fusible residual structure by using a fusion operator, including:
each convolution kernel in the fusible residual error structure takes the output of the previous convolution kernel layer as input and feeds the output back to the next convolution kernel so as to realize the combination of the convolution kernels and the convolution kernels.
In a second aspect, an embodiment of the present invention further provides a neural network optimization apparatus, including:
the training module is used for carrying out model training based on the multi-branch fusible residual structure and extracting model parameters after training;
the fusion module is used for performing structure conversion on the trained fusible residual error structure type by utilizing a fusion operator to obtain a single-branch residual error structure;
and the deployment inference module is used for deploying the single-branch residual structure to target equipment and executing inference steps of a target task.
Aiming at the memory low-efficiency and low-parallelism structure of a multi-branch network, the invention provides a fusible residual module, adopts a re-parameterization technology, aims at a similar ResNet network, carries out structural replacement on the residual module by replacing the fusible residual module, fuses the residual structure into a convolution when in deployment, avoids the additional memory consumption brought by the multi-branch structure of the network, reduces the network depth, improves the memory efficiency and the parallelism when the network is deployed, saves the network resource consumption, and accelerates the network reasoning speed; meanwhile, various equivalent convolution structures and anisotropic convolution structures are provided, and the performance of the fusible residual error module is enhanced.
Drawings
Fig. 1 is a flowchart of a neural network optimization method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a fusible residual structure according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an equivalent expansion of a 1 by 1 convolution kernel according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a neural network optimization device according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Examples
Fig. 1 is a flowchart of a neural network optimization method provided in an embodiment of the present invention, which specifically includes the following steps:
s110, model training is carried out based on the multi-branch fusible residual structure, and trained model parameters are extracted.
Referring to fig. 2, fig. 2 is a schematic diagram of a fusible residual structure according to an embodiment of the present invention. The fusible residual structure in this embodiment removes the relu layer between two consecutive convolutional layers, removing the nonlinear relationship between the convolutional layers, thereby enabling it to fuse. Further, the fusible residual structure adopts a 131 structure, i.e., a 1 by 1 convolution kernel, a 3 by 3 convolution kernel following the 1 by 1 convolution kernel, and a 1 by 1 convolution kernel following the 3 by 3 convolution kernel.
In this embodiment, at the convolution kernel of 3 by 3, the accuracy degradation problem caused by removing the relu layer is reduced by widening the number of channels.
And S120, performing structure conversion on the trained fusible residual structure by using a fusion operator to obtain a single-branch residual structure.
Specifically, the method for performing structure transformation on the trained model parameters by adopting the fusion operator mainly comprises the following steps: the method comprises three parts of convolution kernel and batch normalization layer combination, convolution kernel and convolution kernel combination and convolution kernel horizontal combination.
(1) Convolution kernel and batch normalization layer merging
In this embodiment, the convolution kernel with the batch normalization layer fused is obtained by traversing all the fusible residual error structures in the neural network and bringing the convolution kernel input in the fusible residual error structures into the formula of the batch normalization layer.
Specifically, the formula of the convolution kernel is:
Conv(X)=WX+b
where X is the input image matrix, W is the parameter matrix, and b is the bias matrix.
The output of the convolution kernel is substituted into the formula of the batch normalization layer to obtain the following expression:
Figure BDA0003256096790000041
where mean and var are the mean and variance, respectively, of the input matrix X, and γ and β are the scaling factor and bias, respectively, in the normalization layer.
Order:
Figure BDA0003256096790000042
wherein, WfusedIs a fused parameter matrix, BfsuedIs the fused bias matrix.
The following expression is obtained, which is actually a convolution kernel expression fused with batch normalization.
Convfused(X)=BN(Conv(X))
=WfusedX+Bfused
Wherein, ConvfusedIs a convolution kernel expression formed by fusing batch normalization and convolution kernels, and is represented by WfusedAnd BfsuedAnd (4) forming.
(2) Convolution kernel and convolution kernel merging
In this embodiment, after the batchnorm layers are fused into the convolution kernel layers, each convolution kernel layer in fig. 2 is directly connected, which means that each convolution kernel layer takes the output of its previous convolution kernel layer as an input and feeds back the output to its next convolution kernel layer, so as to implement the combination of the convolution kernel and the convolution kernel.
The specific expression is as follows:
Conv2(Conv1(X))=W2(W1X+b1)+b2
=W2W1X+W2b1+b2
=(W2W1)X+(W2b1+b2)
order:
Wfused=(W2W1)bfused=(W2b1+b2)
the expression is obtained which is in fact an equivalent expression fusing two successive convolution kernels.
Convfused=WfusedX+bfused
(3) Convolution kernel horizontal merging
For a fusible residual structure with downsampling, the 1 by 1 convolution kernels on the skip layer need to be merged horizontally. Specifically, to merge horizontally, the 1 by 1 convolution kernel on the direct connection needs to be equivalently extended to the 3 by 3 convolution kernel to match the sizes, as shown in fig. 3. A 1 by 1 convolution kernel can be seen as a special case of a 3 by 3 convolution kernel, i.e. it can be represented by a 3 by 3 convolution kernel. As shown in fig. 3, the 1 by 1 convolution kernel is extended to a 3 by 3 convolution kernel by filling zeros around the 1 by 1 convolution kernel. The horizontal 3 by 3 convolution kernels may then be combined into one 3 by 3 convolution kernel by adding the 3 by 3 convolution kernel to the center point of the extended 3 by 3 convolution kernel.
S130, deploying the single-branch residual structure to target equipment and executing the inference step of a target task.
For example, a target task may be to automatically assess mineralized foam grade on an embedded device. Aiming at the scenes, the accuracy of the converged ResNe network is reserved during cloud training, and the converged ResNe network is converted into a single-branch structure during deployment and then is deployed at an embedded equipment end, so that the reasoning speed can be obviously increased, and the single reasoning time delay is reduced.
The target task may also be to guard against and detect malicious traffic in the software defined network. Aiming at the scene, the application of the converged ResNet network can effectively improve the reasoning speed of the ResNet network, thereby reducing the interval of network flow scanning each time and improving the overall safety of the software defined network.
Further, the embodiment of the present invention further provides a corresponding experimental verification result, which specifically includes the following contents:
1. experimental setup
The experiment training is carried out by using a Pythroch, the Cifar10 and Cifar100 data sets with enhanced simple data are trained for 120 periods, the learning rate is changed into a preheated cosine annealing function with 5 epochs, and the training batch size (batch size) is 256. In the experimental test, a Pythroch is used as a software environment for the test, the server graphics card is NVIDIA V100, the embedded device is NVIDIATX2, and the speed unit is example/second. In the experimental comparison, the proposed branch fusion method for the residual structure is applied to the ResNet, and compared with the original ResNet in terms of operation speed, model accuracy and memory consumption.
OS Ubuntu 16.04 Xenial
CPU 2*Intel Xeon E5-2620 v4@32x3GHz
GPU 2*Nvidia Tesla V100
RAM 256GB DDR4
TABLE 1 training Server configuration Table
The training server for the experiment in this embodiment uses an Intel Xeon E5 server, and is configured with 2 NVIDIA V100 video cards, the specific configuration of which is shown in table 1.
Figure BDA0003256096790000051
Figure BDA0003256096790000061
Table 2 NVDIA TX2 configuration table
Testing was also performed on the embedded platform at deployment time, using Nvidia TX2 as the deployment environment, which carries quad-cores
Figure BDA0003256096790000062
MPCore,8GB 256 bit LPDDR4 memory, operating system Ubuntu 18.04. The specific configuration thereof is shown in table 2.
2. Results of the experiment
Model (model) V100 speed (FPS) TX speed (FPS) Deployment parameter number (MB)
ResNet18 1644.34 159.54 45
ResNet18* 3038.67 300.22 21
ResNet34 1641.48 158.51 84
ResNet34 3031.32 298.60 39
ResNet50 474.71 48.23 98
ResNet50* 2054.89 189.00 40
ResNet101 277.84 28.86 171
ResNet101* 1200.04 112.75 78
ResNet152 192.23 20.30 231
ResNet152* 834.63 79.34 110
TABLE 3 deployment speed comparison at V100 and TX2
Table 3 shows the comparison of the inference speed when the server side and the embedded side are actually deployed. In the test, ResNet18, ResNet34, ResNet50, ResNet101 and ResNet152 in branch fusion deployment are compared with an original model, and the batch size (batch size) is 64 during reasoning. The speed-up ratio of the fusible residual module relative to BasicBlock (shallow ResNet) is about 1.84, the speed-up ratio relative to Bottleneck (deep ResNet) is about 4, and the parameter number is about half less than that of the original ResNet.
Figure BDA0003256096790000063
Figure BDA0003256096790000071
TABLE 4 CIFAR10 comparison of training results on CIFAR100
Table 4 shows the training results on Cifar10 and Cifar100, in this test, ResNet18, ResNet34, ResNet50, and ResNet101 deployed in branch fusion are compared with the original model, and a VGG network is added for comparison, and the model performance loss of removing the nonlinear layer is recovered by connecting the fusible extension module. The model with ResNet 50-analog band "-" is a network generated by directly replacing a corresponding ResNet with a fusible residual module, and can be seen that the nonlinear Relu layer in the residual module is directly removed, so that the network performance is reduced by 1% -2% compared with the original network, and the model with ResNet 50-analog band "-" is formed by adding a multipath extension branch to the fusible residual module so as to improve the model performance. Experiments show that through the fusible extension module, the fusible residual module in the embodiment is basically consistent with the accuracy of the original ResNet network.
3. Analysis of Experimental results
Considering that the different points of interest of the model during training and deployment are different, by means of the idea of reparameterization, the embodiment provides a fusible residual module for a residual structure aiming at the hardware operation efficiency during network inference, and optimizes the inference efficiency and the memory efficiency of the residual network model during deployment. By removing the nonlinear layer in the residual error structure and fusing the multi-branch structure before deployment, the model branch structure is removed, the number of model layers is reduced, and the memory efficiency and the operation efficiency during deployment are improved. Firstly, the advantages and limitations of the linear network structure and the multi-branch network structure are discussed, secondly, the training and deployment of the network are decoupled by fine-tuning the ResNet network structure, the multi-branch residual error network structure is used during the training, and the multi-branch residual error network structure is converted into the linear network structure during the deployment, and meanwhile, the advantages of the single-branch network and the multi-branch network are utilized to avoid the disadvantages of the single-branch network and the multi-branch network. Compared with a ResNet network, the model obtained finally has equivalent accuracy and an acceleration ratio of 1.8-4.4 under the condition that the parameters are reduced by half.
With continued reference to fig. 4, fig. 4 is a diagram of a neural network optimization apparatus according to an embodiment of the present invention, where the apparatus includes:
a training module 210, configured to perform model training based on a multi-branch fusible residual structure, and extract model parameters after training;
the fusion module 220 is configured to perform structure transformation on the trained fusible residual structure type by using a fusion operator to obtain a single-branch residual structure;
and a deployment inference module 230, configured to deploy the single-branch residual structure to a target device and perform inference steps of a target task.
Optionally, the fusible residual structure is obtained by removing a relu layer between two consecutive convolution kernels from the residual structure.
Optionally, the convolution kernel structure in the fusible residual structure includes: a 1 by 1 convolution kernel, a 3 by 3 convolution kernel following the 1 by 1 convolution kernel, and a 1 by 1 convolution kernel following the 3 by 3 convolution kernel.
Wherein the fusion module 220 is specifically configured to: traversing all fusible residual error structures in the neural network;
and substituting the convolution kernel input in the fusible residual error structure into the formula of the batch normalization layer to obtain the convolution kernel fused with the batch normalization layer.
Wherein the fusion module 220 is specifically configured to: each convolution kernel in the fusible residual error structure takes the output of the previous convolution kernel layer as input and feeds the output back to the next convolution kernel so as to realize the combination of the convolution kernels and the convolution kernels.
Wherein the fusion module 220 is specifically configured to: for a fusible residual structure with downsampling, expanding a 1-by-1 convolution kernel on a direct connection into a 3-by-3 convolution kernel;
and adding the central point of the expanded 3-by-3 convolution kernel to the 3-by-3 convolution kernel to complete horizontal combination.
The neural network optimization device provided by the embodiment of the invention can execute the neural network optimization method provided by any embodiment of the invention, has corresponding functional modules and beneficial effects of the execution method, and is not repeated.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (7)

1. A neural network optimization method, comprising:
model training is carried out based on a multi-branch fusible residual structure, and trained model parameters are extracted;
performing structure conversion on the trained fusible residual error structure type by using a fusion operator to obtain a single-branch residual error structure;
and deploying the single-branch residual structure to a target device and executing the inference step of the target task.
2. The method of claim 1, wherein the fusible residual structure is derived from the residual structure by removing a relu layer between two successive convolution kernels.
3. The method of claim 1, a convolution kernel structure in the fusible residual structure comprising: a 1 by 1 convolution kernel, a 3 by 3 convolution kernel following the 1 by 1 convolution kernel, and a 1 by 1 convolution kernel following the 3 by 3 convolution kernel.
4. The method of claim 1, wherein performing structure transformation on the trained fusible residual structure type by using a fusion operator comprises:
traversing all fusible residual error structures in the neural network;
and substituting the convolution kernel input in the fusible residual error structure into the formula of the batch normalization layer to obtain the convolution kernel fused with the batch normalization layer.
5. The method of claim 1, wherein performing structure transformation on the trained fusible residual structure type by using a fusion operator comprises:
each convolution kernel in the fusible residual error structure takes the output of the previous convolution kernel layer as input and feeds the output back to the next convolution kernel so as to realize the combination of the convolution kernels and the convolution kernels.
6. The method of claim 2, wherein performing structure transformation on the trained fusible residual structure type by using a fusion operator comprises:
for a fusible residual structure with downsampling, expanding a 1-by-1 convolution kernel on a direct connection into a 3-by-3 convolution kernel;
and adding the central point of the expanded 3-by-3 convolution kernel to the 3-by-3 convolution kernel to complete horizontal combination.
7. An apparatus for neural network optimization, comprising:
the training module is used for carrying out model training based on the multi-branch fusible residual structure and extracting model parameters after training;
the fusion module is used for performing structure conversion on the trained fusible residual error structure type by utilizing a fusion operator to obtain a single-branch residual error structure;
and the deployment inference module is used for deploying the single-branch residual structure to target equipment and executing inference steps of a target task.
CN202111060216.3A 2021-09-10 2021-09-10 Neural network optimization method and device Pending CN113762479A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111060216.3A CN113762479A (en) 2021-09-10 2021-09-10 Neural network optimization method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111060216.3A CN113762479A (en) 2021-09-10 2021-09-10 Neural network optimization method and device

Publications (1)

Publication Number Publication Date
CN113762479A true CN113762479A (en) 2021-12-07

Family

ID=78794622

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111060216.3A Pending CN113762479A (en) 2021-09-10 2021-09-10 Neural network optimization method and device

Country Status (1)

Country Link
CN (1) CN113762479A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115293337A (en) * 2022-10-09 2022-11-04 深圳比特微电子科技有限公司 Method and device for constructing neural network, computing equipment and storage medium
CN115600653A (en) * 2022-12-07 2023-01-13 荣耀终端有限公司(Cn) Deployment method and device of neural network model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190114511A1 (en) * 2017-10-16 2019-04-18 Illumina, Inc. Deep Learning-Based Techniques for Training Deep Convolutional Neural Networks
CN110929697A (en) * 2019-12-17 2020-03-27 中国人民解放军海军航空大学 Neural network target identification method and system based on residual error structure
CN111242862A (en) * 2020-01-09 2020-06-05 西安理工大学 Multi-scale fusion parallel dense residual convolution neural network image denoising method
CN111861870A (en) * 2020-07-16 2020-10-30 南通大学 End-to-end parallel generator network construction method for image translation
US20210264278A1 (en) * 2020-02-24 2021-08-26 Adobe Inc. Neural network architecture pruning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190114511A1 (en) * 2017-10-16 2019-04-18 Illumina, Inc. Deep Learning-Based Techniques for Training Deep Convolutional Neural Networks
CN110929697A (en) * 2019-12-17 2020-03-27 中国人民解放军海军航空大学 Neural network target identification method and system based on residual error structure
CN111242862A (en) * 2020-01-09 2020-06-05 西安理工大学 Multi-scale fusion parallel dense residual convolution neural network image denoising method
US20210264278A1 (en) * 2020-02-24 2021-08-26 Adobe Inc. Neural network architecture pruning
CN111861870A (en) * 2020-07-16 2020-10-30 南通大学 End-to-end parallel generator network construction method for image translation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
魏书伟;曾上游;潘兵;王新娇;: "基于多样化结构的轻量型卷积神经网络设计", 现代电子技术, no. 12 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115293337A (en) * 2022-10-09 2022-11-04 深圳比特微电子科技有限公司 Method and device for constructing neural network, computing equipment and storage medium
CN115293337B (en) * 2022-10-09 2022-12-30 深圳比特微电子科技有限公司 Method and device for constructing neural network, computing equipment and storage medium
CN115600653A (en) * 2022-12-07 2023-01-13 荣耀终端有限公司(Cn) Deployment method and device of neural network model

Similar Documents

Publication Publication Date Title
CN113762479A (en) Neural network optimization method and device
Liu et al. Feature pyramid encoding network for real-time semantic segmentation
US11120330B2 (en) Accelerator in convolutional neural network and method for operating the same
CN108765247B (en) Image processing method, device, storage medium and equipment
CN108304921B (en) Convolutional neural network training method and image processing method and device
CN109858613B (en) Compression method and system of deep neural network and terminal equipment
CN110889416B (en) Salient object detection method based on cascade improved network
Shao et al. Branchy-GNN: A device-edge co-inference framework for efficient point cloud processing
CN107103585B (en) Image super-resolution system
CN110674939A (en) Deep neural network model compression method based on pruning threshold automatic search
US20230252294A1 (en) Data processing method, apparatus, and device, and computer-readable storage medium
CN114580636A (en) Neural network lightweight deployment method based on three-target joint optimization
Bethge et al. Learning to train a binary neural network
CN111738435A (en) Online sparse training method and system based on mobile equipment
US20220019846A1 (en) Image analysis system and operating method of the same
WO2023231635A1 (en) Model transmission method and apparatus
CN111860770A (en) Model compression method and system integrating clipping and quantization
CN111542837B (en) Three-dimensional convolutional neural network computing device and related products
WO2019127926A1 (en) Calculation method and calculation device for sparse neural network, electronic device, computer readable storage medium, and computer program product
CN114882530A (en) Pedestrian detection-oriented lightweight convolutional neural network model
CN116150612A (en) Model training method and communication device
CN113222121A (en) Data processing method, device and equipment
CN111047038A (en) Neural network compression method using block circulant matrix
Orovas et al. A cellular system for pattern recognition using associative neural networks
Chen et al. Far-Sighted BiSeNet V2 for Real-time Semantic Segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination