CN108304921B

CN108304921B - Convolutional neural network training method and image processing method and device

Info

Publication number: CN108304921B
Application number: CN201810136118.5A
Authority: CN
Inventors: 程光亮; 石建萍
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2018-02-09
Filing date: 2018-02-09
Publication date: 2021-02-02
Anticipated expiration: 2038-02-09
Also published as: CN108304921A

Abstract

The invention discloses a training method of a convolutional neural network, an image processing method, an image processing device, a computer storage medium, a computer readable storage medium and a computer program, wherein the training method of the convolutional neural network comprises the following steps: deleting at least one nonlinear correction layer arranged behind the linear structure layer in the first convolutional neural network to obtain a second convolutional neural network; and performing supervised training on the second convolutional neural network based on a training image and the labeling information of the training image. The image processing method comprises the following steps: combining at least one parallel branch and/or at least one serial branch in the trained second convolutional neural network to obtain a third convolutional neural network; at least one nonlinear correction layer arranged behind the linear structural layer in the second convolutional neural network is deleted; inputting an image into the third convolutional neural network; and processing the image through the third convolutional neural network to obtain a processing result of the image.

Description

Convolutional neural network training method and image processing method and device

Technical Field

The present invention relates to the field of deep learning technologies, and in particular, to a training method for a convolutional neural network, an image processing method, an image processing apparatus, a computer device, a computer-readable storage medium, and a computer program.

Background

With the development of deep convolutional neural networks, the accuracy of each computer vision data set is constantly refreshed by deeper or wider network structures. However, deepening or widening of the network tends to increase the runtime of the network during training and testing. In recent years, in order to be able to run deep convolutional neural networks on low power, low computational resource platforms, more and more researchers have started to focus on networks that are lightweight and require less time for the training and testing process.

How to design some lightweight convolutional neural networks to achieve shorter training time or shorter testing time becomes an important direction for deep learning research.

Disclosure of Invention

The embodiment of the invention provides a training method of a convolutional neural network, an image processing method, an image processing device, computer equipment, a computer readable storage medium and a computer program.

The embodiment of the invention provides a training method of a convolutional neural network, which comprises the following steps:

deleting at least one nonlinear correction layer arranged behind the linear structure layer in the first convolutional neural network to obtain a second convolutional neural network;

and performing supervised training on the second convolutional neural network based on a training image and the labeling information of the training image.

In the above scheme, the first convolutional neural network includes a residual error unit;

the deleting at least one nonlinear correction layer arranged behind the linear structure layer in the first convolutional neural network to obtain a second convolutional neural network, and the method comprises the following steps:

and deleting part or all of the nonlinear correction layer arranged behind the linear structure layer in the residual error unit in the first convolutional neural network to obtain a second convolutional neural network.

In the above scheme, the linear structure layer includes: convolutional layers or batch normalization layers.

In the foregoing solution, the performing supervised training on the second convolutional neural network based on the training image and the labeled information of the training image includes:

initializing the training parameters of the second convolutional neural network, and iteratively executing the following steps: inputting a training image into the second convolutional neural network, calculating a processing result of the training sample based on the training parameter, and updating the training parameter; until the processing result of the training image is converged; the convergence of the processing result of the training image means that the difference value between the processing result of the training image and the labeling information of the training image is less than or equal to a target threshold value.

The embodiment of the invention provides an image processing method, which comprises the following steps:

combining at least one parallel branch and/or at least one serial branch in the trained second convolutional neural network to obtain a third convolutional neural network; at least one nonlinear correction layer arranged behind the linear structural layer in the second convolutional neural network is deleted;

inputting an image into the third convolutional neural network;

and processing the image through the third convolutional neural network to obtain a processing result of the image.

In the above scheme, the serial branch comprises at least two serially connected linear structure layers;

performing merging processing on the serial branches, including: merging one linear structure layer in the serial branch into another linear structure layer.

In the above scheme, the at least two serially connected linear structure layers include: two convolution layers with different sizes of the convolution kernels connected in series;

performing merging processing on the serial branches, including: deleting one convolutional layer and merging the parameters of the deleted convolutional layer into another convolutional layer.

In the above scheme, the at least two serially connected linear structure layers include: a convolutional layer and a batch normalization layer connected in series;

performing merging processing on the serial branches, including: the batch normalization layer is deleted and the batch normalized scale and offset parameters are incorporated into the convolution kernel parameters of the convolutional layer.

In the above solution, the parallel branch includes: a parallel linear structure branch and a jump branch;

merging the parallel branches, including: the jump branch is deleted and 1 is added to the center position of the convolution kernel of the convolution layer in the linear structure branch.

In the above solution, the parallel branch includes: two parallel linear structure branches;

merging the parallel branches, including: deleting one of the parallel branches and adding the weight vector in the convolution kernel of the deleted one of the linear branches to the center position of the convolution kernel of the other linear branch.

In the foregoing solution, the processing the training image by the third convolutional neural network includes:

and carrying out object identification, classification processing, image segmentation processing or object detection/tracking processing on the image through the third convolutional neural network.

In the above scheme, part or all of the non-linear correction layer after the residual error unit of the second convolutional neural network is disposed in the linear structure layer is deleted.

The embodiment of the invention provides a training device of a convolutional neural network, which comprises:

the deleting unit is used for deleting at least one nonlinear correction layer arranged behind the linear structure layer in the first convolutional neural network to obtain a second convolutional neural network;

and the training unit is used for carrying out supervision training on the second convolutional neural network based on a training image and the labeling information of the training image.

and the deleting unit is used for deleting part or all of the nonlinear correction layer arranged behind the linear structure layer in the residual error unit in the first convolutional neural network to obtain a second convolutional neural network.

In the foregoing solution, the training unit is configured to initialize a training parameter of the second convolutional neural network, and iteratively perform the following steps: inputting a training image into the second convolutional neural network, calculating a processing result of the training sample based on the training parameter, and updating the training parameter; until the processing result of the training image is converged; the convergence of the processing result of the training image means that the difference value between the processing result of the training image and the labeling information of the training image is less than or equal to a target threshold value.

An embodiment of the present invention provides an image processing apparatus, including:

the merging unit is used for merging at least one serial branch and/or at least one parallel branch in the trained second convolutional neural network to obtain a third convolutional neural network; at least one nonlinear correction layer arranged behind the linear structural layer in the second convolutional neural network is deleted;

an image processing unit for inputting an image into the third convolutional neural network; and processing the image through the third convolutional neural network to obtain a processing result of the image.

the merging unit is used for merging one linear structural layer in the serial branches into another linear structural layer.

the merging unit is used for deleting one convolution layer and merging the parameters of the deleted convolution layer into another convolution layer.

and the merging unit is used for deleting the batch normalization layer and merging the batch normalization scale and the offset parameter into the convolution kernel parameter of the convolution layer.

the merging unit is configured to perform merging processing on the parallel branches, and includes: the jump branch is deleted and 1 is added to the center position of the convolution kernel of the convolution layer in the linear structure branch.

and the merging unit is used for deleting one linear structure branch in the parallel branches and adding the weight vector in the convolution kernel of the deleted linear structure branch to the central position of the convolution kernel of the other linear structure.

In the foregoing solution, the image processing unit is configured to perform object identification, classification processing, image segmentation processing, or object detection/tracking processing on the image through the third convolutional neural network.

The embodiment of the invention provides computer equipment, which comprises a memory and a processor, wherein computer executable instructions are stored on the memory, and when the processor runs the computer executable instructions on the memory, the training method or the image processing method of the convolutional neural network provided by the embodiment of the invention is realized.

Embodiments of the present invention provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program implements a training method or an image processing method of a convolutional neural network provided in an embodiment of the present invention.

Embodiments of the present invention provide a computer program, which includes a computer instruction, and when the computer instruction runs in a processor of a device, the training method or the image processing method of the convolutional neural network provided in an embodiment of the present invention is implemented.

In the technical scheme of the embodiment of the invention, at least one serial branch and/or at least one parallel branch in the trained second convolutional neural network can be combined to obtain a third convolutional neural network; at least one nonlinear correction layer arranged behind the linear structural layer in the second convolutional neural network is deleted; inputting an image into the third convolutional neural network; and processing the image through the third convolutional neural network to obtain a processing result of the image. Therefore, compared with the second convolutional neural network, the third convolutional neural network combines the serial branches and/or the parallel branches, and the test time of the convolutional neural network is greatly shortened on the premise of not influencing the test precision as much as possible.

By adopting the technical scheme of the embodiment of the invention, the compression and acceleration of the convolutional neural network can be realized, and the running time and the required video memory of the network can be greatly reduced.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description, serve to explain the principles of the application.

The present application may be more clearly understood from the following detailed description with reference to the accompanying drawings, in which:

FIG. 1 is a schematic flow chart of a training method of a convolutional neural network according to an embodiment of the present invention;

FIG. 2(a) is a schematic diagram of a first residual error unit before refinement;

FIG. 2(b) is a simplified diagram of a first residual unit;

FIG. 2(c) is a schematic diagram of a second residual error unit before refinement;

FIG. 2(d) is a simplified diagram of a second residual unit;

FIG. 3 is a flowchart illustrating an image processing method according to an embodiment of the present invention;

FIG. 4(a) is a schematic diagram of the skip module before simplification;

FIG. 4(b) is a schematic diagram of the projection module before it is simplified;

FIG. 4(c) is a schematic diagram of the skip module and the projection module before simplification;

FIG. 5(a) is a schematic structural diagram of a second convolutional neural network;

FIG. 5(b) is a schematic structural diagram of a third convolutional neural network;

FIG. 6 is a schematic structural diagram of a training apparatus for convolutional neural network according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of an exemplary embodiment of an image processing apparatus;

fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present invention.

Detailed Description

So that the manner in which the features and aspects of the embodiments of the present invention can be understood in detail, a more particular description of the embodiments of the invention, briefly summarized above, may be had by reference to the embodiments, some of which are illustrated in the appended drawings.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the application, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

Embodiments of the present application may be implemented in electronic devices such as terminal devices, computer systems, and servers, which are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with electronic devices, such as terminal devices, computer systems, and servers, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above, and the like.

Electronic devices such as terminal devices, computer systems, and servers may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, and data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.

Fig. 1 is a schematic flow chart of a training method of a convolutional neural network according to an embodiment of the present invention, and as shown in fig. 1, the method includes the following steps:

step 101: and deleting at least one nonlinear correction layer arranged behind the linear structure layer in the first convolutional neural network to obtain a second convolutional neural network.

In the embodiment of the present invention, the first convolutional neural network may be any residual network, the first convolutional neural network includes a residual unit, and the second convolutional neural network is obtained by deleting a part or all of the nonlinear correction layer after the linear structure layer in the residual unit in the first convolutional neural network.

In an embodiment of the present invention, the linear structure layer includes: convolutional layers or batch normalization layers.

Referring to fig. 2(a), 2(b), 2(c) and 2(d), fig. 2(a) and 2(c) are two classical residual units, which include two types of structures, respectively: a bottleneck (bottleeck) structure (shown in fig. 2 (a)) facilitated by 1 × 1 convolution, 3 × 3 convolution and 1 × 1 convolution, and a structure consisting of a concatenation of two 3 × 3 convolutions (shown in fig. 2 (c)). Fig. 2(b) shows that the ReLU layer (i.e., the non-linear modified layer) following the BN layer (i.e., the batch normalization layer) is deleted with respect to fig. 2(a), and fig. 2(d) shows that the ReLU layer (i.e., the non-linear modified layer) following the BN layer (i.e., the batch normalization layer) is deleted with respect to fig. 2 (c). It should be noted that the embodiment of the present invention does not delete the ReLU layer after the sum (sum) layer, and it can be seen that the embodiment of the present invention deletes a part of the ReLU layer in the residual unit, that is, deletes the ReLU layer disposed after the linear structure layer.

Step 102: and performing supervised training on the second convolutional neural network based on a training image and the labeling information of the training image.

Optionally, initializing training parameters of the second convolutional neural network, and iteratively performing the following steps: inputting a training image into the second convolutional neural network, calculating a processing result of the training sample based on the training parameter, and updating the training parameter; until the processing result of the training image is converged; the convergence of the processing result of the training image means that the difference value between the processing result of the training image and the labeling information of the training image is less than or equal to a target threshold value.

In the technical scheme of the embodiment of the invention, at least one nonlinear correction layer arranged behind a linear structure layer is deleted in a first convolutional neural network to obtain a second convolutional neural network; and performing supervised training on the second convolutional neural network based on a training image and the labeling information of the training image. Therefore, compared with the first convolutional neural network, the second convolutional neural network deletes the nonlinear correction layer, and greatly shortens the training time of the convolutional neural network on the premise of not influencing the training precision.

In a relatively classical deep convolutional neural network, each convolutional layer (Convolution) is connected in series with a Batch Normalization (BN) layer and a nonlinear correction (ReLU) layer, where the BN layer mainly functions to solve the "internal covariance skewing" phenomenon, and the ReLU layer serves as a nonlinear correction layer and mainly functions to retain positive values in the input, and negative values are set to 0.

The residual error network (ResNet) is a deep convolutional neural network, and consists of residual error units, and the applicant finds through experiments that the ReLU layer in the residual error units is removed, so that the classification performance of the network is not influenced, and the processing performance of the network is increased.

Based on the above experimental observation, the embodiment of the present invention provides a novel convolutional neural network (i.e., a second convolutional neural network), which is implemented by structurally deleting a nonlinear modification layer (i.e., a ReLU layer) disposed behind a linear structure layer from an old convolutional neural network (i.e., a first convolutional neural network). The novel convolutional neural network can simplify network training, and can obtain similar or even better target task processing effect as a residual error network.

In the process of network training, the training parameters are continuously optimized in the training process, namely the training parameters are a dynamic change value, and the process of network testing is optimized compared with the process of network training, namely the training parameters are a fixed optimized value, so that the convolutional neural network after the training can be further simplified, for example, serial branches and/or parallel branches in the convolutional neural network after the training are combined, and finally, the serial branches and/or the parallel branches are combined into a network structure consisting of only 3 × 3 convolutional layers and ReLU layers, so that the network operation time is shortened, and the required display memory can be greatly reduced. Experiments show that the convolutional neural network provided by the embodiment of the invention can achieve 2-4 times of test acceleration, and in addition, the required memory is only 1/4 of the original residual error network, so that the convolutional neural network provided by the embodiment of the invention can operate on a platform with low power consumption and low computing resources, and can keep the processing precision completely consistent with that of the old network.

Through a large number of experiments, after some ReLU layers in the residual error unit are removed, not only the task processing precision is not reduced, but also the task processing precision is increased, which proves that the 'connection of one ReLU layer behind each convolution layer or batch normalization layer' in the traditional residual error network is unnecessary. Therefore, the embodiment of the invention appropriately reduces the use of the ReLU layer in the network, thereby obtaining a network structure with similar or better task processing precision, such as classification tasks, image semantic segmentation tasks, target detection tasks and the like.

Fig. 3 is a schematic flowchart of an image processing method according to an embodiment of the present invention, and as shown in fig. 3, the method includes the following steps:

step 301: combining at least one parallel branch and/or at least one serial branch in the trained second convolutional neural network to obtain a third convolutional neural network; and deleting at least one nonlinear correction layer arranged behind the linear structural layer in the second convolutional neural network.

In the embodiment of the present invention, the trained second convolutional neural network can be obtained by, but not limited to, training in the method shown in fig. 1, and it can be trained for the network based on a supervised or unsupervised mode.

In one embodiment, part or all of the non-linear modification layer disposed behind the linear structure layer in the residual unit of the second convolutional neural network is deleted.

In one embodiment, the linear structure layer is a network layer having a linear processing function, and may include, but is not limited to: convolutional layers, which may include, but are not limited to, convolutional layers, pooling layers.

In the embodiment of the present invention, at least one serial branch and/or at least one parallel branch in the trained second convolutional neural network are merged for equivalent processing the processed branch into a conventional basic network layer, for example, the serial branch is processed into a linear structure layer with equivalent processing capability to the serial branch through the merged processing; as another example, the parallel branches are processed by a merge process into a linear structure layer having equivalent processing power to the parallel branches; and so on. Through the combination processing mode, the more complex part of the second convolutional neural network is converted into the traditional network layer with equivalent processing capacity, on the premise of not influencing the test precision as much as possible, the test operation speed is improved, the test time is shortened, and resources such as internal memory and the like required by the operation of the convolutional neural network are reduced.

Optionally, the merging process includes one or more of the following processing modes:

the first method is as follows: and under the condition that the serial branch comprises at least two linear structure layers connected in series, one linear structure layer in the serial branch is merged into the other linear structure layer.

Here, the two concatenated linear structure layers may have the same structure (e.g., both convolution layers, and the convolution kernels of different convolution layers may be the same or different), or may have different structures (e.g., one convolution layer and the other BN layer).

For example: the at least two serially connected linear structure layers comprise: two concatenated convolutional layers with different convolutional kernel sizes, in which case one convolutional layer is deleted and the parameters of the deleted convolutional layer are merged into the other convolutional layer. Optionally, the convolution kernel parameters of the deleted convolution layer are merged at the center of another convolution layer (for example, merging is performed by parameter matrix multiplication).

For example: the at least two serially connected linear structure layers comprise: deleting the batch normalization layer, and combining the batch normalization scale and the offset parameter into the convolution kernel parameter of the convolution layer, wherein the convolution kernel parameter of the convolution layer is assumed to be z, and the z is W x + b, wherein W and b are the weight matrix and the offset of the convolution layer respectively, and the batch normalization scale and the offset parameter are combined into the convolution kernel parameter of the convolution layer to obtain the batch normalization scale and the offset parameter

Where c is related to the batch normalized scale and d is related to the batch normalized offset parameter.

The second method comprises the following steps: and under the condition that the parallel branches comprise a linear structure branch and a jump branch which are parallel, deleting the jump branch and adding 1 to the central position of the convolution kernel of the convolution layer in the linear structure branch.

Assuming that the weight matrix of convolutional layers in the branches of the linear structure is

Adding 1 to the central position of the convolution kernel of the convolution layer in the linear structure branch to obtain

Wherein i represents the ith weight filter, j represents the jth channel of the ith weight filter, and (k, s) represents the weight position.

The third method comprises the following steps: in the case where the parallel branch includes two linear structure branches in parallel, one of the parallel branches is deleted, and the weight vector in the convolution kernel of the deleted one of the linear structure branches is added to the center position of the convolution kernel of the other linear structure.

The weight vector in the convolution kernel of another linear structure branch is added to the central position of the convolution kernel of the convolution layer in the linear structure branch

Wherein i represents the ith weight filter, j represents the jth channel of the ith weight filter, and (k, s) represents the weight position,

a weight matrix representing the convolution layer of the other linear structure branch.

In an alternative implementation, the parallel branches included in the second convolutional neural network may include at least one of a projection module (projection-module) and a skip module (skip-module), such as the skip module shown in fig. 4(a) and the projection module shown in fig. 4 (b). The skip module comprises a 3 x 3 convolution layer, and the input and the output are directly connected through a skip layer; and the projection module performs convolution operation on the input feature map through a convolution layer of 3 x 3 and a convolution layer of 1 x1 respectively, and fuses the two results as the output of the module. The modules shown in fig. 4(a) and fig. 4(b) can be network merged by the solution of the embodiment of the present invention, so that the network structure module shown in fig. 4(c) can be obtained, and the network structure module is similar to the conventional network structure, i.e. does not include any branch and residual structure, which can greatly save network test time and required video memory.

Step 302: inputting an image into the third convolutional neural network; and processing the image through the third convolutional neural network to obtain a processing result of the image.

In one embodiment, the image is subject to object recognition, classification processing, image segmentation processing, or object detection/tracking processing via the third convolutional neural network. Here, the object recognition of the image may be recognition of any type of object such as a person or an object.

Fig. 5(a) is a schematic structural diagram of a second convolutional neural network, which is obtained by deleting a ReLU layer following a convolutional layer or a BN layer in a first convolutional neural network, as shown in fig. 5(a), and includes a plurality of residual units, in each of which only the ReLU layer following a sum layer is reserved.

Fig. 5(b) is a schematic structural diagram of a third convolutional neural network, and as shown in fig. 5(b), the third convolutional neural network is obtained by combining serial branches and parallel branches of the second convolutional neural network, and the obtained third convolutional neural network has a simplified structure.

It should be understood that the number of each layer structure described in the embodiment of the present invention is not limited, and the technical solution of the embodiment of the present invention is further described in detail with reference to fig. 4(a), fig. 4(b), and fig. 4 (c).

(1) Merging a BN layer into a convolutional layer preceding it

In general, a BN layer includes two steps, namely a normalization operation and a scaling/translation operation. During the training process, parameter updating is performed by calculating the mean and variance of the training data in each iteration process. In the prediction process, the factors are not changed due to the change of the data, i.e., the factors are fixed. The operation of the BN layer at this time can be regarded as a linear structure. Furthermore, the operation of the known convolutional layer can also be considered as a linear operation. It should be understood that two linear operations may be combined into one linear operation, i.e., in the test phase, the BN layer may be combined into the convolutional layer and processed. This saves the running time and memory required for the network to perform BN operations. For the skip module shown in fig. 4(a) and the projection module shown in fig. 4(b), the BN layer thereof may be deleted.

(2) Merging parallel branches

1) Parallel branch merging for skip modules

After the BN layer is combined into the convolution layer, the jump module is converted into a jump operation and a convolution operation of 3 x 3, the jump operation and the convolution operation of 3 x 3 are two parallel operations, the corresponding network structure is a parallel branch structure, and the jump operation can be combined into the convolution operation of 3 x 3. The basic operation is as follows: the center of the ith channel of the i convolution kernels is added to 1. As shown in fig. 4(c), the jump module can be merged into a basic unit required by a conventional network, and one branch and an information fusion operation between different branches are omitted, so that the network running time and the required video memory are greatly reduced.

2) Parallel branch merging for projection modules

After the BN layer is merged into the convolution layer, the projection module converts into two convolution operations, namely, 1 × 1 convolution operation and 3 × 3 convolution operation, where the 1 × 1 convolution operation and 3 × 3 convolution operation are two parallel operations, and the corresponding network structure is a parallel branch structure, and the 1 × 1 convolution operation can also be merged into the 3 × 3 convolution operation. The basic operation is as follows: the weight vector in the i 1 x1 th convolution kernel is added to the center position of the i 3 x 3 th convolution kernel. As shown in fig. 4(c), the projection module can also be combined into a basic unit required by a conventional network, which omits one branch and information fusion operation between different branches, thereby greatly reducing network running time and required video memory.

Through the operation, the technical scheme of the embodiment of the invention has the following two advantages: (1) the network structure of the embodiment of the invention comprises a skip module and a projection module similar to a residual error network, and the introduction of the modules can overcome the gradient disappearance phenomenon existing in the network training process of the traditional neural network and can carry out effective gradient back transmission and network training on the network; (2) the network structure of the embodiment of the invention adopts a network structure similar to residual errors during training, and adopts a network structure similar to the traditional network during network testing, thereby saving the running time and the video memory required by a BN layer and a multi-branch structure and obviously accelerating the network running speed.

The technical scheme of the embodiment of the invention 1) can be used for researching network acceleration and memory saving. 2) The deep convolutional neural network can be operated on platforms with low power consumption and low computational resources, such as a Programmable Gate Array (FPGA), PX2 and TX 1. 3) The method is not limited to the image classification task, and can also be expanded to image semantic segmentation and target detection tasks. 4) The idea of network merging can be used to conduct further research on network acceleration and model compression.

Fig. 6 is a schematic structural composition diagram of a training apparatus of a convolutional neural network according to an embodiment of the present invention, and as shown in fig. 6, the training apparatus of the convolutional neural network includes:

a deleting unit 601, configured to delete at least one nonlinear correction layer arranged behind the linear structure layer in the first convolutional neural network, so as to obtain a second convolutional neural network;

a training unit 602, configured to perform supervised training on the second convolutional neural network based on a training image and the labeling information of the training image.

In one embodiment, the first convolutional neural network comprises a residual unit;

the deleting unit 601 is configured to delete, in the first convolutional neural network, part or all of the nonlinear correction layer after the linear structure layer in the residual error unit, so as to obtain a second convolutional neural network.

In one embodiment, the linear structure layer includes: convolutional layers or batch normalization layers.

In an embodiment, the training unit 602 is configured to initialize training parameters of the second convolutional neural network, and iteratively perform the following steps: inputting a training image into the second convolutional neural network, calculating a processing result of the training sample based on the training parameter, and updating the training parameter; until the processing result of the training image is converged; the convergence of the processing result of the training image means that the difference value between the processing result of the training image and the labeling information of the training image is less than or equal to a target threshold value.

It will be understood by those skilled in the art that the implementation functions of each unit in the training device of the convolutional neural network shown in fig. 6 can be understood by referring to the related description of the training method of the convolutional neural network. The functions of the units in the training apparatus of the convolutional neural network shown in fig. 6 can be realized by a program running on a processor, and can also be realized by a specific logic circuit.

Fig. 7 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present invention, and as shown in fig. 7, the image processing apparatus includes:

a merging unit 701, configured to merge at least one parallel branch and/or at least one serial branch in the trained second convolutional neural network to obtain a third convolutional neural network; at least one nonlinear correction layer arranged behind the linear structural layer in the second convolutional neural network is deleted;

an image processing unit 702 for inputting an image into the third convolutional neural network; and processing the image through the third convolutional neural network to obtain a processing result of the image.

In one embodiment, the serial branch comprises at least two serially connected linear structure layers;

the merging unit 701 is configured to merge one linear structure layer in the serial branches into another linear structure layer.

In one embodiment, the at least two serially connected linear structure layers include: two convolution layers with different sizes of the convolution kernels connected in series;

the merging unit 701 is configured to delete one convolution layer and merge parameters of the deleted convolution layer into another convolution layer.

In one embodiment, the at least two serially connected linear structure layers include: a convolutional layer and a batch normalization layer connected in series;

the merging unit 701 is configured to delete the batch normalization layer and merge the batch normalization scale and the offset parameter into the convolution kernel parameter of the convolutional layer.

In one embodiment, the parallel branch comprises: a parallel linear structure branch and a jump branch;

the merging unit 701 is configured to perform merging processing on the parallel branches, and includes: the jump branch is deleted and 1 is added to the center position of the convolution kernel of the convolution layer in the linear structure branch.

In one embodiment, the parallel branch comprises: two parallel linear structure branches;

the merging unit 701 is configured to delete one of the parallel branches and add the weight vector in the convolution kernel of the deleted one of the linear structure branches to the center position of the convolution kernel of the other linear structure.

In an embodiment, the image processing unit 702 is configured to perform object recognition, classification processing, image segmentation processing, or object detection/tracking processing on the image via the third convolutional neural network.

It will be understood by those skilled in the art that the implementation functions of the units in the image processing apparatus shown in fig. 7 can be understood with reference to the related description of the aforementioned image processing method. The functions of the units in the image processing apparatus shown in fig. 7 may be realized by a program running on a processor, or may be realized by specific logic circuits.

The device according to the embodiment of the present invention may also be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as an independent product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, or an optical disk. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.

Accordingly, the embodiment of the present invention also provides a computer storage medium, in which computer executable instructions are stored, and when the computer executable instructions are executed by a processor, the computer executable instructions implement the above-mentioned method of the embodiment of the present invention.

Fig. 8 is a schematic structural component diagram of a computer device according to an embodiment of the present invention, and as shown in fig. 8, the computer device 100 may include one or more processors 1002 (only one of which is shown in the figure) (the processors 1002 may include, but are not limited to, a processing device such as a Microprocessor (MCU) or a Programmable logic device (FPGA), a memory 1004 for storing data, and a transmission device 1006 for a communication function. It will be understood by those skilled in the art that the structure shown in fig. 8 is only an illustration and is not intended to limit the structure of the electronic device. For example, computer device 100 may also include more or fewer components than shown in FIG. 8, or have a different configuration than shown in FIG. 8.

The memory 1004 can be used for storing software programs and modules of application software, such as program instructions/modules corresponding to the method in the embodiment of the present invention, and the processor 1002 executes various functional applications and data processing by running the software programs and modules stored in the memory 1004, so as to implement the method described above. The memory 1004 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 1004 may further include memory located remotely from the processor 1002, which may be connected to the computer device 100 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 1006 is used for receiving or sending data via a network. Specific examples of such networks may include wireless networks provided by the communications provider of the computer device 100. In one example, the transmission device 1006 includes a Network adapter (NIC) that can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission device 1006 can be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

The technical schemes described in the embodiments of the present invention can be combined arbitrarily without conflict.

In the embodiments provided in the present invention, it should be understood that the disclosed method and intelligent device may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all the functional units in the embodiments of the present invention may be integrated into one second processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention.

Claims

1. A method of training a convolutional neural network, the method comprising:

inputting the training image into a second convolutional neural network, wherein the second convolutional neural network is obtained by the following processing of the first convolutional neural network: deleting at least one nonlinear correction layer arranged behind a linear structure layer in a first convolutional neural network, and maintaining other structures in the first convolutional neural network unchanged;

processing the training image by a predetermined task through the second convolutional neural network to obtain a processing result; the predetermined task includes at least one of: image classification, image semantic segmentation and target detection;

updating the training parameters of the second convolutional neural network based on the processing result and the labeling information of the training image until the processing result of the training image is converged; the convergence of the processing result of the training image means that the difference value between the processing result of the training image and the labeling information of the training image is less than or equal to a target threshold value.

2. The method of claim 1, wherein the first convolutional neural network comprises a residual unit;

3. The method of claim 1 or 2, wherein the linear structure layer comprises: convolutional layers or batch normalization layers.

4. An image processing method, characterized in that the method comprises:

inputting the image into a third convolutional neural network, wherein the third convolutional neural network is obtained by the second convolutional neural network through the following processing: merging at least one parallel branch and/or at least one serial branch in the trained second convolutional neural network; at least one nonlinear correction layer arranged behind the linear structural layer in the second convolutional neural network is deleted;

processing the image through the third convolutional neural network to obtain a processing result of the image, wherein the predetermined task comprises at least one of the following steps: image classification, image semantic segmentation and target detection.

5. The method of claim 4, wherein the serial branch comprises at least two serially connected linear structure layers;

6. The method of claim 5, wherein the at least two serially connected linear structure layers comprise: two convolution layers with different sizes of the convolution kernels connected in series;

7. The method of claim 5, wherein the at least two serially connected linear structure layers comprise: a convolutional layer and a batch normalization layer connected in series;

8. The method according to any of claims 4 to 7, wherein the parallel branching comprises: a parallel linear structure branch and a jump branch;

9. The method according to any of claims 4-7, wherein the parallel branching comprises: two parallel linear structure branches;

10. The method of any of claims 4-7, wherein said performing a predetermined task on said image via said third convolutional neural network comprises:

11. The method according to any one of claims 4 to 7, wherein part or all of the non-linear modification layer disposed behind the linear structure layer in the residual unit of the second convolutional neural network is deleted.

12. The method of any of claims 4-7, wherein the linear structural layer comprises: convolutional layers or batch normalization layers.

13. An apparatus for training a convolutional neural network, the apparatus comprising:

the training unit is used for inputting a training image into a second convolutional neural network, and the second convolutional neural network is obtained by the following processing of the first convolutional neural network: deleting at least one nonlinear correction layer arranged behind a linear structure layer in a first convolutional neural network, and maintaining other structures in the first convolutional neural network unchanged; processing the training image by a predetermined task through the second convolutional neural network to obtain a processing result; the predetermined task includes at least one of: image classification, image semantic segmentation and target detection; updating the training parameters of the second convolutional neural network based on the processing result and the labeling information of the training image until the processing result of the training image is converged; the convergence of the processing result of the training image means that the difference value between the processing result of the training image and the labeling information of the training image is less than or equal to a target threshold value.

14. The apparatus of claim 13, wherein the first convolutional neural network comprises a residual unit; the device further comprises:

15. The apparatus of claim 13 or 14, wherein the linear structure layer comprises: convolutional layers or batch normalization layers.

16. An image processing apparatus, characterized in that the apparatus comprises:

the image processing unit is used for inputting an image into a third convolutional neural network, and the third convolutional neural network is obtained by the following processing of the second convolutional neural network: merging at least one parallel branch and/or at least one serial branch in the trained second convolutional neural network; at least one nonlinear correction layer arranged behind the linear structural layer in the second convolutional neural network is deleted; processing the image through the third convolutional neural network to obtain a processing result of the image, wherein the predetermined task comprises at least one of the following steps: image classification, image semantic segmentation and target detection.

17. The apparatus of claim 16, wherein the serial branch comprises at least two serially connected linear structure layers; the device further comprises:

a merging unit for merging one linear structure layer in the serial branch into another linear structure layer.

18. The apparatus of claim 17, wherein the at least two serially connected linear structure layers comprise: two convolution layers with different sizes of the convolution kernels connected in series;

19. The apparatus of claim 17, wherein the at least two serially connected linear structure layers comprise: a convolutional layer and a batch normalization layer connected in series;

20. The apparatus according to any of claims 16 to 19, wherein the parallel branches comprise: a parallel linear structure branch and a jump branch; the device further comprises:

the merging unit is used for merging the parallel branches, and comprises: the jump branch is deleted and 1 is added to the center position of the convolution kernel of the convolution layer in the linear structure branch.

21. The apparatus according to any of claims 16 to 19, wherein the parallel branches comprise: two parallel linear structure branches; the device further comprises:

22. The apparatus according to any one of claims 16 to 19, wherein the image processing unit is configured to perform object recognition, classification processing, image segmentation processing, or object detection/tracking processing on the image via the third convolutional neural network.

23. The apparatus of any one of claims 16 to 19, wherein part or all of the non-linear modification layer disposed behind the linear structure layer in the residual unit of the second convolutional neural network is deleted.

24. The apparatus of any one of claims 16 to 19, wherein the linear structural layer comprises: convolutional layers or batch normalization layers.

25. A computer device comprising a memory having computer-executable instructions stored thereon and a processor that, when executing the computer-executable instructions on the memory, performs the method steps of any one of claims 1 to 3 or the method steps of any one of claims 4 to 12.

26. A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the method of any one of claims 1 to 3, or the method of any one of claims 4 to 12.