CN110852425A

CN110852425A - Optimization-based neural network processing method and device and electronic system

Info

Publication number: CN110852425A
Application number: CN201911124704.9A
Authority: CN
Inventors: 李运
Original assignee: Beijing Megvii Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd
Priority date: 2019-11-15
Filing date: 2019-11-15
Publication date: 2020-02-28

Abstract

The invention provides a processing method, a device and an electronic system based on an optimized neural network; wherein, the method comprises the following steps: acquiring a convolution kernel used by each convolution layer in each training of the convolutional neural network, and acquiring cross entropy loss and delay loss corresponding to each training according to the convolution kernel used by each convolution layer, wherein the cross entropy loss and the delay loss correspond to a convolution layer identification vector of each training; calculating an integral loss value corresponding to each training based on a preset loss function and the cross entropy loss and the delay loss corresponding to the convolutional layer identification vector; and screening out the convolution kernels finally used by each convolution layer according to the overall loss value corresponding to each training. In the method, the overall loss value of the convolutional neural network is comprehensively considered based on the cross entropy loss and the delay loss corresponding to each training so as to carry out double-target optimization on the accuracy and the delay of the convolutional neural network, and the convolutional neural network which meets the delay requirement and has higher accuracy can be screened out.

Description

Optimization-based neural network processing method and device and electronic system

Technical Field

The invention relates to the technical field of convolutional neural networks, in particular to a neural network processing method and device based on optimization and an electronic system.

Background

With the rapid development of the convolutional neural network, the parameters of the convolutional neural network model are generally increased to the order of millions, millions or billions, and the occupied space exceeds the storage device of various current mobile terminals. Therefore, the convolutional neural network model has extremely high requirements on a computer and storage equipment, exceeds the operational limit of various current mobile terminal equipment, and limits the application of the convolutional neural network model on the mobile terminal equipment.

In the related technology, redundancy in the model is screened under the condition of not influencing the accuracy of the convolutional neural network model by generally using a network pruning technology, so that the purposes of compressing the parameter quantity and the calculated quantity of the model are achieved. However, most of the network pruning is only concerned with the parameter quantity and the calculation quantity, so that the convolutional neural network subjected to network pruning has the problem of high delay.

Disclosure of Invention

In view of the above, the present invention provides an optimization-based neural network processing method, an optimization-based neural network processing device, and an electronic system, so as to optimize a convolutional neural network, and screen out a convolutional neural network that meets a delay requirement and has a high accuracy.

In a first aspect, an embodiment of the present invention provides an optimization-based neural network processing method, where the method includes: acquiring a convolution kernel used by each convolution layer in each training of the convolutional neural network, and acquiring cross entropy loss and delay loss corresponding to each training according to the convolution kernel used by each convolution layer, wherein the cross entropy loss and the delay loss correspond to a convolution layer identification vector of each training; each element in the convolutional layer identification vector is the number of convolutional cores used by the corresponding convolutional layer; calculating an integral loss value corresponding to each training based on a preset loss function and the cross entropy loss and the delay loss corresponding to the convolutional layer identification vector; screening out the convolution kernel finally used by each convolution layer according to the overall loss value corresponding to each training to obtain an optimized convolution neural network; and inputting the image to be detected into a convolutional neural network for processing to obtain an image identification result.

In a preferred embodiment of the present invention, the step of obtaining the convolution kernel used by each convolutional layer in each training of the convolutional neural network, and obtaining the cross entropy loss and the delay loss corresponding to each training according to the convolution kernel used by each convolutional layer, where the cross entropy loss and the delay loss correspond to the convolutional layer identification vector in each training, includes: acquiring the identifier of a convolution kernel used by each convolution layer in each training of the convolution neural network; determining cross entropy loss corresponding to each training based on a convolution kernel used by each convolution layer in each training; determining a convolutional layer identification vector of each training based on the identification of the convolutional kernel used by each convolutional layer in each training; and determining the delay loss corresponding to each training based on the convolutional layer identification vector of each training.

In a preferred embodiment of the present invention, the step of obtaining the identifier of the convolution kernel used by each convolution layer in each training of the convolutional neural network includes: in each training, traversing the convolutional layers of the convolutional neural network, taking the traversed convolutional layers as target convolutional layers, and executing the following operations for each target convolutional layer pair: acquiring a weight parameter of each convolution core in the target convolution layer, wherein the weight parameter is required to be trained each time; inputting the weight parameter of each convolution kernel to a preset convolution kernel screening module, wherein the convolution kernel screening module comprises a full-link layer and a binarization activation function; scoring each convolution kernel through the full connection layer, binarizing the score through a binarization activation function, and outputting a corresponding identifier of the convolution kernel; wherein the identification of the convolution kernel comprises 0 and 1; the convolution kernel labeled 1 is the convolution kernel used by the target convolution layer.

In a preferred embodiment of the present invention, the step of determining the convolutional layer identification vector for each training based on the identification of the convolutional kernel used by each convolutional layer in each training includes: for each training the following steps are performed: counting the number of convolution kernels used by each convolution layer in current training based on the identification of the convolution kernel used by each convolution layer in the current training; taking the number of convolution kernels used by each convolution layer in current training as a vector element corresponding to the convolution layer in the current training; and according to the arrangement sequence of the convolutional layers in the convolutional neural network, forming the vector elements corresponding to the currently trained convolutional layer into the currently trained convolutional layer identification vector.

In a preferred embodiment of the present invention, the step of determining the delay loss corresponding to each training based on the convolutional layer identification vector of each training includes: inputting the convolutional layer identification vector of each training to a pre-trained delay prediction module, and outputting the delay loss corresponding to each convolutional layer identification vector; the delay prediction module trains based on a plurality of uniformly sampled convolutional layer identification vectors carrying delay loss labels.

In a preferred embodiment of the present invention, the step of calculating the overall loss value corresponding to each training according to the cross entropy loss and the delay loss corresponding to the predetermined loss function and the convolutional layer identifier vector includes: calculating the integral loss value corresponding to each training through the following loss function: loss ═ cross entroploss + k ═ log (1+ latency); wherein, the Loss is the integral Loss value corresponding to each training; cross EntropyLoss is the cross entropy loss corresponding to the convolutional layer identification vector; latency is the delay loss corresponding to the convolutional layer identification vector; k is a preset delay weight.

In a preferred embodiment of the present invention, the step of screening out the convolution kernel finally used by each convolution layer according to the overall loss value corresponding to each training includes: and taking the convolution kernel used by each convolution layer in the training corresponding to the converged overall loss value as a finally used convolution kernel.

In a second aspect, an embodiment of the present invention further provides an optimized neural network-based processing apparatus, where the apparatus includes: the cross entropy loss and delay loss acquisition module is used for acquiring a convolution kernel used by each convolution layer in each training of the convolutional neural network, acquiring cross entropy loss and delay loss corresponding to each training according to the convolution kernel used by each convolution layer, wherein the cross entropy loss and the delay loss correspond to a convolution layer identification vector of each training; each element in the convolutional layer identification vector is the number of convolutional cores used by the corresponding convolutional layer; the integral loss value calculation module is used for calculating an integral loss value corresponding to each training based on a preset loss function and the cross entropy loss and the delay loss corresponding to the convolutional layer identification vector; the convolution kernel screening module is used for screening out the finally used convolution kernel of each convolution layer according to the overall loss value corresponding to each training to obtain an optimized convolution neural network; and the image identification result determining module is used for inputting the image to be detected into the convolutional neural network for processing to obtain an image identification result.

In a third aspect, an embodiment of the present invention further provides an electronic system, where the electronic system includes: the device comprises data acquisition equipment, processing equipment and a storage device; the data acquisition equipment is used for acquiring the convolutional neural network; the storage means has stored thereon a computer program which, when run by a processing device, performs the optimization-based neural network processing method as described above.

In a fourth aspect, the embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processing device to perform the steps of the processing method based on an optimized neural network.

The embodiment of the invention has the following beneficial effects:

the embodiment of the invention provides a processing method, a device and an electronic system based on an optimized neural network. In the method, the overall loss value of the convolutional neural network is comprehensively considered based on the cross entropy loss and the delay loss corresponding to each training so as to carry out double-target optimization on the accuracy and the delay of the convolutional neural network, and the convolutional neural network which meets the delay requirement and has higher accuracy can be screened out.

Additional features and advantages of the disclosure will be set forth in the description which follows, or in part may be learned by the practice of the above-described techniques of the disclosure, or may be learned by practice of the disclosure.

In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic structural diagram of an electronic system according to an embodiment of the present invention;

FIG. 2 is a flowchart of an optimization-based neural network processing method according to an embodiment of the present invention;

FIG. 3 is a flow chart of another optimized neural network based processing method provided by an embodiment of the present invention;

fig. 4 is a block diagram of an architecture of a processing method based on an optimized neural network according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating a delay prediction module according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a processing device based on an optimized neural network according to an embodiment of the present invention.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to solve the problem that the existing convolutional neural network occupies too much storage space, for example, the convolutional neural network VGG16 contains about 1.4 hundred million floating-point parameters, and the storage space of more than 500MB is needed; the embodiment of the invention provides a processing method, a device and an electronic system based on an optimized neural network, and the technology can be applied to various devices such as servers, computers, cameras, mobile phones, tablet computers, vehicle central control devices and the like, can be realized by adopting corresponding software and hardware, and is described in detail in the following.

For the understanding of the present embodiment, a detailed description will be given to a processing method based on an optimized neural network disclosed in the embodiment of the present invention.

The first embodiment is as follows:

first, an example electronic system 100 for implementing an optimized neural network-based processing method, apparatus, and electronic system of embodiments of the present invention is described with reference to fig. 1.

As shown in FIG. 1, an electronic system 100 includes one or more processing devices 102, one or more memory devices 104, an input device 106, an output device 108, and one or more data acquisition devices 110, which are interconnected via a bus system 112 and/or other type of connection mechanism (not shown). It should be noted that the components and structure of the electronic system 100 shown in fig. 1 are exemplary only, and not limiting, and that the electronic system may have other components and structures as desired.

The processing device 102 may be an intelligent terminal or a device containing a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, may process data for other components in the electronic system 100, and may control other components in the electronic system 100 to perform the functions of target object statistics.

Storage 104 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, Random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, Read Only Memory (ROM), a hard disk, flash memory, and the like. One or more computer program instructions may be stored on a computer-readable storage medium and executed by processing device 102 to implement the client functionality (implemented by the processing device) of the embodiments of the invention described below and/or other desired functionality. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.

The input device 106 may be a device used by a user to input instructions and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like.

The output device 108 may output various information (e.g., images or sounds) to the outside (e.g., a user), and may include one or more of a display, a speaker, and the like.

The data acquisition device 110 may acquire the convolutional neural network and store the acquired convolutional neural network in the storage 104 for use by other components.

For example, the processing method, apparatus and electronic system for implementing the optimization-based neural network according to the embodiment of the present invention may be integrally disposed, or may be dispersedly disposed, such as integrally disposing the processing device 102, the storage device 104, the input device 106 and the output device 108 into a whole, and disposing the data acquisition device 110 at a designated position where the convolutional neural network can be acquired. When the above-described devices in the electronic system are integrally provided, the electronic system may be implemented as an intelligent terminal such as a camera, a smart phone, a tablet computer, a vehicle center control device, and the like.

Example two:

the embodiment provides a processing method based on an optimized neural network, which is executed by a processing device in the electronic system; the processing device may be any device or chip having data processing capabilities. Fig. 2 is a flowchart of an optimized neural network-based processing method, which includes the following steps:

step S202, obtaining a convolution kernel used by each convolution layer in each training of the convolution neural network, and obtaining cross entropy loss and delay loss corresponding to each training according to the convolution kernel used by each convolution layer, wherein the cross entropy loss and the delay loss correspond to the convolution layer identification vector of each training; each element in the convolutional layer identification vector is the number of convolutional cores used by the corresponding convolutional layer.

The convolutional neural network is a feedforward neural network which comprises convolution calculation and has a depth structure, the convolutional neural network comprises at least one convolutional layer, each convolutional layer comprises at least one convolution kernel, and the convolution calculation is carried out through the convolution kernels. The convolutional neural network in the embodiment is a convolutional neural network to be optimized, wherein the optimization refers to screening convolutional kernels in the convolutional neural network, reserving important convolutional kernels capable of extracting key information, and removing unimportant convolutional kernels so as to achieve the purposes of compressing a storage space occupied by the convolutional neural network and maintaining accuracy.

The training of the convolutional neural network is used for adjusting each convolutional kernel weight parameter, and the weight parameter refers to a parameter inside a convolutional kernel which can be learned and updated in the training process of the convolutional neural network. During the training process of the convolutional neural network, whether each convolutional kernel is used in the training can be determined, the convolutional kernels used in each training are given the same identification, and the convolutional kernels which are not used in each training are given other identifications. The convolutional layer identification vector is used to represent the number of convolutional cores used by each convolutional layer of the convolutional neural network in each training.

The cross entropy loss is used for determining the accuracy of each training, the cross entropy describes the distance between two probability distributions, and the smaller the cross entropy is, the closer the cross entropy is; the delay loss is used to determine the delay length of each training. And corresponding the convolutional layer identification vector of each training with cross entropy loss and delay loss.

Step S204, based on the preset loss function and the cross entropy loss and the delay loss corresponding to the convolutional layer identification vector, calculating the integral loss value corresponding to each training.

The loss function is a function which maps the value of a random event or its related random variable to a non-negative real number to represent the "risk" or "loss" of the random event, and the convolutional neural network can be solved and evaluated by minimizing the loss function. In the embodiment, the loss function comprehensively considers cross entropy loss and delay loss, and the output overall loss value not only considers accuracy, but also considers delay; and inputting the cross entropy loss and the delay loss corresponding to the convolutional layer identification vector of each training into a preset loss function, so as to obtain an overall loss value corresponding to each training.

And S206, screening out the convolution kernel finally used by each convolution layer according to the overall loss value corresponding to each training to obtain the optimized convolution neural network.

Selecting a proper overall loss value according to the needs of a user, optimizing the convolutional neural network according to the proper overall loss value, namely screening the convolutional kernel of each convolutional layer in the convolutional neural network, only reserving the convolutional kernel corresponding to the proper overall loss value, taking the convolutional kernel corresponding to the proper overall loss value as the finally used convolutional kernel of each convolutional layer in the convolutional neural network, and obtaining the optimized convolutional neural network.

And S208, inputting the image to be detected into a convolutional neural network for processing to obtain an image identification result.

The image to be measured is an image to be subjected to convolution calculation, and may be an image such as a photograph or a video frame. And inputting the image to be detected into the optimized convolutional neural network, namely performing image recognition to obtain an image recognition result. Wherein the image recognition comprises: image classification, semantic segmentation, object detection, etc.

The embodiment of the invention provides an optimization-based neural network processing method, which comprises the steps of firstly obtaining cross entropy loss and delay loss corresponding to convolutional layer identification vectors of convolutional neural networks trained each time, determining an overall loss value corresponding to each training based on the cross entropy loss and the delay loss, and screening a finally-used convolutional core of each convolutional layer in the convolutional neural networks according to the overall loss value. In the method, the overall loss value of the convolutional neural network is comprehensively considered based on the cross entropy loss and the delay loss corresponding to each training so as to carry out double-target optimization on the accuracy and the delay of the convolutional neural network, and the convolutional neural network which meets the delay requirement and has higher accuracy can be screened out.

Example three:

the embodiment provides another processing method based on an optimized neural network, which is realized on the basis of the embodiment; the embodiment focuses on a specific implementation manner in which a convolution kernel used by each convolutional layer in each training of the convolutional neural network is obtained, and cross entropy loss and delay loss corresponding to each training are obtained according to the convolution kernel used by each convolutional layer, and the cross entropy loss and delay loss correspond to a convolutional layer identification vector of each training. As shown in fig. 3, a flowchart of another optimized neural network-based processing method, the optimized neural network-based processing method in this embodiment includes the following steps:

step S302, obtaining the mark of the convolution kernel used by each convolution layer in each training of the convolution neural network.

The same identification is made for the convolution kernel used by each convolutional layer in each training, and other identifications are made for the convolution kernels not used, which can be performed by steps a 1-a 4:

step a1, in each training, traversing the convolutional layers of the convolutional neural network, taking the traversed convolutional layers as target convolutional layers, and executing the following operations for each target convolutional layer pair:

a convolutional neural network includes at least one convolutional layer, each convolutional layer including at least one convolutional layer core. Thus, in each convolutional layer, the convolutional kernel used by that convolutional layer in each training can be labeled. That is, one convolutional layer may be selected as a target convolutional layer in each training, and the convolutional kernels in the target convolutional layer may be labeled.

Step A2, obtaining the weight parameter of each convolution kernel in the target convolution layer, which should be trained each time.

First, the weight parameters of each convolution kernel in the target convolution layer in the training are obtained, and the weight parameters are the parameters learned in the training process of the convolution neural network and form the convolution neural network layer.

Referring to the block diagram of the architecture of the optimized neural network-based processing method shown in fig. 4, as shown in fig. 4, for each convolution kernel in convolution layer i, a corresponding weight parameter is extracted, and assuming that there are 5 convolution kernels in convolution layer i, 5 sets of weight parameters are extracted, and the weight parameter of each convolution kernel is expanded into a row vector.

Step A3, inputting the weight parameter of each convolution kernel to a preset convolution kernel screening module, wherein the convolution kernel screening module comprises a full connection layer and a binarization activation function.

And inputting the extracted weight parameters of each convolution kernel to a preset convolution kernel screening module respectively. The convolution kernel screening module is used for determining whether the corresponding convolution kernel is used in the current training or not according to the weight parameter scores. The convolution kernel screening module can be directly added into the convolution neural network to be optimized and trained together with the convolution neural network. As shown in fig. 4, after the weight parameters are extracted, the weight parameters are input to the fully connected layer of the convolution kernel filtering module.

A4, scoring each convolution kernel through a full connection layer, binarizing the score through a binarization activation function, and outputting the corresponding identification of the convolution kernel; wherein the identification of the convolution kernel comprises 0 and 1; the convolution kernel labeled 1 is the convolution kernel used by the target convolution layer.

The convolution kernel screening module comprises a full connection layer and a binarization activation function, wherein the convolution kernel screening module learns weight parameters in convolution kernels through the full connection layer and automatically screens the most effective convolution kernels so that invalid or ineffective convolution kernels cannot play a role. The fully-connected layer may represent the binary gated output that controls the convolution kernel on or off as a function of the convolution kernel weight parameters.

Firstly, expanding the weight parameters into corresponding row vectors, scoring the row vectors through a full-function connection layer, then carrying out binarization through a binarization activation function, and outputting the identifier of a convolution kernel after binarization. For example, the score larger than 0 after the scoring is set to 1, and the score smaller than or equal to 0 is set to 0. Thus, the convolution kernel labeled 1 is the convolution kernel used by the target convolution layer, and the convolution kernel labeled 0 is the convolution kernel not used by the target convolution layer.

As shown in fig. 4, for 5 weight parameters of convolutional layer i, 5 scores are obtained through the full-link layer, and the 5 scores are binarized through the binarization activation parameter, so as to sequentially obtain identifiers of 1, 0, 1, and 1, where the 5 identifiers may form a mask 10011, and the dimension of the mask is equal to the number of convolutional cores in the convolutional layer. The 1 st, 3 rd, and 5 th convolution kernels representing convolutional layer i are used in the current training, and the 2 nd and 4 th convolution kernels of convolutional layer i are not used in the current training.

In the method, for all convolutional layers of the convolutional neural network, a weight parameter of each convolutional core in each convolutional layer is extracted, the weight parameter of each convolutional core is scored through a preset convolutional core screening module, and the label of each convolutional core is obtained through binary scoring. The convolution kernel screening module in the mode can learn the weight parameter of each convolution kernel, automatically screen out the most effective convolution kernel, and enable invalid or ineffective convolution kernels not to play a role; and the convolution kernel screening module can be directly added into the convolution neural network to be optimized and trained together with the convolution neural network, so that the screening accuracy of the convolution kernel screening module can be increased along with the training of the convolution neural network.

Step S304, determining the cross entropy loss corresponding to each training based on the convolution kernel used by each convolution layer in each training.

As shown in fig. 4, the cross entropy loss can be directly output by the convolutional neural network, and the cross entropy loss is used to evaluate the accuracy of the convolutional neural network. In general, the smaller the cross-entropy loss, the higher the accuracy of the convolutional neural network.

Step S306, determining the convolutional layer identification vector of each training based on the identification of the convolutional kernel used by each convolutional layer in each training.

Based on the identification of the convolution kernel, which convolution kernels in the corresponding convolution layer are used in the current training can be determined, so that the number of the convolution kernels used in each convolution layer of each training can be obtained, and the identification vector of each convolution layer of each training is determined. Specifically, the convolutional layer identification vector for each training may be determined by steps B1-B4:

step B1, for each training, the following steps are performed:

and step B2, counting the number of convolution kernels used by each convolution layer in the current training based on the identification of the convolution kernel used by each convolution layer in the current training.

The identifier of the convolution kernel indicates whether the convolution kernel is used in the current training, and the number of the identifiers of the convolution kernels used by each convolution layer is counted, so that the number of the convolution kernels used by each convolution layer in the current training can be determined.

It should be noted that, if 1 is used to identify the convolution kernel used by each convolution layer and 0 is used to identify the convolution kernel not used by each convolution layer, the identified values of each convolution layer can be directly summed, and the number of convolution kernels currently used for training each convolution layer can be directly obtained. As shown in fig. 4, for convolutional layer i, the labels are 1, 0, 1, and 1, respectively, and the sum is 3, that is, the number of convolution kernels used by convolutional layer i; for convolutional layer i +1, the labels are 0, 1, and 1, respectively, and the sum is 4, i.e., the number of convolution kernels used by convolutional layer i + 1.

And step B3, using the number of convolution kernels used by each convolution layer in the current training as the vector element corresponding to the convolution layer in the current training.

And forming a convolutional layer identification vector by the number of convolutional kernels used by each convolutional layer, wherein the vector elements of the convolutional layer identification vector are the number of convolutional kernels used by the convolutional layer. As shown in fig. 4, the vector elements of the convolutional layer identification vector in fig. 4 include the number of convolution kernels 3 used by convolutional layer i and the number of convolution kernels 4 used by convolutional layer i + 1.

And step B4, according to the arrangement sequence of the convolutional layers in the convolutional neural network, the vector elements corresponding to the convolutional layer of the current training form the convolutional layer identification vector of the current training.

And sequencing the vector elements of the convolutional layer identification vector according to the arrangement sequence of the corresponding convolutional layers in the convolutional neural network, namely, a first vector element of the convolutional layer identification vector corresponds to a first convolutional layer in the convolutional neural network, a second vector element of the convolutional layer identification vector corresponds to a second convolutional layer in the convolutional neural network, and the like. As shown in fig. 4, in the convolutional layer identification vector of fig. 4, the vector element 4 corresponding to convolutional layer i +1 is disposed one bit after the vector element 3 corresponding to convolutional layer i.

In the method, the number of convolution kernels used by each convolution layer in current training is used as vector elements corresponding to the convolution layer in the current training, and the corresponding vector elements are arranged according to the arrangement sequence of the convolution layers in the convolution neural network to obtain convolution layer identification vectors, so that the convolution layer identification vectors and the convolution layers are in one-to-one correspondence, and the kth vector element of the convolution layer identification vector is the number of convolution kernels used by the kth convolution layer in the current training.

Step S308, determining the delay loss corresponding to each training based on the convolutional layer identification vector of each training.

The delay loss is determined based on convolutional layer identification vectors that account for the number of convolutional cores of each convolutional layer of the convolutional neural network in the current training. Therefore, a delay prediction module can be trained in advance, and the delay prediction module outputs corresponding delay loss for the input convolutional layer identification vector. The method is specifically implemented by the following steps: inputting the convolutional layer identification vector of each training to a pre-trained delay prediction module, and outputting the delay loss corresponding to each convolutional layer identification vector; the delay prediction module trains based on a plurality of uniformly sampled convolutional layer identification vectors carrying delay loss labels.

Firstly, uniformly sampling to obtain a large number of convolutional layer identification vectors (the number of convolutional cores of each layer in a convolutional neural network to be optimized is taken as an upper limit, the convolutional layer identification vectors are randomly generated, the vector length is equal to the number of convolutional layers of the convolutional neural network), an actual convolutional neural network is generated according to the convolutional layer identification vectors, the convolutional neural network is put on a GPU (Graphics processing unit) to operate the generated convolutional neural network, and actual network delay is obtained. And training a delay prediction module by using the convolutional layer identification vector obtained by sampling as input and the actual network delay as a label.

And after the training of the delay prediction module is finished, adding the delay prediction module into the convolutional neural network to be optimized, and fixing the parameters of the delay prediction module, namely, the parameters of the delay prediction module are not changed in the training process of the convolutional neural network. And inputting the convolutional layer identification vector of each training into a pre-trained delay prediction module, and outputting the delay loss of the training. Referring to fig. 5, a schematic diagram of a delay prediction module, as shown in fig. 5, a convolutional layer identifier vector [ 342016 … … 64] is input into the delay prediction module, and the delay loss of the training is output; the delay prediction module may be a 3-layer fully-connected layer structure.

In the method, the delay loss corresponding to each convolutional layer identification vector is output through the pre-trained delay prediction module, and the delay loss of each training can be accurately predicted.

Step S310, calculating an integral loss value corresponding to each training based on a preset loss function and the cross entropy loss and the delay loss corresponding to the convolutional layer identification vector.

The loss function in this embodiment is based on the comprehensive consideration of the accuracy and the delay, and may be the highest accuracy obtained under the condition that the requirement for the delay is met, or the smallest delay obtained under the condition that the requirement for the accuracy is met, or both the accuracy and the delay have no requirement, and only the accuracy and the delay are considered comprehensively.

The overall loss value for each training can be calculated by the following loss function:

Loss＝CrossEntropyLoss+k*log(1+latency)；

wherein, the Loss is the integral Loss value corresponding to each training; cross EntropyLoss is the cross entropy loss corresponding to the convolutional layer identification vector; latency is the delay loss corresponding to the convolutional layer identification vector; k is a preset delay weight.

The cross entropy loss is output by the convolutional neural network, the delay loss corresponding to the convolutional layer identification vector is output by the delay prediction module, the overall loss value is equal to the weighted sum of the cross entropy loss and the delay loss, and the delay weight k is used for adjusting the attention degree of the convolutional neural network to the delay in the optimization process. Generally, the higher the interest level of the delay, i.e. the more important the delay requirement is in the optimization process, the larger k parameter is needed to increase the weight of the delay loss in the overall loss value.

In the method, the overall loss value is equal to the weighted sum of the cross entropy loss and the delay loss, and the accuracy and the delay are comprehensively considered. And adjusting the attention degree of the convolutional neural network to the delay in the optimization process through the delay weight k.

And S312, screening out the convolution kernel finally used by each convolution layer according to the overall loss value corresponding to each training to obtain the optimized convolution neural network.

Parameters of the convolutional neural network can be adjusted based on the overall loss value as required to adjust the convolutional core used in each training, so that multiple times of training can be performed, and the convolutional core finally used by each convolutional layer can be screened out by selecting the appropriate parameters of the convolutional neural network according to the overall loss value of each training. In general, the condition for stopping training may be that the overall loss value converges, the training time reaches a preset time threshold, or the training time reaches a preset time threshold.

After the training of the convolutional neural network is completed, if the overall loss value converges, the convolutional kernel finally used by the convolutional layer can be screened by the following steps: and taking the convolution kernel used by each convolution layer in the training corresponding to the converged overall loss value as a finally used convolution kernel.

If the trained overall loss value is convergent, the convolution kernel used by the convolution layer of the converged convolutional neural network can be used as the final convolution kernel. For example, the convolution kernel used by each convolution layer in the last training may be used as the convolution kernel to be finally used, the convolution kernel used by the convolution layer of the convolutional neural network in each convolution training may be recorded in advance, and the convolution kernel used by the convolution layer of the last convolutional neural network may be used as the convolution kernel to be finally used.

In the method, the convolution kernels used by each convolution layer in the training corresponding to the converged overall loss value are used as the finally used convolution kernels, the convolution kernels are reserved, and the convolution kernels which are not used are removed, so that the optimization of the convolution neural network is completed.

After the training of the convolutional neural network is completed, if the overall loss value is not converged (i.e. the training times reach the preset time threshold, or the training time reaches the preset time threshold, etc.), the convolutional kernel finally used by the convolutional layer may be screened through steps C1-C2:

and step C1, selecting the minimum overall loss value in the overall loss values corresponding to each training.

The convolutional neural network corresponding to the minimum overall loss value represents the convolutional neural network which is considered best comprehensively according to the requirements of users, time delay and accuracy in all training processes. Therefore, it is necessary to select the minimum overall loss value from the overall loss values corresponding to all the trainings.

And step C2, taking the convolution kernel used by each convolution layer in the training corresponding to the minimum overall loss value as the final convolution kernel.

And recording the convolution kernel condition used by each convolution layer in each training in advance, and taking the convolution kernel condition used by each convolution layer in the training corresponding to the minimum overall loss value as the used convolution kernel of the optimized convolution neural network, namely the finally used convolution kernel. Specifically, the number of each training and the identification of each convolution kernel in each training can be recorded in advance, so that the recording space can be saved. And finally, determining the condition of the convolution kernel used by each convolution layer in the training corresponding to the minimum overall loss value based on the number and the recorded identification.

In the method, the convolution kernel used by each convolution layer in the training corresponding to the minimum overall loss value is selected as the convolution kernel which is finally used, the convolution kernels are reserved, and the convolution kernels which are not used are removed, so that the optimization of the convolution neural network is completed.

And step S314, inputting the image to be detected into a convolutional neural network for processing to obtain an image identification result.

The flow of the processing method based on the optimized neural network provided by the embodiment of the invention can be executed through the steps D1-D5:

and D1, pre-training the convolutional neural network to be optimized and the convolutional kernel screening module.

The convolutional neural network and the convolutional kernel screening module are trained together, pre-training refers to simple basic training of the convolutional neural network and the convolutional kernel screening module, and the number of convolutional layers and the number of convolutional kernels included in each convolutional layer are required to be determined by the obtained convolutional neural network.

Step D2, automatically generating samples to train the delay prediction module alone.

The delay prediction module is trained independently, and the delay prediction module is trained based on a plurality of uniformly sampled convolutional layer identification vectors carrying delay loss labels.

And D3, adding the delay prediction module into the convolutional neural network, and fixing the weight of the convolutional neural network.

And after the training of the delay prediction module is finished, adding the delay prediction module into the convolutional neural network, fixing the parameters of the delay prediction module, and not updating the parameters of the delay prediction module.

And D4, performing double-target optimization on the cross entropy loss and the time delay loss.

The overall loss value is the weighted sum of cross entropy loss and delay loss, and the target of the dual-objective optimization can be the convolutional neural network with the minimum delay within a certain range of accuracy, the convolutional neural network with the highest accuracy within a certain range of delay, and the convolutional neural network with the best comprehensive consideration of accuracy and delay.

And D5, selecting corresponding convolution kernels according to the converged overall loss value, and optimizing the convolution neural network.

And determining the identifier of each convolution kernel in corresponding training according to the converged overall loss value, determining the convolution kernel used in the training based on the identifier, reserving the used convolution kernel, and removing the unused convolution kernel to complete the optimization of the convolution neural network.

In the method, weight parameters in the convolutional neural network are learned by utilizing the layer-in-group, key information is extracted, and the optimization of convolutional kernels and the pruning rate of each layer are automatically determined; pre-training a delay prediction module to predict the calculation time consumption of a convolutional neural network with a specified structure on hardware equipment; the problems of optimizing the convolutional neural network are summarized into a double-target optimization problem, the optimization targets are respectively higher in accuracy and smaller in delay, and then an optimal balance is found out between the two optimization targets through network training. Therefore, the method can ensure that the calculation delay is the lowest on the basis of the accuracy.

Example four:

corresponding to the above method embodiment, refer to a schematic structural diagram of an optimized neural network-based processing apparatus shown in fig. 6, where the apparatus includes:

a cross entropy loss and delay loss obtaining module 61, configured to obtain a convolution kernel used by each convolutional layer in each training of the convolutional neural network, and obtain a cross entropy loss and a delay loss corresponding to each training according to the convolution kernel used by each convolutional layer, where the cross entropy loss and the delay loss correspond to a convolutional layer identification vector in each training; each element in the convolutional layer identification vector is the number of convolutional cores used by the corresponding convolutional layer;

the overall loss value calculating module 62 is configured to calculate an overall loss value corresponding to each training based on a cross entropy loss and a delay loss corresponding to a preset loss function and a convolutional layer identification vector;

a convolution kernel screening module 63, configured to screen out a convolution kernel that is finally used by each convolution layer according to an overall loss value corresponding to each training, so as to obtain an optimized convolution neural network;

and the image identification result determining module 64 is used for inputting the image to be detected into the convolutional neural network for processing to obtain an image identification result.

Further, the cross entropy loss and delay loss obtaining module is configured to: acquiring the identifier of a convolution kernel used by each convolution layer in each training of the convolution neural network; determining cross entropy loss corresponding to each training based on a convolution kernel used by each convolution layer in each training; determining a convolutional layer identification vector of each training based on the identification of the convolutional kernel used by each convolutional layer in each training; and determining the delay loss corresponding to each training based on the convolutional layer identification vector of each training.

Further, the cross entropy loss and delay loss obtaining module is configured to: in each training, traversing the convolutional layers of the convolutional neural network, taking the traversed convolutional layers as target convolutional layers, and executing the following operations for each target convolutional layer pair: acquiring a weight parameter of each convolution core in the target convolution layer, wherein the weight parameter is required to be trained each time; inputting the weight parameter of each convolution kernel to a preset convolution kernel screening module, wherein the convolution kernel screening module comprises a full-link layer and a binarization activation function; scoring each convolution kernel through the full connection layer, binarizing the score through a binarization activation function, and outputting a corresponding identifier of the convolution kernel; wherein the identification of the convolution kernel comprises 0 and 1; the convolution kernel labeled 1 is the convolution kernel used by the target convolution layer.

Further, the cross entropy loss and delay loss obtaining module is configured to: for each training the following steps are performed: counting the number of convolution kernels used by each convolution layer in current training based on the identification of the convolution kernel used by each convolution layer in the current training; taking the number of convolution kernels used by each convolution layer in current training as a vector element corresponding to the convolution layer in the current training; and according to the arrangement sequence of the convolutional layers in the convolutional neural network, forming the vector elements corresponding to the currently trained convolutional layer into the currently trained convolutional layer identification vector.

Further, the cross entropy loss and delay loss obtaining module is configured to: inputting the convolutional layer identification vector of each training to a pre-trained delay prediction module, and outputting the delay loss corresponding to each convolutional layer identification vector; the delay prediction module trains based on a plurality of uniformly sampled convolutional layer identification vectors carrying delay loss labels.

Further, the overall loss value calculating module is configured to: calculating the integral loss value corresponding to each training through the following loss function: loss ═ cross entroploss + k ═ log (1+ latency); wherein, the Loss is the integral Loss value corresponding to each training; cross EntropyLoss is the cross entropy loss corresponding to the convolutional layer identification vector; latency is the delay loss corresponding to the convolutional layer identification vector; k is a preset delay weight.

Further, the convolution kernel screening module is configured to: and taking the convolution kernel used by each convolution layer in the training corresponding to the converged overall loss value as a finally used convolution kernel.

The embodiment of the invention provides an optimized neural network-based processing device, which is characterized by firstly obtaining cross entropy loss and delay loss corresponding to convolutional layer identification vectors of convolutional neural networks trained each time, determining an overall loss value corresponding to each training based on the cross entropy loss and the delay loss, and screening a finally used convolutional core of each convolutional layer in the convolutional neural networks according to the overall loss value. In the method, the overall loss value of the convolutional neural network is comprehensively considered based on the cross entropy loss and the delay loss corresponding to each training so as to carry out double-target optimization on the accuracy and the delay of the convolutional neural network, and the convolutional neural network which meets the delay requirement and has higher accuracy can be screened out.

Example five:

an embodiment of the present invention provides an electronic system, including: the device comprises data acquisition equipment, processing equipment and a storage device; the data acquisition equipment is used for acquiring the convolutional neural network; the storage means has stored thereon a computer program which, when being executed by a processing device, performs the steps of the optimized neural network based processing method as described above.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the electronic system described above may refer to the corresponding process in the foregoing method embodiments, and is not described herein again.

Embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processing device, performs the steps of the processing method, such as an optimization-based neural network.

The processing method and apparatus based on the optimized neural network and the computer program product of the electronic system provided by the embodiments of the present invention include a computer-readable storage medium storing program codes, and instructions included in the program codes may be used to execute the methods in the foregoing method embodiments.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and/or the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In addition, in the description of the embodiments of the present invention, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method of processing based on an optimized neural network, the method comprising:

acquiring a convolution kernel used by each convolution layer in each training of the convolutional neural network, and acquiring cross entropy loss and delay loss corresponding to each training according to the convolution kernel used by each convolution layer, wherein the cross entropy loss corresponds to the delay loss and a convolution layer identification vector of each training; each element in the convolutional layer identification vector is the number of convolutional cores used by the corresponding convolutional layer;

calculating an integral loss value corresponding to each training based on a preset loss function and the cross entropy loss and the delay loss corresponding to the convolutional layer identification vector;

screening out the convolution kernel finally used by each convolution layer according to the overall loss value corresponding to each training to obtain an optimized convolution neural network;

and inputting the image to be detected into the convolutional neural network for processing to obtain an image identification result.

2. The method of claim 1, wherein the step of obtaining the convolution kernel used by each convolutional layer in each training of the convolutional neural network, and obtaining the cross entropy loss and the delay loss corresponding to each training according to the convolution kernel used by each convolutional layer, wherein the cross entropy loss and the delay loss correspond to the convolutional layer identification vector in each training comprises:

acquiring the identifier of a convolution kernel used by each convolution layer in each training of the convolution neural network;

determining cross entropy loss corresponding to each training based on a convolution kernel used by each convolution layer in each training;

determining a convolutional layer identification vector of each training based on the identification of the convolutional kernel used by each convolutional layer in each training;

and determining the delay loss corresponding to each training based on the convolutional layer identification vector of each training.

3. The method of claim 2, wherein the step of obtaining an identification of a convolution kernel used by each convolutional layer in each training of the convolutional neural network comprises:

in each training, traversing the convolutional layers of the convolutional neural network, taking the traversed convolutional layers as target convolutional layers, and executing the following operations for each target convolutional layer pair:

acquiring a weight parameter of each convolution core in the target convolution layer, wherein the weight parameter is required to be trained each time;

inputting the weight parameter of each convolution kernel to a preset convolution kernel screening module, wherein the convolution kernel screening module comprises a full-link layer and a binarization activation function;

scoring each convolution kernel through the full-link layer, binarizing the score through the binarization activation function, and outputting a corresponding identifier of the convolution kernel; wherein the identity of the convolution kernel includes 0 and 1; the convolution kernel identified as 1 is the convolution kernel used by the target convolution layer.

4. The method of claim 2, wherein the step of determining a convolutional layer identification vector for each training based on the identification of the convolutional kernel used by each convolutional layer in each training comprises:

for each training the following steps are performed:

counting the number of convolution kernels used by each convolution layer in current training based on the identification of the convolution kernel used by each convolution layer in the current training;

taking the number of convolution kernels used by each convolution layer in current training as a vector element corresponding to the convolution layer in the current training;

and according to the arrangement sequence of the convolutional layers in the convolutional neural network, forming the vector elements corresponding to the convolutional layers of the current training into the convolutional layer identification vector of the current training.

5. The method of claim 2, wherein the step of determining the delay loss corresponding to each training based on the convolutional layer identification vector for each training comprises:

inputting the convolutional layer identification vector of each training to a pre-trained delay prediction module, and outputting the delay loss corresponding to each convolutional layer identification vector; the delay prediction module trains based on a plurality of uniformly sampled convolutional layer identification vectors carrying delay loss labels.

6. The method of claim 1, wherein the step of calculating the overall loss value corresponding to each training based on the cross-entropy loss and the delay loss corresponding to the convolutional layer identification vector and a predetermined loss function comprises:

calculating the integral loss value corresponding to each training through the following loss function:

Loss＝CrossEntropyLoss+k*log(1+latency)；

7. The method of claim 1, wherein the step of screening out the convolution kernel ultimately used by each convolution layer based on the global penalty value corresponding to each training comprises:

and taking the convolution kernel used by each convolution layer in the training corresponding to the converged overall loss value as a finally used convolution kernel.

8. An optimized neural network-based processing apparatus, the apparatus comprising:

the cross entropy loss and delay loss acquisition module is used for acquiring a convolution kernel used by each convolution layer in each training of the convolutional neural network, and acquiring cross entropy loss and delay loss corresponding to each training according to the convolution kernel used by each convolution layer, wherein the cross entropy loss corresponds to the delay loss and a convolution layer identification vector of each training; each element in the convolutional layer identification vector is the number of convolutional cores used by the corresponding convolutional layer;

the integral loss value calculation module is used for calculating an integral loss value corresponding to each training based on a preset loss function and the cross entropy loss and the delay loss corresponding to the convolutional layer identification vector;

the convolution kernel screening module is used for screening out the finally used convolution kernel of each convolution layer according to the overall loss value corresponding to each training to obtain an optimized convolution neural network;

and the image identification result determining module is used for inputting the image to be detected into the convolutional neural network for processing to obtain an image identification result.

9. An electronic system, characterized in that the electronic system comprises: the device comprises data acquisition equipment, processing equipment and a storage device;

the data acquisition equipment is used for acquiring a convolutional neural network;

the storage device has stored thereon a computer program which, when executed by the processing apparatus, performs the optimized neural network-based processing method of any one of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processing device, carries out the steps of the optimized neural network-based processing method according to any one of claims 1 to 7.