CN110399918B

CN110399918B - Target identification method and device

Info

Publication number: CN110399918B
Application number: CN201910671904.XA
Authority: CN
Inventors: 陈海波
Original assignee: Deep Blue Technology Shanghai Co Ltd
Current assignee: Shenlan Robot Shanghai Co ltd
Priority date: 2019-07-24
Filing date: 2019-07-24
Publication date: 2021-11-19
Anticipated expiration: 2039-07-24
Also published as: CN110399918A

Abstract

The invention discloses a method and a device for identifying a target, wherein the method comprises the following steps: acquiring image data for target recognition; inputting the image data into a neural network model for target recognition to obtain a recognition result of the target recognition, inputting a training sample into a preset neural network model for training to obtain the neural network model, and determining whether the preset neural network model comprises a batch normalization BN layer or not in the process of training the preset neural network model; if so, cutting the number of BN layers in the preset neural network model, otherwise, cutting the weight of the network layers of the preset neural network model. According to the method, the preset neural network model is combined with direct and indirect structural sparsity methods, the structural weight and the BN layer are cut, and the neural network model after cutting through the neural network is used for carrying out target recognition, so that the target recognition speed is higher, and the efficiency is higher.

Description

Target identification method and device

Technical Field

The present invention relates to the field of target identification technologies, and in particular, to a method and an apparatus for target identification.

Background

With the development of computer technology and neural network technology, more and more people use neural network models to perform target recognition, but in the process of performing target recognition by using the neural network models, the neural network needs to have deeper depth in order to achieve a good pattern recognition effect, but for specific problems, too deep depth also causes problems of increased overfitting risk, increased training difficulty and the like, and the excessively deep network has limited help to improve the performance of pattern recognition in a specific scene, so that the network is sometimes subjected to hierarchical clipping. Network clipping refers to removing redundant parts in a network by changing the structure of the network. Network clipping can be divided into hierarchical clipping, neural connection level clipping and other granularities according to different clipping objects.

The object of the neural connection level is a specific network connection or a specific parameter, and the result of the clipping is usually to obtain a more sparse network. The tailoring of the neural connection level is more fine and controllable, and the influence on the network performance is minimum. However, the network is out of regularity due to the cutting of the neural connection stage, and the cut network weight tension becomes sparse, so that the storage and operation rules of sparse tensor need to be adopted during storage and operation, which is not beneficial to parallelism.

The object of hierarchical cutting is the whole network layer, which is mainly suitable for the model with more network layers, and the cutting result is that the neural network becomes more shallow, and a plurality of modules of the deep residual error network are removed, which is actually the hierarchical cutting. The object of clipping at the neuron level is a single neuron or filter, and as a result of clipping, the neural network becomes "thinner". The goal of neural connection level clipping is to connect weights for a single neural network, and the result of clipping is to make the neural network more "sparse".

In summary, when the neural network model is used for target recognition, a network clipping manner is adopted, and if clipping is excessive, new problems are caused while an effect is obtained.

Disclosure of Invention

The invention provides a method and equipment for identifying a target, which specifically comprise the following steps:

according to a first aspect of the invention, there is provided a method of object recognition, the method comprising:

acquiring image data for target recognition;

inputting the image data into a neural network model for target recognition to obtain a recognition result of the target recognition, inputting a training sample into a preset neural network model for training to obtain the neural network model, and determining whether the preset neural network model comprises a batch normalization BN layer or not in the process of training the preset neural network model; if so, cutting the number of BN layers in the preset neural network model, otherwise, cutting the weight of the network layers of the preset neural network model.

In a possible implementation manner, after a preset neural network model is trained by using n training samples, if the current preset neural network model is determined to include a BN layer, the number of BN layers of the current preset neural network model is cut, if the current preset neural network model is determined not to include a BN layer, the weight of the network layers of the current preset neural network model is cut, and n is a preset positive integer smaller than the total number of the training samples.

In a possible implementation manner, the weights of the network layer of the preset neural network model are clipped by adding a first penalty term to the loss function of the preset neural network model.

In a possible implementation manner, the first penalty item includes a first adjustment coefficient and a clipping weight value range, where the first adjustment coefficient is used to adjust a weight value of an ith network layer in a preset neural network model;

adjusting the weight value of the ith network layer in the preset neural network model according to the first adjustment coefficient in the first penalty item for the ith network layer in the preset neural network model;

and according to the clipping weight value range in the first penalty item, clipping the weight of the ith network layer in the preset neural network model.

In a possible implementation manner, the first adjustment coefficient in the first penalty item is dynamically adjusted to adjust the weight value of the ith network layer in the preset neural network model, so that the number of weights of the weight value in the ith network layer within the clipping weight value range is the largest.

In one possible implementation manner, the first penalty term L1 is:

wherein x is_iSatisfies epsilon 1 for ith network layer_i＜|x_i|≤C1_iIs given by a weight of [ epsilon 1 ]_i，C1_i]The network layer is a clipping weight value range of the ith network layer, N is the number of weights of the ith network layer, wherein the weights meet the clipping weight value range, and lambda 1 is a first adjusting coefficient.

In a possible implementation manner, the number of layers of the BN layer of the preset neural network model is cut by adding a second penalty term to the BN layer of the preset neural network model.

In a possible implementation manner, the second penalty term includes a second adjustment coefficient for adjusting the output value of the BN layer in the preset neural network model and a range of the output value of the clipped BN layer;

adjusting the output value of the BN layer in the preset neural network model according to the second adjustment coefficient in the second punishment item for the BN layer in the preset neural network model;

and cutting the BN layer in the preset neural network model according to the output value range of the cutting BN layer in the second penalty item.

In a possible implementation manner, the second adjustment coefficient in the second penalty term is dynamically adjusted to adjust the output value of the BN layer in the preset neural network model, so that the output value of the BN layer satisfies the maximum number of BN layers within the range of the output value of the clipped BN layer.

In a possible implementation manner, the second penalty term L2 is: lambda 2 sigma_k∈τg(k)；

Wherein g (k) is the output value of BN layer satisfying epsilon 2 < | g (k) | less than or equal to C2, wherein epsilon 2 and C2 are the range of output values of the cutting BN layer, and lambda 2 is a second adjusting coefficient.

According to a second aspect of the invention, an object recognition device comprises a memory storing an executable program and a processor implementing the following process when the executable program is executed:

acquiring image data for target recognition;

According to a third aspect of the invention, there is provided a computer storage medium storing a computer program which, when executed, implements the method described above.

Compared with the prior art, the target identification method and the target identification equipment provided by the invention have the following advantages and beneficial effects:

according to the method, the preset neural network model is combined with direct and indirect structural sparsity methods, the structural weight and the BN layer are cut, the training speed for sparsity is greatly increased, the cutting degree can be increased, and the network performance is kept unchanged.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic diagram of a target identification method according to an embodiment of the present invention;

fig. 2 is a graph of output values of a BN layer according to a first embodiment of the present invention;

fig. 3 is a schematic diagram of an apparatus for object recognition according to a second embodiment of the present invention;

fig. 4 is a schematic diagram of a target identification apparatus according to a second embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The application scenario described in the embodiment of the present invention is for more clearly illustrating the technical solution of the embodiment of the present invention, and does not form a limitation on the technical solution provided in the embodiment of the present invention, and it can be known by a person skilled in the art that with the occurrence of a new application scenario, the technical solution provided in the embodiment of the present invention is also applicable to similar technical problems. In the description of the present invention, the term "plurality" means two or more unless otherwise specified.

The neural network clipping refers to removing redundant parts in the neural network by changing the structure of the neural network. According to different cutting objects, neural network cutting can be divided into multiple granularities such as hierarchical cutting, neural connection-level cutting and the like. The cutting of the neural connection level can cause the loss of the regularity of the neural network, and the weight tensor of the cut neural network becomes sparse, so that the storage and operation rules of the sparse tensor need to be adopted during storage and operation, and the parallel operation is not facilitated. When the neural network model is used for target recognition, a neural network cutting mode is adopted, if the cutting is excessive, new problems are brought while the effect is obtained, and at present, a neural network cutting mode which can balance the effect between the two neural network cutting modes is needed to assist the target recognition, so that the target recognition effect is better.

Therefore, the embodiment of the invention provides a method and equipment for target identification.

With respect to the above scenario, the following describes an embodiment of the present invention in further detail with reference to the drawings of the specification.

Example one

The present embodiment provides a method for identifying a target, as shown in fig. 1, which specifically includes the following steps:

step 101, acquiring image data for target identification;

102, inputting the image data into a neural network model for target recognition to obtain a recognition result of the target recognition, wherein a training sample is input into a preset neural network model for training to obtain the neural network model, and in the process of training the preset neural network model, whether the preset neural network model comprises a batch normalization BN layer is determined; if so, cutting the number of BN layers in the preset neural network model, otherwise, cutting the weight of the network layers of the preset neural network model.

In the method, the preset neural network model is combined with a direct and indirect structural sparsity method, the structural weight and the BN layer are cut, the training speed for sparsity is greatly increased, the cutting degree can be increased, the network performance is kept unchanged, and the neural network model cut by the neural network is used for target recognition, so that the target recognition speed is higher, and the efficiency is higher.

As an optional implementation manner, after a preset neural network model is trained by using n training samples, if it is determined that the current preset neural network model includes a BN layer, the number of BN layer layers of the current preset neural network model is cut, if it is determined that the current preset neural network model does not include a BN layer, the weight of the network layer of the current preset neural network model is cut, and n is a preset positive integer smaller than the total number of the training samples.

In this embodiment, a specific value of n is not limited, n may be 1 or 10 or any positive integer smaller than the total number of training samples, for example, in implementation, after a preset neural network model is trained by using 1 training sample, the preset neural network model may be trimmed once, or after a preset neural network model is trained by using 10 training samples, the preset neural network model may be trimmed once.

The BN layer, the convolutional layer, the activation layer and the full connection layer are also one layer in the neural network model. The other layers except the output layer in the neural network model cause the distribution of input data of the back layer to change because the parameters are updated by the lower layer network in the training process. It is therefore necessary to normalize each layer of input data, as it is input, by BN layer preprocessing, such as network layer third layer input data X3, to: the mean value is 0, the variance is 1, and then the third layer of calculation is input to solve the problem of data distribution change.

Because the output data of a certain layer of the network layer is normalized and then sent to the next network layer, the learned characteristics of the network layer of the current layer can be influenced, and the characteristics learned by the certain layer of the original network layer can be restored. Therefore, the learnable reconstruction parameters gamma and beta are introduced, so that the network can learn and recover the feature distribution to be learned by the original network. y is^(k)＝γ^(k)x^(k)+β^(k)Each neuron x (k) in the neural network model will have a pair of such parameters γ, β.

And if the current preset neural network model is determined to comprise the BN layer, cutting the number of the BN layer of the current preset neural network model. The specific implementation mode is as follows:

as an optional implementation manner, the number of layers of the BN layer of the preset neural network model is cut by adding a second penalty term to the BN layer of the preset neural network model.

The second penalty item comprises a second adjusting coefficient used for adjusting the output value of the BN layer in the preset neural network model and a cutting BN layer output value range;

and adjusting the output value of the BN layer in the preset neural network model by dynamically adjusting the second adjustment coefficient in the second penalty term, so that the output value of the BN layer meets the condition that the number of BN layers in the output value range of the cutting BN layer is the maximum.

The second penalty term L2 is: lambda 2 sigma_k∈τg(k)；

In this embodiment, the specific value of λ 2 is not limited, and λ 2 is generally a number smaller than 1. In the process of training the neural network model, a curve in which the output values of all BN layers are sorted from small to large is drawn, as shown in fig. 2, where the curve S1 is a curve without penalty term, and the curve S2 is a curve with penalty term. In specific implementation, according to the actual training situation, the model is trained by dynamically adjusting lambda 2, and the output value of the BN layer is close to epsilon 2 as many BN layers as possible on the premise of ensuring the training effect.

In this embodiment, the values of ∈ 2 and C2 are not limited, where ∈ 2 is usually 0, and assuming that the value of C2 is 0.2, a BN layer whose BN layer output value is greater than 0 and less than 0.2 is clipped.

The neural network model is formed by connecting nodes on one layer by edges, each edge has weight, and network layer weight is clipped, namely, when the weight on some edges is small, the edges are not considered to be important, and the edges are removed.

And if the current preset neural network model does not comprise the BN layer, cutting the weight of the network layer of the current preset neural network model. The specific implementation mode is as follows:

as an optional implementation, the weights of the network layer of the preset neural network model are clipped by adding a first penalty term to the loss function of the preset neural network model.

The first penalty item comprises a first adjusting coefficient and a clipping weight value range, wherein the first adjusting coefficient is used for adjusting the weight value of the ith network layer in the preset neural network model;

and adjusting the weight value of the ith network layer in a preset neural network model by dynamically adjusting the first adjustment coefficient in the first penalty item, so that the number of the weights of the weight value in the ith network layer in a clipping weight value range is the largest.

As an optional implementation, the first penalty term L1 is:

In this embodiment, the specific value of λ 1 is not limited, and λ 1 is generally a number smaller than 1. In specific implementation, according to the actual training situation, the model is trained by dynamically adjusting λ 1, and the output value of the BN layer is made to approach to the BN layer of ∈ 1 as much as possible on the premise of ensuring the training effect.

In this embodiment, the values of ∈ 1 and C1 are not limited, ∈ 2 is usually 0, and assuming that the value of C1 is 0.2, a weight with a weight value greater than 0 and less than 0.2 is clipped.

Example two

Based on the same inventive concept, the present embodiment provides an object recognition device, as shown in fig. 3, the device includes a processor 301 and a memory 302, wherein the memory 302 stores an executable program, and the processor 301 implements the following processes when the executable program is executed:

acquiring image data for target recognition;

As an optional implementation manner, the processor 301 is specifically configured to:

after n training samples are used for training a preset neural network model, if the current preset neural network model is determined to include a BN layer, cutting the number of the BN layer of the current preset neural network model, if the current preset neural network model is determined not to include the BN layer, cutting the weight of the network layer of the current preset neural network model, wherein n is a preset positive integer smaller than the total number of the training samples.

As an optional implementation, the processor 301 is specifically configured to crop the weights of the network layer of the current preset neural network model, and includes:

and cutting the weight of the network layer of the preset neural network model by adding a first punishment item to the loss function of the preset neural network model.

As an optional implementation manner, the first penalty item includes a first adjustment coefficient and a clipping weight value range, where the first adjustment coefficient is used to adjust a weight value of an ith network layer in the preset neural network model;

the processor 301 is specifically configured to:

As an optional implementation, the first penalty term L1 is:

As an optional implementation manner, the processor 301 is specifically configured to crop the number of BN layers of the current preset neural network model, and includes:

and cutting the layer number of the BN layer of the preset neural network model by adding a second punishment item to the BN layer of the preset neural network model.

As an optional implementation manner, the second penalty term includes a second adjustment coefficient for adjusting the output value of the BN layer in the preset neural network model and a range of the output value of the clipped BN layer;

the processor 301 is specifically configured to:

As an alternative embodiment, the second penalty term L2 is：λ2∑_k∈τg(k)；

Based on the same inventive concept, the present embodiment further provides an apparatus for object recognition, as shown in fig. 4, the apparatus comprising:

an image data acquisition unit 401 for acquiring image data for target recognition;

a target recognition unit 402, configured to input the image data into a neural network model for target recognition, so as to obtain a recognition result of the target recognition, where a training sample is input into a preset neural network model for training, so as to obtain the neural network model, and in a process of training the preset neural network model, it is determined whether the preset neural network model includes a batch normalized BN layer; if so, cutting the number of BN layers in the preset neural network model, otherwise, cutting the weight of the network layers of the preset neural network model.

As an optional implementation manner, the object identifying unit 402 is specifically configured to:

As an optional implementation manner, the above-mentioned target identifying unit 402 is specifically configured to crop the weight of the network layer of the current preset neural network model, and includes:

the object identifying unit 402 is specifically configured to:

As an optional implementation, the first penalty term L1 is:

As an optional implementation manner, the above-mentioned target identifying unit 402 is specifically configured to perform the clipping on the number of BN layers of the current preset neural network model, and includes:

the object identifying unit 402 is specifically configured to:

As an optional implementation, the second penalty term L2 is: lambda 2 sigma_k∈τg(k)；

EXAMPLE III

The embodiment of the present invention further provides a computer-readable non-volatile storage medium, which includes a program code, and when the program code runs on a computing terminal, the program code is configured to enable the computing terminal to execute the steps of the method according to the first embodiment of the present invention.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method of object recognition, comprising:

acquiring image data for target recognition;

inputting the image data into a neural network model for target recognition to obtain a recognition result of the target recognition, inputting a training sample into a preset neural network model for training to obtain the neural network model, and determining whether the preset neural network model comprises a batch normalization BN layer or not in the process of training the preset neural network model; if so, cutting the number of BN layers in the preset neural network model, otherwise, cutting the weight of the network layers of the preset neural network model;

wherein, cutting the weight of the network layer of the preset neural network model, comprises:

cutting the weight of the network layer of the preset neural network model by adding a first penalty term to the loss function of the preset neural network model;

according to the clipping weight value range in the first penalty item, clipping the weight of the ith network layer in the preset neural network model;

cutting the number of BN layers of the preset neural network model, wherein the cutting comprises the following steps:

cutting the number of the BN layers of the preset neural network model by adding a second punishment item to the BN layers of the preset neural network model;

2. The method of claim 1,

3. The method of claim 1,

4. The method of claim 1,

the first penalty term L1 is:

5. The method of claim 1,

6. The method of claim 1,

the second penalty term L2 is: lambda 2 sigma_k∈τg(k)；

Where k is a parameter used in the BN layer calculation to reconstruct the input data distribution from the normalized values, τ is a set of k values, g (k) is the BN layer output values satisfying ∈ 2 < | g (k) | ≦ C2, where [ ∈ 2, C2] is the clipped BN layer output value range, and λ 2 is the second adjustment coefficient.

7. An object recognition device comprising a processor and a memory, wherein the memory stores an executable program, and wherein the processor implements the following when the executable program is executed:

acquiring image data for target recognition;

8. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.