CN108197669B

CN108197669B - Feature training method and device of convolutional neural network

Info

Publication number: CN108197669B
Application number: CN201810096726.8A
Authority: CN
Inventors: 张默; 刘彬; 孙伯元
Original assignee: Beijing Moshanghua Technology Co ltd
Current assignee: Beijing Moshanghua Technology Co ltd
Priority date: 2018-01-31
Filing date: 2018-01-31
Publication date: 2021-04-30
Anticipated expiration: 2038-01-31
Also published as: CN108197669A

Abstract

The application discloses a feature training method and device of a convolutional neural network. The feature training method comprises the following steps: extracting a first characteristic picture; determining a feature map of the first feature picture, and acquiring a first feature according to the feature map; calculating a loss value of a loss function using the first feature as an input; and updating the convolutional neural network according to the loss value. The method and the device solve the technical problems that the loss objective function can not ensure that the intra-class distance is relatively closer and the inter-class distance is relatively farther.

Description

Feature training method and device of convolutional neural network

Technical Field

The application relates to the field of computers, in particular to a method and a device for training characteristics of a convolutional neural network.

Background

The convolutional neural network has good performance in the field of computer vision, and is particularly applied to the fields of object identification, object detection, object segmentation and the like. Training a convolutional neural network, and using a layer of convolutional layer and an active layer to be stacked, the strong visual representation capability can be realized, wherein the convolutional neural network structure is composed of two parts: convolutional network, objective loss function.

The inventors have found that there are some penalty functions in convolutional neural networks, which have the disadvantage that it is difficult to ensure that the distances within a class are closer and the distances between classes are further apart. If this premise is guaranteed, the features proposed by the trained network can be more representative. In addition, some loss functions ensure that the distance in the classes is closer, but do not ensure that the distance between the classes is farther, and simultaneously influence the accuracy of object recognition, so the method is widely used in the field of face classification. There are also some penalty functions that guarantee both closer intra-class distance and farther inter-class distance, however the problem is that the training process is difficult to converge if some noise is present in the training data itself.

Aiming at the problem that the loss objective function in the related technology can not ensure that the intra-class distance is relatively closer and the inter-class distance is relatively farther, an effective solution is not provided at present.

Disclosure of Invention

The present application mainly aims to provide a feature training method for a convolutional neural network to solve the problem.

In order to achieve the above object, according to one aspect of the present application, there is provided a feature training method of a convolutional neural network, including: extracting a first characteristic picture; determining a feature map of the first feature picture, and acquiring a first feature according to the feature map; calculating a loss value of a loss function using the first feature as an input; and updating the convolutional neural network according to the loss value; wherein the loss function is used to make the features trained in the updated convolutional neural network conform to a preset category.

Further, calculating the loss value of the loss function includes: configuring a first loss function, wherein the first loss function is used as a combined loss function of Softmax and cross entropy; configuring a second loss function, wherein the second loss function is used as an angle loss function.

Further, calculating the loss value of the loss function includes:

wherein the content of the first and second substances,

denotes y_iCorresponding weight, N represents the number of input pictures;

and calculating an average value obtained by adding all probabilities corresponding to the N input pictures through a loss function.

Calculating the loss value of the loss function includes:

wherein the content of the first and second substances,

denotes y_iCorresponding weight, N denotes the number of input pictures, y_iRepresenting the corresponding category of each input picture;

calculating N pictures by loss function

Average value of (a).

Further, updating the convolutional neural network according to the loss value further includes: inputting a second picture to be tested; obtaining a corresponding second characteristic through the convolutional neural network after the loss value is updated; calculating a loss value of a loss function using the second feature as an input; and determining the category of the object corresponding to the second picture.

Further, the loss function is used to make the features trained in the updated convolutional neural network conform to preset categories as follows: the intra-class distance of the feature; inter-class distance of features.

In order to achieve the above object, according to another aspect of the present application, there is provided a feature training apparatus of a convolutional neural network.

The feature training device of the convolutional neural network according to the present application includes: the extraction unit is used for extracting a first characteristic picture; the determining unit is used for determining a feature map of the first feature picture and acquiring a first feature according to the feature map; a loss function unit for calculating a loss value of a loss function using the first feature as an input; the reverse unit is used for updating the convolutional neural network according to the loss value; wherein the loss function is used to make the features trained in the updated convolutional neural network conform to a preset category.

Further, the loss function unit includes: a first loss function unit and a second loss function unit, wherein the first loss function unit is used for being a combined loss function of Softmax and cross entropy; and the second loss function unit is used as an angle loss function.

Further, the apparatus further comprises: the test unit is used for inputting a second picture to be tested; obtaining a corresponding second characteristic through the convolutional neural network after the loss value is updated; calculating a loss value of a loss function using the second feature as an input; and determining the category of the object corresponding to the second picture.

Further, the inverse unit is further configured to make the features trained in the updated convolutional neural network conform to a preset through a loss function: the features are closer in-class distance; the inter-class distance of features is further.

In the embodiment of the application, a mode of optimizing feature training in the convolutional neural network is adopted, the loss function is used for enabling the features trained in the updated convolutional neural network to accord with the preset categories, the purpose of training the recognition capability to be stronger is achieved, the technical effect of training the features with stronger recognition capability is achieved, and the technical problems that the loss objective function cannot guarantee that the intra-class distance is relatively closer and the inter-class distance is relatively farther are solved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, serve to provide a further understanding of the application and to enable other features, objects, and advantages of the application to be more apparent. The drawings and their description illustrate the embodiments of the invention and do not limit it. In the drawings:

FIG. 1 is a schematic diagram of a feature training method for a convolutional neural network according to a first embodiment of the present application;

FIG. 2 is a schematic diagram of a feature training method for a convolutional neural network according to a second embodiment of the present application;

FIG. 3 is a schematic diagram of a feature training method for a convolutional neural network according to a third embodiment of the present application; and

FIG. 4 is a schematic diagram of a feature training apparatus for convolutional neural networks according to a preferred embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Many loss functions have been proposed so far, the original Softmax combined with cross entropy, which has the disadvantage that it is difficult to ensure that the distance within a class is closer and the distance between classes is further, and if this premise is guaranteed, the feature proposed by the trained network can be more characteristic.

The Center-Loss is proposed later, which can ensure that the distance in the class is closer, but does not ensure that the distance between the classes is farther, and simultaneously, the Center-Loss can influence the accuracy rate of object identification, so that the Center-Loss is widely used in the field of face classification; L-Softmax was proposed later, which guarantees both closer intra-class distance and farther inter-class distance, but it has the problem that the training process is difficult to converge if some noise is present in the training data itself.

According to the method, the optimized feature training mode in the convolutional neural network is adopted, the loss function is used for enabling the features trained in the updated convolutional neural network to accord with the preset categories, the purpose of stronger recognition capability is achieved, and the technical effect of training the features with stronger recognition capability is achieved.

The method in the embodiment of the application uses an angle-based loss function, which is mainly used in a training process of object recognition based on a deep learning convolutional neural network, wherein the main functions are shown as follows: a. the characterization capability of the trained features is stronger, namely the intra-class distance is closer, and the inter-class distance is farther; b. and on the premise of ensuring the establishment of the a, ensuring the convergence of the neural network training process.

(4) The target loss function related to the method can be used for model training of tasks except for object recognition, including object detection, object segmentation and the like.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

As shown in fig. 1, the method includes steps S102 to S108 as follows:

step S102, extracting a first characteristic picture;

inputting N pictures, and then carrying out normalization preprocessing on the N pictures to enable all pixel values to be between [ -1,1 ]; and then input into a convolutional neural network.

Because the convolutional neural network structure comprises a plurality of convolutional layers, an activation layer is connected behind each convolutional layer in the convolutional neural network, and a corresponding characteristic diagram is obtained after each convolutional layer passes through each convolutional layer.

And inputting the normalized data into a convolutional neural network to obtain a corresponding characteristic diagram.

Step S104, determining a feature map of the first feature picture, and acquiring a first feature according to the feature map;

determining the feature map of the feature picture means obtaining the feature map of the feature picture according to the number of channels of the feature map and the length and width of the feature map.

For example, let the size of each feature map be c × h × w, where c is the number of channels of the feature map, and h and w are the length and width of the feature map, since N pictures are input, that is, N feature maps can be obtained finally.

Step S106, taking the first characteristic as input, and calculating a loss value of a loss function;

and taking the multiple feature graphs as input, and obtaining dimensional multi-features as input through a full connection layer in the convolutional neural network.

For example, N feature maps are used as input, and NxM dimensional features are obtained through a full connection layer. I.e. N features, corresponding to N pictures, each feature being M-dimensional.

The calculation of the loss value of the loss function is to calculate the loss value of the loss function using the NxM-dimensional features and the class label of the picture as input.

Step S108, updating the convolutional neural network according to the loss value;

the loss function is used for enabling the features trained in the updated convolutional neural network to accord with preset categories.

The feature matching the predetermined category may be to ensure that the distance between the same (intra-class) features is closer and the distance between the different (inter-class) features is farther.

Specifically, the loss values of the loss functions are calculated, wherein two loss functions are included, the first loss function is a combination of Softmax and cross entropy, and the second loss function is an angle loss function.

From the above description, it can be seen that the present invention achieves the following technical effects:

in the embodiment of the application, a mode of optimizing feature training in the convolutional neural network is adopted, the loss function is used for enabling the features trained in the updated convolutional neural network to accord with the preset categories, the purpose of training the recognition capability to be stronger is achieved, the technical effect of training the features with stronger recognition capability is achieved, and the technical problems that the loss objective function cannot guarantee that the intra-class distance is relatively closer and the inter-class distance is relatively farther are solved. In the method in the embodiment of the application, more hyper-parameters are not introduced during training, so that the cost of manual parameter adjustment is reduced, and the use amount of video memory and memory is not obviously increased during training.

In the test process in the embodiment of the application, the extracted picture features can be used in the fields of object identification, object retrieval and the like.

According to the embodiment of the present invention, as a preferable feature in the embodiment, as shown in fig. 2, the calculating of the loss value of the loss function includes:

step S202, configuring a first loss function,

the first loss function is used as a combined loss function of Softmax and cross entropy;

calculating the loss value of the loss function includes:

wherein the content of the first and second substances,

denotes y_iCorresponding weight, N represents the number of input pictures;

The loss function means

Wherein f is the first feature obtained,

is a weight vector corresponding to class i, so

Is a category y_iCorresponding weight vectors (in this application, M categories are provided, each input picture corresponding to a particular category y_i)，y_iAs the real category corresponding to the input picture.

By passing

Multiplying by f yields a fraction, and

wherein the expression of (1) represents that f is judged to be y_iThe probability over the category of (a).

Step S204, configuring a second loss function,

the second loss function is used as an angular loss function.

Calculating the loss value of the loss function includes:

wherein the content of the first and second substances,

calculating N pictures by loss function

Average value of (a).

Wherein f is the first characteristic obtained by the method,

Represents

Cosine of the angle of f

In the range of [ -1,1]In the above-mentioned manner,the closer to 1, the

The smaller the angle between the vector and the f-feature vector.

The loss function LossFunction calculates N pictures

Average value of (1) can be

The angle with f is as small as possible.

According to the embodiment of the present invention, as shown in fig. 3, as a preferable option in the embodiment, after updating the convolutional neural network according to the loss value, the method further includes:

step S302, inputting a second picture to be tested;

inputting pictures to be tested, wherein the number of the pictures can be N (N > ═ 1), and obtaining corresponding characteristics through the trained neural network.

Step S304, obtaining a corresponding second characteristic through the convolutional neural network after the loss value is updated;

since after the loss value is calculated in step S108, all parameters of the entire network are updated using back propagation. Therefore, the picture to be tested is input into the updated convolutional neural network to obtain the corresponding characteristic diagram.

Step S306, taking the second characteristic as input, and calculating a loss value of a loss function;

input loss function through combination of Softmax and cross entropy

Function of angular loss

A loss value of the loss function is calculated.

Step S308, determining the category of the object corresponding to the second picture.

And (3) in the testing stage, the characteristic passes through a Softmax layer to obtain the probability of all known classes (the probability is added to be 1), and the class with the highest probability is selected as the class of the object corresponding to the picture.

Preferably, in this embodiment, the loss function is used to make the features trained in the updated convolutional neural network conform to preset categories as follows: the intra-class distance of the feature; inter-class distance of features.

And the loss function is used for ensuring that the intra-class distance of the features is closer and the inter-class distance of the features is farther when the training features in the updated convolutional neural network accord with the preset.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.

According to an embodiment of the present invention, there is also provided an apparatus for implementing the above feature training method for a convolutional neural network, as shown in fig. 4, the apparatus includes: an extraction unit 10, configured to extract a first feature picture; a determining unit 20, configured to determine a feature map of the first feature picture, and obtain a first feature according to the feature map; a loss function unit 30 configured to calculate a loss value of a loss function using the first feature as an input; an inverse unit 40, configured to update the convolutional neural network according to the loss value; wherein the loss function is used to make the features trained in the updated convolutional neural network conform to a preset category.

In the extraction unit 10 of the embodiment of the application, all pixel values are between [ -1,1] by inputting N pictures and then performing normalization preprocessing on the N pictures; and then input into a convolutional neural network.

Determining the feature map of the feature picture in the determining unit 20 in the embodiment of the present application means obtaining the feature map of the feature picture according to the number of channels of the feature map and the length and width of the feature map.

In the loss function unit 30 of the embodiment of the present application, a plurality of feature maps are used as input, and a dimensional multi-feature is obtained as input through a full connection layer in a convolutional neural network.

The loss function in the inverse unit 40 of the embodiment of the present application is used to make the features trained in the updated convolutional neural network conform to the preset categories.

As a preference in the present embodiment, the loss function unit 30 includes: a first loss function unit and a second loss function unit, wherein the first loss function unit is used for being a combined loss function of Softmax and cross entropy; and the second loss function unit is used as an angle loss function.

In the first loss function unit, calculating the loss value of the loss function includes:

wherein the content of the first and second substances,

denotes y_iCorresponding weight, N represents the number of input pictures;

The loss function means

Wherein f is the first feature obtained,

is a weight vector corresponding to class i, so

By passing

Multiplying by f yields a fraction, and

Calculating the loss value of the loss function in the second loss function unit includes:

wherein the content of the first and second substances,

calculating N pictures by loss function

Average value of (a).

Wherein f is the first characteristic obtained by the method,

Represents

Cosine of the angle of f

In the range of [ -1,1]The closer to 1, the

The smaller the angle between the vector and the f-feature vector.

The loss function LossFunction calculates N pictures

Average value of (1) can be

The angle with f is as small as possible.

As a preference in the present embodiment, the present invention further includes: the test unit is used for inputting a second picture to be tested; obtaining a corresponding second characteristic through the convolutional neural network after the loss value is updated; calculating a loss value of a loss function using the second feature as an input; and determining the category of the object corresponding to the second picture.

The test unit of the embodiment of the application inputs the pictures to be tested, the number of the pictures can be N (N > ═ 1), and the corresponding features are obtained through the trained neural network.

Input loss function through combination of Softmax and cross entropy

Function of angular loss

A loss value of the loss function is calculated.

The device for implementing the feature training method of the convolutional neural network trains the features with stronger recognition capability, and ensures that the intra-class distance of the features is closer and the inter-class distance of the features is farther. The training of the features is mainly based on an angle optimization loss function, combined with a Softmax cross entropy loss function, and compared with the features obtained by the traditional method only using Softmax cross entropy, the recognition rate of the features obtained by training in the device of the embodiment of the application is improved by 1% on data sets such as Cifar10 and Cifar100, the recognition accuracy of the training model on the two data sets by the original method is respectively 92.5% and 69.24%, and the recognition accuracy in the device of the embodiment of the application is 93.7% and 72%.

Compared with L-Softmax, the method is easier to train, the L-Softmax method has strong constraint on the features, the advantage is that the features with higher recognition rate can be trained, but the problem that the training process is difficult to converge can be met.

Specifically, in the apparatus of the embodiment of the present application, the feature training method of the neural network is performed as follows:

the method mainly aims at object recognition based on the deep learning convolutional neural network, and comprises a training stage and a testing stage, wherein the method is mainly used for the training stage and helps to train a model with stronger recognition capability;

a training stage: taking the whole convolutional neural network as two parts, wherein the first part is used for extracting features, and the second part is used for calculating loss functions of the features and optimizing the loss functions;

s1 inputting N pictures, wherein N is the number of the input pictures in batch processing, and the N pictures are subjected to normalization preprocessing to enable all pixel values to be between [ -1,1 ].

S2, a convolutional neural network structure is adopted, the convolutional neural network structure is composed of a plurality of convolutional layers, an activation layer is connected behind each convolutional layer, a corresponding characteristic diagram can be obtained after each convolutional layer passes through, the specific number and structure of convolutional layers can be changed according to specific tasks, and only the output of the last convolutional neural network is needed;

s3, obtaining a final feature map, wherein the size of each feature map is cxhxw, c is the channel number of the feature map, and h and w are the length and width of the feature map, and because N pictures are input, N feature maps are obtained finally;

S4N feature graphs are used as input, and NxM dimensional features, namely N features, are obtained through a full connection layer and correspond to the N pictures, wherein each feature is M-dimensional;

s5, taking the last NxM feature and the class label of the picture as input, and calculating a loss value of a loss function, where the loss value includes two loss functions, a first loss function is a combination of Softmax and cross entropy, and a second loss function is an angle loss function, and the specific formula is as follows:

s6, after calculating the loss value, updating all parameters of the whole network by using back propagation;

testing phase

S1, inputting pictures to be tested, wherein the number of the pictures is N (N > -1), and obtaining corresponding characteristics through the trained neural network;

and S2, the characteristic is subjected to a Softmax layer to obtain the probability of all known classes (the probability is added to be 1), and the class with the highest probability is selected as the class of the object corresponding to the picture.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and they may alternatively be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, or fabricated separately as individual integrated circuit modules, or fabricated as a single integrated circuit module from multiple modules or steps. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for training features of a convolutional neural network, comprising:

extracting a first characteristic picture;

determining a feature map of the first feature picture, and acquiring a first feature according to the feature map; calculating a loss value of a loss function using the first feature as an input, wherein calculating the loss value of the loss function comprises: configuring a first loss function, wherein the first loss function is used as a combined loss function of Softmax and cross entropy; configuring a second loss function, wherein the second loss function is used as an angle loss function; and

updating the convolutional neural network according to the loss value;

the loss function is used for enabling the features trained in the updated convolutional neural network to accord with preset categories;

obtaining the probability of the category of the first feature through a first loss function;

reducing the intra-class distance of the features through a second loss function, and increasing the inter-class distance of the features;

the extracting the first feature picture comprises:

inputting N pictures, performing normalization pretreatment on the N pictures, and inputting the normalized pictures into a convolutional neural network to obtain a corresponding characteristic diagram;

taking the first feature as an input, calculating a loss value of a loss function comprises:

n characteristic graphs are used as input, N multiplied by M dimensional characteristics are obtained through a full connection layer, and N characteristics correspond to N pictures and each characteristic is M-dimensional; calculating a loss value of a loss function by taking the characteristics of the dimension N multiplied by M and the class label of the picture as input;

wherein calculating the loss value of the loss function comprises: calculating a loss value of the first loss function and calculating a loss value of the second loss function;

calculating the loss value of the first loss function includes:

calculating an average value obtained by adding all probabilities corresponding to the N input pictures through a loss function;

the loss function means

Wherein f is the first feature obtained,

is a weight vector corresponding to class i, so

Is a category y_iCorresponding weight vector, M categories, each input picture corresponding to a specific category y_i，y_iAs the real category corresponding to the input picture;

by passing

Multiplying by f yields a fraction, and

wherein the expression of (1) represents that f is judged to be y_iThe probability over the category of (1);

calculating the loss value of the second loss function includes:

calculating N pictures by loss function

Average value of (d);

a loss function of

Wherein f is the first feature obtained,

is a category y_iCorresponding weight vector, M categories, each input picture corresponding to a specific category y_i，y_iAs the real category to which the input picture corresponds,

represents

Cosine of the angle of f

In the range of [ -1,1]The closer to 1, the

The smaller the angle between the vector and the f-feature vector.

2. The feature training method of claim 1, further comprising, after updating the convolutional neural network according to the loss value:

inputting a second picture to be tested;

obtaining a corresponding second characteristic through the convolutional neural network after the loss value is updated;

calculating a loss value of a loss function using the second feature as an input;

and determining the category of the object corresponding to the second picture.

3. The feature training method according to any one of claims 1-2, wherein the loss function is used to make the trained features in the updated convolutional neural network conform to preset categories as follows:

the features are closer in-class distance;

the inter-class distance of features is further.

4. A convolutional neural network feature training apparatus, comprising:

the extraction unit is used for extracting a first characteristic picture;

the determining unit is used for determining a feature map of the first feature picture and acquiring a first feature according to the feature map;

a loss function unit configured to calculate a loss value of a loss function using the first feature as an input, wherein the loss function unit includes: a first loss function unit and a second loss function unit,

the first loss function unit is used for being a combined loss function of Softmax and cross entropy;

the second loss function unit is used as an angle loss function;

the reverse unit is used for updating the convolutional neural network according to the loss value; the loss function is used for enabling the features trained in the updated convolutional neural network to accord with preset categories;

the extracting the first feature picture comprises:

taking N feature graphs as input, and obtaining N multiplied by M dimensional features, namely N features through a full connection layer, wherein each feature corresponds to N pictures and is M-dimensional; calculating a loss value of a loss function by taking the characteristics of the dimension N multiplied by M and the class label of the picture as input;

calculating the loss value of the first loss function includes:

the loss function means

Wherein f is the first feature obtained,

is a weight vector corresponding to class i, so

by passing

Multiplying by f yields a fraction, and

calculating the loss value of the second loss function includes:

calculating N pictures by loss function

Average value of (d);

a loss function of

Wherein f is the first feature obtained,

represents

Cosine of the angle of f

In the range of [ -1,1]The closer to 1, the

The smaller the angle between the vector and the f-feature vector.

5. The feature training device according to claim 4, further comprising: the test unit is used for inputting a second picture to be tested;

and determining the category of the object corresponding to the second picture.

6. The feature training apparatus according to claim 4, wherein the inverse unit is further configured to make the features trained in the updated convolutional neural network conform to a preset value through a loss function: the features are closer in-class distance; the inter-class distance of features is further.