CN111860054A

CN111860054A - Convolutional network training method and device

Info

Publication number: CN111860054A
Application number: CN201910348698.9A
Authority: CN
Inventors: 侯国梁
Original assignee: Potevio Information Technology Co Ltd
Current assignee: Potevio Information Technology Co Ltd
Priority date: 2019-04-28
Filing date: 2019-04-28
Publication date: 2020-10-30

Abstract

The embodiment of the invention discloses a convolutional network training method and a convolutional network training device. The method comprises the following steps: for each batch of mini-batch training pictures, after extracting the feature vector of each picture by using a deep convolutional network, calculating a corresponding loss value according to the feature vector, and determining a corresponding diffusivity loss value based on a diffusion principle; and correcting the loss value according to the diffusivity loss value, and performing back propagation adjustment on the parameters of the deep convolutional network according to the corrected result to finish the training of the small batch of training pictures. The invention can effectively solve the problem that the training performance is sharply reduced when the data scale is large.

Description

Convolutional network training method and device

Technical Field

The invention relates to the technical field of convolutional networks, in particular to a convolutional network training method and a convolutional network training device.

Background

In recent years, the convolution deep learning technology gradually plays an important role in various projects. In summary, in addition to reinforcement learning and multi-network models represented by generative countermeasure networks (GANs), simple end-to-end deep convolutional networks mainly solve three problems in the visual sense: classification problems, regression problems and feature-based distance similarity problems.

The distance similarity of the deep convolutional network is used for solving the problem of similarity between images, such as face recognition, image searching and the like, even the classification problem is indirectly achieved through a threshold value.

The existing network training method generally utilizes convolution to extract features, designs a bottleeck layer in a mode of constructing a positive sample pair and a negative sample pair, and realizes that the positive samples can be close to the whole network and the negative samples can be far away from the deep convolution network with certain similarity recognition capability through different distance algorithms.

In the process of implementing the invention, the inventor finds that: various algorithms of the current network training scheme mainly consider the realization of distance similarity distinction in the design process, but rarely consider the adoption of a mode of separating different classes as far as possible. In this way, although the existing network training scheme has a good data prediction effect on a certain scale, when the data is increased to a certain scale and the vectors are densely distributed in the dimensional space, the degree of distinction between the vectors is rapidly reduced due to the spatial saturation, so that the performance is rapidly reduced.

For example, assuming that the sample output is a two-dimensional vector (x1, x2), a block diagram of a normally trained distance estimation implementation is shown in fig. 1 by using the existing network training method. The mapping of the corresponding vector to a two-dimensional space is schematically illustrated in fig. 2. As can be seen from the above figures, although the trained deep convolutional network can correctly distinguish positive and negative samples and has a good effect on similar picture aggregation, in the inference process, the number of pictures increases, and although there is a distinction degree, it cannot make (gf), (ab), and (cd) reach the situation of zooming out as far as possible.

Disclosure of Invention

The embodiment of the invention provides a convolutional network training method and a convolutional network training device, which can solve the problem that the training performance is sharply reduced when the data scale is large.

The technical scheme of the embodiment of the invention is as follows:

a convolutional network training method, comprising:

for each batch of mini-batch training pictures, after extracting the feature vector of each picture by using a deep convolutional network, calculating a corresponding loss value according to the feature vector, and determining a corresponding diffusivity loss value based on a diffusion principle;

and correcting the loss value according to the diffusivity loss value, and performing back propagation adjustment on the parameters of the deep convolutional network according to the corrected result to finish the training of the small batch of training pictures.

In one embodiment, the determining the respective diffusivity loss values based on the diffusion principle includes:

for each class c corresponding to the current mini-batch training sample, according to the class c feature vector corresponding to the mini-batch, the mean value of the current class c feature vector is calculated

Updating, wherein c is the serial number of the class;

calculating a central matrix M, wherein each element of the central matrix M is M (c, j), and M (c, j) is the mean value of the characteristic vectors of the c-th class

And mean of class j feature vectors

Middle point of line segment

Wherein

j is the number of a class other than c;

calculating a distance matrix D, each element of D being D (c, j), D (c, j) being

Wherein

ComputingDiffusivity loss value loss_σWherein

Wherein beta is a proportional function, and the numeric value range of beta is a real number of (0-1); k is the number of classes; i is the sample number of the c-th class participating in training in the mini-batch at present; n is a radical of_CThe number of the feature vectors of the c-th class participating in training in the mini-batch is determined;

the feature vector is the ith feature vector of the c-th class in the mini-batch; II |₂Representing the 2 norm of the vector x therein, max () representing the operation that takes the larger value as the result is performed on all dimensions of the vector, respectively.

In one embodiment, the mean value of the current class c feature vector is used

The updating comprises the following steps:

if it is currently said

Is an initial value of 0, the value is added

Updating the mean value of the c-th class feature vector participating in training in the mini-batch at present;

if it is currently said

If not, according to

Is updated

Wherein the content of the first and second substances,

taking the mean value of the c-th class feature vector participating in training in the mini-batch at present, wherein alpha is a preset weight coefficient; alpha is more than or equal to 0 and less than or equal to 1; equation Right side

To mean the class c eigenvectors before update, left side of equation

Is the mean value of the updated class c feature vector.

In one embodiment, said modifying said loss value according to said diffusivity loss value comprises:

according to loss ═ loss + loss_σCorrecting the loss value loss to obtain a corrected loss value loss', wherein loss_σThe diffusivity loss value.

A convolutional network training apparatus, comprising:

the system comprises a first unit, a second unit and a third unit, wherein the first unit is used for extracting a feature vector of each picture by utilizing a deep convolutional network for each batch of mini-batch training pictures, calculating a corresponding loss value according to the feature vector and determining a corresponding diffusivity loss value based on a diffusion principle;

and the second unit is used for correcting the loss value according to the diffusivity loss value, and performing back propagation adjustment on the parameters of the deep convolutional network according to the corrected result to finish the training of the small batch of training pictures.

In an embodiment, the first unit is configured to, for each class c corresponding to the current training sample of the mini-batch, average the current class c feature vector according to the class c feature vector corresponding to the present mini-batch

Updating, wherein c is the serial number of the class; calculating a central matrix M, wherein each element of the central matrix M is M (c, j), and M (c, j) is the mean value of the characteristic vectors of the c-th class

And mean of class j feature vectors

Middle point of line segment

Wherein

j is the number of a class other than c; calculating a distance matrix D, each element of D being D (c, j), D (c, j) being

Wherein

Calculating the diffusivity loss value loss_σWherein

the feature vector is the ith feature vector of the c-th class in the mini-batch; II |₂Representing the 2 norm of the vector x therein, max () representing the larger value performed on all dimensions of the vector, respectivelyThe resulting operation.

In one embodiment, the first unit is configured to determine if the current state is the current state

Is an initial value of 0, the value is added

Updating the mean value of the c-th class feature vector participating in training in the mini-batch at present; if it is currently said

If not, according to

Is updated

Wherein the content of the first and second substances,

To mean the class c eigenvectors before update, left side of equation

Is the mean value of the updated class c feature vector.

In one embodiment, the second unit is configured to determine a loss + loss according to the loss ═ loss +_σCorrecting the loss value loss to obtain a corrected loss value loss', wherein loss_σThe diffusivity loss value.

A convolutional network training apparatus, comprising:

a memory; and a processor coupled to the memory, the processor configured to perform the method of any of the above based on instructions stored in the memory.

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, implements a method as claimed in any one of the above.

According to the technical scheme, in the embodiment of the invention, the method and the device for training the convolutional network, provided by the invention, obtain the diffusivity loss value based on the diffusion principle, correct the conventional loss value obtained by the existing method by using the diffusivity loss value, and then perform back propagation adjustment on the parameters of the deep convolutional network by using the corrected loss value. Therefore, the features output by the deep convolutional network of the same category can be effectively pulled away as far as possible, the larger the data volume is, the larger the distinguishable degree area provided by the features is, and the more effective similarity matching can be carried out, so that the problem that the training performance is sharply reduced when the data scale is large can be effectively solved.

Drawings

Fig. 1 is a schematic diagram of a normally trained distance estimation implementation when a sample is output as a two-dimensional vector (x1, x2) based on a conventional network training method.

Fig. 2 is a schematic diagram of the mapping of a two-dimensional vector corresponding to fig. 1 to a two-dimensional space.

FIG. 3 is a flow chart of a method according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of penalty degree of the loss function according to the embodiment of the present invention.

Fig. 5 is a schematic structural diagram of an apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the accompanying drawings.

For simplicity and clarity of description, the invention will be described below by describing several representative embodiments. Numerous details of the embodiments are set forth to provide an understanding of the principles of the invention. It will be apparent, however, that the invention may be practiced without these specific details. Some embodiments are not described in detail, but rather are merely provided as frameworks, in order to avoid unnecessarily obscuring aspects of the invention. Hereinafter, "including" means "including but not limited to", "according to … …" means "at least according to … …, but not limited to … … only". In view of the language convention of chinese, the following description, when it does not specifically state the number of a component, means that the component may be one or more, or may be understood as at least one.

Fig. 3 is a schematic flow chart of a method according to an embodiment of the present invention, and as shown in fig. 3, the convolutional network training method implemented by the embodiment mainly includes:

step 301, for each batch of small training pictures (mini-batch), after extracting the feature vector of each picture by using a deep convolutional network, calculating a corresponding loss value according to the feature vector, and determining a corresponding diffusivity loss value based on a diffusion principle.

It should be noted here that in the prior art, the corresponding loss value is calculated only from the feature vector, and is not corrected any more. The difference between the step and the existing method is that the diffusivity loss value needs to be determined based on the diffusion principle, so that the loss value obtained by the conventional method is further corrected in the subsequent step, the features output by the deep convolution network of the same category are pulled far as possible, the larger the data volume is, the larger the distinguishable region provided by the features is, the more effective similarity matching can be performed, and the problem that the training performance is sharply reduced when the data scale is larger can be effectively solved.

In this step, the corresponding loss value may be calculated according to the feature vector by using the existing method, so as to obtain the conventional loss value, which is not described herein again.

In this embodiment, in order to improve training efficiency and reduce the overhead of computational resources, the mini-batch is used as a basic processing unit for training, that is, a loss value and a diffusivity loss value are calculated for each mini-batch.

Preferably, the following method can be used to determine the corresponding diffusivity loss value based on the diffusion principle:

step x1, for each class c corresponding to the training sample of the mini-batch at present, according to the class c feature vector corresponding to the mini-batch at present, carrying out average value treatment on the current class c feature vector

Updating is performed, wherein c is the number of the class.

Preferably, the following method can be adopted to average the current class c feature vector

Updating:

if it is currently said

Is an initial value of 0, the value is added

if it is currently said

If not, according to

Is updated

Wherein the content of the first and second substances,

taking the mean value of the c-th class feature vector participating in training in the mini-batch at present, wherein alpha is a preset weight coefficient; alpha is more than or equal to 0 and less than or equal to 1; equation of

Middle, right side

Mean of the class c eigenvectors before update, left side

Is the mean value of the updated class c feature vector.

As can be seen from the above method, the updated

Is the modified mean of the past mini-batch vector weighted by alpha.

Step x2, calculating a central matrix M, wherein each element of the M is M (c, j), and M (c, j) is the mean value of the feature vectors of the c-th class

And mean of class j feature vectors

Middle point of line segment

Wherein

j is the number of a class other than c.

Step x3, calculating a distance matrix D, wherein each element of D is D (c, j), and D (c, j) is

Wherein

Step x4, calculating the diffusivity loss value loss_σWherein

In the above method, considering the limitations of operation speed and memory and video memory in engineering training, the mini-batch is generally used to update the whole, so step X1 aims to complete all the mean tables through multiple rounds of mini-batch. The midpoint matrix of the respective types of the center points with respect to each other is calculated by the step X3, and the distances between the respective types of the center points are calculated by the step X4.

And 302, correcting the loss value according to the diffusivity loss value, and performing back propagation adjustment on the parameters of the deep convolutional network according to the corrected result to finish the training of the small batch of training pictures.

Preferably, the loss value can be corrected according to the diffusivity loss value by adopting the following method:

according to loss ═ loss + loss_σCorrecting the loss value loss to obtain a corrected loss value loss', wherein the loss value loss is_σThe diffusivity loss value.

The back propagation adjustment in this step can be implemented by using the prior art, and the specific method is known by those skilled in the art and is not described herein again.

Therefore, the loss function algorithm provided by the invention can effectively expand the features output by different category depth convolution networks as much as possible, when the loss function algorithm is used for similar face recognition and a clear multi-category object is used for searching a picture, the same category can be furthest zoomed out, the larger the data volume is, the larger the distinguishable region provided by the features is, and the more effective similarity matching can be carried out.

The present application is described in further detail below.

Assume that the sample set is a tagged picture dataset X ═ { X0, X1, X2 …. Xc …. XK-1}, where X0, X1, X2 …. Xc are subsets of X, corresponding to respective classes 0, 1, 2 … K-1 of the classification tag, respectively, for a total of K classes. For example, the following steps are carried out: assume that a group of face training pictures total K persons, each face picture set is Xc, and any Xc contains Nc face photos of the same person with different angular poses.

Definition of

The feature vector of the ith picture in the c class. For example, the feature vector extracted by the deep convolution network is used for the ith face picture of the c-th person.

One is maintained during the training process

Vector tables and two-dimensional matrices of M and D. The following are recorded and maintained: the elements M (c, j) of M are vectors

Element D (c, j) of D is a scalar

Furthermore, it is possible to provide a liquid crystal display device,

both the two-dimensional matrices of table M and D are initialized to 0.

When a mini-batch outputs a feature vector, the following operations are performed:

step 1: for each class c: if class c is

Is the initial value 0 and the mini-batch contains class c, then

I is the mean value of the class feature vectors participating in training, i is the serial number of samples in class c, and the number is Nc; if class c is

If the value is not 0 and the mini-batch contains class c, then command

Wherein

Mean value of the class of feature vectors participating in training

c is the class, i is the sample number of the class.

Is the modified mean of the past mini-batch vector weighted by alpha.

Step 2: calculating a two-dimensional matrix M, each element of M being a vector M (c, j)

The physical meaning of the vector is a mean vector of feature vectors of two classes c and j

And

middle point of line segment

M is a symmetric matrix, each element is a line segment central point formed by mean value centers of c, j types, and the diagonal matrix is 0.

For example, when k is 3, i.e. a class 3 case, M is exemplarily as follows:

and step 3: calculating the element D (c, j) of D as

Is shown as

Is defined as the 2 norm of

For example, when k is 3, i.e. a 3-class case, D is exemplary as follows:

and 4, step 4: during the current mini-batch training process, the loss function loss is calculated_σWherein beta is a proportional function, and the numeric value range of beta is a real number of (0-1); k is the number of classes; i is the sample number of the c-th class participating in training in the mini-batch at present; n is a radical of_CThe number of the feature vectors of the c-th class participating in training in the mini-batch is determined;

the feature vector is the ith feature vector of the c-th class in the mini-batch; II |₂Representing the 2 norm of the vector x therein, max () representing the operation that takes the larger value as the result is performed on all dimensions of the vector, respectively. Wherein:

and 5: according to loss ═ loss + loss_σCorrecting the loss value loss to obtain a corrected loss value loss', wherein loss_σIs the diffusivity loss value. And then, according to the corrected result, performing back propagation adjustment on the parameters of the deep convolutional network to finish the training of the small batch of training pictures.

Based on FIG. 4, let

Is a component of L. F (i, j, c) was analyzed for specific significance.

To the center point of class c, j

The distance of (a) to (b),

is composed of

Distance to a certain hypersphere, the sphere

Is a core and is composed of all points with equal distance to the core, and the hyperplane contains

When in use

When in the hyperplane, the component is effective, otherwise 0, for eliminating the distance rulerInfluence of degree, will be to the effective distance of the hypersphere

Is divided by

And (6) carrying out normalization. The normalized result is then scaled down by the a coefficient in order to release

A nearby feature space.

It is clear that, as shown in figure 4,

closer to the cj center point

The greater F (i, j, c), the greater the penalty function penalty.

As shown in fig. 5, the convolutional network training device includes:

In an embodiment, the first unit is configured to, for each class c corresponding to the training sample of the present mini-batch, average the current class c feature vector according to the class c feature vector corresponding to the present mini-batch

And mean of class j feature vectors

Middle point of line segment

Wherein

Wherein

Calculating the diffusivity loss value loss_σWherein:

In one embodiment, the first sheetElement for if said current

Is an initial value of 0, the value is added

If not, according to

Is updated

Wherein the content of the first and second substances,

To mean the class c eigenvectors before update, left side of equation

Is the mean value of the updated class c feature vector.

The invention also provides an embodiment of a convolutional network training device, which comprises:

a memory; and a processor coupled to the memory, the processor configured to perform any of the method embodiments described above based on instructions stored in the memory.

Accordingly, the present invention further provides a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out any of the above-mentioned method embodiments.

It should be noted that not all steps and modules in the above flows and structures are necessary, and some steps or modules may be omitted according to actual needs. The execution order of the steps is not fixed and can be adjusted as required. The division of each module is only for convenience of describing adopted functional division, and in actual implementation, one module may be divided into multiple modules, and the functions of multiple modules may also be implemented by the same module, and these modules may be located in the same device or in different devices.

The hardware modules in the various embodiments may be implemented mechanically or electronically. For example, a hardware module may include a specially designed permanent circuit or logic device (e.g., a special purpose processor such as an FPGA or ASIC) for performing specific operations. A hardware module may also include programmable logic devices or circuits (e.g., including a general-purpose processor or other programmable processor) that are temporarily configured by software to perform certain operations. The implementation of the hardware module in a mechanical manner, or in a dedicated permanent circuit, or in a temporarily configured circuit (e.g., configured by software), may be determined based on cost and time considerations.

The present invention also provides a machine-readable storage medium storing instructions for causing a machine to perform a method as described herein. Specifically, a system or an apparatus equipped with a storage medium on which a software program code that realizes the functions of any of the embodiments described above is stored may be provided, and a computer (or a CPU or MPU) of the system or the apparatus is caused to read out and execute the program code stored in the storage medium. Further, part or all of the actual operations may be performed by an operating system or the like operating on the computer by instructions based on the program code. The functions of any of the above-described embodiments may also be implemented by writing the program code read out from the storage medium to a memory provided in an expansion board inserted into the computer or to a memory provided in an expansion unit connected to the computer, and then causing a CPU or the like mounted on the expansion board or the expansion unit to perform part or all of the actual operations based on the instructions of the program code.

Examples of the storage medium for supplying the program code include floppy disks, hard disks, magneto-optical disks, optical disks (e.g., CD-ROMs, CD-R, CD-RWs, DVD-ROMs, DVD-RAMs, DVD-RWs, DVD + RWs), magnetic tapes, nonvolatile memory cards, and ROMs. Alternatively, the program code may be downloaded from a server computer or the cloud by a communication network.

"exemplary" means "serving as an example, instance, or illustration" herein, and any illustration, embodiment, or steps described as "exemplary" herein should not be construed as a preferred or advantageous alternative. For the sake of simplicity, the drawings are only schematic representations of the parts relevant to the invention, and do not represent the actual structure of the product. In addition, in order to make the drawings concise and understandable, components having the same structure or function in some of the drawings are only schematically illustrated or only labeled. In this document, "a" does not mean that the number of the relevant portions of the present invention is limited to "only one", and "a" does not mean that the number of the relevant portions of the present invention "more than one" is excluded. In this document, "upper", "lower", "front", "rear", "left", "right", "inner", "outer", and the like are used only to indicate relative positional relationships between relevant portions, and do not limit absolute positions of the relevant portions.

The above-listed detailed description is only a specific description of a possible embodiment of the present invention and is not intended to limit the scope of the present invention, and equivalent embodiments or modifications such as combinations, divisions or repetitions of the features without departing from the technical spirit of the present invention are included in the scope of the present invention.

Claims

1. A convolutional network training method, comprising:

2. The method of claim 1, wherein determining the respective diffusivity loss value based on a diffusion principle comprises:

for each class c corresponding to the current mini-batch training sample, according to the class c characteristic vector corresponding to the mini-batch, the mean value of the current class c characteristic vector is calculated

Updating, wherein c is the serial number of the class;

And mean of class j feature vectors

Middle point of line segment

Wherein

j is the number of a class other than c;

Wherein

Calculating the diffusivity loss value loss_σWherein

3. The method of claim 2, wherein the mean value of the class c current feature vector is determined

The updating comprises the following steps:

if it is currently said

Is an initial value of 0, the value is added

if it is currently said

If not, according to

Is updated

Wherein the content of the first and second substances,

To mean the class c eigenvectors before update, left side of equation

Is the mean value of the updated class c feature vector.

4. The method of any one of claims 1-3, wherein said modifying said loss value according to said diffusivity loss value comprises:

5. A convolutional network training apparatus, comprising:

6. The apparatus of claim 5,

the first unit is used for carrying out average value treatment on the current c-class characteristic vector according to the c-class characteristic vector corresponding to the mini-batch for each class c corresponding to the current training sample of the mini-batch

And mean of class j feature vectors

Middle point of line segment

Wherein

Wherein

Calculating the diffusivity loss value loss_σWherein

7. The apparatus of claim 6,

The first unit is used for judging if the current state is the current state

Is an initial value of 0, the value is added

If not, according to

Is updated

Wherein the content of the first and second substances,

To mean the class c eigenvectors before update, left side of equation

Is the mean value of the updated class c feature vector.

8. The apparatus according to any one of claims 5-7,

the second unit is used for obtaining the loss + loss according to the loss ═ loss +_σCorrecting the loss value loss to obtain a corrected loss value loss', wherein loss_σThe diffusivity loss value.

9. A convolutional network training apparatus, comprising:

a memory; and a processor coupled to the memory, the processor configured to perform the method of any of claims 1-4 based on instructions stored in the memory.

10. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the method of any one of claims 1-4.