CN111860054A - Convolutional network training method and device - Google Patents

Convolutional network training method and device Download PDF

Info

Publication number
CN111860054A
CN111860054A CN201910348698.9A CN201910348698A CN111860054A CN 111860054 A CN111860054 A CN 111860054A CN 201910348698 A CN201910348698 A CN 201910348698A CN 111860054 A CN111860054 A CN 111860054A
Authority
CN
China
Prior art keywords
class
loss
batch
training
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201910348698.9A
Other languages
Chinese (zh)
Inventor
侯国梁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Potevio Information Technology Co Ltd
Original Assignee
Potevio Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Potevio Information Technology Co Ltd filed Critical Potevio Information Technology Co Ltd
Priority to CN201910348698.9A priority Critical patent/CN111860054A/en
Publication of CN111860054A publication Critical patent/CN111860054A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention discloses a convolutional network training method and a convolutional network training device. The method comprises the following steps: for each batch of mini-batch training pictures, after extracting the feature vector of each picture by using a deep convolutional network, calculating a corresponding loss value according to the feature vector, and determining a corresponding diffusivity loss value based on a diffusion principle; and correcting the loss value according to the diffusivity loss value, and performing back propagation adjustment on the parameters of the deep convolutional network according to the corrected result to finish the training of the small batch of training pictures. The invention can effectively solve the problem that the training performance is sharply reduced when the data scale is large.

Description

Convolutional network training method and device
Technical Field
The invention relates to the technical field of convolutional networks, in particular to a convolutional network training method and a convolutional network training device.
Background
In recent years, the convolution deep learning technology gradually plays an important role in various projects. In summary, in addition to reinforcement learning and multi-network models represented by generative countermeasure networks (GANs), simple end-to-end deep convolutional networks mainly solve three problems in the visual sense: classification problems, regression problems and feature-based distance similarity problems.
The distance similarity of the deep convolutional network is used for solving the problem of similarity between images, such as face recognition, image searching and the like, even the classification problem is indirectly achieved through a threshold value.
The existing network training method generally utilizes convolution to extract features, designs a bottleeck layer in a mode of constructing a positive sample pair and a negative sample pair, and realizes that the positive samples can be close to the whole network and the negative samples can be far away from the deep convolution network with certain similarity recognition capability through different distance algorithms.
In the process of implementing the invention, the inventor finds that: various algorithms of the current network training scheme mainly consider the realization of distance similarity distinction in the design process, but rarely consider the adoption of a mode of separating different classes as far as possible. In this way, although the existing network training scheme has a good data prediction effect on a certain scale, when the data is increased to a certain scale and the vectors are densely distributed in the dimensional space, the degree of distinction between the vectors is rapidly reduced due to the spatial saturation, so that the performance is rapidly reduced.
For example, assuming that the sample output is a two-dimensional vector (x1, x2), a block diagram of a normally trained distance estimation implementation is shown in fig. 1 by using the existing network training method. The mapping of the corresponding vector to a two-dimensional space is schematically illustrated in fig. 2. As can be seen from the above figures, although the trained deep convolutional network can correctly distinguish positive and negative samples and has a good effect on similar picture aggregation, in the inference process, the number of pictures increases, and although there is a distinction degree, it cannot make (gf), (ab), and (cd) reach the situation of zooming out as far as possible.
Disclosure of Invention
The embodiment of the invention provides a convolutional network training method and a convolutional network training device, which can solve the problem that the training performance is sharply reduced when the data scale is large.
The technical scheme of the embodiment of the invention is as follows:
a convolutional network training method, comprising:
for each batch of mini-batch training pictures, after extracting the feature vector of each picture by using a deep convolutional network, calculating a corresponding loss value according to the feature vector, and determining a corresponding diffusivity loss value based on a diffusion principle;
and correcting the loss value according to the diffusivity loss value, and performing back propagation adjustment on the parameters of the deep convolutional network according to the corrected result to finish the training of the small batch of training pictures.
In one embodiment, the determining the respective diffusivity loss values based on the diffusion principle includes:
for each class c corresponding to the current mini-batch training sample, according to the class c feature vector corresponding to the mini-batch, the mean value of the current class c feature vector is calculated
Figure RE-RE-GDA0002158899740000021
Updating, wherein c is the serial number of the class;
calculating a central matrix M, wherein each element of the central matrix M is M (c, j), and M (c, j) is the mean value of the characteristic vectors of the c-th class
Figure RE-RE-GDA0002158899740000022
And mean of class j feature vectors
Figure RE-RE-GDA0002158899740000023
Middle point of line segment
Figure RE-RE-GDA0002158899740000024
Wherein
Figure RE-RE-GDA0002158899740000025
Figure RE-RE-GDA0002158899740000026
j is the number of a class other than c;
calculating a distance matrix D, each element of D being D (c, j), D (c, j) being
Figure RE-RE-GDA0002158899740000027
Wherein
Figure RE-RE-GDA0002158899740000028
ComputingDiffusivity loss value lossσWherein
Figure RE-RE-GDA0002158899740000031
Figure RE-RE-GDA0002158899740000032
Wherein beta is a proportional function, and the numeric value range of beta is a real number of (0-1); k is the number of classes; i is the sample number of the c-th class participating in training in the mini-batch at present; n is a radical ofCThe number of the feature vectors of the c-th class participating in training in the mini-batch is determined;
Figure RE-RE-GDA0002158899740000033
the feature vector is the ith feature vector of the c-th class in the mini-batch; II |2Representing the 2 norm of the vector x therein, max () representing the operation that takes the larger value as the result is performed on all dimensions of the vector, respectively.
In one embodiment, the mean value of the current class c feature vector is used
Figure RE-RE-GDA0002158899740000034
The updating comprises the following steps:
if it is currently said
Figure RE-RE-GDA0002158899740000035
Is an initial value of 0, the value is added
Figure RE-RE-GDA0002158899740000036
Updating the mean value of the c-th class feature vector participating in training in the mini-batch at present;
if it is currently said
Figure RE-RE-GDA0002158899740000037
If not, according to
Figure RE-RE-GDA0002158899740000038
Is updated
Figure RE-RE-GDA0002158899740000039
Wherein the content of the first and second substances,
Figure RE-RE-GDA00021588997400000310
taking the mean value of the c-th class feature vector participating in training in the mini-batch at present, wherein alpha is a preset weight coefficient; alpha is more than or equal to 0 and less than or equal to 1; equation Right side
Figure RE-RE-GDA00021588997400000311
To mean the class c eigenvectors before update, left side of equation
Figure RE-RE-GDA00021588997400000312
Is the mean value of the updated class c feature vector.
In one embodiment, said modifying said loss value according to said diffusivity loss value comprises:
according to loss ═ loss + lossσCorrecting the loss value loss to obtain a corrected loss value loss', wherein lossσThe diffusivity loss value.
A convolutional network training apparatus, comprising:
the system comprises a first unit, a second unit and a third unit, wherein the first unit is used for extracting a feature vector of each picture by utilizing a deep convolutional network for each batch of mini-batch training pictures, calculating a corresponding loss value according to the feature vector and determining a corresponding diffusivity loss value based on a diffusion principle;
and the second unit is used for correcting the loss value according to the diffusivity loss value, and performing back propagation adjustment on the parameters of the deep convolutional network according to the corrected result to finish the training of the small batch of training pictures.
In an embodiment, the first unit is configured to, for each class c corresponding to the current training sample of the mini-batch, average the current class c feature vector according to the class c feature vector corresponding to the present mini-batch
Figure RE-RE-GDA0002158899740000041
Updating, wherein c is the serial number of the class; calculating a central matrix M, wherein each element of the central matrix M is M (c, j), and M (c, j) is the mean value of the characteristic vectors of the c-th class
Figure RE-RE-GDA0002158899740000042
And mean of class j feature vectors
Figure RE-RE-GDA0002158899740000043
Middle point of line segment
Figure RE-RE-GDA0002158899740000044
Wherein
Figure RE-RE-GDA0002158899740000045
j is the number of a class other than c; calculating a distance matrix D, each element of D being D (c, j), D (c, j) being
Figure RE-RE-GDA0002158899740000046
Wherein
Figure RE-RE-GDA0002158899740000047
Calculating the diffusivity loss value lossσWherein
Figure DA00020432229547598
Figure RE-RE-GDA0002158899740000049
Wherein beta is a proportional function, and the numeric value range of beta is a real number of (0-1); k is the number of classes; i is the sample number of the c-th class participating in training in the mini-batch at present; n is a radical ofCThe number of the feature vectors of the c-th class participating in training in the mini-batch is determined;
Figure RE-RE-GDA00021588997400000410
the feature vector is the ith feature vector of the c-th class in the mini-batch; II |2Representing the 2 norm of the vector x therein, max () representing the larger value performed on all dimensions of the vector, respectivelyThe resulting operation.
In one embodiment, the first unit is configured to determine if the current state is the current state
Figure RE-RE-GDA00021588997400000411
Is an initial value of 0, the value is added
Figure RE-RE-GDA00021588997400000412
Updating the mean value of the c-th class feature vector participating in training in the mini-batch at present; if it is currently said
Figure RE-RE-GDA00021588997400000413
If not, according to
Figure RE-RE-GDA00021588997400000414
Is updated
Figure RE-RE-GDA00021588997400000415
Wherein the content of the first and second substances,
Figure RE-RE-GDA00021588997400000416
taking the mean value of the c-th class feature vector participating in training in the mini-batch at present, wherein alpha is a preset weight coefficient; alpha is more than or equal to 0 and less than or equal to 1; equation Right side
Figure RE-RE-GDA00021588997400000417
To mean the class c eigenvectors before update, left side of equation
Figure RE-RE-GDA00021588997400000418
Is the mean value of the updated class c feature vector.
In one embodiment, the second unit is configured to determine a loss + loss according to the loss ═ loss +σCorrecting the loss value loss to obtain a corrected loss value loss', wherein lossσThe diffusivity loss value.
A convolutional network training apparatus, comprising:
a memory; and a processor coupled to the memory, the processor configured to perform the method of any of the above based on instructions stored in the memory.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, implements a method as claimed in any one of the above.
According to the technical scheme, in the embodiment of the invention, the method and the device for training the convolutional network, provided by the invention, obtain the diffusivity loss value based on the diffusion principle, correct the conventional loss value obtained by the existing method by using the diffusivity loss value, and then perform back propagation adjustment on the parameters of the deep convolutional network by using the corrected loss value. Therefore, the features output by the deep convolutional network of the same category can be effectively pulled away as far as possible, the larger the data volume is, the larger the distinguishable degree area provided by the features is, and the more effective similarity matching can be carried out, so that the problem that the training performance is sharply reduced when the data scale is large can be effectively solved.
Drawings
Fig. 1 is a schematic diagram of a normally trained distance estimation implementation when a sample is output as a two-dimensional vector (x1, x2) based on a conventional network training method.
Fig. 2 is a schematic diagram of the mapping of a two-dimensional vector corresponding to fig. 1 to a two-dimensional space.
FIG. 3 is a flow chart of a method according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of penalty degree of the loss function according to the embodiment of the present invention.
Fig. 5 is a schematic structural diagram of an apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the accompanying drawings.
For simplicity and clarity of description, the invention will be described below by describing several representative embodiments. Numerous details of the embodiments are set forth to provide an understanding of the principles of the invention. It will be apparent, however, that the invention may be practiced without these specific details. Some embodiments are not described in detail, but rather are merely provided as frameworks, in order to avoid unnecessarily obscuring aspects of the invention. Hereinafter, "including" means "including but not limited to", "according to … …" means "at least according to … …, but not limited to … … only". In view of the language convention of chinese, the following description, when it does not specifically state the number of a component, means that the component may be one or more, or may be understood as at least one.
Fig. 3 is a schematic flow chart of a method according to an embodiment of the present invention, and as shown in fig. 3, the convolutional network training method implemented by the embodiment mainly includes:
step 301, for each batch of small training pictures (mini-batch), after extracting the feature vector of each picture by using a deep convolutional network, calculating a corresponding loss value according to the feature vector, and determining a corresponding diffusivity loss value based on a diffusion principle.
It should be noted here that in the prior art, the corresponding loss value is calculated only from the feature vector, and is not corrected any more. The difference between the step and the existing method is that the diffusivity loss value needs to be determined based on the diffusion principle, so that the loss value obtained by the conventional method is further corrected in the subsequent step, the features output by the deep convolution network of the same category are pulled far as possible, the larger the data volume is, the larger the distinguishable region provided by the features is, the more effective similarity matching can be performed, and the problem that the training performance is sharply reduced when the data scale is larger can be effectively solved.
In this step, the corresponding loss value may be calculated according to the feature vector by using the existing method, so as to obtain the conventional loss value, which is not described herein again.
In this embodiment, in order to improve training efficiency and reduce the overhead of computational resources, the mini-batch is used as a basic processing unit for training, that is, a loss value and a diffusivity loss value are calculated for each mini-batch.
Preferably, the following method can be used to determine the corresponding diffusivity loss value based on the diffusion principle:
step x1, for each class c corresponding to the training sample of the mini-batch at present, according to the class c feature vector corresponding to the mini-batch at present, carrying out average value treatment on the current class c feature vector
Figure RE-RE-GDA0002158899740000061
Updating is performed, wherein c is the number of the class.
Preferably, the following method can be adopted to average the current class c feature vector
Figure RE-RE-GDA0002158899740000062
Updating:
if it is currently said
Figure RE-RE-GDA0002158899740000071
Is an initial value of 0, the value is added
Figure RE-RE-GDA0002158899740000072
Updating the mean value of the c-th class feature vector participating in training in the mini-batch at present;
if it is currently said
Figure RE-RE-GDA0002158899740000073
If not, according to
Figure RE-RE-GDA0002158899740000074
Is updated
Figure RE-RE-GDA0002158899740000075
Wherein the content of the first and second substances,
Figure RE-RE-GDA0002158899740000076
taking the mean value of the c-th class feature vector participating in training in the mini-batch at present, wherein alpha is a preset weight coefficient; alpha is more than or equal to 0 and less than or equal to 1; equation of
Figure RE-RE-GDA0002158899740000077
Middle, right side
Figure RE-RE-GDA0002158899740000078
Mean of the class c eigenvectors before update, left side
Figure RE-RE-GDA0002158899740000079
Is the mean value of the updated class c feature vector.
As can be seen from the above method, the updated
Figure RE-RE-GDA00021588997400000710
Is the modified mean of the past mini-batch vector weighted by alpha.
Step x2, calculating a central matrix M, wherein each element of the M is M (c, j), and M (c, j) is the mean value of the feature vectors of the c-th class
Figure RE-RE-GDA00021588997400000711
And mean of class j feature vectors
Figure RE-RE-GDA00021588997400000712
Middle point of line segment
Figure RE-RE-GDA00021588997400000713
Wherein
Figure RE-RE-GDA00021588997400000714
j is the number of a class other than c.
Step x3, calculating a distance matrix D, wherein each element of D is D (c, j), and D (c, j) is
Figure RE-RE-GDA00021588997400000715
Wherein
Figure RE-RE-GDA00021588997400000716
Step x4, calculating the diffusivity loss value lossσWherein
Figure RE-RE-GDA00021588997400000717
Wherein beta is a proportional function, and the numeric value range of beta is a real number of (0-1); k is the number of classes; i is the sample number of the c-th class participating in training in the mini-batch at present; n is a radical ofCThe number of the feature vectors of the c-th class participating in training in the mini-batch is determined;
Figure RE-RE-GDA00021588997400000718
the feature vector is the ith feature vector of the c-th class in the mini-batch; II |2Representing the 2 norm of the vector x therein, max () representing the operation that takes the larger value as the result is performed on all dimensions of the vector, respectively.
In the above method, considering the limitations of operation speed and memory and video memory in engineering training, the mini-batch is generally used to update the whole, so step X1 aims to complete all the mean tables through multiple rounds of mini-batch. The midpoint matrix of the respective types of the center points with respect to each other is calculated by the step X3, and the distances between the respective types of the center points are calculated by the step X4.
And 302, correcting the loss value according to the diffusivity loss value, and performing back propagation adjustment on the parameters of the deep convolutional network according to the corrected result to finish the training of the small batch of training pictures.
Preferably, the loss value can be corrected according to the diffusivity loss value by adopting the following method:
according to loss ═ loss + lossσCorrecting the loss value loss to obtain a corrected loss value loss', wherein the loss value loss isσThe diffusivity loss value.
The back propagation adjustment in this step can be implemented by using the prior art, and the specific method is known by those skilled in the art and is not described herein again.
Therefore, the loss function algorithm provided by the invention can effectively expand the features output by different category depth convolution networks as much as possible, when the loss function algorithm is used for similar face recognition and a clear multi-category object is used for searching a picture, the same category can be furthest zoomed out, the larger the data volume is, the larger the distinguishable region provided by the features is, and the more effective similarity matching can be carried out.
The present application is described in further detail below.
Assume that the sample set is a tagged picture dataset X ═ { X0, X1, X2 …. Xc …. XK-1}, where X0, X1, X2 …. Xc are subsets of X, corresponding to respective classes 0, 1, 2 … K-1 of the classification tag, respectively, for a total of K classes. For example, the following steps are carried out: assume that a group of face training pictures total K persons, each face picture set is Xc, and any Xc contains Nc face photos of the same person with different angular poses.
Definition of
Figure RE-RE-GDA0002158899740000081
The feature vector of the ith picture in the c class. For example, the feature vector extracted by the deep convolution network is used for the ith face picture of the c-th person.
One is maintained during the training process
Figure RE-RE-GDA0002158899740000082
Vector tables and two-dimensional matrices of M and D. The following are recorded and maintained: the elements M (c, j) of M are vectors
Figure RE-RE-GDA0002158899740000083
Element D (c, j) of D is a scalar
Figure RE-RE-GDA0002158899740000084
Furthermore, it is possible to provide a liquid crystal display device,
Figure RE-RE-GDA0002158899740000085
both the two-dimensional matrices of table M and D are initialized to 0.
When a mini-batch outputs a feature vector, the following operations are performed:
step 1: for each class c: if class c is
Figure RE-RE-GDA0002158899740000086
Is the initial value 0 and the mini-batch contains class c, then
Figure RE-RE-GDA0002158899740000091
I is the mean value of the class feature vectors participating in training, i is the serial number of samples in class c, and the number is Nc; if class c is
Figure RE-RE-GDA0002158899740000092
If the value is not 0 and the mini-batch contains class c, then command
Figure RE-RE-GDA0002158899740000093
Wherein
Figure RE-RE-GDA0002158899740000094
Mean value of the class of feature vectors participating in training
Figure RE-RE-GDA0002158899740000095
c is the class, i is the sample number of the class.
Figure RE-RE-GDA0002158899740000096
Is the modified mean of the past mini-batch vector weighted by alpha.
Step 2: calculating a two-dimensional matrix M, each element of M being a vector M (c, j)
Figure RE-RE-GDA0002158899740000097
The physical meaning of the vector is a mean vector of feature vectors of two classes c and j
Figure RE-RE-GDA0002158899740000098
And
Figure RE-RE-GDA0002158899740000099
middle point of line segment
Figure RE-RE-GDA00021588997400000910
Figure RE-RE-GDA00021588997400000911
M is a symmetric matrix, each element is a line segment central point formed by mean value centers of c, j types, and the diagonal matrix is 0.
For example, when k is 3, i.e. a class 3 case, M is exemplarily as follows:
Figure RE-RE-GDA00021588997400000912
and step 3: calculating the element D (c, j) of D as
Figure RE-RE-GDA00021588997400000913
Figure RE-RE-GDA00021588997400000914
Is shown as
Figure RE-RE-GDA00021588997400000915
Is defined as the 2 norm of
Figure RE-RE-GDA00021588997400000916
For example, when k is 3, i.e. a 3-class case, D is exemplary as follows:
Figure RE-RE-GDA00021588997400000917
and 4, step 4: during the current mini-batch training process, the loss function loss is calculatedσWherein beta is a proportional function, and the numeric value range of beta is a real number of (0-1); k is the number of classes; i is the sample number of the c-th class participating in training in the mini-batch at present; n is a radical ofCThe number of the feature vectors of the c-th class participating in training in the mini-batch is determined;
Figure RE-RE-GDA00021588997400000918
the feature vector is the ith feature vector of the c-th class in the mini-batch; II |2Representing the 2 norm of the vector x therein, max () representing the operation that takes the larger value as the result is performed on all dimensions of the vector, respectively. Wherein:
Figure RE-RE-GDA00021588997400000919
and 5: according to loss ═ loss + lossσCorrecting the loss value loss to obtain a corrected loss value loss', wherein lossσIs the diffusivity loss value. And then, according to the corrected result, performing back propagation adjustment on the parameters of the deep convolutional network to finish the training of the small batch of training pictures.
Fig. 4 is a schematic diagram of penalty degree of the loss function according to the embodiment of the present invention.
Based on FIG. 4, let
Figure RE-RE-GDA0002158899740000101
Is a component of L. F (i, j, c) was analyzed for specific significance.
Figure RE-RE-GDA0002158899740000102
To the center point of class c, j
Figure RE-RE-GDA0002158899740000103
The distance of (a) to (b),
Figure RE-RE-GDA0002158899740000104
Figure RE-RE-GDA0002158899740000105
is composed of
Figure RE-RE-GDA0002158899740000106
Distance to a certain hypersphere, the sphere
Figure RE-RE-GDA0002158899740000107
Is a core and is composed of all points with equal distance to the core, and the hyperplane contains
Figure RE-RE-GDA0002158899740000108
When in use
Figure RE-RE-GDA0002158899740000109
When in the hyperplane, the component is effective, otherwise 0, for eliminating the distance rulerInfluence of degree, will be to the effective distance of the hypersphere
Figure RE-RE-GDA00021588997400001010
Is divided by
Figure RE-RE-GDA00021588997400001011
And (6) carrying out normalization. The normalized result is then scaled down by the a coefficient in order to release
Figure RE-RE-GDA00021588997400001012
A nearby feature space.
It is clear that, as shown in figure 4,
Figure RE-RE-GDA00021588997400001013
closer to the cj center point
Figure RE-RE-GDA00021588997400001014
The greater F (i, j, c), the greater the penalty function penalty.
Fig. 5 is a schematic structural diagram of an apparatus according to an embodiment of the present invention.
As shown in fig. 5, the convolutional network training device includes:
the system comprises a first unit, a second unit and a third unit, wherein the first unit is used for extracting a feature vector of each picture by utilizing a deep convolutional network for each batch of mini-batch training pictures, calculating a corresponding loss value according to the feature vector and determining a corresponding diffusivity loss value based on a diffusion principle;
and the second unit is used for correcting the loss value according to the diffusivity loss value, and performing back propagation adjustment on the parameters of the deep convolutional network according to the corrected result to finish the training of the small batch of training pictures.
In an embodiment, the first unit is configured to, for each class c corresponding to the training sample of the present mini-batch, average the current class c feature vector according to the class c feature vector corresponding to the present mini-batch
Figure RE-RE-GDA00021588997400001015
Updating, wherein c is the serial number of the class; calculating a central matrix M, wherein each element of the central matrix M is M (c, j), and M (c, j) is the mean value of the characteristic vectors of the c-th class
Figure RE-RE-GDA00021588997400001016
And mean of class j feature vectors
Figure RE-RE-GDA0002158899740000111
Middle point of line segment
Figure RE-RE-GDA0002158899740000112
Wherein
Figure RE-RE-GDA0002158899740000113
j is the number of a class other than c; calculating a distance matrix D, each element of D being D (c, j), D (c, j) being
Figure RE-RE-GDA0002158899740000114
Wherein
Figure RE-RE-GDA0002158899740000115
Calculating the diffusivity loss value lossσWherein:
Figure RE-RE-GDA0002158899740000116
wherein beta is a proportional function, and the numeric value range of beta is a real number of (0-1); k is the number of classes; i is the sample number of the c-th class participating in training in the mini-batch at present; n is a radical ofCThe number of the feature vectors of the c-th class participating in training in the mini-batch is determined;
Figure RE-RE-GDA0002158899740000117
the feature vector is the ith feature vector of the c-th class in the mini-batch; II |2Representing the 2 norm of the vector x therein, max () representing the operation that takes the larger value as the result is performed on all dimensions of the vector, respectively.
In one embodiment, the first sheetElement for if said current
Figure RE-RE-GDA0002158899740000118
Is an initial value of 0, the value is added
Figure RE-RE-GDA0002158899740000119
Updating the mean value of the c-th class feature vector participating in training in the mini-batch at present; if it is currently said
Figure RE-RE-GDA00021588997400001110
If not, according to
Figure RE-RE-GDA00021588997400001111
Is updated
Figure RE-RE-GDA00021588997400001112
Wherein the content of the first and second substances,
Figure RE-RE-GDA00021588997400001113
taking the mean value of the c-th class feature vector participating in training in the mini-batch at present, wherein alpha is a preset weight coefficient; alpha is more than or equal to 0 and less than or equal to 1; equation Right side
Figure RE-RE-GDA00021588997400001114
To mean the class c eigenvectors before update, left side of equation
Figure RE-RE-GDA00021588997400001115
Is the mean value of the updated class c feature vector.
The invention also provides an embodiment of a convolutional network training device, which comprises:
a memory; and a processor coupled to the memory, the processor configured to perform any of the method embodiments described above based on instructions stored in the memory.
Accordingly, the present invention further provides a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out any of the above-mentioned method embodiments.
It should be noted that not all steps and modules in the above flows and structures are necessary, and some steps or modules may be omitted according to actual needs. The execution order of the steps is not fixed and can be adjusted as required. The division of each module is only for convenience of describing adopted functional division, and in actual implementation, one module may be divided into multiple modules, and the functions of multiple modules may also be implemented by the same module, and these modules may be located in the same device or in different devices.
The hardware modules in the various embodiments may be implemented mechanically or electronically. For example, a hardware module may include a specially designed permanent circuit or logic device (e.g., a special purpose processor such as an FPGA or ASIC) for performing specific operations. A hardware module may also include programmable logic devices or circuits (e.g., including a general-purpose processor or other programmable processor) that are temporarily configured by software to perform certain operations. The implementation of the hardware module in a mechanical manner, or in a dedicated permanent circuit, or in a temporarily configured circuit (e.g., configured by software), may be determined based on cost and time considerations.
The present invention also provides a machine-readable storage medium storing instructions for causing a machine to perform a method as described herein. Specifically, a system or an apparatus equipped with a storage medium on which a software program code that realizes the functions of any of the embodiments described above is stored may be provided, and a computer (or a CPU or MPU) of the system or the apparatus is caused to read out and execute the program code stored in the storage medium. Further, part or all of the actual operations may be performed by an operating system or the like operating on the computer by instructions based on the program code. The functions of any of the above-described embodiments may also be implemented by writing the program code read out from the storage medium to a memory provided in an expansion board inserted into the computer or to a memory provided in an expansion unit connected to the computer, and then causing a CPU or the like mounted on the expansion board or the expansion unit to perform part or all of the actual operations based on the instructions of the program code.
Examples of the storage medium for supplying the program code include floppy disks, hard disks, magneto-optical disks, optical disks (e.g., CD-ROMs, CD-R, CD-RWs, DVD-ROMs, DVD-RAMs, DVD-RWs, DVD + RWs), magnetic tapes, nonvolatile memory cards, and ROMs. Alternatively, the program code may be downloaded from a server computer or the cloud by a communication network.
"exemplary" means "serving as an example, instance, or illustration" herein, and any illustration, embodiment, or steps described as "exemplary" herein should not be construed as a preferred or advantageous alternative. For the sake of simplicity, the drawings are only schematic representations of the parts relevant to the invention, and do not represent the actual structure of the product. In addition, in order to make the drawings concise and understandable, components having the same structure or function in some of the drawings are only schematically illustrated or only labeled. In this document, "a" does not mean that the number of the relevant portions of the present invention is limited to "only one", and "a" does not mean that the number of the relevant portions of the present invention "more than one" is excluded. In this document, "upper", "lower", "front", "rear", "left", "right", "inner", "outer", and the like are used only to indicate relative positional relationships between relevant portions, and do not limit absolute positions of the relevant portions.
The above-listed detailed description is only a specific description of a possible embodiment of the present invention and is not intended to limit the scope of the present invention, and equivalent embodiments or modifications such as combinations, divisions or repetitions of the features without departing from the technical spirit of the present invention are included in the scope of the present invention.

Claims (10)

1. A convolutional network training method, comprising:
for each batch of mini-batch training pictures, after extracting the feature vector of each picture by using a deep convolutional network, calculating a corresponding loss value according to the feature vector, and determining a corresponding diffusivity loss value based on a diffusion principle;
and correcting the loss value according to the diffusivity loss value, and performing back propagation adjustment on the parameters of the deep convolutional network according to the corrected result to finish the training of the small batch of training pictures.
2. The method of claim 1, wherein determining the respective diffusivity loss value based on a diffusion principle comprises:
for each class c corresponding to the current mini-batch training sample, according to the class c characteristic vector corresponding to the mini-batch, the mean value of the current class c characteristic vector is calculated
Figure FDA0002043222940000011
Updating, wherein c is the serial number of the class;
calculating a central matrix M, wherein each element of the central matrix M is M (c, j), and M (c, j) is the mean value of the characteristic vectors of the c-th class
Figure FDA0002043222940000012
And mean of class j feature vectors
Figure FDA0002043222940000013
Middle point of line segment
Figure FDA0002043222940000014
Wherein
Figure FDA0002043222940000015
Figure FDA0002043222940000016
j is the number of a class other than c;
calculating a distance matrix D, each element of D being D (c, j), D (c, j) being
Figure FDA0002043222940000017
Wherein
Figure FDA0002043222940000018
Calculating the diffusivity loss value lossσWherein
Figure FDA0002043222940000019
Figure FDA00020432229400000110
Wherein beta is a proportional function, and the numeric value range of beta is a real number of (0-1); k is the number of classes; i is the sample number of the c-th class participating in training in the mini-batch at present; n is a radical ofCThe number of the feature vectors of the c-th class participating in training in the mini-batch is determined;
Figure FDA00020432229400000111
the feature vector is the ith feature vector of the c-th class in the mini-batch; II |2Representing the 2 norm of the vector x therein, max () representing the operation that takes the larger value as the result is performed on all dimensions of the vector, respectively.
3. The method of claim 2, wherein the mean value of the class c current feature vector is determined
Figure FDA0002043222940000021
The updating comprises the following steps:
if it is currently said
Figure FDA0002043222940000022
Is an initial value of 0, the value is added
Figure FDA0002043222940000023
Updating the mean value of the c-th class feature vector participating in training in the mini-batch at present;
if it is currently said
Figure FDA0002043222940000024
If not, according to
Figure FDA0002043222940000025
Is updated
Figure FDA0002043222940000026
Wherein the content of the first and second substances,
Figure FDA0002043222940000027
taking the mean value of the c-th class feature vector participating in training in the mini-batch at present, wherein alpha is a preset weight coefficient; alpha is more than or equal to 0 and less than or equal to 1; equation Right side
Figure FDA0002043222940000028
To mean the class c eigenvectors before update, left side of equation
Figure FDA0002043222940000029
Is the mean value of the updated class c feature vector.
4. The method of any one of claims 1-3, wherein said modifying said loss value according to said diffusivity loss value comprises:
according to loss ═ loss + lossσCorrecting the loss value loss to obtain a corrected loss value loss', wherein lossσThe diffusivity loss value.
5. A convolutional network training apparatus, comprising:
the system comprises a first unit, a second unit and a third unit, wherein the first unit is used for extracting a feature vector of each picture by utilizing a deep convolutional network for each batch of mini-batch training pictures, calculating a corresponding loss value according to the feature vector and determining a corresponding diffusivity loss value based on a diffusion principle;
and the second unit is used for correcting the loss value according to the diffusivity loss value, and performing back propagation adjustment on the parameters of the deep convolutional network according to the corrected result to finish the training of the small batch of training pictures.
6. The apparatus of claim 5,
the first unit is used for carrying out average value treatment on the current c-class characteristic vector according to the c-class characteristic vector corresponding to the mini-batch for each class c corresponding to the current training sample of the mini-batch
Figure FDA00020432229400000210
Updating, wherein c is the serial number of the class; calculating a central matrix M, wherein each element of the central matrix M is M (c, j), and M (c, j) is the mean value of the characteristic vectors of the c-th class
Figure FDA00020432229400000211
And mean of class j feature vectors
Figure FDA00020432229400000212
Middle point of line segment
Figure FDA00020432229400000213
Wherein
Figure FDA00020432229400000214
j is the number of a class other than c; calculating a distance matrix D, each element of D being D (c, j), D (c, j) being
Figure FDA00020432229400000215
Wherein
Figure FDA0002043222940000031
Calculating the diffusivity loss value lossσWherein
Figure FDA0002043222940000032
Wherein beta is a proportional function, and the numeric value range of beta is a real number of (0-1); k is the number of classes; i is the sample number of the c-th class participating in training in the mini-batch at present; n is a radical ofCThe number of the feature vectors of the c-th class participating in training in the mini-batch is determined;
Figure FDA0002043222940000033
the feature vector is the ith feature vector of the c-th class in the mini-batch; II |2Representing the 2 norm of the vector x therein, max () representing the operation that takes the larger value as the result is performed on all dimensions of the vector, respectively.
7. The apparatus of claim 6,
The first unit is used for judging if the current state is the current state
Figure FDA0002043222940000034
Is an initial value of 0, the value is added
Figure FDA0002043222940000035
Updating the mean value of the c-th class feature vector participating in training in the mini-batch at present; if it is currently said
Figure FDA0002043222940000036
If not, according to
Figure FDA0002043222940000037
Is updated
Figure FDA0002043222940000038
Wherein the content of the first and second substances,
Figure FDA0002043222940000039
taking the mean value of the c-th class feature vector participating in training in the mini-batch at present, wherein alpha is a preset weight coefficient; alpha is more than or equal to 0 and less than or equal to 1; equation Right side
Figure FDA00020432229400000310
To mean the class c eigenvectors before update, left side of equation
Figure FDA00020432229400000311
Is the mean value of the updated class c feature vector.
8. The apparatus according to any one of claims 5-7,
the second unit is used for obtaining the loss + loss according to the loss ═ loss +σCorrecting the loss value loss to obtain a corrected loss value loss', wherein lossσThe diffusivity loss value.
9. A convolutional network training apparatus, comprising:
a memory; and a processor coupled to the memory, the processor configured to perform the method of any of claims 1-4 based on instructions stored in the memory.
10. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the method of any one of claims 1-4.
CN201910348698.9A 2019-04-28 2019-04-28 Convolutional network training method and device Withdrawn CN111860054A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910348698.9A CN111860054A (en) 2019-04-28 2019-04-28 Convolutional network training method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910348698.9A CN111860054A (en) 2019-04-28 2019-04-28 Convolutional network training method and device

Publications (1)

Publication Number Publication Date
CN111860054A true CN111860054A (en) 2020-10-30

Family

ID=72964917

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910348698.9A Withdrawn CN111860054A (en) 2019-04-28 2019-04-28 Convolutional network training method and device

Country Status (1)

Country Link
CN (1) CN111860054A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112529114A (en) * 2021-01-13 2021-03-19 北京云真信科技有限公司 Target information identification method based on GAN, electronic device and medium
CN116649159A (en) * 2023-08-01 2023-08-29 江苏慧岸信息科技有限公司 Edible fungus growth parameter optimizing system and method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112529114A (en) * 2021-01-13 2021-03-19 北京云真信科技有限公司 Target information identification method based on GAN, electronic device and medium
CN116649159A (en) * 2023-08-01 2023-08-29 江苏慧岸信息科技有限公司 Edible fungus growth parameter optimizing system and method
CN116649159B (en) * 2023-08-01 2023-11-07 江苏慧岸信息科技有限公司 Edible fungus growth parameter optimizing system and method

Similar Documents

Publication Publication Date Title
KR102319177B1 (en) Method and apparatus, equipment, and storage medium for determining object pose in an image
CN110020592B (en) Object detection model training method, device, computer equipment and storage medium
CN111192292B (en) Target tracking method and related equipment based on attention mechanism and twin network
US11704817B2 (en) Method, apparatus, terminal, and storage medium for training model
CN109815826B (en) Method and device for generating face attribute model
US8582887B2 (en) Image processing system, learning device and method, and program
US20190325197A1 (en) Methods and apparatuses for searching for target person, devices, and media
CN112967341B (en) Indoor visual positioning method, system, equipment and storage medium based on live-action image
CN110765882B (en) Video tag determination method, device, server and storage medium
CN110648289B (en) Image noise adding processing method and device
CN111914908B (en) Image recognition model training method, image recognition method and related equipment
CN114861842B (en) Few-sample target detection method and device and electronic equipment
CN111860054A (en) Convolutional network training method and device
Su et al. Efficient and accurate face alignment by global regression and cascaded local refinement
CN111382791B (en) Deep learning task processing method, image recognition task processing method and device
CN111931767B (en) Multi-model target detection method, device and system based on picture informativeness and storage medium
CN111583146B (en) Face image deblurring method based on improved multi-scale circulation network
CN113469091A (en) Face recognition method, training method, electronic device and storage medium
CN113610016A (en) Training method, system, equipment and storage medium of video frame feature extraction model
CN113569809A (en) Image processing method, device and computer readable storage medium
CN112949571A (en) Method for identifying age, and training method and device of age identification model
CN115795355B (en) Classification model training method, device and equipment
CN114140802B (en) Text recognition method and device, electronic equipment and storage medium
Han et al. Effective search space reduction for human pose estimation with Viterbi recurrence algorithm
CN111191782A (en) Convolutional network training method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20201030

WW01 Invention patent application withdrawn after publication