CN112132147A

CN112132147A - Learning method based on quality node model

Info

Publication number: CN112132147A
Application number: CN202010818346.8A
Authority: CN
Inventors: 周迪; 肖海林; 曹广; 张仲非; 刘鹏; 韦文生
Original assignee: Zhejiang University ZJU; Zhejiang Uniview Technologies Co Ltd
Current assignee: Zhejiang University ZJU; Zhejiang Uniview Technologies Co Ltd
Priority date: 2020-08-14
Filing date: 2020-08-14
Publication date: 2020-12-25
Anticipated expiration: 2040-08-14
Also published as: CN112132147B

Abstract

The invention discloses a learning method based on a quality node model, which introduces a gravity model based on quality nodes to realize simple and interpretable calculation of a target loss function; the gravity of each type is used as a classification prediction result, so that the belonged classification can be accurately determined, and the classification identification accuracy is improved; and a small sample learning mechanism is utilized, a small number of labeled samples are utilized for training, heavy labeling on massive samples is not needed, massive new samples are not needed to be applied to training the convolutional neural network, the training time is greatly reduced, and higher identification accuracy is achieved under the same training workload.

Description

Learning method based on quality node model

Technical Field

The application belongs to the technical field of image recognition, and particularly relates to a learning method based on a quality node model, in particular to an image recognition method based on quality node and small sample learning.

Background

With the rapid development of deep learning in the image field, the recognition of images by a computer approaches or even surpasses the performance of human beings, and particularly, machine vision obtains good effect in the character recognition field and plays a vital role in improving the working and living efficiency.

At present, the mainstream machine learning method is to train massive (more than million levels) labeled data by using a deep learning model and continuously iteratively optimize parameters of a neural network. The disadvantages of this approach are evident: massive sample data needs to be marked, and the generalization capability is weak and the convergence speed is slow.

Disclosure of Invention

The application aims to provide a learning method based on a quality node model, which greatly reduces training time and has higher identification accuracy under the same training workload.

In order to achieve the purpose, the technical scheme adopted by the application is as follows:

a learning method based on a quality node model comprises the following steps:

step 1, randomly extracting W types from all character types in a character picture library, extracting a plurality of marked samples from each type corresponding to the W types to form a sample set S, and randomly extracting one or more marked samples from the rest marked samples of each type corresponding to the W types to form a sample set Q, wherein each marked sample is taken as a point with quality;

step 2, inputting each sample with the label in the sampling sample set S into a convolutional neural network, and taking a feature vector of each sample with the label output by the convolutional neural network as a typical vector;

step 3, inputting all labeled samples in the sample set Q into the convolutional neural network respectively to obtain the feature vector of each labeled sample output by the convolutional neural network, and calculating the gravity between each labeled sample and each class according to the quality node model as follows:

in the formula, F_q,kM, representing the attraction of class k to a labeled sample Q in the sample set Q_{k total}The sum of the masses, N, of all labeled samples representing class k in the sample set S_kNumber of labeled samples, M, representing class k in sample set S_s,k,iRepresenting the mass, m, of the ith labeled sample of class k in the sample set S_qRepresenting the mass of the marked sample Q in the sample set Q, Z_s,k,iRepresentative vector, Z, representing the ith member of class k in sample set S_qRepresenting a feature vector with a labeled sample Q in the sample set Q, d²() Represents the square of the euclidean distance;

step 4, calculating a loss function according to the gravitation between all labeled samples in the sample set Q and each category, and optimizing the loss function by using a random gradient descent method to obtain an optimized convolutional neural network;

and 5, taking the optimized convolutional neural network, outputting a feature vector to be classified to the character image to be classified, calculating the gravitation between the character image to be classified and each corresponding class according to the feature vector to be classified, and selecting the class with the highest gravitation as a classification prediction result to be output.

Several alternatives are provided below, but not as an additional limitation to the above general solution, but merely as a further addition or preference, each alternative being combinable individually for the above general solution or among several alternatives without technical or logical contradictions.

Preferably, each labeled sample is taken as a point with mass, and the method comprises the following steps:

all labeled samples serve as points of quality, labeled samples of different fonts have different qualities, and the size of the quality is positively correlated with the standard degree of the font.

Preferably, the convolutional neural network comprises B concatenated convolutional blocks, each convolutional block comprising a convolutional layer with a convolutional kernel of 3 × 3, a batch regularization process, a ReLU activation function, and a max pooling layer with a pooling kernel of 2 × 2.

Preferably, the step 4 calculates a loss function according to the gravitation between all labeled samples in the sample set Q and each class, and optimizes the loss function by using a random gradient descent method to obtain an optimized convolutional neural network, including:

if the labeled sample Q in the sample set Q belongs to the class k, the total attraction of the labeled sample Q to the W classes is F_{q total}：

In the formula, F_q,kThe gravity of class k to the labeled sample Q in the sample set Q is a positive gravity, F_q,jRepresenting the gravitation of the class j to the labeled sample Q in the sample set Q, wherein the gravitation is negative gravitation, and j represents the rest classes except the class k in the W classes;

thus calculating a loss function of

In the formula (I), the compound is shown in the specification,

the number of the marked samples in the sample set Q is D;

optimizing a loss function by using a random gradient descent method to obtain a neural network parameter after the optimization;

inputting the neural network parameters obtained after the optimization into a convolutional neural network, repeating the steps 1 to 4, continuously training the neural network parameters until the loss of the loss function is minimized after the optimization, obtaining the optimal neural network parameters, and determining the convolutional neural network after the optimization.

According to the learning method based on the quality node model, a gravity model based on the quality nodes is introduced to achieve the purpose that target loss function calculation can be simply and interpretably realized; the gravity of each type is used as a classification prediction result, so that the belonged classification can be accurately determined, and the classification identification accuracy is improved; and a small sample learning mechanism is utilized, a small number of labeled samples are utilized for training, heavy labeling on massive samples is not needed, massive new samples are not needed to be applied to training the convolutional neural network, the training time is greatly reduced, and higher identification accuracy is achieved under the same training workload.

Drawings

Fig. 1 is a flowchart of a learning method based on a quality node model according to the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.

The embodiment provides a learning method based on a quality node model, and particularly provides a method for identifying and classifying images based on quality nodes and small sample learning.

The training of the neural network can be completed by only using a relatively small number (thousands of sheets) of labeled samples, so that the training time is remarkably shortened. The method and the device can be used for training and recognizing any type of images, such as human faces or character photos. In the embodiment, a scheme is described by taking a Chinese character picture as an example.

As shown in fig. 1, the learning method based on the quality node model of the present embodiment includes the following steps:

step 1, randomly extracting W types from all character types in a character picture library, extracting a plurality of marked samples from each type corresponding to the W types to form a sample set S, and randomly extracting one or more marked samples from the rest marked samples of each type corresponding to the W types to form a sample set Q, wherein each marked sample is taken as a point with quality.

In this embodiment, photos of the same chinese character are taken as a class, and partial samples (called N shot, N is the number of class samples) of partial classes (called W way, W is the number of classes) therein are taken as typical groups of the class, and the number of group members of each class is equal.

Aiming at the principle of the quality node, the sample space is imagined as a universe, each sample in the clan is a planet, and the quality of each planet is set to be 1. Any sample should be attracted by the celestial bodies in the family to which the sample belongs as much as possible, the attraction of the family to which the sample belongs is set to be positive, the attraction of other families to the sample is set to be negative, and the larger the sum is, the better the sum is. The difference from the physical universe system is that: there is no requirement that the sum be positive, nor is there any reason to be positive; vector effects do not need to be considered, and only positive and negative are considered.

In view of the fact that the chinese character samples have different fonts and the analysis and recognition difficulty levels of the different fonts are different, in an embodiment, in order to further improve the accuracy of character recognition, on the premise that all labeled samples are taken as points with quality, labeled samples with different fonts are further set to have different qualities, and the quality is positively correlated with the standard degree of the fonts.

For example, the normal script print, the cursive script print, the handwritten script print, and the like are common character fonts, and the standard degree is sequentially reduced, so the weight of the normal script print can be set to be the largest, and the weight of the handwritten script print can be set to be the smallest, so the quality of the normal script print is set to be 3, the quality of the cursive script print is set to be 2, and the quality of the handwritten script is set to be 1, and these qualities are recorded in the label information of the sample.

After different qualities are set for the samples according to different fonts, namely the more standard the writing is, the greater the attraction to the belonged group is, and vice versa. The method effectively considers the influence of the font on the recognition result so as to ensure the accuracy of recognition and classification. For the case of different fonts with different masses, the total gravity generated by each family must be divided by the total mass to normalize, considering that the total mass of each typical family must be equal.

It is to be understood that the regular script print form, the cursive script print form, and the handwriting form are used as examples in the present embodiment, and the present invention is not limited to the regular script print form, the cursive script print form, and the handwriting form, and the corresponding quality may be set according to the actually existing font type in other embodiments.

In addition, the size of the text picture is set to be 190 pixels by 190 pixels in the embodiment, so that the post-image processing is facilitated.

And 2, inputting each sample with the label in the sampling sample set S into the convolutional neural network, and taking the feature vector of each sample with the label output by the convolutional neural network as a typical vector.

The convolutional neural network is a deep feedforward artificial neural network, and the feature vectors can be extracted by adopting the conventional convolutional network. Of course, in order to ensure the effectiveness of feature vector extraction, a preferred convolutional neural network structure provided in this embodiment is as follows:

the neural network comprises B concatenated convolution blocks, each convolution block comprises a convolution layer with a convolution kernel of 3 x 3, a batch regularization process, a ReLU activation function and a maximum pooling layer with a pooling kernel of 2 x 2. The value of B is usually 4-8, and is preferably set to 6 in the embodiment.

That is, the convolutional neural network is preferably formed by connecting 6 convolutional blocks in series, and each convolutional block is defined as follows:

the input of a first convolution block in the convolution neural network is 1 channel, and the output is 64 channels; the second through sixth volume blocks have 64 channels as inputs and 64 channels as outputs. The whole convolutional neural network is defined as follows:

since the convolution kernel is 3 × 3 and the pooling kernel is 2 × 2, the 190 × 190 gray text image passes through 6 convolution blocks, and finally the feature map is output as 1 × 64. Combining 1 x 1 of 64 channels can obtain 64-dimensional feature vectors to fully express the feature details of the characters, thereby facilitating classification and identification.

And 3, respectively inputting all marked samples in the sample set Q into the convolutional neural network to obtain the characteristic vector of each marked sample output by the convolutional neural network, and calculating the gravity between each marked sample and each class according to a quality node (gravity) model.

The gravity calculation formula is as follows:

in the formula, F_q,kM, representing the attraction of class k to a labeled sample Q in the sample set Q_{k total}The sum of the masses, N, of all labeled samples representing class k in the sample set S_kNumber of labeled samples, M, representing class k in sample set S_s,k,iRepresenting the mass, m, of the ith labeled sample of class k in the sample set S_qRepresenting the mass of the marked sample Q in the sample set Q, Z_s,k,iRepresentative vector, Z, representing the ith member of class k in sample set S_qRepresenting a sampleFeature vector with labeled sample Q in set Q, d²() Representing the square of the euclidean distance.

According to the gravity rule, if the marked sample q belongs to the class k, the gravity of the marked sample q in the class k is positive; if the labeled sample q does not belong to the class k, the attraction of the class k to the labeled sample q is a negative attraction. For convenience of calculation, the values of the positive attraction and the negative attraction are positive numbers, and the attraction generated by the category to which the gravity belongs is the largest.

And 4, calculating a loss function according to the gravitation between all labeled samples in the sample set Q and each class, and optimizing the loss function by using a random gradient descent method to obtain an optimized convolutional neural network.

When calculating the loss function from the gravity, it is necessary to calculate the total gravity of each class for the labeled sample, and since the class to which the labeled sample belongs is known, it is possible to know what class has a positive gravity for it and which class has a negative gravity for it, and therefore the total gravity is calculated as follows.

In the formula, F_q,kThe gravity of class k to the labeled sample Q in the sample set Q is a positive gravity, F_q,jAnd representing the attraction of the class j to the labeled sample Q in the sample set Q, wherein the attraction is a negative attraction, and j represents the rest of the W classes except the class k.

Thus calculating a loss function of

In the formula (I), the compound is shown in the specification,

and D is the number of the labeled samples in the sample set Q.

And optimizing the loss function by using a random gradient descent method to obtain the optimized neural network parameters.

The random gradient descent method is a commonly used unconstrained optimization problem, and therefore, the steps performed in this embodiment will not be described in detail.

It is easy to understand that the recognition of the character image to be classified is to identify which character is in the character image, so that each category mentioned in step 5 is all categories in a specific environment. For example, if the text image to be classified is a chinese character in a certain book, the categories in step 5 are all the text categories in the chinese character dictionary, or all the text categories referred to in the book, so as to identify which text the text to be classified is.

And the formula for calculating the gravitation can refer to formula 1, the text image to be classified corresponds to the labeled sample q in formula 1, each corresponding category corresponds to the sample set S in formula 1, and the calculation can be performed by analogy.

From the aspect of neural network operation, the steps 1 to 4 are equivalent to training optimization of the neural network, the step 5 is equivalent to execution of the neural network, and the specific steps of the execution can be as follows:

1) calculate a typical vector for each class: selecting a plurality of pictures with marked samples from each class, sending the pictures into an optimized convolutional neural network, and outputting the feature vector of each sample with the mark in each class as a typical vector by the convolutional neural network.

2) Calculating a feature vector of a sample to be classified: and selecting a character picture to be classified, and sending the character picture to the optimized convolutional neural network to obtain the feature vector of the character picture to be classified.

3) Prediction category: and (3) taking the character picture to be classified in the step 2) as a sample q with a label, taking one category in the step 1) as a category k, and substituting the category k into the formula (1) to calculate the gravity of the character picture to be classified by the category k. Similarly, the gravity of the character picture to be classified is calculated by replacing another category in the step 1), and after the gravity between the character picture to be classified and each category is obtained, the category with the largest gravity is selected as a prediction result to be output.

It should be understood that, although the steps in the flowchart of fig. 1 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 1 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A learning method based on a quality node model is characterized in that the learning method based on the quality node model comprises the following steps:

2. The quality node model-based learning method of claim 1, wherein each labeled sample is taken as a quality point, comprising:

3. The quality node model-based learning method of claim 1, wherein the convolutional neural network comprises B concatenated convolutional blocks, each convolutional block comprising a convolutional layer with a convolutional kernel of 3 x 3, a batch regularization process, a ReLU activation function, and a max pooling layer with a pooling kernel of 2 x 2.

4. The learning method based on the quality node model according to claim 1, wherein the step 4 calculates a loss function according to the gravity between all labeled samples in the sample set Q and each class, and optimizes the loss function by using a stochastic gradient descent method to obtain an optimized convolutional neural network, which comprises:

F_{q total}＝F_q,k–∑_j(j≠k)F_q,j

thus calculating a loss function of

In the formula (I), the compound is shown in the specification,

the number of the marked samples in the sample set Q is D;