CN111612076B

CN111612076B - Image fine recognition method based on DCL and Cascade

Info

Publication number: CN111612076B
Application number: CN202010444726.XA
Authority: CN
Inventors: 李旻; 沈华飞
Original assignee: Nanjing University Smartercity Program Design Co ltd
Current assignee: Nanjing University Smartercity Program Design Co ltd
Priority date: 2020-05-23
Filing date: 2020-05-23
Publication date: 2023-04-18
Anticipated expiration: 2040-05-23
Also published as: CN111612076A

Abstract

The invention discloses an image fine recognition method based on DCL (digital communication link) and Cascade, which is characterized in that an original image is input according to blocks according to the DCL (Destruction and Construction Learning) concept to be disturbed, so that structural information in the original image is damaged, then a Cascade Cascade classifier is used for training a neural network to recognize images with damaged local area sequences, the neural network is forced to grasp key visual areas, the image recognition fineness and recognition efficiency are improved through a Cascade weak classifier and a Cascade strong classifier, and the images are thinned to texture information.

Description

Image fine recognition method based on DCL and Cascade

Technical Field

The invention relates to the technical field of artificial intelligence image recognition, in particular to a DCL and Cascade-based image fine recognition method.

Background

With the rapid progress of science and technology, in the past decade, general target recognition has made steady progress with large-scale labeling data and complex model design. However, identifying fine object categories (e.g., birds, butterflies, automobile models, SKU-grade merchandise) remains a challenging task. Slightly different objects tend to visually resemble cursory glances, but they can be correctly identified by distinctive local area details, so learning the discriminative feature representation locally from distinctive objects plays a key role in fine image recognition. The existing fine identification methods can be roughly divided into two categories, specifically as follows:

(1) One method is that firstly, a target local area with discriminant is positioned, and then classification is carried out according to the discriminant areas, the two-step method needs to add additional boundary box marks on the target or the target local, and the cost of the marks is very high;

(2) Another is to try to locate the discrimination area automatically by attention mechanism in an unsupervised manner, so no additional comments are required. However, these approaches typically require additional network structures (e.g., attention mechanisms), thus introducing additional computational overhead for the training and prediction phases.

Therefore, how to overcome the above problems, improve the accuracy of the fine recognition method, reduce the calculation overhead, and ensure the efficiency is currently continuously solved.

Disclosure of Invention

The invention aims to overcome the problems of high cost and additional calculation overhead introduced in the training and predicting stages of the existing fine recognition method. According to the fine image identification method based on DCL and Cascade, a DCL (Destruct and Construction Learning) branch is introduced to automatically learn a discriminant region by disordering the input of an original image according to blocks and further 'destroying' structural information in the image, firstly, the input image is destroyed to emphasize discriminant local details, and then, semantic correlation between the local regions is modeled to reconstruct the image; on one hand, the DCL automatically positions the judgment area, so that additional marking is not needed during training; on the other hand, the DCL structure is only adopted in a training stage, so that the calculation overhead is not introduced in the prediction process, the disordered images ignore irrelevant areas which are not important for fine recognition and force the network to classify the images based on the local details of discriminability, although the recognition becomes more difficult, the experts can still easily find differences, the neural network classifies and recognizes the damaged images by cascading knowledge classifiers (from weak to strong) of learning experts, and in order to prevent the noise patterns introduced by the network learning damaged global structures, the adversarial resistance loss is used for inhibiting the introduced noise patterns, so that the application prospect is good.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

a DCL and Cascade-based image fine recognition method comprises the following steps,

after an original image is generated, carrying out initial analysis on data of the original image, wherein the original image is a high-definition image;

based on a DCL image destruction algorithm, destroying an original image subjected to initial analysis to emphasize details of local regions with discriminant, modeling semantic correlation between the local regions to reconstruct an image, and enabling a network to classify the reconstructed image based on the local details with discriminant;

step (C), a noise mode and a region reconstruction loss mode are introduced, so that the reconstructed image solves the antagonism loss and the region alignment loss, and a fine identification image is obtained;

classifying the strength of the damaged reconstructed image from weak to strong through a knowledge classifier of a cascade learning expert on the basis of a neural network to form a knowledge classifier;

and (E) identifying and judging the fine identification image obtained in the step (C) according to the accuracy required in advance by comparing the fine identification image with the knowledge classifier.

The fine image recognition method based on DCL and Cascade, step (B), based on DCL image destruction algorithm, destroys the original image after initial analysis to emphasize the detail of the local regions with discriminant, models the semantic correlation between the local regions to reconstruct the image, makes the network classify the reconstructed image based on the local details with discriminant, includes the following steps,

(B1) Given an original image I, the image is first uniformly divided into N sub-regions, each sub-region defined by R _i，j Representing, wherein i and j are horizontal and vertical indexes respectively, i is more than or equal to 1, and N is more than or equal to j;

(B2) From the subregion R _i，j Mixed in their 2D neighbourhoods, for the sub-area of row j, a random vector number sequence q of length N is generated _j Wherein the ith element q _j，i ＝i+r，

Is a random variable subject to uniform distribution, and k is an adjustable parameter defining a neighborhood range (k is more than or equal to 1 and less than N);

(B3) By counting the sequence q of random vectors _j Reordering to obtain a new arrangement of jth row area

This transforms the region coordinates in the original image from (i, j) to σ (i, j) to reconstruct the image.

The fine image recognition method based on DCL and Cascade, step (C), introducing noise mode and region reconstruction loss mode, making the reconstructed image solve the antagonism loss and region alignment loss, obtaining the fine recognition image, includes the following steps,

(C1) Introducing a noise mode to make the reconstructed image solve the antagonism loss

Original image I, use

A k characteristic diagram showing the m layer, which is used for visualizing the characteristics of the backbone classification network ResNet-50 and comprises two conditions of using and not using antagonism loss, and taking the output characteristics of the layer in front of the last fully-connected layer to carry out antagonism learning, wherein the k convolution core of the m convolution layer responds to the real category c,

wherein,

is the weight between the kth feature map and the corresponding class c, i.e. the response r ^k And (I, c) is equal to the weight of the characteristic graph corresponding to the kth convolution kernel multiplied by the weight of the full-connection layer corresponding to c, so that whether the convolution kernel can map the input image to c is measured, and the higher the response is, the higher the credibility of the mapping is.

(C2) Introducing a region reconstruction loss mode to enable the reconstructed image to solve the region alignment loss

(C21) Given an original image I and a corresponding reconstructed image phi (I), a region R at a position (I, j) in the original image I _i，j With the region R in the reconstructed image phi (I) _σ(i，j) Consistency;

(C22) The area alignment network isOutput feature map for nth convolutional layer of classification network

Performing convolution processing on the operation characteristic diagram to obtain output with only two channels;

(C23) The output is processed by a ReLU linear rectification function and average pooling to obtain a feature map with the size of 2 × N, and the output of the area alignment network can be written as:

the two channels in M (I) are respectively corresponding to row and column position coordinates, h is an area alignment network, and theta _loc Is a parameter of the area alignment network, namely, each spatial position point of the output characteristic graph of the two channels predicts an area position, each spatial position point has two values to respectively predict the horizontal and vertical coordinates of the area, a total of N multiplied by N sub-areas are recorded in the area R _σ(i，j) The predicted position is M _σ(i，j) (phi (I)) for the region R _i，j Is M _i，j (I, j), the true values for both predicted positions are (I, j);

(C24) Calculating the region alignment loss L _loc Defined as L of the predicted coordinates and the original coordinates ₁ Distance:

(C25) According to the region alignment penalty L _loc And the reconstructed image is made to solve the region alignment loss.

The fine image recognition method based on DCL and Cascade, step (D), based on neural network, through the knowledge classifier of the Cascade learning expert, from weak to strong to classify the intensity of the damaged reconstructed image, forming the knowledge classifier, includes the following steps,

(D1) Training each sample in the data and giving a weight to the sample;

(D2) Training a weak classifier on the training data and calculating its errorsError rate, then the updated weights on the unified data set again

Weak score of training

The classifier reduces the weight of the paired samples, improves the weight of the error, and has the error rate as follows:

weight value:

increasing the weight value:

reducing the weight value:

(D3) And calculating to obtain a vector D through the parameters, repeating the steps (D1) - (D2) to enter the next iteration, and continuously repeating the training and adjusting the weight until the training error rate is 0 or the target value is reached to form the strong classifier.

In the fine image recognition method based on DCL and Cascade, step (E), by comparing with the knowledge classifier, if the weak classification stage is not executed, the strong classification stage is not entered.

The invention has the beneficial effects that: according to the image fine recognition method based on the DCL and the Cascade, a DCL branch is introduced to automatically learn a discriminant region, firstly, an input image is damaged to emphasize local details with discriminant, and then semantic correlation between local regions is modeled to reconstruct an image; on one hand, the DCL automatically positions the discrimination area, so that additional marking is not needed during training; on the other hand, the DCL structure is only used in the training phase, so that no computational overhead is introduced in prediction, and for "breaking", a regional obfuscation mechanism is used to divide the input image into local tiles and then randomly obfuscate. Local details play a more important role in fine recognition than global structures, since images from different fine classes often have the same global structure or shape, differing only in local details. Leaving up the global structure to maintain local details may force the network to focus on local areas that are discriminative. After being scrambled, irrelevant areas that are not important for fine recognition will be ignored and the network will be forced to classify the image based on local details of discriminability. Although recognition becomes more difficult, the expert can easily find the difference. The neural network classifies and identifies the damaged images through a knowledge classifier (from weak to strong) of a cascade learning expert, and in order to prevent noise patterns introduced by network learning damage of a global structure, antagonism loss is used for restraining the introduced noise patterns, so that the neural network has a good application prospect.

Drawings

FIG. 1 is a flow chart of the image fine recognition method based on DCL and Cascade of the present invention;

fig. 2 is a flow chart of the present invention for calculating vector D.

Detailed Description

The invention will be further described with reference to the accompanying drawings.

As shown in fig. 1, the image fine recognition method based on DCL and Cascade of the present invention includes the following steps,

step (B), based on DCL image destruction algorithm, destroying the original image after initial analysis to emphasize the details of local regions with discriminant, modeling semantic correlation between the local regions to reconstruct the image, and making the network classify the reconstructed image based on the local details with discriminant, including the following steps,

(B2) From the subregion R _i，j Mixed in their 2D neighbourhoods, for the sub-area of row j, a random vector number sequence q of length N is generated _j Wherein, the ith element q _j，i ＝i+r，

Thus, the region coordinates in the original image are converted from (i, j) to sigma (i, j) to reconstruct the image;

at the same time, it can be verified that:

similarly, we can use it in the column

To rearrange the regions while also verifying that:

thus, the region coordinates in the original image are converted from (i, j) to σ (i, j):

the scrambling method can destroy the global structure and simultaneously ensure that the local area randomly changes in the neighborhood with adjustable size, the original image I, the destroyed version phi (I) of the original image I and the one-vs-all label indicating the real fine category of the original image I are combined together during training<I，φ(I)，l>. The classification network maps the input image to a probability distribution vector C (I, theta) _cls ) Wherein θ _cls Are learnable parameters that classify all layers in the network. Loss function L of a classification network _cls Can be written as:

where Γ is all training sets;

step (C), a noise mode and a region reconstruction loss mode are introduced to enable the reconstructed image to solve the antagonism loss and the region alignment loss to obtain a fine identification image,

Original image I, use

wherein,

is the weight between the kth feature map and the corresponding class c, i.e. the response r ^k (I, c) is equal to the weight of the characteristic graph corresponding to the kth convolution kernel multiplied by the weight of the c corresponding to the full connection layer, so that whether the input image can be mapped to the c by the convolution kernel is measured, and the higher the response is, the higher the reliability of the mapping is; to minimize the loss, neither rough features of the overall contour nor edge-type noise patterns can be learned, but features common to both are learned, thus enhancing local detail with discriminability and filtering out irrelevant features;

(C2) Lead toEntering a region reconstruction loss mode, so that the reconstructed image solves the region alignment loss (C21), and the region R located at the position (I, j) in the original image I is given the original image I and the corresponding reconstructed image phi (I) _i，j With the region R in the reconstructed image phi (I) _σ(i，j) Consistency;

(C22) The area alignment network is the output characteristic diagram of the nth convolution layer of the classification network

Performing convolution processing on the operation characteristic diagram to obtain output of only two channels;

the two channels in M (I) respectively correspond to the position coordinates of a row and a column, h is an area alignment network, and theta is _loc Is a parameter of the area alignment network, namely, each spatial position point of the output characteristic graph of the two channels predicts an area position, each spatial position point has two values to respectively predict the horizontal and vertical coordinates of the area, a total of N multiplied by N sub-areas are recorded in the area R _σ(i，j) The predicted position is M _σ(i，j) (phi (I)) for the region R _i，j Is M _i，j (I, j), the true values for both predicted positions are (I, j);

(C25) According to the region alignment penalty L _loc Enabling the reconstructed image to solve the problem of region alignment loss;

step (D) of classifying the intensity of the damaged reconstructed image from weak to strong by a knowledge classifier of a cascade learning expert based on a neural network to form the knowledge classifier, comprising the steps of,

(D1) Training each sample in the data and giving a weight to the sample;

(D2) Training a weak classifier on the training data and calculating its error rate, then updating the weight again on the unified data set

Training weak classifier, paired->

The sample weight is reduced, the error weight is improved, and the error rate is as follows:

weight value:

increasing the weight value:

reducing the weight value:

(D3) According to the parameters, a vector D is obtained through calculation according to the process shown in FIG. 2, the steps (D1) - (D2) are repeated to enter the next iteration, and the training and the weight adjustment are repeated continuously until the training error rate is 0 or the target value is reached, so that a strong classifier is formed.

And (E) identifying and judging the fine identification image obtained in the step (C) according to the accuracy required in advance by comparing the fine identification image with a knowledge classifier, and if the weak classification stage is not executed, the strong classification stage is not started by comparing the fine identification image with the knowledge classifier.

According to the image fine recognition method based on the DCL and the Cascade, the judgment area is automatically positioned in the DCL, so that additional marking is not needed during training; the DCL structure is only adopted in a training stage, so that the calculation cost is not introduced during prediction, the global structure is abandoned to keep local details, the network can be forced to focus on local areas with discriminability, after the local areas are disturbed, irrelevant areas which are not important for fine recognition can be ignored, and the network is forced to classify the images based on the discriminability local details.

In summary, in the image fine recognition method based on the DCL and the Cascade of the present invention, the DCL branch is introduced to automatically learn the discriminant region, firstly, the input image is destroyed to emphasize the discriminant local details, and then the semantic correlation between the local regions is modeled to reconstruct the image; on one hand, the DCL automatically positions the discrimination area, so that additional marking is not needed during training; on the other hand, the DCL structure is only used in the training phase, so that no computational overhead is introduced in prediction, and for "breaking", a regional obfuscation mechanism is used to divide the input image into local tiles and then randomly obfuscate. Local details play a more important role in fine recognition than global structures, since images from different fine classes often have the same global structure or shape, differing only in local details. Leaving up the global structure to maintain local details may force the network to focus on local areas that are discriminative. After being scrambled, irrelevant areas that are not important for fine recognition will be ignored and the network will be forced to classify the image based on local details of discriminability. Although recognition becomes more difficult, the expert can still easily find the differences. The neural network classifies and identifies the damaged images through a knowledge classifier (from weak to strong) of a cascade learning expert, and in order to prevent noise patterns introduced by network learning damage of a global structure, antagonism loss is used for restraining the introduced noise patterns, so that the neural network has a good application prospect.

The foregoing illustrates and describes the principles, general features, and advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. The fine image recognition method based on DCL and Cascade is characterized by comprising the following steps: comprises the following steps of (a) carrying out,

based on a DCL image destruction algorithm, destroying an original image after initial analysis to emphasize details of discriminant local regions, modeling semantic correlation between the local regions to reconstruct the image, and enabling a network to classify the reconstructed image based on the discriminant local details;

2. The fine image recognition method based on DCL and Cascade according to claim 1, wherein: step (B), based on DCL image destruction algorithm, destroying the original image after initial analysis to emphasize the details of local regions with discriminant, modeling semantic correlation between the local regions to reconstruct the image, and making the network classify the reconstructed image based on the local details with discriminant, including the following steps,

(B1) Given an original image I, the image is first uniformly divided into N sub-regions, each sub-region defined by R _i，j Representing, wherein i and j are respectively horizontal and vertical indexes, i is more than or equal to 1, and N is more than or equal to j;

Is a random variable subject to uniform distribution, k is an adjustable parameter defining a neighborhood range, and k is more than or equal to 1 and is less than N;

3. The fine image recognition method based on DCL and Cascade according to claim 1, wherein: step (C), introducing a noise mode and a region reconstruction loss mode to enable the reconstructed image to solve the antagonism loss and the region alignment loss to obtain a fine identification image, comprising the following steps,

(C1) Introducing a noise pattern to allow the reconstructed image to account for the loss of antagonism

Original image I, use

wherein,

is the weight between the kth feature map and the corresponding class c, i.e. the response r ^k (I, c) is equal to the weight of the characteristic graph corresponding to the kth convolution kernel multiplied by the weight of the c corresponding to the full connection layer, so as to measure whether the convolution kernel can map the input image to the c, and the larger the response isIndicating a higher confidence in the mapping.

4. The fine image recognition method based on DCL and Cascade according to claim 2, wherein: step (C), introducing a noise mode and a region reconstruction loss mode to enable the reconstructed image to solve the antagonism loss and the region alignment loss to obtain a fine identification image, comprising the following steps,

(C21) Given an original image I and a corresponding reconstructed image phi (I), a region R at a position (I, j) in the original image I _i，j With the region R in the reconstructed image phi (I) _σ(i，j) The consistency is achieved;

(C22) The area alignment network is the output characteristic winter of the nth convolution layer of the classification network

Performing I multiplied by I convolution processing on the operation characteristic diagram to obtain output with only two channels;

(C23) And obtaining a characteristic diagram with the size of 2 multiplied by N by the output through a ReLU linear rectification function and average pooling, wherein the output of the area alignment network is written as follows:

the two channels in M (I) are respectively corresponding to row and column position coordinates, h is an area alignment network, and theta _loc Is the parameter of the area alignment network, namely, each spatial position point of the output characteristic graph of the two channels predicts an area position, each spatial position point has two values to respectively predict the horizontal and vertical coordinates of the area, and has N multiplied by N sub-areas in total, and the area R is recorded _σ(i，j) The predicted position is M _σ(i，j) (phi (I)) is different for each region _i，j Is M _i，j (I, j), the true values for both predicted positions are (I, j);

(C25) According to the region alignment penalty L _loc And the reconstructed image is enabled to solve the problem of region alignment loss.

5. The DCL and Cascade-based image fine recognition method according to claim 1, wherein: step (D) of classifying the intensity of the damaged reconstructed image from weak to strong by a knowledge classifier of a cascade learning expert based on a neural network to form the knowledge classifier, comprising the steps of,

(D1) Training each sample in the data and giving a weight to the sample;

Training weak classifier, paired->

weight value:

increasing the weight value:

reducing the weight value:

6. The fine image recognition method based on DCL and Cascade according to claim 1, wherein: and (E) comparing the weak classification stage with the knowledge classifier, and if the weak classification stage is not executed, not entering the strong classification stage.