CN109934293B

CN109934293B - Image recognition method, device, medium and confusion perception convolutional neural network

Info

Publication number: CN109934293B
Application number: CN201910198639.8A
Authority: CN
Inventors: 钟宝江; 言俐光
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2019-03-15
Filing date: 2019-03-15
Publication date: 2023-06-13
Anticipated expiration: 2039-03-15
Also published as: CN109934293A

Abstract

The embodiment of the invention discloses an image recognition method, an image recognition device, image recognition equipment, computer-readable storage media and a confusion perception convolutional neural network. The confusion perception convolutional neural network comprises a prediction classifier, a confusion perception model, a correction classifier group and a probability averaging layer, wherein the prediction classifier is a traditional convolutional neural network classifier trained by using a training sample set. The confusion perception model is constructed based on a confusion matrix obtained by cross verification of a prediction classifier on a training sample set; each correction classifier is obtained by training confusing class sample data with fuzzy boundaries in a training sample set by using a confusing perception model as a decision system; the probability average layer outputs a classification result of the image to be recognized according to the class probability output by the prediction classifier and the class probability output by the target correction classifier, and the target correction classifier is a correction classifier selected by the confusion perception model according to the prediction class of the prediction classifier. The method and the device are beneficial to improving the accuracy of image identification.

Description

Image recognition method, device, medium and confusion perception convolutional neural network

Technical Field

The embodiment of the invention relates to the technical field of image classification and identification, in particular to an image identification method, an image identification device, an image identification equipment, a computer readable storage medium and a confusion perception convolutional neural network.

Background

With the rapid development of computer vision technology, the requirements for image classification and identification are increasing. Before classifying and identifying the images, preprocessing steps such as binarization and standardization are generally carried out on the input images, and compared with the traditional machine learning method of manual feature extraction and classifier classification, the method of automatically extracting features in the images and classifying the features by using a convolutional neural network is more accurate and efficient.

Convolutional neural networks (Convolutional Neural Networks, CNN) are a class of feedforward neural networks that contain convolutional computations and have a depth structure with most advanced performance among various tasks of computer vision, such as image classification, object detection, semantic segmentation, etc. Previous research has focused mainly on enhancing CNN components such as the convergence layer or the activation unit.

Most existing convolutional neural network classifiers for image classification employ a flat structure that treats all classes as independent classes and ignores their visual separability, in actual classification, some classes may be more difficult to distinguish than others, thus requiring a more specialized classifier. For example, in the CIFAR-10 dataset, "cat" is easily distinguished from "truck", but it may be difficult to distinguish "cat" from "dog" due to the fuzzy boundary between some categories. The ResNet-18 classifier can achieve 94.63% accuracy on this dataset, however, the misclassification ratio r between 'cat' and 'dog' reaches 21.04%, which is far higher than any other two categories.

Generalized hierarchical classification refers to specifying the class to which a class object belongs in a class hierarchy according to a large scale class hierarchy. The classification object may be a text object, such as entry information of hundred degrees encyclopedia, or may be a multimedia object, such as information of video, image, music, etc. The hierarchical classification method can be used for classifying manually, or can be used for automatically classifying based on machine learning or automatic classification with expert verification, for example, related prior art proposes a hierarchical classification problem described by using a hierarchical structure of categories, the number of categories of an instance and 3 attributes of a label path depth of the instance. In large-scale image recognition tasks, the classes are not only simple level relations, for example, in large-scale visual recognition competition imagenets, 1000 classes of images are included, some classes are not strong in visual separability, and similar class relations exist among the classes. Hierarchical classification solves the classification problem by embedding a classifier into two or more class hierarchies, as shown in fig. 1. The upper classifier produces coarse classification results and the lower classifier produces fine classification results. In hierarchical classification, the hierarchy may be predefined or learned by top-down or bottom-up methods.

Although the existing hierarchical classification method can improve the classification accuracy to a certain extent, the existing hierarchical classification method faces the problem of error propagation, that is, the classification of the classification error of the upper-level classifier propagates to the classifier of the lower level, and finally unavoidable classification errors are caused. For example, the picture of class cat in fig. 1 cannot be classified correctly by the fine classifier if it is classified into the coarse class N in the first classification.

Disclosure of Invention

The embodiment of the disclosure provides an image recognition method, an image recognition device, image recognition equipment, computer-readable storage media and a confusion perception convolutional neural network, solves the defects existing in the related art, and is beneficial to improving the accuracy of image classification.

In order to solve the technical problems, the embodiment of the invention provides the following technical scheme:

the embodiment of the invention provides a confusion perception convolutional neural network, which comprises a prediction classifier, a confusion perception model, a plurality of correction classifiers and a probability average layer, wherein the prediction classifier is used for predicting the confusion perception model;

the prediction classifier is a convolutional neural network classifier trained by using a training sample set;

the confusion perception model is a model constructed based on a confusion matrix corresponding to the prediction classifier, and the confusion matrix is obtained by performing cross-validation on the training sample set;

Each correction classifier is obtained by training confusing class sample data with fuzzy boundaries in the training sample set by using the confusing perception model as a decision system;

the probability average layer is used for outputting a classification result of the image to be recognized according to the class probability output by the prediction classifier and the class probability output by the target correction classifier, and the target correction classifier is a correction classifier selected by the confusion perception model according to the prediction class of the prediction classifier.

Optionally, the confusion matrix is obtained by cross-validation on the training sample set, and the method comprises the following steps:

dividing the training sample set into a plurality of sub-training sets;

for each sub-training set, taking the current sub-training set as a verification set, taking the rest sub-training sets which are not the current sub-training set as training sets to train the prediction classifier, and testing the verification set by using the trained prediction classifier to obtain a misclassified image;

summarizing misclassified images obtained by taking each sub-training set as a verification set to construct the confusion matrix.

Optionally, the number of training sample images included in each sub-training set is the same.

Optionally, the probability average layer comprises a normalization module and a probability calculation module;

The normalization module is used for performing normalization processing on the class probability output by the prediction classifier and the class probability output by the target correction classifier;

and the probability calculation module calculates the class probability according to the standardized processing to obtain the class to which the image to be identified belongs.

Optionally, the normalization module performs normalization processing on the class probability output by the prediction classifier and the class probability output by the target correction classifier according to the following formula:

where z is a K-dimensional vector and its jth element z _j The (0, 1) probability mapped to the interval is σ (z) _j 。

Optionally, the probability calculation module determines the category label to which the image to be identified belongs by using the following formula:

wherein X is the image to be identified, y is the class label of the image to be identified, B _j The class probabilities output for the prediction classifier,

correcting the class probability of the classifier output for said target,/->

And the class corresponding to the confusable class sample data.

Another aspect of the embodiment of the present invention provides an image recognition method, including:

inputting an image to be identified into a pre-constructed confusion perception convolutional neural network;

Invoking a prediction classifier of the confusion perception convolutional neural network to identify the image to be identified, so as to obtain a prediction category and a first category probability of the image to be identified;

invoking a confusion perception model of the confusion perception convolutional neural network to select a target correction classifier according to the prediction category;

identifying the image to be identified by using the target correction classifier to obtain a second class probability of the image to be identified;

and outputting a classification result of the image to be identified according to the first class probability and the second class probability by using a probability average layer of the confusion perception convolutional neural network.

The embodiment of the invention also provides an image recognition device, which comprises:

the image input module is used for inputting the image to be identified into a pre-constructed confusion perception convolutional neural network;

the image recognition module is used for calling a prediction classifier of the confusion perception convolutional neural network to recognize the image to be recognized, so as to obtain a prediction category and a first category probability of the image to be recognized; invoking a confusion perception model of the confusion perception convolutional neural network to select a target correction classifier according to the prediction category; identifying the image to be identified by using the target correction classifier to obtain a second class probability of the image to be identified; and outputting a classification result of the image to be identified according to the first class probability and the second class probability by using a probability average layer of the confusion perception convolutional neural network.

The embodiment of the invention also provides image recognition equipment, which comprises a processor, wherein the processor is used for realizing the steps of the image recognition method when executing the computer program stored in the memory.

The embodiment of the invention finally provides a computer readable storage medium, on which an image recognition program is stored, which when executed by a processor implements the steps of the image recognition method according to any one of the preceding claims.

The technical scheme provided by the application has the advantages that a confusion perception convolutional neural network structure with prediction correction layering is provided, a confusion matrix is obtained through cross verification, and the confusion perception model constructed by the confusion matrix is used for identifying fuzzy boundary data types; by using a predictive correction hierarchy, the CNN model is able to distinguish classes with fuzzy boundaries. On one hand, the processing capacity of the CNN model on category fuzzy boundaries is improved, the image classification accuracy can be effectively improved, on the other hand, the error propagation problem caused by hierarchical classification is avoided, and because a more accurate estimated confusion matrix is used, the larger improvement can be obtained on a small-scale data set.

In addition, the embodiment of the invention also provides a corresponding implementation method, a device, equipment and a computer readable storage medium for the image recognition method, so that the confusion perception convolutional neural network has more practicability, and the method, the device, the equipment and the computer readable storage medium have corresponding advantages.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the related art, the drawings that are required to be used in the embodiments or the description of the related art will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort to those of ordinary skill in the art.

FIG. 1 is a schematic diagram of a two-level class hierarchy provided in an embodiment of the present invention;

FIG. 2 is a schematic diagram of a structural framework of one implementation of a confusion-aware convolutional neural network provided by an embodiment of the present invention;

FIG. 3 is a schematic flow chart of a cross-validation construction confusion matrix according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of an image sample of a Mnist dataset provided by an embodiment of the present invention;

FIG. 5 is a schematic diagram of a Mnist ConvNet network structure according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a LeNet-5 network architecture according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a ResNet-18 network architecture according to an embodiment of the present invention;

fig. 8 is a schematic flow chart of an image recognition method according to an embodiment of the present invention;

FIG. 9 is a flowchart of an exemplary image recognition method according to an embodiment of the present invention;

fig. 10 is a block diagram of an embodiment of an image recognition apparatus according to the present invention.

Detailed Description

In order to better understand the aspects of the present invention, the present invention will be described in further detail with reference to the accompanying drawings and detailed description. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims of this application and in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may include other steps or elements not expressly listed.

A prior art hierarchical deep neural network (Hierarchical Deep Convolutional Neural Networks, HD-CNN) achieves a greater improvement in accuracy over convolutional neural networks used on ImageNet and CIFAR-100 datasets by classifying coarse and fine classifiers using a coarse-to-fine class hierarchy in combination with a Fineturn training strategy, and using a probabilistic averaging layer to weight average the recognition results of the two-layer classifiers. However, it does not achieve good results for small-scale data sets using a randomly sampled validation set to obtain the confusion matrix. That is, this technique is only suitable for application scenarios with large-scale training data, and the effect of improving small-scale data sets is not obvious.

In image classification, explicit boundaries between classes have a good impact on the exact classification of images, but boundaries between certain classes are easily confused with each other compared to other classes. Convolutional neural networks have extremely strong classification performance as advanced image recognition technologies, however, the processing of fuzzy boundaries by common convolutional neural networks is not ideal. For example, in the CIFAR-10 dataset, we can easily distinguish the two categories of "cat" and "truck" using convolutional neural networks, but it is difficult to distinguish the two categories of "cat" and "dog". Training the CIFAR-10 database with data enhancement using one of the best performing convolutional neural networks in the current image recognition field by using the Residual Network (ResNet) ResNet-18 can achieve 94.63% accuracy on the test set, and statistics is performed on images with misclassification errors to obtain a confusion matrix (fusion matrix) as shown in Table 1.

TABLE 1 confusion matrix (diagonal element set 0) for training model on CIFAR-10 using ResNet-18

The confusion matrix is a special table that is a tool for visualizing the performance of the classifier. Each row of which is the category predicted by the classifier and each column is the actual category of the image. Can make a _ij For misclassification diagrams in tablesThe number of slices, where i represents the class of these pictures predicted by the classifier and j represents the actual class. Then the confusion matrix F containing k categories can be expressed as:

F＝(a _ij )，i，j＝1，…，K；(1)

the total number of error pictures in Table 1 is 537, i.e., Σ _i≠j a _ij 537, number of pictures a with prediction class i and actual class j in confusion matrix _ij Ratio r of all misclassified pictures _ij The calculation formula of (2) is as follows:

the error ratio between every two classes, i.e. the prediction class is i, the actual class is j or the prediction class is j, the ratio of the pictures with the actual class i to all misclassified pictures is r=r _ij +r _ji 。

From the confusion matrix in table 1, it is apparent that the number of misclassifications between "Cat (Cat)" and "Dog (Dog)" reaches 113, with an error ratio r=21.04%, while r of any two remaining classes is only 7.45% at the highest. It follows that while ResNet-18 has very strong classification capabilities, it does not work well for partially confusing classes.

In view of this, in order to improve the current situation that the related technology is suitable for small-scale application scenarios and has poor effect on identifying the confusing type, the present application proposes a new convolutional neural network architecture, and the confusing perception convolutional neural network. The architecture of the confusion-aware convolutional neural network appears similar to the conventional hierarchical classification shown in fig. 1. However, there is essentially a significant difference between them. The confusion-aware convolutional neural network does not follow the concept of category from coarse to fine, but adopts a predictive-corrected recognition strategy for hierarchical classification, which is driven by a predictor-corrector numerical method, which is often used to solve various mathematical and engineering problems.

Having described the technical solutions of embodiments of the present invention, various non-limiting implementations of the present application are described in detail below.

Referring first to fig. 2, fig. 2 is a schematic structural diagram of an confusion-aware convolutional network according to an embodiment of the present invention, where the embodiment of the present invention may include the following:

the confusion-aware convolutional neural network may include a prediction classifier 1, a confusion-aware model 2, a correction classifier set 3, and a probabilistic averaging layer 4.

The predictive classifier 1 is a planar classifier trained on a training set with all classes for generating the predicted class of the input image to be identified in the classification stage, but its prediction is usually not accurate enough and therefore needs to be corrected.

The prediction classifier 1 is a convolutional neural network classifier trained by using a training sample set, and the convolutional neural network classifier can be any conventional convolutional neural network structure in the prior art, which is not limited in this application. The predictive classifier 1 is trained with all images of all classes in the training sample set.

Can make a _ij The number of misclassified images for a prediction with actual class i and predicted class j, and r _ij Is a as _ij The ratio to all misclassified images is defined as:

through { r _ij A set of which those easily confusable categories can be identified by using a threshold for use in building a subsequent confusing perception model 2.

In practical application, the prediction classifier 1 first classifies an input image to be identified for the first time, and the classification generates two output parts, namely a prediction class (cpredirect) of the image to be identified and a class probability vector belonging to the class. The class probability vector is the output of the last layer of the prediction classifier 1, and the size of the class probability vector represents the confidence that the image to be recognized belongs to each class, wherein the largest component corresponds to the prediction class cpprediction.

The confusion perception model 2 is used as a core component of the confusion perception convolutional neural network and is formed based on an estimated confusion matrix corresponding to the prediction classifier 1. The confusion perception model exists as a decision system, whether in the training stage (for training the correction classifier) or the classification stage (for selecting the appropriate correction classifier), the decision object being the choice of the correction classifier, the primary role being to perceive the confusable class in the test set.

The prediction classifier 1 is used as a common convolutional neural network classifier, and has no strong recognition capability on the confusable category. There are often fuzzy boundaries between these categories, such as "Cat" (Cat), "Dog" (Dog) and "Horse" (Horse) in CIFAR-10, which are not as sharp as between the remaining categories, and conventional classifiers tend to misclassify images of the overlapping areas of these boundaries. Whereas the confusion perception model established by constructing the estimated confusion matrix of the prediction classifier 1 can be used to accurately perceive sample data of the confusable class.

Since the confusion matrix on the test set cannot be directly acquired to estimate its confusion class, the confusion matrix on the training sample set can be acquired through cross-validation to simulate its distribution. Cross-validation, also known as loop estimation (Rotation Estimation), is a practical method of statistically slicing data samples into smaller subsets.

The confusion matrix may be obtained by cross-validation on a training sample set, and the method of generation may be as follows:

dividing the training sample set into a plurality of sub-training sets; for each sub-training set, taking the current sub-training set as a verification set, taking the rest sub-training sets of non-current sub-training sets as training sets to train a prediction classifier, and testing the verification set by using the trained prediction classifier to obtain a misclassified image; and summarizing misclassified images obtained by taking each sub-training set as a verification set to construct a confusion matrix. In a specific embodiment, the data sample images contained in each sub-training set may be identical, that is, the training sample set may be divided equally into a plurality of sub-training sets.

Referring to fig. 3, the Training sample set (traset) may be divided into 5 sub-Training sets, one of which is selected as the verification set (verification data) each time, and the rest of which is selected as the Training set (Training data) to train the convolutional neural network classifier, that is, the verification data and Training data are 1:4. and then testing the verification set by using the prediction classifier 1 to obtain misclassified images output by the verification set, repeating the above operation until all the 5 sub-training sets are tested as the verification set, and finally summarizing all misclassified images to construct a confusion matrix (fusion matrix).

In the prior hierarchical classification structure, a random sampling method is adopted to pick a verification set from a training sample set to obtain a confusion matrix. In contrast to these approaches, cross-validation may maximize the use of training sets to obtain a more accurate confusion matrix.

The confusion-aware model 2 serves as a decision system, training a set of correction classifiers that are dedicated to generating clear boundaries between each pair of fuzzy categories. That is, each correction classifier is trained using confusing class sample data having fuzzy boundaries in a training sample set using a confusing perceptual model as a decision making system. That is, the correction classifier exists as part of the correction in the prediction-correction structure of the confusion-aware network, which is made up of a series of convolutional neural network classifiers based on the confusion classes, so the number of correction classifiers in the correction classifier set 3 is typically less than or equal to the total number of classes in the training sample set.

The confusable category sample data is data with fuzzy boundary categories in the training sample set, such as all image data corresponding to cats and dogs. After the estimated confusion matrix for the prediction classifier 1 on the training sample set is obtained, all misclassified images in the confusion matrix may be ranked and the first 30% thereof may be selected as the confusing class. Specifically, the threshold T is chosen such that 30% of a _ij More than or equal to T, the rest 70 percent of a _ij <T. For example, for the predicted class being the kth class and the actual class being the ith number a _ij If there is a _ij And (3) not less than T, i can be regarded as a confusable category of k and used for subsequent training and correction of the classifier.

In the training phase of the model, for each prediction class C in the prediction classifier 1 _k Select its confusing category C _i 、C _j (i.e. a _ki Not less than T and a _ki Not less than T), then these three categories form a confusion set of categories C _k Based on C _k One correction classifier is trained separately from several classes in (a). At the time of classification, if the prediction classifier 1 predicts the inputted image to be recognized as a class C _k Then the confusion perception model selects a corresponding target correction classifier for the confusion perception model, and the target correction classifier is used for correcting and classifying the image to be identified again to obtain a class probability vector of the correction classifier. The final classification result depends on the combination of the output of the prediction classifier 1 and the target correction classifier, and the combination may be in the form of a probabilistic averaging layer 4.

The probability average layer 4 is used for outputting a final classification result of the image to be identified according to the class probability output by the prediction classifier and the class probability output by the target correction classifier, and the target correction classifier is a correction classifier selected by the confusion perception model according to the prediction class of the prediction classifier.

The hierarchical structure cooperatively uses the prediction classifier and the correction classifier to classify, and the output of all the classifiers is utilized simultaneously, so that the problem of selecting the output of a single classifier can be avoided to a certain extent, error propagation is prevented, and the defect of the traditional hierarchical classification model is overcome.

Because the class probability vectors output by the prediction classifier 1 and the correction classifier group 3 are not unified, in order to facilitate the subsequent data processing, the two probability vectors can be standardized, that is, the probability averaging layer may include a normalization module, where the normalization module is used to normalize the class probability output by the prediction classifier 1 and the class probability output by the target correction classifier, and optionally, the available softmax function processes the probability vectors. The following formula can be used for carrying out standardization processing on the class probability output by the prediction classifier and the class probability output by the target correction classifier:

By mapping the output into the interval (0, 1), the probability vector of the prediction classifier 1 output and the probability vector of the target correction classifier output can also be combined, the formula of which can be as follows:

/>

Wherein X is an image to be identified, y is a class label of the image to be identified, B _j In order to predict the class probability of the classifier output,

correcting the class probability of the classifier output for the target, +.>

And the sample data is a category corresponding to the confusable category sample data. The prediction classifier 1 is found in the j-th class (class C _j ) The probability of the component is B _j Correcting classifier to class C _j The sum of the obtained probabilities is->

The highest probability class of the output of the probability averaging layer p (y=j|x) is the final class of the confusion-aware convolutional neural network.

It should be noted that, the prediction classifier 1 in the present application does not necessarily need to use a certain fixed convolutional neural network classifier, and the network structures of the prediction classifier 1 and the correction classifier may be replaced by any existing convolutional neural network structure or other non-convolutional neural network classifiers.

In the technical scheme provided by the embodiment of the invention, a fuzzy perception convolutional neural network structure with prediction correction layering is provided, a fuzzy matrix is obtained through cross verification, and the fuzzy boundary data type is identified by a fuzzy perception model constructed by the fuzzy matrix; by using a predictive correction hierarchy, the CNN model is able to distinguish classes with fuzzy boundaries. On one hand, the processing capacity of the CNN model on category fuzzy boundaries is improved, the image classification accuracy is effectively improved, on the other hand, the error propagation problem caused by hierarchical classification is avoided, and because a more accurate estimation confusion matrix is used, the method can obtain larger improvement on a small-scale data set.

In order to verify that the technical scheme provided by the application can improve the accuracy and precision of image classification, the application also provides a series of verification experiments, the confusion perception CNN provided by the application is used for evaluating on a Mnist and a data set, the deep learning framework PyToch is used for carrying out experiments on a single NVIDIA Titan X display card, and the network is trained through back propagation. Embodiments of the present invention may include the following:

PyTorch is a deep learning framework by Facebook artificial intelligence institute (FAIR) on the gate of 2017, which was preceded by Torch, which was born in university of New York in 2002. PyTorch has an advanced design concept framework, and all modules above Tensor are reconstructed on the basis of Torch, and the most advanced automatic derivation system is newly added, so that the PyTorch becomes the most popular dynamic graph framework at present. PyTorch leads most of the rest of the deep learning framework in terms of flexibility, speed and ease of use. The PyTorch has simple design, easy understanding of source codes, flexible and easy use of designed interfaces, and easy realization of ideas by researchers, but the flexibility is maintained, the running speed is not sacrificed, and the speed is still leading in each deep learning framework. PyTorch provides a complete document and has an active community that remains in a long-term stable update under the support of Facebook artificial intelligence research institute and was therefore selected to test the confusion-aware convolutional neural network presented herein.

Mnist (Modified National Institute of Standards and Technology database) the data set is a handwritten digital data set comprising 70000 grey scale images of handwritten digits, 60000 of which are training data and the remaining 10000 of which are test data. Mnist consists of a 28 x 28 size handwritten digital image, comprising 10 categories, corresponding to numbers 0 to 9, respectively, such as fig. 4. The CIFAR-10 dataset is a common computer vision dataset comprising 10 categories of 60000 images of size 32X 3 total, with 50000 images being available as training sets and the remaining images being available as test sets.

On the Mnist dataset, the reference classifier can be trained using the convolutional neural network MnistConvNets in the PyTorch official sample. The Mnist ConvNets structure model consists of two convolution layers and two fully connected layers as shown in fig. 5. The prediction classifier iterates 500 batches on the training set with a learning rate of 0.01 and a momentum of 0.9. The training set is divided into six parts, each sub-training set having 10000 images. And realizing the confusion matrix by adopting a 6-time cross verification method. The correction classifier is trained using the confusion classes selected from the confusion matrix. The fusion-aware CNN is then tested on the test set. Experiments show that the accuracy of the test set is improved to 99.31%. The error rate of the aliasing-aware CNN is one quarter lower than the single CNN. The parameters of the confusion perceived CNN were 238K, 10 times that of Mnist ConvNet, and the confusion perceived CNN had better performance than ResNet-32 with 460K parameters, and the experimental results are shown in Table 2.

Table 2. The performance (without data enhancement) of different CNN models on Mnist.

From the above table, the accuracy of the test set is improved from 99.05% of the reference classification network Mnist to 99.31% of the confusion perception convolutional neural network (technical scheme proposed in the application). It can be seen that the error rate of the aliasing-aware convolutional neural network is one quarter lower than the reference network. In terms of parameter scale, the parameter of the confusion perception convolutional neural network is about 10 times of the reference Mnist and is 238K, and compared with ResNet-32 with 460K parameter, the confusion perception convolutional neural network has better performance and lower parameter scale.

Based on the CIFAR-10 dataset, three different CNNs can be used as basic models to test the performance of the confusion-aware CNNs, with increasing depth and overall parameters of these networks. The first network is identical in structure to the Lenet-5, and the only modification is to resize the input image to 32X 3 only. The second network increases the number of convolution kernels and hidden nodes on the basis of the first network. The structure of the first two networks is shown in fig. 6, and the learning rate of the classifier training is 0.1, which is reduced by 10 times every 100 batches. They iterate 500 batches on the training set, momentum 0.9, weight decay 0.0005. The last CNN uses a residual network of 18 layers, called ResNet-18.ResNet-18 includes 17 convolutional layers and a full link layer. Because the residual network is constructed with special residual blocks, it can reach thousands of depths without the problem of gradient dispersion. Network architecture of ResNet-18 As shown in FIG. 7, 200 batches per classifier were iterated when training the classifier using the residual network. The initial learning rate was set to 0.01, 10-fold reduction per 50 batches.

The three networks all adopt a random gradient descent method during training, and additional data enhancement is used, namely, original images of a training set are randomly cut and turned over before training, so that the performance of the classifier is enhanced. The training set of CIFAR-10 is divided into 5 sub-training sets, each sub-training set contains 10000 images, and a confusion matrix is obtained by adopting 5-fold cross validation. The final effect of the experiment is shown in table 3, with random clipping and flipping used for training.

TABLE 3 correction ratio comparison of three CNN networks and their corresponding confusion-aware CNNs

As shown in table 3, the improvement in accuracy decreased from 4.07% to 0.21% as the complexity of the base model increased. Accordingly, the present inventors considered that: the simpler the reference classifier is adopted, the lower the complexity is, and the more obvious the effect of improving the confusion perception convolutional neural network is.

Based on the above embodiments, the application further provides a method for image classification and identification, please refer to fig. 8 and 9, fig. 8 is a flow chart of an image identification method provided by the embodiment of the invention, and fig. 9 is a flow chart of an illustrative example, where the embodiment of the invention may include the following:

s801: and inputting the image to be identified into a pre-constructed confusion perception convolutional neural network.

The confusion-aware convolutional neural network may include a prediction classifier, a confusion-aware model, a plurality of correction classifiers, and a probabilistic averaging layer. The functional structure and implementation process of the confusion perception convolutional neural network can be referred to the description of the above embodiments.

S802: and calling a prediction classifier of the confusion perception convolutional neural network to identify the image to be identified, and obtaining the prediction category and the first category probability of the image to be identified.

S803: and calling a confusion perception model of the confusion perception convolutional neural network to select a target correction classifier according to the prediction category.

S804: and identifying the image to be identified by using the target correction classifier to obtain the second class probability of the image to be identified.

S805: and outputting a classification result of the image to be identified according to the first class probability and the second class probability by using a probability average layer of the confusion perception convolutional neural network.

And fusing the class probability vectors obtained by the prediction classifier and the correction classifier through the probability average layer, and selecting the analogy corresponding to the maximum component in the fused vector as the final output of the network, namely the final class of the image to be recognized. From the above, the embodiment of the invention solves the defects existing in the related technology, and is beneficial to improving the accuracy of image classification.

The embodiment of the invention also provides a corresponding implementation device for the image recognition method, so that the method has higher practicability. The image recognition apparatus according to the embodiment of the present invention will be described below, and the image recognition apparatus described below and the image recognition method described above may be referred to correspondingly.

Referring to fig. 10, fig. 10 is a block diagram of an image recognition apparatus according to an embodiment of the present invention, where the apparatus may include:

the image input module 1001 is configured to input an image to be identified into a pre-constructed confusion-aware convolutional neural network.

The image recognition module 1002 is configured to invoke a prediction classifier of the confusion perception convolutional neural network to recognize an image to be recognized, so as to obtain a prediction class and a first class probability of the image to be recognized; invoking a confusion perception model of a confusion perception convolutional neural network to select a target correction classifier according to the prediction category; identifying the image to be identified by using the target correction classifier to obtain a second class probability of the image to be identified; and outputting a classification result of the image to be identified according to the first class probability and the second class probability by using a probability average layer of the confusion perception convolutional neural network.

Alternatively, in some implementations of the present embodiment, the image recognition module 1002 may divide the training sample set into a plurality of sub-training sets; for each sub-training set, taking the current sub-training set as a verification set, taking the rest sub-training sets of non-current sub-training sets as training sets to train a prediction classifier, and testing the verification set by using the trained prediction classifier to obtain a misclassified image; and a module for summarizing misclassified images obtained by taking each sub-training set as a verification set to construct an confusion matrix.

In another embodiment, the image recognition module 1002 may further include, for example, a normalization module and a probability calculation module; the normalization module is used for performing normalization processing on the class probability output by the prediction classifier and the class probability output by the target correction classifier; the probability calculation module is used for calculating the class probability according to the standardized processing to obtain the class to which the image to be identified belongs.

In some implementations of the embodiments of the present invention, the normalization module may further normalize the class probability output by the prediction classifier and the class probability output by the target correction classifier using the following formula:

In other implementations of the embodiments of the present invention, the probability calculation module may further determine the category label to which the image to be identified belongs by using the following formula:

correcting the class probability of the classifier output for the target, +.>

And the sample data is a category corresponding to the confusable category sample data.

The functions of each functional module of the image recognition device according to the embodiment of the present invention may be specifically implemented according to the method in the embodiment of the method, and the specific implementation process may refer to the related description of the embodiment of the method, which is not repeated herein.

From the above, the embodiment of the invention solves the defects existing in the related technology, and is beneficial to improving the accuracy of image classification.

The embodiment of the invention also provides image recognition equipment, which comprises the following steps:

a memory for storing a computer program;

a processor configured to execute a computer program to implement the steps of the image recognition method according to any of the embodiments described above.

The embodiment of the invention also provides a computer readable storage medium storing an image recognition program which, when executed by a processor, performs the steps of the image recognition method according to any one of the embodiments above.

The functions of each functional module of the computer readable storage medium according to the embodiments of the present invention may be specifically implemented according to the method in the embodiments of the method, and the specific implementation process may refer to the relevant description of the embodiments of the method, which is not repeated herein.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The image recognition method, the device, the equipment, the computer readable storage medium and the confusion perception convolutional neural network provided by the invention are described in detail above. The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to facilitate an understanding of the method of the present invention and its core ideas. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the invention can be made without departing from the principles of the invention and these modifications and adaptations are intended to be within the scope of the invention as defined in the following claims.

Claims

1. An image recognition method, comprising:

outputting a classification result of the image to be identified according to the first class probability and the second class probability by utilizing a probability average layer of the confusion perception convolutional neural network;

the confusion perception convolutional neural network comprises a prediction classifier, a confusion perception model, a plurality of correction classifiers and a probability average layer;

The probability average layer is used for outputting a classification result of the image to be identified according to the class probability output by the prediction classifier and the class probability output by the target correction classifier, wherein the target correction classifier is a correction classifier selected by the confusion perception model according to the prediction class of the prediction classifier;

the probability averaging layer for using the confusion perception convolutional neural network outputs a classification result of the image to be identified according to the first class probability and the second class probability, including:

and carrying out standardization processing on the class probability output by the prediction classifier and the class probability output by the target correction classifier by using the following formula:

where z is a K-dimensional vector and its jth element z _j The (0, 1) probability mapped to the interval is σ (z) _j ；

Determining the category label to which the image to be identified belongs by using the following formula:

correcting the class probability of the classifier output for said target,/->

And the class corresponding to the confusable class sample data.

2. The image recognition method of claim 1, wherein the confusion matrix is obtained by cross-validation acquisition on the training sample set comprising:

Dividing the training sample set into a plurality of sub-training sets;

3. The image recognition method of claim 2, wherein the number of training sample images included in each sub-training set is the same.

4. A method of image recognition according to any one of claims 1 to 3, wherein the probability averaging layer comprises a normalization module and a probability calculation module;

5. An image recognition apparatus, comprising:

The image recognition module is used for calling a prediction classifier of the confusion perception convolutional neural network to recognize the image to be recognized, so as to obtain a prediction category and a first category probability of the image to be recognized; invoking a confusion perception model of the confusion perception convolutional neural network to select a target correction classifier according to the prediction category; identifying the image to be identified by using the target correction classifier to obtain a second class probability of the image to be identified; outputting a classification result of the image to be identified according to the first class probability and the second class probability by utilizing a probability average layer of the confusion perception convolutional neural network;

wherein the image recognition module is further configured to:

correcting the class probability of the classifier output for said target,/->

And the class corresponding to the confusable class sample data.

6. An image recognition device comprising a processor for implementing the steps of the image recognition method according to any one of claims 1 to 4 when executing a computer program stored in a memory.

7. A computer-readable storage medium, on which an image recognition program is stored, which, when executed by a processor, implements the steps of the image recognition method according to any one of claims 1 to 4.