CN117197523A

CN117197523A - Image classification method, apparatus, device, storage medium, and program product

Info

Publication number: CN117197523A
Application number: CN202310786459.8A
Authority: CN
Inventors: 徐晓健
Original assignee: Bank of China Ltd
Current assignee: Bank of China Ltd
Priority date: 2023-06-29
Filing date: 2023-06-29
Publication date: 2023-12-08

Abstract

The application relates to an image classification method, an image classification device, an image classification equipment, a storage medium and a program product, and relates to the technical field of artificial intelligence, wherein the method comprises the following steps: and acquiring a plurality of image pairs constructed by the image to be predicted and a plurality of base images of different categories, respectively inputting each image pair into a preset image classification model to obtain the prediction probability of each image pair, and then determining the category of the base image in the image pair with the highest prediction probability as the classification result of the image to be predicted. The prediction probability represents the similarity between the image to be predicted in the image pair and the base image; the image classification model is obtained by iteratively learning the characteristic information of a plurality of sample images of different categories. The method improves the classification accuracy of the image to be predicted.

Description

Image classification method, apparatus, device, storage medium, and program product

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to an image classification method, apparatus, device, storage medium, and program product.

Background

With the continuous development of artificial intelligence, neural network models are increasingly used.

Taking image classification as an example, the initial neural network model is sufficiently and effectively learned through a sample set to obtain an image classification model, and then the images of all the classes are classified through the image classification model. For example, the fields of face recognition, unmanned driving, medical detection, and the like can be classified.

However, the related art has a problem of inaccurate classification when classifying images.

Disclosure of Invention

In view of the foregoing, it is desirable to provide an image classification method, apparatus, device, storage medium, and program product that can improve the accuracy of image classification.

In a first aspect, the present application provides a method of classifying images, the method comprising:

acquiring a plurality of image pairs constructed by an image to be predicted and a plurality of different types of base images;

inputting each image pair into a preset image classification model respectively to obtain the prediction probability of each image pair; the prediction probability represents the similarity between the image to be predicted in the image pair and the base image; the image classification model is obtained by iteratively learning the characteristic information of a plurality of sample images with different categories;

and determining the category of the base image in the image pair with the highest prediction probability as the classification result of the image to be predicted.

In one embodiment, the training process of the image classification model includes:

acquiring a plurality of sample images of different categories;

determining a plurality of training sample groups according to the sample images of a plurality of different categories; each training sample group comprises a plurality of sample image pairs under all categories;

and training the initial image classification model through each training sample group until the initial image classification model converges to obtain an image classification model.

In one embodiment, determining a plurality of training sample sets from a plurality of different classes of sample images includes:

acquiring a base image under each category according to the sample image under each category;

respectively combining the base image and other sample images under each category to obtain a plurality of sample image pairs under each category;

and obtaining a plurality of training sample groups according to the plurality of sample image pairs in each category and according to the mode that at least one sample image pair in each category is correspondingly divided by each training sample group.

In one embodiment, acquiring the substrate image under each category includes:

for any one of the categories, any one of the sample images in the category is determined as the base image under the category.

In one embodiment, training the initial image classification model by each training sample set until the initial image classification model converges includes:

for any training sample group, each sample image pair in the training sample group is input into an initial image classification model for training, and a loss function value of the initial image classification model trained by each sample image pair is obtained;

and updating parameters of the initial image classification model according to the loss function values until the initial image classification model reaches a preset condition, and determining convergence of the initial image classification model.

In one embodiment, each sample image pair includes a base image and a sample image; the initial image classification model comprises an initial feature extraction network and an initial calculation network; acquiring a loss function value of each sample image pair training initial image classification model, comprising:

for any sample image pair, inputting the sample image pair into an initial feature extraction network to obtain feature vectors of a base image in the sample image pair and feature vectors of the sample image;

splicing the feature vector of the base image and the feature vector of the sample image to obtain a double-channel feature vector;

And inputting the double-channel feature vector into an initial computing network, and determining a loss function value of the sample image on the trained initial image classification model.

In one embodiment, the initial feature extraction network comprises a single channel feature network and a multi-scale feature network; inputting the sample image pair into an initial feature extraction network to obtain feature vectors of a base image in the sample image pair and feature vectors of the sample image, wherein the method comprises the following steps:

inputting the sample image pair into a single-channel feature network to obtain a feature image of a base image and a feature image of a sample image;

inputting the feature images of the base image and the feature images of the sample image into a multi-scale feature network to obtain a sub-feature image of the base image at a plurality of different scales and a sub-feature image of the sample image at a plurality of different scales;

splicing the multiple sub-feature images of the base image to obtain feature vectors of the base image; and splicing the plurality of sub-feature images of the sample image to obtain the feature vector of the sample image.

In one embodiment, the initial computing network includes a convolutional layer and a fully-connected layer; inputting the two-channel feature vector into a computing network, determining a loss function value of the sample image on the trained initial image classification model, comprising:

Inputting the double-channel feature vector into a convolution layer to obtain a similarity matrix between the base image and the sample image;

inputting a similarity matrix into the full-connection layer, and determining the similarity between the base image and the sample image;

the similarity between the base image and the sample image is determined as a loss function value.

In one embodiment, updating parameters of the initial image classification model based on the loss function values includes:

acquiring image pair labels of each sample image pair; the image pair label characterizes a class coincidence state of a base image and a sample image in the sample image pair;

and updating parameters of the initial image classification model according to the loss function value of each sample image pair training initial image classification model and the corresponding image pair label.

In one embodiment, the initial image classification model reaching the preset condition includes the classification result accuracy of the initial filtering network model reaching a preset accuracy threshold.

In a second aspect, the present application also provides an image classification apparatus, including:

an image pair acquisition module for acquiring a plurality of image pairs constructed by an image to be predicted and a plurality of different categories of base images;

the prediction module is used for inputting each image pair into a preset image classification model respectively to obtain the prediction probability of each image pair; the prediction probability represents the similarity between the image to be predicted in the image pair and the base image; the image classification model is obtained by iteratively learning the characteristic information of a plurality of sample images with different categories;

And the determining module is used for determining the category of the base image in the image pair with the highest prediction probability as the classification result of the image to be predicted.

In a third aspect, an embodiment of the present application provides a computer device, including a memory and a processor, the memory storing a computer program, the processor implementing the steps of the method provided by any of the embodiments of the first aspect, when the computer program is executed.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method provided by any of the embodiments of the first aspect described above.

In a fifth aspect, embodiments of the present application also provide a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method provided by any of the embodiments of the first aspect described above.

The image classification method, the device, the equipment, the storage medium and the program product acquire a plurality of image pairs constructed by the image to be predicted and a plurality of base images of different categories, respectively input each image pair into a preset image classification model to obtain the prediction probability of each image pair, and then determine the category of the base image in the image pair with the highest prediction probability as the classification result of the image to be predicted. The prediction probability represents the similarity between the image to be predicted in the image pair and the base image; the image classification model is obtained by iteratively learning the characteristic information of a plurality of sample images of different categories. In the method, the image classification model is obtained by iteratively learning the characteristic information of the sample images of a plurality of different categories, which is equivalent to the image classification model, so that the image classification model can accurately calculate the prediction probability of an image pair consisting of an image to be predicted and a plurality of base images of different categories, thereby improving the classification accuracy of the image to be predicted.

Drawings

FIG. 1 is a diagram of an application environment for an image classification method in one embodiment;

FIG. 2 is a flow chart of an image classification method according to an embodiment;

FIG. 3 is a flow chart of an image classification method according to another embodiment;

FIG. 4 is a flow chart of an image classification method according to another embodiment;

FIG. 5 is a flow chart of an image classification method according to another embodiment;

FIG. 6 is a flow chart of an image classification method according to another embodiment;

FIG. 7 is a flow chart of an image classification method according to another embodiment;

FIG. 8 is a flow chart of an image classification method according to another embodiment;

FIG. 9 is a flow chart of an image classification method according to another embodiment;

FIG. 10 is a flow chart of an image classification method according to another embodiment;

FIG. 11 is a flow chart of an image classification method according to another embodiment;

FIG. 12 is a flow chart of an image classification method according to another embodiment;

fig. 13 is a block diagram showing the structure of an image classification apparatus in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

In the related art, when classifying images, an image classification model is generally used to classify the images, but when the training image data amount of the image classification model is insufficient, the image classification model may be fitted excessively, so that the problem of inaccurate image classification occurs.

Based on this, the embodiment of the application provides an image classification method, which can classify an image to be predicted through a pre-trained image classification model, and because the image classification model is obtained by iteratively learning the characteristic information of a plurality of sample images with different categories, the accuracy of the image classification model is higher, so that the accuracy of the image classification result of the image to be predicted is higher.

It should be noted that, the image classification method in the embodiment of the present application may be applied to classification in any field, for example, fields of social media, medical diagnosis, automatic driving, retail industry, security monitoring, etc.; in the aspect of social media, the image classification model can be applied to image annotation and face recognition, so that the social media platform is more accurate in recommendation and individuation functions; in the aspect of medical diagnosis, the image classification model can be applied to medical image analysis, such as breast cancer classification, lung disease diagnosis and the like; in the automated driving aspect, image classification models may be used to identify traffic signs, vehicle detection and identification to aid automated driving systems in making decisions in the retail industry, image classification models may be used for merchandise identification and classification, such as merchandise inventory management, merchandise flaw detection, and the like.

In one embodiment, a method of classifying images is provided, and the method is applied to the computing device in fig. 1 for illustration. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement an image classification method.

It will be appreciated by those skilled in the art that the architecture shown in fig. 1 is merely a block diagram of some of the architecture relevant to the embodiments of the present application and is not intended to limit the computer device to which the embodiments of the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, there is provided an image classification method, as shown in fig. 2, including the steps of:

s201, a plurality of image pairs constructed by the image to be predicted and a plurality of different types of base images are acquired.

Taking image classification of a face as an example, the face image classification may include age classification, gender classification, expression classification, emotion classification, and the like; for example, by way of example of age classification, categories may include infants, children, teenagers, adults, and the elderly; taking the expression classification as an example, the categories may include happiness, sadness, anger, and the like; taking emotion classification as an example, categories may include happy, hard, surprised, and the like.

The image to be predicted can be any image of the category to be predicted, and the image to be predicted can be sent to the computer equipment by other terminal equipment; for example, when an image of an object needs to be classified, the image of the object may be acquired by the image capturing apparatus, the image of the object is sent to the computer apparatus as an image to be predicted, and the computer apparatus classifies the image to be predicted when receiving the image to be predicted sent by the image capturing apparatus.

Optionally, before classifying the image to be predicted, a plurality of base images of different categories in the image field corresponding to the image to be predicted are required to be acquired; the base images of all the categories represent the image attributes and the characteristics of the corresponding categories and are used for classifying the images to be predicted; the base images of each class can characterize typical features of the corresponding class and can be used as reference images in image classification.

Taking the age classification task as an example, the base image may include corresponding face images of infants, children, teenagers, adults, and elderly people.

The method for obtaining the plurality of base images with different categories may be that, when sending the image to be predicted to the computer device, other terminal devices send the corresponding plurality of base images with different categories to the computer device at the same time; alternatively, a plurality of different categories of base images may be acquired from a database in the computer device.

Based on the obtained image to be predicted and a plurality of different types of base images, a plurality of image pairs can be constructed by respectively pairing the image to be predicted with each base image to obtain a plurality of image pairs; each image pair comprises an image to be predicted and a base image.

S202, inputting each image pair into a preset image classification model respectively to obtain the prediction probability of each image pair; the prediction probability represents the similarity between the image to be predicted in the image pair and the base image; the image classification model is obtained by iteratively learning the characteristic information of a plurality of sample images of different categories.

Inputting each image pair into an image classification model of the device, and outputting the prediction probability of each image pair through analysis of the image classification model; the prediction probability represents the probability of similarity between the image to be predicted in the image pair and the base image.

Alternatively, the image classification model may be a neural network model that is trained in advance based on a plurality of sample images of different classes, and is dedicated to classifying images to be predicted.

The similarity between the image to be predicted and the base image can be calculated by means of Euclidean distance, cosine similarity, manhattan distance, hamming distance and the like.

Taking Euclidean distance as an example, aiming at any image pair, feature information of an image to be predicted and feature information of a base image in the image pair are obtained through an image classification model, euclidean distance between the image to be predicted and the base image is calculated according to the feature information of the image to be predicted and the feature information of the base image, similarity between the image to be predicted and the base image is determined according to the Euclidean distance between the image to be predicted and the base image, and the similarity between the image to be predicted and the base image is taken as prediction probability.

And S203, determining the category of the base image in the image pair with the highest prediction probability as the classification result of the image to be predicted.

And determining the category of the base image in the image pair with the highest prediction probability as the image classification result to be predicted, namely, the classification result of the image to be predicted represents which category the image to be predicted belongs to.

Continuing taking the age classification task as an example, if the base image in the image pair of the highest probability image is the base image representing the child, determining the class of the image to be predicted as the child.

Optionally, to further improve accuracy of image classification, each class may include a plurality of base images, so that each class of corresponding image pair includes a plurality of image pairs, and for any class, an average value of prediction probabilities of the plurality of image pairs corresponding to the class may be used as a prediction probability of the image pair corresponding to the class; and then determining the category corresponding to the image pair with the highest prediction probability as the classification result of the image to be predicted.

In the image classification method provided by the embodiment of the application, a plurality of image pairs constructed by the image to be predicted and a plurality of base images with different categories are obtained, each image pair is respectively input into a preset image classification model to obtain the prediction probability of each image pair, and then the category of the base image in the image pair with the highest prediction probability is determined as the classification result of the image to be predicted. The prediction probability represents the similarity between the image to be predicted in the image pair and the base image; the image classification model is obtained by iteratively learning the characteristic information of a plurality of sample images of different categories. In the method, the image classification model is obtained by iteratively learning the characteristic information of the sample images of a plurality of different categories, which is equivalent to the image classification model, so that the image classification model can accurately calculate the prediction probability of an image pair consisting of an image to be predicted and a plurality of base images of different categories, thereby improving the classification accuracy of the image to be predicted.

The above is an illustration of the application of the pre-image classification model, and the following is an illustration of how the image classification model is constructed, in one embodiment, as shown in fig. 3, the training process of the image classification model includes the steps of:

s301, acquiring sample images of various different categories.

A plurality of different categories of sample images may be acquired from the dataset, each category including a plurality of sample images.

Alternatively, the sample images of each category may be identified by a classification algorithm, or may be identified manually.

S302, determining a plurality of training sample groups according to sample images of a plurality of different categories; each training sample set includes a plurality of sample image pairs under all classes.

In order to ensure the diversity and effectiveness of the training images, when the initial image classification model is subjected to iterative training, each iterative training is performed through different training sample groups, and each training sample group comprises multiple sample image pairs under multiple categories.

The method comprises the steps of determining a plurality of training sample groups through a preset construction model, inputting sample images of different categories into the construction model, and analyzing the sample images of different categories through the construction model to obtain the plurality of training sample groups.

Alternatively, if the initial image classification model is trained iteratively 100 times, 100 training sample sets may be determined.

In one embodiment, as shown in FIG. 4, determining a plurality of training sample sets from a plurality of different classes of sample images includes:

s401, acquiring a substrate image under each category according to the sample image under each category.

For any category, the manner of acquiring the substrate image under each category may be: determining any sample image in the category as a base image under the category; randomly selecting one piece from all sample images in the category as a base image under the category.

Alternatively, an evaluation model may be used to evaluate all sample images under a category, so as to obtain a sample image that is most representative of the category.

And S402, respectively combining the base image and other sample images under each category to obtain a plurality of sample image pairs under each category.

And combining the base images under each category with other sample images respectively to obtain a plurality of sample image pairs under each category.

For any category, combining the base image under the category with other images respectively to obtain a plurality of sample image pairs under the category, wherein for each category, the other sample images are all sample images except the base image of the category.

For example, category a, category B, and category C are included in the categories of the sample image; category a includes sample image A1, sample image A2, and sample image A3, category B includes sample image B1, sample image B2, and sample image B3, and category C includes sample image C1 and sample image C2.

For example, for class a, the base image of class a is A1, and A1 is combined with the other sample images, respectively, to obtain a plurality of sample image pairs under class a including (A1, A2), (A1, A3), (A1, B1), (A1, B2), (A1, B3), (A1, C1), and (A1, C2). For class B, the base image of class B is B1, and B1 is combined with other sample images, respectively, to obtain a plurality of sample image pairs under class B including (B1, A1), (B1, A2), (B1, A3), (B1, B2), (B1, B3), (B1, C1), and (B1, C2).

And obtaining a plurality of sample image pairs under each category according to the mode of obtaining the plurality of sample image pairs under each category.

S403, obtaining a plurality of training sample groups according to a plurality of sample image pairs in each category and according to a mode that at least one sample image pair in each category is correspondingly divided by each training sample group.

And determining a plurality of training sample groups according to the obtained plurality of sample image pairs under each category and according to the mode that each training sample group correspondingly divides at least one sample image pair in each category.

In one embodiment, a plurality of sample image pairs under all categories in the training sample group can be obtained from a plurality of sample image pairs under each category according to a preset construction rule; wherein, the construction rule can be the number of sample image pairs under each class in each training sample group; optionally, according to the preset number of sample image pairs under each category in each training sample group, a corresponding number of sample image pairs may be randomly acquired from a plurality of sample image pairs under each category, so as to be used as a corresponding training sample group.

It should be noted that the sample image pairs in each training sample group may or may not have the same sample image pair.

In the embodiment of the application, according to the sample image under each category, the base image under each category is obtained; and respectively combining the base images and other sample images under each category to obtain a plurality of sample image pairs under each category, and obtaining a plurality of training sample groups according to the plurality of sample image pairs under each category and the mode of correspondingly dividing at least one sample image pair in each category according to each training sample group. In the method, the base images under each category are combined with other sample images to obtain a plurality of sample image pairs under each category, so that the training sample groups have diversity, and the training effect is improved; and at least one sample image pair in each category is correspondingly divided in the training sample group, so that each time the initial image classification model is trained, the sample image pairs in all categories participate in training, the comprehensiveness of training the initial image classification model is ensured, and the reliability and the accuracy of the image classification model are improved.

S303, training the initial image classification model through each training sample group until the initial image classification model converges to obtain an image classification model.

Firstly, acquiring an initial image classification model, then training the initial image classification model through each training sample group until the initial image classification model converges, and determining the converged initial image classification model as a trained image classification model.

The initial image classification model may be, among other things, some underlying neural network model, such as, for example, including but not limited to, a deep learning network model, a deep convolutional neural network model, a residual neural network (Residual Neural Network, resNet) model, and the like.

For any training sample group, the mode of training the initial image classification model through each training sample group may be that the training sample group is input into the initial image classification model, the loss function of the initial image classification model trained by the training sample group is calculated according to the preset loss function in the initial image classification model, and then the parameters in the initial image classification model are updated based on the loss function.

Alternatively, the condition for convergence of the initial image classification model may be that the initial image classification model reaches a preset iteration number, and the initial image classification model obtained after the last training is determined as the image classification model.

In the image classification method provided by the embodiment of the application, a plurality of sample images with different categories are obtained; determining a plurality of training sample groups according to the sample images of a plurality of different categories; each training sample group comprises a plurality of sample image pairs under all categories; and training the initial image classification model through each training sample group until the initial image classification model converges to obtain an image classification model. According to the method, the initial image classification model is trained by determining a plurality of training sample groups and respectively using different training sample groups, so that the diversity of the initial image classification model is improved; in addition, each training sample group comprises a plurality of sample image pairs under all categories, and in each training process of the initial image classification model, the training is carried out by using the plurality of image pairs under all categories, so that the comprehensiveness of training data of each training of the initial image classification model is ensured, and the accuracy of the image classification model is improved.

In one embodiment, as shown in fig. 5, training the initial image classification model by each training sample set until the initial image classification model converges includes:

S501, for any training sample group, each sample image pair in the training sample group is input into an initial image classification model for training, and a loss function value of the initial image classification model trained by each sample image pair is obtained.

The loss function value is an index for measuring the difference between the predicted value and the actual value, and the parameters of the initial image classification model can be optimized by minimizing the loss function value or maximizing the loss function value in the training process, so that the initial image classification model can more accurately predict the target value.

For any training sample group, each sample image pair in the training sample set is input into an initial image classification model, each sample image pair is analyzed through the initial image classification model, and a loss function value of the initial image classification model trained by each sample image pair is calculated through a preset loss function.

S502, updating parameters of the initial image classification model according to the loss function values until the initial image classification model reaches a preset condition, and determining convergence of the initial image classification model.

In one embodiment, the gradient is calculated from the loss function values for each training, then passed back to the various parameters of the initial image classification model using a back-propagation algorithm, and then the parameters of the initial image classification model are updated according to an optimization algorithm.

In the iterative training process, the loss function value in the initial image classification model can be gradually reduced or increased along with the training of the initial image classification model, and when the initial image classification model converges, the loss function value tends to be stable, so that the initial image classification model achieves a good prediction effect.

Therefore, the initial image classification model reaching the preset condition may be that the loss function value of the initial image classification model is within a preset range.

In one embodiment, the initial image classification model reaching the preset condition may also include the classification result accuracy of the initial filtering network model reaching a preset accuracy threshold.

Specifically, under the condition that each training of the initial image classification model is completed, a plurality of preset test sample images can be predicted through the initial image classification model to obtain a prediction classification result of each test sample image, and the classification result accuracy of the initial image classification model is calculated according to the test classification result of each test sample and the corresponding real classification result; and under the condition that the accuracy of the classification result of the initial image classification model is greater than or equal to a preset accuracy threshold, stopping training of the initial image classification model, and determining the final initial image classification model as an image classification model.

Optionally, the initial image classification model reaching the preset condition may also include reaching the preset number of iterations of the initial image classification model, and the accuracy of the classification result of the initial image classification model reaching the preset accuracy threshold.

Under the condition that the training of the initial image classification model reaches the preset iteration times and the accuracy of the classification result of the initial image classification model also reaches the accuracy threshold, determining the parameter in the initial image classification model corresponding to the maximum accuracy of the classification result in the preset iteration times as the parameter of the image classification model.

If the initial image classification model does not reach the preset condition, training the initial image classification model can be continued until the initial image classification model reaches the preset condition.

In the image classification method provided by the embodiment of the application, for any training sample group, each sample image pair in the training sample group is input into an initial image classification model for training, the loss function value of each sample image pair training the initial image classification model is obtained, and the parameters of the initial image classification model are updated according to each loss function value until the initial image classification model reaches a preset condition, and the convergence of the initial image classification model is determined. In the method, the training sample group comprises a plurality of sample image pairs under all categories, so that the initial image classification model is trained through the sample image pairs of the plurality of categories, so that the loss function value of the initial image classification model is more accurate, the parameters of the initial image classification model are updated through the loss function value, and the accuracy of the initial image classification model can be improved; and a preset condition is set for iterative training of the initial image classification model, and training of the initial image classification model is stopped under the condition that the initial image classification model reaches the preset condition, so that calculation resources and training time are saved.

Each sample image pair includes a base image and a sample image; in one embodiment, as shown in FIG. 6, the initial image classification model includes an initial feature extraction network and an initial computing network; acquiring a loss function value of each sample image pair training initial image classification model, comprising:

s601, for any one sample image pair, inputting the sample image pair into the initial feature extraction network, and obtaining feature vectors of the base image and feature vectors of the sample image in the sample image pair.

The initial feature extraction network may be a convolutional neural network (Convolutional Neural Networks, CNN), among others.

Inputting the sample image pair into an initial feature extraction network, and extracting features of the base image and the sample image in the sample image pair through the feature extraction network to obtain feature vectors of the base image and feature vectors of the sample image in the sample image.

The initial feature extraction network comprises a single-channel feature network and a multi-scale feature network; in one embodiment, as shown in fig. 7, a sample image pair is input into an initial feature extraction network to obtain feature vectors of a base image in the sample image pair and feature vectors of the sample image, including the steps of:

And S701, inputting the sample image pair into a single-channel feature network to obtain a feature map of the substrate image and a feature map of the sample image.

The single-channel feature network can extract feature images of the images, and the number of channels of the extracted feature images is 1. The single channel feature network may be a CNN with a number of convolution kernels of 1.

And inputting the sample image pair into a single-channel feature network, and analyzing the features of the base image and the sample image through the single-channel feature network to obtain a feature map of the base image and a feature map of the sample image.

S702, inputting the feature images of the base image and the feature images of the sample image into a multi-scale feature network to obtain a sub-feature image of the base image at a plurality of different scales and a sub-feature image of the sample image at a plurality of different scales.

The multi-scale feature network can extract sub-feature graphs of the feature graph under a plurality of different scales, and a plurality of convolution kernels with different sizes exist in the multi-scale feature network. The multi-scale feature network may also be a CNN but there are a number of convolution kernels of different sizes in the CMM.

The method for obtaining the sub-feature images of the substrate image and the sample image in the different scales may be as follows: inputting the feature images of the base image into a multi-scale feature network, and carrying out feature extraction on the feature images of the base image on a plurality of different scales by the multi-scale feature network to obtain sub-feature images of the base image on a plurality of different scales; inputting the feature images of the sample image into a multi-scale feature network, and carrying out feature extraction on the feature images of the sample image on a plurality of different scales by the multi-scale feature network to obtain sub-feature images of the sample image on a plurality of different scales.

S703, splicing the plurality of sub-feature images of the base image to obtain feature vectors of the base image; and splicing the plurality of sub-feature images of the sample image to obtain the feature vector of the sample image.

Feature stitching is performed on a plurality of feature graphs with different scales, and feature graphs from different scales are fused together to improve the perceptibility of the model to the features with different scales.

Therefore, the characteristic images are taken as units, a plurality of sub-characteristic images of the base image are spliced to obtain the characteristic vector of the base image, and a plurality of sub-characteristic images of the sample image are spliced to obtain the characteristic vector of the sample image.

The method for splicing the plurality of sub-feature images with different scales to obtain the feature vector may be that the plurality of sub-feature images with different scales are spliced to obtain a large feature image, and then vector conversion is performed on the feature image to obtain the corresponding feature vector. For example, if there are two sub-feature graphs: at1 and at2, at 3= (at 1, at 2) is obtained by stitching at1 and at2, and at3 is converted into feature vectors. The feature vector of the base image and the feature vector of the sample image can be obtained in the above manner.

Alternatively, the channel dimensions of the sub-feature maps that need to be stitched are the same.

S602, the feature vector of the base image and the feature vector of the sample image are spliced to obtain a double-channel feature vector.

And splicing the feature vector of the base image and the feature vector of the sample image according to the channel dimension to obtain a multi-channel feature vector, namely a 2-channel feature vector.

S603, inputting the double-channel feature vector into an initial computing network, and determining a loss function value of the sample image on the trained initial image classification model.

And inputting the two-channel feature vectors into an initial computing network, performing correlation analysis on the two-channel feature vectors through the initial computing network, and determining the loss function value of the sample image on the trained initial image classification model. Alternatively, the loss function value may characterize the similarity of the base image and the sample image.

In one embodiment, as shown in fig. 8, in the case where the initial computing network includes a convolution layer and a full connection layer, the two-channel feature vector is input to the computing network, and a loss function value of the sample image to the trained initial image classification model is determined, including the steps of:

s801, inputting the double-channel feature vector into a convolution layer to obtain a similarity matrix between the base image and the sample image.

Wherein a convolution layer in the computing network may be used to calculate a similarity matrix between the base image and the sample image.

Therefore, the obtained two-channel eigenvectors of the base image and the sample image can be input into the convolution layer, the two-channel eigenvectors are analyzed through the convolution layer, and the similarity matrix between the base image and the sample image is output.

S802, inputting a similarity matrix into the full-connection layer, and determining the similarity between the substrate image and the sample image.

The fully connected layer may be used to calculate the corresponding similarity of the similarity matrix. Therefore, the similarity matrix output by the convolution layer can be input to the full connection layer, and the similarity matrix is analyzed through the full connection layer, so that the similarity between the base image and the sample image is output.

The similarity between the base image and the sample image may be calculated according to an euclidean matrix between the eigenvectors of the base image and the eigenvectors of the sample image.

S803, the similarity between the base image and the sample image is determined as a loss function value.

The similarity between the base image and the sample image is determined as a loss function value of the corresponding sample image to the trained initial image classification model.

In the image classification method provided by the embodiment of the application, for any sample image pair, the sample image pair is input into an initial feature extraction network to obtain the feature vector of a base image in the sample image pair and the feature vector of the sample image, the feature vector of the base image and the feature vector of the sample image are spliced to obtain a double-channel feature vector, then the double-channel feature vector is input into an initial calculation network, and the loss function value of the sample image pair to a trained initial image classification model is determined. According to the method, the loss function value of each sample image to the trained initial image classification model is calculated through the preset initial feature extraction network and the initial calculation network, so that the initial image classification model is set in a targeted mode, and the effectiveness and the accuracy of the initial image classification model are guaranteed.

In one embodiment, as shown in fig. 9, updating parameters of the initial image classification model according to the loss function values includes the steps of:

s901, obtaining image pair labels of image pairs of all samples; the image pair label characterizes a class coincidence state of the base image and the sample image in the sample image pair.

The image pair labels are used for representing whether the categories of the base images in the corresponding sample image pairs are consistent with the categories of the sample images or not; for example, if the base image and the sample image belong to the same category, the image pair tag of the corresponding sample image pair may be set to 1, and if the base image and the sample image belong to different categories, the image pair tag of the corresponding sample image pair may be set to 0.

When each sample image pair is constructed, an image pair label of each sample image pair is determined according to the category of the base image in each sample image pair and the category of the sample image.

S902, updating parameters of the initial image classification model according to the loss function value of the initial image classification model trained by each sample image pair and the corresponding image pair label.

Because the loss function value represents the similarity degree of the categories between the base image and the sample image in the sample image pair predicted by the initial image classification model, and the image pair label represents the consistency degree of the categories of the base image and the sample image, the initial image classification model can be determined as the image classification model according to the loss function value of the initial image classification model trained by each sample image pair and the parameters of the initial image classification model of the corresponding image pair label until the initial image classification model reaches the preset condition.

Specifically, updating parameters in an initial image classification model through a gradient descent method or a back propagation method according to each loss function value and a corresponding image pair label; the embodiment of the application does not limit how to update the parameters of the initial image classification model through the loss function value and the image pair label, so long as the loss function value predicted by the initial image classification model is more close to or consistent with the corresponding image pair label through updating the parameters.

Optionally, the updating of the parameters of the initial image classification model is essentially updating parameters in the initial feature extraction network and the initial computing network.

In the image classification method provided by the embodiment of the application, the image pair labels of each sample image pair are obtained; the image pair labels represent the class coincidence state of the base image and the sample image in the sample image pair, and the parameters of the initial image classification model are updated according to the loss function value of the initial image classification model trained by each sample image pair and the corresponding image pair labels. In the method, the image-to-label representation is the consistency degree of the categories of the base image and the sample image, the loss function value representation is the similarity degree of the categories between the base image and the sample image in the sample image pair predicted by the initial image classification model, so that the parameters of the initial image classification model are updated according to the loss function value and the image-to-label representation, the initial image classification model can be more accurate and reliable, and the accuracy of image classification is improved.

In one embodiment, the embodiment of the application also provides an image classification method, which comprises the steps of training an image classification model, and classifying images through the trained image classification model, wherein the initial image classification model comprises a single-channel feature network, a feature processing module and a computing network; as shown in fig. 10, this embodiment includes the steps of:

S1001, a plurality of types of sample images are acquired, each type of sample image including a plurality of images.

S1002, for any iterative training process of an initial image classification model, respectively acquiring any one of sample images of each category as a base image, randomly selecting a plurality of sample images from the rest sample images as target images for the base images of any category, obtaining a plurality of sample image pairs, and setting an image pair label according to the image category in the image pairs; each image pair includes a base image and a target image.

S1003, for any sample image pair; inputting the sample image pair into a single-channel feature network in an initial image classification model to obtain a feature image of a base image and a feature image of a target image;

the number of channels of the feature map of the base image and the feature map of the target image is 1, that is, the convolution kernel of the feature extraction network is 1.

S1004, inputting the feature images of the base image and the target image to a feature processing module, and extracting features of the feature images on different scales to obtain sub-feature images of the base image and the target image under a plurality of different scales; the feature processing module comprises a multi-scale feature network;

S1005, respectively fusing the sub-feature images of the base image and the target image under a plurality of different scales to obtain feature vectors of the base image and feature vectors of the target image.

S1006, splicing the feature vector of the base image and the feature vector of the target image according to the channels to obtain a 2-channel feature vector.

S1007, inputting the 2-channel feature vector into a computing network to obtain the similarity between the base image and the target image;

the computing network includes a convolution layer and a full connection layer: and inputting the 2-channel feature vector into a convolution layer, performing convolution calculation to obtain a similarity matrix between the base image and the target image, and then inputting the similarity matrix into a full-connection layer to obtain the similarity between the base image and the target image.

S1008, updating parameters in the initial image classification model according to the similarity score between the base image and the target image and the image pair label until the initial image classification model meets the preset condition, so as to obtain an image classification model;

the preset condition may be that the accuracy of the initial image classification model reaches an accuracy threshold.

S1009, the image to be predicted and any sample picture of each category are respectively constructed into a predicted image pair, and the predicted image pairs are respectively input into an image classification model to obtain the similarity of each predicted image pair, and the category corresponding to the predicted image pair with the highest similarity is determined as the image category of the image to be predicted.

In one embodiment, there is further provided an image classification method, including testing the trained image classification model, as shown in fig. 11, and fig. 11 shows a flowchart of testing the image classification model. The method comprises the steps of including K categories, wherein each category corresponds to a base image, and comprises a base image 1, a base image 2, a base image K, calculating a test feature map of a test image feature map and a base feature map of the base image through a single-channel feature network in an image classification model, calculating 2-channel feature vectors of the test image and each base image through a feature processing module, and obtaining similarity between the test image and each base image based on each 2-channel feature vector. Determining the prediction category of the test image through each similarity; and determining the accuracy of the detection image classification model according to the prediction type of the test image and the real type of the test image.

FIG. 12 is a schematic view of a partial flow of an image through a feature processing module, as shown in FIG. 12; after an image is obtained through a single-channel feature network, the feature image is subjected to multi-scale feature decomposition in a feature processing module to obtain a plurality of sub-feature images of the feature image under each scale, and then the sub-feature images are spliced to obtain feature vectors corresponding to the feature image.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides an image classification device for realizing the above related image classification method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of the image classification device or devices provided below may be referred to the limitation of the image classification method hereinabove, and will not be repeated here.

In one embodiment, as shown in fig. 13, there is provided an image classification apparatus 1300 comprising: an image pair acquisition module 1301, a prediction module 1302, and a determination module 1303, wherein:

an image pair acquisition module 1301, configured to acquire a plurality of image pairs configured by an image to be predicted and a plurality of different types of base images;

the prediction module 1302 is configured to input each image pair into a preset image classification model, so as to obtain a prediction probability of each image pair; the prediction probability represents the similarity between the image to be predicted in the image pair and the base image; the image classification model is obtained by iteratively learning the characteristic information of a plurality of sample images with different categories;

the determining module 1303 is configured to determine a class of the base image in the image pair with the highest prediction probability as a classification result of the image to be predicted.

In one embodiment, the apparatus 1300 further comprises:

the sample acquisition module is used for acquiring a plurality of sample images of different categories;

the sample determining module is used for determining a plurality of training sample groups according to sample images of a plurality of different categories; each training sample group comprises a plurality of sample image pairs under all categories;

the training module is used for training the initial image classification model through each training sample group until the initial image classification model converges to obtain an image classification model.

In one embodiment, the sample determination module comprises:

a substrate acquisition unit for acquiring a substrate image under each category according to the sample image under each category;

the combination unit is used for respectively combining the base images under each category with other sample images to obtain a plurality of sample image pairs under each category;

the dividing unit is used for obtaining a plurality of training sample groups according to a plurality of sample image pairs under each category and according to a mode that at least one sample image pair in each category is correspondingly divided by each training sample group.

In one embodiment, the substrate acquisition unit includes:

and the substrate acquisition subunit is used for determining any one sample image in any one category as a substrate image in the category.

In one embodiment, the training module includes:

the training unit is used for inputting each sample image pair in the training sample group into the initial image classification model for training for any training sample group, and obtaining the loss function value of the initial image classification model trained by each sample image pair;

and the updating unit is used for updating the parameters of the initial image classification model according to the loss function values until the initial image classification model reaches the preset condition, and determining convergence of the initial image classification model.

In one embodiment, each sample image pair includes a base image and a sample image; the initial image classification model comprises an initial feature extraction network and an initial calculation network; the training unit includes:

a first input subunit, configured to input, for any one sample image pair, the sample image pair into an initial feature extraction network, to obtain a feature vector of a base image in the sample image pair and a feature vector of the sample image;

the first splicing subunit is used for splicing the feature vector of the base image and the feature vector of the sample image to obtain a dual-channel feature vector;

and the second input subunit is used for inputting the double-channel feature vectors into the initial computing network and determining the loss function value of the sample image on the trained initial image classification model.

In one embodiment, the initial feature extraction network comprises a single channel feature network and a multi-scale feature network; the first input subunit includes:

the third input subunit is used for inputting the sample image pair into the single-channel feature network to obtain a feature image of the base image and a feature image of the sample image;

a fourth input subunit, configured to input a feature map of the base image and a feature map of the sample image into a multi-scale feature network, to obtain a sub-feature map of the base image at a plurality of different scales and a sub-feature map of the sample image at a plurality of different scales;

The second splicing subunit is used for splicing the plurality of sub-feature images of the base image to obtain feature vectors of the base image; and splicing the plurality of sub-feature images of the sample image to obtain the feature vector of the sample image.

In one embodiment, an initial computing network includes a convolutional layer and a fully-connected layer; the second input subunit includes:

a fifth input subunit, configured to input the dual-channel feature vector to the convolution layer, to obtain a similarity matrix between the base image and the sample image;

a sixth input subunit, configured to input a similarity matrix to the full-connection layer, and determine a similarity between the base image and the sample image;

and a determining subunit configured to determine a similarity between the base image and the sample image as a loss function value.

In one embodiment, the update unit includes:

a label obtaining subunit, configured to obtain an image pair label of each sample image pair; the image pair label characterizes a class coincidence state of a base image and a sample image in the sample image pair;

and the updating subunit is used for updating parameters of the initial image classification model according to the loss function value of the initial image classification model trained by each sample image pair and the corresponding image pair label.

The respective modules in the above-described image classification apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

The implementation principle and technical effect of each step implemented by the processor in the embodiment of the present application are similar to those of the above image classification method, and are not described herein.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.

The steps implemented when the computer program is executed by the processor in the embodiment of the present application implement the principles and technical effects similar to those of the above-described image classification method, and will not be described herein.

In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. A method of classifying images, the method comprising:

2. The method of claim 1, wherein the training process of the image classification model comprises:

acquiring a plurality of sample images of different categories;

determining a plurality of training sample groups according to the sample images of the different categories; each training sample group comprises a plurality of sample image pairs under all categories;

training the initial image classification model through each training sample group until the initial image classification model converges to obtain an image classification model.

3. The method of claim 2, wherein determining a plurality of training sample sets from the plurality of different classes of sample images comprises:

and obtaining a plurality of training sample groups according to a plurality of sample image pairs under each category and according to a mode that at least one sample image pair in each category is correspondingly divided by each training sample group.

4. A method according to claim 3, wherein said acquiring a substrate image under each of said categories comprises:

for any one of the categories, determining any one of the sample images in the category as a base image in the category.

5. The method according to any one of claims 2-4, wherein training the initial image classification model with each of the training sample sets, respectively, until the initial image classification model converges, comprises:

for any training sample group, each sample image pair in the training sample group is input into the initial image classification model for training, and the loss function value of the initial image classification model trained by each sample image pair is obtained;

and updating parameters of the initial image classification model according to the loss function values until the initial image classification model reaches a preset condition, and determining that the initial image classification model converges.

6. The method of claim 5, wherein each sample image pair comprises a base image and a sample image; the initial image classification model comprises an initial feature extraction network and an initial calculation network; the obtaining the loss function value of each sample image to the trained initial image classification model comprises the following steps:

For any sample image pair, inputting the sample image pair into the initial feature extraction network to obtain feature vectors of a base image and feature vectors of a sample image in the sample image pair;

and inputting the dual-channel feature vector into the initial computing network, and determining a loss function value of the sample image pair trained on the initial image classification model.

7. The method of claim 6, wherein the initial feature extraction network comprises a single channel feature network and a multi-scale feature network; the inputting the sample image pair into the initial feature extraction network, obtaining a feature vector of a base image in the sample image pair and a feature vector of a sample image, including:

inputting the sample image pair into the single-channel feature network to obtain a feature map of the substrate image and a feature map of the sample image;

inputting the characteristic images of the base image and the characteristic images of the sample image into the multi-scale characteristic network to obtain a sub-characteristic image of the base image at a plurality of different scales and a sub-characteristic image of the sample image at a plurality of different scales;

8. The method of claim 6, wherein the initial computing network comprises a convolutional layer and a fully-connected layer; the inputting the dual-channel feature vector into the computing network, determining a loss function value of the sample image to the trained initial image classification model, comprising:

inputting the double-channel feature vector into the convolution layer to obtain a similarity matrix between the base image and the sample image;

inputting the similarity matrix to the full-connection layer, and determining the similarity between the substrate image and the sample image;

a similarity between the base image and the sample image is determined as the loss function value.

9. The method of claim 5, wherein said updating parameters of said initial image classification model based on each of said loss function values comprises:

acquiring image pair labels of each sample image pair; the image pair labels represent the category consistency state of the base image and the sample image in the sample image pair;

And updating parameters of the initial image classification model according to the loss function value of the initial image classification model trained by each sample image pair and the corresponding image pair label.

10. The method of claim 5, wherein the initial image classification model reaching a preset condition comprises a classification result accuracy of the initial filter network model reaching a preset accuracy threshold.

11. An image classification apparatus, the apparatus comprising:

12. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 10 when the computer program is executed.

13. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 10.

14. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1 to 10.