CN113361636B

CN113361636B - Image classification method, system, medium and electronic device

Info

Publication number: CN113361636B
Application number: CN202110741804.7A
Authority: CN
Inventors: 袭肖明; 杨霄; 聂秀山; 宁阳; 张光; 尹义龙
Original assignee: Shandong University; Shandong Jianzhu University; Shandong Qianfoshan Hospital
Current assignee: Shandong University; Shandong Jianzhu University; Shandong Qianfoshan Hospital
Priority date: 2021-06-30
Filing date: 2021-06-30
Publication date: 2022-09-20
Anticipated expiration: 2041-06-30
Also published as: CN113361636A

Abstract

The present disclosure provides an image classification method, system, medium, and electronic device, which acquire an image pair to be classified; preprocessing the acquired image pair; extracting the characteristics of the preprocessed image pair; obtaining a final classification result according to the extracted features and a preset neural network model; the preset neural network model comprises a template image feature map, the image feature map to be classified is connected with the template image feature map in series, the similarity score of the image feature map to be classified and the template image feature map is calculated, and the category with the highest score is taken as a final classification result; according to the method, a twin network structure is introduced to solve the problem of small data, a modal correlation learning module is introduced to fully mine correlation characteristics among multiple modalities, an attention fusion module is introduced to learn attention of two modal characteristics, higher attention is given to the correlation characteristics, and the accuracy of an image classification result is greatly improved.

Description

Image classification method, system, medium and electronic device

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an image classification method, system, medium, and electronic device.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

Image classification is an important research direction of computer vision, and has wide application in numerous tasks such as image analysis, identity authentication, disease diagnosis and the like. Although the existing method achieves better performance in the task of image classification, for some tasks, the classification only by using the image of a single mode is difficult to achieve a satisfactory effect. For example, in the task of breast cancer auxiliary diagnosis, the molybdenum target image and the ultrasound image have advantages and defects, and it is difficult to obtain accurate diagnosis results only by using a single-modality image. Therefore, the information of the fused multi-modal image can overcome the limitation of effective information expression of the single-modal image, and the classification performance can be further improved.

Deep learning has been widely applied to image recognition tasks due to its powerful feature learning capability. However, compared with single-mode images, multi-mode image data is less (for example, in a medical image processing task, a large amount of multi-mode images are difficult to collect), and the existing deep learning method does not consider information interaction between multi-mode images, so that the improvement of classification performance is limited.

Disclosure of Invention

In order to solve the defects of the prior art, the disclosure provides an image classification method, a system, a medium and an electronic device, a twin network structure is introduced to solve the problem of small data, a modality correlation learning module is introduced to fully mine correlation characteristics among multiple modalities, an attention fusion module is introduced to learn attention of two modality characteristics, higher attention is given to the correlation characteristics, and the accuracy of an image classification result is greatly improved.

In order to achieve the purpose, the following technical scheme is adopted in the disclosure:

a first aspect of the present disclosure provides an image classification method.

An image classification method comprising the process of:

acquiring an image pair to be classified;

preprocessing the acquired image pair;

extracting the characteristics of the preprocessed image pair;

obtaining a final classification result according to the extracted features and a preset neural network model;

the preset neural network model comprises a template image feature map, the image feature map to be classified and the template image feature map are connected in series, the similarity score of the image feature map to be classified and the template image feature map is calculated, and the category with the highest score is used as a final classification result.

Further, the obtaining of the template image feature map comprises:

selecting two types from a plurality of types, wherein each type has a picture pair which is two modes of pictures;

performing feature extraction on the image pair by using a preset convolutional neural network to obtain a first feature map and a second feature map;

calculating a correlation score between the first characteristic diagram and the second characteristic diagram by using a cosine function to obtain a first score and a second score;

and multiplying the first score by the first feature map points, multiplying the second score by the second feature map points, performing convolution, and then respectively adding the convolution with the first feature map and the second feature map to obtain a fused feature map, wherein the fused feature map is taken as a template image feature map.

Further, calculating a correlation score between the first feature map and the second feature map using a cosine function, comprising:

normalizing the first picture feature and the second picture feature of the extracted picture pair to obtain f1_ norm and f2_ norm, performing channel rearrangement on f1_ norm to obtain f1_ norm _1, and taking the result of the multiplication of f1_ norm _1 and f2_ norm as a first score; and performing channel rearrangement on the f2_ norm to obtain f2_ norm _1, and then the result of point multiplication of the f2_ norm _1 and the f1_ norm is recorded as a second score.

Further, calculating a similarity score between the image feature map to be classified and the template image feature map, including:

and performing feature learning by using a multi-layer fully-connected network, then calculating the similarity between the image to be classified and the template image by using a softmax module, finally generating a scalar in a range of 0 to 1, and judging the similarity between the image feature map to be classified and the template image feature map according to the obtained scalar.

Further, feature extraction is performed on the preprocessed image by using a resnet34 network.

Further, in the training process of the preset neural network model, processing the data in the data set, including:

carrying out scale transformation on the existing data set by using transforms in a Pythrch, transforming the existing data set into a uniform size, and then carrying out uniform normalization processing;

carrying out the same data enhancement on the image pairs in the original data set, and carrying out different data enhancement on different image pairs;

the labels of the enhanced labeled data set and the label-free data set are unchanged, and the data quantity of each type of data of the enhanced labeled data is consistent or different within a preset range.

A second aspect of the present disclosure provides an image classification system.

An image classification system comprising:

a data acquisition module configured to: acquiring an image pair to be classified;

a pre-processing module configured to: preprocessing the acquired image pair;

a feature extraction module configured to: extracting the characteristics of the preprocessed image pair;

an image classification module configured to: obtaining a final classification result according to the extracted features and a preset neural network model;

the preset neural network model comprises a template image feature map, the image feature map to be classified is connected with the template image feature map in series, the similarity score of the image feature map to be classified and the template image feature map is calculated, and the category with the highest score is taken as the final classification result.

A third aspect of the present disclosure provides a computer-readable storage medium having stored thereon a program which, when executed by a processor, implements the steps in the image classification method according to the first aspect of the present disclosure.

A fourth aspect of the present disclosure provides an electronic device, comprising a memory, a processor, and a program stored on the memory and executable on the processor, wherein the processor implements the steps of the image classification method according to the first aspect of the present disclosure when executing the program.

Compared with the prior art, the beneficial effect of this disclosure is:

the methods, systems, media or electronic devices described in this disclosure solve the small sample problem with a twin network structure; and a multi-modal relevance learning module is introduced to learn the relevance among multi-modal images, so that the information interaction among the modalities is enhanced.

According to the method, the system, the medium or the electronic equipment, the attention fusion module is introduced to give higher attention to the robust modal correlation characteristics, so that the characteristics play a more important role in final classification, and the accuracy of the classification result is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and are not to limit the disclosure.

Fig. 1 is a schematic flowchart of an image classification method provided in embodiment 1 of the present disclosure.

Fig. 2 is a schematic diagram of a network learning process provided in embodiment 1 of the present disclosure.

Detailed Description

The present disclosure is further described with reference to the following drawings and examples.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.

Example 1:

as shown in fig. 1 and 2, embodiment 1 of the present disclosure provides an image classification method, including the following processes:

acquiring an image pair to be classified;

preprocessing the acquired image pair;

extracting the characteristics of the preprocessed image pair;

and obtaining a final classification result according to the extracted features and a preset neural network model.

Specifically, the method comprises the following steps:

s1: data set preprocessing

The image sizes of the original data samples may not be consistent, which is not beneficial to the deep network model for feature extraction and subsequent learning. Therefore, the existing dataset is first scaled by using the transforms in the pytorech, transformed to a uniform size (e.g., 224 × 224), and then normalized (normalized).

Because the data volume is small and a large amount of data is lacked, the same data enhancement is carried out on the image pairs in the original data set, different data enhancements are carried out on different image pairs, and the main data enhancements comprise random cutting, horizontal turning, vertical turning, random rotation, addition of salt and pepper noise and the like. The labels of the labeled data set and the unlabeled data set are unchanged after enhancement. It should be noted that the enhanced tagged data must ensure that each type of data is balanced, i.e., each type of data is substantially consistent in volume.

S2: input device

Two categories are selected from the plurality of categories, each category has a picture pair (the picture pair is two modals of pictures), the 4 pictures are used as templates, and one picture pair is selected as an input of a test picture as a model.

S3: data feature vector extraction

Image pairs are loaded and input into the lower network (conv1, bn1, relu, maxpool, layer1, layer2, layer3) of resnet34 at the same time according to the size of batch _ size. Extracting key characteristic information vectors of the image after a shallow convolutional neural network operation, and inputting the characteristic vectors into a user-defined convolutional neural network.

S4: correlation learning module

Taking a pair of data features (feature maps) p passing through the lower layer of resnet34 _i And q is _j For which a cosine function is used to calculate a correlation score between them:

and (3) calculating: first to p _i And q is _j Normalization is carried out to obtain f1_ norm and f2_ norm, the f1_ norm is subjected to channel rearrangement to obtain f1_ norm _1, and the result of the point multiplication of the f1_ norm _1 and the f2_ norm is recorded as R1 (correlation score 1). Similarly, if f2_ norm _1 is obtained by channel rearrangement of f2_ norm, the result of the dot multiplication of f2_ norm _1 and f1_ norm is denoted as R2 (correlation score 2).

S5: attention fusion module

Multiplying R1 and f2 by using the correlation score to obtain RA1, multiplying R2 and f1 by using the correlation score to obtain RA2, performing 1-1 convolution to obtain RA1s and RA2s respectively, and finally adding the RA1s and the RA2 539 to f1 and f2 respectively to obtain final fusion feature maps which are marked as RA1_ frs and RA2_ frs.

S6: feature concatenation module for relational networks

The feature map after the feature extraction is completed is denoted as f (x) _i ) And f (x) _j ) F (x) _i ) And f (x) _j ) Connected in series, respectively, in preparation for calculating the similarity score.

S7: metric learning module

Template f (x) _i ) And query f (x) _j ) Is fed to a metric learning module which firstly uses a 3-layer fully-connected network to carry out further feature learning, then calculates the similarity of the sample to be tested and the template through a softmax module, and finally generates a scalar quantity in the range of 0 to 1, wherein the scalar quantity represents x _i And x _j The similarity between them.

S8: calculating mean square error loss

And calculating the Euclidean distance between the predicted data and the real data, wherein the closer the predicted value and the real value are, the smaller the mean square error of the predicted value and the real value is. The category corresponding to the maximum score is a prediction category, and the mean square error loss calculation is carried out on the prediction result output at present and the prediction result output by historical weighting:

s9: network training

In a small sample network, back propagation training is repeatedly carried out by using Mean Square Error (MSE) loss, at the moment, the loss value is gradually reduced along with the increase of training rounds until reaching a preset training round, and the network model with the minimum loss value is stored as a training result.

S10: prediction phase

And inputting the image pair into the trained network model for prediction to obtain a corresponding category score, wherein the category corresponding to the maximum value of the category score is the prediction result.

The system in the dashed line box corresponding to fig. 1 is a system module mainly performing a classification function, wherein the feature vector module utilizes the renet 34 and the correlation learning module in S4, the attention fusion module in S5 obtains scores, trains by utilizing the network in S9, determines appropriate network parameters, and finally tests the results.

A user inputs image data to be tested into a classification system, the classification system automatically carries out four processes of feature vector extraction, a correlation learning module, an attention fusion module and calculation prediction category, and finally the prediction category is output to interact with the user.

Example 2:

an embodiment 2 of the present disclosure provides an image classification system, including:

a pre-processing module configured to: preprocessing the acquired image pair;

Example 3:

the embodiment 3 of the present disclosure provides a computer-readable storage medium on which a program is stored, which when executed by a processor, implements the steps in the image classification method according to the embodiment 1 of the present disclosure.

Example 4:

the embodiment 4 of the present disclosure provides an electronic device, which includes a memory, a processor, and a program stored in the memory and executable on the processor, and when the processor executes the program, the steps in the image classification method according to the embodiment 1 of the present disclosure are implemented.

As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims

1. An image classification method, characterized by: the method comprises the following steps:

acquiring an image pair to be classified;

preprocessing the acquired image pair;

extracting the characteristics of the preprocessed image pair;

the preset neural network model comprises a template image feature map, the image feature map to be classified is connected with the template image feature map in series, the similarity score of the image feature map to be classified and the template image feature map is calculated, and the category with the highest score is taken as a final classification result;

the acquisition of the template image feature map comprises the following steps:

multiplying the first score by the first feature map point, multiplying the second score by the second feature map point, after convolution, respectively adding the first feature map and the second feature map to obtain a fused feature map, and taking the fused feature map as a template image feature map;

calculating a correlation score between the first feature map and the second feature map using a cosine function, comprising:

normalizing the first picture feature and the second picture feature of the extracted picture pair to obtain f1_ norm and f2_ norm, performing channel rearrangement on f1_ norm to obtain f1_ norm _1, and taking the result of the multiplication of f1_ norm _1 and f2_ norm as a first score; performing channel rearrangement on the f2_ norm to obtain f2_ norm _1, and recording the result of point multiplication of the f2_ norm _1 and the f1_ norm as a second score;

calculating the similarity score between the image feature map to be classified and the template image feature map, wherein the similarity score comprises the following steps:

and performing feature learning by using a multilayer fully-connected network, then calculating the similarity between the image to be classified and the template image by using a softmax module, finally generating a scalar in a range from 0 to 1, and judging the similarity between the image feature map to be classified and the template image feature map according to the obtained scalar.

2. The image classification method according to claim 1, characterized in that:

and (5) performing feature extraction on the preprocessed image by using a resnet34 network.

3. The image classification method according to claim 1, characterized in that:

in the training process of the preset neural network model, processing data in the data set, including:

4. An image classification system characterized by: the method comprises the following steps:

a pre-processing module configured to: preprocessing the acquired image pair;

randomly selecting two types from a plurality of types, wherein each type has a picture pair which is two modes of pictures;

multiplying the first score by the first feature map, multiplying the second score by the second feature map, after convolution, respectively adding the first score and the second feature map to obtain a fused feature map, and taking the fused feature map as a template image feature map;

normalizing the first picture characteristic and the second picture characteristic of the extracted picture pair to obtain f1_ norm and f2_ norm, performing channel rearrangement on f1_ norm to obtain f1_ norm _1, and taking the multiplication result of f1_ norm _1 and f2_ norm as a first score; performing channel rearrangement on the f2_ norm to obtain f2_ norm _1, and recording the result of point multiplication of the f2_ norm _1 and the f1_ norm as a second score;

5. A computer-readable storage medium, on which a program is stored which, when being executed by a processor, carries out the steps of the image classification method according to any one of claims 1 to 3.

6. An electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, wherein the processor implements the steps in the image classification method according to any one of claims 1 to 3 when executing the program.