CN113191390A

CN113191390A - Image classification model construction method, image classification method and storage medium

Info

Publication number: CN113191390A
Application number: CN202110356938.7A
Authority: CN
Inventors: 张旭明; 周权
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2021-04-01
Filing date: 2021-04-01
Publication date: 2021-07-30
Anticipated expiration: 2041-04-01
Also published as: WO2022205502A1; CN113191390B

Abstract

The invention discloses a construction method of an image classification model, the image classification method and a storage medium, wherein the constructed image classification model comprises a convolution layer, a first pyramid convolution unit, a second pyramid convolution unit …, an nth pyramid convolution unit, a pooling layer and a full-connection layer which are sequentially cascaded; after the ith pyramid convolution unit respectively performs further feature extraction on the currently input feature map by adopting convolution kernels with different scales, the number of which is n-i +1, sequentially fusing the feature map extracted by the convolution kernels of each scale with the fusion feature map extracted by the convolution kernel of the previous stage to obtain fusion feature maps extracted by the convolution kernels of each scale, namely a group of feature maps containing information with different scales; fusing the feature map containing different scale information with the currently input feature map to obtain an output feature map containing multi-scale information; 1,2, …, n; the invention fully utilizes the information of different scales and has higher image classification accuracy.

Description

Image classification model construction method, image classification method and storage medium

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a construction method of an image classification model, an image classification method and a storage medium.

Background

Image classification techniques are the core of computer vision and have wide applications in many fields, such as: the method comprises the steps of face recognition and intelligent video analysis in the security field, traffic scene recognition in the traffic field, image retrieval in the internet field, medical image analysis in the medical field and the like. Taking medical images as an example, doctors can identify images collected by imaging devices (such as magnetic resonance imaging, ultrasonic imaging, optical tomography and the like) in clinical diagnosis to achieve the purpose of disease screening. However, the manual identification effect greatly depends on the clinical experience of the doctor, and the diagnosis efficiency of the doctor is also influenced by huge medical data volume, so that misdiagnosis or missed diagnosis is easily caused by over fatigue of the doctor. At present, the automated computer-aided diagnosis technology is widely applied to the field of medical image recognition, and the image is processed and analyzed by utilizing the strong computing power of a computer, so that information with reference value is provided for a clinician, and the workload of the clinician is greatly reduced.

In recent years, deep learning algorithms have gained wide attention in the field of image classification. Compared with the traditional machine learning algorithm for acquiring manual features based on shallow learning, the deep learning method combines a plurality of nonlinear shallow features and constructs more abstract high-order features on the basis. Like the deep structure of the brain, in deep learning, each input object is represented in a multi-layer abstract form, each layer corresponding to a different cortical region. The advantage of deep learning is that it achieves multi-level features that are learned from raw data using a common learning process, rather than being designed by manual screening. The deep learning models which are commonly used at present comprise a deep boltzmann machine, a deep belief network, a stacked automatic encoder, a recurrent neural network and a convolutional neural network. Convolutional neural networks are widely used in image processing and have good effects in many medical image recognition tasks. However, most of the existing network models only use a single convolution kernel when extracting image feature information, and for images with large target area changes, it is difficult to completely capture feature information with different detail sizes.

Disclosure of Invention

In view of the above drawbacks or needs for improvement in the prior art, the present invention provides a method for constructing an image classification model, an image classification method, and a storage medium, so as to solve the technical problem in the prior art that the classification accuracy is low due to the fact that feature information of different scales is not fully utilized.

In order to achieve the above object, in a first aspect, the present invention provides a method for constructing an image classification model, including the following steps:

s1, building an image classification model; the image classification model comprises: the convolution layer, the first pyramid convolution unit, the second pyramid convolution unit, the nth pyramid convolution unit, the pooling layer and the full-connection layer are sequentially cascaded; the first convolution layer is used for extracting an initial feature map of an input image and outputting the initial feature map to the first pyramid convolution unit; the ith pyramid convolution unit is used for adopting convolution kernels with different scales of n-i +1 in number to respectively perform further feature extraction on the feature map currently input to the ith pyramid convolution unit, and then sequentially fusing the feature map extracted by the convolution kernels with the fusion feature map extracted by the convolution kernel of the previous stage to obtain fusion feature maps extracted by the convolution kernels of each scale, namely a group of feature maps containing information with different scales; fusing the feature map containing different scale information with the feature map currently input to the ith pyramid convolution unit to obtain an output feature map containing multi-scale information; wherein i is 1,2, …, n; for each scale convolution kernel, the scale is larger than that of the previous stage convolution kernel;

and S2, inputting a training set collected according to a preset classification task into the image classification model for training to obtain a trained image classification model.

Further preferably, the input image is an image obtained by scaling the original sample image in the training set, so as to improve the calculation efficiency.

Further preferably, a feature map extracted by a convolution kernel of a first block scale in an ith pyramid convolution unit is marked as F_i ^block，block 1, …, n-i + 1; for the ith pyramid convolution unit, when i is 1,2, …, n-1, the feature map F is processed_i ²And characteristic diagram F_i ¹Fusing to obtain a fusion characteristic diagram extracted by a second scale convolution kernel

Starting from block to 3, the feature map F is processed_i ^blockFusion characteristic graph extracted from convolution kernel with first block-1 scale

Fusing to obtain a fusion characteristic graph extracted by a first block scale convolution kernel

After the fusion characteristic graphs of the convolution kernels of all scales are extracted, all the fusion characteristic graphs are subjected to

After splicing operation is carried out, the feature map is fused with the feature map currently input to the ith pyramid convolution unit to obtain an output feature map containing multi-scale information, and the output feature map is output to the (i +1) th pyramid convolution unit; and when i is equal to n, performing convolution operation on the feature map currently input to the ith pyramid convolution unit, fusing the feature map currently input to the ith pyramid convolution unit with the feature map currently input to the ith pyramid convolution unit to obtain an output feature map containing multi-scale information, outputting the output feature map to the pooling layer, performing pooling operation, and then obtaining a classification result through the full-connection layer.

More preferably, in the ith pyramid convolution unit, a specific way of fusing the feature map a with the feature map B or the fused feature map B in the ith pyramid convolution unit is as follows: performing convolution operation on the A and then combining the A with the B; the mode of combining with B comprises pixel-by-pixel superposition operation or splicing operation or convolution operation after splicing;

for each fused feature map

Make a spliceThe specific mode of fusing the operated characteristic graph with the characteristic graph currently input to the ith pyramid convolution unit is as follows: combining each fused feature map

And splicing according to channels, changing the number of characteristic channels of the spliced characteristic diagram in a convolution mode to keep the characteristic channels consistent with the number of channels of the characteristic diagram currently input to the ith pyramid convolution unit, and then superposing and summing the characteristic channels and the characteristic diagram currently input to the ith pyramid convolution unit pixel by pixel to obtain an output characteristic diagram containing multi-scale information.

Further preferably, the output end of the ith pyramid convolution unit is also connected to the input end of the ith pyramid convolution unit;

the ith pyramid convolution unit is also used for inputting the obtained output feature map containing the multi-scale information to the ith pyramid convolution unit again before outputting the obtained output feature map containing the multi-scale information to the next pyramid convolution unit or the pooling layer so as to further extract features of the currently obtained output feature map containing the multi-scale information; after repeating for multiple times, outputting the result to the next pyramid convolution unit or pooling layer; so as to improve the robustness of the image classification model.

Further preferably, the image classification model further includes: a hybrid attention module; n mixed attention modules, when i is 1,2, …, n-1, the ith mixed attention module is located between the ith pyramid convolution unit and the (i +1) th pyramid convolution unit; when i is equal to n, the ith mixed attention module is positioned between the ith pyramid convolution unit and the pooling layer;

the mixed attention module comprises a space attention network and a channel attention network which are connected in series or in parallel, and is used for screening the output characteristic diagram containing multi-scale information input by the pyramid convolution unit in space and channel dimensions to obtain a characteristic diagram F_saThereby suppressing redundant background information and highlighting characteristic information that is beneficial to the classification result.

Further preferably, the output end of the ith mixed attention module is also connected to the input end of the ith pyramid convolution unit;

the ith mixed attention module is also used for obtaining a characteristic diagram F_saRe-input to the ith pyramid convolution unit to perform feature map F_saFurther extracting features; after repeating for multiple times, outputting the result to the next pyramid convolution unit or pooling layer; so as to improve the robustness of the image classification model.

Further preferably, the channel attention network is configured to perform global average pooling operation on the input feature map according to channels to extract global spatial information on each channel; then, channel weights of global space information on all channels are learned through one-dimensional convolution kernels shared by the weights, and the learned channel weights are respectively acted on the corresponding channels in the input feature graph so as to carry out channel dimension screening on the feature information;

size k of convolution kernel in channel attention network_1DAnd the number C of feature channels of the input feature map_1DSatisfies the following conditions:

wherein gamma and b are learning parameters, | e! Y_oddRepresenting the odd number nearest to e.

In a second aspect, the present invention provides an image classification method, including: and inputting the image to be classified into the image classification model constructed by adopting the construction method of the image classification model provided by the first aspect of the invention to obtain a classification result.

In a third aspect, the present invention also provides a machine-readable storage medium storing machine-executable instructions which, when invoked and executed by a processor, cause the processor to implement any one of the image classification model construction methods described above and/or the image classification method described above.

Generally, by the above technical solution conceived by the present invention, the following beneficial effects can be obtained:

1. the invention provides a method for constructing an image classification model, wherein the constructed image classification model comprises a convolution layer, a first pyramid convolution unit, a second pyramid convolution unit, an n-th pyramid convolution unit, a pooling layer and a full-connection layer which are sequentially cascaded, wherein the pyramid convolution unit sequentially fuses a feature map extracted by each scale convolution kernel and a fusion feature map extracted by a previous stage convolution kernel in a convolution crossing connection mode to obtain a fusion feature map extracted by each scale convolution kernel, so that the correlation among the feature maps is further mined to obtain an output feature map containing multi-scale information, and different scale information among the output feature maps is fully utilized; the invention utilizes a multi-scale scheme to extract different fine-grained image characteristics, and the accuracy of image classification is higher.

2. The image classification model constructed by the construction method of the image classification model further comprises a mixed attention module, the output feature graph which is input by the pyramid convolution unit and contains multi-scale information is screened on the space and channel dimensions based on the space attention network and the channel attention network, self-adaptive calibration of channel features and space information is achieved, redundant information introduced during integration of feature graphs of different scales is restrained, and the accuracy of image classification is further improved by effectively restraining useless background information and highlighting key feature information.

3. In the image classification model constructed by the construction method of the image classification model provided by the invention, the output end of the ith mixed attention module is also connected to the input end of the ith pyramid convolution unit; the pyramid convolution unit and the mixed attention module connected with the output end of the pyramid convolution unit are called as mixed attention pyramid modules, the mixed attention pyramid modules composed of different numbers and depth convolution kernels are cascaded together for image classification, and the model accuracy rate is improved while the robustness of the model is greatly improved.

4. In the image classification model constructed by the construction method of the image classification model, the image can be scaled in scale before being input into the model so as to improve the calculation efficiency.

Drawings

Fig. 1 is a schematic structural diagram of an image classification model provided in embodiment 1 of the present invention;

fig. 2 is a schematic structural diagram of an image classification model including a mixed attention module according to embodiment 1 of the present invention;

fig. 3 is a schematic structural diagram of an image classification model according to embodiment 1 of the present invention, which includes a mixed attention module, and an output end of the mixed attention module is further connected to an input end of a corresponding pyramid convolution unit;

fig. 4 is a schematic diagram illustrating a 3 × 3 convolution cross-connection manner in the pyramid convolution unit provided in embodiment 1 of the present invention;

FIG. 5 is a graph of the accuracy of a HapcNet and each comparative deep learning model on a validation set of anterior chamber angles, as provided in example 1 of the present invention;

FIG. 6 is a confusion matrix of HapcNet and each comparative deep learning model on the anterior chamber angle test set according to embodiment 1 of the present invention; wherein, (a) is a confusion matrix of the deep learning model VGG-16 on the anterior chamber angle test set; (b) a confusion matrix of a deep learning model ResNet-50 on the anterior chamber corner test set; (c) a confusion matrix of a deep learning model DenseNet-121 on an anterior chamber corner test set; (d) a confusion matrix of a deep learning model MobileNet on an anterior chamber corner test set; (e) a confusion matrix of a deep learning model EfficientNet-B7 on the anterior chamber corner test set; (f) a confusion matrix of a deep learning model PyConvNet-50 on an anterior chamber corner test set; (g) the confusion matrix of the HapcNet provided by the invention on the anterior chamber corner test set.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

Examples 1,

A construction method of an image classification model comprises the following steps:

s1, building an image classification model; as shown in fig. 1, the image classification model includes: the convolution layer, the first pyramid convolution unit, the second pyramid convolution unit, the nth pyramid convolution unit, the pooling layer and the full-connection layer are sequentially cascaded; the first convolution layer is used for extracting an initial feature map of an input image and outputting the initial feature map to the first pyramid convolution unit; the ith pyramid convolution unit is used for adopting convolution kernels with different scales of n-i +1 in number to respectively perform further feature extraction on the feature map currently input to the ith pyramid convolution unit, and then sequentially fusing the feature map extracted by each scale convolution kernel with the fusion feature map extracted by the previous stage convolution kernel to obtain fusion feature maps extracted by each scale convolution kernel, namely a group of feature maps containing information with different scales; splicing the feature maps containing different scale information, and then fusing the feature maps with the feature map currently input to the ith pyramid convolution unit to obtain an output feature map containing multi-scale information; wherein i is 1,2, …, n; for each scale convolution kernel, its scale is larger than that of its previous stage convolution kernel. Specifically, a feature map extracted by adopting a convolution kernel of a first scale in an ith pyramid convolution unit is marked as F_i ^blockBlock 1, …, n-i + 1; for the ith pyramid convolution unit, when i is 1,2, …, n-1, the feature map F is processed_i ²And characteristic diagram F_i ¹Fusing to obtain a fusion characteristic diagram extracted by a second scale convolution kernel

Will feature chart F_i ³Fused feature map extracted with second scale convolution kernel

And the fusion characteristic graph extracted by the third scale convolution kernel is obtained by fusion

By analogy, after the fused feature maps of the convolution kernels of all scales are extracted, all the fused feature maps are subjected to extraction

After splicing operation is carried out, the feature map is fused with the feature map currently input to the ith pyramid convolution unit to obtain an output feature map containing multi-scale information, and the output feature map is output to the (i +1) th pyramid convolution unit; and when i is equal to n, performing convolution operation on the feature map currently input to the ith pyramid convolution unit, fusing the feature map currently input to the ith pyramid convolution unit with the feature map currently input to the ith pyramid convolution unit to obtain an output feature map containing multi-scale information, outputting the output feature map to the pooling layer, performing pooling operation, and then obtaining a classification result through the full-connection layer. Preferably, when i is 1,2, …, n-1, in the above-mentioned ith pyramid convolution unit, the feature map F is processed_i ^blockFusion characteristic graph extracted from convolution kernel with first block-1 scale

The specific way of fusion is as follows: for feature map F_i ^block-1After convolution operation, the feature map is fused

Combining to fully mine information among different characteristic graphs, so that the information is more complete; wherein, the feature map is fused with

The combination mode comprises pixel-by-pixel superposition operation or splicing operation or convolution operation after splicing. It should be noted that the feature map F_i ²And characteristic diagram F_i ¹The mode of fusion and the feature map F_i ^blockFusion characteristic graph extracted from convolution kernel with first block-1 scale

The fusion is performed in the same manner, which is not described herein. Further, for each fused feature map

The specific way of fusing the feature map which is currently input to the ith pyramid convolution unit after the splicing operation is performed is as follows: combining each fused feature map

And S2, inputting a training set collected according to a preset classification task into the image classification model for training to obtain a trained image classification model. Preferably, the input image may be an image obtained by scaling an original sample image in a training set, so as to improve the calculation efficiency and speed up the training. In this embodiment, the cross entropy loss is taken as a total loss function, specifically:

wherein eta represents the number of output categories, and Num is the batch size of the images in the training set; x is the number of_p,qThe predicted probability generated by the softmax classification function for the p sample and belonging to class q; y is_p,qIs the corresponding label of the p-th sample classified as the q-th class.

Preferably, in order to improve the robustness of the image classification model, the output end of the ith pyramid convolution unit is further connected to the input end of the ith pyramid convolution unit;

Preferably, as shown in fig. 2, in order to solve the problem of information redundancy generated in the feature fusion process, useful information is further highlighted and useless information is suppressed; the image classification model further includes: a hybrid attention module; n mixed attention modules, when i is 1,2, …, n-1, the ith mixed attention module is located between the ith pyramid convolution unit and the (i +1) th pyramid convolution unit; when i is equal to n, the ith mixed attention module is positioned between the ith pyramid convolution unit and the pooling layer; the mixed attention module comprises a space attention network and a channel attention network which are connected in series or in parallel, and is used for screening the output characteristic diagram containing multi-scale information input by the pyramid convolution unit in space and channel dimensions to obtain a characteristic diagram F_saThereby suppressing redundant background information and highlighting characteristic information that is beneficial to the classification result.

Taking a mixed attention module composed of a spatial attention network and a channel attention network which are cascaded as an example, the output feature map u containing multi-scale information output by the ith pyramid convolution unit is input into the mixed attention module, and the following operations are executed in the mixed attention module:

in the channel attention network, firstly, the global average pooling operation is carried out on the output feature graph u according to the channels to extract the global spatial information on each channel, wherein u_cGlobal spatial information of the c-th channel

H_caAnd W_caAre each u_cHeight and width of (u)_cA feature map corresponding to the c-th channel of the output feature map u, wherein z is a one-dimensional vector containing global space information of each channel; then, the channel weights of the global space information on each channel are respectively learned through a one-dimensional convolution kernel with shared weights, and the obtained weight is

Wherein δ (·) is a Sigmoid function; 1D _ Conv denotes the use of a size k_1DPerforming one-dimensional convolution operation on the z by the convolution kernel; it should be noted that, in order to realize the adaptive selection of the size of the convolution kernel, the size k of the convolution kernel in the channel attention network_1DAnd the number C of feature channels of the input feature map_1DSatisfies the following conditions:

where γ and b are learning parameters, which are set to 2 and 1, | e $ y in this embodiment, respectively_oddRepresents the odd number nearest to e; the channel attention ensures that the classification result is improved, and meanwhile, the calculation amount and the parameter amount are reduced. Finally, the learned channel weights are respectively acted on the corresponding channels in the output characteristic diagram to obtain a channel attention weight characteristic diagram F_ca(ii) a Specifically, F_caU · w. Further, the channel attention weight feature map F_caInput into a spatial attention network.

In the spatial attention network, feature map F is selected_caPerforming average pooling and maximum pooling operations along their channel axis directions, respectively, to quickly capture context information to generate two 2D maps, respectively

And

(H_saand W_saHeight and width, respectively, of the spatial attention network generated feature map). Then, F is mixed_avgAnd F_maxSplicing according to channels to generate two channel characteristic graphs, performing convolution operation on the two channel characteristic graphs by adopting convolution check with preset size to generate a space attention weight characteristic graph

Finally, the channel attention weight feature map F_caAnd spatial attention weight feature map M (F)_ca) Multiplying (i.e. dot multiplication) by pixel correspondence to obtain a feature map F_sa. Wherein the size of the preset convolution kernel is according to the channel attention weight characteristic diagram F_caThe size of the two-channel feature map and the channel attention weight feature map F are determined through convolution operation_caThe size of (a) is kept consistent to realize the subsequent dot product operation.

It should be noted that, in the present embodiment, the channel attention network and the spatial attention network are combined in a cascade manner to form a hybrid attention module; in addition, the two attention modules can be reasonably combined in parallel or in other ways.

When the parallel connection mode is adopted, the channel attention network and the space attention network respectively process the output feature graph u containing the multi-scale information and input by the ith pyramid convolution unit according to the operation to respectively obtain a channel attention weight feature graph and a space attention weight feature graph; then, the attention weight characteristic graph and the space attention weight characteristic graph are spliced according to channels and then convolution operation is carried out to obtain a characteristic graph F_sa. It should be noted that, the order of performing the splicing and the convolution operation according to the channels is not limited, as long as the output dimension is consistent with the dimension of the output feature graph u.

Preferably, as shown in FIG. 3, the output of the ith hybrid attention module is further connected to the input of the ith pyramid convolution unit; the ith mixed attention module is also used for obtaining a characteristic diagram F_saRe-input to the ith pyramid convolution unit to perform feature map F_saFurther extracting features; after repeating for multiple times, outputting the result to the next pyramid convolution unit or pooling layer; so as to improve the robustness of the image classification model.

It should be noted that the pyramid convolution unit and the attention mixing module connected to the output end of the pyramid convolution unit may be referred to as an attention mixing pyramid module; in order to improve the robustness of the network model, mixed attention pyramid modules consisting of different numbers and depth convolution kernels are cascaded together to form the classification model in the invention, and an input image is processed by the different mixed attention pyramid modules which are repeated for multiple times, so that a final classification prediction result is generated. The invention can adjust the repetition times of extracting the features in the mixed attention pyramid module and the size and the number of convolution kernels in each pyramid convolution unit according to the actual task.

Further, for example, in the common ophthalmic disease glaucoma, Optical Coherence Tomography (OCT) is often used to help clinicians identify the types of Anterior Chamber Angles (ACA) of patients, i.e. open angle, narrow angle and closed angle, because of its advantages of being non-invasive, comfortable, high resolution, non-contact, etc., but the Anterior chamber angle occupies a certain range of fluctuation in the OCT image due to the difference of individuals. If the anterior chamber angle is small, a single convolution kernel is difficult to accurately capture characteristic information of tiny details, and meanwhile, due to the fact that the information redundancy problem in the characteristic fusion process is ignored, useful information cannot be highlighted and useless information cannot be inhibited, and accurate type prediction of the anterior chamber angle is influenced finally; the invention provides an image classification model, which comprises a plurality of pyramid convolution units and is used for extracting different fine-grained image characteristics by utilizing a multi-scale scheme. In the module, the image is input into a pyramid convolution module consisting of convolution kernel filters with different sizes and depths, and the input image is respectively extracted with different scale information. And then, sequentially fusing the feature maps extracted by the convolution kernels of all scales with the fusion feature map extracted by the convolution kernel of the previous stage in a convolution spanning connection mode to obtain the fusion feature maps extracted by the convolution kernels of all scales, further mining the correlation among the feature maps to obtain output feature maps containing information of different scales, and thus finishing the feature extraction of all the convolution kernels of all sizes. And then, splicing the output feature maps containing different scale information together by using a feature map combination operation, and changing the number of spliced feature channels by 1 × 1 convolution. And finally, performing pixel-by-pixel superposition summation on the combined feature map and the image input into the pyramid convolution module.

In order to better verify the accuracy of the classification model constructed by the invention, the closed angle glaucoma data set provided by international conference of mica Image Computing and Computer Assisted interpretation in 2019 is taken as a training data set, 1341 Image is randomly selected from the training data set, and the Image is cut into 2682 anterior chamber angle images by cutting, wherein the data set provides two gold standard labels of open angle anterior chamber angle and closed angle anterior chamber angle. On this basis, closed angle glaucoma is further classified into narrow angle anterior chamber angle and closed angle glaucoma. In order to avoid the problem of difficult convergence of training caused by unbalanced data distribution, original data are subjected to translation and rotation processing through data enhancement to obtain 1536 open-angle anterior chamber angles, 1214 narrow-angle anterior chamber angles and 1458 closed-angle anterior chamber angles, and the number of finally obtained training sets, verification sets and test sets is 3367, 419 and 422 respectively.

In order to further embody the advantages of the invention, the invention utilizes the glaucoma anterior chamber angle data set to evaluate the performance of the method for classifying the classification model constructed by the invention and the performance of the current mainstream depth learning classification method, and the evaluation and selection indexes comprise accuracy ACC and average sensitivity

Average specificity

And average equilibrium accuracy

It is defined as follows:

wherein N is_testFor testing the number of images in the set, TP_s、TN_s、FP_s、FN_s(s ∈ {1,2,3}) indicates the number of true positives, true negatives, false positives and false negatives, respectively, when the s-th class is considered positive and the remaining classes are negative.

It should be noted that, in this embodiment, the number n of the pyramid convolution units is 4, the number of convolution kernels of the first pyramid convolution unit is n, and the scales of the convolution kernels are 3 × 3,5 × 5, …, (2n +1) × (2n + 1); the number of convolution kernels of the second pyramid convolution unit is n-1, and the scales of the convolution kernels are respectively 3 multiplied by 3,5 multiplied by 5, …, (2n-1) × (2 n-1); the convolution kernels of the (n-1) th pyramid convolution unit are 2, and the scales of the convolution kernels are respectively 3 multiplied by 3 and 5 multiplied by 5; the convolution kernels of the nth pyramid convolution unit are 1, and the scale is 3 multiplied by 3. In each pyramid convolution unit, sequentially fusing the feature graph extracted by each scale convolution kernel with the fusion feature graph extracted by the previous-stage convolution kernel in a convolution spanning connection mode to obtain the fusion feature graph extracted by each scale convolution kernel, thereby obtaining output feature graphs containing information of different scales and sequentially finishing feature extraction of all convolution kernels; namely, the feature map F extracted by the second scale convolution kernel_i ²Feature map F extracted by convolution kernel with first scale_i ¹Fusing to obtain a fusion characteristic graph M extracted by a second scale convolution kernel_i ²Extracting feature graph F from the third scale convolution kernel_i ³Fusion characteristic graph F extracted by convolution kernel with second scale_i ²Fusing to obtain a fused feature map extracted by a third-scale convolution kernel

The feature map F extracted by the fourth scale convolution kernel_i ⁴Fused feature map extracted with third scale convolution kernel

Fusing to obtain a fusion characteristic diagram extracted by a fourth scale convolution kernel

And so on. It should be noted that the scale of the previous convolution kernel is smaller than that of the current convolution kernel. Specifically, in this embodiment, as shown in fig. 4, taking 3 convolution kernels in a pyramid convolution unit as an example, the operation is performed in a 3 × 3 convolution cross-connection manner, and a fusion feature graph extracted by a first block scale convolution kernel is:

wherein K_3×3A convolution kernel of size 3 x 3. After the fusion characteristic graphs of the convolution kernels of all scales are extracted, outputting the fusion characteristic graphs by utilizing the combination operation of the characteristic graphs

And splicing the channels together, and changing the number of the spliced characteristic channels through 1 multiplied by 1 convolution so that the characteristic channels can be superposed and summed with the characteristic image currently input to the ith pyramid convolution unit pixel by pixel to obtain an output characteristic image containing multi-scale information. In this embodiment, the number of times of repetition when the first pyramid convolution unit and the first mixed attention pyramid module formed by the first mixed attention module extract features is 3, the number of times of repetition when the second pyramid convolution unit and the second mixed attention pyramid module formed by the second mixed attention module extract features is 4, the number of times of repetition when the third pyramid convolution unit and the third mixed attention pyramid module formed by the third mixed attention module extract features is 6, and the number of times of repetition when the fourth pyramid convolution unit and the fourth mixed attention pyramid module formed by the fourth mixed attention module extract features is 3.

Table 1 shows the comparison of classification performance of the classification model constructed by the present invention (herein, HapcNet) and different mainstream networks (VGG-16, ResNet-50, DenseNet-121, MobileNet, EfficientNet-B7, and PyConvNet-50) on the anterior chamber corner test set. Wherein EfficientNet-B7 is B7 series of EfficientNet, and numbers in other networks represent the networksThe number of layers, e.g., VGG-16, represents a 16-layer VGG network. As can be seen from Table 1, the algorithms with more prominent classification effect include EfficientNet, PyConvNet and HapcNet provided by the invention, which are superior to the other four deep learning methods in most indexes. Compared with the EfficientNet and PyConvNet methods, the HapcNet provided by the invention has the ACC value improved by about 1.47% and 1.66% respectively. In that

Although the difference between the networks is not significant, the VGG performs the worst, which is

0.9933, the HapcNet provided by the invention can reach 0.9998, and the best classification performance is obtained in the comparison networks.

TABLE 1

Furthermore, in order to more intuitively display the superiority of the method compared with other methods, the HapcNet and each comparative deep learning model provided by the invention are respectively adopted for experiments; fig. 5 is a graph showing an Accuracy curve of the hapcent and each comparative deep learning model provided by the present invention on the anterior chamber angle validation set, wherein an abscissa Epochs is the number of iterations, and an ordinate Accuracy is the Accuracy; fig. 6 is a confusion matrix of the hapcent and each comparative deep learning model provided by the present invention on the anterior chamber angle test set, wherein "0", "1" and "2" represent the open angle, narrow angle and closed angle, respectively; FIG. 6 (a) is a confusion matrix of the deep learning model VGG-16 on the anterior chamber corner test set; FIG. 6 (b) is a confusion matrix of the deep learning model ResNet-50 on the anterior chamber corner test set; fig. 6 (c) is a confusion matrix of the deep learning model DenseNet-121 on the anterior chamber corner test set; fig. 6 (d) is a confusion matrix of the deep learning model MobileNet on the anterior chamber angle test set; FIG. 6 (e) is a confusion matrix of the deep learning model EfficientNet-B7 on the anterior chamber corner test set; FIG. 6 (f) is the confusion matrix of the deep learning model PyConvNet-50 on the anterior chamber corner test set; fig. 6 (g) is a confusion matrix of hapcent on the anterior chamber angle test set provided by the present invention. From fig. 5, it can be seen that the hapcent provided by the present invention achieves better convergence accuracy than the deep learning model, and provides an extremely competitive convergence speed. As can be seen from the confusion matrix shown in FIG. 6, HapcNet, EfficientNet-B7, and PyConvNet-50 achieved superior classification performance on the anterior chamber corner test dataset as compared to the rest of the mainstream networks. Specifically, for the open angle anterior chamber angle, the HapcNet provided by the invention can provide the sub-optimal accuracy rate of 98.7%, and the EfficientNet-B7 obtains the optimal classification accuracy rate of 99.4%; for narrow-angle anterior chamber angles, the HapcNet provided by the invention can provide the best accuracy of 100%, and EfficientNet-B7 can only provide the second best accuracy; the HapcNet provided by the invention can still achieve the best classification accuracy for the closed-angle anterior chamber angle. In conclusion, the HapcNet provided by the invention has advantages in the classification of the anterior chamber angle data sets compared with other deep learning models.

Examples 2,

An image classification method, comprising: the image to be classified is input into the image classification model constructed by the construction method of the image classification model provided in embodiment 1, and a classification result is obtained. Preferably, before the image to be classified is input to the image classification model, the image to be classified is scaled to improve the computational efficiency.

The related technical scheme is the same as embodiment 1, and is not described herein.

Examples 3,

A machine-readable storage medium storing machine-executable instructions that, when invoked and executed by a processor, cause the processor to implement the method of constructing an image classification model provided in example 1 and/or the method of image classification provided in example 2.

The related technical features are the same as those of embodiment 1 and embodiment 2, and are not described herein.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A construction method of an image classification model is characterized by comprising the following steps:

s1, building an image classification model; the image classification model comprises: the convolution layer, the first pyramid convolution unit, the second pyramid convolution unit, the nth pyramid convolution unit, the pooling layer and the full-connection layer are sequentially cascaded; the first convolution layer is used for extracting an initial feature map of an input image and outputting the initial feature map to the first pyramid convolution unit; the ith pyramid convolution unit is used for adopting convolution kernels with different scales of n-i +1 in number to respectively perform further feature extraction on the feature map currently input into the ith pyramid convolution unit, and then sequentially fusing the feature map extracted by each scale convolution kernel with the fusion feature map extracted by the previous stage convolution kernel to obtain fusion feature maps extracted by each scale convolution kernel, namely a group of feature maps containing different scale information; fusing the characteristic graph containing different scale information with the characteristic graph currently input to the ith pyramid convolution unit to obtain an output characteristic graph containing multi-scale information; wherein i is 1,2, …, n; for each scale convolution kernel, the scale is larger than that of the previous stage convolution kernel;

2. The method of constructing an image classification model according to claim 1, wherein the input image is an image obtained by scaling original sample images in the training set.

3. The structure of the image classification model of claim 1The building method is characterized in that a characteristic graph extracted by adopting a convolution kernel of a first block scale in the ith pyramid convolution unit is marked as F_i ^blockBlock 1, …, n-i + 1; for the ith pyramid convolution unit, when i is 1,2, …, n-1, a feature map F is obtained_i ²And characteristic diagram F_i ¹Fusing to obtain a fusion characteristic diagram extracted by a second scale convolution kernel

Starting from block to 3, sequentially converting the characteristic diagram F_i ^blockFusion characteristic graph extracted from convolution kernel with first block-1 scale

After splicing operation is carried out, the feature map is fused with the feature map currently input to the ith pyramid convolution unit to obtain an output feature map containing multi-scale information, and the output feature map is output to the (i +1) th pyramid convolution unit; and when i is equal to n, performing convolution operation on the feature map currently input to the ith pyramid convolution unit, fusing the feature map currently input to the ith pyramid convolution unit with the feature map currently input to the ith pyramid convolution unit to obtain an output feature map containing multi-scale information, outputting the output feature map to the pooling layer, performing pooling operation on the output feature map, and obtaining a classification result through the full connection layer.

4. The method for constructing an image classification model according to claim 3, wherein in the ith pyramid convolution unit, a specific way of fusing the feature map A and the feature map B or the fused feature map B is as follows: performing convolution operation on the A and then combining the A with the B; the mode of combining with B comprises pixel-by-pixel superposition operation or splicing operation or convolution operation after splicing;

the pair of fused feature maps

5. The method for constructing an image classification model according to any one of claims 1 to 4, wherein the output end of the ith pyramid convolution unit is further connected to the input end of the ith pyramid convolution unit;

the ith pyramid convolution unit is further used for inputting the output feature map containing the multi-scale information to the ith pyramid convolution unit again before outputting the output feature map containing the multi-scale information to the next pyramid convolution unit or the pooling layer so as to further extract features of the output feature map containing the multi-scale information; and after repeating for multiple times, outputting the result to the next pyramid convolution unit or the pooling layer.

6. The method for constructing an image classification model according to any one of claims 1 to 4, wherein the image classification model further comprises: a hybrid attention module; n mixed attention modules, when i is 1,2, …, n-1, the ith mixed attention module is located between the ith pyramid convolution unit and the (i +1) th pyramid convolution unit; when i is n, the ith mixed attention module is positioned between the ith pyramid convolution unit and the pooling layer;

the mixed attention module comprises a space attention network and a channel attention network which are connected in series or in parallel, and is used for screening output characteristic diagrams which are input by the pyramid convolution unit and contain multi-scale information in space and channel dimensions to obtain a characteristic diagram F_saThereby suppressing redundant background information.

7. The method for constructing an image classification model according to claim 6, wherein the output end of the ith mixed attention module is further connected to the input end of the ith pyramid convolution unit;

the ith mixed attention module is also used for converting the feature map F_saRe-inputting the feature map to the ith pyramid convolution unit to obtain the feature map F_saFurther extracting features; and after repeating for multiple times, outputting the result to the next pyramid convolution unit or the pooling layer.

8. The method for constructing an image classification model according to claim 6, wherein the channel attention network is used for performing a global average pooling operation on the input feature map according to channels to extract global spatial information on each channel; then, channel weights of global space information on all channels are learned through one-dimensional convolution kernels shared by the weights, and the learned channel weights are respectively acted on the corresponding channels in the input feature map so as to screen the feature information in channel dimensions;

size k of convolution kernel in the channel attention network_1DAnd the number C of characteristic channels of the input characteristic diagram_1DSatisfies the following conditions:

9. An image classification method, comprising: inputting the image to be classified into the image classification model constructed by the image classification model construction method according to any one of claims 1 to 8, and obtaining a classification result.

10. A machine-readable storage medium having stored thereon machine-executable instructions which, when invoked and executed by a processor, cause the processor to carry out the method of constructing an image classification model according to any one of claims 1 to 8 and/or the method of image classification according to claim 9.