WO2022205502A1 - 一种图像分类模型的构建方法、图像分类方法及存储介质 - Google Patents

一种图像分类模型的构建方法、图像分类方法及存储介质 Download PDF

Info

Publication number
WO2022205502A1
WO2022205502A1 PCT/CN2021/086861 CN2021086861W WO2022205502A1 WO 2022205502 A1 WO2022205502 A1 WO 2022205502A1 CN 2021086861 W CN2021086861 W CN 2021086861W WO 2022205502 A1 WO2022205502 A1 WO 2022205502A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature map
pyramid
convolution
convolution unit
image classification
Prior art date
Application number
PCT/CN2021/086861
Other languages
English (en)
French (fr)
Inventor
张旭明
周权
Original Assignee
华中科技大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华中科技大学 filed Critical 华中科技大学
Publication of WO2022205502A1 publication Critical patent/WO2022205502A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/32Indexing scheme for image data processing or generation, in general involving image mosaicing

Definitions

  • the invention belongs to the technical field of image processing, and more particularly, relates to a method for constructing an image classification model, an image classification method and a storage medium.
  • Image classification technology is the core of computer vision and has a wide range of applications in many fields, such as face recognition and intelligent video analysis in the security field, traffic scene recognition in the transportation field, image retrieval in the Internet field, and medical images in the medical field. analysis, etc.
  • doctors can identify images collected by imaging equipment (such as magnetic resonance imaging, ultrasound imaging, and optical tomography) in clinical diagnosis, so as to achieve the purpose of disease screening.
  • imaging equipment such as magnetic resonance imaging, ultrasound imaging, and optical tomography
  • the effect of manual recognition greatly depends on the clinical experience of doctors, and the diagnostic efficiency of doctors is also affected by the huge amount of medical data, which is prone to misdiagnosis or missed diagnosis due to excessive fatigue of doctors.
  • automated computer-aided diagnosis technology has been widely used in the field of medical image recognition. It uses the powerful computing power of computers to process and analyze images, provide clinicians with valuable information and greatly reduce the workload of doctors. .
  • deep learning algorithms have received extensive attention in the field of image classification. Compared with traditional machine learning algorithms based on shallow learning to obtain handcrafted features, deep learning methods combine multiple nonlinear shallow features and construct more abstract higher-order features on this basis. Like the deep structure of the brain, in deep learning, each input object is represented in the form of multiple layers of abstraction, each layer corresponding to a different cortical area.
  • the advantage of deep learning is that the multi-level features it obtains are learned from raw data using a general learning process, rather than being designed by hand-screening.
  • the more commonly used deep learning models are deep Boltzmann machines, deep belief networks, stacked autoencoders, recurrent neural networks, and convolutional neural networks.
  • convolutional neural networks are widely used in image processing and have achieved good results in many medical image recognition tasks.
  • most network models only use a single convolution kernel when extracting image feature information.
  • these networks fail to make full use of different scales. It also fails to solve the problem of information redundancy generated in the process of feature fusion, resulting in that useful information cannot be highlighted and useless information cannot be suppressed, and the accuracy of classification is low.
  • the present invention provides a method for constructing an image classification model, an image classification method and a storage medium, so as to solve the problem of accurate classification due to the failure to fully utilize the feature information of different scales in the prior art. lesser technical issues.
  • the present invention provides a method for constructing an image classification model, comprising the following steps:
  • the image classification model includes: successively cascaded convolution layers, a first pyramid convolution unit, a second pyramid convolution unit, ..., the nth pyramid convolution unit, a pooling layer, and a full Connection layer;
  • the first convolution layer is used to extract the initial feature map of the input image and output it to the first pyramid convolution unit;
  • the i-th pyramid convolution unit is used to use a number of n-i+1 different scales of convolution
  • the kernel performs further feature extraction on the feature map currently input to the i-th pyramid convolution unit, the feature map extracted by the convolution kernel of each scale is sequentially processed with the fusion feature map extracted by the previous convolution kernel.
  • the input image is an image obtained by scaling the original sample image in the training set to improve computational efficiency.
  • the specific method of fusing feature map A with feature map B or fusion feature map B is: convolving A. Combine with B after the operation; wherein, the method of combining with B includes pixel-by-pixel overlay operation or splicing operation or convolution operation after splicing;
  • the specific method of merging with the feature map currently input to the i-th pyramid convolution unit after the splicing operation is as follows: Splicing by channel, and changing the number of feature channels of the spliced feature map by convolution to make it consistent with the number of channels currently input to the i-th pyramid convolution unit
  • the feature maps of the i-th pyramid convolution unit are superimposed and summed pixel by pixel to obtain an output feature map containing multi-scale information.
  • the output end of the ith pyramid convolution unit is also connected to the input end of the ith pyramid convolution unit;
  • the ith pyramid convolution unit is also used to re-input the obtained output feature map containing multi-scale information to the ith pyramid before outputting the resulting output feature map containing multi-scale information to the next pyramid convolution unit or pooling layer i pyramid convolution unit, to further extract features from the currently obtained output feature map containing multi-scale information; after repeating many times, output the result to the next pyramid convolution unit or pooling layer; to improve the performance of the above image classification model robustness.
  • the hybrid attention module includes cascaded or parallel spatial attention networks and channel attention networks to screen the output feature maps containing multi-scale information input from the pyramid convolution unit in spatial and channel dimensions to obtain the feature map F sa , thereby suppressing redundant background information and highlighting useful feature information for classification results.
  • the output of the ith mixed attention module is also connected to the input of the ith pyramid convolution unit;
  • the i-th mixed attention module is also used to re-input the obtained feature map F sa to the i-th pyramid convolution unit to further extract features from the feature map F sa ; after repeated many times, the result is output to the next pyramid convolution unit or pooling layer; to improve the robustness of the above image classification model.
  • the channel attention network is used to perform a global average pooling operation on the input feature map by channel to extract the global spatial information on each channel;
  • the channel weights of the global spatial information are learned, and the learned channel weights are applied to the corresponding channels in the input feature map, so as to filter the feature information in the channel dimension;
  • the size of the convolution kernel in the channel attention network k 1D and the number of feature channels C 1D of the input feature map satisfy: Among them, ⁇ and b are both learning parameters, and
  • the present invention provides an image classification method, comprising: inputting an image to be classified into an image classification model constructed by using the image classification model construction method provided in the first aspect of the present invention, and obtaining a classification result.
  • the present invention further provides a machine-readable storage medium, where the machine-readable storage medium stores machine-executable instructions, and when the machine-executable instructions are invoked and executed by a processor, the machine-executable instructions The instructions cause the processor to implement any of the image classification model building methods described above and/or the image classification methods described above.
  • the present invention provides a method for constructing an image classification model.
  • the constructed image classification model includes convolution layers, a first pyramid convolution unit, a second pyramid convolution unit, . . . Pyramid convolution unit, pooling layer and fully connected layer, among which, the pyramid convolution unit sequentially extracts the feature map of each scale convolution kernel by convolution spanning connection, and fuses it with the extraction of the previous convolution kernel
  • the feature maps are fused to obtain the fusion feature maps extracted by the convolution kernels of each scale, so as to further mine the correlation between the feature maps, and obtain the output feature maps containing multi-scale information, so as to make full use of the different scale information between the output feature maps;
  • the present invention utilizes a multi-scale scheme to extract different fine-grained image features, and the accuracy of image classification is high.
  • the image classification model constructed by the method for constructing the image classification model provided by the present invention also includes a hybrid attention module, based on the spatial attention network and the channel attention network to the pyramid convolution unit input containing multi-scale information output features Graphs are screened in space and channel dimensions to achieve adaptive calibration of channel features and spatial information to suppress redundant information introduced when integrating feature maps of different scales. By effectively suppressing useless background information and highlighting key feature information, It further improves the accuracy of image classification.
  • the output end of the ith mixed attention module is also connected to the input end of the ith pyramid convolution unit; wherein, the pyramid convolution unit and The hybrid attention module connected to its output is called the hybrid attention pyramid module.
  • the hybrid attention pyramid modules composed of different numbers and depths of convolution kernels are cascaded together for image classification, which not only improves the accuracy of the model, but also greatly improves the the robustness of the model.
  • the image can be scaled before being input into the model to improve the computational efficiency.
  • Embodiment 1 is a schematic structural diagram of an image classification model provided in Embodiment 1 of the present invention.
  • Embodiment 1 of the present invention is a schematic structural diagram of an image classification model including a mixed attention module provided by Embodiment 1 of the present invention
  • Embodiment 3 is a schematic structural diagram of an image classification model provided in Embodiment 1 of the present invention that includes a mixed attention module and the output end of the mixed attention module is also connected to the input end of the corresponding pyramid convolution unit;
  • FIG. 4 is a schematic diagram of a 3 ⁇ 3 convolution spanning connection in the pyramid convolution unit provided in Embodiment 1 of the present invention.
  • Fig. 6 is the confusion matrix of HapcNet and each contrasting deep learning model provided in Embodiment 1 of the present invention on the anterior chamber angle test set; wherein, (a) is the confusion of the deep learning model VGG-16 on the anterior chamber angle test set Matrix; (b) is the confusion matrix of the deep learning model ResNet-50 on the anterior chamber angle test set; (c) is the confusion matrix of the deep learning model DenseNet-121 on the anterior chamber angle test set; (d) is the deep learning The confusion matrix of the model MobileNet on the anterior chamber angle test set; (e) is the confusion matrix of the deep learning model EfficientNet-B7 on the anterior chamber angle test set; (f) is the deep learning model PyConvNet-50 on the anterior chamber angle test set (g) is the confusion matrix of the HapcNet provided by the present invention on the anterior chamber angle test set.
  • a method for constructing an image classification model comprising the following steps:
  • the image classification model includes: successively cascaded convolution layers, a first pyramid convolution unit, a second pyramid convolution unit, ..., the nth pyramid convolution unit , pooling layer and fully connected layer; the first convolutional layer is used to extract the initial feature map of the input image and output it to the first pyramid convolution unit; the i-th pyramid convolution unit is used to adopt the number of n-i+1
  • the convolution kernels of different scales respectively perform further feature extraction on the feature map currently input to the i-th pyramid convolution unit, and then extract the feature map extracted by the convolution kernel of each scale in turn with its previous convolution kernel.
  • the fused feature maps are fused, and the fused feature maps extracted by the convolution kernels of each scale are obtained, that is, a set of feature maps containing information of different scales; the feature maps containing information of different scales are spliced together with the current input to the i-th pyramid volume.
  • the fusion feature map extracted by the third-scale convolution kernel is obtained by fusing with
  • the specific method of fusion is: after the convolution operation is performed on the feature map F i block-1 , the feature map is fused with the Combined to fully mine the information between different feature maps to make the information more complete; among them, and the fusion feature map
  • the way of combining includes pixel-by-pixel stacking operation or splicing operation or convolution operation after splicing. It needs to be said that the method of fusing the feature map F i 2 with the feature map F i 1 is the same as the fusion feature map extracted from the feature map F i block and the block-1th scale convolution kernel The method of fusion is the same, which is not repeated here.
  • the specific method of merging with the feature map currently input to the i-th pyramid convolution unit after the splicing operation is as follows: Splicing by channel, and changing the number of feature channels of the spliced feature map by convolution to make it consistent with the number of channels currently input to the i-th pyramid convolution unit
  • the feature maps of the i-th pyramid convolution unit are superimposed and summed pixel by pixel to obtain an output feature map containing multi-scale information.
  • the training set collected according to the preset classification task into the above-mentioned image classification model for training, and obtain a trained image classification model.
  • the above-mentioned input image may be an image obtained by scaling the original sample image in the training set, thereby improving computational efficiency and speeding up training.
  • the cross-entropy loss is used as the total loss function, specifically:
  • represents the number of output categories
  • Num is the batch size of images in the training set
  • x p, q is the prediction probability generated by the softmax classification function when the p-th sample belongs to the q-th class
  • y p , q is the p-th sample Whether each sample is classified as the corresponding label of the qth class.
  • the output end of the ith pyramid convolution unit is also connected to the input end of the ith pyramid convolution unit;
  • the ith pyramid convolution unit is also used to re-input the obtained output feature map containing multi-scale information to the ith pyramid before outputting the resulting output feature map containing multi-scale information to the next pyramid convolution unit or pooling layer i pyramid convolution unit, to further extract features from the currently obtained output feature map containing multi-scale information; after repeating many times, output the result to the next pyramid convolution unit or pooling layer; to improve the performance of the above image classification model robustness.
  • the attention module is located between the i-th pyramid convolution unit and the pooling layer;
  • the hybrid attention module includes cascaded or parallel spatial attention network and channel attention network, so that the input of pyramid convolution unit contains multi-scale information.
  • the output feature map is filtered in spatial and channel dimensions to obtain a feature map F sa , thereby suppressing redundant background information and highlighting feature information beneficial to the classification results.
  • the output feature map u containing multi-scale information output by the i-th pyramid convolution unit is input into the hybrid attention module.
  • the following operations are performed in the hybrid attention module:
  • the global average pooling operation is first performed on the output feature map u by channel to extract the global spatial information on each channel, where the global spatial information of the cth channel in u c H ca and W ca are the height and width of uc, respectively, uc is the feature map corresponding to the cth channel of the output feature map u, and z is a one-dimensional vector containing the global spatial information of each channel;
  • the shared one-dimensional convolution kernel learns the channel weights of the global spatial information on each channel respectively, and the resulting weights are Among them, ⁇ ( ) is the Sigmoid function; 1D_Conv indicates that a convolution kernel of size k 1D is used to perform a one-dimensional convolution operation on z; it should be noted that, in order to realize the adaptive selection of the size of the convolution kernel, the channel attention network
  • the size of the convolution kernel k 1D and the number of feature channels C 1D of the input feature map satisfy: Among them, ⁇ and b are
  • the average pooling and max pooling operations are performed on the feature map F ca along its channel axis direction to quickly capture the context information to generate two 2D maps respectively. and (H sa and W sa are the height and width of the feature map generated by the spatial attention network, respectively).
  • F avg and F max are spliced by channel to generate a two-channel feature map, and a convolution kernel of a preset size is used to convolve the obtained two-channel feature map to generate a spatial attention weight feature map
  • the feature map F sa is obtained by multiplying the channel attention weight feature map F ca and the spatial attention weight feature map M(F ca ) by pixel correspondence (ie, dot multiplication).
  • the size of the preset convolution kernel is determined according to the size of the channel attention weight feature map F ca , and the size of the two-channel feature map is consistent with the size of the channel attention weight feature map F ca through the convolution operation, so as to realize the subsequent the dot multiplication operation.
  • the channel attention network and the spatial attention network are combined in a cascade manner to form a mixed attention module; in addition, the two attention modules can also be combined in parallel or in other ways. make a reasonable combination.
  • the channel attention network and the spatial attention network respectively process the output feature map u containing multi-scale information input by the i-th pyramid convolution unit according to the above operations, and obtain the channel attention weight feature map and spatial attention respectively.
  • the attention weight feature map; then the attention weight feature map and the spatial attention weight feature map are spliced by channel, and then the convolution operation is performed to obtain the feature map F sa .
  • the order of splicing and convolution operations by channel is not limited here, as long as the output dimension and the dimension of the output feature map u are kept consistent.
  • the output of the i-th mixed attention module is also connected to the input of the i-th pyramid convolution unit; the i-th mixed attention module is also used to re-input the resulting feature map F sa to the i-th pyramid convolution unit; i pyramid convolution unit to further extract features from the feature map F sa ; after repeating for many times, the result is output to the next pyramid convolution unit or pooling layer; to improve the robustness of the above image classification model.
  • the pyramid convolution unit and the mixed attention module connected to its output can be called the mixed attention pyramid module; in order to improve the robustness of the network model, the mixed attention composed of different numbers and depths of convolution kernels
  • the force pyramid modules are cascaded together to form the classification model in the present invention, and the input image is processed by different mixed attention pyramid modules repeated many times, thereby generating the final classification prediction result.
  • the present invention can adjust the repetition times of extracting features in the mixed attention pyramid module and the size and quantity of convolution kernels in each pyramid convolution unit according to the actual task.
  • OCT optical coherence tomography
  • ACA anterior chamber angle
  • the present invention provides an image classification model including a plurality of pyramid convolution units, which utilizes a multi-scale scheme to extract different fine-grained image features.
  • the image is input into a pyramid convolution module composed of convolution kernel filters of different sizes and depths, and different scale information is extracted from the input image respectively.
  • the feature maps extracted by the convolution kernels of each scale are sequentially fused with the fusion feature maps extracted by the previous convolution kernels by convolution spanning connections, and the fusion feature maps extracted by the convolution kernels of each scale are obtained.
  • the output feature maps containing information of different scales are obtained, thereby completing the feature extraction of all convolution kernels of different sizes.
  • the output feature maps containing different scale information are spliced together by the feature map combination operation, and the number of feature channels after splicing is changed by 1 ⁇ 1 convolution.
  • the combined feature map is summed by pixel-wise overlay with the image input to the pyramid convolution module.
  • the present invention uses the angle-closure glaucoma data set provided by the 2019 MICCAI (Medical Image Computing and Computer Assisted Intervention) International Conference as the training data set, and randomly selected 1341 images, and the image is divided into 2682 anterior chamber angle images by cropping, the dataset has provided two gold standard labels of open angle anterior chamber angle and closed angle anterior chamber angle. On this basis, angle-closure glaucoma is further divided into narrow-angle anterior chamber angle and angle-closure glaucoma.
  • 2019 MICCAI Medical Image Computing and Computer Assisted Intervention
  • the original data were translated and rotated through data enhancement, and 1536 open-angle anterior chamber angles, 1214 narrow-angle anterior chamber angles, and 1458 closed-angle anterior chamber angles were obtained.
  • Zhang, the final number of training, validation, and test sets are 3367, 419, and 422, respectively.
  • the present invention uses the above-mentioned glaucoma anterior chamber angle data set to evaluate the performance of the method for classifying using the classification model constructed by the present invention and the current mainstream deep learning classification methods, and the indicators selected for evaluation include accuracy.
  • ACC mean sensitivity average specificity and average balance accuracy It is defined as follows:
  • N test is the number of images in the test set, TP s , TN s , FP s , and FN s (s ⁇ 1,2,3 ⁇ ) respectively indicate that when the sth class is regarded as positive and the other classes are negative, The number of true positives, true negatives, false positives, and false negatives.
  • the number n of pyramid convolution units is 4, the number of convolution kernels of the first pyramid convolution unit is n, and the scales of the convolution kernels are 3 ⁇ 3 respectively. 5 ⁇ 5,...,(2n+1) ⁇ (2n+1); the number of convolution kernels of the second pyramid convolution unit is n-1, and the convolution kernel scales are 3 ⁇ 3, 5 ⁇ 5, ...,(2n-1) ⁇ (2n-1); the convolution kernels of the n-1th pyramid convolution unit are 2, and the convolution kernel scales are 3 ⁇ 3 and 5 ⁇ 5 respectively; the nth pyramid convolution unit The convolution kernel is 1 and the scale is 3 ⁇ 3.
  • each pyramid convolution unit the feature maps extracted by the convolution kernels of each scale are sequentially fused with the fusion feature maps extracted by the convolution kernel of the previous level by convolution spanning connection, and the convolution kernels of each scale are obtained.
  • the extracted fused feature maps are obtained to obtain output feature maps containing information of different scales, and the feature extraction of all convolution kernels is completed in turn; that is, the feature map F i2 extracted by the second scale convolution kernel and the first scale convolution kernel are extracted
  • the feature map F i 1 is fused to obtain the fusion feature map extracted by the second scale convolution kernel
  • the feature map F i 3 extracted by the third-scale convolution kernel is fused with the fusion feature map F i 2 extracted by the second-scale convolution kernel to obtain the fusion feature map extracted by the third-scale convolution kernel.
  • the feature map F i 4 extracted by the fourth-scale convolution kernel and the fusion feature map extracted by the third-scale convolution kernel Perform fusion to obtain the fusion feature map extracted by the fourth-scale convolution kernel And so on.
  • the scale of the previous convolution kernel is smaller than the scale of the current convolution kernel.
  • the operation is performed by 3 ⁇ 3 convolution spanning connections, and the block-scale convolution kernel extracted
  • the fusion feature map is: where K 3 ⁇ 3 is the convolution kernel of size 3 ⁇ 3.
  • the fusion feature maps of the convolution kernels of each scale are extracted, use the feature map combination operation to They are spliced together by channel, and the number of feature channels after splicing is changed by 1 ⁇ 1 convolution, so that it can be superimposed and summed pixel by pixel with the feature map currently input to the i-th pyramid convolution unit to obtain a multi-scale information.
  • Output feature map the first pyramid convolution unit and the first mixed attention pyramid module formed by the first mixed attention module are repeated three times when extracting features, and the second pyramid convolution unit and the second mixed attention The number of repetitions when extracting features by the second mixed attention pyramid module formed by the attention module is 4.
  • the number of repetitions is 6 times, and the number of repetitions is 3 times when the fourth pyramid convolution unit and the fourth mixed attention pyramid module formed with the fourth mixed attention module extract features.
  • Table 1 shows the classification model provided by the present invention (referred to as HapcNet here) and different mainstream networks (VGG-16, ResNet-50, DenseNet-121, MobileNet, EfficientNet-B7 and PyConvNet-50) in Comparison of classification performance on the anterior chamber angle test set.
  • EfficientNet-B7 is the B7 series of EfficientNet
  • the numbers in other networks indicate the number of layers of the network.
  • VGG-16 indicates a 16-layer VGG network. It can be seen from Table 1 that the algorithms with more prominent classification effects include EfficientNet, PyConvNet and HapcNet provided by the present invention, which are superior to the other four deep learning methods in most indicators.
  • the HapcNet provided by the present invention improves the ACC value by about 1.47% and 1.66%, respectively. exist In the above, although the difference between the networks is not obvious, VGG performs the worst, and its is 0.9933, while the HapcNet provided by the present invention can reach 0.9998, and the best classification performance is achieved among these comparative networks.
  • the HapcNet provided by the present invention and each contrasting deep learning model were used to conduct experiments;
  • the accuracy curve of the deep learning model on the anterior chamber angle verification set where the abscissa Epochs is the number of iterations, and the ordinate Accuracy is the accuracy;
  • the HapcNet provided by the present invention and each contrasting deep learning model are Confusion matrix on the anterior chamber angle test set, where "0", "1" and "2" represent open, narrow and closed angles, respectively;
  • (a) in Figure 6 is the deep learning model VGG-16 in front Confusion matrix on the angle test set;
  • Figure 6 is the confusion matrix of the deep learning model ResNet-50 on the anterior chamber angle test set;
  • Figure 6 (c) is the deep learning model DenseNet-121 before Confusion matrix on the angle test set;
  • (d) in Figure 6 is the confusion matrix of the deep learning model MobileNet on the anterior chamber angle test set;
  • Figure 6 (e) is the deep
  • FIG. 6 is the confusion matrix of the deep learning model PyConvNet-50 on the anterior chamber angle test set; (g) in Fig. 6 is the HapcNet provided by the present invention in the anterior chamber angle Confusion matrix on the test set. It can be found from FIG. 5 that the HapcNet provided by the present invention achieves better convergence accuracy than the deep learning model, and can provide a very competitive convergence speed. From the confusion matrix shown in Figure 6, it can be seen that HapcNet, EfficientNet-B7 and PyConvNet-50 can achieve better classification performance on the anterior chamber angle test dataset than the rest of the mainstream networks.
  • the HapcNet provided by the present invention can provide the second best accuracy rate of 98.7%, and EfficientNet-B7 achieves the best classification accuracy rate of 99.4%; and for the narrow-angle anterior chamber angle, this The HapcNet provided by the invention can provide the best accuracy rate of 100%, and the EfficientNet-B7 can only provide the second best accuracy rate; for the closed anterior chamber angle, the HapcNet provided by the present invention can still achieve the best classification accuracy rate .
  • the HapcNet provided by the present invention has advantages in the classification of anterior chamber angle datasets compared with other deep learning models.
  • An image classification method comprising: inputting an image to be classified into an image classification model constructed by using the image classification model construction method provided in Embodiment 1, and obtaining a classification result.
  • the image to be classified is first scaled to improve computational efficiency.
  • a machine-readable storage medium storing machine-executable instructions that, when invoked and executed by a processor, cause the processor to implement the machine-executable instructions

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

一种图像分类模型的构建方法、图像分类方法及存储介质,所构建的图像分类模型包括依次级联的卷积层、第一金字塔卷积单元、第二金字塔卷积单元、...、第n金字塔卷积单元、池化层和全连接层;第i金字塔卷积单元采用数量为n-i+1的不同尺度的卷积核分别对当前输入的特征图进行进一步的特征提取后,依次对各尺度的卷积核提取的特征图,将其与其前一级卷积核提取的融合特征图进行融合,得到各尺度的卷积核提取的融合特征图,即一组含有不同尺度信息的特征图;将含有不同尺度信息的特征图与当前输入的特征图进行融合,得到包含多尺度信息的输出特征图;i=1,2,…,n;该方法充分利用了不同尺度信息,图像分类准确性较高。

Description

一种图像分类模型的构建方法、图像分类方法及存储介质 【技术领域】
本发明属于图像处理技术领域,更具体地,涉及一种图像分类模型的构建方法、图像分类方法及存储介质。
【背景技术】
图像分类技术是计算机视觉的核心,在很多领域中都有着广泛的应用,如:安防领域的人脸识别和智能视频分析、交通领域的交通场景识别,互联网领域的图像检索以及医学领域的医学图像分析等。以医学图像为例,医生在临床诊断中可通过对影像设备(如核磁共振成像、超声成像和光学断层成像等设备)采集到的图像进行识别,以实现疾病筛查目的。然而,人工识别效果极大地依赖于医生的临床经验,同时医生的诊断效率也受到巨大医学数据量的影响,容易由于医生过度疲劳而导致误诊或漏诊。目前,自动化的计算机辅助诊断技术已被广泛应用于医学图像识别领域中,它利用计算机的强大计算能力对图像进行处理和分析,为临床医生提供具有参考价值的信息,并大大减少医生的工作负担。
近年来,深度学习算法在图像分类领域得到广泛关注。与基于浅层学习获得手工特征的传统机器学习算法相比,深度学习方法通过联合多个非线性浅层特征,并在此基础上构造出更加抽象的高阶特征。如同大脑的深度结构,深度学习中,每一个输入对象会以多层抽象形式表现出来,每个层次对应于不同的皮层区域。深度学习的优势在于它获得的多层次特征是使用通用的学习过程从原始数据中学习获得的,而不是由手工筛选设计出来。目前较为常用的深度学习模型有深度玻尔兹曼机、深度信念网络、栈式自动编码器、递归神经网络以及卷积神经网络。其中卷积神经网络在图 像处理中被广泛使用,并在许多医学图像识别任务中取得了不错的效果。然而,目前大部分网络模型提取图像特征信息时仅使用单一的卷积核,对目标区域变化较大的图像,很难完整地捕捉不同细节大小的特征信息,同时这些网络未能充分利用不同尺度的特征信息,也没有解决好特征融合过程中所产生的信息冗余问题,导致有用信息无法凸显而无用信息无法被抑制,分类的准确性偏低。
【发明内容】
针对现有技术的以上缺陷或改进需求,本发明提供一种图像分类模型的构建方法、图像分类方法及存储介质,用以解决现有技术由于未能充分利用不同尺度的特征信息而存在分类准确性较低的技术问题。
为了实现上述目的,第一方面,本发明提供了一种图像分类模型的构建方法,包括以下步骤:
S1、搭建图像分类模型;图像分类模型包括:依次级联的卷积层、第一金字塔卷积单元、第二金字塔卷积单元、...、第n金字塔卷积单元、池化层和全连接层;第一卷积层用于提取输入图像的初始特征图,并输出至第一金字塔卷积单元;第i金字塔卷积单元用于采用数量为n-i+1的不同尺度的卷积核分别对当前输入到第i金字塔卷积单元的特征图进行进一步的特征提取后,依次对各尺度的卷积核提取的特征图,将其与其前一级卷积核提取的融合特征图进行融合,得到各尺度的卷积核提取的融合特征图,即一组含有不同尺度信息的特征图;将含有不同尺度信息的特征图与当前输入到第i金字塔卷积单元的特征图进行融合,得到包含多尺度信息的输出特征图;其中,i=1,2,…,n;对于各尺度卷积核,其尺度大于其前一级卷积核的尺度;
S2、将按照预设分类任务采集的训练集输入到上述图像分类模型中进行训练,得到训练好的图像分类模型。
进一步优选地,输入图像为对训练集中原始样本图像进行尺度缩放后的图像,以提高计算效率。
进一步优选地,将在第i金字塔卷积单元中采用第block尺度的卷积核提取的特征图记为F i block,block=1,…,n-i+1;对于第i金字塔卷积单元,当i=1,2,…,n-1时,将特征图F i 2与特征图F i 1进行融合得到第二尺度卷积核提取的融合特征图
Figure PCTCN2021086861-appb-000001
从block=3开始,将特征图F i block与第block-1尺度卷积核提取的融合特征图
Figure PCTCN2021086861-appb-000002
进行融合得到第block尺度卷积核提取的融合特征图
Figure PCTCN2021086861-appb-000003
待各尺度卷积核的融合特征图均提取完成后,对各融合特征图
Figure PCTCN2021086861-appb-000004
进行拼接操作后与当前输入到第i金字塔卷积单元的特征图进行融合,得到包含多尺度信息的输出特征图,并输出至第i+1金字塔卷积单元;当i=n时,对当前输入到第i金字塔卷积单元的特征图进行卷积操作后与当前输入到第i金字塔卷积单元的特征图进行融合,得到包含多尺度信息的输出特征图,并输出至池化层进行池化操作后经过全连接层得到分类结果。
进一步优选地,在上述第i金字塔卷积单元中,在所述第i金字塔卷积单元中,将特征图A与特征图B或融合特征图B进行融合的具体方式为:对A进行卷积操作后与B进行组合;其中,与B进行组合的方式包括逐像素叠加操作或拼接操作或拼接后再进行卷积的操作;
对各融合特征图
Figure PCTCN2021086861-appb-000005
进行拼接操作后与当前输入到第i金字塔卷积单元的特征图进行融合的具体方式为:将各融合特征图
Figure PCTCN2021086861-appb-000006
按通道进行拼接,并通过卷积方式来改变拼接后的特征图的特征通道数量,使其与当前输入到第i金字塔卷积单元的特征图的通道数量保持一致,然后将其与当前输入到所述第i金字塔卷积单元的特征图逐像素进行叠加求和,得到包含多尺度信息的输出特征图。
进一步优选地,第i金字塔卷积单元的输出端还连接到该第i金字塔卷积单元的输入端;
第i金字塔卷积单元还用于在将所得的包含多尺度信息的输出特征图输出至下一金字塔卷积单元或池化层之前,将所得的包含多尺度信息的输出特征图重新输入至第i金字塔卷积单元,以对当前所得的包含多尺度信息的输出特征图进一步提取特征;重复多次后,将结果输出至下一金字塔卷积单元或池化层;以提高上述图像分类模型的鲁棒性。
进一步优选地,上述图像分类模型还包括:混合注意力模块;混合注意力模块有n个,当i=1,2,…,n-1时,第i混合注意力模块位于第i金字塔卷积单元和第i+1金字塔卷积单元之间;当i=n时,第i混合注意力模块位于第i金字塔卷积单元与池化层之间;
混合注意力模块包括级联或并联的空间注意力网络和通道注意力网络,以对金字塔卷积单元输入的包含多尺度信息的输出特征图在空间和通道维度上进行筛选,得到特征图F sa,从而抑制冗余的背景信息并凸显对分类结果有益的特征信息。
进一步优选地,第i混合注意力模块的输出端还连接到第i金字塔卷积单元的输入端;
第i混合注意力模块还用于将所得特征图F sa重新输入至第i金字塔卷积单元,以对特征图F sa进一步提取特征;重复多次后,将结果输出至下一金字塔卷积单元或池化层;以提高上述图像分类模型的鲁棒性。
进一步优选地,通道注意力网络用于对输入的特征图按通道进行全局平均池化操作来提取各通道上的全局空间信息;然后通过权值共享的一维卷积核分别对各通道上的全局空间信息的通道权重进行学习,并将学习到的各通道权重分别作用于输入的特征图中所对应的通道上,以对特征信息进行通道维度上的筛选;
通道注意力网络中卷积核的尺寸k 1D与输入的特征图的特征通道数量C 1D满足:
Figure PCTCN2021086861-appb-000007
其中,γ和b均为学习参数,|e| odd表示离e最近的奇数。
第二方面,本发明提供了一种图像分类方法,包括:将待分类的图像输入到采用本发明第一方面所提供的图像分类模型的构建方法所构建的图像分类模型中,得到分类结果。
第三方面,本发明还提供一种机器可读存储介质,所述机器可读存储介质存储有机器可执行指令,所述机器可执行指令在被处理器调用和执行时,所述机器可执行指令促使所述处理器实现如上所述的任一种图像分类模型的构建方法和/或如上所述的图像分类方法。
总体而言,通过本发明所构思的以上技术方案,能够取得以下有益效果:
1、本发明提供了一种图像分类模型的构建方法,所构建的图像分类模型包括依次级联的卷积层、第一金字塔卷积单元、第二金字塔卷积单元、...、第n金字塔卷积单元、池化层和全连接层,其中,金字塔卷积单元通过卷积跨越连接的方式依次对各尺度卷积核提取的特征图,将其与其前一级卷积核提取的融合特征图进行融合,得到各尺度卷积核提取的融合特征图,以进一步挖掘特征图之间的相关性,得到包含多尺度信息的输出特征图,以充分利用输出特征图间的不同尺度信息;本发明利用多尺度方案来提取不同细粒度图像特征,图像分类的准确性较高。
2、本发明所提供的图像分类模型的构建方法所构建的图像分类模型还包括混合注意力模块,基于空间注意力网络和通道注意力网络对金字塔卷积单元输入的包含多尺度信息的输出特征图在空间和通道维度上进行筛选,实现通道特征和空间信息的自适应校准,以抑制不同尺度特征图整合时所引入的冗余信息,通过有效抑制无用的背景信息和凸显关键的特征信 息,进一步提高了图像分类的准确性。
3、本发明所提供的图像分类模型的构建方法所构建的图像分类模型中,第i混合注意力模块的输出端还连接到第i金字塔卷积单元的输入端;其中,金字塔卷积单元和与其输出端相连的混合注意力模块称为混合注意力金字塔模块,将由不同数量和深度卷积核组成的混合注意力金字塔模块级联在一起进行图像分类,在提高模型准确率的同时也大大提升了模型的鲁棒性。
4、本发明所提供的图像分类模型的构建方法所构建的图像分类模型中,图像在输入模型之前可以先进行尺度缩放以提高计算效率。
【附图说明】
图1为本发明实施例1所提供的图像分类模型结构示意图;
图2为本发明实施例1所提供的包含混合注意力模块后的图像分类模型结构示意图;
图3为本发明实施例1所提供的包含混合注意力模块且混合注意力模块的输出端还连接到对应金字塔卷积单元的输入端的图像分类模型结构示意图;
图4为本发明实施例1所提供的金字塔卷积单元中3×3卷积跨越连接的方式示意图;
图5为本发明实施例1所提供的HapcNet与各对比深度学习模型在前房角验证集上的精确度曲线;
图6为本发明实施例1所提供的HapcNet与各对比深度学习模型在前房角测试集上的混淆矩阵;其中,(a)为深度学习模型VGG-16在前房角测试集上的混淆矩阵;(b)为深度学习模型ResNet-50在前房角测试集上的混淆矩阵;(c)为深度学习模型DenseNet-121在前房角测试集上的混淆矩阵;(d)为深度学习模型MobileNet在前房角测试集上的混淆矩阵;(e) 为深度学习模型EfficientNet-B7在前房角测试集上的混淆矩阵;(f)为深度学习模型PyConvNet-50在前房角测试集上的混淆矩阵;(g)为本发明所提供的HapcNet在前房角测试集上的混淆矩阵。
【具体实施方式】
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。此外,下面所描述的本发明各个实施方式中所涉及到的技术特征只要彼此之间未构成冲突就可以相互组合。
实施例1、
一种图像分类模型的构建方法,包括以下步骤:
S1、搭建图像分类模型;如图1所示,图像分类模型包括:依次级联的卷积层、第一金字塔卷积单元、第二金字塔卷积单元、...、第n金字塔卷积单元、池化层和全连接层;第一卷积层用于提取输入图像的初始特征图,并输出至第一金字塔卷积单元;第i金字塔卷积单元用于采用数量为n-i+1的不同尺度的卷积核分别对当前输入到第i金字塔卷积单元的特征图进行进一步的特征提取后,依次对各尺度卷积核提取的特征图,将其与其前一级卷积核提取的融合特征图进行融合,得到各尺度卷积核提取的融合特征图,即一组含有不同尺度信息的特征图;将含有不同尺度信息的特征图进行拼接操作后与当前输入到第i金字塔卷积单元的特征图进行融合,得到包含多尺度信息的输出特征图;其中,i=1,2,…,n;对于各尺度卷积核,其尺度大于其前一级卷积核的尺度。具体地,将在第i金字塔卷积单元中采用第block个尺度的卷积核提取的特征图记为F i block,block=1,…,n-i+1;对于第i金字塔卷积单元,当i=1,2,…,n-1时,将特征图F i 2与特征图F i 1进行融合 得到第二尺度卷积核提取的融合特征图
Figure PCTCN2021086861-appb-000008
将特征图F i 3与第二尺度卷积核提取的融合特征图
Figure PCTCN2021086861-appb-000009
与进行融合得到第三尺度卷积核提取的融合特征图
Figure PCTCN2021086861-appb-000010
以此类推,待各尺度卷积核的融合特征图均提取完成后,对各融合特征图
Figure PCTCN2021086861-appb-000011
进行拼接操作后与当前输入到第i金字塔卷积单元的特征图进行融合,得到包含多尺度信息的输出特征图,并输出至第i+1金字塔卷积单元;当i=n时,对当前输入到第i金字塔卷积单元的特征图进行卷积操作后与当前输入到第i金字塔卷积单元的特征图进行融合,得到包含多尺度信息的输出特征图,并输出至池化层进行池化操作后经过全连接层得到分类结果。优选地,当i=1,2,…,n-1时,在上述第i金字塔卷积单元中,将特征图F i block与第block-1尺度卷积核提取的融合特征图
Figure PCTCN2021086861-appb-000012
进行融合的具体方式为:对特征图F i block-1进行卷积操作后与融合特征图
Figure PCTCN2021086861-appb-000013
进行组合,以充分挖掘不同特征图之间的信息,使得信息更加完整;其中,与融合特征图
Figure PCTCN2021086861-appb-000014
进行组合的方式包括逐像素叠加操作或拼接操作或拼接后再进行卷积的操作。需要说的是,将特征图F i 2与特征图F i 1进行融合的方式与将特征图F i block与第block-1尺度卷积核提取的融合特征图
Figure PCTCN2021086861-appb-000015
进行融合的方式相同,这里不做赘述。进一步地,对各融合特征图
Figure PCTCN2021086861-appb-000016
进行拼接操作后与当前输入到第i金字塔卷积单元的特征图进行融合的具体方式为:将各融合特征图
Figure PCTCN2021086861-appb-000017
按通道进行拼接,并通过卷积方式来改变拼接后的特征图的特征通道数量,使其与当前输入到第i金字塔卷积单元的特征图的通道数量保持一致,然后将其与当前输入到所述第i金字塔卷积单元的特征图逐像素进行叠加求和,得到包含多尺度信息的输出特征图。
S2、将按照预设分类任务采集的训练集输入到上述图像分类模型中进行训练,得到训练好的图像分类模型。优选地,上述输入图像可以为对训练集中原始样本图像进行尺度缩放后的图像,从而提高计算效率,加快训 练速度。本实施例中将交叉熵损失作为总损失函数,具体为:
Figure PCTCN2021086861-appb-000018
其中,η表示输出类别的数量,Num为训练集中图像的批量大小;x p,q为第p个样本且属于第q类时由softmax分类函数所产生的预测概率;y p,q为第p个样本是否分类为第q类的相应标签。
优选地,为了提高上述图像分类模型的鲁棒性,第i金字塔卷积单元的输出端还连接到该第i金字塔卷积单元的输入端;
第i金字塔卷积单元还用于在将所得的包含多尺度信息的输出特征图输出至下一金字塔卷积单元或池化层之前,将所得的包含多尺度信息的输出特征图重新输入至第i金字塔卷积单元,以对当前所得的包含多尺度信息的输出特征图进一步提取特征;重复多次后,将结果输出至下一金字塔卷积单元或池化层;以提高上述图像分类模型的鲁棒性。
优选地,如图2所示,为了解决特征融合过程中所产生的信息冗余问题,进一步凸显有用信息且抑制无用信息;上述图像分类模型还包括:混合注意力模块;混合注意力模块有n个,当i=1,2,…,n-1时,第i混合注意力模块位于第i金字塔卷积单元和第i+1金字塔卷积单元之间;当i=n时,第i混合注意力模块位于第i金字塔卷积单元与池化层之间;混合注意力模块包括级联或并联的空间注意力网络和通道注意力网络,以对金字塔卷积单元输入的包含多尺度信息的输出特征图在空间和通道维度上进行筛选,得到特征图F sa,从而抑制冗余的背景信息并凸显对分类结果有益的特征信息。
以级联的空间注意力网络和通道注意力网络所构成的混合注意力模块为例,将第i金字塔卷积单元输出的包含多尺度信息的输出特征图u输入到混合注意力模块中,在混合注意力模块中执行以下操作:
在通道注意力网络中,首先对输出特征图u按通道进行全局平均池化操作来提取各通道上的全局空间信息,其中,u c中第c个通道的全局空间信息
Figure PCTCN2021086861-appb-000019
H ca和W ca分别为u c的高和宽,u c为输出特征图u的第c个通道所对应的特征图,z为包含各通道全局空间信息的一维矢量;然后通过一个权重可以共享的一维卷积核分别对各通道上的全局空间信息的通道权重进行学习,所得权重为
Figure PCTCN2021086861-appb-000020
其中,δ(·)为Sigmoid函数;1D_Conv表示采用尺寸为k 1D的卷积核对z进行一维卷积操作;需要说明的是,为了实现卷积核大小的自适应选择,通道注意力网络中卷积核的尺寸k 1D与输入的特征图的特征通道数量C 1D满足:
Figure PCTCN2021086861-appb-000021
其中,γ和b均为学习参数,本实施例中分别设置为2和1,|e| odd表示离e最近的奇数;该通道注意力在保证提升分类结果的同时,减少了计算量以及参数量。最后,将学习到的各通道权重分别作用于输出特征图中所对应的通道上,得到通道注意力权重特征图F ca;具体的,F ca=u·w。进一步地,将通道注意力权重特征图F ca输入到空间注意力网络中。
在空间注意力网络中,对特征图F ca沿其通道轴方向分别进行平均池化和最大池化操作,以快速捕获上下文信息从而分别生成两个2D映射
Figure PCTCN2021086861-appb-000022
Figure PCTCN2021086861-appb-000023
(H sa和W sa分别是空间注意力网络生成特征图的高和宽)。然后,将F avg和F max按通道进行拼接生成两通道特征图,并采用预设尺寸的卷积核对所得两通道特征图进行卷积操作,生成空间注意力权重特征图
Figure PCTCN2021086861-appb-000024
最后,将通道注意力权重特征图F ca与空间注意力权重特征图M(F ca)按像素对应相乘(即点乘运算)得到特征图F sa。其中,预设卷积核的尺寸根据通道注意力权重特征图F ca的尺寸确定,通过卷 积操作将两通道特征图的尺寸与通道注意力权重特征图F ca的尺寸保持一致,以实现后续的点乘运算。
需要说明的是,本实施例通过级联的方式将通道注意力网络和空间注意力网络进行组合,构成混合注意力模块;除此之外,也可采用并联或其它方式将两种注意力模块进行合理结合。
当采用并联方式时,通道注意力网络和空间注意力网络分别按照上述操作对第i金字塔卷积单元输入的包含多尺度信息的输出特征图u进行处理,分别得到通道注意力权重特征图和空间注意力权重特征图;然后将注意力权重特征图和空间注意力权重特征图按通道进行拼接后进行卷积运算得到特征图F sa。需要说明的是,这里按通道进行拼接和进行卷积运算的顺序不做限制,只要保证输出维度和输出特征图u的维度保持一致即可。
优选地,如图3所示,第i混合注意力模块的输出端还连接到第i金字塔卷积单元的输入端;第i混合注意力模块还用于将所得特征图F sa重新输入至第i金字塔卷积单元,以对特征图F sa进一步提取特征;重复多次后,将结果输出至下一金字塔卷积单元或池化层;以提高上述图像分类模型的鲁棒性。
需要说明的是,可以将金字塔卷积单元和与其输出端相连的混合注意力模块称为混合注意力金字塔模块;为了提升网络模型的鲁棒性,将由不同数量和深度卷积核组成的混合注意力金字塔模块级联在一起构成本发明中的分类模型,输入图像经过多次重复的不同混合注意力金字塔模块处理,由此产生最终的分类预测结果。且本发明可以根据实际任务来调整混合注意力金字塔模块中提取特征的重复次数以及各金字塔卷积单元中卷积核的大小和数量。
进一步地,以眼科常见疾病——青光眼为例,光学相干断层成像(optical coherence tomography,OCT)因其无创、舒适、高分辨率、非接触等优点, 常被用来帮助临床医生识别患者前房角(Anterior chamber angle,ACA)类型,即开角,窄角以及闭角,但因个体的不同,前房角在OCT图像中所占区域会出现一定范围的波动。若前房角较小,单一的卷积核很难准确捕获微小细节的特征信息,同时因忽略特征融合过程中的信息冗余问题,导致有用信息无法凸显而无用信息无法被抑制,最终影响了前房角的准确类型预测;本发明提供了一种图像分类模型,包括多个金字塔卷积单元,它利用多尺度方案来提取不同细粒度图像特征。在该模块中,将图像输入到由不同大小和深度的卷积核滤波器组成的金字塔卷积模块中,并对输入图像分别进行不同尺度信息的提取。然后,通过卷积跨越连接的方式依次对各尺度卷积核提取的特征图,将其与其前一级卷积核提取的融合特征图进行融合,得到各尺度卷积核提取的融合特征图,以进一步挖掘特征图之间的相关性,得到包含不同尺度信息的输出特征图,由此完成所有大小卷积核的特征提取。接着,利用特征图组合操作将输出的含有不同尺度信息的特征图拼接在一起,并通过1×1卷积来改变拼接后特征通道的数量。最后,将组合的特征图与输入金字塔卷积模块的图像进行逐像素叠加求和。
为了更好的验证本发明所构建的分类模型的准确性,本发明将2019年MICCAI(Medical Image Computing and Computer Assisted Intervention)国际会议所提供的闭角青光眼数据集作为训练数据集,从中随机选择了1341张图像,并通过裁剪将图像切分为2682张前房角图像,该数据集已提供了开角前房角和闭角前房角两种金标准标签。在此基础上,进一步将闭角青光眼分为窄角前房角和闭角青光眼。为避免因数据分布不平衡而导致的训练难收敛问题,通过数据增强对原始数据进行平移和旋转处理,得到开角前房角1536张,窄角前房角1214张,闭角前房角1458张,最终得到的训练集、验证集和测试集的数量分别为3367,419和422。
为了进一步体现本发明的优点,本发明利用上述青光眼前房角数据集对采用本发明所构建的分类模型进行分类的方法和目前主流深度学习分类 方法的性能进行评价,评价选择的指标包括精确度ACC、平均敏感性
Figure PCTCN2021086861-appb-000025
平均特异性
Figure PCTCN2021086861-appb-000026
和平均平衡准确率
Figure PCTCN2021086861-appb-000027
其定义如下:
Figure PCTCN2021086861-appb-000028
Figure PCTCN2021086861-appb-000029
Figure PCTCN2021086861-appb-000030
Figure PCTCN2021086861-appb-000031
其中,N test为测试集中的图像数量,TP s、TN s、FP s、FN s(s∈{1,2,3})分别表示当第s类被当作为阳性,其余类别为阴性时,真阳性、真阴性、假阳性和假阴性数量。
需要说明的是,本实施例中,金字塔卷积单元的个数n取值为4,第一金字塔卷积单元的卷积核的个数为n个,卷积核尺度分别为3×3,5×5,…,(2n+1)×(2n+1);第二金字塔卷积单元的卷积核个数为n-1个,卷积核尺度分别为3×3,5×5,…,(2n-1)×(2n-1);第n-1金字塔卷积单元的卷积核为2个,卷积核尺度分别为3×3,5×5;第n金字塔卷积单元的卷积核为1个,尺度为3×3。在各金字塔卷积单元中,通过卷积跨越连接的方式依次对各尺度卷积核提取的特征图,将其与其前一级卷积核提取的融合特征图进行融合,得到各尺度卷积核提取的融合特征图,从而得到包含不同尺度信息的输出特征图,依次完成所有卷积核的特征提取;即将第二尺度卷积核提取的特征图F i 2与第一尺度卷积核所提取的特征图F i 1进行融合得到第二尺度卷积核提取的融合特征图
Figure PCTCN2021086861-appb-000032
将第三尺度卷积核所提取的特征图F i 3与第二尺度卷积核提取的融合特征图F i 2进行融合得到第三尺度卷积核提取的融合特征图
Figure PCTCN2021086861-appb-000033
将第四尺度卷积核所提取的特征图F i 4与第三 尺度卷积核提取的融合特征图
Figure PCTCN2021086861-appb-000034
进行融合得到第四尺度卷积核提取的融合特征图
Figure PCTCN2021086861-appb-000035
以此类推。需要说明的是,前一级卷积核的尺度小于当前卷积核的尺度。具体的,本实施例中,如图4所示,以金字塔卷积单元中有3个卷积核为例,通过3×3卷积跨越连接的方式进行操作,第block尺度卷积核提取的融合特征图为:
Figure PCTCN2021086861-appb-000036
其中K 3×3为尺寸为3×3的卷积核。待各尺度卷积核的融合特征图均提取完成后,利用特征图组合操作将输出的
Figure PCTCN2021086861-appb-000037
按通道拼接在一起,并通过1×1卷积来改变拼接后特征通道数量,以使得其能够与当前输入到第i金字塔卷积单元的特征图逐像素叠加求和,得到包含多尺度信息的输出特征图。本实施例中,第一金字塔卷积单元和与第一混合注意力模块所构成的第一混合注意力金字塔模块提取特征时重复的次数为3次,第二金字塔卷积单元和与第二混合注意力模块所构成的第二混合注意力金字塔模块提取特征时重复的次数为4次,第三金字塔卷积单元和与第三混合注意力模块所构成的第三混合注意力金字塔模块提取特征时重复的次数为6次,第四金字塔卷积单元和与第四混合注意力模块所构成的第四混合注意力金字塔模块提取特征时重复的次数为3次。
表1为本发明所提供的采用本发明所构建的分类模型(这里记为HapcNet)与不同主流网络(VGG-16、ResNet-50、DenseNet-121、MobileNet、EfficientNet-B7及PyConvNet-50)在前房角测试集上分类性能的比较。其中,EfficientNet-B7为EfficientNet的B7系列,其它网络中的数字表示网络的层数,如VGG-16表示16层VGG网络。从表1中可看出,分类效果较为突出的算法包括EfficientNet、PyConvNet以及本发明所提供的HapcNet,它们在绝大部分指标上优于其余四种深度学习方法。与EfficientNet和PyConvNet方法相比,本发明所提供的HapcNet在ACC值上 分别提高了约1.47%和1.66%。在
Figure PCTCN2021086861-appb-000038
上,虽然各网络之间相差不明显,但VGG表现最差,其
Figure PCTCN2021086861-appb-000039
为0.9933,而本发明所提供的HapcNet则可达0.9998,在这些对比网络中取得了最佳分类性能。
表1
Figure PCTCN2021086861-appb-000040
进一步地,为更直观地显示本发明相对于其余方法的优越性,分别采用本发明所提供的HapcNet与各对比深度学习模型进行实验;如图5所示为本发明所提供的HapcNet与各对比深度学习模型在前房角验证集上的精确度曲线,其中,横坐标Epochs为迭代次数,纵坐标Accuracy为精确度;如图6所示为本发明所提供的HapcNet与各对比深度学习模型在前房角测试集上的混淆矩阵,其中,“0”、“1”和“2”分别代表开角、窄角和闭角;图6中的(a)为深度学习模型VGG-16在前房角测试集上的混淆矩阵;图6中的(b)为深度学习模型ResNet-50在前房角测试集上的混淆矩阵;图6中的(c)为深度学习模型DenseNet-121在前房角测试集上的混淆矩阵;图6中的(d)为深度学习模型MobileNet在前房角测试集上的混淆矩阵;图6中的(e)为深度学习模型EfficientNet-B7在前房角测试集上的混淆矩阵;图6中的(f)为深度学习模型PyConvNet-50在前房角测试集上的混淆矩阵;图6中的(g)为本发明所提供的HapcNet在前房角测试集上的混淆矩阵。 从图5可以发现,本发明所提供的HapcNet相比深度学习模型取得了更佳的收敛精确度,且能提供极具竞争力的收敛速度。从图6所示的混淆矩阵可以看出,HapcNet、EfficientNet-B7和PyConvNet-50相比其余的主流网络在前房角测试数据集上能取得更优异的分类性能。具体而言,对于开角前房角,本发明所提供的HapcNet能提供次佳的准确率98.7%,EfficientNet-B7取得了最佳的分类准确率99.4%;而对于窄角前房角,本发明所提供的HapcNet则能提供100%的最佳准确率,EfficientNet-B7只能提供次佳的准确率;对于闭角前房角,本发明所提供的HapcNet仍能取得最佳的分类准确率。综上所述,本发明所提供的HapcNet相比其它深度学习模型在前房角数据集分类上具有优势。
实施例2、
一种图像分类方法,包括:将待分类的图像输入到采用实施例1所提供的图像分类模型的构建方法所构建的图像分类模型中,得到分类结果。优选地,在将待分类的图像输入到图像分类模型之前,先对待分类的图像进行尺度缩放以提高计算效率。
相关技术方案同实施例1,这里不做赘述。
实施例3、
一种机器可读存储介质,所述机器可读存储介质存储有机器可执行指令,所述机器可执行指令在被处理器调用和执行时,所述机器可执行指令促使所述处理器实现实施例1所提供的图像分类模型的构建方法和/或实施例2所提供的图像分类方法。
相关技术特征同实施例1和实施例2,这里不做赘述。
本领域的技术人员容易理解,以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。

Claims (10)

  1. 一种图像分类模型的构建方法,其特征在于,包括以下步骤:
    S1、搭建图像分类模型;图像分类模型包括:依次级联的卷积层、第一金字塔卷积单元、第二金字塔卷积单元、...、第n金字塔卷积单元、池化层和全连接层;所述第一卷积层用于提取输入图像的初始特征图,并输出至第一金字塔卷积单元;所述第i金字塔卷积单元用于采用数量为n-i+1的不同尺度的卷积核分别对当前输入到所述第i金字塔卷积单元的特征图进行进一步的特征提取后,依次对各尺度卷积核提取的特征图,将其与其前一级卷积核提取的融合特征图进行融合,得到各尺度卷积核提取的融合特征图,即一组含有不同尺度信息的特征图;将所述含有不同尺度信息的特征图与当前输入到所述第i金字塔卷积单元的特征图进行融合,得到包含多尺度信息的输出特征图;其中,i=1,2,…,n;对于各尺度卷积核,其尺度大于其前一级卷积核的尺度;
    S2、将按照预设分类任务采集的训练集输入到所述图像分类模型中进行训练,得到训练好的图像分类模型。
  2. 根据权利要求1所述的图像分类模型的构建方法,其特征在于,所述输入图像为对所述训练集中原始样本图像进行尺度缩放后的图像。
  3. 根据权利要求1所述的图像分类模型的构建方法,其特征在于,将在所述第i金字塔卷积单元中采用第block尺度的卷积核提取的特征图记为F i block,block=1,…,n-i+1;对于所述第i金字塔卷积单元,当i=1,2,…,n-1时,将特征图F i 2与特征图F i 1进行融合得到第二尺度卷积核提取的融合特征图
    Figure PCTCN2021086861-appb-100001
    从block=3开始,依次将特征图F i block与第block-1尺度卷积核提取的融合特征图
    Figure PCTCN2021086861-appb-100002
    进行融合得到第block尺度卷积核提取的融合特征图
    Figure PCTCN2021086861-appb-100003
    待各尺度卷积核的融合特征图均提取完成后,对各融合特征图
    Figure PCTCN2021086861-appb-100004
    进行拼接操作后与当前输入到所述第i金字塔卷积单元的特征图进行融合,得到包含多尺度信息的输出特征图,并输出至第i+1金字塔卷积单元;当i=n时,对当前输入到所述第i金字塔卷积单元的特征图进行卷积操作后与当前输入到第i金字塔卷积单元的特征图进行融合,得到包含多尺度信息的输出特征图,并输出至所述池化层进行池化操作后经过所述全连接层得到分类结果。
  4. 根据权利要求3所述的图像分类模型的构建方法,其特征在于,在所述第i金字塔卷积单元中,将特征图A与特征图B或融合特征图B进行融合的具体方式为:对A进行卷积操作后与B进行组合;其中,与B进行组合的方式包括逐像素叠加操作或拼接操作或拼接后再进行卷积的操作;
    所述对各融合特征图
    Figure PCTCN2021086861-appb-100005
    进行拼接操作后与当前输入到所述第i金字塔卷积单元的特征图进行融合的具体方式为:将各融合特征图
    Figure PCTCN2021086861-appb-100006
    按通道进行拼接,并通过卷积方式来改变拼接后的特征图的特征通道数量,使其与当前输入到所述第i金字塔卷积单元的特征图的通道数量保持一致,然后将其与当前输入到所述第i金字塔卷积单元的特征图逐像素进行叠加求和,得到包含多尺度信息的输出特征图。
  5. 根据权利要求1-4任意一项所述的图像分类模型的构建方法,其特征在于,所述第i金字塔卷积单元的输出端还连接到所述第i金字塔卷积单元的输入端;
    所述第i金字塔卷积单元还用于在将所述包含多尺度信息的输出特征图输出至下一金字塔卷积单元或所述池化层之前,将所述包含多尺度信息的输出特征图重新输入至所述第i金字塔卷积单元,以对所述包含多尺度信息的输出特征图进一步提取特征;重复多次后,将结果输出至下一金字塔卷积单元或所述池化层。
  6. 根据权利要求1-4任意一项所述的图像分类模型的构建方法,其特 征在于,所述图像分类模型还包括:混合注意力模块;所述混合注意力模块有n个,当i=1,2,…,n-1时,第i混合注意力模块位于所述第i金字塔卷积单元和第i+1金字塔卷积单元之间;当i=n时,第i混合注意力模块位于所述第i金字塔卷积单元与所述池化层之间;
    所述混合注意力模块包括级联或并联的空间注意力网络和通道注意力网络,以对金字塔卷积单元输入的包含多尺度信息的输出特征图在空间和通道维度上进行筛选,得到特征图F sa,从而抑制冗余的背景信息。
  7. 根据权利要求6所述的图像分类模型的构建方法,其特征在于,所述第i混合注意力模块的输出端还连接到所述第i金字塔卷积单元的输入端;
    所述第i混合注意力模块还用于将所述特征图F sa重新输入至所述第i金字塔卷积单元,以对所述特征图F sa进一步提取特征;重复多次后,将结果输出至下一金字塔卷积单元或所述池化层。
  8. 根据权利要求6所述的图像分类模型的构建方法,其特征在于,所述通道注意力网络用于对输入的特征图按通道进行全局平均池化操作来提取各通道上的全局空间信息;然后通过权值共享的一维卷积核分别对各通道上的全局空间信息的通道权重进行学习,并将学习到的各通道权重分别作用于所述输入的特征图中所对应的通道上,以对特征信息进行通道维度上的筛选;
    所述通道注意力网络中卷积核的尺寸k 1D与所述输入的特征图的特征通道数量C 1D满足:
    Figure PCTCN2021086861-appb-100007
    其中,γ和b均为学习参数,|e| odd表示离e最近的奇数。
  9. 一种图像分类方法,其特征在于,包括:将待分类的图像输入到采用权利要求1-8任意一项所述的图像分类模型的构建方法所构建的图像分类模型中,得到分类结果。
  10. 一种机器可读存储介质,其特征在于,所述机器可读存储介质存储有机器可执行指令,所述机器可执行指令在被处理器调用和执行时,所述机器可执行指令促使所述处理器实现权利要求1-8任意一项所述的图像分类模型的构建方法和/或权利要求9所述的图像分类方法。
PCT/CN2021/086861 2021-04-01 2021-04-13 一种图像分类模型的构建方法、图像分类方法及存储介质 WO2022205502A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110356938.7A CN113191390B (zh) 2021-04-01 2021-04-01 一种图像分类模型的构建方法、图像分类方法及存储介质
CN202110356938.7 2021-04-01

Publications (1)

Publication Number Publication Date
WO2022205502A1 true WO2022205502A1 (zh) 2022-10-06

Family

ID=76974445

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/086861 WO2022205502A1 (zh) 2021-04-01 2021-04-13 一种图像分类模型的构建方法、图像分类方法及存储介质

Country Status (2)

Country Link
CN (1) CN113191390B (zh)
WO (1) WO2022205502A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115496808A (zh) * 2022-11-21 2022-12-20 中山大学中山眼科中心 一种角膜缘定位方法及其系统
CN116758029A (zh) * 2023-06-15 2023-09-15 东莞市商斯迈智能科技有限公司 基于机器视觉的擦窗机移动控制方法及系统

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113762251B (zh) * 2021-08-17 2024-05-10 慧影医疗科技(北京)股份有限公司 一种基于注意力机制的目标分类方法及系统
CN114821121B (zh) * 2022-05-09 2023-02-03 盐城工学院 一种基于rgb三分量分组注意力加权融合的图像分类方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190228268A1 (en) * 2016-09-14 2019-07-25 Konica Minolta Laboratory U.S.A., Inc. Method and system for cell image segmentation using multi-stage convolutional neural networks
CN110188685A (zh) * 2019-05-30 2019-08-30 燕山大学 一种基于双注意力多尺度级联网络的目标计数方法及系统
CN110992361A (zh) * 2019-12-25 2020-04-10 创新奇智(成都)科技有限公司 基于代价平衡的发动机紧固件检测系统及检测方法
CN111739075A (zh) * 2020-06-15 2020-10-02 大连理工大学 一种结合多尺度注意力的深层网络肺部纹理识别方法
CN112418176A (zh) * 2020-12-09 2021-02-26 江西师范大学 一种基于金字塔池化多级特征融合网络的遥感图像语义分割方法

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018120013A1 (en) * 2016-12-30 2018-07-05 Nokia Technologies Oy Artificial neural network
CN110232394B (zh) * 2018-03-06 2021-08-10 华南理工大学 一种多尺度图像语义分割方法
CN109034210B (zh) * 2018-07-04 2021-10-12 国家新闻出版广电总局广播科学研究院 基于超特征融合与多尺度金字塔网络的目标检测方法
CN109598269A (zh) * 2018-11-14 2019-04-09 天津大学 一种基于多分辨率输入与金字塔膨胀卷积的语义分割方法
CN111507408B (zh) * 2020-04-17 2022-11-04 深圳市商汤科技有限公司 图像处理方法及装置、电子设备和存储介质
CN112396645B (zh) * 2020-11-06 2022-05-31 华中科技大学 一种基于卷积残差学习的单目图像深度估计方法和系统
AU2020103901A4 (en) * 2020-12-04 2021-02-11 Chongqing Normal University Image Semantic Segmentation Method Based on Deep Full Convolutional Network and Conditional Random Field
CN112287924B (zh) * 2020-12-24 2021-03-16 北京易真学思教育科技有限公司 文本区域检测方法、装置、电子设备和计算机存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190228268A1 (en) * 2016-09-14 2019-07-25 Konica Minolta Laboratory U.S.A., Inc. Method and system for cell image segmentation using multi-stage convolutional neural networks
CN110188685A (zh) * 2019-05-30 2019-08-30 燕山大学 一种基于双注意力多尺度级联网络的目标计数方法及系统
CN110992361A (zh) * 2019-12-25 2020-04-10 创新奇智(成都)科技有限公司 基于代价平衡的发动机紧固件检测系统及检测方法
CN111739075A (zh) * 2020-06-15 2020-10-02 大连理工大学 一种结合多尺度注意力的深层网络肺部纹理识别方法
CN112418176A (zh) * 2020-12-09 2021-02-26 江西师范大学 一种基于金字塔池化多级特征融合网络的遥感图像语义分割方法

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115496808A (zh) * 2022-11-21 2022-12-20 中山大学中山眼科中心 一种角膜缘定位方法及其系统
CN115496808B (zh) * 2022-11-21 2023-03-24 中山大学中山眼科中心 一种角膜缘定位方法及其系统
CN116758029A (zh) * 2023-06-15 2023-09-15 东莞市商斯迈智能科技有限公司 基于机器视觉的擦窗机移动控制方法及系统

Also Published As

Publication number Publication date
CN113191390A (zh) 2021-07-30
CN113191390B (zh) 2022-06-14

Similar Documents

Publication Publication Date Title
WO2022205502A1 (zh) 一种图像分类模型的构建方法、图像分类方法及存储介质
CN110236543B (zh) 基于深度学习的阿尔茨海默病多分类诊断系统
Elmuogy et al. An efficient technique for CT scan images classification of COVID-19
Narayanan et al. Understanding deep neural network predictions for medical imaging applications
CN113065588A (zh) 基于双线性注意力网络的医学影像数据分类方法及系统
Ding et al. FTransCNN: Fusing Transformer and a CNN based on fuzzy logic for uncertain medical image segmentation
CN111260639A (zh) 多视角信息协作的乳腺良恶性肿瘤分类方法
Tambe et al. Towards designing an automated classification of lymphoma subtypes using deep neural networks
Tursynova et al. Brain Stroke Lesion Segmentation Using Computed Tomography Images based on Modified U-Net Model with ResNet Blocks.
Xu et al. Guided multi-scale refinement network for camouflaged object detection
Feng et al. Trusted multi-scale classification framework for whole slide image
Shi et al. Combined channel and spatial attention for YOLOv5 during target detection
Peng et al. A multi-task network for cardiac magnetic resonance image segmentation and classification
Derwin et al. Hybrid multi-kernel SVM algorithm for detection of microaneurysm in color fundus images
Nur et al. Using fused Contourlet transform and neural features to spot COVID19 infections in CT scan images
Wan et al. C2BNet: A Deep Learning Architecture with Coupled Composite Backbone for Parasitic EGG Detection in Microscopic Images
Dhawan et al. Deep Learning Based Sugarcane Downy Mildew Disease Detection Using CNN-LSTM Ensemble Model for Severity Level Classification
Yan et al. Two and multiple categorization of breast pathological images by transfer learning
Albelaihi et al. DeepDiabetic: An Identification System of Diabetic Eye Diseases Using Deep Neural Networks
Chu Machine learning for automation of Chromosome based Genetic Diagnostics
CN116958535B (zh) 一种基于多尺度残差推理的息肉分割系统及方法
Zhang et al. CTransNet: Convolutional Neural Network Combined with Transformer for Medical Image Segmentation
Kassim et al. A cell augmentation tool for blood smear analysis
Parvathi et al. Diabetic Retinopathy Detection Using Transfer Learning
Gandhi et al. A vision transformer approach for classification an a small-sized medical image dataset

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21934156

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21934156

Country of ref document: EP

Kind code of ref document: A1