CN113191390A - Image classification model construction method, image classification method and storage medium - Google Patents
Image classification model construction method, image classification method and storage medium Download PDFInfo
- Publication number
- CN113191390A CN113191390A CN202110356938.7A CN202110356938A CN113191390A CN 113191390 A CN113191390 A CN 113191390A CN 202110356938 A CN202110356938 A CN 202110356938A CN 113191390 A CN113191390 A CN 113191390A
- Authority
- CN
- China
- Prior art keywords
- feature map
- convolution
- convolution unit
- pyramid
- ith
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013145 classification model Methods 0.000 title claims abstract description 62
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000010276 construction Methods 0.000 title claims abstract description 13
- 230000004927 fusion Effects 0.000 claims abstract description 32
- 238000011176 pooling Methods 0.000 claims abstract description 28
- 238000000605 extraction Methods 0.000 claims abstract description 7
- 238000010586 diagram Methods 0.000 claims description 37
- 238000012549 training Methods 0.000 claims description 15
- 238000012216 screening Methods 0.000 claims description 6
- 210000002159 anterior chamber Anatomy 0.000 description 36
- 238000012360 testing method Methods 0.000 description 21
- 238000013136 deep learning model Methods 0.000 description 20
- 239000011159 matrix material Substances 0.000 description 17
- 230000008901 benefit Effects 0.000 description 5
- 230000000052 comparative effect Effects 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 201000002862 Angle-Closure Glaucoma Diseases 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000012014 optical coherence tomography Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 208000010412 Glaucoma Diseases 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 238000007499 fusion processing Methods 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 210000003710 cerebral cortex Anatomy 0.000 description 1
- 238000003759 clinical diagnosis Methods 0.000 description 1
- 230000002860 competitive effect Effects 0.000 description 1
- 238000004195 computer-aided diagnosis Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000002595 magnetic resonance imaging Methods 0.000 description 1
- 239000010445 mica Substances 0.000 description 1
- 229910052618 mica group Inorganic materials 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000000452 restraining effect Effects 0.000 description 1
- 238000003325 tomography Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4038—Image mosaicing, e.g. composing plane images from plane sub-images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
- G06T2200/32—Indexing scheme for image data processing or generation, in general involving image mosaicing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a construction method of an image classification model, the image classification method and a storage medium, wherein the constructed image classification model comprises a convolution layer, a first pyramid convolution unit, a second pyramid convolution unit …, an nth pyramid convolution unit, a pooling layer and a full-connection layer which are sequentially cascaded; after the ith pyramid convolution unit respectively performs further feature extraction on the currently input feature map by adopting convolution kernels with different scales, the number of which is n-i +1, sequentially fusing the feature map extracted by the convolution kernels of each scale with the fusion feature map extracted by the convolution kernel of the previous stage to obtain fusion feature maps extracted by the convolution kernels of each scale, namely a group of feature maps containing information with different scales; fusing the feature map containing different scale information with the currently input feature map to obtain an output feature map containing multi-scale information; 1,2, …, n; the invention fully utilizes the information of different scales and has higher image classification accuracy.
Description
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a construction method of an image classification model, an image classification method and a storage medium.
Background
Image classification techniques are the core of computer vision and have wide applications in many fields, such as: the method comprises the steps of face recognition and intelligent video analysis in the security field, traffic scene recognition in the traffic field, image retrieval in the internet field, medical image analysis in the medical field and the like. Taking medical images as an example, doctors can identify images collected by imaging devices (such as magnetic resonance imaging, ultrasonic imaging, optical tomography and the like) in clinical diagnosis to achieve the purpose of disease screening. However, the manual identification effect greatly depends on the clinical experience of the doctor, and the diagnosis efficiency of the doctor is also influenced by huge medical data volume, so that misdiagnosis or missed diagnosis is easily caused by over fatigue of the doctor. At present, the automated computer-aided diagnosis technology is widely applied to the field of medical image recognition, and the image is processed and analyzed by utilizing the strong computing power of a computer, so that information with reference value is provided for a clinician, and the workload of the clinician is greatly reduced.
In recent years, deep learning algorithms have gained wide attention in the field of image classification. Compared with the traditional machine learning algorithm for acquiring manual features based on shallow learning, the deep learning method combines a plurality of nonlinear shallow features and constructs more abstract high-order features on the basis. Like the deep structure of the brain, in deep learning, each input object is represented in a multi-layer abstract form, each layer corresponding to a different cortical region. The advantage of deep learning is that it achieves multi-level features that are learned from raw data using a common learning process, rather than being designed by manual screening. The deep learning models which are commonly used at present comprise a deep boltzmann machine, a deep belief network, a stacked automatic encoder, a recurrent neural network and a convolutional neural network. Convolutional neural networks are widely used in image processing and have good effects in many medical image recognition tasks. However, most of the existing network models only use a single convolution kernel when extracting image feature information, and for images with large target area changes, it is difficult to completely capture feature information with different detail sizes.
Disclosure of Invention
In view of the above drawbacks or needs for improvement in the prior art, the present invention provides a method for constructing an image classification model, an image classification method, and a storage medium, so as to solve the technical problem in the prior art that the classification accuracy is low due to the fact that feature information of different scales is not fully utilized.
In order to achieve the above object, in a first aspect, the present invention provides a method for constructing an image classification model, including the following steps:
s1, building an image classification model; the image classification model comprises: the convolution layer, the first pyramid convolution unit, the second pyramid convolution unit, the nth pyramid convolution unit, the pooling layer and the full-connection layer are sequentially cascaded; the first convolution layer is used for extracting an initial feature map of an input image and outputting the initial feature map to the first pyramid convolution unit; the ith pyramid convolution unit is used for adopting convolution kernels with different scales of n-i +1 in number to respectively perform further feature extraction on the feature map currently input to the ith pyramid convolution unit, and then sequentially fusing the feature map extracted by the convolution kernels with the fusion feature map extracted by the convolution kernel of the previous stage to obtain fusion feature maps extracted by the convolution kernels of each scale, namely a group of feature maps containing information with different scales; fusing the feature map containing different scale information with the feature map currently input to the ith pyramid convolution unit to obtain an output feature map containing multi-scale information; wherein i is 1,2, …, n; for each scale convolution kernel, the scale is larger than that of the previous stage convolution kernel;
and S2, inputting a training set collected according to a preset classification task into the image classification model for training to obtain a trained image classification model.
Further preferably, the input image is an image obtained by scaling the original sample image in the training set, so as to improve the calculation efficiency.
Further preferably, a feature map extracted by a convolution kernel of a first block scale in an ith pyramid convolution unit is marked as Fi block,block 1, …, n-i + 1; for the ith pyramid convolution unit, when i is 1,2, …, n-1, the feature map F is processedi 2And characteristic diagram Fi 1Fusing to obtain a fusion characteristic diagram extracted by a second scale convolution kernelStarting from block to 3, the feature map F is processedi blockFusion characteristic graph extracted from convolution kernel with first block-1 scaleFusing to obtain a fusion characteristic graph extracted by a first block scale convolution kernelAfter the fusion characteristic graphs of the convolution kernels of all scales are extracted, all the fusion characteristic graphs are subjected toAfter splicing operation is carried out, the feature map is fused with the feature map currently input to the ith pyramid convolution unit to obtain an output feature map containing multi-scale information, and the output feature map is output to the (i +1) th pyramid convolution unit; and when i is equal to n, performing convolution operation on the feature map currently input to the ith pyramid convolution unit, fusing the feature map currently input to the ith pyramid convolution unit with the feature map currently input to the ith pyramid convolution unit to obtain an output feature map containing multi-scale information, outputting the output feature map to the pooling layer, performing pooling operation, and then obtaining a classification result through the full-connection layer.
More preferably, in the ith pyramid convolution unit, a specific way of fusing the feature map a with the feature map B or the fused feature map B in the ith pyramid convolution unit is as follows: performing convolution operation on the A and then combining the A with the B; the mode of combining with B comprises pixel-by-pixel superposition operation or splicing operation or convolution operation after splicing;
for each fused feature mapMake a spliceThe specific mode of fusing the operated characteristic graph with the characteristic graph currently input to the ith pyramid convolution unit is as follows: combining each fused feature mapAnd splicing according to channels, changing the number of characteristic channels of the spliced characteristic diagram in a convolution mode to keep the characteristic channels consistent with the number of channels of the characteristic diagram currently input to the ith pyramid convolution unit, and then superposing and summing the characteristic channels and the characteristic diagram currently input to the ith pyramid convolution unit pixel by pixel to obtain an output characteristic diagram containing multi-scale information.
Further preferably, the output end of the ith pyramid convolution unit is also connected to the input end of the ith pyramid convolution unit;
the ith pyramid convolution unit is also used for inputting the obtained output feature map containing the multi-scale information to the ith pyramid convolution unit again before outputting the obtained output feature map containing the multi-scale information to the next pyramid convolution unit or the pooling layer so as to further extract features of the currently obtained output feature map containing the multi-scale information; after repeating for multiple times, outputting the result to the next pyramid convolution unit or pooling layer; so as to improve the robustness of the image classification model.
Further preferably, the image classification model further includes: a hybrid attention module; n mixed attention modules, when i is 1,2, …, n-1, the ith mixed attention module is located between the ith pyramid convolution unit and the (i +1) th pyramid convolution unit; when i is equal to n, the ith mixed attention module is positioned between the ith pyramid convolution unit and the pooling layer;
the mixed attention module comprises a space attention network and a channel attention network which are connected in series or in parallel, and is used for screening the output characteristic diagram containing multi-scale information input by the pyramid convolution unit in space and channel dimensions to obtain a characteristic diagram FsaThereby suppressing redundant background information and highlighting characteristic information that is beneficial to the classification result.
Further preferably, the output end of the ith mixed attention module is also connected to the input end of the ith pyramid convolution unit;
the ith mixed attention module is also used for obtaining a characteristic diagram FsaRe-input to the ith pyramid convolution unit to perform feature map FsaFurther extracting features; after repeating for multiple times, outputting the result to the next pyramid convolution unit or pooling layer; so as to improve the robustness of the image classification model.
Further preferably, the channel attention network is configured to perform global average pooling operation on the input feature map according to channels to extract global spatial information on each channel; then, channel weights of global space information on all channels are learned through one-dimensional convolution kernels shared by the weights, and the learned channel weights are respectively acted on the corresponding channels in the input feature graph so as to carry out channel dimension screening on the feature information;
size k of convolution kernel in channel attention network1DAnd the number C of feature channels of the input feature map1DSatisfies the following conditions:wherein gamma and b are learning parameters, | e! YoddRepresenting the odd number nearest to e.
In a second aspect, the present invention provides an image classification method, including: and inputting the image to be classified into the image classification model constructed by adopting the construction method of the image classification model provided by the first aspect of the invention to obtain a classification result.
In a third aspect, the present invention also provides a machine-readable storage medium storing machine-executable instructions which, when invoked and executed by a processor, cause the processor to implement any one of the image classification model construction methods described above and/or the image classification method described above.
Generally, by the above technical solution conceived by the present invention, the following beneficial effects can be obtained:
1. the invention provides a method for constructing an image classification model, wherein the constructed image classification model comprises a convolution layer, a first pyramid convolution unit, a second pyramid convolution unit, an n-th pyramid convolution unit, a pooling layer and a full-connection layer which are sequentially cascaded, wherein the pyramid convolution unit sequentially fuses a feature map extracted by each scale convolution kernel and a fusion feature map extracted by a previous stage convolution kernel in a convolution crossing connection mode to obtain a fusion feature map extracted by each scale convolution kernel, so that the correlation among the feature maps is further mined to obtain an output feature map containing multi-scale information, and different scale information among the output feature maps is fully utilized; the invention utilizes a multi-scale scheme to extract different fine-grained image characteristics, and the accuracy of image classification is higher.
2. The image classification model constructed by the construction method of the image classification model further comprises a mixed attention module, the output feature graph which is input by the pyramid convolution unit and contains multi-scale information is screened on the space and channel dimensions based on the space attention network and the channel attention network, self-adaptive calibration of channel features and space information is achieved, redundant information introduced during integration of feature graphs of different scales is restrained, and the accuracy of image classification is further improved by effectively restraining useless background information and highlighting key feature information.
3. In the image classification model constructed by the construction method of the image classification model provided by the invention, the output end of the ith mixed attention module is also connected to the input end of the ith pyramid convolution unit; the pyramid convolution unit and the mixed attention module connected with the output end of the pyramid convolution unit are called as mixed attention pyramid modules, the mixed attention pyramid modules composed of different numbers and depth convolution kernels are cascaded together for image classification, and the model accuracy rate is improved while the robustness of the model is greatly improved.
4. In the image classification model constructed by the construction method of the image classification model, the image can be scaled in scale before being input into the model so as to improve the calculation efficiency.
Drawings
Fig. 1 is a schematic structural diagram of an image classification model provided in embodiment 1 of the present invention;
fig. 2 is a schematic structural diagram of an image classification model including a mixed attention module according to embodiment 1 of the present invention;
fig. 3 is a schematic structural diagram of an image classification model according to embodiment 1 of the present invention, which includes a mixed attention module, and an output end of the mixed attention module is further connected to an input end of a corresponding pyramid convolution unit;
fig. 4 is a schematic diagram illustrating a 3 × 3 convolution cross-connection manner in the pyramid convolution unit provided in embodiment 1 of the present invention;
FIG. 5 is a graph of the accuracy of a HapcNet and each comparative deep learning model on a validation set of anterior chamber angles, as provided in example 1 of the present invention;
FIG. 6 is a confusion matrix of HapcNet and each comparative deep learning model on the anterior chamber angle test set according to embodiment 1 of the present invention; wherein, (a) is a confusion matrix of the deep learning model VGG-16 on the anterior chamber angle test set; (b) a confusion matrix of a deep learning model ResNet-50 on the anterior chamber corner test set; (c) a confusion matrix of a deep learning model DenseNet-121 on an anterior chamber corner test set; (d) a confusion matrix of a deep learning model MobileNet on an anterior chamber corner test set; (e) a confusion matrix of a deep learning model EfficientNet-B7 on the anterior chamber corner test set; (f) a confusion matrix of a deep learning model PyConvNet-50 on an anterior chamber corner test set; (g) the confusion matrix of the HapcNet provided by the invention on the anterior chamber corner test set.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Examples 1,
A construction method of an image classification model comprises the following steps:
s1, building an image classification model; as shown in fig. 1, the image classification model includes: the convolution layer, the first pyramid convolution unit, the second pyramid convolution unit, the nth pyramid convolution unit, the pooling layer and the full-connection layer are sequentially cascaded; the first convolution layer is used for extracting an initial feature map of an input image and outputting the initial feature map to the first pyramid convolution unit; the ith pyramid convolution unit is used for adopting convolution kernels with different scales of n-i +1 in number to respectively perform further feature extraction on the feature map currently input to the ith pyramid convolution unit, and then sequentially fusing the feature map extracted by each scale convolution kernel with the fusion feature map extracted by the previous stage convolution kernel to obtain fusion feature maps extracted by each scale convolution kernel, namely a group of feature maps containing information with different scales; splicing the feature maps containing different scale information, and then fusing the feature maps with the feature map currently input to the ith pyramid convolution unit to obtain an output feature map containing multi-scale information; wherein i is 1,2, …, n; for each scale convolution kernel, its scale is larger than that of its previous stage convolution kernel. Specifically, a feature map extracted by adopting a convolution kernel of a first scale in an ith pyramid convolution unit is marked as Fi blockBlock 1, …, n-i + 1; for the ith pyramid convolution unit, when i is 1,2, …, n-1, the feature map F is processedi 2And characteristic diagram Fi 1Fusing to obtain a fusion characteristic diagram extracted by a second scale convolution kernelWill feature chart Fi 3Fused feature map extracted with second scale convolution kernelAnd the fusion characteristic graph extracted by the third scale convolution kernel is obtained by fusionBy analogy, after the fused feature maps of the convolution kernels of all scales are extracted, all the fused feature maps are subjected to extractionAfter splicing operation is carried out, the feature map is fused with the feature map currently input to the ith pyramid convolution unit to obtain an output feature map containing multi-scale information, and the output feature map is output to the (i +1) th pyramid convolution unit; and when i is equal to n, performing convolution operation on the feature map currently input to the ith pyramid convolution unit, fusing the feature map currently input to the ith pyramid convolution unit with the feature map currently input to the ith pyramid convolution unit to obtain an output feature map containing multi-scale information, outputting the output feature map to the pooling layer, performing pooling operation, and then obtaining a classification result through the full-connection layer. Preferably, when i is 1,2, …, n-1, in the above-mentioned ith pyramid convolution unit, the feature map F is processedi blockFusion characteristic graph extracted from convolution kernel with first block-1 scaleThe specific way of fusion is as follows: for feature map Fi block-1After convolution operation, the feature map is fusedCombining to fully mine information among different characteristic graphs, so that the information is more complete; wherein, the feature map is fused withThe combination mode comprises pixel-by-pixel superposition operation or splicing operation or convolution operation after splicing. It should be noted that the feature map Fi 2And characteristic diagram Fi 1The mode of fusion and the feature map Fi blockFusion characteristic graph extracted from convolution kernel with first block-1 scaleThe fusion is performed in the same manner, which is not described herein. Further, for each fused feature mapThe specific way of fusing the feature map which is currently input to the ith pyramid convolution unit after the splicing operation is performed is as follows: combining each fused feature mapAnd splicing according to channels, changing the number of characteristic channels of the spliced characteristic diagram in a convolution mode to keep the characteristic channels consistent with the number of channels of the characteristic diagram currently input to the ith pyramid convolution unit, and then superposing and summing the characteristic channels and the characteristic diagram currently input to the ith pyramid convolution unit pixel by pixel to obtain an output characteristic diagram containing multi-scale information.
And S2, inputting a training set collected according to a preset classification task into the image classification model for training to obtain a trained image classification model. Preferably, the input image may be an image obtained by scaling an original sample image in a training set, so as to improve the calculation efficiency and speed up the training. In this embodiment, the cross entropy loss is taken as a total loss function, specifically:
wherein eta represents the number of output categories, and Num is the batch size of the images in the training set; x is the number ofp,qThe predicted probability generated by the softmax classification function for the p sample and belonging to class q; y isp,qIs the corresponding label of the p-th sample classified as the q-th class.
Preferably, in order to improve the robustness of the image classification model, the output end of the ith pyramid convolution unit is further connected to the input end of the ith pyramid convolution unit;
the ith pyramid convolution unit is also used for inputting the obtained output feature map containing the multi-scale information to the ith pyramid convolution unit again before outputting the obtained output feature map containing the multi-scale information to the next pyramid convolution unit or the pooling layer so as to further extract features of the currently obtained output feature map containing the multi-scale information; after repeating for multiple times, outputting the result to the next pyramid convolution unit or pooling layer; so as to improve the robustness of the image classification model.
Preferably, as shown in fig. 2, in order to solve the problem of information redundancy generated in the feature fusion process, useful information is further highlighted and useless information is suppressed; the image classification model further includes: a hybrid attention module; n mixed attention modules, when i is 1,2, …, n-1, the ith mixed attention module is located between the ith pyramid convolution unit and the (i +1) th pyramid convolution unit; when i is equal to n, the ith mixed attention module is positioned between the ith pyramid convolution unit and the pooling layer; the mixed attention module comprises a space attention network and a channel attention network which are connected in series or in parallel, and is used for screening the output characteristic diagram containing multi-scale information input by the pyramid convolution unit in space and channel dimensions to obtain a characteristic diagram FsaThereby suppressing redundant background information and highlighting characteristic information that is beneficial to the classification result.
Taking a mixed attention module composed of a spatial attention network and a channel attention network which are cascaded as an example, the output feature map u containing multi-scale information output by the ith pyramid convolution unit is input into the mixed attention module, and the following operations are executed in the mixed attention module:
in the channel attention network, firstly, the global average pooling operation is carried out on the output feature graph u according to the channels to extract the global spatial information on each channel, wherein ucGlobal spatial information of the c-th channelHcaAnd WcaAre each ucHeight and width of (u)cA feature map corresponding to the c-th channel of the output feature map u, wherein z is a one-dimensional vector containing global space information of each channel; then, the channel weights of the global space information on each channel are respectively learned through a one-dimensional convolution kernel with shared weights, and the obtained weight isWherein δ (·) is a Sigmoid function; 1D _ Conv denotes the use of a size k1DPerforming one-dimensional convolution operation on the z by the convolution kernel; it should be noted that, in order to realize the adaptive selection of the size of the convolution kernel, the size k of the convolution kernel in the channel attention network1DAnd the number C of feature channels of the input feature map1DSatisfies the following conditions:where γ and b are learning parameters, which are set to 2 and 1, | e $ y in this embodiment, respectivelyoddRepresents the odd number nearest to e; the channel attention ensures that the classification result is improved, and meanwhile, the calculation amount and the parameter amount are reduced. Finally, the learned channel weights are respectively acted on the corresponding channels in the output characteristic diagram to obtain a channel attention weight characteristic diagram Fca(ii) a Specifically, FcaU · w. Further, the channel attention weight feature map FcaInput into a spatial attention network.
In the spatial attention network, feature map F is selectedcaPerforming average pooling and maximum pooling operations along their channel axis directions, respectively, to quickly capture context information to generate two 2D maps, respectivelyAnd(Hsaand WsaHeight and width, respectively, of the spatial attention network generated feature map). Then, F is mixedavgAnd FmaxSplicing according to channels to generate two channel characteristic graphs, performing convolution operation on the two channel characteristic graphs by adopting convolution check with preset size to generate a space attention weight characteristic graphFinally, the channel attention weight feature map FcaAnd spatial attention weight feature map M (F)ca) Multiplying (i.e. dot multiplication) by pixel correspondence to obtain a feature map Fsa. Wherein the size of the preset convolution kernel is according to the channel attention weight characteristic diagram FcaThe size of the two-channel feature map and the channel attention weight feature map F are determined through convolution operationcaThe size of (a) is kept consistent to realize the subsequent dot product operation.
It should be noted that, in the present embodiment, the channel attention network and the spatial attention network are combined in a cascade manner to form a hybrid attention module; in addition, the two attention modules can be reasonably combined in parallel or in other ways.
When the parallel connection mode is adopted, the channel attention network and the space attention network respectively process the output feature graph u containing the multi-scale information and input by the ith pyramid convolution unit according to the operation to respectively obtain a channel attention weight feature graph and a space attention weight feature graph; then, the attention weight characteristic graph and the space attention weight characteristic graph are spliced according to channels and then convolution operation is carried out to obtain a characteristic graph Fsa. It should be noted that, the order of performing the splicing and the convolution operation according to the channels is not limited, as long as the output dimension is consistent with the dimension of the output feature graph u.
Preferably, as shown in FIG. 3, the output of the ith hybrid attention module is further connected to the input of the ith pyramid convolution unit; the ith mixed attention module is also used for obtaining a characteristic diagram FsaRe-input to the ith pyramid convolution unit to perform feature map FsaFurther extracting features; after repeating for multiple times, outputting the result to the next pyramid convolution unit or pooling layer; so as to improve the robustness of the image classification model.
It should be noted that the pyramid convolution unit and the attention mixing module connected to the output end of the pyramid convolution unit may be referred to as an attention mixing pyramid module; in order to improve the robustness of the network model, mixed attention pyramid modules consisting of different numbers and depth convolution kernels are cascaded together to form the classification model in the invention, and an input image is processed by the different mixed attention pyramid modules which are repeated for multiple times, so that a final classification prediction result is generated. The invention can adjust the repetition times of extracting the features in the mixed attention pyramid module and the size and the number of convolution kernels in each pyramid convolution unit according to the actual task.
Further, for example, in the common ophthalmic disease glaucoma, Optical Coherence Tomography (OCT) is often used to help clinicians identify the types of Anterior Chamber Angles (ACA) of patients, i.e. open angle, narrow angle and closed angle, because of its advantages of being non-invasive, comfortable, high resolution, non-contact, etc., but the Anterior chamber angle occupies a certain range of fluctuation in the OCT image due to the difference of individuals. If the anterior chamber angle is small, a single convolution kernel is difficult to accurately capture characteristic information of tiny details, and meanwhile, due to the fact that the information redundancy problem in the characteristic fusion process is ignored, useful information cannot be highlighted and useless information cannot be inhibited, and accurate type prediction of the anterior chamber angle is influenced finally; the invention provides an image classification model, which comprises a plurality of pyramid convolution units and is used for extracting different fine-grained image characteristics by utilizing a multi-scale scheme. In the module, the image is input into a pyramid convolution module consisting of convolution kernel filters with different sizes and depths, and the input image is respectively extracted with different scale information. And then, sequentially fusing the feature maps extracted by the convolution kernels of all scales with the fusion feature map extracted by the convolution kernel of the previous stage in a convolution spanning connection mode to obtain the fusion feature maps extracted by the convolution kernels of all scales, further mining the correlation among the feature maps to obtain output feature maps containing information of different scales, and thus finishing the feature extraction of all the convolution kernels of all sizes. And then, splicing the output feature maps containing different scale information together by using a feature map combination operation, and changing the number of spliced feature channels by 1 × 1 convolution. And finally, performing pixel-by-pixel superposition summation on the combined feature map and the image input into the pyramid convolution module.
In order to better verify the accuracy of the classification model constructed by the invention, the closed angle glaucoma data set provided by international conference of mica Image Computing and Computer Assisted interpretation in 2019 is taken as a training data set, 1341 Image is randomly selected from the training data set, and the Image is cut into 2682 anterior chamber angle images by cutting, wherein the data set provides two gold standard labels of open angle anterior chamber angle and closed angle anterior chamber angle. On this basis, closed angle glaucoma is further classified into narrow angle anterior chamber angle and closed angle glaucoma. In order to avoid the problem of difficult convergence of training caused by unbalanced data distribution, original data are subjected to translation and rotation processing through data enhancement to obtain 1536 open-angle anterior chamber angles, 1214 narrow-angle anterior chamber angles and 1458 closed-angle anterior chamber angles, and the number of finally obtained training sets, verification sets and test sets is 3367, 419 and 422 respectively.
In order to further embody the advantages of the invention, the invention utilizes the glaucoma anterior chamber angle data set to evaluate the performance of the method for classifying the classification model constructed by the invention and the performance of the current mainstream depth learning classification method, and the evaluation and selection indexes comprise accuracy ACC and average sensitivityAverage specificityAnd average equilibrium accuracyIt is defined as follows:
wherein N istestFor testing the number of images in the set, TPs、TNs、FPs、FNs(s ∈ {1,2,3}) indicates the number of true positives, true negatives, false positives and false negatives, respectively, when the s-th class is considered positive and the remaining classes are negative.
It should be noted that, in this embodiment, the number n of the pyramid convolution units is 4, the number of convolution kernels of the first pyramid convolution unit is n, and the scales of the convolution kernels are 3 × 3,5 × 5, …, (2n +1) × (2n + 1); the number of convolution kernels of the second pyramid convolution unit is n-1, and the scales of the convolution kernels are respectively 3 multiplied by 3,5 multiplied by 5, …, (2n-1) × (2 n-1); the convolution kernels of the (n-1) th pyramid convolution unit are 2, and the scales of the convolution kernels are respectively 3 multiplied by 3 and 5 multiplied by 5; the convolution kernels of the nth pyramid convolution unit are 1, and the scale is 3 multiplied by 3. In each pyramid convolution unit, sequentially fusing the feature graph extracted by each scale convolution kernel with the fusion feature graph extracted by the previous-stage convolution kernel in a convolution spanning connection mode to obtain the fusion feature graph extracted by each scale convolution kernel, thereby obtaining output feature graphs containing information of different scales and sequentially finishing feature extraction of all convolution kernels; namely, the feature map F extracted by the second scale convolution kerneli 2Feature map F extracted by convolution kernel with first scalei 1Fusing to obtain a fusion characteristic graph M extracted by a second scale convolution kerneli 2Extracting feature graph F from the third scale convolution kerneli 3Fusion characteristic graph F extracted by convolution kernel with second scalei 2Fusing to obtain a fused feature map extracted by a third-scale convolution kernelThe feature map F extracted by the fourth scale convolution kerneli 4Fused feature map extracted with third scale convolution kernelFusing to obtain a fusion characteristic diagram extracted by a fourth scale convolution kernelAnd so on. It should be noted that the scale of the previous convolution kernel is smaller than that of the current convolution kernel. Specifically, in this embodiment, as shown in fig. 4, taking 3 convolution kernels in a pyramid convolution unit as an example, the operation is performed in a 3 × 3 convolution cross-connection manner, and a fusion feature graph extracted by a first block scale convolution kernel is:wherein K3×3A convolution kernel of size 3 x 3. After the fusion characteristic graphs of the convolution kernels of all scales are extracted, outputting the fusion characteristic graphs by utilizing the combination operation of the characteristic graphsAnd splicing the channels together, and changing the number of the spliced characteristic channels through 1 multiplied by 1 convolution so that the characteristic channels can be superposed and summed with the characteristic image currently input to the ith pyramid convolution unit pixel by pixel to obtain an output characteristic image containing multi-scale information. In this embodiment, the number of times of repetition when the first pyramid convolution unit and the first mixed attention pyramid module formed by the first mixed attention module extract features is 3, the number of times of repetition when the second pyramid convolution unit and the second mixed attention pyramid module formed by the second mixed attention module extract features is 4, the number of times of repetition when the third pyramid convolution unit and the third mixed attention pyramid module formed by the third mixed attention module extract features is 6, and the number of times of repetition when the fourth pyramid convolution unit and the fourth mixed attention pyramid module formed by the fourth mixed attention module extract features is 3.
Table 1 shows the comparison of classification performance of the classification model constructed by the present invention (herein, HapcNet) and different mainstream networks (VGG-16, ResNet-50, DenseNet-121, MobileNet, EfficientNet-B7, and PyConvNet-50) on the anterior chamber corner test set. Wherein EfficientNet-B7 is B7 series of EfficientNet, and numbers in other networks represent the networksThe number of layers, e.g., VGG-16, represents a 16-layer VGG network. As can be seen from Table 1, the algorithms with more prominent classification effect include EfficientNet, PyConvNet and HapcNet provided by the invention, which are superior to the other four deep learning methods in most indexes. Compared with the EfficientNet and PyConvNet methods, the HapcNet provided by the invention has the ACC value improved by about 1.47% and 1.66% respectively. In thatAlthough the difference between the networks is not significant, the VGG performs the worst, which is0.9933, the HapcNet provided by the invention can reach 0.9998, and the best classification performance is obtained in the comparison networks.
TABLE 1
Furthermore, in order to more intuitively display the superiority of the method compared with other methods, the HapcNet and each comparative deep learning model provided by the invention are respectively adopted for experiments; fig. 5 is a graph showing an Accuracy curve of the hapcent and each comparative deep learning model provided by the present invention on the anterior chamber angle validation set, wherein an abscissa Epochs is the number of iterations, and an ordinate Accuracy is the Accuracy; fig. 6 is a confusion matrix of the hapcent and each comparative deep learning model provided by the present invention on the anterior chamber angle test set, wherein "0", "1" and "2" represent the open angle, narrow angle and closed angle, respectively; FIG. 6 (a) is a confusion matrix of the deep learning model VGG-16 on the anterior chamber corner test set; FIG. 6 (b) is a confusion matrix of the deep learning model ResNet-50 on the anterior chamber corner test set; fig. 6 (c) is a confusion matrix of the deep learning model DenseNet-121 on the anterior chamber corner test set; fig. 6 (d) is a confusion matrix of the deep learning model MobileNet on the anterior chamber angle test set; FIG. 6 (e) is a confusion matrix of the deep learning model EfficientNet-B7 on the anterior chamber corner test set; FIG. 6 (f) is the confusion matrix of the deep learning model PyConvNet-50 on the anterior chamber corner test set; fig. 6 (g) is a confusion matrix of hapcent on the anterior chamber angle test set provided by the present invention. From fig. 5, it can be seen that the hapcent provided by the present invention achieves better convergence accuracy than the deep learning model, and provides an extremely competitive convergence speed. As can be seen from the confusion matrix shown in FIG. 6, HapcNet, EfficientNet-B7, and PyConvNet-50 achieved superior classification performance on the anterior chamber corner test dataset as compared to the rest of the mainstream networks. Specifically, for the open angle anterior chamber angle, the HapcNet provided by the invention can provide the sub-optimal accuracy rate of 98.7%, and the EfficientNet-B7 obtains the optimal classification accuracy rate of 99.4%; for narrow-angle anterior chamber angles, the HapcNet provided by the invention can provide the best accuracy of 100%, and EfficientNet-B7 can only provide the second best accuracy; the HapcNet provided by the invention can still achieve the best classification accuracy for the closed-angle anterior chamber angle. In conclusion, the HapcNet provided by the invention has advantages in the classification of the anterior chamber angle data sets compared with other deep learning models.
Examples 2,
An image classification method, comprising: the image to be classified is input into the image classification model constructed by the construction method of the image classification model provided in embodiment 1, and a classification result is obtained. Preferably, before the image to be classified is input to the image classification model, the image to be classified is scaled to improve the computational efficiency.
The related technical scheme is the same as embodiment 1, and is not described herein.
Examples 3,
A machine-readable storage medium storing machine-executable instructions that, when invoked and executed by a processor, cause the processor to implement the method of constructing an image classification model provided in example 1 and/or the method of image classification provided in example 2.
The related technical features are the same as those of embodiment 1 and embodiment 2, and are not described herein.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (10)
1. A construction method of an image classification model is characterized by comprising the following steps:
s1, building an image classification model; the image classification model comprises: the convolution layer, the first pyramid convolution unit, the second pyramid convolution unit, the nth pyramid convolution unit, the pooling layer and the full-connection layer are sequentially cascaded; the first convolution layer is used for extracting an initial feature map of an input image and outputting the initial feature map to the first pyramid convolution unit; the ith pyramid convolution unit is used for adopting convolution kernels with different scales of n-i +1 in number to respectively perform further feature extraction on the feature map currently input into the ith pyramid convolution unit, and then sequentially fusing the feature map extracted by each scale convolution kernel with the fusion feature map extracted by the previous stage convolution kernel to obtain fusion feature maps extracted by each scale convolution kernel, namely a group of feature maps containing different scale information; fusing the characteristic graph containing different scale information with the characteristic graph currently input to the ith pyramid convolution unit to obtain an output characteristic graph containing multi-scale information; wherein i is 1,2, …, n; for each scale convolution kernel, the scale is larger than that of the previous stage convolution kernel;
and S2, inputting a training set collected according to a preset classification task into the image classification model for training to obtain a trained image classification model.
2. The method of constructing an image classification model according to claim 1, wherein the input image is an image obtained by scaling original sample images in the training set.
3. The structure of the image classification model of claim 1The building method is characterized in that a characteristic graph extracted by adopting a convolution kernel of a first block scale in the ith pyramid convolution unit is marked as Fi blockBlock 1, …, n-i + 1; for the ith pyramid convolution unit, when i is 1,2, …, n-1, a feature map F is obtainedi 2And characteristic diagram Fi 1Fusing to obtain a fusion characteristic diagram extracted by a second scale convolution kernelStarting from block to 3, sequentially converting the characteristic diagram Fi blockFusion characteristic graph extracted from convolution kernel with first block-1 scaleFusing to obtain a fusion characteristic graph extracted by a first block scale convolution kernelAfter the fusion characteristic graphs of the convolution kernels of all scales are extracted, all the fusion characteristic graphs are subjected toAfter splicing operation is carried out, the feature map is fused with the feature map currently input to the ith pyramid convolution unit to obtain an output feature map containing multi-scale information, and the output feature map is output to the (i +1) th pyramid convolution unit; and when i is equal to n, performing convolution operation on the feature map currently input to the ith pyramid convolution unit, fusing the feature map currently input to the ith pyramid convolution unit with the feature map currently input to the ith pyramid convolution unit to obtain an output feature map containing multi-scale information, outputting the output feature map to the pooling layer, performing pooling operation on the output feature map, and obtaining a classification result through the full connection layer.
4. The method for constructing an image classification model according to claim 3, wherein in the ith pyramid convolution unit, a specific way of fusing the feature map A and the feature map B or the fused feature map B is as follows: performing convolution operation on the A and then combining the A with the B; the mode of combining with B comprises pixel-by-pixel superposition operation or splicing operation or convolution operation after splicing;
the pair of fused feature mapsThe specific way of fusing the feature map which is currently input to the ith pyramid convolution unit after the splicing operation is performed is as follows: combining each fused feature mapAnd splicing according to channels, changing the number of characteristic channels of the spliced characteristic diagram in a convolution mode to keep the characteristic channels consistent with the number of channels of the characteristic diagram currently input to the ith pyramid convolution unit, and then superposing and summing the characteristic channels and the characteristic diagram currently input to the ith pyramid convolution unit pixel by pixel to obtain an output characteristic diagram containing multi-scale information.
5. The method for constructing an image classification model according to any one of claims 1 to 4, wherein the output end of the ith pyramid convolution unit is further connected to the input end of the ith pyramid convolution unit;
the ith pyramid convolution unit is further used for inputting the output feature map containing the multi-scale information to the ith pyramid convolution unit again before outputting the output feature map containing the multi-scale information to the next pyramid convolution unit or the pooling layer so as to further extract features of the output feature map containing the multi-scale information; and after repeating for multiple times, outputting the result to the next pyramid convolution unit or the pooling layer.
6. The method for constructing an image classification model according to any one of claims 1 to 4, wherein the image classification model further comprises: a hybrid attention module; n mixed attention modules, when i is 1,2, …, n-1, the ith mixed attention module is located between the ith pyramid convolution unit and the (i +1) th pyramid convolution unit; when i is n, the ith mixed attention module is positioned between the ith pyramid convolution unit and the pooling layer;
the mixed attention module comprises a space attention network and a channel attention network which are connected in series or in parallel, and is used for screening output characteristic diagrams which are input by the pyramid convolution unit and contain multi-scale information in space and channel dimensions to obtain a characteristic diagram FsaThereby suppressing redundant background information.
7. The method for constructing an image classification model according to claim 6, wherein the output end of the ith mixed attention module is further connected to the input end of the ith pyramid convolution unit;
the ith mixed attention module is also used for converting the feature map FsaRe-inputting the feature map to the ith pyramid convolution unit to obtain the feature map FsaFurther extracting features; and after repeating for multiple times, outputting the result to the next pyramid convolution unit or the pooling layer.
8. The method for constructing an image classification model according to claim 6, wherein the channel attention network is used for performing a global average pooling operation on the input feature map according to channels to extract global spatial information on each channel; then, channel weights of global space information on all channels are learned through one-dimensional convolution kernels shared by the weights, and the learned channel weights are respectively acted on the corresponding channels in the input feature map so as to screen the feature information in channel dimensions;
9. An image classification method, comprising: inputting the image to be classified into the image classification model constructed by the image classification model construction method according to any one of claims 1 to 8, and obtaining a classification result.
10. A machine-readable storage medium having stored thereon machine-executable instructions which, when invoked and executed by a processor, cause the processor to carry out the method of constructing an image classification model according to any one of claims 1 to 8 and/or the method of image classification according to claim 9.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110356938.7A CN113191390B (en) | 2021-04-01 | 2021-04-01 | Image classification model construction method, image classification method and storage medium |
PCT/CN2021/086861 WO2022205502A1 (en) | 2021-04-01 | 2021-04-13 | Image classification model construction method, image classification method, and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110356938.7A CN113191390B (en) | 2021-04-01 | 2021-04-01 | Image classification model construction method, image classification method and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113191390A true CN113191390A (en) | 2021-07-30 |
CN113191390B CN113191390B (en) | 2022-06-14 |
Family
ID=76974445
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110356938.7A Active CN113191390B (en) | 2021-04-01 | 2021-04-01 | Image classification model construction method, image classification method and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113191390B (en) |
WO (1) | WO2022205502A1 (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113762251A (en) * | 2021-08-17 | 2021-12-07 | 慧影医疗科技(北京)有限公司 | Target classification method and system based on attention mechanism |
CN113963217A (en) * | 2021-11-16 | 2022-01-21 | 广东技术师范大学 | Anterior chamber angle image grading method integrating weak supervision metric learning |
CN114821121A (en) * | 2022-05-09 | 2022-07-29 | 盐城工学院 | Image classification method based on RGB three-component grouping attention weighted fusion |
CN114841979A (en) * | 2022-05-18 | 2022-08-02 | 大连理工大学人工智能大连研究院 | Multi-scale attention-fused deep learning cancer molecular typing prediction method |
CN115496808A (en) * | 2022-11-21 | 2022-12-20 | 中山大学中山眼科中心 | Corneal limbus positioning method and system |
CN117876797A (en) * | 2024-03-11 | 2024-04-12 | 中国地质大学(武汉) | Image multi-label classification method, device and storage medium |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116758029B (en) * | 2023-06-15 | 2024-07-26 | 广东灵顿智链信息技术有限公司 | Window cleaning machine movement control method and system based on machine vision |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109034210A (en) * | 2018-07-04 | 2018-12-18 | 国家新闻出版广电总局广播科学研究院 | Object detection method based on super Fusion Features Yu multi-Scale Pyramid network |
CN109598269A (en) * | 2018-11-14 | 2019-04-09 | 天津大学 | A kind of semantic segmentation method based on multiresolution input with pyramid expansion convolution |
CN110232394A (en) * | 2018-03-06 | 2019-09-13 | 华南理工大学 | A kind of multi-scale image semantic segmentation method |
US20200005151A1 (en) * | 2016-12-30 | 2020-01-02 | Nokia Technologies Oy | Artificial neural network |
CN111507408A (en) * | 2020-04-17 | 2020-08-07 | 深圳市商汤科技有限公司 | Image processing method and device, electronic equipment and storage medium |
CN112287924A (en) * | 2020-12-24 | 2021-01-29 | 北京易真学思教育科技有限公司 | Text region detection method, text region detection device, electronic equipment and computer storage medium |
AU2020103901A4 (en) * | 2020-12-04 | 2021-02-11 | Chongqing Normal University | Image Semantic Segmentation Method Based on Deep Full Convolutional Network and Conditional Random Field |
CN112396645A (en) * | 2020-11-06 | 2021-02-23 | 华中科技大学 | Monocular image depth estimation method and system based on convolution residual learning |
CN112418176A (en) * | 2020-12-09 | 2021-02-26 | 江西师范大学 | Remote sensing image semantic segmentation method based on pyramid pooling multilevel feature fusion network |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190228268A1 (en) * | 2016-09-14 | 2019-07-25 | Konica Minolta Laboratory U.S.A., Inc. | Method and system for cell image segmentation using multi-stage convolutional neural networks |
CN110188685B (en) * | 2019-05-30 | 2021-01-05 | 燕山大学 | Target counting method and system based on double-attention multi-scale cascade network |
CN110992361A (en) * | 2019-12-25 | 2020-04-10 | 创新奇智(成都)科技有限公司 | Engine fastener detection system and detection method based on cost balance |
CN111739075B (en) * | 2020-06-15 | 2024-02-06 | 大连理工大学 | Deep network lung texture recognition method combining multi-scale attention |
-
2021
- 2021-04-01 CN CN202110356938.7A patent/CN113191390B/en active Active
- 2021-04-13 WO PCT/CN2021/086861 patent/WO2022205502A1/en active Application Filing
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200005151A1 (en) * | 2016-12-30 | 2020-01-02 | Nokia Technologies Oy | Artificial neural network |
CN110232394A (en) * | 2018-03-06 | 2019-09-13 | 华南理工大学 | A kind of multi-scale image semantic segmentation method |
CN109034210A (en) * | 2018-07-04 | 2018-12-18 | 国家新闻出版广电总局广播科学研究院 | Object detection method based on super Fusion Features Yu multi-Scale Pyramid network |
CN109598269A (en) * | 2018-11-14 | 2019-04-09 | 天津大学 | A kind of semantic segmentation method based on multiresolution input with pyramid expansion convolution |
CN111507408A (en) * | 2020-04-17 | 2020-08-07 | 深圳市商汤科技有限公司 | Image processing method and device, electronic equipment and storage medium |
CN112396645A (en) * | 2020-11-06 | 2021-02-23 | 华中科技大学 | Monocular image depth estimation method and system based on convolution residual learning |
AU2020103901A4 (en) * | 2020-12-04 | 2021-02-11 | Chongqing Normal University | Image Semantic Segmentation Method Based on Deep Full Convolutional Network and Conditional Random Field |
CN112418176A (en) * | 2020-12-09 | 2021-02-26 | 江西师范大学 | Remote sensing image semantic segmentation method based on pyramid pooling multilevel feature fusion network |
CN112287924A (en) * | 2020-12-24 | 2021-01-29 | 北京易真学思教育科技有限公司 | Text region detection method, text region detection device, electronic equipment and computer storage medium |
Non-Patent Citations (3)
Title |
---|
XINJIANG WANG等: "Scale –Equalizing Pyramid Convolution for Object Detection", 《COMPUTER VISION AND PATTERN RECOGNITION》, 6 May 2020 (2020-05-06), pages 1 - 16 * |
XUMING ZHANG等: "Spiking cortical model-based noise detector for switching-based filters", 《JOURNAL OF ELECTRONIC IMAGING》, 2 April 2012 (2012-04-02), pages 013020 - 1 * |
吕朦: "基于多尺度卷积神经网络的图像分类算法研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》, 15 September 2019 (2019-09-15), pages 138 - 692 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113762251A (en) * | 2021-08-17 | 2021-12-07 | 慧影医疗科技(北京)有限公司 | Target classification method and system based on attention mechanism |
CN113762251B (en) * | 2021-08-17 | 2024-05-10 | 慧影医疗科技(北京)股份有限公司 | Attention mechanism-based target classification method and system |
CN113963217A (en) * | 2021-11-16 | 2022-01-21 | 广东技术师范大学 | Anterior chamber angle image grading method integrating weak supervision metric learning |
CN113963217B (en) * | 2021-11-16 | 2024-10-01 | 广东技术师范大学 | Anterior chamber angle image grading method integrating weak supervision measurement learning |
CN114821121A (en) * | 2022-05-09 | 2022-07-29 | 盐城工学院 | Image classification method based on RGB three-component grouping attention weighted fusion |
CN114841979A (en) * | 2022-05-18 | 2022-08-02 | 大连理工大学人工智能大连研究院 | Multi-scale attention-fused deep learning cancer molecular typing prediction method |
CN114841979B (en) * | 2022-05-18 | 2024-10-01 | 大连理工大学人工智能大连研究院 | Deep learning cancer molecular typing prediction method with multi-scale attention fusion |
CN115496808A (en) * | 2022-11-21 | 2022-12-20 | 中山大学中山眼科中心 | Corneal limbus positioning method and system |
CN115496808B (en) * | 2022-11-21 | 2023-03-24 | 中山大学中山眼科中心 | Corneal limbus positioning method and system |
CN117876797A (en) * | 2024-03-11 | 2024-04-12 | 中国地质大学(武汉) | Image multi-label classification method, device and storage medium |
CN117876797B (en) * | 2024-03-11 | 2024-06-04 | 中国地质大学(武汉) | Image multi-label classification method, device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2022205502A1 (en) | 2022-10-06 |
CN113191390B (en) | 2022-06-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113191390B (en) | Image classification model construction method, image classification method and storage medium | |
EP3921776B1 (en) | Method and system for classification and visualisation of 3d images | |
Verma et al. | Pneumonia classification using deep learning in healthcare | |
CN112308200A (en) | Neural network searching method and device | |
US11830187B2 (en) | Automatic condition diagnosis using a segmentation-guided framework | |
Narayanan et al. | Understanding deep neural network predictions for medical imaging applications | |
Olatunji et al. | Identification of erythemato-squamous skin diseases using extreme learning machine and artificial neural network | |
US11875898B2 (en) | Automatic condition diagnosis using an attention-guided framework | |
CN113706544A (en) | Medical image segmentation method based on complete attention convolution neural network | |
Yan et al. | Investigation of Customized Medical Decision Algorithms Utilizing Graph Neural Networks | |
CN114445356A (en) | Multi-resolution-based full-field pathological section image tumor rapid positioning method | |
Shamrat et al. | Analysing most efficient deep learning model to detect COVID-19 from computer tomography images | |
Dhawan et al. | Deep Learning Based Sugarcane Downy Mildew Disease Detection Using CNN-LSTM Ensemble Model for Severity Level Classification | |
CN116958535B (en) | Polyp segmentation system and method based on multi-scale residual error reasoning | |
Padmapriya et al. | Computer-Aided Diagnostic System for Brain Tumor Classification using Explainable AI | |
Zhou et al. | Balancing High-performance and Lightweight: HL-UNet for 3D Cardiac Medical Image Segmentation | |
Chen et al. | Cardiac motion scoring based on CNN with attention mechanism | |
Sineglazov et al. | Design of hybrid neural networks of the ensemble structure | |
Harshini et al. | Machine Learning Approach for Various Eye Diseases using Modified Voting Classifier Model | |
Veeranki et al. | Detection and classification of brain tumors using convolutional neural network | |
Parvathi et al. | Diabetic Retinopathy Detection Using Transfer Learning | |
Saednia et al. | An attention-guided deep neural network for annotating abnormalities in chest X-ray images: visualization of network decision basis | |
Shaik et al. | A Deep Learning Framework for Prognosis Patients with COVID-19 | |
Truong et al. | A Novel Approach of Using Neural Circuit Policies for COVID-19 Classification on CT-Images | |
Mahmud et al. | Automatic Diagnosis of Malaria from Thin Blood Smear Images using Deep Convolutional Neural Network with Multi-Resolution Feature Fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |