CN113191390A - Image classification model construction method, image classification method and storage medium - Google Patents

Image classification model construction method, image classification method and storage medium Download PDF

Info

Publication number
CN113191390A
CN113191390A CN202110356938.7A CN202110356938A CN113191390A CN 113191390 A CN113191390 A CN 113191390A CN 202110356938 A CN202110356938 A CN 202110356938A CN 113191390 A CN113191390 A CN 113191390A
Authority
CN
China
Prior art keywords
feature map
convolution
convolution unit
pyramid
ith
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110356938.7A
Other languages
Chinese (zh)
Other versions
CN113191390B (en
Inventor
张旭明
周权
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202110356938.7A priority Critical patent/CN113191390B/en
Priority to PCT/CN2021/086861 priority patent/WO2022205502A1/en
Publication of CN113191390A publication Critical patent/CN113191390A/en
Application granted granted Critical
Publication of CN113191390B publication Critical patent/CN113191390B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/32Indexing scheme for image data processing or generation, in general involving image mosaicing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a construction method of an image classification model, the image classification method and a storage medium, wherein the constructed image classification model comprises a convolution layer, a first pyramid convolution unit, a second pyramid convolution unit …, an nth pyramid convolution unit, a pooling layer and a full-connection layer which are sequentially cascaded; after the ith pyramid convolution unit respectively performs further feature extraction on the currently input feature map by adopting convolution kernels with different scales, the number of which is n-i +1, sequentially fusing the feature map extracted by the convolution kernels of each scale with the fusion feature map extracted by the convolution kernel of the previous stage to obtain fusion feature maps extracted by the convolution kernels of each scale, namely a group of feature maps containing information with different scales; fusing the feature map containing different scale information with the currently input feature map to obtain an output feature map containing multi-scale information; 1,2, …, n; the invention fully utilizes the information of different scales and has higher image classification accuracy.

Description

Image classification model construction method, image classification method and storage medium
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a construction method of an image classification model, an image classification method and a storage medium.
Background
Image classification techniques are the core of computer vision and have wide applications in many fields, such as: the method comprises the steps of face recognition and intelligent video analysis in the security field, traffic scene recognition in the traffic field, image retrieval in the internet field, medical image analysis in the medical field and the like. Taking medical images as an example, doctors can identify images collected by imaging devices (such as magnetic resonance imaging, ultrasonic imaging, optical tomography and the like) in clinical diagnosis to achieve the purpose of disease screening. However, the manual identification effect greatly depends on the clinical experience of the doctor, and the diagnosis efficiency of the doctor is also influenced by huge medical data volume, so that misdiagnosis or missed diagnosis is easily caused by over fatigue of the doctor. At present, the automated computer-aided diagnosis technology is widely applied to the field of medical image recognition, and the image is processed and analyzed by utilizing the strong computing power of a computer, so that information with reference value is provided for a clinician, and the workload of the clinician is greatly reduced.
In recent years, deep learning algorithms have gained wide attention in the field of image classification. Compared with the traditional machine learning algorithm for acquiring manual features based on shallow learning, the deep learning method combines a plurality of nonlinear shallow features and constructs more abstract high-order features on the basis. Like the deep structure of the brain, in deep learning, each input object is represented in a multi-layer abstract form, each layer corresponding to a different cortical region. The advantage of deep learning is that it achieves multi-level features that are learned from raw data using a common learning process, rather than being designed by manual screening. The deep learning models which are commonly used at present comprise a deep boltzmann machine, a deep belief network, a stacked automatic encoder, a recurrent neural network and a convolutional neural network. Convolutional neural networks are widely used in image processing and have good effects in many medical image recognition tasks. However, most of the existing network models only use a single convolution kernel when extracting image feature information, and for images with large target area changes, it is difficult to completely capture feature information with different detail sizes.
Disclosure of Invention
In view of the above drawbacks or needs for improvement in the prior art, the present invention provides a method for constructing an image classification model, an image classification method, and a storage medium, so as to solve the technical problem in the prior art that the classification accuracy is low due to the fact that feature information of different scales is not fully utilized.
In order to achieve the above object, in a first aspect, the present invention provides a method for constructing an image classification model, including the following steps:
s1, building an image classification model; the image classification model comprises: the convolution layer, the first pyramid convolution unit, the second pyramid convolution unit, the nth pyramid convolution unit, the pooling layer and the full-connection layer are sequentially cascaded; the first convolution layer is used for extracting an initial feature map of an input image and outputting the initial feature map to the first pyramid convolution unit; the ith pyramid convolution unit is used for adopting convolution kernels with different scales of n-i +1 in number to respectively perform further feature extraction on the feature map currently input to the ith pyramid convolution unit, and then sequentially fusing the feature map extracted by the convolution kernels with the fusion feature map extracted by the convolution kernel of the previous stage to obtain fusion feature maps extracted by the convolution kernels of each scale, namely a group of feature maps containing information with different scales; fusing the feature map containing different scale information with the feature map currently input to the ith pyramid convolution unit to obtain an output feature map containing multi-scale information; wherein i is 1,2, …, n; for each scale convolution kernel, the scale is larger than that of the previous stage convolution kernel;
and S2, inputting a training set collected according to a preset classification task into the image classification model for training to obtain a trained image classification model.
Further preferably, the input image is an image obtained by scaling the original sample image in the training set, so as to improve the calculation efficiency.
Further preferably, a feature map extracted by a convolution kernel of a first block scale in an ith pyramid convolution unit is marked as Fi block,block 1, …, n-i + 1; for the ith pyramid convolution unit, when i is 1,2, …, n-1, the feature map F is processedi 2And characteristic diagram Fi 1Fusing to obtain a fusion characteristic diagram extracted by a second scale convolution kernel
Figure BDA0003003705090000031
Starting from block to 3, the feature map F is processedi blockFusion characteristic graph extracted from convolution kernel with first block-1 scale
Figure BDA0003003705090000032
Fusing to obtain a fusion characteristic graph extracted by a first block scale convolution kernel
Figure BDA0003003705090000033
After the fusion characteristic graphs of the convolution kernels of all scales are extracted, all the fusion characteristic graphs are subjected to
Figure BDA0003003705090000034
After splicing operation is carried out, the feature map is fused with the feature map currently input to the ith pyramid convolution unit to obtain an output feature map containing multi-scale information, and the output feature map is output to the (i +1) th pyramid convolution unit; and when i is equal to n, performing convolution operation on the feature map currently input to the ith pyramid convolution unit, fusing the feature map currently input to the ith pyramid convolution unit with the feature map currently input to the ith pyramid convolution unit to obtain an output feature map containing multi-scale information, outputting the output feature map to the pooling layer, performing pooling operation, and then obtaining a classification result through the full-connection layer.
More preferably, in the ith pyramid convolution unit, a specific way of fusing the feature map a with the feature map B or the fused feature map B in the ith pyramid convolution unit is as follows: performing convolution operation on the A and then combining the A with the B; the mode of combining with B comprises pixel-by-pixel superposition operation or splicing operation or convolution operation after splicing;
for each fused feature map
Figure BDA0003003705090000035
Make a spliceThe specific mode of fusing the operated characteristic graph with the characteristic graph currently input to the ith pyramid convolution unit is as follows: combining each fused feature map
Figure BDA0003003705090000036
And splicing according to channels, changing the number of characteristic channels of the spliced characteristic diagram in a convolution mode to keep the characteristic channels consistent with the number of channels of the characteristic diagram currently input to the ith pyramid convolution unit, and then superposing and summing the characteristic channels and the characteristic diagram currently input to the ith pyramid convolution unit pixel by pixel to obtain an output characteristic diagram containing multi-scale information.
Further preferably, the output end of the ith pyramid convolution unit is also connected to the input end of the ith pyramid convolution unit;
the ith pyramid convolution unit is also used for inputting the obtained output feature map containing the multi-scale information to the ith pyramid convolution unit again before outputting the obtained output feature map containing the multi-scale information to the next pyramid convolution unit or the pooling layer so as to further extract features of the currently obtained output feature map containing the multi-scale information; after repeating for multiple times, outputting the result to the next pyramid convolution unit or pooling layer; so as to improve the robustness of the image classification model.
Further preferably, the image classification model further includes: a hybrid attention module; n mixed attention modules, when i is 1,2, …, n-1, the ith mixed attention module is located between the ith pyramid convolution unit and the (i +1) th pyramid convolution unit; when i is equal to n, the ith mixed attention module is positioned between the ith pyramid convolution unit and the pooling layer;
the mixed attention module comprises a space attention network and a channel attention network which are connected in series or in parallel, and is used for screening the output characteristic diagram containing multi-scale information input by the pyramid convolution unit in space and channel dimensions to obtain a characteristic diagram FsaThereby suppressing redundant background information and highlighting characteristic information that is beneficial to the classification result.
Further preferably, the output end of the ith mixed attention module is also connected to the input end of the ith pyramid convolution unit;
the ith mixed attention module is also used for obtaining a characteristic diagram FsaRe-input to the ith pyramid convolution unit to perform feature map FsaFurther extracting features; after repeating for multiple times, outputting the result to the next pyramid convolution unit or pooling layer; so as to improve the robustness of the image classification model.
Further preferably, the channel attention network is configured to perform global average pooling operation on the input feature map according to channels to extract global spatial information on each channel; then, channel weights of global space information on all channels are learned through one-dimensional convolution kernels shared by the weights, and the learned channel weights are respectively acted on the corresponding channels in the input feature graph so as to carry out channel dimension screening on the feature information;
size k of convolution kernel in channel attention network1DAnd the number C of feature channels of the input feature map1DSatisfies the following conditions:
Figure BDA0003003705090000051
wherein gamma and b are learning parameters, | e! YoddRepresenting the odd number nearest to e.
In a second aspect, the present invention provides an image classification method, including: and inputting the image to be classified into the image classification model constructed by adopting the construction method of the image classification model provided by the first aspect of the invention to obtain a classification result.
In a third aspect, the present invention also provides a machine-readable storage medium storing machine-executable instructions which, when invoked and executed by a processor, cause the processor to implement any one of the image classification model construction methods described above and/or the image classification method described above.
Generally, by the above technical solution conceived by the present invention, the following beneficial effects can be obtained:
1. the invention provides a method for constructing an image classification model, wherein the constructed image classification model comprises a convolution layer, a first pyramid convolution unit, a second pyramid convolution unit, an n-th pyramid convolution unit, a pooling layer and a full-connection layer which are sequentially cascaded, wherein the pyramid convolution unit sequentially fuses a feature map extracted by each scale convolution kernel and a fusion feature map extracted by a previous stage convolution kernel in a convolution crossing connection mode to obtain a fusion feature map extracted by each scale convolution kernel, so that the correlation among the feature maps is further mined to obtain an output feature map containing multi-scale information, and different scale information among the output feature maps is fully utilized; the invention utilizes a multi-scale scheme to extract different fine-grained image characteristics, and the accuracy of image classification is higher.
2. The image classification model constructed by the construction method of the image classification model further comprises a mixed attention module, the output feature graph which is input by the pyramid convolution unit and contains multi-scale information is screened on the space and channel dimensions based on the space attention network and the channel attention network, self-adaptive calibration of channel features and space information is achieved, redundant information introduced during integration of feature graphs of different scales is restrained, and the accuracy of image classification is further improved by effectively restraining useless background information and highlighting key feature information.
3. In the image classification model constructed by the construction method of the image classification model provided by the invention, the output end of the ith mixed attention module is also connected to the input end of the ith pyramid convolution unit; the pyramid convolution unit and the mixed attention module connected with the output end of the pyramid convolution unit are called as mixed attention pyramid modules, the mixed attention pyramid modules composed of different numbers and depth convolution kernels are cascaded together for image classification, and the model accuracy rate is improved while the robustness of the model is greatly improved.
4. In the image classification model constructed by the construction method of the image classification model, the image can be scaled in scale before being input into the model so as to improve the calculation efficiency.
Drawings
Fig. 1 is a schematic structural diagram of an image classification model provided in embodiment 1 of the present invention;
fig. 2 is a schematic structural diagram of an image classification model including a mixed attention module according to embodiment 1 of the present invention;
fig. 3 is a schematic structural diagram of an image classification model according to embodiment 1 of the present invention, which includes a mixed attention module, and an output end of the mixed attention module is further connected to an input end of a corresponding pyramid convolution unit;
fig. 4 is a schematic diagram illustrating a 3 × 3 convolution cross-connection manner in the pyramid convolution unit provided in embodiment 1 of the present invention;
FIG. 5 is a graph of the accuracy of a HapcNet and each comparative deep learning model on a validation set of anterior chamber angles, as provided in example 1 of the present invention;
FIG. 6 is a confusion matrix of HapcNet and each comparative deep learning model on the anterior chamber angle test set according to embodiment 1 of the present invention; wherein, (a) is a confusion matrix of the deep learning model VGG-16 on the anterior chamber angle test set; (b) a confusion matrix of a deep learning model ResNet-50 on the anterior chamber corner test set; (c) a confusion matrix of a deep learning model DenseNet-121 on an anterior chamber corner test set; (d) a confusion matrix of a deep learning model MobileNet on an anterior chamber corner test set; (e) a confusion matrix of a deep learning model EfficientNet-B7 on the anterior chamber corner test set; (f) a confusion matrix of a deep learning model PyConvNet-50 on an anterior chamber corner test set; (g) the confusion matrix of the HapcNet provided by the invention on the anterior chamber corner test set.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Examples 1,
A construction method of an image classification model comprises the following steps:
s1, building an image classification model; as shown in fig. 1, the image classification model includes: the convolution layer, the first pyramid convolution unit, the second pyramid convolution unit, the nth pyramid convolution unit, the pooling layer and the full-connection layer are sequentially cascaded; the first convolution layer is used for extracting an initial feature map of an input image and outputting the initial feature map to the first pyramid convolution unit; the ith pyramid convolution unit is used for adopting convolution kernels with different scales of n-i +1 in number to respectively perform further feature extraction on the feature map currently input to the ith pyramid convolution unit, and then sequentially fusing the feature map extracted by each scale convolution kernel with the fusion feature map extracted by the previous stage convolution kernel to obtain fusion feature maps extracted by each scale convolution kernel, namely a group of feature maps containing information with different scales; splicing the feature maps containing different scale information, and then fusing the feature maps with the feature map currently input to the ith pyramid convolution unit to obtain an output feature map containing multi-scale information; wherein i is 1,2, …, n; for each scale convolution kernel, its scale is larger than that of its previous stage convolution kernel. Specifically, a feature map extracted by adopting a convolution kernel of a first scale in an ith pyramid convolution unit is marked as Fi blockBlock 1, …, n-i + 1; for the ith pyramid convolution unit, when i is 1,2, …, n-1, the feature map F is processedi 2And characteristic diagram Fi 1Fusing to obtain a fusion characteristic diagram extracted by a second scale convolution kernel
Figure BDA0003003705090000081
Will feature chart Fi 3Fused feature map extracted with second scale convolution kernel
Figure BDA0003003705090000082
And the fusion characteristic graph extracted by the third scale convolution kernel is obtained by fusion
Figure BDA0003003705090000083
By analogy, after the fused feature maps of the convolution kernels of all scales are extracted, all the fused feature maps are subjected to extraction
Figure BDA0003003705090000084
After splicing operation is carried out, the feature map is fused with the feature map currently input to the ith pyramid convolution unit to obtain an output feature map containing multi-scale information, and the output feature map is output to the (i +1) th pyramid convolution unit; and when i is equal to n, performing convolution operation on the feature map currently input to the ith pyramid convolution unit, fusing the feature map currently input to the ith pyramid convolution unit with the feature map currently input to the ith pyramid convolution unit to obtain an output feature map containing multi-scale information, outputting the output feature map to the pooling layer, performing pooling operation, and then obtaining a classification result through the full-connection layer. Preferably, when i is 1,2, …, n-1, in the above-mentioned ith pyramid convolution unit, the feature map F is processedi blockFusion characteristic graph extracted from convolution kernel with first block-1 scale
Figure BDA0003003705090000085
The specific way of fusion is as follows: for feature map Fi block-1After convolution operation, the feature map is fused
Figure BDA0003003705090000086
Combining to fully mine information among different characteristic graphs, so that the information is more complete; wherein, the feature map is fused with
Figure BDA0003003705090000087
The combination mode comprises pixel-by-pixel superposition operation or splicing operation or convolution operation after splicing. It should be noted that the feature map Fi 2And characteristic diagram Fi 1The mode of fusion and the feature map Fi blockFusion characteristic graph extracted from convolution kernel with first block-1 scale
Figure BDA0003003705090000088
The fusion is performed in the same manner, which is not described herein. Further, for each fused feature map
Figure BDA0003003705090000089
The specific way of fusing the feature map which is currently input to the ith pyramid convolution unit after the splicing operation is performed is as follows: combining each fused feature map
Figure BDA00030037050900000810
And splicing according to channels, changing the number of characteristic channels of the spliced characteristic diagram in a convolution mode to keep the characteristic channels consistent with the number of channels of the characteristic diagram currently input to the ith pyramid convolution unit, and then superposing and summing the characteristic channels and the characteristic diagram currently input to the ith pyramid convolution unit pixel by pixel to obtain an output characteristic diagram containing multi-scale information.
And S2, inputting a training set collected according to a preset classification task into the image classification model for training to obtain a trained image classification model. Preferably, the input image may be an image obtained by scaling an original sample image in a training set, so as to improve the calculation efficiency and speed up the training. In this embodiment, the cross entropy loss is taken as a total loss function, specifically:
Figure BDA0003003705090000091
wherein eta represents the number of output categories, and Num is the batch size of the images in the training set; x is the number ofp,qThe predicted probability generated by the softmax classification function for the p sample and belonging to class q; y isp,qIs the corresponding label of the p-th sample classified as the q-th class.
Preferably, in order to improve the robustness of the image classification model, the output end of the ith pyramid convolution unit is further connected to the input end of the ith pyramid convolution unit;
the ith pyramid convolution unit is also used for inputting the obtained output feature map containing the multi-scale information to the ith pyramid convolution unit again before outputting the obtained output feature map containing the multi-scale information to the next pyramid convolution unit or the pooling layer so as to further extract features of the currently obtained output feature map containing the multi-scale information; after repeating for multiple times, outputting the result to the next pyramid convolution unit or pooling layer; so as to improve the robustness of the image classification model.
Preferably, as shown in fig. 2, in order to solve the problem of information redundancy generated in the feature fusion process, useful information is further highlighted and useless information is suppressed; the image classification model further includes: a hybrid attention module; n mixed attention modules, when i is 1,2, …, n-1, the ith mixed attention module is located between the ith pyramid convolution unit and the (i +1) th pyramid convolution unit; when i is equal to n, the ith mixed attention module is positioned between the ith pyramid convolution unit and the pooling layer; the mixed attention module comprises a space attention network and a channel attention network which are connected in series or in parallel, and is used for screening the output characteristic diagram containing multi-scale information input by the pyramid convolution unit in space and channel dimensions to obtain a characteristic diagram FsaThereby suppressing redundant background information and highlighting characteristic information that is beneficial to the classification result.
Taking a mixed attention module composed of a spatial attention network and a channel attention network which are cascaded as an example, the output feature map u containing multi-scale information output by the ith pyramid convolution unit is input into the mixed attention module, and the following operations are executed in the mixed attention module:
in the channel attention network, firstly, the global average pooling operation is carried out on the output feature graph u according to the channels to extract the global spatial information on each channel, wherein ucGlobal spatial information of the c-th channel
Figure BDA0003003705090000101
HcaAnd WcaAre each ucHeight and width of (u)cA feature map corresponding to the c-th channel of the output feature map u, wherein z is a one-dimensional vector containing global space information of each channel; then, the channel weights of the global space information on each channel are respectively learned through a one-dimensional convolution kernel with shared weights, and the obtained weight is
Figure BDA0003003705090000102
Wherein δ (·) is a Sigmoid function; 1D _ Conv denotes the use of a size k1DPerforming one-dimensional convolution operation on the z by the convolution kernel; it should be noted that, in order to realize the adaptive selection of the size of the convolution kernel, the size k of the convolution kernel in the channel attention network1DAnd the number C of feature channels of the input feature map1DSatisfies the following conditions:
Figure BDA0003003705090000103
where γ and b are learning parameters, which are set to 2 and 1, | e $ y in this embodiment, respectivelyoddRepresents the odd number nearest to e; the channel attention ensures that the classification result is improved, and meanwhile, the calculation amount and the parameter amount are reduced. Finally, the learned channel weights are respectively acted on the corresponding channels in the output characteristic diagram to obtain a channel attention weight characteristic diagram Fca(ii) a Specifically, FcaU · w. Further, the channel attention weight feature map FcaInput into a spatial attention network.
In the spatial attention network, feature map F is selectedcaPerforming average pooling and maximum pooling operations along their channel axis directions, respectively, to quickly capture context information to generate two 2D maps, respectively
Figure BDA0003003705090000104
And
Figure BDA0003003705090000105
(Hsaand WsaHeight and width, respectively, of the spatial attention network generated feature map). Then, F is mixedavgAnd FmaxSplicing according to channels to generate two channel characteristic graphs, performing convolution operation on the two channel characteristic graphs by adopting convolution check with preset size to generate a space attention weight characteristic graph
Figure BDA0003003705090000106
Finally, the channel attention weight feature map FcaAnd spatial attention weight feature map M (F)ca) Multiplying (i.e. dot multiplication) by pixel correspondence to obtain a feature map Fsa. Wherein the size of the preset convolution kernel is according to the channel attention weight characteristic diagram FcaThe size of the two-channel feature map and the channel attention weight feature map F are determined through convolution operationcaThe size of (a) is kept consistent to realize the subsequent dot product operation.
It should be noted that, in the present embodiment, the channel attention network and the spatial attention network are combined in a cascade manner to form a hybrid attention module; in addition, the two attention modules can be reasonably combined in parallel or in other ways.
When the parallel connection mode is adopted, the channel attention network and the space attention network respectively process the output feature graph u containing the multi-scale information and input by the ith pyramid convolution unit according to the operation to respectively obtain a channel attention weight feature graph and a space attention weight feature graph; then, the attention weight characteristic graph and the space attention weight characteristic graph are spliced according to channels and then convolution operation is carried out to obtain a characteristic graph Fsa. It should be noted that, the order of performing the splicing and the convolution operation according to the channels is not limited, as long as the output dimension is consistent with the dimension of the output feature graph u.
Preferably, as shown in FIG. 3, the output of the ith hybrid attention module is further connected to the input of the ith pyramid convolution unit; the ith mixed attention module is also used for obtaining a characteristic diagram FsaRe-input to the ith pyramid convolution unit to perform feature map FsaFurther extracting features; after repeating for multiple times, outputting the result to the next pyramid convolution unit or pooling layer; so as to improve the robustness of the image classification model.
It should be noted that the pyramid convolution unit and the attention mixing module connected to the output end of the pyramid convolution unit may be referred to as an attention mixing pyramid module; in order to improve the robustness of the network model, mixed attention pyramid modules consisting of different numbers and depth convolution kernels are cascaded together to form the classification model in the invention, and an input image is processed by the different mixed attention pyramid modules which are repeated for multiple times, so that a final classification prediction result is generated. The invention can adjust the repetition times of extracting the features in the mixed attention pyramid module and the size and the number of convolution kernels in each pyramid convolution unit according to the actual task.
Further, for example, in the common ophthalmic disease glaucoma, Optical Coherence Tomography (OCT) is often used to help clinicians identify the types of Anterior Chamber Angles (ACA) of patients, i.e. open angle, narrow angle and closed angle, because of its advantages of being non-invasive, comfortable, high resolution, non-contact, etc., but the Anterior chamber angle occupies a certain range of fluctuation in the OCT image due to the difference of individuals. If the anterior chamber angle is small, a single convolution kernel is difficult to accurately capture characteristic information of tiny details, and meanwhile, due to the fact that the information redundancy problem in the characteristic fusion process is ignored, useful information cannot be highlighted and useless information cannot be inhibited, and accurate type prediction of the anterior chamber angle is influenced finally; the invention provides an image classification model, which comprises a plurality of pyramid convolution units and is used for extracting different fine-grained image characteristics by utilizing a multi-scale scheme. In the module, the image is input into a pyramid convolution module consisting of convolution kernel filters with different sizes and depths, and the input image is respectively extracted with different scale information. And then, sequentially fusing the feature maps extracted by the convolution kernels of all scales with the fusion feature map extracted by the convolution kernel of the previous stage in a convolution spanning connection mode to obtain the fusion feature maps extracted by the convolution kernels of all scales, further mining the correlation among the feature maps to obtain output feature maps containing information of different scales, and thus finishing the feature extraction of all the convolution kernels of all sizes. And then, splicing the output feature maps containing different scale information together by using a feature map combination operation, and changing the number of spliced feature channels by 1 × 1 convolution. And finally, performing pixel-by-pixel superposition summation on the combined feature map and the image input into the pyramid convolution module.
In order to better verify the accuracy of the classification model constructed by the invention, the closed angle glaucoma data set provided by international conference of mica Image Computing and Computer Assisted interpretation in 2019 is taken as a training data set, 1341 Image is randomly selected from the training data set, and the Image is cut into 2682 anterior chamber angle images by cutting, wherein the data set provides two gold standard labels of open angle anterior chamber angle and closed angle anterior chamber angle. On this basis, closed angle glaucoma is further classified into narrow angle anterior chamber angle and closed angle glaucoma. In order to avoid the problem of difficult convergence of training caused by unbalanced data distribution, original data are subjected to translation and rotation processing through data enhancement to obtain 1536 open-angle anterior chamber angles, 1214 narrow-angle anterior chamber angles and 1458 closed-angle anterior chamber angles, and the number of finally obtained training sets, verification sets and test sets is 3367, 419 and 422 respectively.
In order to further embody the advantages of the invention, the invention utilizes the glaucoma anterior chamber angle data set to evaluate the performance of the method for classifying the classification model constructed by the invention and the performance of the current mainstream depth learning classification method, and the evaluation and selection indexes comprise accuracy ACC and average sensitivity
Figure BDA0003003705090000131
Average specificity
Figure BDA0003003705090000132
And average equilibrium accuracy
Figure BDA0003003705090000133
It is defined as follows:
Figure BDA0003003705090000134
Figure BDA0003003705090000135
Figure BDA0003003705090000136
Figure BDA0003003705090000137
wherein N istestFor testing the number of images in the set, TPs、TNs、FPs、FNs(s ∈ {1,2,3}) indicates the number of true positives, true negatives, false positives and false negatives, respectively, when the s-th class is considered positive and the remaining classes are negative.
It should be noted that, in this embodiment, the number n of the pyramid convolution units is 4, the number of convolution kernels of the first pyramid convolution unit is n, and the scales of the convolution kernels are 3 × 3,5 × 5, …, (2n +1) × (2n + 1); the number of convolution kernels of the second pyramid convolution unit is n-1, and the scales of the convolution kernels are respectively 3 multiplied by 3,5 multiplied by 5, …, (2n-1) × (2 n-1); the convolution kernels of the (n-1) th pyramid convolution unit are 2, and the scales of the convolution kernels are respectively 3 multiplied by 3 and 5 multiplied by 5; the convolution kernels of the nth pyramid convolution unit are 1, and the scale is 3 multiplied by 3. In each pyramid convolution unit, sequentially fusing the feature graph extracted by each scale convolution kernel with the fusion feature graph extracted by the previous-stage convolution kernel in a convolution spanning connection mode to obtain the fusion feature graph extracted by each scale convolution kernel, thereby obtaining output feature graphs containing information of different scales and sequentially finishing feature extraction of all convolution kernels; namely, the feature map F extracted by the second scale convolution kerneli 2Feature map F extracted by convolution kernel with first scalei 1Fusing to obtain a fusion characteristic graph M extracted by a second scale convolution kerneli 2Extracting feature graph F from the third scale convolution kerneli 3Fusion characteristic graph F extracted by convolution kernel with second scalei 2Fusing to obtain a fused feature map extracted by a third-scale convolution kernel
Figure BDA0003003705090000141
The feature map F extracted by the fourth scale convolution kerneli 4Fused feature map extracted with third scale convolution kernel
Figure BDA0003003705090000142
Fusing to obtain a fusion characteristic diagram extracted by a fourth scale convolution kernel
Figure BDA0003003705090000143
And so on. It should be noted that the scale of the previous convolution kernel is smaller than that of the current convolution kernel. Specifically, in this embodiment, as shown in fig. 4, taking 3 convolution kernels in a pyramid convolution unit as an example, the operation is performed in a 3 × 3 convolution cross-connection manner, and a fusion feature graph extracted by a first block scale convolution kernel is:
Figure BDA0003003705090000144
wherein K3×3A convolution kernel of size 3 x 3. After the fusion characteristic graphs of the convolution kernels of all scales are extracted, outputting the fusion characteristic graphs by utilizing the combination operation of the characteristic graphs
Figure BDA0003003705090000145
And splicing the channels together, and changing the number of the spliced characteristic channels through 1 multiplied by 1 convolution so that the characteristic channels can be superposed and summed with the characteristic image currently input to the ith pyramid convolution unit pixel by pixel to obtain an output characteristic image containing multi-scale information. In this embodiment, the number of times of repetition when the first pyramid convolution unit and the first mixed attention pyramid module formed by the first mixed attention module extract features is 3, the number of times of repetition when the second pyramid convolution unit and the second mixed attention pyramid module formed by the second mixed attention module extract features is 4, the number of times of repetition when the third pyramid convolution unit and the third mixed attention pyramid module formed by the third mixed attention module extract features is 6, and the number of times of repetition when the fourth pyramid convolution unit and the fourth mixed attention pyramid module formed by the fourth mixed attention module extract features is 3.
Table 1 shows the comparison of classification performance of the classification model constructed by the present invention (herein, HapcNet) and different mainstream networks (VGG-16, ResNet-50, DenseNet-121, MobileNet, EfficientNet-B7, and PyConvNet-50) on the anterior chamber corner test set. Wherein EfficientNet-B7 is B7 series of EfficientNet, and numbers in other networks represent the networksThe number of layers, e.g., VGG-16, represents a 16-layer VGG network. As can be seen from Table 1, the algorithms with more prominent classification effect include EfficientNet, PyConvNet and HapcNet provided by the invention, which are superior to the other four deep learning methods in most indexes. Compared with the EfficientNet and PyConvNet methods, the HapcNet provided by the invention has the ACC value improved by about 1.47% and 1.66% respectively. In that
Figure BDA0003003705090000151
Although the difference between the networks is not significant, the VGG performs the worst, which is
Figure BDA0003003705090000152
0.9933, the HapcNet provided by the invention can reach 0.9998, and the best classification performance is obtained in the comparison networks.
TABLE 1
Figure BDA0003003705090000153
Furthermore, in order to more intuitively display the superiority of the method compared with other methods, the HapcNet and each comparative deep learning model provided by the invention are respectively adopted for experiments; fig. 5 is a graph showing an Accuracy curve of the hapcent and each comparative deep learning model provided by the present invention on the anterior chamber angle validation set, wherein an abscissa Epochs is the number of iterations, and an ordinate Accuracy is the Accuracy; fig. 6 is a confusion matrix of the hapcent and each comparative deep learning model provided by the present invention on the anterior chamber angle test set, wherein "0", "1" and "2" represent the open angle, narrow angle and closed angle, respectively; FIG. 6 (a) is a confusion matrix of the deep learning model VGG-16 on the anterior chamber corner test set; FIG. 6 (b) is a confusion matrix of the deep learning model ResNet-50 on the anterior chamber corner test set; fig. 6 (c) is a confusion matrix of the deep learning model DenseNet-121 on the anterior chamber corner test set; fig. 6 (d) is a confusion matrix of the deep learning model MobileNet on the anterior chamber angle test set; FIG. 6 (e) is a confusion matrix of the deep learning model EfficientNet-B7 on the anterior chamber corner test set; FIG. 6 (f) is the confusion matrix of the deep learning model PyConvNet-50 on the anterior chamber corner test set; fig. 6 (g) is a confusion matrix of hapcent on the anterior chamber angle test set provided by the present invention. From fig. 5, it can be seen that the hapcent provided by the present invention achieves better convergence accuracy than the deep learning model, and provides an extremely competitive convergence speed. As can be seen from the confusion matrix shown in FIG. 6, HapcNet, EfficientNet-B7, and PyConvNet-50 achieved superior classification performance on the anterior chamber corner test dataset as compared to the rest of the mainstream networks. Specifically, for the open angle anterior chamber angle, the HapcNet provided by the invention can provide the sub-optimal accuracy rate of 98.7%, and the EfficientNet-B7 obtains the optimal classification accuracy rate of 99.4%; for narrow-angle anterior chamber angles, the HapcNet provided by the invention can provide the best accuracy of 100%, and EfficientNet-B7 can only provide the second best accuracy; the HapcNet provided by the invention can still achieve the best classification accuracy for the closed-angle anterior chamber angle. In conclusion, the HapcNet provided by the invention has advantages in the classification of the anterior chamber angle data sets compared with other deep learning models.
Examples 2,
An image classification method, comprising: the image to be classified is input into the image classification model constructed by the construction method of the image classification model provided in embodiment 1, and a classification result is obtained. Preferably, before the image to be classified is input to the image classification model, the image to be classified is scaled to improve the computational efficiency.
The related technical scheme is the same as embodiment 1, and is not described herein.
Examples 3,
A machine-readable storage medium storing machine-executable instructions that, when invoked and executed by a processor, cause the processor to implement the method of constructing an image classification model provided in example 1 and/or the method of image classification provided in example 2.
The related technical features are the same as those of embodiment 1 and embodiment 2, and are not described herein.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A construction method of an image classification model is characterized by comprising the following steps:
s1, building an image classification model; the image classification model comprises: the convolution layer, the first pyramid convolution unit, the second pyramid convolution unit, the nth pyramid convolution unit, the pooling layer and the full-connection layer are sequentially cascaded; the first convolution layer is used for extracting an initial feature map of an input image and outputting the initial feature map to the first pyramid convolution unit; the ith pyramid convolution unit is used for adopting convolution kernels with different scales of n-i +1 in number to respectively perform further feature extraction on the feature map currently input into the ith pyramid convolution unit, and then sequentially fusing the feature map extracted by each scale convolution kernel with the fusion feature map extracted by the previous stage convolution kernel to obtain fusion feature maps extracted by each scale convolution kernel, namely a group of feature maps containing different scale information; fusing the characteristic graph containing different scale information with the characteristic graph currently input to the ith pyramid convolution unit to obtain an output characteristic graph containing multi-scale information; wherein i is 1,2, …, n; for each scale convolution kernel, the scale is larger than that of the previous stage convolution kernel;
and S2, inputting a training set collected according to a preset classification task into the image classification model for training to obtain a trained image classification model.
2. The method of constructing an image classification model according to claim 1, wherein the input image is an image obtained by scaling original sample images in the training set.
3. The structure of the image classification model of claim 1The building method is characterized in that a characteristic graph extracted by adopting a convolution kernel of a first block scale in the ith pyramid convolution unit is marked as Fi blockBlock 1, …, n-i + 1; for the ith pyramid convolution unit, when i is 1,2, …, n-1, a feature map F is obtainedi 2And characteristic diagram Fi 1Fusing to obtain a fusion characteristic diagram extracted by a second scale convolution kernel
Figure FDA0003003705080000011
Starting from block to 3, sequentially converting the characteristic diagram Fi blockFusion characteristic graph extracted from convolution kernel with first block-1 scale
Figure FDA0003003705080000012
Fusing to obtain a fusion characteristic graph extracted by a first block scale convolution kernel
Figure FDA0003003705080000013
After the fusion characteristic graphs of the convolution kernels of all scales are extracted, all the fusion characteristic graphs are subjected to
Figure FDA0003003705080000014
After splicing operation is carried out, the feature map is fused with the feature map currently input to the ith pyramid convolution unit to obtain an output feature map containing multi-scale information, and the output feature map is output to the (i +1) th pyramid convolution unit; and when i is equal to n, performing convolution operation on the feature map currently input to the ith pyramid convolution unit, fusing the feature map currently input to the ith pyramid convolution unit with the feature map currently input to the ith pyramid convolution unit to obtain an output feature map containing multi-scale information, outputting the output feature map to the pooling layer, performing pooling operation on the output feature map, and obtaining a classification result through the full connection layer.
4. The method for constructing an image classification model according to claim 3, wherein in the ith pyramid convolution unit, a specific way of fusing the feature map A and the feature map B or the fused feature map B is as follows: performing convolution operation on the A and then combining the A with the B; the mode of combining with B comprises pixel-by-pixel superposition operation or splicing operation or convolution operation after splicing;
the pair of fused feature maps
Figure FDA0003003705080000021
The specific way of fusing the feature map which is currently input to the ith pyramid convolution unit after the splicing operation is performed is as follows: combining each fused feature map
Figure FDA0003003705080000022
And splicing according to channels, changing the number of characteristic channels of the spliced characteristic diagram in a convolution mode to keep the characteristic channels consistent with the number of channels of the characteristic diagram currently input to the ith pyramid convolution unit, and then superposing and summing the characteristic channels and the characteristic diagram currently input to the ith pyramid convolution unit pixel by pixel to obtain an output characteristic diagram containing multi-scale information.
5. The method for constructing an image classification model according to any one of claims 1 to 4, wherein the output end of the ith pyramid convolution unit is further connected to the input end of the ith pyramid convolution unit;
the ith pyramid convolution unit is further used for inputting the output feature map containing the multi-scale information to the ith pyramid convolution unit again before outputting the output feature map containing the multi-scale information to the next pyramid convolution unit or the pooling layer so as to further extract features of the output feature map containing the multi-scale information; and after repeating for multiple times, outputting the result to the next pyramid convolution unit or the pooling layer.
6. The method for constructing an image classification model according to any one of claims 1 to 4, wherein the image classification model further comprises: a hybrid attention module; n mixed attention modules, when i is 1,2, …, n-1, the ith mixed attention module is located between the ith pyramid convolution unit and the (i +1) th pyramid convolution unit; when i is n, the ith mixed attention module is positioned between the ith pyramid convolution unit and the pooling layer;
the mixed attention module comprises a space attention network and a channel attention network which are connected in series or in parallel, and is used for screening output characteristic diagrams which are input by the pyramid convolution unit and contain multi-scale information in space and channel dimensions to obtain a characteristic diagram FsaThereby suppressing redundant background information.
7. The method for constructing an image classification model according to claim 6, wherein the output end of the ith mixed attention module is further connected to the input end of the ith pyramid convolution unit;
the ith mixed attention module is also used for converting the feature map FsaRe-inputting the feature map to the ith pyramid convolution unit to obtain the feature map FsaFurther extracting features; and after repeating for multiple times, outputting the result to the next pyramid convolution unit or the pooling layer.
8. The method for constructing an image classification model according to claim 6, wherein the channel attention network is used for performing a global average pooling operation on the input feature map according to channels to extract global spatial information on each channel; then, channel weights of global space information on all channels are learned through one-dimensional convolution kernels shared by the weights, and the learned channel weights are respectively acted on the corresponding channels in the input feature map so as to screen the feature information in channel dimensions;
size k of convolution kernel in the channel attention network1DAnd the number C of characteristic channels of the input characteristic diagram1DSatisfies the following conditions:
Figure FDA0003003705080000031
wherein gamma and b are learning parameters, | e! YoddRepresenting the odd number nearest to e.
9. An image classification method, comprising: inputting the image to be classified into the image classification model constructed by the image classification model construction method according to any one of claims 1 to 8, and obtaining a classification result.
10. A machine-readable storage medium having stored thereon machine-executable instructions which, when invoked and executed by a processor, cause the processor to carry out the method of constructing an image classification model according to any one of claims 1 to 8 and/or the method of image classification according to claim 9.
CN202110356938.7A 2021-04-01 2021-04-01 Image classification model construction method, image classification method and storage medium Active CN113191390B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110356938.7A CN113191390B (en) 2021-04-01 2021-04-01 Image classification model construction method, image classification method and storage medium
PCT/CN2021/086861 WO2022205502A1 (en) 2021-04-01 2021-04-13 Image classification model construction method, image classification method, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110356938.7A CN113191390B (en) 2021-04-01 2021-04-01 Image classification model construction method, image classification method and storage medium

Publications (2)

Publication Number Publication Date
CN113191390A true CN113191390A (en) 2021-07-30
CN113191390B CN113191390B (en) 2022-06-14

Family

ID=76974445

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110356938.7A Active CN113191390B (en) 2021-04-01 2021-04-01 Image classification model construction method, image classification method and storage medium

Country Status (2)

Country Link
CN (1) CN113191390B (en)
WO (1) WO2022205502A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113762251A (en) * 2021-08-17 2021-12-07 慧影医疗科技(北京)有限公司 Target classification method and system based on attention mechanism
CN113963217A (en) * 2021-11-16 2022-01-21 广东技术师范大学 Anterior chamber angle image grading method integrating weak supervision metric learning
CN114821121A (en) * 2022-05-09 2022-07-29 盐城工学院 Image classification method based on RGB three-component grouping attention weighted fusion
CN114841979A (en) * 2022-05-18 2022-08-02 大连理工大学人工智能大连研究院 Multi-scale attention-fused deep learning cancer molecular typing prediction method
CN115496808A (en) * 2022-11-21 2022-12-20 中山大学中山眼科中心 Corneal limbus positioning method and system
CN117876797A (en) * 2024-03-11 2024-04-12 中国地质大学(武汉) Image multi-label classification method, device and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116758029B (en) * 2023-06-15 2024-07-26 广东灵顿智链信息技术有限公司 Window cleaning machine movement control method and system based on machine vision

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109034210A (en) * 2018-07-04 2018-12-18 国家新闻出版广电总局广播科学研究院 Object detection method based on super Fusion Features Yu multi-Scale Pyramid network
CN109598269A (en) * 2018-11-14 2019-04-09 天津大学 A kind of semantic segmentation method based on multiresolution input with pyramid expansion convolution
CN110232394A (en) * 2018-03-06 2019-09-13 华南理工大学 A kind of multi-scale image semantic segmentation method
US20200005151A1 (en) * 2016-12-30 2020-01-02 Nokia Technologies Oy Artificial neural network
CN111507408A (en) * 2020-04-17 2020-08-07 深圳市商汤科技有限公司 Image processing method and device, electronic equipment and storage medium
CN112287924A (en) * 2020-12-24 2021-01-29 北京易真学思教育科技有限公司 Text region detection method, text region detection device, electronic equipment and computer storage medium
AU2020103901A4 (en) * 2020-12-04 2021-02-11 Chongqing Normal University Image Semantic Segmentation Method Based on Deep Full Convolutional Network and Conditional Random Field
CN112396645A (en) * 2020-11-06 2021-02-23 华中科技大学 Monocular image depth estimation method and system based on convolution residual learning
CN112418176A (en) * 2020-12-09 2021-02-26 江西师范大学 Remote sensing image semantic segmentation method based on pyramid pooling multilevel feature fusion network

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190228268A1 (en) * 2016-09-14 2019-07-25 Konica Minolta Laboratory U.S.A., Inc. Method and system for cell image segmentation using multi-stage convolutional neural networks
CN110188685B (en) * 2019-05-30 2021-01-05 燕山大学 Target counting method and system based on double-attention multi-scale cascade network
CN110992361A (en) * 2019-12-25 2020-04-10 创新奇智(成都)科技有限公司 Engine fastener detection system and detection method based on cost balance
CN111739075B (en) * 2020-06-15 2024-02-06 大连理工大学 Deep network lung texture recognition method combining multi-scale attention

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200005151A1 (en) * 2016-12-30 2020-01-02 Nokia Technologies Oy Artificial neural network
CN110232394A (en) * 2018-03-06 2019-09-13 华南理工大学 A kind of multi-scale image semantic segmentation method
CN109034210A (en) * 2018-07-04 2018-12-18 国家新闻出版广电总局广播科学研究院 Object detection method based on super Fusion Features Yu multi-Scale Pyramid network
CN109598269A (en) * 2018-11-14 2019-04-09 天津大学 A kind of semantic segmentation method based on multiresolution input with pyramid expansion convolution
CN111507408A (en) * 2020-04-17 2020-08-07 深圳市商汤科技有限公司 Image processing method and device, electronic equipment and storage medium
CN112396645A (en) * 2020-11-06 2021-02-23 华中科技大学 Monocular image depth estimation method and system based on convolution residual learning
AU2020103901A4 (en) * 2020-12-04 2021-02-11 Chongqing Normal University Image Semantic Segmentation Method Based on Deep Full Convolutional Network and Conditional Random Field
CN112418176A (en) * 2020-12-09 2021-02-26 江西师范大学 Remote sensing image semantic segmentation method based on pyramid pooling multilevel feature fusion network
CN112287924A (en) * 2020-12-24 2021-01-29 北京易真学思教育科技有限公司 Text region detection method, text region detection device, electronic equipment and computer storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
XINJIANG WANG等: "Scale –Equalizing Pyramid Convolution for Object Detection", 《COMPUTER VISION AND PATTERN RECOGNITION》, 6 May 2020 (2020-05-06), pages 1 - 16 *
XUMING ZHANG等: "Spiking cortical model-based noise detector for switching-based filters", 《JOURNAL OF ELECTRONIC IMAGING》, 2 April 2012 (2012-04-02), pages 013020 - 1 *
吕朦: "基于多尺度卷积神经网络的图像分类算法研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》, 15 September 2019 (2019-09-15), pages 138 - 692 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113762251A (en) * 2021-08-17 2021-12-07 慧影医疗科技(北京)有限公司 Target classification method and system based on attention mechanism
CN113762251B (en) * 2021-08-17 2024-05-10 慧影医疗科技(北京)股份有限公司 Attention mechanism-based target classification method and system
CN113963217A (en) * 2021-11-16 2022-01-21 广东技术师范大学 Anterior chamber angle image grading method integrating weak supervision metric learning
CN113963217B (en) * 2021-11-16 2024-10-01 广东技术师范大学 Anterior chamber angle image grading method integrating weak supervision measurement learning
CN114821121A (en) * 2022-05-09 2022-07-29 盐城工学院 Image classification method based on RGB three-component grouping attention weighted fusion
CN114841979A (en) * 2022-05-18 2022-08-02 大连理工大学人工智能大连研究院 Multi-scale attention-fused deep learning cancer molecular typing prediction method
CN114841979B (en) * 2022-05-18 2024-10-01 大连理工大学人工智能大连研究院 Deep learning cancer molecular typing prediction method with multi-scale attention fusion
CN115496808A (en) * 2022-11-21 2022-12-20 中山大学中山眼科中心 Corneal limbus positioning method and system
CN115496808B (en) * 2022-11-21 2023-03-24 中山大学中山眼科中心 Corneal limbus positioning method and system
CN117876797A (en) * 2024-03-11 2024-04-12 中国地质大学(武汉) Image multi-label classification method, device and storage medium
CN117876797B (en) * 2024-03-11 2024-06-04 中国地质大学(武汉) Image multi-label classification method, device and storage medium

Also Published As

Publication number Publication date
WO2022205502A1 (en) 2022-10-06
CN113191390B (en) 2022-06-14

Similar Documents

Publication Publication Date Title
CN113191390B (en) Image classification model construction method, image classification method and storage medium
EP3921776B1 (en) Method and system for classification and visualisation of 3d images
Verma et al. Pneumonia classification using deep learning in healthcare
CN112308200A (en) Neural network searching method and device
US11830187B2 (en) Automatic condition diagnosis using a segmentation-guided framework
Narayanan et al. Understanding deep neural network predictions for medical imaging applications
Olatunji et al. Identification of erythemato-squamous skin diseases using extreme learning machine and artificial neural network
US11875898B2 (en) Automatic condition diagnosis using an attention-guided framework
CN113706544A (en) Medical image segmentation method based on complete attention convolution neural network
Yan et al. Investigation of Customized Medical Decision Algorithms Utilizing Graph Neural Networks
CN114445356A (en) Multi-resolution-based full-field pathological section image tumor rapid positioning method
Shamrat et al. Analysing most efficient deep learning model to detect COVID-19 from computer tomography images
Dhawan et al. Deep Learning Based Sugarcane Downy Mildew Disease Detection Using CNN-LSTM Ensemble Model for Severity Level Classification
CN116958535B (en) Polyp segmentation system and method based on multi-scale residual error reasoning
Padmapriya et al. Computer-Aided Diagnostic System for Brain Tumor Classification using Explainable AI
Zhou et al. Balancing High-performance and Lightweight: HL-UNet for 3D Cardiac Medical Image Segmentation
Chen et al. Cardiac motion scoring based on CNN with attention mechanism
Sineglazov et al. Design of hybrid neural networks of the ensemble structure
Harshini et al. Machine Learning Approach for Various Eye Diseases using Modified Voting Classifier Model
Veeranki et al. Detection and classification of brain tumors using convolutional neural network
Parvathi et al. Diabetic Retinopathy Detection Using Transfer Learning
Saednia et al. An attention-guided deep neural network for annotating abnormalities in chest X-ray images: visualization of network decision basis
Shaik et al. A Deep Learning Framework for Prognosis Patients with COVID-19
Truong et al. A Novel Approach of Using Neural Circuit Policies for COVID-19 Classification on CT-Images
Mahmud et al. Automatic Diagnosis of Malaria from Thin Blood Smear Images using Deep Convolutional Neural Network with Multi-Resolution Feature Fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant