CN115546466A - A weakly supervised image object localization method based on multi-scale salient feature fusion - Google Patents
A weakly supervised image object localization method based on multi-scale salient feature fusion Download PDFInfo
- Publication number
- CN115546466A CN115546466A CN202211201019.3A CN202211201019A CN115546466A CN 115546466 A CN115546466 A CN 115546466A CN 202211201019 A CN202211201019 A CN 202211201019A CN 115546466 A CN115546466 A CN 115546466A
- Authority
- CN
- China
- Prior art keywords
- image
- pyramid
- layer
- network
- cam
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000004927 fusion Effects 0.000 title claims abstract description 42
- 238000000034 method Methods 0.000 title claims abstract description 29
- 230000004807 localization Effects 0.000 title claims description 5
- 230000004913 activation Effects 0.000 claims abstract description 44
- 230000011218 segmentation Effects 0.000 claims abstract description 36
- 238000012549 training Methods 0.000 claims description 21
- 238000010276 construction Methods 0.000 claims description 15
- 238000005070 sampling Methods 0.000 claims description 13
- 238000010586 diagram Methods 0.000 claims description 10
- 238000009499 grossing Methods 0.000 claims description 10
- 238000007781 pre-processing Methods 0.000 claims description 10
- 239000011159 matrix material Substances 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 238000003709 image segmentation Methods 0.000 claims description 4
- 238000012360 testing method Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000000844 transformation Methods 0.000 claims description 3
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 claims description 2
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 2
- 238000013507 mapping Methods 0.000 claims 1
- 239000000203 mixture Substances 0.000 claims 1
- 238000000638 solvent extraction Methods 0.000 claims 1
- 238000011160 research Methods 0.000 abstract description 6
- 238000002372 labelling Methods 0.000 abstract description 5
- 230000000295 complement effect Effects 0.000 abstract description 2
- 238000013527 convolutional neural network Methods 0.000 abstract description 2
- 230000035945 sensitivity Effects 0.000 abstract description 2
- 210000000481 breast Anatomy 0.000 description 4
- 230000003902 lesion Effects 0.000 description 3
- 238000002604 ultrasonography Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 210000005075 mammary gland Anatomy 0.000 description 2
- 206010006187 Breast cancer Diseases 0.000 description 1
- 208000026310 Breast neoplasm Diseases 0.000 description 1
- 241000947136 Jarava media Species 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000002059 diagnostic imaging Methods 0.000 description 1
- 230000000877 morphologic effect Effects 0.000 description 1
- 210000001519 tissue Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/24—Aligning, centring, orientation detection or correction of the image
- G06V10/245—Aligning, centring, orientation detection or correction of the image by locating a pattern; Special marks for positioning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
- G06V10/765—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
Description
技术领域technical field
本发明涉及一种基于多尺度显著特征融合的弱监督图像目标定位方法,属于计算机视觉领域。The invention relates to a weakly supervised image target positioning method based on multi-scale salient feature fusion, which belongs to the field of computer vision.
背景技术Background technique
图像感兴趣区域(RegionOfInterest,ROI)的定位分割是计算机视觉研究中的一个经典难题,目前基于自然图像的ROI定位分割研究已经取得了巨大的进展。然而对于一些特定领域中的非自然图像(例如医疗图像、花粉颗粒图像),它们的ROI较于自然图像更小,所以基于自然图像的ROI定位分割方法并不完全适用于这类图像。因此,基于特定领域的图像小目标定位分割研究具有十分重要的意义。The positioning and segmentation of the region of interest (RegionOfInterest, ROI) in the image is a classic problem in computer vision research. At present, the research on the positioning and segmentation of ROI based on natural images has made great progress. However, for unnatural images in some specific fields (such as medical images, pollen grain images), their ROIs are smaller than natural images, so the ROI positioning and segmentation method based on natural images is not completely suitable for such images. Therefore, it is of great significance to study the localization and segmentation of small image objects based on specific fields.
目前主流的基于深度学习的小目标定位分割方法有全监督学习和弱监督学习两类。Z. Ning等人[1]提出分别利用乳腺超声图像前景和背景的显著性图来引导主网络和辅助网络分别学习前景显著表示和背景显著表示,最终融合两者特征增强分割网络的形态学习能力。但该类全监督型深度学习方法一般都需要大量的已标注数据集,而获取图像的像素级标签是一项繁杂且费时的工作,相对而言,获取只带有类别信息的数据集更容易,故不少工作只使用图像级标签这类弱监督方法来实现目标定位分割。但弱监督学习中由分类网络得到的类激活图(Class Activation Map,CAM)只能覆盖图像中最为显著的部分,并不能指示完整的目标区域,即CAM的定位精度较低(激活不足),为此,Li Y等人[2]首先利用乳腺解剖学先验知识来约束分类网络对乳腺病变组织的搜索空间,再使用水平集算法对CAM 进行修正。但其忽略了一个重要信息:对于不同尺度的目标,分类网络捕获的判别性区域并不一致。At present, there are two types of mainstream deep learning-based small target segmentation methods: fully supervised learning and weakly supervised learning. Z. Ning et al. [1] proposed to use the saliency maps of the foreground and background of breast ultrasound images to guide the main network and the auxiliary network to learn the foreground saliency representation and the background saliency representation respectively, and finally fuse the two features to enhance the morphological learning ability of the segmentation network . However, this type of fully supervised deep learning method generally requires a large amount of labeled data sets, and obtaining pixel-level labels of images is a complicated and time-consuming task. Relatively speaking, it is easier to obtain data sets with only category information , so many works only use weakly supervised methods such as image-level labels to achieve target location segmentation. However, the class activation map (Class Activation Map, CAM) obtained by the classification network in weakly supervised learning can only cover the most significant part of the image, and cannot indicate the complete target area, that is, the positioning accuracy of CAM is low (insufficient activation), To this end, Li Y et al. [2] first used the prior knowledge of breast anatomy to constrain the search space of the classification network for breast lesion tissue, and then used the level set algorithm to correct the CAM. But it ignores an important information: for objects of different scales, the discriminative regions captured by the classification network are not consistent.
为了解决小目标图像ROI标注工作繁杂、CAM激活不足两个问题,本发明重点关注优化弱监督下分类网络输出类激活图的研究。本发明涉及两个层面的信息融合:①由于卷积神经网络中最底层的特征图语义信息弱但位置信息强,故可与最高层特征图进行融合得到分类网络最终的特征图;②由于分类网络对不同尺度ROI的敏感度不同,其得到的类激活图也有所不同,所以融合不同激活图中互补的对象信息能够完善图像中目标区域的定位,进而产生更准确的伪标签用于分割任务。In order to solve the two problems of complicated ROI labeling for small target images and insufficient CAM activation, the present invention focuses on the research on optimizing the output class activation map of the classification network under weak supervision. The present invention involves information fusion at two levels: ①Since the semantic information of the feature map at the bottom layer in the convolutional neural network is weak but the position information is strong, it can be fused with the feature map at the top layer to obtain the final feature map of the classification network; ②Since the classification The sensitivity of the network to ROIs of different scales is different, and the class activation maps obtained are also different. Therefore, the fusion of complementary object information in different activation maps can improve the positioning of the target area in the image, and then generate more accurate pseudo-labels for segmentation tasks. .
参考文献:references:
[1]Z.Ning,S.Zhong,Q.Feng,W.Chen and Y.Zhang,"SMU-Net:Saliency-GuidedMorphology-Aware U-Net for Breast Lesion Segmentation in Ultrasound Image,"inIEEE Transactions on Medical Imaging,vol.41,no.2,pp.476-490,Feb.2022,doi:10.1109/TMI.2021.3116087.[1] Z.Ning, S.Zhong, Q.Feng, W.Chen and Y.Zhang,"SMU-Net:Saliency-GuidedMorphology-Aware U-Net for Breast Lesion Segmentation in Ultrasound Image,"inIEEE Transactions on Medical Imaging ,vol.41,no.2,pp.476-490,Feb.2022,doi:10.1109/TMI.2021.3116087.
[2]Li Y,Liu Y,Huang L,Wang Z,Luo J.Deep weakly-supervised breasttumor segmentation in ultrasound images with explicit anatomicalconstraints.Med Image Anal. 2022Feb;76:102315.doi:10.1016/j.media.2021.102315.Epub 2021Nov 28.PMID: 34902792.[2] Li Y, Liu Y, Huang L, Wang Z, Luo J. Deep weakly-supervised breast tumor segmentation in ultrasound images with explicit anatomical constraints. Med Image Anal. 2022Feb; 76:102315.doi:10.1016/j.media.2021.102315 .Epub 2021Nov 28.PMID: 34902792.
发明内容Contents of the invention
针对现有基于全监督学习的图像小目标定位分割研究存在标注工作繁杂、而基于弱监督学习的单尺度图像小目标定位分割研究存在CAM激活不足的问题,本发明设计了一种基于多尺度显著特征融合的弱监督图像目标定位方法。具体而言,我们通过构建图像金字塔获取三种不同尺度的图像,并由此得到同一张图像的多尺度CAM,然后将其进行融合,最后将融合后的CAM作为弱监督信息训练分割网络。Aiming at the problem that the existing research on the location and segmentation of image small objects based on fully supervised learning has complicated labeling work, and the research on location and segmentation of single-scale images based on weakly supervised learning has insufficient CAM activation, the present invention designs a method based on multi-scale saliency A Weakly-Supervised Image Object Localization Approach for Feature Fusion. Specifically, we obtain images of three different scales by constructing an image pyramid, and thus obtain a multi-scale CAM of the same image, then fuse them, and finally use the fused CAM as weakly supervised information to train a segmentation network.
本发明所述的基于多尺度显著特征融合的弱监督图像目标定位方法由五个阶段组成:第一阶段为图像的预处理,主要对数据集中图像的分辨率进行统一。第二阶段为图像金字塔的构建。该阶段主要包括以输入图像为源图像向下采样构建图像金字塔顶层、向上采样构建图像金字塔底层、最终图像金字塔层数确定三个部分。第三阶段为分类器特征图的获取。该阶段首先为图像金字塔每层的图像训练一个分类器,然后对于任一层的分类器,将最高层的特征图与最低层的特征图进行拼接以获得融合后的特征图。第四阶段为多尺度CAM的融合。该阶段首先通过每层特征图的加权和获得同一图像的多尺度CAM,然后将所有CAM进行对齐,最后将其融合,获得源图像最终的CAM。第五阶段为目标区域的预测。该阶段首先将融合后的CAM转换为伪二值标签,然后利用该伪标签来训练分割网络,最后通过分割网络预测目标区域。The weakly supervised image target location method based on multi-scale salient feature fusion in the present invention consists of five stages: the first stage is image preprocessing, which mainly unifies the resolution of images in the data set. The second stage is the construction of the image pyramid. This stage mainly includes three parts: taking the input image as the source image to down-sample to construct the top layer of the image pyramid, up-sampling to construct the bottom layer of the image pyramid, and finally determining the number of layers of the image pyramid. The third stage is the acquisition of classifier feature maps. In this stage, a classifier is first trained for each layer of the image pyramid, and then for any layer of the classifier, the feature map of the highest layer is spliced with the feature map of the lowest layer to obtain a fused feature map. The fourth stage is the fusion of multi-scale CAM. In this stage, the multi-scale CAM of the same image is first obtained through the weighted sum of the feature maps of each layer, then all the CAMs are aligned, and finally they are fused to obtain the final CAM of the source image. The fifth stage is the prediction of the target area. This stage first converts the fused CAM into a pseudo-binary label, then uses the pseudo-label to train the segmentation network, and finally predicts the target region through the segmentation network.
本发明的具体方案如附图2所示。Concrete scheme of the present invention is as shown in accompanying drawing 2.
步骤1:图像预处理Step 1: Image Preprocessing
图像预处理的目的是统一数据集内所有图像的尺寸。本发明所参考的数据主要为小目标型图像数据,例如公开的乳腺图像数据集和花粉图像数据集。若数据集内图像分辨率不统一,将导致后续分类网络得到的特征图大小也不一致,而分类网络中全连接层的参数无法适应不同大小的特征图,所以必须将所有输入图像的大小固定为统一的尺寸。The purpose of image preprocessing is to unify the dimensions of all images in the dataset. The data referred to in the present invention are mainly small target image data, such as the public mammary gland image data set and pollen image data set. If the image resolution in the data set is not uniform, the size of the feature map obtained by the subsequent classification network will be inconsistent, and the parameters of the fully connected layer in the classification network cannot adapt to feature maps of different sizes, so the size of all input images must be fixed as Uniform size.
步骤2:图像金字塔构建Step 2: Image Pyramid Construction
该步骤以数据集内图像为源图像,通过构建高斯金字塔来获取输入图像的三种尺度变换。为同时获取较于原图更全局和更细粒度的信息,本发明构建的高斯金字塔采用向下采样和向上采样混合的金字塔结构。In this step, the image in the dataset is used as the source image, and the three scale transformations of the input image are obtained by constructing a Gaussian pyramid. In order to obtain more global and finer-grained information than the original image at the same time, the Gaussian pyramid constructed by the present invention adopts a pyramid structure of down-sampling and up-sampling.
步骤2.1图像金字塔顶层构建:以输入图像为源图像,首先利用5*5大小的模板高斯核对其进行高斯平滑处理,然后通过去除图像矩阵中的偶数行和列来对处理后的图像进行下采样,最后得到输入图像1/4大小的图像,并以此作为图像金字塔顶层。Step 2.1 Construction of the top layer of the image pyramid: take the input image as the source image, first use the template Gaussian kernel of 5*5 size to perform Gaussian smoothing on it, and then downsample the processed image by removing the even-numbered rows and columns in the image matrix , and finally get an image of 1/4 the size of the input image, and use it as the top layer of the image pyramid.
步骤2.2图像金字塔底层构建:以输入图像为源图像,首先将图像在每个方向上都扩大为原来的2倍,其中新增的行和列用数值0来填充;然后将5*5大小的模板高斯核乘4 后再与放大后的图像进行卷积运算,以获得新增像素的近似值。最后得到输入图像4倍大小的图像,并以此作为图像金字塔底层。Step 2.2 Construction of the bottom layer of the image pyramid: take the input image as the source image, first expand the image to twice the original size in each direction, and fill the newly added rows and columns with the value 0; then the 5*5 size The template Gaussian kernel is multiplied by 4 and then convolved with the enlarged image to obtain the approximate value of the added pixels. Finally, an image that is 4 times the size of the input image is obtained, and this is used as the bottom layer of the image pyramid.
步骤2.3图像金字塔层数确定:为图像金字塔中不同层上的图像确定编号,其中图像金字塔层数编号从0开始,随着金字塔层数的增加,图像分辨率相应减小。本发明构建的图像金字塔为3层,其中原图处于第2层,相应的金字塔层数编号为1。Step 2.3 Determining the number of layers in the image pyramid: determine the numbers for the images on different layers in the image pyramid, where the number of layers in the image pyramid starts from 0, and as the number of pyramid layers increases, the image resolution decreases accordingly. The image pyramid constructed by the present invention has three layers, wherein the original image is on the second layer, and the corresponding pyramid layer number is 1.
步骤3:分类器特征图获取Step 3: Classifier feature map acquisition
该步骤针对图像金字塔中三种不同尺度的图像,分别训练一个分类器,以得到同一图像三种不同尺度的类激活图。In this step, a classifier is trained for images of three different scales in the image pyramid to obtain class activation maps of three different scales of the same image.
步骤3.1分类网络训练:本发明选用经典的ResNet50作为分类网络,用于判断输入图像所属的类别。由于图像金字塔中存在三种不同尺度的图像,所以最终需要为三个不同尺度的图像数据集分别训练一个分类器。Step 3.1 Classification network training: the present invention selects the classic ResNet50 as the classification network for judging the category to which the input image belongs. Since there are three images of different scales in the image pyramid, it is finally necessary to train a classifier for each of the three image datasets of different scales.
步骤3.2高低层特征图融合:对于每一个分类网络,浅层感受野较小,提取的是纹理、边缘等低级的几何信息;而高层感受野大,提取的是更全局、更深层次的语义信息。所以本发明将每一个分类网络中最高层特征与最低层特征进行对齐拼接,促使网络增强小目标对象低层次的特征,以获得网络最后的融合特征图。Step 3.2 Fusion of high- and low-level feature maps: For each classification network, the shallow receptive field is small, and low-level geometric information such as texture and edge is extracted; while the high-level receptive field is large, more global and deeper semantic information is extracted . Therefore, the present invention aligns and splices the highest-level features and the lowest-level features in each classification network, prompting the network to enhance the low-level features of small target objects, so as to obtain the final fusion feature map of the network.
步骤4:多尺度CAM融合Step 4: Multi-scale CAM Fusion
该步骤获取三个分类网络的CAM,将其对齐后再进行融合,最终得到图像对应的融合CAM图。This step obtains the CAMs of the three classification networks, aligns them and then fuses them, and finally obtains the fused CAM map corresponding to the image.
步骤4.1分类网络CAM获取:针对步骤3.2中得到的最终融合特征图,通过将其与分类网络中全连接层的权重矩阵相乘以获得CAM。由于本发明使用了三个分类网络,所以对于每一张源图像,最终将得到三张不同尺度的CAM,构成CAM金字塔。Step 4.1 Classification network CAM acquisition: For the final fusion feature map obtained in step 3.2, multiply it with the weight matrix of the fully connected layer in the classification network to obtain the CAM. Since the present invention uses three classification networks, for each source image, three CAMs of different scales will be finally obtained to form a CAM pyramid.
步骤4.2多个CAM对齐:将不同尺度的CAM基于源图像的尺寸进行对齐,以方便后续的融合操作。Step 4.2 Align multiple CAMs: align CAMs of different scales based on the size of the source image to facilitate subsequent fusion operations.
步骤4.3多个CAM融合:对于融合CAM中的任一像素,本发明采用以下判断机制:若至少存在两个独立CAM在该点关于某类别的激活值大于等于阈值,则认为该像素点属于该类别。若经过判断机制后该像素点未分配给任何类别,则忽略该像素点;若该像素点被分配给了多个类别,则将该像素点分配给三个独立CAM在该点的最大平均激活值对应的类别。Step 4.3 Fusion of multiple CAMs: For any pixel in the fused CAM, the present invention adopts the following judging mechanism: if there are at least two independent CAMs whose activation values for a certain category at this point are greater than or equal to the threshold, the pixel is considered to belong to the category. If the pixel is not assigned to any category after the judgment mechanism, the pixel is ignored; if the pixel is assigned to multiple categories, the pixel is assigned to the maximum average activation of the three independent CAMs at that point The category to which the value corresponds.
步骤5:ROI预测Step 5: ROI Prediction
该步骤首先将步骤4.3中得到的融合CAM转换为伪标签,再基于伪标签训练图像ROI 的定位分割网络,最后利用网络进行ROI的预测。In this step, first convert the fused CAM obtained in step 4.3 into a pseudo-label, then train the image ROI positioning and segmentation network based on the pseudo-label, and finally use the network to predict the ROI.
步骤5.1融合CAM伪标签转换:将融合后的CAM转换为用于分割网络训练的伪二值掩膜。本发明采用以下判断机制:若融合CAM中的任意像素点属于非目标类,则将该点像素值赋为0,否则赋为1。Step 5.1 Fused CAM pseudo-label conversion: Convert the fused CAM to a pseudo-binary mask for segmentation network training. The present invention adopts the following judging mechanism: if any pixel point in the fused CAM belongs to the non-target category, assign the pixel value of this point as 0, otherwise assign it as 1.
步骤5.2分割网络训练预测:基于步骤5.1中获得的伪二值标签训练图像分割网络,本发明选用的分割网络架构为U-Net,最后利用训练好的网络对测试集进行ROI的分割预测。Step 5.2 Segmentation network training prediction: Based on the pseudo-binary label training image segmentation network obtained in step 5.1, the segmentation network architecture selected by the present invention is U-Net, and finally use the trained network to perform ROI segmentation prediction on the test set.
与已有技术相比,本发明有益效果在于:Compared with the prior art, the present invention has the beneficial effects of:
一、本发明采用的基于多尺度显著特征融合的弱监督图像目标定位方法,有效地避免了全监督学习下所需关于ROI的像素级标注,这很大程度上减轻了数据的标注工作量。1. The weakly supervised image target positioning method based on multi-scale salient feature fusion adopted by the present invention effectively avoids the pixel-level labeling of ROI required under fully supervised learning, which greatly reduces the workload of data labeling.
二、本发明采用的基于多尺度显著特征融合的弱监督图像目标定位方法,将每个分类网络得到的最高层特征图和最低层特征图进行拼接,加强了网络对小目标对象低层次特征的学习,从而使得网络能关注到小目标对象的更多特征。2. The weakly supervised image target positioning method based on multi-scale salient feature fusion adopted by the present invention splices the highest-level feature map and the lowest-level feature map obtained by each classification network, which strengthens the network's ability to detect low-level features of small target objects Learning, so that the network can focus on more features of small target objects.
三、本发明采用的基于多尺度显著特征融合的弱监督图像目标定位方法,基于源图像构建向下采样、向上采样的混合金字塔结构,由此同时得到较于原图更全局、更细粒度的特征,通过融合这些特征得到更为完整的CAM。3. The weakly supervised image target positioning method based on multi-scale salient feature fusion adopted by the present invention constructs a hybrid pyramid structure of down-sampling and up-sampling based on the source image, thereby obtaining a more global and finer-grained image than the original image at the same time features, and a more complete CAM is obtained by fusing these features.
附图说明Description of drawings
图1为本发明构建的图像金字塔示意图。Fig. 1 is a schematic diagram of an image pyramid constructed in the present invention.
图2为本发明提出方法的整体流程图。Fig. 2 is the overall flowchart of the method proposed by the present invention.
具体实施方式detailed description
以下结合说明书附图2,对本发明的实施实例加以详细说明:Below in conjunction with accompanying drawing 2 of description, the embodiment of the present invention is described in detail:
本发明所述的基于多尺度显著特征融合的弱监督图像目标定位方法由五个阶段组成:第一阶段为图像的预处理,主要对数据集的分辨率进行统一。第二阶段为图像金字塔的构建。该阶段主要包括以输入图像为源图像向下采样构建图像金字塔顶层、向上采样构建图像金字塔底层、最终图像金字塔层数确定三个部分。第三阶段为分类器特征图的获取。该阶段首先为图像金字塔每层的图像训练一个分类器,然后对于任一层的分类器,将最高层的特征图与最低层的特征图进行拼接以获得融合后的特征图。第四阶段为多尺度CAM的融合。该阶段首先通过每层特征图的加权和获得同一图像的多尺度CAM,然后将所有CAM 进行对齐,最后将其融合,获得源图像最终的CAM。第五阶段为ROI的预测。该阶段首先将融合后的CAM转换为伪二值标签,然后利用该伪标签来训练分割网络,最后通过分割网络预测ROI。The weakly supervised image target positioning method based on multi-scale salient feature fusion in the present invention consists of five stages: the first stage is image preprocessing, which mainly unifies the resolution of the data set. The second stage is the construction of the image pyramid. This stage mainly includes three parts: taking the input image as the source image to down-sample to construct the top layer of the image pyramid, up-sampling to construct the bottom layer of the image pyramid, and finally determining the number of layers of the image pyramid. The third stage is the acquisition of classifier feature maps. In this stage, a classifier is first trained for each layer of the image pyramid, and then for any layer of the classifier, the feature map of the highest layer is spliced with the feature map of the lowest layer to obtain a fused feature map. The fourth stage is the fusion of multi-scale CAM. In this stage, the multi-scale CAM of the same image is first obtained through the weighted sum of the feature maps of each layer, then all the CAMs are aligned, and finally they are fused to obtain the final CAM of the source image. The fifth stage is the prediction of ROI. This stage first converts the fused CAM into a pseudo-binary label, then uses the pseudo-label to train the segmentation network, and finally predicts the ROI through the segmentation network.
具体地,该方法包括以下步骤:Specifically, the method includes the following steps:
步骤1:图像预处理Step 1: Image Preprocessing
图像预处理的目的是统一数据集内所有图像的尺寸。本发明所参考的数据主要为小目标型图像数据,例如公开的乳腺图像数据集和花粉图像数据集。若数据集中图像分辨率不唯一,将导致后续分类网络最后一个卷积层得到的特征图大小不一,而全连接层和前一层连接的参数维度是事先固定的,即全连接层的参数无法适应不同的特征图大小,所以必须将所有输入图像的大小进行固定。为最小程度地降低图像信息损失及方便后续分类网络中的卷积运算,我们将所有图像的尺寸都设定为512*512。The purpose of image preprocessing is to unify the dimensions of all images in the dataset. The data referred to in the present invention are mainly small target image data, such as the public mammary gland image data set and pollen image data set. If the image resolution in the data set is not unique, the size of the feature map obtained by the last convolutional layer of the subsequent classification network will be different, and the parameter dimension of the connection between the fully connected layer and the previous layer is fixed in advance, that is, the parameters of the fully connected layer It cannot adapt to different feature map sizes, so the size of all input images must be fixed. In order to minimize the loss of image information and facilitate the convolution operation in the subsequent classification network, we set the size of all images to 512*512.
步骤2:图像金字塔构建Step 2: Image Pyramid Construction
该步骤通过构建高斯金字塔来获取输入图像的三种尺度变换。为同时获取较于原图而言更全局与更细粒度的信息,本发明构建的高斯金字塔采用向下采样和向上采样混合的金字塔结构。具体而言,构建过程包括两个部分:其一,通过高斯金字塔将输入原图的宽和高分别下采样为原始图像的50%,由此得到256*256分辨率的图像作为金字塔的顶层;其二,通过高斯金字塔将输入原图的宽和高分别上采样为原始图像的200%,由此得到 1024*1024分辨率的图像作为金字塔的底层。This step obtains three scale transformations of the input image by constructing a Gaussian pyramid. In order to obtain more global and finer-grained information than the original image at the same time, the Gaussian pyramid constructed by the present invention adopts a pyramid structure of down-sampling and up-sampling. Specifically, the construction process includes two parts: first, the width and height of the input original image are down-sampled to 50% of the original image through the Gaussian pyramid, thereby obtaining a 256*256 resolution image as the top layer of the pyramid; Second, through the Gaussian pyramid, the width and height of the input original image are up-sampled to 200% of the original image, thereby obtaining a 1024*1024 resolution image as the bottom layer of the pyramid.
步骤2.1图像金字塔顶层构建:对于给定的512*512大小的原图,我们向下采样以原图1/4大小的图像构建高斯金字塔的顶层,图像对应分辨率为256*256。具体过程如公式(1) 所示:首先对512*512的原始图像做一次高斯平滑处理,其与简单平滑不同,高斯平滑在计算周围像素加权平均值时,对中心点临近的像素赋予了更高的权重值;然后通过去除图像矩阵中的偶数行和列来对处理后的图像进行下采样,以得到256*256分辨率的图像。Step 2.1 Construction of the top layer of the image pyramid: For a given original image with a size of 512*512, we downsample to construct the top layer of the Gaussian pyramid with an image of 1/4 the size of the original image, and the corresponding resolution of the image is 256*256. The specific process is shown in formula (1): firstly, a Gaussian smoothing process is performed on the 512*512 original image, which is different from simple smoothing. When Gaussian smoothing calculates the weighted average of surrounding pixels, it gives more High weight value; then the processed image is down-sampled by removing even rows and columns in the image matrix to obtain a 256*256 resolution image.
(1≤l≤L,0≤x≤Rl,0≤y≤Cl)(1≤l≤L,0≤x≤R l ,0≤y≤C l )
其中Gl为高斯金字塔的第l层图像(高斯金字塔层数从0开始),L为高斯金字塔顶层的层号,Rl和Cl分别为第l层图像的行数和列数,W(m,n)为高斯滤波模板的第m行第n列数值,一般取5*5大小,本发明选用反锐化掩膜算法中广泛使用的二维可分离5*5的高斯核对原图进行平滑处理,其值如(2)所示。Among them, G l is the l-th layer image of the Gaussian pyramid (the number of layers of the Gaussian pyramid starts from 0), L is the layer number of the top layer of the Gaussian pyramid, R l and C l are the number of rows and columns of the l-th layer image respectively, W( m, n) is the value of the mth row and the nth column of the Gaussian filter template, which is generally 5*5 in size. The present invention selects the two-dimensional separable 5*5 Gaussian kernel widely used in the unsharp mask algorithm to check the original image. Smoothing, its value is shown in (2).
步骤2.2图像金字塔底层构建:对于给定的512*512原图,我们向上采样以原图4倍大小的图像构建高斯金字塔的最低层,其对应分辨率为1024*1024。具体过程为:首先将图像在每个方向上扩大为原图像的2倍,其中新增的行和列都用数值0来填充;然后将向下采样中使用的高斯内核先乘4,再将其与放大的图像进行卷积运算,以获得新增像素的近似值,最终得到1024*1024分辨率的图像。Step 2.2 Construction of the bottom layer of the image pyramid: For a given 512*512 original image, we upsample an image 4 times the size of the original image to construct the lowest layer of the Gaussian pyramid, which corresponds to a resolution of 1024*1024. The specific process is as follows: First, the image is enlarged to twice the original image in each direction, and the newly added rows and columns are filled with the value 0; then the Gaussian kernel used in the downsampling is multiplied by 4, and then the It performs convolution operation with the enlarged image to obtain the approximate value of the newly added pixels, and finally obtains an image with a resolution of 1024*1024.
步骤2.3图像金字塔层数确定:图像金字塔构建完成后,分辨率为w*h的图像对应高斯金字塔中的层数l由公式(3)确定。Step 2.3 Determination of the number of layers of the image pyramid: After the construction of the image pyramid is completed, the number of layers l in the Gaussian pyramid corresponding to the image with a resolution of w*h is determined by formula (3).
其中l0为512*512的原图在图像金字塔中的层数,由于高斯金字塔中图像的三个尺度分别为1024、512、256,所以原图对应的层数l0=1。由公式(3)可得,1024*1024大小的图像对应层数为0,即位于高斯金字塔的最低层;256*256大小的图像对应层数为2,即位于高斯金字塔的最顶层。Where l 0 is the number of layers of the original image of 512*512 in the image pyramid. Since the three scales of the image in the Gaussian pyramid are 1024, 512, and 256 respectively, the number of layers corresponding to the original image is l 0 =1. From the formula (3), it can be obtained that the corresponding layer number of the 1024*1024 image is 0, that is, it is located at the lowest layer of the Gaussian pyramid; the corresponding layer number of the 256*256 image is 2, that is, it is located at the top layer of the Gaussian pyramid.
步骤3:分类器特征图获取Step 3: Classifier feature map acquisition
该步骤针对图像金字塔中三种不同尺度的图像,分别训练一个分类器,以得到同一图像三种不同尺度的类激活图。In this step, a classifier is trained for images of three different scales in the image pyramid to obtain class activation maps of three different scales of the same image.
步骤3.1分类网络训练:分类网络用于判别输入图像所属类别,本发明选用的分类网络为ResNet50,ResNet50网络包含49个卷积层和1个全连接层,每个残差块都有三层卷积,该网络的残差结构能够将输入直接连接到后面的网络层,以避免信息的丢失。本发明针对图像金字塔中256*256、512*512、1024*1024三种分辨率的图像分别训练一个分类器,记为R1、R2、R3。Step 3.1 Classification network training: The classification network is used to distinguish the category of the input image. The classification network selected by the present invention is ResNet50. The ResNet50 network includes 49 convolutional layers and 1 fully connected layer. Each residual block has three layers of convolution , the residual structure of the network can directly connect the input to the following network layer to avoid the loss of information. The present invention trains a classifier respectively for images with three resolutions of 256*256, 512*512, and 1024*1024 in the image pyramid, which are denoted as R 1 , R 2 , and R 3 .
步骤3.2高低层特征图融合:对于每一个分类网络而言,浅层的特征图感受野小,提取的是图像纹理、边缘等局部且通用的特征,即低级的几何信息;而随着网络层数的加深,高层的特征图感受野更大,提取的是更深层次、更全局的特征,即高级的语义信息。所以本发明将每一个分类网络中的最高层特征与最低层特征进行拼接,以此作为网络输出的最后特征图,促使网络增强小目标对象低层次的特征。对于分类器R1,设其网络得到的最高层特征图为最低层特征图为则分类器R1最终的特征图f1可由公式(4)得到。Step 3.2 Fusion of high-level and low-level feature maps: For each classification network, the feature map of the shallow layer has a small receptive field, and the local and general features such as image texture and edge are extracted, that is, low-level geometric information; As the number deepens, the high-level feature map has a larger receptive field, and deeper and more global features are extracted, that is, high-level semantic information. Therefore, the present invention splices the highest-level features and the lowest-level features in each classification network as the final feature map output by the network, prompting the network to enhance the low-level features of small target objects. For the classifier R 1 , let the highest-level feature map obtained by its network be The lowest layer feature map is Then the final feature map f 1 of classifier R 1 can be obtained by formula (4).
其中UP为上采样操作,即对最高层特征图进行上采样,以达到和最低层特征图相同的尺寸,便于后续处理,为特征图的横向连接操作,即逐元素相加。同理,分类器R2和分类器R3最终的特征图f2、f3则可由公式(5)和(6)所得。Among them, UP is an upsampling operation, that is, upsampling the feature map of the highest layer to achieve the same size as the feature map of the lowest layer, which is convenient for subsequent processing. is the horizontal connection operation of the feature map, that is, element-wise addition. Similarly, the final feature maps f 2 and f 3 of classifier R 2 and classifier R 3 can be obtained by formulas (5) and (6).
其中分别为分类器R2得到的最高层特征图和最低层特征图,则分别为分类器R3得到的最高层特征图和最低层特征图,UP为上采样操作,为特征图的横向连接操作,即逐元素相加。in are the highest-level feature map and the lowest-level feature map obtained by the classifier R 2 , respectively, are respectively the highest-level feature map and the lowest-level feature map obtained by the classifier R 3 , UP is an upsampling operation, is the horizontal connection operation of the feature map, that is, element-wise addition.
步骤4:多尺度CAM融合Step 4: Multi-scale CAM Fusion
该步骤获取三个分类网络的CAM,将其对齐后再进行融合,最终的输出为图像对应的融合CAM图。This step obtains the CAMs of the three classification networks, aligns them and then fuses them, and the final output is the fused CAM image corresponding to the image.
步骤4.1分类网络CAM获取:对于分类器R1而言,256*256分辨率大小的图像中空间像素点u(x,y)关于类c的激活值可由公式(7)获得。Step 4.1 Classification network CAM acquisition: For classifier R 1 , the activation value of spatial pixel u(x,y) in the image with a resolution of 256*256 for class c Can be obtained by formula (7).
其中i为分类网络最后一个卷积层的通道编号,K为分类网络最后一个卷积层的通道数目,为通道i中类别c对应的权重,fi 1(x,y)为分类器R1最后融合特征图中通道i上位置 (x,y)的特征值。同理,对于分类器R2和分类器R3,图像上像素点u(x,y)关于类c的激活值可分别由公式(8)和(9)获得。Where i is the channel number of the last convolutional layer of the classification network, K is the number of channels of the last convolutional layer of the classification network, is the weight corresponding to category c in channel i, f i 1 (x, y) is the feature value of position (x, y) on channel i in the final fusion feature map of classifier R 1 . Similarly, for classifier R 2 and classifier R 3 , the activation value of pixel u(x,y) on the image with respect to class c can be obtained by formulas (8) and (9), respectively.
其中i为分类网络最后一个卷积层的通道编号,K为分类网络最后一个卷积层的通道数目,为通道i中类别c对应的权重,fi 2(x,y)和fi 3(x,y)分别为分类器R2和R3最后融合特征图中通道i上位置(x,y)的特征值。Where i is the channel number of the last convolutional layer of the classification network, K is the number of channels of the last convolutional layer of the classification network, is the weight corresponding to category c in channel i, f i 2 (x, y) and f i 3 (x, y) are the positions (x, y) on channel i in the final fusion feature map of classifiers R 2 and R 3 respectively eigenvalues of .
步骤4.2多个CAM对齐:由于分类器R1、R2、R3的输入为三层的图像金字塔,故其得到的三个类激活映射图尺寸也构成激活图金字塔。为了融合三个不同尺度的CAM,需要将其进行对齐,本发明将所有CAM都设定为和原始输入图像一致的大小,即512*512 分辨率。Step 4.2 Multiple CAM alignment: Since the input of the classifiers R 1 , R 2 , and R 3 is a three-layer image pyramid, the dimensions of the three class activation maps obtained also constitute an activation map pyramid. In order to fuse three CAMs of different scales, they need to be aligned. In the present invention, all CAMs are set to the same size as the original input image, that is, 512*512 resolution.
步骤4.3多个CAM融合:将对齐后的三个CAM融合为最终的CAM。对于融合类激活图Magg中的像素点u(x,y),本发明的融合机制如下所述:若至少存在两个独立激活图在该点关于类c的激活值大于等于阈值θ(θ∈[0.5,0.7]),则认为Magg中该像素点属于类别c。若经过融合机制后该像素点未分配给任何类别,则忽略该像素;若该像素点被分配给了多个类别,则按照公式(10)判定其最终所属类别cla(x,y)。Step 4.3 Multiple CAM fusion: The aligned three CAMs are fused into the final CAM. For the pixel point u(x,y) in the fusion class activation map Magg , the fusion mechanism of the present invention is as follows: if there are at least two independent activation maps at this point, the activation value of the class c is greater than or equal to the threshold θ(θ ∈[0.5, 0.7]), then the pixel in Magg is considered to belong to category c. If the pixel is not assigned to any category after the fusion mechanism, the pixel is ignored; if the pixel is assigned to multiple categories, the final category cla(x,y) is determined according to formula (10).
其中j为金字塔层数编号,P为金字塔总层数,此处P=3,N为数据集划分的类别数目 (不包含背景类),为像素点u(x,y)在第j层金字塔得到的特征图中关于类别c的激活值。指像素点u(x,y)在P个特征图中关于背景类(类别编号为0)的平均激活值,指像素点u(x,y)在P个特征图中关于类别c的平均激活值,指像素点u(x,y)在P个特征图中关于类别N的平均激活值。index为取索引操作,即取数组中最大值对应的索引序号,在这里也指像素点所属类别,例如数组中第0 个平均激活值最大,则取出的索引值为0,也指其类别属于0。Where j is the number of pyramid layers, P is the total number of layers of the pyramid, where P=3, N is the number of categories divided by the data set (not including the background class), is the activation value of category c in the feature map obtained for the pixel point u(x, y) at the jth layer of the pyramid. Refers to the average activation value of the pixel point u(x,y) in the P feature map with respect to the background class (category number 0), Refers to the average activation value of pixel u(x, y) in P feature maps with respect to category c, Refers to the average activation value of pixel u(x, y) in P feature maps with respect to category N. index is an indexing operation, that is, taking the index number corresponding to the maximum value in the array. Here, it also refers to the category to which the pixel belongs. For example, the 0th in the array has the largest average activation value, and the extracted index value is 0, which also means that its category belongs to 0.
步骤5:ROI区域预测Step 5: ROI Area Prediction
该步骤首先将步骤4.3中得到的融合CAM转换为伪标签,再基于伪标签训练分割网络,最后利用网络进行ROI的预测。This step first converts the fused CAM obtained in step 4.3 into a pseudo-label, then trains the segmentation network based on the pseudo-label, and finally uses the network to predict the ROI.
步骤5.1融合CAM伪标签转换:将融合后的CAM转换为用于分割网络训练的伪二值掩膜。伪二值掩膜中像素点u(x,y)的取值由公式(11)确定。Step 5.1 Fused CAM pseudo-label conversion: Convert the fused CAM to a pseudo-binary mask for segmentation network training. pseudo binary mask The value of the pixel point u(x, y) in is determined by the formula (11).
其中cla(x,y)指像素点u(x,y)所属的类别,cla(x,y)=0则表示该像素点属于非目标类。Among them, cla(x, y) refers to the category to which the pixel point u(x, y) belongs, and cla(x, y)=0 indicates that the pixel point belongs to a non-target category.
步骤5.2分割网络训练预测:基于步骤5.1中获得的伪二值标签训练图像分割网络,本发明选用的分割网络架构为U-Net,最后利用训练好的网络对测试集进行ROI的分割预测。Step 5.2 Segmentation network training prediction: Based on the pseudo-binary label training image segmentation network obtained in step 5.1, the segmentation network architecture selected by the present invention is U-Net, and finally use the trained network to perform ROI segmentation prediction on the test set.
本发明主要针对于小目标图像数据,例如病灶区域较小的医疗影像数据、花粉图像数据等,通过融合由图像金字塔得到的多尺度显著特征,能够加强小目标对象的位置信息及轮廓信息,从而提升弱监督场景下小目标对象定位分割任务的性能。本发明特定实例中所描述的步骤可以被修改,但系统体系结构并不脱离本发明的基本精神。因此,当前的实施例可以看作为示例性的而非限定性的,本发明的范围由所附权利要求而非上述描述定义。The present invention is mainly aimed at small target image data, such as medical image data and pollen image data with small lesion areas. By fusing the multi-scale salient features obtained from the image pyramid, the position information and contour information of small target objects can be enhanced, thereby Improve the performance of small object localization and segmentation tasks in weakly supervised scenarios. The steps described in specific examples of the invention may be modified without departing from the basic spirit of the invention in terms of system architecture. Accordingly, the present embodiments are to be considered as illustrative rather than restrictive, with the scope of the invention being defined by the appended claims rather than the foregoing description.
Claims (2)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211201019.3A CN115546466A (en) | 2022-09-28 | 2022-09-28 | A weakly supervised image object localization method based on multi-scale salient feature fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211201019.3A CN115546466A (en) | 2022-09-28 | 2022-09-28 | A weakly supervised image object localization method based on multi-scale salient feature fusion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115546466A true CN115546466A (en) | 2022-12-30 |
Family
ID=84731704
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211201019.3A Pending CN115546466A (en) | 2022-09-28 | 2022-09-28 | A weakly supervised image object localization method based on multi-scale salient feature fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115546466A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116665095A (en) * | 2023-05-18 | 2023-08-29 | 中国科学院空间应用工程与技术中心 | Method and system for detecting motion ship, storage medium and electronic equipment |
CN117079103A (en) * | 2023-10-16 | 2023-11-17 | 暨南大学 | Pseudo tag generation method and system for neural network training |
-
2022
- 2022-09-28 CN CN202211201019.3A patent/CN115546466A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116665095A (en) * | 2023-05-18 | 2023-08-29 | 中国科学院空间应用工程与技术中心 | Method and system for detecting motion ship, storage medium and electronic equipment |
CN116665095B (en) * | 2023-05-18 | 2023-12-22 | 中国科学院空间应用工程与技术中心 | Method and system for detecting motion ship, storage medium and electronic equipment |
CN117079103A (en) * | 2023-10-16 | 2023-11-17 | 暨南大学 | Pseudo tag generation method and system for neural network training |
CN117079103B (en) * | 2023-10-16 | 2024-01-02 | 暨南大学 | A pseudo-label generation method and system for neural network training |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Cao et al. | A novel attention-guided convolutional network for the detection of abnormal cervical cells in cervical cancer screening | |
CN111027547B (en) | Automatic detection method for multi-scale polymorphic target in two-dimensional image | |
CN111784671B (en) | Pathological image lesion area detection method based on multi-scale deep learning | |
CN111445478B (en) | An automatic detection system and method for intracranial aneurysm area for CTA images | |
WO2024108522A1 (en) | Multi-modal brain tumor image segmentation method based on self-supervised learning | |
CN105551036B (en) | A kind of training method and device of deep learning network | |
CN111931684A (en) | A weak and small target detection method based on discriminative features of video satellite data | |
Han et al. | Automated pathogenesis-based diagnosis of lumbar neural foraminal stenosis via deep multiscale multitask learning | |
CN109035263A (en) | Brain tumor image automatic segmentation method based on convolutional neural networks | |
CN108335303B (en) | A multi-scale palm bone segmentation method applied to palm X-ray films | |
CN104484886B (en) | A kind of dividing method and device of MR images | |
CN112348082B (en) | Deep learning model construction method, image processing method and readable storage medium | |
CN115136189A (en) | Automated detection of tumors based on image processing | |
CN114332572B (en) | Method for extracting breast lesion ultrasonic image multi-scale fusion characteristic parameters based on saliency map-guided hierarchical dense characteristic fusion network | |
CN110648331B (en) | Detection method for medical image segmentation, medical image segmentation method and device | |
CN115546466A (en) | A weakly supervised image object localization method based on multi-scale salient feature fusion | |
CN114600155A (en) | Weakly supervised multitask learning for cell detection and segmentation | |
CN112348059A (en) | Deep learning-based method and system for classifying multiple dyeing pathological images | |
CN115170568B (en) | Automatic segmentation method and system for rectal cancer image and chemoradiotherapy response prediction system | |
CN116309431A (en) | Visual interpretation method based on medical image | |
Sreelekshmi et al. | SwinCNN: An Integrated Swin Trasformer and CNN for improved breast cancer grade classification | |
CN113379691B (en) | A deep learning segmentation method of breast lesions based on prior guidance | |
CN118116576B (en) | Intelligent case analysis method and system based on deep learning | |
Kalyani et al. | Deep learning-based detection and classification of adenocarcinoma cell nuclei | |
CN112614092A (en) | Spine detection method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |