CN111340807A - Method, system, electronic device and storage medium for extracting core data of lesion location - Google Patents
Method, system, electronic device and storage medium for extracting core data of lesion location Download PDFInfo
- Publication number
- CN111340807A CN111340807A CN202010413451.3A CN202010413451A CN111340807A CN 111340807 A CN111340807 A CN 111340807A CN 202010413451 A CN202010413451 A CN 202010413451A CN 111340807 A CN111340807 A CN 111340807A
- Authority
- CN
- China
- Prior art keywords
- image
- data
- core
- information entropy
- core data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 230000003902 lesion Effects 0.000 title claims description 43
- 238000013075 data extraction Methods 0.000 claims abstract description 10
- 230000008569 process Effects 0.000 claims abstract description 9
- 230000007170 pathology Effects 0.000 claims abstract description 4
- 238000004364 calculation method Methods 0.000 claims description 47
- 238000013145 classification model Methods 0.000 claims description 30
- 230000004807 localization Effects 0.000 claims description 15
- 238000012549 training Methods 0.000 claims description 14
- 238000005457 optimization Methods 0.000 claims description 13
- 230000004927 fusion Effects 0.000 claims description 12
- 230000001575 pathological effect Effects 0.000 claims description 12
- 238000011156 evaluation Methods 0.000 claims description 10
- 239000011159 matrix material Substances 0.000 claims description 8
- 230000009466 transformation Effects 0.000 claims description 7
- 230000015572 biosynthetic process Effects 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000003786 synthesis reaction Methods 0.000 claims description 3
- 230000002194 synthesizing effect Effects 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims 2
- 238000012163 sequencing technique Methods 0.000 claims 2
- 238000002372 labelling Methods 0.000 abstract description 21
- 239000000284 extract Substances 0.000 abstract description 13
- 238000000605 extraction Methods 0.000 abstract description 8
- 238000003745 diagnosis Methods 0.000 abstract description 5
- 238000002474 experimental method Methods 0.000 abstract description 2
- 230000007246 mechanism Effects 0.000 abstract description 2
- 238000001514 detection method Methods 0.000 description 14
- 238000013135 deep learning Methods 0.000 description 11
- 238000005516 engineering process Methods 0.000 description 7
- 238000011161 development Methods 0.000 description 6
- 230000018109 developmental process Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 201000010099 disease Diseases 0.000 description 5
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 5
- 238000011160 research Methods 0.000 description 3
- 206010012689 Diabetic retinopathy Diseases 0.000 description 1
- 208000009119 Giant Axonal Neuropathy Diseases 0.000 description 1
- 230000003187 abdominal effect Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 201000003382 giant axonal neuropathy 1 Diseases 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000013441 quality evaluation Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0012—Biomedical image inspection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H30/00—ICT specially adapted for the handling or processing of medical images
- G16H30/40—ICT specially adapted for the handling or processing of medical images for processing medical images, e.g. editing
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Radiology & Medical Imaging (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Epidemiology (AREA)
- Primary Health Care (AREA)
- Public Health (AREA)
- Image Analysis (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
Description
技术领域technical field
本发明涉及智慧医疗领域,特别是一种基于主动学习的病灶定位核心数据提取方法、系统、电子设备及存储介质。The invention relates to the field of smart medical treatment, in particular to a method, system, electronic device and storage medium for extracting core data of focus location based on active learning.
背景技术Background technique
近年来,人工智能在理论和技术上日益成熟,给人类日常生活带来许多便利,其中智能医疗的发展非常迅速,如Google提出的基于深度学习的算法,能够对糖尿病视网膜病变迹象做出解释;Ni借助深度学习在腹部器官分割取得高准确率等等。这些深度学习的技术可以辅助医生判别疾病,极大的减轻了医生的负担,也有利于医生做出更加准确的诊断。这些研究体现了深度学习技术在医疗图像分析上的有效性,但是目前对疾病智能诊断的研究大多都集中在疾病的识别上,而在医学领域,病灶定位信息可以帮助医生做出更好的诊断,并且对于大多数疾病的治疗而言,获得病灶部分位置信息是不可或缺的。目前,病灶定位的主要方法是依靠医生判断,这不仅大大增加了医生的工作量,而且医生在疲劳等情况下对病灶位置判断容易失误,耽误治疗。因此借助基于深度学习的目标检测技术完成对病灶的定位,可以使智能医疗更加准确和全面,从而更好地辅助医生工作。In recent years, artificial intelligence has become increasingly mature in theory and technology, bringing many conveniences to human daily life. Among them, the development of intelligent medical care is very rapid. For example, the algorithm based on deep learning proposed by Google can explain the signs of diabetic retinopathy; Ni uses deep learning to achieve high accuracy in abdominal organ segmentation and more. These deep learning technologies can assist doctors in identifying diseases, greatly reducing the burden on doctors and helping doctors make more accurate diagnoses. These studies reflect the effectiveness of deep learning technology in medical image analysis, but most of the current research on intelligent disease diagnosis focuses on disease identification, and in the medical field, lesion location information can help doctors make better diagnoses , and for the treatment of most diseases, obtaining the location information of the lesion part is indispensable. At present, the main method of locating the lesion is to rely on the doctor's judgment, which not only greatly increases the workload of the doctor, but also makes mistakes in judging the location of the lesion when the doctor is fatigued, which delays treatment. Therefore, with the help of deep learning-based target detection technology to complete the localization of lesions, intelligent medical treatment can be more accurate and comprehensive, so as to better assist doctors in their work.
深度学习本质是深层网络自动从大量的数据中提取特征,数据是网络学习的保证,数据的质量和数量影响着网络的性能。并且深度学习在医疗领域中图像识别和目标检测的发展很大程度上依赖于全监督学习的方法,这需要大量的强标注数据去学习。然而在大数据医疗时代中,虽然存在大量的医疗影像数据,如X光片、CT图像等,但是这些数据中会包含一些较为劣质的图像拥有低分辨率、多噪声等问题。除了质量问题外,另一个重要的问题是这些图像大多没有标注,虽然大部分医疗图像可能具有病种标签,但是病灶位置的标注几乎没有,所以深度学习中面向医疗图影像的目标检测发展受限。而目标检测又是智慧医疗辅助技术中必不可少的一部分,所以如何解决医疗领域目标检测中这个障碍非常重要。上述两个问题的最直接的方法就是将所有无标注的数据进行标注,对网络进行训练。但是病灶标注需以专业为导向的医疗知识和技能,费时费力和成本昂贵,所以面对海量的无病灶标注的医疗影像数据,这种方法实现较困难,并且可能存在质量不良的图像,影响网络的训练。可见,使用深度学习技术训练目标检测模型进行病灶定位需要大量的标注数据,现有数据质量不一,且大都无标注。需要专业人士挑选和标注数据,增加了医生工作量,费时费力,人工成本昂贵。The essence of deep learning is that the deep network automatically extracts features from a large amount of data. Data is the guarantee of network learning. The quality and quantity of data affect the performance of the network. And the development of deep learning in image recognition and target detection in the medical field largely relies on fully supervised learning methods, which require a large amount of strongly labeled data to learn. However, in the era of big data medical treatment, although there is a large amount of medical image data, such as X-ray films, CT images, etc., these data will contain some relatively inferior images with low resolution and high noise. In addition to the quality problem, another important problem is that most of these images are not labeled. Although most medical images may have disease labels, there are almost no labeling of lesion locations, so the development of target detection for medical images in deep learning is limited. . And target detection is an indispensable part of smart medical assistance technology, so how to solve this obstacle in the medical field target detection is very important. The most direct way to solve the above two problems is to label all unlabeled data and train the network. However, lesion labeling requires professional-oriented medical knowledge and skills, which is time-consuming, labor-intensive and expensive. Therefore, in the face of massive medical image data without lesion labeling, this method is difficult to implement, and there may be images of poor quality that affect the network. training. It can be seen that the use of deep learning technology to train target detection models for lesion localization requires a large amount of labeled data. The quality of existing data varies, and most of them are not labeled. Professionals are required to select and label data, which increases the workload of doctors, is time-consuming, labor-intensive, and expensive.
针对深度学习中这种由于标注成本过高而没有获得强标注的情况,半监督学习和主动学习两种主流的弱监督学习解决方法被提出。半监督学习的重点是使用简单易用的注释进行学习。它通过用计算机进行自动或半自动标注,没有人类专家参与。虽然该方法降低了标注代价,但其标注结果过分依赖由初始部分有标注的数据训练出来的模型, 因此并不能保证标注结果的准确率。主动学习侧重于减少所需标注的样本数,主动学习通过查询函数,选择出最有价值的未标注数据给专家标注,通过较少的含标注的核心数据集训练目标模型。专家参与标注解决了上述中过于依赖基准模型的问题,效果更加稳定,更适合于医学领域。但是目前大部分主动学习的研究是面向图像分类,只有少数研究面向目标检测,但是要求初始数据集包含部分有标注信息的数据,且没有应用于医疗领域。总之,现有面向目标检测的主动学习方法,均需要初始数据集中包含部分已有标注信息的数据,且需要多次与专家交互,不适用于医疗领域的实际情况。目前还没有面向医疗领域的合适方法,能够在减少训练数据的同时,保持模型准确率,实现病灶定位,辅助医生诊断。Aiming at the situation in deep learning that strong annotation is not obtained due to the high cost of annotation, two mainstream weakly supervised learning solutions, semi-supervised learning and active learning, are proposed. Semi-supervised learning focuses on learning with easy-to-use annotations. It does automatic or semi-automatic labeling with computers without the involvement of human experts. Although this method reduces the labeling cost, its labeling results rely too much on the model trained from the initial labeled data, so the accuracy of the labeling results cannot be guaranteed. Active learning focuses on reducing the number of samples that need to be labeled. Active learning selects the most valuable unlabeled data to be labeled by experts through query functions, and trains the target model through fewer labeled core data sets. Experts participate in the annotation to solve the above problem of relying too much on the benchmark model, the effect is more stable, and it is more suitable for the medical field. However, most of the current active learning research is oriented to image classification, and only a few researches are oriented to target detection, but the initial data set is required to contain some data with labeled information, and it is not used in the medical field. In a word, the existing active learning methods for object detection all require that the initial data set contains some data with annotated information, and need to interact with experts many times, which is not suitable for the actual situation in the medical field. At present, there is no suitable method for the medical field, which can reduce the training data while maintaining the accuracy of the model, realize the localization of lesions, and assist doctors in diagnosis.
发明内容SUMMARY OF THE INVENTION
本发明所要解决的技术问题是,针对现有技术不足,提供一种病灶定位核心数据提取方法、系统、电子设备及存储介质,从无任何病灶标注信息的医疗图像数据中提取核心数据,解决智能医疗中由于数据大量无标注且质量不一造成的病灶定位难的问题。The technical problem to be solved by the present invention is to provide a method, system, electronic device and storage medium for extracting core data of lesion location, which can extract core data from medical image data without any lesion labeling information, and solve the problem of intelligent In medical treatment, it is difficult to locate the lesions due to the large amount of unlabeled data and the different quality.
为解决上述技术问题,本发明所采用的技术方案是:一种病灶定位核心数据提取方法,包括以下步骤:In order to solve the above-mentioned technical problems, the technical solution adopted in the present invention is: a method for extracting core data of lesion location, comprising the following steps:
S1、对于医疗图像数据集中的任一个图像,计算并融合该图像的信息熵、对比度值、inception score值,计算该图像的核心度;S1. For any image in the medical image dataset, calculate and fuse the information entropy, contrast value, and inception score value of the image to calculate the core degree of the image;
S2、将医疗图像数据集中所有图像按照核心度进行降序排列,提取核心度排名前k的图像作为核心数据。S2. Arrange all the images in the medical image dataset in descending order according to the core degree, and extract the top k images of the core degree as the core data.
S3、利用上一批核心数据和无病理医疗数据,优化所述信息熵;S3, using the last batch of core data and non-pathological medical data to optimize the information entropy;
S4、重复步骤S1~S3,提取出合适数量的核心数据。S4. Repeat steps S1 to S3 to extract an appropriate amount of core data.
本发明的方法无需大量病灶标注数据,解决了现有技术中数据质量不一且病灶标注数据稀缺,标注麻烦,代价昂贵的问题,能在初始没有任何病灶标注信息的医疗图像数据中提取核心数据。考虑图像在不同评价指标下的重要性并保留数值的特征,本发明设计均值融合算法,对各评价指标计算均值,即利用下式计算医疗图像数据集中图像i的核心度:The method of the invention does not require a large amount of lesion labeling data, solves the problems in the prior art that the data quality is different, the lesion labeling data is scarce, the labeling is troublesome, and the cost is expensive, and the core data can be extracted from the medical image data without any lesion labeling information initially. . Considering the importance of images under different evaluation indicators and retaining the characteristics of numerical values, the present invention designs a mean value fusion algorithm, and calculates the mean value for each evaluation index, that is, the core degree of image i in the medical image data set is calculated by the following formula :
; ;
其中,r表示同一评价指标下图像的数量;分别表示图像i的信息熵值、对比度值以及inception score值;i的取值范围为1,2,…,r;分别表示图像j的信息熵值、对比度值以及inception score值;j的取值范围为1,2,…,r。Among them, r represents the number of images under the same evaluation index; Represents the information entropy value, contrast value and inception score value of image i respectively; the value range of i is 1, 2, ..., r; Represent the information entropy value, contrast value and inception score value of image j, respectively; the value range of j is 1, 2, ..., r.
本发明信息熵优化过程为:利用上一批核心数据和无病理医疗数据对所述分类模型进行微调,即固定分类模型的前面层的参数权重,使用提取的上一批核心数据和无病理医疗数据训练调整分类模型最后一层的权重,并使用微调后的分类模型替换原有分类模型,从而优化所述信息熵。The information entropy optimization process of the present invention is as follows: using the last batch of core data and non-pathological medical data to fine-tune the classification model, that is, fixing the parameter weights of the front layer of the classification model, using the extracted last batch of core data and non-pathological medical data The data training adjusts the weight of the last layer of the classification model, and replaces the original classification model with the fine-tuned classification model, thereby optimizing the information entropy.
步骤S1中,图像i的信息熵值计算过程包括:对图像i通过使用拥有预训练权重的分类模型进行学习和计算,得到图像i的信息熵,得到图像i的信息熵,计算公式如下:In step S1, the calculation process of the information entropy value of the image i includes: learning and calculating the image i by using a classification model with pre-trained weights to obtain the information entropy of the image i, and obtain the information entropy of the image i. ,Calculated as follows:
; ;
N为分类模型的输出类别数,i表示无标注数据集中正样本集的第i张图像,表示分类模型预测图像i是类别c的置信度。N is the number of output categories of the classification model, i is the ith image of the positive sample set in the unlabeled dataset, Indicates the confidence that the classification model predicts that image i is of class c.
步骤S1中,计算图像i的对比度值之前,对所述图像i进行GLCM(灰度共生矩阵)转化。基于灰度共生矩阵的对比度可以更加准确的表示图像的纹理特征。纹理沟纹越深,其对比度越大,图像越清晰,越利于模型学习特征。In step S1, before calculating the contrast value of the image i, GLCM (Gray Level Co-occurrence Matrix) transformation is performed on the image i. The contrast based on the gray level co-occurrence matrix can more accurately represent the texture features of the image. The deeper the texture groove, the greater the contrast and the clearer the image, which is more conducive to the model learning features.
步骤S1中,图像i的inception score值的计算过程包括:对图像i进行切割处理,得到n×n个子块;计算每个子块的inception score,合成所述图像i所有子块的inceptionscore,得到图像i的inception score。精确计算图像的inception score。In step S1, the calculation process of the inception score value of the image i includes: cutting the image i to obtain n×n sub-blocks; calculating the inception score of each sub-block, synthesizing the inception scores of all the sub-blocks of the image i, and obtaining the image i's inception score. Accurately calculate the inception score of the image.
对应上述方法,本发明还提供了一种基于主动学习的病灶定位核心数据提取系统,包括:Corresponding to the above method, the present invention also provides a core data extraction system for focus location based on active learning, including:
信息熵计算模块,用于计算医疗图像数据集中所有图像的信息熵;The information entropy calculation module is used to calculate the information entropy of all images in the medical image dataset;
对比度值计算模块,用于计算医疗图像数据集中所有图像的对比度值;The contrast value calculation module is used to calculate the contrast value of all images in the medical image dataset;
inception score值计算模块,用于计算医疗图像数据集中所有图像的inceptionscore值;The inception score value calculation module is used to calculate the inceptionscore value of all images in the medical image dataset;
融合模块,用于根据每个图像的信息熵、对比度值、inception score值,计算每个图像的核心度;The fusion module is used to calculate the core degree of each image according to the information entropy, contrast value and inception score value of each image;
排序模块,用于将医疗图像数据集中所有图像按照核心度进行降序排列,提取核心度排名前k的图像作为一批核心数据。The sorting module is used to sort all the images in the medical image dataset in descending order according to the core degree, and extract the top k images of the core degree as a batch of core data.
优化循环模块,用于优化信息熵计算模块,并循环提取核心数据,直至提取出适量核心数据。The optimization loop module is used to optimize the information entropy calculation module, and extract core data in a loop until an appropriate amount of core data is extracted.
所述信息熵计算模块对图像i通过使用拥有预训练权重的分类模型进行学习和计算,得到图像i的信息熵。本发明的所述对比度值计算模块包括:The information entropy calculation module learns and calculates the image i by using a classification model with pre-trained weights to obtain the information entropy of the image i. The contrast value calculation module of the present invention includes:
转化单元,用于对医疗图像数据集中的图像进行GLCM转化;A transformation unit, used to perform GLCM transformation on the images in the medical image dataset;
计算单元,用于计算转化后的图像的对比度值。The calculation unit is used to calculate the contrast value of the transformed image.
本发明的所述inception score值计算模块包括:The inception score value calculation module of the present invention includes:
切割单元,用于对所述医疗图像数据集中的图像进行切割处理,将每个图像切割为n×n个子块;a cutting unit, configured to perform cutting processing on the images in the medical image data set, and cut each image into n×n sub-blocks;
inception score计算单元,用于计算每个子块的inception score;The inception score calculation unit is used to calculate the inception score of each sub-block;
合成单元,用于合成n×n个子块的inception score,得到图像的inception score。The synthesis unit is used to synthesize the inception score of n × n sub-blocks to obtain the inception score of the image.
本发明的所述优化循环模块包括:The optimized cycle module of the present invention includes:
优化单元,用于对信息熵计算模块中分类模型利用迁移学习进行微调,即固定网络的前面层的参数权重,只用这些数据训练调整最后几层的权重。并将微调后的模型对选择模块中的原有模型进行替换。The optimization unit is used to fine-tune the classification model in the information entropy calculation module using migration learning, that is, to fix the parameter weights of the previous layers of the network, and only use these data to train and adjust the weights of the last layers. And the fine-tuned model replaces the original model in the selection module.
循环单元,循环执行信息熵计算模块,对比度值计算模块,inception score值计算模块,融合模块,排序模块以及优化循环模块中的优化单元的操作,直至提取出合适数量的核心数据。The loop unit performs the operations of the information entropy calculation module, the contrast value calculation module, the inception score value calculation module, the fusion module, the sorting module and the optimization unit in the optimization loop module in a loop, until an appropriate amount of core data is extracted.
作为一个发明构思,本发明还提供了一种用于提取病灶定位核心数据的电子设备,其包括处理器;所述处理器用于执行上述方法。As an inventive concept, the present invention also provides an electronic device for extracting core data of lesion location, which includes a processor; the processor is used for executing the above method.
优选地,为了便于采集数据,本发明的电子设备还包括数据采集模块,用于采集医疗图像,并将所述病灶图像传送至所述处理器。Preferably, in order to facilitate data acquisition, the electronic device of the present invention further includes a data acquisition module for acquiring medical images and transmitting the lesion images to the processor.
作为一个发明构思,本发明还提供了一种计算机存储介质,其存储有程序;该程序用于执行上述方法。As an inventive concept, the present invention also provides a computer storage medium storing a program; the program is used to execute the above method.
与现有技术相比,本发明所具有的有益效果为:Compared with the prior art, the present invention has the following beneficial effects:
1、本发明解决了现阶段智能医疗中病灶定位的模型训练依赖大量病灶标注数据,而数据质量不一且病灶标注数据稀缺,标注麻烦,代价昂贵的问题,能在初始没有任何病灶标注信息的医疗图像数据中提取核心数据,在降低医生标注负担和交互次数的同时,利用本发明提取出的核心数据可以训练得到一个有效定位病灶的目标检测模型,以辅助医生诊断,减轻医生工作压力,推动智慧医疗中目标检测的发展;1. The present invention solves the problems that the model training of lesion location in the current stage of intelligent medical care relies on a large amount of lesion labeling data, and the data quality is different and the lesion labeling data is scarce, which is troublesome and expensive to label. Extracting core data from medical image data can reduce the burden of labeling and the number of interactions for doctors, and use the core data extracted by the present invention to train a target detection model for effectively locating lesions, so as to assist doctors in diagnosis, reduce the work pressure of doctors, and promote The development of target detection in smart medicine;
2、本发明在不断提取核心数据的同时,不断优化其提取机制,使得本发明的提取性能不断提升。经过实验证明,本发明的实用性高,能够减轻大量的数据标注的负担,有效辅助医生诊断。2. The present invention continuously optimizes its extraction mechanism while continuously extracting core data, so that the extraction performance of the present invention is continuously improved. Experiments have proved that the present invention has high practicability, can reduce the burden of labeling a large amount of data, and effectively assist doctors in diagnosis.
附图说明Description of drawings
图1为本发明方法流程图;Fig. 1 is the flow chart of the method of the present invention;
图2为本发明体系结构图;Fig. 2 is the system structure diagram of the present invention;
图3为本发明选择模块中Inception v3网络结构示意图;其中,(a)Inception v3中模块一;(b)Inception v3中模块二;(c)Inception v3中模块三;(d)Inception v3模型训练的网络结构;Figure 3 is a schematic diagram of the network structure of Inception v3 in the selection module of the present invention; wherein, (a) module one in Inception v3; (b) module two in Inception v3; (c) module three in Inception v3; (d) Inception v3 model training the network structure;
图4为本发明实施例对比度值计算模块模块结构示意图;4 is a schematic structural diagram of a contrast value calculation module module according to an embodiment of the present invention;
图5为本发明实施例inception score值计算模块结构示意图;5 is a schematic structural diagram of an inception score value calculation module according to an embodiment of the present invention;
图6为本发明实施例优化循环模块结构示意图。FIG. 6 is a schematic structural diagram of an optimized circulation module according to an embodiment of the present invention.
具体实施方式Detailed ways
本发明采用主动学习的思想,设计了核心数据提取方法,能够从无任何病灶标注信息的医疗图像数据中提取核心数据,用于病灶定位目标检测模型的训练,能够解决智能医疗中由于数据大量无标注且质量不一造成的病灶定位发展阻碍。The invention adopts the idea of active learning and designs a core data extraction method, which can extract core data from medical image data without any lesion labeling information, which can be used for the training of the detection model of lesion localization target, and can solve the problem of the large amount of data in intelligent medical care. The development of lesion localization is hindered by the uneven quality of labeling.
智慧医疗中病灶定位的发展依赖基于深度学习的目标检测技术,需要大量的标注数据。而医疗大数据中,医疗图像质量不一,有些存在噪声等质量问题。此外大多数图像数据均无病灶位置标注,且标注需要医疗知识,费时费力,获取成本太高。本发明将通过设计的选择模块(即本发明的提取系统)从无病灶标注的数据中提取核心数据。为了准确的提取核心数据,本发明选择模块包含了三个图像评价指标对图像进行评价。通过考虑图像本身质量选用图像质量评价指标如基于灰度共生矩阵(GLCM)对比度等对图像进行计算。考虑模型训练计算图像信息熵,综合目标检测模型的训练和图像数据集本身考虑,引入Inceptionscore。它是一项经常用于评价GAN生成图像的指标,用来评价生成图像的清晰度和多样性。一般是对一组图像进行判别,其值越大表示图像越清晰和越多样。本发明设计有效的融合算法进行融合得到图像核心度,因为考虑图像在不同评价指标下的重要性并保留数值的特征,本发明设计均值融合算法,对各评价指标计算均值,用均值处理原始数据,最后进行融合得到图像核心度,其公式如下:The development of lesion localization in smart medical care relies on deep learning-based target detection technology, which requires a large amount of labeled data. In medical big data, the quality of medical images varies, and some have quality problems such as noise. In addition, most image data have no lesion location labeling, and labeling requires medical knowledge, which is time-consuming and labor-intensive, and the acquisition cost is too high. The present invention will extract the core data from the data without lesion annotation through the designed selection module (ie, the extraction system of the present invention). In order to accurately extract the core data, the selection module of the present invention includes three image evaluation indicators to evaluate the image. The image is calculated by considering the quality of the image itself and selecting image quality evaluation indicators such as contrast based on the gray level co-occurrence matrix (GLCM). Considering the model training to calculate the image information entropy, the training of the target detection model and the image data set itself are considered, and the Inceptionscore is introduced. It is a metric often used to evaluate the images generated by GANs to evaluate the clarity and variety of the generated images. Generally, a group of images is discriminated, and the larger the value, the clearer and more diverse the image. The present invention designs an effective fusion algorithm for fusion to obtain image coreness. Considering the importance of images under different evaluation indicators and retaining the characteristics of numerical values, the present invention designs a mean value fusion algorithm, calculates the mean value for each evaluation index, and uses the mean value to process the original data , and finally fuse to get the image core degree, the formula is as follows:
; ;
其中r表示同一评价指标下图像的数量,分别表示图像i的信息熵值,对比度值以及inception score的值;i的取值范围为1,2,…,r;分别表示图像j的信息熵值、对比度值以及inception score值;j的取值范围为1,2,…,r。where r represents the number of images under the same evaluation index, Represent the information entropy value, contrast value and inception score value of image i respectively; the value range of i is 1, 2, ..., r; Represent the information entropy value, contrast value and inception score value of image j, respectively; the value range of j is 1, 2, ..., r.
本发明的方法主要包括以下三个阶段:第一阶段通过选择模块从无标注的数据集中提取核心数据;第二阶段通过提取出来的核心数据对选择模块进行优化;第三阶段重复上述两步得到合适数量的核心数据,最后交给人类专家标注具体的病理位置。The method of the present invention mainly includes the following three stages: the first stage extracts core data from the unlabeled data set through the selection module; the second stage optimizes the selection module through the extracted core data; the third stage repeats the above two steps to obtain An appropriate amount of core data is finally handed over to human experts to label specific pathological locations.
如图1,本发明根据核心度提取核心数据,上述过程其具体步骤如下:As shown in Figure 1, the present invention extracts core data according to the core degree, and the specific steps of the above-mentioned process are as follows:
第一步:首先对图像进行GLCM转化,并计算其对比度值。其值反映了图像的清晰度和纹理沟纹深浅的程度。纹理沟纹越深,其对比度越大,图像越清晰。Step 1: First perform GLCM transformation on the image and calculate its contrast value. Its value reflects the sharpness of the image and the depth of the texture grooves. The deeper the texture groove, the greater its contrast and the sharper the image.
第二步:对图像通过使用拥有预训练权重的分类模型,如有ImageNet预训练权重的Inception v3进行学习和计算得到图像的信息熵。Step 2: Learn and calculate the information entropy of the image by using a classification model with pre-trained weights, such as Inception v3 with ImageNet pre-trained weights.
第三步:在这里我们对图像进行切割处理,得到n×n个块(例如,n=5)。针对图像的每块进行inception score的计算,最后将该图像所有块的inception score进行合成,得到该图像inception score。Step 3: Here we cut the image to get n×n blocks (for example, n=5). Calculate the inception score for each block of the image, and finally synthesize the inception scores of all blocks of the image to obtain the inception score of the image.
第四步:使用设计好的融合算法将三组图像评价指标的值进行融合,得到图像的一个具有综合评价代表性的核心度。Step 4: Use the designed fusion algorithm to fuse the values of the three groups of image evaluation indicators to obtain a core degree of the image that is representative of comprehensive evaluation.
第五步:对图像的核心度进行降序排列,提取核心度排名前k的图像作为一批核心数据。Step 5: Rank the images in descending order by their coreness, and extract the top k images with coreness as a batch of core data.
为了使核心数据的提取更加精准,本发明采用的是批量提取,并且利用提取的核心数据不断优化选择模块(即优化第二步计算的信息熵,也即优化信息熵计算模块),步骤如下:In order to make the extraction of core data more accurate, the present invention adopts batch extraction, and uses the extracted core data to continuously optimize the selection module (that is, optimize the information entropy calculated in the second step, that is, optimize the information entropy calculation module), and the steps are as follows:
第一步:将上一批的核心数据与未标注池中无病理的正常图像对选择模块中具有预权重的Inception v3利用迁移学习进行微调(例如迭代1000轮),即维持网络的前面层的初始权重不变,只用这些数据训练调整最后一层的权重。Step 1: The core data of the previous batch and the normal images without pathology in the unlabeled pool are used to fine-tune the Inception v3 with pre-weight in the selection module (for example, iterate 1000 rounds), that is, maintain the previous layer of the network. The initial weights are unchanged, and only these data are used to train and adjust the weights of the last layer.
第二步:将微调后的模型对选择模块中的原有模型进行替换。Step 2: Replace the fine-tuned model with the original model in the selection module.
重复上述两个阶段,直至挑选出适量的核心数据。最后将其交给医生专家,由医生专家对这些核心医疗图像进行具体病灶位置信息的标注。这些核心数据,可以用于定位病灶的目标检测模型的训练,得到一个优秀的病灶定位模型。Repeat the above two stages until the right amount of core data is picked. Finally, it is handed over to doctors and experts, who will label these core medical images with specific lesion location information. These core data can be used for the training of the target detection model for locating lesions, and an excellent lesion localization model is obtained.
本发明的体系结构如图2,主要由四个部分组成:(1)无标注的数据集池。无标注的数据集中存在的是大量低成本易获得没有病理位置标注的病理图像,包含少量无病理的正常医疗图像。(2)选择模块。选择模块包括信息熵计算模块、对比度值计算模块、inceptionscore值计算模块、融合模块和排序模块。(3)专家标注。医生专家对最后挑选出来的核心数据集进行标注病灶位置。(4)迭代更新的(有标注)核心数据集池。最后整个核心数据集通过专家标注后得到有标注的核心数据集。The architecture of the present invention is shown in Figure 2, which is mainly composed of four parts: (1) Unlabeled dataset pool. There are a large number of low-cost and easy-to-obtain pathological images without pathological location annotations in the unlabeled dataset, including a small number of normal medical images without pathology. (2) Select the module. The selection module includes an information entropy calculation module, a contrast value calculation module, an inceptionscore value calculation module, a fusion module and a sorting module. (3) Expert annotation. Doctors and experts annotate the location of the lesions on the core data set finally selected. (4) Iteratively updated (annotated) core dataset pool. Finally, the entire core data set is annotated by experts to obtain an annotated core data set.
从图2可以看出,本发明实施例的提取系统包括以下模块:As can be seen from FIG. 2, the extraction system of the embodiment of the present invention includes the following modules:
信息熵计算模块,用于计算医疗图像数据集中所有图像的信息熵;The information entropy calculation module is used to calculate the information entropy of all images in the medical image dataset;
对比度值计算模块,用于计算医疗图像数据集中所有图像的基于灰度共生矩阵的对比度值;The contrast value calculation module is used to calculate the contrast value based on the gray level co-occurrence matrix of all the images in the medical image dataset;
inception score值计算模块,用于计算计算医疗图像数据集中所有图像的inceptionscore值;The inception score value calculation module is used to calculate the inceptionscore value of all images in the medical image dataset;
融合模块,用于根据每个图像的信息熵、对比度值、inception score值,计算每个图像的核心度;The fusion module is used to calculate the core degree of each image according to the information entropy, contrast value and inception score value of each image;
排序模块,用于将医疗图像数据集中所有图像按照核心度进行降序排列,提取核心度排名前k的图像作为一批核心数据。The sorting module is used to sort all the images in the medical image dataset in descending order according to the core degree, and extract the top k images of the core degree as a batch of core data.
优化循环模块,用于优化信息熵模块,并循环提取一批核心数据过程,直至提取出适量核心数据集;The optimization loop module is used to optimize the information entropy module, and loop through the process of extracting a batch of core data until an appropriate amount of core data sets are extracted;
上述信息熵计算模块对图像i通过使用拥有预训练权重的分类模型进行学习和计算,得到图像i的信息熵。The above information entropy calculation module learns and calculates the image i by using a classification model with pre-trained weights to obtain the information entropy of the image i.
如图3,信息熵计算模块中的分类模型非常重要,Inception v3在分类领域有着良好的性能,因此本发明使用了Inception v3对图像进行学习和计算信息熵。Inception v3模型有46层,输入图像通过卷积层(Conv),池化层(Pool),全连接层(FC)得到图像分类的置信度。另外Inception v3模型中有3类Inception模块,分别为:两个连续的3×3卷积核的inception模块;将n×n卷积分解为连续的n×1和1×n卷积核的inception模块;将n×n卷积分解为并列的n×1和1×n卷积核的inception模块。As shown in Figure 3, the classification model in the information entropy calculation module is very important, and Inception v3 has good performance in the classification field. Therefore, the present invention uses Inception v3 to learn images and calculate information entropy. The Inception v3 model has 46 layers, and the input image passes through the convolution layer (Conv), the pooling layer (Pool), and the fully connected layer (FC) to obtain the confidence of image classification. In addition, there are 3 types of Inception modules in the Inception v3 model, namely: inception modules with two consecutive 3×3 convolution kernels; inception modules that decompose n×n convolutions into consecutive n×1 and 1×n convolution kernels Module; an inception module that decomposes an n×n convolution into side-by-side n×1 and 1×n convolution kernels.
如图4所示,对比度值计算模块包括:As shown in Figure 4, the contrast value calculation module includes:
转化单元,用于对医疗图像数据集中的图像进行GLCM转化;A transformation unit, used to perform GLCM transformation on the images in the medical image dataset;
计算单元,用于计算转化后的图像的对比度值。The calculation unit is used to calculate the contrast value of the transformed image.
如图5所示,inception score值计算模块包括:As shown in Figure 5, the inception score value calculation module includes:
切割单元,用于对所述医疗图像数据集中的图像进行切割处理,将每个图像切割为n×n个子块;a cutting unit, configured to perform cutting processing on the images in the medical image data set, and cut each image into n×n sub-blocks;
inception score计算单元,用于计算每个子块的inception score;The inception score calculation unit is used to calculate the inception score of each sub-block;
合成单元,用于合成n×n个子块的inception score,得到图像的inception score。The synthesis unit is used to synthesize the inception score of n × n sub-blocks to obtain the inception score of the image.
如图6所示,优化循环模块包括:As shown in Figure 6, the optimization loop module includes:
优化单元,利用排序模块提取的上一批核心数据和无病理医疗数据对信息熵计算模块中的分类模型进行微调,即固定分类模型的前面层的参数权重,使用提取的上一批核心数据和无病理医疗数据训练调整最后一层的权重,并使用微调后的分类模型替换原有分类模型;The optimization unit uses the last batch of core data and non-pathological medical data extracted by the sorting module to fine-tune the classification model in the information entropy calculation module, that is, to fix the parameter weights of the previous layers of the classification model, and use the extracted last batch of core data and No pathological medical data training to adjust the weight of the last layer, and use the fine-tuned classification model to replace the original classification model;
循环单元,循环执行信息熵计算模块,对比度值计算模块,inception score值计算模块,融合模块,排序模块以及优化循环模块的操作,直至提取出合适数量的核心数据。The loop unit performs the operations of the information entropy calculation module, the contrast value calculation module, the inception score value calculation module, the fusion module, the sorting module and the optimization loop module in a loop until an appropriate amount of core data is extracted.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010413451.3A CN111340807B (en) | 2020-05-15 | 2020-05-15 | Method, system, electronic device and storage medium for extracting core data of lesion location |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010413451.3A CN111340807B (en) | 2020-05-15 | 2020-05-15 | Method, system, electronic device and storage medium for extracting core data of lesion location |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111340807A true CN111340807A (en) | 2020-06-26 |
CN111340807B CN111340807B (en) | 2020-09-11 |
Family
ID=71186448
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010413451.3A Active CN111340807B (en) | 2020-05-15 | 2020-05-15 | Method, system, electronic device and storage medium for extracting core data of lesion location |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111340807B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112257812A (en) * | 2020-11-12 | 2021-01-22 | 四川云从天府人工智能科技有限公司 | Method and device for determining labeled sample, machine readable medium and equipment |
CN113962976A (en) * | 2021-01-20 | 2022-01-21 | 赛维森(广州)医疗科技服务有限公司 | Quality evaluation method for pathological slide digital image |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101345891A (en) * | 2008-08-25 | 2009-01-14 | 重庆医科大学 | No-reference image quality assessment method based on information entropy and contrast |
CN103871054A (en) * | 2014-02-27 | 2014-06-18 | 华中科技大学 | Combined index-based image segmentation result quantitative evaluation method |
CN104104943A (en) * | 2013-04-10 | 2014-10-15 | 江南大学 | No-reference JPEG2000 compressed image quality evaluation method based on generalized regression neural network |
JP5710408B2 (en) * | 2011-07-19 | 2015-04-30 | 国立大学法人京都大学 | Noodle crack detection device, crack detection method and sorting system |
CN110599447A (en) * | 2019-07-29 | 2019-12-20 | 广州市番禺区中心医院(广州市番禺区人民医院、广州市番禺区心血管疾病研究所) | Method, system and storage medium for processing liver cancer focus data |
-
2020
- 2020-05-15 CN CN202010413451.3A patent/CN111340807B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101345891A (en) * | 2008-08-25 | 2009-01-14 | 重庆医科大学 | No-reference image quality assessment method based on information entropy and contrast |
JP5710408B2 (en) * | 2011-07-19 | 2015-04-30 | 国立大学法人京都大学 | Noodle crack detection device, crack detection method and sorting system |
CN104104943A (en) * | 2013-04-10 | 2014-10-15 | 江南大学 | No-reference JPEG2000 compressed image quality evaluation method based on generalized regression neural network |
CN103871054A (en) * | 2014-02-27 | 2014-06-18 | 华中科技大学 | Combined index-based image segmentation result quantitative evaluation method |
CN110599447A (en) * | 2019-07-29 | 2019-12-20 | 广州市番禺区中心医院(广州市番禺区人民医院、广州市番禺区心血管疾病研究所) | Method, system and storage medium for processing liver cancer focus data |
Non-Patent Citations (2)
Title |
---|
YEH, C. H.等: "Deep learning underwater image color correction and contrast enhancement based on hue preservation", 《IN 2019 IEEE UNDERWATER TECHNOLOGY (UT)》 * |
孙思婷: "医学超声图像分割与病灶中心空间定位方法研究", 《万方学位论文》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112257812A (en) * | 2020-11-12 | 2021-01-22 | 四川云从天府人工智能科技有限公司 | Method and device for determining labeled sample, machine readable medium and equipment |
CN112257812B (en) * | 2020-11-12 | 2024-03-29 | 四川云从天府人工智能科技有限公司 | Labeling sample determination method, device, machine-readable medium and equipment |
CN113962976A (en) * | 2021-01-20 | 2022-01-21 | 赛维森(广州)医疗科技服务有限公司 | Quality evaluation method for pathological slide digital image |
CN113962976B (en) * | 2021-01-20 | 2022-09-16 | 赛维森(广州)医疗科技服务有限公司 | Quality evaluation method for pathological slide digital image |
Also Published As
Publication number | Publication date |
---|---|
CN111340807B (en) | 2020-09-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhuang et al. | An effective WSSENet-based similarity retrieval method of large lung CT image databases | |
US20220148191A1 (en) | Image segmentation method and apparatus and storage medium | |
Qayyum et al. | Medical image retrieval using deep convolutional neural network | |
CN109065110B (en) | A method for automatic generation of medical imaging diagnosis report based on deep learning method | |
Zhang et al. | Hybrid graph convolutional network for semi-supervised retinal image classification | |
CN110348019B (en) | A Medical Entity Vector Transformation Method Based on Attention Mechanism | |
CN115019405A (en) | Multi-modal fusion-based tumor classification method and system | |
CN112614133A (en) | Three-dimensional pulmonary nodule detection model training method and device without anchor point frame | |
CN111340807B (en) | Method, system, electronic device and storage medium for extracting core data of lesion location | |
CN111387938A (en) | A system for predicting the risk of death in patients with heart failure based on feature-rearranged one-dimensional convolutional neural network | |
CN118447994A (en) | Diagnostic report generation system based on medical image and disease attribute description pair | |
CN114399634B (en) | Three-dimensional image classification method, system, equipment and medium based on weak supervision learning | |
CN112466462A (en) | EMR information association and evolution method based on deep learning of image | |
CN119027983A (en) | Method, device and electronic equipment for identifying images of livestock and poultry diseases | |
CN114519705A (en) | Ultrasonic standard data processing method and system for medical selection and identification | |
CN118196013B (en) | Multi-task medical image segmentation method and system supporting collaborative supervision of multiple doctors | |
CN113779295A (en) | Retrieval method, device, equipment and medium for abnormal cell image features | |
CN113159186A (en) | Medical image automatic labeling method based on semi-supervised learning | |
CN118297960A (en) | Semi-supervised nasopharyngeal carcinoma segmentation method based on image text contrast learning | |
CN118154627A (en) | A domain-adaptive segmentation method for supercardiac images driven by eye movement attention | |
CN118261887A (en) | Improved YOLOv prokaryotic and blastomere detection method | |
CN117976185A (en) | A breast cancer risk assessment method and system based on deep learning | |
CN116564458A (en) | A data processing method, system, device and medium based on electronic medical records | |
CN116230224A (en) | Method and system for predicting adverse events of heart failure based on time sequence model | |
CN112785559B (en) | Bone age prediction method based on deep learning and formed by mutually combining multiple heterogeneous models |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |