WO2021217854A1 - 假阳性过滤方法、装置、设备及存储介质 - Google Patents

假阳性过滤方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2021217854A1
WO2021217854A1 PCT/CN2020/098974 CN2020098974W WO2021217854A1 WO 2021217854 A1 WO2021217854 A1 WO 2021217854A1 CN 2020098974 W CN2020098974 W CN 2020098974W WO 2021217854 A1 WO2021217854 A1 WO 2021217854A1
Authority
WO
WIPO (PCT)
Prior art keywords
area
normal
distance
false positive
normal area
Prior art date
Application number
PCT/CN2020/098974
Other languages
English (en)
French (fr)
Inventor
陈凯星
周鑫
吕传峰
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021217854A1 publication Critical patent/WO2021217854A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a false positive filtering method, device, equipment, and computer-readable storage medium.
  • the existing methods for suppressing false positives are mainly divided into two categories: rule-based methods or network-based learning methods. These two types of methods have their own advantages and disadvantages: the rule-based method requires artificial induction of effective rules, which is highly pertinent and interpretable, but cannot be automatically summarized according to changes in data.
  • the network-based learning method can be self-summarized based on the data, and it is a simple and effective solution when the training data is complete.
  • the inventor found that in the application of lesion detection, the shape, gray scale, and texture of true positives (lesions) are ever-changing, and false positives are constantly changing according to network input results. Therefore, a complete training set cannot be obtained for true and false positives. Especially in the case where the image properties of the training sample and the test sample are different, there will be deviations in distinguishing between true and false positives.
  • the doctor judges whether the suspected area is a true positive based on the similarity between the suspected area and the normal area of the same sequence (case) or single (slice) image. For example, if a suspected area is very similar to normal brain parenchyma, then this area is probably a false positive. On this basis, an effective technique for distinguishing true and false positives can be proposed by simulating the above-mentioned comparative image reading ideas of doctors.
  • the present application provides a false positive filtering method, device, equipment and storage medium to solve at least one of the above technical problems.
  • this application proposes a false positive filtering method, which includes the steps:
  • Positioning module used to process the image to be inspected through the deep neural network model, and locate the normal area and the suspected area;
  • Measurement module used to calculate the intra-class distance between the initial normal area and the class distance between the suspected area and the initial normal area by using a similarity measurement algorithm;
  • Calculation module used to calculate the probability that the suspected area is a normal area according to the class inner distance and the class distance;
  • the present application also provides a computer device, including a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor executes the computer program when the computer program is executed. The following steps:
  • Figure 1 is a schematic diagram of an optional hardware architecture of the computer equipment of the present application.
  • FIG. 2 is a schematic diagram of modules of the first embodiment of the false positive filtering device of the present application.
  • FIG. 5 is a detailed flowchart of step S402 in FIG. 4;
  • FIG. 6 is a schematic flowchart of a second embodiment of the false positive filtering method of the present application.
  • the computer device 2 may be a server, or a terminal device that performs lesion detection, or the like.
  • the server may be a computing device such as a rack server, a blade server, a tower server, or a cabinet server, and may be an independent server or a server cluster composed of multiple servers.
  • the memory 11 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static memory Random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disk, optical disk, etc.
  • the memory 11 may be an internal storage unit of the computer device 2, for example, a hard disk or a memory of the computer device 2.
  • the memory 11 may also be an external storage device of the computer device 2, for example, a plug-in hard disk, a smart media card (SMC), and a secure digital device equipped on the computer device 2.
  • the processor 12 may be a central processing unit (CPU), a controller, a microcontroller, a microprocessor, or other data processing chips.
  • the processor 12 is generally used to control the overall operation of the computer device 2.
  • the processor 12 is used to run the program code or processing data stored in the memory 11, for example, to run the false positive filtering program 100.
  • the network interface 13 may include a wireless network interface or a wired network interface, and the network interface 13 is generally used to establish a communication connection between the computer device 2 and other electronic devices.
  • FIG. 2 is a block diagram of the first embodiment of the false positive filtering device 200 of the present application.
  • the false positive filtering device 200 includes a series of computer program instructions stored on the memory 11. When the computer program instructions are executed by the processor 12, the false positive filtering operations of the various embodiments of the present application can be implemented. . In some embodiments, the false positive filtering device 200 may be divided into one or more modules based on specific operations implemented by the various parts of the computer program instructions. For example, in FIG. 2, the false positive filtering device 200 can be divided into a positioning module 201, a determination module 202, a measurement module 203, a calculation module 204, and a filtering module 205. in:
  • the positioning module 201 is used to process the image to be detected through the deep neural network model, and locate the normal area and the suspected area.
  • the image to be detected may be a case or slice image to be subjected to false positive filtering (to distinguish between true and false positive).
  • case can be understood as the meaning of a sequence, a sequence of images obtained in one inspection is a case image; a slice image can be understood as a single image, and a case image is composed of multiple slice images.
  • any one or more commonly used deep neural network frameworks can be selected for learning and training of normal regions or suspected regions, so that the trained model can output information of normal regions and suspected regions according to the input data. Then use the trained deep neural network model (lesion detection network model) to process the case or slice image, and the normal area and the suspected area can be located from the model output. For each case or slice image, one or more normal areas and one or more suspected areas can be located.
  • the learning process of a deep neural network is a process of directed cognition, the information it learns is only limited to the samples it learns, and it cannot be compared and analyzed by comparing the current sample suspected area with the normal area.
  • this embodiment needs to further determine the initial normal area from the model output result.
  • the initial normal area can be determined in the following manner (that is, the preset rule is):
  • the measurement module 203 is used to calculate the intra-class distance between the initial normal area and the class distance between the suspected area and the initial normal area by using a similarity measurement algorithm.
  • the intra-class distance refers to calculating the distance between every two initial normal areas separately.
  • the class distance refers to separately calculating the distance between each suspected area and each initial normal area.
  • the calculation module 204 is configured to calculate the probability that the suspected area is a normal area according to the calculated intra-class distances and class distances.
  • the 3 ⁇ principle is to first assume that a set of test data contains only random errors, calculate and process them to obtain the standard deviation, and determine an interval with a certain probability. It is considered that any error exceeding this interval is not a random error but a gross error. , The data containing the gross error should be eliminated.
  • the 3 ⁇ principle is the most commonly used and simplest criterion for judging gross errors. It is generally used when the number of measurements is sufficient (n ⁇ 30) to make judgments. In this embodiment, a sufficient number of case or slice images are tested, and then an appropriate interval is taken, and the error outside this interval is regarded as a false positive. The selection of an appropriate threshold by testing a large number of samples is by testing enough data, then analyzing the results, and selecting a threshold with a good effect of suppressing false positives.
  • the comparison module 206 is used to select the best similarity measurement algorithm by comparing multiple candidate similarity measurement algorithms.
  • the best similarity measurement method is determined by comparing the performance of various similarity measurement methods on the initial normal region structure, and selecting a structure that can make the initial normal region structure in a low-dimensional or high-dimensional
  • the similarity measurement method with the smallest inner distance of the feature level is used as the subsequent calculation method to distinguish true and false positives.
  • the alternative similarity measurement algorithms include Euclidean distance, Manhattan distance, and cosine similarity. According to the selected features, these three alternative similarity measurement algorithms are respectively used to calculate the intra-class distance of the initial normal area, and then a similarity measurement algorithm with the smallest intra-class distance is selected as the best The similarity measurement algorithm. Subsequently, the optimal similarity measurement algorithm is used to calculate the intra-class distance between the initial normal area and the class distance between the suspected area and the initial normal area respectively. in:
  • the cosine similarity measurement algorithm uses the cosine value of the angle between two vectors in the vector space as a measure of the difference between two individuals.
  • the calculation formula is:
  • a and B represent feature vectors.
  • the best similarity measurement algorithm can be used for subsequent calculations; if If the deep neural network model changes, it is necessary to reselect the best similarity measurement algorithm again.
  • the measurement module 203 uses the best similarity measurement algorithm selected by the comparison module 206 to respectively calculate the intra-class distance between the initial normal area and the class between the suspected area and the initial normal area. spacing.
  • the false positive filtering device can compare the performance of a variety of alternative similarity measurement algorithms on the initial normal region structure, and select such that the initial normal region structure is at a low-dimensional or high-dimensional feature level.
  • the similarity measurement method with the smallest class distance is used as a subsequent calculation method to distinguish between true and false positives, which makes the similarity measurement between the normal area and the suspected area in the same case or slice image more effective, thereby improving the subsequent judgment of the false positive area The accuracy of the results, optimize the filtering effect.
  • this application also proposes a false positive filtering method.
  • FIG. 4 is a schematic flowchart of the first embodiment of the false positive filtering method of the present application.
  • the execution order of the steps in the flowchart shown in FIG. 4 can be changed, and some steps can be omitted.
  • the method includes:
  • a sequence of images can be generated for each inspection.
  • the image to be detected may be a case or slice image to be subjected to false positive filtering (to distinguish between true and false positive).
  • case can be understood as the meaning of a sequence, a sequence of images obtained in one inspection is a case image; a slice image can be understood as a single image, and a case image is composed of multiple slice images.
  • any one or more commonly used deep neural network frameworks can be selected for learning and training of normal regions or suspected regions, so that the trained model can output information of normal regions and suspected regions according to the input data. Then use the trained deep neural network model (lesion detection network model) to process the case or slice image, and the normal area and the suspected area can be located from the model output. For each case or slice image, one or more normal areas and one or more suspected areas can be located.
  • Step S402 Use a preset rule to determine an initial normal area from the normal area output by the model.
  • the learning process of a deep neural network is a process of directed cognition, the information it learns is only limited to the samples it learns, and it cannot be compared and analyzed by comparing the current sample suspected area with the normal area.
  • this embodiment needs to further determine the initial normal area from the model output result.
  • step S402 specifically includes:
  • step S4020 the gray value of each normal region output by the model is calculated respectively.
  • Step S4024 Calculate the difference value between the gray value of each normal area and the gray average value.
  • Step S4026 selecting a number of the normal regions with a smaller difference value as the initial normal regions.
  • step S404 the similarity measurement algorithm is used to calculate the intra-class distance between the initial normal area and the class distance between the suspected area and the initial normal area respectively.
  • the intra-class distance refers to calculating the distance between every two initial normal areas separately.
  • the class distance refers to separately calculating the distance between each suspected area and each initial normal area.
  • a certain similarity measurement algorithm preset or selected by the user is used to calculate the distance between every two initial normal regions in the same case or slice image.
  • the set is represented by the symbol P; the distance between each initial normal area and each suspected area in the same case or slice image is calculated, and the set of these distances (class spacing) is represented by the symbol Q.
  • the mean value ⁇ and the standard deviation ⁇ of the distances (class inner distances) in the aforementioned set P are calculated. Then, using ⁇ and ⁇ as the mean and standard deviation of the following Gaussian functions, and substituting the distance (class spacing) in the set Q as x into the following Gaussian function formula to obtain the probability p(x) that the suspected area is a normal area, the probability p( x) The smaller the area, the less likely it is that the suspected area belongs to the normal area.
  • step S408 the false positive area is filtered out according to the calculated probability and the selected threshold.
  • a threshold is selected to filter the false positive area, so as to achieve the effect of suppressing the false positive.
  • the false positive area is the suspicious area that is finally judged to be a normal area, that is, the probability of the area being a normal area exceeds (greater than or equal to) the threshold.
  • the false positive area can be screened by selecting an appropriate threshold according to the 3 ⁇ principle in the Gaussian function or by testing on a large number of samples.
  • the 3 ⁇ principle is to first assume that a set of test data contains only random errors, calculate and process them to obtain the standard deviation, and determine an interval with a certain probability. It is considered that any error exceeding this interval is not a random error but a gross error. , The data containing the gross error should be eliminated.
  • the 3 ⁇ principle is the most commonly used and simplest criterion for judging gross errors. It is generally used when the number of measurements is sufficient (n ⁇ 30) to make judgments. In this embodiment, a sufficient number of case or slice images are tested, and then an appropriate interval is taken, and the error outside this interval is regarded as a false positive. The selection of an appropriate threshold by testing a large number of samples is by testing enough data, then analyzing the results, and selecting a threshold with a good effect of suppressing false positives.
  • the false positive filtering method provided in this embodiment can use the intra-class and inter-class differences of the data itself to suppress false positives by comparing the similarity between the normal area and the suspected area in the same case or slice image, which not only avoids data discrepancies
  • the performance is unstable, and the normal area information is also used, which can supplement and optimize the network learning method.
  • this embodiment can be generalized to images with different characteristics, thereby reducing the difficulty of data collection.
  • this embodiment can be connected to any lesion detection network model as a simple supplement to the output result of the network model, so it has the advantages of universality and plug-and-play.
  • step S500 the image to be detected is processed through the deep neural network model to locate the normal area and the suspected area.
  • a sequence of images can be generated for each inspection.
  • the image to be detected may be a case or slice image to be subjected to false positive filtering (to distinguish between true and false positive).
  • case can be understood as the meaning of a sequence, a sequence of images obtained in one inspection is a case image; a slice image can be understood as a single image, and a case image is composed of multiple slice images.
  • any one or more commonly used deep neural network frameworks can be selected for learning and training of normal regions or suspected regions, so that the trained model can output information of normal regions and suspected regions according to the input data. Then use the trained deep neural network model (lesion detection network model) to process the case or slice image, and the normal area and the suspected area can be located from the model output. For each case or slice image, one or more normal areas and one or more suspected areas can be located.
  • Step S502 using a preset rule to determine an initial normal area from the normal area output by the model.
  • the learning process of a deep neural network is a process of directed cognition, the information it learns is only limited to the samples it learns, and it cannot be compared and analyzed by comparing the current sample suspected area with the normal area.
  • this embodiment needs to further determine the initial normal area from the model output result.
  • the preset rule may be based on the difference between the gray value of each normal region and the average value thereof, selecting several normal regions with smaller differences as the initial normal region. For the specific process of this step, refer to FIG. 5 and related descriptions, which will not be repeated here.
  • step S504 the best similarity measurement algorithm is selected by comparing multiple candidate similarity measurement algorithms.
  • the best similarity measurement method is determined by comparing the performance of various similarity measurement methods on the initial normal region structure, and selecting a structure that can make the initial normal region structure in a low-dimensional or high-dimensional
  • the similarity measurement method with the smallest inner distance of the feature level is used as the subsequent calculation method to distinguish true and false positives.
  • the alternative similarity measurement algorithms include Euclidean distance, Manhattan distance, and cosine similarity. According to the selected features, these three alternative similarity measurement algorithms are respectively used to calculate the intra-class distance of the initial normal area, and then a similarity measurement algorithm with the smallest intra-class distance is selected as the best The similarity measurement algorithm. Subsequently, the optimal similarity measurement algorithm is used to calculate the intra-class distance between the initial normal area and the class distance between the suspected area and the initial normal area respectively.
  • the intra-class distance refers to calculating the distance between every two initial normal areas separately.
  • the class distance refers to separately calculating the distance between each suspected area and each initial normal area.
  • k represents the dimension of the feature
  • xi and yi respectively represent the corresponding elements in the two feature vectors.
  • k represents the dimension of the feature
  • xi and yi respectively represent the corresponding elements in the two feature vectors.
  • the cosine similarity measurement algorithm uses the cosine value of the angle between two vectors in the vector space as a measure of the difference between two individuals.
  • the calculation formula is:
  • a and B represent feature vectors.
  • Step S506 using the best similarity measurement algorithm to calculate the intra-class distance between the initial normal area and the class distance between the suspected area and the initial normal area respectively.
  • the selected optimal similarity measurement algorithm is used to calculate the distance between every two initial normal regions in the same case or slice image, and the set of these distances (class distances) It is represented by the symbol P; the distance between each initial normal area and each suspected area in the same case or slice image is calculated, and the set of these distances (class spacing) is represented by the symbol Q.
  • Step S508 Calculate the probability that the suspected area is a normal area according to the calculated intra-class distance and the calculated inter-class distance.
  • step S510 the false positive area is filtered out according to the calculated probability and the selected threshold.
  • a threshold is selected to filter the false positive area, so as to achieve the effect of suppressing the false positive.
  • the false positive area is the suspected area that is finally judged to be a normal area, that is, the probability of the area being a normal area exceeds (greater than or equal to) the threshold.
  • the false positive area can be screened by selecting an appropriate threshold according to the 3 ⁇ principle in the Gaussian function or by testing on a large number of samples.
  • the computer-readable storage medium may be non-volatile or volatile, and the computer-readable storage medium stores A false positive filtering program, the false positive filtering program can be executed by at least one processor, so that the at least one processor executes the steps of the false positive filtering method as described above.
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to make a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present application.
  • a terminal device which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Apparatus For Radiation Diagnosis (AREA)

Abstract

涉及人工智能技术领域,揭露了一种假阳性过滤方法,该方法包括:通过深度神经网络模型处理待检测图像,定位出正常区域和被怀疑区域(S400);从模型输出的所述正常区域中采用预设规则确定出初始正常区域(S402);采用相似度测度算法分别计算所述初始正常区域间的类内距和所述被怀疑区域与所述初始正常区域间的类间距(S404);根据所述类内距和类间距计算所述被怀疑区域为正常区域的概率(S406);根据所计算出的概率和选定的阈值过滤出假阳性区域(S408)。还提供一种装置、设备及存储介质。能够利用数据本身的类内和类间差异,通过对比同一待检测图像中正常区域与被怀疑区域间的相似性来区分真假阳性区域,从而对网络学习方式进行补充和优化。

Description

假阳性过滤方法、装置、设备及存储介质
本申请要求于2020年4月30日提交中国专利局、申请号为CN202010369986.5,发明名称为“假阳性过滤方法、电子装置及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种假阳性过滤方法、装置、设备及计算机可读存储介质。
背景技术
随着高性能计算的产生和信息计算的飞速发展,人工智能技术的研究与应用已成为当前学术界和工业界的一个热点,并已成功应用于医疗健康等领域。在医疗健康领域,由于超过90%的医疗数据是来自于医学影像,因此,基于人工智能技术在医学影像领域的应用就具有很多种可能,例如图像分析与病变检查、疾病防治、智能治疗规划与预测等。当然,技术的发展难免会遇到一些痛点问题,而假阳性则是人工智能病灶检测技术的一个普遍痛点。
现有的压制假阳性的方法主要分两类:基于规则的方法或基于网络学习的方法。这两类方法互有优劣:基于规则的方法需要人为归纳有效规则,针对性和可解释性强,但无法根据数据的变化自动归纳。而基于网络学习的方法可以根据数据自我归纳,在训练数据完备的情况下,是简单有效的解决方案。发明人发现在病灶检测的应用中,真阳性(病灶)形态、灰度、纹理千变万化,假阳性则根据网络输入结果不断变化。因此,真假阳性都无法获得完备的训练集。特别是在训练样本和测试样本图像性质不同的情况下,区分真假阳性会出现偏差。
在现实医生阅片过程中,即使图像特性有变化,医生依然能迅速分辨假阳性。究其原因,医生是以同一个序列(case)或单张(slice)图像的被怀疑区域与正常区域的相似性,来判断被怀疑区域是否是真阳性。例如,一个被怀疑区域如果与正常的脑实质十分相似的话,那么这个区域大概率为假阳性。在此基础上,通过模拟医生的上述对比阅片的思路,可以提出一种有效的区分真假阳性的技术。
发明内容
本申请提供一种假阳性过滤方法、装置、设备及存储介质,以解决至少一个上述技术问题。
首先,为实现上述目的,本申请提出一种假阳性过滤方法,该方法包括步骤:
通过深度神经网络模型处理待检测图像,定位出正常区域和被怀疑区域;
从模型输出的所述正常区域中采用预设规则确定出初始正常区域;
采用相似度测度算法分别计算所述初始正常区域间的类内距和所述被怀疑区域与所述初始正常区域间的类间距;
根据所述类内距和类间距计算所述被怀疑区域为正常区域的概率;及
根据所计算出的概率和选定的阈值过滤出假阳性区域。
为了解决上述问题,本申请还提供一种假阳性过滤装置,所述装置包括:
定位模块:用于通过深度神经网络模型处理待检测图像,定位出正常区域和被怀疑区域;
确定模块:用于从模型输出的所述正常区域中采用预设规则确定出初始正常区域;
测度模块:用于采用相似度测度算法分别计算所述初始正常区域间的类内距和所述被怀疑区域与所述初始正常区域间的类间距;
计算模块:用于根据所述类内距和类间距计算所述被怀疑区域为正常区域的概率;及
过滤模块:用于根据所计算出的概率和选定的阈值过滤出假阳性区域。
为了解决上述问题,本申请还提供一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如下步骤:
通过深度神经网络模型处理待检测图像,定位出正常区域和被怀疑区域;
从模型输出的所述正常区域中采用预设规则确定出初始正常区域;
采用相似度测度算法分别计算所述初始正常区域间的类内距和所述被怀疑区域与所述初始正常区域间的类间距;
根据所述类内距和类间距计算所述被怀疑区域为正常区域的概率;及
根据所计算出的概率和选定的阈值过滤出假阳性区域。
为了解决上述问题,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,其中,所述计算机程序被处理器执行时实现如下步骤:
通过深度神经网络模型处理待检测图像,定位出正常区域和被怀疑区域;
从模型输出的所述正常区域中采用预设规则确定出初始正常区域;
采用相似度测度算法分别计算所述初始正常区域间的类内距和所述被怀疑区域与所述初始正常区域间的类间距;
根据所述类内距和类间距计算所述被怀疑区域为正常区域的概率;及
根据所计算出的概率和选定的阈值过滤出假阳性区域。
相较于现有技术,本申请所提出的假阳性过滤方法、装置、设备及计算机可读存储介质,可以通过模拟医生对比阅片的思路,利用数据本身的类内和类间差异,通过对比同一待检测图像中正常区域与被怀疑区域间的相似性来区分真假阳性区域,从而对网络学习方式进行补充和优化,有效提高真假阳性分类的成功率和泛化性。
附图说明
图1是本申请计算机设备一可选的硬件架构的示意图;
图2是本申请假阳性过滤装置第一实施例的模块示意图;
图3是本申请假阳性过滤装置第二实施例的模块示意图;
图4是本申请假阳性过滤方法第一实施例的流程示意图;
图5是图4中步骤S402的细化流程示意图;
图6是本申请假阳性过滤方法第二实施例的流程示意图;
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
需要说明的是,在本申请中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合 出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本申请要求的保护范围之内。
参阅图1所示,是本申请计算机设备2一可选的硬件架构的示意图。
本实施例中,所述计算机设备2可包括,但不仅限于,可通过系统总线相互通信连接存储器11、处理器12、网络接口13。需要指出的是,图1仅示出了具有组件11-13的计算机设备2,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。
其中,所述计算机设备2可以是服务器,也可以是进行病灶检测的终端设备等。所述服务器可以是机架式服务器、刀片式服务器、塔式服务器或机柜式服务器等计算设备,并且可以是独立的服务器,也可以是多个服务器所组成的服务器集群。
所述存储器11至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器11可以是所述计算机设备2的内部存储单元,例如该计算机设备2的硬盘或内存。在另一些实施例中,所述存储器11也可以是所述计算机设备2的外部存储设备,例如该计算机设备2上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器11还可以既包括所述计算机设备2的内部存储单元也包括其外部存储设备。本实施例中,所述存储器11通常用于存储安装于所述计算机设备2的操作系统和各类应用软件,例如假阳性过滤程序100的程序代码等。此外,所述存储器11还可以用于暂时地存储已经输出或者将要输出的各类数据。
所述处理器12在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器12通常用于控制所述计算机设备2的总体操作。本实施例中,所述处理器12用于运行所述存储器11中存储的程序代码或者处理数据,例如运行所述的假阳性过滤程序100等。
所述网络接口13可包括无线网络接口或有线网络接口,该网络接口13通常用于在所述计算机设备2与其他电子设备之间建立通信连接。
至此,己经详细介绍了本申请相关设备的硬件结构和功能。下面,将基于上述介绍提出本申请的各个实施例。
首先,本申请提出一种假阳性过滤装置200。
参阅图2所示,是本申请假阳性过滤装置200第一实施例的模块图。
本实施例中,所述假阳性过滤装置200包括一系列的存储于存储器11上的计算机程序指令,当该计算机程序指令被处理器12执行时,可以实现本申请各实施例的假阳性过滤操作。在一些实施例中,基于该计算机程序指令各部分所实现的特定的操作,假阳性过滤装置200可以被划分为一个或多个模块。例如,在图2中,所述假阳性过滤装置200可以被分割成定位模块201、确定模块202、测度模块203、计算模块204、过滤模块205。其中:
所述定位模块201,用于通过深度神经网络模型处理待检测图像,定位出正常区域和被怀疑区域。
具体地,对于CT(Computed Tomography,电子计算机断层扫描)、MRI(Magnetic Resonance Imaging,磁共振成像)、PET(Positron Emission Computed Tomography,正电子 发射型计算机断层显像)等3D图像模态,每次检查能产生一个序列的图像。所述待检测图像可以是待进行假阳性过滤(区分真假阳性)的case或slice图像。其中,case可理解为序列的意思,一次检查得到的一序列图像就是一个case图像;slice图像可理解为单张图像,一个case图像由多个slice图像组成。
在本实施例中,可以选择任意一个或多个常用的深度神经网络框架进行正常区域或被怀疑区域的学习训练,使训练出的模型能根据输入的数据输出正常区域和被怀疑区域信息。然后利用训练好的深度神经网络模型(病灶检测网络模型)处理case或slice图像,从模型输出中即可定位出正常区域和被怀疑区域。针对每一个case或slice图像,可以定位出一个或多个正常区域以及一个或多个被怀疑区域。
所述确定模块202,用于从模型输出的所述正常区域中采用预设规则确定出初始正常区域。
具体地,由于深度神经网络的学习过程是一个定向认知的过程,它所学到信息仅仅局限在给予它学习的样本,而无法像人一样可通过对比分析当前样本被怀疑区域与正常区域共同特性来排除由噪声或者图像特性的差异带来的干扰。因此,本实施例需要从模型输出结果中进一步确定初始正常区域。
所述初始正常区域可通过以下方式进行确定(即所述预设规则为):
(1)分别计算模型输出的各个所述正常区域的灰度值;
(2)计算所有所述正常区域的灰度均值;
(3)比较各个所述正常区域的灰度值和所述灰度均值,从中选取灰度值与所述灰度均值差异较小的若干个区域作为所述初始正常区域。
也就是说,分别计算各个所述正常区域的灰度值和所述灰度均值之间的差异值,然后按差异值大小通过升序排序,并按需选择前M个(M为正整数,例如3个)差异值对应的所述正常区域作为初始正常区域。或者,还可以在计算出差异值后,选择差异值小于预设阈值的所述正常区域作为初始正常区域。
所述测度模块203,用于采用相似度测度算法分别计算初始正常区域间的类内距和被怀疑区域与初始正常区域间的类间距。
具体地,所述相似性测度就是比较两个事物的相似性,一般通过计算事物特征之间的距离来衡量。其中,事物的特征可分为低维特征和高维特征,常用的低维特征包括颜色(灰度)、纹理、大小、形状等,常用的相似度测度算法包括欧式距离、曼哈顿距离、余弦相似度等;而高维特征一般通过卷积获得,常用的相似度测度算法包括结构相似性、块匹配等。如果距离小则相似度大,反之,距离大则相似度小。
所述类内距是指分别计算每两个初始正常区域之间的距离。所述类间距是指分别计算每个被怀疑区域与每个初始正常区域之间的距离。
在本实施例中,利用预先设置的或用户选择的某一种相似度测度算法,计算同一case或slice图像中每两个所述初始正常区域之间的距离,这些距离(类内距)的集合用符号P表示;计算同一case或slice图像中每个所述初始正常区域与每个被怀疑区域之间的距离,这些距离(类间距)的集合用符号Q表示。
所述计算模块204,用于根据所计算出的类内距和类间距计算所述被怀疑区域为正常区域的概率。
具体地,计算上述集合P中距离(类内距)的均值μ和标准差σ。然后,以μ和σ作为以 下高斯函数的均值和标准差,以集合Q中距离(类间距)作为x代入以下高斯函数公式求取被怀疑区域为正常区域的概率p(x),概率p(x)越小则表明所述被怀疑区域属于正常区域的可能性越小。
Figure PCTCN2020098974-appb-000001
所述过滤模块205,用于根据所计算出的概率和选定的阈值过滤出假阳性区域。
具体地,在计算出所述被怀疑区域为正常区域的概率后,选取阈值进行假阳性区域的过滤,以此达到抑制假阳性的效果。所述假阳性区域即最后被判断为正常区域的所述被怀疑区域,也就是说该区域为正常区域的概率超过(大于或等于)所述阈值。在本实施例中,可根据高斯函数中的3σ原则或者通过在大量样本测试来选取合适的阈值筛选假阳性区域。
所述3σ原则为,先假设一组检测数据只含有随机误差,对其进行计算处理得到标准偏差,按一定概率确定一个区间,认为凡超过这个区间的误差,就不属于随机误差而是粗大误差,含有该粗大误差的数据应予以剔除。3σ原则是最常用也是最简单的粗大误差判别准则,它一般应用于测量次数充分多(n≥30)做判别时的情况。在本实施例中就是测试充分多的case或者slice图像,然后取一个合适的区间,将误差在这个区间外的作为假阳性。所述通过在大量样本测试来选取合适的阈值就是通过测试足够多的数据,然后分析结果,选择一个压制假阳性效果好的阈值。
由于医学影像有多种成像方式,且成像的设备、设备厂商及环境的不同都造成图像特性存在差异,数据收集时往往无法涵盖所有数据特性,因此训练的网络容易在遇到不同特性的图像时出现检测错误。本实施例是模拟医生的阅片思路,从case或slice层面去比对被怀疑区域与正常区域间的相似性来筛选目标,该处理过程更具合理性和科学依据,可以有效提高真假阳性样本分类的成功率和泛化性,从而更好地压制假阳性。
本实施例提供的假阳性过滤装置,可以利用数据本身的类内和类间差异,通过对比同一case或slice图像中正常区域与被怀疑区域间的相似性来压制假阳性,不仅避免了数据差异造成的性能的不稳定,而且还利用了正常区域信息,可以对网络学习方式进行补充和优化。另外,相比通过搜集样本进行深度学习的压制假阳性策略,本实施例可以泛化于不同特性的图像上,从而降低了数据搜集的难度。并且,本实施例可以接在任意病灶检测网络模型后,作为对网络模型输出结果的简单补充,因此具有普适性和即插即用的优点。
参阅图3所示,是本申请假阳性过滤装置200第二实施例的模块图。本实施例中,所述的假阳性过滤装置200除了包括第一实施例中的所述定位模块201、确定模块202、测度模块203、计算模块204、过滤模块205之外,还包括对比模块206。
所述对比模块206,用于通过对比多种备选的相似度测度算法来选定最佳的相似度测度算法。
在本实施例中,最佳的相似性测度方法的确定是通过对比各种相似性测度方法在所述初始正常区域结构上的表现,选取能够使得所述初始正常区域结构在低维或高维特征层面类内距最小的相似性测度方法作为后续区分真假阳性的计算方法。在前期可以采用所有常用特征去计算距离,判断哪些特征对于区分真假阳性是有效的,后期就选择使用这些有效的特征。
例如,选择四个低维特征(颜色、纹理、大小、形状)作为判断依据,备选的相似度测度算法包括欧式距离、曼哈顿距离、余弦相似度三种。根据所选择的特征,分别采用这 三种备选的相似度测度算法计算所述初始正常区域的类内距,然后从中选出类内距最小的一种相似度测度算法,作为所述最佳的相似度测度算法。后续则采用所述最佳的相似度测度算法分别计算所述初始正常区域间的类内距以及被怀疑区域与所述初始正常区域间的类间距。其中:
(1)欧式距离计算公式:
Figure PCTCN2020098974-appb-000002
其中k表示特征的维度,xi、yi分别表示两个特征向量中对应的元素。
(2)曼哈顿距离计算公式:
D=|x i-y i|,i∈k
其中k表示特征的维度,xi、yi分别表示两个特征向量中对应的元素。
(3)余弦相似度测度算法采用向量空间中两个向量夹角的余弦值作为衡量两个个体间差异的大小,计算公式为:
Figure PCTCN2020098974-appb-000003
其中A、B代表特征向量。
值得注意的是,当所述深度神经网络模型不变时,通过所述对比模块206选定最佳的相似度测度算法后,后续均可采用该最佳的相似度测度算法来进行计算;若所述深度神经网络模型发生改变,则需要再次重新选定最佳的相似度测度算法。
在本实施例中,所述测度模块203利用所述对比模块206选定的所述最佳的相似度测度算法分别计算初始正常区域间的类内距和被怀疑区域与初始正常区域间的类间距。
本实施例提供的假阳性过滤装置,可以通过对比多种备选的相似度测度算法在所述初始正常区域结构上的表现,选取能够使得所述初始正常区域结构在低维或高维特征层面类内距最小的相似性测度方法作为后续区分真假阳性的计算方法,使得针对同一case或slice图像中正常区域与被怀疑区域间的相似性测度更加有效,从而提升后续对于假阳性区域的判断结果的准确性,优化过滤效果。
此外,本申请还提出一种假阳性过滤方法。
参阅图4所示,是本申请假阳性过滤方法第一实施例的流程示意图。在本实施例中,根据不同的需求,图4所示的流程图中的步骤的执行顺序可以改变,某些步骤可以省略。该方法包括:
步骤S400,通过深度神经网络模型处理待检测图像,定位出正常区域和被怀疑区域。
具体地,对于CT、MRI、PET等3D图像模态,每次检查能产生一个序列的图像。所述待检测图像可以是待进行假阳性过滤(区分真假阳性)的case或slice图像。其中,case可理解为序列的意思,一次检查得到的一序列图像就是一个case图像;slice图像可理解为单张图像,一个case图像由多个slice图像组成。
在本实施例中,可以选择任意一个或多个常用的深度神经网络框架进行正常区域或被怀疑区域的学习训练,使训练出的模型能根据输入的数据输出正常区域和被怀疑区域信息。然后利用训练好的深度神经网络模型(病灶检测网络模型)处理case或slice图像,从模型输出中即可定位出正常区域和被怀疑区域。针对每一个case或slice图像,可以定位出一个 或多个正常区域以及一个或多个被怀疑区域。
步骤S402,从模型输出的所述正常区域中采用预设规则确定出初始正常区域。
具体地,由于深度神经网络的学习过程是一个定向认知的过程,它所学到信息仅仅局限在给予它学习的样本,而无法像人一样可通过对比分析当前样本被怀疑区域与正常区域共同特性来排除由噪声或者图像特性的差异带来的干扰。因此,本实施例需要从模型输出结果中进一步确定初始正常区域。
在本实施例中,所述预设规则可以是根据各个所述正常区域的灰度值与其均值之间的差异,选取差异较小的若干个所述正常区域作为所述初始正常区域。
进一步参阅图5所示,所述步骤S402具体包括:
步骤S4020,分别计算模型输出的各个所述正常区域的灰度值。
步骤S4022,计算所有所述正常区域的灰度均值。
也就是说,将上一步计算得到的各个所述正常区域的灰度值求平均值。
步骤S4024,分别计算各个所述正常区域的灰度值和所述灰度均值之间的差异值。
步骤S4026,选取差异值较小的若干个所述正常区域作为所述初始正常区域。
其中,可以按差异值大小将所述正常区域通过升序排序,并按需选择前M个(M为正整数,例如3个)差异值对应的所述正常区域作为初始正常区域。或者,还可以在计算出差异值后,选择差异值小于预设阈值的所述正常区域作为初始正常区域。
回到图4,步骤S404,采用相似度测度算法分别计算初始正常区域间的类内距和被怀疑区域与初始正常区域间的类间距。
具体地,所述相似性测度就是比较两个事物的相似性,一般通过计算事物特征之间的距离来衡量。其中,事物的特征可分为低维特征和高维特征,常用的低维特征包括颜色(灰度)、纹理、大小、形状等,常用的相似度测度算法包括欧式距离、曼哈顿距离、余弦相似度等;而高维特征一般通过卷积获得,常用的相似度测度算法包括结构相似性、块匹配等。如果距离小则相似度大,反之,距离大则相似度小。
所述类内距是指分别计算每两个初始正常区域之间的距离。所述类间距是指分别计算每个被怀疑区域与每个初始正常区域之间的距离。
在本实施例中,利用预先设置的或用户选择的某一种相似度测度算法,计算同一case或slice图像中每两个所述初始正常区域之间的距离,这些距离(类内距)的集合用符号P表示;计算同一case或slice图像中每个所述初始正常区域与每个被怀疑区域之间的距离,这些距离(类间距)的集合用符号Q表示。
步骤S406,根据所计算出的类内距和类间距计算所述被怀疑区域为正常区域的概率。
具体地,计算上述集合P中距离(类内距)的均值μ和标准差σ。然后,以μ和σ作为以下高斯函数的均值和标准差,以集合Q中距离(类间距)作为x代入以下高斯函数公式求取被怀疑区域为正常区域的概率p(x),概率p(x)越小则表明所述被怀疑区域属于正常区域的可能性越小。
Figure PCTCN2020098974-appb-000004
步骤S408,根据所计算出的概率和选定的阈值过滤出假阳性区域。
具体地,在计算出所述被怀疑区域为正常区域的概率后,选取阈值进行假阳性区域的过滤,以此达到抑制假阳性的效果。所述假阳性区域即最后被判断为正常区域的所述被怀 疑区域,也就是说该区域为正常区域的概率超过(大于或等于)所述阈值。在本实施例中,可根据高斯函数中的3σ原则或者通过在大量样本测试来选取合适的阈值筛选假阳性区域。
所述3σ原则为,先假设一组检测数据只含有随机误差,对其进行计算处理得到标准偏差,按一定概率确定一个区间,认为凡超过这个区间的误差,就不属于随机误差而是粗大误差,含有该粗大误差的数据应予以剔除。3σ原则是最常用也是最简单的粗大误差判别准则,它一般应用于测量次数充分多(n≥30)做判别时的情况。在本实施例中就是测试充分多的case或者slice图像,然后取一个合适的区间,将误差在这个区间外的作为假阳性。所述通过在大量样本测试来选取合适的阈值就是通过测试足够多的数据,然后分析结果,选择一个压制假阳性效果好的阈值。
由于医学影像有多种成像方式,且成像的设备、设备厂商及环境的不同都造成图像特性存在差异,数据收集时往往无法涵盖所有数据特性,因此训练的网络容易在遇到不同特性的图像时出现检测错误。本实施例是模拟医生的阅片思路,从case或slice层面去比对被怀疑区域与正常区域间的相似性来筛选目标,该处理过程更具合理性和科学依据,可以有效提高真假阳性样本分类的成功率和泛化性,从而更好地压制假阳性。
本实施例提供的假阳性过滤方法,可以利用数据本身的类内和类间差异,通过对比同一case或slice图像中正常区域与被怀疑区域间的相似性来压制假阳性,不仅避免了数据差异造成的性能的不稳定,而且还利用了正常区域信息,可以对网络学习方式进行补充和优化。另外,相比通过搜集样本进行深度学习的压制假阳性策略,本实施例可以泛化于不同特性的图像上,从而降低了数据搜集的难度。并且,本实施例可以接在任意病灶检测网络模型后,作为对网络模型输出结果的简单补充,因此具有普适性和即插即用的优点。
如图6所示,是本申请假阳性过滤方法的第二实施例的流程示意图。本实施例中,所述假阳性过滤方法在第一实施例的基础上,还包括步骤S504。
该方法包括以下步骤:
步骤S500,通过深度神经网络模型处理待检测图像,定位出正常区域和被怀疑区域。
具体地,对于CT、MRI、PET等3D图像模态,每次检查能产生一个序列的图像。所述待检测图像可以是待进行假阳性过滤(区分真假阳性)的case或slice图像。其中,case可理解为序列的意思,一次检查得到的一序列图像就是一个case图像;slice图像可理解为单张图像,一个case图像由多个slice图像组成。
在本实施例中,可以选择任意一个或多个常用的深度神经网络框架进行正常区域或被怀疑区域的学习训练,使训练出的模型能根据输入的数据输出正常区域和被怀疑区域信息。然后利用训练好的深度神经网络模型(病灶检测网络模型)处理case或slice图像,从模型输出中即可定位出正常区域和被怀疑区域。针对每一个case或slice图像,可以定位出一个或多个正常区域以及一个或多个被怀疑区域。
步骤S502,从模型输出的所述正常区域中采用预设规则确定出初始正常区域。
具体地,由于深度神经网络的学习过程是一个定向认知的过程,它所学到信息仅仅局限在给予它学习的样本,而无法像人一样可通过对比分析当前样本被怀疑区域与正常区域共同特性来排除由噪声或者图像特性的差异带来的干扰。因此,本实施例需要从模型输出结果中进一步确定初始正常区域。
在本实施例中,所述预设规则可以是根据各个所述正常区域的灰度值与其均值之间的 差异,选取差异较小的若干个所述正常区域作为所述初始正常区域。该步骤的具体过程参见图5及相关说明,在此不再赘述。
步骤S504,通过对比多种备选的相似度测度算法来选定最佳的相似度测度算法。
具体地,所述相似性测度就是比较两个事物的相似性,一般通过计算事物特征之间的距离来衡量。其中,事物的特征可分为低维特征和高维特征,常用的低维特征包括颜色(灰度)、纹理、大小、形状等,常用的相似度测度算法包括欧式距离、曼哈顿距离、余弦相似度等;而高维特征一般通过卷积获得,常用的相似度测度算法包括结构相似性、块匹配等。如果距离小则相似度大,反之,距离大则相似度小。
在本实施例中,最佳的相似性测度方法的确定是通过对比各种相似性测度方法在所述初始正常区域结构上的表现,选取能够使得所述初始正常区域结构在低维或高维特征层面类内距最小的相似性测度方法作为后续区分真假阳性的计算方法。在前期可以采用所有常用特征去计算距离,判断哪些特征对于区分真假阳性是有效的,后期就选择使用这些有效的特征。
例如,选择四个低维特征(颜色、纹理、大小、形状)作为判断依据,备选的相似度测度算法包括欧式距离、曼哈顿距离、余弦相似度三种。根据所选择的特征,分别采用这三种备选的相似度测度算法计算所述初始正常区域的类内距,然后从中选出类内距最小的一种相似度测度算法,作为所述最佳的相似度测度算法。后续则采用所述最佳的相似度测度算法分别计算所述初始正常区域间的类内距以及被怀疑区域与所述初始正常区域间的类间距。
所述类内距是指分别计算每两个初始正常区域之间的距离。所述类间距是指分别计算每个被怀疑区域与每个初始正常区域之间的距离。
其中:
(1)欧式距离计算公式:
Figure PCTCN2020098974-appb-000005
其中k表示特征的维度,xi、yi分别表示两个特征向量中对应的元素。
(2)曼哈顿距离计算公式:
D=|x i-y i|,i∈k
其中k表示特征的维度,xi、yi分别表示两个特征向量中对应的元素。
(3)余弦相似度测度算法采用向量空间中两个向量夹角的余弦值作为衡量两个个体间差异的大小,计算公式为:
Figure PCTCN2020098974-appb-000006
其中A、B代表特征向量。
值得注意的是,当所述深度神经网络模型不变时,通过该步骤选定最佳的相似度测度算法后,后续均可采用该最佳的相似度测度算法来进行计算;若所述深度神经网络模型发生改变,则需要再次重新选定最佳的相似度测度算法。
在本实施例中,后续利用该步骤选定的所述最佳的相似度测度算法进行类内距和类间距的计算。
步骤S506,采用所述最佳的相似度测度算法分别计算初始正常区域间的类内距和被怀疑区域与初始正常区域间的类间距。
在本实施例中,利用所选定的所述最佳的相似度测度算法,计算同一case或slice图像中每两个所述初始正常区域之间的距离,这些距离(类内距)的集合用符号P表示;计算同一case或slice图像中每个所述初始正常区域与每个被怀疑区域之间的距离,这些距离(类间距)的集合用符号Q表示。
步骤S508,根据所计算出的类内距和类间距计算所述被怀疑区域为正常区域的概率。
具体地,计算上述集合P中距离(类内距)的均值μ和标准差σ。然后,以μ和σ作为以下高斯函数的均值和标准差,以集合Q中距离(类间距)作为x代入以下高斯函数公式求取被怀疑区域为正常区域的概率p(x),概率p(x)越小则表明所述被怀疑区域属于正常区域的可能性越小。
Figure PCTCN2020098974-appb-000007
步骤S510,根据所计算出的概率和选定的阈值过滤出假阳性区域。
具体地,在计算出所述被怀疑区域为正常区域的概率后,选取阈值进行假阳性区域的过滤,以此达到抑制假阳性的效果。所述假阳性区域即最后被判断为正常区域的所述被怀疑区域,也就是说该区域为正常区域的概率超过(大于或等于)所述阈值。在本实施例中,可根据高斯函数中的3σ原则或者通过在大量样本测试来选取合适的阈值筛选假阳性区域。
本实施例提供的假阳性过滤方法,可以通过模拟医生对比阅片的思路,提出一种通过使用同一个case或slice图像中的正常区域结构为背景来区分真假阳性的技术,有效提高真假阳性分类的成功率和泛化性。并且,还可以通过对比多种备选的相似度测度算法在所述初始正常区域结构上的表现,选取能够使得所述初始正常区域结构在低维或高维特征层面类内距最小的相似性测度方法作为后续区分真假阳性的计算方法,使得针对同一case或slice图像中正常区域与被怀疑区域间的相似性测度更加有效,从而提升后续对于假阳性区域的判断结果的准确性,优化过滤效果。
本申请还提供了另一种实施方式,即提供一种计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性,所述计算机可读存储介质存储有假阳性过滤程序,所述假阳性过滤程序可被至少一个处理器执行,以使所述至少一个处理器执行如上述的假阳性过滤方法的步骤。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种假阳性过滤方法,其中,所述方法包括步骤:
    通过深度神经网络模型处理待检测图像,定位出正常区域和被怀疑区域;
    从模型输出的所述正常区域中采用预设规则确定出初始正常区域;
    采用相似度测度算法分别计算所述初始正常区域间的类内距和所述被怀疑区域与所述初始正常区域间的类间距;
    根据所述类内距和类间距计算所述被怀疑区域为正常区域的概率;及
    根据所计算出的概率和选定的阈值过滤出假阳性区域。
  2. 如权利要求1所述的假阳性过滤方法,其中,该方法在计算所述类内距和类间距之前还包括步骤:
    通过对比多种备选的相似度测度算法,选取使所述初始正常区域类内距最小的相似性测度方法作为最佳的相似度测度算法,并在所述计算所述初始正常区域间的类内距和所述被怀疑区域与所述初始正常区域间的类间距的步骤中采用所述最佳的相似度测度算法。
  3. 如权利要求1或2所述的假阳性过滤方法,其中,所述待检测图像为待进行假阳性过滤的序列图像或单张图像。
  4. 如权利要求3所述的假阳性过滤方法,其中,所述类内距为同一序列图像或单张图像中每两个所述初始正常区域之间的距离;所述类间距为同一序列图像或单张图像中每个所述被怀疑区域与每个所述初始正常区域间之间的距离。
  5. 如权利要求1或2所述的假阳性过滤方法,其中,所述从模型输出的所述正常区域中采用预设规则确定出初始正常区域的步骤包括:
    分别计算模型输出的各个所述正常区域的灰度值;
    计算所有所述正常区域的灰度均值;
    分别计算各个所述正常区域的灰度值和所述灰度均值之间的差异值;
    选取差异值较小的若干个所述正常区域作为所述初始正常区域。
  6. 如权利要求5所述的假阳性过滤方法,其中,所述选取差异值较小的若干个所述正常区域作为所述初始正常区域的步骤包括:
    按差异值大小将所述正常区域通过升序排序,并按需选择前M个差异值对应的所述正常区域作为所述初始正常区域,其中M为正整数;或者
    选择差异值小于预设阈值的所述正常区域作为所述初始正常区域。
  7. 如权利要求1或2所述的假阳性过滤方法,其中,所述根据所计算出的概率和选定的阈值过滤出假阳性区域的步骤包括:
    根据高斯函数中的3σ原则选取所述阈值,将所述概率大于或等于所述阈值的所述被怀疑区域确定为假阳性区域。
  8. 一种假阳性过滤装置,其中,所述装置包括:
    定位模块:用于通过深度神经网络模型处理待检测图像,定位出正常区域和被怀疑区域;
    确定模块:用于从模型输出的所述正常区域中采用预设规则确定出初始正常区域;
    测度模块:用于采用相似度测度算法分别计算所述初始正常区域间的类内距和所述被怀疑区域与所述初始正常区域间的类间距;
    计算模块:用于根据所述类内距和类间距计算所述被怀疑区域为正常区域的概率;及过滤模块:用于根据所计算出的概率和选定的阈值过滤出假阳性区域。
  9. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现如下步骤:
    通过深度神经网络模型处理待检测图像,定位出正常区域和被怀疑区域;
    从模型输出的所述正常区域中采用预设规则确定出初始正常区域;
    采用相似度测度算法分别计算所述初始正常区域间的类内距和所述被怀疑区域与所述初始正常区域间的类间距;
    根据所述类内距和类间距计算所述被怀疑区域为正常区域的概率;及
    根据所计算出的概率和选定的阈值过滤出假阳性区域。
  10. 如权利要求9所述的计算机设备,其中,在计算所述类内距和类间距之前所述处理器执行所述计算机程序时还实现如下步骤:
    通过对比多种备选的相似度测度算法,选取使所述初始正常区域类内距最小的相似性测度方法作为最佳的相似度测度算法,并在所述计算所述初始正常区域间的类内距和所述被怀疑区域与所述初始正常区域间的类间距的步骤中采用所述最佳的相似度测度算法。
  11. 如权利要求9或10所述的计算机设备,其中,所述待检测图像为待进行假阳性过滤的序列图像或单张图像。
  12. 如权利要求11所述的计算机设备,其中,所述类内距为同一序列图像或单张图像中每两个所述初始正常区域之间的距离;所述类间距为同一序列图像或单张图像中每个所述被怀疑区域与每个所述初始正常区域间之间的距离。
  13. 如权利要求9或10所述的计算机设备,其中,所述从模型输出的所述正常区域中采用预设规则确定出初始正常区域的步骤包括:
    分别计算模型输出的各个所述正常区域的灰度值;
    计算所有所述正常区域的灰度均值;
    分别计算各个所述正常区域的灰度值和所述灰度均值之间的差异值;
    选取差异值较小的若干个所述正常区域作为所述初始正常区域。
  14. 如权利要求13所述的计算机设备,其中,所述选取差异值较小的若干个所述正常区域作为所述初始正常区域的步骤包括:
    按差异值大小将所述正常区域通过升序排序,并按需选择前M个差异值对应的所述正常区域作为所述初始正常区域,其中M为正整数;或者
    选择差异值小于预设阈值的所述正常区域作为所述初始正常区域。
  15. 如权利要求9所述的计算机设备,其中,所述根据所计算出的概率和选定的阈值过滤出假阳性区域的步骤包括:
    根据高斯函数中的3σ原则选取所述阈值,将所述概率大于或等于所述阈值的所述被怀疑区域确定为假阳性区域。
  16. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,其中,所述计算机程序被处理器执行时实现如下步骤:
    通过深度神经网络模型处理待检测图像,定位出正常区域和被怀疑区域;
    从模型输出的所述正常区域中采用预设规则确定出初始正常区域;
    采用相似度测度算法分别计算所述初始正常区域间的类内距和所述被怀疑区域与所述初始正常区域间的类间距;
    根据所述类内距和类间距计算所述被怀疑区域为正常区域的概率;及
    根据所计算出的概率和选定的阈值过滤出假阳性区域。
  17. 如权利要求16所述的计算机可读存储介质,其中,所述待检测图像为待进行假阳性过滤的序列图像或单张图像。
  18. 如权利要求17所述的计算机可读存储介质,其中,所述类内距为同一序列图像或单张图像中每两个所述初始正常区域之间的距离;所述类间距为同一序列图像或单张图像中每个所述被怀疑区域与每个所述初始正常区域间之间的距离。
  19. 如权利要求16所述的计算机可读存储介质,其中,所述从模型输出的所述正常区域中采用预设规则确定出初始正常区域的步骤包括:
    分别计算模型输出的各个所述正常区域的灰度值;
    计算所有所述正常区域的灰度均值;
    分别计算各个所述正常区域的灰度值和所述灰度均值之间的差异值;
    选取差异值较小的若干个所述正常区域作为所述初始正常区域。
  20. 如权利要求16所述的计算机可读存储介质,其中,所述根据所计算出的概率和选定的阈值过滤出假阳性区域的步骤包括:
    根据高斯函数中的3σ原则选取所述阈值,将所述概率大于或等于所述阈值的所述被怀疑区域确定为假阳性区域。
PCT/CN2020/098974 2020-04-30 2020-06-29 假阳性过滤方法、装置、设备及存储介质 WO2021217854A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010369986.5 2020-04-30
CN202010369986.5A CN111652277A (zh) 2020-04-30 2020-04-30 假阳性过滤方法、电子装置及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2021217854A1 true WO2021217854A1 (zh) 2021-11-04

Family

ID=72346648

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/098974 WO2021217854A1 (zh) 2020-04-30 2020-06-29 假阳性过滤方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN111652277A (zh)
WO (1) WO2021217854A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115081957A (zh) * 2022-08-18 2022-09-20 山东超华环保智能装备有限公司 一种危废暂存及监测的危废管理平台

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113569742B (zh) * 2021-07-29 2023-04-07 西南交通大学 一种基于卷积神经网络的宽频带电磁干扰源识别方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101556650A (zh) * 2009-04-01 2009-10-14 东北大学 一种分布式自适应肺结节计算机检测方法及系统
CN107958453A (zh) * 2017-12-01 2018-04-24 深圳蓝韵医学影像有限公司 乳腺图像病变区域的检测方法、装置及计算机存储介质
CN109635846A (zh) * 2018-11-16 2019-04-16 哈尔滨工业大学(深圳) 一种多类医学图像判断方法和系统
US20190130562A1 (en) * 2017-11-02 2019-05-02 Siemens Healthcare Gmbh 3D Anisotropic Hybrid Network: Transferring Convolutional Features from 2D Images to 3D Anisotropic Volumes

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101556650A (zh) * 2009-04-01 2009-10-14 东北大学 一种分布式自适应肺结节计算机检测方法及系统
US20190130562A1 (en) * 2017-11-02 2019-05-02 Siemens Healthcare Gmbh 3D Anisotropic Hybrid Network: Transferring Convolutional Features from 2D Images to 3D Anisotropic Volumes
CN107958453A (zh) * 2017-12-01 2018-04-24 深圳蓝韵医学影像有限公司 乳腺图像病变区域的检测方法、装置及计算机存储介质
CN109635846A (zh) * 2018-11-16 2019-04-16 哈尔滨工业大学(深圳) 一种多类医学图像判断方法和系统

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115081957A (zh) * 2022-08-18 2022-09-20 山东超华环保智能装备有限公司 一种危废暂存及监测的危废管理平台

Also Published As

Publication number Publication date
CN111652277A (zh) 2020-09-11

Similar Documents

Publication Publication Date Title
Lai et al. Medical image classification based on deep features extracted by deep model and statistic feature fusion with multilayer perceptron‬
CN113159147B (zh) 基于神经网络的图像识别方法、装置、电子设备
CN111933281B (zh) 一种疾病分型的确定系统、方法、装置及存储介质
CN108629784A (zh) 一种基于深度学习的ct图像颅内血管分割方法及系统
CN109817339B (zh) 基于大数据的患者分组方法和装置
US10706534B2 (en) Method and apparatus for classifying a data point in imaging data
JP2013525009A (ja) 放射線画像における微小石灰化検出および分類
WO2021217854A1 (zh) 假阳性过滤方法、装置、设备及存储介质
CN111207926A (zh) 一种基于滚动轴承故障诊断方法、电子装置及存储介质
KR20140114303A (ko) 3차원 의료 영상들에서 2차원 뷰의 자동 플래닝을 위한 방법 및 시스템
Ma et al. A new classifier fusion method based on historical and on-line classification reliability for recognizing common CT imaging signs of lung diseases
CN110705621A (zh) 基于dcnn的食物图像的识别方法和系统及食物热量计算方法
WO2019095587A1 (zh) 人脸识别方法、应用服务器及计算机可读存储介质
CN113379469A (zh) 一种异常流量检测方法、装置、设备及存储介质
CN111028940B (zh) 肺结节多尺度检测方法、装置、设备及介质
CN115861656A (zh) 用于自动处理医学图像以输出警报的方法、设备和系统
WO2021051555A1 (zh) 基于图像识别的左心室测量方法、装置以及计算机设备
Zhou et al. Adaptive weighted locality-constrained sparse coding for glaucoma diagnosis
Vimalajeewa et al. Early detection of ovarian cancer by wavelet analysis of protein mass spectra
CN113139928A (zh) 肺结节检测模型的训练方法和肺结节检测方法
Wang et al. Improved classifier for computer‐aided polyp detection in CT Colonography by nonlinear dimensionality reduction
CN116910295A (zh) 基于抗混淆因子的哈希图像检索方法
US20230386023A1 (en) Method for detecting medical images, electronic device, and storage medium
Xiong et al. Lung field segmentation using weighted sparse shape composition with robust initialization
CN113327655B (zh) 多维度数据的离群值检测方法、装置、设备和介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20933631

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20933631

Country of ref document: EP

Kind code of ref document: A1