WO2022193628A1

WO2022193628A1 - Colon lesion intelligent recognition method and system based on unsupervised transfer picture classification, and medium

Info

Publication number: WO2022193628A1
Application number: PCT/CN2021/123276
Authority: WO
Inventors: 吴庆耀; 吴家驹; 刘飞
Original assignee: 华南理工大学
Priority date: 2021-03-15
Filing date: 2021-10-12
Publication date: 2022-09-22
Also published as: CN113160135A

Abstract

A colon lesion intelligent recognition method and system based on unsupervised transfer picture classification, and a medium. A model comprising two sub-network modules having the same structure, a difficulty quantification module, a domain alignment module, a noise adaptive module, and a diversity module is constructed for performing intelligent recognition on a colon lesion. There is no need for annotated colon microscopic image samples. The method has high robustness of incorrect annotation, and overcomes the defect in the existing colon lesion intelligent recognition technology that the cost is very high due to the fact that the number and the annotation quality of colon microscopic images which are difficult to obtain are very dependent. At the same time, the colon lesion intelligent recognition method is based on unsupervised transfer learning, and is low in cost, high in robustness, and high in flexibility.

Description

Intelligent identification method, system and medium of colon lesions based on unsupervised transfer image classification

technical field

The invention belongs to the technical field of unsupervised transfer learning and intelligent medical picture classification, and in particular relates to a colon lesion intelligent identification method, system and medium based on unsupervised transfer picture classification.

Background technique

In recent years, artificial intelligence and related industries have been developing rapidly and have become the focus of attention from academia, industry and governments around the world. The State Council issued the "New Generation Artificial Intelligence Development Plan", highlighting the national strategic position of artificial intelligence research and industry. . In the field of intelligent identification of colon lesions, colon microscopic image samples are not easy to obtain, and labeling these samples is very difficult, requiring manual labeling by professional and experienced doctors, and labeling errors are inevitable. Existing methods rely heavily on high-quality annotated colon microscopic image samples, which are costly and difficult to apply to real-world medical applications. Therefore, how to reduce the dependence on the annotated colon microscopic images is an urgent problem to be solved in the intelligent identification of colon lesions.

SUMMARY OF THE INVENTION

The main purpose of the present invention is to overcome the deficiencies of the prior art, and to provide a method, system and medium for intelligent identification of colon lesions based on unsupervised transfer picture classification.

In order to achieve the above object, the present invention adopts the following technical solutions:

The present invention provides an intelligent identification method for colon lesions based on unsupervised migration picture classification, comprising the following steps:

Define the category of colon microscopic images in the target field; collect and process the digital slice images of the colon in the source field to make the annotation consistent with the category of colon microscopic images in the target field;

Build an intelligent identification model of colon lesions, including: two sub-network modules with the same structure, difficulty quantification module, domain alignment module, noise adaptability module and diversity module;

Using the processed digital slice images of the colon in the source domain as a sample to train an intelligent recognition model for colon lesions, the details are as follows:

Input the sample into the two sub-network modules to obtain the classification prediction result and feature vector of the sample;

Input the classification prediction result of the sample into the difficulty quantification module to obtain the difficulty coefficient of the sample;

The domain alignment module, the noise adaptability module and the diversity module are used to construct the final loss function of the intelligent identification model of colon lesions; wherein, the domain alignment module constructs the domain alignment loss by using the feature vector and difficulty coefficient of the sample; the noise The adaptive module uses the modeling manual labeling error probability method to process the prediction results, and constructs the classification loss; the diversity module adopts the KL divergence to measure the similarity between the two sub-network modules, and constructs the diversity loss; the final The loss function is used to iteratively optimize the intelligent identification model of colon lesions;

Model deployment and prediction: Input the colon microscopic images of the target domain into the trained colon lesion intelligent recognition model, and predict whether lesions will occur according to the model output results.

As a preferred technical solution, the categories of colon microscopic images that define the target area include: normal, adenoma, adenocarcinoma and mucinous adenocarcinoma.

As a preferred technical solution, in the training process, let the i-th training sample be _xi ;

The sample passes through the feature extractors of the two sub-network modules to obtain a feature vector P _τ (x _i ), where τ={1,2} represents two sub-networks; the feature vector P _τ (x _i ) passes through the two sub-networks The classifier of the module, get the classification prediction result

As a preferred technical solution, the difficulty quantization module adopts a quantization formula to obtain the difficulty coefficient λ( _xi ) of the training sample x _i , which is specifically as follows:

in,

Predict the result for the ith classification of the two sub-network modules.

As a preferred technical solution, the domain alignment module adopts the reweighting method to align the loss to obtain the domain alignment loss, which is specifically as follows:

Among them, d _τ ( ) is the probability prediction of the sample from the source domain or the target domain by the domain alignment module, S is the source domain data set, T is the target domain data set, _ns is the number of samples in the source domain, and n _t is the target domain. Number of samples.

As a preferred technical solution, the processing of the prediction results by the method of modeling and manual labeling error probability is specifically as follows: when the model prediction is correct in the training stage but the labeling is wrong, the conversion prediction result is consistent with the labeling, and the prediction stage uses the unmarked The converted prediction results, where the model of the manual labeling error probability method is as follows:

Among them, {w _km ,b _km } are the parameters of the modeling, and f is the prediction result of the model on the sample;

The classification loss is specifically as follows:

in,

is the model of the manual labeling error probability method, and γ is a hyperparameter that controls the weight of the sample.

As a preferred technical solution, the diversity loss is specifically as follows:

where D _KL is the KL divergence.

As a preferred technical solution, the training process adopts the gradient descent method for iterative optimization; the final loss function is constructed by weighted domain alignment loss, classification loss and diversity loss, and the specific formula is as follows:

L=max(-αL ^d )+L ^c -ηL ^div ,

where α is the weight of the domain alignment loss, and η is the weight of the diversity loss.

The present invention also provides an intelligent identification system for colon lesions based on unsupervised transfer image classification, which is applied to the above-mentioned intelligent identification method for colon lesions based on unsupervised transfer image classification, including a preprocessing module, a model building module, a model training module and a Model prediction module;

The preprocessing module is used to define the category of colon microscopic images in the target area; collect and process the digital slice images of the colon in the source area, so that the labeling is consistent with the category of colon microscopic images in the target area;

The model building module builds an intelligent identification model of colon lesions, including: two sub-network modules with the same structure, difficulty quantification module, domain alignment module, noise adaptability module and diversity module;

The model training module uses the processed source domain colon digital slice images as samples to train the colon lesion intelligent recognition model, specifically:

The model prediction module deploys the model and makes predictions, inputs the colon microscopic image of the target field into the trained colon lesion intelligent recognition model, and predicts whether the lesion occurs according to the model output result.

The present invention also provides a storage medium storing a program, and when the program is executed by the processor, the above-mentioned method for intelligent identification of colon lesions based on classification of unsupervised migration pictures is implemented. Compared with the prior art, the present invention has the following advantages and beneficial effects:

(1) The intelligent identification method of colon lesions proposed by the present invention does not require labeled colon microscopic image samples, and has high robustness to wrong labeling. It uses easily obtained labeled colon digital slice images to train the model, and uses The trained model is used for colon microscopic image prediction, which overcomes the fact that the existing intelligent identification technology of colon lesions relies on the quantity and quality of colon microscopic images that are difficult to obtain. The cost is very high, and the performance is greatly improved when there are errors in the annotation. Decreases and other defects.

(2) The intelligent identification method of colon lesions proposed by the present invention is based on unsupervised transfer learning, and has low cost, strong robustness and high flexibility.

Description of drawings

1 is a schematic diagram of the overall flow of an intelligent identification method for colon lesions based on unsupervised migration picture classification according to an embodiment of the present invention;

2 is a schematic diagram of a training process of an intelligent identification model for colon lesions according to an embodiment of the present invention;

3 is a schematic diagram of overcoming the influence of incorrect labeling according to an embodiment of the present invention;

4 is a schematic diagram of predicting lesions in colon microscopic images according to an embodiment of the present invention;

5 is a schematic structural diagram of an intelligent identification system for colon lesions based on unsupervised transfer picture classification according to an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of a storage medium according to an embodiment of the present invention.

Detailed ways

In order to make those skilled in the art better understand the solutions of the present application, the following will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative efforts shall fall within the protection scope of this application.

Example 1

As shown in FIG. 1 , this embodiment provides an intelligent identification method for colon lesions based on unsupervised transfer image classification, which includes the following steps:

S1. Define the category of colon microscopic images in the target field; collect and process the digital slice images of the colon in the source field so that the labeling is consistent with the category of colon microscopic images in the target field;

More specifically, in step S1, the categories of colon microscopic images of the defined target area include "normal", "adenoma", "adenocarcinoma" and "mucinous adenocarcinoma".

S2. Build an intelligent identification model for colon lesions, including: two sub-network modules with the same structure, a difficulty quantification module, a domain alignment module, a noise adaptability module, and a diversity module;

More specifically, in step S2, the sub-network module includes a feature extractor and a classifier; the difficulty quantification module is used to obtain the difficulty coefficient; the domain alignment module, the noise adaptability module and the diversity module are used to construct the model. The final loss function; the domain alignment module is equivalent to the discriminator, the two sub-network modules are equivalent to the generator, and a generative adversarial network is formed between these two parts.

S3, using the processed source domain colon digital slice image as a sample to train a colon lesion intelligent recognition model, as shown in Figure 2;

More specifically, in step S3, let the i-th training sample be x _i ;

S3.1. Pass the sample through the feature extractors of two sub-network modules to obtain a feature vector P _τ (x _i ), where τ={1,2} represents two sub-networks; the feature vector P _τ (x _i ) passes through Classifier of two sub-network modules to get classification prediction results

S3.2, input the classification prediction result of the sample into the difficulty quantization module, and adopt the quantization formula proposed by the present invention to obtain the difficulty coefficient λ( _xi ) of the training sample x _i , specifically as follows:

in,

Predict the result for the i-th classification of the two sub-network modules;

S3.3. Input the feature vector and difficulty coefficient of the sample into the generative adversarial network of the domain alignment module, adopt the reweighting method proposed by the present invention to align the loss, construct the domain alignment loss, and align the domain feature space, as follows:

Among them, d _τ ( ) is the probability prediction of the sample from the source domain or the target domain by the domain alignment module, S is the source domain data set, T is the target domain data set, _ns is the number of samples in the source domain, and n _t is the target domain. Number of samples

S3.4, the noise adaptability module uses the modeling manual labeling error probability method proposed by the present invention to process the prediction result, as shown in Figure 3, and constructs the classification loss;

The use of the modeling manual labeling error probability method to process the prediction results can reduce the damage of manual labeling errors, specifically: when the model prediction is correct in the training stage and the labeling is wrong, the conversion prediction result is consistent with the labeling, and the prediction stage uses Untransformed prediction results, where the model of manual labeling error probability method is as follows:

The classification loss is specifically as follows:

in,

is the model of the manual labeling error probability method, and γ is a hyperparameter that controls the weight of the model.

S3.5. The diversity module adopts KL divergence to measure the similarity between the two sub-network modules, so as to ensure the effect of the integration of the two sub-networks, and construct the diversity loss, as follows:

where D _KL is the KL divergence.

S3.6. Use the final loss function to iteratively optimize the colon lesion intelligent recognition model, and use the gradient descent method to iteratively optimize the training process. The final loss function is constructed by weighted domain alignment loss, classification loss and diversity loss, as follows:

L=max(-αL ^d )+L ^c -ηL ^div ,

S4. Model deployment and prediction, as shown in Figure 4, input the colon microscopic image of the target domain into the trained colon lesion intelligent recognition model for prediction, and predict whether a lesion occurs according to the model output result.

As shown in FIG. 5 , this embodiment provides an intelligent identification system for colon lesions based on unsupervised transfer image classification, including a preprocessing module, a model building module, a model training module and a model prediction module;

The preprocessing module is used to define the category of colon microscopic images in the target field; collect and process the digital slice images of the colon in the source field, so that the labeling is consistent with the category of the colonic microscopic images in the target field;

The model building module constructs an intelligent identification model of colon lesions, including: two sub-network modules with the same structure, a difficulty quantification module, a domain alignment module, a noise adaptability module and a diversity module;

The model training module utilizes the processed source domain colon digital slice image as a sample to train a colon lesion intelligent recognition model, specifically:

The model prediction module is used for deploying the model and making predictions, inputting the colon microscopic image of the target domain into the trained colon lesion intelligent identification model, and predicting whether a lesion occurs according to the model output result.

It should be noted here that the system provided in this embodiment only takes the division of the above-mentioned functional modules as an example. In practical applications, the above-mentioned function allocation can be completed by different functional modules as required, that is, the internal structure is divided into Different functional modules are used to complete all or part of the functions described above, and the system is an intelligent identification method for colon lesions based on unsupervised transfer image classification applied to the above embodiment.

As shown in FIG. 6 , the present embodiment also provides a storage medium storing a program. When the program is executed by the processor, a method for intelligently identifying colon lesions based on unsupervised migration picture classification is implemented, specifically:

S2. Build an intelligent recognition model for colon lesions, including: two sub-network modules with the same structure, a difficulty quantification module, a domain alignment module, a noise adaptability module, and a diversity module;

S3, using the processed colon digital slice image of the source domain as a sample to train an intelligent recognition model for colon lesions, specifically:

S3.1. Input the sample into the two sub-network modules to obtain the classification prediction result and feature vector of the sample;

S3.2, input the classification prediction result of the sample into the difficulty quantification module to obtain the difficulty coefficient of the sample;

S3.3. The domain alignment module, the noise adaptability module and the diversity module are used to construct the final loss function of the intelligent identification model of colon lesions; wherein, the domain alignment module utilizes the feature vector and difficulty coefficient of the sample to construct the domain alignment loss ; the noise adaptability module uses the modeling manual labeling error probability method to process the prediction results, and constructs the classification loss; the diversity module adopts the KL divergence to measure the similarity between the two sub-network modules, and constructs the diversity loss ; The final loss function is used to iteratively optimize the intelligent identification model of colon lesions;

S4, model deployment and prediction, input the colon microscopic image of the target domain into the trained colon lesion intelligent recognition model, and predict whether a lesion occurs according to the model output result.

It should be understood that various parts of this application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or a combination of the following techniques known in the art: Discrete logic circuits, application specific integrated circuits with suitable combinational logic gates, Programmable Gate Arrays (PGA), Field Programmable Gate Arrays (FPGA), etc.

The above-mentioned embodiments are preferred embodiments of the present invention, but the embodiments of the present invention are not limited by the above-mentioned embodiments, and any other changes, modifications, substitutions, combinations, The simplification should be equivalent replacement manners, which are all included in the protection scope of the present invention.

Claims

The intelligent identification method of colon lesions based on unsupervised transfer image classification is characterized in that, it comprises the following steps:

Define the category of colon microscopic images in the target field; collect and process the digital slice images of the colon in the source field to make the annotation consistent with the category of colon microscopic images in the target field;

Build an intelligent identification model of colon lesions, including: two sub-network modules with the same structure, difficulty quantification module, domain alignment module, noise adaptability module and diversity module;

Using the processed digital slice images of the colon in the source domain as a sample to train an intelligent recognition model for colon lesions, the details are as follows:

Input the sample into the two sub-network modules to obtain the classification prediction result and feature vector of the sample;

Input the classification prediction result of the sample into the difficulty quantification module to obtain the difficulty coefficient of the sample;

The domain alignment module, the noise adaptability module and the diversity module are used to construct the final loss function of the intelligent identification model of colon lesions; wherein, the domain alignment module constructs the domain alignment loss by using the feature vector and difficulty coefficient of the sample; the noise The adaptive module uses the modeling manual labeling error probability method to process the prediction results, and constructs the classification loss; the diversity module adopts the KL divergence to measure the similarity between the two sub-network modules, and constructs the diversity loss; the final The loss function is used to iteratively optimize the intelligent identification model of colon lesions;

Model deployment and prediction: Input the colon microscopic images of the target domain into the trained colon lesion intelligent recognition model, and predict whether lesions will occur according to the model output results.
The method for intelligent identification of colon lesions based on unsupervised transfer image classification according to claim 1, wherein the categories of colon microscopic images in the defined target area include: normal, adenoma, adenocarcinoma and mucinous adenocarcinoma.
The colon lesion intelligent identification method based on unsupervised migration picture classification according to claim 1, is characterized in that, in the training process, let the i-th training sample be xi ;

The sample passes through the feature extractors of the two sub-network modules to obtain a feature vector P τ (x i ), where τ={1,2} represents two sub-networks; the feature vector P τ (x i ) passes through the two sub-networks The classifier of the module, get the classification prediction result
The colon lesion intelligent identification method based on unsupervised migration picture classification according to claim 3, is characterized in that, described difficulty quantization module adopts quantization formula to obtain the difficulty coefficient λ(x i ) of training sample x i , specifically as follows:

in,
Predict the result for the ith classification of the two sub-network modules.
The method for intelligent identification of colon lesions based on unsupervised migration picture classification according to claim 4, wherein the domain alignment module adopts a reweighting method to align the loss to obtain the domain alignment loss, which is specifically as follows:

Among them, d τ ( ) is the probability prediction of the sample from the source domain or the target domain by the domain alignment module, S is the source domain data set, T is the target domain data set, ns is the number of samples in the source domain, and n t is the target domain. Number of samples.
The method for intelligent identification of colon lesions based on unsupervised transfer picture classification according to claim 5, characterized in that, the method for processing the prediction result by using the modeled manual labeling error probability method is specifically: when the model predicts correctly in the training stage, the labeling is performed. When it is wrong, the converted prediction result is consistent with the annotation, and the untransformed prediction result is used in the prediction stage. The model of the manual annotation error probability method is as follows:

Among them, {w km ,b km } are the parameters of the modeling, and f is the prediction result of the model on the sample;

The classification loss is specifically as follows:

in,
is the model of the manual labeling error probability method, and γ is a hyperparameter that controls the weight of the sample.
The method for intelligent identification of colon lesions based on unsupervised transfer picture classification according to claim 6, wherein the diversity loss is specifically as follows:

where D KL is the KL divergence.
The method for intelligent identification of colon lesions based on unsupervised transfer picture classification according to claim 7, wherein the training process adopts gradient descent method for iterative optimization; the final loss function is composed of domain alignment loss, classification loss and diversity The weighted construction of sexual loss is as follows:

L=max(-αL d )+L c -ηL div ,

where α is the weight of the domain alignment loss, and η is the weight of the diversity loss.
An intelligent identification system for colon lesions based on unsupervised transfer picture classification is characterized in that, it is applied to the intelligent identification method for colon lesions based on unsupervised transfer picture classification according to any one of claims 1-8, comprising a preprocessing module, a model Building modules, model training modules and model prediction modules;

The preprocessing module is used to define the category of colon microscopic images in the target field; collect and process the digital slice images of the colon in the source field, so that the labeling is consistent with the category of the colonic microscopic images in the target field;

The model building module constructs an intelligent identification model of colon lesions, including: two sub-network modules with the same structure, a difficulty quantification module, a domain alignment module, a noise adaptability module and a diversity module;

The model training module utilizes the processed source domain colon digital slice image as a sample to train a colon lesion intelligent recognition model, specifically:

Input the sample into the two sub-network modules to obtain the classification prediction result and feature vector of the sample;

Input the classification prediction result of the sample into the difficulty quantification module to obtain the difficulty coefficient of the sample;

The domain alignment module, the noise adaptability module and the diversity module are used to construct the final loss function of the intelligent identification model of colon lesions; wherein, the domain alignment module constructs the domain alignment loss by using the feature vector and difficulty coefficient of the sample; the noise The adaptive module uses the modeling manual labeling error probability method to process the prediction results, and constructs the classification loss; the diversity module adopts the KL divergence to measure the similarity between the two sub-network modules, and constructs the diversity loss; the final The loss function is used to iteratively optimize the intelligent identification model of colon lesions;

The model prediction module is used for deploying the model and making predictions, inputting the colon microscopic image of the target domain into the trained colon lesion intelligent identification model, and predicting whether a lesion occurs according to the model output result.
A storage medium storing a program, characterized in that, when the program is executed by a processor, the method for intelligently identifying colon lesions based on unsupervised migration picture classification according to any one of claims 1-8 is implemented.