CN114863124A - Model training method, polyp detection method, corresponding apparatus, medium, and device - Google Patents

Model training method, polyp detection method, corresponding apparatus, medium, and device Download PDF

Info

Publication number
CN114863124A
CN114863124A CN202210583592.9A CN202210583592A CN114863124A CN 114863124 A CN114863124 A CN 114863124A CN 202210583592 A CN202210583592 A CN 202210583592A CN 114863124 A CN114863124 A CN 114863124A
Authority
CN
China
Prior art keywords
polyp
loss
feature
target
branch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210583592.9A
Other languages
Chinese (zh)
Inventor
刘威
刘腾营
边成
张志诚
李永会
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiaohe Medical Instrument Hainan Co ltd
Original Assignee
Xiaohe Medical Instrument Hainan Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiaohe Medical Instrument Hainan Co ltd filed Critical Xiaohe Medical Instrument Hainan Co ltd
Priority to CN202210583592.9A priority Critical patent/CN114863124A/en
Publication of CN114863124A publication Critical patent/CN114863124A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30028Colon; Small intestine
    • G06T2207/30032Colon polyp

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure relates to a model training method, a polyp detection method, a corresponding apparatus, a medium, and a device, the training method including: acquiring a polyp detection model; inputting the training image into a polyp detection model, and obtaining a plurality of feature maps with different scales through a feature extraction network; detecting the position of a polyp target in the training image according to the plurality of feature maps through a polyp detection branch, and calculating detection loss; extracting positive sample characteristics and negative sample characteristics from the characteristic graphs through a comparison learning branch, and calculating a comparison loss according to the distance between the positive sample characteristics and the positive sample target characteristics and the distance between the negative sample characteristics and the negative sample target characteristics; and obtaining a joint loss according to the detection loss and the contrast loss, and updating the polyp detection branch and the contrast learning branch according to the joint loss. The obtained model can have good discrimination capability on a suspected target with a shape similar to a polyp, and the position of the polyp target in the image can be accurately obtained for different application scenes.

Description

Model training method, polyp detection method, corresponding apparatus, medium, and device
Technical Field
The present disclosure relates to the field of image processing technologies, and in particular, to a model training method, a polyp detection method, a corresponding apparatus, a medium, and a device.
Background
Colonoscopy can be used for colon screening and polyp detection, and for polyp detection, polyps are mainly found from endoscopic images by an endoscope specialist through naked eyes at present, and in the related art, polyp targets in the endoscopic images can be detected through a polyp detection model based on deep learning.
The polyp detection model based on deep learning can achieve good performance in a single scene, but in the enteroscopy process, the existing model has the following problems: firstly, the intestinal environment is complex, a large number of suspected targets (such as feces and mucous membrane bulges) with shapes similar to polyps exist, and the frequency of the polyp targets appearing in the whole enteroscopy process is low, so that the virtual detection rate of the existing model is high, and unnecessary interference is caused to the normal examination and diagnosis of doctors; secondly, due to the difference of the acquisition equipment, the image imaging resolution ratio is different, and the polyp distribution of patients in different areas is also greatly different, so that the generalization capability of the existing model is weak, and the omission factor is high in a new application scene.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
In a first aspect, the present disclosure provides a model training method, including:
acquiring a polyp detection model, wherein the polyp detection model comprises a feature extraction network, a polyp detection branch and a contrast learning branch;
inputting a training image into the polyp detection model, and extracting features from the training image through the feature extraction network to obtain a plurality of feature maps with different scales;
detecting the positions of polyp targets in the training image according to the plurality of feature maps through the polyp detection branch, and calculating detection loss according to the detection result;
extracting positive sample features corresponding to polyp targets and negative sample features corresponding to non-polyp targets from the plurality of feature maps through the contrast learning branch, and calculating contrast loss according to the distance between the positive sample features and the positive sample target features and the distance between the negative sample features and the negative sample target features;
obtaining a joint loss from the detection loss and the contrast loss, and updating the polyp detection branch and the contrast learning branch according to the joint loss.
In a second aspect, the present disclosure provides a polyp detection method comprising:
acquiring an image to be detected;
inputting the image to be detected into a target polyp detection model to obtain the position of a polyp target in the image to be detected; the target polyp detection model is obtained by training a polyp detection model based on the method of the first aspect, and includes a feature extraction network and a polyp detection branch.
In a third aspect, the present disclosure provides a model training apparatus, comprising:
the model acquisition module is used for acquiring a polyp detection model, and the polyp detection model comprises a feature extraction network, a polyp detection branch and a comparison learning branch;
the characteristic extraction module is used for inputting a training image into the polyp detection model, extracting characteristics from the training image through the characteristic extraction network and obtaining a plurality of characteristic graphs with different scales;
a polyp detection module for detecting the position of a polyp target in the training image according to the plurality of feature maps through the polyp detection branch, and calculating a detection loss according to a detection result;
a contrast learning module for extracting a positive sample feature corresponding to a polyp target and a negative sample feature corresponding to a non-polyp target from the plurality of feature maps through the contrast learning branch, and calculating a contrast loss according to a distance between the positive sample feature and the positive sample target feature and a distance between the negative sample feature and the negative sample target feature;
and the joint learning module is used for obtaining a joint loss according to the detection loss and the contrast loss and updating the polyp detection branch and the contrast learning branch according to the joint loss.
In a fourth aspect, the present disclosure provides a polyp detection apparatus comprising:
the image acquisition module is used for acquiring an image to be detected;
a polyp detection module, configured to input the image to be detected into a target polyp detection model, and obtain a position of a polyp target in the image to be detected; the target polyp detection model is obtained by training a polyp detection model based on the method of the first aspect, and includes a feature extraction network and a polyp detection branch.
In a fifth aspect, the present disclosure provides a computer readable storage medium having stored thereon a computer program which, when executed by a processing apparatus, performs the steps of the method of the first or second aspect.
In a sixth aspect, the present disclosure provides an electronic device comprising:
a storage device having a computer program stored thereon;
processing means for executing the computer program in the storage means to implement the steps of the method of the first or second aspect.
In the scheme, the generalization performance of the polyp detection model can be improved through the multitask joint learning of the polyp detection branch and the contrast learning branch, specifically, the parameters of the polyp detection branch are updated based on the detection loss of the polyp detection branch and the contrast loss of the contrast learning branch, the robustness of the polyp detection branch facing different scenes is enhanced, the discrimination capability of positive and negative samples is improved, and when different application scenes such as different acquisition devices and patients in different regions are faced, the positive and negative samples are accurately discriminated, so that the position of a polyp target in an image is more accurately obtained.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows.
Drawings
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale. In the drawings:
FIG. 1 illustrates a flow chart of a model training method provided by an exemplary embodiment;
FIG. 2 shows a schematic diagram of a polyp detection model in an exemplary embodiment;
FIG. 3 shows a flowchart illustrating a detailed implementation of step S104 in an exemplary embodiment;
FIG. 4 shows yet another schematic diagram of a polyp detection model in an exemplary embodiment;
FIG. 5 illustrates a flow chart of a polyp detection method provided by an exemplary embodiment;
FIG. 6 illustrates a block diagram of a model training apparatus provided in an exemplary embodiment;
fig. 7 shows a block diagram of a polyp detection apparatus provided by an exemplary embodiment;
FIG. 8 illustrates a block diagram of an electronic device provided by an exemplary embodiment.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
It is understood that before the technical solutions disclosed in the embodiments of the present disclosure are used, the type, the use range, the use scene, etc. of the personal information related to the present disclosure should be informed to the user and obtain the authorization of the user through a proper manner according to the relevant laws and regulations.
For example, in response to receiving an active request from a user, a prompt message is sent to the user to explicitly prompt the user that the requested operation to be performed would require the acquisition and use of personal information to the user. Thus, the user can autonomously select whether to provide personal information to software or hardware such as an electronic device, an application program, a server, or a storage medium that performs the operations of the disclosed technical solution, according to the prompt information.
As an optional but non-limiting implementation manner, in response to receiving an active request from the user, the manner of sending the prompt information to the user may be, for example, a pop-up window, and the prompt information may be presented in a text manner in the pop-up window. In addition, a selection control for providing personal information to the electronic device by the user's selection of "agreeing" or "disagreeing" can be carried in the pop-up window.
It is understood that the above notification and user authorization process is only illustrative and not limiting, and other ways of satisfying relevant laws and regulations may be applied to the implementation of the present disclosure.
Meanwhile, it is understood that the data involved in the present technical solution (including but not limited to the data itself, the acquisition or use of the data) should comply with the requirements of the corresponding laws and regulations and the related regulations.
Aiming at a polyp detection model based on deep learning, with the rapid development of the deep learning in the field of computer vision, a lead algorithm in general target detection is introduced into a polyp detection task in related technologies, the methods usually rely on pre-training on a large-scale general visual data set such as ImageNet, and then fine-tuning is carried out on a reference data set composed of endoscope images aiming at the polyp detection task. Although these methods achieve good performance on reference datasets, these reference datasets are typically small in size and the resulting polyp detection models are at risk of overfitting, resulting in poor generalization performance of the model to fall on a particular enteroscopy procedure. In addition, in the real enteroscopy process, the intestinal environment is very complex, suspected targets such as feces and mucosa bulges easily interfere with the polyp detection model, and factors such as polyp distribution of patients in different acquisition devices and different regions also put higher requirements on the generalization capability of the polyp detection model.
Therefore, the embodiment of the disclosure provides a model training method to improve the generalization capability of a polyp detection model in a complex intestinal environment. Fig. 1 is a flowchart illustrating a model training method provided in an exemplary embodiment, and as shown in fig. 1, the method includes:
s101, a polyp detection model is obtained, and the polyp detection model comprises a feature extraction network, a polyp detection branch and a contrast learning branch.
Fig. 2 shows a schematic diagram of a polyp detection model in an exemplary embodiment, please refer to fig. 2, which includes a feature extraction network, a polyp detection branch, and a contrast learning branch. The feature extraction network is used for extracting features from an input image and outputting a plurality of feature maps with different scales. Illustratively, in fig. 2, the Feature extraction network includes a base network and a Feature Pyramid Network (FPN), first inputting an image into the base network to extract features from bottom to top, then inputting a shallow Feature map extracted by the base network into a high-level Feature map, which is fused by top to bottom and horizontal connections, and finally outputting multiple Feature maps of different scales, for example, outputting Feature maps which are down-sampled by 16 times, 32 times and 64 times than the input image, respectively, P1, P2 and P3. On the basis of the multiple characteristic graphs, a polyp detection branch and a contrast learning branch are respectively connected, the polyp detection branch is used for completing a polyp detection task, the contrast learning branch is used for completing a contrast learning task of positive and negative sample characteristics, and the generalization performance of a polyp detection model is improved through multi-task combined learning of the polyp detection branch and the contrast learning branch.
The polyp detection branch is used for detecting the position of a polyp target according to a plurality of feature maps with different scales output by the feature extraction network, and obtaining a detection result, namely a target frame representing the position of the polyp target. In the training stage, the polyp detection branch also calculates the detection loss of the polyp detection branch according to the position of the detected polyp target and the polyp target label of the input image.
The contrast learning branch is used for storing positive sample target features and negative sample target features, sampling positive sample features corresponding to polyp targets and negative sample features corresponding to non-polyp targets from a plurality of feature maps with different scales output by the feature extraction network, and calculating contrast loss of the contrast learning branch according to the distance between each sampled positive sample feature and each sampled positive sample target feature and the distance between each sampled negative sample feature and each sampled negative sample target feature.
S102, inputting the training image into a polyp detection model, extracting features from the training image through a feature extraction network, and obtaining a plurality of feature maps with different scales.
Before this, a training set is obtained, which includes a plurality of training images. In an exemplary embodiment, to ensure data diversity, the training set is obtained by extracting frames from endoscope videos acquired in a hospital, the acquired endoscope videos come from different acquisition devices, the video extraction frame rate is 5FPS, blurred images are filtered out, and then the images are labeled by a physician.
It should be noted that in the general target detection task, the frequency of the target appearing in the image is high, while in the real endoscope video, the frequency of the polyp target appearing is low, the intestinal environment is complex, and a large number of suspected targets (such as stool, mucosa bulge, etc.) with shapes similar to polyps exist. To reduce false detection rates during enteroscopy, a large number of images of the flesh-free target were added to the training set.
Optionally, to further increase the data diversity, each training image in the training set is subjected to at least one of the following data augmentations: random cropping, random flip-up and down, random flip-left and right, and random color transformation, and finally scaling the image size to a preset size (e.g., 512 × 512).
S103, the positions of the polyp targets in the input training image are detected from the plurality of feature maps by the polyp detection branch, and the detection loss is calculated from the detection result.
S104, extracting positive sample characteristics corresponding to polyp targets and negative sample characteristics corresponding to non-polyp targets from the plurality of characteristic graphs through a contrast learning branch, and calculating contrast loss according to the distance between the positive sample characteristics and the positive sample target characteristics and the distance between the negative sample characteristics and the negative sample target characteristics.
S105, obtaining a joint loss according to the detection loss and the contrast loss, and updating the polyp detection branch and the contrast learning branch according to the joint loss.
Specifically, a weighted sum of the detection loss and the contrast loss is calculated as a joint loss of the polyp detection model, and the polyp detection branch and the contrast learning branch are updated based on the joint loss.
In the contrast learning branch, firstly, a positive sample target feature representing positive sample feature information and a negative sample target feature representing negative sample feature information need to be generated, the positive sample target feature and the negative sample target feature are initialized and randomly generated, but the positive sample target feature and the negative sample target feature are learnable and can be used for contrast learning according to the positive sample feature and the negative sample feature of the input image. When the comparison learning branch is updated in each training iteration process, the positive sample target feature and the negative sample target feature in the comparison learning branch are updated, so that the positive sample target feature and the negative sample target feature are adaptively updated according to joint loss as the training process of the polyp detection model is carried out.
In the training process, the positive sample target features approach to the positive sample features, the negative sample target features approach to the negative sample features, and therefore the distance between the learned positive sample target features and the learned negative sample target features is increased continuously.
The parameters of the polyp detection branch are updated based on the detection loss of the polyp detection branch and the contrast loss of the contrast learning branch, the robustness of the polyp detection branch in a polyp detection model facing different scenes is enhanced, the discrimination capability of positive and negative samples is improved, and the positive and negative samples are accurately discriminated when different application scenes such as different acquisition equipment and patients in different regions are faced, so that the position of a polyp target in an image is more accurately obtained.
In order to obtain positive and negative sample characteristics for comparison learning, a comparison learning branch performs sample characteristic sampling on a three-layer characteristic diagram output by a characteristic extraction network. Fig. 3 is a flowchart illustrating a specific implementation manner of the step S104 in an exemplary embodiment, referring to fig. 3, the step S104 includes:
s201, according to the polyp target label of the training image, determining a label of each pixel position in each feature map.
Since each feature map is obtained by extracting features from an input training image, pixel positions on the feature map have a mapping relation with pixel positions on the training image, and whether each pixel position on each feature map corresponds to a polyp target can be determined according to a polyp target label of the training image, wherein the polyp target label is a target frame for framing the polyp target. Then, the label y for each pixel location is determined based on whether each pixel location on each feature map corresponds to a polyp target i ,y i E {0,1}, where 0 represents a positive sample and 1 represents a negative sample. Specifically, the label of the pixel location in each feature map corresponding to the polyp target is determined to be a positive sample, and the label of the pixel location in each feature map corresponding to the non-polyp target is determined to be a negative sample.
And S202, according to the label of each pixel position in each feature map, sampling a plurality of positive sample features corresponding to the positive samples and a plurality of negative sample features corresponding to the negative samples from the plurality of feature maps.
In order to prevent the training process from being dominated by the negative samples, the number of the negative sample features is determined according to the number of the positive sample features in the sampling process.
Firstly, sampling all positive sample features corresponding to the positive samples from the positions of the positive samples in the plurality of feature maps according to the label of each pixel position, wherein one sample feature can be obtained by sampling at each pixel position, illustratively, the feature map P1 is 32 × 32 × 512 in size, one sample feature of 1 × 1 × 512 can be obtained by sampling from one pixel position on the feature map P1, and finally obtaining N p Individual positive sample features. Then, k x N corresponding to the negative samples are randomly sampled from the plurality of feature maps p A negative sample characteristic, k being a predetermined coefficient, e.g. k is 3, 3N is finally obtained p Individual negative sample characteristics. Thus, when the number of sampled positive sample features is small, the number of sampled negative sample features is correspondingly reduced, and when the number of sampled positive sample features is large, the number of sampled negative sample features is correspondingly increased.
The value of k should not be too large, for example, the value of k may be within [1,5 ].
And S203, calculating the contrast loss according to the distance between each positive sample feature and the positive sample target feature and the distance between each negative sample feature and the negative sample target feature.
It is worth noting that in the above scheme, the positive and negative sample features come from corresponding pixel positions on feature maps of different scales output by the feature extraction network, so that the positive and negative sample features are ensured to include multi-scale information.
Optionally, in the above step, each of the positive sample features, each of the negative sample features, the positive sample target features, and the negative sample target features is subjected to L2 normalization processing, an intra-class loss is calculated according to a distance between each of the normalized positive sample features and the positive sample target features, and a distance between each of the normalized negative sample features and the negative sample target features, an inter-class loss is calculated according to a distance between each of the normalized positive sample features and the negative sample target features, and a distance between each of the normalized negative sample features and the positive sample target features, and then a contrast loss is calculated according to the intra-class loss and the inter-class loss. Thus, contrast loss consists of intra-class loss and inter-class loss.
In the above calculation, the distance between two features can be measured by cosine similarity.
Based on the above embodiment, the intra-class loss, the inter-class loss, and the contrast loss can be calculated using the following formulas:
Figure BDA0003662675850000101
Figure BDA0003662675850000102
L con =L pos +L neg
wherein L is pos Is an internal loss of class, L neg Is an inter-class loss, L con For contrast loss, cos () represents the calculation of cosine similarity,
Figure BDA0003662675850000111
representing a calculation x 1 And x 2 Cosine similarity between them, max () denotes taking the maximum value, max (z) 1 ,z 2 ) Is expressed by taking z 1 And z 2 The larger of which, N p Is the number of sampled positive sample features, N n Is the number of negative sample features sampled, p 1 For positive sample target features, p 0 In order to be a negative sample target feature,
Figure BDA0003662675850000112
for the normalized ith positive sample feature,
Figure BDA0003662675850000113
is normalized toi negative sample features.
It can be seen from the above process that the positive and negative sample features are obtained by extracting features of corresponding positions on a multi-scale feature map according to the labels of the pixel positions, so that the multi-scale information of the polyp target is fully utilized, and the adaptability of the polyp detection branch in the polyp detection model to the polyp targets of different scales is improved.
Optionally, in order to further improve the discrimination capability of the model between the real polyp target and the suspected polyp target and further reduce the false detection, fig. 4 shows another schematic diagram of the polyp detection model in an exemplary embodiment, in fig. 4, the polyp detection model further includes an image classification branch, and on the basis of the top-level feature map output by the feature extraction network, an image classification branch is accessed, and the image classification branch is used for completing a global image classification task for identifying whether the image contains the polyp target.
Specifically, the image classification branch is used for acquiring a top-level feature map in a plurality of feature maps with different scales output by the feature extraction network, classifying whether a polyp target exists in an input image or not based on image global features in the top-level feature map, and calculating the classification loss of the image classification branch.
Specifically, the image classification branch comprises an average pooling layer and a full-connection layer, wherein in the image classification branch, a top-layer feature map is input into the average pooling layer, global average pooling is performed on the top-layer feature map through the average pooling layer to obtain a one-dimensional feature vector, then the one-dimensional feature vector is input into the full-connection layer, and a classification predicted value p is output through the full-connection layer. And acquiring a label y of the input image, and substituting the classification predicted value p and the label y into a classification loss function to calculate to obtain the classification loss.
The classification loss function may adopt the following function:
L cls =-y log(p)-(1-y)log(1-p);
where y is ∈ {0,1 }. If the input image contains a polyp object, label y is 1, otherwise label y is 0.
Therefore, in step S105, a joint loss is obtained from the detection loss, the contrast loss, and the classification loss, and the polyp detection branch, the contrast learning branch, and the image classification branch are updated from the joint loss.
Specifically, the combined losses are:
L total =λ 0 L det1 L con2 L cls
wherein L is total For combined losses, L det Detecting loss of detection of branches for polyps, L con To compare the loss of contrast of the learning branch, L cls Classification loss, λ, for image classification branches 0 、λ 1 And λ 2 The weight coefficients of the loss of the three branches are used for keeping the training to be performed stably. In one example, λ 0 、λ 1 And λ 2 Are all set to 1.0.
The generalization performance of the polyp detection model can be further improved through the multi-task joint learning of the polyp detection branch, the contrast learning branch and the image classification branch.
It is worth noting that the image classification branch and the contrast learning branch are only used in the training stage and used for improving the generalization performance of the polyp detection model, and in the actual detection stage, the two branches can be completely discarded without increasing the weight parameters of the polyp detection model and the reasoning time consumption.
After training is finished, a final polyp detection model is obtained, and a target polyp detection model is obtained according to a feature extraction network and a polyp detection branch in the final polyp detection model, so the target polyp detection model comprises the feature extraction network and the polyp detection branch, wherein the polyp detection branch of the target polyp detection model is obtained by training based on detection loss, contrast loss and classification loss, and therefore the polyp detection branch can well distinguish suspected targets with similar polyps in shapes, and can accurately obtain the positions of the polyp targets in the images when facing different application scenes of different acquisition equipment, patients in different regions and the like.
Thus, the present disclosure provides a polyp detection method for detecting a polyp target position in an image. Fig. 5 is a flowchart illustrating a polyp detection method according to an exemplary embodiment, and referring to fig. 5, the method includes:
s301, acquiring an image to be detected.
S302, inputting the image to be detected into a target polyp detection model to obtain the position of a polyp target in the image to be detected. Wherein, the target polyp detection model is obtained based on the model training method provided by the disclosure.
The target polyp detection model includes a feature extraction network and polyp detection branches. The image to be detected is input to a feature extraction network after being preprocessed, the preprocessing includes but is not limited to zooming the image to be detected to a preset size, such as 512 x 512, the feature extraction network is used for extracting features from the input image to be detected and outputting a plurality of feature maps with different scales, and the polyp detection branch is used for detecting the position of a polyp target in the image to be detected according to the plurality of feature maps with different scales output by the feature extraction network, obtaining a detection result and outputting a target frame representing the position of the polyp target in the image to be detected.
Because the polyp detection branch of the target polyp detection model is obtained by training based on detection loss, contrast loss and classification loss, the polyp detection branch can have good discrimination capability on a suspected target with a shape similar to a polyp, and can accurately output a target frame representing the position of the polyp target in an image to be detected when the suspected target faces different application scenes such as different acquisition equipment, patients in different regions and the like.
Fig. 6 is a block diagram of a model training apparatus according to an exemplary embodiment, and referring to fig. 6, the model training apparatus 400 includes:
a model obtaining module 401, configured to obtain a polyp detection model, where the polyp detection model includes a feature extraction network, a polyp detection branch, and a contrast learning branch;
a feature extraction module 402, configured to input a training image into the polyp detection model, extract features from the training image through the feature extraction network, and obtain multiple feature maps of different scales;
a polyp detection module 403, configured to detect, through the polyp detection branch, a position of a polyp target in the training image according to the plurality of feature maps, and calculate a detection loss according to a detection result;
a contrast learning module 404, configured to extract, through the contrast learning branch, a positive sample feature corresponding to a polyp target and a negative sample feature corresponding to a non-polyp target from the plurality of feature maps, and calculate a contrast loss according to a distance between the positive sample feature and the positive sample target feature and a distance between the negative sample feature and the negative sample target feature;
a joint learning module 405, configured to obtain a joint loss according to the detection loss and the contrast loss, and update the polyp detection branch and the contrast learning branch according to the joint loss.
Optionally, the polyp detection model further comprises an image classification branch, the apparatus 400 further comprising:
and the image classification module is used for classifying whether the training image contains the polyp target or not according to the top layer characteristic diagram in the plurality of characteristic diagrams through the image classification branch and calculating the classification loss according to the classification result.
Wherein the joint learning module 405 is configured to obtain a joint loss according to the detection loss, the contrast loss, and the classification loss, and update the polyp detection branch, the contrast learning branch, and the image classification branch according to the joint loss.
Optionally, the contrast learning module 404 includes:
a label determining module, configured to determine a label of each pixel position in each feature map according to a polyp target label of the training image; wherein labels of pixel positions corresponding to polyp targets in the feature map are positive samples, and labels of pixel positions corresponding to non-polyp targets are negative samples;
and the feature sampling module is used for sampling a plurality of positive sample features corresponding to the positive samples and a plurality of negative sample features corresponding to the negative samples from the plurality of feature maps according to the label of each pixel position in each feature map.
Optionally, the feature sampling module is configured to:
sampling all positive sample characteristics corresponding to positive samples from the plurality of feature maps and randomly sampling k × N corresponding to negative samples according to the label of each pixel position in each feature map p Individual negative sample characteristics; wherein N is p K is a preset coefficient for the number of positive sample features sampled.
Optionally, the contrast learning module 404 is configured to:
performing L2 normalization on each positive sample feature, each negative sample feature, the positive sample target feature, and the negative sample target feature;
calculating the intra-class loss according to the distance between each normalized positive sample characteristic and the positive sample target characteristic and the distance between each normalized negative sample characteristic and the negative sample target characteristic;
calculating inter-class loss according to the distance between each normalized positive sample feature and each normalized negative sample target feature and the distance between each normalized negative sample feature and each normalized positive sample target feature;
and calculating the contrast loss according to the intra-class loss and the inter-class loss.
Optionally, the intra-class loss, the inter-class loss, and the contrast loss are calculated by the following formulas:
Figure BDA0003662675850000151
Figure BDA0003662675850000152
L con =L pos +L neg
wherein L is pos For said intra-class losses, L neg For said inter-class losses, L con For the contrast loss, cos () means the cosine similarity is calculated, max () means the maximum value, N p Is a positive sample characteristic of the sampleNumber of (2), N n Is the number of negative sample features sampled, p 1 For positive sample target features, p 0 In order to be a negative sample target feature,
Figure BDA0003662675850000153
for the normalized ith positive sample feature,
Figure BDA0003662675850000154
is the normalized ith negative sample characteristic.
Optionally, the joint learning module 405 is configured to:
updating the polyp detection branch, and the positive sample target feature and the negative sample target feature in the contrast learning branch, according to the joint loss.
Fig. 7 shows a block diagram of a polyp detection apparatus provided by an exemplary embodiment, and referring to fig. 7, the polyp detection apparatus 500 includes:
an image obtaining module 501, configured to obtain an image to be detected;
a polyp detection module 502, configured to input the image to be detected into a target polyp detection model, and obtain a position of a polyp target in the image to be detected; the target polyp detection model is obtained by training a polyp detection model based on the model training method disclosed by the disclosure, and comprises a feature extraction network and a polyp detection branch.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
In an exemplary embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processing device, implements the model training method or the polyp detection method of the present disclosure.
In an exemplary embodiment, there is provided an electronic device including:
a storage device having a computer program stored thereon;
processing means for executing the computer program in the storage means to implement the model training method or the polyp detection method of the present disclosure.
Referring now to FIG. 8, shown is a schematic diagram of an electronic device 600 suitable for use in implementing embodiments of the present disclosure. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., car navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 8, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 8 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a polyp detection model, wherein the polyp detection model comprises a feature extraction network, a polyp detection branch and a contrast learning branch; inputting a training image into the polyp detection model, and extracting features from the training image through the feature extraction network to obtain a plurality of feature maps with different scales; detecting the positions of polyp targets in the training image according to the plurality of feature maps through the polyp detection branch, and calculating detection loss according to the detection result; extracting positive sample features corresponding to polyp targets and negative sample features corresponding to non-polyp targets from the plurality of feature maps through the contrast learning branch, and calculating contrast loss according to the distance between the positive sample features and the positive sample target features and the distance between the negative sample features and the negative sample target features; obtaining a joint loss from the detection loss and the contrast loss, and updating the polyp detection branch and the contrast learning branch according to the joint loss.
Alternatively, the computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring an image to be detected; inputting the image to be detected into a target polyp detection model to obtain the position of a polyp target in the image to be detected; the target polyp detection model is obtained by training a polyp detection model based on the model training method disclosed by the disclosure, and comprises a feature extraction network and a polyp detection branch.
Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a module does not in some cases constitute a limitation of the module itself, for example, the model acquisition module may also be described as a "module for acquiring a polyp detection model".
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Example 1 provides, in accordance with one or more embodiments of the present disclosure, a model training method, comprising:
acquiring a polyp detection model, wherein the polyp detection model comprises a feature extraction network, a polyp detection branch and a contrast learning branch;
inputting a training image into the polyp detection model, and extracting features from the training image through the feature extraction network to obtain a plurality of feature maps with different scales;
detecting the positions of polyp targets in the training image according to the plurality of feature maps through the polyp detection branch, and calculating detection loss according to the detection result;
extracting positive sample features corresponding to polyp targets and negative sample features corresponding to non-polyp targets from the plurality of feature maps through the contrast learning branch, and calculating contrast loss according to the distance between the positive sample features and the positive sample target features and the distance between the negative sample features and the negative sample target features;
obtaining a joint loss from the detection loss and the contrast loss, and updating the polyp detection branch and the contrast learning branch according to the joint loss.
Example 2 provides the method of example 1, the polyp detection model further including an image classification branch, and after extracting features from the training image through the feature extraction network to obtain a plurality of feature maps at different scales, the method further including:
classifying whether the training image contains polyp targets or not according to a top layer feature map in the plurality of feature maps through the image classification branch, and calculating classification loss according to a classification result;
said obtaining a joint loss from said detection loss and said contrast loss, updating said polyp detection branch and said contrast learning branch from said joint loss, comprising:
obtaining a joint loss from the detection loss, the contrast loss, and the classification loss, and updating the polyp detection branch, the contrast learning branch, and the image classification branch according to the joint loss.
Example 3 provides the method of example 1 or 2, the extracting positive sample features corresponding to polyp targets and negative sample features corresponding to non-polyp targets from the plurality of feature maps, comprising:
determining a label of each pixel position in each feature map according to the polyp target label of the training image; wherein labels of pixel positions corresponding to polyp targets in the feature map are positive samples, and labels of pixel positions corresponding to non-polyp targets are negative samples;
and sampling a plurality of positive sample characteristics corresponding to positive samples and a plurality of negative sample characteristics corresponding to negative samples from the plurality of feature maps according to the label of each pixel position in each feature map.
Example 4 provides the method of example 3, the sampling, from the plurality of feature maps, a plurality of positive sample features corresponding to positive samples and a plurality of negative sample features corresponding to negative samples according to a label of each pixel location in each of the feature maps, including:
sampling all positive sample characteristics corresponding to positive samples from the plurality of feature maps and randomly sampling k × N corresponding to negative samples according to the label of each pixel position in each feature map p Individual negative sample characteristics; wherein N is p K is a preset coefficient for the number of positive sample features sampled.
Example 5 provides the method of example 3, the calculating a contrast loss according to a distance between the positive sample feature and a positive sample target feature, and a distance between the negative sample feature and a negative sample target feature, comprising:
performing L2 normalization on each positive sample feature, each negative sample feature, the positive sample target feature, and the negative sample target feature;
calculating the intra-class loss according to the distance between each normalized positive sample characteristic and the positive sample target characteristic and the distance between each normalized negative sample characteristic and the negative sample target characteristic;
calculating inter-class loss according to the distance between each normalized positive sample feature and each normalized negative sample target feature and the distance between each normalized negative sample feature and each normalized positive sample target feature;
and calculating the contrast loss according to the intra-class loss and the inter-class loss.
Example 6 provides the method of example 5, the intra-class loss, the inter-class loss, and the contrast loss are calculated by the following formulas:
Figure BDA0003662675850000221
Figure BDA0003662675850000222
L con =L pos +L neg
wherein L is pos For said intra-class losses, L neg For said inter-class losses, L con For the contrast loss, cos () means the cosine similarity is calculated, max () means the maximum value, N p Is the number of sampled positive sample features, N n Is the number of negative sample features sampled, p 1 For positive sample target features, p 0 In order to be a negative sample target feature,
Figure BDA0003662675850000223
for the normalized ith positive sample feature,
Figure BDA0003662675850000224
is the normalized ith negative sample characteristic.
Example 7 provides the method of example 1, the updating the polyp detection branch and the contrast learning branch according to the joint loss, comprising:
updating the polyp detection branch, and the positive sample target feature and the negative sample target feature in the contrast learning branch, according to the joint loss.
Example 8 provides a polyp detection method, according to one or more embodiments of the present disclosure, including:
acquiring an image to be detected;
inputting the image to be detected into a target polyp detection model to obtain the position of a polyp target in the image to be detected; wherein the target polyp detection model is obtained by training a polyp detection model based on the method of any one of examples 1 to 7, and the target polyp detection model includes a feature extraction network and polyp detection branches.
Example 9 provides, in accordance with one or more embodiments of the present disclosure, a model training apparatus, comprising:
the model acquisition module is used for acquiring a polyp detection model, and the polyp detection model comprises a feature extraction network, a polyp detection branch and a comparison learning branch;
the characteristic extraction module is used for inputting a training image into the polyp detection model, extracting characteristics from the training image through the characteristic extraction network and obtaining a plurality of characteristic graphs with different scales;
a polyp detection module for detecting the position of a polyp target in the training image according to the plurality of feature maps through the polyp detection branch, and calculating a detection loss according to a detection result;
a contrast learning module for extracting a positive sample feature corresponding to a polyp target and a negative sample feature corresponding to a non-polyp target from the plurality of feature maps through the contrast learning branch, and calculating a contrast loss according to a distance between the positive sample feature and the positive sample target feature and a distance between the negative sample feature and the negative sample target feature;
and the joint learning module is used for obtaining a joint loss according to the detection loss and the contrast loss and updating the polyp detection branch and the contrast learning branch according to the joint loss.
Example 10 provides a polyp detection apparatus, according to one or more embodiments of the present disclosure, comprising:
the image acquisition module is used for acquiring an image to be detected;
a polyp detection module, configured to input the image to be detected into a target polyp detection model, and obtain a position of a polyp target in the image to be detected; wherein the target polyp detection model is obtained by training a polyp detection model based on the method of any one of examples 1 to 7, and the target polyp detection model includes a feature extraction network and polyp detection branches.
Example 11 provides a computer-readable storage medium having stored thereon a computer program that, when executed by a processing apparatus, implements the method of any of examples 1-8, in accordance with one or more embodiments of the present disclosure.
Example 12 provides, in accordance with one or more embodiments of the present disclosure, an electronic device, comprising:
a storage device having a computer program stored thereon;
processing means for executing said computer program in said storage means to implement the method of any of examples 1-8.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.
Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims (12)

1. A method of model training, comprising:
acquiring a polyp detection model, wherein the polyp detection model comprises a feature extraction network, a polyp detection branch and a contrast learning branch;
inputting a training image into the polyp detection model, and extracting features from the training image through the feature extraction network to obtain a plurality of feature maps with different scales;
detecting the positions of polyp targets in the training image according to the plurality of feature maps through the polyp detection branch, and calculating detection loss according to the detection result;
extracting positive sample features corresponding to polyp targets and negative sample features corresponding to non-polyp targets from the plurality of feature maps through the contrast learning branch, and calculating contrast loss according to the distance between the positive sample features and the positive sample target features and the distance between the negative sample features and the negative sample target features;
obtaining a joint loss from the detection loss and the contrast loss, and updating the polyp detection branch and the contrast learning branch according to the joint loss.
2. The method of claim 1, wherein the polyp detection model further comprises an image classification branch, and after extracting features from the training image through the feature extraction network to obtain a plurality of feature maps at different scales, the method further comprises:
classifying whether the training image contains polyp targets or not according to a top layer feature map in the plurality of feature maps through the image classification branch, and calculating classification loss according to a classification result;
said obtaining a joint loss from said detection loss and said contrast loss, updating said polyp detection branch and said contrast learning branch from said joint loss, comprising:
obtaining a joint loss from the detection loss, the contrast loss, and the classification loss, and updating the polyp detection branch, the contrast learning branch, and the image classification branch according to the joint loss.
3. The method of claim 1 or 2, wherein said extracting positive sample features corresponding to polyp targets and negative sample features corresponding to non-polyp targets from said plurality of feature maps comprises:
determining a label of each pixel position in each feature map according to the polyp target label of the training image; wherein labels of pixel positions corresponding to polyp targets in the feature map are positive samples, and labels of pixel positions corresponding to non-polyp targets are negative samples;
and sampling a plurality of positive sample characteristics corresponding to positive samples and a plurality of negative sample characteristics corresponding to negative samples from the plurality of feature maps according to the label of each pixel position in each feature map.
4. The method of claim 3, wherein sampling a plurality of positive exemplar features corresponding to positive exemplars and a plurality of negative exemplar features corresponding to negative exemplars from the plurality of feature maps based on the label for each pixel location in each of the feature maps comprises:
sampling all positive sample characteristics corresponding to positive samples from the plurality of feature maps and randomly sampling k × N corresponding to negative samples according to the label of each pixel position in each feature map p Individual negative sample characteristics; wherein N is p K is a preset coefficient for the number of positive sample features sampled.
5. The method of claim 3, wherein calculating the contrast loss based on the distance between the positive sample features and positive sample target features and the distance between the negative sample features and negative sample target features comprises:
performing L2 normalization on each positive sample feature, each negative sample feature, the positive sample target feature, and the negative sample target feature;
calculating the intra-class loss according to the distance between each normalized positive sample characteristic and the positive sample target characteristic and the distance between each normalized negative sample characteristic and the negative sample target characteristic;
calculating inter-class loss according to the distance between each normalized positive sample feature and each normalized negative sample target feature and the distance between each normalized negative sample feature and each normalized positive sample target feature;
and calculating the contrast loss according to the intra-class loss and the inter-class loss.
6. The method of claim 5, wherein the intra-class loss, the inter-class loss, and the contrast loss are calculated by the following equations:
Figure FDA0003662675840000031
Figure FDA0003662675840000032
L con =L pos +L neg
wherein L is pos For said intra-class losses, L neg For said inter-class losses, L con For the contrast loss, cos () means the cosine similarity is calculated, max () means the maximum value, N p Is the number of sampled positive sample features, N n Is the number of negative sample features sampled, p 1 For positive sample target features, p 0 In order to be a negative sample target feature,
Figure FDA0003662675840000033
for the normalized ith positive sample feature,
Figure FDA0003662675840000034
is the normalized ith negative sample characteristic.
7. The method of claim 1, wherein said updating said polyp detection branch and said contrast learning branch according to said joint loss comprises:
updating the polyp detection branch and the positive sample target feature and the negative sample target feature in the contrast learning branch according to the joint loss.
8. A method of polyp detection, comprising:
acquiring an image to be detected;
inputting the image to be detected into a target polyp detection model to obtain the position of a polyp target in the image to be detected; wherein the target polyp detection model is obtained by training a polyp detection model based on the method of any one of claims 1-7, the target polyp detection model comprising a feature extraction network and polyp detection branches.
9. A model training apparatus, comprising:
the model acquisition module is used for acquiring a polyp detection model, and the polyp detection model comprises a feature extraction network, a polyp detection branch and a contrast learning branch;
the characteristic extraction module is used for inputting a training image into the polyp detection model, extracting characteristics from the training image through the characteristic extraction network and obtaining a plurality of characteristic graphs with different scales;
a polyp detection module for detecting the position of a polyp target in the training image according to the plurality of feature maps through the polyp detection branch, and calculating a detection loss according to a detection result;
a contrast learning module for extracting a positive sample feature corresponding to a polyp target and a negative sample feature corresponding to a non-polyp target from the plurality of feature maps through the contrast learning branch, and calculating a contrast loss according to a distance between the positive sample feature and the positive sample target feature and a distance between the negative sample feature and the negative sample target feature;
and the joint learning module is used for obtaining a joint loss according to the detection loss and the contrast loss and updating the polyp detection branch and the contrast learning branch according to the joint loss.
10. A polyp detection device, comprising:
the image acquisition module is used for acquiring an image to be detected;
a polyp detection module, configured to input the image to be detected into a target polyp detection model, and obtain a position of a polyp target in the image to be detected; wherein the target polyp detection model is obtained by training a polyp detection model based on the method of any one of claims 1-7, the target polyp detection model comprising a feature extraction network and polyp detection branches.
11. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by processing means, carries out the steps of the method according to any one of claims 1 to 8.
12. An electronic device, comprising:
a storage device having a computer program stored thereon;
processing means for executing the computer program in the storage means to carry out the steps of the method according to any one of claims 1 to 8.
CN202210583592.9A 2022-05-25 2022-05-25 Model training method, polyp detection method, corresponding apparatus, medium, and device Pending CN114863124A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210583592.9A CN114863124A (en) 2022-05-25 2022-05-25 Model training method, polyp detection method, corresponding apparatus, medium, and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210583592.9A CN114863124A (en) 2022-05-25 2022-05-25 Model training method, polyp detection method, corresponding apparatus, medium, and device

Publications (1)

Publication Number Publication Date
CN114863124A true CN114863124A (en) 2022-08-05

Family

ID=82640681

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210583592.9A Pending CN114863124A (en) 2022-05-25 2022-05-25 Model training method, polyp detection method, corresponding apparatus, medium, and device

Country Status (1)

Country Link
CN (1) CN114863124A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116168053A (en) * 2023-02-28 2023-05-26 抖音视界有限公司 Polyp segmentation model training method, polyp segmentation method and related device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116168053A (en) * 2023-02-28 2023-05-26 抖音视界有限公司 Polyp segmentation model training method, polyp segmentation method and related device
CN116168053B (en) * 2023-02-28 2024-02-02 抖音视界有限公司 Polyp segmentation model training method, polyp segmentation method and related device

Similar Documents

Publication Publication Date Title
US20210158533A1 (en) Image processing method and apparatus, and storage medium
CN110348543B (en) Fundus image recognition method and device, computer equipment and storage medium
CN111739035B (en) Image processing method, device and equipment based on artificial intelligence and storage medium
CN111325726A (en) Model training method, image processing method, device, equipment and storage medium
WO2020224479A1 (en) Method and apparatus for acquiring positions of target, and computer device and storage medium
CN111091166B (en) Image processing model training method, image processing device, and storage medium
CN113470029B (en) Training method and device, image processing method, electronic device and storage medium
CN110399847B (en) Key frame extraction method and device and electronic equipment
CN114820584B (en) Lung focus positioner
CN114332554A (en) Training method of image segmentation model, image segmentation method, device and equipment
CN115082490B (en) Abnormity prediction method, and abnormity prediction model training method, device and equipment
CN112712036A (en) Traffic sign recognition method and device, electronic equipment and computer storage medium
CN115131281A (en) Method, device and equipment for training change detection model and detecting image change
CN114863124A (en) Model training method, polyp detection method, corresponding apparatus, medium, and device
CN114332033A (en) Endoscope image processing method, apparatus, medium, and device based on artificial intelligence
CN112037305B (en) Method, device and storage medium for reconstructing tree-like organization in image
CN114283299A (en) Image clustering method and device, computer equipment and storage medium
CN116168053B (en) Polyp segmentation model training method, polyp segmentation method and related device
WO2023185497A1 (en) Tissue image recognition method and apparatus, and readable medium and electronic device
CN112884702A (en) Polyp identification system and method based on endoscope image
CN115375657A (en) Method for training polyp detection model, detection method, device, medium, and apparatus
CN114937178B (en) Multi-modality-based image classification method and device, readable medium and electronic equipment
CN111310595A (en) Method and apparatus for generating information
CN115375656A (en) Training method, segmentation method, device, medium, and apparatus for polyp segmentation model
CN116704593A (en) Predictive model training method, apparatus, electronic device, and computer-readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination