CN117437580B - Digestive tract tumor recognition method, digestive tract tumor recognition system and digestive tract tumor recognition medium - Google Patents
Digestive tract tumor recognition method, digestive tract tumor recognition system and digestive tract tumor recognition medium Download PDFInfo
- Publication number
- CN117437580B CN117437580B CN202311754713.2A CN202311754713A CN117437580B CN 117437580 B CN117437580 B CN 117437580B CN 202311754713 A CN202311754713 A CN 202311754713A CN 117437580 B CN117437580 B CN 117437580B
- Authority
- CN
- China
- Prior art keywords
- frame
- prediction
- frames
- target
- real
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 206010028980 Neoplasm Diseases 0.000 title claims abstract description 131
- 210000001035 gastrointestinal tract Anatomy 0.000 title claims abstract description 59
- 238000000034 method Methods 0.000 title claims abstract description 29
- 238000004422 calculation algorithm Methods 0.000 claims description 11
- 238000001839 endoscopy Methods 0.000 claims description 8
- 239000011159 matrix material Substances 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 5
- 238000013473 artificial intelligence Methods 0.000 abstract description 7
- 208000005718 Stomach Neoplasms Diseases 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 206010017758 gastric cancer Diseases 0.000 description 3
- 201000011549 stomach cancer Diseases 0.000 description 3
- 206010009944 Colon cancer Diseases 0.000 description 2
- 208000001333 Colorectal Neoplasms Diseases 0.000 description 2
- 208000000461 Esophageal Neoplasms Diseases 0.000 description 2
- 238000002405 diagnostic procedure Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 201000004101 esophageal cancer Diseases 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 206010061424 Anal cancer Diseases 0.000 description 1
- 208000007860 Anus Neoplasms Diseases 0.000 description 1
- 206010019375 Helicobacter infections Diseases 0.000 description 1
- 206010030155 Oesophageal carcinoma Diseases 0.000 description 1
- 208000037581 Persistent Infection Diseases 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 201000011165 anus cancer Diseases 0.000 description 1
- 230000004071 biological effect Effects 0.000 description 1
- 239000000090 biomarker Substances 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000013399 early diagnosis Methods 0.000 description 1
- 235000005686 eating Nutrition 0.000 description 1
- 235000006694 eating habits Nutrition 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000003902 lesion Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000037081 physical activity Effects 0.000 description 1
- 201000002314 small intestine cancer Diseases 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10068—Endoscopic image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
- G06T2207/30096—Tumor; Lesion
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to the application of artificial intelligence in the medical field, in particular to a digestive tract tumor recognition method, which comprises the steps of inputting training sample images in a sample set into a digestive tract tumor recognition model to obtain a prediction frame and tumor prediction probability corresponding to the prediction frame, and calculating loss according to the position of the prediction frame, the tumor prediction probability corresponding to the prediction frame and the position of a real frame to obtain loss corresponding to the prediction frame; when the number of the real frames is 1 and the number of the target prediction frames is greater than 1, determining the target prediction frames corresponding to the real frames according to other frames around the training image in the video, and establishing a corresponding relation between the real frames and the target prediction frames; the target prediction frame is the prediction frame with the minimum loss in all the prediction frames; calculating the loss corresponding to the training sample image based on the corresponding relation, and training the model by adopting back propagation; frames are extracted from the video of the endoscope and identified by the trained model. The invention improves the accuracy of the model and the training speed.
Description
Technical Field
The invention relates to the application field of artificial intelligence in medical treatment, in particular to a digestive tract tumor identification method, a digestive tract tumor identification system and a digestive tract tumor identification medium.
Background
Digestive tract tumors, including esophageal cancer, gastric cancer, colorectal cancer, small intestine cancer, anal cancer, etc., are mainly increased in risk due to eating habits and chronic infections (such as helicobacter pylori infection), while colorectal cancer is more closely related to lifestyle factors (such as lack of eating and physical activity). The complexity and diversity of these tumor types requires that the medical community continually seek more efficient, more accurate diagnostic methods. Traditional diagnostic methods, such as endoscopy, radiological imaging and biomarker detection, while widely used clinically, remain limited in terms of the presence of significant limitations, such as early tumor identification and precise typing. Especially in the early stages, the digestive tract tumor may have no obvious symptoms or easily distinguishable features, which increases the difficulty of early diagnosis.
With the development of artificial intelligence technology, the technology has great potential in the medical field, especially in diagnosis of digestive tract tumors. However, artificial intelligence also faces several challenges in gut tumor recognition, the most significant of which is the impact of tumor heterogeneity on the generalization ability of artificial intelligence models. In digestive tract tumors, even the same type of tumor, different biological characteristics and clinical manifestations may be exhibited between different patients. Such differences may result from genetic differences, environmental factors, or biological properties of the tumor itself. For example, the tumors of two gastric cancer patients may differ significantly in size, morphology, growth rate, and cell composition, and these differences directly affect the recognition and analysis capabilities of the artificial intelligence model, requiring higher accuracy of the model. How to improve the accuracy of identifying digestive tract tumors becomes a key of artificial intelligence in identifying digestive tract tumors.
Disclosure of Invention
In order to improve the accuracy and the training speed of the digestive tract tumor recognition, the invention provides a digestive tract tumor recognition method, which comprises the following steps of:
obtaining an endoscope video of digestive tract examination, marking images in the video to obtain a sample set, inputting training sample images in the sample set into a digestive tract tumor recognition model to obtain a prediction frame and a tumor prediction probability corresponding to the prediction frame, and calculating losses according to the position of the prediction frame, the tumor prediction probability corresponding to the prediction frame and the position of a real frame to obtain losses corresponding to each prediction frame;
when the number of the real frames is 1, if the number of the target predicted frames is 1, establishing a corresponding relation between the real frames and the target predicted frames; if the number of the target prediction frames is greater than 1, determining target prediction frames corresponding to the real frames according to other frames around the training sample image in the video, and establishing a corresponding relation between the real frames and the target prediction frames; when the number of the real frames is larger than 1, constructing a cost matrix and determining the corresponding relation between the real frames and the predicted frames by adopting a Hungary algorithm; the target prediction frame is the prediction frame with the minimum loss in all the prediction frames;
calculating the corresponding loss of the training sample image based on the corresponding relation, and training the model by adopting back propagation; in endoscopy of a tumor of the digestive tract, frames are extracted from the video of the endoscope and identified by the trained model.
Preferably, the determining a target prediction frame corresponding to the real frame according to other frames around the training sample image in the video specifically includes:
obtaining a tumor prediction probability corresponding to each target prediction frame;
determining target frames according to the positions of the training sample images in the video, and inputting each target frame into the digestive tract tumor recognition model to obtain the maximum tumor prediction probability corresponding to all prediction frames;
and calculating the average value of the maximum tumor prediction probabilities of all target frames, and taking the target prediction frame with the tumor prediction probability closest to the average value as a prediction frame corresponding to a real frame.
Preferably, the determining the target frame according to the position of the training sample image in the video specifically includes:
acquiring the position of the training sample image in the video, and determining N frames of images before the position and N frames of images after the position; wherein N is a positive integer greater than 2;
and acquiring a plurality of sub-region images from the training sample image according to the target prediction frame and the real frame, and acquiring a target frame from the 2N frame image according to the sub-region images.
Preferably, the target frame is obtained from the 2N frame image according to the sub-region image, specifically:
searching a region with highest similarity of the sub-region images corresponding to the real frames in each of the 2N frame images, and determining target sub-region images corresponding to the plurality of sub-region images based on the position relation among the plurality of sub-region images and the position of the region with highest similarity in the 2N frame images;
establishing a corresponding relation between the sub-region image and the target sub-region image, calculating the similarity between the sub-region image and the corresponding target sub-region image, and calculating the distance between the target sub-region image corresponding to the real frame and the real frame, selecting at least one image from the 2N frame images, wherein the similarity and the distance corresponding to the selected image meet the preset condition.
Preferably, the calculating the loss according to the position of the prediction frame, the tumor prediction probability corresponding to the prediction frame, and the position of the real frame obtains the loss corresponding to each prediction frame, specifically:
obtaining probability loss according to the tumor prediction probability corresponding to the prediction frame; obtaining a boundary frame loss based on the IOU or GIOU of the prediction frame and the real frame; and summing the probability loss and the boundary box loss to obtain the loss corresponding to the prediction box.
In addition, the invention also provides a digestive tract tumor recognition system, which comprises the following modules:
the prediction frame loss acquisition module is used for acquiring an endoscopic video of the digestive tract examination, marking images in the video to obtain a sample set, inputting training sample images in the sample set into the digestive tract tumor recognition model to obtain a prediction frame and a tumor prediction probability corresponding to the prediction frame, and calculating loss according to the position of the prediction frame, the tumor prediction probability corresponding to the prediction frame and the position of a real frame to obtain loss corresponding to each prediction frame;
the target prediction frame calculation module is used for establishing a corresponding relation between the real frames and the target prediction frames if the number of the real frames is 1 and the number of the target prediction frames is 1; if the number of the target prediction frames is greater than 1, determining target prediction frames corresponding to the real frames according to other frames around the training sample image in the video, and establishing a corresponding relation between the real frames and the target prediction frames; when the number of the real frames is larger than 1, constructing a cost matrix and determining the corresponding relation between the real frames and the predicted frames by adopting a Hungary algorithm; the target prediction frame is the prediction frame with the minimum loss in all the prediction frames;
the tumor recognition module is used for calculating the loss corresponding to the training sample image based on the corresponding relation and training the model by adopting back propagation; in endoscopy of a tumor of the digestive tract, frames are extracted from the video of the endoscope and identified by the trained model.
Preferably, the determining a target prediction frame corresponding to the real frame according to other frames around the training sample image in the video specifically includes:
obtaining a tumor prediction probability corresponding to each target prediction frame;
determining target frames according to the positions of the training sample images in the video, and inputting each target frame into the digestive tract tumor recognition model to obtain the maximum tumor prediction probability corresponding to all prediction frames;
and calculating the average value of the maximum tumor prediction probabilities of all target frames, and taking the target prediction frame with the tumor prediction probability closest to the average value as a prediction frame corresponding to a real frame.
Preferably, the determining the target frame according to the position of the training sample image in the video specifically includes:
acquiring the position of the training sample image in the video, and determining N frames of images before the position and N frames of images after the position; wherein N is a positive integer greater than 2;
and acquiring a plurality of sub-region images from the training sample image according to the target prediction frame and the real frame, and acquiring a target frame from the 2N frame image according to the sub-region images.
Preferably, the target frame is obtained from the 2N frame image according to the sub-region image, specifically:
searching a region with highest similarity of the sub-region images corresponding to the real frames in each of the 2N frame images, and determining target sub-region images corresponding to the plurality of sub-region images based on the position relation among the plurality of sub-region images and the position of the region with highest similarity in the 2N frame images;
establishing a corresponding relation between the sub-region image and the target sub-region image, calculating the similarity between the sub-region image and the corresponding target sub-region image, and calculating the distance between the target sub-region image corresponding to the real frame and the real frame, selecting at least one image from the 2N frame images, wherein the similarity and the distance corresponding to the selected image meet the preset condition.
Preferably, the calculating the loss according to the position of the prediction frame, the tumor prediction probability corresponding to the prediction frame, and the position of the real frame obtains the loss corresponding to each prediction frame, specifically:
obtaining probability loss according to the tumor prediction probability corresponding to the prediction frame; obtaining a boundary frame loss based on the IOU or GIOU of the prediction frame and the real frame; and summing the probability loss and the boundary box loss to obtain the loss corresponding to the prediction box.
Finally, the invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as described above.
In the training of the digestive tract tumor recognition model, because of limited samples, the convergence and the accuracy of the model are deficient, and based on the model, the loss corresponding to each prediction frame is obtained according to the position of the prediction frame, the tumor prediction probability corresponding to the prediction frame and the position calculation loss of the real frame; when the number of the real frames is 1, if the number of the target predicted frames is 1, establishing a corresponding relation between the real frames and the target predicted frames; if the number of the target prediction frames is greater than 1, determining target prediction frames corresponding to the real frames according to other frames around the training sample image in the video, and establishing a corresponding relation between the real frames and the target prediction frames; and calculating the loss corresponding to the training sample image based on the corresponding relation, and training the model by adopting back propagation. The invention not only has more accurate calculation loss, but also reduces the calculation amount and improves the training speed.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a first embodiment;
FIG. 2 is a flowchart of step S2;
FIG. 3 is a schematic diagram of a real frame and a target prediction frame;
FIG. 4 is a schematic diagram of a real frame and a target prediction frame in a frame image;
fig. 5 is a structural diagram of the second embodiment.
Detailed Description
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In a first embodiment, the present invention provides a method for identifying tumor of digestive tract, as shown in fig. 1, the method comprises the following steps:
s1, obtaining an endoscope video of digestive tract examination, marking images in the video to obtain a sample set, inputting training sample images in the sample set into a digestive tract tumor recognition model to obtain a prediction frame and tumor prediction probability corresponding to the prediction frame, and calculating loss according to the position of the prediction frame, the tumor prediction probability corresponding to the prediction frame and the position of a real frame to obtain loss corresponding to each prediction frame;
when the endoscope is used for checking the digestive tract, a doctor usually judges whether a lesion or a tumor exists according to experience, the doctor usually depends on the experience of the doctor, and omission can occur. When training a digestive tract tumor recognition model, obtaining an endoscope video of digestive tract examination, and then labeling images in the video to obtain a sample set, wherein the video is composed of a plurality of frames, each frame is an image, and when collecting a sample, a clear image of a tumor is preferentially selected as a labeling object to obtain a first set. And marking the image with clear tumor, and marking the image with blurred tumor to obtain a second set.
The first set is preferentially used as training samples when training the digestive tract tumor recognition model. The digestive tract tumor recognition model is preferably a DETR model, where the DETR model outputs multiple prediction results and multiple prediction frames (predicted boxes) at the same time, where each prediction frame corresponds to one prediction result, for example, the prediction probability corresponding to the prediction frame 1 is 0.1, if the DETR outputs 10 prediction frames at a time, the 10 prediction frames correspond to the 10 tumor probabilities, and there is position information corresponding to the 10 prediction frames, and an exemplary position is expressed in (x_min, y_min, x_max, y_max) or (x_center, y_center, width, height). If the tumor prediction probability corresponding to the position is added, the prediction probability may be expressed as (x_center, y_center, width, height, pr), where x_center, y_center represent the center coordinates of the prediction frame, width, height represent the width and height of the prediction frame, and pr represents the tumor prediction probability corresponding to the prediction frame.
Then, the loss of each prediction frame to each real frame (also called as a group frame or a label frame) is calculated, assuming that there are 8 prediction frames and 2 real frames, the loss of the 1 st prediction frame and the 1 st real frame are calculated respectively, then the loss of the 1 st prediction frame and the 2 nd real frame are calculated, and so on until the last. The loss corresponding to each prediction frame is obtained by calculating the loss according to the position of the prediction frame, the tumor prediction probability corresponding to the prediction frame and the position of the real frame, and specifically comprises the following steps:
obtaining probability loss (also called classification loss) according to the tumor prediction probability corresponding to the prediction frame; obtaining a bounding box penalty (also known as a regression penalty) based on the IOU or GIOU of the predicted and real boxes; and summing the probability loss and the boundary box loss to obtain the loss corresponding to the prediction box. Table 1 below shows one example:
TABLE 1
In table 1, pbox represents a prediction box, tbox represents a real box, and tbox loss corresponding to pbox is represented in a+b form, where a represents a probability loss, and b represents a bounding box loss; f represents the predicted loss of no tumor, and if no tumor is predicted, it is also free of bounding box loss. The relation between the tumor prediction probability and the probability loss is that the probability loss is obtained by 1-tumor prediction probability, and a mode that 1-log (tumor prediction probability) is equal to the probability loss can be adopted, so that the invention is not particularly limited. Taking the loss of 0.2+0.1 of the prediction frame 1 relative to the real frame 1 as an example, 0.2 represents the loss of the prediction frame 1 corresponding to the prediction as a tumor, and 0.1 represents the loss corresponding to the IOU or GIOU of the prediction frame 1 and the real frame 1.
S2, when the number of the real frames is 1, if the number of the target predicted frames is 1, establishing a corresponding relation between the real frames and the target predicted frames; if the number of the target prediction frames is greater than 1, determining target prediction frames corresponding to the real frames according to other frames around the training sample image in the video, and establishing a corresponding relation between the real frames and the target prediction frames; when the number of the real frames is larger than 1, constructing a cost matrix and determining the corresponding relation between the real frames and the predicted frames by adopting a Hungary algorithm; the target prediction frame is the prediction frame with the minimum loss in all the prediction frames; the flow chart of S2 is shown in fig. 2.
Each real frame should correspond to a prediction frame, in the DETR model, a cost matrix is calculated, a hungarian algorithm (Hungarian algorithm) is used to find the prediction frame corresponding to each real frame according to the cost matrix, and then loss is calculated. The hungarian algorithm in the above process is a problem of combination, and the process is complex, but in the process of identifying digestive tract tumors, the number of tumors in the image is small, and the complex process of the hungarian algorithm is not necessary.
In the model training of the invention, firstly, the number of the real frames in one sample is judged, if only one real frame exists, the prediction frame with the minimum loss for the real frame is selected, and the corresponding relation between the two frames is established. If there are two or more predicted frames for which the loss is minimal, i.e., the loss of two or more predicted frames for the real frame is equal and minimal, as shown in Table 2:
TABLE 2
In table 2, the losses of prediction frame 1 and prediction frame 8 with respect to real frame 1 are the same, since prediction frame 1 is used to match real frame 1 and prediction frame 2 is used to match real frame 1, the loss of this training, calculated last, is different, and it is necessary to further determine whether to match prediction frame 1 with real frame 1 or to match prediction frame 2 with real frame 1.
In a specific embodiment, the determining a target prediction frame corresponding to a real frame according to other frames around the training sample image in the video specifically includes:
obtaining a tumor prediction probability corresponding to each target prediction frame;
each target prediction frame corresponds to a position and a prediction probability of the target prediction frame, for example, (x_center, y_center, width, height, pr), where pr is a tumor prediction probability.
Determining target frames according to the positions of the training sample images in the video, and inputting each target frame into the digestive tract tumor recognition model to obtain the maximum tumor prediction probability corresponding to all prediction frames;
and calculating the average value of the maximum tumor prediction probabilities of all target frames, and taking the target prediction frame with the tumor prediction probability closest to the average value as a prediction frame corresponding to a real frame.
In the video, the frame around the training sample image is similar to the training sample image, and in general, there will be a tumor marked by a real frame also appearing in the frame around the training sample image, except that the position of the tumor in the frame around the training sample will change due to the movement of the endoscope or the like. Based on the above, the determined target frames are input into the model, then the maximum tumor prediction probability in all the prediction frames corresponding to each target frame is obtained, then the average value of the maximum tumor prediction probabilities of all the target frames is calculated, and the reliability of the probability loss corresponding to the target prediction frames can be judged through the average value.
For example, the target prediction frame has two pbox1 and pbox8, and the corresponding tumor prediction probabilities are 0.8 and 0.9, respectively. After the target frames pass through the model, the number of target frames is 3, each target frame corresponds to 8 prediction frames, namely 8 prediction frames of each target frame correspond to 8 tumor prediction probabilities, and the maximum value is taken; the maximum tumor prediction probabilities corresponding to the 3 target frames are assumed to be respectively: 0.8, 0.6, 0.9, with an average value of 0.77, which is closest to 0.8, then finally pbox1 is used as the prediction box for the real box.
In an optional embodiment, the determining the target frame according to the position of the training sample image in the video specifically is:
acquiring the position of the training sample image in the video, and determining N frames of images before the position and N frames of images after the position; wherein N is a positive integer greater than 2;
and acquiring a plurality of sub-region images from the training sample image according to the target prediction frame and the real frame, and acquiring a target frame from the 2N frame image according to the sub-region images.
When selecting the target frame, it is necessary to ensure that the image area of the real frame in the target training sample exists in the target frame, otherwise, the reliability of the average value obtained above is very low, and the final effect is reduced. The invention selects the target frame from the frames around the training sample image, and further screens from 2N frame images according to the target prediction frame and the real frame. In an optional embodiment, the target frame is obtained from the 2N frame image according to the sub-region image, specifically:
searching a region with highest similarity of the sub-region images corresponding to the real frames in each of the 2N frame images, and determining target sub-region images corresponding to the plurality of sub-region images based on the position relation among the plurality of sub-region images and the position of the region with highest similarity in the 2N frame images;
the sub-region images are obtained by capturing images of the size from the position in the target sample image, and the sub-region images corresponding to the target prediction frames are obtained in the same sub-region image obtaining mode.
Then, for each of the 2N frame images, searching for the region with the highest similarity of the sub-region image corresponding to the real frame, so that a sub-region is obtained in each of the 2N frame images, and then determining the position of the sub-region in each of the 2N frame images. And determining a plurality of target sub-region images according to the position relation of the plurality of sub-region images in the target sample image and the position of the sub-region in the frame image. Assuming that there are 1 real frame and 2 target prediction frames, the relationship of the three is shown in fig. 3. After determining the region with the highest similarity with the real frame in the frame image, 3 target sub-region images can be determined according to the triangle, as shown in fig. 4.
Establishing a corresponding relation between the sub-region image and the target sub-region image, calculating the similarity between the sub-region image and the corresponding target sub-region image, and calculating the distance between the target sub-region image corresponding to the real frame and the real frame, selecting at least one image from the 2N frame images, wherein the similarity and the distance corresponding to the selected image meet the preset condition.
The method comprises the steps that a real frame and a plurality of target prediction frames exist in a target sample image, a target sub-region image corresponding to the real frame and a target sub-region image corresponding to the target prediction frames are determined in the frame image, and therefore a corresponding relation between the sub-region image in the target sample image and the target sub-region image in the frame image is established. Then calculating the similarity of the sub-region image and the corresponding target sub-region image, calculating the distance between the target sub-region image corresponding to the real frame and the real frame in the target sample image, if the similarity is high, indicating that the real frame and the target prediction frame exist in the frame image, otherwise, discarding the frame image; meanwhile, certain differences between the frame image and the target sample image are guaranteed, so that the reliability of the calculated average value is higher.
In the digestive tract tumor recognition, for the situation that more than one unusual training sample image exists, for programming simplicity, the corresponding relation between a real frame and a predicted frame is still determined by adopting a default mode of a DETR model.
S3, calculating the loss corresponding to the training sample image based on the corresponding relation, and training the model by adopting back propagation; in endoscopy of a tumor of the digestive tract, frames are extracted from the video of the endoscope and identified by the trained model.
After the corresponding relation between the prediction frame and the real frame is established, the loss corresponding to the training sample image is calculated, and in an alternative embodiment, the loss corresponding to the training sample image adopts a loss calculation mode in the DETR model. By training the digestive tract tumor recognition model through the method, the accuracy and the accuracy of the model can be improved, and the calculated amount in the training process, particularly the calculated amount brought by the Hungary algorithm, can be reduced.
In endoscopy of a tumor in a digestive tract, video shot by an endoscope is synchronously transmitted to a server or a host computer, frames are extracted from the video of the endoscope, and the trained model is adopted for identification. In an alternative embodiment, frames are extracted from the video of the endoscope in a mode of equal time intervals, for example, frames are extracted every 1s, or a mode of extracting only I frames and/or P frames in the video is also adopted, and the specific extraction mode is not limited in the present invention. According to different training samples, the model of the invention can be only used for detecting gastric cancer or esophagus cancer and the like. Of course, the method can also be applied to other related diseases detected by the endoscope.
In a second embodiment, the present invention further provides a digestive tract tumor recognition system, as shown in fig. 5, the system includes the following modules:
the prediction frame loss acquisition module is used for acquiring an endoscopic video of the digestive tract examination, marking images in the video to obtain a sample set, inputting training sample images in the sample set into the digestive tract tumor recognition model to obtain a prediction frame and a tumor prediction probability corresponding to the prediction frame, and calculating loss according to the position of the prediction frame, the tumor prediction probability corresponding to the prediction frame and the position of a real frame to obtain loss corresponding to each prediction frame;
the target prediction frame calculation module is used for establishing a corresponding relation between the real frames and the target prediction frames if the number of the real frames is 1 and the number of the target prediction frames is 1; if the number of the target prediction frames is greater than 1, determining target prediction frames corresponding to the real frames according to other frames around the training sample image in the video, and establishing a corresponding relation between the real frames and the target prediction frames; when the number of the real frames is larger than 1, constructing a cost matrix and determining the corresponding relation between the real frames and the predicted frames by adopting a Hungary algorithm; the target prediction frame is the prediction frame with the minimum loss in all the prediction frames;
the tumor recognition module is used for calculating the loss corresponding to the training sample image based on the corresponding relation and training the model by adopting back propagation; in endoscopy of a tumor of the digestive tract, frames are extracted from the video of the endoscope and identified by the trained model.
Preferably, the determining a target prediction frame corresponding to the real frame according to other frames around the training sample image in the video specifically includes:
obtaining a tumor prediction probability corresponding to each target prediction frame;
determining target frames according to the positions of the training sample images in the video, and inputting each target frame into the digestive tract tumor recognition model to obtain the maximum tumor prediction probability corresponding to all prediction frames;
and calculating the average value of the maximum tumor prediction probabilities of all target frames, and taking the target prediction frame with the tumor prediction probability closest to the average value as a prediction frame corresponding to a real frame.
Preferably, the determining the target frame according to the position of the training sample image in the video specifically includes:
acquiring the position of the training sample image in the video, and determining N frames of images before the position and N frames of images after the position; wherein N is a positive integer greater than 2;
and acquiring a plurality of sub-region images from the training sample image according to the target prediction frame and the real frame, and acquiring a target frame from the 2N frame image according to the sub-region images.
Preferably, the target frame is obtained from the 2N frame image according to the sub-region image, specifically:
searching a region with highest similarity of the sub-region images corresponding to the real frames in each of the 2N frame images, and determining target sub-region images corresponding to the plurality of sub-region images based on the position relation among the plurality of sub-region images and the position of the region with highest similarity in the 2N frame images;
establishing a corresponding relation between the sub-region image and the target sub-region image, calculating the similarity between the sub-region image and the corresponding target sub-region image, and calculating the distance between the target sub-region image corresponding to the real frame and the real frame, selecting at least one image from the 2N frame images, wherein the similarity and the distance corresponding to the selected image meet the preset condition.
Preferably, the calculating the loss according to the position of the prediction frame, the tumor prediction probability corresponding to the prediction frame, and the position of the real frame obtains the loss corresponding to each prediction frame, specifically:
obtaining probability loss according to the tumor prediction probability corresponding to the prediction frame; obtaining a boundary frame loss based on the IOU or GIOU of the prediction frame and the real frame; and summing the probability loss and the boundary box loss to obtain the loss corresponding to the prediction box.
In a third embodiment, the present invention further provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the method according to the first embodiment.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by adding necessary general purpose hardware platforms, or may be implemented by a combination of hardware and software. Based on such understanding, the foregoing aspects, in essence and portions contributing to the art, may be embodied in the form of a computer program product, which may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (8)
1. A method of gut tumor identification, the method comprising the steps of:
obtaining an endoscope video of digestive tract examination, marking images in the video to obtain a sample set, inputting training sample images in the sample set into a digestive tract tumor recognition model to obtain a prediction frame and a tumor prediction probability corresponding to the prediction frame, and calculating losses according to the position of the prediction frame, the tumor prediction probability corresponding to the prediction frame and the position of a real frame to obtain losses corresponding to each prediction frame;
when the number of the real frames is 1, if the number of the target predicted frames is 1, establishing a corresponding relation between the real frames and the target predicted frames; if the number of the target prediction frames is greater than 1, determining target prediction frames corresponding to the real frames according to other frames around the training sample image in the video, and establishing a corresponding relation between the real frames and the target prediction frames; when the number of the real frames is larger than 1, constructing a cost matrix and determining the corresponding relation between the real frames and the predicted frames by adopting a Hungary algorithm; the target prediction frame is the prediction frame with the minimum loss in all the prediction frames;
calculating the corresponding loss of the training sample image based on the corresponding relation, and training the model by adopting back propagation; in endoscopy of a tumor of the digestive tract, frames are extracted from the video of the endoscope and identified by the trained model;
the determining a target prediction frame corresponding to the real frame according to other frames around the training sample image in the video specifically includes:
obtaining a tumor prediction probability corresponding to each target prediction frame;
determining target frames according to the positions of the training sample images in the video, and inputting each target frame into the digestive tract tumor recognition model to obtain the maximum tumor prediction probability corresponding to all prediction frames;
and calculating the average value of the maximum tumor prediction probabilities of all target frames, and taking the target prediction frame with the tumor prediction probability closest to the average value as a prediction frame corresponding to a real frame.
2. The method according to claim 1, wherein the determining a target frame from the position of the training sample image in the video is in particular:
acquiring the position of the training sample image in the video, and determining N frames of images before the position and N frames of images after the position; wherein N is a positive integer greater than 2;
and acquiring a plurality of sub-region images from the training sample image according to the target prediction frame and the real frame, and acquiring a target frame from the 2N frame image according to the sub-region images.
3. The method according to claim 2, wherein the target frame is acquired from the 2N frame image according to the sub-region image, specifically:
searching a region with highest similarity of the sub-region images corresponding to the real frames in each of the 2N frame images, and determining target sub-region images corresponding to the plurality of sub-region images based on the position relation among the plurality of sub-region images and the position of the region with highest similarity in the 2N frame images;
establishing a corresponding relation between the sub-region image and the target sub-region image, calculating the similarity between the sub-region image and the corresponding target sub-region image, and calculating the distance between the target sub-region image corresponding to the real frame and the real frame, selecting at least one image from the 2N frame images, wherein the similarity and the distance corresponding to the selected image meet the preset condition.
4. The method of claim 1, wherein the calculating the loss according to the position of the predicted frame, the tumor prediction probability corresponding to the predicted frame, and the position of the real frame obtains the loss corresponding to each predicted frame, specifically:
obtaining probability loss according to the tumor prediction probability corresponding to the prediction frame; obtaining a boundary frame loss based on the IOU or GIOU of the prediction frame and the real frame; and summing the probability loss and the boundary box loss to obtain the loss corresponding to the prediction box.
5. A digestive tract tumor recognition system, the system comprising the following modules:
the prediction frame loss acquisition module is used for acquiring an endoscopic video of the digestive tract examination, marking images in the video to obtain a sample set, inputting training sample images in the sample set into the digestive tract tumor recognition model to obtain a prediction frame and a tumor prediction probability corresponding to the prediction frame, and calculating loss according to the position of the prediction frame, the tumor prediction probability corresponding to the prediction frame and the position of a real frame to obtain loss corresponding to each prediction frame;
the target prediction frame calculation module is used for establishing a corresponding relation between the real frames and the target prediction frames if the number of the real frames is 1 and the number of the target prediction frames is 1; if the number of the target prediction frames is greater than 1, determining target prediction frames corresponding to the real frames according to other frames around the training sample image in the video, and establishing a corresponding relation between the real frames and the target prediction frames; when the number of the real frames is larger than 1, constructing a cost matrix and determining the corresponding relation between the real frames and the predicted frames by adopting a Hungary algorithm; the target prediction frame is the prediction frame with the minimum loss in all the prediction frames;
the tumor recognition module is used for calculating the loss corresponding to the training sample image based on the corresponding relation and training the model by adopting back propagation; in endoscopy of a tumor of the digestive tract, frames are extracted from the video of the endoscope and identified by the trained model;
the determining a target prediction frame corresponding to the real frame according to other frames around the training sample image in the video specifically includes:
obtaining a tumor prediction probability corresponding to each target prediction frame;
determining target frames according to the positions of the training sample images in the video, and inputting each target frame into the digestive tract tumor recognition model to obtain the maximum tumor prediction probability corresponding to all prediction frames;
and calculating the average value of the maximum tumor prediction probabilities of all target frames, and taking the target prediction frame with the tumor prediction probability closest to the average value as a prediction frame corresponding to a real frame.
6. The system according to claim 5, wherein the determining the target frame according to the position of the training sample image in the video is specifically:
acquiring the position of the training sample image in the video, and determining N frames of images before the position and N frames of images after the position; wherein N is a positive integer greater than 2;
and acquiring a plurality of sub-region images from the training sample image according to the target prediction frame and the real frame, and acquiring a target frame from the 2N frame image according to the sub-region images.
7. The system according to claim 6, wherein the target frame is acquired from the 2N frame image according to the sub-region image, specifically:
searching a region with highest similarity of the sub-region images corresponding to the real frames in each of the 2N frame images, and determining target sub-region images corresponding to the plurality of sub-region images based on the position relation among the plurality of sub-region images and the position of the region with highest similarity in the 2N frame images;
establishing a corresponding relation between the sub-region image and the target sub-region image, calculating the similarity between the sub-region image and the corresponding target sub-region image, and calculating the distance between the target sub-region image corresponding to the real frame and the real frame, selecting at least one image from the 2N frame images, wherein the similarity and the distance corresponding to the selected image meet the preset condition.
8. A computer readable storage medium having stored thereon a computer program, which when executed by a processor, implements the method according to any of claims 1-4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311754713.2A CN117437580B (en) | 2023-12-20 | 2023-12-20 | Digestive tract tumor recognition method, digestive tract tumor recognition system and digestive tract tumor recognition medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311754713.2A CN117437580B (en) | 2023-12-20 | 2023-12-20 | Digestive tract tumor recognition method, digestive tract tumor recognition system and digestive tract tumor recognition medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117437580A CN117437580A (en) | 2024-01-23 |
CN117437580B true CN117437580B (en) | 2024-03-22 |
Family
ID=89550186
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311754713.2A Active CN117437580B (en) | 2023-12-20 | 2023-12-20 | Digestive tract tumor recognition method, digestive tract tumor recognition system and digestive tract tumor recognition medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117437580B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114120071A (en) * | 2021-12-09 | 2022-03-01 | 北京车网科技发展有限公司 | Detection method of image with object labeling frame |
CN114140651A (en) * | 2021-12-09 | 2022-03-04 | 深圳市资福医疗技术有限公司 | Stomach focus recognition model training method and stomach focus recognition method |
CN114565762A (en) * | 2022-02-28 | 2022-05-31 | 西安电子科技大学 | Weakly supervised liver tumor segmentation based on ROI and split fusion strategy |
CN116309536A (en) * | 2023-04-23 | 2023-06-23 | 西安理工大学 | Pavement crack detection method and storage medium |
CN117173182A (en) * | 2023-11-03 | 2023-12-05 | 厦门微亚智能科技股份有限公司 | Defect detection method, system, equipment and medium based on coding and decoding network |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP7429715B2 (en) * | 2020-01-09 | 2024-02-08 | オリンパス株式会社 | Image processing system, endoscope system, image processing system operating method and program |
-
2023
- 2023-12-20 CN CN202311754713.2A patent/CN117437580B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114120071A (en) * | 2021-12-09 | 2022-03-01 | 北京车网科技发展有限公司 | Detection method of image with object labeling frame |
CN114140651A (en) * | 2021-12-09 | 2022-03-04 | 深圳市资福医疗技术有限公司 | Stomach focus recognition model training method and stomach focus recognition method |
CN114565762A (en) * | 2022-02-28 | 2022-05-31 | 西安电子科技大学 | Weakly supervised liver tumor segmentation based on ROI and split fusion strategy |
CN116309536A (en) * | 2023-04-23 | 2023-06-23 | 西安理工大学 | Pavement crack detection method and storage medium |
CN117173182A (en) * | 2023-11-03 | 2023-12-05 | 厦门微亚智能科技股份有限公司 | Defect detection method, system, equipment and medium based on coding and decoding network |
Non-Patent Citations (2)
Title |
---|
Transfer learning for fluence map prediction in adrenal stereotactic body radiation therapy;Wang Wentao et al.;《 Physics in medicine and biology》;20211231;第1-7页 * |
基于DETR的超声甲状旁腺亢进检测方法研究;肖宇峰;《万方》;20230927;第1-61页 * |
Also Published As
Publication number | Publication date |
---|---|
CN117437580A (en) | 2024-01-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Farhat et al. | Deep learning applications in pulmonary medical imaging: recent updates and insights on COVID-19 | |
KR102210806B1 (en) | Apparatus and method for diagnosing gastric lesion using deep learning of endoscopic images | |
CN113379693B (en) | Capsule endoscope key focus image detection method based on video abstraction technology | |
US10482313B2 (en) | Method and system for classification of endoscopic images using deep decision networks | |
Srinidhi et al. | Automated method for retinal artery/vein separation via graph search metaheuristic approach | |
CN109523522A (en) | Processing method, device, system and the storage medium of endoscopic images | |
CN111968091A (en) | Method for detecting and classifying lesion areas in clinical image | |
CN112466466B (en) | Digestive tract auxiliary detection method and device based on deep learning and computing equipment | |
US11935239B2 (en) | Control method, apparatus and program for system for determining lesion obtained via real-time image | |
CN114581375A (en) | Method, device and storage medium for automatically detecting focus of wireless capsule endoscope | |
Seok et al. | The semantic segmentation approach for normal and pathologic tympanic membrane using deep learning | |
CN111738992A (en) | Lung focus region extraction method and device, electronic equipment and storage medium | |
Yue et al. | Benchmarking polyp segmentation methods in narrow-band imaging colonoscopy images | |
CN117058467B (en) | Gastrointestinal tract lesion type identification method and system | |
CN111401102A (en) | Deep learning model training method and device, electronic equipment and storage medium | |
CN117437580B (en) | Digestive tract tumor recognition method, digestive tract tumor recognition system and digestive tract tumor recognition medium | |
Arnold et al. | Indistinct frame detection in colonoscopy videos | |
CN114037686B (en) | Children intussusception automatic check out system based on degree of depth learning | |
CN112885435B (en) | Method, device and system for determining image target area | |
Gatoula et al. | Enhanced CNN-based gaze estimation on wireless capsule endoscopy images | |
CN114271763A (en) | Mask RCNN-based gastric cancer early identification method, system and device | |
Yan et al. | Unsupervised body part regression using convolutional neural network with self-organization | |
Stancilas et al. | Detection of Pathological Markers in Colonoscopy Images using YOLOv7 | |
Banik et al. | Recent advances in intelligent imaging systems for early prediction of colorectal cancer: a perspective | |
KR102502418B1 (en) | Medical image processing apparatus and method using neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |