CN117437580A - Digestive tract tumor recognition method, digestive tract tumor recognition system and digestive tract tumor recognition medium - Google Patents

Digestive tract tumor recognition method, digestive tract tumor recognition system and digestive tract tumor recognition medium Download PDF

Info

Publication number
CN117437580A
CN117437580A CN202311754713.2A CN202311754713A CN117437580A CN 117437580 A CN117437580 A CN 117437580A CN 202311754713 A CN202311754713 A CN 202311754713A CN 117437580 A CN117437580 A CN 117437580A
Authority
CN
China
Prior art keywords
frame
prediction
frames
target
real
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311754713.2A
Other languages
Chinese (zh)
Other versions
CN117437580B (en
Inventor
郑中文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong General Hospital
Original Assignee
Guangdong General Hospital
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong General Hospital filed Critical Guangdong General Hospital
Priority to CN202311754713.2A priority Critical patent/CN117437580B/en
Publication of CN117437580A publication Critical patent/CN117437580A/en
Application granted granted Critical
Publication of CN117437580B publication Critical patent/CN117437580B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10068Endoscopic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30096Tumor; Lesion
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the application of artificial intelligence in the medical field, in particular to a digestive tract tumor recognition method, which comprises the steps of inputting training sample images in a sample set into a digestive tract tumor recognition model to obtain a prediction frame and tumor prediction probability corresponding to the prediction frame, and calculating loss according to the position of the prediction frame, the tumor prediction probability corresponding to the prediction frame and the position of a real frame to obtain loss corresponding to the prediction frame; when the number of the real frames is 1 and the number of the target prediction frames is greater than 1, determining the target prediction frames corresponding to the real frames according to other frames around the training image in the video, and establishing a corresponding relation between the real frames and the target prediction frames; the target prediction frame is the prediction frame with the minimum loss in all the prediction frames; calculating the loss corresponding to the training sample image based on the corresponding relation, and training the model by adopting back propagation; frames are extracted from the video of the endoscope and identified by the trained model. The invention improves the accuracy of the model and the training speed.

Description

Digestive tract tumor recognition method, digestive tract tumor recognition system and digestive tract tumor recognition medium
Technical Field
The invention relates to the application field of artificial intelligence in medical treatment, in particular to a digestive tract tumor identification method, a digestive tract tumor identification system and a digestive tract tumor identification medium.
Background
Digestive tract tumors, including esophageal cancer, gastric cancer, colorectal cancer, small intestine cancer, anal cancer, etc., are mainly increased in risk due to eating habits and chronic infections (such as helicobacter pylori infection), while colorectal cancer is more closely related to lifestyle factors (such as lack of eating and physical activity). The complexity and diversity of these tumor types requires that the medical community continually seek more efficient, more accurate diagnostic methods. Traditional diagnostic methods, such as endoscopy, radiological imaging and biomarker detection, while widely used clinically, remain limited in terms of the presence of significant limitations, such as early tumor identification and precise typing. Especially in the early stages, the digestive tract tumor may have no obvious symptoms or easily distinguishable features, which increases the difficulty of early diagnosis.
With the development of artificial intelligence technology, the technology has great potential in the medical field, especially in diagnosis of digestive tract tumors. However, artificial intelligence also faces several challenges in gut tumor recognition, the most significant of which is the impact of tumor heterogeneity on the generalization ability of artificial intelligence models. In digestive tract tumors, even the same type of tumor, different biological characteristics and clinical manifestations may be exhibited between different patients. Such differences may result from genetic differences, environmental factors, or biological properties of the tumor itself. For example, the tumors of two gastric cancer patients may differ significantly in size, morphology, growth rate, and cell composition, and these differences directly affect the recognition and analysis capabilities of the artificial intelligence model, requiring higher accuracy of the model. How to improve the accuracy of identifying digestive tract tumors becomes a key of artificial intelligence in identifying digestive tract tumors.
Disclosure of Invention
In order to improve the accuracy and the training speed of the digestive tract tumor recognition, the invention provides a digestive tract tumor recognition method, which comprises the following steps of:
obtaining an endoscope video of digestive tract examination, marking images in the video to obtain a sample set, inputting training sample images in the sample set into a digestive tract tumor recognition model to obtain a prediction frame and a tumor prediction probability corresponding to the prediction frame, and calculating losses according to the position of the prediction frame, the tumor prediction probability corresponding to the prediction frame and the position of a real frame to obtain losses corresponding to each prediction frame;
when the number of the real frames is 1, if the number of the target predicted frames is 1, establishing a corresponding relation between the real frames and the target predicted frames; if the number of the target prediction frames is greater than 1, determining target prediction frames corresponding to the real frames according to other frames around the training sample image in the video, and establishing a corresponding relation between the real frames and the target prediction frames; when the number of the real frames is larger than 1, constructing a cost matrix and determining the corresponding relation between the real frames and the predicted frames by adopting a Hungary algorithm; the target prediction frame is the prediction frame with the minimum loss in all the prediction frames;
calculating the corresponding loss of the training sample image based on the corresponding relation, and training the model by adopting back propagation; in endoscopy of a tumor of the digestive tract, frames are extracted from the video of the endoscope and identified by the trained model.
Preferably, the determining a target prediction frame corresponding to the real frame according to other frames around the training sample image in the video specifically includes:
obtaining a tumor prediction probability corresponding to each target prediction frame;
determining target frames according to the positions of the training sample images in the video, and inputting each target frame into the digestive tract tumor recognition model to obtain the maximum tumor prediction probability corresponding to all prediction frames;
and calculating the average value of the maximum tumor prediction probabilities of all target frames, and taking the target prediction frame with the tumor prediction probability closest to the average value as a prediction frame corresponding to a real frame.
Preferably, the determining the target frame according to the position of the training sample image in the video specifically includes:
acquiring the position of the training sample image in the video, and determining N frames of images before the position and N frames of images after the position; wherein N is a positive integer greater than 2;
and acquiring a plurality of sub-region images from the training sample image according to the target prediction frame and the real frame, and acquiring a target frame from the 2N frame image according to the sub-region images.
Preferably, the target frame is obtained from the 2N frame image according to the sub-region image, specifically:
searching a region with highest similarity of the sub-region images corresponding to the real frames in each of the 2N frame images, and determining target sub-region images corresponding to the plurality of sub-region images based on the position relation among the plurality of sub-region images and the position of the region with highest similarity in the 2N frame images;
establishing a corresponding relation between the sub-region image and the target sub-region image, calculating the similarity between the sub-region image and the corresponding target sub-region image, and calculating the distance between the target sub-region image corresponding to the real frame and the real frame, selecting at least one image from the 2N frame images, wherein the similarity and the distance corresponding to the selected image meet the preset condition.
Preferably, the calculating the loss according to the position of the prediction frame, the tumor prediction probability corresponding to the prediction frame, and the position of the real frame obtains the loss corresponding to each prediction frame, specifically:
obtaining probability loss according to the tumor prediction probability corresponding to the prediction frame; obtaining a boundary frame loss based on the IOU or GIOU of the prediction frame and the real frame; and summing the probability loss and the boundary box loss to obtain the loss corresponding to the prediction box.
In addition, the invention also provides a digestive tract tumor recognition system, which comprises the following modules:
the prediction frame loss acquisition module is used for acquiring an endoscopic video of the digestive tract examination, marking images in the video to obtain a sample set, inputting training sample images in the sample set into the digestive tract tumor recognition model to obtain a prediction frame and a tumor prediction probability corresponding to the prediction frame, and calculating loss according to the position of the prediction frame, the tumor prediction probability corresponding to the prediction frame and the position of a real frame to obtain loss corresponding to each prediction frame;
the target prediction frame calculation module is used for establishing a corresponding relation between the real frames and the target prediction frames if the number of the real frames is 1 and the number of the target prediction frames is 1; if the number of the target prediction frames is greater than 1, determining target prediction frames corresponding to the real frames according to other frames around the training sample image in the video, and establishing a corresponding relation between the real frames and the target prediction frames; when the number of the real frames is larger than 1, constructing a cost matrix and determining the corresponding relation between the real frames and the predicted frames by adopting a Hungary algorithm; the target prediction frame is the prediction frame with the minimum loss in all the prediction frames;
the tumor recognition module is used for calculating the loss corresponding to the training sample image based on the corresponding relation and training the model by adopting back propagation; in endoscopy of a tumor of the digestive tract, frames are extracted from the video of the endoscope and identified by the trained model.
Preferably, the determining a target prediction frame corresponding to the real frame according to other frames around the training sample image in the video specifically includes:
obtaining a tumor prediction probability corresponding to each target prediction frame;
determining target frames according to the positions of the training sample images in the video, and inputting each target frame into the digestive tract tumor recognition model to obtain the maximum tumor prediction probability corresponding to all prediction frames;
and calculating the average value of the maximum tumor prediction probabilities of all target frames, and taking the target prediction frame with the tumor prediction probability closest to the average value as a prediction frame corresponding to a real frame.
Preferably, the determining the target frame according to the position of the training sample image in the video specifically includes:
acquiring the position of the training sample image in the video, and determining N frames of images before the position and N frames of images after the position; wherein N is a positive integer greater than 2;
and acquiring a plurality of sub-region images from the training sample image according to the target prediction frame and the real frame, and acquiring a target frame from the 2N frame image according to the sub-region images.
Preferably, the target frame is obtained from the 2N frame image according to the sub-region image, specifically:
searching a region with highest similarity of the sub-region images corresponding to the real frames in each of the 2N frame images, and determining target sub-region images corresponding to the plurality of sub-region images based on the position relation among the plurality of sub-region images and the position of the region with highest similarity in the 2N frame images;
establishing a corresponding relation between the sub-region image and the target sub-region image, calculating the similarity between the sub-region image and the corresponding target sub-region image, and calculating the distance between the target sub-region image corresponding to the real frame and the real frame, selecting at least one image from the 2N frame images, wherein the similarity and the distance corresponding to the selected image meet the preset condition.
Preferably, the calculating the loss according to the position of the prediction frame, the tumor prediction probability corresponding to the prediction frame, and the position of the real frame obtains the loss corresponding to each prediction frame, specifically:
obtaining probability loss according to the tumor prediction probability corresponding to the prediction frame; obtaining a boundary frame loss based on the IOU or GIOU of the prediction frame and the real frame; and summing the probability loss and the boundary box loss to obtain the loss corresponding to the prediction box.
Finally, the invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as described above.
In the training of the digestive tract tumor recognition model, because of limited samples, the convergence and the accuracy of the model are deficient, and based on the model, the loss corresponding to each prediction frame is obtained according to the position of the prediction frame, the tumor prediction probability corresponding to the prediction frame and the position calculation loss of the real frame; when the number of the real frames is 1, if the number of the target predicted frames is 1, establishing a corresponding relation between the real frames and the target predicted frames; if the number of the target prediction frames is greater than 1, determining target prediction frames corresponding to the real frames according to other frames around the training sample image in the video, and establishing a corresponding relation between the real frames and the target prediction frames; and calculating the loss corresponding to the training sample image based on the corresponding relation, and training the model by adopting back propagation. The invention not only has more accurate calculation loss, but also reduces the calculation amount and improves the training speed.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a first embodiment;
FIG. 2 is a flowchart of step S2;
FIG. 3 is a schematic diagram of a real frame and a target prediction frame;
FIG. 4 is a schematic diagram of a real frame and a target prediction frame in a frame image;
fig. 5 is a structural diagram of the second embodiment.
Detailed Description
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In a first embodiment, the present invention provides a method for identifying tumor of digestive tract, as shown in fig. 1, the method comprises the following steps:
s1, obtaining an endoscope video of digestive tract examination, marking images in the video to obtain a sample set, inputting training sample images in the sample set into a digestive tract tumor recognition model to obtain a prediction frame and tumor prediction probability corresponding to the prediction frame, and calculating loss according to the position of the prediction frame, the tumor prediction probability corresponding to the prediction frame and the position of a real frame to obtain loss corresponding to each prediction frame;
when the endoscope is used for checking the digestive tract, a doctor usually judges whether a lesion or a tumor exists according to experience, the doctor usually depends on the experience of the doctor, and omission can occur. When training a digestive tract tumor recognition model, obtaining an endoscope video of digestive tract examination, and then labeling images in the video to obtain a sample set, wherein the video is composed of a plurality of frames, each frame is an image, and when collecting a sample, a clear image of a tumor is preferentially selected as a labeling object to obtain a first set. And marking the image with clear tumor, and marking the image with blurred tumor to obtain a second set.
The first set is preferentially used as training samples when training the digestive tract tumor recognition model. The digestive tract tumor recognition model is preferably a DETR model, where the DETR model outputs multiple prediction results and multiple prediction frames (predicted boxes) at the same time, where each prediction frame corresponds to one prediction result, for example, the prediction probability corresponding to the prediction frame 1 is 0.1, if the DETR outputs 10 prediction frames at a time, the 10 prediction frames correspond to the 10 tumor probabilities, and there is position information corresponding to the 10 prediction frames, and an exemplary position is expressed in (x_min, y_min, x_max, y_max) or (x_center, y_center, width, height). If the tumor prediction probability corresponding to the position is added, the prediction probability may be expressed as (x_center, y_center, width, height, pr), where x_center, y_center represent the center coordinates of the prediction frame, width, height represent the width and height of the prediction frame, and pr represents the tumor prediction probability corresponding to the prediction frame.
Then, the loss of each prediction frame to each real frame (also called as a group frame or a label frame) is calculated, assuming that there are 8 prediction frames and 2 real frames, the loss of the 1 st prediction frame and the 1 st real frame are calculated respectively, then the loss of the 1 st prediction frame and the 2 nd real frame are calculated, and so on until the last. The loss corresponding to each prediction frame is obtained by calculating the loss according to the position of the prediction frame, the tumor prediction probability corresponding to the prediction frame and the position of the real frame, and specifically comprises the following steps:
obtaining probability loss (also called classification loss) according to the tumor prediction probability corresponding to the prediction frame; obtaining a bounding box penalty (also known as a regression penalty) based on the IOU or GIOU of the predicted and real boxes; and summing the probability loss and the boundary box loss to obtain the loss corresponding to the prediction box. Table 1 below shows one example:
TABLE 1
In table 1, pbox represents a prediction box, tbox represents a real box, and tbox loss corresponding to pbox is represented in a+b form, where a represents a probability loss, and b represents a bounding box loss; f represents the predicted loss of no tumor, and if no tumor is predicted, it is also free of bounding box loss. The relation between the tumor prediction probability and the probability loss is that the probability loss is obtained by 1-tumor prediction probability, and a mode that 1-log (tumor prediction probability) is equal to the probability loss can be adopted, so that the invention is not particularly limited. Taking the loss of 0.2+0.1 of the prediction frame 1 relative to the real frame 1 as an example, 0.2 represents the loss of the prediction frame 1 corresponding to the prediction as a tumor, and 0.1 represents the loss corresponding to the IOU or GIOU of the prediction frame 1 and the real frame 1.
S2, when the number of the real frames is 1, if the number of the target predicted frames is 1, establishing a corresponding relation between the real frames and the target predicted frames; if the number of the target prediction frames is greater than 1, determining target prediction frames corresponding to the real frames according to other frames around the training sample image in the video, and establishing a corresponding relation between the real frames and the target prediction frames; when the number of the real frames is larger than 1, constructing a cost matrix and determining the corresponding relation between the real frames and the predicted frames by adopting a Hungary algorithm; the target prediction frame is the prediction frame with the minimum loss in all the prediction frames; the flow chart of S2 is shown in fig. 2.
Each real frame should correspond to a prediction frame, in the DETR model, a cost matrix is calculated, a hungarian algorithm (Hungarian algorithm) is used to find the prediction frame corresponding to each real frame according to the cost matrix, and then loss is calculated. The hungarian algorithm in the above process is a problem of combination, and the process is complex, but in the process of identifying digestive tract tumors, the number of tumors in the image is small, and the complex process of the hungarian algorithm is not necessary.
In the model training of the invention, firstly, the number of the real frames in one sample is judged, if only one real frame exists, the prediction frame with the minimum loss for the real frame is selected, and the corresponding relation between the two frames is established. If there are two or more predicted frames for which the loss is minimal, i.e., the loss of two or more predicted frames for the real frame is equal and minimal, as shown in Table 2:
TABLE 2
In table 2, the losses of prediction frame 1 and prediction frame 8 with respect to real frame 1 are the same, since prediction frame 1 is used to match real frame 1 and prediction frame 2 is used to match real frame 1, the loss of this training, calculated last, is different, and it is necessary to further determine whether to match prediction frame 1 with real frame 1 or to match prediction frame 2 with real frame 1.
In a specific embodiment, the determining a target prediction frame corresponding to a real frame according to other frames around the training sample image in the video specifically includes:
obtaining a tumor prediction probability corresponding to each target prediction frame;
each target prediction frame corresponds to a position and a prediction probability of the target prediction frame, for example, (x_center, y_center, width, height, pr), where pr is a tumor prediction probability.
Determining target frames according to the positions of the training sample images in the video, and inputting each target frame into the digestive tract tumor recognition model to obtain the maximum tumor prediction probability corresponding to all prediction frames;
and calculating the average value of the maximum tumor prediction probabilities of all target frames, and taking the target prediction frame with the tumor prediction probability closest to the average value as a prediction frame corresponding to a real frame.
In the video, the frame around the training sample image is similar to the training sample image, and in general, there will be a tumor marked by a real frame also appearing in the frame around the training sample image, except that the position of the tumor in the frame around the training sample will change due to the movement of the endoscope or the like. Based on the above, the determined target frames are input into the model, then the maximum tumor prediction probability in all the prediction frames corresponding to each target frame is obtained, then the average value of the maximum tumor prediction probabilities of all the target frames is calculated, and the reliability of the probability loss corresponding to the target prediction frames can be judged through the average value.
For example, the target prediction frame has two pbox1 and pbox8, and the corresponding tumor prediction probabilities are 0.8 and 0.9, respectively. After the target frames pass through the model, the number of target frames is 3, each target frame corresponds to 8 prediction frames, namely 8 prediction frames of each target frame correspond to 8 tumor prediction probabilities, and the maximum value is taken; the maximum tumor prediction probabilities corresponding to the 3 target frames are assumed to be respectively: 0.8, 0.6, 0.9, with an average value of 0.77, which is closest to 0.8, then finally pbox1 is used as the prediction box for the real box.
In an optional embodiment, the determining the target frame according to the position of the training sample image in the video specifically is:
acquiring the position of the training sample image in the video, and determining N frames of images before the position and N frames of images after the position; wherein N is a positive integer greater than 2;
and acquiring a plurality of sub-region images from the training sample image according to the target prediction frame and the real frame, and acquiring a target frame from the 2N frame image according to the sub-region images.
When selecting the target frame, it is necessary to ensure that the image area of the real frame in the target training sample exists in the target frame, otherwise, the reliability of the average value obtained above is very low, and the final effect is reduced. The invention selects the target frame from the frames around the training sample image, and further screens from 2N frame images according to the target prediction frame and the real frame. In an optional embodiment, the target frame is obtained from the 2N frame image according to the sub-region image, specifically:
searching a region with highest similarity of the sub-region images corresponding to the real frames in each of the 2N frame images, and determining target sub-region images corresponding to the plurality of sub-region images based on the position relation among the plurality of sub-region images and the position of the region with highest similarity in the 2N frame images;
the sub-region images are obtained by capturing images of the size from the position in the target sample image, and the sub-region images corresponding to the target prediction frames are obtained in the same sub-region image obtaining mode.
Then, for each of the 2N frame images, searching for the region with the highest similarity of the sub-region image corresponding to the real frame, so that a sub-region is obtained in each of the 2N frame images, and then determining the position of the sub-region in each of the 2N frame images. And determining a plurality of target sub-region images according to the position relation of the plurality of sub-region images in the target sample image and the position of the sub-region in the frame image. Assuming that there are 1 real frame and 2 target prediction frames, the relationship of the three is shown in fig. 3. After determining the region with the highest similarity with the real frame in the frame image, 3 target sub-region images can be determined according to the triangle, as shown in fig. 4.
Establishing a corresponding relation between the sub-region image and the target sub-region image, calculating the similarity between the sub-region image and the corresponding target sub-region image, and calculating the distance between the target sub-region image corresponding to the real frame and the real frame, selecting at least one image from the 2N frame images, wherein the similarity and the distance corresponding to the selected image meet the preset condition.
The method comprises the steps that a real frame and a plurality of target prediction frames exist in a target sample image, a target sub-region image corresponding to the real frame and a target sub-region image corresponding to the target prediction frames are determined in the frame image, and therefore a corresponding relation between the sub-region image in the target sample image and the target sub-region image in the frame image is established. Then calculating the similarity of the sub-region image and the corresponding target sub-region image, calculating the distance between the target sub-region image corresponding to the real frame and the real frame in the target sample image, if the similarity is high, indicating that the real frame and the target prediction frame exist in the frame image, otherwise, discarding the frame image; meanwhile, certain differences between the frame image and the target sample image are guaranteed, so that the reliability of the calculated average value is higher.
In the digestive tract tumor recognition, for the situation that more than one unusual training sample image exists, for programming simplicity, the corresponding relation between a real frame and a predicted frame is still determined by adopting a default mode of a DETR model.
S3, calculating the loss corresponding to the training sample image based on the corresponding relation, and training the model by adopting back propagation; in endoscopy of a tumor of the digestive tract, frames are extracted from the video of the endoscope and identified by the trained model.
After the corresponding relation between the prediction frame and the real frame is established, the loss corresponding to the training sample image is calculated, and in an alternative embodiment, the loss corresponding to the training sample image adopts a loss calculation mode in the DETR model. By training the digestive tract tumor recognition model through the method, the accuracy and the accuracy of the model can be improved, and the calculated amount in the training process, particularly the calculated amount brought by the Hungary algorithm, can be reduced.
In endoscopy of a tumor in a digestive tract, video shot by an endoscope is synchronously transmitted to a server or a host computer, frames are extracted from the video of the endoscope, and the trained model is adopted for identification. In an alternative embodiment, frames are extracted from the video of the endoscope in a mode of equal time intervals, for example, frames are extracted every 1s, or a mode of extracting only I frames and/or P frames in the video is also adopted, and the specific extraction mode is not limited in the present invention. According to different training samples, the model of the invention can be only used for detecting gastric cancer or esophagus cancer and the like. Of course, the method can also be applied to other related diseases detected by the endoscope.
In a second embodiment, the present invention further provides a digestive tract tumor recognition system, as shown in fig. 5, the system includes the following modules:
the prediction frame loss acquisition module is used for acquiring an endoscopic video of the digestive tract examination, marking images in the video to obtain a sample set, inputting training sample images in the sample set into the digestive tract tumor recognition model to obtain a prediction frame and a tumor prediction probability corresponding to the prediction frame, and calculating loss according to the position of the prediction frame, the tumor prediction probability corresponding to the prediction frame and the position of a real frame to obtain loss corresponding to each prediction frame;
the target prediction frame calculation module is used for establishing a corresponding relation between the real frames and the target prediction frames if the number of the real frames is 1 and the number of the target prediction frames is 1; if the number of the target prediction frames is greater than 1, determining target prediction frames corresponding to the real frames according to other frames around the training sample image in the video, and establishing a corresponding relation between the real frames and the target prediction frames; when the number of the real frames is larger than 1, constructing a cost matrix and determining the corresponding relation between the real frames and the predicted frames by adopting a Hungary algorithm; the target prediction frame is the prediction frame with the minimum loss in all the prediction frames;
the tumor recognition module is used for calculating the loss corresponding to the training sample image based on the corresponding relation and training the model by adopting back propagation; in endoscopy of a tumor of the digestive tract, frames are extracted from the video of the endoscope and identified by the trained model.
Preferably, the determining a target prediction frame corresponding to the real frame according to other frames around the training sample image in the video specifically includes:
obtaining a tumor prediction probability corresponding to each target prediction frame;
determining target frames according to the positions of the training sample images in the video, and inputting each target frame into the digestive tract tumor recognition model to obtain the maximum tumor prediction probability corresponding to all prediction frames;
and calculating the average value of the maximum tumor prediction probabilities of all target frames, and taking the target prediction frame with the tumor prediction probability closest to the average value as a prediction frame corresponding to a real frame.
Preferably, the determining the target frame according to the position of the training sample image in the video specifically includes:
acquiring the position of the training sample image in the video, and determining N frames of images before the position and N frames of images after the position; wherein N is a positive integer greater than 2;
and acquiring a plurality of sub-region images from the training sample image according to the target prediction frame and the real frame, and acquiring a target frame from the 2N frame image according to the sub-region images.
Preferably, the target frame is obtained from the 2N frame image according to the sub-region image, specifically:
searching a region with highest similarity of the sub-region images corresponding to the real frames in each of the 2N frame images, and determining target sub-region images corresponding to the plurality of sub-region images based on the position relation among the plurality of sub-region images and the position of the region with highest similarity in the 2N frame images;
establishing a corresponding relation between the sub-region image and the target sub-region image, calculating the similarity between the sub-region image and the corresponding target sub-region image, and calculating the distance between the target sub-region image corresponding to the real frame and the real frame, selecting at least one image from the 2N frame images, wherein the similarity and the distance corresponding to the selected image meet the preset condition.
Preferably, the calculating the loss according to the position of the prediction frame, the tumor prediction probability corresponding to the prediction frame, and the position of the real frame obtains the loss corresponding to each prediction frame, specifically:
obtaining probability loss according to the tumor prediction probability corresponding to the prediction frame; obtaining a boundary frame loss based on the IOU or GIOU of the prediction frame and the real frame; and summing the probability loss and the boundary box loss to obtain the loss corresponding to the prediction box.
In a third embodiment, the present invention further provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the method according to the first embodiment.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by adding necessary general purpose hardware platforms, or may be implemented by a combination of hardware and software. Based on such understanding, the foregoing aspects, in essence and portions contributing to the art, may be embodied in the form of a computer program product, which may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method of gut tumor identification, the method comprising the steps of:
obtaining an endoscope video of digestive tract examination, marking images in the video to obtain a sample set, inputting training sample images in the sample set into a digestive tract tumor recognition model to obtain a prediction frame and a tumor prediction probability corresponding to the prediction frame, and calculating losses according to the position of the prediction frame, the tumor prediction probability corresponding to the prediction frame and the position of a real frame to obtain losses corresponding to each prediction frame;
when the number of the real frames is 1, if the number of the target predicted frames is 1, establishing a corresponding relation between the real frames and the target predicted frames; if the number of the target prediction frames is greater than 1, determining target prediction frames corresponding to the real frames according to other frames around the training sample image in the video, and establishing a corresponding relation between the real frames and the target prediction frames; when the number of the real frames is larger than 1, constructing a cost matrix and determining the corresponding relation between the real frames and the predicted frames by adopting a Hungary algorithm; the target prediction frame is the prediction frame with the minimum loss in all the prediction frames;
calculating the corresponding loss of the training sample image based on the corresponding relation, and training the model by adopting back propagation; in endoscopy of a tumor of the digestive tract, frames are extracted from the video of the endoscope and identified by the trained model.
2. The method according to claim 1, wherein the determining a target prediction frame corresponding to a real frame according to other frames around the training sample image in the video is specifically:
obtaining a tumor prediction probability corresponding to each target prediction frame;
determining target frames according to the positions of the training sample images in the video, and inputting each target frame into the digestive tract tumor recognition model to obtain the maximum tumor prediction probability corresponding to all prediction frames;
and calculating the average value of the maximum tumor prediction probabilities of all target frames, and taking the target prediction frame with the tumor prediction probability closest to the average value as a prediction frame corresponding to a real frame.
3. The method according to claim 2, wherein the determining a target frame from the position of the training sample image in the video is in particular:
acquiring the position of the training sample image in the video, and determining N frames of images before the position and N frames of images after the position; wherein N is a positive integer greater than 2;
and acquiring a plurality of sub-region images from the training sample image according to the target prediction frame and the real frame, and acquiring a target frame from the 2N frame image according to the sub-region images.
4. A method according to claim 3, wherein the target frame is acquired from the 2N frame image according to the sub-region image, specifically:
searching a region with highest similarity of the sub-region images corresponding to the real frames in each of the 2N frame images, and determining target sub-region images corresponding to the plurality of sub-region images based on the position relation among the plurality of sub-region images and the position of the region with highest similarity in the 2N frame images;
establishing a corresponding relation between the sub-region image and the target sub-region image, calculating the similarity between the sub-region image and the corresponding target sub-region image, and calculating the distance between the target sub-region image corresponding to the real frame and the real frame, selecting at least one image from the 2N frame images, wherein the similarity and the distance corresponding to the selected image meet the preset condition.
5. The method of claim 1, wherein the calculating the loss according to the position of the predicted frame, the tumor prediction probability corresponding to the predicted frame, and the position of the real frame obtains the loss corresponding to each predicted frame, specifically:
obtaining probability loss according to the tumor prediction probability corresponding to the prediction frame; obtaining a boundary frame loss based on the IOU or GIOU of the prediction frame and the real frame; and summing the probability loss and the boundary box loss to obtain the loss corresponding to the prediction box.
6. A digestive tract tumor recognition system, the system comprising the following modules:
the prediction frame loss acquisition module is used for acquiring an endoscopic video of the digestive tract examination, marking images in the video to obtain a sample set, inputting training sample images in the sample set into the digestive tract tumor recognition model to obtain a prediction frame and a tumor prediction probability corresponding to the prediction frame, and calculating loss according to the position of the prediction frame, the tumor prediction probability corresponding to the prediction frame and the position of a real frame to obtain loss corresponding to each prediction frame;
the target prediction frame calculation module is used for establishing a corresponding relation between the real frames and the target prediction frames if the number of the real frames is 1 and the number of the target prediction frames is 1; if the number of the target prediction frames is greater than 1, determining target prediction frames corresponding to the real frames according to other frames around the training sample image in the video, and establishing a corresponding relation between the real frames and the target prediction frames; when the number of the real frames is larger than 1, constructing a cost matrix and determining the corresponding relation between the real frames and the predicted frames by adopting a Hungary algorithm; the target prediction frame is the prediction frame with the minimum loss in all the prediction frames;
the tumor recognition module is used for calculating the loss corresponding to the training sample image based on the corresponding relation and training the model by adopting back propagation; in endoscopy of a tumor of the digestive tract, frames are extracted from the video of the endoscope and identified by the trained model.
7. The system according to claim 6, wherein the determining a target prediction frame corresponding to a real frame according to other frames around the training sample image in the video is specifically:
obtaining a tumor prediction probability corresponding to each target prediction frame;
determining target frames according to the positions of the training sample images in the video, and inputting each target frame into the digestive tract tumor recognition model to obtain the maximum tumor prediction probability corresponding to all prediction frames;
and calculating the average value of the maximum tumor prediction probabilities of all target frames, and taking the target prediction frame with the tumor prediction probability closest to the average value as a prediction frame corresponding to a real frame.
8. The system according to claim 7, wherein the determining the target frame according to the position of the training sample image in the video is specifically:
acquiring the position of the training sample image in the video, and determining N frames of images before the position and N frames of images after the position; wherein N is a positive integer greater than 2;
and acquiring a plurality of sub-region images from the training sample image according to the target prediction frame and the real frame, and acquiring a target frame from the 2N frame image according to the sub-region images.
9. The system according to claim 8, wherein the target frame is acquired from the 2N frame image according to the sub-region image, specifically:
searching a region with highest similarity of the sub-region images corresponding to the real frames in each of the 2N frame images, and determining target sub-region images corresponding to the plurality of sub-region images based on the position relation among the plurality of sub-region images and the position of the region with highest similarity in the 2N frame images;
establishing a corresponding relation between the sub-region image and the target sub-region image, calculating the similarity between the sub-region image and the corresponding target sub-region image, and calculating the distance between the target sub-region image corresponding to the real frame and the real frame, selecting at least one image from the 2N frame images, wherein the similarity and the distance corresponding to the selected image meet the preset condition.
10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor, implements the method according to any of claims 1-5.
CN202311754713.2A 2023-12-20 2023-12-20 Digestive tract tumor recognition method, digestive tract tumor recognition system and digestive tract tumor recognition medium Active CN117437580B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311754713.2A CN117437580B (en) 2023-12-20 2023-12-20 Digestive tract tumor recognition method, digestive tract tumor recognition system and digestive tract tumor recognition medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311754713.2A CN117437580B (en) 2023-12-20 2023-12-20 Digestive tract tumor recognition method, digestive tract tumor recognition system and digestive tract tumor recognition medium

Publications (2)

Publication Number Publication Date
CN117437580A true CN117437580A (en) 2024-01-23
CN117437580B CN117437580B (en) 2024-03-22

Family

ID=89550186

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311754713.2A Active CN117437580B (en) 2023-12-20 2023-12-20 Digestive tract tumor recognition method, digestive tract tumor recognition system and digestive tract tumor recognition medium

Country Status (1)

Country Link
CN (1) CN117437580B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114120071A (en) * 2021-12-09 2022-03-01 北京车网科技发展有限公司 Detection method of image with object labeling frame
CN114140651A (en) * 2021-12-09 2022-03-04 深圳市资福医疗技术有限公司 Stomach focus recognition model training method and stomach focus recognition method
CN114565762A (en) * 2022-02-28 2022-05-31 西安电子科技大学 Weakly supervised liver tumor segmentation based on ROI and split fusion strategy
US20220351483A1 (en) * 2020-01-09 2022-11-03 Olympus Corporation Image processing system, endoscope system, image processing method, and storage medium
CN116309536A (en) * 2023-04-23 2023-06-23 西安理工大学 Pavement crack detection method and storage medium
CN117173182A (en) * 2023-11-03 2023-12-05 厦门微亚智能科技股份有限公司 Defect detection method, system, equipment and medium based on coding and decoding network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220351483A1 (en) * 2020-01-09 2022-11-03 Olympus Corporation Image processing system, endoscope system, image processing method, and storage medium
CN114120071A (en) * 2021-12-09 2022-03-01 北京车网科技发展有限公司 Detection method of image with object labeling frame
CN114140651A (en) * 2021-12-09 2022-03-04 深圳市资福医疗技术有限公司 Stomach focus recognition model training method and stomach focus recognition method
CN114565762A (en) * 2022-02-28 2022-05-31 西安电子科技大学 Weakly supervised liver tumor segmentation based on ROI and split fusion strategy
CN116309536A (en) * 2023-04-23 2023-06-23 西安理工大学 Pavement crack detection method and storage medium
CN117173182A (en) * 2023-11-03 2023-12-05 厦门微亚智能科技股份有限公司 Defect detection method, system, equipment and medium based on coding and decoding network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WANG WENTAO ET AL.: "Transfer learning for fluence map prediction in adrenal stereotactic body radiation therapy", 《 PHYSICS IN MEDICINE AND BIOLOGY》, 31 December 2021 (2021-12-31), pages 1 - 7 *
肖宇峰: "基于DETR的超声甲状旁腺亢进检测方法研究", 《万方》, 27 September 2023 (2023-09-27), pages 1 - 61 *

Also Published As

Publication number Publication date
CN117437580B (en) 2024-03-22

Similar Documents

Publication Publication Date Title
Farhat et al. Deep learning applications in pulmonary medical imaging: recent updates and insights on COVID-19
KR102210806B1 (en) Apparatus and method for diagnosing gastric lesion using deep learning of endoscopic images
US10482313B2 (en) Method and system for classification of endoscopic images using deep decision networks
CN113379693B (en) Capsule endoscope key focus image detection method based on video abstraction technology
Guo et al. Semi-supervised WCE image classification with adaptive aggregated attention
Srinidhi et al. Automated method for retinal artery/vein separation via graph search metaheuristic approach
Gridach PyDiNet: Pyramid dilated network for medical image segmentation
CN111968091B (en) Method for detecting and classifying lesion areas in clinical image
CN102065744A (en) Image processing device, image processing program, and image processing method
CN112466466B (en) Digestive tract auxiliary detection method and device based on deep learning and computing equipment
CN114581375A (en) Method, device and storage medium for automatically detecting focus of wireless capsule endoscope
Seok et al. The semantic segmentation approach for normal and pathologic tympanic membrane using deep learning
US11935239B2 (en) Control method, apparatus and program for system for determining lesion obtained via real-time image
CN111738992A (en) Lung focus region extraction method and device, electronic equipment and storage medium
Yue et al. Benchmarking polyp segmentation methods in narrow-band imaging colonoscopy images
CN111401102A (en) Deep learning model training method and device, electronic equipment and storage medium
CN117437580B (en) Digestive tract tumor recognition method, digestive tract tumor recognition system and digestive tract tumor recognition medium
Arnold et al. Indistinct frame detection in colonoscopy videos
CN114037686B (en) Children intussusception automatic check out system based on degree of depth learning
CN112885435B (en) Method, device and system for determining image target area
Gatoula et al. Enhanced CNN-based gaze estimation on wireless capsule endoscopy images
CN114271763A (en) Mask RCNN-based gastric cancer early identification method, system and device
Yan et al. Unsupervised body part regression using convolutional neural network with self-organization
Cai et al. An improved automatic system for aiding the detection of colon polyps using deep learning
KR102502418B1 (en) Medical image processing apparatus and method using neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant