CN117437580A

CN117437580A - Digestive tract tumor recognition method, digestive tract tumor recognition system and digestive tract tumor recognition medium

Info

Publication number: CN117437580A
Application number: CN202311754713.2A
Authority: CN
Inventors: 郑中文
Original assignee: Guangdong General Hospital
Current assignee: Guangdong General Hospital
Priority date: 2023-12-20
Filing date: 2023-12-20
Publication date: 2024-01-23
Anticipated expiration: 2043-12-20
Also published as: CN117437580B

Abstract

The invention belongs to the application of artificial intelligence in the medical field, in particular to a digestive tract tumor recognition method, which comprises the steps of inputting training sample images in a sample set into a digestive tract tumor recognition model to obtain a prediction frame and tumor prediction probability corresponding to the prediction frame, and calculating loss according to the position of the prediction frame, the tumor prediction probability corresponding to the prediction frame and the position of a real frame to obtain loss corresponding to the prediction frame; when the number of the real frames is 1 and the number of the target prediction frames is greater than 1, determining the target prediction frames corresponding to the real frames according to other frames around the training image in the video, and establishing a corresponding relation between the real frames and the target prediction frames; the target prediction frame is the prediction frame with the minimum loss in all the prediction frames; calculating the loss corresponding to the training sample image based on the corresponding relation, and training the model by adopting back propagation; frames are extracted from the video of the endoscope and identified by the trained model. The invention improves the accuracy of the model and the training speed.

Description

Digestive tract tumor recognition method, digestive tract tumor recognition system and digestive tract tumor recognition medium

Technical Field

The invention relates to the application field of artificial intelligence in medical treatment, in particular to a digestive tract tumor identification method, a digestive tract tumor identification system and a digestive tract tumor identification medium.

Background

Digestive tract tumors, including esophageal cancer, gastric cancer, colorectal cancer, small intestine cancer, anal cancer, etc., are mainly increased in risk due to eating habits and chronic infections (such as helicobacter pylori infection), while colorectal cancer is more closely related to lifestyle factors (such as lack of eating and physical activity). The complexity and diversity of these tumor types requires that the medical community continually seek more efficient, more accurate diagnostic methods. Traditional diagnostic methods, such as endoscopy, radiological imaging and biomarker detection, while widely used clinically, remain limited in terms of the presence of significant limitations, such as early tumor identification and precise typing. Especially in the early stages, the digestive tract tumor may have no obvious symptoms or easily distinguishable features, which increases the difficulty of early diagnosis.

With the development of artificial intelligence technology, the technology has great potential in the medical field, especially in diagnosis of digestive tract tumors. However, artificial intelligence also faces several challenges in gut tumor recognition, the most significant of which is the impact of tumor heterogeneity on the generalization ability of artificial intelligence models. In digestive tract tumors, even the same type of tumor, different biological characteristics and clinical manifestations may be exhibited between different patients. Such differences may result from genetic differences, environmental factors, or biological properties of the tumor itself. For example, the tumors of two gastric cancer patients may differ significantly in size, morphology, growth rate, and cell composition, and these differences directly affect the recognition and analysis capabilities of the artificial intelligence model, requiring higher accuracy of the model. How to improve the accuracy of identifying digestive tract tumors becomes a key of artificial intelligence in identifying digestive tract tumors.

Disclosure of Invention

In order to improve the accuracy and the training speed of the digestive tract tumor recognition, the invention provides a digestive tract tumor recognition method, which comprises the following steps of:

obtaining an endoscope video of digestive tract examination, marking images in the video to obtain a sample set, inputting training sample images in the sample set into a digestive tract tumor recognition model to obtain a prediction frame and a tumor prediction probability corresponding to the prediction frame, and calculating losses according to the position of the prediction frame, the tumor prediction probability corresponding to the prediction frame and the position of a real frame to obtain losses corresponding to each prediction frame;

when the number of the real frames is 1, if the number of the target predicted frames is 1, establishing a corresponding relation between the real frames and the target predicted frames; if the number of the target prediction frames is greater than 1, determining target prediction frames corresponding to the real frames according to other frames around the training sample image in the video, and establishing a corresponding relation between the real frames and the target prediction frames; when the number of the real frames is larger than 1, constructing a cost matrix and determining the corresponding relation between the real frames and the predicted frames by adopting a Hungary algorithm; the target prediction frame is the prediction frame with the minimum loss in all the prediction frames;

calculating the corresponding loss of the training sample image based on the corresponding relation, and training the model by adopting back propagation; in endoscopy of a tumor of the digestive tract, frames are extracted from the video of the endoscope and identified by the trained model.

Preferably, the determining a target prediction frame corresponding to the real frame according to other frames around the training sample image in the video specifically includes:

obtaining a tumor prediction probability corresponding to each target prediction frame;

determining target frames according to the positions of the training sample images in the video, and inputting each target frame into the digestive tract tumor recognition model to obtain the maximum tumor prediction probability corresponding to all prediction frames;

and calculating the average value of the maximum tumor prediction probabilities of all target frames, and taking the target prediction frame with the tumor prediction probability closest to the average value as a prediction frame corresponding to a real frame.

Preferably, the determining the target frame according to the position of the training sample image in the video specifically includes:

acquiring the position of the training sample image in the video, and determining N frames of images before the position and N frames of images after the position; wherein N is a positive integer greater than 2;

and acquiring a plurality of sub-region images from the training sample image according to the target prediction frame and the real frame, and acquiring a target frame from the 2N frame image according to the sub-region images.

Preferably, the target frame is obtained from the 2N frame image according to the sub-region image, specifically:

searching a region with highest similarity of the sub-region images corresponding to the real frames in each of the 2N frame images, and determining target sub-region images corresponding to the plurality of sub-region images based on the position relation among the plurality of sub-region images and the position of the region with highest similarity in the 2N frame images;

establishing a corresponding relation between the sub-region image and the target sub-region image, calculating the similarity between the sub-region image and the corresponding target sub-region image, and calculating the distance between the target sub-region image corresponding to the real frame and the real frame, selecting at least one image from the 2N frame images, wherein the similarity and the distance corresponding to the selected image meet the preset condition.

Preferably, the calculating the loss according to the position of the prediction frame, the tumor prediction probability corresponding to the prediction frame, and the position of the real frame obtains the loss corresponding to each prediction frame, specifically:

obtaining probability loss according to the tumor prediction probability corresponding to the prediction frame; obtaining a boundary frame loss based on the IOU or GIOU of the prediction frame and the real frame; and summing the probability loss and the boundary box loss to obtain the loss corresponding to the prediction box.

In addition, the invention also provides a digestive tract tumor recognition system, which comprises the following modules:

the prediction frame loss acquisition module is used for acquiring an endoscopic video of the digestive tract examination, marking images in the video to obtain a sample set, inputting training sample images in the sample set into the digestive tract tumor recognition model to obtain a prediction frame and a tumor prediction probability corresponding to the prediction frame, and calculating loss according to the position of the prediction frame, the tumor prediction probability corresponding to the prediction frame and the position of a real frame to obtain loss corresponding to each prediction frame;

the target prediction frame calculation module is used for establishing a corresponding relation between the real frames and the target prediction frames if the number of the real frames is 1 and the number of the target prediction frames is 1; if the number of the target prediction frames is greater than 1, determining target prediction frames corresponding to the real frames according to other frames around the training sample image in the video, and establishing a corresponding relation between the real frames and the target prediction frames; when the number of the real frames is larger than 1, constructing a cost matrix and determining the corresponding relation between the real frames and the predicted frames by adopting a Hungary algorithm; the target prediction frame is the prediction frame with the minimum loss in all the prediction frames;

the tumor recognition module is used for calculating the loss corresponding to the training sample image based on the corresponding relation and training the model by adopting back propagation; in endoscopy of a tumor of the digestive tract, frames are extracted from the video of the endoscope and identified by the trained model.

Finally, the invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as described above.

In the training of the digestive tract tumor recognition model, because of limited samples, the convergence and the accuracy of the model are deficient, and based on the model, the loss corresponding to each prediction frame is obtained according to the position of the prediction frame, the tumor prediction probability corresponding to the prediction frame and the position calculation loss of the real frame; when the number of the real frames is 1, if the number of the target predicted frames is 1, establishing a corresponding relation between the real frames and the target predicted frames; if the number of the target prediction frames is greater than 1, determining target prediction frames corresponding to the real frames according to other frames around the training sample image in the video, and establishing a corresponding relation between the real frames and the target prediction frames; and calculating the loss corresponding to the training sample image based on the corresponding relation, and training the model by adopting back propagation. The invention not only has more accurate calculation loss, but also reduces the calculation amount and improves the training speed.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a first embodiment;

FIG. 2 is a flowchart of step S2;

FIG. 3 is a schematic diagram of a real frame and a target prediction frame;

FIG. 4 is a schematic diagram of a real frame and a target prediction frame in a frame image;

fig. 5 is a structural diagram of the second embodiment.

Detailed Description

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In a first embodiment, the present invention provides a method for identifying tumor of digestive tract, as shown in fig. 1, the method comprises the following steps:

s1, obtaining an endoscope video of digestive tract examination, marking images in the video to obtain a sample set, inputting training sample images in the sample set into a digestive tract tumor recognition model to obtain a prediction frame and tumor prediction probability corresponding to the prediction frame, and calculating loss according to the position of the prediction frame, the tumor prediction probability corresponding to the prediction frame and the position of a real frame to obtain loss corresponding to each prediction frame;

when the endoscope is used for checking the digestive tract, a doctor usually judges whether a lesion or a tumor exists according to experience, the doctor usually depends on the experience of the doctor, and omission can occur. When training a digestive tract tumor recognition model, obtaining an endoscope video of digestive tract examination, and then labeling images in the video to obtain a sample set, wherein the video is composed of a plurality of frames, each frame is an image, and when collecting a sample, a clear image of a tumor is preferentially selected as a labeling object to obtain a first set. And marking the image with clear tumor, and marking the image with blurred tumor to obtain a second set.

The first set is preferentially used as training samples when training the digestive tract tumor recognition model. The digestive tract tumor recognition model is preferably a DETR model, where the DETR model outputs multiple prediction results and multiple prediction frames (predicted boxes) at the same time, where each prediction frame corresponds to one prediction result, for example, the prediction probability corresponding to the prediction frame 1 is 0.1, if the DETR outputs 10 prediction frames at a time, the 10 prediction frames correspond to the 10 tumor probabilities, and there is position information corresponding to the 10 prediction frames, and an exemplary position is expressed in (x_min, y_min, x_max, y_max) or (x_center, y_center, width, height). If the tumor prediction probability corresponding to the position is added, the prediction probability may be expressed as (x_center, y_center, width, height, pr), where x_center, y_center represent the center coordinates of the prediction frame, width, height represent the width and height of the prediction frame, and pr represents the tumor prediction probability corresponding to the prediction frame.

Then, the loss of each prediction frame to each real frame (also called as a group frame or a label frame) is calculated, assuming that there are 8 prediction frames and 2 real frames, the loss of the 1 st prediction frame and the 1 st real frame are calculated respectively, then the loss of the 1 st prediction frame and the 2 nd real frame are calculated, and so on until the last. The loss corresponding to each prediction frame is obtained by calculating the loss according to the position of the prediction frame, the tumor prediction probability corresponding to the prediction frame and the position of the real frame, and specifically comprises the following steps:

obtaining probability loss (also called classification loss) according to the tumor prediction probability corresponding to the prediction frame; obtaining a bounding box penalty (also known as a regression penalty) based on the IOU or GIOU of the predicted and real boxes; and summing the probability loss and the boundary box loss to obtain the loss corresponding to the prediction box. Table 1 below shows one example:

TABLE 1

In table 1, pbox represents a prediction box, tbox represents a real box, and tbox loss corresponding to pbox is represented in a+b form, where a represents a probability loss, and b represents a bounding box loss; f represents the predicted loss of no tumor, and if no tumor is predicted, it is also free of bounding box loss. The relation between the tumor prediction probability and the probability loss is that the probability loss is obtained by 1-tumor prediction probability, and a mode that 1-log (tumor prediction probability) is equal to the probability loss can be adopted, so that the invention is not particularly limited. Taking the loss of 0.2+0.1 of the prediction frame 1 relative to the real frame 1 as an example, 0.2 represents the loss of the prediction frame 1 corresponding to the prediction as a tumor, and 0.1 represents the loss corresponding to the IOU or GIOU of the prediction frame 1 and the real frame 1.

S2, when the number of the real frames is 1, if the number of the target predicted frames is 1, establishing a corresponding relation between the real frames and the target predicted frames; if the number of the target prediction frames is greater than 1, determining target prediction frames corresponding to the real frames according to other frames around the training sample image in the video, and establishing a corresponding relation between the real frames and the target prediction frames; when the number of the real frames is larger than 1, constructing a cost matrix and determining the corresponding relation between the real frames and the predicted frames by adopting a Hungary algorithm; the target prediction frame is the prediction frame with the minimum loss in all the prediction frames; the flow chart of S2 is shown in fig. 2.

Each real frame should correspond to a prediction frame, in the DETR model, a cost matrix is calculated, a hungarian algorithm (Hungarian algorithm) is used to find the prediction frame corresponding to each real frame according to the cost matrix, and then loss is calculated. The hungarian algorithm in the above process is a problem of combination, and the process is complex, but in the process of identifying digestive tract tumors, the number of tumors in the image is small, and the complex process of the hungarian algorithm is not necessary.

In the model training of the invention, firstly, the number of the real frames in one sample is judged, if only one real frame exists, the prediction frame with the minimum loss for the real frame is selected, and the corresponding relation between the two frames is established. If there are two or more predicted frames for which the loss is minimal, i.e., the loss of two or more predicted frames for the real frame is equal and minimal, as shown in Table 2:

TABLE 2

In table 2, the losses of prediction frame 1 and prediction frame 8 with respect to real frame 1 are the same, since prediction frame 1 is used to match real frame 1 and prediction frame 2 is used to match real frame 1, the loss of this training, calculated last, is different, and it is necessary to further determine whether to match prediction frame 1 with real frame 1 or to match prediction frame 2 with real frame 1.

In a specific embodiment, the determining a target prediction frame corresponding to a real frame according to other frames around the training sample image in the video specifically includes:

each target prediction frame corresponds to a position and a prediction probability of the target prediction frame, for example, (x_center, y_center, width, height, pr), where pr is a tumor prediction probability.

In the video, the frame around the training sample image is similar to the training sample image, and in general, there will be a tumor marked by a real frame also appearing in the frame around the training sample image, except that the position of the tumor in the frame around the training sample will change due to the movement of the endoscope or the like. Based on the above, the determined target frames are input into the model, then the maximum tumor prediction probability in all the prediction frames corresponding to each target frame is obtained, then the average value of the maximum tumor prediction probabilities of all the target frames is calculated, and the reliability of the probability loss corresponding to the target prediction frames can be judged through the average value.

For example, the target prediction frame has two pbox1 and pbox8, and the corresponding tumor prediction probabilities are 0.8 and 0.9, respectively. After the target frames pass through the model, the number of target frames is 3, each target frame corresponds to 8 prediction frames, namely 8 prediction frames of each target frame correspond to 8 tumor prediction probabilities, and the maximum value is taken; the maximum tumor prediction probabilities corresponding to the 3 target frames are assumed to be respectively: 0.8, 0.6, 0.9, with an average value of 0.77, which is closest to 0.8, then finally pbox1 is used as the prediction box for the real box.

In an optional embodiment, the determining the target frame according to the position of the training sample image in the video specifically is:

When selecting the target frame, it is necessary to ensure that the image area of the real frame in the target training sample exists in the target frame, otherwise, the reliability of the average value obtained above is very low, and the final effect is reduced. The invention selects the target frame from the frames around the training sample image, and further screens from 2N frame images according to the target prediction frame and the real frame. In an optional embodiment, the target frame is obtained from the 2N frame image according to the sub-region image, specifically:

the sub-region images are obtained by capturing images of the size from the position in the target sample image, and the sub-region images corresponding to the target prediction frames are obtained in the same sub-region image obtaining mode.

Then, for each of the 2N frame images, searching for the region with the highest similarity of the sub-region image corresponding to the real frame, so that a sub-region is obtained in each of the 2N frame images, and then determining the position of the sub-region in each of the 2N frame images. And determining a plurality of target sub-region images according to the position relation of the plurality of sub-region images in the target sample image and the position of the sub-region in the frame image. Assuming that there are 1 real frame and 2 target prediction frames, the relationship of the three is shown in fig. 3. After determining the region with the highest similarity with the real frame in the frame image, 3 target sub-region images can be determined according to the triangle, as shown in fig. 4.

The method comprises the steps that a real frame and a plurality of target prediction frames exist in a target sample image, a target sub-region image corresponding to the real frame and a target sub-region image corresponding to the target prediction frames are determined in the frame image, and therefore a corresponding relation between the sub-region image in the target sample image and the target sub-region image in the frame image is established. Then calculating the similarity of the sub-region image and the corresponding target sub-region image, calculating the distance between the target sub-region image corresponding to the real frame and the real frame in the target sample image, if the similarity is high, indicating that the real frame and the target prediction frame exist in the frame image, otherwise, discarding the frame image; meanwhile, certain differences between the frame image and the target sample image are guaranteed, so that the reliability of the calculated average value is higher.

In the digestive tract tumor recognition, for the situation that more than one unusual training sample image exists, for programming simplicity, the corresponding relation between a real frame and a predicted frame is still determined by adopting a default mode of a DETR model.

S3, calculating the loss corresponding to the training sample image based on the corresponding relation, and training the model by adopting back propagation; in endoscopy of a tumor of the digestive tract, frames are extracted from the video of the endoscope and identified by the trained model.

After the corresponding relation between the prediction frame and the real frame is established, the loss corresponding to the training sample image is calculated, and in an alternative embodiment, the loss corresponding to the training sample image adopts a loss calculation mode in the DETR model. By training the digestive tract tumor recognition model through the method, the accuracy and the accuracy of the model can be improved, and the calculated amount in the training process, particularly the calculated amount brought by the Hungary algorithm, can be reduced.

In endoscopy of a tumor in a digestive tract, video shot by an endoscope is synchronously transmitted to a server or a host computer, frames are extracted from the video of the endoscope, and the trained model is adopted for identification. In an alternative embodiment, frames are extracted from the video of the endoscope in a mode of equal time intervals, for example, frames are extracted every 1s, or a mode of extracting only I frames and/or P frames in the video is also adopted, and the specific extraction mode is not limited in the present invention. According to different training samples, the model of the invention can be only used for detecting gastric cancer or esophagus cancer and the like. Of course, the method can also be applied to other related diseases detected by the endoscope.

In a second embodiment, the present invention further provides a digestive tract tumor recognition system, as shown in fig. 5, the system includes the following modules:

In a third embodiment, the present invention further provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the method according to the first embodiment.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by adding necessary general purpose hardware platforms, or may be implemented by a combination of hardware and software. Based on such understanding, the foregoing aspects, in essence and portions contributing to the art, may be embodied in the form of a computer program product, which may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method of gut tumor identification, the method comprising the steps of:

2. The method according to claim 1, wherein the determining a target prediction frame corresponding to a real frame according to other frames around the training sample image in the video is specifically:

3. The method according to claim 2, wherein the determining a target frame from the position of the training sample image in the video is in particular:

4. A method according to claim 3, wherein the target frame is acquired from the 2N frame image according to the sub-region image, specifically:

5. The method of claim 1, wherein the calculating the loss according to the position of the predicted frame, the tumor prediction probability corresponding to the predicted frame, and the position of the real frame obtains the loss corresponding to each predicted frame, specifically:

6. A digestive tract tumor recognition system, the system comprising the following modules:

7. The system according to claim 6, wherein the determining a target prediction frame corresponding to a real frame according to other frames around the training sample image in the video is specifically:

8. The system according to claim 7, wherein the determining the target frame according to the position of the training sample image in the video is specifically:

9. The system according to claim 8, wherein the target frame is acquired from the 2N frame image according to the sub-region image, specifically:

10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor, implements the method according to any of claims 1-5.