CN114332033A - Endoscope image processing method, apparatus, medium, and device based on artificial intelligence - Google Patents

Endoscope image processing method, apparatus, medium, and device based on artificial intelligence Download PDF

Info

Publication number
CN114332033A
CN114332033A CN202111653381.XA CN202111653381A CN114332033A CN 114332033 A CN114332033 A CN 114332033A CN 202111653381 A CN202111653381 A CN 202111653381A CN 114332033 A CN114332033 A CN 114332033A
Authority
CN
China
Prior art keywords
image
classification
inspection
target
image classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111653381.XA
Other languages
Chinese (zh)
Inventor
边成
杨志雄
石小周
赵家英
李剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiaohe Medical Instrument Hainan Co ltd
Original Assignee
Xiaohe Medical Instrument Hainan Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiaohe Medical Instrument Hainan Co ltd filed Critical Xiaohe Medical Instrument Hainan Co ltd
Priority to CN202111653381.XA priority Critical patent/CN114332033A/en
Publication of CN114332033A publication Critical patent/CN114332033A/en
Priority to PCT/CN2022/139016 priority patent/WO2023125008A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features

Abstract

The present disclosure relates to an artificial intelligence based endoscope image processing method, apparatus, medium and device, the method comprising: acquiring an inspection image of an endoscope; extracting a depth image corresponding to the inspection image according to a depth map model; determining an image classification corresponding to the inspection image according to the depth image, the inspection image and an image classification model, wherein the image classification is used for representing a blind area proportion of a tissue corresponding to the inspection image; and determining a target image classification corresponding to the endoscope operation in the target time interval according to the image classification corresponding to the plurality of inspection images received in the target time interval at the interval target time interval, and outputting the target image classification. Therefore, the doctor can accurately know the examination range of the endoscope in the endoscopic examination process in the process of endoscopic operation, the risk of missed examination is reduced to a certain extent, the endoscopic examination result is guaranteed, and the use experience of the user is improved.

Description

Endoscope image processing method, apparatus, medium, and device based on artificial intelligence
Technical Field
The present disclosure relates to the field of image processing, and in particular, to an endoscope image processing method, apparatus, medium, and device based on artificial intelligence.
Background
Endoscopes are widely used for colon screening and polyp detection, and the scope of examination of a region in the human body in endoscopy directly affects the final examination result.
The internal tissues of the body to be examined by the endoscope are usually soft tissues, creep occurs in the intestinal tract and the like during the endoscope moving process of the doctor, and the doctor can flush water, release a loop and the like during the endoscope moving process, so that the doctor cannot clearly know the examination range during the endoscope moving process. Meanwhile, due to the problems of intestinal peristalsis, folds and the like, in the examination process of a doctor, part of intestinal mucosa areas cannot appear in the visual field of the doctor, so that the doctor misses the examination and cannot obtain an accurate endoscopic examination result.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
In a first aspect, the present disclosure provides an artificial intelligence based endoscopic image processing method, the method comprising:
acquiring an inspection image of an endoscope;
extracting a depth image corresponding to the inspection image according to a depth map model, wherein the depth image is used for representing the structural information of the tissue corresponding to the inspection image;
determining an image classification corresponding to the inspection image according to the depth image, the inspection image and an image classification model, wherein the image classification is used for representing a blind area proportion of a tissue corresponding to the inspection image;
and determining a target image classification corresponding to the endoscope operation in the target time interval according to the image classification corresponding to the plurality of inspection images received in the target time interval at the interval target time interval, and outputting the target image classification.
In a second aspect, the present disclosure provides an artificial intelligence based endoscopic image processing apparatus, the apparatus comprising:
the acquisition module is used for acquiring an examination image of the endoscope;
the extraction module is used for extracting a depth image corresponding to the inspection image according to a depth map model, wherein the depth image is used for representing the structural information of the tissue corresponding to the inspection image;
a first determining module, configured to determine, according to the depth image, the inspection image, and an image classification model, an image classification corresponding to the inspection image, where the image classification is used to represent a blind area proportion of a tissue corresponding to the inspection image;
and the second determining module is used for determining the target image classification corresponding to the endoscope operation in the target time interval according to the image classification corresponding to the plurality of inspection images received in the target time interval at intervals of the target time interval, and outputting the target image classification.
In a third aspect, the present disclosure provides a computer readable medium having stored thereon a computer program which, when executed by a processing apparatus, performs the steps of the method of the first aspect.
In a fourth aspect, the present disclosure provides an electronic device comprising:
a storage device having a computer program stored thereon;
processing means for executing the computer program in the storage means to carry out the steps of the method of the first aspect.
By the technical scheme, the structural information of the tissue corresponding to the inspection image can be obtained by extracting the depth image of the inspection image, the blind area proportion corresponding to the invisible part in the inspection image is predicted by combining the inspection image visible in the visual field of a doctor and the depth image containing the structural information of the inspected tissue, the determined image classification is less influenced by the dynamically changed internal environment of the human body by combining the structural information, meanwhile, the influence of textures, colors and the like on the surface of the internal tissue on the prediction result can be avoided, and the accuracy of the image classification is further improved. And the classification of the target image corresponding to the endoscope operation in the target time interval is determined by combining the image classification of the multi-frame inspection image, so that the influence of the estimation error of the single-frame image on the final result can be effectively avoided, the classification accuracy of the target image is further improved, a doctor can accurately know the inspection range of the endoscope in the process of endoscope operation, the risk of missed inspection is reduced to a certain extent, the result of the endoscope inspection is ensured, and the use experience of the user is improved.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows.
Drawings
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale. In the drawings:
FIG. 1 is a flow diagram of an artificial intelligence based endoscopic image processing method provided in accordance with an embodiment of the present disclosure;
FIG. 2 is a schematic view of an intestine reconstructed based on a three-dimensional reconstruction approach;
FIG. 3 is a schematic diagram of a depth map model provided in accordance with an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a finite state machine provided in accordance with one embodiment of the present disclosure;
FIG. 5 is a block diagram of an artificial intelligence based endoscopic image processing apparatus provided in accordance with an embodiment of the present disclosure;
FIG. 6 shows a schematic structural diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
Fig. 1 is a flowchart of an artificial intelligence based endoscopic image processing method according to an embodiment of the present disclosure, and as shown in fig. 1, the method may include:
in step 11, an examination image of the endoscope is acquired. The image shot in real time in the endoscope withdrawing process can be used as the examination image, so that the related operation in the endoscope withdrawing process can be monitored based on the examination image.
In step 12, a depth image corresponding to the inspection image is extracted according to the depth map model, wherein the depth image is used for representing the structural information of the tissue corresponding to the inspection image.
Wherein, the depth image can be used to reflect the geometric shape of the visible surface in the examination image without regard to the influence of texture, color, etc. in the examination image, i.e. the structural information of the corresponding human tissue inside in the examination image can be obtained by extracting the depth image corresponding to the examination image.
However, as described in the background art, it is difficult to accurately monitor the examination range of the endoscopic examination due to problems such as intestinal peristalsis and wrinkles during the endoscopic examination. In this embodiment, by extracting the depth image corresponding to the examination image, the structural information of the inside of the corresponding human tissue in the examination image, such as the structural information of the inside of the intestinal tract, can be obtained. The visual field of a doctor is easily blocked due to the intestinal peristalsis, the wrinkles, the intestinal bending and the like, and the structure of the intestinal tract is not changed due to the intestinal peristalsis, the wrinkles, the intestinal bending and the like, so that the influence of other information in the inspection image on subsequent image classification judgment can be avoided by acquiring the depth image in the embodiment.
In step 13, an image classification corresponding to the inspection image is determined according to the depth image, the inspection image and the image classification model, wherein the image classification is used for representing the blind area proportion of the tissue corresponding to the inspection image.
Wherein the examination image may contain image information of a tissue actually appearing in the field of view of the doctor, and the depth image may contain structural information of the tissue, so as to predict the structural information and visible image information based on the image classification model, thereby determining the blind area proportion of the current tissue. The proportion of dead zones is understood to be the proportion of dead zones (i.e. parts of the endoscope that cannot be observed in the field of view) that occupy the entire surface area of the interior of the tissue during the endoscopic procedure. As shown in fig. 2, a schematic diagram of an intestinal mucosa obtained by performing three-dimensional reconstruction based on an endoscopic image, wherein the intestinal tract may be similar to a tubular structure, due to the limitation of the endoscopic field of view, when performing intestinal tract reconstruction based on an endoscopic image, it may occur hollow positions as shown at W1, W2, W3, W4 in fig. 2, that is, the hollow positions do not occur in an examination image, that is, an invisible part in an endoscopic examination, a doctor cannot observe a mucosal region of the part during the endoscopic examination, and if the invisible mucosal region is too much, a missed examination phenomenon is likely to occur. The blind area proportion can represent the proportion of the invisible part of the mucous membrane in the examination image to the whole mucous membrane area of the tissue, and the proportion of the invisible part in the current tissue can be prompted according to the blind area proportion so as to represent the comprehensiveness of the endoscopy.
In step 14, the interval target period determines a target image classification corresponding to the endoscopic operation in the target period according to the image classifications corresponding to the plurality of examination images received in the target period, and outputs the target image classification.
The target time interval can be set according to an actual application scene, if the real-time requirement on the endoscopy is high, the target time interval can be set to be short in time, and if the real-time requirement on the endoscopy is low, the target time interval can be set to be long in time. In the method, the classification and judgment are carried out on the real-time detection images in the endoscope operation process, and in order to avoid the estimation error of a single-frame image, the integral operation state in a target time period can be determined by classifying the images corresponding to a plurality of frames of inspection images in the target time period, so that the accuracy and the comprehensiveness of the determined target image classification are ensured.
Therefore, the structural information of the tissue corresponding to the examination image can be obtained by extracting the depth image of the examination image, the blind area proportion corresponding to the invisible part in the examination image is predicted by combining the examination image visible in the visual field of a doctor and the depth image containing the structural information of the examined tissue, the determined image classification is less influenced by the dynamically changed internal environment of the human body by combining the structural information, meanwhile, the influence of the texture, the color and the like of the surface of the internal tissue on the prediction result can be avoided, and the accuracy of the image classification is further improved. And the classification of the target image corresponding to the endoscope operation in the target time interval is determined by combining the image classification of the multi-frame inspection image, so that the influence of the estimation error of the single-frame image on the final result can be effectively avoided, the classification accuracy of the target image is further improved, a doctor can accurately know the inspection range of the endoscope in the process of endoscope operation, the risk of missed inspection is reduced to a certain extent, the result of the endoscope inspection is ensured, and the use experience of the user is improved.
In one possible embodiment, an exemplary implementation of the method for acquiring an examination image of an endoscope is as follows, which may include:
acquiring an endoscope image shot by the endoscope in the process of withdrawing the endoscope;
and performing secondary classification on the endoscope images according to an inspection image determination model, and determining the images which are correspondingly classified into a normal classification in the endoscope images as the inspection images, wherein a training sample of the inspection image determination model comprises a positive sample of the images corresponding to the normal classification and a negative sample of the images corresponding to one or more abnormal classifications.
In an actual application scenario, in the enteroscopy process of a doctor, abnormal image frames such as blurring and overexposure may exist in an acquired examination image due to reasons such as too fast flushing and endoscope withdrawing speed, and it is difficult to obtain an accurate result when image classification is performed based on the abnormal image frames, so that determination of final target image classification is affected. Based on this, in the embodiment of the present disclosure, frames may be extracted in advance from a video captured by an endoscope, and a plurality of image frames may be obtained. Then, a doctor can perform classification marking, for example, a no-signal image frame, an in-vitro image frame, a bubble image frame, a screen overlook image frame, an overexposure image frame, a fuzzy image frame, a color-changing image frame and the like can be marked as abnormal classification, and a clear image frame is marked as normal classification, so that a training sample containing a positive sample and a negative sample can be obtained.
Then, a neural network may be trained based on the training samples to obtain an inspection image determination model, for example, the neural network may be a resnet50 network, and a training process thereof may adopt a training manner commonly used in the art, which is not described herein again.
Therefore, in this embodiment, after obtaining an endoscopic image captured by an endoscope, the endoscopic image may be input into the trained inspection image determination model, if the inspection image determination model outputs a normal classification, the endoscopic image may be used as an inspection image to perform a subsequent processing procedure, and if the inspection image determination model outputs an abnormal classification, it indicates that the endoscopic image itself has a quality problem, and at this time, the endoscopic image may be directly discarded, which may not only avoid invalid data caused by subsequent analysis, but also avoid the influence of a low-quality image on the classification of a final target image, ensure the accuracy of the determined target image classification, and perform accurate data prompt for a user.
In a possible embodiment, the depth map model includes a plurality of feature extraction submodels connected in series, and an exemplary implementation manner of extracting a depth image corresponding to the inspection image according to the depth map model includes:
and performing down-sampling on the inspection image to obtain a down-sampled image corresponding to the inspection image.
For example, the inspection image may be downsampled by the convolution layer, for example, the inspection image may be downsampled to 1/2 of the resolution of the inspection image, and the downsampled image is obtained, so that the downsampled image may include more original features in the inspection image, and the amount of calculation required for image recognition and feature extraction may be reduced to some extent, thereby improving the processing efficiency of the inspection image.
Then, inputting a target image into the feature extraction submodel to obtain a feature map output by the feature extraction submodel, wherein if the feature extraction submodel is a first feature extraction submodel, the target image is the downsampled image, and if the feature extraction submodel is not the first feature extraction submodel, the target image is an image obtained by fusing the downsampled image and the feature map output by a previous feature extraction submodel of the feature extraction submodel;
and performing deconvolution operation on the feature map output by the last feature extraction sub-model to obtain the depth image.
Fig. 3 is a schematic structural diagram of a depth map model provided according to an embodiment of the present disclosure. As shown in fig. 3, the depth map model may include 3 feature extraction submodels M1, M2, and M3, and then the inspection image I may be downsampled to obtain a downsampled image, and a depth image may be obtained based on the downsampled image and the depth map model.
For example, if the feature extraction sub-model is the first feature extraction sub-model, that is, the feature extraction sub-model is the sub-model M1, the down-sampled image may be input into the sub-model M1, and the sub-model M1 may obtain a feature map corresponding to the down-sampled image through the processes of down-sampling and up-sampling. Then, the feature map output by the submodel M1 and the down-sampled image may be fused to obtain a fused image, and the fused image may be used as an input image of the submodel M2. Likewise, the submodel M2 may obtain the feature map corresponding to the fused image through the processes of down-sampling and up-sampling. Furthermore, the feature map output by the submodel M2 and the downsampled image are fused to obtain a fused image, and the fused image is used as an input image of the submodel M3, and the submodel M3 can obtain the feature map corresponding to the fused image through the processes of downsampling and upsampling.
Since the submodel M3 is the last feature extraction submodel, the feature map output by the submodel M3 may be deconvoluted to obtain the depth image, so that the obtained depth image has the same resolution as the inspection image, i.e., a depth image corresponding to the original image size is obtained.
Therefore, according to the technical scheme, when the depth image corresponding to the inspection image is determined, the data volume calculated by the depth map model can be reduced by performing downsampling on the inspection image, the image information of the inspection image is continuously mixed by circularly using the downsampling and downsampling through the plurality of feature extraction sub-models, the network receptive field in the depth map model is effectively increased, the depth map model can pay attention to the global structure information of the inspection image and the local detail information of the inspection image, the comprehensiveness and effectiveness of image feature extraction are ensured, and the accuracy and effectiveness of the determined depth image are improved. And the input images of other submodels except the first feature extraction submodel are all fusion images of the feature map output by the previous feature extraction submodel and the downsampled image, so that the input of the feature extraction submodel can contain the feature information of the original image, the edge blurring of the depth image is avoided, and the accuracy of the depth image is further improved.
In a possible embodiment, the depth map model comprises a plurality of feature extraction submodels connected in series, and the target loss of the depth map model in the training process is obtained by:
and performing down-sampling on the true-value depth image corresponding to the training image input into the depth map model to obtain true-value feature images respectively corresponding to each feature extraction sub-model, wherein the resolution of the feature map output by each feature extraction sub-model is the same as that of the true-value feature image corresponding to the feature extraction sub-model.
For example, the depth map model may be trained based on classic depth estimation data sets such as Kitti and NYU, and the training samples corresponding to the depth map model may include training images determined from the depth estimation data sets and true-value depth images corresponding to the training images. As an example, the depth map model includes a plurality of feature extraction submodels connected in series, and in order to improve the training efficiency of the model, the resolution of the feature map of the middle layer output by the feature extraction submodel is usually smaller than that of the original training image, so in this embodiment, the true-value depth image of the input training image may be down-sampled with respect to the resolution of the feature map output by each feature extraction submodel, so as to obtain the same true-value feature image as the resolution of the feature map output by the feature extraction submodel, so as to determine to evaluate the accuracy of the feature extraction submodel according to the true-value feature image and the output feature map.
And aiming at each feature extraction submodel except the last feature extraction submodel, determining the intermediate loss corresponding to the feature extraction submodel according to the feature graph output by the feature extraction submodel and the truth-value feature image corresponding to the feature extraction submodel.
The intermediate loss corresponding to the feature extraction submodel can be determined through the following formula:
Figure BDA0003447172950000091
wherein L isi(di,diA) is used for representing the intermediate loss corresponding to the ith feature extraction submodel;
dia feature map for representing an output of the ith feature extraction submodel;
dithe characteristic extraction sub-model is used for extracting the characteristic of the ith characteristic;
and N is used for representing the number of pixel points in the characteristic diagram.
Then, according to the depth image output by the depth map model and the true value depth image, determining the prediction loss of the depth map model; determining a sum of each of the intermediate losses and the predicted loss as a target loss for the depth map model.
For example, as in the depth map model shown in fig. 3, the corresponding intermediate loss may be calculated for the intermediate feature extraction submodels M1 and M2, respectively. As an example, if the resolution of the feature map output by the submodel M1 is 1/2 of the resolution of the input training image, the true-value depth image corresponding to the training image may be down-sampled to 1/2 resolution to obtain the true-value feature image. The target loss of the depth map model as shown in fig. 3 is then expressed as follows:
Ld=L1(d1,d1*)+L2(d2,d2*)+L3(D,d*)
wherein D is used for representing the depth image output by the depth map model;
d is used to represent the true depth image.
Then, a new training image and a true-value depth image corresponding to the training image can be obtained from the endoscope data set endo-slam, and the depth map model is further subjected to local feature optimization based on the new training image, so that the accuracy of the depth map model is further improved. The loss calculation in the tuning process is the same as the above process, and is not described herein again.
Therefore, by the technical scheme, the intermediate supervision loss function can be added to the output of the intermediate feature extraction submodel, so that the intermediate feature extraction submodel obtains an intermediate supervision signal, the accuracy of gradient back transmission of the intermediate layer in the model training process is improved, the phenomenon of gradient information caused by the fact that the network of the depth map model is too deep is avoided, and the training efficiency and accuracy of the depth map model are improved.
In one possible embodiment, the image classification model is determined by:
and acquiring a historical examination image corresponding to the endoscope operation.
As an example, endoscopic images (e.g., gastroscopic images, colonoscopic images, etc.) captured endoscopically by multiple users in real-world situations may be acquired, and the pre-processing may include performing standardized cropping, e.g., normalizing resolution and size to obtain a historical image of uniform size, to facilitate subsequent training procedures by pre-processing the endoscopic images to obtain historical images. Preprocessing may also include deleting abnormally classified endoscopic images, such as overexposure and lack of sharpness, to avoid the impact of such images on the learning of classification characteristics, wherein the manner in which abnormally classified endoscopic images are determined is described in detail above.
As another example, the historical examination images include historical images determined based on endoscopic images taken by the endoscope, and enhanced images obtained by data enhancing the historical images, the data enhancing including one or more of: random flip, random artificial panning transform (RandomAffine), and color perturbation (ColorJitter). Wherein the historical image may be an image obtained by preprocessing an endoscopic image in the manner described above. In this embodiment, the number of endoscopic images is usually small, so that more images can be constructed based on the historical images on the basis of obtaining the historical images, that is, the historical images can be subjected to data enhancement, so that the diversity and richness of training samples of the image classification model can be effectively increased, and the stability and accuracy of the image classification model obtained by training can be ensured.
And then extracting a depth image corresponding to the historical inspection image according to the depth map model, and fusing the depth image corresponding to the historical inspection image and the historical inspection image to obtain a training image.
For example, for each historical inspection image, the historical inspection image is input into a depth map model, so that the depth image corresponding to the historical inspection image can be obtained based on the depth map model. The manner of extracting the depth image corresponding to the historical inspection image according to the depth map model is the same as the manner of extracting the depth image corresponding to the inspection image, and is not described herein again. Then, the historical inspection image and the depth image corresponding to the historical inspection image may be fused, for example, the historical inspection image and the depth image may be fused by concat, and the fused image may be used as a training image to train an image classification model.
And taking the training image as the input of a preset classification model, taking the label classification corresponding to the historical inspection image as the target output of the preset classification model, and training the preset classification model to obtain the image classification model.
For each historical inspection image, the corresponding image classification can be marked by an experienced endoscopist, namely the corresponding marking classification of the historical inspection image. For example, taking enteroscopy as an example, the blind area ratio can represent the ratio of the intestinal tract region which does not appear in the visual field to the whole intestinal tract region, if the blind area ratio is too high, the examination region is less, the part has the risk of missing examination, and the doctor needs to re-examine the intestinal tract. The proportion of the dead zone should be a continuous value between 0 and 1, and for the convenience of doctor labeling, the label of the continuous value can be converted into a classification label, for example, the following corresponding relationship can be used for conversion:
Figure BDA0003447172950000111
for example, the training image is input into a preset classification model, which may be implemented based on a Resnet50 network, and an output vector g of the training image is obtained by connecting a Global pooling layer (Global average potential) after a last convolutional layer in the network, and then a probability that the output vector g corresponds to each image classification may be obtained through a full connection layer, so as to perform cross entropy loss calculation based on the probability:
Figure BDA0003447172950000121
wherein Lc (p, q) is used to represent the loss of the image classification model;
c is used for representing the category number of the image classification;
pifor indicating that the annotation class corresponds to the representation of the ith image class, if the annotation class is the same as the ith image class, the annotation class is associated with the representation of the ith image class1, if the label classification is different from the ith image classification, the label classification is 0;
qirepresenting the probability that the training image corresponds to the ith image class.
In the embodiment, when the image classification model is trained, the depth image corresponding to the inspection image is fused in the corresponding training image, so that the relationship between the image information of the tissue surface and the structural information of the tissue and the image classification in the endoscopy process can be learned in the training process of the image classification model, the prediction accuracy of the image classification model is higher, the reference characteristics are more comprehensive, and the training efficiency and accuracy of the image classification model are improved.
In one possible embodiment, the step of determining the target image classification corresponding to the endoscopic operation in the target period according to the image classification corresponding to the plurality of examination images received in the target period may include:
if the continuous accumulated quantity of the inspection images corresponding to the lowest level of image classification in the target time period exceeds a preset threshold value, taking the lowest level of image classification as the target image classification;
and if the continuous accumulative quantity of the inspection images corresponding to the image classification of the lowest level in the target time period does not exceed the preset threshold, determining the target image classification according to the total accumulative quantity of the inspection images under each image classification in the target time period.
In an actual application scenario, if the grade corresponding to the image classification determined by the doctor in the endoscopic examination process is relatively low, which indicates that the visible range of the examination region appearing in the visual field of the doctor in the endoscopic examination process is relatively small, taking the intestinal tract examination as an example, if the grade corresponding to the image classification of the intestinal tract determined by the doctor in the enteroscopy process is relatively low, which indicates that the ratio of the intestinal tract examination region in the visual field of the examination image to the whole intestinal tract region is relatively low, that is, a relatively large part of the intestinal tract environment does not appear in the visual field of the doctor, the doctor cannot necessarily view the partial intestinal tract environment, and thus the condition of missed examination of the lesion position is likely to occur. And endoscopy is a dynamic process, and in order to avoid estimation error of a single-frame inspection image, the inspection state of the endoscope can be accurately classified by combining image classification of multiple frames of inspection images.
As described above, when the level corresponding to the image classification is low, it indicates that the ratio of the intestinal tract examination region to the entire intestinal tract region appearing in the visual field of the doctor is low, and the examination range of the doctor is insufficient. Therefore, the method and the device can perform priority identification on the image classification of the lower level so as to find problems in time, improve the real-time performance of identification and reduce the data processing amount corresponding to the identification.
For example, the respective inspection images may be classified and accumulated according to the image classification corresponding to the respective inspection images received in the target time period, that is, the total accumulated number of the inspection images under the respective image classifications is determined.
In the process of accumulating the number of inspection images in a target time period, if the continuous accumulated number of inspection images corresponding to the lowest-level image classification in the target time period exceeds a preset threshold, the lowest-level image classification is taken as the target image classification. The method comprises the steps of classifying images of the lowest level, namely, determining the proportion of blind areas is too high, and if continuous multi-frame inspection images correspond to the image classification of the lowest level, the proportion of blind areas of the whole inspection within the target time interval is too high, and directly determining the image classification of the lowest level as the target image classification corresponding to the endoscope operation within the target time interval, so that operation leaks in the endoscope inspection process can be found in time, and reliable and real-time data support is provided for corresponding prompts of doctors in the follow-up process.
If the continuous accumulated number of the inspection images corresponding to the lowest-level image classification in the target time period does not exceed the preset threshold, it indicates that the variability of the proportion of the blind areas in the overall inspection process in the target time period is large, and at this time, the situation in the target time period can be comprehensively analyzed by further combining with the image classification corresponding to the overall inspection image in the target time period.
Therefore, by the technical scheme, the inspection images are continuously accumulated according to the image classification corresponding to the inspection images in the target time interval, the endoscope operation with the over-high blind area ratio can be determined in time, the data processing amount is reduced, and the instantaneity of the image classification determination in the target time interval is improved, so that the image classification can be monitored in real time in the endoscope operation process, and a user can conveniently and timely adjust the endoscope operation according to the image classification. By monitoring the endoscope operation in real time, the occurrence of missed detection can be avoided to a certain extent, and data support is provided for the comprehensiveness of the endoscopy.
In a possible embodiment, an exemplary implementation manner of determining the target image classification according to the total accumulated number of inspection images under each image classification in the target period is as follows, and the step may include:
determining the size relation between a target ratio corresponding to the candidate image classification and a grade threshold corresponding to the candidate image classification, wherein the target ratio is the ratio of the total accumulated number of the inspection images under the candidate image classification to the total number of targets, the total number of the targets is the sum of the number of the inspection images in the target time period, and the candidate image classification is initially the image classification with the lowest grade.
For example, if the image classification includes 5 levels, the corresponding levels according to the image classification are respectively a very poor level a1, a poor level a2, a medium level A3, a good level a4 and a good level a5 from low to high, and for convenience of description, the following description is directly performed by using a1-a 5. Therefore, after the total accumulated number of the inspection images in each image classification in the target time period is determined, the target ratio corresponding to each image classification, that is, the ratio of the inspection images in each image classification in the inspection images in the target time period, can be further determined.
When the image classification is comprehensively analyzed, the image classification can be progressively analyzed according to the sequence of the corresponding grades from low to high, and a target ratio under a very poor grade, namely a target ratio under the classification A1, is obtained first, so that the size relationship between the target ratio Q1 and the grade threshold a1 corresponding to the classification A1 can be determined. Each image classification corresponds to a level threshold, and the level thresholds corresponding to different image classifications may be the same or different, which is not limited in this disclosure.
And if the target ratio corresponding to the candidate image classification is greater than or equal to the grade threshold corresponding to the candidate image classification, taking the candidate image classification as the target image classification.
For example, if the target ratio Q1 corresponding to the class a1 is greater than or equal to the level threshold a1 corresponding to the class a1, it indicates that the percentage of the inspection image under the class a1 is high in the target period, and the class a1 can be used to characterize the overall inspection condition of the target period, so that the candidate image class a1 is classified as the target image class, i.e., the target image corresponding to the endoscopic operation in the target period is classified as poor.
If the target ratio corresponding to the candidate image classification is smaller than the grade threshold corresponding to the candidate image classification, acquiring the next image classification of the candidate image classification according to the sequence from low grade to high grade corresponding to the image classification;
if the next image classification is not the highest grade, taking the next image classification as a new candidate image classification, and re-executing the step of determining the size relation between the target ratio corresponding to the candidate image classification and the grade threshold corresponding to the candidate image classification; and if the next image classification is the highest grade, determining the next image classification as the target image classification.
For example, if the target ratio Q1 corresponding to the classification a1 is smaller than the level threshold a1 corresponding to the classification a1, it indicates that the ratio of the inspection images under the classification a1 is low in the target time period, and the overall inspection condition of the target time period represented by the classification a1 is not appropriate, so that the next image classification, i.e., the classification a2, may be further determined. At this time, if the class a2 is a differential class and the corresponding class is not the highest class, the same determination process as the class a1 is performed, that is, if the target ratio Q2 corresponding to the class a2 is greater than or equal to the class threshold a2 corresponding to the class a2, which indicates that the ratio of the inspection images under the class a2 is high in the target period, the class a2 can be used to represent the overall inspection condition of the target period. If the target ratio Q2 corresponding to the classification a2 is smaller than the level threshold a2 corresponding to the classification a2, the next image classification, i.e., the classification A3, is further obtained.
After the target image classification corresponding to the endoscope operation is determined, subsequent judgment of other levels is not needed, so that the data calculation amount is saved. If, in the above example, it is further determined that the target ratio Q4 corresponding to the classification a4 is smaller than the corresponding class threshold a4, at this time, the next image is classified as the classification a5, and the class corresponding to the classification a5 is the highest class, at this time, the classification a5 may be directly determined as the target image classification, that is, the target image corresponding to the endoscopic operation in the target period is classified as the best.
Illustratively, the above-mentioned determination process may be implemented by means of a finite state machine, as shown in fig. 4, when the image is classified into 5, the determination recognition is performed by state transition:
counting according to the image classification list corresponding to each inspection image in the target time period:
step 1: if the image classification of the inspection images satisfying the M (i.e., the preset threshold) frames is extreme a1, or if the target ratio Q1 corresponding to the classification a1 is greater than or equal to the level threshold a1 corresponding to the classification a1, jumping to the state Y1 and exiting, i.e., the target image corresponding to the endoscopic operation in the target period is classified as extreme; otherwise, entering the step 2;
step 2: if the target ratio Q2 corresponding to the classification A2 is greater than or equal to the level threshold a2 corresponding to the classification A2, jumping to the state Y2 and exiting, namely, classifying the target image corresponding to the endoscope operation in the target period as a difference; otherwise, entering the step 3;
and 3, step 3: if the target ratio Q3 corresponding to the classification A3 is greater than or equal to the level threshold A3 corresponding to the classification A3, jumping to the state Y3 and exiting, namely, classifying the target image corresponding to the endoscope operation in the target period as medium; otherwise, entering the step 4;
and 4, step 4: if the target ratio Q4 corresponding to the classification a4 is greater than or equal to the level threshold a4 corresponding to the classification a4, jumping to the state Y4 and exiting, that is, the target image corresponding to the endoscopic operation in the target period is classified as good; otherwise, entering the step 5;
and 5, step 5: and (4) jumping to a state Y5, namely classifying the target image corresponding to the endoscope operation in the target time interval as the best, and ending the state transition.
As described above, in an actual application scenario, when the level corresponding to the image classification is low, the examination result of the user using the endoscope is greatly affected. Therefore, by the technical scheme, the image classification of the endoscope operation in the target time period can be identified according to the sequence of the grades corresponding to the image classification from low to high, so that the image classification identification is matched with the practical application scene of the endoscope, the usability and the effectiveness of the determined image classification are improved, and reliable data reference is provided for the accurate and reasonable use of the endoscope by a user.
In one possible embodiment, the method may further comprise:
and outputting prompt information under the condition that the grade corresponding to the target image classification is lower than a preset grade or is the same as the preset grade, wherein the prompt information is used for indicating that the missing detection risk exists.
The preset grade can be set according to the actual application scene. The determined target image classification is displayed in real time in a display interface which can be used for displaying the image of the endoscope, so that a user can be prompted in real time. And in the scene requiring rough screening, the preset grade can be a poor grade, and when the determined image is classified as extremely poor or poor, prompt information can be further output, for example, the prompt information is displayed in a display interface, for example, the prompt information can be 'high current missed inspection risk', 'please review', 'please execute the process of endoscope retreat', the prompt information can be directly displayed, or can be voice-prompted, or can be prompted through a popup window, so that a doctor can timely know that the mucous membrane coverage area of an inspection area in the process of endoscope retreat is insufficient, the missed inspection phenomenon is easy to occur, and thus the doctor can adjust the direction of the endoscope according to the prompt information, or execute the process of endoscope retreat again. From this, can carry out endoscope at the doctor and move back the mirror in-process and carry out real time monitoring to its endoscope operation, can in time indicate when doctor's inspection scope is not enough, provide reliable suggestion for the doctor carries out comprehensive and effectual inspection, can reduce to a certain extent and leak the rate of examining, convenient to use improves user and uses experience.
The present disclosure also provides an artificial intelligence based endoscopic image processing apparatus, as shown in fig. 5, the apparatus 10 including:
an acquisition module 100 for acquiring an examination image of an endoscope;
an extracting module 200, configured to extract a depth image corresponding to the inspection image according to a depth map model, where the depth image is used to represent structural information of a tissue corresponding to the inspection image;
a first determining module 300, configured to determine, according to the depth image, the inspection image, and an image classification model, an image classification corresponding to the inspection image, where the image classification is used to represent a blind area proportion of a tissue corresponding to the inspection image;
a second determining module 400, configured to determine, at an interval of a target time period, a target image classification corresponding to an endoscopic operation in the target time period according to image classifications corresponding to a plurality of inspection images received in the target time period, and output the target image classification.
Optionally, the depth map model comprises a plurality of feature extraction submodels connected in series, and the extraction module comprises:
the down-sampling sub-module is used for down-sampling the inspection image to obtain a down-sampled image corresponding to the inspection image;
the first processing submodule is used for inputting a target image into the feature extraction submodel and obtaining a feature map output by the feature extraction submodel, wherein if the feature extraction submodel is a first feature extraction submodel, the target image is the downsampling image, and if the feature extraction submodel is not the first feature extraction submodel, the target image is an image obtained by fusing the downsampling image and the feature map output by the feature extraction submodel which is arranged before the feature extraction submodel;
and the second processing submodule is used for carrying out deconvolution operation on the feature map output by the last feature extraction submodel to obtain the depth image.
Optionally, the depth map model includes a plurality of feature extraction submodels connected in series, and a target loss of the depth map model in the training process is obtained by:
down-sampling a true value depth image corresponding to a training image input into the depth map model to obtain a true value feature image corresponding to each feature extraction sub-model, wherein the resolution of the feature map output by each feature extraction sub-model is the same as the resolution of the true value feature image corresponding to the feature extraction sub-model;
aiming at each feature extraction submodel except the last feature extraction submodel, determining the intermediate loss corresponding to the feature extraction submodel according to a feature map output by the feature extraction submodel and a truth-value feature image corresponding to the feature extraction submodel;
determining the prediction loss of the depth map model according to the depth image output by the depth map model and the true value depth image;
determining a sum of each of the intermediate losses and the predicted loss as a target loss for the depth map model.
Optionally, the image classification model is determined by:
acquiring a historical examination image corresponding to endoscope operation;
extracting a depth image corresponding to the historical inspection image according to the depth map model, and fusing the depth image corresponding to the historical inspection image with the historical inspection image to obtain a training image;
and taking the training image as the input of a preset classification model, taking the label classification corresponding to the historical inspection image as the target output of the preset classification model, and training the preset classification model to obtain the image classification model.
Optionally, the historical examination image comprises a historical image determined based on an endoscope image captured by the endoscope, and an enhanced image obtained by performing data enhancement on the historical image, wherein the data enhancement comprises one or more of the following: random inversion, random imitation shooting transformation and color disturbance.
Optionally, the second determining module includes:
the first determining submodule is used for taking the image classification at the lowest level as the target image classification if the continuous accumulated quantity of the inspection images corresponding to the image classification at the lowest level in the target time period exceeds a preset threshold value;
and the second determining submodule is used for determining the target image classification according to the total accumulated quantity of the inspection images under each image classification in the target time period if the continuous accumulated quantity of the inspection images under the image classification corresponding to the lowest level in the target time period does not exceed the preset threshold.
Optionally, the second determining sub-module includes:
a third determining submodule, configured to determine a size relationship between a target ratio corresponding to a candidate image classification and a level threshold corresponding to the candidate image classification, where the target ratio is a ratio of a total cumulative number of inspection images under the candidate image classification to a target total number, the target total number is a sum of numbers of the inspection images in the target time period, and the candidate image classification is initially an image classification with a lowest level;
a fourth determining sub-module, configured to, if the target ratio corresponding to the candidate image classification is greater than or equal to the level threshold corresponding to the candidate image classification, take the candidate image classification as the target image classification;
a fifth determining submodule, configured to, if the target ratio corresponding to the candidate image classification is smaller than the level threshold corresponding to the candidate image classification, obtain a next image classification of the candidate image classification according to a sequence from low to high of the levels corresponding to the image classifications; if the next image classification is not the highest grade, taking the next image classification as a new candidate image classification, and triggering a third determination submodule to determine the size relation between the target ratio corresponding to the candidate image classification and the grade threshold corresponding to the candidate image classification; and if the next image classification is the highest grade, determining the next image classification as the target image classification.
Optionally, the obtaining module includes:
the acquisition submodule is used for acquiring an endoscope image shot by the endoscope in the process of withdrawing the endoscope;
and the sixth determining submodule is used for performing secondary classification on the endoscope images according to an inspection image determining model and determining the images which are correspondingly classified into normal classification in the endoscope images as the inspection images, wherein the training samples of the inspection image determining model comprise positive samples of the images corresponding to the normal classification and negative samples of the images corresponding to one or more abnormal classifications.
Optionally, the apparatus further comprises:
and the output module is used for outputting prompt information under the condition that the grade corresponding to the target image classification is lower than a preset grade or is the same as the preset grade, wherein the prompt information is used for indicating that the missing detection risk exists.
Referring now to FIG. 6, a block diagram of an electronic device 600 suitable for use in implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring an inspection image of an endoscope; extracting a depth image corresponding to the inspection image according to a depth map model, wherein the depth image is used for representing the structural information of the tissue corresponding to the inspection image; determining an image classification corresponding to the inspection image according to the depth image, the inspection image and an image classification model, wherein the image classification is used for representing a blind area proportion of a tissue corresponding to the inspection image; and determining a target image classification corresponding to the endoscope operation in the target time interval according to the image classification corresponding to the plurality of inspection images received in the target time interval at the interval target time interval, and outputting the target image classification.
Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present disclosure may be implemented by software or hardware. The name of the module does not in some cases constitute a limitation of the module itself, and for example, the acquisition module may also be described as a "module that acquires an examination image of an endoscope".
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Example 1 provides an artificial intelligence based endoscopic image processing method according to one or more embodiments of the present disclosure, wherein the method comprises:
acquiring an inspection image of an endoscope;
extracting a depth image corresponding to the inspection image according to a depth map model, wherein the depth image is used for representing the structural information of the tissue corresponding to the inspection image;
determining an image classification corresponding to the inspection image according to the depth image, the inspection image and an image classification model, wherein the image classification is used for representing a blind area proportion of a tissue corresponding to the inspection image;
and determining a target image classification corresponding to the endoscope operation in the target time interval according to the image classification corresponding to the plurality of inspection images received in the target time interval at the interval target time interval, and outputting the target image classification.
Example 2 provides the method of example 1, wherein the depth map model includes a plurality of feature extraction submodels connected in series, and the extracting a depth image corresponding to the inspection image according to the depth map model includes:
performing down-sampling on the inspection image to obtain a down-sampled image corresponding to the inspection image;
inputting a target image into the feature extraction submodel to obtain a feature map output by the feature extraction submodel, wherein if the feature extraction submodel is a first feature extraction submodel, the target image is the downsampled image, and if the feature extraction submodel is not the first feature extraction submodel, the target image is an image obtained by fusing the downsampled image and the feature map output by a previous feature extraction submodel of the feature extraction submodel;
and performing deconvolution operation on the feature map output by the last feature extraction sub-model to obtain the depth image.
Example 3 provides the method of example 1, wherein the depth map model comprises a plurality of feature extraction submodels connected in series, and a target loss of the depth map model in a training process is obtained by:
down-sampling a true value depth image corresponding to a training image input into the depth map model to obtain a true value feature image corresponding to each feature extraction sub-model, wherein the resolution of the feature map output by each feature extraction sub-model is the same as the resolution of the true value feature image corresponding to the feature extraction sub-model;
aiming at each feature extraction submodel except the last feature extraction submodel, determining the intermediate loss corresponding to the feature extraction submodel according to a feature map output by the feature extraction submodel and a truth-value feature image corresponding to the feature extraction submodel;
determining the prediction loss of the depth map model according to the depth image output by the depth map model and the true value depth image;
determining a sum of each of the intermediate losses and the predicted loss as a target loss for the depth map model.
Example 4 provides the method of example 1, wherein the image classification model is determined by:
acquiring a historical examination image corresponding to endoscope operation;
extracting a depth image corresponding to the historical inspection image according to the depth map model, and fusing the depth image corresponding to the historical inspection image with the historical inspection image to obtain a training image;
and taking the training image as the input of a preset classification model, taking the label classification corresponding to the historical inspection image as the target output of the preset classification model, and training the preset classification model to obtain the image classification model.
Example 5 provides the method of example 4, wherein the historical exam images include historical images determined based on endoscopic images taken by the endoscope, and enhanced images obtained by data enhancing the historical images, the data enhancing including one or more of: random inversion, random imitation shooting transformation and color disturbance.
Example 6 provides the method of example 1, wherein the determining, by the interval target period according to image classifications corresponding to a plurality of examination images received within the target period, a target image classification corresponding to an endoscopic operation within the target period comprises:
if the continuous accumulated quantity of the inspection images corresponding to the lowest level of image classification in the target time period exceeds a preset threshold value, taking the lowest level of image classification as the target image classification;
and if the continuous accumulative quantity of the inspection images corresponding to the image classification of the lowest level in the target time period does not exceed the preset threshold, determining the target image classification according to the total accumulative quantity of the inspection images under each image classification in the target time period.
Example 7 provides the method of example 6, wherein the determining the target image classification according to the total accumulated number of inspection images under each image classification within the target period includes:
determining a size relation between a target ratio corresponding to a candidate image classification and a grade threshold corresponding to the candidate image classification, wherein the target ratio is the ratio of the total accumulated number of the inspection images under the candidate image classification to the total number of targets, the total number of the targets is the sum of the number of the inspection images in the target time period, and the candidate image classification is initially the image classification with the lowest grade;
if the target ratio corresponding to the candidate image classification is larger than or equal to the grade threshold corresponding to the candidate image classification, taking the candidate image classification as the target image classification;
if the target ratio corresponding to the candidate image classification is smaller than the grade threshold corresponding to the candidate image classification, acquiring the next image classification of the candidate image classification according to the sequence from low grade to high grade corresponding to the image classification;
if the next image classification is not the highest grade, taking the next image classification as a new candidate image classification, and re-executing the step of determining the size relation between the target ratio corresponding to the candidate image classification and the grade threshold corresponding to the candidate image classification; and if the next image classification is the highest grade, determining the next image classification as the target image classification.
Example 8 provides the method of example 1, wherein the acquiring an inspection image of an endoscope, according to one or more embodiments of the present disclosure, includes:
acquiring an endoscope image shot by the endoscope in the process of withdrawing the endoscope;
and performing secondary classification on the endoscope images according to an inspection image determination model, and determining the images which are correspondingly classified into a normal classification in the endoscope images as the inspection images, wherein a training sample of the inspection image determination model comprises a positive sample of the images corresponding to the normal classification and a negative sample of the images corresponding to one or more abnormal classifications.
Example 9 provides the method of any of examples 1-8, wherein the method further comprises, in accordance with one or more embodiments of the present disclosure:
and outputting prompt information under the condition that the grade corresponding to the target image classification is lower than a preset grade or is the same as the preset grade, wherein the prompt information is used for indicating that the missing detection risk exists.
Example 10 provides an artificial intelligence-based endoscopic image processing apparatus according to one or more embodiments of the present disclosure, wherein the apparatus comprises:
the acquisition module is used for acquiring an examination image of the endoscope;
the extraction module is used for extracting a depth image corresponding to the inspection image according to a depth map model, wherein the depth image is used for representing the structural information of the tissue corresponding to the inspection image;
a first determining module, configured to determine, according to the depth image, the inspection image, and an image classification model, an image classification corresponding to the inspection image, where the image classification is used to represent a blind area proportion of a tissue corresponding to the inspection image;
and the second determining module is used for determining the target image classification corresponding to the endoscope operation in the target time interval according to the image classification corresponding to the plurality of inspection images received in the target time interval at intervals of the target time interval, and outputting the target image classification.
Example 11 provides a computer-readable medium having stored thereon a computer program that, when executed by a processing apparatus, performs the steps of the method of any of examples 1-9, in accordance with one or more embodiments of the present disclosure.
Example 12 provides, in accordance with one or more embodiments of the present disclosure, an electronic device, comprising:
a storage device having a computer program stored thereon;
processing means for executing the computer program in the storage means to carry out the steps of the method of any of examples 1-9.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.
Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims. With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Claims (12)

1. An artificial intelligence based endoscopic image processing method, the method comprising:
acquiring an inspection image of an endoscope;
extracting a depth image corresponding to the inspection image according to a depth map model, wherein the depth image is used for representing the structural information of the tissue corresponding to the inspection image;
determining an image classification corresponding to the inspection image according to the depth image, the inspection image and an image classification model, wherein the image classification is used for representing a blind area proportion of a tissue corresponding to the inspection image;
and determining a target image classification corresponding to the endoscope operation in the target time interval according to the image classification corresponding to the plurality of inspection images received in the target time interval at the interval target time interval, and outputting the target image classification.
2. The method of claim 1, wherein the depth map model comprises a plurality of feature extraction submodels connected in series, and the extracting the depth image corresponding to the inspection image according to the depth map model comprises:
performing down-sampling on the inspection image to obtain a down-sampled image corresponding to the inspection image;
inputting a target image into the feature extraction submodel to obtain a feature map output by the feature extraction submodel, wherein if the feature extraction submodel is a first feature extraction submodel, the target image is the downsampled image, and if the feature extraction submodel is not the first feature extraction submodel, the target image is an image obtained by fusing the downsampled image and the feature map output by a previous feature extraction submodel of the feature extraction submodel;
and performing deconvolution operation on the feature map output by the last feature extraction sub-model to obtain the depth image.
3. The method of claim 1, wherein the depth map model comprises a plurality of feature extraction submodels connected in series, and the target loss of the depth map model in the training process is obtained by:
down-sampling a true value depth image corresponding to a training image input into the depth map model to obtain a true value feature image corresponding to each feature extraction sub-model, wherein the resolution of the feature map output by each feature extraction sub-model is the same as the resolution of the true value feature image corresponding to the feature extraction sub-model;
aiming at each feature extraction submodel except the last feature extraction submodel, determining the intermediate loss corresponding to the feature extraction submodel according to a feature map output by the feature extraction submodel and a truth-value feature image corresponding to the feature extraction submodel;
determining the prediction loss of the depth map model according to the depth image output by the depth map model and the true value depth image;
determining a sum of each of the intermediate losses and the predicted loss as a target loss for the depth map model.
4. The method of claim 1, wherein the image classification model is determined by:
acquiring a historical examination image corresponding to endoscope operation;
extracting a depth image corresponding to the historical inspection image according to the depth map model, and fusing the depth image corresponding to the historical inspection image with the historical inspection image to obtain a training image;
and taking the training image as the input of a preset classification model, taking the label classification corresponding to the historical inspection image as the target output of the preset classification model, and training the preset classification model to obtain the image classification model.
5. The method of claim 4, wherein the historical examination images comprise historical images determined based on endoscopic images taken by the endoscope, and enhanced images obtained by data enhancing the historical images, the data enhancing comprising one or more of: random inversion, random imitation shooting transformation and color disturbance.
6. The method of claim 1, wherein the determining, based on the image classifications corresponding to the plurality of examination images received in the target period, a target image classification corresponding to the endoscopic procedure in the target period comprises:
if the continuous accumulated quantity of the inspection images corresponding to the lowest level of image classification in the target time period exceeds a preset threshold value, taking the lowest level of image classification as the target image classification;
and if the continuous accumulative quantity of the inspection images corresponding to the image classification of the lowest level in the target time period does not exceed the preset threshold, determining the target image classification according to the total accumulative quantity of the inspection images under each image classification in the target time period.
7. The method of claim 6, wherein determining the target image classification based on the total cumulative number of inspection images under each image classification within the target time period comprises:
determining a size relation between a target ratio corresponding to a candidate image classification and a grade threshold corresponding to the candidate image classification, wherein the target ratio is the ratio of the total accumulated number of the inspection images under the candidate image classification to the total number of targets, the total number of the targets is the sum of the number of the inspection images in the target time period, and the candidate image classification is initially the image classification with the lowest grade;
if the target ratio corresponding to the candidate image classification is larger than or equal to the grade threshold corresponding to the candidate image classification, taking the candidate image classification as the target image classification;
if the target ratio corresponding to the candidate image classification is smaller than the grade threshold corresponding to the candidate image classification, acquiring the next image classification of the candidate image classification according to the sequence from low grade to high grade corresponding to the image classification;
if the next image classification is not the highest grade, taking the next image classification as a new candidate image classification, and re-executing the step of determining the size relation between the target ratio corresponding to the candidate image classification and the grade threshold corresponding to the candidate image classification; and if the next image classification is the highest grade, determining the next image classification as the target image classification.
8. The method of claim 1, wherein said acquiring an endoscopic image comprises:
acquiring an endoscope image shot by the endoscope in the process of withdrawing the endoscope;
and performing secondary classification on the endoscope images according to an inspection image determination model, and determining the images which are correspondingly classified into a normal classification in the endoscope images as the inspection images, wherein a training sample of the inspection image determination model comprises a positive sample of the images corresponding to the normal classification and a negative sample of the images corresponding to one or more abnormal classifications.
9. The method according to any one of claims 1-8, further comprising:
and outputting prompt information under the condition that the grade corresponding to the target image classification is lower than a preset grade or is the same as the preset grade, wherein the prompt information is used for indicating that the missing detection risk exists.
10. An artificial intelligence based endoscopic image processing apparatus, the apparatus comprising:
the acquisition module is used for acquiring an examination image of the endoscope;
the extraction module is used for extracting a depth image corresponding to the inspection image according to a depth map model, wherein the depth image is used for representing the structural information of the tissue corresponding to the inspection image;
a first determining module, configured to determine, according to the depth image, the inspection image, and an image classification model, an image classification corresponding to the inspection image, where the image classification is used to represent a blind area proportion of a tissue corresponding to the inspection image;
and the second determining module is used for determining the target image classification corresponding to the endoscope operation in the target time interval according to the image classification corresponding to the plurality of inspection images received in the target time interval at intervals of the target time interval, and outputting the target image classification.
11. A computer-readable medium, on which a computer program is stored, characterized in that the program, when being executed by processing means, carries out the steps of the method of any one of claims 1-9.
12. An electronic device, comprising:
a storage device having a computer program stored thereon;
processing means for executing the computer program in the storage means to carry out the steps of the method according to any one of claims 1 to 9.
CN202111653381.XA 2021-12-30 2021-12-30 Endoscope image processing method, apparatus, medium, and device based on artificial intelligence Pending CN114332033A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111653381.XA CN114332033A (en) 2021-12-30 2021-12-30 Endoscope image processing method, apparatus, medium, and device based on artificial intelligence
PCT/CN2022/139016 WO2023125008A1 (en) 2021-12-30 2022-12-14 Artificial intelligence-based endoscope image processing method and apparatus, medium and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111653381.XA CN114332033A (en) 2021-12-30 2021-12-30 Endoscope image processing method, apparatus, medium, and device based on artificial intelligence

Publications (1)

Publication Number Publication Date
CN114332033A true CN114332033A (en) 2022-04-12

Family

ID=81019697

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111653381.XA Pending CN114332033A (en) 2021-12-30 2021-12-30 Endoscope image processing method, apparatus, medium, and device based on artificial intelligence

Country Status (2)

Country Link
CN (1) CN114332033A (en)
WO (1) WO2023125008A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023125008A1 (en) * 2021-12-30 2023-07-06 小荷医疗器械(海南)有限公司 Artificial intelligence-based endoscope image processing method and apparatus, medium and device

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116958147B (en) * 2023-09-21 2023-12-22 青岛美迪康数字工程有限公司 Target area determining method, device and equipment based on depth image characteristics

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102013206911A1 (en) * 2013-04-17 2014-10-23 Siemens Aktiengesellschaft Method and apparatus for the stereoscopic display of image data
CN111062981B (en) * 2019-12-13 2023-05-05 腾讯科技(深圳)有限公司 Image processing method, device and storage medium
US11918178B2 (en) * 2020-03-06 2024-03-05 Verily Life Sciences Llc Detecting deficient coverage in gastroenterological procedures
CN114332033A (en) * 2021-12-30 2022-04-12 小荷医疗器械(海南)有限公司 Endoscope image processing method, apparatus, medium, and device based on artificial intelligence

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023125008A1 (en) * 2021-12-30 2023-07-06 小荷医疗器械(海南)有限公司 Artificial intelligence-based endoscope image processing method and apparatus, medium and device

Also Published As

Publication number Publication date
WO2023125008A1 (en) 2023-07-06

Similar Documents

Publication Publication Date Title
US20210158533A1 (en) Image processing method and apparatus, and storage medium
WO2023125008A1 (en) Artificial intelligence-based endoscope image processing method and apparatus, medium and device
CN110599421B (en) Model training method, video fuzzy frame conversion method, device and storage medium
CN111275721B (en) Image segmentation method and device, electronic equipment and storage medium
CN113487608B (en) Endoscope image detection method, endoscope image detection device, storage medium, and electronic apparatus
CN113469295B (en) Training method for generating model, polyp recognition method, device, medium, and apparatus
CN111144271B (en) Method and system for automatically identifying biopsy parts and biopsy quantity under endoscope
WO2023029741A1 (en) Tissue cavity locating method and apparatus for endoscope, medium and device
CN113470031B (en) Polyp classification method, model training method and related device
CN113470029B (en) Training method and device, image processing method, electronic device and storage medium
CN113470030B (en) Method and device for determining cleanliness of tissue cavity, readable medium and electronic equipment
CN109977832B (en) Image processing method, device and storage medium
CN114782388A (en) Endoscope advance and retreat time determining method and device based on image recognition
CN113496512B (en) Tissue cavity positioning method, device, medium and equipment for endoscope
WO2023138619A1 (en) Endoscope image processing method and apparatus, readable medium, and electronic device
CN114240867A (en) Training method of endoscope image recognition model, endoscope image recognition method and device
CN111311609B (en) Image segmentation method and device, electronic equipment and storage medium
CN110349108B (en) Method, apparatus, electronic device, and storage medium for processing image
WO2023165332A1 (en) Tissue cavity positioning method, apparatus, readable medium, and electronic device
CN114937178B (en) Multi-modality-based image classification method and device, readable medium and electronic equipment
CN114863124A (en) Model training method, polyp detection method, corresponding apparatus, medium, and device
CN114419400A (en) Training method, recognition method, device, medium and equipment of image recognition model
CN113470026B (en) Polyp recognition method, device, medium, and apparatus
CN116228715B (en) Training method of polyp detection model, polyp detection method and related device
CN114782390B (en) Determination method of detection model, polyp detection method, polyp detection device, medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination