WO2023030373A1 - 组织腔体的定位方法、装置、可读介质和电子设备 - Google Patents

组织腔体的定位方法、装置、可读介质和电子设备 Download PDF

Info

Publication number
WO2023030373A1
WO2023030373A1 PCT/CN2022/116108 CN2022116108W WO2023030373A1 WO 2023030373 A1 WO2023030373 A1 WO 2023030373A1 CN 2022116108 W CN2022116108 W CN 2022116108W WO 2023030373 A1 WO2023030373 A1 WO 2023030373A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
tissue
sample
sub
tissue image
Prior art date
Application number
PCT/CN2022/116108
Other languages
English (en)
French (fr)
Inventor
边成
李永会
赵家英
石小周
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Publication of WO2023030373A1 publication Critical patent/WO2023030373A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10068Endoscopic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30028Colon; Small intestine
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30092Stomach; Gastric

Definitions

  • the present disclosure relates to the technical field of image processing, and in particular, relates to a tissue cavity positioning method, device, readable medium and electronic equipment.
  • the endoscope is equipped with optical lens, image sensor, light source and other components, which can enter the inside of the human body for inspection, so that doctors can intuitively observe the internal conditions of the human body, and have been widely used in the medical field. Since endoscopy is an invasive inspection method, during the process of entering the endoscope, it may be due to the lack of experience of the operator, poor viewing angle, etc., which may lead to slow or even failure to enter the endoscope. There will be damage to the tissue and mucous membrane, causing pain and other problems for the user. Therefore, it is necessary to accurately identify the position of the tissue cavity in the tissue image collected by the endoscope, so as to ensure the safe and effective entry of the endoscope.
  • the endoscope may be, for example, a colonoscope, a gastroscope, or the like.
  • colonoscopy it is necessary to identify the position of the intestinal lumen in the intestinal tract images collected by the colonoscope; for gastroscopy, it is necessary to identify the esophageal cavity or gastric cavity in the esophagus or stomach images collected by the gastroscope. The location of the inner cavity.
  • the position of the tissue cavity can be determined by performing image segmentation, position estimation and other processing on the tissue image.
  • tissue cavities may not exist in the tissue image, resulting in low recognition accuracy or even failure to identify them.
  • the present disclosure provides a method for positioning a tissue cavity, the method comprising:
  • the target type indicates that there is a tissue cavity in the tissue image
  • the target type indicates that there is no tissue cavity in the tissue image
  • the present disclosure provides a device for positioning a tissue cavity, the device comprising:
  • An acquisition module configured to acquire tissue images collected by the endoscope at the current moment
  • a classification module configured to classify the tissue image using a pre-trained classification model to determine the target type of the tissue image
  • the first positioning module is configured to determine the position of the tissue cavity in the tissue image according to the pre-trained first positioning model and the tissue image if the target type indicates that there is a tissue cavity in the tissue image;
  • the second positioning module is configured to determine the tissue cavity in the tissue image according to the pre-trained second positioning model, the tissue image and the historical tissue image if the target type indicates that there is no tissue cavity in the tissue image
  • the position of the body, the historical tissue image is an image collected by the endoscope before the current moment.
  • the present disclosure provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processing device, the steps of the method described in the first aspect of the present disclosure are implemented.
  • an electronic device including:
  • a processing device configured to execute the computer program in the storage device to implement the steps of the method described in the first aspect of the present disclosure.
  • the present disclosure first acquires the tissue image collected by the endoscope at the current moment, and then uses the classification model to classify the tissue image to determine the target type of the tissue image. If the target type indicates that there is a tissue cavity in the tissue image, determine the position of the tissue cavity in the tissue image according to the first positioning model and the tissue image; if the target type indicates that there is no tissue cavity in the tissue image, The position of the tissue cavity in the tissue image is determined according to the second positioning model, the tissue image, and the historical tissue image collected by the endoscope before the current moment. The disclosure first classifies tissue images, and selects different positioning methods to locate the positions of tissue cavities in the tissue images according to different classification results, which can improve the success rate and accuracy of positioning.
  • Fig. 1 is a flow chart of a method for locating a tissue cavity according to an exemplary embodiment
  • Fig. 2 is a flowchart of another method for positioning a tissue cavity according to an exemplary embodiment
  • Fig. 3 is a flow chart of another method for positioning a tissue cavity according to an exemplary embodiment
  • Fig. 4 is a schematic diagram of a classification model shown according to an exemplary embodiment
  • Fig. 5 is a flow chart of another method for positioning a tissue cavity according to an exemplary embodiment
  • Fig. 6 is a schematic diagram of a first positioning model according to an exemplary embodiment
  • Fig. 7 is a flowchart of another method for positioning a tissue cavity according to an exemplary embodiment
  • Fig. 8 is a schematic diagram showing a second positioning model according to an exemplary embodiment
  • Fig. 9 is a flowchart showing a training classification model according to an exemplary embodiment
  • Fig. 10 is a flow chart of training a first positioning model according to an exemplary embodiment
  • Fig. 11 is a flow chart of training a second positioning model according to an exemplary embodiment
  • Fig. 12 is a block diagram of a positioning device for a tissue cavity according to an exemplary embodiment
  • Fig. 13 is a block diagram of another device for positioning a tissue cavity according to an exemplary embodiment
  • Fig. 14 is a block diagram of another device for positioning a tissue cavity according to an exemplary embodiment
  • Fig. 15 is a block diagram of another tissue cavity positioning device according to an exemplary embodiment
  • Fig. 16 is a block diagram of another device for positioning a tissue cavity according to an exemplary embodiment
  • Fig. 17 is a block diagram of an electronic device according to an exemplary embodiment.
  • the term “comprise” and its variations are open-ended, ie “including but not limited to”.
  • the term “based on” is “based at least in part on”.
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one further embodiment”; the term “some embodiments” means “at least some embodiments.” Relevant definitions of other terms will be given in the description below.
  • Fig. 1 is a flow chart of a method for locating a tissue cavity according to an exemplary embodiment. As shown in Fig. 1 , the method includes the following steps:
  • Step 101 acquire the tissue image collected by the endoscope at the current moment.
  • Step 102 using a pre-trained classification model to classify the tissue image, so as to determine the target type of the tissue image.
  • the endoscope will be inserted into the tissue, and images in the tissue will be continuously collected according to a preset collection period.
  • the tissue image collected by the endoscope can be input into the pre-trained classification model, so that the classification model can classify the tissue image, and the output of the classification model is the target type of the tissue image.
  • the target type may include: the first type and the second type, the first type is used to indicate that there is a tissue cavity in the tissue image, the second type is used to indicate that there is no tissue cavity in the tissue image, and the target type may also include the third type , which indicates that the quality of the tissue image is too low.
  • the classification model is used to identify the type of the input image, and the classification model can be trained according to a large number of pre-collected training images and the type label corresponding to each training image, and the classification label is used to indicate the true type of the training image.
  • the classification model can be, for example, CNN (English: Convolutional Neural Networks, Chinese: Convolutional Neural Network) or LSTM (English: Long Short-Term Memory, Chinese: Long Short-Term Memory Network), or it can be a Transformer (such as Vision Transformer). Encoder, etc., this disclosure does not specifically limit it.
  • Step 103 if the target type indicates that there is a tissue cavity in the tissue image, determine the position of the tissue cavity in the tissue image according to the pre-trained first positioning model and the tissue image.
  • Step 104 if the target type indicates that there is no tissue cavity in the tissue image, determine the position of the tissue cavity in the tissue image according to the pre-trained second positioning model, the tissue image and the historical tissue image, the historical tissue image is within the current time Images captured by the scope.
  • the tissue image can be input into the pre-trained first positioning model, and the tissue cavity in the tissue image can be detected by the first positioning model
  • the tissue cavity is positioned, and the output of the first positioning model is the position of the tissue cavity in the tissue image, which can be understood as the coordinates of the tissue cavity in the tissue image.
  • the first positioning model is used to locate the position of the tissue cavity in the input image, and the first positioning model can be trained according to a large number of pre-collected training images and the real position of the tissue cavity in each training image.
  • the first positioning model may be, for example, CNN or LSTM, or an Encoder in a Transformer (such as a Vision Transformer), which is not specifically limited in the present disclosure.
  • the tissue image and the historical tissue images collected by the endoscope before the current moment can be input together into the pre-trained second In the positioning model, the tissue cavity in the tissue image is positioned by the second positioning model, and the output of the second positioning model is the position of the tissue cavity in the tissue image, which can be understood as the coordinates of the tissue cavity in the tissue image.
  • the second positioning model is used to locate the position of the tissue cavity in a set of input images, and a large number of training images can be collected in advance, and the training images can be divided into multiple training image groups according to the acquisition time, and can be divided into multiple training image groups according to each
  • the training image group and the real position of the tissue cavity in each training image group are used to train the second localization model.
  • the second positioning model may be, for example, CNN or LSTM, or an Encoder in a Transformer (such as a Vision Transformer), which is not specifically limited in the present disclosure.
  • the historical tissue image may be one frame or multiple frames, and the historical tissue image may be determined according to a preset number of frames, or may be determined according to a preset time window.
  • the 4 frames of images continuously collected by the endoscope before the current moment can be used as historical tissue images, and the images continuously collected by the endoscope within 5s before the current moment can also be used as historical tissue images (if the acquisition period is 1s, then Historical tissue images include 5 frame images).
  • Historical tissue images include 5 frame images.
  • the endoscope described in the embodiments of the present disclosure can be, for example, a colonoscope or a gastroscope. If the endoscope is a colonoscope, then the above-mentioned tissue image is an intestinal image, and the tissue cavity is the intestinal cavity. . If the endoscope is a gastroscope, the above-mentioned tissue image may be an image of the esophagus, stomach or duodenum, and correspondingly, the tissue cavity may be the cavity of the esophagus, the cavity of the stomach, or the cavity of the duodenum. In the present disclosure, the endoscope can also be used to acquire images of other tissues with cavities to locate the positions of the cavities in the tissues, which is not specifically limited in the present disclosure.
  • the present disclosure first obtains the tissue image collected by the endoscope at the current moment, and then uses the classification model to classify the tissue image to determine the target type of the tissue image. If the target type indicates that there is a tissue cavity in the tissue image, determine the position of the tissue cavity in the tissue image according to the first positioning model and the tissue image; if the target type indicates that there is no tissue cavity in the tissue image, The position of the tissue cavity in the tissue image is determined according to the second positioning model, the tissue image, and the historical tissue image collected by the endoscope before the current moment. The disclosure first classifies tissue images, and selects different positioning methods to locate the positions of tissue cavities in the tissue images according to different classification results, which can improve the success rate and accuracy of positioning.
  • Fig. 2 is a flow chart of another method for positioning a tissue cavity according to an exemplary embodiment. As shown in Fig. 2, the method may further include:
  • Step 105 if the target type indicates that the quality of the tissue image does not meet the preset condition, determine the direction of entry of the endoscope according to the history of tissue images, so as to control the endoscope to advance according to the direction of entry.
  • Step 106 when the position of the tissue cavity in the tissue image is obtained, determine the direction of the endoscope according to the position of the tissue cavity in the tissue image, so as to control the endoscope to move according to the direction of the lens.
  • the target type indicates that the quality of the tissue image does not meet the preset conditions
  • the target type is the third type, which means that the quality of the tissue image is poor, and contains too little effective information to locate the tissue according to the tissue image.
  • the direction of entry of the endoscope can be determined, for example, the direction of entry of the endoscope is determined as facing the position of the tissue cavity in the historical tissue image, and then the direction of entry of the endoscope is determined .
  • the preset conditions may include at least one of the following: the colonoscope is not blocked when the intestinal image is collected, the distance between the colonoscope and the intestinal wall is greater than The preset threshold, the exposure of the intestinal tract image is less than the preset exposure threshold, the blurriness of the intestinal tract image is less than the preset blurriness threshold, no intestinal adhesion occurs in the intestinal tract image, and the like.
  • the intestinal tract is covered by sewage, or the colonoscope is too close to the intestinal wall, the intestinal image is overexposed, the intestinal image is too blurred, or adhesions occur in the intestinal tract, the quality of the intestinal image does not meet the preset conditions.
  • the direction of the endoscope can be determined according to the position of the tissue cavity in the tissue image, so as to control the endoscope to move according to the direction of the lens.
  • the lens advancing direction may be determined as facing the position of the tissue cavity in the tissue image, and then the lens advancing is performed according to the lens advancing direction.
  • the lens-entry distance of the endoscope can also be controlled according to whether the position of the tissue cavity in the tissue image is obtained.
  • the mirror distance can be divided into three grades from short to long, the mirror distance of the first grade is the shortest, and the mirror distance of the third grade is the longest.
  • the lens distance can be the first level, if the position of the tissue cavity in the tissue image is obtained according to step 104, then the lens distance can be the second level, if according to Step 103 obtains the position of the tissue cavity in the tissue image, then the lens distance can be the third grade.
  • Fig. 3 is a flow chart showing another method for locating tissue cavities according to an exemplary embodiment.
  • the structure of the classification model may be shown in (a) in Fig. 4, wherein the tissue image
  • a classification model could include: an encoder and a classification layer, and it could also include a linear projection layer.
  • the encoder can be the Encoder in Vision Transformer
  • the classification layer can be MLP (English: Multilayer Perceptron Head)
  • the linear projection layer English: Linear Projection
  • step 102 may include:
  • Step 1021 perform preprocessing on the tissue image, and divide the preprocessed tissue image into multiple sub-images of equal size.
  • the tissue image is firstly preprocessed to enhance the data included in the tissue image.
  • the preprocessing may include: random affine transformation, random brightness, contrast, saturation, and hue adjustment, size transformation (English : Resize) and other processing, the finally obtained preprocessed tissue image may be an image of a specified size (for example, 224*224).
  • the preprocessed tissue image can be divided into multiple sub-images of equal size (which can be represented as patches) according to the specified size. For example, the preprocessed tissue image is 224*224, and the specified size is 16*16, then you can Divide the preprocessed tissue image into 196 sub-images.
  • Step 1022 Determine the token corresponding to each sub-image according to the image vector corresponding to each sub-image and the position vector corresponding to the sub-image.
  • the position vector is used to indicate the position of the sub-image in the preprocessed tissue image.
  • the linear projection layer can be used to flatten each sub-image first, that is, to flatten the sub-image into a one-dimensional vector, and then perform a linear transformation on the one-dimensional vector corresponding to the sub-image (which can be understood as passing through a fully connected layer ), to perform dimensionality reduction processing on each sub-image to obtain an image vector corresponding to the sub-image (which can be expressed as patch embedding), and the image vector can represent the sub-image.
  • the 9 blocks output by Linear Projection are image vectors.
  • a position vector (may be expressed as position embedding) for indicating the position of the sub-image in the preprocessed tissue image may also be generated, where the size of the position embedding is the same as that of the patch embedding.
  • the 9 blocks identified by numbers 1-9 are the position embeddings corresponding to each sub-image.
  • the position embedding can be randomly generated, and the encoder can learn the representation of the position of the corresponding sub-image in the tissue image.
  • a token corresponding to the sub-image (which can be expressed as a token) can be generated.
  • the token corresponding to each sub-image may be obtained by concatenating the image vector and the position vector of the sub-image (which can be understood as concat).
  • a classification token can also be randomly generated. For example, an image vector (ie, the block identified by the symbol "#”) and a position vector (ie, the block identified by the number 0) can be randomly generated and concatenated as a classification token.
  • Step 1023 Input the token corresponding to each sub-image and the randomly generated classification token into the encoder to obtain a local encoding vector corresponding to each sub-image and a global encoding vector corresponding to the tissue image.
  • Step 1024 input the global encoding vector and multiple local encoding vectors into the classification layer, so as to obtain the target type output by the classification layer.
  • the token corresponding to each sub-image and the classification token can be input into the encoder, and the encoder can generate a local encoding vector corresponding to each sub-image according to the token corresponding to each sub-image, and at the same time, can also be based on all Tokens corresponding to sub-images, generating global encoding vectors corresponding to tissue images.
  • the local encoding vector can be understood as a vector learned by the encoder and can represent the corresponding sub-image
  • the global encoding vector can be understood as the vector learned by the encoder and can represent the entire tissue image.
  • each encoder can be included in the classification model, and the token corresponding to each sub-image and the classification token can be input into each encoder, and the encoder outputs the local encoding vector and tissue image corresponding to each sub-image The corresponding global encoding vector.
  • the structure of each encoder can be shown in (b) in Figure 4. The patch embedding and position embedding are spliced and input into the encoder.
  • the encoder includes: Multi-Head Attention (multi-head attention), Norm&Add And Position-wise FFN structure, among them, Multi-Head Attention can split patch embedding+position embedding (that is, token) into h groups of tokens, and then input them into h attention structures respectively, concat the obtained results, and Use Norm&Add for normalization. Since a residual structure is also added to the encoder, the result after Norm&Add processing is added to the result before processing, and then enters Position-wise FFN, and the output result is then Norm&Add, and added to the residual data to obtain the encoder The output of , namely the global encoding vector and multiple local encoding vectors.
  • the global encoding vector and multiple local encoding vectors are input into the classification layer, and the output of the classification layer is the target type.
  • the global encoding vector and multiple local encoding vectors output by each encoder can be input into the classification layer, and the classification layer outputs the target type.
  • the global encoding vector and multiple local encoding vectors can be concatenated to obtain a comprehensive encoding vector, and then the integrated encoding vector is input into the classification layer, and the classification layer can determine the matching of tissue images with various types according to the integrated encoding vector. Probability, and finally the type with the highest matching probability is used as the target type.
  • the input of the classification layer includes both the global encoding vector and each local encoding vector, the characteristics of the entire tissue image and each sub-image are integrated, that is, the global information and local information are considered, which can effectively improve the classification accuracy of the classification model. .
  • Fig. 5 is a flow chart showing another method for positioning a tissue cavity according to an exemplary embodiment.
  • the structure of the first positioning model may be shown in (a) of Fig. 6, wherein
  • the tissue image is an intestinal image as an example
  • the first positioning model includes: a plurality of first encoders, a point regression sub-model and a heat map sub-model, and may also include a linear projection layer.
  • the first encoder can be the Encoder in the Vision Transformer
  • the linear projection layer can be understood as a fully connected layer.
  • the implementation of step 103 may include:
  • Step 1031 perform preprocessing on the tissue image, and divide the preprocessed tissue image into multiple sub-images of equal size.
  • Step 1032 according to the image vector corresponding to each sub-image and the position vector corresponding to the sub-image, determine the token corresponding to the sub-image, and the position vector is used to indicate the position of the sub-image in the preprocessed tissue image.
  • Step 1033 input the token corresponding to each sub-image and the randomly generated first positioning token into each first encoder, so as to obtain the output of the first encoder, the local encoding vector corresponding to each sub-image, and the organization The global encoding vector corresponding to the image.
  • a first positioning token may also be randomly generated.
  • an image vector and a position vector may be randomly generated and concatenated to serve as the first positioning token.
  • the token corresponding to each sub-image and the first positioning token can be input into each first encoder, and each first encoder can be based on
  • the token corresponding to each sub-image generates a local coding vector corresponding to each sub-image
  • a global coding vector corresponding to the tissue image can also be generated according to the tokens corresponding to all sub-images.
  • the local coding vector can be understood as a vector learned by the first encoder and can represent the corresponding sub-image
  • the global coding vector can be understood as a vector learned by the first encoder and can represent the whole tissue image.
  • the structure of the first encoder is the same as the structure of the encoder shown in (b) in FIG. 4 , and will not be repeated here.
  • Step 1034 input the global encoding vector output by each first encoder into the point regression sub-model to obtain the regression coordinates output by the point regression sub-model.
  • the global encoding vector output by each first encoder may be input into the point regression sub-model, so as to obtain the regression coordinates output by the point regression sub-model.
  • the structure of the point regression sub-model can be shown in (a) in Figure 6, where the global encoding vector is expressed as a regression token, and multiple regression tokens obtain a set of x coordinates (ie x list) and A set of y coordinates (y list), and then input the x list into an MLP to obtain an x coordinate, input the y list into an MLP to obtain a y coordinate, and finally combine an x coordinate and a y coordinate into a regression coordinate ( That is, regression), the regression coordinates can be understood as the position coordinates of the tissue cavity in the tissue image determined by the point regression sub-model.
  • Step 1035 input the output of each first encoder into the heat map sub-model to obtain a heat map output by the heat map sub-model.
  • the local encoding vectors output by each first encoder and corresponding to each sub-image can be input into the heat map sub-model, so as to obtain a heat map (English: Heatmap) output by the heat map sub-model.
  • a heat map English: Heatmap
  • the structure of the heat map sub-model can be shown in (a) in Figure 6, where the local encoding vector is represented as an image token.
  • the two-dimensional image token corresponding to each sub-image output by each first encoder is transformed into a three-dimensional space through a reshape operation.
  • the size of the tissue image is 224*224
  • the size of the sub-image is 16*16
  • the tissue image is divided into 14*14 sub-images
  • the reshape operation transforms the image token corresponding to each sub-image into the size of the vector after three-dimensional space It is 16*16*3.
  • encode the 16*16*3 vector to 512 dimensions through the Linear projection operation, and obtain 196 512-dimensional vectors.
  • use the reshape operation to convert 196 512-dimensional vectors into three-dimensional vectors, namely (14, 14, 512), and further use the channel of the 1*1 convolution transformation feature to transform the (14, 14, 512) dimensional vector is a (14, 14, 1)-dimensional vector.
  • the feature map corresponding to the local encoding vector corresponding to each sub-image output by each first encoder is obtained through bilinear upsampling (ie Upsample), and the dimension of each feature map is (224, 224, 1).
  • the feature maps corresponding to each first encoder are spliced together to obtain a (224, 224, N)-dimensional vector, where N is the number of the first encoder, and finally two layers of convolution are used to convert ( The 224, 224, N)-dimensional vector is transformed by convolution to obtain the (224, 224, 1)-dimensional vector, which is the heat map output by the heat map sub-model.
  • the coordinates of the point with the highest brightness in the heat map are the position coordinates of the tissue cavity in the tissue image determined by the heat map sub-model, namely the heat coordinates mentioned later.
  • Step 1036 determine the position of the tissue cavity in the tissue image according to the regression coordinates and the thermal coordinates, where the thermal coordinates are the coordinates of the point with the highest brightness in the thermal map.
  • the two coordinates can be mutually verified to determine the position of the tissue cavity in the tissue image.
  • the position of the tissue cavity may be marked in the tissue image as an output of the method for locating the tissue cavity provided in the present disclosure, as shown in (b) of FIG. 6 .
  • step 1036 may be:
  • the coordinates of the tissue cavity in the tissue image are determined according to the regression coordinates and the thermal coordinates.
  • the method may also include:
  • the distance between the regression coordinates and the thermal coordinates is greater than or equal to the distance threshold, according to the historical tissue image, determine the mirror entry direction of the endoscope, so as to control the endoscope to enter the mirror according to the mirror entry direction.
  • the distance between the regression coordinate and the thermal coordinate can be determined first, which can be understood as the difference between the two coordinates, and then the distance between the regression coordinate and the thermal coordinate can be compared with a preset distance threshold. If the distance between the regression coordinates and the thermal coordinates is less than the preset distance threshold, it indicates that the confidence of the regression coordinates and the thermal coordinates is high, then the coordinates of the tissue cavity in the tissue image can be determined according to the regression coordinates and the thermal coordinates .
  • the regression coordinates or thermal coordinates can be used as the coordinates of the tissue cavity in the tissue image, or the midpoint of the line connecting the regression coordinates and the thermal coordinates can be used as the coordinates of the tissue cavity in the tissue image, and the regression The coordinates and thermal coordinates are used as the coordinates of the tissue cavity in the tissue image, that is, the regression coordinates and thermal coordinates are output at the same time.
  • the tissue image can be discarded, and the progress of the endoscope can be determined based on the historical tissue images.
  • Mirror direction to control the endoscope to enter the mirror according to the direction of the mirror according to the direction of the mirror. Specifically, according to the position of the tissue cavity in the historical tissue image, the direction of entry of the endoscope can be determined, for example, the direction of entry of the endoscope is determined as facing the position of the tissue cavity in the historical tissue image, and then the direction of entry of the endoscope is determined .
  • Fig. 7 is a flow chart of another method for positioning a tissue cavity according to an exemplary embodiment.
  • the structure of the second positioning model may be as shown in Fig. Taking an image as an example, the second positioning model includes: multiple second encoders, a point regression sub-model and a heat map sub-model, and may also include a linear projection layer.
  • the second encoder can be the Encoder in the Vision Transformer, and the linear projection layer can be understood as a fully connected layer.
  • step 104 may include:
  • Step 1041 Perform preprocessing on the tissue image and historical tissue image, divide the preprocessed tissue image into multiple sub-images of equal size, and divide the preprocessed historical tissue image into sub-images of equal size and position corresponding to each sub-image Multiple historical subimages of .
  • Step 1042 taking the sub-image corresponding to the position and the historical sub-image as an image group.
  • the way of preprocessing and dividing the tissue image in step 1041 and the way of preprocessing and dividing the historical tissue image are the same as the way in step 1021, and will not be repeated here.
  • a plurality of historical sub-images having the same size as each sub-image and corresponding positions are obtained.
  • the size of the preprocessed tissue image and the historical tissue image are both 224*224, divided according to 16*16, to obtain 196 sub-images, and 196 historical sub-images, each historical sub-image corresponds to a sub-image,
  • the position of the historical sub-image on the historical tissue image is the same as the position of the corresponding sub-image on the tissue image.
  • each historical organization image can be divided into multiple historical sub-images, then each sub-image corresponds to one historical sub-image in each historical organization image among the multiple historical organization images
  • the sub-images corresponding to the positions and the history sub-images can be regarded as an image group.
  • the size of the preprocessed tissue images is 224*224
  • the size of the 5 preprocessed historical tissue images is 224*224, divided according to 16*16
  • 196 sub-images are obtained
  • 196 historical sub-images divided by each preprocessed historical tissue image (5*196 historical sub-images in total).
  • each sub-image corresponds to 5 historical sub-images
  • the sub-image and the corresponding 5 historical sub-images can be used as an image group.
  • Step 1043 Determine the token corresponding to the image group according to the image vector corresponding to each image group and the position vector corresponding to the image group, and the position vector is used to indicate the corresponding position of the image group.
  • each image group may be spliced according to channels first to obtain a spliced image. Then use the linear projection layer to flatten the stitched image, that is, flatten the stitched image into a one-dimensional vector, and then perform a linear transformation on the one-dimensional vector corresponding to the stitched image (which can be understood as going through a fully connected layer) to convert the stitched image Perform dimensionality reduction processing to obtain the image vector corresponding to the image group (which can be expressed as patch embedding), and the image vector can represent the image group.
  • a position vector (may be expressed as position embedding) for indicating the position corresponding to the image group in the preprocessed tissue image may also be generated, where the size of the position embedding is the same as that of the patch embedding.
  • the position embedding can be randomly generated, and the second encoder can learn the representation of the corresponding position of the image group in the tissue image.
  • a token corresponding to the image group (which can be expressed as a token) can be generated.
  • the token corresponding to each image group may be obtained by concatenating the image vector and the position vector of the image group.
  • a second positioning token may be randomly generated.
  • an image vector and a position vector may be randomly generated and concatenated to serve as the second positioning token.
  • Step 1044 input the token corresponding to each image group and the randomly generated second positioning token into each second encoder, so as to obtain the local encoding vector corresponding to each image group output by the second encoder, A global encoding vector corresponding to the total image group, which includes tissue images and historical tissue images.
  • the token corresponding to each image group and the second positioning token can be input into each second encoder, and each The second encoder can generate a local encoding vector corresponding to each image group according to the token corresponding to each image group, and can also generate a global encoding vector corresponding to the total image group according to the tokens corresponding to all image groups.
  • the local encoding vector can be understood as the vector learned by the second encoder and can represent the corresponding sub-image
  • the global encoding vector can be understood as the vector learned by the second encoder and can represent the entire total image group
  • the total image A group can be understood as a collection of tissue images and multiple historical tissue images, that is, the total image group is all images input into the second positioning model.
  • the structure of the second encoder is the same as that of the encoder shown in (b) in FIG. 4 , and will not be repeated here.
  • Step 1045 input the global encoding vector output by each second encoder into the point regression sub-model to obtain the regression coordinates output by the point regression sub-model.
  • Step 1046 Input the local encoding vectors output by each second encoder and corresponding to each image group into the heat map sub-model to obtain a heat map output by the heat map sub-model.
  • Step 1047 Determine the position of the tissue cavity in the tissue image according to the regression coordinates and the thermal coordinates, where the thermal coordinates are the coordinates of the point with the highest brightness in the thermal image.
  • the structure of the point regression sub-model and the heat map sub-model in the second positioning model is the same as the structure of the point regression sub-model and the heat map sub-model in the first positioning model, and the way to obtain the regression coordinates and thermal coordinates is also The same, and the method of determining the position of the tissue cavity in the tissue image according to the regression coordinates and the thermal coordinates is also the same, and will not be repeated here.
  • step 1047 may be:
  • the coordinates of the tissue cavity in the tissue image are determined according to the regression coordinates and the thermal coordinates.
  • the method may also include:
  • the distance between the regression coordinates and the thermal coordinates is greater than or equal to the distance threshold, according to the historical tissue image, determine the mirror entry direction of the endoscope, so as to control the endoscope to enter the mirror according to the mirror entry direction.
  • Fig. 9 is a flow chart showing a training classification model according to an exemplary embodiment. As shown in Fig. 9, the classification model is obtained by training in the following manner:
  • Step A obtaining a first sample input set and a first sample output set
  • the first sample input set includes: a plurality of first sample inputs, each first sample input includes a sample tissue image, the first sample The output set includes a first sample output corresponding to each first sample input, and each first sample output includes a true type of the corresponding sample tissue image.
  • step B the first sample input set is used as the input of the classification model, and the first sample output set is used as the output of the classification model to train the classification model.
  • the first sample input set includes a plurality of first sample inputs, and each first sample input may be a sample tissue image, and the sample tissue image may be, for example, a tissue image collected during an endoscopic examination before.
  • the first sample output set includes a first sample output corresponding to each first sample input, and each first sample output includes the true type of the corresponding sample tissue image, and the true type may include: the first type and The second type, the first type is used to indicate that there is a tissue cavity in the sample tissue image, the second type is used to indicate that there is no tissue cavity in the sample tissue image, and the real type can also include the third type, which is used to indicate the sample tissue image is too low quality.
  • the first sample input set can be used as the input of the classification model, and then the first sample output set can be used as the output of the classification model to train the classification model, so that when the first sample input set is input
  • the output of the classification model can match the output set of the first sample.
  • the cross-entropy loss can be determined according to the output of the classification model and the first sample output set, as the loss function of the classification model, with the goal of reducing the loss function, and the backpropagation algorithm is used to correct the neurons in the classification model Parameters
  • the parameters of the neuron may be, for example, the weight (English: Weight) and bias (English: Bias) of the neuron. Repeat the above steps until the loss function satisfies the preset condition, for example, the loss function is smaller than the preset loss threshold, so as to achieve the purpose of training the classification model.
  • the initial learning rate for training the classification model can be set to: 2e-4
  • Batch size can be set to: 64
  • the optimizer can be selected: Adam
  • Epoch can be set to: 100
  • the size of the sample tissue image can be set to: 224* 224.
  • the loss function of the classification model can be shown in Formula 1 (ie, the cross-entropy loss function):
  • L class represents the loss function of the classification model
  • Indicates the output of the classification model (which can be understood as the matching probability between the sample tissue image and the i-th type)
  • y i represents the matching probability between the real type of the sample tissue image and the i-th type
  • c represents the number of real types.
  • Fig. 10 is a flow chart of training a first positioning model according to an exemplary embodiment. As shown in Fig. 10, the first positioning model is obtained by training in the following manner:
  • Step C obtaining a second sample input set and a second sample output set
  • the second sample input set includes: a plurality of second sample inputs, each second sample input includes a sample tissue image, and the second sample output set includes each The second sample input corresponds to the second sample output, and each second sample output includes the real position of the tissue cavity in the corresponding sample tissue image.
  • step D the second sample input set is used as the input of the first positioning model, and the second sample output set is used as the output of the first positioning model, so as to train the first positioning model.
  • the loss of the first positioning model is determined according to the regression loss and the heat map loss
  • the regression loss is determined according to the output of the point regression sub-model and the second sample output set
  • the heat map loss is determined according to the output of the heat map sub-model and the second sample output Set OK.
  • the second sample input set includes a plurality of second sample inputs, and each second sample input may be a sample tissue image, and the sample tissue image may be, for example, a tissue image collected during an endoscopic examination.
  • the second sample output set includes a second sample output corresponding to each second sample input, and each second sample output includes the real position of the tissue cavity in the corresponding sample tissue image.
  • the second sample input set can be used as the input of the first positioning model, and then the second sample output set can be used as the output of the first positioning model to train the first positioning model, so that the input
  • the output of the first positioning model can be matched with the second sample output set.
  • the loss function of the first positioning model can be determined according to the output of the first positioning model and the second sample output set, and the parameters of the neurons in the first positioning model can be corrected by using the backpropagation algorithm with the goal of reducing the loss function , the parameters of the neuron can be, for example, the weight and bias of the neuron.
  • the above steps are repeated until the loss function satisfies a preset condition, for example, the loss function is smaller than a preset loss threshold, so as to achieve the purpose of training the first positioning model.
  • the output of the first positioning model includes the regression coordinates output by the point regression sub-model, and the heat map output by the heat map sub-model, and the regression coordinates can be compared with the second sample output set respectively. Comparing to determine the regression loss, comparing the thermal coordinates included in the heat map with the second sample output set to determine the heat map loss, and finally jointly determining the loss of the first positioning model according to the regression loss and the heat map loss.
  • the thermal coordinates are the coordinates of the point with the highest brightness in the thermal map.
  • the initial learning rate for training the first positioning model can be set to: 2e-4
  • Batch size can be set to: 64
  • the optimizer can be set to: Adam
  • Epoch can be set to: 100
  • the size of the sample tissue image can be set to: 224 ⁇ 224, correspondingly, the size of the heat map output by the heat map sub-model is: 224 ⁇ 224.
  • the regression loss can be determined by formula two:
  • L r represents the regression loss
  • x represents the coordinate value of the real position of the tissue cavity in the sample tissue image on the X-axis
  • y represents the coordinate value of the true position of the tissue cavity in the sample tissue image on the Y axis.
  • the second sample output is the real position of the tissue cavity in the corresponding sample tissue image, it is often the coordinates of a point, that is, the label of this point is 1, and the label of other points in the sample tissue image is 0, then in When training the first positioning model, the output of the second sample includes too little information, which may easily cause the heat map sub-model to output a heat map with all 0s, which further leads to a small loss of the heat map, making it impossible to train the first positioning model .
  • each second sample output in the second sample output set can be converted into a Gaussian map, and the Gaussian map can be used to determine the heat map loss.
  • the size of the Gaussian map is the same as the size of the sample tissue image.
  • Each pixel corresponds to a label. For example, if the position coordinates of the tissue cavity in the sample tissue image included in the second sample output are (x l , y l ), then the label corresponding to the point with the coordinates (x l , y l ) in the Gaussian map is 1, and the Gaussian
  • the labels corresponding to other points in the figure are values between (0, 1), which can be determined by formula three:
  • label x, y represents the label corresponding to the point whose coordinates are (x, y) in the Gaussian map
  • represents the hyperparameter
  • x l represents the coordinate of the position of the tissue cavity in the sample tissue image on the X axis
  • y l represents The coordinates of the position of the tissue cavity in the sample tissue image on the Y axis.
  • the second sample output is converted into a Gaussian map.
  • Each point in the Gaussian map has a non-zero label, which can effectively increase the amount of information contained in it. Therefore, determining the loss of the heat map according to the Gauss map can avoid the loss of the heat map. Small, problem of not being able to train the first localization model.
  • the heat map loss can be determined by Formula 4:
  • L h represents the loss of the heat map
  • H represents the height of the heat map
  • W represents the width of the heat map
  • ph w represent the output of the first positioning model (which can be understood as the point with coordinates (h, w) in the heat map Brightness)
  • q h, w represents the label corresponding to the point whose coordinates are (h, w) in the Gaussian map.
  • Equation 5 the loss of the first positioning model
  • L loc represents the loss of the first localization model
  • represents a weight parameter corresponding to the regression loss, which may be set to 1, for example.
  • Fig. 11 is a flow chart of training a second positioning model according to an exemplary embodiment. As shown in Fig. 11, the second positioning model is obtained by training in the following manner:
  • Step E obtaining a third sample input set and a third sample output set
  • the third sample input set includes: a plurality of third sample inputs, each third sample input includes a sample tissue image and a sample history tissue image, and the third sample output Concentrating includes a third sample output corresponding to each third sample input, and each third sample output includes the real position of the tissue cavity in the corresponding sample tissue image, and the sample historical tissue image is the endoscope before collecting the sample tissue image The captured image.
  • step F the third sample input set is used as the input of the second positioning model, and the third sample output set is used as the output of the second positioning model, so as to train the second positioning model.
  • the loss of the second positioning model is determined according to the regression loss and the heat map loss
  • the regression loss is determined according to the output of the point regression sub-model and the third sample output set
  • the heat map loss is determined according to the output of the heat map sub-model and the third sample output Set OK.
  • the third sample input set includes a plurality of third sample inputs, and each third sample input may be a sample tissue image and a sample history tissue image corresponding to the sample tissue image.
  • the sample tissue image can be, for example, a tissue image collected during endoscopic examination before
  • the sample historical tissue image is an image collected by the endoscope before collecting the sample tissue image
  • the sample historical tissue image can be one or more indivual.
  • the third sample output set includes a third sample output corresponding to each third sample input, and each third sample output includes the real position of the tissue cavity in the corresponding sample tissue image.
  • the third sample input set can be used as the input of the second positioning model, and then the third sample output set can be used as the output of the second positioning model to train the second positioning model, so that the input
  • the output of the second positioning model can be matched with the third sample output set.
  • the loss function of the second positioning model can be determined according to the output of the second positioning model and the third sample output set, and the parameters of the neurons in the second positioning model can be corrected by using the back propagation algorithm with the goal of reducing the loss function , the parameters of the neuron can be, for example, the weight and bias of the neuron.
  • the above steps are repeated until the loss function satisfies the preset condition, for example, the loss function is smaller than the preset loss threshold, so as to achieve the purpose of training the second positioning model.
  • the output of the second positioning model includes the regression coordinates output by the point regression sub-model and the heat map output by the heat map sub-model, and the regression coordinates can be compared with the third sample output set respectively. Comparing to determine the regression loss, comparing the thermal coordinates included in the heat map with the third sample output set to determine the heat map loss, and finally jointly determining the loss of the second positioning model based on the regression loss and the heat map loss.
  • the thermal coordinates are the coordinates of the point with the highest brightness in the thermal map.
  • the initial learning rate for training the second positioning model can be set to: 2e-4
  • Batch size can be set to: 64
  • the optimizer can be set to: Adam
  • Epoch can be set to: 100
  • the size of the sample tissue image can be set to: 224 ⁇ 224, correspondingly, the size of the heat map output by the heat map sub-model is: 224 ⁇ 224.
  • the method of determining the regression loss, the heat map loss and the loss of the second positioning model is the same as that of the first positioning model, and will not be repeated here.
  • the present disclosure first obtains the tissue image collected by the endoscope at the current moment, and then uses the classification model to classify the tissue image to determine the target type of the tissue image. If the target type indicates that there is a tissue cavity in the tissue image, determine the position of the tissue cavity in the tissue image according to the first positioning model and the tissue image; if the target type indicates that there is no tissue cavity in the tissue image, The position of the tissue cavity in the tissue image is determined according to the second positioning model, the tissue image, and the historical tissue image collected by the endoscope before the current moment. The disclosure first classifies tissue images, and selects different positioning methods to locate the positions of tissue cavities in the tissue images according to different classification results, which can improve the success rate and accuracy of positioning.
  • Fig. 12 is a block diagram of a device for positioning a tissue cavity according to an exemplary embodiment. As shown in Fig. 12, the device 200 may include:
  • the acquisition module 201 is configured to acquire tissue images collected by the endoscope at the current moment.
  • the classification module 202 is configured to classify the tissue image using a pre-trained classification model, so as to determine the target type of the tissue image.
  • the first positioning module 203 is configured to determine the position of the tissue cavity in the tissue image according to the pre-trained first positioning model and the tissue image if the target type indicates that there is a tissue cavity in the tissue image.
  • the second positioning module 204 is configured to determine the position of the tissue cavity in the tissue image according to the pre-trained second positioning model, the tissue image and the historical tissue image if the target type indicates that there is no tissue cavity in the tissue image. is the image collected by the endoscope before the current moment.
  • Fig. 13 is a block diagram of another tissue cavity positioning device according to an exemplary embodiment. As shown in Fig. 13, the device may further include:
  • the determining module 205 is configured to determine the direction of entry of the endoscope according to the historical tissue images if the quality of the target type indicates that the tissue image does not meet the preset conditions, so as to control the endoscope to enter the lens according to the direction of entry.
  • the determination module 205 is further configured to determine the direction of entry according to the position of the tissue cavity in the tissue image when the position of the tissue cavity in the tissue image is obtained, so as to control the endoscope to advance according to the direction of entry.
  • Fig. 14 is a block diagram of another tissue cavity positioning device according to an exemplary embodiment.
  • the classification model includes: an encoder and a classification layer, and the classification module 202 may include:
  • the first preprocessing sub-module 2021 is configured to preprocess the tissue image, and divide the preprocessed tissue image into multiple sub-images of equal size.
  • the first determination sub-module 2022 is configured to determine the token corresponding to the sub-image according to the image vector corresponding to each sub-image and the position vector corresponding to the sub-image, and the position vector is used to indicate the organization of the sub-image after preprocessing position in the image.
  • the first encoding sub-module 2023 is configured to input the token corresponding to each sub-image and the randomly generated classification token into the encoder to obtain a local encoding vector corresponding to each sub-image and a global encoding vector corresponding to the tissue image.
  • the classification sub-module 2024 is configured to input the global encoding vector and multiple local encoding vectors into the classification layer, so as to obtain the target type output by the classification layer.
  • Fig. 15 is a block diagram of another tissue cavity positioning device according to an exemplary embodiment.
  • the first positioning model includes: a plurality of first encoders, a point regression sub-model and a thermal map sub-model model
  • the first positioning module 203 may include:
  • the second preprocessing module 2031 is configured to preprocess the tissue image, and divide the preprocessed tissue image into multiple sub-images of equal size.
  • the second determination sub-module 2032 is used to determine the token corresponding to the sub-image according to the image vector corresponding to each sub-image and the position vector corresponding to the sub-image, and the position vector is used to indicate the organization of the sub-image after preprocessing position in the image.
  • the second encoding submodule 2033 is configured to input the token corresponding to each sub-image and the randomly generated first positioning token into each first encoder, so as to obtain the output of the first encoder, corresponding to each sub-image The local encoding vector, and the global encoding vector corresponding to the tissue image.
  • the first regression sub-module 2034 is configured to input the global encoding vector output by each first encoder into the point regression sub-model to obtain regression coordinates output by the point regression sub-model.
  • the first heat map sub-module 2035 is configured to input the local encoding vectors output by each first encoder and corresponding to each sub-image into the heat map sub-model, so as to obtain a heat map output by the heat map sub-model.
  • the first output sub-module 2036 is configured to determine the position of the tissue cavity in the tissue image according to the regression coordinates and the thermal coordinates, where the thermal coordinates are the coordinates of the point with the highest brightness in the thermal map.
  • the first output submodule 2036 can be used to:
  • the coordinates of the tissue cavity in the tissue image are determined according to the regression coordinates and the thermal coordinates.
  • the determination module 205 is also used to determine the direction of entry of the endoscope according to the historical tissue image if the distance between the regression coordinates and the thermal coordinates is greater than or equal to the distance threshold, so as to control the endoscope to move in accordance with the direction of entry. mirror.
  • Fig. 16 is a block diagram of another tissue cavity positioning device according to an exemplary embodiment.
  • the second positioning model includes: a plurality of second encoders, a point regression sub-model and a thermal map sub-model model, the second positioning module 204 may include:
  • the third preprocessing module 2041 is used to preprocess tissue images and historical tissue images, divide the preprocessed tissue images into multiple sub-images of equal size, and divide the preprocessed historical tissue images into sub-images corresponding to each sub-image. Multiple historical sub-images of equal image size and corresponding positions. The sub-image corresponding to the position and the historical sub-image are regarded as an image group.
  • the third determination sub-module 2042 is configured to determine the token corresponding to the image group according to the image vector corresponding to each image group and the position vector corresponding to the image group, and the position vector is used to indicate the corresponding position of the image group.
  • the third encoding sub-module 2043 is configured to input the token corresponding to each image group and the randomly generated second positioning token into each second encoder, so as to obtain the output of the second encoder, each image group The corresponding local encoding vector, and the global encoding vector corresponding to the total image group, which includes tissue images and historical tissue images.
  • the second regression sub-module 2044 is configured to input the global encoding vector output by each second encoder into the point regression sub-model to obtain regression coordinates output by the point regression sub-model.
  • the second heat map sub-module 2045 is configured to input the local encoding vectors output by each second encoder and corresponding to each image group into the heat map sub-model, so as to obtain a heat map output by the heat map sub-model.
  • the second output sub-module 2046 is configured to determine the position of the tissue cavity in the tissue image according to the regression coordinates and the thermal coordinates, where the thermal coordinates are the coordinates of the point with the highest brightness in the thermal map.
  • the second output submodule 2046 can be used for:
  • the coordinates of the tissue cavity in the tissue image are determined according to the regression coordinates and the thermal coordinates.
  • the determination module 205 is also used to determine the direction of entry of the endoscope according to the historical tissue image if the distance between the regression coordinates and the thermal coordinates is greater than or equal to the distance threshold, so as to control the endoscope to move in accordance with the direction of entry. mirror.
  • the classification model is trained by:
  • Step A obtaining a first sample input set and a first sample output set
  • the first sample input set includes: a plurality of first sample inputs, each first sample input includes a sample tissue image, the first sample The output set includes a first sample output corresponding to each first sample input, and each first sample output includes a true type of the corresponding sample tissue image.
  • step B the first sample input set is used as the input of the classification model, and the first sample output set is used as the output of the classification model to train the classification model.
  • the first positioning model is trained in the following manner:
  • Step C obtaining a second sample input set and a second sample output set
  • the second sample input set includes: a plurality of second sample inputs, each second sample input includes a sample tissue image, and the second sample output set includes each The second sample input corresponds to the second sample output, and each second sample output includes the real position of the tissue cavity in the corresponding sample tissue image.
  • Step D use the second sample input set as the input of the first positioning model, and use the second sample output set as the output of the first positioning model, so as to train the first positioning model.
  • the loss of the first positioning model is determined according to the regression loss and the heat map loss
  • the regression loss is determined according to the output of the point regression sub-model and the second sample output set
  • the heat map loss is determined according to the output of the heat map sub-model and the second sample output Set OK.
  • the second positioning model is obtained through training in the following manner:
  • Step E obtaining a third sample input set and a third sample output set
  • the third sample input set includes: a plurality of third sample inputs, each third sample input includes a sample tissue image and a sample history tissue image, the third sample output Concentrating includes a third sample output corresponding to each third sample input, and each third sample output includes the real position of the tissue cavity in the corresponding sample tissue image, and the sample historical tissue image is the endoscope before collecting the sample tissue image The captured image.
  • step F the third sample input set is used as the input of the second positioning model, and the third sample output set is used as the output of the second positioning model, so as to train the second positioning model.
  • the loss of the second positioning model is determined according to the regression loss and the heat map loss
  • the regression loss is determined according to the output of the point regression sub-model and the third sample output set
  • the heat map loss is determined according to the output of the heat map sub-model and the third sample output Set OK.
  • the present disclosure first obtains the tissue image collected by the endoscope at the current moment, and then uses the classification model to classify the tissue image to determine the target type of the tissue image. If the target type indicates that there is a tissue cavity in the tissue image, determine the position of the tissue cavity in the tissue image according to the first positioning model and the tissue image; if the target type indicates that there is no tissue cavity in the tissue image, The position of the tissue cavity in the tissue image is determined according to the second positioning model, the tissue image, and the historical tissue image collected by the endoscope before the current moment. The disclosure first classifies tissue images, and selects different positioning methods to locate the positions of tissue cavities in the tissue images according to different classification results, which can improve the success rate and accuracy of positioning.
  • FIG. 17 it shows a schematic structural diagram of an electronic device (for example, the execution subject in the above embodiments, which may be a terminal device or a server) 300 suitable for implementing the embodiments of the present disclosure.
  • the terminal equipment in the embodiment of the present disclosure may include but not limited to such as mobile phone, notebook computer, digital broadcast receiver, PDA (personal digital assistant), PAD (tablet computer), PMP (portable multimedia player), vehicle terminal (such as mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers and the like.
  • the electronic device shown in FIG. 17 is only an example, and should not limit the functions and application scope of the embodiments of the present disclosure.
  • an electronic device 300 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 301, which may be randomly accessed according to a program stored in a read-only memory (ROM) 302 or loaded from a storage device 308.
  • a processing device such as a central processing unit, a graphics processing unit, etc.
  • RAM random access memory
  • Various appropriate actions and processes are executed by programs in the memory (RAM) 303 .
  • RAM 303 various programs and data necessary for the operation of the electronic device 300 are also stored.
  • the processing device 301, ROM 302, and RAM 303 are connected to each other through a bus 304.
  • An input/output (I/O) interface 305 is also connected to the bus 304 .
  • the following devices can be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibrating an output device 307 such as a computer; a storage device 308 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 309.
  • the communication means 309 may allow the electronic device 300 to perform wireless or wired communication with other devices to exchange data. While FIG. 17 shows electronic device 300 having various means, it should be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
  • the processes described above with reference to the flowcharts can be implemented as computer software programs.
  • the embodiments of the present disclosure include a computer program product including a computer program carried on a non-transitory computer readable medium, the computer program including program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from a network via communication means 309, or from storage means 308, or from ROM 302.
  • the processing device 301 When the computer program is executed by the processing device 301, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are performed.
  • the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two.
  • a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
  • the terminal device and the server can communicate with any currently known or future developed network protocols such as HTTP (Hyper Text Transfer Protocol), and can communicate with any form or medium of digital Data communication (eg, communication network) interconnections.
  • network protocols such as HTTP (Hyper Text Transfer Protocol)
  • Examples of communication networks include local area networks ("LANs”), wide area networks (“WANs”), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., adhoc peer-to-peer networks), as well as any currently known or future developed network.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: obtains the tissue image collected by the endoscope at the current moment; utilizes the pre-trained classification model to classify Classifying the tissue image to determine the target type of the tissue image; if the target type indicates that there is a tissue cavity in the tissue image, according to the pre-trained first positioning model and the tissue image, determine the The position of the tissue cavity in the tissue image; if the target type indicates that there is no tissue cavity in the tissue image, according to the pre-trained second positioning model, the tissue image and the historical tissue image, determine The position of the tissue cavity, the historical tissue image is an image collected by the endoscope before the current moment.
  • Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as "C" or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (for example, using an Internet service provider to connected via the Internet).
  • LAN local area network
  • WAN wide area network
  • Internet service provider for example, using an Internet service provider to connected via the Internet.
  • each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
  • modules involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of the module does not constitute a limitation of the module itself under certain circumstances, for example, the obtaining module may also be described as a "module for obtaining tissue images".
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs System on Chips
  • CPLD Complex Programmable Logical device
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • Example 1 provides a method for locating a tissue cavity, including: acquiring a tissue image collected by an endoscope at the current moment; and classifying the tissue image using a pre-trained classification model , to determine the target type of the tissue image; if the target type indicates that there is a tissue cavity in the tissue image, determine the tissue cavity in the tissue image according to the pre-trained first positioning model and the tissue image If the target type indicates that there is no tissue cavity in the tissue image, determine the position of the tissue cavity in the tissue image according to the pre-trained second positioning model, the tissue image and the historical tissue image, The historical tissue images are images collected by the endoscope before the current moment.
  • Example 2 provides the method of Example 1, the method further includes: if the target type indicates that the quality of the tissue image does not meet a preset condition, organizing the image according to the history , to determine the mirroring direction of the endoscope, so as to control the mirroring direction of the endoscope according to the mirroring direction; in the case of obtaining the position of the tissue cavity in the tissue image, according to the The position of the tissue cavity is used to determine the mirroring direction, so as to control the endoscope to go into the mirror according to the mirroring direction.
  • Example 3 provides the method of Example 1 or Example 2, the classification model includes: an encoder and a classification layer, and the pre-trained classification model is used to classify the tissue image , to determine the target type of the tissue image, comprising: performing preprocessing on the tissue image, and dividing the preprocessed tissue image into a plurality of sub-images of equal size; according to each of the sub-images corresponding The image vector and the position vector corresponding to the sub-image determine the token corresponding to the sub-image, and the position vector is used to indicate the position of the sub-image in the preprocessed tissue image; each of the sub-images The token corresponding to the image, and the randomly generated classification token are input into the encoder to obtain the local encoding vector corresponding to each sub-image, and the global encoding vector corresponding to the tissue image; the global encoding vector and multiple The local encoding vector is input to a classification layer to obtain the object type output by the classification layer.
  • the pre-trained classification model is used to classify the tissue
  • Example 4 provides the method of Example 1 or Example 2, the first positioning model includes: a plurality of first encoders, a point regression sub-model and a heat map sub-model, the According to the pre-trained first positioning model and the tissue image, determining the position of the tissue cavity in the tissue image includes: performing preprocessing on the tissue image, and dividing the preprocessed tissue image into sizes A plurality of equal sub-images; according to the image vector corresponding to each of the sub-images, and the position vector corresponding to the sub-image, determine the token corresponding to the sub-image, and the position vector is used to indicate that the sub-image is preprocessed
  • the location in the tissue image; the token corresponding to each of the sub-images, and the randomly generated first positioning token are input into each of the first encoders to obtain the first encoder output , the local encoding vector corresponding to each sub-image, and the global encoding vector corresponding to the tissue image; the global encoding vector output by
  • Example 5 provides the method of Example 4, the determining the position of the tissue cavity in the tissue image according to the regression coordinates and the thermal coordinates includes: if the regression coordinates The distance between the thermal coordinates and the thermal coordinates is less than a preset distance threshold, and according to the regression coordinates and the thermal coordinates, determine the coordinates of the tissue cavity in the tissue image; the method further includes: if the regression coordinates and the The distance between the thermal coordinates is greater than or equal to the distance threshold, and according to the historical tissue image, determine the mirror-entry direction of the endoscope, so as to control the endoscope to proceed according to the mirror-entry direction.
  • Example 6 provides the method of Example 1 or Example 2, the second positioning model includes: a plurality of second encoders, a point regression sub-model and a heat map sub-model, the According to the pre-trained second positioning model, the tissue image and the historical tissue image, determining the position of the tissue cavity in the tissue image includes: performing preprocessing on the tissue image and the historical tissue image, and preprocessing The processed tissue image is divided into a plurality of sub-images of equal size, and the pre-processed historical tissue image is divided into a plurality of historical sub-images of equal size and position corresponding to each of the sub-images; The sub-image and the historical sub-image are used as an image group; according to the image vector corresponding to each image group and the position vector corresponding to the image group, determine the token corresponding to the image group, and the position vector It is used to indicate the corresponding position of the image group; the token corresponding to each image group and the randomly generated second positioning token are input into each of the
  • Example 7 provides the method of Example 1, the classification model is obtained by training in the following manner: obtaining a first sample input set and a first sample output set, the first A sample input set includes: a plurality of first sample inputs, each of the first sample inputs includes a sample tissue image, and the first sample output set includes a corresponding to each of the first sample inputs The first sample output, each of the first sample outputs includes the true type of the corresponding sample tissue image; the first sample input set is used as the input of the classification model, and the first sample This output set is used as the output of the classification model to train the classification model.
  • Example 8 provides the method of Example 4, the first positioning model is obtained by training in the following manner: acquiring a second sample input set and a second sample output set, the first The two-sample input set includes: a plurality of second sample inputs, each of the second sample inputs includes a sample tissue image, and the second sample output set includes a second sample output corresponding to each of the second sample inputs, Each of the second sample outputs includes the real position of the tissue cavity in the corresponding sample tissue image; the second sample input set is used as the input of the first positioning model, and the second sample output set is As the output of the first positioning model to train the first positioning model; the loss of the first positioning model is determined according to the regression loss and the heat map loss, and the regression loss is determined according to the output of the point regression sub-model determined from the second sample output set, the heat map loss is determined according to the output of the heat map sub-model and the second sample output set.
  • Example 9 provides the method of Example 6, the second positioning model is obtained by training in the following manner: acquiring a third sample input set and a third sample output set, the first The three-sample input set includes: a plurality of third sample inputs, each of the third sample inputs includes a sample tissue image and a sample history tissue image, and the third sample output set includes a corresponding to each of the third sample inputs The third sample output, each of the third sample outputs includes the real position of the tissue cavity in the corresponding sample tissue image, and the sample historical tissue image is acquired by the endoscope before acquiring the sample tissue image
  • the image of the image; the third sample input set is used as the input of the second positioning model, and the third sample output set is used as the output of the second positioning model, so as to train the second positioning model;
  • the loss of the second positioning model is determined according to the regression loss and the heat map loss, the regression loss is determined according to the output of the point regression sub-model and the third sample output set, and the heat map loss
  • Example 10 provides a device for locating a tissue cavity, including: an acquisition module, used to acquire tissue images collected by an endoscope at the current moment; a classification module, used to use pre-trained The classification model of the tissue image is used to classify the tissue image to determine the target type of the tissue image; the first positioning module is configured to, if the target type indicates that there is a tissue cavity in the tissue image, according to the pre-trained first positioning the model and the tissue image, and determining the position of the tissue cavity in the tissue image; a second positioning module, configured to, if the target type indicates that there is no tissue cavity in the tissue image, according to the pre-trained second Locating the model, the tissue image and the historical tissue image, determining the position of the tissue cavity in the tissue image, the historical tissue image is an image collected by the endoscope before the current moment.
  • an acquisition module used to acquire tissue images collected by an endoscope at the current moment
  • a classification module used to use pre-trained The classification model of the tissue image is used to classify the
  • Example 11 provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processing device, the steps of the methods described in Example 1 to Example 9 are implemented.
  • Example 12 provides an electronic device, including: a storage device, on which a computer program is stored; a processing device, configured to execute the computer program in the storage device, to Implement the steps of the method described in Example 1 to Example 9.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Endoscopes (AREA)
  • Image Analysis (AREA)

Abstract

本公开涉及一种组织腔体的定位方法、装置、可读介质和电子设备,涉及图像处理技术领域,该方法包括:获取当前时刻内窥镜采集的组织图像,利用预先训练的分类模型对组织图像进行分类,以确定组织图像的目标类型,若目标类型指示组织图像中存在组织腔体,根据预先训练的第一定位模型和组织图像,确定组织图像中组织腔体的位置,若目标类型指示组织图像中不存在组织腔体,根据预先训练的第二定位模型、组织图像和历史组织图像,确定组织图像中组织腔体的位置,历史组织图像为当前时刻之前内窥镜采集的图像。本公开首先对组织图像进行分类,并根据不同的分类结果,选择不同的定位方式来定位组织图像中组织腔体的位置,提高了定位的成功率和准确度。

Description

组织腔体的定位方法、装置、可读介质和电子设备
相关申请的交叉引用
本申请基于申请号为202111040354.5、申请日为2021年09月06日,名称为“组织腔体的定位方法、装置、可读介质和电子设备”的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本公开涉及图像处理技术领域,具体地,涉及一种组织腔体的定位方法、装置、可读介质和电子设备。
背景技术
内窥镜上设置有光学镜头、图像传感器、光源等组件,能够进入人体内部进行检查,使得医生能够直观地观察到人体内部的情况,在医疗领域得到了广泛应用。由于内窥镜检查是一种侵入式的检查手段,在内窥镜进镜的过程中,可能由于操作人员的经验不足、视角不好等原因,导致进镜过慢甚至进镜失败,还可能出现损伤组织粘膜,造成用户疼痛等问题。因此,需要在内窥镜采集的组织图像中,准确识别出组织腔体的位置,才能够保证内窥镜安全、有效地进镜。内窥镜例如可以为肠镜、胃镜等。针对肠镜来说,需要在肠镜采集的肠道图像中,识别出肠腔的位置,针对胃镜来说,需要在胃镜采集的食道图像或者胃部图像中,识别出食管腔体或者胃内腔体的位置。
通常情况下,可以通过对组织图像进行图像分割、位置估计等处理,来确定组织腔体的位置。然而,由于组织内的环境复杂多变、内窥镜与组织壁过近等原因,可能出现组织图像中不存在组织腔体的情况,导致识别的准确度低,甚至无法识别的问题。
发明内容
提供该发明内容部分以便以简要的形式介绍构思,这些构思将在后面的具体实施方式部分被详细描述。该发明内容部分并不旨在标识要求保护的技术方案的关键特征或必要特征,也不旨在用于限制所要求的保护的技术方案的范围。
第一方面,本公开提供一种组织腔体的定位方法,所述方法包括:
获取当前时刻内窥镜采集的组织图像;
利用预先训练的分类模型对所述组织图像进行分类,以确定所述组织图像的目标类型;
若所述目标类型指示所述组织图像中存在组织腔体,根据预先训练的第一定位模型和所述组织图像,确定所述组织图像中组织腔体的位置;
若所述目标类型指示所述组织图像中不存在组织腔体,根据预先训练的第二定位模型、所述组织图像和历史组织图像,确定所述组织图像中组织腔体的位置,所述历史组织图像为当前时刻之前所述内窥镜采集的图像。
第二方面,本公开提供一种组织腔体的定位装置,所述装置包括:
获取模块,用于获取当前时刻内窥镜采集的组织图像;
分类模块,用于利用预先训练的分类模型对所述组织图像进行分类,以确定所述组织图像的目标类型;
第一定位模块,用于若所述目标类型指示所述组织图像中存在组织腔体,根据预先训练的第一定位模型和所述组织图像,确定所述组织图像中组织腔体的位置;
第二定位模块,用于若所述目标类型指示所述组织图像中不存在组织腔体,根据预先训练的第 二定位模型、所述组织图像和历史组织图像,确定所述组织图像中组织腔体的位置,所述历史组织图像为当前时刻之前所述内窥镜采集的图像。
第三方面,本公开提供一种计算机可读介质,其上存储有计算机程序,该程序被处理装置执行时实现本公开第一方面所述方法的步骤。
第四方面,本公开提供一种电子设备,包括:
存储装置,其上存储有计算机程序;
处理装置,用于执行所述存储装置中的所述计算机程序,以实现本公开第一方面所述方法的步骤。
通过上述技术方案,本公开首先获取当前时刻内窥镜采集的组织图像,之后利用分类模型对组织图像进行分类,以确定组织图像的目标类型。在目标类型指示组织图像中存在组织腔体的情况下,根据第一定位模型和组织图像,确定组织图像中组织腔体的位置,在目标类型指示组织图像中不存在组织腔体的情况下,根据第二定位模型、组织图像和当前时刻之前内窥镜采集的历史组织图像,确定组织图像中组织腔体的位置。本公开首先对组织图像进行分类,并根据不同的分类结果,选择不同的定位方式来定位组织图像中组织腔体的位置,能够提高定位的成功率和准确度。
本公开的其他特征和优点将在随后的具体实施方式部分予以详细说明。
附图说明
结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。在附图中:
图1是根据一示例性实施例示出的一种组织腔体的定位方法的流程图;
图2是根据一示例性实施例示出的另一种组织腔体的定位方法的流程图;
图3是根据一示例性实施例示出的另一种组织腔体的定位方法的流程图;
图4是根据一示例性实施例示出的一种分类模型的示意图;
图5是根据一示例性实施例示出的另一种组织腔体的定位方法的流程图;
图6是根据一示例性实施例示出的一种第一定位模型的示意图;
图7是根据一示例性实施例示出的另一种组织腔体的定位方法的流程图;
图8是根据一示例性实施例示出的一种第二定位模型的示意图;
图9是根据一示例性实施例示出的一种训练分类模型的流程图;
图10是根据一示例性实施例示出的一种训练第一定位模型的流程图;
图11是根据一示例性实施例示出的一种训练第二定位模型的流程图;
图12是根据一示例性实施例示出的一种组织腔体的定位装置的框图;
图13是根据一示例性实施例示出的另一种组织腔体的定位装置的框图;
图14是根据一示例性实施例示出的另一种组织腔体的定位装置的框图;
图15是根据一示例性实施例示出的另一种组织腔体的定位装置的框图;
图16是根据一示例性实施例示出的另一种组织腔体的定位装置的框图;
图17是根据一示例性实施例示出的一种电子设备的框图。
具体实施方式
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
图1是根据一示例性实施例示出的一种组织腔体的定位方法的流程图,如图1所示,该方法包括以下步骤:
步骤101,获取当前时刻内窥镜采集的组织图像。
步骤102,利用预先训练的分类模型对组织图像进行分类,以确定组织图像的目标类型。
举例来说,在进行内窥镜检查时,内窥镜会伸入组织内,并按照预设的采集周期不断地采集组织内的图像。在当前时刻,可以将内窥镜采集的组织图像,输入预先训练的分类模型,以使分类模型对组织图像进行分类,分类模型输出的即为组织图像的目标类型。目标类型可以包括:第一类型和第二类型,第一类型用于指示组织图像中存在组织腔体,第二类型用于指示组织图像中不存在组织腔体,目标类型还可以包括第三类型,用于指示组织图像的质量过低。其中,分类模型用于识别输入的图像的类型,可以根据预先采集的大量训练图像,和每个训练图像对应的类型标签,对分类模型进行训练,分类标签用于指示该训练图像的真实类型。分类模型例如可以是CNN(英文:Convolutional Neural Networks,中文:卷积神经网络)或者LSTM(英文:Long Short-Term Memory,中文:长短期记忆网络),也可以是Transformer(例如Vision Transformer)中的Encoder等,本公开对此不作具体限定。
步骤103,若目标类型指示组织图像中存在组织腔体,根据预先训练的第一定位模型和组织图像,确定组织图像中组织腔体的位置。
步骤104,若目标类型指示组织图像中不存在组织腔体,根据预先训练的第二定位模型、组织图像和历史组织图像,确定组织图像中组织腔体的位置,历史组织图像为当前时刻之前内窥镜采集的图像。
示例的,在目标类型指示组织图像中存在组织腔体的情况下,即目标类型为第一类型,那么可以将组织图像输入预先训练的第一定位模型,由第一定位模型对组织图像中的组织腔体进行定位,第 一定位模型输出的即为组织图像中组织腔体的位置,可以理解为组织图像中组织腔体的坐标。其中,第一定位类模型用于定位输入的图像中组织腔体的位置,可以根据预先采集的大量训练图像,和每个训练图像中组织腔体的真实位置,对第一定位模型进行训练。第一定位模型例如可以是CNN或者LSTM,也可以是Transformer(例如Vision Transformer)中的Encoder等,本公开对此不作具体限定。
在目标类型指示组织图像中不存在组织腔体的情况下,即目标类型为第二类型,那么可以将组织图像和在当前时刻之前内窥镜采集的历史组织图像,一起输入预先训练的第二定位模型,由第二定位模型对组织图像中的组织腔体进行定位,第二定位模型输出的即为组织图像中组织腔体的位置,可以理解为组织图像中组织腔体的坐标。其中,第二定位类模型用于定位输入的一组图像中组织腔体的位置,可以预先采集的大量训练图像,并将训练图像按照采集时刻分为多个训练图像组,可以根据每个训练图像组和每个训练图像组中组织腔体的真实位置,对第二定位模型进行训练。第二定位模型例如可以是CNN或者LSTM,也可以是Transformer(例如Vision Transformer)中的Encoder等,本公开对此不作具体限定。需要说明的是,历史组织图像可以是一帧图像,也可以是多帧图像,可以按照预设的帧数来确定历史组织图像,也可以按照预设的时间窗口来确定历史组织图像。例如,可以将内窥镜在当前时刻之前连续采集的4帧图像作为历史组织图像,也可以将内窥镜在当前时刻之前5s内连续采集的图像作为历史组织图像(若采集周期为1s,那么历史组织图像包括5帧图像)。这样,在对组织图像进行定位之前,首先对组织图像进行分类,以区分组织图像中是否存在组织腔体,若存在,直接利用组织图像来定位组织腔体的位置,若不存在,可以将组织图像和之前采集的历史组织图像结合起来,一同用于定位组织腔体的位置,能够避免由于组织图像中不存在组织腔体,导致的定位不准,甚至无法定位的问题,能够提高定位的成功率和准确度。
需要说明的是,本公开实施例中所述的内窥镜,例如可以是肠镜、胃镜,若内窥镜为肠镜,那么上述组织图像即为肠道图像,组织腔体即为肠腔。若内窥镜为胃镜,那么上述组织图像可以为食道图像、胃部图像或者十二指肠图像,相应的,组织腔体可以为食管腔体、胃内腔体、十二指肠腔。本公开中,内窥镜还可以用于采集其他具有腔体的组织的图像,以定位腔体在组织中的位置,本公开对此不作具体限定。
综上所述,本公开首先获取当前时刻内窥镜采集的组织图像,之后利用分类模型对组织图像进行分类,以确定组织图像的目标类型。在目标类型指示组织图像中存在组织腔体的情况下,根据第一定位模型和组织图像,确定组织图像中组织腔体的位置,在目标类型指示组织图像中不存在组织腔体的情况下,根据第二定位模型、组织图像和当前时刻之前内窥镜采集的历史组织图像,确定组织图像中组织腔体的位置。本公开首先对组织图像进行分类,并根据不同的分类结果,选择不同的定位方式来定位组织图像中组织腔体的位置,能够提高定位的成功率和准确度。
图2是根据一示例性实施例示出的另一种组织腔体的定位方法的流程图,如图2所示,该方法还可以包括:
步骤105,若目标类型指示组织图像的质量不满足预设条件,根据历史组织图像,确定内窥镜的进镜方向,以控制内窥镜按照进镜方向进镜。
步骤106,在获得组织图像中组织腔体的位置的情况下,根据组织图像中组织腔体的位置,确定进镜方向,以控制内窥镜按照进镜方向进镜。
示例的,在目标类型指示组织图像的质量不满足预设条件的情况下,即目标类型为第三类型, 表示组织图像的质量较差,其中包含的有效信息过少,无法根据组织图像定位组织腔体的位置。因此,可以放弃该组织图像,并根据历史组织图像,确定内窥镜的进镜方向,从而控制内窥镜按照进镜方向进镜。具体的,可以根据历史组织图像中组织腔体的位置,确定内窥镜的进镜方向,例如,将进镜方向确定为朝向历史组织图像中组织腔体的位置,然后按照进镜方向进镜。当内窥镜为肠镜,组织图像为肠道图像时,预设条件可以包括以下至少一种:采集肠道图像时肠镜未被遮挡、采集肠道图像时肠镜与肠壁的距离大于预设的阈值、肠道图像的曝光度小于预设的曝光度阈值、肠道图像的模糊度小于预设的模糊度阈值、肠道图像中的肠道未发生粘连等。例如,若肠道中有污水遮挡,或者肠镜离肠壁过近、肠道图像过曝、肠道图像过于模糊、肠道出现粘连等,那么肠道图像的质量不满足预设条件。
在通过步骤103或者步骤104获取到组织图像中组织腔体的位置的情况下,可以根据组织图像中组织腔体的位置,确定进镜方向,以控制内窥镜按照进镜方向进镜。具体的,可以将进镜方向确定为朝向组织图像中组织腔体的位置,然后按照进镜方向进镜。进一步的,还可以根据是否获得组织图像中组织腔体的位置,来控制内窥镜的进镜距离。例如,可以将进镜距离按照由短到长分为三个等级,第一等级的进镜距离最短,第三等级的进镜距离最长。若目标类型指示组织图像的质量不满足预设条件,进镜距离可以为第一等级,若根据步骤104获取到组织图像中组织腔体的位置,那么进镜距离可以为第二等级,若根据步骤103获取到组织图像中组织腔体的位置,那么进镜距离可以为第三等级。
图3是根据一示例性实施例示出的另一种组织腔体的定位方法的流程图,如图3所示,分类模型的结构可以如图4中的(a)所示,其中以组织图像为肠道图像来举例,分类模型可以包括:编码器和分类层,还可以包括线性投射层。其中,编码器可以为Vision Transformer中的Encoder,分类层可以为MLP(英文:Multilayer Perceptron Head),线性投射层(英文:Linear Projection)可以理解为一个全连接层。
步骤102的实现方式可以包括:
步骤1021,对组织图像进行预处理,并将预处理后的组织图像划分为大小相等的多个子图像。
举例来说,首先对组织图像进行预处理,以对组织图像中包括的数据进行增强处理,预处理可以包括:随机仿射变换,随机亮度、对比度、饱和度、色度调整,尺寸变换(英文:Resize)等处理,最后得到的预处理后的组织图像可以是指定尺寸(例如可以是224*224)的图像。之后,可以将预处理后的组织图像按照指定大小划分为大小相等的多个子图像(可以表示为patch),例如,预处理后的组织图像为224*224,指定大小为16*16,那么可以将预处理后的组织图像划分为196个子图像。
步骤1022,根据每个子图像对应的图像向量,和该子图像对应的位置向量,确定该子图像对应的令牌,位置向量用于指示该子图像在预处理后的组织图像中的位置。
示例的,可以利用线性投射层先将每个子图像进行展平处理,即将该子图像展平成一维向量,之后再将该子图像对应的一维向量做线性变换(可以理解为经过全连接层),以对每个子图像进行降维处理,得到该子图像对应的图像向量(可以表示为patch embedding),图像向量能够表征该子图像。图4中的(a)中,以9个子图像为例,Linear Projection输出的9个块即为图像向量。进一步的,还可以生成用于指示该子图像在预处理后的组织图像中的位置的位置向量(可以表示为position embedding),其中,position embedding的大小与patch embedding的大小相同。图4中的(a)中,以9个子图像为例,数字1-9标识的9个块即为各个子图像对应的position embedding。需要说明的是,position embedding可以是随机生成的,编码器能够学习到对应的子图像在组织图像的位置的表征。之 后,可以根据每个子图像的图像向量和位置向量,生成该子图像对应的令牌(可以表示为token)。具体的,每个子图像对应的令牌,可以是将该子图像的图像向量和位置向量进行拼接(可以理解为concat)得到的。
进一步的,在得到每个子图像对应的令牌之后,还可以随机生成一个分类令牌。例如,可以随机生成一个图像向量(即符号“#”标识的块)和一个位置向量(即数字0标识的块),并进行拼接,以作为分类令牌。
步骤1023,将每个子图像对应的令牌,和随机生成的分类令牌输入编码器,以得到每个子图像对应的局部编码向量,和组织图像对应的全局编码向量。
步骤1024,将全局编码向量和多个局部编码向量输入分类层,以得到分类层输出的目标类型。
示例的,可以将每个子图像对应的令牌,和分类令牌输入编码器,编码器能够根据每个子图像对应的令牌,生成每个子图像对应的局部编码向量,同时,还能够根据全部的子图像对应的令牌,生成组织图像对应的全局编码向量。其中,局部编码向量可以理解为编码器学习到的,能够表征对应的子图像的向量,全局编码向量可以理解为编码器学习到的,能够表征整个组织图像的向量。需要说明的是,分类模型中可以包括多个编码器,可以将每个子图像对应的令牌,和分类令牌输入每个编码器,该编码器输出每个子图像对应的局部编码向量和组织图像对应的全局编码向量。具体的,每个编码器的结构可以如图4中的(b)所示,将patch embedding和position embedding进行拼接,输入编码器,编码器中包括:Multi-Head Attention(多头注意力)、Norm&Add和Position-wise FFN结构,其中,Multi-Head Attention能够将patch embedding+position embedding(即token)拆分成h个组的token,然后分别输入到h个attention结构,对得到的结果进行concat,并使用Norm&Add进行归一化。由于编码器中还加入了残差结构,所以Norm&Add处理后的结果与处理前的结果相加,之后进入Position-wise FFN,得到输出结果再进行Norm&Add,并与残差数据相加,得到编码器的输出,即全局编码向量和多个局部编码向量。
最后,将全局编码向量和多个局部编码向量输入分类层,分类层的输出即为目标类型。在有多个编码器的场景中,可以将每个编码器输出的全局编码向量和多个局部编码向量,都输入分类层,由分类层输出目标类型。具体的,可以将全局编码向量和多个局部编码向量进行拼接,得到一个综合编码向量,然后将综合编码向量输入分类层,分类层能够根据综合编码向量,确定组织图像分别与多种类型的匹配概率,最后将匹配概率最大的类型作为目标类型。由于分类层的输入既包括全局编码向量,又包括各个局部编码向量,整合了整个组织图像和各个子图像的特征,即考虑了全局的信息和局部的信息,能够有效提高分类模型的分类准确度。
图5是根据一示例性实施例示出的另一种组织腔体的定位方法的流程图,如图5所示,第一定位模型的结构可以如图6中的(a)所示,其中以组织图像为肠道图像来举例,第一定位模型包括:多个第一编码器、点回归子模型和热力图子模型,还可以包括线性投射层。其中,第一编码器可以为Vision Transformer中的Encoder,线性投射层可以理解为一个全连接层。步骤103的实现方式可以包括:
步骤1031,对组织图像进行预处理,并将预处理后的组织图像划分为大小相等的多个子图像。
步骤1032,根据每个子图像对应的图像向量,和该子图像对应的位置向量,确定该子图像对应的令牌,位置向量用于指示该子图像在预处理后的组织图像中的位置。
步骤1033,将每个子图像对应的令牌,和随机生成的第一定位令牌输入每个第一编码器,以得到该第一编码器输出的,每个子图像对应的局部编码向量,和组织图像对应的全局编码向量。
示例的,步骤1031中对组织图像进行预处理、划分的方式与步骤1021中的方式相同,同样的,步骤1032中确定每个子图像对应的令牌的方式,与步骤1022中的方式相同,此处不再赘述。同样的,在得到每个子图像对应的令牌之后,还可以随机生成一个第一定位令牌。例如,可以随机生成一个图像向量和一个位置向量,并进行拼接,以作为第一定位令牌。
在得到每个子图像对应的令牌,和第一定位令牌之后,可以将每个子图像对应的令牌,和第一定位令牌输入每个第一编码器,每个第一编码器能够根据每个子图像对应的令牌,生成每个子图像对应的局部编码向量,同时,还能够根据全部子图像对应的令牌,生成组织图像对应的全局编码向量。其中,局部编码向量可以理解为第一编码器学习到的,能够表征对应的子图像的向量,全局编码向量可以理解为第一编码器学习到的,能够表征整个组织图像的向量。其中,第一编码器的结构与图4中的(b)所示出的编码器的结构相同,此处不再赘述。
步骤1034,将每个第一编码器输出的全局编码向量输入点回归子模型,以得到点回归子模型输出的回归坐标。
示例的,可以将每个第一编码器输出的全局编码向量输入点回归子模型,从而得到点回归子模型输出的回归坐标。具体的,点回归子模型的结构可以如图6中的(a)所示,其中,全局编码向量表示为regression token,多个regression token分别通过MLP Head得到一组x坐标(即x list)和一组y坐标(y list),然后再分别将x list输入一个MLP,得到一个x坐标,将y list输入一个MLP,得到一个y坐标,最后将一个x坐标和一个y坐标组合为回归坐标(即regression),回归坐标可以理解为点回归子模型确定的,组织腔体在组织图像中的位置坐标。
步骤1035,将每个第一编码器输出的,输入热力图子模型,以得到热力图子模型输出的热力图。
示例的,可以将每个第一编码器输出的,每个子图像对应的局部编码向量输入热力图子模型,从而得到热力图子模型输出的热力图(英文:Heatmap)。具体的,热力图子模型的结构可以如图6中的(a)所示,其中,局部编码向量表示为image token。首先,通过reshape操作将每个第一编码器输出的每个子图像对应的,二维的image token变换到三维空间。例如,组织图像的大小为224*224,子图像的大小为16*16,组织图像被划分为14*14个子图像,reshape操作将每个子图像对应的image token变换到三维空间后的向量的尺寸为16*16*3。再通过Linear projection操作将16*16*3的向量编码到512维,得到196个512维的向量。之后使用reshape操作将196个512维的向量变成三维向量,即(14,14,512),进一步使用1*1的卷积变换特征的channel,将(14,14,512)维的向量变换为(14,14,1)维的向量。然后经过双线性上采样(即Upsample)得到每个第一编码器输出的每个子图像对应的局部编码向量对应的feature map,每个feature map的维度为(224,224,1)。最后,将每个第一编码器对应的feature map拼接到一起,得到(224,224,N)维的向量,其中,N为第一编码器的个数,最终使用两层卷积,将(224,224,N)维的向量通过卷积变换得到(224,224,1)维的向量,即为热力图子模型输出的热力图。热力图中亮度最大的点的坐标,即为热力图子模型确定的,组织腔体在组织图像中的位置坐标,即后文提及的热力坐标。
步骤1036,根据回归坐标和热力坐标,确定组织图像中组织腔体的位置,热力坐标为热力图中亮度最大的点的坐标。
示例的,在得到回归坐标和热力坐标之后,可以根据两种坐标互相验证,以确定组织图像中组织腔体的位置。具体的,可以在组织图像中标出组织腔体的位置,以作为本公开所提供的组织腔体的定位方法的输出,如图6中的(b)所示。
具体的,步骤1036的实现方式可以为:
若回归坐标与热力坐标之间的距离小于预设的距离阈值,根据回归坐标和热力坐标,确定组织图像中组织腔体的坐标。
相应的,该方法还可以包括:
若回归坐标与热力坐标之间的距离大于或等于距离阈值,根据历史组织图像,确定内窥镜的进镜方向,以控制内窥镜按照进镜方向进镜。
举例来说,可以先确定回归坐标与热力坐标之间的距离,可以理解为两种坐标之间的差,然后将回归坐标与热力坐标之间的距离与预设的距离阈值进行比较。在回归坐标与热力坐标之间的距离小于预设的距离阈值的情况下,表明回归坐标与热力坐标的置信度较高,那么可以根据回归坐标和热力坐标,确定组织图像中组织腔体的坐标。具体的,可以将回归坐标或者热力坐标,作为组织图像中组织腔体的坐标,也可以将回归坐标与热力坐标的连线的中点作为组织图像中组织腔体的坐标,还可以同时将回归坐标与热力坐标作为组织图像中组织腔体的坐标,即同时输出回归坐标和热力坐标。其中,距离阈值可以根据组织图像的尺寸来确定,例如,组织图像的尺寸为224*224,那么距离阈值可以为224*0.2=45。
在回归坐标与热力坐标之间的距离大于或等于距离阈值的情况下,表明回归坐标与热力坐标的置信度较低,那么可以放弃该组织图像,并根据历史组织图像,确定内窥镜的进镜方向,以控制内窥镜按照进镜方向进镜。具体的,可以根据历史组织图像中组织腔体的位置,确定内窥镜的进镜方向,例如,将进镜方向确定为朝向历史组织图像中组织腔体的位置,然后按照进镜方向进镜。
图7是根据一示例性实施例示出的另一种组织腔体的定位方法的流程图,如图7所示,第二定位模型的结构可以如图8所示,其中以组织图像为肠道图像来举例,第二定位模型包括:多个第二编码器、点回归子模型和热力图子模型,还可以包括线性投射层。其中,第二编码器可以为Vision Transformer中的Encoder,线性投射层可以理解为一个全连接层。
步骤104的实现方式可以包括:
步骤1041,对组织图像、历史组织图像进行预处理,并将预处理后的组织图像划分为大小相等的多个子图像,将预处理后的历史组织图像划分为与每个子图像大小相等、位置对应的多个历史子图像。
步骤1042,将位置对应的子图像和历史子图像作为一个图像组。
示例的,步骤1041中对组织图像进行预处理、划分的方式,对历史组织图像进行预处理、划分的方式,均与步骤1021中的方式相同,此处不再赘述。需要说明的是,对预处理后的历史组织图像进行划分时,得到的是与每个子图像大小相等,且位置对应的多个历史子图像。例如,预处理后的组织图像和历史组织图像的尺寸均为224*224,按照16*16进行划分,得到196个子图像,和196个历史子图像,每个历史子图像都对应一个子图像,该历史子图像在历史组织图像上的位置,与对应的子图像在组织图像上的位置相同。进一步的,可以有多个历史组织图像,每个历史组织图像都可以被划分为多个历史子图像,那么每个子图像,对应多个历史组织图像中,每个历史组织图像中的一个历史 子图像,即每个子图像,对应多个历史子图像(每个历史子图像是从不同的历史组织图像中划分出来的)。
之后,可以将位置对应的子图像和历史子图像(一个或多个)作为一个图像组。例如,历史组织图像有5个,预处理后的组织图像的尺寸为224*224,5个预处理后的历史组织图像的尺寸均为224*224,按照16*16进行划分,得到196个子图像,和每个预处理后的历史组织图像划分出的196个历史子图像(共5*196个历史子图像)。那么,每个子图像对应5个历史子图像,可以将该子图像与对应的5个历史子图像作为一个图像组。
步骤1043,根据每个图像组对应的图像向量,和该图像组对应的位置向量,确定该图像组对应的令牌,位置向量用于指示该图像组对应的位置。
示例的,可以先将每个图像组按照通道进行拼接,得到拼接图像。之后再利用线性投射层将拼接图像进行展平处理,即将拼接图像展平成一维向量,之后再将拼接图像对应的一维向量做线性变换(可以理解为经过全连接层),以对拼接图像进行降维处理,得到该图像组对应的图像向量(可以表示为patch embedding),图像向量能够表征该图像组。进一步的,还可以生成用于指示该图像组对应在预处理后的组织图像中的位置的位置向量(可以表示为position embedding),其中,position embedding的大小与patch embedding的大小相同。需要说明的是,position embedding可以是随机生成,第二编码器能够学习到对应的该图像组在组织图像的位置的表征。之后,可以根据每个该图像组的图像向量和位置向量,生成该图像组对应的令牌(可以表示为token)。具体的,每个图像组对应的令牌,可以是将该图像组的图像向量和位置向量进行拼接得到的。
进一步的,在得到每个图像组对应的令牌之后,还可以随机生成一个的第二定位令牌。例如,可以随机生成一个图像向量和一个位置向量,并进行拼接,以作为第二定位令牌。
步骤1044,将每个图像组对应的令牌,和随机生成的第二定位令牌输入每个第二编码器,以得到该第二编码器输出的,每个图像组对应的局部编码向量,和总图像组对应的全局编码向量,总图像组包括组织图像和历史组织图像。
示例的,在得到每个图像组对应的令牌,和第二定位令牌之后,可以将每个图像组对应的令牌,和第二定位令牌输入每个第二编码器,每个第二编码器能够根据每个图像组对应的令牌,生成每个图像组对应的局部编码向量,同时,还能够根据全部的图像组对应的令牌,生成总图像组对应的全局编码向量。其中,局部编码向量可以理解为第二编码器学习到的,能够表征对应的子图像的向量,全局编码向量可以理解为第二编码器学习到的,能够表征整个总图像组的向量,总图像组可以理解为组织图像和多个历史组织图像的集合,即总图像组为输入第二定位模型的全部图像。其中,第二编码器的结构与图4中的(b)所示出的编码器的结构相同,此处不再赘述。
步骤1045,将每个第二编码器输出的全局编码向量输入点回归子模型,以得到点回归子模型输出的回归坐标。
步骤1046,将每个第二编码器输出的,每个图像组对应的局部编码向量输入热力图子模型,以得到热力图子模型输出的热力图。
步骤1047,根据回归坐标和热力坐标,确定组织图像中组织腔体的位置,热力坐标为热力图中亮度最大的点的坐标。
示例的,第二定位模型中的点回归子模型和热力图子模型的结构,与第一定位模型中的点回归 子模型和热力图子模型的结构相同,得到回归坐标和热力坐标的方式也相同,且根据回归坐标和热力坐标,确定组织图像中组织腔体的位置的方式也相同,此处不再赘述。
在另一种实现方式中,步骤1047的实现方式可以为:
若回归坐标与热力坐标之间的距离小于预设的距离阈值,根据回归坐标和热力坐标,确定组织图像中组织腔体的坐标。
相应的,该方法还可以包括:
若回归坐标与热力坐标之间的距离大于或等于距离阈值,根据历史组织图像,确定内窥镜的进镜方向,以控制内窥镜按照进镜方向进镜。
图9是根据一示例性实施例示出的一种训练分类模型的流程图,如图9所示,分类模型是通过以下方式训练得到的:
步骤A,获取第一样本输入集和第一样本输出集,第一样本输入集包括:多个第一样本输入,每个第一样本输入包括样本组织图像,第一样本输出集中包括与每个第一样本输入对应的第一样本输出,每个第一样本输出包括对应的样本组织图像的真实类型。
步骤B,将第一样本输入集作为分类模型的输入,将第一样本输出集作为分类模型的输出,以训练分类模型。
举例来说,在对分类模型进行训练时,需要先获取用于训练分类模型的第一样本输入集和第一样本输出集。第一样本输入集中包括了多个第一样本输入,每个第一样本输入可以为一个样本组织图像,样本组织图像例如可以是之前执行内窥镜检查时采集到的组织图像。第一样本输出集中包括了与每个第一样本输入对应的第一样本输出,每个第一样本输出包括对应的样本组织图像的真实类型,真实类型可以包括:第一类型和第二类型,第一类型用于指示样本组织图像中存在组织腔体,第二类型用于指示样本组织图像中不存在组织腔体,真实类型还可以包括第三类型,用于指示样本组织图像的质量过低。
在对分类模型训练时,可以将第一样本输入集作为分类模型的输入,然后再将第一样本输出集作为分类模型的输出,来训练分类模型,使得在输入第一样本输入集时,分类模型的输出,能够和第一样本输出集匹配。例如,可以根据分类模型的输出,与第一样本输出集确定交叉熵损失,以作为分类模型的损失函数,以降低损失函数为目标,利用反向传播算法来修正分类模型中的神经元的参数,神经元的参数例如可以是神经元的权重(英文:Weight)和偏置量(英文:Bias)。重复上述步骤,直至损失函数满足预设条件,例如损失函数小于预设的损失阈值,以达到训练分类模型的目的。
具体的,训练分类模型的初始学习率可以设置为:2e-4,Batch size可以设置为:64,优化器可以选择:Adam,Epoch可以设置为:100,样本组织图像的可以大小为:224*224。分类模型的损失函数可以如公式一(即交叉熵损失函数)所示:
Figure PCTCN2022116108-appb-000001
其中,L class表示分类模型的损失函数,
Figure PCTCN2022116108-appb-000002
表示分类模型的输出(可以理解为样本组织图像与 第i种类型的匹配概率),y i表示样本组织图像的真实类型与第i种类型的匹配概率,c表示真实类型的种数。
图10是根据一示例性实施例示出的一种训练第一定位模型的流程图,如图10所示,第一定位模型是通过以下方式训练得到的:
步骤C,获取第二样本输入集和第二样本输出集,第二样本输入集包括:多个第二样本输入,每个第二样本输入包括样本组织图像,第二样本输出集中包括与每个第二样本输入对应的第二样本输出,每个第二样本输出包括对应的样本组织图像中组织腔体的真实位置。
步骤D,将第二样本输入集作为第一定位模型的输入,将第二样本输出集作为第一定位模型的输出,以训练第一定位模型。
其中,第一定位模型的损失,根据回归损失和热力图损失确定,回归损失根据点回归子模型的输出与第二样本输出集确定,热力图损失根据热力图子模型的输出与第二样本输出集确定。
举例来说,在对第一定位模型进行训练时,需要先获取用于训练第一定位模型的第二样本输入集和第二样本输出集。第二样本输入集中包括了多个第二样本输入,每个第二样本输入可以为一个样本组织图像,样本组织图像例如可以是之前执行内窥镜检查时采集到的组织图像。第二样本输出集中包括了与每个第二样本输入对应的第二样本输出,每个第二样本输出包括对应的样本组织图像中组织腔体的真实位置。
在对第一定位模型训练时,可以将第二样本输入集作为第一定位模型的输入,然后再将第二样本输出集作为第一定位模型的输出,来训练第一定位模型,使得在输入第二样本输入集时,第一定位模型的输出,能够和第二样本输出集匹配。例如,可以根据第一定位模型的输出,与第二样本输出集确定第一定位模型的损失函数,以降低损失函数为目标,利用反向传播算法来修正第一定位模型中的神经元的参数,神经元的参数例如可以是神经元的权重和偏置量。重复上述步骤,直至损失函数满足预设条件,例如损失函数小于预设的损失阈值,以达到训练第一定位模型的目的。
具体的,在输入第二样本输入集时,第一定位模型的输出包括点回归子模型输出的回归坐标,和热力图子模型输出的热力图,可以分别将回归坐标与第二样本输出集进行比较,以确定回归损失,将热力图中包括的热力坐标与第二样本输出集进行比较,以确定热力图损失,最后根据回归损失和热力图损失,共同确定第一定位模型的损失。其中,热力坐标为热力图中亮度最大的点的坐标。
例如,训练第一定位模型的初始学习率可以设置为:2e-4,Batch size可以设置为:64,优化器可以选择:Adam,Epoch可以设置为:100,样本组织图像的大小可以为:224×224,相应的,热力图子模型输出的热力图的大小为:224×224。回归损失可以通过公式二来确定:
Figure PCTCN2022116108-appb-000003
其中,L r表示回归损失,
Figure PCTCN2022116108-appb-000004
表示回归坐标中X轴的坐标值,x表示样本组织图像中组织腔体的真实位置在X轴上的坐标值,
Figure PCTCN2022116108-appb-000005
表示回归坐标中Y轴的坐标值,y表示样本组织图像中组织腔体的 真实位置在Y轴上的坐标值。
进一步的,由于第二样本输出为对应的样本组织图像中组织腔体的真实位置,往往是一个点的坐标,即该点的label为1,样本组织图像其他的点的label为0,那么在对第一定位模型进行训练时,第二样本输出包括的信息量过小,容易导致热力图子模型输出一个全0的热力图,进一步导致热力图损失很小,无法对第一定位模型进行训练。为了避免这样的问题出现,可以将第二样本输出集中的每个第二样本输出,转换为高斯图,并可以利用高斯图来确定热力图损失,高斯图的大小与样本组织图像的大小相同,其中每个像素点对应一个label。例如,第二样本输出中包括的样本组织图像中组织腔体的位置坐标为(x l,y l),那么高斯图中坐标为(x l,y l)的点对应的label为1,高斯图中其他点对应的label为(0,1)之间的值,可以通过公式三来确定:
Figure PCTCN2022116108-appb-000006
其中,label x,y表示高斯图中坐标为(x,y)的点对应的label,σ表示超参数,x l表示样本组织图像中组织腔体的位置在X轴上的坐标,y l表示样本组织图像中组织腔体的位置在Y轴上的坐标。
根据公式三可以看出,距离第二样本输出越近的点的label越大(可以理解为该点属于组织腔体的概率越大),距离第二样本输出越远的点的label越小(可以理解为该点属于组织腔体的概率越小)。这样,将第二样本输出转化为高斯图,高斯图中每个点都有非零的label,能够有效提高其中包含的信息量,所以根据高斯图来确定热力图损失,能够避免热力图损失很小,无法对第一定位模型进行训练的问题。相应的,热力图损失可以通过公式四来确定:
Figure PCTCN2022116108-appb-000007
其中,L h表示热力图损失,H表示热力图的高,W表示热力图的宽,p h,w表示第一定位模型的输出(可以理解为热力图中坐标为(h,w)的点的亮度),q h,w表示高斯图中坐标为(h,w)的点对应的label。
相应的,第一定位模型的损失可以通过公式五来确定:
L loc=L h+λL r公式五
其中,L loc表示第一定位模型的损失,λ表示回归损失对应的权重参数,例如可以设置为1。
图11是根据一示例性实施例示出的一种训练第二定位模型的流程图,如图11所示,第二定位模型是通过以下方式训练得到的:
步骤E,获取第三样本输入集和第三样本输出集,第三样本输入集包括:多个第三样本输入, 每个第三样本输入包括样本组织图像和样本历史组织图像,第三样本输出集中包括与每个第三样本输入对应的第三样本输出,每个第三样本输出包括对应的样本组织图像中组织腔体的真实位置,样本历史组织图像为内窥镜在采集样本组织图像之前采集的图像。
步骤F,将第三样本输入集作为第二定位模型的输入,将第三样本输出集作为第二定位模型的输出,以训练第二定位模型。
其中,第二定位模型的损失,根据回归损失和热力图损失确定,回归损失根据点回归子模型的输出与第三样本输出集确定,热力图损失根据热力图子模型的输出与第三样本输出集确定。
举例来说,在对第二定位模型进行训练时,需要先获取用于训练第二定位模型的第三样本输入集和第三样本输出集。第三样本输入集中包括了多个第三样本输入,每个第三样本输入可以为一个样本组织图像和该样本组织图像对应的样本历史组织图像。其中,样本组织图像例如可以是之前执行内窥镜检查时采集到的组织图像,样本历史组织图像为内窥镜在采集样本组织图像之前采集的图像,样本历史组织图像可以是一个也可以是多个。第三样本输出集中包括了与每个第三样本输入对应的第三样本输出,每个第三样本输出包括对应的样本组织图像中组织腔体的真实位置。
在对第二定位模型训练时,可以将第三样本输入集作为第二定位模型的输入,然后再将第三样本输出集作为第二定位模型的输出,来训练第二定位模型,使得在输入第三样本输入集时,第二定位模型的输出,能够和第三样本输出集匹配。例如,可以根据第二定位模型的输出,与第三样本输出集确定第二定位模型的损失函数,以降低损失函数为目标,利用反向传播算法来修正第二定位模型中的神经元的参数,神经元的参数例如可以是神经元的权重和偏置量。重复上述步骤,直至损失函数满足预设条件,例如损失函数小于预设的损失阈值,以达到训练第二定位模型的目的。
具体的,在输入第三样本输入集时,第二定位模型的输出包括点回归子模型输出的回归坐标,和热力图子模型输出的热力图,可以分别将回归坐标与第三样本输出集进行比较,以确定回归损失,将热力图中包括的热力坐标与第三样本输出集进行比较,以确定热力图损失,最后根据回归损失和热力图损失,共同确定第二定位模型的损失。其中,热力坐标为热力图中亮度最大的点的坐标。
例如,训练第二定位模型的初始学习率可以设置为:2e-4,Batch size可以设置为:64,优化器可以选择:Adam,Epoch可以设置为:100,样本组织图像的大小可以为:224×224,相应的,热力图子模型输出的热力图的大小为:224×224。进一步的,确定回归损失、热力图损失和第二定位模型的损失的方式,与第一定位模型中的确定方式相同,此处不再赘述。
综上所述,本公开首先获取当前时刻内窥镜采集的组织图像,之后利用分类模型对组织图像进行分类,以确定组织图像的目标类型。在目标类型指示组织图像中存在组织腔体的情况下,根据第一定位模型和组织图像,确定组织图像中组织腔体的位置,在目标类型指示组织图像中不存在组织腔体的情况下,根据第二定位模型、组织图像和当前时刻之前内窥镜采集的历史组织图像,确定组织图像中组织腔体的位置。本公开首先对组织图像进行分类,并根据不同的分类结果,选择不同的定位方式来定位组织图像中组织腔体的位置,能够提高定位的成功率和准确度。
图12是根据一示例性实施例示出的一种组织腔体的定位装置的框图,如图12所示,该装置200可以包括:
获取模块201,用于获取当前时刻内窥镜采集的组织图像。
分类模块202,用于利用预先训练的分类模型对组织图像进行分类,以确定组织图像的目标类 型。
第一定位模块203,用于若目标类型指示组织图像中存在组织腔体,根据预先训练的第一定位模型和组织图像,确定组织图像中组织腔体的位置。
第二定位模块204,用于若目标类型指示组织图像中不存在组织腔体,根据预先训练的第二定位模型、组织图像和历史组织图像,确定组织图像中组织腔体的位置,历史组织图像为当前时刻之前内窥镜采集的图像。
图13是根据一示例性实施例示出的另一种组织腔体的定位装置的框图,如图13所示,该装置还可以包括:
确定模块205,用于若目标类型指示组织图像的质量不满足预设条件,根据历史组织图像,确定内窥镜的进镜方向,以控制内窥镜按照进镜方向进镜。
确定模块205,还用于在获得组织图像中组织腔体的位置的情况下,根据组织图像中组织腔体的位置,确定进镜方向,以控制内窥镜按照进镜方向进镜。
图14是根据一示例性实施例示出的另一种组织腔体的定位装置的框图,如图14所示,分类模型包括:编码器和分类层,分类模块202的可以包括:
第一预处理子模块2021,用于对组织图像进行预处理,并将预处理后的组织图像划分为大小相等的多个子图像。
第一确定子模块2022,用于根据每个子图像对应的图像向量,和该子图像对应的位置向量,确定该子图像对应的令牌,位置向量用于指示该子图像在预处理后的组织图像中的位置。
第一编码子模块2023,用于将每个子图像对应的令牌,和随机生成的分类令牌输入编码器,以得到每个子图像对应的局部编码向量,和组织图像对应的全局编码向量。
分类子模块2024,用于将全局编码向量和多个局部编码向量输入分类层,以得到分类层输出的目标类型。
图15是根据一示例性实施例示出的另一种组织腔体的定位装置的框图,如图15所示,第一定位模型包括:多个第一编码器、点回归子模型和热力图子模型,第一定位模块203可以包括:
第二预处理模块2031,用于对组织图像进行预处理,并将预处理后的组织图像划分为大小相等的多个子图像。
第二确定子模块2032,用于根据每个子图像对应的图像向量,和该子图像对应的位置向量,确定该子图像对应的令牌,位置向量用于指示该子图像在预处理后的组织图像中的位置。
第二编码子模块2033,用于将每个子图像对应的令牌,和随机生成的第一定位令牌输入每个第一编码器,以得到该第一编码器输出的,每个子图像对应的局部编码向量,和组织图像对应的全局编码向量。
第一回归子模块2034,用于将每个第一编码器输出的全局编码向量输入点回归子模型,以得到点回归子模型输出的回归坐标。
第一热力图子模块2035,用于将每个第一编码器输出的,每个子图像对应的局部编码向量输入热力图子模型,以得到热力图子模型输出的热力图。
第一输出子模块2036,用于根据回归坐标和热力坐标,确定组织图像中组织腔体的位置,热力坐标为热力图中亮度最大的点的坐标。
在一种实现方式中,第一输出子模块2036可以用于:
若回归坐标与热力坐标之间的距离小于预设的距离阈值,根据回归坐标和热力坐标,确定组织图像中组织腔体的坐标。
相应的,确定模块205,还用于若回归坐标与热力坐标之间的距离大于或等于距离阈值,根据历史组织图像,确定内窥镜的进镜方向,以控制内窥镜按照进镜方向进镜。
图16是根据一示例性实施例示出的另一种组织腔体的定位装置的框图,如图16所示,第二定位模型包括:多个第二编码器、点回归子模型和热力图子模型,第二定位模块204可以包括:
第三预处理模块2041,用于对组织图像、历史组织图像进行预处理,并将预处理后的组织图像划分为大小相等的多个子图像,将预处理后的历史组织图像划分为与每个子图像大小相等、位置对应的多个历史子图像。将位置对应的子图像和历史子图像作为一个图像组。
第三确定子模块2042,用于根据每个图像组对应的图像向量,和该图像组对应的位置向量,确定该图像组对应的令牌,位置向量用于指示该图像组对应的位置。
第三编码子模块2043,用于将每个图像组对应的令牌,和随机生成的第二定位令牌输入每个第二编码器,以得到该第二编码器输出的,每个图像组对应的局部编码向量,和总图像组对应的全局编码向量,总图像组包括组织图像和历史组织图像。
第二回归子模块2044,用于将每个第二编码器输出的全局编码向量输入点回归子模型,以得到点回归子模型输出的回归坐标。
第二热力图子模块2045,用于将每个第二编码器输出的,每个图像组对应的局部编码向量输入热力图子模型,以得到热力图子模型输出的热力图。
第二输出子模块2046,用于根据回归坐标和热力坐标,确定组织图像中组织腔体的位置,热力坐标为热力图中亮度最大的点的坐标。
在另一种实现方式中,第二输出子模块2046可以用于:
若回归坐标与热力坐标之间的距离小于预设的距离阈值,根据回归坐标和热力坐标,确定组织图像中组织腔体的坐标。
相应的,确定模块205,还用于若回归坐标与热力坐标之间的距离大于或等于距离阈值,根据历史组织图像,确定内窥镜的进镜方向,以控制内窥镜按照进镜方向进镜。
在一种应用场景中,分类模型是通过以下方式训练得到的:
步骤A,获取第一样本输入集和第一样本输出集,第一样本输入集包括:多个第一样本输入,每个第一样本输入包括样本组织图像,第一样本输出集中包括与每个第一样本输入对应的第一样本输出,每个第一样本输出包括对应的样本组织图像的真实类型。
步骤B,将第一样本输入集作为分类模型的输入,将第一样本输出集作为分类模型的输出,以训练分类模型。
在另一种应用场景中,第一定位模型是通过以下方式训练得到的:
步骤C,获取第二样本输入集和第二样本输出集,第二样本输入集包括:多个第二样本输入,每个第二样本输入包括样本组织图像,第二样本输出集中包括与每个第二样本输入对应的第二样本输出,每个第二样本输出包括对应的样本组织图像中组织腔体的真实位置。
步骤D,将第二样本输入集作为第一定位模型的输入,将第二样本输出集作为第一定位模型的 输出,以训练第一定位模型。
其中,第一定位模型的损失,根据回归损失和热力图损失确定,回归损失根据点回归子模型的输出与第二样本输出集确定,热力图损失根据热力图子模型的输出与第二样本输出集确定。
在又一种应用场景中,第二定位模型是通过以下方式训练得到的:
步骤E,获取第三样本输入集和第三样本输出集,第三样本输入集包括:多个第三样本输入,每个第三样本输入包括样本组织图像和样本历史组织图像,第三样本输出集中包括与每个第三样本输入对应的第三样本输出,每个第三样本输出包括对应的样本组织图像中组织腔体的真实位置,样本历史组织图像为内窥镜在采集样本组织图像之前采集的图像。
步骤F,将第三样本输入集作为第二定位模型的输入,将第三样本输出集作为第二定位模型的输出,以训练第二定位模型。
其中,第二定位模型的损失,根据回归损失和热力图损失确定,回归损失根据点回归子模型的输出与第三样本输出集确定,热力图损失根据热力图子模型的输出与第三样本输出集确定。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
综上所述,本公开首先获取当前时刻内窥镜采集的组织图像,之后利用分类模型对组织图像进行分类,以确定组织图像的目标类型。在目标类型指示组织图像中存在组织腔体的情况下,根据第一定位模型和组织图像,确定组织图像中组织腔体的位置,在目标类型指示组织图像中不存在组织腔体的情况下,根据第二定位模型、组织图像和当前时刻之前内窥镜采集的历史组织图像,确定组织图像中组织腔体的位置。本公开首先对组织图像进行分类,并根据不同的分类结果,选择不同的定位方式来定位组织图像中组织腔体的位置,能够提高定位的成功率和准确度。
下面参考图17,其示出了适于用来实现本公开实施例的电子设备(例如可以上述实施例中的执行主体,可以是终端设备或服务器)300的结构示意图。本公开实施例中的终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图17示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图17所示,电子设备300可以包括处理装置(例如中央处理器、图形处理器等)301,其可以根据存储在只读存储器(ROM)302中的程序或者从存储装置308加载到随机访问存储器(RAM)303中的程序而执行各种适当的动作和处理。在RAM 303中,还存储有电子设备300操作所需的各种程序和数据。处理装置301、ROM 302以及RAM 303通过总线304彼此相连。输入/输出(I/O)接口305也连接至总线304。
通常,以下装置可以连接至I/O接口305:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置306;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置307;包括例如磁带、硬盘等的存储装置308;以及通信装置309。通信装置309可以允许电子设备300与其他设备进行无线或有线通信以交换数据。虽然图17示出了具有各种装置的电子设备300,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例 如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置309从网络上被下载和安装,或者从存储装置308被安装,或者从ROM 302被安装。在该计算机程序被处理装置301执行时,执行本公开实施例的方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
在一些实施方式中,终端设备、服务器可以利用诸如HTTP(Hyper Text Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,adhoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:获取当前时刻内窥镜采集的组织图像;利用预先训练的分类模型对所述组织图像进行分类,以确定所述组织图像的目标类型;若所述目标类型指示所述组织图像中存在组织腔体,根据预先训练的第一定位模型和所述组织图像,确定所述组织图像中组织腔体的位置;若所述目标类型指示所述组织图像中不存在组织腔体,根据预先训练的第二定位模型、所述组织图像和历史组织图像,确定所述组织图像中组织腔体的位置,所述历史组织图像为当前时刻之前所述内窥镜采集的图像。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言-诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言——诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)——连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的模块可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,模块的名称在某种情况下并不构成对该模块本身的限定,例如,获取模块还可以被描述为“获取组织图像的模块”。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
根据本公开的一个或多个实施例,示例1提供了一种组织腔体的定位方法,包括:获取当前时刻内窥镜采集的组织图像;利用预先训练的分类模型对所述组织图像进行分类,以确定所述组织图像的目标类型;若所述目标类型指示所述组织图像中存在组织腔体,根据预先训练的第一定位模型和所述组织图像,确定所述组织图像中组织腔体的位置;若所述目标类型指示所述组织图像中不存在组织腔体,根据预先训练的第二定位模型、所述组织图像和历史组织图像,确定所述组织图像中组织腔体的位置,所述历史组织图像为当前时刻之前所述内窥镜采集的图像。
根据本公开的一个或多个实施例,示例2提供了示例1的方法,所述方法还包括:若所述目标类型指示所述组织图像的质量不满足预设条件,根据所述历史组织图像,确定所述内窥镜的进镜方向,以控制所述内窥镜按照所述进镜方向进镜;在获得所述组织图像中组织腔体的位置的情况下,根据所述组织图像中组织腔体的位置,确定所述进镜方向,以控制所述内窥镜按照所述进镜方向进镜。
根据本公开的一个或多个实施例,示例3提供了示例1或示例2的方法,所述分类模型包括:编码器和分类层,所述利用预先训练的分类模型对所述组织图像进行分类,以确定所述组织图像的目标类型,包括:对所述组织图像进行预处理,并将预处理后的所述组织图像划分为大小相等的多个子图像;根据每个所述子图像对应的图像向量,和该子图像对应的位置向量,确定该子图像对应的令牌,所述位置向量用于指示该子图像在预处理后的所述组织图像中的位置;将每个所述子图像对应的令牌,和所述随机生成的分类令牌输入编码器,以得到每个子图像对应的局部编码向量,和所述组织图像对 应的全局编码向量;将所述全局编码向量和多个所述局部编码向量输入分类层,以得到所述分类层输出的所述目标类型。
根据本公开的一个或多个实施例,示例4提供了示例1或示例2的方法,所述第一定位模型包括:多个第一编码器、点回归子模型和热力图子模型,所述根据预先训练的第一定位模型和所述组织图像,确定所述组织图像中组织腔体的位置,包括:对所述组织图像进行预处理,并将预处理后的所述组织图像划分为大小相等的多个子图像;根据每个所述子图像对应的图像向量,和该子图像对应的位置向量,确定该子图像对应的令牌,所述位置向量用于指示该子图像在预处理后的所述组织图像中的位置;将每个所述子图像对应的令牌,和所述随机生成的第一定位令牌输入每个所述第一编码器,以得到该第一编码器输出的,每个所述子图像对应的局部编码向量,和所述组织图像对应的全局编码向量;将每个所述第一编码器输出的所述全局编码向量输入所述点回归子模型,以得到所述点回归子模型输出的回归坐标;将每个所述第一编码器输出的,每个所述子图像对应的局部编码向量输入所述热力图子模型,以得到所述热力图子模型输出的热力图;根据所述回归坐标和热力坐标,确定所述组织图像中组织腔体的位置,所述热力坐标为所述热力图中亮度最大的点的坐标。
根据本公开的一个或多个实施例,示例5提供了示例4的方法,所述根据所述回归坐标和热力坐标,确定所述组织图像中组织腔体的位置,包括:若所述回归坐标与所述热力坐标之间的距离小于预设的距离阈值,根据所述回归坐标和热力坐标,确定所述组织图像中组织腔体的坐标;所述方法还包括:若所述回归坐标与所述热力坐标之间的距离大于或等于所述距离阈值,根据所述历史组织图像,确定所述内窥镜的进镜方向,以控制所述内窥镜按照所述进镜方向进镜。
根据本公开的一个或多个实施例,示例6提供了示例1或示例2的方法,所述第二定位模型包括:多个第二编码器、点回归子模型和热力图子模型,所述根据预先训练的第二定位模型、所述组织图像和历史组织图像,确定所述组织图像中组织腔体的位置,包括:对所述组织图像、所述历史组织图像进行预处理,并将预处理后的所述组织图像划分为大小相等的多个子图像,将预处理后的所述历史组织图像划分为与每个所述子图像大小相等、位置对应的多个历史子图像;将位置对应的所述子图像和所述历史子图像作为一个图像组;根据每个所述图像组对应的图像向量,和该图像组对应的位置向量,确定该图像组对应的令牌,所述位置向量用于指示该图像组对应的位置;将每个所述图像组对应的令牌,和随机生成的第二定位令牌输入每个所述第二编码器,以得到该第二编码器输出的,每个所述图像组对应的局部编码向量,和总图像组对应的全局编码向量,所述总图像组包括所述组织图像和所述历史组织图像;将每个所述第二编码器输出的所述全局编码向量输入所述点回归子模型,以得到所述点回归子模型输出的回归坐标;将每个所述第二编码器输出的,每个所述图像组对应的局部编码向量输入所述热力图子模型,以得到所述热力图子模型输出的热力图;根据所述回归坐标和热力坐标,确定所述组织图像中组织腔体的位置,所述热力坐标为所述热力图中亮度最大的点的坐标。
根据本公开的一个或多个实施例,示例7提供了示例1的方法,所述分类模型是通过以下方式训练得到的:获取第一样本输入集和第一样本输出集,所述第一样本输入集包括:多个第一样本输入,每个所述第一样本输入包括样本组织图像,所述第一样本输出集中包括与每个所述第一样本输入对应的第一样本输出,每个所述第一样本输出包括对应的所述样本组织图像的真实类型;将所述第一样本输入集作为所述分类模型的输入,将所述第一样本输出集作为所述分类模型的输出,以训练所述分类模型。
根据本公开的一个或多个实施例,示例8提供了示例4的方法,所述第一定位模型是通过以下方式训练得到的:获取第二样本输入集和第二样本输出集,所述第二样本输入集包括:多个第二样本输入,每个所述第二样本输入包括样本组织图像,所述第二样本输出集中包括与每个所述第二样本输入对应的第二样本输出,每个所述第二样本输出包括对应的所述样本组织图像中组织腔体的真实位置;将所述第二样本输入集作为所述第一定位模型的输入,将所述第二样本输出集作为所述第一定位模型的输出,以训练所述第一定位模型;所述第一定位模型的损失,根据回归损失和热力图损失确定,所述回归损失根据所述点回归子模型的输出与所述第二样本输出集确定,所述热力图损失根据所述热力图子模型的输出与所述第二样本输出集确定。
根据本公开的一个或多个实施例,示例9提供了示例6的方法,所述第二定位模型是通过以下方式训练得到的:获取第三样本输入集和第三样本输出集,所述第三样本输入集包括:多个第三样本输入,每个所述第三样本输入包括样本组织图像和样本历史组织图像,所述第三样本输出集中包括与每个所述第三样本输入对应的第三样本输出,每个所述第三样本输出包括对应的所述样本组织图像中组织腔体的真实位置,所述样本历史组织图像为所述内窥镜在采集所述样本组织图像之前采集的图像;将所述第三样本输入集作为所述第二定位模型的输入,将所述第三样本输出集作为所述第二定位模型的输出,以训练所述第二定位模型;所述第二定位模型的损失,根据回归损失和热力图损失确定,所述回归损失根据所述点回归子模型的输出与所述第三样本输出集确定,所述热力图损失根据所述热力图子模型的输出与所述第三样本输出集确定。
根据本公开的一个或多个实施例,示例10提供了一种组织腔体的定位装置,包括:获取模块,用于获取当前时刻内窥镜采集的组织图像;分类模块,用于利用预先训练的分类模型对所述组织图像进行分类,以确定所述组织图像的目标类型;第一定位模块,用于若所述目标类型指示所述组织图像中存在组织腔体,根据预先训练的第一定位模型和所述组织图像,确定所述组织图像中组织腔体的位置;第二定位模块,用于若所述目标类型指示所述组织图像中不存在组织腔体,根据预先训练的第二定位模型、所述组织图像和历史组织图像,确定所述组织图像中组织腔体的位置,所述历史组织图像为当前时刻之前所述内窥镜采集的图像。
根据本公开的一个或多个实施例,示例11提供了一种计算机可读介质,其上存储有计算机程序,该程序被处理装置执行时实现示例1至示例9中所述方法的步骤。
根据本公开的一个或多个实施例,示例12提供了一种电子设备,包括:存储装置,其上存储有计算机程序;处理装置,用于执行所述存储装置中的所述计算机程序,以实现示例1至示例9中所述方法的步骤。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描 述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。

Claims (12)

  1. 一种组织腔体的定位方法,所述方法包括:
    获取当前时刻内窥镜采集的组织图像;
    利用预先训练的分类模型对所述组织图像进行分类,以确定所述组织图像的目标类型;
    若所述目标类型指示所述组织图像中存在组织腔体,根据预先训练的第一定位模型和所述组织图像,确定所述组织图像中组织腔体的位置;
    若所述目标类型指示所述组织图像中不存在组织腔体,根据预先训练的第二定位模型、所述组织图像和历史组织图像,确定所述组织图像中组织腔体的位置,所述历史组织图像为当前时刻之前所述内窥镜采集的图像。
  2. 根据权利要求1所述的方法,其中,所述方法还包括:
    若所述目标类型指示所述组织图像的质量不满足预设条件,根据所述历史组织图像,确定所述内窥镜的进镜方向,以控制所述内窥镜按照所述进镜方向进镜;
    在获得所述组织图像中组织腔体的位置的情况下,根据所述组织图像中组织腔体的位置,确定所述进镜方向,以控制所述内窥镜按照所述进镜方向进镜。
  3. 根据权利要求1或2所述的方法,其中,所述分类模型包括:编码器和分类层,所述利用预先训练的分类模型对所述组织图像进行分类,以确定所述组织图像的目标类型,包括:
    对所述组织图像进行预处理,并将预处理后的所述组织图像划分为大小相等的多个子图像;
    根据每个所述子图像对应的图像向量,和该子图像对应的位置向量,确定该子图像对应的令牌,所述位置向量用于指示该子图像在预处理后的所述组织图像中的位置;
    将每个所述子图像对应的令牌,和随机生成的分类令牌输入编码器,以得到每个子图像对应的局部编码向量,和所述组织图像对应的全局编码向量;
    将所述全局编码向量和多个所述局部编码向量输入分类层,以得到所述分类层输出的所述目标类型。
  4. 根据权利要求1或2所述的方法,其中,所述第一定位模型包括:多个第一编码器、点回归子模型和热力图子模型,所述根据预先训练的第一定位模型和所述组织图像,确定所述组织图像中组织腔体的位置,包括:
    对所述组织图像进行预处理,并将预处理后的所述组织图像划分为大小相等的多个子图像;
    根据每个所述子图像对应的图像向量,和该子图像对应的位置向量,确定该子图像对应的令牌,所述位置向量用于指示该子图像在预处理后的所述组织图像中的位置;
    将每个所述子图像对应的令牌,和随机生成的第一定位令牌输入每个所述第一编码器,以得到该第一编码器输出的,每个所述子图像对应的局部编码向量,和所述组织图像对应的全局编码向量;
    将每个所述第一编码器输出的所述全局编码向量输入所述点回归子模型,以得到所述点回归子模型输出的回归坐标;
    将每个所述第一编码器输出的,每个所述子图像对应的局部编码向量输入所述热力图子模型,以得到所述热力图子模型输出的热力图;
    根据所述回归坐标和热力坐标,确定所述组织图像中组织腔体的位置,所述热力坐标为所述热力图中亮度最大的点的坐标。
  5. 根据权利要求4所述的方法,其中,所述根据所述回归坐标和热力坐标,确定所述组织图像中组织腔体的位置,包括:
    若所述回归坐标与所述热力坐标之间的距离小于预设的距离阈值,根据所述回归坐标和热力坐标,确定所述组织图像中组织腔体的坐标;
    所述方法还包括:
    若所述回归坐标与所述热力坐标之间的距离大于或等于所述距离阈值,根据所述历史组织图像,确定所述内窥镜的进镜方向,以控制所述内窥镜按照所述进镜方向进镜。
  6. 根据权利要求1或2所述的方法,其中,所述第二定位模型包括:多个第二编码器、点回归子模型和热力图子模型,所述根据预先训练的第二定位模型、所述组织图像和历史组织图像,确定所述组织图像中组织腔体的位置,包括:
    对所述组织图像、所述历史组织图像进行预处理,并将预处理后的所述组织图像划分为大小相等的多个子图像,将预处理后的所述历史组织图像划分为与每个所述子图像大小相等、位置对应的多个历史子图像;
    将位置对应的所述子图像和所述历史子图像作为一个图像组;
    根据每个所述图像组对应的图像向量,和该图像组对应的位置向量,确定该图像组对应的令牌,所述位置向量用于指示该图像组对应的位置;
    将每个所述图像组对应的令牌,和随机生成的第二定位令牌输入每个所述第二编码器,以得到该第二编码器输出的,每个所述图像组对应的局部编码向量,和总图像组对应的全局编码向量,所述总图像组包括所述组织图像和所述历史组织图像;
    将每个所述第二编码器输出的所述全局编码向量输入所述点回归子模型,以得到所述点回归子模型输出的回归坐标;
    将每个所述第二编码器输出的,每个所述图像组对应的局部编码向量输入所述热力图子模型,以得到所述热力图子模型输出的热力图;
    根据所述回归坐标和热力坐标,确定所述组织图像中组织腔体的位置,所述热力坐标为所述热力图中亮度最大的点的坐标。
  7. 根据权利要求1所述的方法,其中,所述分类模型是通过以下方式训练得到的:
    获取第一样本输入集和第一样本输出集,所述第一样本输入集包括:多个第一样本输入,每个所述第一样本输入包括样本组织图像,所述第一样本输出集中包括与每个所述第一样本输入对应的第一样本输出,每个所述第一样本输出包括对应的所述样本组织图像的真实类型;
    将所述第一样本输入集作为所述分类模型的输入,将所述第一样本输出集作为所述分类模型的输出,以训练所述分类模型。
  8. 根据权利要求4所述的方法,其中,所述第一定位模型是通过以下方式训练得到的:
    获取第二样本输入集和第二样本输出集,所述第二样本输入集包括:多个第二样本输入,每个所述第二样本输入包括样本组织图像,所述第二样本输出集中包括与每个所述第二样本输入对应的第二样本输出,每个所述第二样本输出包括对应的所述样本组织图像中组织腔体的真实位置;
    将所述第二样本输入集作为所述第一定位模型的输入,将所述第二样本输出集作为所述第一定位模型的输出,以训练所述第一定位模型;
    所述第一定位模型的损失,根据回归损失和热力图损失确定,所述回归损失根据所述点回归子模型的输出与所述第二样本输出集确定,所述热力图损失根据所述热力图子模型的输出与所述第二样本输出集确定。
  9. 根据权利要求6所述的方法,其中,所述第二定位模型是通过以下方式训练得到的:
    获取第三样本输入集和第三样本输出集,所述第三样本输入集包括:多个第三样本输入,每个所述第三样本输入包括样本组织图像和样本历史组织图像,所述第三样本输出集中包括与每个所述第三样本输入对应的第三样本输出,每个所述第三样本输出包括对应的所述样本组织图像中组织腔体的真实位置,所述样本历史组织图像为所述内窥镜在采集所述样本组织图像之前采集的图像;
    将所述第三样本输入集作为所述第二定位模型的输入,将所述第三样本输出集作为所述第二定位模型的输出,以训练所述第二定位模型;
    所述第二定位模型的损失,根据回归损失和热力图损失确定,所述回归损失根据所述点回归子模型的输出与所述第三样本输出集确定,所述热力图损失根据所述热力图子模型的输出与所述第三样本输出集确定。
  10. 一种组织腔体的定位装置,所述装置包括:
    获取模块,用于获取当前时刻内窥镜采集的组织图像;
    分类模块,用于利用预先训练的分类模型对所述组织图像进行分类,以确定所述组织图像的目标类型;
    第一定位模块,用于若所述目标类型指示所述组织图像中存在组织腔体,根据预先训练的第一定位模型和所述组织图像,确定所述组织图像中组织腔体的位置;
    第二定位模块,用于若所述目标类型指示所述组织图像中不存在组织腔体,根据预先训练的第二定位模型、所述组织图像和历史组织图像,确定所述组织图像中组织腔体的位置,所述历史组织图像为当前时刻之前所述内窥镜采集的图像。
  11. 一种计算机可读介质,其上存储有计算机程序,该程序被处理装置执行时实现权利要求1-9中任一项所述方法的步骤。
  12. 一种电子设备,包括:
    存储装置,其上存储有计算机程序;
    处理装置,用于执行所述存储装置中的所述计算机程序,以实现权利要求1-9中任一项所述方法的步骤。
PCT/CN2022/116108 2021-09-06 2022-08-31 组织腔体的定位方法、装置、可读介质和电子设备 WO2023030373A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111040354.5A CN113487609B (zh) 2021-09-06 2021-09-06 组织腔体的定位方法、装置、可读介质和电子设备
CN202111040354.5 2021-09-06

Publications (1)

Publication Number Publication Date
WO2023030373A1 true WO2023030373A1 (zh) 2023-03-09

Family

ID=77947360

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/116108 WO2023030373A1 (zh) 2021-09-06 2022-08-31 组织腔体的定位方法、装置、可读介质和电子设备

Country Status (2)

Country Link
CN (1) CN113487609B (zh)
WO (1) WO2023030373A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113487609B (zh) * 2021-09-06 2021-12-07 北京字节跳动网络技术有限公司 组织腔体的定位方法、装置、可读介质和电子设备
CN113658178B (zh) * 2021-10-14 2022-01-25 北京字节跳动网络技术有限公司 组织图像的识别方法、装置、可读介质和电子设备
CN113743544A (zh) * 2021-11-05 2021-12-03 中科智为科技(天津)有限公司 一种跨模态神经网络构建方法、行人检索方法及系统
CN114332080B (zh) * 2022-03-04 2022-05-27 北京字节跳动网络技术有限公司 组织腔体的定位方法、装置、可读介质和电子设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016163609A (ja) * 2015-03-06 2016-09-08 富士フイルム株式会社 分岐構造判定装置、方法およびプログラム
CN110136106A (zh) * 2019-05-06 2019-08-16 腾讯科技(深圳)有限公司 医疗内窥镜图像的识别方法、系统、设备和内窥镜影像系统
CN110335313A (zh) * 2019-06-17 2019-10-15 腾讯科技(深圳)有限公司 音频采集设备定位方法及装置、说话人识别方法及系统
CN110742690A (zh) * 2019-09-12 2020-02-04 东南大学苏州医疗器械研究院 一种用于配置内窥镜的方法及终端设备
CN113470030A (zh) * 2021-09-03 2021-10-01 北京字节跳动网络技术有限公司 组织腔清洁度的确定方法、装置、可读介质和电子设备
CN113487609A (zh) * 2021-09-06 2021-10-08 北京字节跳动网络技术有限公司 组织腔体的定位方法、装置、可读介质和电子设备

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6956853B2 (ja) * 2018-03-30 2021-11-02 オリンパス株式会社 診断支援装置、診断支援プログラム、及び、診断支援方法
CN110051434A (zh) * 2019-04-25 2019-07-26 厦门强本科技有限公司 Ar与内窥镜结合手术导航方法及终端
CN113143168A (zh) * 2020-01-07 2021-07-23 日本电气株式会社 医疗辅助操作方法、装置、设备和计算机存储介质
CN111666998B (zh) * 2020-06-03 2022-04-22 电子科技大学 一种基于目标点检测的内窥镜智能插管决策方法
CN111862090B (zh) * 2020-08-05 2023-10-10 武汉楚精灵医疗科技有限公司 一种基于人工智能的食管癌术前管理的方法和系统
CN111986196B (zh) * 2020-09-08 2022-07-12 贵州工程应用技术学院 一种消化道胶囊内镜滞留自动监测方法及系统
CN112785549B (zh) * 2020-12-29 2024-03-01 成都微识医疗设备有限公司 基于图像识别的肠镜检查质量评估方法、装置及存储介质
CN112466466B (zh) * 2021-01-27 2021-05-18 萱闱(北京)生物科技有限公司 基于深度学习的消化道辅助检测方法、装置和计算设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016163609A (ja) * 2015-03-06 2016-09-08 富士フイルム株式会社 分岐構造判定装置、方法およびプログラム
CN110136106A (zh) * 2019-05-06 2019-08-16 腾讯科技(深圳)有限公司 医疗内窥镜图像的识别方法、系统、设备和内窥镜影像系统
CN110335313A (zh) * 2019-06-17 2019-10-15 腾讯科技(深圳)有限公司 音频采集设备定位方法及装置、说话人识别方法及系统
CN110742690A (zh) * 2019-09-12 2020-02-04 东南大学苏州医疗器械研究院 一种用于配置内窥镜的方法及终端设备
CN113470030A (zh) * 2021-09-03 2021-10-01 北京字节跳动网络技术有限公司 组织腔清洁度的确定方法、装置、可读介质和电子设备
CN113487609A (zh) * 2021-09-06 2021-10-08 北京字节跳动网络技术有限公司 组织腔体的定位方法、装置、可读介质和电子设备

Also Published As

Publication number Publication date
CN113487609B (zh) 2021-12-07
CN113487609A (zh) 2021-10-08

Similar Documents

Publication Publication Date Title
WO2023030373A1 (zh) 组织腔体的定位方法、装置、可读介质和电子设备
WO2023030370A1 (zh) 内窥镜图像检测方法、装置、存储介质及电子设备
WO2023030523A1 (zh) 用于内窥镜的组织腔体定位方法、装置、介质及设备
CN113658178B (zh) 组织图像的识别方法、装置、可读介质和电子设备
US11417014B2 (en) Method and apparatus for constructing map
WO2023029741A1 (zh) 用于内窥镜的组织腔体定位方法、装置、介质及设备
WO2023030097A1 (zh) 组织腔清洁度的确定方法、装置、可读介质和电子设备
CN113470029B (zh) 训练方法及装置、图像处理方法、电子设备和存储介质
WO2022012179A1 (zh) 生成特征提取网络的方法、装置、设备和计算机可读介质
WO2023030298A1 (zh) 息肉分型方法、模型训练方法及相关装置
WO2023030427A1 (zh) 生成模型的训练方法、息肉识别方法、装置、介质及设备
WO2023124877A1 (zh) 内窥镜图像的处理方法、装置、可读介质和电子设备
CN111967515A (zh) 图像信息提取方法、训练方法及装置、介质和电子设备
CN114429458A (zh) 内窥镜图像的处理方法、装置、可读介质和电子设备
WO2023165332A1 (zh) 组织腔体的定位方法、装置、可读介质和电子设备
WO2023185497A1 (zh) 组织图像的识别方法、装置、可读介质和电子设备
WO2023030426A1 (zh) 息肉识别方法、装置、介质及设备
WO2023185516A1 (zh) 图像识别模型的训练方法、识别方法、装置、介质和设备
WO2023130925A1 (zh) 字体识别方法、装置、可读介质及电子设备
CN114937178A (zh) 基于多模态的图像分类方法、装置、可读介质和电子设备
CN114863124A (zh) 模型训练方法、息肉检测方法、相应装置、介质及设备
CN114511887B (zh) 组织图像的识别方法、装置、可读介质和电子设备
CN114565586B (zh) 息肉分割模型的训练方法、息肉分割方法及相关装置
CN114049417B (zh) 虚拟角色图像的生成方法、装置、可读介质及电子设备
CN114841970B (zh) 检查图像的识别方法、装置、可读介质和电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22863508

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE