WO2023029741A1 - 用于内窥镜的组织腔体定位方法、装置、介质及设备 - Google Patents

用于内窥镜的组织腔体定位方法、装置、介质及设备 Download PDF

Info

Publication number
WO2023029741A1
WO2023029741A1 PCT/CN2022/104089 CN2022104089W WO2023029741A1 WO 2023029741 A1 WO2023029741 A1 WO 2023029741A1 CN 2022104089 W CN2022104089 W CN 2022104089W WO 2023029741 A1 WO2023029741 A1 WO 2023029741A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
feature
cavity
target
image sequence
Prior art date
Application number
PCT/CN2022/104089
Other languages
English (en)
French (fr)
Inventor
石小周
边成
赵家英
杨志雄
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Publication of WO2023029741A1 publication Critical patent/WO2023029741A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10068Endoscopic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Definitions

  • the present disclosure relates to the field of image processing, and in particular, relates to a tissue cavity positioning method, device, medium and equipment for an endoscope.
  • Endoscopic examination such as colonoscopy
  • the exit is the stage of the doctor's examination of the condition.
  • entering the country often requires more energy and time for the doctor, and blindly proceeds.
  • Endoscopy may cause damage to the intestinal mucosa, resulting in perforation.
  • automated navigation can be used to save mirror entry time and save doctors' workload.
  • there may be many complicated situations in the process of entering the mirror such as the occlusion of dirt, the peristalsis of the intestinal tract, and different intestinal tracts of different people.
  • the colonoscope is manually controlled by the doctor to retreat a certain distance, and then enter the mirror manually.
  • the present disclosure provides a method for locating a tissue cavity in an endoscope, the method comprising:
  • the target direction point of the tissue cavity corresponding to the cavity image sequence relative to the last image, wherein the target direction point is used to indicate the The next target movement direction of the endoscope at its current position;
  • the key point identification model includes a convolution subnetwork, a time loop subnetwork and a decoding subnetwork
  • the convolution subnetwork is used to obtain the spatial features of the cavity image sequence
  • the time loop subnetwork is used to The temporal feature of the cavity image sequence is acquired
  • the decoding sub-network is used to perform decoding based on the spatial feature and the temporal feature, so as to obtain the target direction point.
  • the present disclosure provides a tissue cavity positioning device for an endoscope, the device comprising:
  • the receiving module is configured to receive a sequence of cavity images to be identified, wherein the sequence of cavity images contains multiple consecutive images, and the last image in the sequence of cavity images is the endoscope in its current location. acquired at the position;
  • the first determination module is configured to determine the target direction point of the tissue cavity corresponding to the cavity image sequence relative to the last image according to the cavity image sequence and the key point recognition model, wherein the target The direction point is used to indicate the next target movement direction of the endoscope at its current position;
  • the key point identification model includes a convolution subnetwork, a time loop subnetwork and a decoding subnetwork
  • the convolution subnetwork is used to obtain the spatial features of the cavity image sequence
  • the time loop subnetwork is used to The temporal feature of the cavity image sequence is acquired
  • the decoding sub-network is used to perform decoding based on the spatial feature and the temporal feature, so as to obtain the target direction point.
  • the present disclosure provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processing device, the steps of the method described in the first aspect are implemented.
  • an electronic device including:
  • a processing device configured to execute the computer program in the storage device to implement the steps of the method in the first aspect.
  • multiple historical cavity images can be combined to predict the target direction point of the tissue cavity at the current moment, and in the process of direction prediction based on the key point recognition model, the multiple cavity images can be simultaneously
  • the spatial features and temporal features contained in the image can effectively improve the accuracy of the predicted target direction point and provide data support for the automatic endoscope navigation; on the other hand, it can make the method suitable for more complex
  • the in vivo environment improves the scope of application of the tissue cavity localization method.
  • the movement direction of the tissue cavity can be predicted based on the cavity image sequence, so that it can be applied to the scene where the center point of the cavity is not recognized from the cavity image, without manual operation by the user, and the internal The automation level of looking into the mirror improves the user experience.
  • Fig. 1 is a flowchart of a method for positioning a tissue cavity of an endoscope provided according to an implementation of the present disclosure
  • FIG. 2 is a schematic structural diagram of a key point recognition model provided according to an implementation of the present disclosure
  • FIG. 3 is a flow chart of training a key point recognition model provided according to an implementation of the present disclosure
  • Figure 4 is a schematic diagram of a standard ConvLSTM network
  • Fig. 5 is a block diagram of a tissue cavity positioning device for an endoscope provided according to an implementation of the present disclosure
  • FIG. 6 shows a schematic structural diagram of an electronic device suitable for implementing an embodiment of the present disclosure.
  • the term “comprise” and its variations are open-ended, ie “including but not limited to”.
  • the term “based on” is “based at least in part on”.
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one further embodiment”; the term “some embodiments” means “at least some embodiments.” Relevant definitions of other terms will be given in the description below.
  • FIG. 1 it is a flow chart of a method for locating a tissue cavity of an endoscope provided according to an implementation of the present disclosure. As shown in FIG. 1 , the method includes:
  • step 11 the sequence of cavity images to be identified is received, wherein the sequence of cavity images contains multiple consecutive images, and the last image in the sequence of cavity images is the endoscope in its current position. acquired at the position.
  • an endoscope shoots a medical endoscope video stream inside a living body, such as a human body.
  • the process of entering the lens of the endoscope is to capture images during the process of entering the target position of the human body from the lumen or closed body cavity that communicates with the outside world, so that the current location can be determined based on the captured images or videos. position to provide navigation for its mirroring process.
  • the cavity communicating with the outside world may be the digestive tract, respiratory tract, etc.
  • the closed body cavity may be the cavity of the chest cavity, abdominal cavity, etc. that can be fed into the endoscope through an incision.
  • the images in the video stream captured during the movement of the endoscope can be sampled, so that the sequence of cavity images can be obtained. Therefore, N can be predicted based on the latest N images obtained by the endoscope. The movement direction under the moment, improve the accuracy of the obtained movement direction.
  • step 12 according to the cavity image sequence and the key point recognition model, determine the target direction point of the tissue cavity corresponding to the cavity image sequence relative to the last image, wherein the target direction point is used to indicate the The next target moving direction of the endoscope at its current position.
  • the tissue cavity corresponding to the cavity image sequence is the tissue cavity corresponding to the displayed image in the cavity image sequence.
  • the tissue cavity can be intestinal cavity, gastric cavity, etc. Taking the intestinal cavity as an example, after the endoscope enters the intestinal cavity, it can take images at the position to obtain a cavity image sequence, then it should The corresponding tissue cavity is the intestinal cavity.
  • the automatic navigation of colonoscopy is mainly based on the cavity image to determine the intestinal lumen in the intestinal tract, so that the colonoscope can move in the direction of the intestinal lumen and reach the ileocecal area to complete the mirror entry.
  • the complex environment of the intestinal tract such as the peristalsis of the intestinal tract, different appearances of different intestinal segments, etc., as well as the obstruction of dirt in the intestinal tract, excessive curvature of the intestinal tract, adhesion of the intestinal wall, and the camera being too close to the intestinal wall, etc.
  • the intestinal cavity cannot be seen in the currently captured image of the cavity, so the moving position of the colonoscope cannot be determined.
  • the target direction point of the tissue cavity relative to the last image is a point used to indicate the direction of the position of the tissue cavity, that is, if the cavity image sequence is identified tissue cavity, the target direction point can be the center point of the tissue cavity, that is, the center of the space section surrounded by the inner wall of the tissue cavity, if the cavity image sequence does not identify the tissue cavity, then the target direction point is the predicted
  • the relative position point of the center point of the tissue cavity relative to the last cavity image indicates that the endoscope should be offset in the direction of the target direction point, so as to provide direction guidance for the advancement of the endoscope.
  • the key point recognition model includes a convolutional subnetwork 101, a time loop subnetwork 102 and a decoding subnetwork 103, and the convolutional subnetwork 101 is used to obtain the cavity image sequence Im. Spatial features, the time loop sub-network 102 is used to acquire the temporal features of the cavity image sequence, the decoding sub-network 103 is used to decode based on the spatial features and the temporal features to obtain the target direction point.
  • multiple historical cavity images can be combined to predict the target direction point of the tissue cavity at the current moment, and in the process of direction prediction based on the key point recognition model, multiple images can be used simultaneously.
  • the spatial and temporal features contained in the cavity image can effectively improve the accuracy of the predicted target direction point and provide data support for the automatic endoscope navigation; on the other hand, it can make the method suitable for more
  • the complex in vivo environment improves the scope of application of the tissue cavity positioning method.
  • the movement direction of the tissue cavity can be predicted based on the cavity image sequence, so that it can be applied to the scene where the center point of the cavity is not recognized from the cavity image, without manual operation by the user, and the internal The automation level of looking into the mirror improves the user experience.
  • the method may also include:
  • the driving device of the endoscope is used to control the movement of the endoscope, and a common driving device in the field may be used, which is not limited in the present disclosure.
  • the endoscope can be controlled to shift toward the target direction point, so that the endoscope can move into the mirror.
  • the cavity image can be acquired again during the movement of the endoscope, and combined with the historical cavity image to obtain the cavity image sequence corresponding to the current position after the endoscope moves, and through the above steps 11 and 12, further Determine the target movement direction of the endoscope.
  • the target position point may be a target position point determined according to the detection site.
  • the target position point may be the position point of the ileocecal area in the intestinal tract, so that the position point determined based on the cavity image sequence
  • the moving operation is ended, and the automatic mirror-in operation of the endoscope is realized.
  • the automatic endoscope navigation of the endoscope can be realized based on the target direction point and the driving device, thereby effectively reducing the technical and experience requirements of the inspection personnel for the operation of the endoscope entry operation, and facilitating the use of the inspection personnel. , to improve user experience.
  • the key point recognition model may be trained in the following manner, as shown in FIG. 3 , which may include the following steps:
  • step 21 multiple sets of training samples are obtained, wherein each set of training samples includes a training image sequence and a label image corresponding to the training image sequence.
  • the number of training images contained in the training image sequence can be limited according to the actual use scenario, for example, the training image sequence can contain 5 training images, that is, the tissue cavity in the current state can be predicted based on the latest 5 training images body position.
  • the label image corresponding to the training image sequence is used to indicate the position of the direction point of the cavity in the last image predicted based on the multiple images.
  • step 22 input the target input image sequence into the convolutional sub-network to obtain the spatial feature image corresponding to the target input image sequence, and input the target input image sequence into the temporal recurrent sub-network to obtain the temporal feature image corresponding to the target input image sequence,
  • the target input image sequence includes the training image sequence.
  • a training sample can be obtained, and the training image sequence in the training sample is input into the convolutional sub-network, so as to perform feature extraction on the training image sequence through the convolutional sub-network.
  • the convolutional subnetwork may adopt a Resnet18 network structure in which the fully connected layer and the pooling layer are deleted.
  • the input of the convolutional sub-network can be the result of superimposing each training image in the training image sequence in the channel dimension.
  • the training image is an RGB image
  • the training image can be represented as a 3-channel image. Therefore, the The input of the convolutional sub-network is an image of 3N channel dimension, where N is the number of training images contained in the training image sequence.
  • the training image sequence is input into the convolutional sub-network in the above manner, so that the N training images can be simultaneously extracted for signs in the convolutional sub-network.
  • feature fusion processing is performed based on N training images to obtain spatial feature images output by the convolutional subnetwork.
  • the training image sequence in the training sample can be input into the time-recurrent sub-network, so as to perform feature extraction on the training image sequence through the time-recurrent sub-network.
  • the time loop sub-network can be an LSTM (Long Short-Term Memory, long-term short-term memory) network.
  • LSTM Long Short-Term Memory, long-term short-term memory
  • only one training image is processed at a time based on the relationship of sequence, that is, for the training In the image corresponding to the earliest training image, feature extraction is performed on the training image to obtain a feature map, and then feature extraction can be performed based on the feature map and the next training image, and then the next feature map is obtained, that is, each time in the network Only one training image is processed.
  • the current training image it is processed based on the feature map of the historical training image and the current training image, so that in the process of image feature extraction, the features of the later training image The larger the weight, the more the extracted features match the current features.
  • FIG 4 it is a schematic diagram of a standard ConvLSTM network.
  • X t represents the input at time t
  • h (t-1) represents the input of the hidden unit at time t-1
  • C (t-1) represents the input of the main line memory of the network
  • f t represents the output of the forget gate
  • It represents The input gate outputs
  • g t represents the supply to the mainline memory
  • o t represents the output of the output gate.
  • 3 ⁇ 3 convolution can be used uniformly in LSTM, padding is 1, and stride is 1.
  • the input f t controls the forgetting degree of history C (t-1) through the fusion of h (t-1) and X t , and determines the amount of information obtained from the unit input by i t weighting g t , and o t
  • the calculation formula is as follows, where ⁇ represents tanh, ⁇ represents Sigmoid, W represents the corresponding convolution weight in the network, ⁇ represents the network translation amount, ⁇ It is the multiplication of the corresponding elements of the matrix, and the corresponding calculation method is as follows:
  • step 23 the spatial feature image and the temporal feature image are fused to obtain a fused feature image.
  • the fused feature image can be obtained by splicing the spatial feature image and the temporal feature image through the concatenate function.
  • step 24 input the fusion feature image into the decoding sub-network to obtain the direction feature image.
  • the decoding sub-network may be implemented by multiple decoding layers including a convolutional block, a self-attention module and an upsampling module.
  • a convolutional block As an example, input the fused feature map into the self-attention module, and transform it through three 1*1 convolution kernels f(x), g(x), and h(x).
  • the feature map M1 of f(x) is transposed to obtain the feature map M1', and the feature map M2 of g(x) is calculated by matrix multiplication to obtain the feature correlation representation, and then the feature correlation representation can be mapped based on softmax as
  • the probability matrix P is obtained in the probability form of 0 to 1, and finally the matrix multiplication is performed on the probability matrix P and the feature map M3 after h(x) to obtain the feature map S output from the self-attention module.
  • the feature map S is convolved through the convolution block ConvBlock to change the number of channels of the feature map S, and the feature map obtained after the convolution operation is input into the upsampling module Upsample for upsampling.
  • the output feature map U is obtained.
  • the processing of the next decoding layer is performed based on the feature map U, and its calculation method is the same as that described above, and will not be repeated here.
  • the feature map with the same size as the original image is obtained, that is, the direction feature image.
  • step 25 according to the direction feature image and the label image corresponding to the target input image sequence, the target loss of the key point recognition model is determined.
  • the mean square error can be calculated to obtain the target loss based on the direction feature image and the label image corresponding to the input training image sequence.
  • the calculation method of the mean square error is a common method in the art, and will not be repeated here.
  • step 26 if the update condition is satisfied, the parameters of the key point recognition model are updated according to the target loss.
  • the update condition may be that the target loss is greater than a preset loss threshold, which means that the recognition accuracy of the key point recognition model is insufficient.
  • the update condition may be that the number of iterations is less than a preset number threshold, and at this time it is considered that the number of iterations of the key point recognition model is relatively small, and its recognition accuracy is insufficient.
  • the parameters of the key point recognition model can be updated according to the target loss.
  • the manner of updating the parameters based on the determined target loss may adopt a commonly used updating manner in the field, which will not be repeated here.
  • the update condition is not satisfied, it can be considered that the recognition accuracy of the key point recognition model meets the training requirement, and at this time, the training process can be stopped to obtain a key point recognition model that has been trained.
  • the key point recognition model can be trained based on the training image sequence, so that the key point recognition model can combine the spatial features corresponding to multiple training images, and at the same time, the multiple training images can be combined Prediction is made based on the relationship of time series, and the recognition accuracy of the key point recognition model is improved, so that the tissue cavity localization method can be applied to more complex and wider application scenarios.
  • feature extraction can be performed based on time sequence, so that the serialized data feature extraction is more in line with human subjective cognition, and fits the user's own recognition experience, so that the predicted direction point can be further guaranteed to a certain extent
  • the accuracy provides data support for accurate navigation of the movement of the endoscope.
  • determining the target direction point of the tissue cavity corresponding to the cavity image sequence relative to the last image may include:
  • the target direction point corresponding to the cavity image sequence can be quickly and accurately determined based on the features output by the key point recognition model, so as to provide guidance for the moving direction of the automatic navigation of the endoscope.
  • the target input image sequence further includes a processed image sequence
  • the processed image sequence is an image sequence obtained by preprocessing based on the training image sequence
  • the label image corresponding to the processed image sequence is an image obtained by performing the same preprocessing on the label image corresponding to the training image sequence.
  • the preprocessing manner may be data enhancement, such as color, brightness, chroma, saturation transformation, and affine transformation.
  • the training images may be standardized before data enhancement, that is, the training images may be standardized to a preset size, so as to facilitate the normalization of the training images.
  • the training images in the training image sequence can be preprocessed to transform the training image sequence to obtain a processed image sequence to increase the diversity of training samples, which can effectively improve the training results.
  • the generalization of the key point recognition model makes the tissue cavity localization method applicable to more complex and wider application scenarios.
  • the label image in order to ensure the consistency of the training image sequence and the label image, can be transformed based on the same preprocessing method, so as to obtain the label image corresponding to the processed image sequence, and then based on the label image obtained after the processing
  • the image is used to identify the prediction error of the output image corresponding to the processed image sequence, which can further increase the diversity of training images, improve the training efficiency of the key point recognition model to a certain extent, and improve the stability of the key point recognition model.
  • the scope navigation of the scope provides accurate data support.
  • an exemplary implementation of determining the target loss of the key point recognition model is as follows, and this step may include:
  • the label image According to the position of each point in the label image and the marked direction point in the label image, convert the label image into a Gaussian feature map, wherein the marked direction point in the label image is the point in the training image sequence The orientation point of the tissue cavity.
  • the label image can be processed, and the label image can be converted into a Gaussian feature map through the relationship between each point in the label image and the position of the labeled direction point in the label image, wherein, The farther a point in the label image is from the labeled direction point, the smaller the Gaussian eigenvalue of the point.
  • the label image is converted into a Gaussian feature map according to the position of each point in the label image and the marked direction point in the label image by the following formula:
  • y′(x, y; x l , y l , ⁇ ) is used to represent the eigenvalues of the (x, y) coordinates in the Gaussian feature map;
  • is used to represent the hyperparameter of Gaussian transformation, where the value of the hyperparameter can be set based on the actual application scenario, which is not limited in this disclosure.
  • each point in the label image that is not marked with a direction point can also be characterized by a feature value, providing data support for the subsequent accurate calculation of the target loss predicted by the model.
  • the target loss is determined according to the directional feature image and the Gaussian feature map.
  • a mean square error may be calculated based on the directional feature image and the Gaussian feature map to obtain the target loss.
  • the label image when determining the target loss, can be converted into a Gaussian image for calculation, so that the accuracy of the determined target loss can be guaranteed, so as to ensure the accuracy of the parameter adjustment of the key point recognition model It can improve the efficiency of the model training, and at the same time, it can improve the accuracy of direction point prediction based on the trained key point recognition model for the cavity image sequence to be recognized, and provide decision-making data for endoscope navigation.
  • the decoding sub-network includes a multi-layer feature decoding network, and the size of the feature map output by each layer of feature decoding network is different;
  • An exemplary implementation of determining the target loss of the key point recognition model according to the direction feature image and the label image corresponding to the target input image sequence is as follows, and this step may include:
  • the feature map or the label image output by the feature decoding network of this layer is standardized, so as to obtain the corresponding target feature map and target label image of the same size corresponding to the feature decoding network of this layer.
  • the feature map output by each layer of feature decoding network can be normalized to normalize the feature map output by each layer to a feature map with the same size as the label image, then the feature obtained after each layer can be normalized map as the target feature map corresponding to this layer, and this label image is determined as the target label image.
  • label images may be normalized. For example, for each layer of feature processing network, the label image is normalized to a label image with the same size as the feature map output by the feature decoding network of this layer, then the label image obtained after normalization processing of each layer can be used as the corresponding target label of this layer image, and the feature map output by this layer is determined as the target feature map.
  • each layer of feature processing network the object of normalization processing is the same, that is, each layer performs standardization processing on label images, and each layer obtains standardization processing on feature maps.
  • the loss corresponding to this layer of feature decoding network is determined.
  • the method of calculating the loss is similar to the method of calculating the loss described above, and will not be repeated here. Therefore, attention can be paid to the accuracy of the target direction point of the tissue cavity predicted by each layer in the decoding sub-network during the decoding process, so as to improve the accuracy of the finally determined target direction point.
  • the target loss of the key point recognition model is determined according to the loss corresponding to each layer of feature decoding network.
  • the sum of losses corresponding to each layer of feature decoding networks can be determined as the target loss, or the average value of the losses corresponding to each layer of feature decoding networks can be used to determine the target loss, which can be set according to actual usage scenarios.
  • the loss calculation can be performed on the feature map output by each layer of the feature decoding network in the decoding sub-network, so that the target loss of the key point recognition model can be determined in combination with the corresponding loss of each layer.
  • it can be based on multiple The prediction of the scale improves the accuracy of the determined target loss.
  • it can improve the efficiency and accuracy of model parameter adjustment based on the target loss, thereby improving the training efficiency of the key point recognition model.
  • it can improve the prediction accuracy of each layer of feature decoding network in the decoding sub-network, avoid the accumulation of decoding errors corresponding to the multi-layer decoding feature network to a certain extent, further improve the recognition accuracy of the key point recognition model, and ensure Endoscope navigation.
  • the present disclosure also provides a tissue cavity positioning device for an endoscope, as shown in FIG. 5 , the device 50 includes:
  • the receiving module 51 is configured to receive a sequence of cavity images to be identified, wherein the sequence of cavity images contains multiple consecutive images, and the last image in the sequence of cavity images is the endoscope in its current acquired by location;
  • the first determination module 52 is configured to determine the target direction point of the tissue cavity corresponding to the cavity image sequence relative to the last image according to the cavity image sequence and the key point recognition model, wherein the The target direction point is used to indicate the next target movement direction of the endoscope at its current position;
  • the key point identification model includes a convolution subnetwork, a time loop subnetwork and a decoding subnetwork
  • the convolution subnetwork is used to obtain the spatial features of the cavity image sequence
  • the time loop subnetwork is used to The temporal feature of the cavity image sequence is acquired
  • the decoding sub-network is used to perform decoding based on the spatial feature and the temporal feature, so as to obtain the target direction point.
  • the key point recognition model is trained by a training device, and the training device includes:
  • An acquisition module configured to acquire multiple sets of training samples, wherein each set of training samples includes a training image sequence and a label image corresponding to the training image sequence;
  • the first processing module is configured to input the target input image sequence into the convolution sub-network, obtain the spatial feature image corresponding to the target input image sequence, and input the target input image sequence into the time loop sub-network to obtain A temporal feature image corresponding to the target input image sequence, wherein the target input image sequence includes the training image sequence;
  • a fusion module configured to fuse the spatial feature image and the temporal feature image to obtain a fusion feature image
  • the second processing module is used to input the fusion feature image into the decoding sub-network to obtain a direction feature image
  • the second determination module is used to determine the target loss of the key point recognition model according to the label image corresponding to the direction feature image and the target input image sequence;
  • An update module configured to update the parameters of the key point recognition model according to the target loss when an update condition is met.
  • the target input image sequence further includes a processed image sequence
  • the processed image sequence is an image sequence obtained by preprocessing based on the training image sequence
  • the label image corresponding to the processed image sequence is the training image sequence.
  • the second determination module includes:
  • a conversion submodule configured to convert the label image into a Gaussian feature map according to the positions of each point in the label image and the marked direction point in the label image;
  • a first determining submodule configured to determine the target loss according to the directional feature image and the Gaussian feature map.
  • the label image is converted into a Gaussian feature map according to the position of each point in the label image and the marked direction point in the label image by the following formula:
  • y′(x, y; x l , y l , ⁇ ) is used to represent the eigenvalues of the (x, y) coordinates in the Gaussian feature map;
  • is used to denote the hyperparameters of the Gaussian transform.
  • the decoding sub-network includes a multi-layer feature decoding network, and the size of the feature map output by each layer of feature decoding network is different;
  • the second determination module includes:
  • the processing sub-module is used to standardize the feature map or the label image output by the feature decoding network of the layer for each layer of feature decoding network, so as to obtain the same size target feature map and target label image corresponding to the feature decoding network of this layer ;
  • the second determination sub-module is used to determine the loss corresponding to the feature decoding network of this layer according to the target feature map corresponding to the feature decoding network of this layer and the target label image for each layer of feature decoding network;
  • the third determination sub-module is used to determine the target loss of the key point recognition model according to the loss corresponding to each layer of feature decoding network.
  • the device also includes:
  • a sending module configured to send the target direction point to the driving device of the endoscope, so that the endoscope moves to the target direction point, and trigger the receiving module to receive a cavity image sequence to be identified , until the endoscope reaches the target position point.
  • FIG. 6 it shows a schematic structural diagram of an electronic device 600 suitable for implementing an embodiment of the present disclosure.
  • the terminal equipment in the embodiment of the present disclosure may include but not limited to such as mobile phone, notebook computer, digital broadcast receiver, PDA (personal digital assistant), PAD (tablet computer), PMP (portable multimedia player), vehicle terminal (such as mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers and the like.
  • the electronic device shown in FIG. 6 is only an example, and should not limit the functions and application scope of the embodiments of the present disclosure.
  • an electronic device 600 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 601, which may be randomly accessed according to a program stored in a read-only memory (ROM) 602 or loaded from a storage device 608.
  • a processing device such as a central processing unit, a graphics processing unit, etc.
  • RAM memory
  • various programs and data necessary for the operation of the electronic device 600 are also stored.
  • the processing device 601, ROM 602, and RAM 603 are connected to each other through a bus 604.
  • An input/output (I/O) interface 605 is also connected to the bus 604 .
  • the following devices can be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 607 such as a computer; a storage device 608 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 609.
  • the communication means 609 may allow the electronic device 600 to perform wireless or wired communication with other devices to exchange data. While FIG. 6 shows electronic device 600 having various means, it is to be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from a network via communication means 609, or from storage means 608, or from ROM 602.
  • the processing device 601 When the computer program is executed by the processing device 601, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are performed.
  • the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two.
  • a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to: wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
  • the client and the server can communicate using any currently known or future network protocols such as HTTP (HyperText Transfer Protocol, Hypertext Transfer Protocol), and can communicate with digital data in any form or medium
  • HTTP HyperText Transfer Protocol
  • the communication eg, communication network
  • Examples of communication networks include local area networks (“LANs”), wide area networks (“WANs”), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: receives a cavity image sequence to be identified, wherein the cavity image sequence includes There are multiple consecutive images, and the last image in the cavity image sequence is obtained by the endoscope at its current position; according to the cavity image sequence and the key point recognition model, determine the cavity The target direction point of the tissue cavity corresponding to the image sequence relative to the last image, wherein the target direction point is used to indicate the next target movement direction of the endoscope at its current position; wherein,
  • the key point recognition model includes a convolution subnetwork, a time loop subnetwork and a decoding subnetwork, the convolution subnetwork is used to obtain the spatial features of the cavity image sequence, and the time loop subnetwork is used to obtain the The time feature of the cavity image sequence, the decoding sub-network is used to decode based on the space feature and the time feature, so as to obtain the target direction point
  • Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as "C" or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (for example, using an Internet service provider to connected via the Internet).
  • LAN local area network
  • WAN wide area network
  • Internet service provider for example, using an Internet service provider to connected via the Internet.
  • each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
  • the modules involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of the module does not constitute a limitation of the module itself under certain circumstances, for example, the receiving module may also be described as "a module that receives a cavity image sequence to be identified".
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs System on Chips
  • CPLD Complex Programmable Logical device
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • Example 1 provides a method for locating a tissue cavity in an endoscope, wherein the method includes:
  • the target direction point of the tissue cavity corresponding to the cavity image sequence relative to the last image, wherein the target direction point is used to indicate the The next target movement direction of the endoscope at its current position;
  • the key point identification model includes a convolution subnetwork, a time loop subnetwork and a decoding subnetwork
  • the convolution subnetwork is used to obtain the spatial features of the cavity image sequence
  • the time loop subnetwork is used to The temporal feature of the cavity image sequence is acquired
  • the decoding sub-network is used to perform decoding based on the spatial feature and the temporal feature, so as to obtain the target direction point.
  • Example 2 provides the method of Example 1, wherein the key point recognition model is trained in the following manner:
  • each set of training samples includes a training image sequence and a label image corresponding to the training image sequence
  • the parameters of the key point recognition model are updated according to the target loss.
  • Example 3 provides the method of Example 2, wherein the target input image sequence further includes a processed image sequence, and the processed image sequence is obtained by preprocessing based on the training image sequence
  • the image sequence corresponding to the processed image sequence is an image obtained by performing the same preprocessing on the label image corresponding to the training image sequence.
  • Example 4 provides the method of Example 2, wherein the key point recognition model is determined according to the direction feature image and the label image corresponding to the target input image sequence Target loss, including:
  • the target loss is determined according to the directional feature image and the Gaussian feature map.
  • Example 5 provides the method of Example 4, wherein, according to the position of each point in the label image and the labeled direction point in the label image, the label Convert the image to a Gaussian feature map:
  • y′(x, y; x l , y l , ⁇ ) is used to represent the eigenvalues of the (x, y) coordinates in the Gaussian feature map;
  • is used to denote the hyperparameters of the Gaussian transform.
  • Example 6 provides the method of Example 2, wherein the decoding sub-network includes a multi-layer feature decoding network, and each layer of feature decoding network outputs a different feature map size;
  • the target loss of the key point recognition model is determined according to the loss corresponding to each layer of feature decoding network.
  • Example 7 provides the method of Example 1, wherein the method further includes:
  • Example 8 provides a tissue cavity positioning device for an endoscope, wherein the device includes:
  • the receiving module is configured to receive a sequence of cavity images to be identified, wherein the sequence of cavity images contains multiple consecutive images, and the last image in the sequence of cavity images is the endoscope in its current location. acquired at the position;
  • the first determination module is configured to determine the target direction point of the tissue cavity corresponding to the cavity image sequence relative to the last image according to the cavity image sequence and the key point recognition model, wherein the target direction point used to indicate the next target movement direction of the endoscope at its current location;
  • the key point identification model includes a convolution subnetwork, a time loop subnetwork and a decoding subnetwork
  • the convolution subnetwork is used to obtain the spatial features of the cavity image sequence
  • the time loop subnetwork is used to The temporal feature of the cavity image sequence is acquired
  • the decoding sub-network is used to perform decoding based on the spatial feature and the temporal feature, so as to obtain the target direction point.
  • Example 9 provides a computer-readable medium on which a computer program is stored, wherein, when the program is executed by a processing device, the method described in any one of Examples 1-7 is implemented A step of.
  • Example 10 provides an electronic device, including:
  • a processing device configured to execute the computer program in the storage device, so as to implement the steps of the method in any one of examples 1-7.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Endoscopes (AREA)
  • Image Analysis (AREA)

Abstract

本公开涉及一种用于内窥镜的组织腔体定位方法、装置、介质及设备,所述方法包括:接收待识别的腔体图像序列,腔体图像序列包含有多张连续的图像,腔体图像序列中的最后一张图像为内窥镜在其当前所处位置获得的;根据腔体图像序列和关键点识别模型,确定腔体图像序列对应的组织腔体相对于所述最后一张图像的目标方向点;关键点识别模型包括卷积子网络、时间循环子网络和解码子网络,卷积子网络用于获取所述腔体图像序列的空间特征,时间循环子网络用于获取所述腔体图像序列的时间特征,解码子网络用于基于所述空间特征和所述时间特征进行解码,以获得所述目标方向点。由此可以对组织腔体的方向进行预测,以为内窥镜进镜导航提供数据支持。

Description

用于内窥镜的组织腔体定位方法、装置、介质及设备
相关申请的交叉引用
本申请基于申请号为202111033760.9、申请日为2021年09月03日,名称为“用于内窥镜的组织腔体定位方法、装置、介质及设备”的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本公开涉及图像处理领域,具体地,涉及一种用于内窥镜的组织腔体定位方法、装置、介质及设备。
背景技术
近年来由于深度学习的出现,人工智能技术得到了飞速的发展,在许多领域人工智能可以替代人类的工作,如执行重复性的繁琐的工作,可以大大减轻人类工作的负担。
在内窥镜检查,如肠镜检查通常分为进镜和退镜两个阶段,其中退镜为医生对病情的检查阶段,但进境往往需要花费医生更多的精力和时间,盲目的进镜可能导致损坏肠道粘膜,造成穿孔。相关技术中,可以通过自动化导航以节省进镜时间,节省医生工作量。相关技术中,然而进镜过程中可能存在很多复杂的情况,例如污物的遮挡、肠道的蠕动、不同人的不同肠道等,当肠腔不可见的情况下,通常需要医生参与自动化设备控制,通过医生人工控制将肠镜倒退一段距离,然后手动进镜。
发明内容
提供该发明内容部分以便以简要的形式介绍构思,这些构思将在后面的具体实施方式部分被详细描述。该发明内容部分并不旨在标识要求保护的技术方案的关键特征或必要特征,也不旨在用于限制所要求的保护的技术方案的范围。
第一方面,本公开提供一种用于内窥镜的组织腔体定位方法,所述方法包括:
接收待识别的腔体图像序列,其中,所述腔体图像序列包含有多张连续的图像,所述腔体图像序列中的最后一张图像为内窥镜在其当前所处位置获得的;
根据所述腔体图像序列和关键点识别模型,确定所述腔体图像序列对应的组织腔体相对于所述最后一张图像的目标方向点,其中,所述目标方向点用于指示所述内窥镜在其当前所处位置的下一目标移动方向;
其中,所述关键点识别模型包括卷积子网络、时间循环子网络和解码子网络,所述卷积子网络用于获取所述腔体图像序列的空间特征,所述时间循环子网络用于获取所述腔体图像序列的时间特征,所述解码子网络用于基于所述空间特征和所述时间特征进行解码,以获得所述目标方向点。
第二方面,本公开提供一种用于内窥镜的组织腔体定位装置,所述装置包括:
接收模块,用于接收待识别的腔体图像序列,其中,所述腔体图像序列包含有多张连续的图像,所述腔体图像序列中的最后一张图像为内窥镜在其当前所处位置获得的;
第一确定模块,用于根据所述腔体图像序列和关键点识别模型,确定所述腔体图像序列对应的组织腔体相对于所述最后一张图像的目标方向点,其中,所述目标方向点用于指示所述内窥镜在其当前所处位置的下一目标移动方向;
其中,所述关键点识别模型包括卷积子网络、时间循环子网络和解码子 网络,所述卷积子网络用于获取所述腔体图像序列的空间特征,所述时间循环子网络用于获取所述腔体图像序列的时间特征,所述解码子网络用于基于所述空间特征和所述时间特征进行解码,以获得所述目标方向点。
第三方面,本公开提供一种计算机可读介质,其上存储有计算机程序,该程序被处理装置执行时实现第一方面所述方法的步骤。
第四方面,提供一种电子设备,包括:
存储装置,其上存储有计算机程序;
处理装置,用于执行所述存储装置中的所述计算机程序,以实现第一方面所述方法的步骤。
通过上述技术方案,可以结合多张历史腔体图像对当前时刻下的组织腔体的目标方向点进行预测,并且在基于关键点识别模型进行方向预测的过程中,可以同时可以该多张腔体图像所包含的空间特征以及时间特征,一方面可以有效提高预测出的目标方向点的准确性,为内窥镜的自动进镜导航提供数据支持;另一方面可以使得该方法适用于更加复杂的体内环境,提高该组织腔体定位方法的适用范围。并且,通过上述技术方案,可以基于该腔体图像序列对组织腔体的移动方向进行预测,从而可以应用于从腔体图像中未识别出腔体中心点的场景,无需用户手动操作,提高内窥镜进镜的自动化水平,提升用户使用体验。
本公开的其他特征和优点将在随后的具体实施方式部分予以详细说明。
附图说明
结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。在附图中:
图1是根据本公开的实现方式提供的用于内窥镜的组织腔体定位方法的 流程图;
图2是根据本公开的实现方式提供的关键点识别模型的结构示意图;
图3是根据本公开的实现方式提供的关键点识别模型进行训练的流程图;
图4是标准的ConvLSTM网络的示意图;
图5是根据本公开的实现方式提供的用于内窥镜的组织腔体定位装置的框图;
图6其示出了适于用来实现本公开实施例的电子设备的结构示意图。
具体实施方式
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
图1所示,为根据本公开的实现方式提供的用于内窥镜的组织腔体定位方法的流程图,如图1所示,所述方法包括:
在步骤11中,接收待识别的腔体图像序列,其中,所述腔体图像序列包含有多张连续的图像,所述腔体图像序列中的最后一张图像为内窥镜在其当前所处位置获得的。
其中,在医疗内窥镜图像识别中,内窥镜在生物体,例如人体内部拍摄医疗内窥镜视频流。示例性的,在内窥镜的进镜过程即从人体与外界相通的腔道或者密闭体腔进入人体的目标位置的过程中进行图像拍摄,从而可以基于拍摄的图像或视频确定其当前所处的位置,以为其进镜过程提供导航。示例地,与外界相通的腔道可以是消化道、呼吸道等,密闭体腔可以是胸腔、腹腔等可以通过切口来送入内窥镜的腔体。
在该实施例中,可以对内窥镜移动过程中拍摄的视频流中的图像进行采样,从而可以获得该腔体图像序列,因此,可以基于内窥镜获得的最近的N张图像,预测N时刻下的移动方向,提高获得的移动方向的准确性。
在步骤12中,根据腔体图像序列和关键点识别模型,确定腔体图像序列对应的组织腔体相对于所述最后一张图像的目标方向点,其中,所述目标方向点用于指示所述内窥镜在其当前所处位置的下一目标移动方向。
其中,所述腔体图像序列对应的组织腔体即为该腔体图像序列中的显示图像应对应的组织腔体。示例地,该组织腔体可以为肠腔、胃腔等,以肠腔为例,在内窥镜进入肠腔后,其可以在其所述位置拍摄图像从而获得腔体图像序列,则其应对应的组织腔体即为肠腔。
以肠镜为例,肠镜的自动化导航主要是基于腔体图像确定出肠道中的肠腔,以使肠镜按照肠腔的方向进行移动,到达回盲区域完成进镜。而由于肠道的环境复杂,例如肠道的蠕动、不同肠段的不同外观等,以及肠道中污物的遮挡、肠道弯曲过大、肠壁黏连、镜头离肠壁过近等可能会导致当前拍摄到的腔体图像中看不到肠腔,从而无法确定肠镜的移动位置。因此,在本公开实施例中,组织腔体相对于所述最后一张图像的目标方向点为用于表示该组织腔体所处位置的方向的点,即若腔体图像序列中识别出该组织腔体,则该目标方向点可以是该组织腔体的中心点,即组织腔体内壁所包围空间截面的中心,若腔体图像序列未识别出组织腔体,则该目标方向点为预测出的组织腔体的中心点相对于最后一张腔体图像的相对位置点,表示内窥镜应该朝该目标方向点的方向偏移,以为内窥镜的前进提供方向引导。
其中,如图2所示,所述关键点识别模型包括卷积子网络101、时间循环子网络102和解码子网络103,所述卷积子网络101用于获取所述腔体图像序列Im的空间特征,所述时间循环子网络102用于获取所述腔体图像序列的时间特征,所述解码子网络103用于基于所述空间特征和所述时间特征进行解码,以获得所述目标方向点。
由此,通过上述技术方案,可以结合多张历史腔体图像对当前时刻下的组织腔体的目标方向点进行预测,并且在基于关键点识别模型进行方向预测的过程中,可以同时利用多张腔体图像所包含的空间特征以及时间特征,一方面可以有效提高预测出的目标方向点的准确性,为内窥镜的自动进镜导航提供数据支持;另一方面可以使得该方法适用于更加复杂的体内环境,提高该组织腔体定位方法的适用范围。并且,通过上述技术方案,可以基于该腔体图像序列对组织腔体的移动方向进行预测,从而可以应用于从腔体图像中未识别出腔体中心点的场景,无需用户手动操作,提高内窥镜进镜的自动化水平,提升用户使用体验。
在一种可能的实施例中,所述方法还可以包括:
向所述内窥镜的驱动装置发送所述目标方向点,以使所述内窥镜向所述目标方向点移动;
并重新返回执行所述接收待识别的腔体图像序列的步骤,直至所述内窥镜到达目标位置点。
其中,所述内窥镜的驱动装置用于控制所述内窥镜移动,可以采用本领域中常用的驱动装置,本公开对此不进行限定。在确定出目标方向点后,则可以控制该内窥镜向该目标方向点偏移,以使得内窥镜实现进镜移动。之后,则可以在内窥镜移动的过程中再次获取腔体图像,并结合历史腔体图像获得内窥镜移动后的当前位置对应的腔体图像序列,并通过上述步骤11和步骤12,进一步确定内窥镜的目标移动方向。
示例地,该目标位置点可以为根据检测部位确定出的目标位置点,如进行肠道检测时,该目标位置点可以为肠道中的回盲区域的位置点,从而在基于腔体图像序列确定到达该目标位置点时结束移动操作,实现内窥镜的自动进镜操作。
由此,通过上述技术方案,可以基于该目标方向点和驱动装置实现内窥镜的自动进镜导航,从而可以有效降低内窥镜进镜操作对检测人员的技术和经验要求,便于检测人员使用,提升用户使用体验。
为了使本领域技术人员能够更加理解本发明实施例提供的技术方案,下面对上述步骤及相关内容进行较为详细的说明。
在一种可能的实施例中,所述关键点识别模型可以通过如下方式进行训练,如图3所示,其可以包括以下步骤:
在步骤21中,获取多组训练样本,其中,每组训练样本中包含训练图像序列以及与所述训练图像序列对应的标签图像。
其中,训练图像序列中包含的训练图像的数量可以根据实际使用场景进行限定,例如,该训练图像序列中可以包含5张训练图像,即可以基于最近的5张训练图像预测当前状态下的组织腔体的位置。其中,该训练图像序列 对应的标签图像用于指示基于该多张图像预测出的最后一张图像中的腔体的方向点的位置。
在步骤22中,将目标输入图像序列输入卷积子网络,获得目标输入图像序列对应的空间特征图像,并将目标输入图像序列输入时间循环子网络,获得目标输入图像序列对应的时间特征图像,其中,所述目标输入图像序列包含所述训练图像序列。
其中,在该步骤中可以获取训练样本,将该训练样本中的训练图像序列输入卷积子网络,以通过该卷积子网络对该训练图像序列进行特征提取。示例地,该卷积子网络可以采用删除全连接层和池化层的Resnet18网络结构。
示例地,该卷积子网络的输入可以为该训练图像序列中各个训练图像在通道维度上叠加的结果,如训练图像为RGB图像,则该训练图像可以表示为3通道的图像,因此,该卷积子网络的输入为3N通道维度的图像,其中N为训练图像序列中包含的训练图像的数量。之后,将该训练图像序列通过上述方式输入该卷积子网络,从而可以在该卷积子网络中同时对该N张训练图像进行体征提取。其中,在该卷积子网络中的每一层网络中都会基于N张训练图像进行特征融合处理,以获得该卷积子网络输出的空间特征图像。
并且,同时可以将该训练样本中的训练图像序列输入时间循环子网络,以通过该时间循环子网络对该训练图像序列进行特征提取。示例地,该时间循环子网络可以是LSTM(Long Short-Term Memory,长短期记忆)网络,在该时间循环子网络中,基于先后次序的关系每次只处理一张训练图像,即针对该训练图像中对应时间最早的训练图像,对该训练图像进行特征提取,获得特征图,之后可以基于该特征图与下一训练图像进行特征提取,进而获得下一特征图,即在该网络中每次只处理一张训练图像,在处理当前训练图像是,是基于历史的训练图像的特征图和当前训练图像进行处理,由此可以使得在图像特征提取的过程中,时间越晚的训练图像的特征权重越大,使得提取出的特征与当前时刻的特征更加匹配。
如图4所示,为标准的ConvLSTM网络的示意图。其中X t代表t时刻的输入,h (t-1)代表t-1时刻的隐藏单元的输入,C (t-1)代表该网络的主线记忆的输入,f t代表遗忘门输出,it代表输入门输出,g t代表对主线记忆的补给,o t代表输出门的输出。在该示例中,LSTM中可以统一使用3×3的卷积,padding为1,步长stride为1。输入通过h (t-1)和X t的融合得到f t控制历史C (t-1)的遗忘程度,通过i t为g t加权来确定该从该单元输入获得的信息量,通过o t来得到从主线记忆中获得的信息作为该单元的输出h t,其中的计算公式如下,其中φ代表tanh,σ代表Sigmoid,W代表网络中的对应的卷积权重,ε代表网络平移量,⊙为矩阵对应元素相乘,对应的计算方式如下:
g t=φ(W xg*X t+W hg*h t-1g),
i t=σ(W xi*X t+W hi*h t-1i),
f t=σ(W xf*X t+W hf*h t-1f),
o t=σ(W xo*X t+W ho*h t-1o),
C t=f t□C t-1+i t□g t
h t=o t□φ(C t)
在步骤23中,对空间特征图像和时间特征图像进行融合,获得融合特征图像。
其中可以通过concatenate函数对空间特征图像和时间特征图像进行特征拼接,从而获得该融合特征图像。
在步骤24中,将融合特征图像输入解码子网络,获得方向特征图像。
在一种可能的实施例中,所述解码子网络可以通过包含卷积块、自注意力模块和上采样模块的多个解码层实现。作为示例,向自注意力模块中输入该融合特征图,通过3个1*1的卷积核f(x)、g(x)、h(x)进行变换。其中经过f(x)的特征图M1通过转置得到特征图M1’,和通过g(x)的特征图M2进行矩阵乘法计算获得特征相关性表示,之后可以基于softmax将特征相关性表示映射为0~1的概率形式得出概率矩阵P,最后将该概率矩阵P和经过h(x)的特征图M3进行矩阵乘法获得自注意力模块输出的特征图S。
之后将该特征图S通过卷积块ConvBlock进行卷积操作,以改变该特征图S的通道数,并将进行卷积操作后获得的特征图输入到上采样模块Upsample中进行上采样。在对输入的特征图上采样后,得到输出的特征图U。之后,基于该特征图U进行下一解码层的处理,其计算方式与上文所述类型,在此不再赘述。经过最后一层解码层的输出,获得与原始图像尺寸相同的特征图,即该方向特征图像。
在步骤25中,根据方向特征图像和目标输入图像序列对应的标签图像,确定关键点识别模型的目标损失。
其中,可以基于该方向特征图像和输入的训练图像序列对应的标签图像,计算均方误差(MSE)以获得该目标损失。其中,该均方误差的计算方式为本领域中的常用方式,在此不再赘述。
在步骤26中,在满足更新条件的情况下,根据所述目标损失对所述关键点识别模型的参数进行更新。
作为示例,该更新条件可以为目标损失大于预设的损失阈值,此时表示关键点识别模型的识别准确性不足。作为另一示例,该更新条件可以是迭代次数小于预设的次数阈值,此时认为关键点识别模型迭代次数较少,其识别准确性不足。
相应地,在满足更新条件的情况下,可以根据该目标损失对关键点识别模型的参数进行更新。其中,基于确定出的目标损失对参数进行更新的方式可以采用本领域中常用的更新方式,在此不再赘述。
在不满足该更新条件的情况下,则可以认为该关键点识别模型的识别精确性达到训练要求,此时可以停止训练过程,获得训练完成的关键点识别模型。
由此,通过上述技术方案,可以基于训练图像序列对关键点识别模型进行训练,从而使得在该关键点识别模型中可以结合多张训练图像对应的空间特征,同时还可以结合该多张训练图像基于时间序列的关系进行预测,提高 该关键点识别模型的识别准确性,使得该组织腔体定位方法可以适用于更复杂更广泛的应用场景。同时在该训练过程中,可以基于时间次序进行特征提取,使得该序列化数据特征提取更加符合人类主观认知,贴合用户本身的识别经验,从而可以在一定程度上进一步保证预测出的方向点的准确性,为内窥镜的移动进行准确导航提供数据支持。
相应地,在步骤12中,根据腔体图像序列和关键点识别模型,确定腔体图像序列对应的组织腔体相对于所述最后一张图像的目标方向点,可以包括:
将所述腔体图像序列输入所述关键点识别模型,获得所述关键点识别模型输出的方向特征图像,并将所述方向特征图像中对应特征值最大的点确定为所述目标方向点。
由此,可以基于该关键点识别模型输出的特征快速且准确地确定出腔体图像序列对应的目标方向点,以为内窥镜的自动导航提供移动方向的引导。
在一种可能的实施例中,所述目标输入图像序列还包括处理图像序列,所述处理图像序列为基于所述训练图像序列进行预处理所得的图像序列,所述处理图像序列对应的标签图像为对所述训练图像序列对应的标签图像进行相同的预处理所得的图像。
示例地,该示例地,该预处理方式可以是数据增强,例如可以是颜色、亮度、色度、饱和度变换、以及仿射变换等。
作为示例,为了提高图像处理的准确性,可以在进行数据增强之前对训练图像进行标准化处理,即将该训练图像标准化至预设尺寸,以便于对训练图像的标准化处理。
相应地,在该实施例中可以通过对训练图像序列中的训练图像进行预处理,从而对该训练图像序列进行变换,获得处理图像序列,以增加训练样本的多样性,可以有效提高训练所得的关键点识别模型的泛化性,使得该组织腔体定位方法可以适用于更复杂更广泛的应用场景。在本公开实施例中,为 了保证训练图像序列和标签图像的一致性,可以基于相同的预处理方式对标签图像进行变换,从而获得处理图像序列对应的标签图像,进而基于该处理后所得的标签图像对处理图像序列对应的输出图像进行预测误差的识别,从而可以进一步提高训练图像的多样性,在一定程度上提高关键点识别模型的训练效率,提高该关键点识别模型的稳定性,为内窥镜的进镜导航提供准确的数据支持。
在一种可能的实施例中,根据方向特征图像和目标输入图像序列对应的标签图像,确定关键点识别模型的目标损失的示例性实现方式如下,该步骤可以包括:
根据所述标签图像中各点与所述标签图像中的标注方向点的位置,将所述标签图像转换为高斯特征图,其中,所述标签图像中的标注方向点即该训练图像序列中的组织腔体的方向点。
其中,该标签图像中该标注方向点为数量为一个,其余位置特征值为0,而在解码子网络输出的方向特征图像为全0的图像时,则会使得该方向特征图像与标签图像之间的目标损失较小,不便于模型的参数更新。因此,在本公开实施例中,可以对标签图像进行处理,通过该标签图像中各点与所述标签图像中的标注方向点的位置的关系,将该标签图像转换为高斯特征图,其中,标签图像中的点与标注方向点越远,则该点的高斯特征值越小。
示例地,通过以下公式根据所述标签图像中各点与所述标签图像中的标注方向点的位置,将所述标签图像转换为高斯特征图:
Figure PCTCN2022104089-appb-000001
y′(x,y;x l,y l,α)用于表示高斯特征图中(x,y)坐标的特征值;
(x,y)用于表示所述标签图像中的元素坐标值;
(x l,y l)用于表示所述标签图像中标注方向点的坐标值;
α用于表示高斯变换的超参数,其中该超参数的取值可以基于实际应用 场景进行设置,本公开对此不进行限定。
由此,可以使得标签图像中的非标注方向点的各点也可以通过特征值进行表征,为后续准确计算模型预测的目标损失提供数据支持。
根据所述方向特征图像和所述高斯特征图,确定所述目标损失。
示例地,可以基于该方向特征图像和该高斯特征图计算均方误差(MSE)以获得该目标损失。
由此,通过上述技术方案,在确定目标损失时,可以通过将标签图像转化成高斯图像进行计算,从而可以保证确定出的目标损失的准确性,以保证对关键点识别模型的参数调整的准确性,提高该模型训练的效率,同时可以提高基于训练完成的关键点识别模型对待识别的腔体图像序列进行方向点预测的准确性,为内窥镜的进镜导航提供决策数据。
可选地,所述解码子网络包含多层特征解码网络,每层特征解码网络输出的特征图尺寸不同;
所述根据所述方向特征图像和所述目标输入图像序列对应的标签图像,确定所述关键点识别模型的目标损失的示例性实现方式如下,该步骤可以包括:
针对每层特征解码网络,对该层特征解码网络输出的特征图或该标签图像进行标准化处理,以获得该层特征解码网络对应的尺寸相同的目标特征图和目标标签图像。
其中,在对输入的图像序列进行特征提取编码的过程中,通常是采用增加通道数并降低特征图的宽和高的方式进行编码的,因此,在基于该多层特征解码网络进行解码的过程中,通常是降低通道数并增加特征图的宽和高的方式,以使得最终输出的特征图与原始输入图像的尺寸相同。
作为示例,可以对每层特征解码网络输出的特征图进行标准化处理,以将每层输出的特征图标准化至与标签图像的尺寸相同的特征图,则可以将每层进行标准化处理后所得的特征图作为该层对应的目标特征图,并将该标签 图像确定为目标标签图像。
作为另一示例,可以对标签图像进行标准化处理。如,针对每层特征处理网络,将该标签图像标准化为该层特征解码网络输出的特征图尺寸相同的标签图像,则可以将每层进行标准化处理后所得的标签图像作为该层对应的目标标签图像,并将该层输出的特征图确定为目标特征图。
其中,需要说明的是,针对每层特征处理网络其进行标准化处理的对象相同,即每层均对标签图像进行标准化处理,获得每层均对特征图进行标准化处理。
针对每层特征解码网络,根据该层特征解码网络对应的目标特征图与所述目标标签图像,确定该层特征解码网络对应的损失。其中,计算损失的方式与上文所述损失计算方式类似,在此不再赘述。由此,可以在解码的过程中关注该解码子网络中每一层预测的组织腔体的目标方向点的准确性,以提高最终确定出的目标方向点的准确性。
根据每层特征解码网络对应的损失确定所述关键点识别模型的目标损失。
其中,可以将每层特征解码网络对应的损失之和确定为目标损失,也可以将每层特征解码网络对应的损失的平均值确定出该目标损失,可以根据实际使用场景进行设置。
通过上述技术方案,可以对该解码子网络中的每一层特征解码网络输出的特征图进行损失计算,从而可以结合每层对应的损失确定该关键点识别模型的目标损失,一方面可以基于多尺度的预测提高确定出的目标损失的准确性,另一方面可以提高基于该目标损失进行模型参数调整的效率和准确性,从而提高关键点识别模型的训练效率。并且可以针对解码子网络中的每一层特征解码网络的预测准确性进行提高,在一定程度上避免多层解码特征网络对应的解码误差累计,进一步提高该关键点识别模型的识别准确率,保证内窥镜的进镜导航。
本公开还提供一种用于内窥镜的组织腔体定位装置,如图5所示,所述装置50包括:
接收模块51,用于接收待识别的腔体图像序列,其中,所述腔体图像序列包含有多张连续的图像,所述腔体图像序列中的最后一张图像为内窥镜在其当前所处位置获得的;
第一确定模块52,用于根据所述腔体图像序列和关键点识别模型,确定所述腔体图像序列对应的组织腔体相对于所述最后一张图像的目标方向点,其中,所述目标方向点用于指示所述内窥镜在其当前所处位置的下一目标移动方向;
其中,所述关键点识别模型包括卷积子网络、时间循环子网络和解码子网络,所述卷积子网络用于获取所述腔体图像序列的空间特征,所述时间循环子网络用于获取所述腔体图像序列的时间特征,所述解码子网络用于基于所述空间特征和所述时间特征进行解码,以获得所述目标方向点。
可选地,所述关键点识别模型通过训练装置进行训练,所述训练装置包括:
获取模块,用于获取多组训练样本,其中,每组训练样本中包含训练图像序列以及与所述训练图像序列对应的标签图像;
第一处理模块,用于将目标输入图像序列输入所述卷积子网络,获得所述目标输入图像序列对应的空间特征图像,并将所述目标输入图像序列输入所述时间循环子网络,获得所述目标输入图像序列对应的时间特征图像,其中,所述目标输入图像序列包含所述训练图像序列;
融合模块,用于对所述空间特征图像和所述时间特征图像进行融合,获得融合特征图像;
第二处理模块,用于将所述融合特征图像输入所述解码子网络,获得方向特征图像;
第二确定模块,用于根据所述方向特征图像和所述目标输入图像序列对 应的标签图像,确定所述关键点识别模型的目标损失;
更新模块,用于在满足更新条件的情况下,根据所述目标损失对所述关键点识别模型的参数进行更新。
可选地,所述目标输入图像序列还包括处理图像序列,所述处理图像序列为基于所述训练图像序列进行预处理所得的图像序列,所述处理图像序列对应的标签图像为对所述训练图像序列对应的标签图像进行相同的预处理所得的图像。
可选地,所述第二确定模块包括:
转换子模块,用于根据所述标签图像中各点与所述标签图像中的标注方向点的位置,将所述标签图像转换为高斯特征图;
第一确定子模块,用于根据所述方向特征图像和所述高斯特征图,确定所述目标损失。
可选地,通过以下公式根据所述标签图像中各点与所述标签图像中的标注方向点的位置,将所述标签图像转换为高斯特征图:
Figure PCTCN2022104089-appb-000002
y′(x,y;x l,y l,α)用于表示高斯特征图中(x,y)坐标的特征值;
(x,y)用于表示所述标签图像中的元素坐标值;
(x l,y l)用于表示所述标签图像中标注方向点的坐标值;
α用于表示高斯变换的超参数。
可选地,所述解码子网络包含多层特征解码网络,每层特征解码网络输出的特征图尺寸不同;
所述第二确定模块包括:
处理子模块,用于针对每层特征解码网络,对该层特征解码网络输出的特征图或该标签图像进行标准化处理,以获得该层特征解码网络对应的尺寸相同的目标特征图和目标标签图像;
第二确定子模块,用于针对每层特征解码网络,根据该层特征解码网络对应的目标特征图与所述目标标签图像,确定该层特征解码网络对应的损失;
第三确定子模块,用于根据每层特征解码网络对应的损失确定所述关键点识别模型的目标损失。
可选地,所述装置还包括:
发送模块,用于向所述内窥镜的驱动装置发送所述目标方向点,以使所述内窥镜向所述目标方向点移动,并触发所述接收模块接收待识别的腔体图像序列,直至所述内窥镜到达目标位置点。
下面参考图6,其示出了适于用来实现本公开实施例的电子设备600的结构示意图。本公开实施例中的终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图6示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图6所示,电子设备600可以包括处理装置(例如中央处理器、图形处理器等)601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储装置608加载到随机访问存储器(RAM)603中的程序而执行各种适当的动作和处理。在RAM 603中,还存储有电子设备600操作所需的各种程序和数据。处理装置601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。
通常,以下装置可以连接至I/O接口605:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置606;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置607;包括例如磁带、硬盘等的存储装置608;以及通信装置609。通信装置609可以允许电子设备600与其他设备进行无线或有线通信以交换数据。虽然图6示出了具有各 种装置的电子设备600,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置609从网络上被下载和安装,或者从存储装置608被安装,或者从ROM 602被安装。在该计算机程序被处理装置601执行时,执行本公开实施例的方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任 意合适的组合。
在一些实施方式中,客户端、服务器可以利用诸如HTTP(HyperText Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:接收待识别的腔体图像序列,其中,所述腔体图像序列包含有多张连续的图像,所述腔体图像序列中的最后一张图像为内窥镜在其当前所处位置获得的;根据所述腔体图像序列和关键点识别模型,确定所述腔体图像序列对应的组织腔体相对于所述最后一张图像的目标方向点,其中,所述目标方向点用于指示所述内窥镜在其当前所处位置的下一目标移动方向;其中,所述关键点识别模型包括卷积子网络、时间循环子网络和解码子网络,所述卷积子网络用于获取所述腔体图像序列的空间特征,所述时间循环子网络用于获取所述腔体图像序列的时间特征,所述解码子网络用于基于所述空间特征和所述时间特征进行解码,以获得所述目标方向点。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言-诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言——诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上 执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)——连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的模块可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,模块的名称在某种情况下并不构成对该模块本身的限定,例如,接收模块还可以被描述为“接收待识别的腔体图像序列的模块”。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可 读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
根据本公开的一个或多个实施例,示例1提供了一种用于内窥镜的组织腔体定位方法,其中,所述方法包括:
接收待识别的腔体图像序列,其中,所述腔体图像序列包含有多张连续的图像,所述腔体图像序列中的最后一张图像为内窥镜在其当前所处位置获得的;
根据所述腔体图像序列和关键点识别模型,确定所述腔体图像序列对应的组织腔体相对于所述最后一张图像的目标方向点,其中,所述目标方向点用于指示所述内窥镜在其当前所处位置的下一目标移动方向;
其中,所述关键点识别模型包括卷积子网络、时间循环子网络和解码子网络,所述卷积子网络用于获取所述腔体图像序列的空间特征,所述时间循环子网络用于获取所述腔体图像序列的时间特征,所述解码子网络用于基于所述空间特征和所述时间特征进行解码,以获得所述目标方向点。
根据本公开的一个或多个实施例,示例2提供了示例1的方法,其中,所述关键点识别模型通过如下方式进行训练:
获取多组训练样本,其中,每组训练样本中包含训练图像序列以及与所述训练图像序列对应的标签图像;
将目标输入图像序列输入所述卷积子网络,获得所述目标输入图像序列对应的空间特征图像,并将所述目标输入图像序列输入所述时间循环子网络,获得所述目标输入图像序列对应的时间特征图像,其中,所述目标输入图像序列包含所述训练图像序列;
对所述空间特征图像和所述时间特征图像进行融合,获得融合特征图像;
将所述融合特征图像输入所述解码子网络,获得方向特征图像;
根据所述方向特征图像和所述目标输入图像序列对应的标签图像,确定所述关键点识别模型的目标损失;
在满足更新条件的情况下,根据所述目标损失对所述关键点识别模型的参数进行更新。
根据本公开的一个或多个实施例,示例3提供了示例2的方法,其中,所述目标输入图像序列还包括处理图像序列,所述处理图像序列为基于所述训练图像序列进行预处理所得的图像序列,所述处理图像序列对应的标签图像为对所述训练图像序列对应的标签图像进行相同的预处理所得的图像。
根据本公开的一个或多个实施例,示例4提供了示例2的方法,其中,所述根据所述方向特征图像和所述目标输入图像序列对应的标签图像,确定所述关键点识别模型的目标损失,包括:
根据所述标签图像中各点与所述标签图像中的标注方向点的位置,将所述标签图像转换为高斯特征图;
根据所述方向特征图像和所述高斯特征图,确定所述目标损失。
根据本公开的一个或多个实施例,示例5提供了示例4的方法,其中,通过以下公式根据所述标签图像中各点与所述标签图像中的标注方向点的位置,将所述标签图像转换为高斯特征图:
Figure PCTCN2022104089-appb-000003
y′(x,y;x l,y l,α)用于表示高斯特征图中(x,y)坐标的特征值;
(x,y)用于表示所述标签图像中的元素坐标值;
(x l,y l)用于表示所述标签图像中标注方向点的坐标值;
α用于表示高斯变换的超参数。
根据本公开的一个或多个实施例,示例6提供了示例2的方法,其中,所述解码子网络包含多层特征解码网络,每层特征解码网络输出的特征图尺 寸不同;
所述根据所述方向特征图像和所述目标输入图像序列对应的标签图像,确定所述关键点识别模型的目标损失,包括:
针对每层特征解码网络,对该层特征解码网络输出的特征图或该标签图像进行标准化处理,以获得该层特征解码网络对应的尺寸相同的目标特征图和目标标签图像;
针对每层特征解码网络,根据该层特征解码网络对应的目标特征图与所述目标标签图像,确定该层特征解码网络对应的损失;
根据每层特征解码网络对应的损失确定所述关键点识别模型的目标损失。
根据本公开的一个或多个实施例,示例7提供了示例1的方法,其中,所述方法还包括:
向所述内窥镜的驱动装置发送所述目标方向点,以使所述内窥镜向所述目标方向点移动;
并重新返回执行所述接收待识别的腔体图像序列的步骤,直至所述内窥镜到达目标位置点。
根据本公开的一个或多个实施例,示例8提供了一种用于内窥镜的组织腔体定位装置,其中,所述装置包括:
接收模块,用于接收待识别的腔体图像序列,其中,所述腔体图像序列包含有多张连续的图像,所述腔体图像序列中的最后一张图像为内窥镜在其当前所处位置获得的;
第一确定模块,用于根据所述腔体图像序列和关键点识别模型,确定腔体图像序列对应的组织腔体相对于所述最后一张图像的目标方向点,其中,所述目标方向点用于指示所述内窥镜在其当前所处位置的下一目标移动方向;
其中,所述关键点识别模型包括卷积子网络、时间循环子网络和解码子 网络,所述卷积子网络用于获取所述腔体图像序列的空间特征,所述时间循环子网络用于获取所述腔体图像序列的时间特征,所述解码子网络用于基于所述空间特征和所述时间特征进行解码,以获得所述目标方向点。
根据本公开的一个或多个实施例,示例9提供了一种计算机可读介质,其上存储有计算机程序,其中,该程序被处理装置执行时实现示例1-7中任一示例所述方法的步骤。
根据本公开的一个或多个实施例,示例10提供了一种电子设备,其中,包括:
存储装置,其上存储有计算机程序;
处理装置,用于执行所述存储装置中的所述计算机程序,以实现示例1-7中任一示例所述方法的步骤。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特 定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。

Claims (10)

  1. 一种用于内窥镜的组织腔体定位方法,其特征在于,所述方法包括:
    接收待识别的腔体图像序列,其中,所述腔体图像序列包含有多张连续的图像,所述腔体图像序列中的最后一张图像为内窥镜在其当前所处位置获得的;
    根据所述腔体图像序列和关键点识别模型,确定所述腔体图像序列对应的组织腔体相对于所述最后一张图像的目标方向点,其中,所述目标方向点用于指示所述内窥镜在其当前所处位置的下一目标移动方向;
    其中,所述关键点识别模型包括卷积子网络、时间循环子网络和解码子网络,所述卷积子网络用于获取所述腔体图像序列的空间特征,所述时间循环子网络用于获取所述腔体图像序列的时间特征,所述解码子网络用于基于所述空间特征和所述时间特征进行解码,以获得所述目标方向点。
  2. 根据权利要求1所述的方法,其特征在于,所述关键点识别模型通过如下方式进行训练:
    获取多组训练样本,其中,每组训练样本中包含训练图像序列以及与所述训练图像序列对应的标签图像;
    将目标输入图像序列输入所述卷积子网络,获得所述目标输入图像序列对应的空间特征图像,并将所述目标输入图像序列输入所述时间循环子网络,获得所述目标输入图像序列对应的时间特征图像,其中,所述目标输入图像序列包含所述训练图像序列;
    对所述空间特征图像和所述时间特征图像进行融合,获得融合特征图像;
    将所述融合特征图像输入所述解码子网络,获得方向特征图像;
    根据所述方向特征图像和所述目标输入图像序列对应的标签图像,确定所述关键点识别模型的目标损失;
    在满足更新条件的情况下,根据所述目标损失对所述关键点识别模型的参数进行更新。
  3. 根据权利要求2所述的方法,其特征在于,所述目标输入图像序列还包括处理图像序列,所述处理图像序列为基于所述训练图像序列进行预处理所得的图像序列,所述处理图像序列对应的标签图像为对所述训练图像序列对应的标签图像进行相同的预处理所得的图像。
  4. 根据权利要求2所述的方法,其特征在于,所述根据所述方向特征图像和所述目标输入图像序列对应的标签图像,确定所述关键点识别模型的目标损失,包括:
    根据所述标签图像中各点与所述标签图像中的标注方向点的位置,将所述标签图像转换为高斯特征图;
    根据所述方向特征图像和所述高斯特征图,确定所述目标损失。
  5. 根据权利要求4所述的方法,其特征在于,通过以下公式根据所述标签图像中各点与所述标签图像中的标注方向点的位置,将所述标签图像转换为高斯特征图:
    Figure PCTCN2022104089-appb-100001
    y′(x,y;x l,y l,α)用于表示高斯特征图中(x,y)坐标的特征值;
    (x,y)用于表示所述标签图像中的元素坐标值;
    (x l,y l)用于表示所述标签图像中标注方向点的坐标值;
    α用于表示高斯变换的超参数。
  6. 根据权利要求2所述的方法,其特征在于,所述解码子网络包含多层特征解码网络,每层特征解码网络输出的特征图尺寸不同;
    所述根据所述方向特征图像和所述目标输入图像序列对应的标签图像,确定所述关键点识别模型的目标损失,包括:
    针对每层特征解码网络,对该层特征解码网络输出的特征图或该标签图像进行标准化处理,以获得该层特征解码网络对应的尺寸相同的目标特征图和目标标签图像;
    针对每层特征解码网络,根据该层特征解码网络对应的目标特征图与所述目标标签图像,确定该层特征解码网络对应的损失;
    根据每层特征解码网络对应的损失确定所述关键点识别模型的目标损失。
  7. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    向所述内窥镜的驱动装置发送所述目标方向点,以使所述内窥镜向所述目标方向点移动;
    并重新返回执行所述接收待识别的腔体图像序列的步骤,直至所述内窥镜到达目标位置点。
  8. 一种用于内窥镜的组织腔体定位装置,其特征在于,所述装置包括:
    接收模块,用于接收待识别的腔体图像序列,其中,所述腔体图像序列包含有多张连续的图像,所述腔体图像序列中的最后一张图像为内窥镜在其当前所处位置获得的;
    第一确定模块,用于根据所述腔体图像序列和关键点识别模型,确定腔体图像序列对应的组织腔体相对于所述最后一张图像的目标方向点,其中,所述目标方向点用于指示所述内窥镜在其当前所处位置的下一目标移动方 向;
    其中,所述关键点识别模型包括卷积子网络、时间循环子网络和解码子网络,所述卷积子网络用于获取所述腔体图像序列的空间特征,所述时间循环子网络用于获取所述腔体图像序列的时间特征,所述解码子网络用于基于所述空间特征和所述时间特征进行解码,以获得所述目标方向点。
  9. 一种计算机可读介质,其上存储有计算机程序,其特征在于,该程序被处理装置执行时实现权利要求1-7中任一项所述方法的步骤。
  10. 一种电子设备,其特征在于,包括:
    存储装置,其上存储有计算机程序;
    处理装置,用于执行所述存储装置中的所述计算机程序,以实现权利要求1-7中任一项所述方法的步骤。
PCT/CN2022/104089 2021-09-03 2022-07-06 用于内窥镜的组织腔体定位方法、装置、介质及设备 WO2023029741A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111033760.9A CN113487605B (zh) 2021-09-03 2021-09-03 用于内窥镜的组织腔体定位方法、装置、介质及设备
CN202111033760.9 2021-09-03

Publications (1)

Publication Number Publication Date
WO2023029741A1 true WO2023029741A1 (zh) 2023-03-09

Family

ID=77947180

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/104089 WO2023029741A1 (zh) 2021-09-03 2022-07-06 用于内窥镜的组织腔体定位方法、装置、介质及设备

Country Status (2)

Country Link
CN (1) CN113487605B (zh)
WO (1) WO2023029741A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113487605B (zh) * 2021-09-03 2021-11-19 北京字节跳动网络技术有限公司 用于内窥镜的组织腔体定位方法、装置、介质及设备
CN113705546A (zh) * 2021-10-28 2021-11-26 武汉楚精灵医疗科技有限公司 干扰类别识别模型训练方法、识别方法、装置及电子设备
CN114332019B (zh) * 2021-12-29 2023-07-04 小荷医疗器械(海南)有限公司 内窥镜图像检测辅助系统、方法、介质和电子设备
CN114429458A (zh) * 2022-01-21 2022-05-03 小荷医疗器械(海南)有限公司 内窥镜图像的处理方法、装置、可读介质和电子设备
CN114332080B (zh) * 2022-03-04 2022-05-27 北京字节跳动网络技术有限公司 组织腔体的定位方法、装置、可读介质和电子设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20200070062A (ko) * 2018-12-07 2020-06-17 주식회사 포인바이오닉스 인공신경망을 이용하여 캡슐형 내시경 영상에서 병변을 감지하는 시스템 및 방법
CN111666998A (zh) * 2020-06-03 2020-09-15 电子科技大学 一种基于目标点检测的内窥镜智能插管决策方法
CN111915573A (zh) * 2020-07-14 2020-11-10 武汉楚精灵医疗科技有限公司 一种基于时序特征学习的消化内镜下病灶跟踪方法
CN112348125A (zh) * 2021-01-06 2021-02-09 安翰科技(武汉)股份有限公司 基于深度学习的胶囊内窥镜影像识别方法、设备及介质
CN112766416A (zh) * 2021-02-10 2021-05-07 中国科学院深圳先进技术研究院 一种消化内镜导航方法和系统
CN113487605A (zh) * 2021-09-03 2021-10-08 北京字节跳动网络技术有限公司 用于内窥镜的组织腔体定位方法、装置、介质及设备

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113112609A (zh) * 2021-03-15 2021-07-13 同济大学 一种面向肺部活检支气管镜的导航方法和系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20200070062A (ko) * 2018-12-07 2020-06-17 주식회사 포인바이오닉스 인공신경망을 이용하여 캡슐형 내시경 영상에서 병변을 감지하는 시스템 및 방법
CN111666998A (zh) * 2020-06-03 2020-09-15 电子科技大学 一种基于目标点检测的内窥镜智能插管决策方法
CN111915573A (zh) * 2020-07-14 2020-11-10 武汉楚精灵医疗科技有限公司 一种基于时序特征学习的消化内镜下病灶跟踪方法
CN112348125A (zh) * 2021-01-06 2021-02-09 安翰科技(武汉)股份有限公司 基于深度学习的胶囊内窥镜影像识别方法、设备及介质
CN112766416A (zh) * 2021-02-10 2021-05-07 中国科学院深圳先进技术研究院 一种消化内镜导航方法和系统
CN113487605A (zh) * 2021-09-03 2021-10-08 北京字节跳动网络技术有限公司 用于内窥镜的组织腔体定位方法、装置、介质及设备

Also Published As

Publication number Publication date
CN113487605B (zh) 2021-11-19
CN113487605A (zh) 2021-10-08

Similar Documents

Publication Publication Date Title
WO2023029741A1 (zh) 用于内窥镜的组织腔体定位方法、装置、介质及设备
WO2023030523A1 (zh) 用于内窥镜的组织腔体定位方法、装置、介质及设备
CN113496489B (zh) 内窥镜图像分类模型的训练方法、图像分类方法和装置
CN113487608B (zh) 内窥镜图像检测方法、装置、存储介质及电子设备
CN114332019B (zh) 内窥镜图像检测辅助系统、方法、介质和电子设备
CN113487609B (zh) 组织腔体的定位方法、装置、可读介质和电子设备
CN113470029B (zh) 训练方法及装置、图像处理方法、电子设备和存储介质
US11417014B2 (en) Method and apparatus for constructing map
WO2023061080A1 (zh) 组织图像的识别方法、装置、可读介质和电子设备
CN113469295B (zh) 生成模型的训练方法、息肉识别方法、装置、介质及设备
CN113470030B (zh) 组织腔清洁度的确定方法、装置、可读介质和电子设备
WO2023207564A1 (zh) 基于图像识别的内窥镜进退镜时间确定方法及装置
WO2023138619A1 (zh) 内窥镜图像的处理方法、装置、可读介质和电子设备
CN114332033A (zh) 基于人工智能的内窥镜图像处理方法、装置、介质及设备
WO2023165332A1 (zh) 组织腔体的定位方法、装置、可读介质和电子设备
CN114240867A (zh) 内窥镜图像识别模型的训练方法、内窥镜图像识别方法及装置
CN114937178B (zh) 基于多模态的图像分类方法、装置、可读介质和电子设备
CN116704593A (zh) 预测模型训练方法、装置、电子设备和计算机可读介质
CN114565586B (zh) 息肉分割模型的训练方法、息肉分割方法及相关装置
CN114596312B (zh) 一种视频处理方法和装置
CN114782390A (zh) 检测模型的确定方法、息肉检测方法、装置、介质及设备
CN116524582A (zh) 手势姿态识别方法、装置、电子设备、介质及程序产品
KR20220111526A (ko) 실시간 생체 이미지 인식 방법 및 장치
CN115944261A (zh) 基于三维地图的人体腔道导航方法和系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22862882

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE