CN113496512B

CN113496512B - Tissue cavity positioning method, device, medium and equipment for endoscope

Info

Publication number: CN113496512B
Application number: CN202111039749.3A
Authority: CN
Inventors: 石小周; 赵家英; 李永会; 杨延展; 边成
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2021-09-06
Filing date: 2021-09-06
Publication date: 2021-12-17
Anticipated expiration: 2041-09-06
Also published as: CN113496512A; WO2023030523A1

Abstract

The present disclosure relates to a tissue cavity localization method, apparatus, medium, and device for an endoscope, the method comprising: receiving a cavity image to be identified, wherein the cavity image is acquired by an endoscope at the position of the cavity image; acquiring a central point of a tissue cavity corresponding to the cavity image according to the key point identification model and the cavity image, wherein the central point is used for representing the next target moving position of the endoscope at the position of the central point; the key point identification model comprises a student sub-network, a teacher sub-network and a judgment sub-network, the network structures of the student sub-network and the teacher sub-network are the same, the teacher sub-network is used for determining the prediction marking characteristics corresponding to the training images in the student sub-networks, and in the training process of the key point identification model, the weight of the prediction loss corresponding to the student sub-networks is determined based on the prediction marking characteristics of the teacher sub-network and the judgment sub-network. Therefore, the real-time performance and accuracy of tissue cavity positioning can be improved.

Description

Tissue cavity positioning method, device, medium and equipment for endoscope

Technical Field

The present disclosure relates to the field of image processing, and in particular, to a method, apparatus, medium, and device for positioning a tissue cavity of an endoscope.

Background

In recent years, due to the appearance of deep learning, the artificial intelligence technology is rapidly developed, and the artificial intelligence can replace human work in many fields, such as repetitive tedious work, and the burden of the human work can be greatly reduced.

Endoscopy, such as enteroscopy, is generally divided into two stages, namely, endoscope advancing and endoscope withdrawing, wherein the endoscope withdrawing is the stage of examining the disease condition for a doctor, but the physician needs to spend more energy and time on advancing. In the related art, the endoscope entering time can be saved through automatic navigation, and the workload of doctors is saved. In the related art, a combination of pose estimation and depth estimation may be generally used to determine the direction of the entering mirror, or an image algorithm based on threshold segmentation, or the like may be used to determine the direction of the entering mirror. However, many complex situations may exist in the process of endoscope entering, such as blocking of dirt, peristalsis of intestinal tracts, different intestinal tracts of different people, and the like, and the scheme is difficult to adapt to complex and variable intestinal tract environments, and accuracy and precision of endoscope entering navigation are insufficient.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

In a first aspect, the present disclosure provides a method of tissue cavity localization for an endoscope, the method comprising:

receiving a cavity image to be identified, wherein the cavity image is acquired by an endoscope at the position of the cavity image;

obtaining a central point of a tissue cavity corresponding to the cavity image according to the key point identification model and the cavity image, wherein the central point is used for representing the next target moving position of the endoscope at the position of the central point;

the key point identification model comprises a student sub-network, a teacher sub-network and a judgment sub-network, the student sub-network and the teacher sub-network have the same network structure, the teacher sub-network is used for determining the prediction marking characteristics corresponding to the training images in the student sub-networks, and in the training process of the key point identification model, the weight of the prediction loss corresponding to the student sub-networks is determined based on the prediction marking characteristics of the teacher sub-network and the judgment sub-network.

In a second aspect, the present disclosure provides a tissue cavity positioning device for an endoscope, the device comprising:

the receiving module is used for receiving a cavity image to be identified, wherein the cavity image is acquired by the endoscope at the position of the cavity image;

the first determination module is used for obtaining a central point of a tissue cavity corresponding to the cavity image according to a key point identification model and the cavity image, wherein the central point is used for representing the next target moving position of the endoscope at the position of the central point;

In a third aspect, the present disclosure provides a computer readable medium having stored thereon a computer program which, when executed by a processing apparatus, performs the steps of the method of the first aspect.

In a fourth aspect, the present disclosure provides an electronic device comprising:

a storage device having a computer program stored thereon;

processing means for executing the computer program in the storage means to carry out the steps of the method of the first aspect.

According to the technical scheme, the central point of the tissue cavity in the cavity image is determined based on the cavity image and the key point identification model obtained by the endoscope at the position where the endoscope is located, and the advanced moving position of the endoscope is determined based on the central point, so that the tissue cavity positioning method can be suitable for complex cavity environments, the accuracy and the real-time performance of tissue cavity positioning can be effectively improved, and accurate data support is provided for automatic navigation of endoscope entering. Moreover, by the technical scheme, the training can be performed by combining the unmarked training images in the training process of the key point identification model, so that the workload of data marking can be reduced, the generalization of the key point identification model can be improved, and the application range of the method is further improved. In addition, in the key point identification model, the weight of the loss prediction of the student sub-network can be dynamically determined based on the prediction marking characteristics of the teacher sub-network, so that the accuracy of the loss determination in the training process of the key point identification model can be further improved, the training efficiency and the identification accuracy of the key point model are improved, the accuracy of the determined tissue cavity center point is improved, the technical and experience requirements of an endoscope on a user are reduced, and the use experience of the user is improved

Additional features and advantages of the disclosure will be set forth in the detailed description which follows.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale. In the drawings:

FIG. 1 is a flow chart of a method of tissue cavity localization for an endoscope provided in accordance with an embodiment of the present disclosure;

FIG. 2 is a flow diagram of the training of a keypoint recognition model provided in accordance with one embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a structure of a keypoint identification model;

FIG. 4 is a schematic diagram of a two-layer feature processing network comprised by a student subnetwork;

FIG. 5 is a schematic illustration of the intestinal lumen location determined based on the tissue lumen localization methods for endoscopes provided by the present disclosure;

FIG. 6 is a schematic illustration of another intestinal lumen location determined based on the tissue lumen localization methods for endoscopes provided by the present disclosure;

FIG. 7 is a block diagram of a tissue cavity positioning device for an endoscope provided in accordance with an embodiment of the present disclosure;

FIG. 8 shows a schematic structural diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

Fig. 1 is a flow chart of a method for positioning a tissue cavity for an endoscope according to an embodiment of the present disclosure, which may include, as shown in fig. 1:

in step 11, a lumen image to be identified is received, the lumen image being acquired by the endoscope at its location.

Among them, in medical endoscope image recognition, an endoscope takes a medical endoscope video stream inside a living body, for example, a human body. Illustratively, during the endoscope entering process of the endoscope, namely, the endoscope enters the target position of the human body from the cavity channel or the closed body cavity of the human body communicated with the outside, image shooting is carried out, so that the current position of the endoscope can be determined based on the shot image or video to provide navigation for the endoscope entering process. Illustratively, the cavity communicating with the outside may be an alimentary canal, a respiratory tract, or the like, and the enclosed body cavity may be a thoracic cavity, an abdominal cavity, or the like, which may be introduced into the endoscope through an incision.

In step 12, according to the key point identification model and the cavity image, a central point of the tissue cavity corresponding to the cavity image is obtained, where the central point is used to represent a next target movement position of the endoscope at the position of the central point.

And the tissue cavity corresponding to the cavity image is the displayed tissue cavity in the cavity image. The tissue cavity may be an intestinal cavity, a gastric cavity, or the like, for example, after the endoscope enters the intestinal cavity, the endoscope may capture an image at the position thereof to obtain a cavity image, the central point of the intestinal cavity is the center of the section of the space surrounded by the intestinal cavity wall, and when the endoscope moves forward along the central point of the intestinal cavity, the endoscope can be automatically navigated.

In this embodiment, the student sub-network is used for performing key point identification on an input image, and predicting the labeled features of the input training image through the teacher sub-network, so as to evaluate the accuracy of the prediction result of the student sub-network, so that the method can be applied to a model of semi-supervised learning, that is, the model can be trained by combining an unlabeled training image. In the process of training the key point identification model, the weight of the predicted loss corresponding to the student sub-network is determined based on the prediction result of the teacher sub-network and the judgment sub-network, so that the weight of the predicted loss of the student sub-network can be dynamically determined based on the prediction labeling feature of the teacher sub-network, and the parameter adjustment of the student sub-network in the following stage is matched with the prediction capability of the student sub-network.

Therefore, according to the technical scheme, the central point of the tissue cavity in the cavity image is determined based on the cavity image and the key point identification model obtained by the endoscope at the position where the endoscope is located, and the advanced moving position of the endoscope is determined based on the central point, so that the tissue cavity positioning method can be suitable for complex cavity environments, the accuracy and the real-time performance of tissue cavity positioning can be effectively improved, and accurate data support is provided for automatic navigation of endoscope entering. Moreover, by the technical scheme, the training can be performed by combining the unmarked training images in the training process of the key point identification model, so that the workload of data marking can be reduced, the generalization of the key point identification model can be improved, and the application range of the method is further improved. In addition, in the key point identification model, the weight of the loss prediction of the student sub-network can be dynamically determined based on the prediction marking characteristics of the teacher sub-network, so that the accuracy of loss determination in the training process of the key point identification model can be further improved, the training efficiency and the identification accuracy of the key point model are improved, the accuracy of the determined tissue cavity center point is improved, the technical and experience requirements of an endoscope on a user are reduced, and the use experience of the user is improved.

In a possible embodiment, the keypoint identification model may be trained by the following method, as shown in fig. 2, which may include the following steps:

in step 21, a plurality of sets of training samples are obtained, where the plurality of sets of training samples include a first training sample and a second training sample, the first training sample includes a training image, the second training sample includes a training image and a key point feature corresponding to the training image, and the key point feature is used to identify a central point of a tissue cavity corresponding to the training image.

In the training process, for part of the training images, the images may be labeled by a professional doctor based on experience, for example, for convenience of labeling, the position of the central point of the tissue cavity in the training images may be circled, the center of the labeled circle is the position of the central point, that is, the key point feature, and the part of the training samples is the second training sample. Meanwhile, in order to further improve the generalization of the model, in the embodiment of the present disclosure, the training samples may include multiple groups of unlabeled training samples, that is, the first training sample, so that the trained keypoint identification model may use a more complex application scenario.

In step 22, the training image is preprocessed in different preprocessing manners to obtain a first processed image and a second processed image corresponding to the training image.

For example, the preprocessing mode may be data enhancement, and may be, for example, a non-affine transformation mode such as color, luminance, chrominance, saturation transformation, and the like, so as to ensure that the position is not deformed.

As an example, in order to improve the accuracy of image processing, the training image may be normalized before data enhancement, i.e., normalized to a preset size, so as to facilitate the normalization of the training image.

Illustratively, in this embodiment, the same training image may be normalized to a preset size, such as 256 × 256, and then may be preprocessed in a different manner selected from the manners of data enhancement. Illustratively, the first processed image may be obtained by color conversion, and the second processed image may be obtained by luminance conversion. The selection of the preset size and different preprocessing modes can be set according to actual use scenes, and the selection is not limited by the disclosure.

In step 23, a first predictive image is obtained from the first processed image and the student subnetwork, and a second predictive image is obtained from the second processed image and the teacher subnetwork.

Illustratively, the first processed image can be input into the student subnetwork to obtain a first predicted image, and the second processed image can be input into the teacher subnetwork to obtain a second predicted image. For a student sub-network, the feature icons output by the student sub-network can be normalized to the same size as the training image to obtain a predicted feature map, and the point with the largest element value in the predicted feature map is identified as the predicted central point to obtain the first predicted image. The predicted feature map may be a heat map, wherein each feature value represents a probability value that the point is the center point. As described above, if the structures of the student sub-network and the teacher sub-network are the same, the second prediction image is obtained in a similar manner for the teacher sub-network, and details thereof are not repeated here.

In step 24, the prediction loss of the student sub-network and the weight corresponding to the prediction loss are determined based on the first prediction image, the second prediction image, and the discrimination sub-network.

Wherein the first predictive image is used for representing the prediction result of the student sub-network, and the second predictive image is used for representing the prediction result of the teacher sub-network. In the embodiment of the present disclosure, since the training sample includes the training image that is not labeled, for this part of the training image, the second prediction image output by the teacher subnetwork can be used as the labeling feature of the training image in the present disclosure. Thus, in embodiments of the present disclosure, the prediction loss of a student subnetwork may be determined based on the first and second predicted images. In addition, considering that the second predicted image is also a predicted labeling feature and there may be a difference between the predicted labeling feature and a true labeling feature, a weight corresponding to the prediction loss may be determined at the same time, so as to consider the influence of the prediction accuracy of the teacher sub-network when determining the prediction loss.

In step 25, a target loss corresponding to the student subnetwork is determined based on the predicted loss and the weight of the predicted loss.

For example, if the difference between the second predicted image and the actual feature to be labeled is large, the weight is small, and the part of the prediction loss reflected in the target loss is correspondingly reduced, that is, the product of the prediction loss and the weight can be used as the target prediction loss corresponding to the student sub-network, so as to determine the target loss of the student sub-network.

In a possible embodiment, in a case that the training sample is a first training sample, that is, the training sample is an unlabeled training image, an exemplary implementation manner of determining the target loss corresponding to the student sub-network based on the predicted loss and the weight of the predicted loss is as follows, and this step may include:

and determining the generation loss of the student sub-network according to the first prediction image and the preset tag characteristics corresponding to the first prediction image.

In the embodiment of the present disclosure, a student subnetwork may be used as a generator, and a discriminant subnetwork may be used as a discriminator to form a GAN (confrontation-induced Network) model. Wherein the generator is for generating dummy data to obfuscate true and false discrimination by a discriminator for discriminating between samples from the dummy data set and the true data set. In the embodiment of the disclosure, the first prediction feature image output by the student sub-network can be used as pseudo data to determine the generation loss of the student sub-network, for example

Can be realized by softplus loss:

；

wherein the content of the first and second substances,

representing combined samples of the training image and the first predictive image.

And then, based on the weights respectively corresponding to the generated loss and the predicted loss, carrying out weighted summation on the generated loss and the predicted loss, and determining the obtained result as the target loss.

For example, the weight corresponding to the generation loss may be set in advance in a hyper-parametric manner, for example, the weight corresponding to the generation loss may be set to 0.01. Therefore, the corresponding target loss when the training image is not labeled can be determined through a weighted summation mode.

Therefore, according to the technical scheme, when the target loss of the student sub-network is determined, the predicted loss of the student sub-network can be considered, and the generation loss of the student sub-network can be determined by combining the judgment sub-network, so that the target loss of the student sub-network can be comprehensively determined from multiple aspects, and accurate and reliable data support is provided for parameter adjustment of the subsequent student sub-network.

In a possible embodiment, in a case that the training sample is a second training sample, that is, a sample for performing feature criteria, the determining, based on the predicted loss and the weight of the predicted loss, another exemplary implementation manner of the target loss corresponding to the student sub-network may include:

and determining the generation loss of the student sub-network according to the first prediction image and the preset tag characteristics corresponding to the first prediction image. The method for determining the generation loss is described in detail above, and is not described herein again.

And calculating the mean square error between the first prediction image and the key point characteristics corresponding to the training image to obtain the output loss. In this embodiment, the training samples include the feature of the keypoint corresponding to the training image, that is, the feature of the accurately labeled center point, so that the difference between the prediction output of the student subnetwork and the corresponding real output thereof can be further determined by calculating the mean square error between the first prediction image and the feature of the keypoint corresponding to the training image, thereby ensuring the updating accuracy thereof.

And performing weighted summation on the generation loss and the prediction loss based on weights respectively corresponding to the generation loss and the prediction loss, and determining the sum of the obtained result and the output loss as the target loss.

Illustratively, the target loss L may be determined by the following formula:

；

wherein the content of the first and second substances,

for representing said output loss;

a weight for representing the predicted loss;

for representing the predicted loss;

a weight for representing a correspondence of the generation loss;

for representing the generation loss.

Therefore, according to the technical scheme, when the target loss of the student sub-network is determined, the predicted loss of the student sub-network can be considered, the generation loss of the student sub-network can be determined by combining the judgment sub-network, the difference between the output and the real output of the student sub-network can be considered, the target loss of the student sub-network can be determined comprehensively from multiple aspects, and when the target loss is determined, the proportion of the generation loss and the predicted loss in the target loss can be adjusted by combining the corresponding weights of the generation loss and the predicted loss, so that the accuracy of target loss calculation is further improved, accurate and reliable data support is provided for parameter adjustment of the subsequent student sub-network, and the accuracy of the central point of the tissue cavity determined based on the key point recognition model is further ensured.

Turning back to fig. 2, in step 26, in the case where the update condition is satisfied, the parameters of the student sub-network and the teacher sub-network are updated in accordance with the target loss.

As an example, the update condition may be that the target loss is greater than a preset loss threshold, which indicates that the recognition accuracy of the keypoint recognition model is insufficient. As another example, the update condition may be that the number of iterations is less than a preset threshold, and the identification accuracy of the keypoint identification model is considered to be insufficient when the number of iterations is less.

Accordingly, in the case where the update condition is satisfied, the parameters of the student sub-networks and the teacher sub-networks can be updated according to the target loss. The method for updating the parameter based on the determined target loss may adopt an updating method commonly used in the art, and is not described herein again.

Under the condition that the updating condition is not met, the identification accuracy of the key point identification model can be considered to meet the training requirement, at the moment, the training process can be stopped, and the trained key point identification model is obtained.

In step 27, the loss of the discrimination sub-network is determined, and the parameters of the discrimination sub-network are updated based on the loss of the discrimination sub-network. Wherein the discrimination sub-network may be a VGG network in which the classification network is replaced with a confidence regression network.

For example, after updating the student sub-networks, the discriminant sub-networks can be updated accordingly to ensure the matching degree of the discriminant sub-networks and the student sub-networks. For example, when determining the loss of the discrimination sub-network, a portion of the data may be sampled from the second training sample, and the loss of the discrimination sub-network may be determined as follows

：

；

Wherein the content of the first and second substances,

for representing a training image annotated with a keypoint feature and a combined sample of the keypoint feature,

representing combined samples of the training image and the first predictive image. Likewise, after determining the loss of the discrimination sub-network, the method may be updated by using a parameter updating method commonly used in the art, and the disclosure is not repeated herein.

Therefore, by the technical scheme, the training samples of the key point identification model can contain the marked samples and the unmarked samples, so that the generalization of the trained key point identification model can be effectively improved, and the tissue cavity positioning method can be suitable for more complex and wider application scenes. Meanwhile, in the training process, the input of the student sub-network and the input of the teacher sub-network are different images converted aiming at the same training image, so that the diversity invariance of the training images can be further improved, the training efficiency of the key point identification model is improved to a certain extent, the stability of the key point identification model is improved, and accurate data support is provided for endoscope entering navigation.

In one possible embodiment, an exemplary implementation of determining the prediction loss and the weight corresponding to the prediction loss of the student subnetwork from the first prediction image, the second prediction image and the discrimination subnetwork in step 24 is as follows, which may include:

and calculating the mean square error between the first prediction image and the second prediction image to obtain the prediction loss.

Illustratively, the predicted loss Lc may be determined by the following formula:

wherein, the first prediction image and the second prediction image have the same size, H is used for indicating the height of the prediction image, and W is used for indicating the width of the prediction image;

for representing a first predicted image (h,w) The value of the element at (c);

for representing a second predicted image (h,w) The value of the element(s).

And splicing the second prediction image and the training image to obtain a judgment image.

And acquiring the confidence corresponding to the second prediction image according to the judgment image and the judgment sub-network, and determining the confidence as the weight corresponding to the prediction loss.

And splicing the second predicted image and the training image through a concatemate function to obtain a discrimination image. The determination subnetwork is used for determining the confidence that the second predicted image is the annotation image of the training image, and when the confidence is high, the determination subnetwork indicates that the second predicted image is authentic as the annotation image of the training image, and when the confidence is low, the determination subnetwork indicates that the confidence that the second predicted image is the training image is insufficient.

Thus, in the present disclosure, when the confidence is used as the weight of the prediction loss, and the second predicted image corresponding to the teacher subnetwork is reliable as the annotation image of the training image, the weight corresponding to the prediction loss is large, that is, in this case, the influence of the prediction loss on the model can be sufficiently considered; the reliability of the second prediction image corresponding to the teacher sub-network as the labeled image of the training image is insufficient, and the weight corresponding to the prediction loss is small, that is, in this case, the prediction loss may be caused by the prediction error of the student sub-network or the prediction error of the teacher sub-network, and the influence of the determined prediction loss on the model is small.

Therefore, according to the technical scheme, the weight of the prediction loss of the student sub-network can be adaptively adjusted according to the credibility of the prediction image of the teacher sub-network, the influence of errors caused by the prediction errors of the teacher sub-network on model updating is effectively avoided, the weight adjustment of the adaptive prediction loss is realized, and the training of the key point identification model is more stable and efficient.

In a possible embodiment, the feature processing networks in the student sub-network and the teacher sub-network are multiple layers and have the same number of layers, as shown in fig. 3, which is a schematic structural diagram of a keypoint identification model, and in fig. 3, a hourglass structure may be used to construct the feature processing networks. Illustratively, as shown in FIG. 3, in the student subnetwork, encoder E1 and decoder D1 are formed as one layer of feature processing network N1, encoder E2 and decoder D2 are formed as one layer of feature processing network N2, and accordingly, two layers of feature processing networks N3 and N4 are also included in the teacher subnetwork, with feature processing network N3 including encoder E3 and decoder D3, and feature processing network N4 including encoder E4 and decoder D4.

Accordingly, calculating a mean square error between the first predicted image and the second predicted image, an exemplary implementation of obtaining a prediction loss may include:

and calculating the mean square error according to the feature prediction images output by the feature processing networks corresponding to the same layer in the student sub-networks and the teacher sub-networks, wherein the first prediction image is the feature prediction image output by the feature processing network of the last layer in the student sub-networks, and the second prediction image is the feature prediction image output by the feature processing network of the last layer in the teacher sub-networks.

As shown in fig. 4, the feature prediction image is output in each layer of feature processing network, which is a schematic structural diagram of two layers of feature processing networks included in a student subnetwork. Illustratively, the size of the input first processed image is a preset size 256 × 256: (1) the first processed image can be scaled to 128 x 128 using a 7 x 7 convolution with a step size of 2 and padding of 3; (2) the first processed image is then scaled to 64 x 64, with 256 channels based on ResBlock and maxporoling. As shown above, a hourglass structure can be formed as a layer of feature processing network, which can be split into two parts, namely an encoder and a decoder, and the decoding process is performed at the decoder, and each layer is subjected to upsampling and then added with the feature map of the corresponding scale of the encoder to obtain the feature prediction image corresponding to the layer of feature processing network. The method for setting the hourglass structure comprises the following steps of setting a hourglass structure, setting specific parameters, and setting the parameters according to actual use scenes. Therefore, the feature information under multiple scales can be fused in the feature prediction image, so that the prediction result is more accurate.

As shown in fig. 3, it is possible to calculate a mean square error, denoted as MSE1, based on the feature prediction images output by the feature processing networks N1 and N3, respectively, and calculate a mean square error, denoted as MSE2, based on the feature prediction images output by the feature processing networks N2 and N4, respectively. And then, determining the average value of the mean square errors corresponding to each layer of feature processing network as the prediction loss. The predicted loss may be determined as the average of MSE1 and MSE 2.

Therefore, according to the technical scheme, the student sub-networks and the teacher sub-networks can predict through the multi-layer feature processing network, and therefore the accuracy of the obtained first predicted image and the second predicted image can be improved. In addition, in the technical scheme, when the prediction loss of the student sub-network is determined, the loss can be calculated according to the feature prediction image output by each layer of feature processing network, so that the prediction loss of the student sub-network can be determined by combining the loss corresponding to each layer of feature processing network, the prediction loss can accurately represent the prediction deviation in the prediction process of the student sub-network, and accurate data support is provided for the subsequent parameter adjustment.

In one possible embodiment, the feature processing networks in the student sub-networks and the teacher sub-networks are multiple layers and have the same number of layers, and a schematic structural diagram is shown in fig. 3.

Accordingly, the calculating a mean square error between the first prediction image and the corresponding keypoint features of the training image to obtain an output loss may include:

and processing the key point features corresponding to the training images into processing training images with the same size as the feature predicted images aiming at the feature predicted images output by each layer of feature processing network in the student sub-network.

In this embodiment, in order to ensure the accuracy of data processing, the key point features corresponding to the training images may be processed into processing training images having the same size as the feature prediction images, because the parameter setting of the feature processing network may cause the feature prediction images output by the feature processing network to have different sizes from the training images. Illustratively, the size of the feature prediction image is 64 × 64, and the size of the keypoint feature corresponding to the training image is 256 × 256, then the processed training image with the size of 64 × 64 can be obtained by down-sampling the keypoint feature.

And calculating the mean square error between the characteristic prediction image output by each layer of characteristic processing network and the processing training image corresponding to the characteristic prediction image, and determining the mean value of the mean square errors corresponding to each layer of characteristic processing network as the output loss.

Illustratively, the output loss may be determined as follows

：

Wherein the content of the first and second substances,

for representing the height of a feature-predicted image,

a width for representing a feature prediction image;

in the processing training image corresponding to the characteristic processing network of the nth layerh,w) The value of the element at (c);

(for representing the feature prediction image corresponding to the n-th layer feature processing network)h,w) The value of the element at (c);

n is used to indicate the number of layers of the feature processing network.

Therefore, according to the technical scheme, when the output loss of the student sub-network is determined, the loss can be calculated according to the feature prediction image output by each layer of feature processing network, so that the output loss of the student sub-network can be determined by combining the loss corresponding to each layer of feature processing network, the output loss can accurately represent the deviation between the feature prediction image output by each layer of feature processing network of the student sub-network and the annotation image, and accurate data support is provided for subsequent parameter adjustment.

In one possible embodiment, the exemplary implementation of updating the parameters of the student and teacher subnetworks according to the target loss is as follows, including:

updating parameters of the student sub-network according to the target loss. For example, after the target loss is determined, the parameters of the student sub-networks can be updated and adjusted in a gradient descent method according to the target loss so as to further improve the prediction accuracy of the student sub-networks.

Then, the parameters of the teacher sub-network may be updated according to the updated parameters of the student sub-networks by the following formula:

；

wherein the content of the first and second substances,

a parameter value used for representing the updated parameter of the teacher sub-network;

a parameter value for indicating a current parameter of the teacher sub-network;

a parameter value used for representing the updated parameter of the student sub-network;

for indicating the update rate.

In the embodiment of the present disclosure, the teacher sub-network is configured to provide the student sub-network with a predictive annotation feature of the training image, so that when the parameter of the teacher sub-network is updated, the parameter of the teacher sub-network is not updated directly based on the determined target loss, but after the parameter of the student sub-network is updated, the parameter of the teacher sub-network is further updated based on the updated parameter of the student sub-network.

For example, when the teacher sub-network is updated in the above manner, the adjustment may be performed based on the current parameter values of the parameters in the teacher sub-network and the updated values of the parameters in the student sub-networks, that is, for each parameter in a sub-network, each updated parameter value may include both the original characteristic of the teacher sub-network and the characteristic of the parameter in the student sub-network, so as to ensure the accuracy and stability of the teacher sub-network.

In one possible embodiment, an exemplary implementation manner of obtaining a central point of a tissue cavity corresponding to a cavity image according to the key point identification model and the cavity image may include:

and inputting the cavity image into a student sub-network of the key point identification model to obtain a characteristic image output by the student sub-network.

As described above, in the training process of the keypoint recognition model, it is necessary to train in combination with the student sub-network, the teacher sub-network, and the discrimination sub-network to adjust the parameters of the student sub-networks so that accurate prediction can be performed. Therefore, in the embodiment, the cavity image can be directly input into the student sub-network of the key point identification model, and key point identification can be directly carried out on the basis of the student sub-network, so that prediction is not needed to be carried out in combination with the teacher sub-network, and the data calculation amount can be reduced.

And carrying out standardization processing on the characteristic image to obtain a processing characteristic image with the same size as the cavity image.

As described above, the feature image output by the student subnetwork is usually different from the size of the cavity image, and in order to ensure the accuracy of the center point identification, in the embodiment of the present disclosure, the feature image may be subjected to a normalization process, for example, when the size of the feature image is smaller than that of the cavity image, the feature image may be converted into a processed feature image having the same size as that of the cavity image by an upsampling method.

And then, determining the point with the maximum characteristic value in the processed characteristic image as a key point, and determining the point corresponding to the key point in the cavity image as the central point.

The processing feature image may be a heatmap image, and the feature value of each point in the heatmap image is a decimal within an interval of [0,1], which generally indicates the probability that there is a target point, that is, the heatmap may represent a probability distribution map, and the probability value of the probability distribution map satisfies a two-dimensional gaussian distribution. Therefore, in this embodiment, the point in the processed feature image where the feature value is largest can be directly determined as the key point, i.e., the point where the probability of having the central point is largest. Then, a center point in the cavity image may be determined based on the location of the keypoint in the processed feature image. In this embodiment, the processing feature image and the cavity image have the same size, and the position of the determined key point can be directly determined as the position of the central point in the cavity image, so that the target can be located for the forward navigation of the endoscope.

Therefore, according to the technical scheme, when the central point is determined based on the key point identification model, the image identification and prediction can be directly performed based on the student sub-network in the key point identification model, so that the identification efficiency can be improved to a certain extent, the complexity of key point identification can be reduced, the application range of the tissue cavity positioning method for the endoscope is widened, the real-time performance and the accuracy of automatic navigation of the endoscope are improved, and the use experience of a user is improved.

Fig. 5 and 6 are schematic diagrams of the center point of the intestinal lumen determined by the tissue lumen positioning method for an endoscope provided by the present disclosure, and fig. 5 shows that point a is a labeled point and point a' is the center point of the intestinal lumen determined by the method provided by the present disclosure, so as to provide data support for automatic navigation of the endoscope by accurate positioning of the intestinal lumen. As further shown in fig. 6, point B is the labeled point and point B' is the center point of the intestinal lumen determined based on the methods of the present disclosure. Therefore, based on experimental images, the method disclosed by the invention can be used for accurately determining the position of the tissue cavity in the image through the image obtained in the endoscope entering process, so that the positioning accuracy of the tissue cavity is improved.

The present disclosure also provides a tissue cavity locating device for an endoscope, as shown in fig. 7, the device 10 comprising:

a receiving module 100, configured to receive a cavity image to be identified, where the cavity image is acquired by an endoscope at a location of the endoscope;

a first determining module 200, configured to obtain a central point of a tissue cavity corresponding to the cavity image according to the key point identification model and the cavity image, where the central point is used to represent a next target moving position of the endoscope at the position of the central point;

Optionally, the keypoint recognition model is trained by a training device, and the training device includes:

the acquisition module is used for acquiring a plurality of groups of training samples, wherein the plurality of groups of training samples comprise a first training sample and a second training sample, the first training sample comprises a training image, the second training sample comprises a training image and key point features corresponding to the training image, and the key point features are used for identifying a central point of a tissue cavity corresponding to the training image;

the first processing module is used for preprocessing the training image by adopting different preprocessing modes to obtain a first processing image and a second processing image corresponding to the training image;

a second processing module for obtaining a first predicted image based on the first processed image and the student sub-network, and obtaining a second predicted image based on the second processed image and the teacher sub-network;

a second determining module, configured to determine, according to the first predicted image, the second predicted image, and the discrimination sub-network, a prediction loss of the student sub-network and a weight corresponding to the prediction loss;

a third determining module, configured to determine a target loss corresponding to the student subnetwork based on the predicted loss and the weight of the predicted loss;

the first updating module is used for updating the parameters of the student sub-networks and the teacher sub-networks according to the target loss under the condition that an updating condition is met;

and the second updating module is used for determining the loss of the discrimination sub-network and updating the parameters of the discrimination sub-network according to the loss of the discrimination sub-network.

Optionally, the second determining module includes:

the first calculation sub-module is used for calculating the mean square error between the first prediction image and the second prediction image to obtain the prediction loss;

the splicing sub-module is used for splicing the second prediction image and the training image to obtain a judgment image;

and the first determining sub-module is used for acquiring the confidence corresponding to the second prediction image according to the judgment image and the judgment sub-network, and determining the confidence as the weight corresponding to the prediction loss.

Optionally, the feature processing networks in the student sub-networks and the teacher sub-networks are multilayer and have the same number of layers;

the first computation submodule includes:

the second calculation sub-module is used for calculating the mean square error according to the feature prediction images output by the feature processing networks corresponding to the same layer in the student sub-networks and the teacher sub-networks, wherein the first prediction image is the feature prediction image output by the feature processing network of the last layer in the student sub-networks, and the second prediction image is the feature prediction image output by the feature processing network of the last layer in the teacher sub-networks;

and the second determining submodule is used for determining the average value of the mean square errors corresponding to each layer of the feature processing network as the prediction loss.

Optionally, in a case that the training sample is a first training sample, the third determining module includes:

the third determining sub-module is used for determining the generation loss of the student sub-network according to the first prediction image and the preset tag characteristics corresponding to the first prediction image;

and the fourth determining submodule is used for performing weighted summation on the generation loss and the prediction loss based on the weights respectively corresponding to the generation loss and the prediction loss, and determining the obtained result as the target loss.

Optionally, in a case that the training sample is a second training sample, the third determining module includes:

the third calculation submodule is used for calculating the mean square error between the first prediction image and the key point characteristics corresponding to the training image to obtain output loss;

and the fifth determining submodule is used for performing weighted summation on the generation loss and the prediction loss based on weights respectively corresponding to the generation loss and the prediction loss, and determining the sum of the obtained result and the output loss as the target loss.

the third computation submodule comprises:

the first processing sub-module is used for processing the key point features corresponding to the training images into processing training images with the same size as the feature predicted images aiming at the feature predicted images output by each layer of feature processing network in the student sub-network;

and the fourth calculating submodule is used for calculating the mean square error between the characteristic predicted image output by each layer of the characteristic processing network and the processing training image corresponding to the characteristic predicted image, and determining the mean value of the mean square error corresponding to each layer of the characteristic processing network as the output loss.

Optionally, the first updating module includes:

a first updating submodule, configured to update parameters of the student sub-network according to the target loss;

a second updating sub-module, configured to update the parameters of the teacher sub-network according to the updated parameters of the student sub-networks by using the following formula:

；

wherein the content of the first and second substances,

for indicating the update rate.

Optionally, the first determining module includes:

the input sub-module is used for inputting the cavity image into a student sub-network of the key point identification model to obtain a characteristic image output by the student sub-network;

the second processing submodule is used for carrying out standardization processing on the characteristic image to obtain a processing characteristic image with the same size as the cavity image;

and the sixth determining submodule is used for determining the point with the maximum characteristic value in the processed characteristic image as a key point and determining the point corresponding to the key point in the cavity image as the central point.

Referring now to FIG. 8, shown is a schematic diagram of an electronic device 600 suitable for use in implementing embodiments of the present disclosure. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 8, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 8 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: receiving a cavity image to be identified, wherein the cavity image is acquired by an endoscope at the position of the cavity image; obtaining a central point of a tissue cavity corresponding to the cavity image according to the key point identification model and the cavity image, wherein the central point is used for representing the next target moving position of the endoscope at the position of the central point; the key point identification model comprises a student sub-network, a teacher sub-network and a judgment sub-network, the student sub-network and the teacher sub-network have the same network structure, the teacher sub-network is used for determining the prediction marking characteristics corresponding to the training images in the student sub-networks, and in the training process of the key point identification model, the weight of the prediction loss corresponding to the student sub-networks is determined based on the prediction marking characteristics of the teacher sub-network and the judgment sub-network.

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a module does not in some cases constitute a definition of the module itself, for example, the receiving module may also be described as a "module receiving an image of a cavity to be identified".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

Example 1 provides a method of tissue cavity localization for an endoscope, wherein the method comprises:

Example 2 provides the method of example 1, wherein the keypoint recognition model is trained by:

acquiring a plurality of groups of training samples, wherein the plurality of groups of training samples comprise a first training sample and a second training sample, the first training sample comprises a training image, the second training sample comprises a training image and key point features corresponding to the training image, and the key point features are used for identifying a central point of a tissue cavity corresponding to the training image;

preprocessing the training image by adopting different preprocessing modes to obtain a first processed image and a second processed image corresponding to the training image;

obtaining a first predictive image from the first processed image and the student sub-network and a second predictive image from the second processed image and the teacher sub-network;

determining a prediction loss of the student sub-network and a weight corresponding to the prediction loss according to the first prediction image, the second prediction image and the discrimination sub-network;

determining a target loss corresponding to the student subnetwork based on the predicted loss and the weight of the predicted loss;

updating the parameters of the student sub-networks and the teacher sub-networks according to the target loss under the condition that an updating condition is met;

determining the loss of the discrimination sub-network, and updating the parameters of the discrimination sub-network according to the loss of the discrimination sub-network.

Example 3 provides the method of example 2, wherein the determining prediction losses of the student subnetworks and weights corresponding to the prediction losses from the first prediction image, the second prediction image, and the discrimination subnetwork comprises:

calculating the mean square error between the first prediction image and the second prediction image to obtain the prediction loss;

splicing the second prediction image and the training image to obtain a judgment image;

Example 4 provides the method of example 3, wherein the feature handling networks in the student sub-network and the teacher sub-network are multi-layered and the number of layers is the same;

the calculating a mean square error between the first predicted image and the second predicted image to obtain a prediction loss comprises:

calculating the mean square error according to the feature prediction images output by the feature processing networks corresponding to the same layer in the student sub-networks and the teacher sub-networks, wherein the first prediction image is the feature prediction image output by the feature processing network of the last layer in the student sub-networks, and the second prediction image is the feature prediction image output by the feature processing network of the last layer in the teacher sub-networks;

and determining the average value of the mean square errors corresponding to each layer of feature processing network as the prediction loss.

Example 5 provides the method of example 2, wherein, in a case where the training sample is a first training sample, the determining the target loss corresponding to the student subnetwork based on the predicted loss and the weight of the predicted loss includes:

determining the generation loss of the student sub-network according to the first prediction image and the preset tag characteristics corresponding to the first prediction image;

and performing weighted summation on the generation loss and the prediction loss based on weights corresponding to the generation loss and the prediction loss respectively, and determining the obtained result as the target loss.

Example 6 provides the method of example 2, wherein, in a case where the training sample is a second training sample, the determining the target loss corresponding to the student subnetwork based on the predicted loss and the weight of the predicted loss includes:

calculating the mean square error between the first prediction image and the key point characteristics corresponding to the training image to obtain output loss;

Example 7 provides the method of example 6, wherein the feature handling networks in the student sub-network and the teacher sub-network are multi-layered and the number of layers is the same;

the calculating a mean square error between the first prediction image and the corresponding key point features of the training image to obtain an output loss includes:

processing the key point features corresponding to the training images into processing training images with the same size as the feature predicted images aiming at the feature predicted images output by each layer of feature processing network in the student sub-network;

Example 8 provides the method of example 2, wherein the updating parameters of the student and teacher subnetworks according to the target loss comprises:

updating parameters of the student sub-network according to the target loss;

updating the parameters of the teacher sub-network according to the updated parameters of the student sub-networks by the following formula:

；

wherein the content of the first and second substances,

for indicating the update rate.

Example 9 provides the method of example 1, wherein the obtaining a central point of a tissue cavity corresponding to the cavity image according to a key point identification model and the cavity image includes:

inputting the cavity image into a student sub-network of the key point identification model to obtain a characteristic image output by the student sub-network;

carrying out standardization processing on the characteristic image to obtain a processing characteristic image with the same size as the cavity image;

and determining the point with the maximum characteristic value in the processing characteristic image as a key point, and determining the point corresponding to the key point in the cavity image as the central point.

Example 10 provides a tissue cavity positioning device for an endoscope, wherein the device comprises:

Example 11 provides a computer readable medium having a computer program stored thereon, wherein the program, when executed by a processing device, implements the steps of the method of any of examples 1-9, in accordance with one or more embodiments of the present disclosure.

Example 12 provides, in accordance with one or more embodiments of the present disclosure, an electronic device, comprising:

a storage device having a computer program stored thereon;

processing means for executing said computer program in said storage means to carry out the steps of the method of any of examples 1-9.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims. With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Claims

1. A method of tissue cavity localization for an endoscope, the method comprising:

the key point identification model comprises a student sub-network, a teacher sub-network and a judgment sub-network, the student sub-network and the teacher sub-network have the same network structure, the teacher sub-network is used for determining the prediction marking characteristics corresponding to the training images in the student sub-networks, and in the training process of the key point identification model, the weight of the prediction loss corresponding to the student sub-networks is determined based on the prediction marking characteristics of the teacher sub-network and the judgment sub-network;

the key point identification model is trained in the following way:

2. The method of claim 1, wherein said determining a prediction loss for the student subnetwork and a weight corresponding to the prediction loss based on the first predicted image, the second predicted image, and the discrimination subnetwork comprises:

3. The method of claim 2, wherein the feature handling networks in the student and teacher subnetworks are multi-tiered and equal in number of tiers;

4. The method of claim 1, wherein, in the case that the training sample is a first training sample, the determining the target loss corresponding to the student subnetwork based on the predicted loss and the weight of the predicted loss comprises:

5. The method of claim 1, wherein, in the case that the training sample is a second training sample, the determining the target loss corresponding to the student subnetwork based on the predicted loss and the weight of the predicted loss comprises:

6. The method of claim 5, wherein the feature handling networks in the student and teacher subnetworks are multi-tiered and equal in number of tiers;

7. The method of claim 1, wherein said updating parameters of said student and teacher subnetworks based on said target loss comprises:

updating parameters of the student sub-network according to the target loss;

；

wherein the content of the first and second substances,

for indicating the update rate.

8. The method of claim 1, wherein obtaining the central point of the tissue cavity corresponding to the cavity image according to the key point identification model and the cavity image comprises:

9. A tissue cavity positioning device for an endoscope, the device comprising:

the key point recognition model is trained by a training device, the training device comprising:

10. A computer-readable medium, on which a computer program is stored, characterized in that the program, when being executed by processing means, carries out the steps of the method of any one of claims 1 to 8.

11. An electronic device, comprising:

a storage device having a computer program stored thereon;

processing means for executing the computer program in the storage means to carry out the steps of the method according to any one of claims 1 to 8.