CN116051486A

CN116051486A - Training method of endoscope image recognition model, image recognition method and device

Info

Publication number: CN116051486A
Application number: CN202211713114.1A
Authority: CN
Inventors: 潘俊文; 王杰祥; 赵家英; 李永会
Original assignee: Douyin Vision Co Ltd
Current assignee: Douyin Vision Co Ltd
Priority date: 2022-12-29
Filing date: 2022-12-29
Publication date: 2023-05-02
Anticipated expiration: 2042-12-29
Also published as: CN116051486B

Abstract

The disclosure relates to a training method, an image recognition method and an image recognition device for an endoscope image recognition model, which aims to reduce the problem that a large number of samples are dominant in training and improve the generalization performance and the robustness of the endoscope image recognition model. The training method comprises the following steps: acquiring a sample endoscope image set; inputting the sample image into an endoscope image recognition model aiming at each sample image to obtain a corresponding prediction blind returning result, determining a sample influence value of the sample image on the endoscope image recognition model according to the prediction blind returning result and the sample blind returning result corresponding to the sample image, and determining a sample weight of the sample image according to the sample influence value, wherein the sample influence value is inversely related to the sample weight; determining a target loss function value according to a sample blind back result, a prediction blind back result and a sample weight corresponding to each sample image; and adjusting parameters of the endoscope image recognition model according to the objective loss function value.

Description

Training method of endoscope image recognition model, image recognition method and device

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a training method of an endoscope image recognition model, an image recognition method and an image recognition device.

Background

The endoscope can reach the ileocecal section using an electronic enteroscope to view the colonic lesions from the mucosal side. Therefore, identification of the ileocecum is critical during endoscopy.

With the continuous development of deep learning technology, deep learning algorithms are gradually applied to the task of endoscope recognition. In general, deep learning algorithms assume that the ratio of the number of samples of different classes is balanced. However, in practical applications, the sample number imbalance is normal. For example, during a enteroscopy procedure, the electron enteroscope is not advanced after reaching the ileocecum. Thus, in the ileocecal identification sample set, only a very small portion of the frames in the enteroscopy video contain ileocecal imaging, and the non-ileocecal imaging is far more than the ileocecal imaging. Such class imbalance may result in an algorithm that is biased towards learning classes with a greater number of samples, while learning effects may be poor for classes with a smaller number of samples, thereby affecting the blind-back recognition accuracy of the model.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

In a first aspect, the present disclosure provides a training method of an endoscopic image recognition model for recognizing a ileocecum, the method comprising:

obtaining a sample endoscope image set, wherein the sample endoscope image set comprises a back-blind type sample image with a back-blind part and a non-back-blind type sample image without a back-blind part, and each sample image in the sample endoscope image set is marked with a sample back-blind result for representing whether the back-blind part exists or not;

inputting the sample image into the endoscope image recognition model for each sample image to obtain a corresponding prediction blind back result, determining a sample influence value of the sample image on the endoscope image recognition model according to the prediction blind back result and the sample blind back result corresponding to the sample image, and determining a sample weight of the sample image according to the sample influence value, wherein the sample influence value is inversely related to the sample weight;

determining a target loss function value according to the sample blind back result, the prediction blind back result and the sample weight corresponding to each sample image;

and adjusting parameters of the endoscope image recognition model according to the objective loss function value.

In a second aspect, the present disclosure provides an endoscopic image recognition method, the method comprising:

acquiring an endoscope image to be identified;

and inputting the endoscope image into an endoscope image recognition model to obtain a blind back recognition result corresponding to the endoscope image, wherein the endoscope image recognition model is obtained through the training method of the endoscope image recognition model in the first aspect.

In a third aspect, the present disclosure provides a training apparatus for an endoscopic image recognition model for recognizing a ileocecum, the apparatus comprising:

a first acquisition module for acquiring a sample endoscopic image set, wherein the sample endoscopic image set comprises a back-blind type sample image with a back-blind part and a non-back-blind type sample image without a back-blind part, and each sample image in the sample endoscopic image set is marked with a sample back-blind result for representing whether the back-blind part exists or not;

the first training module is used for inputting the sample image into the endoscope image recognition model for each sample image to obtain a corresponding prediction blind returning result, determining a sample influence value of the sample image on the endoscope image recognition model according to the prediction blind returning result and the sample blind returning result corresponding to the sample image, and determining a sample weight of the sample image according to the sample influence value;

The second training module is used for determining a target loss function value according to the sample blind returning result, the prediction blind returning result and the sample weight corresponding to each sample image;

and the third training module is used for adjusting parameters of the endoscope image recognition model according to the objective loss function value.

In a fourth aspect, the present disclosure provides an endoscopic image recognition apparatus, the apparatus comprising:

the second acquisition module is used for acquiring an endoscope image to be identified;

the recognition module is used for inputting the endoscope image into an endoscope image recognition model to obtain a blind recognition result corresponding to the endoscope image, wherein the endoscope image recognition model is obtained through the training method of the endoscope image recognition model in the first aspect.

In a fifth aspect, the present disclosure provides a non-transitory computer readable medium having stored thereon a computer program which, when executed by a processing device, implements the steps of the method of the first or second aspect.

In a sixth aspect, the present disclosure provides an electronic device, comprising:

a storage device having a computer program stored thereon;

processing means for executing said computer program in said storage means to carry out the steps of the method described in the first or second aspect.

According to the technical scheme, in the training process of the endoscope image recognition model, the sample influence value of the sample image on the endoscope image recognition model can be determined according to the predicted blindness result and the sample blindness result corresponding to the sample image, and the sample weight of the sample image is determined according to the sample influence value. The sample influence value is inversely related to the sample weight, so that a sample with large influence on the endoscope image recognition model has smaller sample weight, and a sample with small influence on the endoscope image recognition model has larger sample weight, thereby better adapting to category unbalanced training, reducing the problem that a large number of samples are dominant in training, and improving the generalization performance and the robustness of the endoscope image recognition model.

Additional features and advantages of the present disclosure will be set forth in the detailed description which follows.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale. In the drawings:

FIG. 1 is a flowchart illustrating a training method of an endoscopic image recognition model according to an exemplary embodiment of the present disclosure;

FIG. 2 is a process schematic diagram of a training method for an endoscopic image recognition model, according to another exemplary embodiment of the present disclosure;

FIG. 3 is a flowchart illustrating an endoscopic image recognition method according to an exemplary embodiment of the present disclosure;

FIG. 4 is a block diagram of a training device for an endoscopic image recognition model, according to an exemplary embodiment of the present disclosure;

FIG. 5 is a block diagram of an endoscopic image recognition device, according to an exemplary embodiment of the present disclosure;

fig. 6 is a block diagram of an electronic device, according to an exemplary embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below.

It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

It will be appreciated that prior to using the technical solutions disclosed in the embodiments of the present disclosure, the user should be informed and authorized of the type, usage range, usage scenario, etc. of the personal information related to the present disclosure in an appropriate manner according to the relevant legal regulations.

For example, in response to receiving an active request from a user, a prompt is sent to the user to explicitly prompt the user that the operation it is requesting to perform will require personal information to be obtained and used with the user. Thus, the user can autonomously select whether to provide personal information to software or hardware such as an electronic device, an application program, a server or a storage medium for executing the operation of the technical scheme of the present disclosure according to the prompt information.

As an alternative but non-limiting implementation, in response to receiving an active request from a user, the manner in which the prompt information is sent to the user may be, for example, a popup, in which the prompt information may be presented in a text manner. In addition, a selection control for the user to select to provide personal information to the electronic device in a 'consent' or 'disagreement' manner can be carried in the popup window.

It will be appreciated that the above-described notification and user authorization process is merely illustrative and not limiting of the implementations of the present disclosure, and that other ways of satisfying relevant legal regulations may be applied to the implementations of the present disclosure.

Meanwhile, it can be understood that the data (including but not limited to the data itself, the acquisition or the use of the data) related to the technical scheme should conform to the requirements of the corresponding laws and regulations and related regulations.

Technical terms that may appear in the embodiments of the present disclosure will be explained first.

Long tail (long-tail): in statistics, the long tail of a distribution is the portion of the distribution that is far from the "head" of the distribution and occurs many times. In machine learning, long tail distribution refers to a serious class imbalance problem.

Class imbalance (class imbalance): the data volume of different categories in the training set varies greatly.

Sample effect (Sample Impact): a single sample brings about changes and effects on the current model decisions.

Difficult sample (hard sample): the model identifies samples that are not good, such as a low confidence of the predictions, or a large loss of samples (loss).

Decision boundary or decision plane (decision boundary, or decision surface): is a hypersurface in the statistical classification problem, and divides the vector space into two sets which respectively correspond to two classifications.

With the continuous development of deep learning technology, deep learning algorithms are gradually applied to the task of endoscope recognition. The related art manner of endoscopic image recognition is based mostly on deep neural networks, and usually uses an off-the-shelf convolutional neural network, such as ResNet, VGG (Visual Geometry GroupNetwork), inceptionv3, and the like. These algorithms require that the proportions of the sample sizes of the different classes in the training set be balanced. However, in practical applications, the sample number imbalance is normal. For example, during a enteroscopy procedure, the electron enteroscope is not advanced after reaching the ileocecum. Thus, in the ileocecal identification sample set, only a very small portion of the frames in the enteroscopy video contain ileocecal imaging, and the non-ileocecal imaging is far more than the ileocecal imaging. Therefore, the deep learning model in the related art is difficult to obtain good performance in the blind back recognition task under long tail conditions, and the model inevitably tends to predict the blind back image as a non-blind back class in the test. In this case, the decision boundary of the model shifts to the ileocecal class region, and thus ideal generalization performance cannot be obtained.

Aiming at the problem of unbalanced categories, the related technology also enables the sample size among different categories to be relatively balanced through a resampling mode of resampling data, and classical methods comprise a random balanced sampling method, an SMOTE (Synthetic Minority Oversampling Technique, a synthetic minority class oversampling technology) algorithm, a Borderline-SMOTE algorithm, informed Undersampling and the like. Among them, random balanced sampling is relatively simple but has many problems. For example, randomly undersampling samples of the head class (i.e., a greater number of samples) may result in information loss, while randomly oversampling samples of the tail class (i.e., a lesser number of samples) may in turn tend to result in an overfitting. The SMOTE method realizes class balance by generating samples of tail classes, but still has sample overlapping, and can not completely solve the problem of over fitting.

In addition, the related art solves the problem of class imbalance by weighting the head class and the tail class with different class weights, respectively, for example, by weighting the head class and the tail class with the sample size or the inverse square root of the sample size. Although the method can effectively improve the performance of the tail class, the same weight is given to all samples of the same class, and the difference of the influence degree of different samples on model decision is ignored.

In addition, the related art also attempts to assign different weights to each sample, e.g., assigning weights according to the magnitude of confidence, can reduce the weight of well-identified samples and assign more weight to difficult samples that produce higher errors. In the back-blind identification problem, we expect images of the back-blind class to be given more weight to alleviate the imbalance problem. However, there are also a large number of difficult samples in the non-ileocecal category, especially those that are easily confusing with ileocecal, such as images containing larger polyps. Thus, under the extremely unbalanced back-blind recognition problem, the difficult samples are still dominated by the non-back-blind images, and the model is extremely easy to over-fit to these large sample sizes of difficult samples. In this case, the decision boundary will be overly dependent on a large number of non-blind images, and the decision boundary will be offset towards the area of the blind class, affecting the accuracy of the blind identification. Whereas for the re-weighting method based on the influence function, the sample weights are highly dependent on a set of unbiased validation sets. In actual blind-back recognition tasks, it is difficult to obtain large-scale unbiased verification sets, and weights obtained from small-scale "unbiased" verification sets tend to be unreliable.

In view of this, the embodiments of the present disclosure provide a training method for an endoscopic image recognition model, so as to determine a sample weight of each sample image according to a sample influence value, reduce a problem that a large number of samples occupy a dominant role in training, and improve generalization performance and robustness of the endoscopic image recognition model.

FIG. 1 is a flowchart illustrating a method of training an endoscopic image recognition model according to an exemplary embodiment of the present disclosure. Referring to fig. 1, the endoscopic image recognition model is used for recognizing a ileocecum, and includes the steps of:

step 101, a sample endoscopic image set is acquired.

Step 102, inputting a sample image into an endoscope image recognition model for each sample image to obtain a corresponding prediction blind back result, determining a sample influence value of the sample image on the endoscope image recognition model according to the prediction blind back result and the sample blind back result corresponding to the sample image, and determining a sample weight of the sample image according to the sample influence value. Wherein the sample impact value is inversely related to the sample weight.

And step 103, determining a target loss function value according to the sample blind back result, the prediction blind back result and the sample weight corresponding to each sample image.

And step 104, adjusting parameters of the endoscope image recognition model according to the objective loss function value.

Wherein the sample endoscopic image set comprises a back-blind type sample image with a back-blind portion and a non-back-blind type sample image without a back-blind portion, and each sample image in the sample endoscopic image set is labeled with a sample back-blind result for characterizing whether the back-blind portion is present. For example, a sample blind back result with a blind back portion may be labeled with 1, and a sample blind back result without a blind back portion may be labeled with 0, which is not limited by the embodiments of the present disclosure.

For each sample image in the sample endoscope image set, the sample image can be input into an endoscope image recognition model to obtain a corresponding prediction blind back result, wherein the prediction blind back result is a result of whether the sample image predicted by the endoscope image recognition model has a blind back part or not. And then, according to the predicted blindness result and the sample blindness result corresponding to the sample image, determining a sample influence value of the sample image on the endoscope image recognition model.

In some embodiments, the endoscopic image recognition model is configured to recognize the blindness portion by the classifier, and accordingly, determining the sample impact value of the sample image on the endoscopic image recognition model according to the predicted blindness result and the sample blindness result corresponding to the sample image may be: firstly, according to a predicted blindness result and a sample blindness result corresponding to a sample image, determining a loss function value, determining a gradient tensor of the loss function value on classifier parameters, and then determining a norm of the gradient tensor as a sample influence value of the sample image on an endoscope image recognition model.

Wherein the norms of the gradient tensors may be F (Frobenius) norms, or may be 1 norms, 2 norms, etc., which are not limited by the disclosed embodiments.

For example, note the sample endoscopic image set d= { x ⁽ⁱ⁾ ,y ⁽ⁱ⁾ I 1.ltoreq.i.ltoreq.N }, where x is ⁽ⁱ⁾ Is the ith sample image, y ⁽ⁱ⁾ ∈R ^C×1 Is the marked sample blind back result, C is the category number, and N is the image number of the sample endoscope image set. The feature extraction network is denoted as f _θ The network parameter is theta, and the extracted characteristic is marked as z ⁽ⁱ⁾ ＝f _θ (x ⁽ⁱ⁾ )∈R ^K×1 Where K is the feature dimension. The softmax activated full-connection layer is used as a classifier, and the parameter is w= [ w ] ₁ ,w ₂ ,…,w _C ]∈R ^K×C . Let σ be the softmax operation, the predicted classification probability is

Wherein w is ^T Representing the transpose of the parameter w. Thus, the sample effect of the sample image on the endoscope image recognition model is the effect of the ith sample image on the decision layer parameter w.

It should be appreciated that since the number of non-blind back images predominates in difficult samples, embodiments of the present disclosure use gradient information to focus on the impact of a particular sample on blind back recognition decisions to learn a blind back recognition model with better generalization performance. In particular, in embodiments of the present disclosure, the sample impact value of the ith sample image may be the gradient tensor of its loss versus classifier parameter F norm of (c): i (I, w) = ||Δ _w H(y ⁽ⁱ⁾ ,σ(w ^T z ⁽ⁱ⁾ ))|| _F . Wherein I (I, w) represents the sample influence value of the I-th sample image on the classifier parameter w, Δ _w Represents the gradient of the classifier parameter w, H represents the cross entropy loss function (cross entropy loss), Δ _w H∈R ^K×C 。

It should be further appreciated that the automatic derivative engine of the deep learning framework of the related art calculates the loss function value and gradient in batches (batch), that is, the gradient of the classifier parameter w calculated in the back propagation of the related art is the average of the gradients over M samples within a batch, that is

Accordingly, the present disclosure provides a way to determine a single sample gradient tensor.

Taking the softmax classifier using cross entropy loss function as an example, note

Because of

According to the chain law, it is possible to obtain:

wherein H is ⁽ⁱ⁾ Representing the cross entropy loss function value corresponding to the ith sample image,/->

Predictive outcome indicating that the ith sample image belongs to the c-th class,/>

Sample results of the ith sample image belonging to the c-th category, w _c,k Parameters representing the kth dimension characteristic of the classifier corresponding to the c th class,/for the classifier>

Representing the kth dimension feature corresponding to the ith sample image.

Thus, the gradient tensor of the i-th sample image can be expressed as:

that is, it is possible to first determine the predicted blindness return result corresponding to the ith sample image +. >

And sample blindness results->

The loss function value is determined. Then, a gradient tensor of the loss function value to the classifier parameter is determined. Finally, F-norm of the gradient tensor is delta _w H ⁽ⁱ⁾ || _F A sample impact value for the i-th sample image on the endoscope image recognition model is determined.

After obtaining the sample influence value of the sample image on the endoscope image recognition model, the sample weight of the sample image can be determined according to the sample influence value.

In some embodiments, the inverse of the sample impact value may be determined and multiplied by a first preset hyper-parameter to obtain a sample weight for the sample image. The first preset super parameter may be set to a suitable value according to an actual situation, which is not limited in the embodiment of the present disclosure.

For example, the sample weight of the sample image may be determined from the sample influence value according to the following calculation formula:

wherein lambda is _i Sample weight, alpha, representing the ith sample image ₁ Representing a first preset super parameter.

It should be appreciated that in training an endoscopic image recognition model for back-blind recognition, a large number of non-back-blind sample points located near complex decision boundaries support the decision boundaries. If those non-blind samples located near the decision boundary are removed, the decision boundary will become smoother, which may lead to better generalization performance. Therefore, the embodiment of the disclosure introduces sample influence to estimate the change of each sample image to the blind recognition decision, and the influence of the training data with large sample influence on the blind recognition decision is larger, so that the weight of the samples is reduced, and the model is enabled to pay attention to other samples better.

Therefore, the sample influence value is inversely related to the sample weight, so that the sample with large influence on the endoscope image recognition model has smaller sample weight, and the sample with small influence on the endoscope image recognition model has larger sample weight, thereby better adapting to the class unbalance training, reducing the problem that a large number of samples take the dominant role in the training, and improving the generalization performance and the robustness of the endoscope image recognition model.

In some embodiments, the number of images in the sample endoscopic image set may be divided by the number of images in the same class of sample images in the sample endoscopic image set to obtain a sample proportion of the sample images, and the difference value obtained by subtracting the predictive blindness result from 1 is determined as the difficulty in identifying the sample images by the endoscopic image identification model. Accordingly, determining the sample weight of the sample image according to the sample influence value may be: multiplying the reciprocal of the sample influence value by a second preset super parameter and then multiplying the reciprocal of the sample influence value by a first preset power of the sample proportion to obtain a sample weight of the sample image; or multiplying the reciprocal of the sample influence value by a second preset hyper-parameter and then multiplying the reciprocal of the sample influence value by a second preset power of the recognition difficulty to obtain a sample weight of the sample image; or multiplying the reciprocal of the sample influence value by a second preset super parameter and then multiplying the reciprocal by the first preset power of the sample proportion and the second preset power of the recognition difficulty to obtain the sample weight of the sample image.

The second preset super parameter, the first preset secondary and the second preset secondary may be used to control the influence degree of each part of numerical values on the sample weight, and may be set according to practical situations, which is not limited in the embodiments of the present disclosure. It should be further understood that the endoscopic image recognition model is equivalent to a classification model, and the output predictive blindness result may be used to classify whether the endoscopic image includes a blindness portion, and specifically may include a predictive probability that the endoscopic image belongs to a certain class (for example, includes the blindness portion as one class and does not include the blindness portion as another class). Therefore, in the embodiment of the disclosure, the difference value obtained by subtracting the predicted blindness result from 1 can be determined as the difficulty in identifying the sample image by the endoscope image identification model.

For example, the sample weight of the sample image may be obtained as follows:

wherein alpha is ₂ Representing a second preset super parameter, N _c Representing the number of images of the same category of the i-th sample image in the set of sample endoscopic images, β representing a first preset power.

For another example, the sample weight of the sample image may be obtained as follows:

wherein γ represents a second preset power.

Wherein the larger the sample proportion, the smaller the corresponding sample weight, and the smaller the sample proportion, the larger the corresponding sample weight, so that the samples of the head class (i.e., the more samples) can be prevented from being dominant in training. Prediction result

The higher the value of (C) is, the better the model can identify the sample, and the characteristics of the sample do not need to be learned excessively, so that the reduction can be realizedThe corresponding sample weight is small. The sample influence value is inversely related to the sample weight, so that the sample with large influence on the endoscope image recognition model has smaller sample weight, and the sample with small influence on the endoscope image recognition model has larger sample weight.

Therefore, in a training scene of blind back recognition, the sample weight of the ith sample is determined by combining the recognition difficulty of the ith sample, the sample proportion of blind back and non-blind back categories and the influence of the ith sample on the blind back recognition model, so that the method can better adapt to category unbalanced training, reduce the problem that a large number of samples are dominant in training, and improve the generalization performance and the robustness of the endoscope image recognition model.

After the sample weight of each sample image is obtained, the objective loss function value can be determined according to the sample blind back result, the prediction blind back result and the sample weight corresponding to each sample image.

In some embodiments, the loss function value may be determined according to a sample blind back result and a prediction blind back result corresponding to each sample image, and the loss function value of each sample image is multiplied by a corresponding sample weight and summed to obtain the target loss function value. For example, the objective loss function value L may be obtained as follows:

of course, when the calculation formulas of the sample influence values are different, the calculation formulas of the objective loss function values L may be correspondingly different, and in practical application, the sample influence values may be determined by any of the above manners, so as to obtain the corresponding objective loss function values, which is not limited in the embodiments of the present disclosure.

It should be appreciated that in the rebalancing problem of long-tail datasets, balancing of categories may compromise the model's learning of a generic feature representation, where a better feature representation can be learned using the original distribution of picture features. Therefore, the embodiment of the disclosure also provides a decoupled two-stage training mode, and the feature representation learning and the blind recognition learning can be divided into two training periods.

In some embodiments, an unlabeled pre-trained endoscopic image set may be acquired, then an unsupervised training is performed on the feature extraction network through the pre-trained endoscopic image set, and parameters of a backbone network of the trained feature extraction network are migrated to a backbone network of the endoscopic image recognition model to obtain a pre-trained endoscopic image recognition model. Accordingly, adjusting parameters of the endoscope image recognition model according to the objective loss function value may be: and adjusting parameters of the pre-training endoscope image recognition model according to the objective loss function value.

In practical applications, the number of blind images in the blind recognition dataset is rare, and if the model is trained using the classification loss in the first training period, the model is difficult to learn the distinguishing characteristics. In other words, the model easily predicts all samples as non-blind-back classes, resulting in a pattern Collapse (Collapse) of the training, so that meaningful characterization cannot be learned. Therefore, unlike the supervised pretraining method in the prior art, the embodiment of the disclosure introduces an unsupervised contrast pretraining scheme, and uses example-level contrast learning through the unlabeled pretrained endoscopic image set, so that the universal distinguishing features are effectively learned. For example, a Siamese self-supervised representation learning scheme may be employed, requiring no negative samples and additional memory space (memory bank) to train.

Thereafter, referring to fig. 2, parameters of the backbone network of the trained feature extraction network may be migrated to the backbone network of the endoscopic image recognition model to obtain a pre-trained endoscopic image recognition model. That is, the backbone network parameters obtained from the first stage training are used to initialize the second stage training. In the second stage, classification can be performed by a linear classifier and parameters of the model are adjusted by objective loss function values. The determination of the objective loss function value may be referred to above, and will not be described herein.

Thus, the endoscopic image representation learning and blind recognition learning processes can be decoupled. In the first stage, the model is trained by a self-supervision representation learning method to obtain a feature extraction network with high discrimination performance. In the second stage, the model is finely adjusted under the blind identification task by combining the sample influence-based re-weighting method, so that the endoscope image characteristic representation learning under the severe unbalanced scene of the sample category is realized, the accuracy of blind identification learning is improved, and the mode collapse problem in the category loss training during category unbalance is reduced.

In some embodiments, the endoscopic image recognition model is used to identify the ileocecum through the classifier, and since the backbone network of the endoscopic image recognition model is already pre-trained in the first stage, the learning rate of the backbone network in the endoscopic image recognition model may be set to be less than the learning rate of the classifier in the second stage. For example, the learning rate of the backbone network in the endoscopic image recognition model can be set to be one tenth of that of the classifier. Thus, the training efficiency of the second stage can be improved.

By means of any training mode, a robust recognition model can be learned from a blind recognition data set with severely unbalanced category distribution. The loss function can be modulated by combining three factors of sample influence, class balance and difficult sample mining, so that class imbalance training is better adapted, and samples of the head class are prevented from occupying the dominant part in the training. In addition, the self-supervision representation pre-training method can be introduced into a sample class imbalance blind recognition training task, and the mode collapse problem in class imbalance class loss training is reduced through a decoupling two-stage blind recognition training method.

Based on the same conception, the present disclosure also provides an endoscopic image recognition method. Referring to fig. 3, the endoscopic image recognition method includes:

step 301, obtaining an endoscope image to be identified;

step 302, inputting the endoscope image into an endoscope image recognition model to obtain a blind back recognition result corresponding to the endoscope image. Wherein the endoscope image recognition model is obtained by the training method of any one of the endoscope image recognition models.

For example, the acquisition of the endoscopic image may be acquisition from an endoscopic apparatus. In particular embodiments, the method for recognizing an endoscopic image provided by the present disclosure may be applied to a control unit of an endoscopic apparatus, where the control unit may perform the method for recognizing an endoscopic image provided by the present disclosure after obtaining the endoscopic image collected by an image collection unit of the endoscopic apparatus, so as to determine a blind back recognition result corresponding to the endoscopic image through a trained endoscopic image recognition model. Alternatively, the method for recognizing an endoscopic image provided by the present disclosure may be applied to a medical system including an endoscopic apparatus, and a control device in the medical system may communicate with the endoscopic apparatus in a wired or wireless manner, so that an endoscopic image may be acquired from the endoscopic apparatus, and the method for recognizing an endoscopic image provided by the present disclosure may be performed, so that a blindness recognition result corresponding to the endoscopic image is determined through a trained endoscopic image recognition model.

Therefore, the problem that samples with a large number of reduced sample influence values occupy the main part in the training process of the endoscope image recognition model is solved, and the generalization performance and the robustness of the endoscope image recognition model are improved, so that in the model application stage, the input endoscope image is subjected to blind-back recognition through the endoscope image recognition model, and a more accurate blind-back recognition result can be obtained.

Based on the same conception, the embodiment of the disclosure also provides a training device of an endoscope image recognition model, wherein the endoscope image recognition model is used for recognizing the ileocecum part, and the device can be part or all of the electronic equipment through software, hardware or a combination of the two modes. Referring to fig. 4, an endoscope image recognition model training apparatus 400 includes:

a first acquisition module 401, configured to acquire a sample endoscopic image set, where the sample endoscopic image set includes a blindness-back type sample image with a blindness-back portion and a non-blindness-back type sample image without a blindness-back portion, and each sample image in the sample endoscopic image set is labeled with a sample blindness-back result for characterizing whether the blindness-back portion exists;

a first training module 402, configured to input, for each of the sample images, the sample image into the endoscope image recognition model to obtain a corresponding predictive blindness-back result, determine a sample influence value of the sample image on the endoscope image recognition model according to the predictive blindness-back result and the sample blindness-back result corresponding to the sample image, and determine a sample weight of the sample image according to the sample influence value;

A second training module 403, configured to determine an objective loss function value according to the sample blind-back result, the predicted blind-back result, and the sample weight corresponding to each sample image;

a third training module 404, configured to adjust parameters of the endoscopic image recognition model according to the objective loss function value.

Optionally, the endoscope image recognition model is used to recognize the blindness by a classifier, and the first training module 402 is configured to:

determining a loss function value according to the prediction blind back result and the sample blind back result corresponding to the sample image, and determining a gradient tensor of the loss function value on the classifier parameter;

determining a norm of the gradient tensor as a sample influence value of the sample image on the endoscope image recognition model.

Optionally, the first training module 402 is configured to:

and determining the reciprocal of the sample influence value, and multiplying the reciprocal by a first preset super-parameter to obtain the sample weight of the sample image.

Optionally, the apparatus 400 further includes:

a fourth training module, configured to divide the number of images in the sample endoscopic image set by the number of images in the same category of the sample images in the sample endoscopic image set to obtain a sample proportion of the sample images, and determine a difference value obtained by subtracting the predictive blindness result from 1 as a difficulty in recognition of the sample images by the endoscopic image recognition model;

The first training module 402 is configured to:

multiplying the reciprocal of the sample influence value by a second preset super parameter and then multiplying the reciprocal of the sample influence value by a first preset power of the sample proportion to obtain a sample weight of the sample image; or alternatively

Multiplying the reciprocal of the sample influence value by the second preset super parameter and then multiplying the reciprocal of the sample influence value by the second preset power of the recognition difficulty to obtain the sample weight of the sample image; or alternatively

Multiplying the reciprocal of the sample influence value by the second preset hyper-parameter and then multiplying the reciprocal of the sample influence value by the first preset power of the sample proportion and the second preset power of the recognition difficulty to obtain the sample weight of the sample image.

Optionally, the second training module 403 is configured to:

and determining a loss function value according to the sample blind back result and the prediction blind back result corresponding to each sample image, multiplying the loss function value of each sample image by the corresponding sample weight, and then summing to obtain a target loss function value.

Optionally, the apparatus 400 further comprises a pre-training module for:

acquiring an unlabeled pre-trained endoscope image set;

performing unsupervised training on a feature extraction network through the pre-training endoscope image set, and migrating parameters of a main network of the feature extraction network after training to the main network of the endoscope image recognition model to obtain a pre-training endoscope image recognition model;

The third training module 404 is configured to:

and adjusting parameters of the pre-trained endoscope image recognition model according to the objective loss function value.

Optionally, the endoscope image recognition model is used for recognizing the blindness part through a classifier, and the learning rate of the backbone network in the endoscope image recognition model is smaller than that of the classifier.

Based on the same conception, the present disclosure also provides an endoscopic image recognition apparatus that may be part or all of an endoscopic device or other electronic device by means of software, hardware, or a combination of both. Referring to fig. 5, an endoscopic image recognition apparatus 500 includes:

a second acquisition module 501 for acquiring an endoscope image to be identified;

the recognition module 502 is configured to input the endoscopic image into an endoscopic image recognition model, and obtain a blind recognition result corresponding to the endoscopic image, where the endoscopic image recognition model is obtained by using any one of the above-mentioned training methods of the endoscopic image recognition model.

Based on the same conception, the present disclosure also provides a non-transitory computer readable medium having stored thereon a computer program which, when executed by a processing device, implements the steps of the training method or the endoscope image recognition method of any one of the above-described endoscope image recognition models.

Based on the same concept, the present disclosure also provides an electronic device, comprising:

a storage device having a computer program stored thereon;

and the processing device is used for executing the computer program in the storage device to realize the training method of any endoscope image recognition model or the steps of the endoscope image recognition method.

Referring now to fig. 6, a schematic diagram of an electronic device 600 suitable for use in implementing embodiments of the present disclosure is shown. The terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 6 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.

As shown in fig. 6, the electronic device 600 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 601, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

In general, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, magnetic tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 shows an electronic device 600 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via communication means 609, or from storage means 608, or from ROM 602. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing device 601.

It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

In some implementations, communications may be made using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: obtaining a sample endoscope image set, wherein the sample endoscope image set comprises a back-blind type sample image with a back-blind part and a non-back-blind type sample image without a back-blind part, and each sample image in the sample endoscope image set is marked with a sample back-blind result for representing whether the back-blind part exists or not; inputting the sample image into the endoscope image recognition model for each sample image to obtain a corresponding prediction blind back result, determining a sample influence value of the sample image on the endoscope image recognition model according to the prediction blind back result and the sample blind back result corresponding to the sample image, and determining a sample weight of the sample image according to the sample influence value, wherein the sample influence value is inversely related to the sample weight; determining a target loss function value according to the sample blind back result, the prediction blind back result and the sample weight corresponding to each sample image; and adjusting parameters of the endoscope image recognition model according to the objective loss function value.

Alternatively, the computer-readable medium carries one or more programs that, when executed by the electronic device, cause the electronic device to: acquiring an endoscope image to be identified; and inputting the endoscope image into an endoscope image recognition model to obtain a blind recognition result corresponding to the endoscope image, wherein the endoscope image recognition model is obtained by the training method of any one of the endoscope image recognition models.

Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present disclosure may be implemented in software or hardware. The name of a module does not in some cases define the module itself.

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

Moreover, although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims. The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

Claims

1. A training method of an endoscopic image recognition model for recognizing a ileocecal section, the method comprising:

2. The method according to claim 1, wherein the endoscope image recognition model is used for recognizing a blindness back portion by a classifier, and the determining a sample influence value of the sample image on the endoscope image recognition model according to the predicted blindness back result and the sample blindness back result corresponding to the sample image includes:

3. The method according to claim 1 or 2, wherein said determining a sample weight of the sample image from the sample influence value comprises:

4. The method according to claim 1 or 2, characterized in that the method further comprises:

dividing the number of images of the sample endoscope image set by the number of images of the same category of the sample images in the sample endoscope image set to obtain a sample proportion of the sample images, and determining a difference value obtained by subtracting the predicted blindness result from 1 as the recognition difficulty of the endoscope image recognition model on the sample images;

the determining the sample weight of the sample image according to the sample influence value comprises the following steps:

5. The method according to claim 1 or 2, wherein said determining an objective loss function value from the sample blindness result, the prediction blindness result and the sample weight for each of the sample images comprises:

6. The method according to claim 1 or 2, characterized in that the method further comprises:

acquiring an unlabeled pre-trained endoscope image set;

adjusting parameters of the endoscope image recognition model according to the objective loss function value, including:

7. The method of claim 6, wherein the endoscopic image recognition model is used to identify a blindness via a classifier, wherein a learning rate of the backbone network in the endoscopic image recognition model is less than a learning rate of the classifier.

8. An endoscopic image recognition method, the method comprising:

acquiring an endoscope image to be identified;

inputting the endoscope image into an endoscope image recognition model to obtain a blind recognition result corresponding to the endoscope image, wherein the endoscope image recognition model is obtained through the training method of the endoscope image recognition model according to any one of claims 1-7.

9. A training device for an endoscopic image recognition model for recognizing a ileocecal section, the device comprising:

10. An endoscopic image recognition device, the device comprising:

the recognition module is used for inputting the endoscope image into an endoscope image recognition model to obtain a blind recognition result corresponding to the endoscope image, wherein the endoscope image recognition model is obtained through the training method of the endoscope image recognition model according to any one of claims 1-7.

11. A non-transitory computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processing device, implements the steps of the method according to any one of claims 1-8.

12. An electronic device, comprising:

a storage device having a computer program stored thereon;

processing means for executing said computer program in said storage means to carry out the steps of the method according to any one of claims 1-8.