CN114445670A

CN114445670A - Training method, device and equipment of image processing model and storage medium

Info

Publication number: CN114445670A
Application number: CN202210373873.1A
Authority: CN
Inventors: 周彦宁; 肖凯文; 叶虎; 蔡德; 马兆轩; 韩骁
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-04-11
Filing date: 2022-04-11
Publication date: 2022-05-06
Anticipated expiration: 2042-04-11
Also published as: CN114445670B

Abstract

The application discloses a training method, a training device, equipment and a storage medium of an image processing model, and relates to the technical field of artificial intelligence. The method comprises the following steps: obtaining a student prediction result of the source domain image with the label through a student detection network; respectively obtaining a student prediction result and a teacher prediction result of the unlabeled target domain image through a student detection network and a teacher detection network; obtaining a domain classification result of the source domain image and a domain classification result of the target domain image through a domain classification network; and training the image processing model according to the student prediction result of the source domain image, the student prediction result and teacher prediction result of the target domain image and the domain classification result. According to the image processing model recognition method and device, the image processing model is trained on the basis of the labeled source domain images and the unlabeled target domain images, so that under the condition that the number of the labeled source domain images is limited, the recognition accuracy of the image processing model can be ensured through a large number of unlabeled target domain images.

Description

Training method, device and equipment of image processing model and storage medium

Technical Field

The embodiment of the application relates to the technical field of artificial intelligence, in particular to a training method, a training device, training equipment and a storage medium for an image processing model.

Background

Currently, artificial intelligence techniques have begun to be applied to cytological diagnosis to assist pathologists in diagnosis. Among them, the identification of diseased cells has an important influence on cytological diagnosis.

In the related art, a model is trained based on a labeled image (such as an image with a lesion cell label), so that a model capable of identifying lesion cells is obtained. For example, the model is trained based on labeled Images corresponding to WSI (Whole Slide Images). However, due to the complexity of cell morphology and the specificity of the images, the tags require repeated review and determination by a professional, experienced pathologist to obtain the tags, and therefore, the tagged images are very limited. In the case of limited tagged images, the accuracy of the model is also limited, and the recognition accuracy of the model is not high.

Disclosure of Invention

The embodiment of the application provides a training method, a training device, equipment and a storage medium for an image processing model, which can improve the robustness and the generalization of the image processing model and improve the recognition accuracy of the image processing model.

According to an aspect of an embodiment of the present application, there is provided a training method of an image processing model, the image processing model including a student detection network, a teacher detection network, and a domain classification network; the method comprises the following steps:

processing the labeled source domain image through the student detection network to obtain a student prediction result of the source domain image, wherein the student prediction result of the source domain image is used for indicating the student prediction position and the student prediction category of a target object in the source domain image;

respectively processing the target domain image without the label through the student detection network and the teacher detection network to obtain a student prediction result and a teacher prediction result of the target domain image; the student prediction result of the target domain image is used for indicating the student prediction position and the student prediction category of the target object in the target domain image, and the teacher prediction result of the target domain image is used for indicating the teacher prediction position and the teacher prediction category of the target object in the target domain image;

respectively judging the feature information of the source domain image and the feature information of the target domain image obtained by the student detection network through the domain classification network to obtain a domain classification result of the source domain image and a domain classification result of the target domain image;

and training the image processing model according to the student prediction result of the source domain image, the student prediction result and teacher prediction result of the target domain image and the domain classification result.

According to an aspect of an embodiment of the present application, there is provided a training apparatus for an image processing model, the image processing model including a student detection network, a teacher detection network, and a domain classification network; the device comprises:

the source domain image processing module is used for processing the source domain image with the label through the student detection network to obtain a student prediction result of the source domain image, and the student prediction result of the source domain image is used for indicating the student prediction position and the student prediction category of the target object in the source domain image;

the target domain image processing module is used for respectively processing the target domain images without labels through the student detection network and the teacher detection network to obtain student prediction results and teacher prediction results of the target domain images; the student prediction result of the target domain image is used for indicating the student prediction position and the student prediction category of the target object in the target domain image, and the teacher prediction result of the target domain image is used for indicating the teacher prediction position and the teacher prediction category of the target object in the target domain image;

a classification result obtaining module, configured to respectively judge, through the domain classification network, the feature information of the source domain image and the feature information of the target domain image obtained by the student detection network, so as to obtain a domain classification result of the source domain image and a domain classification result of the target domain image;

and the processing model training module is used for training the image processing model according to the student prediction result of the source domain image, the student prediction result and teacher prediction result of the target domain image and the domain classification result.

According to an aspect of embodiments of the present application, there is provided a computer device comprising a processor and a memory, the memory having stored therein a computer program, the computer program being loaded and executed by the processor to implement the above-mentioned training method of an image processing model.

According to an aspect of embodiments of the present application, there is provided a computer-readable storage medium, in which a computer program is stored, the computer program being loaded and executed by a processor to implement the above-mentioned training method for an image processing model.

According to an aspect of embodiments herein, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to cause the computer device to execute the training method of the image processing model.

The technical scheme provided by the embodiment of the application at least comprises the following beneficial effects.

The image processing model can be trained based on the labeled source domain images and the unlabeled target domain images, so that the image processing model with high recognition accuracy can be obtained through a large number of unlabeled target domain images under the condition that the number of the labeled source domain images is limited, and compared with the prior art which only depends on a small number of labeled source domain images, the image processing model can effectively improve the recognition accuracy of the image processing model by adopting the technical scheme provided by the embodiment of the application. Meanwhile, only a small amount of labeled source domain images are needed, so that the training cost of the image processing model can be effectively reduced, and the labeled source domain images can be more efficiently utilized.

In addition, the image processing model is trained based on the labeled source domain image and the unlabeled target domain image, so that the trained image processing model can process both the source domain image and the target domain image, and the generalization of the image processing model is further improved. Meanwhile, images of a plurality of domains are used for training the image processing model, so that the network characteristics corresponding to the image processing model have domain invariance, and the robustness of the image processing model in a target domain is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic illustration of an environment for implementing an embodiment provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of a domain classifier provided by one embodiment of the present application;

FIG. 3 is a flow chart of a method for training an image processing model provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of an image processing model provided by one embodiment of the present application;

FIG. 5 is a schematic diagram of a method for training an image processing model according to another embodiment of the present application;

FIG. 6 is a block diagram of an apparatus for training an image processing model according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Computer Vision technology (CV) is a science for researching how to make a machine "look", and more specifically, it refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or is transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image Recognition, image semantic understanding, image retrieval, OCR (Optical Character Recognition), video processing, video semantic understanding, video content/behavior Recognition, three-dimensional object reconstruction, 3D (3-dimensional) technology, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face Recognition and fingerprint Recognition.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.

The technical scheme provided by the embodiment of the application relates to technologies such as artificial intelligence machine learning and computer vision, the computer vision technology is used for extracting the features of the image, the machine learning technology is used for training an image processing model (such as a student detection network, a teacher detection network and a domain classification network in the image processing model), and then the trained teacher detection network is used for realizing the task of identifying the image.

According to the method provided by the embodiment of the application, the execution main body of each step can be a computer device, and the computer device refers to an electronic device with data calculation, processing and storage capabilities. The Computer device may be a terminal such as a PC (Personal Computer), a tablet, a smartphone, a wearable device, a smart robot, or the like; or may be a server. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing services.

The technical scheme provided by the embodiment of the application is suitable for any tasks which are difficult to acquire a large number of labeled images, such as cytology diagnosis tasks, lesion cell identification tasks, medical image segmentation tasks, cell segmentation tasks and the like. The following describes a method for training an image processing model according to an embodiment of the present application in detail.

Refer to fig. 1, which illustrates a schematic diagram of an environment for implementing an embodiment of the present application. The embodiment implementation environment may include a model training apparatus 10 and a model using apparatus 20.

The model training device 10 may be an electronic device such as a PC, a computer, a tablet, a server, a smart robot, or some other electronic device with greater computing power. The model training apparatus 10 is used to train the image processing model 30.

In the embodiment of the present application, the image processing model 30 is a neural network model for image processing. Illustratively, the image processing model 30 is a neural network model for identifying diseased cells. For example, the image processing model 30 may detect the location and class of diseased cells from the cell slice images to be identified. Alternatively, the model training apparatus 10 may train the image processing model 30 in a machine learning manner so that it has better performance.

The trained image processing model 30 can be deployed in the model using apparatus 20 for use to provide the recognition result of the image. The model using device 20 may be a terminal device such as a mobile phone, a computer, a smart television, a multimedia playing device, a wearable device, a medical device, or a server, which is not limited in this application.

In some embodiments, as shown in FIG. 1, the image processing model 30 may include a student detection network 31, a teacher detection network 32, and a domain classification network 33.

The student detection network 31 may be a neural network that may be used to process the image to obtain a predicted location and a predicted category of the target object in the image. Illustratively, the student detection network 31 may be a single stage object detection network, such as RetinaNet or the like. The student detection network 31 may include a backbone network for feature extraction, a first sub-network for category classification, and a second sub-network for target location regression.

The backbone network can be constructed by combining with an FPN (Feature Pyramid) on the basis of ResNet-50, so that multi-scale Feature fusion can be performed through the FPN according to a top-down path and transverse connection, and the Feature extraction capability of the convolutional network is enhanced. The first sub-network is used for identifying the position of the target object in the image based on the feature map acquired by the backbone network to obtain the predicted position of the target object. The second sub-network is used for identifying the type of the target object based on the feature map acquired by the backbone network to obtain the prediction type of the target object. Alternatively, the backbone network may be replaced by other network structures, such as ResNet-101, EfficientNet, Swin-transformer, and the like, which is not limited in this embodiment of the present application.

In some possible embodiments, the student detection network 31 may also be replaced by other forms of detection models, such as Faster R-CNN, and the like, which is not limited in the embodiments of the present application.

The teacher detection network 32 may be a detection network having the same network structure as the student detection network 31 but different network parameters, and may also be used to process the image to obtain the predicted position and the predicted category of the target object in the image. Therefore, consistency constraint can be formed between the student detection network 31 and the teacher detection network 32, and optimization of the student detection network 31 and the teacher detection network 32 under the consistency constraint is further achieved based on images.

Alternatively, the prediction result of the teacher detection network 32 may be used as the pseudo label information of the student detection network 31 to supervise the training of the student detection network 31. Meanwhile, the teacher detection network 32 may be optimized based on the network parameters of the student detection network 31, so as to obtain the teacher detection network 32 with higher identification accuracy. For example, an exponentially weighted average algorithm may be employed to optimize the network parameters of the teacher detection network 32 based on the network parameters of the student detection network 31.

Referring to fig. 2, the domain classification network 33 may include a gradient inversion layer and a domain classifier. Therein, the domain classifier may consist of 4 blocks each consisting of convolution (conv) -Group Normalization (GN) -activation function (ReLU) convolved with one 1 × 1. The output of the domain classifier is normalized to a class prediction probability distribution by a Softmax function. Alternatively, the domain classification network 33 may be a multi-scale domain classifier, which may be connected to the FPN in the student detection network 301, and the domain classification network 33 takes the output of the FPN (e.g., the feature maps corresponding to the P3-P7 layers in fig. 2, i.e., the multi-scale feature map) as an input to perform domain judgment on the feature maps of the images corresponding to the plurality of different scales. In the embodiment of the present application, the domain classification network 33 may determine images of one or more domains. Taking cytological diagnosis as an example, each domain may refer to different shooting devices, each domain may also refer to different hospitals, and each domain may also refer to different production processes of cell slices, which is not limited in the embodiments of the present application.

Alternatively, the domain classification network 33 may be replaced with a domain classification network of a single layer feature. The number of blocks in the domain classification network can be adaptively adjusted, and the classification of the domain corresponding to the domain classification network can be multiple, which is not limited in the embodiment of the present application.

In the present embodiment, the inputs to the image processing model 30 include a labeled source domain image and an unlabeled target domain image. The model training device 10 trains the image processing model 30 based on the output results of the student detection network 31 for the source domain images and the target domain images, the output results of the teacher detection network 32 for the target domain images, and the output results of the domain classification network 33 for the source domain images and the target domain images, and obtains the trained image processing model 30. Alternatively, after the image processing model 30 is trained, a target detection model for detecting the position and the category of the target object from the image to be recognized may be constructed based on the structure and the parameters of the teacher detection network 32. The object detection model is deployed in the model using apparatus 20 for use, i.e., the image recognition task can be performed by the model using apparatus 20.

The source domain image is an image corresponding to a source domain, the target domain image is an image corresponding to a target domain, and there are often large differences between images of different domains. For example, in the scene of identifying diseased cells, the domain division is performed in a hospital, the source domain image may refer to a cell slice image with a label in any hospital, and the target domain image may refer to a cell slice image in any hospital except the hospital. Alternatively, the input to the image processing model 30 may include one or more images corresponding to the target fields, respectively. The number of classifications of a domain may be equal to 1 (corresponding to the source domain) + the number of target domains.

In the following, the technical solution of the present application will be described by way of method examples.

Referring to fig. 3, a flowchart of a training method of an image processing model according to an embodiment of the present application is shown. The execution subject of each step of the method can be the model training device introduced above. The method can include the following steps (301-304).

Step 301, processing the labeled source domain image through a student detection network to obtain a student prediction result of the source domain image, where the student prediction result of the source domain image is used to indicate a student prediction position and a student prediction category of a target object in the source domain image.

The student prediction position refers to the position of a target object predicted by the student detection network in an image, and the student prediction type refers to the type of the target object predicted by the student detection network. The target object in the source domain image may refer to any object in the source domain image. In the embodiment of the present application, the target object may refer to an object of a category specified by a designer, such as a diseased cell in a cell slice image. The above-mentioned tag can be used to indicate the real position of the target object in the source domain image, the real category of the target object, and the domain to which the source domain image belongs.

Illustratively, the tagged data field is the source field:

(ii) a Wherein,

representing a labeled image set from the source domain (e.g., a cell slice image set), the size of the source domain image may be 1280 x 720.

A set of labels corresponding to the representative image,

including the real position (x 1, x2, y1, y 2) and the real category of the target object in the source domain image, (x 1, y 1), (x 1, y 2) are the top-left and bottom-left vertex coordinates, respectively, of the circumscribed rectangle of the target object.

The student detection network and the source domain image are the same as those described in the above embodiments, and the content not described in the embodiment of the present application may refer to the above embodiments, which are not described herein again.

Illustratively, referring to fig. 4, the image processing model 400 includes a student detection network 401, a teacher detection network 402, and a domain classification network 403. Feature extraction is performed on the source domain image through a student detection network 401 to obtain a multi-scale feature map set corresponding to the source domain image, the multi-scale feature map set comprises feature maps of different scales, and the student detection network 401 predicts the position and the type of a target object in the source domain image based on the feature maps of different scales to obtain a student predicted position 1 and a student predicted type 1 of the target object in the source domain image.

Step 302, respectively processing the target domain image without the label through a student detection network and a teacher detection network to obtain a student prediction result and a teacher prediction result of the target domain image; the student prediction result of the target domain image is used for indicating the student prediction position and the student prediction category of the target object in the target domain image, and the teacher prediction result of the target domain image is used for indicating the teacher prediction position and the teacher prediction category of the target object in the target domain image.

The target object in the target domain image and the target object in the source domain image may refer to the same class of objects. And the target object corresponding to the student prediction result of the target domain image is the same as the target object corresponding to the teacher prediction result of the target domain image. The teacher predicted location may refer to a location of a target object in an image that the teacher detected the network predicted, and the teacher predicted category may refer to a category of the target object that the teacher detected the network predicted.

Alternatively, in the case where there are a plurality of target domains, the plurality of target domains may be written as n target domains:

，

，…，

wherein

and representing the image set corresponding to the nth target domain, wherein the size of the target domain image can also be 1280 × 720.

The teacher detection network and the target domain image are the same as those described in the above embodiments, and the content that is not described in the embodiments of the present application may refer to the above embodiments, which are not described herein again.

Exemplarily, referring to fig. 4, feature extraction is performed on a target domain image through a student detection network 401 to obtain a multi-scale feature map set corresponding to the target domain image, the student detection network 401 predicts a position and a category of a target object in the target domain image based on the multi-scale feature map set, and a student predicted position 2 and a student predicted category 2 of the target object in the target domain image are obtained after a non-maximum value is suppressed. Feature extraction is performed on the target domain image through the teacher detection network 402 to obtain a multi-scale feature map set corresponding to the target domain image, the teacher detection network 402 predicts the position and the type of the target object in the target domain image based on the multi-scale feature map set, and the teacher predicted position and the teacher predicted type of the target object in the target domain image are obtained after the non-maximum value is suppressed.

Optionally, before processing the target domain image, data enhancement may be performed on the target domain image, and specific contents thereof may be as follows: respectively processing the target domain image by adopting a first data enhancement mode and a second data enhancement mode to obtain a first processed target domain image and a second processed target domain image; the enhancement intensity of the first data enhancement mode is greater than that of the second data enhancement mode; processing the first processed target domain image through a student detection network to obtain a student prediction result of the target domain image; and processing the second processed target domain image through a teacher detection network to obtain a teacher prediction result of the target domain image.

Among them, data enhancement is a method of modifying an image to a certain extent to obtain a new image, such as cropping, flipping, changing colors, scaling, shifting, adding gaussian noise, and the like. Depending on the magnitude of the modification, the data enhancement may be divided into weak data enhancement and strong data enhancement.

In the embodiment of the present application, the first data enhancement mode is a strong data enhancement mode, and the second data enhancement mode is a weak data enhancement mode. The target image after the weak data enhancement is processed through the teacher detection network, the identification accuracy of the teacher detection network can be effectively improved, the target image after the strong data enhancement is processed through the student detection network, the robustness of the student detection network can be effectively improved, and the robustness of the teacher detection network can be further improved.

And step 303, respectively judging the feature information of the source domain image and the feature information of the target domain image obtained by the student detection network through the domain classification network to obtain a domain classification result of the source domain image and a domain classification result of the target domain image.

A domain classification network may be used to determine the domain of the image. For example, a domain classification network may be used to determine whether an image to be identified is from a source domain or a target domain. Optionally, the domain classification network may include a gradient inversion layer and a domain classifier, and the domain classifier is configured to determine the domains of the source domain image and the target domain image based on the feature information of the source domain image and the feature information of the target domain image obtained by the student detection network, so as to obtain a domain classification result of the source domain image and a domain classification result of the target domain image. The gradient reverse layer is used for reversely propagating the gradient optimization direction of the domain classification network to the detection network when the domain classification network and the detection network (the student detection network + the teacher detection network) are jointly trained. Wherein the domain classification result is used for indicating the domain to which the image belongs.

The feature information refers to a multi-scale feature set extracted by a student detecting a backbone network in the network, that is, the feature information includes a plurality of feature maps with different scales. Optionally, the obtaining process of the domain classification result may also be as follows: respectively judging a plurality of feature maps with different scales of a source domain image obtained by a student detection network through a domain classification network to obtain a plurality of domain classification results of the source domain image; and respectively judging a plurality of feature maps with different scales of the target domain image obtained by the student detection network through the domain classification network to obtain a plurality of domain classification results of the target domain image.

For example, referring to fig. 4, feature extraction is performed on the source domain image and the target domain image through the student detection network 402, so as to obtain feature maps of 5 different scales corresponding to the source domain image and feature maps of 5 different scales corresponding to the target domain image. The domain classifier in the domain classification network 403 judges the 10 feature maps respectively, and obtains 10 domain classification results.

And step 304, training the image processing model according to the student prediction result of the source domain image, the student prediction result and teacher prediction result of the target domain image and the domain classification result.

In the embodiment of the application, the image processing model is trained by combining the countercharacteristic learning and the consistency constraint, so that the trained image processing model can be used for accurately identifying images of different domains. Illustratively, a supervised learning method, an antagonistic characteristic learning method and a cross-domain semi-supervised learning method can be adopted to perform joint optimization training on a teacher detection network, a student detection network and a domain classification network, so that training of the image processing model can be completed based on a small number of labeled source domain images and target domain images, and the accuracy, the generalization and the robustness of the image processing model are ensured while the training cost of the image processing model is reduced. Optionally, step 304 may also include several substeps as follows.

Step 304a, determining target classification loss and position regression loss according to the student prediction result of the source domain image and the label information of the source domain image; the target classification loss is used for measuring the class prediction accuracy of the student detection network on the source domain image by taking the position label information of the source domain image as a reference, and the position regression loss is used for measuring the position prediction accuracy of the student detection network on the source domain image by taking the class label information of the source domain image as a reference.

The tag information of the source domain image is used to indicate the true location (corresponding location tag information) and the true category (corresponding category tag information) of the target object. The target classification loss may be determined based on the difference between the student predicted location and the true location of the source domain image, and the location regression loss may be determined based on the difference between the student predicted category and the true category of the source domain image.

For example, referring to FIG. 4, a position regression loss, denoted as SMOOTH-L1 loss function (L1 norm loss function after smoothing, i.e., minimum absolute value deviation after smoothing) is calculated based on the difference between student predicted position 1 and true position

. Calculating target classification loss based on the difference between the student prediction category 1 and the real category by adopting a focus loss method, and recording the target classification loss as

. The focus loss method can be used for solving the problem of unbalance of the positive and negative samples, and is beneficial to further improving the identification accuracy of the image processing model.

Step 304b, determining comparative loss according to the student prediction result and the teacher prediction result of the target domain image; the comparison loss is used for measuring the class and position prediction accuracy of the student detection network on the target domain image by taking the teacher prediction result of the target domain image as the pseudo label information.

The pseudo tag information is used instead of the real tag information as the supervision information. Since the quality of the pseudo tag information greatly affects the training effect of the model, in the embodiment of the present application, sample screening needs to be performed on the teacher detection result of the target image. For example, a target domain image of which the teacher predicted result is greater than or equal to the probability threshold may be selected from the plurality of target domain images; and the student prediction result and the teacher prediction result of the selected target domain image are used for determining the comparison loss.

For example, a teacher prediction category may be used to indicate a category of the target object, and in a case where a prediction probability value corresponding to the teacher prediction category is greater than or equal to a probability threshold, it may be determined that the target object probably belongs to the category corresponding to the probability threshold. Therefore, the target domain images with the prediction probability values corresponding to the teacher prediction categories being larger than or equal to the probability threshold can be reserved, the target domain images with the prediction probability values corresponding to the teacher prediction categories being smaller than the probability threshold are removed to obtain the selected target domain images, and the comparison loss is determined based on the student prediction results and the teacher prediction results of the selected target domain images.

Alternatively, the probability threshold may be set and adjusted according to actual use requirements, such as 95%, 96%, 97%, and the like. Therefore, the target domain image is screened based on the prediction probability value corresponding to the teacher prediction category through the high probability threshold, and a less accurate prediction result can be screened, so that the identification accuracy of the model is improved.

In one example, an SFL (soft Focal Loss) algorithm may be used to calculate the comparison Loss based on the student prediction result and the teacher prediction result of the target domain image. Illustratively, the comparison loss may be determined according to a probability gap between the student prediction result and the teacher prediction result of the target domain image, and the weight parameter; wherein, the weight parameter is dynamically adjusted according to the teacher prediction result of the target domain image.

This comparative loss can be expressed as follows:

；

wherein,

for contrast loss, target is used to indicate whether a target object is present, target =1 indicates that a target object is present in the target domain image, target =0 indicates that a target object is not present in the target domain image,

predicting probability values corresponding to the classes for teachers corresponding to the target domain images,

predicting probability values corresponding to the classes for students corresponding to the target domain images,

for the probability gap between the student's prediction and the teacher's prediction of the target domain image,

in order to be a weight parameter, the weight parameter,

，

and

is a weight (e.g.,

and

may take 0.75 and 0.25, respectively).

For example, referring to fig. 4, a SFL (soft Focal Loss) algorithm is used, and based on the prediction probability value corresponding to the teacher prediction category and the prediction probability value corresponding to the student prediction category 2 of the target domain image, the comparative Loss can be calculated

。

Step 304c, determining domain classification loss according to the domain classification result and the domain classification label; wherein, the domain classification loss is used for measuring the domain classification accuracy of the domain classification network.

The domain classification label is used to indicate the true domain of the image. Optionally, a cross entropy loss function may be adopted, and based on the domain classification result and the domain classification label, the domain classification loss is calculated and recorded as

。

For example, referring to fig. 4, in the case where the input of the image processing model has only one image corresponding to the target domain, the domain classification network 403 outputs 10 domain classification results, where 5 domain classification results correspond to the source domain image and the remaining 5 domain classification results correspond to the target domain image. Calculating to obtain 10 sub-domain classification losses based on the 10 domain classification results and the domain classification labels corresponding to the 10 domain classification results by adopting a cross entropy loss function, and summing the 10 sub-domain classification losses to obtain the domain classification losses

。

And step 304d, training the image processing model according to the target classification loss, the position regression loss, the comparison loss and the domain classification loss.

In the embodiment of the application, the image processing model can be subjected to two-stage iterative training.

The first stage is as follows: and training the student detection network by using the source domain image with the label, and simultaneously performing combined optimization training on the domain classification network and the student detection network by using the source domain image with the label and the target domain image without the label. Illustratively, based on the target classification loss, the positional regression loss, and the domain classification loss, a total loss corresponding to the first stage is determined, which may be represented as follows:

；

at this time, supervised learning may be performed based on the labeled source domain image, while domain-adapted learning may be performed based on the source domain image and the target domain image. Optionally, in the training process of the first stage, a parameter adjustment gradient of the domain classification network may be determined according to the total loss corresponding to the first stage, and the parameter of the domain classification network may be adjusted based on the parameter adjustment gradient of the domain classification network. And determining the parameter adjustment Gradient of the student detection network by adopting a GRL (Gradient reverse Layer) according to the parameter adjustment Gradient of the domain classification network, and adjusting the parameter of the student detection network based on the parameter adjustment Gradient of the student detection network. After multiple rounds of iterative training, the first stage training of the image processing model is completed.

And a second stage: the method comprises the steps of training a student detection network by using a source domain image with a label, simultaneously performing combined optimization training on a domain classification network and the student detection network by using a source domain image with the label and a target domain image without the label, and training the student detection network and a teacher detection network by using constraint consistency and based on the target domain image without the label. Illustratively, the total loss of the image processing model (i.e., the total loss of the second stage) may be determined from the target classification loss, the positional regression loss, the comparison loss, and the domain classification loss, and may be expressed as follows:

；

at this time, supervised learning may be performed based on the labeled source domain images, semi-supervised learning may be performed based on the labeled source domain images and the unlabeled target domain images, and domain-adapted learning may be performed based on the source domain images and the target domain images. And after multiple rounds of iterative training, finishing the second-stage training of the image processing model to obtain the trained image processing model.

Optionally, in the training process of the second stage of training, a parameter adjustment gradient of the domain classification network may be determined according to the total loss, and the parameter of the domain classification network is adjusted based on the parameter adjustment gradient of the domain classification network.

Determining the parameter adjustment gradient of the student detection network by adopting a gradient reverse layer GRL according to the parameter adjustment gradient of the domain classification network, and adjusting the parameter of the student detection network based on the parameter adjustment gradient of the student detection network; and finally, determining the parameters of the teacher detection network by adopting an exponential weighted average algorithm according to the parameters of the student detection network.

In the embodiment of the present application, the domain classifier is required to distinguish the domain of the feature map as much as possible, and the FPN is required to make the domain classification network unable to distinguish the domain of the feature map as much as possible, for example, the domain invariance of the features is enhanced by counteracting the feature learning, so as to improve the robustness of the image processing model. The parameter of the teacher detection network is determined by adopting an exponential weighted average algorithm, so that the identification accuracy of the teacher detection network can be further improved.

Alternatively, after the image processing model is trained, a target detection model for detecting the position and the category of a target object from an image to be recognized may be constructed based on the structure and the parameters of the teacher detection network.

In one example, the target detection model may be used to detect the location and class of diseased cells from the cell slice images to be identified. Take the identification scenario of diseased cells as an example. After the WSI corresponding to the cell slice is obtained, a segmentation method (such as a maximum inter-class variance method) can be adopted to extract a foreground region of the WSI, so as to obtain an image to be identified. And then segmenting the image to be identified based on a grid segmentation mode to obtain a plurality of view level images. And respectively identifying the multiple visual field level images by using the target detection model to obtain positive prediction probability values respectively corresponding to the prediction types corresponding to each visual field level image, if target cells with positive prediction probability values larger than a set threshold (such as 0.5) exist in the target visual field level images, determining that the target cells are pathological cells, and determining that the target visual field level images are positive correspondingly, so that cell slices including the pathological cells are positive. If cells with positive prediction probability values larger than a set threshold (such as 0.5) do not exist in all the visual field level images, the cell section can be judged to be negative.

Optionally, the above lesion cells may be classified into different grades, that is, lesion cells with different lesion degrees may also be identified by using the technical solution provided in the embodiments of the present application. Taking cervical Cells as an example, the positive categories may include ASCUS (Atypical Squamous Cells Of Unknown Significance), LSIL (Low-grade Squamous Intraepithelial Lesion), HSIL (High-grade Squamous Intraepithelial Lesion), and the like. The technical scheme provided by the embodiment of the application can be used for identifying the positive categories.

To sum up, the technical scheme provided by the embodiment of the application can complete the training of the image processing model through the source domain images with the labels and the target domain images without the labels, so that the image processing model with high recognition accuracy can be obtained through a large number of target domain images without the labels under the condition that the number of the source domain images with the labels is limited. Meanwhile, only a small amount of labeled source domain images are needed, so that the training cost of the image processing model can be effectively reduced, and the labeled source domain images can be more efficiently utilized.

In addition, in the embodiment of the present application, the domain classifier is required to distinguish the domain of the feature map as much as possible, and the FPN is required to make the domain classification network unable to distinguish the domain of the feature map as much as possible, for example, the domain invariance of the features is enhanced by counteracting the feature learning, so as to improve the robustness of the image processing model. Meanwhile, the parameter of the teacher detection network is determined by adopting the exponential weighted average algorithm, so that the identification accuracy of the teacher detection model can be further improved.

In an exemplary embodiment, taking training of an image processing model based on a pathological image as an example, the method for training an image processing model provided in the embodiment of the present application is described, which may include the following.

Acquiring a source domain pathological image and a target domain pathological image; the pathological image refers to an image (e.g., WSI) generated based on a slice of a human tissue (e.g., tissue, cells, etc. of a lesion), and the pathological image can reveal the cause and severity of a disease. Illustratively, the pathological image may refer to a pathological image corresponding to cancer, inflammation, tumor, and the like. For example, the pathology image may refer to an image corresponding to a cell slice, such as a corresponding cell slice of cervical cancer, gastric cancer, intestinal cancer, and the like.

In the embodiment of the present application, the source domain pathological image refers to a labeled pathological image, and the target domain pathological image refers to an unlabeled pathological image. For example, the source domain pathology image may refer to a labeled cervical cancer pathology image, and the target domain pathology image may refer to an unlabeled cervical cancer pathology image. The source domain pathology image and the target domain pathology image may be selected from different hospitals, facilities, production procedures, and the like. Alternatively, the target domain pathology image may include a plurality of sub-target domain images from different target domains (e.g., different hospitals).

Optionally, the image processing model includes a student detection network, a teacher detection network and a domain classification network, which are the same as those described in the above embodiments and are not described herein again. The parameter adjustment process of the image processing model may include the following.

And processing the source domain pathological image through a student detection network to obtain a student prediction position and a student prediction category for indicating a pathological object in the source domain pathological image. The lesion object refers to an object in which a disease can be revealed in a pathological image, such as a lesion cell, a lesion tissue, or the like.

And calculating to obtain the position regression loss of the image processing model based on the student predicted position and the real position of the pathological change object in the label information of the source domain pathological image by adopting a smooth-L1 loss function. And calculating to obtain the target classification loss of the image processing model by adopting a focus loss method based on the student prediction category and the real category of the pathological change object in the label information of the source domain pathological image.

And identifying the pathological object through the student detection network based on the multi-scale characteristic diagram to obtain a student prediction position and a student prediction category for indicating the pathological object in the weakly enhanced target domain pathological image.

The method comprises the steps of performing strong data enhancement on a target domain pathological image to obtain a strong enhanced target domain pathological image, performing feature extraction on the strong enhanced target domain pathological image through a teacher detection network to obtain a multi-scale feature map corresponding to the strong enhanced target domain pathological image, and identifying a pathological change object through the teacher detection network based on the multi-scale feature map to obtain a teacher prediction position and a teacher prediction category for indicating the pathological change object in the strong enhanced target domain pathological image.

And screening the prediction results of the student detection network and the teacher detection network based on the prediction probability values corresponding to the teacher prediction categories. For example, teacher predicted results with prediction probability values larger than or equal to the probability threshold corresponding to the teacher predicted category and student predicted results corresponding to the teacher predicted results are reserved.

And calculating to obtain the comparative loss of the image processing model by adopting an SFL algorithm based on the predicted probability value corresponding to the screened student predicted category and the predicted probability value corresponding to the screened teacher predicted category.

And respectively judging the multi-scale characteristic graph of the source domain pathological image and the multi-scale characteristic graph of the target domain pathological image obtained by the student detection network through the domain classification network to obtain a domain classification result of the source domain pathological image and a domain classification result of the target domain pathological image.

And calculating the domain classification loss of the image processing model by adopting a cross entropy loss function based on the difference between the domain classification result of the source domain pathological image and the real domain corresponding to the source domain pathological image and the difference between the domain classification result of the target domain pathological image and the real domain corresponding to the target domain pathological image.

And performing first-stage training on the image processing model based on the target classification loss, the position regression loss and the domain classification loss to obtain the image processing model for completing the first-stage training. In the training process of the first stage, a parameter adjustment gradient of the domain classification network may be determined according to a total loss (i.e., a sum of the target classification loss, the position regression loss, and the domain classification loss) corresponding to the first stage, and a parameter of the domain classification network may be adjusted based on the parameter adjustment gradient of the domain classification network. And determining the parameter adjustment gradient of the student detection network by adopting GRL according to the parameter adjustment gradient of the domain classification network, and adjusting the parameters of the student detection network based on the parameter adjustment gradient of the student detection network.

And performing second-stage training on the image processing model after the first-stage training based on the target classification loss, the position regression loss, the comparison loss and the domain classification loss to obtain the trained image processing model. In the training process of the second stage of training, a parameter adjustment gradient of the domain classification network may be determined according to the total loss (i.e., the sum of the target classification loss, the position regression loss, the comparison loss, and the domain classification loss), and the parameter of the domain classification network may be adjusted based on the parameter adjustment gradient of the domain classification network. Determining the parameter adjustment gradient of the student detection network by adopting a gradient reverse layer GRL according to the parameter adjustment gradient of the domain classification network, and adjusting the parameter of the student detection network based on the parameter adjustment gradient of the student detection network; and finally, determining the parameters of the teacher detection network by adopting an exponential weighted average algorithm according to the parameters of the student detection network.

The trained image processing model can be used for identifying a pathological object in a pathological image so as to assist a pathologist in pathological diagnosis. Alternatively, the lesion object in the pathology image may be identified by a teacher detection network in the trained image processing model.

In summary, the image processing model obtained by the technical solution provided in the embodiment of the present application can accurately identify pathological objects (e.g., pathological cells) in pathological images from different domains to assist a pathologist in reading the pathological images, so as to reduce the reading burden of the pathologist, and further improve the efficiency of pathological diagnosis. Meanwhile, the problem of data misreading caused by the influence of factors such as subjectivity of manual reading, diagnosis experience, emotion, fatigue and difference among pathological images in different domains can be avoided, and the accuracy of pathological diagnosis can be improved.

In addition, compared with the problems of low efficiency, high cost, risk of false labeling and the like of manual labeling in the labeling scene of the pathological images, the labeling of the pathological images by the image processing model obtained by the technical scheme provided by the embodiment of the application can be effectively improved, and the labeling efficiency and accuracy of the pathological images can be effectively improved, and the labeling cost of the pathological images can be reduced.

The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.

Referring to fig. 6, a block diagram of an apparatus for training an image processing model according to an embodiment of the present application is shown. The device can be used for realizing the training method of the image processing model. The apparatus 600 may include: a source domain image processing module 601, a target domain image processing module 602, a classification result obtaining module 603 and a processing model training module 604.

The source domain image processing module 601 is configured to process a source domain image with a tag through the student detection network to obtain a student prediction result of the source domain image, where the student prediction result of the source domain image is used to indicate a student prediction position and a student prediction category of a target object in the source domain image.

A target domain image processing module 602, configured to process an unlabeled target domain image through the student detection network and the teacher detection network, respectively, to obtain a student prediction result and a teacher prediction result of the target domain image; the student prediction result of the target domain image is used for indicating the student prediction position and the student prediction category of the target object in the target domain image, and the teacher prediction result of the target domain image is used for indicating the teacher prediction position and the teacher prediction category of the target object in the target domain image.

A classification result obtaining module 603, configured to respectively determine, through the domain classification network, the feature information of the source domain image and the feature information of the target domain image obtained by the student detection network, so as to obtain a domain classification result of the source domain image and a domain classification result of the target domain image.

And the processing model training module 604 is configured to train the image processing model according to the student prediction result of the source domain image, the student prediction result and the teacher prediction result of the target domain image, and the domain classification result.

In an exemplary embodiment, the process model training module 604 is configured to:

determining target classification loss and position regression loss according to the student prediction result of the source domain image and the label information of the source domain image; the target classification loss is used for measuring the class prediction accuracy of the student detection network on the source domain image by taking the position label information of the source domain image as a reference, and the position regression loss is used for measuring the position prediction accuracy of the student detection network on the source domain image by taking the class label information of the source domain image as a reference;

determining comparative loss according to the student prediction result and the teacher prediction result of the target domain image; the comparison loss is used for measuring the class and position prediction accuracy of the student detection network on the target domain image by taking a teacher prediction result of the target domain image as pseudo label information;

determining domain classification loss according to the domain classification result and the domain classification label; wherein the domain classification loss is used to measure the domain classification accuracy of the domain classification network;

and training the image processing model according to the target classification loss, the position regression loss, the comparison loss and the domain classification loss.

In an exemplary embodiment, the process model training module 604 is further configured to:

determining a total loss of the image processing model according to the target classification loss, the positional regression loss, the comparison loss, and the domain classification loss;

determining a parameter adjustment gradient of the domain classification network according to the total loss, and adjusting the parameters of the domain classification network based on the parameter adjustment gradient of the domain classification network;

determining the parameter adjustment gradient of the student detection network by adopting a gradient reversal layer GRL according to the parameter adjustment gradient of the domain classification network, and adjusting the parameters of the student detection network based on the parameter adjustment gradient of the student detection network;

and determining the parameters of the teacher detection network by adopting an exponential weighted average algorithm according to the parameters of the student detection network.

determining the comparison loss according to the probability difference between the student prediction result and the teacher prediction result of the target domain image and the weight parameter; and the weight parameters are dynamically adjusted according to the teacher prediction result of the target domain image.

selecting a target domain image of which the teacher prediction result is greater than or equal to a probability threshold from the plurality of target domain images; wherein the student prediction result and the teacher prediction result of the selected target domain image are used for determining the comparison loss.

In an exemplary embodiment, the target domain image processing module 602 is configured to:

respectively processing the target domain image by adopting a first data enhancement mode and a second data enhancement mode to obtain a first processed target domain image and a second processed target domain image; the enhancement intensity of the first data enhancement mode is greater than that of the second data enhancement mode;

processing the first processed target domain image through the student detection network to obtain a student prediction result of the target domain image;

and processing the second processed target domain image through the teacher detection network to obtain a teacher prediction result of the target domain image.

In one exemplary embodiment, the feature information includes a plurality of feature maps of different scales; the classification result obtaining module 603 is configured to:

respectively judging a plurality of feature maps of different scales of the source domain image obtained by the student detection network through the domain classification network to obtain a plurality of domain classification results of the source domain image;

and respectively judging a plurality of feature maps with different scales of the target domain image obtained by the student detection network through the domain classification network to obtain a plurality of domain classification results of the target domain image.

In an exemplary embodiment, the apparatus 600 further comprises a target model building module (not shown in FIG. 6).

And the target model building module is used for building a target detection model based on the structure and parameters of the teacher detection network after the image processing model completes training, and the target detection model is used for detecting the position and the category of the target object from the image to be recognized.

In an exemplary embodiment, the target detection model is used to detect the location and class of diseased cells from the cell slice images to be identified.

It should be noted that, when the apparatus provided in the foregoing embodiment implements the functions thereof, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the apparatus may be divided into different functional modules to implement all or part of the functions described above. In addition, the apparatus and method embodiments provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.

Referring to fig. 7, a schematic structural diagram of a computer device according to an embodiment of the present application is shown. The computer device may be any electronic device with data computing, processing and storing capabilities that can be implemented as the model training device 10 and/or the model using device 20 in the environment of implementation of the embodiment shown in fig. 1. Specifically, the following may be included.

The computer device 700 includes a Central Processing Unit (e.g., a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), an FPGA (Field Programmable Gate Array), etc.) 701, a system Memory 704 including a RAM (Random-Access Memory) 702 and a ROM (Read-Only Memory) 703, and a system bus 705 connecting the system Memory 704 and the Central Processing Unit 701. The computer device 700 also includes a basic Input/Output System (I/O) 706 for facilitating information transfer between devices within the server, and a mass storage device 707 for storing an operating System 713, application programs 714, and other program modules 715.

In some embodiments, the basic input/output system 706 includes a display 708 for displaying information and an input device 709, such as a mouse, keyboard, etc., for a user to input information. Wherein the display 708 and input device 709 are connected to the central processing unit 701 through an input output controller 710 coupled to the system bus 705. The basic input/output system 706 may also include an input/output controller 710 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, input-output controller 710 may also provide output to a display screen, a printer, or other type of output device.

The mass storage device 707 is connected to the central processing unit 701 through a mass storage controller (not shown) connected to the system bus 705. The mass storage device 707 and its associated computer-readable media provide non-volatile storage for the computer device 700. That is, the mass storage device 707 may include a computer readable medium (not shown) such as a hard disk or CD-ROM (Compact disk Read-Only Memory) drive.

Without loss of generality, the computer readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM (Erasable Programmable Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), flash Memory or other solid state Memory technology, CD-ROM, DVD (Digital Video Disc) or other optical, magnetic, or other magnetic storage devices. Of course, those skilled in the art will appreciate that the computer storage media is not limited to the foregoing. The system memory 704 and mass storage device 707 described above may be collectively referred to as memory.

The computer device 700 may also operate as a remote computer connected to a network via a network, such as the internet, according to embodiments of the present application. That is, the computer device 700 may be connected to the network 712 through the network interface unit 711 connected to the system bus 705, or may be connected to other types of networks or remote computer systems (not shown) using the network interface unit 711.

The memory also includes a computer program stored in the memory and configured to be executed by the one or more processors to implement the method of training an image processing model described above.

In an exemplary embodiment, a computer-readable storage medium is also provided, in which a computer program is stored which, when being executed by a processor, is adapted to carry out the above-mentioned training method of the image processing model.

Optionally, the computer-readable storage medium may include: ROM (Read-Only Memory), RAM (Random-Access Memory), SSD (Solid State drive), or optical disk. The Random Access Memory may include a ReRAM (resistive Random Access Memory) and a DRAM (Dynamic Random Access Memory).

In an exemplary embodiment, a computer program product or a computer program is also provided, which comprises computer instructions, which are stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and executes the computer instructions to cause the computer device to execute the training method of the image processing model.

It should be noted that information (including but not limited to subject equipment information, subject personal information, etc.), data (including but not limited to data for analysis, stored data, displayed data, etc.), and signals referred to in this application are authorized by the subject or fully authorized by various parties, and the collection, use, and processing of the relevant data is in need of complying with relevant laws and regulations and standards in relevant countries and regions. For example, the annotated source domain image, the annotated target domain image, and the like referred to in this application are all obtained under sufficient authorization.

It should be understood that reference to "a plurality" herein means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. In addition, the step numbers described herein only exemplarily show one possible execution sequence among the steps, and in some other embodiments, the steps may also be executed out of the numbering sequence, for example, two steps with different numbers are executed simultaneously, or two steps with different numbers are executed in a reverse order to the order shown in the figure, which is not limited by the embodiment of the present application.

The above description is only exemplary of the present application and should not be taken as limiting the present application, and any modifications, equivalents, improvements and the like that are made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. The training method of the image processing model is characterized in that the image processing model comprises a student detection network, a teacher detection network and a domain classification network; the method comprises the following steps:

2. The method of claim 1, wherein the training the image processing model based on the student prediction results of the source domain images, the student prediction results and teacher prediction results of the target domain images, and the domain classification results comprises:

3. The method of claim 2, wherein the training the image processing model based on the target classification penalty, the positional regression penalty, the comparison penalty, and the domain classification penalty comprises:

4. The method of claim 2, wherein determining a comparative loss based on the student and teacher predictions of the target domain image comprises:

5. The method of claim 2, further comprising:

selecting a target domain image of which the teacher prediction result is greater than or equal to a probability threshold from the plurality of target domain images;

wherein the student prediction result and the teacher prediction result of the selected target domain image are used for determining the comparison loss.

6. The method of claim 1, wherein the processing the unlabeled target domain image through the student detection network and the teacher detection network to obtain the student prediction result and the teacher prediction result of the target domain image comprises:

7. The method of claim 1, wherein the feature information comprises a plurality of feature maps of different scales;

the step of respectively judging the feature information of the source domain image and the feature information of the target domain image obtained by the student detection network through the domain classification network to obtain the domain classification result of the source domain image and the domain classification result of the target domain image includes:

8. The method of claim 1, further comprising:

after the image processing model is trained, a target detection model is constructed based on the structure and parameters of the teacher detection network, and the target detection model is used for detecting the position and the category of the target object from the image to be recognized.

9. The method of claim 8, wherein the target detection model is used to detect the location and type of diseased cells from the cell slice images to be identified.

10. The training device of the image processing model is characterized in that the image processing model comprises a student detection network, a teacher detection network and a domain classification network; the device comprises:

11. A computer device, characterized in that the computer device comprises a processor and a memory, in which a computer program is stored, which computer program is loaded and executed by the processor to implement the method of training an image processing model according to any of claims 1 to 9.

12. A computer-readable storage medium, in which a computer program is stored which is loaded and executed by a processor to implement the method of training an image processing model according to any one of claims 1 to 9.

13. A computer program product, characterized in that it comprises computer instructions stored in a computer-readable storage medium, from which a processor reads and executes said computer instructions to implement the method of training an image processing model according to any one of claims 1 to 9.