CN115760831A

CN115760831A - Training method and system of image processing model

Info

Publication number: CN115760831A
Application number: CN202211520483.9A
Authority: CN
Inventors: 苏立
Original assignee: Wuhan United Imaging Healthcare Co Ltd
Current assignee: Wuhan United Imaging Healthcare Co Ltd
Priority date: 2022-11-30
Filing date: 2022-11-30
Publication date: 2023-03-07

Abstract

The embodiment of the specification discloses a training method and a training system for an image processing model. Wherein, the method comprises the following steps: obtaining a labeled sample and an unlabeled sample; respectively performing first amplification treatment and second amplification treatment on the unlabeled sample, and determining a first amplification sample and a second amplification sample; wherein the second augmentation process changes the characteristic of the unlabeled exemplar to a greater extent than the first augmentation process; obtaining a target sample based on the first augmented sample and the labeled sample; training an image processing model using the target sample; wherein the loss function of the image processing model comprises a first loss function term constructed based on the prediction result of the image processing model trained in the previous round on the second augmented sample; the trained image processing model is used for processing medical images.

Description

Training method and system of image processing model

Technical Field

The present disclosure relates to the field of image diagnosis, and in particular, to a method and a system for training an image processing model.

Background

The medical image refers to a human body or an experimental body, which interacts with the human body in a non-invasive way by means of a certain medium, so as to obtain an image of a target internal tissue organ, thereby assisting a doctor in diagnosing diseases.

The intensive diagnosis work consumes a great deal of effort of the doctor, and makes it difficult for the doctor to maintain a good state and even possibly make wrong judgment on the state of illness.

Therefore, the present specification proposes a training method and system of an image processing model to assist a doctor in making a diagnosis of a medical image through an artificial intelligence technique.

Disclosure of Invention

One aspect of an embodiment of the present specification provides a method of training an image processing model, the method including: obtaining a labeled sample and an unlabeled sample; respectively carrying out first amplification treatment and second amplification treatment on the unlabeled sample, and determining a first amplification sample and a second amplification sample; wherein the second augmentation process changes the characteristic of the unlabeled exemplar to a greater extent than the first augmentation process; obtaining a target sample based on the first augmented sample and the labeled sample; training an image processing model using the target sample; wherein the loss function of the image processing model comprises a first loss function term constructed based on the prediction result of the image processing model trained in the previous round on the second augmented sample; the trained image processing model is used for processing medical images.

Another aspect of embodiments of the present specification provides a training system for an image processing model. The system comprises: the sample acquisition module is used for acquiring a labeled sample and an unlabeled sample; the augmentation processing module is used for respectively carrying out first augmentation processing and second augmentation processing on the label-free sample and determining a first augmentation sample and a second augmentation sample; wherein the first and second augmentation processes change the characteristics of the unlabeled specimen differently; a target sample obtaining module for obtaining a target sample based on the first augmented sample and the labeled sample; the model training module is used for training an image processing model by using the target sample; wherein the loss function of the image processing model comprises a first loss function term constructed based on the prediction result of the image processing model trained in the previous round on the second augmented sample; the trained image processing model is used for processing medical images.

Another aspect of embodiments of the present specification provides an image processing apparatus comprising at least one storage medium for storing computer instructions and at least one processor; the at least one processor is configured to execute the computer instructions to implement a training method for an image processing model.

Another aspect of embodiments of the present specification provides a computer-readable storage medium storing computer instructions, and a computer executes a training method of an image processing model when the computer instructions in the storage medium are read by the computer.

Drawings

The present description will be further explained by way of exemplary embodiments, which will be described in detail by way of the accompanying drawings. These embodiments are not intended to be limiting, and in these embodiments like numerals are used to indicate like structures, wherein:

FIG. 1 is a schematic diagram of an exemplary application scenario of a training system for an image processing model in accordance with some embodiments of the present description;

FIG. 2 is an exemplary flow diagram of a method of training an image processing model according to some embodiments shown herein;

FIG. 3 is an exemplary diagram of a training process for an image processing model, shown in accordance with some embodiments of the present description;

FIG. 4 is an exemplary diagram illustrating the determination of a consistency regularization term in accordance with some embodiments of the present description;

FIG. 5 is an exemplary block diagram of a training system for an image processing model in accordance with certain embodiments of the present description;

FIG. 6 is a classification confusion matrix obtained after training the same model using supervised learning and semi-supervised learning algorithms according to some embodiments of the present description;

FIG. 7 is a ROC curve obtained after training the same model using supervised learning and semi-supervised learning algorithms in accordance with some embodiments of the present description;

fig. 8 is a classification thermodynamic diagram of exemplary supervised learning and semi-supervised learning in accordance with certain embodiments of the present description.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only examples or embodiments of the present description, and that for a person skilled in the art, the present description can also be applied to other similar scenarios on the basis of these drawings without inventive effort. Unless otherwise apparent from the context, or stated otherwise, like reference numbers in the figures refer to the same structure or operation.

It should be understood that "system", "device", "unit" and/or "module" as used herein is a method for distinguishing different components, elements, parts, portions or assemblies of different levels. However, other words may be substituted by other expressions if they accomplish the same purpose.

As used in this specification and the appended claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are intended to be inclusive in the plural, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.

Flow charts are used in this description to illustrate operations performed by a system according to embodiments of the present description. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, the various steps may be processed in reverse order or simultaneously. Meanwhile, other operations may be added to the processes, or a certain step or several steps of operations may be removed from the processes.

The intensive diagnosis work consumes a great deal of effort of the doctor, and makes it difficult for the doctor to maintain a good state and possibly make a wrong judgment on the condition of the disease. Moreover, the pathology of some diseases is not easily seen in medical images, and a doctor needs higher medical level and abundant clinical experience to make a correct diagnosis. Once misdiagnosis occurs, the patient may miss the optimal treatment opportunity, resulting in deterioration of the disease condition. Therefore, the artificial intelligence technology is used for assisting a doctor in diagnosing medical images, and the suspected lesion area is automatically detected while the condition of the disease is judged, so that the pressure of massive and complex image data on the doctor can be greatly relieved, and the misdiagnosis and missed diagnosis phenomena of certain diseases which are difficult to detect can be greatly reduced.

Medical imaging techniques have been developed rapidly over the past few decades, and are used to examine areas that cannot be examined by non-surgical means, such as Computed Tomography (CT), magnetic Resonance Imaging (MRI), positron Emission Tomography (PET), and X-ray imaging (X-ray). Diagnosis and interpretation of these medical images typically relies on the experience and judgment of the physician. In view of the differences in pathology and fatigue misdiagnosis by physicians, researchers began to introduce computer technology to aid diagnosis.

The traditional medical image processing method mainly comprises statistical mode medical image classification and fuzzy mode medical image classification. The commonly used method for classifying the medical images in the statistical mode comprises a Bayes classifier, a support vector machine and the like, however, human experts mainly select the characteristics related to tasks according to the knowledge of the experts on the target field, and the characteristic selection by non-experts is difficult. Meanwhile, the performance of the method is limited by the shortage of the number of features and the difficulty of processing a large amount of medical image data by a classifier. The fuzzy pattern image classification commonly used methods include a fuzzy pattern vector machine, a fuzzy neural network and the like, but still depend on expert experience seriously. The deep learning is realized by constructing a multi-layer network and deriving high-level features from low-level features such as edge lines, so that the network can extract features which are difficult to directly extract originally from original data in a self-learning mode, and the defects of difficulty in manually selecting the features and difficulty in training in the traditional medical image processing are overcome.

Machine learning is a research hotspot which is raised in the field of artificial intelligence since the 80 th of the 20 th century, and an artificial neural network serving as a major branch of the artificial neural network performs abstract modeling on the transfer process of brain signals in the human brain, simulates the mechanism of the human brain to explain data in different degrees and levels, and becomes one of the current hotspot research directions of machine learning. In recent years, research on artificial neural networks is continuously developed, so that the problem that a traditional simulation model is difficult to process in the fields of pattern recognition, computer vision, natural language processing and the like is successfully solved, and attention of researchers in various fields is paid to the simulation model due to good performance of the simulation model. More and more artificial intelligence technology based on deep learning is applied to research and application of medical images to assist researchers and human doctors in diagnosis.

In most cases, the number is much larger than the number of labeled samples, since labeling of samples requires a lot of time and effort from a professional, and unlabeled samples are easier to collect. Supervised learning, however, cannot use unlabeled exemplars, and performance is largely dependent on the number of labeled exemplars. However, in the unsupervised learning, the relevance of the features is evaluated by using data distribution information among unlabeled samples, and there may be a case that the correct features cannot be selected. Therefore, when there are few labeled samples, how to improve the performance of machine learning by using unlabeled samples becomes an urgent problem to be solved at present.

The semi-supervised learning can fully utilize a large amount of non-labeled samples to improve the learning performance under the guidance of a small amount of sample labels, avoid the waste of data resources, and simultaneously solve the problems that the supervised learning method has weak generalization capability when the labeled samples are few and the unsupervised learning method is inaccurate when the sample label guidance is lacked. The semi-supervised learning is suitable for various task scenes with limited labeled data, particularly in the medical field with expensive label labeling.

Therefore, the present specification proposes a training method for an image processing model, which is based on a semi-supervised learning technique, and by allowing a model to utilize unlabelled samples, alleviates the requirement for labeled data to a great extent, and has an important meaning in processing scenes with small labeled data amount, and meanwhile, the model obtained by training has high accuracy and robustness.

FIG. 1 is a schematic diagram of an exemplary application scenario of a training system for an image processing model according to some embodiments of the present description.

As shown in fig. 1, the training system 100 for an image processing model may include a medical imaging device 110, a network 120, a terminal 130, a processing device 140, and a storage device 150.

The medical imaging device 110 may be used to image a target object to produce an image. The medical Imaging device 110 may include a CT (Computed Tomography) Imaging device, a PET (Positron Emission Tomography) Imaging device, an MRI (Magnetic Resonance Imaging) Imaging device, a SPECT (Single-Photon Emission Computed Tomography) Imaging device, a PET-CT Imaging device, a PET-MRI Imaging device, and the like. In some embodiments, the medical imaging device 110 may also include other medical devices 1 such as X-ray imaging equipment, thermal imaging devices, medical optics, and the like, or any combination thereof.

Network 120 may include any suitable network capable of facilitating information and/or data exchange for training system 100 of the image processing model. In some embodiments, one or more components of the training system 100 of the image processing model (e.g., the medical imaging device 110, the terminal 130, the processing device 140, the storage device 150, etc.) may exchange information and/or data with one or more components of the training system 100 of the image processing model via the network 120. For example, processing device 140 may acquire image data obtained from a scan from medical imaging device 110 via network 120 (e.g., the image data may be used as a labeled or unlabeled sample for model training). In some embodiments, network 120 may include one or more network access points. For example, the network 120 may include wired and/or wireless network access points, such as base stations and/or internet exchange points, through which one or more components of the training system 100 of the image processing model may connect to the network 120 to exchange data and/or information.

The terminal 130 may include a mobile device 131, a tablet computer 132, a notebook computer 133, and the like, or any combination thereof. In some embodiments, the terminal 130 may interact with other components in the training system 100 of the image processing model over a network. For example, the terminal 130 may send one or more control instructions to the medical imaging device 110 to control the medical imaging device 110 to scan the target object according to the instructions to acquire the medical image. In some embodiments, the terminal 130 may be part of the processing device 140. In some embodiments, the terminal 130 may be integrated with the processing device 140 as an operating console for the medical imaging device 110. For example, a user/operator (e.g., a doctor) of the training system 100 of the image processing model may control the operation of the medical imaging device 110 through the console, such as scanning a target object, controlling the movement of a scanning table, and the like.

The processing device 140 may process data and/or information obtained from the medical imaging device 110, the terminal 130, and/or the storage device 150. In some embodiments, the processing device 140 may be a single server or a group of servers. The server group may be centralized or distributed. In some embodiments, the processing device 140 may be local or remote. For example, the processing device 140 may access information and/or data from the medical imaging device 110, the terminal 130, and/or the storage device 150 via the network 120. As another example, the processing device 140 may be directly connected to the medical imaging device 110, the terminal 130, and/or the storage device 150 to access information and/or data. In some embodiments, the processing device 140 may be implemented on a cloud platform. For example, the cloud platform may include one or a combination of private cloud, public cloud, hybrid cloud, community cloud, distributed cloud, cross-cloud, multi-cloud, and the like.

The storage device 150 may store data (e.g., scan data of a target object, training samples of a model, model parameters of an image processing model, etc.), instructions, and/or any other information. In some embodiments, the storage device 150 may store data obtained from the medical imaging device 110, the terminal 130, and/or the processing device 140, e.g., the storage device 150 may store treatment plans, labeled specimens, unlabeled specimens, and the like obtained from the medical imaging device 110. In some embodiments, storage device 150 may store data and/or instructions that processing device 140 may execute or use to perform the example methods described herein. In some embodiments, the storage device 150 may be implemented by a cloud platform as described herein.

In some embodiments, the storage device 150 may be connected to the network 120 to enable communication with one or more components (e.g., the processing device 140, the terminal 130, etc.) in the training system 100 of the image processing model. One or more components in the training system 100 of the image processing model may read data or instructions in the storage device 150 through the network 120. In some embodiments, the storage device 150 may be part of the processing device 140 or may be separate and connected directly or indirectly to the processing device.

FIG. 2 is an exemplary flow diagram of a method of training an image processing model according to some embodiments shown in the present description. In some embodiments, flow 200 may be performed by a processing device. For example, the process 200 may be stored in a storage device (e.g., an onboard storage unit of a processing device or an external storage device) in the form of a program or instructions that, when executed, may implement the process 200. The flow 200 may include the following operations.

Step 202, obtaining a labeled sample and an unlabeled sample. In some embodiments, step 202 may be performed by the sample acquisition module 510.

The samples refer to medical images used to train machine learning models. The labeled exemplar is an exemplar in which a target area is labeled in an image. The target region may be a lesion region in the image, or any other region of interest. Unlabeled exemplars refer to exemplars that do not mark target areas in the image. In some embodiments, the labeled swatch and the unlabeled swatch may include a plurality. In some embodiments, the number of labeled exemplars may be less than the unlabeled exemplars due to the difficulty of labeling the labels of the labeled exemplars.

In some embodiments, the labeling of labeled exemplars can be performed manually, such as by a human expert noting a label region in an unlabeled image.

In some embodiments, the labeled and unlabeled sample images may be medical images such as CT images, MRI images, and the like.

In some embodiments, the processing device may obtain the tagged and untagged exemplars by reading from a storage device, database, or imaging device (e.g., medical imaging device 110).

And 204, respectively performing first amplification treatment and second amplification treatment on the unlabeled sample, and determining a first amplification sample and a second amplification sample. In some embodiments, step 204 may be performed by the augmentation processing module 520.

Augmentation processing refers to changing elements in a sample image in a manner to obtain a new sample image. For example, pixel positions in the sample image are adjusted, pixel values are adjusted, and the like. Through augmentation processing, a new sample which can be used for model training can be obtained, the number of samples is expanded, and the problems that the sample data of the training model is difficult to obtain and the number of samples is small are solved.

The first augmentation process may refer to adjusting pixel positions in a single sample image. For example, the first augmentation process may include random translation, rotation, color transformation, cropping, and the like. Cropping is the random cropping of a smaller image from the original image, such as a 224-by-224 image from a 256-by-256 image. In some embodiments, the classification of the resulting sample image is not affected after the first augmentation process.

The second augmentation process may refer to generating a new sample image using a plurality of pairs of sample images. For example, the second augmentation process may include CutOut, CTAugmen, and the like. Taking CutOut as an example, cutOut randomly truncates a portion of the area in the sample image and replaces the pixel value of the portion with 0.

The processing equipment can obtain a first augmentation sample after carrying out first augmentation treatment on the non-label sample, and can obtain a second augmentation sample after carrying out second augmentation treatment on the non-label sample. Based on the different modes of the first augmentation process and the second augmentation process, the second augmentation process may change the characteristics of the unlabeled sample to a greater extent than the first augmentation process.

In some embodiments, the processing device may perform the first amplification process and the second amplification process on each unlabeled sample, so as to expand the number of samples as much as possible and solve the problem of difficulty in obtaining training samples.

Step 206, obtaining a target sample based on the first augmented sample and the labeled sample. In some embodiments, step 206 may be performed by the target sample acquisition module 530.

The target sample refers to a sample image which is to be obtained for training the image processing model to be trained. In some embodiments, the processing device may process the first augmented exemplar to obtain a label (hereinafter referred to as a pseudo label) corresponding thereto, and obtain the target exemplar based on the first augmented exemplar and the labeled exemplar that have obtained the label.

In some embodiments, the manner in which the label of the first augmented sample is obtained may be as shown in the examples below.

The processing device can input the first augmented sample into an image processing model trained in the previous round, and determine a prediction result of the first augmented sample; based on a prediction of the first augmented sample (which may also be referred to as a first prediction), a pseudo label corresponding to the first augmented sample is determined.

The image processing model of the previous training round refers to an image which is subjected to certain training by using a sample image. For example, all sample images may be input into the image processing model to be trained once, and parameters may be adjusted accordingly based on the prediction result of the image processing model as a round of training. In some embodiments, if the current training is the first training round, the image processing model of the previous training round may be the initial model, and the model parameters are the initialization parameters.

The prediction result refers to a target area in the first augmented sample output after the first augmented sample input is processed by the image processing model trained in the previous round.

The pseudo label is a label in which the label of the target region in the sample image is related to the prediction result of the image processing model, and the label is not a true label (or a label labeled manually). In some embodiments, the processing device may directly use the prediction result of the image processing model as a pseudo label, or the processing device may use the prediction result as a pseudo label after performing a certain processing (for example, shifting by a distance of several pixel units, etc.).

After the pseudo label corresponding to the first augmented sample is determined, the processing device may perform mixing processing on the first augmented sample and the labeled sample corresponding to the pseudo label, and determine the target sample.

The mixing processing means that the first augmentation sample and the labeled sample are randomly mixed together, and data enhancement operation is carried out on the mixed sample image. For example, it may be that a plurality of first augmented samples and a plurality of labeled samples are put together and randomly scrambled in order, and then the mixed plurality of sample images are subjected to a data enhancement CAMMix operation.

In some embodiments, the procedure for camfix operations may be as shown in the following embodiments.

The processing device may randomly select two images from the blended sample images, e.g., a first augmented sample, a labeled sample, calculate a class activation map for the first augmented sample, and then determine a region of interest in the first augmented sample based on the class activation map. The region of interest may determine a contribution of each pixel point in the first augmented sample to the auxiliary medical diagnosis based on the activation-like map, for example, a region with the highest contribution is used as the region of interest). The processing device may then paste the region of interest to a corresponding location in the labeled exemplar image, resulting in the target image. In some embodiments, the paste location of the region of interest is the same as its location in the first augmented image.

The label of the target image may be obtained based on the pseudo label of the first augmented image and the real label of the labeled image. For example, the processing device may calculate the label of the target image according to the following formula (1).

y＝λy ₁ +(1-λ)y ₂ (1)

Wherein y represents a label of the target image, y ₁ Indicating a corresponding false label, y, of the first augmented image ₂ For the true label of the labeled image, λ is a weight coefficient, which can be set as needed, for example, λ can be 0.1, 0.2, 0.5, 0.8, etc.

It should be noted that the two sample images selected by the CAMMix operation may be arbitrary, for example, two first augmented images, two tagged images, etc., and the above examples are for illustrative purposes only.

In some embodiments, the processing device may further perform a first augmentation process on the labeled sample to obtain a third augmented sample; and mixing the first augmented sample and the third augmented sample corresponding to the pseudo label to determine the target sample.

The third augmentation sample is a sample image obtained by performing the first augmentation process on the labeled sample. Regarding the mixing process performed on the first and third amplified samples, the manner of determining the target sample is similar to that described above, and is not described herein again.

Step 208, training an image processing model using the target sample. In some embodiments, step 208 may be performed by model training module 540.

In some embodiments, the processing device may input the target samples into the current round of image processing model to be trained, and adjust the model parameters according to the prediction results of the image processing model.

In some embodiments, the image processing model may be a neural network model, a support vector machine, or the like. The processing device can input the target sample into the image processing model, and the image processing model automatically adjusts the model parameters according to the predicted result, so as to complete the model training of the current stage (using the target image for training). For example, this stage of training of the image processing model may be considered as a black box, and the process is automatically performed by the image processing model.

Then, the processing device can further train or test the image processing model by using the labeled sample image and the unlabeled sample image, so that the accuracy and the robustness of the model can be improved as much as possible under the condition of improving the utilization rate of the training sample.

In some embodiments, the processing device may adjust parameters of the image processing model based on a loss function. The loss function of the image processing model may include a first loss function term constructed based on the prediction results (which may be referred to as second prediction results) of the second augmented sample from the image processing model trained in the previous round. For example, the first loss function term may be constructed based on a prediction value of the unlabeled sample by the image processing model of the current round (for example, a model trained by using the target sample) and a prediction result obtained by processing the second augmented sample by the image processing model of the previous round.

In some embodiments, the first loss function term may further include a function term constructed based on a prediction value of the image processing model of the current round on the unlabeled samples and a prediction result (first prediction result) obtained by processing the first augmented samples by the image processing model of the previous round.

The first loss function term may reflect a difference between the predicted value of the unlabeled exemplar and its corresponding exemplar label (i.e., a pseudo label, or a predicted result of the model for the first augmented exemplar) and a difference between the predicted value of the unlabeled exemplar and the predicted result of the second augmented exemplar. The difference between the predicted value of the image processing model for the label-free sample and the predicted value of the augmented sample after the augmentation processing can be determined through the value of the first loss function item, and the model parameters are adjusted according to the difference, so that the model performance can be improved.

In some embodiments, the first loss function term may be a least mean square error loss function. Illustratively, the first loss function term may be as shown in the following equation (2).

Wherein L is _unlabel A value representing the first loss function term, and L represents the number of categories of the sample image, for example, in the case of a CT image, the sample image can be divided into negative and positive (for example, corresponding to negative and positive of a new coronavirus, respectively), and in this case, L can be 2; u 'denotes the unlabeled sample, q' denotes the prediction of the second augmented sample by the model, p _model And (y | u: theta) represents the predicted value of the model to the unlabeled sample, and q represents the predicted result of the model to the first augmented sample.

In some embodiments, to further improve the accuracy and robustness of the trained model (if not specifically stated, the model in this specification refers generally to the image processing model), the penalty function may further include a second penalty function term. The second loss function term may be constructed based on a prediction result obtained by processing the labeled sample by using an image processing model (a model obtained by training a target sample) and a real label corresponding to the labeled sample. The second loss function term may reflect the difference between the predicted value of the labeled exemplar and its corresponding true label.

In some embodiments, the second loss function term can be a cross entropy loss function. Illustratively, the second loss function term may be as shown in the following equation (3).

Wherein L is _label Representing the value of the second loss function term, x' representing the labeled sample, H representing the cross entropy loss function, p representing the prediction result of the model on the first augmented sample, p _model (y | x: θ) represents the predicted value of the model for the labeled sample.

In some embodiments, to improve the accuracy and robustness of the trained model, the loss function of the model may also include a consistency regularization term. The consistency regularization term may be constructed based on the sample images (e.g., labeled samples and unlabeled samples) before the augmentation process and the global features of the corresponding target samples. The corresponding target sample is a target image obtained based on CAMMix operation after the sample image is subjected to augmentation processing. For example, after the augmentation process is performed on the unlabeled sample object, the sample image obtained by the augmentation process and the labeled sample are used to perform the camfix operation to obtain the target image. For another example, after the augmentation process is performed on the labeled sample, the sample image obtained by the augmentation process and the image obtained after the augmentation process is performed on the unlabeled sample object and the pseudo label is obtained are subjected to the CAMMix operation to obtain the target image. Illustratively, the consistency regularization term of the loss function may be as shown in equation (4) below.

Wherein L is ₂ Value representing a consistency regularization term, F _i Represents the global feature, F ', of the sample image before the ith augmentation processing' _i And the global characteristics of the target sample corresponding to the sample image before the ith augmentation processing are shown.

For more explanation of the consistency regularization term, reference may be made to the description of fig. 4.

In conjunction with the above description, the overall loss function can be shown as equation (5) below.

L＝ξL _unlabel +L _label +L ₂ (5)

Where ξ is a hyperparameter used to control the weight of the unlabeled sample loss function term, its value can be set manually, e.g., 100, 150, etc.

In some embodiments, the trained image processing model may be used to process medical images. For example, a trained image processing model may be used to process CT images of the lungs. Specifically, the lung CT image may be input into a trained image processing model, and the lesion region in the lung CT image may be output by the image processing model. For example, the image processing model may mark the lesion area with a box in the output image, and the doctor may make a diagnosis on the lung CT image by combining the marking of the image processing model and his expert experience.

In some embodiments of the present description, an image processing model is trained based on semi-supervised learning, and augmentation processing is performed by using unlabeled samples to obtain a first augmented sample and a second augmented sample, where the first augmented sample is used to obtain a target sample, the target sample is used for model training, the number of samples is expanded, the second augmented sample is used to predict using the image processing model of the previous round, and a loss function is constructed for a predicted value of the unlabeled sample corresponding to the predicted value according to a prediction result and the image processing model of the current round, so that accuracy and robustness of the model are improved. Meanwhile, the semi-supervised learning mode relieves the requirement on the labeled data to a great extent, and has important significance in processing scenes with small labeled data quantity.

In the description, the prediction results of classification algorithms using semi-supervised learning and supervised learning are compared by taking the classification of lung CT images by using an image processing model and judging the new coronary infection condition as an example. Specifically, comparative experiments were conducted using 3 network model structures including DenseNet-121, denseNet-169, and EfficientNet-b 0. Fig. 6 and 7 are a classification confusion matrix and an ROC curve obtained by training the same model using supervised learning and semi-supervised learning algorithms, respectively. As is apparent from fig. 6, the network model trained using semi-supervised learning can effectively reduce misjudgment of cases compared to supervised learning. In fig. 7, the areas of the ROC (receiver operating characteristic) curve and the coordinate axis after the semi-supervised learning are obviously larger than those of the supervised learning, which indicates that, on the premise of using the same classification model, the semi-supervised learning can effectively improve the performance of the network model by introducing a large number of unlabelled samples for training.

Since the deep learning network model belongs to a black box, the decision making process of the model has no transparency, so doctors pay attention to the interpretability of the model in particular, which is also important for the diagnosis of lung CT image focuses. To better understand the decision of the model, the intent of the lesion area is visualized by using the CAM. The CAM calculates the weight of each feature map through the global average of the gradient, then carries out weighted summation according to the weights of the corresponding categories of all the feature maps to obtain the final thermodynamic diagram, and highlights the important area closely related to the prediction result.

Fig. 8 is a classification result thermodynamic diagram of exemplary supervised and semi-supervised learning as illustrated in some embodiments of the present description. The first column in fig. 8 represents the original lung CT image and the second three columns show the results of the unsunet-121 learning using supervised learning. Where the second column is CAM learned from the baseline and the CAM map of the baseline in the third column is superimposed on the original image to show the activation region. The colors from deep red to deep blue (not apparent in the current image due to the use of black and white images) correspond to values of the class saliency of the pixels from large to small. The fourth fifth column shows the results of using the semi-supervised learning algorithm for DenseNet-121. The CAM image is an intuitive explanation of the lung CT focus predicted by the network, and through comparison results, the model obtained by the training method of the image processing model provided by the specification can almost capture all the remarkable focus areas of the lung, so that the interpretability of the diagnosis result of the model is increased.

FIG. 3 is an exemplary diagram of a training process for an image processing model according to some embodiments shown in the present description.

In some embodiments, the training flow of the image processing model may be as shown in fig. 3.

302, performing a first augmentation treatment on the labeled sample to obtain a third augmentation sample;

304, performing a first augmentation treatment on the unlabeled sample to obtain a first augmented sample;

306, predicting the first augmented sample through the image processing model obtained in the previous round of training to obtain a first prediction result, and determining a pseudo label based on the first prediction result;

308, mixing the labeled sample 301 and/or the third augmented sample with the first augmented sample with the determined pseudo label to obtain a target sample;

310, performing model training by using the target sample;

312, performing a second amplification process on the unlabeled sample to obtain a second amplified sample;

314, predicting through the image processing model trained in the previous round to obtain a second prediction result;

and 316, calculating a loss function. The loss function may include a first loss function term, a second loss function term, and a consistency regularization term.

It should be noted that the above flows are only for illustrative purposes, and the execution order of the steps of the flows is not limited in this specification, for example, 312 and 314 may be performed before, after or simultaneously with 310. For more description of the above processes, reference may be made to the related description of fig. 2.

FIG. 4 is an exemplary diagram illustrating the determination of a consistency regularization term according to some embodiments of the present description.

As shown in fig. 4, in some embodiments, first, the processing device may perform a CAMMix operation on any one of the original unlabeled exemplar images and one of the labeled exemplar images to obtain its corresponding target exemplar. For a detailed description of obtaining the target sample, reference may be made to the description of fig. 2.

The processing device may input the original image 402 (unlabeled exemplars) into the feature extraction model 404 and the blended image 410 (target image) into the feature extraction model 412. The feature extraction model 404 and the feature extraction model 412 may be the same model or different models (e.g., the same type of model with the same parameters). In some embodiments, the feature extraction model used may be an image processing model obtained by last-stage training, or may be another model that can perform feature extraction, which is not limited in this specification.

After the feature extraction model 404 performs feature extraction on the input original image 402, a feature map 406 is output; the feature extraction model 412 performs feature extraction on the input mixed image 410, and outputs a feature map 414.

The feature map 406 is pooled to obtain a global feature 408 (corresponding to F above), and the feature map 414 is pooled to obtain a global feature 416 (corresponding to F above) ^′ ).

A consistency regularization term 418 (e.g., consistency regularization term L above) that yields a loss function may be constructed based on global features 408 and global features 416 ₂ )。

It should be noted that the foregoing descriptions are only for purposes of illustration and description and are not intended to limit the scope of the present disclosure. Various modifications and changes to the procedures described herein will be apparent to those skilled in the art in light of the disclosure. However, such modifications and variations are still within the scope of the present specification. For example, the present specification may be directed to variations on the process steps, such as the addition of pre-processing steps and storage steps.

FIG. 5 is an exemplary block diagram of a training system for an image processing model in accordance with some embodiments of the present description. As shown in fig. 5, the system 500 may include a sample acquisition module 510, an augmentation processing module 520, a target sample acquisition module 530, and a model training module 540.

The swatch acquiring module 510 can be used to acquire labeled swatches and unlabeled swatches.

The augmentation processing module 520 may be configured to perform a first augmentation process and a second augmentation process on the unlabeled sample, respectively, to determine a first augmented sample and a second augmented sample; wherein the second augmentation process changes the characteristic of the unlabeled exemplar to a greater extent than the first augmentation process.

The target sample acquisition module 530 may be configured to acquire a target sample based on the first augmented sample and the labeled sample.

The model training module 540 may be configured to train an image processing model using the target samples; wherein the loss function of the image processing model comprises a first loss function term constructed based on the prediction result of the image processing model trained in the previous round on the second augmented sample.

It should be appreciated that the system and its modules illustrated in FIG. 5 may be implemented in a variety of ways. For example, in some embodiments, the system and its modules may be implemented in hardware, software, or a combination of software and hardware. Wherein the hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory for execution by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the methods and systems described above may be implemented using computer executable instructions and/or embodied in processor control code, for example such code provided on a carrier medium such as a diskette, CD-or DVD-ROM, programmable memory such as read-only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The system and its modules in this specification may be implemented not only by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, etc., or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., but also by software executed by various types of processors, for example, or by a combination of hardware circuits and software (e.g., firmware).

It should be noted that the above description of the training system of the image processing model and the modules thereof is only for convenience of description, and the description is not limited to the scope of the embodiments. It will be appreciated by those skilled in the art that, given the teachings of the system, any combination of modules or sub-system may be configured to interface with other modules without departing from such teachings. For example, in some embodiments, the sample acquiring module 510, the augmentation processing module 520, the target sample acquiring module 530 and the model training module 540 may be different modules in one system, or may be one module to implement the functions of two or more modules. For example, each module may share one memory module, and each module may have its own memory module. Such variations are within the scope of the present disclosure.

Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be regarded as illustrative only and not as limiting the present specification. Various modifications, improvements and adaptations to the present description may occur to those skilled in the art, though not explicitly described herein. Such modifications, improvements and adaptations are proposed in the present specification and thus fall within the spirit and scope of the exemplary embodiments of the present specification.

Also, the description uses specific words to describe embodiments of the description. Reference throughout this specification to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the specification is included. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, certain features, structures, or characteristics may be combined as suitable in one or more embodiments of the specification.

Moreover, those skilled in the art will appreciate that aspects of the present description may be illustrated and described in terms of several patentable species or situations, including any new and useful combination of processes, machines, manufacture, or materials, or any new and useful improvement thereof. Accordingly, aspects of this description may be performed entirely by hardware, entirely by software (including firmware, resident software, micro-code, etc.) or by a combination of hardware and software. The above hardware or software may be referred to as "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the present description may be represented as a computer product, including computer readable program code, embodied in one or more computer readable media.

The computer storage medium may comprise a propagated data signal with the computer program code embodied therewith, for example, on baseband or as part of a carrier wave. The propagated signal may take any of a variety of forms, including electromagnetic, optical, etc., or any suitable combination. A computer storage medium may be any computer-readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer storage medium may be propagated over any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or any combination of the preceding.

Computer program code required for the operation of various portions of this specification may be written in any one or more programming languages, including an object oriented programming language such as Java, scala, smalltalk, eiffel, JADE, emerald, C + +, C #, VB.NET, python, and the like, a conventional programming language such as C, visual Basic, fortran 2003, perl, COBOL 2002, PHP, ABAP, a dynamic programming language such as Python, ruby, and Groovy, or other programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any form of network, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service using, for example, software as a service (SaaS).

Additionally, the order in which the elements and sequences of the process are recited in the specification, the use of alphanumeric characters, or other designations, is not intended to limit the order in which the processes and methods of the specification occur, unless otherwise specified in the claims. While certain presently contemplated useful embodiments of the invention have been discussed in the foregoing disclosure by way of various examples, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein described. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing server or mobile device.

Similarly, it should be noted that in the preceding description of embodiments of the present specification, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to imply that more features than are expressly recited in a claim. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.

Numerals describing the number of components, attributes, etc. are used in some embodiments, it being understood that such numerals used in the description of the embodiments are modified in some instances by the use of the modifier "about", "approximately" or "substantially". Unless otherwise indicated, "about", "approximately" or "substantially" indicates that the number allows a variation of ± 20%. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximations that may vary depending upon the desired properties of the individual embodiments. In some embodiments, the numerical parameter should take into account the specified significant digits and employ a general digit-preserving approach. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the range are approximations, in the specific examples, such numerical values are set forth as precisely as possible within the scope of the application.

For each patent, patent application publication, and other material, such as articles, books, specifications, publications, documents, etc., cited in this specification, the entire contents of each are hereby incorporated by reference into this specification. Except where the application history document does not conform to or conflict with the contents of the present specification, it is to be understood that the application history document, as used herein in the present specification or appended claims, is intended to define the broadest scope of the present specification (whether presently or later in the specification) rather than the broadest scope of the present specification. It is to be understood that the descriptions, definitions and/or uses of terms in the accompanying materials of the present specification shall control if they are inconsistent or inconsistent with the statements and/or uses of the present specification.

Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of the embodiments of the present disclosure. Other variations are also possible within the scope of the present description. Thus, by way of example, and not limitation, alternative configurations of the embodiments of the specification can be considered consistent with the teachings of the specification. Accordingly, the embodiments of the present description are not limited to only those embodiments explicitly described and depicted herein.

Claims

1. A method of training an image processing model, the method comprising:

obtaining a labeled sample and an unlabeled sample;

respectively performing first amplification treatment and second amplification treatment on the unlabeled sample, and determining a first amplification sample and a second amplification sample; wherein the second augmentation process changes the characteristics of the unlabeled exemplar to a greater extent than the first augmentation process;

obtaining a target sample based on the first augmented sample and the labeled sample;

training an image processing model using the target sample; wherein the loss function of the image processing model comprises a first loss function term constructed based on the prediction result of the image processing model trained in the previous round on the second augmented sample; the trained image processing model is used for processing medical images.

2. The method of claim 1, the first loss function term reflecting a difference between the predicted value of the unlabeled exemplar and its corresponding exemplar label of the first augmented exemplar, and a difference between the predicted value of the unlabeled exemplar and the predicted result of the second augmented exemplar.

3. The method of claim 1, the loss function further comprising a second loss function term reflecting a difference of the predicted values of the labeled exemplars and their corresponding true labels.

4. The method of claim 3, the first loss function term being a minimum mean square error loss function and the second loss function term being a cross entropy loss function.

5. The method of claim 1, the loss function further comprising a consistency regularization term constructed based on global features of the sample images before the augmentation process and their corresponding target samples.

6. The method of claim 1, said obtaining a target specimen based on said first augmented specimen and said labeled specimen, comprising:

inputting the first augmentation sample into an image processing model trained in the previous round, and determining a prediction result of the first augmentation sample;

determining a pseudo label corresponding to the first augmented sample based on a prediction result of the first augmented sample;

and mixing the first augmentation sample corresponding to the pseudo label and the labeled sample to determine the target sample.

7. The method of claim 6, further comprising:

performing first amplification treatment on the labeled sample to obtain a third amplified sample;

and mixing the first augmented sample and the third augmented sample corresponding to the pseudo label to determine the target sample.

8. A training system for an image processing model, the system comprising:

the sample acquisition module is used for acquiring a labeled sample and an unlabeled sample;

the augmentation processing module is used for respectively carrying out first augmentation processing and second augmentation processing on the label-free sample and determining a first augmentation sample and a second augmentation sample; wherein the first and second augmentation processes vary characteristics of the unlabeled exemplar differently;

a target sample acquisition module for acquiring a target sample based on the first augmented sample and the labeled sample;

the model training module is used for training an image processing model by using the target sample; wherein the loss function of the image processing model comprises a first loss function term constructed based on the prediction result of the image processing model trained in the previous round on the second augmented sample; the trained image processing model is used for processing medical images.

9. An apparatus for training an image processing model, comprising at least one storage medium and at least one processor, the at least one storage medium storing computer instructions; the at least one processor is configured to execute the computer instructions to implement a method of training an image processing model according to any of claims 1-7.

10. A computer-readable storage medium storing computer instructions which, when read by a computer, implement the method of training an image processing model according to any one of claims 1 to 7.