EP3982292A1 - Method for training image recognition model, and method and apparatus for image recognition - Google Patents
Method for training image recognition model, and method and apparatus for image recognition Download PDFInfo
- Publication number
- EP3982292A1 EP3982292A1 EP20877797.9A EP20877797A EP3982292A1 EP 3982292 A1 EP3982292 A1 EP 3982292A1 EP 20877797 A EP20877797 A EP 20877797A EP 3982292 A1 EP3982292 A1 EP 3982292A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- image
- predicted probability
- loss function
- recognition model
- image recognition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012549 training Methods 0.000 title claims abstract description 137
- 238000000034 method Methods 0.000 title claims abstract description 82
- 230000006870 function Effects 0.000 claims description 269
- 238000003745 diagnosis Methods 0.000 claims description 34
- 239000000523 sample Substances 0.000 claims description 32
- 238000012545 processing Methods 0.000 claims description 31
- 238000004364 calculation method Methods 0.000 claims description 20
- 238000010606 normalization Methods 0.000 claims description 20
- 238000004590 computer program Methods 0.000 claims description 2
- 238000002372 labelling Methods 0.000 abstract description 20
- 230000000694 effects Effects 0.000 abstract description 18
- 238000010586 diagram Methods 0.000 description 22
- 206010028980 Neoplasm Diseases 0.000 description 17
- 230000003902 lesion Effects 0.000 description 15
- 201000011510 cancer Diseases 0.000 description 6
- 210000003238 esophagus Anatomy 0.000 description 5
- 210000002784 stomach Anatomy 0.000 description 5
- 238000012356 Product development Methods 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 201000010099 disease Diseases 0.000 description 3
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 206010061309 Neoplasm progression Diseases 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000013434 data augmentation Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 230000002183 duodenal effect Effects 0.000 description 2
- 210000001198 duodenum Anatomy 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000012797 qualification Methods 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 230000005751 tumor progression Effects 0.000 description 2
- 238000012800 visualization Methods 0.000 description 2
- 239000002699 waste material Substances 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 238000002583 angiography Methods 0.000 description 1
- 210000000013 bile duct Anatomy 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000002059 diagnostic imaging Methods 0.000 description 1
- 210000000232 gallbladder Anatomy 0.000 description 1
- 230000002496 gastric effect Effects 0.000 description 1
- 210000001035 gastrointestinal tract Anatomy 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000009545 invasion Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 230000003211 malignant effect Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 210000000496 pancreas Anatomy 0.000 description 1
- 230000007170 pathology Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 238000003325 tomography Methods 0.000 description 1
- 230000004614 tumor growth Effects 0.000 description 1
- 238000002604 ultrasonography Methods 0.000 description 1
- 230000024883 vasodilation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
- G06V10/7753—Incorporation of unlabelled data, e.g. multiple instance learning [MIL]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
- G06F18/2155—Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/217—Validation; Performance evaluation; Active pattern learning techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H30/00—ICT specially adapted for the handling or processing of medical images
- G16H30/40—ICT specially adapted for the handling or processing of medical images for processing medical images, e.g. editing
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/03—Recognition of patterns in medical or anatomical images
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Definitions
- This application relates to the field of artificial intelligence (AI), and in particular, to an image processing technology.
- AI artificial intelligence
- medical staff may analyze an illness of a patient by using a medical image. To help the medical staff diagnose the illness more quickly and more accurately, and the medical image may be recognized by using an automatic diagnostic device.
- the medical images need to be labeled by the medical staff, that is, the medical staff can make a judgment on each medical image according to clinical experience. For example, whether a disease exists in the medical image or not, and a position of a lesion in the medical image.
- the embodiments of this application provide a method and an apparatus for training an image recognition model and an image recognition method and apparatus, which can train a model by using a labeled medical image for different tasks and an unlabeled medical image together.
- the labeled image and the unlabeled image are effectively used, so that a requirement for image labeling is reduced and a data volume for training is increased, thereby improving a model prediction effect while saving labeling resources.
- a first aspect of this application provides a method for training an image recognition model, including:
- a second aspect of this application provides an image recognition method, including:
- a third aspect of this application provides an apparatus for training an image recognition model, including:
- a fourth aspect of this application provides an image recognition apparatus, including:
- a fifth aspect of this application provides an electronic device, including: a memory, a transceiver, a processor, and a bus system,
- a sixth aspect of this application provides an endoscope medical diagnosis system, including: a probe, a circuit, a processor, and a display,
- a seventh aspect of this application provides a computer-readable storage medium, storing instructions, the instructions, when run on a computer, causing the computer to perform the method in the first aspect or the second aspect.
- An eighth aspect of this application provides a computer program product, including instructions, the instructions, when run on a computer, causing the computer to perform the method in the first aspect or the second aspect.
- the embodiments of this application provide a method for training an image recognition model.
- Image sets to be trained are obtained first, then a first predicted probability, a second predicted probability, a third predicted probability, and a fourth predicted probability are obtained based on the image sets to be trained by using an image recognition model to be trained, subsequently, a target loss function is determined according to the first predicted probability, the second predicted probability, the third predicted probability, and the fourth predicted probability, and finally the image recognition model to be trained is trained based on the target loss function, to obtain an image recognition model.
- a model can be trained by using a labeled medical image for different tasks and an unlabeled medical image together.
- the labeled image and the unlabeled image are effectively used, so that a requirement for image labeling is reduced and a data volume for training is increased, thereby improving a model prediction effect while saving labeling resources.
- the embodiments of this application provide a method and an apparatus for training an image recognition model and an image recognition method and apparatus, a model is trained by using a labeled medical image for different tasks and an unlabeled medical image together, and the labeled image and the unlabeled image are effectively used, so that a requirement for image labeling is reduced and a data volume for training is increased, thereby improving a model prediction effect while saving labeling resources.
- the method for training an image recognition model and the image recognition method provided by this application are applicable to the medical field of artificial intelligence (AI), and are particularly applicable to the field of medical image recognition based on a computer vision (CV) technology.
- AI artificial intelligence
- CV computer vision
- the most common medical images in the medical field include, but are not limited to, an endoscope image, an angiography image, an angiocardiographic image, a computerized tomography (CT) image, a B-mode ultrasound image, and a pathology image.
- CT computerized tomography
- the medical image can directly reflect a lesion occurring inside a tissue, and is an important basis for a doctor to perform disease diagnosis, and even a final basis of diagnosis of some diseases.
- a cancer diagnosis result is determined by observing a radiographic image of a lesion, which includes observing whether there is a shadow, a plaque, or vasodilation.
- an endoscope image may be recognized, and is applied to automatic diagnosis of an endoscope image to assist a doctor in improving diagnosis efficiency and accuracy, and on this basis, available data of another form is further used to assist model training to improve model accuracy.
- the medical image is an important information entry for the doctor to learn an illness of a patient.
- a current high-quality medical imaging device has become popular, accurate interpretation of the medical image often requires the doctor to have professional knowledge background and long-term experience accumulation.
- population is large, burden on a medical system is heavy, and a quantity of experienced doctors is insufficient and is mainly concentrated in large-scale grade-A tertiary hospitals in first-tier cities, resulting in scarcity of medical resources.
- unlabeled data that is, semi-supervised learning
- labeled data that is, multi-task learning, MTL
- FIG. 1 is a schematic architectural diagram of an image recognition system according to an embodiment of this application.
- the image recognition system may include a medical device, and the medical device may be specifically an endoscope device, an electron microscope, or the like. After acquiring a medical image to be recognized, the medical device may recognize the medical image according to a task type by using a trained image recognition model.
- recognition may be performed according to different parts (for example, esophagus, stomach, duodenum, and colorectum), or recognition may be performed according to different target tasks (for example, distinguishing benign or malignant, distinguishing parts, or distinguishing whether a picture is qualified), and a visualization result may be obtained finally, to provide a doctor with a focus region.
- parts for example, esophagus, stomach, duodenum, and colorectum
- target tasks for example, distinguishing benign or malignant, distinguishing parts, or distinguishing whether a picture is qualified
- the medical device may send the medical image to a terminal device, the terminal device may recognize the medical image by using the trained image recognition model, to obtain the visualization result for providing a doctor with a focus region and displaying the result on an interface.
- the medical device may send the medical image to a server, and the server recognizes the medical image by using the trained image recognition model.
- the server may feed the result back to the terminal device or the medical device, and the terminal device or the medical device performs displaying.
- the terminal device includes, but is not limited to, a tablet computer, a notebook computer, a palmtop computer, a mobile phone, a speech interaction device, and a personal computer (PC), and is not limited herein.
- FIG. 2 is an entire schematic structural diagram of training an image recognition model according to an embodiment of this application.
- the image recognition model in this application may adopt a deep learning model structure, for example, a residual network (ResNet) structure or a dense convolutional network structure.
- ResNet residual network
- data augmentation and data preprocessing may be performed on training data, and an end-to-end method based stochastic gradient descent is adopted for training. Alternate training of each task may be selected.
- Hybrid training For alternate training, labeled data of a target task, auxiliary task data in the MTL, and unlabeled data in the semi-supervised learning are inputted sequentially, a corresponding optimizer is invoked to reduce a corresponding loss value, so as to update parameters of an overlapped part and unique parameters of the target task.
- Hybrid training may be alternatively selected.
- an optimizer is invoked after corresponding loss values are added, thereby reducing a total loss value.
- an online inference part shown in FIG. 2 may be used for prediction, and the online inference part includes data preprocessing, a network model, and a fully connected layer. In an actual application, the online inference part may further include another network layer. This is merely an example and is not to be understood as a limitation on this application.
- an embodiment of a method for training an image recognition model in this embodiment of this application includes the following steps: 101.
- an apparatus for training an image recognition model obtains image sets to be trained. It may be understood that the apparatus for training an image recognition model may be deployed on the terminal device or may be deployed on the server. Because a data volume for training is usually relatively large, model training may be performed by using the server. However, this is not to be understood as a limitation of this application.
- the image sets to be trained include at least a first image set, a second image set, and a third image set, and each of the first image set, the second image set, and the third image set belongs to a training sample.
- the first image set includes at least one first image (which may be represented as x 0 )
- the second image set includes at least one second image (which may be represented as x UL ) and at least one perturbed image (which may be represented as x pert )
- the third image set includes at least one third image (which may be represented as x 1 ).
- the first image is a labeled image that carries labeled information and corresponds to a first task
- the second image is an unlabeled image that does not carry the labeled information and corresponds to the first task
- the third image is a labeled image that carries the labeled information and corresponds to a second task.
- the first task and the second task are different tasks.
- the perturbed image is obtained by performing random scrambling on the second image, and a size of the perturbed image is usually the same as a size of the second image.
- the random scrambling includes, but is not limited to, flipping, rotation, and translation. It may be understood that two times of random scrambling may be performed on one second image, that is, one second image may correspond to two perturbed images.
- the perturbed image is usually generated during training.
- two training processes which are respectively semi-supervised learning and multi-task learning (MTL).
- the first image set and the second image set are used for the semi-supervised learning
- the second predicted probability and the third predicted probability are output results of the semi-supervised learning
- the third image set is used for the MTL
- the fourth predicted probability is an output result of the MTL.
- the semi-supervised learning assists training by using unlabeled data of the same task to improve a model effect.
- the significance of labeling is to determine whether a result of prediction of a current model is correct, so as to server as an indication for evaluating quality of the mode. That is, a target loss function is set, a more accurate current image recognition model to be trained indicates a smaller value of the target loss function, and a model training process is an optimization process of causing the target loss function to obtain a minimum value.
- quality of a model may be evaluated by using a cross entropy loss function.
- the quality of the model cannot be evaluated by using a label. Therefore, the same picture may be inputted into a network after two times of random disturbance, and a difference between two prediction results is determined by using a consistency constraint loss function.
- the model training is to reduce the different between the two prediction results.
- the MTL assists training by using a labeled data set in another related task, to improve the model effect.
- a model is independently trained for each task, but in an MTL method, a plurality of related tasks may be trained at the same time by using one network model. Some parameters of the network model are shared by the tasks, and some other parameters of the network model are unique to each task.
- a target loss function according to the first predicted probability, the second predicted probability, the third predicted probability, and the fourth predicted probability, the target loss function including at least a first loss function, a second loss function, and a third loss function, the first loss function being determined according to the first predicted probability, the second loss function being determined according to the second predicted probability and the third predicted probability, and the third loss function being determined according to the fourth predicted probability.
- the apparatus for training an image recognition model determines a first loss function according to the first predicted probability and labeled information corresponding to the first image set, the first predicted probability being a predicted value, and the labeled information corresponding to the first image set being a real value, and calculates the first loss function based on the predicted value and the real value.
- the apparatus for training an image recognition model determines a second loss function according to the second predicted probability and the third predicted probability, both the second predicted probability and the third predicted probability being predicted values.
- the apparatus for training an image recognition model determines a third loss function according to the fourth predicted probability and labeled information corresponding to the third image set, the fourth predicted probability being a predicted value, and the labeled information corresponding to the third image set being a real value, and calculates the third loss function based on the predicted value and the real value.
- a target loss function may be obtained according to the first loss function, the second loss function, and the third loss function.
- the target loss function when the target loss function converges, it indicates that training of the image recognition model to be trained is completed.
- the image recognition model to be trained is an image recognition model. It may be understood that in an actual application, it may be also considered that the target loss function has converged when a quantity of times of training reaches a threshold.
- the embodiments of this application provide a method for training an image recognition model.
- Image sets to be trained are obtained first, then a first predicted probability, a second predicted probability, a third predicted probability, and a fourth predicted probability are obtained based on the image sets to be trained by using an image recognition model to be trained, subsequently, a target loss function is determined according to the first predicted probability, the second predicted probability, the third predicted probability, and the fourth predicted probability, and finally the image recognition model to be trained is trained based on the target loss function, to obtain an image recognition model.
- a model can be trained by using a labeled medical image for different tasks and an unlabeled medical image together.
- the labeled image and the unlabeled image are effectively used, so that a requirement for image labeling is reduced and a data volume for training is increased, thereby improving a model prediction effect while saving labeling resources.
- the obtaining a first predicted probability, a second predicted probability, a third predicted probability, and a fourth predicted probability based on the image sets to be trained by using an image recognition model to be trained may include:
- the apparatus for training an image recognition model inputs a second image set into the image recognition model to be trained.
- the second image set includes a second image and a perturbed image. It is assumed that first random scrambling is performed on a second image A to obtain a perturbed image A, and second random scrambling is performed on the second image A to obtain a perturbed image B. Therefore, the apparatus for training an image recognition model first inputs the second image A and the perturbed image A into the image recognition model to be trained, and the image recognition model to be trained outputs a second predicted probability.
- the apparatus for training an image recognition model inputs the second image A and the perturbed image B into the image recognition model to be trained, the image recognition model to be trained outputs a third predicted probability, and two predicted probabilities are obtained respectively through two predictions.
- two times of random scrambling may be performed on each second image.
- FIG. 4 is a schematic diagram of an embodiment of performing training based on an unlabeled sample according to an embodiment of this application.
- a sample set of the second image includes at least one second image.
- First random scrambling is first performed on each second image in the sample set of the second image, to obtain a sample set of the perturbed image A.
- second random scrambling is performed on each second image in the sample set of the second image, to obtain a sample set of the perturbed image B.
- Both the sample set of the second image and the sample set of the perturbed image A are inputted into the image recognition model to be trained, to obtain a first predicted probability corresponding to each sample.
- Both the sample set of the second image and the sample set of the perturbed image B are inputted into the image recognition model to be trained, to obtain a second predicted probability corresponding to each sample.
- the apparatus for training an image recognition model further inputs a first image set into the image recognition model to be trained.
- the first image set includes a first image
- the first image is a labeled image.
- the apparatus for training an image recognition model further inputs a third image set into the image recognition model to be trained.
- the third image set includes a third image
- the third image is similar to the first image and is also a labeled image. The difference is that the first image set in which the first image is located and the third image set in which the third image is located correspond to different learning tasks.
- the first image set is labeled for a lesion positioning task, that is, content labeled in the first image is a position of a lesion, for example, the lesion is in the esophagus, stomach, duodenum, colorectum, or the like.
- the third image set is labeled for a tumor property task, that is, content labeled in the third image is a tumor property such as a malignant tumor or a benign tumor. It may be understood that in an actual application, other different tasks may be further set according to a requirement. This is merely an example and is not to be understood as a limitation on this application.
- FIG. 5 is a schematic diagram of an embodiment based on MTL according to an embodiment of this application.
- the MTL assists training by using another related labeled data set, to improve a model effect.
- a model is independently trained for each task, but in an MTL method, a plurality of related tasks may be trained at the same time by using one network model. Some parameters of the network model are shared by the tasks, and some other parameters of the network model are unique to each task.
- prediction results under four different tasks are outputted by using the image recognition model to be trained, parameters are shared among different tasks, and all data sets of all tasks are used, so that a data volume for training is increased.
- the MTL has a plurality of forms, including, but is not limited to, joint learning, learning to learn, and learning with an auxiliary task. Generally, optimizing a plurality of loss functions is equivalent to performing the MTL. Even if only one loss function is optimized, an original task model may be improved by using an auxiliary task.
- the MTL provided in this application may be implemented based on parameter hard sharing, or may be implemented based on parameter soft sharing.
- the parameter hard sharing is typically implemented by sharing a hidden layer between all tasks while preserving output layers of several specific tasks. In the parameter soft sharing, each task has a separate model, and each model includes a respective parameter.
- a method for obtaining the first predicted probability, the second predicted probability, the third predicted probability, and the fourth predicted probability is provided.
- the second predicted probability and the third predicted probability are obtained based on the second image set by using the semi-supervised learning
- the fourth predicted probability is obtained based on the third image set by using the MTL.
- a plurality of related tasks can be further trained at the same time by using one image recognition model, some parameters of the image recognition model are shared by various tasks, and some other parameters are unique to each task.
- Shared parameters use all data sets of all tasks, so that a data volume for training is increased, and meanwhile unique noise of each training set is canceled, thereby improving a generalization ability of the model, and reducing overfitting of the model.
- An independent output layer may select a most relevant feature for a task from a shared part, and learn a unique classification boundary of each task, so that the model has sufficient flexibility, and can obtain relatively high accuracy for an image recognition task.
- the obtaining the first predicted probability based on the first image set by using the image recognition model to be trained may include:
- first image in the first image set is used as an example for description below. It may be understood that other first images in the first image set are also processed in a similar manner, and details are not described herein again.
- the first image is represented as x 0
- labeled information of the first image is y 0
- the labeled information is used for representing a classification label under a classification task
- the classification task is a lesion positioning task
- the classification label may be different parts.
- a label 1 represents an esophagus part
- a label 2 represents a stomach
- a label 3 represents a duodenal part
- a label 4 represents a colorectal part
- a label 5 represents no type.
- the classification task is a task of distinguishing tumor properties
- the classification label may be different degrees of tumor progression.
- a label 1 represents a benign tumor
- a label 2 represents a malignant tumor
- a label 3 represents no tumor.
- the classification task is a task of distinguishing qualified conditions of a picture
- the classification label may be different picture qualification conditions.
- a label 1 represents that the picture is qualified
- a label 2 represents that the picture is not qualified.
- a first predicted value is outputted after the first image x 0 belonging to a second task passes through a fully connected (FC) layer, the first predicted value being represented as z 0 , and the first predicted probability p 0 of the first image is obtained after the first predicted value z 0 passes through a softmax layer, that is, normalization processing is implemented.
- FC fully connected
- the last layer of the image recognition model to be trained may be the FC layer+the softmax layer.
- the FC layer multiplies a weight matrix and an input vector and then adds a bias, and maps N real numbers into K fractions, and the softmax layer maps K real numbers into K probabilities within a range (0, 1) and ensures that a sum of the K real numbers is 1.
- the method for generating the first predicted probability is provided, that is, first, a first predicted value of the first image is obtained by using an FC layer included in the image recognition model to be trained, and then normalization processing is performed on the first predicted value of the first image, to obtain the first predicted probability of the first image.
- a prediction class of a sample can be reflected more intuitively, thereby improving the accuracy of training sample classification and improving the model training efficiency and accuracy.
- the obtaining the second predicted probability and the third predicted probability based on the second image set by using the image recognition model to be trained may include:
- the apparatus for training an image recognition model obtains at least one second image, the second image herein being an unlabeled image. Subsequently, two times of random scrambling are performed on each second image, and a first perturbed image set is obtained after first random scrambling, the first perturbed image set including at least one first perturbed image, that is, each first perturbed image corresponds to a second image.
- a second perturbed image set is obtained after second random scrambling, the second perturbed image set including at least one second perturbed image, that is, each second perturbed image corresponds to a second image, and a quantity of second perturbed images being usually the same as a quantity of first perturbed images.
- the at least one second image and the first perturbed image set are inputted into the image recognition model to be trained, to obtain the second predicted probability.
- 1000 second images and 1000 first perturbed images may be inputted into the image recognition model to be trained, or 100 second images and 100 first perturbed images may be inputted into the image recognition model to be trained.
- a quantity of second images is not limited this time.
- the at least one second image and the second perturbed image set are inputted into the image recognition model to be trained, to obtain the third predicted probability.
- the second predicted probability may be the same as or different from the third predicted probability.
- a result outputted by the image recognition model to be trained may be a predicted value, and a predicted probability may be obtained after normalization processing is performed on the predicted value.
- Data augmentation needs to be performed on the second image during random scrambling, and in addition to performing flipping, rotation, and translation on the second image, a direction, a position, a proportion, a brightness, or the like of the second image may be changed.
- a random factor such as a random dropout may be added to the image recognition model to be trained.
- the dropout is a method for optimizing an artificial neural network with a depth structure, and some weights or outputs of a hidden layer are return to zero randomly during learning, to reduce interdependence between nodes, thereby achieving regularization of a neural network.
- a perturbed image is random noise
- a random scrambling process may be referred to as a Pi-model.
- the perturbed image is adversarial perturbation
- the random scrambling process may be referred to as virtual adversarial training (VAT).
- the data processing manner based on semi-supervised learning is provided, that is, two times of random scrambling are performed on a second image, to obtain a first perturbed image and a second perturbed image, and then the second image and each of the first perturbed image and the second perturbed image form two training samples to be inputted into a model, to obtain two predicted probabilities.
- random scrambling is performed on an unlabeled image, to obtain images with different perturbation degrees as samples for model training, and manual intervention is not required during random scrambling, thereby improving the model training efficiency.
- randomized processing can improve the generalization ability of the model, thereby improving a model training effect.
- the semi-supervised learning avoids waste of data and resources, and resolves problems that a generalization ability of a model of full supervised learning is not strong and a model of unsupervised learning is inaccurate.
- the obtaining the fourth predicted probability based on the third image set by using the image recognition model to be trained may include:
- the method for generating the fourth predicted probability is described.
- one third image in the third image set is used as an example for description below. It may be understood that other third images in the third image set are also processed in a similar manner, and details are not described herein again.
- the third image is represented as x 1
- labeled information of the third image is y 1 .
- the labeled information is used for representing a classification label under a classification task, for example, the classification task is a lesion positioning task, and the classification label may be different parts.
- a label 1 represents an esophagus part
- a label 2 represents a stomach
- a label 3 represents a duodenal part
- a label 4 represents a colorectal part
- a label 5 represents no type.
- the classification task is a task of distinguishing tumor properties
- the classification label may be different degrees of tumor progression.
- a label 1 represents a benign tumor
- a label 2 represents a malignant tumor
- a label 3 represents no tumor.
- the classification task is a task of distinguishing qualified conditions of a picture
- the classification label may be different picture qualification conditions.
- a label 1 represents that the picture is qualified
- a label 2 represents that the picture is not qualified.
- the labeled information of the third image belongs to the second task
- the labeled information of the first image belongs to the first task, and the two tasks are different.
- a second predicted value is outputted after the third image x 1 belonging to the second task passes through the FC layer, the second predicted value being represented as z 1 , and the fourth predicted probability p 1 of the third image is obtained after the second predicted value z 1 passes through the softmax layer, that is, normalization processing is implemented.
- the last layer of the image recognition model to be trained may be the FC layer+the softmax layer.
- the FC layer multiplies a weight matrix and an input vector and then adds a bias, and maps N real numbers into K fractions, and the softmax layer maps K real numbers into K probabilities within a range (0, 1) and ensures that a sum of the K real numbers is 1.
- the method for generating the fourth predicted probability is provided, that is, first, a second predicted value of the third image is obtained by using an FC layer included in the image recognition model to be trained, and then normalization processing is performed on the second predicted value of the third image, to obtain the fourth predicted probability of the third image.
- a prediction class of a sample can be reflected more intuitively, thereby improving the accuracy of training sample classification and improving the model training efficiency and accuracy.
- the determining a target loss function according to the first predicted probability, the second predicted probability, the third predicted probability, and the fourth predicted probability may include:
- the apparatus for training an image recognition model calculates the first loss function L CE according to the first predicted probability and labeled information corresponding to the first image set.
- the apparatus for training an image recognition model calculates the second loss function L Con according to at least one second predicted probability and at least one third predicted probability.
- the apparatus for training an image recognition model calculates the third loss function L MTL according to the third predicted probability and labeled information corresponding to the third image set.
- the target loss function further includes an entropy loss function L Ent and a regularization loss function L Reg .
- Minimizing the entropy loss function allows the model more certainly to predict a specific class for a particular task, rather than considering that several classes are all possible, entropy representing an expectation of an amount of information for each class.
- the regularization loss function may be added to the target loss function. It may be understood that the regularization loss function includes, but is not limited to, an L1 regularization loss function and an L2 regularization loss function.
- the regularization loss function may be considered as a penalty term of the target loss function.
- Each item may have different weights (that is, a weight may be a constant value or dynamically changed) when being superimposed. Generally, the weights need to be adjusted according to different tasks and different data sets.
- the specific content of the target loss function is provided, that is, the target loss function includes the first loss function, the second loss function, the third loss function, the entropy loss function, and the regularization loss function.
- the model is trained in different dimensions by using loss functions of different types, thereby improving the model training accuracy.
- the first predicted probability is a probability generated for a task of labeling a degree of tumor degradation
- a label 1 represents a benign tumor
- a label 2 represents a malignant tumor
- a label 3 represents no tumor.
- a first predicted probability of a first image is (0.1, 0.2, 0.7), that is, a prediction label of the first image is obtained as the label 3, which is a label of no tumor.
- Labeling processing has been performed on the first image, so that labeled information y 0 , that is, a real label, may be obtained.
- the real label is the label 3, a probability corresponding to the label 3 is (0, 0.1).
- a distance between distributions of two probabilities is described by using the cross entropy loss function, and a smaller cross entropy indicates that the two probabilities are closer.
- An obj ective of the model training is to expect that distributions of a predicted probability and a real probability are closer.
- the calculation manner of the first loss function is provided.
- a specific implementation basis is provided for generation of the first loss function, thereby improving the feasibility and operability of the model training.
- the calculating the second loss function according to the second predicted probability and the third predicted probability may include:
- the apparatus for training an image recognition model may calculate a second loss function according to a second predicted probability and a third predicted probability that are obtained through prediction.
- the second loss function may be a mean-square error (MSE) loss function or may be a kullback-leibler (KL) divergence loss function. It may be understood that in an actual application, the second loss function may be alternatively a loss function of another type, and the MSE loss function and the KL divergence loss function are used as examples herein for description.
- the second loss function is the MSE loss function
- the second loss function is the KL divergence loss function
- the second predicted probability and the third predicted probability may be outputted in the same training. Therefore, the second predicted probability may be alternatively represented as p 0 , and p r represents the third predicted probability. Similarly, the third predicted probability p r is obtained after normalization processing is performed on a predicted value Z r . The second predicted probability and the third predicted probability are alternatively outputted in different times of training.
- the second loss function may be specifically a consistency loss function, and a smaller second loss function indicates that results of two predictions are closer, that is, an effect of model training is better, and minimizing the second loss function allows two predicted values to be consistent.
- the calculation manner of the second loss function is provided.
- a specific implementation basis is provided for generation of the second loss function, thereby improving the feasibility and operability of the model training.
- an appropriate second loss function may be further selected for calculation according to a requirement, thereby improving the flexibility of the solution.
- the fourth predicted probability is a probability generated for a task of labeling a qualified condition of a picture
- a label 1 represents that a picture is qualified
- a label 2 represents that a picture is not qualified.
- a fourth predicted probability of a third image is (0.2, 0.8), that is, a prediction label of the third image is obtained as the label 2, which is a label in which the picture is not qualified.
- Labeling processing has been performed on the third image, so that labeled information y 1 , that is, a real label, may be obtained. It is assumed that the real label is the label 1, a probability corresponding to the label 1 is (1, 0).
- a distance between distributions of two probabilities is described by using the cross entropy loss function, and a smaller cross entropy indicates that the two probabilities are closer.
- An objective of the model training is to expect that distributions of a predicted probability and a real probability are closer.
- the calculation manner of the third loss function is provided.
- a specific implementation basis is provided for generation of the third loss function, thereby improving the feasibility and operability of the model training.
- an embodiment of an image recognition method in this embodiment of this application includes the following steps.
- an image recognition apparatus obtains an image to be recognized.
- the image to be recognized may be an endoscope image or may be a medical image of another type. This is not limited herein.
- the image recognition apparatus may be deployed in the server or may be deployed in the terminal device.
- an example in which the image recognition apparatus is deployed in the terminal device is used for description, but is not to be understood as a limitation to this application.
- the image recognition apparatus inputs the image to be recognized into the image recognition model described in the foregoing embodiments, and the image recognition model outputs a corresponding image recognition result.
- the image recognition apparatus may display the image recognition result.
- FIG. 7 is a schematic diagram of an interface of displaying an image recognition result according to an embodiment of this application.
- an inputted medical image is used as an example, and a doctor may select a corresponding task according to a requirement. It is assumed that a task A, that is, a task of positioning a lesion part is selected, a corresponding result is outputted based on the task A selected by the doctor, for example, a positioned lesion part is "stomach".
- a task B that is, a task of detecting a property of a tumor
- a corresponding result is outputted based on the task B selected by the doctor, for example, a property of a tumor is detected as "benign”.
- a task C that is, a task of a qualified condition of an image
- a corresponding result is outputted based on the task C selected by the doctor, for example, a qualified condition of an image is "qualified”.
- the image recognition method is provided, that is, an image to be recognized is obtained first, then the image to be recognized is inputted into a trained image recognition model, the image recognition model outputs an image recognition result, and finally the image recognition result is displayed.
- a recognition result under a corresponding task may be displayed according to a requirement, to assist a doctor in diagnosis, thereby more effectively helping the doctor reduce misdiagnosis and missed diagnosis, especially for a doctor lack of relevant clinical experience.
- FIG. 8 is a schematic diagram of an embodiment of an apparatus for training an image recognition model according to an embodiment of this application.
- An apparatus 30 for training an image recognition model includes:
- the embodiments of this application provide an apparatus for training an image recognition model.
- Image sets to be trained are obtained first, then a first predicted probability, a second predicted probability, a third predicted probability, and a fourth predicted probability are obtained based on the image sets to be trained by using an image recognition model to be trained, subsequently, a target loss function is determined according to the first predicted probability, the second predicted probability, the third predicted probability, and the fourth predicted probability, and finally the image recognition model to be trained is trained based on the target loss function, to obtain an image recognition model.
- a model is trained by using a labeled medical image for different tasks and an unlabeled medical image together. The labeled image and the unlabeled image are effectively used, so that a requirement for image labeling is reduced and a data volume for training is increased, thereby improving a model prediction effect while saving labeling resources.
- the obtaining module 301 is further configured to:
- a method for obtaining the first predicted probability, the second predicted probability, the third predicted probability, and the fourth predicted probability is provided.
- the second predicted probability and the third predicted probability are obtained based on the second image set by using the semi-supervised learning
- the fourth predicted probability is obtained based on the third image set by using the MTL.
- a plurality of related tasks can be further trained at the same time by using one image recognition model, some parameters of the image recognition model are shared by various tasks, and some other parameters are unique to each task.
- Shared parameters use all data sets of all tasks, so that a data volume for training is increased, and meanwhile unique noise of each training set is canceled, thereby improving a generalization ability of the model, and reducing overfitting of the model.
- An independent output layer may select a most relevant feature for a task from a shared part, and learn a unique classification boundary of each task, so that the model has sufficient flexibility, and can obtain relatively high accuracy for an image recognition task.
- a method for generating the first predicted probability is provided, that is, first, a first predicted value of the first image is obtained by using an FC layer included in the image recognition model to be trained, and then normalization processing is performed on the first predicted value of the first image, to obtain the first predicted probability of the first image.
- a prediction class of a sample can be reflected more intuitively, thereby improving the accuracy of training sample classification and improving the model training efficiency and accuracy.
- the obtaining module 301 is further configured to:
- the data processing manner based on semi-supervised learning is provided, that is, two times of random scrambling are performed on a second image, to obtain a first perturbed image and a second perturbed image, and then the second image and each of the first perturbed image and the second perturbed image form two training samples to be inputted into a model, to obtain two predicted probabilities.
- random scrambling can be effectively performed on an unlabeled image, to obtain images with different perturbed degrees as samples for model training, and manual intervention is not required during random scrambling, thereby improving the model training efficiency.
- randomized processing can improve a generalization ability of a model, thereby improving a model training effect.
- the semi-supervised learning avoids waste of data and resources, and resolves problems that a generalization ability of a model of full supervised learning is not strong and a model of unsupervised learning is inaccurate.
- the obtaining module 301 is further configured to:
- a method for generating the fourth predicted probability is provided, that is, first, a second predicted value of the third image is obtained by using the FC layer included in the image recognition model to be trained, and then normalization processing is performed on the second predicted value of the third image, to obtain the fourth predicted probability of the third image.
- a prediction class of a sample can be reflected more intuitively, thereby improving the accuracy of training sample classification and improving the model training efficiency and accuracy.
- the determining module 302 is further configured to:
- the specific content of the target loss function is provided, that is, the target loss function includes the first loss function, the second loss function, the third loss function, the entropy loss function, and the regularization loss function.
- the model is trained in different dimensions by using loss functions of different types, thereby improving the model training accuracy.
- the calculation manner of the first loss function is provided.
- a specific implementation basis is provided for generation of the first loss function, thereby improving the feasibility and operability of the model training.
- the determining module 302 is further configured to:
- the calculation manner of the second loss function is provided.
- a specific implementation basis is provided for generation of the second loss function, thereby improving the feasibility and operability of the model training.
- an appropriate second loss function may be further selected for calculation according to a requirement, thereby improving the flexibility of the solution.
- the calculation manner of the third loss function is provided.
- a specific implementation basis is provided for generation of the third loss function, thereby improving the feasibility and operability of the model training.
- FIG. 9 is a schematic diagram of an embodiment of an image recognition apparatus according to an embodiment of this application, and an image recognition apparatus 40 includes:
- an image recognition apparatus that is, an image to be recognized is obtained first, subsequently, the image to be recognized is inputted into a trained image recognition model, the image recognition model outputs an image recognition result, and finally the image recognition result is displayed.
- a recognition result under a corresponding task may be displayed according to a requirement, to assist a doctor in diagnosis, thereby more effectively helping the doctor reduce misdiagnosis and missed diagnosis, especially for a doctor lack of relevant clinical experience.
- the apparatus for training an image recognition model and the image recognition apparatus provided in this application may be deployed in an electronic device, and the electronic device may be a server or may be a terminal device.
- FIG. 10 is a schematic structural diagram of a server according to an embodiment of this application.
- the server 500 may vary greatly due to different configurations or performance, and may include one or more central processing units (CPU) 522 (for example, one or more processors) and a memory 532, and one or more storage media 530 (for example, one or more mass storage devices) that store application programs 542 or data 544.
- the memory 532 and the storage media 530 may be temporary storage or persistent storage.
- a program stored in the storage media 530 may include one or more modules (which are not marked in the figure), and each module may include a series of instruction operations on the server.
- the CPU 522 may be configured to communicate with the storage medium 530 to perform the series of instruction operations in the storage medium 530 on the server 500.
- the server 500 may further include one or more power supplies 526, one or more wired or wireless network interfaces 550, one or more input/output interfaces 558, and/or one or more operating systems 541 such as Windows Server TM , Mac OS X TM , Unix TM , Linux TM , or FreeBSD TM .
- operating systems 541 such as Windows Server TM , Mac OS X TM , Unix TM , Linux TM , or FreeBSD TM .
- the steps performed by the server in the foregoing embodiment may be based on the structure of the server shown in FIG. 10 .
- the CPU 522 included in the server further has the following functions:
- the CPU 522 included in the server further has the following functions:
- the terminal device may be any terminal device including a mobile phone, a tablet computer, a personal digital assistant (PDA), a point of sales (POS), an on-board computer, or the like, and the terminal device being a mobile phone is used as an example.
- PDA personal digital assistant
- POS point of sales
- FIG. 11 is a block diagram of a structure of a part of a mobile phone related to a terminal device according to an embodiment of this application.
- the mobile phone includes components such as: a radio frequency (RF) circuit 610, a memory 620, an input unit 630, a display unit 640, a sensor 650, an audio circuit 660, a wireless fidelity (WiFi) module 670, a processor 680, and a power supply 690.
- the input unit 630 may include a touch panel 631 and another input device 632
- the display unit 640 may include a display panel 641
- the audio circuit 660 is connected to a loudspeaker 661 and a microphone 662.
- FIG. 11 does not constitute a limitation to the mobile phone, and the mobile phone may include more components or fewer components than those shown in the figure, or some components may be combined, or a different component deployment may be used.
- the memory 620 may be configured to store a software program and module.
- the processor 680 runs the software program and module stored in the memory 620, to implement various functional applications and data processing of the mobile phone.
- the memory 620 may mainly include a program storage area and a data storage area.
- the program storage area may store an operating system, an application program required by at least one function (such as a sound playback function and an image display function), and the like.
- the data storage area may store data (for example, audio data and an address book) created according to the use of the mobile phone, and the like.
- the processor 680 is a control center of the mobile phone, and is connected to various parts of the entire mobile phone by using various interfaces and lines. By running or executing a software program and/or module stored in the memory 620, and invoking data stored in the memory 620, the processor executes various functions of the mobile phone and performs data processing, thereby monitoring the entire mobile phone.
- the processor 680 included in the terminal device further has the following functions:
- the processor 680 included in the terminal device further has the following functions:
- FIG. 12 is a structural diagram of an endoscope medical diagnosis system 70 according to an implementation of this application.
- the endoscope medical diagnosis system 70 in this implementation is a system for supporting an endoscope service.
- the endoscope medical diagnosis system 70 has a probe 701, a processor 702, a display 703, a circuit 704, and an interface 705.
- the endoscope medical diagnosis system 70 and a terminal device 80 can work cooperatively.
- the probe 701 may be specifically an endoscope probe, and may be inserted into the esophagus, gastrointestinal, bronchial, or the like for real-time scanning imaging. A doctor can clearly identify a tumor growth level and a depth of invasion by using the endoscope probe.
- the endoscope probe may be further applied to organ imaging in the vicinity of the intestinal tract, and plays a role in lesion diagnosis of pancreas, bile duct, and gall bladder.
- the processor 702 is configured to recognize an endoscope image captured by the probe 701 and generate a recognition result.
- the display 703 displays a lesion recognition result according to an image signal inputted by the processor 702, the lesion recognition result being specifically an image result, and may display an image in real time captured by the probe 701.
- the circuit 704 is configured to be connected to modules in the endoscope medical diagnosis system 70 and provide an electrical signal, to enable normal operation inside the endoscope medical diagnosis system 70 and enable the endoscope medical diagnosis system to establish a communication connection with the terminal device 80.
- the endoscope medical diagnosis system 70 may directly recognize and process an acquired endoscope image, or send an acquired endoscope image to the terminal device 80 by using the interface 705, and the terminal device 80 recognizes and processes the endoscope image.
- the terminal device 80 can make an electronic medical record and a prescription or directly print an electronic medical record and a prescription, or the like based on a lesion recognition result sent by the endoscope medical diagnosis system 70.
- the processor 702 included in the endoscope medical diagnosis system further has the following functions:
- the processor 702 included in the endoscope medical diagnosis system is further configured to perform the following steps:
- the processor 702 included in the endoscope medical diagnosis system is further configured to perform the following steps:
- the processor 702 included in the endoscope medical diagnosis system is further configured to perform the following steps:
- the processor 702 included in the endoscope medical diagnosis system is further configured to perform the following steps:
- the processor 702 included in the endoscope medical diagnosis system is further configured to perform the following steps:
- the processor 702 included in the endoscope medical diagnosis system further has the following functions:
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Medical Informatics (AREA)
- Physics & Mathematics (AREA)
- Public Health (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Primary Health Care (AREA)
- Epidemiology (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Pathology (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Radiology & Medical Imaging (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Mathematical Physics (AREA)
- Multimedia (AREA)
- Pure & Applied Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Algebra (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Image Analysis (AREA)
Abstract
Description
- This application claims priority to
Chinese Patent Application No. 201910989262.8, entitled "METHOD AND APPARATUS FOR TRAINING IMAGE RECOGNITION MODEL, AND IMAGE RECOGNITION METHOD AND APPARATUS" filed with the China National Intellectual Property Administration on October 17, 2019 - This application relates to the field of artificial intelligence (AI), and in particular, to an image processing technology.
- As population continuously increases, burden on medical systems is increasing, and a requirement for medical resources is also increasing. In an actual application, medical staff may analyze an illness of a patient by using a medical image. To help the medical staff diagnose the illness more quickly and more accurately, and the medical image may be recognized by using an automatic diagnostic device.
- Currently, to implement automatic diagnosis, a large quantity of medical images are often required to train an image recognition model. The medical images need to be labeled by the medical staff, that is, the medical staff can make a judgment on each medical image according to clinical experience. For example, whether a disease exists in the medical image or not, and a position of a lesion in the medical image.
- However, as the quantity of medical images is continuously accumulated, the complexity of the lesion is increasingly high, labeling becomes increasingly difficult, and labeling resources that can be used for training the image recognition model are limited. Moreover, the limited labeling resources results in that only a small part of marked medical images can be used in a model training process. In addition, because model training usually needs to be implemented in combination with a specific task, and for different tasks, a training set corresponding to a task needs to be adopted. As a result, the labeled medical image cannot be effectively used and data of a training set of some tasks is insufficient, resulting in relatively low accuracy of a model prediction effect.
- The embodiments of this application provide a method and an apparatus for training an image recognition model and an image recognition method and apparatus, which can train a model by using a labeled medical image for different tasks and an unlabeled medical image together. The labeled image and the unlabeled image are effectively used, so that a requirement for image labeling is reduced and a data volume for training is increased, thereby improving a model prediction effect while saving labeling resources.
- In view of this, a first aspect of this application provides a method for training an image recognition model, including:
- obtaining image sets to be trained, the image sets to be trained including at least a first image set, a second image set, and a third image set, the first image set including at least one first image, the second image set including at least one second image and at least one perturbed image, the third image set including at least one third image, the first image being a labeled image corresponding to a first task, the second image being an unlabeled image corresponding to the first task, the third image being a labeled image corresponding to a second task, the first task and the second task being different tasks;
- obtaining a first predicted probability, a second predicted probability, a third predicted probability, and a fourth predicted probability based on the image sets to be trained by using an image recognition model to be trained, the first predicted probability being a predicted result outputted based on the first image set, the second predicted probability and the third predicted probability being predicted results outputted based on the second image set, and the fourth predicted probability being a predicted result outputted based on the third image set;
- determining a target loss function according to the first predicted probability, the second predicted probability, the third predicted probability, and the fourth predicted probability, the target loss function including at least a first loss function determined according to the first predicted probability, a second loss function determined according to the second predicted probability and the third predicted probability, and a third loss function determined according to the fourth predicted probability; and
- training the image recognition model to be trained based on the target loss function, to obtain an image recognition model.
- A second aspect of this application provides an image recognition method, including:
- obtaining an image to be recognized;
- obtaining an image recognition result corresponding to the image to be recognized by using an image recognition model, the image recognition model being the image recognition model trained according to the method in the first aspect; and
- displaying the image recognition result.
- A third aspect of this application provides an apparatus for training an image recognition model, including:
- an obtaining module, configured to obtain image sets to be trained, the image sets to be trained including at least a first image set, a second image set, and a third image set, the first image set including at least one first image, the second image set including at least one second image and at least one perturbed image, the third image set including at least one third image, the first image being a labeled image corresponding to a first task, the second image being an unlabeled image corresponding to the first task, the third image being a labeled image corresponding to a second task, the first task and the second task being different tasks;
- the obtaining module, further configured to obtain a first predicted probability, a second predicted probability, a third predicted probability, and a fourth predicted probability based on the image sets to be trained by using an image recognition model to be trained, the first predicted probability being a predicted result outputted based on the first image set, the second predicted probability and the third predicted probability being predicted results outputted based on the second image set, and the fourth predicted probability being a predicted result outputted based on the third image set;
- a determining module, configured to determine a target loss function according to the first predicted probability, the second predicted probability, the third predicted probability, and the fourth predicted probability, the target loss function including at least a first loss function determined according to the first predicted probability, a second loss function determined according to the second predicted probability and the third predicted probability, and a third loss function determined according to the fourth predicted probability; and
- a training module, configured to train the image recognition model to be trained according to the target loss function determined by the determining module, to obtain an image recognition model.
- A fourth aspect of this application provides an image recognition apparatus, including:
- an obtaining module, configured to obtain an image to be recognized;
- the obtaining module, further configured to obtain an image recognition result corresponding to the image to be recognized by using an image recognition model, the image recognition model being the image recognition model trained according to the method in the first aspect; and
- a displaying module, configured to display the image recognition result.
- A fifth aspect of this application provides an electronic device, including: a memory, a transceiver, a processor, and a bus system,
- the memory being configured to store a program;
- the processor being configured to execute the program in the memory, to perform the method according to the method in the first aspect or the second aspect; and
- the bus system being configured to connect the memory and the processor to enable communication between the memory and the processor.
- A sixth aspect of this application provides an endoscope medical diagnosis system, including: a probe, a circuit, a processor, and a display,
- the circuit being configured to excite the probe to obtain an image to be recognized;
- the processor being configured to obtain an image recognition result corresponding to the image to be recognized by using an image recognition model, the image recognition model being the image recognition model trained according to the method in the first aspect; and
- the display being configured to display the image recognition result.
- A seventh aspect of this application provides a computer-readable storage medium, storing instructions, the instructions, when run on a computer, causing the computer to perform the method in the first aspect or the second aspect.
- An eighth aspect of this application provides a computer program product, including instructions, the instructions, when run on a computer, causing the computer to perform the method in the first aspect or the second aspect.
- It can be seen from the foregoing technical solutions that the embodiments of this application have the following advantages:
- The embodiments of this application provide a method for training an image recognition model. Image sets to be trained are obtained first, then a first predicted probability, a second predicted probability, a third predicted probability, and a fourth predicted probability are obtained based on the image sets to be trained by using an image recognition model to be trained, subsequently, a target loss function is determined according to the first predicted probability, the second predicted probability, the third predicted probability, and the fourth predicted probability, and finally the image recognition model to be trained is trained based on the target loss function, to obtain an image recognition model. In this way, a model can be trained by using a labeled medical image for different tasks and an unlabeled medical image together. The labeled image and the unlabeled image are effectively used, so that a requirement for image labeling is reduced and a data volume for training is increased, thereby improving a model prediction effect while saving labeling resources.
-
-
FIG. 1 is a schematic architectural diagram of an image recognition system according to an embodiment of this application. -
FIG. 2 is an entire schematic structural diagram of training an image recognition model according to an embodiment of this application. -
FIG. 3 is a schematic diagram of an embodiment of a method for training an image recognition model according to an embodiment of this application. -
FIG. 4 is a schematic diagram of an embodiment based on semi-supervised learning according to an embodiment of this application. -
FIG. 5 is a schematic diagram of an embodiment based on multi-task learning according to an embodiment of this application. -
FIG. 6 is a schematic diagram of an embodiment of an image recognition method according to an embodiment of this application. -
FIG. 7 is a schematic diagram of an interface of displaying an image recognition result according to an embodiment of this application. -
FIG. 8 is a schematic diagram of an embodiment of an apparatus for training an image recognition model according to an embodiment of this application. -
FIG. 9 is a schematic diagram of an embodiment of an image recognition apparatus according to an embodiment of this application. -
FIG. 10 is a schematic structural diagram of a server according to an embodiment of this application. -
FIG. 11 is a schematic structural diagram of a terminal device according to an embodiment of this application. -
FIG. 12 is a schematic structural diagram of an endoscope medical diagnosis system according to an embodiment of this application. - The embodiments of this application provide a method and an apparatus for training an image recognition model and an image recognition method and apparatus, a model is trained by using a labeled medical image for different tasks and an unlabeled medical image together, and the labeled image and the unlabeled image are effectively used, so that a requirement for image labeling is reduced and a data volume for training is increased, thereby improving a model prediction effect while saving labeling resources.
- It is to be understood that the method for training an image recognition model and the image recognition method provided by this application are applicable to the medical field of artificial intelligence (AI), and are particularly applicable to the field of medical image recognition based on a computer vision (CV) technology.
- The most common medical images in the medical field include, but are not limited to, an endoscope image, an angiography image, an angiocardiographic image, a computerized tomography (CT) image, a B-mode ultrasound image, and a pathology image. Because the medical image can directly reflect a lesion occurring inside a tissue, and is an important basis for a doctor to perform disease diagnosis, and even a final basis of diagnosis of some diseases. For example, in diagnosis of cancer, a cancer diagnosis result is determined by observing a radiographic image of a lesion, which includes observing whether there is a shadow, a plaque, or vasodilation. In this application, an endoscope image may be recognized, and is applied to automatic diagnosis of an endoscope image to assist a doctor in improving diagnosis efficiency and accuracy, and on this basis, available data of another form is further used to assist model training to improve model accuracy.
- The medical image is an important information entry for the doctor to learn an illness of a patient. Although a current high-quality medical imaging device has become popular, accurate interpretation of the medical image often requires the doctor to have professional knowledge background and long-term experience accumulation. Considering that population is large, burden on a medical system is heavy, and a quantity of experienced doctors is insufficient and is mainly concentrated in large-scale grade-A tertiary hospitals in first-tier cities, resulting in scarcity of medical resources. According to the method provided by this application, based on labeled data of a target task, unlabeled data (that is, semi-supervised learning) of the target task and labeled data (that is, multi-task learning, MTL) of another related task can be further used, and information in existing data of various types is maximized to assist the model training, to improve a model effect.
- For ease of understanding, this application provides an image recognition method, and the method is applicable to an image recognition system shown in
FIG. 1. FIG. 1 is a schematic architectural diagram of an image recognition system according to an embodiment of this application. As shown in the figure, the image recognition system may include a medical device, and the medical device may be specifically an endoscope device, an electron microscope, or the like. After acquiring a medical image to be recognized, the medical device may recognize the medical image according to a task type by using a trained image recognition model. Automatic diagnosis of an endoscope image is used as an example, recognition may be performed according to different parts (for example, esophagus, stomach, duodenum, and colorectum), or recognition may be performed according to different target tasks (for example, distinguishing benign or malignant, distinguishing parts, or distinguishing whether a picture is qualified), and a visualization result may be obtained finally, to provide a doctor with a focus region. - Optionally, after acquiring the medical image to be-recognized, the medical device may send the medical image to a terminal device, the terminal device may recognize the medical image by using the trained image recognition model, to obtain the visualization result for providing a doctor with a focus region and displaying the result on an interface.
- Optionally, after acquiring the medical image to be recognized, the medical device may send the medical image to a server, and the server recognizes the medical image by using the trained image recognition model. After obtaining a recognition result, the server may feed the result back to the terminal device or the medical device, and the terminal device or the medical device performs displaying.
- The terminal device includes, but is not limited to, a tablet computer, a notebook computer, a palmtop computer, a mobile phone, a speech interaction device, and a personal computer (PC), and is not limited herein.
- The image recognition model used in this application may be trained by using an architecture shown in
FIG. 2. FIG. 2 is an entire schematic structural diagram of training an image recognition model according to an embodiment of this application. As shown in the figure, the image recognition model in this application may adopt a deep learning model structure, for example, a residual network (ResNet) structure or a dense convolutional network structure. During training, data augmentation and data preprocessing may be performed on training data, and an end-to-end method based stochastic gradient descent is adopted for training. Alternate training of each task may be selected. For alternate training, labeled data of a target task, auxiliary task data in the MTL, and unlabeled data in the semi-supervised learning are inputted sequentially, a corresponding optimizer is invoked to reduce a corresponding loss value, so as to update parameters of an overlapped part and unique parameters of the target task. Hybrid training may be alternatively selected. For hybrid training, that is, the labeled data of the target task, the auxiliary task data in the MTL, and the unlabeled data in the semi-supervised learning that are mixed are inputted each time, an optimizer is invoked after corresponding loss values are added, thereby reducing a total loss value. - After an image recognition model is obtained through training, an online inference part shown in
FIG. 2 may be used for prediction, and the online inference part includes data preprocessing, a network model, and a fully connected layer. In an actual application, the online inference part may further include another network layer. This is merely an example and is not to be understood as a limitation on this application. - Referring to
FIG. 3 , an embodiment of a method for training an image recognition model in this embodiment of this application includes the following steps:
101. Obtain image sets to be trained, the image sets to be trained including at least a first image set, a second image set, and a third image set, the first image set including at least one first image, the second image set including at least one second image and at least one perturbed image, the third image set including at least one third image, the first image being a labeled image corresponding to a first task, the second image being an unlabeled image corresponding to the first task, the third image being a labeled image corresponding to a second task, the first task and the second task being different tasks. - In this embodiment, an apparatus for training an image recognition model obtains image sets to be trained. It may be understood that the apparatus for training an image recognition model may be deployed on the terminal device or may be deployed on the server. Because a data volume for training is usually relatively large, model training may be performed by using the server. However, this is not to be understood as a limitation of this application.
- The image sets to be trained include at least a first image set, a second image set, and a third image set, and each of the first image set, the second image set, and the third image set belongs to a training sample. The first image set includes at least one first image (which may be represented as x 0), the second image set includes at least one second image (which may be represented as xUL ) and at least one perturbed image (which may be represented as xpert ), and the third image set includes at least one third image (which may be represented as x 1). The first image is a labeled image that carries labeled information and corresponds to a first task, the second image is an unlabeled image that does not carry the labeled information and corresponds to the first task, and the third image is a labeled image that carries the labeled information and corresponds to a second task. The first task and the second task are different tasks. The perturbed image is obtained by performing random scrambling on the second image, and a size of the perturbed image is usually the same as a size of the second image. The random scrambling includes, but is not limited to, flipping, rotation, and translation. It may be understood that two times of random scrambling may be performed on one second image, that is, one second image may correspond to two perturbed images. In addition, the perturbed image is usually generated during training.
- 102. Obtain a first predicted probability, a second predicted probability, a third predicted probability, and a fourth predicted probability based on the image sets to be trained by using an image recognition model to be trained, the first predicted probability being a predicted result outputted based on the first image set, the second predicted probability and the third predicted probability being predicted results outputted based on the second image set, and the fourth predicted probability being a predicted result outputted based on the third image set.
- In this embodiment, two training processes, which are respectively semi-supervised learning and multi-task learning (MTL), are adopted. The first image set and the second image set are used for the semi-supervised learning, the second predicted probability and the third predicted probability are output results of the semi-supervised learning, the third image set is used for the MTL, and the fourth predicted probability is an output result of the MTL.
- The semi-supervised learning assists training by using unlabeled data of the same task to improve a model effect. The significance of labeling is to determine whether a result of prediction of a current model is correct, so as to server as an indication for evaluating quality of the mode. That is, a target loss function is set, a more accurate current image recognition model to be trained indicates a smaller value of the target loss function, and a model training process is an optimization process of causing the target loss function to obtain a minimum value. For labeled image data, quality of a model may be evaluated by using a cross entropy loss function. However, for unlabeled image data, the quality of the model cannot be evaluated by using a label. Therefore, the same picture may be inputted into a network after two times of random disturbance, and a difference between two prediction results is determined by using a consistency constraint loss function. The model training is to reduce the different between the two prediction results.
- The MTL assists training by using a labeled data set in another related task, to improve the model effect. In a conventional machine learning method, a model is independently trained for each task, but in an MTL method, a plurality of related tasks may be trained at the same time by using one network model. Some parameters of the network model are shared by the tasks, and some other parameters of the network model are unique to each task.
- 103. Determine a target loss function according to the first predicted probability, the second predicted probability, the third predicted probability, and the fourth predicted probability, the target loss function including at least a first loss function, a second loss function, and a third loss function, the first loss function being determined according to the first predicted probability, the second loss function being determined according to the second predicted probability and the third predicted probability, and the third loss function being determined according to the fourth predicted probability.
- In this embodiment, the apparatus for training an image recognition model determines a first loss function according to the first predicted probability and labeled information corresponding to the first image set, the first predicted probability being a predicted value, and the labeled information corresponding to the first image set being a real value, and calculates the first loss function based on the predicted value and the real value. The apparatus for training an image recognition model determines a second loss function according to the second predicted probability and the third predicted probability, both the second predicted probability and the third predicted probability being predicted values. The apparatus for training an image recognition model determines a third loss function according to the fourth predicted probability and labeled information corresponding to the third image set, the fourth predicted probability being a predicted value, and the labeled information corresponding to the third image set being a real value, and calculates the third loss function based on the predicted value and the real value. A target loss function may be obtained according to the first loss function, the second loss function, and the third loss function.
- 104. Train the image recognition model to be trained based on the target loss function, to obtain an image recognition model.
- In this embodiment, when the target loss function converges, it indicates that training of the image recognition model to be trained is completed. In this case, the image recognition model to be trained is an image recognition model. It may be understood that in an actual application, it may be also considered that the target loss function has converged when a quantity of times of training reaches a threshold.
- The embodiments of this application provide a method for training an image recognition model. Image sets to be trained are obtained first, then a first predicted probability, a second predicted probability, a third predicted probability, and a fourth predicted probability are obtained based on the image sets to be trained by using an image recognition model to be trained, subsequently, a target loss function is determined according to the first predicted probability, the second predicted probability, the third predicted probability, and the fourth predicted probability, and finally the image recognition model to be trained is trained based on the target loss function, to obtain an image recognition model. In this way, a model can be trained by using a labeled medical image for different tasks and an unlabeled medical image together. The labeled image and the unlabeled image are effectively used, so that a requirement for image labeling is reduced and a data volume for training is increased, thereby improving a model prediction effect while saving labeling resources.
- Optionally, based on the embodiment corresponding to
FIG. 3 , in a first optional embodiment of the method for training an image recognition model according to the embodiments of this application, the obtaining a first predicted probability, a second predicted probability, a third predicted probability, and a fourth predicted probability based on the image sets to be trained by using an image recognition model to be trained may include: - obtaining the first predicted probability based on the first image set by using the image recognition model to be trained;
- obtaining the second predicted probability and the third predicted probability based on the second image set by using the image recognition model to be trained; and
- obtaining the fourth predicted probability based on the third image set by using the image recognition model to be trained.
- In this embodiment, the apparatus for training an image recognition model inputs a second image set into the image recognition model to be trained. Specifically, the second image set includes a second image and a perturbed image. It is assumed that first random scrambling is performed on a second image A to obtain a perturbed image A, and second random scrambling is performed on the second image A to obtain a perturbed image B. Therefore, the apparatus for training an image recognition model first inputs the second image A and the perturbed image A into the image recognition model to be trained, and the image recognition model to be trained outputs a second predicted probability. Subsequently, the apparatus for training an image recognition model inputs the second image A and the perturbed image B into the image recognition model to be trained, the image recognition model to be trained outputs a third predicted probability, and two predicted probabilities are obtained respectively through two predictions. In an actual application, two times of random scrambling may be performed on each second image.
- For ease of understanding,
FIG. 4 is a schematic diagram of an embodiment of performing training based on an unlabeled sample according to an embodiment of this application. As shown in the figure, a sample set of the second image includes at least one second image. First random scrambling is first performed on each second image in the sample set of the second image, to obtain a sample set of the perturbed image A. Subsequently, second random scrambling is performed on each second image in the sample set of the second image, to obtain a sample set of the perturbed image B. Both the sample set of the second image and the sample set of the perturbed image A are inputted into the image recognition model to be trained, to obtain a first predicted probability corresponding to each sample. Both the sample set of the second image and the sample set of the perturbed image B are inputted into the image recognition model to be trained, to obtain a second predicted probability corresponding to each sample. - In this embodiment, the apparatus for training an image recognition model further inputs a first image set into the image recognition model to be trained. Specifically, the first image set includes a first image, and the first image is a labeled image. Similarly, the apparatus for training an image recognition model further inputs a third image set into the image recognition model to be trained. Specifically, the third image set includes a third image, and the third image is similar to the first image and is also a labeled image. The difference is that the first image set in which the first image is located and the third image set in which the third image is located correspond to different learning tasks. For example, the first image set is labeled for a lesion positioning task, that is, content labeled in the first image is a position of a lesion, for example, the lesion is in the esophagus, stomach, duodenum, colorectum, or the like. However, the third image set is labeled for a tumor property task, that is, content labeled in the third image is a tumor property such as a malignant tumor or a benign tumor. It may be understood that in an actual application, other different tasks may be further set according to a requirement. This is merely an example and is not to be understood as a limitation on this application.
- For ease of description,
FIG. 5 is a schematic diagram of an embodiment based on MTL according to an embodiment of this application. As shown in the figure, the MTL assists training by using another related labeled data set, to improve a model effect. In a conventional machine learning method, a model is independently trained for each task, but in an MTL method, a plurality of related tasks may be trained at the same time by using one network model. Some parameters of the network model are shared by the tasks, and some other parameters of the network model are unique to each task. As shown inFIG. 5 , for inputted training data, prediction results under four different tasks are outputted by using the image recognition model to be trained, parameters are shared among different tasks, and all data sets of all tasks are used, so that a data volume for training is increased. - The MTL has a plurality of forms, including, but is not limited to, joint learning, learning to learn, and learning with an auxiliary task. Generally, optimizing a plurality of loss functions is equivalent to performing the MTL. Even if only one loss function is optimized, an original task model may be improved by using an auxiliary task. The MTL provided in this application may be implemented based on parameter hard sharing, or may be implemented based on parameter soft sharing. The parameter hard sharing is typically implemented by sharing a hidden layer between all tasks while preserving output layers of several specific tasks. In the parameter soft sharing, each task has a separate model, and each model includes a respective parameter.
- Secondly, in this embodiment of this application, a method for obtaining the first predicted probability, the second predicted probability, the third predicted probability, and the fourth predicted probability is provided. The second predicted probability and the third predicted probability are obtained based on the second image set by using the semi-supervised learning, and the fourth predicted probability is obtained based on the third image set by using the MTL. In the foregoing manner, training is effectively performed by using unlabeled data, to improve a model effect, and a requirement for labeled data is reduced while a better effect is obtained, thereby reducing product development costs and accelerating a product development cycle. In addition, a plurality of related tasks can be further trained at the same time by using one image recognition model, some parameters of the image recognition model are shared by various tasks, and some other parameters are unique to each task. Shared parameters use all data sets of all tasks, so that a data volume for training is increased, and meanwhile unique noise of each training set is canceled, thereby improving a generalization ability of the model, and reducing overfitting of the model. An independent output layer may select a most relevant feature for a task from a shared part, and learn a unique classification boundary of each task, so that the model has sufficient flexibility, and can obtain relatively high accuracy for an image recognition task.
- Optionally, based on the embodiment corresponding to
FIG. 3 , in a second optional embodiment of the method for training an image recognition model according to the embodiments of this application, the obtaining the first predicted probability based on the first image set by using the image recognition model to be trained may include: - obtaining a first predicted value based on the at least one first image by using a fully connected layer included in the image recognition model to be trained; and
- performing normalization processing on the first predicted value, to obtain the first predicted probability.
- In this embodiment, the method for generating the first predicted probability is described. For ease of description, one first image in the first image set is used as an example for description below. It may be understood that other first images in the first image set are also processed in a similar manner, and details are not described herein again.
- Specifically, it is assumed that the first image is represented as x 0, and labeled information of the first image is y 0. The labeled information is used for representing a classification label under a classification task, for example, the classification task is a lesion positioning task, and the classification label may be different parts. For example, a
label 1 represents an esophagus part, alabel 2 represents a stomach, alabel 3 represents a duodenal part, alabel 4 represents a colorectal part, and a label 5 represents no type. In another example, the classification task is a task of distinguishing tumor properties, and the classification label may be different degrees of tumor progression. For example, alabel 1 represents a benign tumor, alabel 2 represents a malignant tumor, and alabel 3 represents no tumor. In another example, the classification task is a task of distinguishing qualified conditions of a picture, and the classification label may be different picture qualification conditions. For example, alabel 1 represents that the picture is qualified, and alabel 2 represents that the picture is not qualified. - A first predicted value is outputted after the first image x 0 belonging to a second task passes through a fully connected (FC) layer, the first predicted value being represented as z 0, and the first predicted probability p 0 of the first image is obtained after the first predicted value z 0 passes through a softmax layer, that is, normalization processing is implemented. The first predicted probability is obtained through calculation in the following manner:
- The last layer of the image recognition model to be trained may be the FC layer+the softmax layer. The FC layer multiplies a weight matrix and an input vector and then adds a bias, and maps N real numbers into K fractions, and the softmax layer maps K real numbers into K probabilities within a range (0, 1) and ensures that a sum of the K real numbers is 1.
- Secondly, in this embodiment of this application, the method for generating the first predicted probability is provided, that is, first, a first predicted value of the first image is obtained by using an FC layer included in the image recognition model to be trained, and then normalization processing is performed on the first predicted value of the first image, to obtain the first predicted probability of the first image. In the foregoing manner, after normalization processing is performed on a predicted value, a prediction class of a sample can be reflected more intuitively, thereby improving the accuracy of training sample classification and improving the model training efficiency and accuracy.
- Optionally, based on the embodiment corresponding to
FIG. 3 , in a third optional embodiment of the method for training an image recognition model according to the embodiments of this application, the obtaining the second predicted probability and the third predicted probability based on the second image set by using the image recognition model to be trained may include: - generating a first perturbed image set according to the at least one second image, the first perturbed image set including at least one first perturbed image, the first perturbed image having a correspondence with the second image, and the first perturbed image belonging to the perturbed image;
- generating a second perturbed image set according to the at least one second image, the second perturbed image set including at least one second perturbed image, the second perturbed image having a correspondence with the second image, and the second perturbed image belonging to the perturbed image;
- obtaining the second predicted probability based on the at least one second image and the first perturbed image set by using the image recognition model to be trained; and
- obtaining the third predicted probability based on the at least one second image and the second perturbed image set by using the image recognition model to be trained.
- In this embodiment, a data processing manner based on semi-supervised learning is described. First, the apparatus for training an image recognition model obtains at least one second image, the second image herein being an unlabeled image. Subsequently, two times of random scrambling are performed on each second image, and a first perturbed image set is obtained after first random scrambling, the first perturbed image set including at least one first perturbed image, that is, each first perturbed image corresponds to a second image. Similarly, a second perturbed image set is obtained after second random scrambling, the second perturbed image set including at least one second perturbed image, that is, each second perturbed image corresponds to a second image, and a quantity of second perturbed images being usually the same as a quantity of first perturbed images. The at least one second image and the first perturbed image set are inputted into the image recognition model to be trained, to obtain the second predicted probability. For example, 1000 second images and 1000 first perturbed images may be inputted into the image recognition model to be trained, or 100 second images and 100 first perturbed images may be inputted into the image recognition model to be trained. A quantity of second images is not limited this time. Similarly, the at least one second image and the second perturbed image set are inputted into the image recognition model to be trained, to obtain the third predicted probability. The second predicted probability may be the same as or different from the third predicted probability.
- It may be understood that in an actual application, a result outputted by the image recognition model to be trained may be a predicted value, and a predicted probability may be obtained after normalization processing is performed on the predicted value.
- Data augmentation needs to be performed on the second image during random scrambling, and in addition to performing flipping, rotation, and translation on the second image, a direction, a position, a proportion, a brightness, or the like of the second image may be changed. A random factor such as a random dropout may be added to the image recognition model to be trained. The dropout is a method for optimizing an artificial neural network with a depth structure, and some weights or outputs of a hidden layer are return to zero randomly during learning, to reduce interdependence between nodes, thereby achieving regularization of a neural network. If a perturbed image is random noise, a random scrambling process may be referred to as a Pi-model. If the perturbed image is adversarial perturbation, the random scrambling process may be referred to as virtual adversarial training (VAT).
- Secondly, in this embodiment of this application, the data processing manner based on semi-supervised learning is provided, that is, two times of random scrambling are performed on a second image, to obtain a first perturbed image and a second perturbed image, and then the second image and each of the first perturbed image and the second perturbed image form two training samples to be inputted into a model, to obtain two predicted probabilities. In the foregoing manner, random scrambling is performed on an unlabeled image, to obtain images with different perturbation degrees as samples for model training, and manual intervention is not required during random scrambling, thereby improving the model training efficiency. In addition, randomized processing can improve the generalization ability of the model, thereby improving a model training effect. The semi-supervised learning avoids waste of data and resources, and resolves problems that a generalization ability of a model of full supervised learning is not strong and a model of unsupervised learning is inaccurate.
- Optionally, based on the embodiment corresponding to
FIG. 3 , in a fourth optional embodiment of the method for training an image recognition model according to the embodiments of this application, the obtaining the fourth predicted probability based on the third image set by using the image recognition model to be trained may include: - obtaining a fourth predicted value based on the at least one third image by using an FC layer included in the image recognition model to be trained; and
- performing normalization processing on the fourth predicted value, to obtain the fourth predicted probability.
- In this embodiment, the method for generating the fourth predicted probability is described. For ease of description, one third image in the third image set is used as an example for description below. It may be understood that other third images in the third image set are also processed in a similar manner, and details are not described herein again.
- Specifically, it is assumed that the third image is represented as x 1, and labeled information of the third image is y 1. The labeled information is used for representing a classification label under a classification task, for example, the classification task is a lesion positioning task, and the classification label may be different parts. For example, a
label 1 represents an esophagus part, alabel 2 represents a stomach, alabel 3 represents a duodenal part, alabel 4 represents a colorectal part, and a label 5 represents no type. In another example, the classification task is a task of distinguishing tumor properties, and the classification label may be different degrees of tumor progression. For example, alabel 1 represents a benign tumor, alabel 2 represents a malignant tumor, and alabel 3 represents no tumor. In another example, the classification task is a task of distinguishing qualified conditions of a picture, and the classification label may be different picture qualification conditions. For example, alabel 1 represents that the picture is qualified, and alabel 2 represents that the picture is not qualified. The labeled information of the third image belongs to the second task, the labeled information of the first image belongs to the first task, and the two tasks are different. - A second predicted value is outputted after the third image x 1 belonging to the second task passes through the FC layer, the second predicted value being represented as z 1, and the fourth predicted probability p 1 of the third image is obtained after the second predicted value z 1 passes through the softmax layer, that is, normalization processing is implemented. The fourth predicted probability is obtained through calculation in the following manner:
- The last layer of the image recognition model to be trained may be the FC layer+the softmax layer. The FC layer multiplies a weight matrix and an input vector and then adds a bias, and maps N real numbers into K fractions, and the softmax layer maps K real numbers into K probabilities within a range (0, 1) and ensures that a sum of the K real numbers is 1.
- Secondly, in this embodiment of this application, the method for generating the fourth predicted probability is provided, that is, first, a second predicted value of the third image is obtained by using an FC layer included in the image recognition model to be trained, and then normalization processing is performed on the second predicted value of the third image, to obtain the fourth predicted probability of the third image. In the foregoing manner, after normalization processing is performed on a predicted value, a prediction class of a sample can be reflected more intuitively, thereby improving the accuracy of training sample classification and improving the model training efficiency and accuracy.
- Optionally, based on the embodiment corresponding to
FIG. 3 , in a fifth optional embodiment of the method for training an image recognition model according to the embodiments of this application, the determining a target loss function according to the first predicted probability, the second predicted probability, the third predicted probability, and the fourth predicted probability may include: - calculating the first loss function according to the first predicted probability and labeled information corresponding to the first image set;
- calculating the second loss function according to the second predicted probability and the third predicted probability;
- calculating the third loss function according to the fourth predicted probability and labeled information corresponding to the third image set;
- obtaining an entropy loss function and a regularization loss function; and
- obtaining the target loss function through calculation according to the first loss function, the second loss function, the third loss function, the entropy loss function, and the regularization loss function.
- In this embodiment, specific content of the target loss function is described. The apparatus for training an image recognition model calculates the first loss function LCE according to the first predicted probability and labeled information corresponding to the first image set. The apparatus for training an image recognition model calculates the second loss function LCon according to at least one second predicted probability and at least one third predicted probability. The apparatus for training an image recognition model calculates the third loss function LMTL according to the third predicted probability and labeled information corresponding to the third image set. In addition, the target loss function further includes an entropy loss function LEnt and a regularization loss function L Reg.
- The entropy loss function LEnt and the regularization loss function L Reg are described below.
- Minimizing the entropy loss function allows the model more certainly to predict a specific class for a particular task, rather than considering that several classes are all possible, entropy representing an expectation of an amount of information for each class.
-
- To avoid overfitting of the model and improve the generalization ability of the model, the regularization loss function may be added to the target loss function. It may be understood that the regularization loss function includes, but is not limited to, an L1 regularization loss function and an L2 regularization loss function. The regularization loss function may be considered as a penalty term of the target loss function.
- Based on the above description, the target loss function in this application may be represented as:
- Secondly, in this embodiment of this application, the specific content of the target loss function is provided, that is, the target loss function includes the first loss function, the second loss function, the third loss function, the entropy loss function, and the regularization loss function. In the foregoing manner, the model is trained in different dimensions by using loss functions of different types, thereby improving the model training accuracy.
- Optionally, based on the embodiment corresponding to
FIG. 3 , in a sixth optional embodiment of the method for training an image recognition model according to the embodiments of this application, the calculating the first loss function according to the first predicted probability and labeled information corresponding to the first image set may include:
calculating the first loss function in the following manner: - In this embodiment, a calculation manner of the first loss function is described. The apparatus for training an image recognition model may calculate a first loss function according to a first predicted probability obtained through prediction and real labeled information corresponding to the first image set, the first loss function being a cross entropy loss function. It may be understood that in an actual application, the first loss function may be alternatively a loss function of another type, and the cross entropy loss function is used as an example herein for description.
calculating the first loss function in the following manner:label 1 represents a benign tumor, alabel 2 represents a malignant tumor, and alabel 3 represents no tumor. It is assumed that a first predicted probability of a first image is (0.1, 0.2, 0.7), that is, a prediction label of the first image is obtained as thelabel 3, which is a label of no tumor. Labeling processing has been performed on the first image, so that labeled information y 0, that is, a real label, may be obtained. It is assumed that the real label is thelabel 3, a probability corresponding to thelabel 3 is (0, 0.1). A distance between distributions of two probabilities is described by using the cross entropy loss function, and a smaller cross entropy indicates that the two probabilities are closer. An obj ective of the model training is to expect that distributions of a predicted probability and a real probability are closer. - Secondly, in this embodiment of this application, the calculation manner of the first loss function is provided. In the foregoing manner, a specific implementation basis is provided for generation of the first loss function, thereby improving the feasibility and operability of the model training.
- Optionally, based on the embodiment corresponding to
FIG. 3 , in a seventh optional embodiment of the method for training an image recognition model according to the embodiments of this application, the calculating the second loss function according to the second predicted probability and the third predicted probability may include: - calculating the second loss function in the following manner:
- calculating the second loss function in the following manner:
- In this embodiment, a calculation manner of the second loss function is described. The apparatus for training an image recognition model may calculate a second loss function according to a second predicted probability and a third predicted probability that are obtained through prediction. The second loss function may be a mean-square error (MSE) loss function or may be a kullback-leibler (KL) divergence loss function. It may be understood that in an actual application, the second loss function may be alternatively a loss function of another type, and the MSE loss function and the KL divergence loss function are used as examples herein for description.
-
-
- A calculation manner of the second predicted probability ps is as follows:
- A calculation manner of the third predicted probability pr is as follows:
- It may be understood that the second predicted probability and the third predicted probability may be outputted in the same training. Therefore, the second predicted probability may be alternatively represented as p 0, and pr represents the third predicted probability. Similarly, the third predicted probability pr is obtained after normalization processing is performed on a predicted value Zr . The second predicted probability and the third predicted probability are alternatively outputted in different times of training. The second loss function may be specifically a consistency loss function, and a smaller second loss function indicates that results of two predictions are closer, that is, an effect of model training is better, and minimizing the second loss function allows two predicted values to be consistent.
- Secondly, in this embodiment of this application, the calculation manner of the second loss function is provided. In the foregoing manner, a specific implementation basis is provided for generation of the second loss function, thereby improving the feasibility and operability of the model training. In addition, an appropriate second loss function may be further selected for calculation according to a requirement, thereby improving the flexibility of the solution.
- Optionally, based on the embodiment corresponding to
FIG. 3 , in an eighth optional embodiment of the method for training an image recognition model according to the embodiments of this application, the calculating the third loss function according to the fourth predicted probability and labeled information corresponding to the third image set includes:
calculating the third loss function in the following manner: - In this embodiment, a calculation manner of the third loss function is described. The apparatus for training an image recognition model may calculate a third loss function according to a third predicted probability obtained through prediction and real labeled information corresponding to the third image set, the third loss function being a cross entropy loss function. It may be understood that in an actual application, the third loss function may be alternatively a loss function of another type, and the cross entropy loss function is used as an example herein for description.
calculating the third loss function in the following manner:label 1 represents that a picture is qualified, and alabel 2 represents that a picture is not qualified. It is assumed that a fourth predicted probability of a third image is (0.2, 0.8), that is, a prediction label of the third image is obtained as thelabel 2, which is a label in which the picture is not qualified. Labeling processing has been performed on the third image, so that labeled information y 1, that is, a real label, may be obtained. It is assumed that the real label is thelabel 1, a probability corresponding to thelabel 1 is (1, 0). A distance between distributions of two probabilities is described by using the cross entropy loss function, and a smaller cross entropy indicates that the two probabilities are closer. An objective of the model training is to expect that distributions of a predicted probability and a real probability are closer. - Secondly, in this embodiment of this application, the calculation manner of the third loss function is provided. In the foregoing manner, a specific implementation basis is provided for generation of the third loss function, thereby improving the feasibility and operability of the model training.
- With reference to the foregoing description, this application further provides an image recognition method. Referring to
FIG. 6 , an embodiment of an image recognition method in this embodiment of this application includes the following steps. - 201. Obtain an image to be recognized.
- In this embodiment, an image recognition apparatus obtains an image to be recognized. The image to be recognized may be an endoscope image or may be a medical image of another type. This is not limited herein. The image recognition apparatus may be deployed in the server or may be deployed in the terminal device. Herein, an example in which the image recognition apparatus is deployed in the terminal device is used for description, but is not to be understood as a limitation to this application.
- 202. Obtain an image recognition result corresponding to the image to be recognized by using an image recognition model, the image recognition model being the image recognition model according to the foregoing embodiments.
- In this embodiment, the image recognition apparatus inputs the image to be recognized into the image recognition model described in the foregoing embodiments, and the image recognition model outputs a corresponding image recognition result.
- 203. Display the image recognition result.
- In this embodiment, the image recognition apparatus may display the image recognition result. For ease of understanding,
FIG. 7 is a schematic diagram of an interface of displaying an image recognition result according to an embodiment of this application. As shown in the figure, an inputted medical image is used as an example, and a doctor may select a corresponding task according to a requirement. It is assumed that a task A, that is, a task of positioning a lesion part is selected, a corresponding result is outputted based on the task A selected by the doctor, for example, a positioned lesion part is "stomach". It is assumed that a task B, that is, a task of detecting a property of a tumor, is selected, a corresponding result is outputted based on the task B selected by the doctor, for example, a property of a tumor is detected as "benign". It is assumed that a task C, that is, a task of a qualified condition of an image, is selected, a corresponding result is outputted based on the task C selected by the doctor, for example, a qualified condition of an image is "qualified". - In this embodiment of this application, the image recognition method is provided, that is, an image to be recognized is obtained first, then the image to be recognized is inputted into a trained image recognition model, the image recognition model outputs an image recognition result, and finally the image recognition result is displayed. In the foregoing manner, when automatic diagnosis is performed by using the image recognition model provided in this application, a recognition result under a corresponding task may be displayed according to a requirement, to assist a doctor in diagnosis, thereby more effectively helping the doctor reduce misdiagnosis and missed diagnosis, especially for a doctor lack of relevant clinical experience.
- The apparatus for training an image recognition model in this application is described in detail below.
FIG. 8 is a schematic diagram of an embodiment of an apparatus for training an image recognition model according to an embodiment of this application. Anapparatus 30 for training an image recognition model includes: - an obtaining
module 301, configured to obtain image sets to be trained, the image sets to be trained including at least a first image set, a second image set, and a third image set, the first image set including at least one first image, the second image set including at least one second image and at least one perturbed image, the third image set including at least one third image, the first image being a labeled image corresponding to a first task, the second image being an unlabeled image corresponding to the first task, the third image being a labeled image corresponding to a second task, the first task and the second task being different tasks; - the obtaining
module 301, further configured to obtain a first predicted probability, a second predicted probability, a third predicted probability, and a fourth predicted probability based on the image sets to be trained by using an image recognition model to be trained, the first predicted probability being a predicted result outputted based on the first image set, the second predicted probability and the third predicted probability being predicted results outputted based on the second image set, and the fourth predicted probability being a predicted result outputted based on the third image set; - a determining
module 302, configured to determine a target loss function according to the first predicted probability, the second predicted probability, the third predicted probability, and the fourth predicted probability that are obtained by the obtainingmodule 301, the target loss function including at least a first loss function, a second loss function, and a third loss function, the first loss function being determined according to the first predicted probability, the second loss function being determined according to the second predicted probability and the third predicted probability, and the third loss function being determined according to the fourth predicted probability; and - a
training module 303, configured to train the image recognition model to be trained according to the target loss function determined by the determiningmodule 302, to obtain an image recognition model. - The embodiments of this application provide an apparatus for training an image recognition model. Image sets to be trained are obtained first, then a first predicted probability, a second predicted probability, a third predicted probability, and a fourth predicted probability are obtained based on the image sets to be trained by using an image recognition model to be trained, subsequently, a target loss function is determined according to the first predicted probability, the second predicted probability, the third predicted probability, and the fourth predicted probability, and finally the image recognition model to be trained is trained based on the target loss function, to obtain an image recognition model. In the foregoing manner, a model is trained by using a labeled medical image for different tasks and an unlabeled medical image together. The labeled image and the unlabeled image are effectively used, so that a requirement for image labeling is reduced and a data volume for training is increased, thereby improving a model prediction effect while saving labeling resources.
- Optionally, based on the embodiment corresponding to
FIG. 8 , in another embodiment of theapparatus 30 for training an image recognition model in this embodiment of this application, the obtainingmodule 301 is further configured to: - obtain the first predicted probability based on the first image set by using the image recognition model to be trained;
- obtain the second predicted probability and the third predicted probability based on the second image set by using the image recognition model to be trained; and
- obtain the fourth predicted probability based on the third image set by using the image recognition model to be trained.
- Secondly, in this embodiment of this application, a method for obtaining the first predicted probability, the second predicted probability, the third predicted probability, and the fourth predicted probability is provided. The second predicted probability and the third predicted probability are obtained based on the second image set by using the semi-supervised learning, and the fourth predicted probability is obtained based on the third image set by using the MTL. In the foregoing manner, training is effectively performed by using unlabeled data, to improve a model effect, and a requirement for labeled data is reduced while a better effect is obtained, thereby reducing product development costs and accelerating a product development cycle. In addition, a plurality of related tasks can be further trained at the same time by using one image recognition model, some parameters of the image recognition model are shared by various tasks, and some other parameters are unique to each task. Shared parameters use all data sets of all tasks, so that a data volume for training is increased, and meanwhile unique noise of each training set is canceled, thereby improving a generalization ability of the model, and reducing overfitting of the model. An independent output layer may select a most relevant feature for a task from a shared part, and learn a unique classification boundary of each task, so that the model has sufficient flexibility, and can obtain relatively high accuracy for an image recognition task.
- Optionally, based on the embodiment corresponding to
FIG. 8 , in another embodiment of theapparatus 30 for training an image recognition model in this embodiment of this application, - the obtaining
module 301 is further configured to obtain a first predicted value based on the at least one first image by using an FC layer included in the image recognition model to be trained; and - perform normalization processing on the first predicted value, to obtain the first predicted probability.
- Secondly, in this embodiment of this application, a method for generating the first predicted probability is provided, that is, first, a first predicted value of the first image is obtained by using an FC layer included in the image recognition model to be trained, and then normalization processing is performed on the first predicted value of the first image, to obtain the first predicted probability of the first image. In the foregoing manner, after normalization processing is performed on a predicted value, a prediction class of a sample can be reflected more intuitively, thereby improving the accuracy of training sample classification and improving the model training efficiency and accuracy.
- Optionally, based on the embodiment corresponding to
FIG. 8 , in another embodiment of theapparatus 30 for training an image recognition model in this embodiment of this application, the obtainingmodule 301 is further configured to: - generate a first perturbed image set according to the at least one second image, the first perturbed image set including at least one first perturbed image, the first perturbed image having a correspondence with the second image, and the first perturbed image belonging to the perturbed image;
- generate a second perturbed image set according to the at least one second image, the second perturbed image set including at least one second perturbed image, the second perturbed image having a correspondence with the second image, and the second perturbed image belonging to the perturbed image;
- obtain the second predicted probability based on the at least one second image and the first perturbed image set by using the image recognition model to be trained; and
- obtain the third predicted probability based on the at least one second image and the second perturbed image set by using the image recognition model to be trained.
- Secondly, in this embodiment of this application, the data processing manner based on semi-supervised learning is provided, that is, two times of random scrambling are performed on a second image, to obtain a first perturbed image and a second perturbed image, and then the second image and each of the first perturbed image and the second perturbed image form two training samples to be inputted into a model, to obtain two predicted probabilities. In the foregoing manner, random scrambling can be effectively performed on an unlabeled image, to obtain images with different perturbed degrees as samples for model training, and manual intervention is not required during random scrambling, thereby improving the model training efficiency. In addition, randomized processing can improve a generalization ability of a model, thereby improving a model training effect. The semi-supervised learning avoids waste of data and resources, and resolves problems that a generalization ability of a model of full supervised learning is not strong and a model of unsupervised learning is inaccurate.
- Optionally, based on the embodiment corresponding to
FIG. 8 , in another embodiment of theapparatus 30 for training an image recognition model in this embodiment of this application, the obtainingmodule 301 is further configured to: - obtain a fourth predicted value based on the at least one third image by using a fully connected layer included in the image recognition model to be trained; and
- perform normalization processing on the fourth predicted value, to obtain the fourth predicted probability.
- Secondly, in this embodiment of this application, a method for generating the fourth predicted probability is provided, that is, first, a second predicted value of the third image is obtained by using the FC layer included in the image recognition model to be trained, and then normalization processing is performed on the second predicted value of the third image, to obtain the fourth predicted probability of the third image. In the foregoing manner, after normalization processing is performed on a predicted value, a prediction class of a sample can be reflected more intuitively, thereby improving the accuracy of training sample classification and improving the model training efficiency and accuracy.
- Optionally, based on the embodiment corresponding to
FIG. 8 , in another embodiment of theapparatus 30 for training an image recognition model in this embodiment of this application, the determiningmodule 302 is further configured to: - calculate the first loss function according to the first predicted probability and labeled information corresponding to the first image set;
- calculate the second loss function according to the second predicted probability and the third predicted probability;
- calculate the third loss function according to the fourth predicted probability and labeled information corresponding to the third image set;
- obtain an entropy loss function and a regularization loss function; and
- obtain the target loss function through calculation according to the first loss function, the second loss function, the third loss function, the entropy loss function, and the regularization loss function.
- Secondly, in this embodiment of this application, the specific content of the target loss function is provided, that is, the target loss function includes the first loss function, the second loss function, the third loss function, the entropy loss function, and the regularization loss function. In the foregoing manner, the model is trained in different dimensions by using loss functions of different types, thereby improving the model training accuracy.
- Optionally, based on the embodiment corresponding to
FIG. 8 , in another embodiment of theapparatus 30 for training an image recognition model in this embodiment of this application, the determiningmodule 302 is further configured to calculate the first loss function in the following manner: - Secondly, in this embodiment of this application, the calculation manner of the first loss function is provided. In the foregoing manner, a specific implementation basis is provided for generation of the first loss function, thereby improving the feasibility and operability of the model training.
- Optionally, based on the embodiment corresponding to
FIG. 8 , in another embodiment of theapparatus 30 for training an image recognition model in this embodiment of this application, the determiningmodule 302 is further configured to: - calculate the second loss function in the following manner:
- calculate the second loss function in the following manner:
- Secondly, in this embodiment of this application, the calculation manner of the second loss function is provided. In the foregoing manner, a specific implementation basis is provided for generation of the second loss function, thereby improving the feasibility and operability of the model training. In addition, an appropriate second loss function may be further selected for calculation according to a requirement, thereby improving the flexibility of the solution.
- Optionally, based on the embodiment corresponding to
FIG. 8 , in another embodiment of theapparatus 30 for training an image recognition model in this embodiment of this application, the determiningmodule 302 is further configured to:
calculate the third loss function in the following manner: - Secondly, in this embodiment of this application, the calculation manner of the third loss function is provided. In the foregoing manner, a specific implementation basis is provided for generation of the third loss function, thereby improving the feasibility and operability of the model training.
- The image recognition apparatus in this application is described below in detail.
FIG. 9 is a schematic diagram of an embodiment of an image recognition apparatus according to an embodiment of this application, and animage recognition apparatus 40 includes: - an obtaining
module 401, configured to obtain an image to be recognized; - the obtaining
module 401, further configured to obtain an image recognition result corresponding to the image to be recognized by using an image recognition model, the image recognition model being the image recognition model provided by the embodiments corresponding toFIG. 3 ; and - a displaying
module 402, configured to display the image recognition result obtained by the obtainingmodule 401. - In this embodiment of this application, an image recognition apparatus is provided, that is, an image to be recognized is obtained first, subsequently, the image to be recognized is inputted into a trained image recognition model, the image recognition model outputs an image recognition result, and finally the image recognition result is displayed. In the foregoing manner, when automatic diagnosis is performed by using the image recognition model provided in this application, a recognition result under a corresponding task may be displayed according to a requirement, to assist a doctor in diagnosis, thereby more effectively helping the doctor reduce misdiagnosis and missed diagnosis, especially for a doctor lack of relevant clinical experience.
- The apparatus for training an image recognition model and the image recognition apparatus provided in this application may be deployed in an electronic device, and the electronic device may be a server or may be a terminal device.
-
FIG. 10 is a schematic structural diagram of a server according to an embodiment of this application. Theserver 500 may vary greatly due to different configurations or performance, and may include one or more central processing units (CPU) 522 (for example, one or more processors) and amemory 532, and one or more storage media 530 (for example, one or more mass storage devices) thatstore application programs 542 ordata 544. Thememory 532 and thestorage media 530 may be temporary storage or persistent storage. A program stored in thestorage media 530 may include one or more modules (which are not marked in the figure), and each module may include a series of instruction operations on the server. Still further, theCPU 522 may be configured to communicate with thestorage medium 530 to perform the series of instruction operations in thestorage medium 530 on theserver 500. - The
server 500 may further include one ormore power supplies 526, one or more wired or wireless network interfaces 550, one or more input/output interfaces 558, and/or one ormore operating systems 541 such as Windows Server™, Mac OS X™, Unix™, Linux™, or FreeBSD™. - The steps performed by the server in the foregoing embodiment may be based on the structure of the server shown in
FIG. 10 . - In this embodiment of this application, the
CPU 522 included in the server further has the following functions: - obtaining image sets to be trained, the image sets to be trained including at least a first image set, a second image set, and a third image set, the first image set including at least one first image, the second image set including at least one second image and at least one perturbed image, the third image set including at least one third image, the first image being a labeled image corresponding to a first task, the second image being an unlabeled image corresponding to the first task, the third image being a labeled image corresponding to a second task, the first task and the second task being different tasks;
- obtaining a first predicted probability, a second predicted probability, a third predicted probability, and a fourth predicted probability based on the image sets to be trained by using an image recognition model to be trained, the first predicted probability being a predicted result outputted based on the first image set, the second predicted probability and the third predicted probability being predicted results outputted based on the second image set, and the fourth predicted probability being a predicted result outputted based on the third image set;
- determining a target loss function according to the first predicted probability, the second predicted probability, the third predicted probability, and the fourth predicted probability, the target loss function including at least a first loss function, a second loss function, and a third loss function, the first loss function being determined according to the first predicted probability, the second loss function being determined according to the second predicted probability and the third predicted probability, and the third loss function being determined according to the fourth predicted probability; and
- training the image recognition model to be trained based on the target loss function, to obtain an image recognition model.
- In this embodiment of this application, the
CPU 522 included in the server further has the following functions: - obtaining an image to be recognized;
- obtaining an image recognition result corresponding to the image to be recognized by using an image recognition model, the image recognition model being the image recognition model according to the embodiments corresponding to
FIG. 3 ; and - displaying the image recognition result.
- This embodiment of this application further provides another apparatus for training an image recognition model and another image recognition apparatus shown in
FIG. 11 . For ease of description, only parts related to this embodiment of this application are shown. For specific technical details that are not disclosed, reference is made to the method part of the embodiments of this application. The terminal device may be any terminal device including a mobile phone, a tablet computer, a personal digital assistant (PDA), a point of sales (POS), an on-board computer, or the like, and the terminal device being a mobile phone is used as an example. -
FIG. 11 is a block diagram of a structure of a part of a mobile phone related to a terminal device according to an embodiment of this application. Referring toFIG. 11 , the mobile phone includes components such as: a radio frequency (RF)circuit 610, amemory 620, aninput unit 630, adisplay unit 640, asensor 650, anaudio circuit 660, a wireless fidelity (WiFi)module 670, aprocessor 680, and apower supply 690. Theinput unit 630 may include atouch panel 631 and anotherinput device 632, thedisplay unit 640 may include adisplay panel 641, and theaudio circuit 660 is connected to aloudspeaker 661 and amicrophone 662. A person skilled in the art can understand that the structure of the mobile phone shown inFIG. 11 does not constitute a limitation to the mobile phone, and the mobile phone may include more components or fewer components than those shown in the figure, or some components may be combined, or a different component deployment may be used. - The
memory 620 may be configured to store a software program and module. Theprocessor 680 runs the software program and module stored in thememory 620, to implement various functional applications and data processing of the mobile phone. Thememory 620 may mainly include a program storage area and a data storage area. The program storage area may store an operating system, an application program required by at least one function (such as a sound playback function and an image display function), and the like. The data storage area may store data (for example, audio data and an address book) created according to the use of the mobile phone, and the like. - The
processor 680 is a control center of the mobile phone, and is connected to various parts of the entire mobile phone by using various interfaces and lines. By running or executing a software program and/or module stored in thememory 620, and invoking data stored in thememory 620, the processor executes various functions of the mobile phone and performs data processing, thereby monitoring the entire mobile phone. - In this embodiment of this application, the
processor 680 included in the terminal device further has the following functions: - obtaining image sets to be trained, the image sets to be trained including at least a first image set, a second image set, and a third image set, the first image set including at least one first image, the second image set including at least one second image and at least one perturbed image, the third image set including at least one third image, the first image being a labeled image corresponding to a first task, the second image being an unlabeled image corresponding to the first task, the third image being a labeled image corresponding to a second task, the first task and the second task being different tasks;
- obtaining a first predicted probability, a second predicted probability, a third predicted probability, and a fourth predicted probability based on the image sets to be trained by using an image recognition model to be trained, the first predicted probability being a predicted result outputted based on the first image set, the second predicted probability and the third predicted probability being predicted results outputted based on the second image set, and the fourth predicted probability being a predicted result outputted based on the third image set;
- determining a target loss function according to the first predicted probability, the second predicted probability, the third predicted probability, and the fourth predicted probability, the target loss function including at least a first loss function, a second loss function, and a third loss function, the first loss function being determined according to the first predicted probability, the second loss function being determined according to the second predicted probability and the third predicted probability, and the third loss function being determined according to the fourth predicted probability; and
- training the image recognition model to be trained based on the target loss function, to obtain an image recognition model.
- In this embodiment of this application, the
processor 680 included in the terminal device further has the following functions: - obtaining an image to be recognized;
- obtaining an image recognition result corresponding to the image to be recognized by using an image recognition model, the image recognition model being the image recognition model according to the embodiments corresponding to
FIG. 3 ; and - displaying the image recognition result.
-
FIG. 12 is a structural diagram of an endoscopemedical diagnosis system 70 according to an implementation of this application. The endoscopemedical diagnosis system 70 in this implementation is a system for supporting an endoscope service. The endoscopemedical diagnosis system 70 has aprobe 701, aprocessor 702, adisplay 703, acircuit 704, and aninterface 705. The endoscopemedical diagnosis system 70 and aterminal device 80 can work cooperatively. Theprobe 701 may be specifically an endoscope probe, and may be inserted into the esophagus, gastrointestinal, bronchial, or the like for real-time scanning imaging. A doctor can clearly identify a tumor growth level and a depth of invasion by using the endoscope probe. In addition, the endoscope probe may be further applied to organ imaging in the vicinity of the intestinal tract, and plays a role in lesion diagnosis of pancreas, bile duct, and gall bladder. - The
processor 702 is configured to recognize an endoscope image captured by theprobe 701 and generate a recognition result. Thedisplay 703 displays a lesion recognition result according to an image signal inputted by theprocessor 702, the lesion recognition result being specifically an image result, and may display an image in real time captured by theprobe 701. Thecircuit 704 is configured to be connected to modules in the endoscopemedical diagnosis system 70 and provide an electrical signal, to enable normal operation inside the endoscopemedical diagnosis system 70 and enable the endoscope medical diagnosis system to establish a communication connection with theterminal device 80. - The endoscope
medical diagnosis system 70 may directly recognize and process an acquired endoscope image, or send an acquired endoscope image to theterminal device 80 by using theinterface 705, and theterminal device 80 recognizes and processes the endoscope image. Theterminal device 80 can make an electronic medical record and a prescription or directly print an electronic medical record and a prescription, or the like based on a lesion recognition result sent by the endoscopemedical diagnosis system 70. - In this embodiment of this application, the
processor 702 included in the endoscope medical diagnosis system further has the following functions: - obtaining image sets to be trained, the image sets to be trained including at least a first image set, a second image set, and a third image set, the first image set including at least one first image, the second image set including at least one second image and at least one perturbed image, the third image set including at least one third image, the first image being a labeled image corresponding to a first task, the second image being an unlabeled image corresponding to the first task, the third image being a labeled image corresponding to a second task, the first task and the second task being different tasks;
- obtaining a first predicted probability, a second predicted probability, a third predicted probability, and a fourth predicted probability based on the image sets to be trained by using an image recognition model to be trained, the first predicted probability being a predicted result outputted based on the first image set, the second predicted probability and the third predicted probability being predicted results outputted based on the second image set, and the fourth predicted probability being a predicted result outputted based on the third image set;
- determining a target loss function according to the first predicted probability, the second predicted probability, the third predicted probability, and the fourth predicted probability, the target loss function including at least a first loss function, a second loss function, and a third loss function, the first loss function being determined according to the first predicted probability, the second loss function being determined according to the second predicted probability and the third predicted probability, and the third loss function being determined according to the fourth predicted probability; and
- training the image recognition model to be trained based on the target loss function, to obtain an image recognition model.
- Optionally, the
processor 702 included in the endoscope medical diagnosis system is further configured to perform the following steps: - obtaining the first predicted probability based on the first image set by using the image recognition model to be trained;
- obtaining the second predicted probability and the third predicted probability based on the second image set by using the image recognition model to be trained; and
- obtaining the fourth predicted probability based on the third image set by using the image recognition model to be trained.
- Optionally, the
processor 702 included in the endoscope medical diagnosis system is further configured to perform the following steps: - obtaining a first predicted value based on the at least one first image by using an FC layer included in the image recognition model to be trained; and
- performing normalization processing on the first predicted value, to obtain the first predicted probability.
- Optionally, the
processor 702 included in the endoscope medical diagnosis system is further configured to perform the following steps: - generating a first perturbed image set according to the at least one second image, the first perturbed image set including at least one first perturbed image, the first perturbed image having a correspondence with the second image, and the first perturbed image belonging to the perturbed image;
- generating a second perturbed image set according to the at least one second image, the second perturbed image set including at least one second perturbed image, the second perturbed image having a correspondence with the second image, and the second perturbed image belonging to the perturbed image;
- obtaining the second predicted probability based on the at least one second image and the first perturbed image set by using the image recognition model to be trained; and
- obtaining the third predicted probability based on the at least one second image and the second perturbed image set by using the image recognition model to be trained.
- Optionally, the
processor 702 included in the endoscope medical diagnosis system is further configured to perform the following steps: - obtaining a fourth predicted value based on the at least one third image by using an FC layer included in the image recognition model to be trained; and
- performing normalization processing on the fourth predicted value, to obtain the fourth predicted probability.
- Optionally, the
processor 702 included in the endoscope medical diagnosis system is further configured to perform the following steps: - calculating the first loss function according to the first predicted probability and labeled information corresponding to the first image set;
- calculating the second loss function according to the second predicted probability and the third predicted probability;
- calculating the third loss function according to the fourth predicted probability and labeled information corresponding to the third image set;
- obtaining an entropy loss function and a regularization loss function; and
- obtaining the target loss function through calculation according to the first loss function, the second loss function, the third loss function, the entropy loss function, and the regularization loss function.
- In this embodiment of this application, the
processor 702 included in the endoscope medical diagnosis system further has the following functions: - obtaining an image to be recognized;
- obtaining an image recognition result corresponding to the image to be recognized by using an image recognition model, the image recognition model being the image recognition model according to the embodiments corresponding to
FIG. 3 ; and - displaying the image recognition result.
- A person skilled in the art can clearly understand that for convenience and conciseness of description, for specific working processes of the foregoing systems, apparatuses and units, reference may be made to the corresponding processes in the foregoing method embodiments, and details are not described herein again.
Claims (16)
- A method for training an image recognition model, executable by an electronic device, the method comprising:obtaining image sets to be trained, the image sets to be trained comprising at least a first image set, a second image set, and a third image set, the first image set comprising at least one first image, the second image set comprising at least one second image and at least one perturbed image, the third image set comprising at least one third image, the first image being a labeled image corresponding to a first task, the second image being an unlabeled image corresponding to the first task, the third image being a labeled image corresponding to a second task, the first task and the second task being different tasks;obtaining a first predicted probability, a second predicted probability, a third predicted probability, and a fourth predicted probability based on the image sets to be trained by using an image recognition model to be trained, the first predicted probability being a predicted result outputted based on the first image set, the second predicted probability and the third predicted probability being predicted results outputted based on the second image set, and the fourth predicted probability being a predicted result outputted based on the third image set;determining a target loss function according to the first predicted probability, the second predicted probability, the third predicted probability, and the fourth predicted probability, the target loss function comprising at least a first loss function determined according to the first predicted probability, a second loss function determined according to the second predicted probability and the third predicted probability, and a third loss function determined according to the fourth predicted probability; andtraining the image recognition model to be trained based on the target loss function, to obtain an image recognition model.
- The method according to claim 1, wherein the obtaining a first predicted probability, a second predicted probability, a third predicted probability, and a fourth predicted probability based on the image sets to be trained by using an image recognition model to be trained comprises:obtaining the first predicted probability based on the first image set by using the image recognition model to be trained;obtaining the second predicted probability and the third predicted probability based on the second image set by using the image recognition model to be trained; andobtaining the fourth predicted probability based on the third image set by using the image recognition model to be trained.
- The method according to claim 2, wherein the obtaining the first predicted probability based on the first image set by using the image recognition model to be trained comprises:obtaining a first predicted value based on the at least one first image by using a fully connected layer comprised in the image recognition model to be trained; andperforming normalization processing on the first predicted value, to obtain the first predicted probability.
- The method according to claim 2, wherein the second image set is generated by:generating a first perturbed image set according to the at least one second image, the first perturbed image set comprising at least one first perturbed image, the first perturbed image having a correspondence with the second image, and the first perturbed image belonging to the perturbed image; andgenerating a second perturbed image set according to the at least one second image, the second perturbed image set comprising at least one second perturbed image, the second perturbed image having a correspondence with the second image, and the second perturbed image belonging to the perturbed image; andwherein the obtaining the second predicted probability and the third predicted probability based on the second image set by using the image recognition model to be trained comprises:obtaining the second predicted probability based on the at least one second image and the first perturbed image set by using the image recognition model to be trained; andobtaining the third predicted probability based on the at least one second image and the second perturbed image set by using the image recognition model to be trained.
- The method according to claim 2, wherein the obtaining the fourth predicted probability based on the third image set by using the image recognition model to be trained comprises:obtaining a fourth predicted value based on the at least one third image by using a fully connected layer comprised in the image recognition model to be trained; andperforming normalization processing on the fourth predicted value, to obtain the fourth predicted probability.
- The method according to claim 1, wherein the determining a target loss function according to the first predicted probability, the second predicted probability, the third predicted probability, and the fourth predicted probability comprises:calculating the first loss function according to the first predicted probability and labeled information corresponding to the first image set;calculating the second loss function according to the second predicted probability and the third predicted probability;calculating the third loss function according to the fourth predicted probability and labeled information corresponding to the third image set;obtaining an entropy loss function and a regularization loss function; andobtaining the target loss function through calculation according to the first loss function, the second loss function, the third loss function, the entropy loss function, and the regularization loss function.
- The method according to claim 6, wherein the calculating the first loss function according to the first predicted probability and labeled information corresponding to the first image set comprises:
calculating the first loss function by: - The method according to claim 6, wherein the calculating the second loss function according to the second predicted probability and the third predicted probability comprises:
- The method according to claim 6, wherein the calculating the third loss function according to the fourth predicted probability and labeled information corresponding to the third image set comprises:
calculating the third loss function by: - An image recognition method, executable by an electronic device, the method comprising:obtaining an image to be recognized;obtaining an image recognition result corresponding to the image to be recognized by using an image recognition model, the image recognition model being the image recognition model trained according to any one of claims 1 to 9; anddisplaying the image recognition result.
- An apparatus for training an image recognition model, comprising:an obtaining module, configured to obtain image sets to be trained, the image sets to be trained comprising at least a first image set, a second image set, and a third image set, the first image set comprising at least one first image, the second image set comprising at least one second image and at least one perturbed image, the third image set comprising at least one third image, the first image being a labeled image corresponding to a first task, the second image being an unlabeled image corresponding to the first task, the third image being a labeled image corresponding to a second task, the first task and the second task being different tasks;the obtaining module, further configured to obtain a first predicted probability, a second predicted probability, a third predicted probability, and a fourth predicted probability based on the image sets to be trained by using an image recognition model to be trained, the first predicted probability being a predicted result outputted based on the first image set, the second predicted probability and the third predicted probability being predicted results outputted based on the second image set, and the fourth predicted probability being a predicted result outputted based on the third image set;a determining module, configured to determine a target loss function according to the first predicted probability, the second predicted probability, the third predicted probability, and the fourth predicted probability, the target loss function comprising at least a first loss function determined according to the first predicted probability, a second loss function determined according to the second predicted probability and the third predicted probability, and a third loss function determined according to the fourth predicted probability; anda training module, configured to train the image recognition model to be trained according to the target loss function determined by the determining module, to obtain an image recognition model.
- An image recognition apparatus, comprising:an obtaining module, configured to obtain an image to be recognized;the obtaining module, further configured to obtain an image recognition result corresponding to the image to be recognized by using an image recognition model, the image recognition model being the image recognition model trained according to any one of claims 1 to 9; anda displaying module, configured to display the image recognition result obtained by the obtaining module.
- An electronic device, comprising: a memory, a transceiver, a processor, and a bus system,the memory being configured to store a program;the processor being configured to execute the program in the memory, to perform the method according to any one of claims 1 to 9, or to perform the method according to claim 10; andthe bus system being configured to connect the memory and the processor, to cause the memory to communicate with the processor.
- An endoscope medical diagnosis system, comprising: a probe, a circuit, a processor, and a display;the circuit being configured to excite the probe to obtain an image to be recognized;the processor being configured to obtain an image recognition result corresponding to the image to be recognized by using an image recognition model, the image recognition model being the image recognition model according to any one of claims 1 to 9; andthe display being configured to display the image recognition result.
- A computer-readable storage medium, comprising instructions, the instructions, when run on a computer, causing the computer to perform the method according to any one of claims 1 to 9 or to perform the method according to claim 10.
- A computer program product, comprising instructions, the instructions, when run on a computer, causing the computer to perform the method according to any one of claims 1 to 9 or to perform the method according to claim 10.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910989262.8A CN110738263B (en) | 2019-10-17 | 2019-10-17 | Image recognition model training method, image recognition method and image recognition device |
PCT/CN2020/116998 WO2021073380A1 (en) | 2019-10-17 | 2020-09-23 | Method for training image recognition model, and method and apparatus for image recognition |
Publications (3)
Publication Number | Publication Date |
---|---|
EP3982292A1 true EP3982292A1 (en) | 2022-04-13 |
EP3982292A4 EP3982292A4 (en) | 2022-08-17 |
EP3982292B1 EP3982292B1 (en) | 2023-08-09 |
Family
ID=69270074
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP20877797.9A Active EP3982292B1 (en) | 2019-10-17 | 2020-09-23 | Method for training image recognition model, and method and apparatus for image recognition |
Country Status (5)
Country | Link |
---|---|
US (2) | US11960571B2 (en) |
EP (1) | EP3982292B1 (en) |
JP (1) | JP7355924B2 (en) |
CN (1) | CN110738263B (en) |
WO (1) | WO2021073380A1 (en) |
Families Citing this family (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110738263B (en) | 2019-10-17 | 2020-12-29 | 腾讯科技(深圳)有限公司 | Image recognition model training method, image recognition method and image recognition device |
US11929060B2 (en) * | 2020-03-04 | 2024-03-12 | Google Llc | Consistency prediction on streaming sequence models |
CN111401445B (en) * | 2020-03-16 | 2023-03-10 | 腾讯科技(深圳)有限公司 | Training method of image recognition model, and image recognition method and device |
CN113469205B (en) * | 2020-03-31 | 2023-01-17 | 阿里巴巴集团控股有限公司 | Data processing method and system, network model and training method thereof, and electronic device |
CN111523597B (en) * | 2020-04-23 | 2023-08-25 | 北京百度网讯科技有限公司 | Target recognition model training method, device, equipment and storage medium |
CN111582342B (en) * | 2020-04-29 | 2022-08-26 | 腾讯科技(深圳)有限公司 | Image identification method, device, equipment and readable storage medium |
CN111598169B (en) * | 2020-05-18 | 2023-04-07 | 腾讯科技(深圳)有限公司 | Model training method, game testing method, simulation operation method and simulation operation device |
CN111738365B (en) * | 2020-08-06 | 2020-12-18 | 腾讯科技(深圳)有限公司 | Image classification model training method and device, computer equipment and storage medium |
CN111898696B (en) * | 2020-08-10 | 2023-10-27 | 腾讯云计算(长沙)有限责任公司 | Pseudo tag and tag prediction model generation method, device, medium and equipment |
CN112199479B (en) * | 2020-09-15 | 2024-08-02 | 北京捷通华声科技股份有限公司 | Method, device, equipment and storage medium for optimizing language semantic understanding model |
CN112001366B (en) * | 2020-09-25 | 2024-07-12 | 珠海微渐安防科技有限公司 | Model training method, face recognition device, equipment and medium |
CN112562069B (en) * | 2020-12-24 | 2023-10-27 | 北京百度网讯科技有限公司 | Method, device, equipment and storage medium for constructing three-dimensional model |
CN112579808B (en) * | 2020-12-29 | 2023-07-18 | 上海赛图默飞医疗科技有限公司 | Data annotation processing method, device and system |
CN113255427B (en) * | 2021-02-09 | 2022-05-27 | 阿里巴巴集团控股有限公司 | Data processing method and device, electronic equipment and storage medium |
CN113011490B (en) * | 2021-03-16 | 2024-03-08 | 北京百度网讯科技有限公司 | Model training method and device and electronic equipment |
CN113255445A (en) * | 2021-04-20 | 2021-08-13 | 杭州飞步科技有限公司 | Multitask model training and image processing method, device, equipment and storage medium |
CN113762585B (en) * | 2021-05-17 | 2023-08-01 | 腾讯科技(深圳)有限公司 | Data processing method, account type identification method and device |
CN113642671B (en) * | 2021-08-27 | 2024-03-05 | 京东科技信息技术有限公司 | Semi-supervised meta learning method and device based on task distribution change |
CN114332558A (en) * | 2021-12-15 | 2022-04-12 | 厦门市美亚柏科信息股份有限公司 | Training method and device for multitask neural network, computing equipment and storage medium |
CN114359904B (en) * | 2022-01-06 | 2023-04-07 | 北京百度网讯科技有限公司 | Image recognition method, image recognition device, electronic equipment and storage medium |
CN114548366A (en) * | 2022-01-12 | 2022-05-27 | 湖南大学 | Deep network model construction method and system based on difficult sample mining |
CN115471717B (en) * | 2022-09-20 | 2023-06-20 | 北京百度网讯科技有限公司 | Semi-supervised training and classifying method device, equipment, medium and product of model |
CN116051486B (en) * | 2022-12-29 | 2024-07-02 | 抖音视界有限公司 | Training method of endoscope image recognition model, image recognition method and device |
CN116403074B (en) * | 2023-04-03 | 2024-05-14 | 上海锡鼎智能科技有限公司 | Semi-automatic image labeling method and device based on active labeling |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015066297A1 (en) * | 2013-10-30 | 2015-05-07 | Worcester Polytechnic Institute | System and method for assessing wound |
CN107122375B (en) * | 2016-12-12 | 2020-11-06 | 南京理工大学 | Image subject identification method based on image features |
US10592779B2 (en) * | 2017-12-21 | 2020-03-17 | International Business Machines Corporation | Generative adversarial network medical image generation for training of a classifier |
CN108986067B (en) * | 2018-05-25 | 2020-08-14 | 上海交通大学 | Cross-modality-based pulmonary nodule detection method |
CN108830300A (en) * | 2018-05-28 | 2018-11-16 | 深圳市唯特视科技有限公司 | A kind of object transmission method based on mixing supervisory detection |
US11487997B2 (en) * | 2018-10-04 | 2022-11-01 | Visa International Service Association | Method, system, and computer program product for local approximation of a predictive model |
CN110163234B (en) | 2018-10-10 | 2023-04-18 | 腾讯科技(深圳)有限公司 | Model training method and device and storage medium |
CN109447065B (en) * | 2018-10-16 | 2020-10-16 | 杭州依图医疗技术有限公司 | Method and device for identifying mammary gland image |
CN110276741B (en) * | 2019-03-08 | 2022-12-16 | 腾讯科技(深圳)有限公司 | Method and device for nodule detection and model training thereof and electronic equipment |
CN109949309B (en) * | 2019-03-18 | 2022-02-11 | 安徽紫薇帝星数字科技有限公司 | Liver CT image segmentation method based on deep learning |
CN110009623B (en) * | 2019-04-10 | 2021-05-11 | 腾讯医疗健康(深圳)有限公司 | Image recognition model training and image recognition method, device and system |
CN110738263B (en) * | 2019-10-17 | 2020-12-29 | 腾讯科技(深圳)有限公司 | Image recognition model training method, image recognition method and image recognition device |
CN110909780B (en) | 2019-11-14 | 2020-11-03 | 腾讯科技(深圳)有限公司 | Image recognition model training and image recognition method, device and system |
-
2019
- 2019-10-17 CN CN201910989262.8A patent/CN110738263B/en active Active
-
2020
- 2020-09-23 WO PCT/CN2020/116998 patent/WO2021073380A1/en unknown
- 2020-09-23 EP EP20877797.9A patent/EP3982292B1/en active Active
- 2020-09-23 JP JP2022515569A patent/JP7355924B2/en active Active
-
2021
- 2021-10-29 US US17/515,312 patent/US11960571B2/en active Active
-
2024
- 2024-02-12 US US18/438,595 patent/US20240184854A1/en active Pending
Also Published As
Publication number | Publication date |
---|---|
US20220051059A1 (en) | 2022-02-17 |
CN110738263A (en) | 2020-01-31 |
WO2021073380A1 (en) | 2021-04-22 |
JP7355924B2 (en) | 2023-10-03 |
CN110738263B (en) | 2020-12-29 |
EP3982292A4 (en) | 2022-08-17 |
JP2022547184A (en) | 2022-11-10 |
EP3982292B1 (en) | 2023-08-09 |
US11960571B2 (en) | 2024-04-16 |
US20240184854A1 (en) | 2024-06-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3982292B1 (en) | Method for training image recognition model, and method and apparatus for image recognition | |
US11813113B2 (en) | Automated extraction of echocardiograph measurements from medical images | |
WO2019150813A1 (en) | Data processing device and method, recognition device, learning data storage device, machine learning device, and program | |
WO2020190851A1 (en) | An explainable ai (xai) platform for computational pathology | |
CN109191451B (en) | Abnormality detection method, apparatus, device, and medium | |
JP2021516090A (en) | Methods and equipment for annotating ultrasonography | |
CN111091127A (en) | Image detection method, network model training method and related device | |
US11682135B2 (en) | Systems and methods for detecting and correcting orientation of a medical image | |
CN113689355B (en) | Image processing method, image processing device, storage medium and computer equipment | |
Hooper et al. | Evaluating semi-supervision methods for medical image segmentation: applications in cardiac magnetic resonance imaging | |
JP2002163635A (en) | System and method for supporting diagnosis of pervasive hepatic disease by utilizing hierarchical neural network on basis of feature amount provided from ultrasonic image of diagnostic part | |
WO2023060735A1 (en) | Image generation model training and image generation methods, apparatus, device, and medium | |
US20230099284A1 (en) | System and method for prognosis management based on medical information of patient | |
US20240358354A1 (en) | Device and method for guiding in ultrasound assessment of an organ | |
CN117994620A (en) | Fusion method, device and equipment of CT image and MR image and storage medium | |
CN118506992A (en) | Ultrasonic image multitasking method, system, device and medium containing prompt | |
Zhu et al. | Evaluating the Role of Large Language Models Detection: A Comparative Analysis of Noninvasive Testing Methods and AI-Generated Diagnoses | |
CN116978549A (en) | Organ disease prediction method, device, equipment and storage medium | |
Dong | Deep Learning Classification of Spinal Osteoporotic Compression Fractures on Radiographs | |
CN117648976A (en) | Answer generation method, device, equipment and storage medium based on medical image | |
KR20240043488A (en) | A multimodal deep learning model for predicting future visual field in glaucoma patients | |
CN118039086A (en) | Medical image automatic interpretation system based on large language model | |
WO2024200344A1 (en) | Imaging protocol repository for automatic protocol update recommendations and a vendor-agnostic system and method for raising protocol-related alerts | |
JP2024007851A (en) | Information processing device, information processing method, and program | |
CN117153408A (en) | Method and device for predicting nodule growth trend, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20220110 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
RAP3 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED |
|
A4 | Supplementary search report drawn up and despatched |
Effective date: 20220714 |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G06V 10/82 20220101ALI20220708BHEP Ipc: G16H 50/70 20180101ALI20220708BHEP Ipc: G16H 30/40 20180101ALI20220708BHEP Ipc: G06V 10/774 20220101ALI20220708BHEP Ipc: G06N 3/08 20060101ALI20220708BHEP Ipc: G16H 50/20 20180101ALI20220708BHEP Ipc: G06K 9/62 20220101AFI20220708BHEP |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R079 Ref document number: 602020015632 Country of ref document: DE Free format text: PREVIOUS MAIN CLASS: G06K0009620000 Ipc: G06N0003088000 Ref country code: DE Ref legal event code: R079 Free format text: PREVIOUS MAIN CLASS: G06K0009620000 Ipc: G06N0003088000 |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: GRANT OF PATENT IS INTENDED |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G06V 10/774 20220101ALI20230416BHEP Ipc: G16H 50/70 20180101ALI20230416BHEP Ipc: G16H 50/20 20180101ALI20230416BHEP Ipc: G16H 30/40 20180101ALI20230416BHEP Ipc: G06V 10/82 20220101ALI20230416BHEP Ipc: G06N 3/088 20230101AFI20230416BHEP |
|
INTG | Intention to grant announced |
Effective date: 20230509 |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE PATENT HAS BEEN GRANTED |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: EP |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: FG4D |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R096 Ref document number: 602020015632 Country of ref document: DE |
|
REG | Reference to a national code |
Ref country code: LT Ref legal event code: MG9D |
|
REG | Reference to a national code |
Ref country code: NL Ref legal event code: MP Effective date: 20230809 |
|
REG | Reference to a national code |
Ref country code: AT Ref legal event code: MK05 Ref document number: 1598352 Country of ref document: AT Kind code of ref document: T Effective date: 20230809 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20231110 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20231209 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230809 Ref country code: RS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230809 Ref country code: PT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20231211 Ref country code: NO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20231109 Ref country code: NL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230809 Ref country code: LV Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230809 Ref country code: LT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230809 Ref country code: IS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20231209 Ref country code: HR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230809 Ref country code: GR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20231110 Ref country code: FI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230809 Ref country code: AT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230809 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: PL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230809 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: ES Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230809 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SM Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230809 Ref country code: RO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230809 Ref country code: ES Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230809 Ref country code: EE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230809 Ref country code: DK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230809 Ref country code: CZ Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230809 Ref country code: SK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230809 |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: PL |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R097 Ref document number: 602020015632 Country of ref document: DE |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LU Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20230923 |
|
REG | Reference to a national code |
Ref country code: BE Ref legal event code: MM Effective date: 20230930 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LU Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20230923 Ref country code: IT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230809 Ref country code: MC Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230809 |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: MM4A |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20230923 |
|
26N | No opposition filed |
Effective date: 20240513 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: CH Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20230930 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20230923 Ref country code: CH Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20230930 Ref country code: SI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230809 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: BE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20230930 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20240919 Year of fee payment: 5 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20240923 Year of fee payment: 5 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20240917 Year of fee payment: 5 |