WO2023000872A1 - 图像特征的监督学习方法、装置、设备及存储介质 - Google Patents

图像特征的监督学习方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2023000872A1
WO2023000872A1 PCT/CN2022/098805 CN2022098805W WO2023000872A1 WO 2023000872 A1 WO2023000872 A1 WO 2023000872A1 CN 2022098805 W CN2022098805 W CN 2022098805W WO 2023000872 A1 WO2023000872 A1 WO 2023000872A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
feature
feature extraction
features
enhanced
Prior art date
Application number
PCT/CN2022/098805
Other languages
English (en)
French (fr)
Inventor
文庆福
杜悦熙
杨森
杨鹏
张军
韩骁
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP22845039.1A priority Critical patent/EP4375857A1/en
Publication of WO2023000872A1 publication Critical patent/WO2023000872A1/zh
Priority to US18/127,657 priority patent/US20230237771A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/771Feature selection, e.g. selecting representative features from a multi-dimensional feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10056Microscopic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30024Cell structures in vitro; Tissue sections in vitro
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images
    • G06V2201/031Recognition of patterns in medical or anatomical images of internal organs

Definitions

  • the embodiments of the present application relate to the field of artificial intelligence, and in particular to a method, device, device and storage medium for supervised learning of image features.
  • CV computer vision
  • image feature extraction directly affects the final image processing results.
  • feature extraction is performed on medical images by training a feature extraction model, and then subsequent image processing procedures are performed based on the extracted image features.
  • a model training method when a supervised learning method is used for model training, the label information of sample medical images is usually used as supervision for model training.
  • Embodiments of the present application provide a supervised learning method, device, device, and storage medium of image features, which can realize self-supervised learning of image features without manual labeling, thereby improving model training efficiency. Described technical scheme is as follows:
  • the embodiment of the present application provides a method for supervised learning of image features, the method is executed by a computer device, and the method includes:
  • the feature extraction model is trained.
  • an embodiment of the present application provides a device for supervised learning of image features, the device comprising:
  • a data enhancement module configured to perform data enhancement on the original medical image to obtain a first enhanced image and a second enhanced image, and the first enhanced image and the second enhanced image are positive samples of each other;
  • a feature extraction module configured to perform feature extraction on the first enhanced image and the second enhanced image through a feature extraction model to obtain the first image feature of the first enhanced image, and the first image feature of the second enhanced image Two image features;
  • a loss determination module configured to determine the model loss of the feature extraction model based on the first image features, the second image features, and negative sample image features, where the negative sample image features are images corresponding to other original medical images feature;
  • the first training module is configured to train the feature extraction model based on the model loss.
  • an embodiment of the present application provides a computer device, the computer device includes a processor and a memory, at least one instruction is stored in the memory, and the at least one instruction is loaded and executed by the processor to implement A supervised learning method for image features as described in the above aspects.
  • an embodiment of the present application provides a computer-readable storage medium, at least one instruction is stored in the readable storage medium, and the at least one instruction is loaded and executed by a processor to implement the above-mentioned Supervised learning methods for image features.
  • an embodiment of the present application provides a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the image feature supervised learning method provided by the above aspect.
  • the first enhanced image and the second enhanced image which are mutually positive samples are obtained, and feature extraction is performed through the feature extraction model to obtain the first image feature and the second image feature , and then take other original medical images different from the original medical image as negative samples, and determine the model loss of the feature extraction model based on the first image feature, the second image feature and the negative sample image feature, and finally use the model loss to train the feature extraction model ;
  • the feature extraction model learns the image features of medical images by using self-supervised learning, without manual medical image labeling, which reduces the cost of manual labeling in the model training process and improves the training efficiency of the feature extraction model.
  • Fig. 1 is the schematic diagram of the supervised learning method of the image feature shown in an exemplary embodiment of the present application
  • Fig. 2 is an implementation schematic diagram of a medical image classification scenario shown in an exemplary embodiment of the present application
  • Fig. 3 is an implementation schematic diagram of a medical image retrieval scene shown in an exemplary embodiment of the present application
  • Fig. 4 is the flowchart of the supervised learning method of the image feature that an exemplary embodiment of the present application provides;
  • Fig. 5 is a medical image showing mutual positive samples in an exemplary embodiment
  • FIG. 6 is a flowchart of a supervised learning method for image features provided by another exemplary embodiment of the present application.
  • Fig. 7 is a schematic diagram of the implementation of the image feature self-supervised learning process shown in an exemplary embodiment of the present application.
  • Fig. 8 is a schematic diagram of a multi-global description sub-network shown in an exemplary embodiment of the present application.
  • Fig. 9 is a flowchart of a model loss determination process shown in an exemplary embodiment of the present application.
  • FIG. 10 is a flowchart of a supervised learning method for image features provided by another exemplary embodiment of the present application.
  • Fig. 11 is a schematic diagram of valid samples and invalid samples shown in an exemplary embodiment of the present application.
  • Fig. 12 is an implementation schematic diagram of a process of weighted summation of multi-image features shown in an exemplary embodiment of the present application;
  • Fig. 13 is a schematic structural diagram of a computer device provided by an exemplary embodiment of the present application.
  • Fig. 14 is a structural block diagram of an apparatus for supervised learning of image features provided by an exemplary embodiment of the present application.
  • Computer vision is a science that studies how to make machines "see”. To put it further, it refers to using cameras and computers instead of human eyes to identify, track and measure targets, and further graphics processing, so that computer processing It becomes an image that is more suitable for human eyes to observe or sent to the instrument for detection. As a scientific discipline, computer vision studies related theories and technologies, trying to build artificial intelligence systems that can obtain information from images or multidimensional data.
  • Computer vision technology usually includes image processing, image recognition, image semantic understanding, image retrieval, optical character recognition (Optical Character Recognition, OCR), video processing, video semantic understanding, video content/behavior recognition, 3D object reconstruction, 3D technology, virtual Reality, augmented reality, simultaneous positioning and map construction, autonomous driving, smart transportation and other technologies, as well as common biometric recognition technologies such as face recognition and fingerprint recognition.
  • OCR optical Character Recognition
  • OCR optical Character Recognition
  • video processing video semantic understanding
  • video content/behavior recognition 3D object reconstruction
  • 3D technology virtual Reality, augmented reality, simultaneous positioning and map construction
  • autonomous driving smart transportation and other technologies
  • biometric recognition technologies such as face recognition and fingerprint recognition.
  • Image feature extraction is an important part of realizing specific functions, and the quality of the extracted image features will directly affect the quality of function realization. For example, when performing image recognition, the extracted high-quality image features help to improve the accuracy of subsequent image recognition; when performing image retrieval, the extracted high-quality image features help to improve the comprehensiveness of retrieval results, and Reduce the probability of retrieving irrelevant results.
  • a supervised model training method is usually used to train the feature extraction model, so as to use the trained feature extraction model for image feature extraction, and before performing supervised model training, it is necessary to prepare a large number of sample images containing label information in advance , so that the subsequent model training can be supervised by labeling information.
  • the sample images used for model training need to contain type labels
  • the sample images used for model training need Contains object segmentation information.
  • manual labeling of sample images takes a lot of time, and the cost of labeling is high, resulting in low training efficiency of feature extraction models.
  • the embodiment of the present application provides a supervised learning method for image features.
  • computer equipment uses data augmentation technology, based on the original medical image 11, the first enhanced image 12 and the second enhanced image 13 that are mutually positive samples are obtained, and other original medical images 14 different from the original medical image 11 are used as negative samples, so that based on the first enhanced image 12 , the image features of the second enhanced image 13 and the negative sample (including the first image feature 15, the second image feature 16 and the negative sample image feature 17) determine the model loss 18, and then use the model loss 18 to train the feature extraction model 19.
  • computer equipment can realize self-supervised feature learning only with the help of original medical images, which helps to reduce the cost of sample preparation in the early stage of model training and improve the efficiency of model training.
  • the feature extraction model trained by the scheme provided in the embodiment of the present application can be used to extract image features of medical images, and the extracted image features can be used for tasks such as medical image classification and similar medical image retrieval.
  • the image feature 23 of the medical image 21 to be classified can be obtained, and the image feature 23 is input into the pre-trained
  • the trained classifier 24 performs image classification according to the image features 23, and finally outputs the classification label 25 corresponding to the medical image 21 to be classified.
  • the computer device in the offline data processing stage, first divides the whole-view digital slice (Whole Slide Image, WSI) 301 to obtain a small-sized medical image 302, and uses The pre-trained feature extraction model 303 performs feature extraction on each medical image 302 to obtain image features 304 of each medical image 302 , and constructs a medical image feature database based on the image features 304 .
  • the pre-trained feature extraction model 303 performs feature extraction on each medical image 302 to obtain image features 304 of each medical image 302 , and constructs a medical image feature database based on the image features 304 .
  • the user selects the retrieval area of the WSI 305 to be retrieved, obtains the medical image 306 to be retrieved, and uses the pre-trained feature extraction model 303 to perform feature extraction on the medical image 306 to be retrieved, and obtains the image feature 307 to be retrieved, so that the The image feature 307 performs feature matching with the image feature 304 in the medical image feature database, and then the medical image 302 whose feature matching degree is higher than a threshold is determined as a similar image 308 .
  • the supervised learning method of image features provided by the embodiment of the present application can be used for computer equipment for feature extraction model training, and the computer equipment can be a personal computer, workstation, physical server or cloud server, etc.
  • the following Various embodiments are described by taking the method executed by a computer device as an example.
  • Fig. 4 shows a flowchart of a method for supervised learning of image features provided by an exemplary embodiment of the present application. This embodiment is described by taking the method executed by a computer device as an example, and the method includes the following steps.
  • step 401 data enhancement is performed on the original medical image to obtain a first enhanced image and a second enhanced image, and the first enhanced image and the second enhanced image are mutually positive samples.
  • the goal of using self-supervised learning for feature extraction model training is to reduce the distance of similar medical images in the feature encoding space, and increase the distance of dissimilar images in the feature encoding space, so that the model has the ability to distinguish image similarity, so in the model How to determine the similarity between input images during the training process and guide them correctly based on the model has become the key to self-supervised learning.
  • a first enhanced image and a second enhanced image that are similar but not identical are obtained.
  • the first enhanced image and the second enhanced image Image features are highly similar, but not completely consistent.
  • the computer device can analyze the color (because the medical image is a stained microscopic tissue section sample, so there may be differences in the degree of staining) and the direction (because the tissue section is in the microscope). The lower may be any direction angle, so the medical image is not sensitive to the display direction)
  • color enhancement is used to change the brightness and darkness of the image, so as to enhance the robustness of the image in the color gamut.
  • Orientation enhancement is used to change the angle or orientation of the image, thereby reducing the sensitivity to the orientation of the image display.
  • the method for a computer device to perform color enhancement on an image can be described as: I c ⁇ a c ⁇ I c +b c ,where Among them, Ic represents the brightness of each pixel in the original medical image, ac and bc are adjustment coefficients, is the value range of the adjustment coefficient.
  • the computer device When the computer device performs direction enhancement on the image, it may perform random angle rotation, random flip mirroring and other processing on the original medical image, which is not limited in this embodiment.
  • the computer device performs color enhancement and direction enhancement on the original medical image based on the first enhancement parameter to obtain the first enhanced image, and performs color enhancement and direction enhancement on the original medical image through the second enhancement parameter to obtain the second To enhance an image, the first enhancement parameter and the second enhancement parameter are different.
  • the computer device may also perform data enhancement on the image from other dimensions, which is not limited in this embodiment of the present application.
  • the computer device determines that the distance between the two medical images is smaller than the distance threshold.
  • the images are mutually positive samples, further increasing the number of positive samples.
  • the distance threshold is related to the resolution of the medical image, for example, under 10 times magnification, the distance threshold is 100 pixels.
  • the computer device determines that in the same WSI, the first medical image 51 of the first region and the second medical image 52 of the second region are mutually positive samples.
  • Step 402 perform feature extraction on the first enhanced image and the second enhanced image through the feature extraction model to obtain the first image features of the first enhanced image and the second image features of the second enhanced image.
  • the computer device inputs the first enhanced image and the second enhanced image into the feature extraction model, and the feature extraction model performs feature extraction to obtain the first image feature and the second image feature.
  • the first image feature and the second image feature are represented by a feature map (featuremap)
  • the feature extraction model can be a model based on a residual network (ResNet), ResNeXt, and a visual transformer (Vision Transformer, ViT),
  • ResNet residual network
  • ResNeXt ResNeXt
  • VCT visual transformer
  • Step 403 Determine the model loss of the feature extraction model based on the first image feature, the second image feature and the negative sample image feature, where the negative sample image feature is the image feature corresponding to other original medical images.
  • the computer device in addition to using the first enhanced image and the second enhanced image that are mutually positive samples, the computer device also needs to introduce negative samples that are not similar to the first enhanced image and the second enhanced image, so that the feature extraction model It can learn the difference of image features between dissimilar images.
  • the computer device takes other original medical images different from the current original medical image as negative samples of the current original medical image, and then uses other original medical images or the enhanced images corresponding to the other original medical images as the first Negative samples for an augmented image and a second augmented image.
  • the negative sample image features are image features extracted from other original medical images. If the enhanced images corresponding to other original medical images are used as negative samples of the first augmented image and the second augmented image, then the negative sample image features are image features extracted from the augmented images corresponding to other original medical images.
  • the methods of generating enhanced images corresponding to other original medical images can also adopt the methods of color enhancement and direction enhancement described above, which are not limited in this application.
  • the current original medical image and other original medical images are different images belonging to the same training batch, and the other original medical images undergo data enhancement and feature extraction before the current original medical image.
  • the computer device determines the feature extraction based on the feature difference between the first image feature and the second image feature, and the feature difference between the first image feature (or the second image feature) and the negative sample image feature.
  • Model loss wherein, the feature difference between image features may be represented by feature distance, and the feature distance may be Euclidean distance, Manhattan distance, cosine distance, etc., which is not limited in this embodiment.
  • Step 404 based on the model loss, train the feature extraction model.
  • the computer equipment takes minimizing the model loss as the training goal, that is, reducing the feature difference between the first image feature and the second image feature, and expanding the difference between the first image feature (or the second image feature) and the negative sample image feature.
  • the feature difference is used to train the feature extraction model until the training completion condition is met.
  • the training completion condition includes at least one of loss convergence or reaching the number of training rounds.
  • the first enhanced image and the second enhanced image which are mutually positive samples, are obtained by data enhancement of the original medical image, and feature extraction is performed through the feature extraction model to obtain the first image feature and the second image features, and then use other original medical images different from the original medical image as negative samples, and determine the model loss of the feature extraction model based on the first image features, second image features and negative sample image features, and finally use the model Loss training feature extraction model; in the whole process, the self-supervised learning method is used to enable the feature extraction model to learn the image features of medical images, without manual medical image labeling, which reduces the cost of manual labeling in the model training process and improves the feature extraction model. training efficiency.
  • the feature extraction model in the embodiment of the present application includes Two feature extraction branches, thereby using different feature extraction branches to perform feature extraction on different enhanced images, wherein different feature extraction branches use feature extraction networks with different parameters (that is, the weights of the feature extraction networks are not shared).
  • the following uses an exemplary embodiment for description.
  • Fig. 6 shows a flowchart of a method for supervised learning of image features provided by another exemplary embodiment of the present application. This embodiment is described by taking the method executed by a computer device as an example, and the method includes the following steps.
  • Step 601 perform data enhancement on the original medical image to obtain a first enhanced image and a second enhanced image, and the first enhanced image and the second enhanced image are mutually positive samples.
  • a first enhanced image 702 and a second enhanced image 703 are respectively obtained.
  • Step 602 perform feature extraction on the first enhanced image through a first feature extraction branch to obtain first image features, and the first feature extraction branch includes a first feature extraction network.
  • the first enhanced image of the first feature extraction branch is input, and feature extraction is performed through a first feature extraction network of the first feature extraction branch to obtain first image features.
  • the computer equipment will perform pooling processing on the extracted image features.
  • Common pooling methods include the largest Pooling, average pooling, etc.
  • maximum pooling focuses on the maximum value in the pooling area
  • average pooling focuses on the average value in the pooling area. Therefore, in order to improve the feature expression of image features, in a possible implementation, after the feature extraction network is connected with a multiple global descriptor (Multiple Global Descriptor, MGD) network, the multiple global descriptor network is used for image features under different descriptors (corresponding to different pooling way) to aggregate and output.
  • MGD Multiple Global Descriptor
  • the computer device inputs the first enhanced image into the first feature extraction network to obtain the first intermediate image features output by the network.
  • the computer device inputs the first enhanced image 702 into the first feature extraction branch, and the first feature extraction network 704 therein performs feature extraction to obtain the first intermediate image features.
  • the multi-global description sub-network is composed of at least two pooling layers, and different pooling layers correspond to different pooling processing methods.
  • the computer device After the feature extraction is completed, the computer device performs pooling processing on the features of the first intermediate image through at least two pooling layers to obtain at least two kinds of first global descriptors.
  • the pooling layer may include a global average pooling (Global Average Pooling, GAP) layer, a global maximum pooling (Global Maximum Pooling, GMP) layer and a general average pooling (General Average Pooling, GeAP) layer of at least two.
  • GAP Global Average Pooling
  • GMP Global Maximum Pooling
  • GeAP General Average Pooling
  • the computer device may also use other pooling methods to perform pooling processing on the intermediate image features, which is not limited in this embodiment.
  • a GAP layer 801, a GMP layer 802, and a GeAP layer 803 are set in the multi-global description sub-network, and the intermediate image features 804 output by the feature extraction network are respectively input into the GAP layer 801, the GMP layer 802, and the GeAP layer.
  • the layer 803 three kinds of global descriptors 805 after different pooling processes are obtained.
  • the dimension of the intermediate image feature is (N, C, H, W)
  • N is the number of enhanced images
  • C is the number of channels
  • H is the feature map Height
  • W is the feature map width.
  • each pooling layer is connected with a multilayer perceptron (Multilayer Perceptron, MLP), and the computer device further processes the first global descriptor obtained after the pooling process through the MLP, and processes the processed At least two first global descriptors are cascaded, and finally the concatenated first global descriptors are input into the MLP to obtain the first image features of the first enhanced image.
  • MLP Multilayer Perceptron
  • the computer device inputs the global descriptor 805 into the MLP 806, and performs cascading processing on the output results of each MLP 806 to obtain the cascaded global descriptor 807, and finally the cascaded global descriptor 807 is obtained through the MLP 808.
  • the descriptor 807 performs processing to obtain the first image feature 809 .
  • Step 603 performing feature extraction on the second enhanced image through a second feature extraction branch to obtain second image features, and the second feature extraction branch includes a second feature extraction network.
  • the second feature extraction branch includes a second feature extraction network and a multi-global descriptor network, where the weights of the second feature extraction network and the first feature extraction network do not share, and the two feature Multiple global descriptor sub-networks in the extraction branch are consistent.
  • this step may include the following steps:
  • the computer device inputs the second enhanced image into the second feature extraction network to obtain the second intermediate image features output by the network.
  • the computer device After the feature extraction is completed, the computer device performs pooling processing on the features of the second intermediate image through at least two pooling layers to obtain at least two kinds of second global descriptors.
  • the computer device further processes the second global descriptor obtained after the pooling process through the MLP, and performs cascading processing on the processed at least two second global descriptors, and finally inputs the cascaded second global descriptor MLP, obtaining a second image feature of the second enhanced image.
  • step 602 For the feature extraction process using the second feature extraction branch, reference may be made to step 602, which will not be repeated in this embodiment.
  • the computer device performs feature extraction on the first enhanced image 702 through the first feature extraction network 704 and the multi-global description sub-network 705 in the first feature extraction branch to obtain the first image features; through The second feature extraction network 706 and the multi-global description sub-network 705 in the second feature extraction branch perform feature extraction on the second enhanced image 703 to obtain second image features.
  • Step 604 based on the first image feature, the second image feature and the negative sample image feature, determine the model loss of the feature extraction model.
  • the model loss of the feature extraction model includes distance loss, and the distance loss is determined by The positive sample feature distance and the negative sample feature distance are determined.
  • the positive sample feature distance is the feature distance between the first image feature and the second image feature
  • the negative sample feature distance is the feature distance between the first image feature (or the second image feature) and the negative sample image feature
  • the positive sample feature distance is positively correlated with the distance loss
  • the negative sample feature distance is negatively correlated with the distance loss.
  • model training based on model loss including distance loss can reduce the feature distance between similar images and expand the feature distance between dissimilar images, there is a problem that the number of positive samples is too small, that is, each original medical image is viewed as As an independent class, model training based on distance loss increases the distance between all classes. However, only increasing the distance between samples will cause learning difficulties and even bring about the problem of false negative samples.
  • the model loss of the feature extraction model includes not only distance loss but also clustering loss, so as to generate better cohesion of similar images.
  • the computer device clusters the first image features corresponding to the original medical images in the current training batch to obtain k centroids of the first clusters, where k is an integer greater than or equal to 2 ;
  • the second image features corresponding to each original medical image in the current training batch are clustered to obtain k second cluster centroids; based on the distance between the first image feature and the k second cluster centroids, and The distance between the second image feature and the k first cluster centroids determines the clustering loss.
  • the computer equipment can use clustering algorithms such as K-Means (K-Means) clustering and mean shift clustering to determine the centroid of the cluster, and the centroid of the cluster can be represented by the average feature of each image feature of the same cluster. This is not limited.
  • K-Means K-Means clustering and mean shift clustering
  • the computer device since there is a certain adversarial relationship between the distance loss and the clustering loss, directly using the first image features and the second image features for clustering may cause learning difficulties in the subsequent training process.
  • the computer device generates the first target feature and the second target feature corresponding to the original medical image based on the first image feature, and generates the first target feature and the second target feature corresponding to the original medical image based on the second image feature.
  • the three target features and the fourth target feature thus using different target features to determine the distance loss and clustering loss.
  • each feature extraction branch further includes a first MLP and a second MLP.
  • the computer device inputs the first image feature into the first MLP and the second MLP respectively to obtain the first target feature and the second target feature; input the second image feature into the first MLP and the second MLP respectively to obtain the third target feature and the second target feature Four target features.
  • the computer device processes the first image feature through the first MLP 707, outputs the first target feature 709, processes the first image feature through the second MLP 708, and outputs the second target feature 710.
  • the computer device processes the second image feature through the first MLP 707, outputs the third target feature 711, processes the second image feature through the second MLP 708, and outputs the fourth target feature 712.
  • the process of determining the model loss may include the following sub-steps.
  • Step 604A based on the feature distance between the first image feature and the second image feature, and the feature distance between the first image feature and the negative sample image feature, determine the distance loss.
  • the computer device uses the target features output by the same MLP to determine the distance loss. In some embodiments, the computer device determines the distance loss based on the feature distance between the first object feature and the third object feature, and the feature distance between the first object feature (or the second object feature) and the negative sample image feature .
  • the computer device calculates the distance loss through infoNCE, and the distance loss can be expressed as:
  • l is the number of negative sample image features
  • f(x 1 ) represents the first target feature
  • f(x 2 ) represents the third target feature
  • m i is the i-th negative sample image feature
  • t is the smoothness of the control loss function hyperparameters.
  • the computer device calculates a distance loss 714 based on the first target feature 709 , the third target feature 711 and the negative sample image feature 713 .
  • Step 604B respectively cluster the first image features and the second image features corresponding to each original medical image in the current training batch, and determine the clustering loss based on the clustering results.
  • determining the clustering loss may include the following steps:
  • the computer device clusters the second target features corresponding to each original medical image in the current training batch to obtain k clusters, and determines the third class based on the second target features in each cluster cluster centroid.
  • the computer device clusters the second target features 710 corresponding to N original medical images to obtain k centroids 715 of the third class clusters.
  • the computer device clusters the fourth target features corresponding to each original medical image in the current training batch to obtain k clusters, and determines the fourth class based on the fourth target features in each cluster cluster centroid.
  • the computer device clusters the fourth target features 712 corresponding to N original medical images to obtain k centroids 716 of fourth class clusters.
  • the clustering loss includes the infoNCE between the cluster centroid corresponding to the first enhanced image and the target feature corresponding to the second enhanced image, and the infoNCE between the cluster centroid corresponding to the second enhanced image and the first enhanced image infoNCE (i.e. symmetric loss) between target features.
  • the computer equipment takes the cluster centroid of the cluster to which the target feature belongs as a positive sample, and uses the cluster centroid of other clusters as a negative sample to determine the distance between the target feature and the cluster centroid, and then obtain the clustering loss.
  • the clustering loss can be expressed as:
  • k is the number of centroids of the fourth type of cluster
  • f(x 1 ) represents the first target feature
  • C(x 2 ) represents the centroid of the fourth type of cluster
  • f(x 1 ) ⁇ C(x 2 ) j represents the first
  • the target feature belongs to the cluster corresponding to the centroid of the jth fourth cluster
  • t is a hyperparameter controlling the smoothness of the loss function.
  • the computer device calculates a clustering loss 717 based on the first target feature 709 , the third target feature 711 , the third cluster centroid 715 and the fourth cluster centroid 716 .
  • Step 604C determine model loss according to distance loss and clustering loss.
  • the computer device determines the sum of distance loss 714 and clustering loss 717 as model loss 718 .
  • the distance loss and the clustering loss can also be weighted and summed to obtain the model loss, so as to flexibly adjust the respective weight ratios of the distance loss and the clustering loss.
  • the computer device updates the negative sample image features based on the first image feature and the second image feature, ensuring that the negative sample image feature queue contains the image features of the last input original image.
  • the computer device updates the negative sample image feature 713 based on the first target feature 709 and the third target feature 711 .
  • Step 605 based on the model loss, train the first feature extraction network through the backpropagation algorithm.
  • the network parameters of the first feature extraction network participate in the gradient return, and the network parameters of the second feature extraction network do not participate in the gradient return, but are determined by the first feature extraction network
  • the network parameters of are updated. Therefore, when performing model training based on model loss, the computer device adjusts the network parameters of the first feature extraction network through a backpropagation algorithm, and completes a round of training for the feature extraction network.
  • the computer device updates the parameters of the first feature extraction network 704 based on the model loss 718 .
  • Step 606 based on the network parameters of the first feature extraction network after training, update the network parameters of the second feature extraction network.
  • the computer device After completing the training of the first feature extraction network, the computer device further updates the network parameters of the second feature extraction network according to the network parameters of the trained first feature extraction network.
  • the computer device may update the network parameters of the second feature extraction network based on the network parameters of the first feature extraction network in a sliding average manner, wherein the sliding average process may be expressed as:
  • ⁇ B is the network parameter of the second feature extraction network
  • ⁇ A is the network parameter of the first feature extraction network
  • m is the control parameter.
  • the computer device updates the network parameters of the second feature extraction network 706 through a sliding average based on the updated network parameters of the first feature extraction network 704 .
  • the computer device determines the distance loss based on the feature distance between the positive sample image features and the feature distance between the positive sample image features and the negative sample image features, so that the feature extraction network can learn similar images during the training process
  • the similarity of features between images, and the difference of features between dissimilar images at the same time, by clustering the image features, and determining the clustering loss based on the distance between the image features and the centroids of each cluster after clustering, there is It helps to improve the cohesion between similar images, thereby improving the feature extraction quality of the trained feature extraction network.
  • multiple global descriptor networks are used to aggregate and represent multiple global descriptors, which improves the feature expression of image features and helps to improve the quality of subsequent training.
  • two MLPs are used to process the image features to obtain two target features for the same enhanced image, and then the target features are used to cluster and determine the clustering loss, so as to avoid directly using the image features to determine the clustering loss , due to the confrontation between clustering loss and distance loss, it is difficult to train.
  • the method further includes the following steps:
  • Step 4001 Segment the WSI at the target magnification to obtain segmented images.
  • WSI WSI format
  • WSI WSI format
  • image size of each segmented image is the same, and conforms to the image output size of the feature extraction model.
  • the computer device segments the WSI at the target magnification to obtain segmented images.
  • Step 4002 Screen the segmented images based on the amount of image information to obtain original medical images.
  • the computer equipment needs to filter the segmented images according to the amount of image information.
  • the segmented image with low image data volume is filtered, and the original medical image is finally obtained.
  • the first segmented image 1101 is an invalid sample to be filtered
  • the second segmented image 1102 is a valid sample to be retained.
  • the computer device trains a magnification prediction model based on original medical images of different magnifications, and the magnification prediction model is used to predict the magnification of the input image.
  • the computer device trains a magnification prediction model with the magnification corresponding to the original medical image as supervision, and the trained magnification prediction model is used to output probabilities of various magnifications. For example, when the magnification of the medical image includes 10 times, 20 times and 40 times, if the output result of the magnification prediction model is 0.01, 0.95, 0.04, it means that the probability of the magnification of the input medical image is 10 times is 0.01, and the magnification is 20x has a probability of 0.95, and a magnification of 40x has a probability of 0.04.
  • the prediction results of the magnification prediction model are used for feature fusion of the image features extracted by different feature extraction models.
  • the feature extraction model corresponding to different magnifications is used to extract the features of the medical image, and then based on the predicted probability.
  • the image features extracted by the extraction model are subjected to feature fusion (such as feature weighted summation), so that subsequent processing can be performed based on the fused image features.
  • the computer equipment performs feature extraction on a medical image 1201 through a first feature extraction model 1202, a second feature extraction model 1203, and a third feature extraction model 1204 (corresponding to different magnifications), and through The magnification prediction model 1205 predicts the magnification of the medical image 1201 , so that based on the magnification prediction result, the image features output by the three feature extraction models are weighted and summed to obtain the target image feature 1206 .
  • the computer device 1300 includes a central processing unit (Central Processing Unit, CPU) 1301, a system memory 1304 including a random access memory 1302 and a read-only memory 1303, and a system connecting the system memory 1304 and the central processing unit 1301 Bus 1305.
  • the computer device 1300 also includes a basic input/output system (Input/Output, I/O system) 1306 that helps to transmit information between various devices in the computer, and is used to store an operating system 1313, an application program 1314 and other program modules 1315 of the mass storage device 1307.
  • I/O system Basic input/output system
  • the basic input/output system 1306 includes a display 1308 for displaying information and input devices 1309 such as a mouse and a keyboard for users to input information. Both the display 1308 and the input device 1309 are connected to the central processing unit 1301 through the input and output controller 1310 connected to the system bus 1305 .
  • the basic input/output system 1306 may also include an input-output controller 1310 for receiving and processing input from a keyboard, mouse, or electronic stylus and other devices. Similarly, input output controller 1310 also provides output to a display screen, printer, or other type of output device.
  • the mass storage device 1307 is connected to the central processing unit 1301 through a mass storage controller (not shown) connected to the system bus 1305 .
  • the mass storage device 1307 and its associated computer-readable media provide non-volatile storage for the computer device 1300 . That is, the mass storage device 1307 may include a computer-readable medium (not shown) such as a hard disk or drive.
  • Computer-readable media may comprise computer storage media and communication media.
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
  • Computer storage media include random access memory (RAM, Random Access Memory), read-only memory (ROM, Read Only Memory), flash memory or other solid-state storage technologies, and compact disc (Compact Disc Read-Only Memory, CD-ROM) ), Digital Versatile Disc (DVD) or other optical storage, cassette, tape, magnetic disk storage or other magnetic storage device.
  • RAM Random Access Memory
  • ROM Read Only Memory
  • flash memory or other solid-state storage technologies
  • compact disc Compact Disc Read-Only Memory
  • DVD Digital Versatile Disc
  • the computer storage medium is not limited to the above-mentioned ones.
  • the above-mentioned system memory 1304 and mass storage device 1307 may be collectively referred to as memory.
  • One or more programs are stored in the memory, one or more programs are configured to be executed by one or more central processing units 1301, one or more programs include instructions for implementing the above method, and the central processing unit 1301 executes the one or more Multiple programs implement the methods provided by the above method embodiments.
  • the computer device 1300 can also run on a remote computer connected to the network through a network such as the Internet. That is, the computer device 1300 can be connected to the network 1312 through the network interface unit 1311 connected to the system bus 1305, or in other words, the network interface unit 1311 can also be used to connect to other types of networks or remote computer systems (not shown ).
  • the memory also includes one or more programs, the one or more programs are stored in the memory, and the one or more programs include the steps executed by the computer device in the method provided by the embodiment of the present application .
  • Fig. 14 is a structural block diagram of a device for supervised learning of image features provided by an exemplary embodiment of the present application, the device comprising:
  • a data enhancement module 1401 configured to perform data enhancement on the original medical image to obtain a first enhanced image and a second enhanced image, and the first enhanced image and the second enhanced image are positive samples of each other;
  • Feature extraction module 1402 configured to perform feature extraction on the first enhanced image and the second enhanced image through a feature extraction model, to obtain the first image features of the first enhanced image, and the features of the second enhanced image second image features;
  • the loss determination module 1403 is configured to determine the model loss of the feature extraction model based on the first image feature, the second image feature and the negative sample image feature, and the negative sample image feature is corresponding to other original medical images image features;
  • the first training module 1404 is configured to train the feature extraction model based on the model loss.
  • the feature extraction model includes a first feature extraction branch and a second feature extraction branch, and the first feature extraction branch and the second feature extraction branch use feature extraction networks with different parameters;
  • the feature extraction module 1402 includes:
  • a first extraction unit configured to perform feature extraction on the first enhanced image through the first feature extraction branch to obtain the first image features
  • the second extraction unit is configured to perform feature extraction on the second enhanced image through the second feature extraction branch to obtain the second image features.
  • the first feature extraction branch includes a first feature extraction network and multiple global descriptor networks
  • the second feature extraction branch includes a second feature extraction network and the multiple global descriptor networks
  • the multi-global descriptor network is used to aggregate and output image features under different descriptors
  • the first extraction unit is specifically used for:
  • the second extraction unit is specifically used for:
  • At least two kinds of the second global descriptors are cascaded through the multi-global descriptor network, and the second image features are generated based on the cascaded second global descriptors.
  • the first training module 1404 includes:
  • a first training unit configured to train the first feature extraction network through a backpropagation algorithm based on the model loss
  • the second training unit is configured to update the network parameters of the second feature extraction network based on the network parameters of the first feature extraction network after training.
  • the loss determination module 1403 includes:
  • a first loss determination unit configured to determine a distance based on a feature distance between the first image feature and the second image feature, and a feature distance between the first image feature and the negative sample image feature loss;
  • the second loss determination unit is configured to respectively cluster the first image features and the second image features corresponding to the original medical images in the current training batch, and determine a clustering loss based on the clustering results ;
  • a total loss determining unit configured to determine the model loss according to the distance loss and the clustering loss.
  • the second loss determination unit is configured to:
  • the device also includes:
  • a first generating module configured to generate a first target feature and a second target feature corresponding to the original medical image based on the first image feature
  • a second generating module configured to generate a third target feature and a fourth target feature corresponding to the original medical image based on the second image feature;
  • the first loss determination unit is specifically used for:
  • the second loss determination unit is specifically used for:
  • the first generation module is used for:
  • the second generation module is used for:
  • the data enhancement module 1401 includes:
  • a first enhancement unit configured to perform color enhancement and direction enhancement on the original medical image based on a first enhancement parameter to obtain the first enhanced image
  • the second enhancement unit is configured to perform color enhancement and direction enhancement on the original medical image by using a second enhancement parameter to obtain the second enhanced image, and the first enhancement parameter is different from the second enhancement parameter.
  • the device method also includes:
  • An updating module configured to update the negative sample image features based on the first image features and the second image features.
  • the feature extraction model is used to perform feature extraction on an image at a target magnification
  • the devices include:
  • a segmentation module configured to segment the full field of view digital slice WSI under the target magnification, to obtain segmented images
  • the screening module is configured to screen the segmented images based on image information to obtain the original medical images.
  • the device also includes:
  • the second training module is used to train the magnification prediction model based on the original medical images of different magnifications, the magnification prediction model is used to predict the magnification of the input image, and the prediction result of the magnification prediction model is used for different feature extraction models
  • the extracted image features are subjected to feature fusion.
  • the first enhanced image and the second enhanced image which are mutually positive samples, are obtained by data enhancement of the original medical image, and feature extraction is performed through the feature extraction model to obtain the first image feature and the second image features, and then use other original medical images different from the original medical image as negative samples, and determine the model loss of the feature extraction model based on the first image features, second image features and negative sample image features, and finally use the model Loss training feature extraction model; in the whole process, the self-supervised learning method is used to enable the feature extraction model to learn the image features of medical images, without manual medical image labeling, which reduces the cost of manual labeling in the model training process and improves the feature extraction model. training efficiency.
  • the computer device determines the distance loss based on the feature distance between the positive sample image features and the feature distance between the positive sample image features and the negative sample image features, so that the feature extraction network can learn similar images during the training process
  • the similarity of features between images, and the difference of features between dissimilar images at the same time, by clustering the image features, and determining the clustering loss based on the distance between the image features and the centroids of each cluster after clustering, there is It helps to improve the cohesion between similar images, thereby improving the feature extraction quality of the trained feature extraction network.
  • multiple global descriptor networks are used to aggregate and represent multiple global descriptors, which improves the feature expression of image features and helps to improve the quality of subsequent training.
  • two MLPs are used to process the image features to obtain two target features for the same enhanced image, and then the target features are used to cluster and determine the clustering loss, so as to avoid directly using the image features to determine the clustering loss , due to the confrontation between clustering loss and distance loss, it is difficult to train.
  • the device provided by the above-mentioned embodiment is only illustrated by dividing the above-mentioned functional modules.
  • the above-mentioned function distribution can be completed by different functional modules according to needs, that is, the internal structure of the device is divided into Different functional modules to complete all or part of the functions described above.
  • the device provided by the above embodiment and the method embodiment belong to the same idea, and the implementation process thereof is detailed in the method embodiment, and will not be repeated here.
  • the embodiment of the present application also provides a computer-readable storage medium, at least one instruction is stored in the readable storage medium, at least one instruction is loaded and executed by a processor to implement the supervised learning of image features described in any of the above-mentioned embodiments method.
  • the computer-readable storage medium may include: ROM, RAM, solid state drives (SSD, Solid State Drives) or optical discs, etc.
  • RAM may include resistive random access memory (ReRAM, Resistance Random Access Memory) and dynamic random access memory (DRAM, Dynamic Random Access Memory).
  • An embodiment of the present application provides a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the image feature supervised learning method described in the above embodiments.
  • the program can be stored in a computer-readable storage medium.
  • the above-mentioned The storage medium mentioned may be a read-only memory, a magnetic disk or an optical disk, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Quality & Reliability (AREA)
  • Radiology & Medical Imaging (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Image Analysis (AREA)

Abstract

一种图像特征的监督学习方法、装置、设备及存储介质,涉及人工智能领域。方法包括:对原始医学图像进行数据增强,得到第一增强图像和第二增强图像,第一增强图像和第二增强图像互为正样本(401);通过特征提取模型对第一增强图像和第二增强图像进行特征提取,得到第一增强图像的第一图像特征以及第二增强图像的第二图像特征(402);基于第一图像特征、第二图像特征以及负样本图像特征,确定特征提取模型的模型损失,负样本图像特征为其他原始医学图像对应的图像特征(403);基于模型损失,训练特征提取模型(404)。采用自监督学习方式使特征提取模型学习到医学图像的图像特征,无需人工进行图像标注,提高了模型训练效率。

Description

图像特征的监督学习方法、装置、设备及存储介质
本申请要求于2021年07月22日提交的申请号为202110831737.8、发明名称为“图像特征的监督学习方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及人工智能领域,特别涉及一种图像特征的监督学习方法、装置、设备及存储介质。
背景技术
计算机视觉(Computer Vision,CV)技术作为人工智能领域的一个重要分支,被广泛应用于医学图像识别(识别医学图像中组织器官的类别)、医学图像检索(从数据库中检索详细医学图像)、医学图像分割(对医学图像中的组织结构进行分割)等医学图像处理场景。
图像特征提取作为图像处理过程中的重要环节,直接影响到最终的图像处理结果。相关技术中,通过训练特征提取模型对医学图像进行特征提取,进而基于提取到的图像特征执行后续图像处理流程。在一种模型训练方式中,当采用监督学习方式进行模型训练时,通常利用样本医学图像的标注信息作为监督进行模型训练。
然而,由于模型训练过程中需要使用到大量样本,而对样本医学图像进行人工标注需要花费大量时间,导致模型训练效率较低。
发明内容
本申请实施例提供了一种图像特征的监督学习方法、装置、设备及存储介质,在无需人工标注的情况下能够实现图像特征的自监督学习,从而提高模型训练效率。所述技术方案如下:
一方面,本申请实施例提供了一种图像特征的监督学习方法,所述方法由计算机设备执行,所述方法包括:
对原始医学图像进行数据增强,得到第一增强图像和第二增强图像,所述第一增强图像和所述第二增强图像互为正样本;
通过特征提取模型对所述第一增强图像和所述第二增强图像进行特征提取,得到所述第一增强图像的第一图像特征,以及所述第二增强图像的第二图像特征;
基于所述第一图像特征、所述第二图像特征以及负样本图像特征,确定所述特征提取模型的模型损失,所述负样本图像特征为其他原始医学图像对应的图像特征;
基于所述模型损失,训练所述特征提取模型。
另一方面,本申请实施例提供了一种图像特征的监督学习装置,所述装置包括:
数据增强模块,用于对原始医学图像进行数据增强,得到第一增强图像和第二增强图像,所述第一增强图像和所述第二增强图像互为正样本;
特征提取模块,用于通过特征提取模型对所述第一增强图像和所述第二增强图像进行特征提取,得到所述第一增强图像的第一图像特征,以及所述第二增强图像的第二图像特征;
损失确定模块,用于基于所述第一图像特征、所述第二图像特征以及负样本图像特征,确定所述特征提取模型的模型损失,所述负样本图像特征为其他原始医学图像对应的图像特征;
第一训练模块,用于基于所述模型损失,训练所述特征提取模型。
另一方面,本申请实施例提供了一种计算机设备,所述计算机设备包括处理器和存储器, 所述存储器中存储有至少一条指令,所述至少一条指令由所述处理器加载并执行以实现如上述方面所述的图像特征的监督学习方法。
另一方面,本申请实施例提供了一种计算机可读存储介质,所述可读存储介质中存储有至少一条指令,所述至少一条指令由处理器加载并执行以实现如上述方面所述的图像特征的监督学习方法。
另一方面,本申请实施例提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述方面提供的图像特征的监督学习方法。
本申请实施例中,通过对原始医学图像进行数据增强,得到互为正样本的第一增强图像和第二增强图像,并通过特征提取模型进行特征提取,得到第一图像特征和第二图像特征,进而将不同于该原始医学图像的其他原始医学图像作为负样本,并基于第一图像特征、第二图像特征以及负样本图像特征确定特征提取模型的模型损失,最终利用模型损失训练特征提取模型;整个过程中,采用自监督学习方式使特征提取模型学习到医学图像的图像特征,无需人工进行医学图像标注,降低了模型训练过程中的人工标注成本,提高了特征提取模型的训练效率。
附图说明
图1是本申请一个示例性实施例示出的图像特征的监督学习方法的原理图;
图2是本申请一个示例性实施例示出的医学图像分类场景的实施示意图;
图3是本申请一个示例性实施例示出的医学图像检索场景的实施示意图;
图4是本申请一个示例性实施例提供的图像特征的监督学习方法的流程图;
图5是一个示例性实施例示出的互为正样本的医学图像;
图6是本申请另一个示例性实施例提供的图像特征的监督学习方法的流程图;
图7是本申请一个示例性实施例示出的图像特征自监督学习过程的实施示意图;
图8是本申请一个示例性实施例示出的多全局描述子网络的示意图;
图9是本申请一个示例性实施例示出的模型损失确定过程的流程图;
图10是本申请另一个示例性实施例提供的图像特征的监督学习方法的流程图;
图11是本申请一个示例性实施例示出的有效样本和无效样本的示意图;
图12是本申请一个示例性实施例示出的多图像特征加权求和过程的实施示意图;
图13是本申请一个示例性实施例提供的计算机设备的结构示意图;
图14是本申请一个示例性实施例提供的图像特征的监督学习装置的结构框图。
具体实施方式
计算机视觉是一门研究如何使机器“看”的科学,更进一步的说,就是指用摄影机和电脑代替人眼对目标进行识别、跟踪和测量等机器视觉,并进一步做图形处理,使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科,计算机视觉研究相关的理论和技术,试图建立能够从图像或者多维数据中获取信息的人工智能系统。计算机视觉技术通常包括图像处理、图像识别、图像语义理解、图像检索、光学字符识别(Optical Character Recognition,OCR)、视频处理、视频语义理解、视频内容/行为识别、三维物体重建、3D技术、虚拟现实、增强现实、同步定位与地图构建、自动驾驶、智慧交通等技术,还包括常见的人脸识别、指纹识别等生物特征识别技术。
图像特征提取作为实现具体功能的重要环节,其提取到的图像特征的质量将直接影响到功能的实现质量。比如,当进行图像识别时,提取到的高质量图像特征有助于提高后续图像识别的准确性;当进行图像检索时,提取到的高质量图像特征有助于提高检索结果的全面性,并降低检索到无关结果的概率。
相关技术中,通常采用有监督的模型训练方法训练特征提取模型,从而利用训练得到的特征提取模型进行图像特征提取,而在进行有监督的模型训练前,需要预先准备大量包含标注信息的样本图像,以便后续以标注信息为监督进行模型训练。比如,当特征提取模型与分类模型组合实现图像分类功能时,用于模型训练的样本图像需要包含类型标签;当特征提取模型与分割模型组合实现图像分割功能时,用于模型训练的样本图像需要包含对象分割信息。然而,对样本图像进行人工标注需要花费大量时间,且标注成本较高,导致特征提取模型的训练效率较低。
为了在保证特征提取质量的前提下,降低对人工标注的依赖程度,从而提高模型训练效率,本申请实施例提供了一种图像特征的监督学习方法,如图1所示,计算机设备采用数据增强技术,基于原始医学图像11得到互为正样本的第一增强图像12和第二增强图像13,并将不同于原始医学图像11的其他原始医学图像14作为负样本,从而基于第一增强图像12、第二增强图像13以及负样本的图像特征(包括第一图像特征15、第二图像特征16以及负样本图像特征17)确定模型损失18,进而利用模型损失18对特征提取模型19进行训练。整个模型训练过程中,在不借助人工标注的情况下,计算机设备只需要借助原始医学图像即可实现自监督的特征学习,有助于降低模型训练前期的样本准备成本,提高模型训练效率。
采用本申请实施例提供的方案训练得到特征提取模型,可以用于提取医学图像的图像特征,而提取到的图像特征可以用于医学图像分类、相似医学图像检索等任务。
在一种可能的应用场景下,如图2所示,将待分类医学图像21输入预训练的特征提取模型22后,即可得到待分类医学图像21的图像特征23,将图像特征23输入预训练的分类器24,分类器24即根据图像特征23进行图像分类,最终输出待分类医学图像21对应的分类标签25。
在另一种可能的应用场景下,如图3所示,离线数据处理阶段,计算机设备首先对全视野数字切片(Whole Slide Image,WSI)301进行划分,得到小尺寸的医学图像302,并利用预训练的特征提取模型303对各张医学图像302进行特征提取,得到各张医学图像302的图像特征304,并基于图像特征304构建医学图像特征数据库。在线检索阶段,用户对待检索WSI 305进行检索区域选择,得到待检索医学图像306,并利用预训练的特征提取模型303对待检索医学图像306进行特征提取,得到待检索图像特征307,从而将待检索图像特征307与医学图像特征数据库中的图像特征304进行特征匹配,进而将特征匹配度高于阈值的医学图像302确定为相似图像308。
需要说明的是,上述应用场景仅为示例性说明,采用本申请实施例提供方案训练得到的特征提取模型还可以用于其他利用到图像特征的场景,比如医学图像中的异常组织区域划分等等,本申请实施例对此并不构成限定。
此外,本申请实施例提供的图像特征的监督学习方法可以用于进行特征提取模型训练的计算机设备,该计算机设备可以是个人计算机、工作站、物理服务器或者云服务器等等,为了方便表述,下述各个实施例以方法由计算机设备执行为例进行说明。
图4示出了本申请一个示例性实施例提供的图像特征的监督学习方法的流程图。本实施例以该方法由计算机设备执行为例进行说明,该方法包括如下步骤。
步骤401,对原始医学图像进行数据增强,得到第一增强图像和第二增强图像,第一增强图像和第二增强图像互为正样本。
采用自监督学习方式进行特征提取模型训练的目标是减小相似医学图像在特征编码空间的距离,并增加不相似图像在特征编码空间的距离,使模型具备辨别图像相似性的能力,因此在模型训练过程中如何判定输入图像之间的相似性,并基于模型正确的指引成为了自监督学习的关键。本申请实施例中,通过对原始医学图像进行不同程度或方式的数据增强,得到相似但不完全相同的第一增强图像和第二增强图像,相应的,第一增强图像和第二增强图像 的图像特征具有高度相似性,但是并不完全一致。
在一种可能的实施方式中,结合医学图像的图像特点,计算机设备可以从色彩(因为医学图像是经过染色的显微组织切片样本,因此染色程度可能存在差异)和方向(因为组织切片在显微镜下可能是任何方向角度,因此医学图像对显示方向并不敏感)两方面对数据增强。其中,色彩增强用于改变图像的明暗度,以此增强图像在色彩域上的鲁棒性。方向增强用于改变图像的角度或朝向,以此降低对图像显示方向的敏感度。
在一些实施例中,计算机设备对图像进行色彩增强的方法可以描述为:I c←a c·I c+b c,where
Figure PCTCN2022098805-appb-000001
其中,I c表示原始医学图像中各个像素点的明度,a c和b c均为调节系数,
Figure PCTCN2022098805-appb-000002
为调节系数的数值范围。
计算机设备对图像进行方向增强时,可以对原始医学图像进行随机角度旋转、随机翻转镜像等处理,本实施例对此不作限定。
在一些实施例中,计算机设备基于第一增强参数对原始医学图像进行色彩增强和方向增强,得到第一增强图像,并通过第二增强参数对原始医学图像进行色彩增强和方向增强,得到第二增强图像,第一增强参数和第二增强参数不同。
比如,第一增强参数中a c=0.9,b c=-5,旋转角度为+25°,第二增强参数中,a c=1.05,b c=+5,旋转角度为-25°。
当然,除了从上述维度对图像进行数据增强外,计算机设备还可以从其他维度对图像进行数据增强,本申请实施例并不对此构成限定。
此外,考虑到医学图像在物理尺度上具有较强的距离相关性,即物理距离相近的医学图像具有相似性,因此在其他可能的实施方式中,计算机设备确定相距距离小于距离阈值的两张医学图像互为正样本,进一步增加正样本数量。其中,该距离阈值与医学图像的分辨率相关,比如,在10倍放大倍率下,该距离阈值为100像素。
示意性的,如图5所示,计算机设备确定同一WSI中,第一区域的第一医学图像51和第二区域的第二医学图像52互为正样本。
步骤402,通过特征提取模型对第一增强图像和第二增强图像进行特征提取,得到第一增强图像的第一图像特征,以及第二增强图像的第二图像特征。
进一步的,计算机设备分别将第一增强图像和第二增强图像输入特征提取模型,由特征提取模型进行特征提取,得到第一图像特征以及第二图像特征。其中,第一图像特征和第二图像特征采用特征图(featuremap)表示,且特征提取模型可以是以残差网络(ResNet)、ResNeXt、视觉变换器(Vision Transformer,ViT)为骨干网络的模型,本申请实施例并不对特征提取模型所采用的骨干网络进行限定。
步骤403,基于第一图像特征、第二图像特征以及负样本图像特征,确定特征提取模型的模型损失,负样本图像特征为其他原始医学图像对应的图像特征。
模型训练过程中,除了需要利用到互为正样本的第一增强图像和第二增强图像外,计算机设备还需要引入与第一增强图像和第二增强图像不相似的负样本,使特征提取模型能够学习到不相似图像之间图像特征的差异性。
在一种可能的实施方式中,计算机设备将不同于当前原始医学图像的其他原始医学图像作为当前原始医学图像的负样本,进而将其他原始医学图像或该其他原始医学图像对应的增强图像作为第一增加图像以及第二增强图像的负样本。
在一些实施例中,如果将其他原始医学图像作为第一增加图像以及第二增强图像的负样本,那么负样本图像特征为从其他原始医学图像中提取的图像特征。如果将其他原始医学图像对应的增强图像作为第一增加图像以及第二增强图像的负样本,那么负样本图像特征为从其他原始医学图像对应的增强图像中提取的图像特征。另外,其他原始医学图像对应的增强图像的生成方式,同样可以采用上文介绍的色彩增强、方向增强等方式,本申请对此不作限定。
在一些实施例中,当前原始医学图像和其他原始医学图像为属于同一训练批次的不同图像,且其他原始医学图像在当前原始医学图像之前经过数据增强以及特征提取。
在一些实施例中,计算机设备基于第一图像特征与第二图像特征之间的特征差异,以及第一图像特征(或第二图像特征)与负样本图像特征之间的特征差异,确定特征提取模型的损失。其中,图像特征之间的特征差异可以采用特征距离表示,该特征距离可以为欧氏距离、曼哈顿距离、余弦距离等等,本实施例对此不作限定。
步骤404,基于模型损失,训练特征提取模型。
进一步的,计算机设备以最小化模型损失为训练目标,即缩小第一图像特征与第二图像特征之间的特征差异,扩大第一图像特征(或第二图像特征)与负样本图像特征之间的特征差异,对特征提取模型进行训练,直至满足训练完成条件时结束训练。其中,该训练完成条件包括损失收敛或达到训练轮数中的至少一种。
综上所述,本申请实施例中,通过对原始医学图像进行数据增强,得到互为正样本的第一增强图像和第二增强图像,并通过特征提取模型进行特征提取,得到第一图像特征和第二图像特征,进而将不同于该原始医学图像的其他原始医学图像作为负样本,并基于第一图像特征、第二图像特征以及负样本图像特征确定特征提取模型的模型损失,最终利用模型损失训练特征提取模型;整个过程中,采用自监督学习方式使特征提取模型学习到医学图像的图像特征,无需人工进行医学图像标注,降低了模型训练过程中的人工标注成本,提高了特征提取模型的训练效率。
在一种可能的实施方式中,为了避免使用相同特征提取网络对第一增强图像和第二增强图像进行特征提取,输出相同特征提取结果进而导致坍缩解,本申请实施例中的特征提取模型包括两条特征提取分支,从而利用不同的特征提取分支对不同增强图像进行特征提取,其中,不同特征提取分支采用不同参数的特征提取网络(即特征提取网络的权值不共享)。下面采用示例性的实施例进行说明。
图6示出了本申请另一个示例性实施例提供的图像特征的监督学习方法的流程图。本实施例以该方法由计算机设备执行为例进行说明,该方法包括如下步骤。
步骤601,对原始医学图像进行数据增强,得到第一增强图像和第二增强图像,第一增强图像和第二增强图像互为正样本。
本步骤的实施方式可以参考上述步骤401,本实施例在此不再赘述。
示意性的,如图7所示,计算机设备对原始图像701进行数据增强后,分别得到第一增强图像702和第二增强图像703。
步骤602,通过第一特征提取分支对第一增强图像进行特征提取,得到第一图像特征,第一特征提取分支包括第一特征提取网络。
在一种可能的实施方式中,输入第一特征提取分支的第一增强图像,通过该第一特征提取分支的第一特征提取网络进行特征提取,得到第一图像特征。
通常情况下,对增强图像进行特征提取后,为了降低特征维度以降低后续推理过程中的计算量,计算机设备会对提取到的图像特征进行池化(pooling)处理,常见的池化方式包括最大池化、平均池化等等。然而,由于不同池化方式下的关注点不同,比如,最大池化关注池化区域内的最大值,平均池化则关注池化区域内的平均值,因此为了提高图像特征的特征表达,在一种可能的实施方式中,特征提取网络后还连接有多全局描述子(Multiple Global Descriptor,MGD)网络,该多全局描述子网络用于对不同描述子下的图像特征(对应不同的池化方式)进行聚合并输出。本步骤可以包括如下步骤:
一、通过第一特征提取网络对第一增强图像进行特征提取,得到第一中间图像特征。
计算机设备将第一增强图像输入第一特征提取网络,得到网络输出的第一中间图像特征。
示意性的,如图7所示,计算机设备将第一增强图像702输入第一特征提取分支,由其 中的第一特征提取网络704进行特征提取,得到第一中间图像特征。
二、通过多全局描述子网络对第一中间图像特征进行至少两种池化处理,得到至少两种第一全局描述子。
在一些实施例中,多全局描述子网络由至少两个池化层构成,且不同池化层对应不同的池化处理方式。完成特征提取后,计算机设备通过至少两个池化层分别对第一中间图像特征进行池化处理,得到至少两种第一全局描述子。
在一些实施例中,池化层可以包括全局平局池化(Global Average Pooling,GAP)层、全局最大池化(Global Maximum Pooling,GMP)层和通用平均池化(General Average Pooling,GeAP)层中的至少两种。当然,计算机设备还可以采用其他池化方式对中间图像特征进行池化处理,本实施例对此不作限定。
示意性的,如图8所示,多全局描述子网络中设置有GAP层801、GMP层802以及GeAP层803,特征提取网络输出的中间图像特征804分别输入GAP层801、GMP层802和GeAP层803,得到不同池化处理后的三种全局描述子805。其中,中间图像特征的维度为(N,C,H,W),全局描述子的维度为(N,C,1,1),N为增强图像的数量,C为通道数,H为特征图高度,W为特征图宽度。
三、通过多全局描述子网络对至少两种第一全局描述子进行级联处理,并基于级联后的第一全局描述子生成第一图像特征。
在一些实施例中,各个池化层后连接有一个多层感知机(Multilayer Perceptron,MLP),计算机设备通过MLP对池化处理后得到的第一全局描述子进行进一步处理,并对处理后的至少两种第一全局描述子进行级联处理,最终将级联后的第一全局描述子输入MLP,得到第一增强图像的第一图像特征。
示意性的,如图8所示,计算机设备将全局描述子805输入MLP 806,并对各个MLP 806的输出结果进行级联处理,得到级联全局描述子807,最终通过MLP 808对级联全局描述子807进行处理,得到第一图像特征809。
步骤603,通过第二特征提取分支对第二增强图像进行特征提取,得到第二图像特征,第二特征提取分支包括第二特征提取网络。
与第一特征提取分支类似的,第二特征提取分支包括第二特征提取网络以及多全局描述子网络,其中,第二特征提取网络与第一特征提取网络的权值不共享,而两条特征提取分支中的多全局描述子网络一致。在一种可能的实施方式中,本步骤可以包括如下步骤:
一、通过第二特征提取网络对第二增强图像进行特征提取,得到第二中间图像特征。
计算机设备将第二增强图像输入第二特征提取网络,得到网络输出的第二中间图像特征。
二、通过多全局描述子网络对第二中间图像特征进行至少两种池化处理,得到至少两种全局描述子。
完成特征提取后,计算机设备通过至少两个池化层分别对第二中间图像特征进行池化处理,得到至少两种第二全局描述子。
三、通过多全局描述子网络对至少两种第二全局描述子进行级联处理,并基于级联后的第二全局描述子生成第二图像特征。
计算机设备通过MLP对池化处理后得到的第二全局描述子进行进一步处理,并对处理后的至少两种第二全局描述子进行级联处理,最终将级联后的第二全局描述子输入MLP,得到第二增强图像的第二图像特征。
其中,利用第二特征提取分支进行特征提取的过程可以参考步骤602,本实施例在此不再赘述。
示意性的,如图7所示,计算机设备通过第一特征提取分支中的第一特征提取网络704以及多全局描述子网络705对第一增强图像702进行特征提取,得到第一图像特征;通过第二特征提取分支中的第二特征提取网络706以及多全局描述子网络705对第二增强图像703 进行特征提取,得到第二图像特征。
步骤604,基于第一图像特征、第二图像特征以及负样本图像特征,确定特征提取模型的模型损失。
由于特征提取模型的特征提取目标是缩小相似图像之间的特征距离,扩大不相似图像之间的特征距离,因此本实施例中,特征提取模型的模型损失包括距离损失,而该距离损失则由正样本特征距离与负样本特征距离确定得到。其中,正样本特征距离为第一图像特征与第二图像特征之间的特征距离,负样本特征距离为第一图像特征(或第二图像特征)与负样本图像特征之间的特征距离,且正样本特征距离与距离损失呈正相关关系,负样本特征距离与距离损失呈负相关关系。
虽然基于包含距离损失的模型损失进行模型训练能够缩小相似图像之间的特征距离,扩大不相似图像之间的特征距离,但是存在正样本数量过少的问题,即每一张原始医学图像被视作为一个独立的类,基于距离损失进行模型训练拉大了所有类之间的距离。然而,仅拉大样本与样本之间的距离反而会造成学习困难,甚至带来假阴性样本的问题。为了避免出现上述问题,本申请实施例中,特征提取模型的模型损失除了包括距离损失外,还包括聚类损失,以此产生更好的相似图像内聚性。
在一种确定聚类损失的方式中,计算机设备对当前训练批次中各张原始医学图像对应的第一图像特征进行聚类,得到k个第一类簇质心,k为大于等于2的整数;对当前训练批次中各张原始医学图像对应的第二图像特征进行聚类,得到k个第二类簇质心;基于第一图像特征与k个第二类簇质心之间的距离,以及第二图像特征与k个第一类簇质心之间的距离,确定聚类损失。
其中,计算机设备可以采用K均值(K-Means)聚类、均值漂移聚类等聚类算法确定类簇质心,且类簇质心可以采用同一类簇各个图像特征的平均特征表示,本实施例对此不作限定。
然而,由于距离损失和聚类损失之间存在一定的对抗关系,因此直接利用第一图像特征和第二图像特征进行聚类,可能会造成后续训练过程中的学习困难。为了避免上述问题,在另一种可能的实施方式中,计算机设备基于第一图像特征生成原始医学图像对应的第一目标特征和第二目标特征,基于第二图像特征生成原始医学图像对应的第三目标特征和第四目标特征,从而利用不同的目标特征确定距离损失以及聚类损失。
在一些实施例中,各条特征提取分支中还包括第一MLP和第二MLP。计算机设备将第一图像特征分别输入第一MLP和第二MLP,得到第一目标特征和第二目标特征;将第二图像特征分别输入第一MLP和第二MLP,得到第三目标特征和第四目标特征。
示意性的,如图7所示,计算机设备通过第一MLP 707对第一图像特征进行处理,输出第一目标特征709,通过第二MLP 708对第一图像特征进行处理,输出第二目标特征710。计算机设备通过第一MLP 707对第二图像特征进行处理,输出第三目标特征711,通过第二MLP 708对第二图像特征进行处理,输出第四目标特征712。
在一种可能的实施方式中,如图9所示,确定模型损失的过程可以包括如下子步骤。
步骤604A,基于第一图像特征与第二图像特征之间的特征距离,以及第一图像特征与负样本图像特征之间的特征距离,确定距离损失。
在一种可能的实施方式中,计算机设备采用同一MLP输出的目标特征确定距离损失。在一些实施例中,计算机设备基于第一目标特征与第三目标特征之间的特征距离,以及第一目标特征(或第二目标特征)与负样本图像特征之间的特征距离,确定距离损失。
在一些实施例中,计算机设备设置有负样本图像特征队列,该队列中包含最近输入的l张原始图像的图像特征,该负样本图像特征队列可以表示为M={m 0,m 1,…,m l-1}。由于每一张原始图像在每一轮训练中仅出现一次,因此负样本图像特征队列中的图像特征都是来自不同输入图像的数据增强。
在一些实施例中,计算机设备通过infoNCE计算得到距离损失,该距离损失可以表示为:
Figure PCTCN2022098805-appb-000003
其中,l为负样本图像特征的数量,f(x 1)表示第一目标特征,f(x 2)表示第三目标特征,m i第i个负样本图像特征,t为控制损失函数平滑度的超参数。
示意性的,如图7所示,计算机设备基于第一目标特征709、第三目标特征711以及负样本图像特征713,计算得到距离损失714。
步骤604B,分别对当前训练批次中各张原始医学图像对应的第一图像特征和第二图像特征进行聚类,并基于聚类结果确定聚类损失。
为了避免学习困难,计算机设备对图像特征进行特征聚类时,对另一MLP输出的目标特征进行聚类,基于聚类结果确定聚类损失。在一种可能的实施方式中,确定聚类损失可以包括如下步骤:
1、对当前训练批次中各张原始医学图像对应的第二目标特征进行聚类,得到k个第三类簇质心。
在一些实施例中,计算机设备对当前训练批次中各张原始医学图像对应的第二目标特征进行聚类,得到k个类簇,并基于各个类簇中的第二目标特征确定第三类簇质心。
示意性的,如图7所示,计算机设备对N张原始医学图像对应的第二目标特征710进行聚类,得到k个第三类簇质心715。
2、对当前训练批次中各张原始医学图像对应的第四目标特征进行聚类,得到k个第四类簇质心。
在一些实施例中,计算机设备对当前训练批次中各张原始医学图像对应的第四目标特征进行聚类,得到k个类簇,并基于各个类簇中的第四目标特征确定第四类簇质心。
示意性的,如图7所示,计算机设备对N张原始医学图像对应的第四目标特征712进行聚类,得到k个第四类簇质心716。
3、基于第一目标特征与k个第四类簇质心之间的距离,以及第三目标特征与k个第三类簇质心之间的距离,确定聚类损失。
在一些实施例中,聚类损失包括第一增强图像对应的类簇质心与第二增强图像对应的目标特征之间的infoNCE,以及第二增强图像对应的类簇质心与第一增强图像对应的目标特征之间的infoNCE(即对称损失)。其中,计算机设备将目标特征所属类簇的类簇质心作为正样本,将其他类簇的类簇质心作为负样本,确定目标特征与类簇质心之间的距离,进而得到聚类损失。
示意性的,该聚类损失可以表示为:
Figure PCTCN2022098805-appb-000004
where f(x 1)∈C(x 2) j
其中,k为第四类簇质心的数量,f(x 1)表示第一目标特征,C(x 2)表示第四类簇质心,f(x 1)∈C(x 2) j表示第一目标特征属于第j个第四类簇质心对应的类簇,t为控制损失函数平滑度的超参数。
示意性的,如图7所示,计算机设备基于第一目标特征709、第三目标特征711、第三类簇质心715以及第四类簇质心716计算得到聚类损失717。
步骤604C,根据距离损失和聚类损失,确定模型损失。
示意性的,如图7所示,计算机设备将距离损失714与聚类损失717之和确定为模型损失718。在一些实施例中,还可以对距离损失与聚类损失进行加权求和,得到模型损失,从而灵活调节距离损失与聚类损失各自的权重占比。
需要说明的是,完成上述训练流程后,计算机设备基于第一图像特征和第二图像特征更新负样本图像特征,保证负样本图像特征队列中包含最近输入的l张原始图像的图像特征。 示意性的,如图7所示,计算机设备基于第一目标特征709和第三目标特征711更新负样本图像特征713。
步骤605,基于模型损失,通过反向传播算法训练第一特征提取网络。
在一种可能的实施方式中,模型训练过程中,第一特征提取网络的网络参数参与梯度回传,第二特征提取网络的网络参数则不参与梯度回传,而是由第一特征提取网络的网络参数更新得到。因此,基于模型损失进行模型训练时,计算机设备通过反向传播算法,调整第一特征提取网络的网络参数,完成一轮对特征提取网络的训练。
示意性的,如图7所示,计算机设备基于模型损失718,对第一特征提取网络704进行参数更新。
步骤606,基于训练后第一特征提取网络的网络参数,更新第二特征提取网络的网络参数。
完成对第一特征提取网络的训练后,计算机设备进一步根据训练后第一特征提取网络的网络参数,对第二特征提取网络的网络参数进行更新。在一种可能的实施方式中,计算机设备可以采用滑动平均方式,基于第一特征提取网络的网络参数更新第二特征提取网络的网络参数,其中,滑动平均过程可以表示为:
θ B←mθ B+(1-m)θ A
其中,θ B为第二特征提取网络的网络参数,θ A为第一特征提取网络的网络参数,m为控制参数。
示意性的,如图7所示,计算机设备基于更新后第一特征提取网络704的网络参数,通过滑动平均更新第二特征提取网络706的网络参数。
本实施例中,计算机设备基于正样本图像特征之间的特征距离,以及正样本图像特征与负样本图像特征之间的特征距离确定距离损失,使特征提取网络在训练过程中能够学习到相似图像之间特征的相似性,以及不相似图像之间特征的差异性;同时,通过对图像特征进行聚类,并基于图像特征与聚类后各个类簇质心之间的距离确定聚类损失,有助于提高相似图像之间的内聚性,进而提高训练得到的特征提取网络的特征提取质量。
此外,本实施例中,通过设置多全局描述子网络,从而利用多全局描述子网络对多个全局描述子进行聚合表示,提高了图像特征的特征表达,有助于提高后续训练质量。
并且,本实施例中,利用两个MLP对图像特征进行处理,得到针对同一增强图像的两个目标特征,进而利用目标特征聚类并确定聚类损失,避免直接利用图像特征确定聚类损失时,因聚类损失与距离损失之间存在对抗性,造成训练困难的问题。
不同显微镜放大倍率下,同一医学图像往往包含不同的语义信息,进而导致截然不同的相似性。因此在训练特征提取模型时,计算机设备需要基于相同放大倍率下的医学图像进行模型训练,相应的,训练得到特征提取模型用于对目标放大倍率下的图像进行特征提取。在一种可能的实施方式中,在图4的基础上,如图10所示,对原始医学图像进行数据增强之前,该方法还包括如下步骤:
步骤4001,对目标放大倍率下的WSI进行切分,得到切分图像。
通常情况下,医学图像通常以WSI格式出现,其中包含了同一张图片在不同分辨率下的图像。由于WSI的数据量过大,因此首先需要在不同显微镜放大倍率下(比如10倍,20倍,40倍),对WSI进行切分,得到若干切分图像。其中,各个切分图像的图像尺寸相同,且符合特征提取模型的图像输出尺寸。
在一些实施例中,当需要训练目标放大倍率对应的特征提取模型时,计算机设备即对目标放大倍率下的WSI进行切分,得到切分图像。
步骤4002,基于图像信息量对切分图像进行筛选,得到原始医学图像。
对于位于边缘的切分图像,由于此类图像中所包含的图像信息量较小,不利于后续模型 训练,因此完成切分后,计算机设备还需要根据图像信息量,对切分图像进行筛选,过滤图像数据量较低的切分图像,最终得到原始医学图像。
示意性的,如图11所示,第一切分图像1101即为需要过滤的无效样本,而第二切分图像1102则是需要保留的有效样本。
此外,在实际应用过程中,若无法知悉医学图像的放大倍率,计算机设备利用训练完成的特征提取网络对医学图像进行特征提取时,提取出的图像特征可能不准确。为了提高这种情况下的特征提取质量,在一种可能的实施方式中,计算机设备基于不同放大倍率的原始医学图像训练倍率预测模型,该倍率预测模型用于预测输入图像的放大倍率。
在一些实施例中,计算机设备以原始医学图像对应的放大倍数为监督,训练倍率预测模型,且训练得到的倍率预测模型用于输出各种放大倍率的概率。比如,当医学图像的放大倍率包括10倍、20倍和40倍时,若倍率预测模型输出的结果0.01,0.95,0.04,表示输入医学图像的放大倍率为10倍的概率为0.01,放大倍率为20倍的概率为0.95,放大倍率为40倍的概率为0.04。
应用过程中,倍率预测模型的预测结果用于对不同特征提取模型提取的图像特征进行特征融合。在一种可能的实施方式中,计算机设备通过倍率预测模型得到医学图像的放大倍率的概率后,通过不同放大倍率对应的特征提取模型对医学图像进行特征提取,进而基于预测出的概率对不同特征提取模型提取的图像特征进行特征融合(比如特征加权求和),以便基于融合得到的图像特征进行后续处理。
示意性的,如图12所示,计算机设备分别通过第一特征提取模型1202、第二特征提取模型1203、第三特征提取模型1204(对应不同放大倍率)对医学图像1201进行特征提取,并通过倍率预测模型1205对医学图像1201进行放大倍率预测,从而基于放大倍率预测结果对三个特征提取模型输出的图像特征进行加权求和,得到目标图像特征1206。
请参考图13,其示出了本申请一个示例性实施例提供的计算机设备的结构示意图。具体来讲:所述计算机设备1300包括中央处理单元(Central Processing Unit,CPU)1301、包括随机存取存储器1302和只读存储器1303的系统存储器1304,以及连接系统存储器1304和中央处理单元1301的系统总线1305。所述计算机设备1300还包括帮助计算机内的各个器件之间传输信息的基本输入/输出系统(Input/Output,I/O系统)1306,和用于存储操作系统1313、应用程序1314和其他程序模块1315的大容量存储设备1307。
所述基本输入/输出系统1306包括有用于显示信息的显示器1308和用于用户输入信息的诸如鼠标、键盘之类的输入设备1309。其中所述显示器1308和输入设备1309都通过连接到系统总线1305的输入输出控制器1310连接到中央处理单元1301。所述基本输入/输出系统1306还可以包括输入输出控制器1310以用于接收和处理来自键盘、鼠标、或电子触控笔等多个其他设备的输入。类似地,输入输出控制器1310还提供输出到显示屏、打印机或其他类型的输出设备。
所述大容量存储设备1307通过连接到系统总线1305的大容量存储控制器(未示出)连接到中央处理单元1301。所述大容量存储设备1307及其相关联的计算机可读介质为计算机设备1300提供非易失性存储。也就是说,所述大容量存储设备1307可以包括诸如硬盘或者驱动器之类的计算机可读介质(未示出)。
不失一般性,所述计算机可读介质可以包括计算机存储介质和通信介质。计算机存储介质包括以用于存储诸如计算机可读指令、数据结构、程序模块或其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介质。计算机存储介质包括随机存取记忆体(RAM,Random Access Memory)、只读存储器(ROM,Read Only Memory)、闪存或其他固态存储其技术,只读光盘(Compact Disc Read-Only Memory,CD-ROM)、数字通用光盘(Digital Versatile Disc,DVD)或其他光学存储、磁带盒、磁带、磁盘存储或其他磁性存 储设备。当然,本领域技术人员可知所述计算机存储介质不局限于上述几种。上述的系统存储器1304和大容量存储设备1307可以统称为存储器。
存储器存储有一个或多个程序,一个或多个程序被配置成由一个或多个中央处理单元1301执行,一个或多个程序包含用于实现上述方法的指令,中央处理单元1301执行该一个或多个程序实现上述各个方法实施例提供的方法。
根据本申请的各种实施例,所述计算机设备1300还可以通过诸如因特网等网络连接到网络上的远程计算机运行。也即计算机设备1300可以通过连接在所述系统总线1305上的网络接口单元1311连接到网络1312,或者说,也可以使用网络接口单元1311来连接到其他类型的网络或远程计算机系统(未示出)。
所述存储器还包括一个或者一个以上的程序,所述一个或者一个以上程序存储于存储器中,所述一个或者一个以上程序包含用于进行本申请实施例提供的方法中由计算机设备所执行的步骤。
图14是本申请一个示例性实施例提供的图像特征的监督学习装置的结构框图,该装置包括:
数据增强模块1401,用于对原始医学图像进行数据增强,得到第一增强图像和第二增强图像,所述第一增强图像和所述第二增强图像互为正样本;
特征提取模块1402,用于通过特征提取模型对所述第一增强图像和所述第二增强图像进行特征提取,得到所述第一增强图像的第一图像特征,以及所述第二增强图像的第二图像特征;
损失确定模块1403,用于基于所述第一图像特征、所述第二图像特征以及负样本图像特征,确定所述特征提取模型的模型损失,所述负样本图像特征为其他原始医学图像对应的图像特征;
第一训练模块1404,用于基于所述模型损失,训练所述特征提取模型。
在一些实施例中,所述特征提取模型包括第一特征提取分支和第二特征提取分支,所述第一特征提取分支和所述第二特征提取分支采用不同参数的特征提取网络;
所述特征提取模块1402,包括:
第一提取单元,用于通过所述第一特征提取分支对所述第一增强图像进行特征提取,得到所述第一图像特征;
第二提取单元,用于通过所述第二特征提取分支对所述第二增强图像进行特征提取,得到所述第二图像特征。
在一些实施例中,所述第一特征提取分支包括第一特征提取网络和多全局描述子网络,所述第二特征提取分支包括第二特征提取网络和所述多全局描述子网络,所述多全局描述子网络用于对不同描述子下的图像特征进行聚合并输出;
所述第一提取单元,具体用于:
通过所述第一特征提取网络对所述第一增强图像进行特征提取,得到第一中间图像特征;
通过所述多全局描述子网络对所述第一中间图像特征进行至少两种池化处理,得到至少两种第一全局描述子;
通过所述多全局描述子网络对至少两种所述第一全局描述子进行级联处理,并基于级联后的所述第一全局描述子生成所述第一图像特征;
所述第二提取单元,具体用于:
通过所述第二特征提取网络对所述第二增强图像进行特征提取,得到第二中间图像特征;
通过所述多全局描述子网络对所述第二中间图像特征进行至少两种池化处理,得到至少两种全局描述子;
通过所述多全局描述子网络对至少两种所述第二全局描述子进行级联处理,并基于级联 后的所述第二全局描述子生成所述第二图像特征。
在一些实施例中,所述第一训练模块1404,包括:
第一训练单元,用于基于所述模型损失,通过反向传播算法训练所述第一特征提取网络;
第二训练单元,用于基于训练后所述第一特征提取网络的网络参数,更新所述第二特征提取网络的网络参数。
在一些实施例中,所述损失确定模块1403,包括:
第一损失确定单元,用于基于所述第一图像特征与所述第二图像特征之间的特征距离,以及所述第一图像特征与所述负样本图像特征之间的特征距离,确定距离损失;
第二损失确定单元,用于分别对当前训练批次中各张所述原始医学图像对应的所述第一图像特征和所述第二图像特征进行聚类,并基于聚类结果确定聚类损失;
总损失确定单元,用于根据所述距离损失和所述聚类损失,确定所述模型损失。
在一些实施例中,所述第二损失确定单元,用于:
对当前训练批次中各张所述原始医学图像对应的所述第一图像特征进行聚类,得到k个第一类簇质心,k为大于等于2的整数;
对当前训练批次中各张所述原始医学图像对应的所述第二图像特征进行聚类,得到k个第二类簇质心;
基于所述第一图像特征与k个所述第二类簇质心之间的距离,以及所述第二图像特征与k个所述第一类簇质心之间的距离,确定所述聚类损失。
在一些实施例中,所述装置还包括:
第一生成模块,用于基于所述第一图像特征生成所述原始医学图像对应的第一目标特征和第二目标特征;
第二生成模块,用于基于所述第二图像特征生成所述原始医学图像对应的第三目标特征和第四目标特征;
所述第一损失确定单元,具体用于:
基于所述第一目标特征与所述第三目标特征之间的特征距离,以及所述第一目标特征与所述负样本图像特征之间的特征距离,确定所述距离损失;
所述第二损失确定单元,具体用于:
对当前训练批次中各张所述原始医学图像对应的第二目标特征进行聚类,得到k个第三类簇质心;
对当前训练批次中各张所述原始医学图像对应的第四目标特征进行聚类,得到k个第四类簇质心;
基于所述第一目标特征与k个所述第四类簇质心之间的距离,以及所述第三目标特征与k个所述第三类簇质心之间的距离,确定所述聚类损失。
在一些实施例中,所述第一生成模块,用于:
将所述第一图像特征分别输入第一多层感知机MLP和第二MLP,得到所述第一目标特征和所述第二目标特征;
在一些实施例中,所述第二生成模块,用于:
将所述第二图像特征分别输入第一MLP和第二MLP,得到所述第三目标特征和所述第四目标特征。
在一些实施例中,所述数据增强模块1401,包括:
第一增强单元,用于基于第一增强参数对所述原始医学图像进行色彩增强和方向增强,得到所述第一增强图像;
第二增强单元,用于通过第二增强参数对所述原始医学图像进行色彩增强和方向增强,得到所述第二增强图像,所述第一增强参数和所述第二增强参数不同。
在一些实施例中,所述装置方法还包括:
更新模块,用于基于所述第一图像特征和所述第二图像特征更新所述负样本图像特征。
在一些实施例中,所述特征提取模型用于对目标放大倍率下的图像进行特征提取;
所述装置包括:
切分模块,用于对所述目标放大倍率下的全视野数字切片WSI进行切分,得到切分图像;
筛选模块,用于基于图像信息量对所述切分图像进行筛选,得到所述原始医学图像。
在一些实施例中,所述装置还包括:
第二训练模块,用于基于不同放大倍率的原始医学图像训练倍率预测模型,所述倍率预测模型用于预测输入图像的放大倍率,且所述倍率预测模型的预测结果用于对不同特征提取模型提取的图像特征进行特征融合。
综上所述,本申请实施例中,通过对原始医学图像进行数据增强,得到互为正样本的第一增强图像和第二增强图像,并通过特征提取模型进行特征提取,得到第一图像特征和第二图像特征,进而将不同于该原始医学图像的其他原始医学图像作为负样本,并基于第一图像特征、第二图像特征以及负样本图像特征确定特征提取模型的模型损失,最终利用模型损失训练特征提取模型;整个过程中,采用自监督学习方式使特征提取模型学习到医学图像的图像特征,无需人工进行医学图像标注,降低了模型训练过程中的人工标注成本,提高了特征提取模型的训练效率。
本实施例中,计算机设备基于正样本图像特征之间的特征距离,以及正样本图像特征与负样本图像特征之间的特征距离确定距离损失,使特征提取网络在训练过程中能够学习到相似图像之间特征的相似性,以及不相似图像之间特征的差异性;同时,通过对图像特征进行聚类,并基于图像特征与聚类后各个类簇质心之间的距离确定聚类损失,有助于提高相似图像之间的内聚性,进而提高训练得到的特征提取网络的特征提取质量。
此外,本实施例中,通过设置多全局描述子网络,从而利用多全局描述子网络对多个全局描述子进行聚合表示,提高了图像特征的特征表达,有助于提高后续训练质量。
并且,本实施例中,利用两个MLP对图像特征进行处理,得到针对同一增强图像的两个目标特征,进而利用目标特征聚类并确定聚类损失,避免直接利用图像特征确定聚类损失时,因聚类损失与距离损失之间存在对抗性,造成训练困难的问题。
需要说明的是:上述实施例提供的装置,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的装置与方法实施例属于同一构思,其实现过程详见方法实施例,这里不再赘述。
本申请实施例还提供一种计算机可读存储介质,该可读存储介质中存储有至少一条指令,至少一条指令由处理器加载并执行以实现上述任一实施例所述的图像特征的监督学习方法。
示例性的,该计算机可读存储介质可以包括:ROM、RAM、固态硬盘(SSD,Solid State Drives)或光盘等。其中,RAM可以包括电阻式随机存取记忆体(ReRAM,Resistance Random Access Memory)和动态随机存取存储器(DRAM,Dynamic Random Access Memory)。
本申请实施例提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述实施例所述的图像特征的监督学习方法。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本申请的示例性实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (16)

  1. 一种图像特征的监督学习方法,所述方法由计算机设备执行,所述方法包括:
    对原始医学图像进行数据增强,得到第一增强图像和第二增强图像,所述第一增强图像和所述第二增强图像互为正样本;
    通过特征提取模型对所述第一增强图像和所述第二增强图像进行特征提取,得到所述第一增强图像的第一图像特征,以及所述第二增强图像的第二图像特征;
    基于所述第一图像特征、所述第二图像特征以及负样本图像特征,确定所述特征提取模型的模型损失,所述负样本图像特征为其他原始医学图像对应的图像特征;
    基于所述模型损失,训练所述特征提取模型。
  2. 根据权利要求1所述的方法,其中,所述特征提取模型包括第一特征提取分支和第二特征提取分支,所述第一特征提取分支和所述第二特征提取分支采用不同参数的特征提取网络;
    所述通过特征提取模型对所述第一增强图像和所述第二增强图像进行特征提取,得到所述第一增强图像的第一图像特征,以及所述第二增强图像的第二图像特征,包括:
    通过所述第一特征提取分支对所述第一增强图像进行特征提取,得到所述第一图像特征;
    通过所述第二特征提取分支对所述第二增强图像进行特征提取,得到所述第二图像特征。
  3. 根据权利要求2所述的方法,其中,所述第一特征提取分支包括第一特征提取网络和多全局描述子网络,所述第二特征提取分支包括第二特征提取网络和所述多全局描述子网络,所述多全局描述子网络用于对不同描述子下的图像特征进行聚合并输出;
    所述通过所述第一特征提取分支对所述第一增强图像进行特征提取,得到所述第一图像特征,包括:
    通过所述第一特征提取网络对所述第一增强图像进行特征提取,得到第一中间图像特征;
    通过所述多全局描述子网络对所述第一中间图像特征进行至少两种池化处理,得到至少两种第一全局描述子;
    通过所述多全局描述子网络对至少两种所述第一全局描述子进行级联处理,并基于级联后的所述第一全局描述子生成所述第一图像特征;
    所述通过所述第二特征提取分支对所述第二增强图像进行特征提取,得到所述第二图像特征,包括:
    通过所述第二特征提取网络对所述第二增强图像进行特征提取,得到第二中间图像特征;
    通过所述多全局描述子网络对所述第二中间图像特征进行至少两种池化处理,得到至少两种全局描述子;
    通过所述多全局描述子网络对至少两种所述第二全局描述子进行级联处理,并基于级联后的所述第二全局描述子生成所述第二图像特征。
  4. 根据权利要求3所述的方法,其中,所述基于所述模型损失训练所述特征提取模型,包括:
    基于所述模型损失,通过反向传播算法训练所述第一特征提取网络;
    基于训练后所述第一特征提取网络的网络参数,更新所述第二特征提取网络的网络参数。
  5. 根据权利要求1至4任一所述的方法,其中,所述基于所述第一图像特征、所述第二图像特征以及负样本图像特征,确定所述特征提取模型的模型损失,包括:
    基于所述第一图像特征与所述第二图像特征之间的特征距离,以及所述第一图像特征与所述负样本图像特征之间的特征距离,确定距离损失;
    分别对当前训练批次中各张所述原始医学图像对应的所述第一图像特征和所述第二图像特征进行聚类,并基于聚类结果确定聚类损失;
    根据所述距离损失和所述聚类损失,确定所述模型损失。
  6. 根据权利要求5所述的方法,其中,所述分别对当前训练批次中各张所述原始医学图像对应的所述第一图像特征和所述第二图像特征进行聚类,并基于聚类结果确定聚类损失,包括:
    对当前训练批次中各张所述原始医学图像对应的所述第一图像特征进行聚类,得到k个第一类簇质心,k为大于等于2的整数;
    对当前训练批次中各张所述原始医学图像对应的所述第二图像特征进行聚类,得到k个第二类簇质心;
    基于所述第一图像特征与k个所述第二类簇质心之间的距离,以及所述第二图像特征与k个所述第一类簇质心之间的距离,确定所述聚类损失。
  7. 根据权利要求5所述的方法,其中,所述方法还包括:
    基于所述第一图像特征生成所述原始医学图像对应的第一目标特征和第二目标特征;
    基于所述第二图像特征生成所述原始医学图像对应的第三目标特征和第四目标特征;
    所述基于所述第一图像特征与所述第二图像特征之间的特征距离,以及所述第一图像特征与所述负样本图像特征之间的特征距离,确定距离损失,包括:
    基于所述第一目标特征与所述第三目标特征之间的特征距离,以及所述第一目标特征与所述负样本图像特征之间的特征距离,确定所述距离损失;
    所述分别对当前训练批次中各张所述原始医学图像对应的所述第一图像特征和所述第二图像特征进行聚类,并基于聚类结果确定聚类损失,包括:
    对当前训练批次中各张所述原始医学图像对应的第二目标特征进行聚类,得到k个第三类簇质心;
    对当前训练批次中各张所述原始医学图像对应的第四目标特征进行聚类,得到k个第四类簇质心;
    基于所述第一目标特征与k个所述第四类簇质心之间的距离,以及所述第三目标特征与k个所述第三类簇质心之间的距离,确定所述聚类损失。
  8. 根据权利要求7所述的方法,其中,所述基于所述第一图像特征生成所述原始医学图像对应的第一目标特征和第二目标特征,包括:
    将所述第一图像特征分别输入第一多层感知机MLP和第二MLP,得到所述第一目标特征和所述第二目标特征;
    所述基于所述第二图像特征生成所述原始医学图像对应的第三目标特征和第四目标特征,包括:
    将所述第二图像特征分别输入第一MLP和第二MLP,得到所述第三目标特征和所述第四目标特征。
  9. 根据权利要求1至4任一所述的方法,其中,所述对原始医学图像进行数据增强,得到第一增强图像和第二增强图像,包括:
    基于第一增强参数对所述原始医学图像进行色彩增强和方向增强,得到所述第一增强图像;
    通过第二增强参数对所述原始医学图像进行色彩增强和方向增强,得到所述第二增强图像,所述第一增强参数和所述第二增强参数不同。
  10. 根据权利要求1至4任一所述的方法,其中,所述基于所述模型损失训练所述特征提取模型之后,所述方法还包括:
    基于所述第一图像特征和所述第二图像特征更新所述负样本图像特征。
  11. 根据权利要求1至4任一所述的方法,其中,所述特征提取模型用于对目标放大倍率下的图像进行特征提取;
    所述对原始医学图像进行数据增强,得到第一增强图像和第二增强图像之前,所述方法包括:
    对所述目标放大倍率下的全视野数字切片WSI进行切分,得到切分图像;
    基于图像信息量对所述切分图像进行筛选,得到所述原始医学图像。
  12. 根据权利要求11所述的方法,其中,所述方法还包括:
    基于不同放大倍率的原始医学图像训练倍率预测模型,所述倍率预测模型用于预测输入图像的放大倍率,且所述倍率预测模型的预测结果用于对不同特征提取模型提取的图像特征进行特征融合。
  13. 一种图像特征的监督学习装置,所述装置包括:
    数据增强模块,用于对原始医学图像进行数据增强,得到第一增强图像和第二增强图像,所述第一增强图像和所述第二增强图像互为正样本;
    特征提取模块,用于通过特征提取模型对所述第一增强图像和所述第二增强图像进行特征提取,得到所述第一增强图像的第一图像特征,以及所述第二增强图像的第二图像特征;
    损失确定模块,用于基于所述第一图像特征、所述第二图像特征以及负样本图像特征,确定所述特征提取模型的模型损失,所述负样本图像特征为其他原始医学图像对应的图像特征;
    第一训练模块,用于基于所述模型损失,训练所述特征提取模型。
  14. 一种计算机设备,所述计算机设备包括处理器和存储器,所述存储器中存储有至少一条指令,所述至少一条指令由所述处理器加载并执行以实现如权利要求1至12任一所述的图像特征的监督学习方法。
  15. 一种计算机可读存储介质,所述可读存储介质中存储有至少一条指令,所述至少一条指令由处理器加载并执行以实现如权利要求1至12任一所述的图像特征的监督学习方法。
  16. 一种计算机程序产品,所述计算机程序产品包括计算机指令,所述计算机指令存储在计算机可读存储介质中,处理器从所述计算机可读存储介质读取并执行所述计算机指令,以实现如权利要求1至12任一所述的图像特征的监督学习方法。
PCT/CN2022/098805 2021-07-22 2022-06-15 图像特征的监督学习方法、装置、设备及存储介质 WO2023000872A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP22845039.1A EP4375857A1 (en) 2021-07-22 2022-06-15 Supervised learning method and apparatus for image features, device, and storage medium
US18/127,657 US20230237771A1 (en) 2021-07-22 2023-03-29 Self-supervised learning method and apparatus for image features, device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110831737.8 2021-07-22
CN202110831737.8A CN113822325A (zh) 2021-07-22 2021-07-22 图像特征的监督学习方法、装置、设备及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/127,657 Continuation US20230237771A1 (en) 2021-07-22 2023-03-29 Self-supervised learning method and apparatus for image features, device, and storage medium

Publications (1)

Publication Number Publication Date
WO2023000872A1 true WO2023000872A1 (zh) 2023-01-26

Family

ID=78912759

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/098805 WO2023000872A1 (zh) 2021-07-22 2022-06-15 图像特征的监督学习方法、装置、设备及存储介质

Country Status (4)

Country Link
US (1) US20230237771A1 (zh)
EP (1) EP4375857A1 (zh)
CN (1) CN113822325A (zh)
WO (1) WO2023000872A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113822325A (zh) * 2021-07-22 2021-12-21 腾讯科技(深圳)有限公司 图像特征的监督学习方法、装置、设备及存储介质
CN115115855A (zh) * 2022-05-16 2022-09-27 腾讯科技(深圳)有限公司 图像编码器的训练方法、装置、设备及介质
CN115115856A (zh) * 2022-05-16 2022-09-27 腾讯科技(深圳)有限公司 图像编码器的训练方法、装置、设备及介质
CN114863543B (zh) * 2022-07-11 2022-09-06 华适科技(深圳)有限公司 一种特征更新的人脸识别方法及系统
CN115187787B (zh) * 2022-09-09 2023-01-31 清华大学 用于自监督多视图表征学习的局部流形增强的方法及装置
CN116741372B (zh) * 2023-07-12 2024-01-23 东北大学 一种基于双分支表征一致性损失的辅助诊断系统及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109858563A (zh) * 2019-02-22 2019-06-07 清华大学 基于变换识别的自监督表征学习方法及装置
CN112507990A (zh) * 2021-02-04 2021-03-16 北京明略软件系统有限公司 视频时空特征学习、抽取方法、装置、设备及存储介质
WO2021059388A1 (ja) * 2019-09-25 2021-04-01 日本電信電話株式会社 学習装置、画像処理装置、学習方法及び学習プログラム
CN112766406A (zh) * 2021-01-29 2021-05-07 北京达佳互联信息技术有限公司 物品图像的处理方法、装置、计算机设备及存储介质
CN113065533A (zh) * 2021-06-01 2021-07-02 北京达佳互联信息技术有限公司 一种特征提取模型生成方法、装置、电子设备和存储介质
CN113822325A (zh) * 2021-07-22 2021-12-21 腾讯科技(深圳)有限公司 图像特征的监督学习方法、装置、设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109858563A (zh) * 2019-02-22 2019-06-07 清华大学 基于变换识别的自监督表征学习方法及装置
WO2021059388A1 (ja) * 2019-09-25 2021-04-01 日本電信電話株式会社 学習装置、画像処理装置、学習方法及び学習プログラム
CN112766406A (zh) * 2021-01-29 2021-05-07 北京达佳互联信息技术有限公司 物品图像的处理方法、装置、计算机设备及存储介质
CN112507990A (zh) * 2021-02-04 2021-03-16 北京明略软件系统有限公司 视频时空特征学习、抽取方法、装置、设备及存储介质
CN113065533A (zh) * 2021-06-01 2021-07-02 北京达佳互联信息技术有限公司 一种特征提取模型生成方法、装置、电子设备和存储介质
CN113822325A (zh) * 2021-07-22 2021-12-21 腾讯科技(深圳)有限公司 图像特征的监督学习方法、装置、设备及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JEAN-BASTIEN GRILL; FLORIAN STRUB; FLORENT ALTCH\'E; CORENTIN TALLEC; PIERRE H. RICHEMOND; ELENA BUCHATSKAYA; CARL DOERSCH; B: "Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 14 June 2020 (2020-06-14), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081700812 *

Also Published As

Publication number Publication date
US20230237771A1 (en) 2023-07-27
EP4375857A1 (en) 2024-05-29
CN113822325A (zh) 2021-12-21

Similar Documents

Publication Publication Date Title
WO2023000872A1 (zh) 图像特征的监督学习方法、装置、设备及存储介质
US11810377B2 (en) Point cloud segmentation method, computer-readable storage medium, and computer device
EP3989119A1 (en) Detection model training method and apparatus, computer device, and storage medium
Kao et al. Visual aesthetic quality assessment with a regression model
Hasnat et al. Deepvisage: Making face recognition simple yet with powerful generalization skills
CN111967379B (zh) 一种基于rgb视频和骨架序列的人体行为识别方法
CN111582409A (zh) 图像标签分类网络的训练方法、图像标签分类方法及设备
CN111476806B (zh) 图像处理方法、装置、计算机设备和存储介质
WO2023273668A1 (zh) 图像分类方法、装置、设备、存储介质及程序产品
CN112036514B (zh) 一种图像分类方法、装置、服务器及计算机可读存储介质
CN116580257A (zh) 特征融合模型训练及样本检索方法、装置和计算机设备
CN114330499A (zh) 分类模型的训练方法、装置、设备、存储介质及程序产品
Soumya et al. Emotion recognition from partially occluded facial images using prototypical networks
Namazi et al. Automatic detection of surgical phases in laparoscopic videos
CN112232147B (zh) 用于人脸模型超参数自适应获取的方法、装置和系统
CN113762041A (zh) 视频分类方法、装置、计算机设备和存储介质
Ullah et al. Weakly-supervised action localization based on seed superpixels
Chaturvedi et al. Landmark calibration for facial expressions and fish classification
CN111651626B (zh) 图像分类方法、装置及可读存储介质
CN114492640A (zh) 基于域自适应的模型训练方法、目标比对方法和相关装置
Guzzi et al. Distillation of a CNN for a high accuracy mobile face recognition system
WO2024016691A1 (zh) 一种图像检索方法、模型训练方法、装置及存储介质
Mantri et al. An intelligent surgical video retrieval for computer vision enhancement in medical diagnosis using deep learning techniques
Wu et al. Multimodal learning with only image data: A deep unsupervised model for street view image retrieval by fusing visual and scene text features of images
Liu et al. Safety helmet wearing correctly detection based on capsule network

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 2022845039

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2022845039

Country of ref document: EP

Effective date: 20240220