CN116740714B

CN116740714B - Intelligent self-labeling method and device for hip joint diseases based on unsupervised learning

Info

Publication number: CN116740714B
Application number: CN202310693176.9A
Authority: CN
Inventors: 张逸凌; 刘星宇
Original assignee: Longwood Valley Medtech Co Ltd
Current assignee: Longwood Valley Medtech Co Ltd
Priority date: 2023-06-12
Filing date: 2023-06-12
Publication date: 2024-02-09
Anticipated expiration: 2043-06-12
Also published as: CN116740714A

Abstract

The application provides an intelligent self-labeling method and device for hip joint diseases based on unsupervised learning, wherein the method comprises the following steps: inputting an image set to be annotated into a joint detection model to obtain a hip joint image set output by the joint detection model; inputting the hip joint image set into a feature extraction model to obtain an image feature information set; inputting the image characteristic information set into a classification model to obtain the category of each image characteristic information and labeling; the joint detection model and the feature extraction model are obtained by performing coarse training based on the first sample image set and performing fine adjustment based on the second sample image set, and the classification model is obtained by performing fine adjustment based on the second sample image set. In the method, the automatic labeling of the disease types of the image to be labeled is achieved by using the joint detection model, the feature extraction model and the classification model which are combined with each other.

Description

Intelligent self-labeling method and device for hip joint diseases based on unsupervised learning

Technical Field

The application relates to the technical field of medical equipment, in particular to an intelligent self-labeling method and device for hip joint diseases based on unsupervised learning.

Background

The current discovery mode of the hip joint disease is mainly realized by means of X-ray, namely, a patient takes X-ray pictures and the doctor of the imaging department reads the pictures to give a diagnosis suggestion. However, with the development of the degree of aging in China and the lack of imaging doctors, the artificial diagnosis of hip joint diseases is more and more difficult to meet the demands.

At present, more and more researchers begin to diagnose the hip joint diseases by using a deep learning method, but the premise that the deep training can achieve the expected effect is good labeling, and the existing manual labeling mode is difficult to meet the requirements on labeling quality and labeling speed.

Disclosure of Invention

The problem that the current marking mode can not meet the requirement of deep training on the marking of the hip joint diseases is difficult to solve.

To solve the above problems, a first aspect of the present application provides an intelligent self-labeling method for hip joint diseases based on unsupervised learning, including:

inputting an image set to be annotated into a joint detection model to obtain a hip joint image set output by the joint detection model;

inputting the hip joint image set into a feature extraction model to obtain an image feature information set, wherein the feature extraction model is constructed by a Swin transducer model;

Inputting the image characteristic information set into a classification model to obtain the category of each image characteristic information and labeling;

the joint detection model and the feature extraction model are obtained after coarse training based on the first sample image set and fine tuning based on the second sample image set, the classification model is obtained after fine tuning based on the second sample image set, the first sample image set is provided with marked hip joint area information and mask information, and the second sample image set is provided with marked disease type information.

Further, before the image set to be annotated is input into the joint detection model to obtain the hip joint image set output by the joint detection model, the method further comprises the steps of:

acquiring the first sample image set, wherein the first sample image set comprises a plurality of first sample images, and each first sample image has a marked hip joint region and a MASK MASK corresponding to the hip joint region;

training the joint detection model according to the first sample image set to obtain a rough-trained joint detection model;

intercepting the first sample image annotation hip region from the first sample image set;

Performing rough training on the feature extraction model according to the intercepted hip joint region and the corresponding MASK to obtain a feature extraction model after rough training;

and performing synchronous fine adjustment on the classification model, the joint detection model after rough training and the feature extraction model according to the second sample image set to obtain the joint detection model after fine adjustment, the feature extraction model and the classification model.

Further, the classification model is a K-means model, the step of performing synchronous fine adjustment on the classification model, the joint detection model after coarse training, and the feature extraction model according to the second sample image set to obtain the joint detection model, the feature extraction model, and the classification model after fine adjustment, including:

acquiring the second sample image set, wherein the second sample image set comprises a plurality of second sample images, and each second sample image has a marked disease type;

counting disease types marked by the second sample image set, and determining the K value of the classification model according to the disease types;

inputting the second sample image set into the joint detection model after coarse training to obtain a second hip joint image set corresponding to the second sample image set;

Inputting the second hip joint image set into the feature extraction model after rough training to obtain a second image feature information set corresponding to the second hip joint image set;

inputting the second image feature information set into the classification model, and classifying the second image feature information into K categories based on the K value;

determining the corresponding relation between K categories and marked disease categories;

calculating an overall loss function according to the category corresponding to the second sample image and the marked disease category;

and iterating the initial clustering center of the classification model, the joint detection model and the feature extraction model according to the integral loss function until the integral loss function converges.

Further, the classification model is a K-means model, and the step of performing synchronous fine adjustment on the classification model, the joint detection model after coarse training, and the feature extraction model according to the second sample image set to obtain the joint detection model, the feature extraction model, and the classification model after fine adjustment further includes:

and storing the initial clustering center after the last iteration and the corresponding relation between the initial clustering center and the disease types, and taking the initial clustering center as the initial clustering center of the classification model after fine adjustment.

Further, the determining the correspondence between the K categories and the noted disease categories includes:

counting the number of disease types marked by the first sample images of K categories and the proportion of the diseases occupied in the categories respectively;

the number of the same disease category in K categories and the occupied disease proportion are respectively ordered;

if the number ordering and the disease proportion ordering of one disease category in one category are both first, establishing a corresponding relation between the disease category and the category;

if a first threshold proportion of the number of the disease types in one category is larger than the total number of the disease types, establishing a corresponding relation between the disease types and the category, wherein the first threshold proportion is larger than 50%;

if the disease proportion of one disease type in one type is greater than a second threshold proportion, establishing a corresponding relation between the disease type and the type, wherein the second threshold proportion is greater than 50%;

re-executing the step of inputting the second image feature information set into the classification model and classifying the second image feature information into K categories based on the K values under the condition that the K categories do not all establish one-to-one correspondence with different disease categories;

And under the condition that the K categories respectively establish one-to-one correspondence with different disease categories, confirming the correspondence.

Further, the calculation formula of the overall loss function is as follows:

wherein x is _i For the class value, y, corresponding to the second sample image _i And labeling disease type values for the second sample images, wherein i is the serial number of the second sample images, and n is the total number of the second sample images.

Further, the feature extraction model comprises a linear embedding layer, a first remodelling layer, an even number of Swin transducer modules and a second remodelling layer;

the linear embedding layer changes the dimension of the input image data;

the first remodelling layer changes the channel number of the image data;

the Swin transducer modules perform feature extraction on the image data to obtain feature information;

and the second remodelling layer rearranges the output of the feature extraction model to obtain the image feature information.

The second aspect of the application provides an intelligent self-labeling device for hip joint diseases based on unsupervised learning, which comprises:

the joint detection module is used for inputting the image set to be marked into a joint detection model to obtain a hip joint image set output by the joint detection model;

The feature extraction module is used for inputting the hip joint image set into a feature extraction model to obtain an image feature information set, and the feature extraction model is constructed by a Swin transducer model;

the category labeling module is used for inputting the image characteristic information set into a classification model to obtain and label the category of each image characteristic information;

A third aspect of the present application provides an electronic device, comprising: a memory and a processor;

the memory is used for storing programs;

the processor, coupled to the memory, is configured to execute the program for:

A fourth aspect of the present application provides a computer readable storage medium having stored thereon a computer program for execution by a processor to implement the above-described method for intelligent self-labeling of hip joint diseases based on unsupervised learning.

In the method, the automatic labeling of the disease types of the image to be labeled is achieved by using the joint detection model, the feature extraction model and the classification model which are combined with each other. On one hand, the speed and the number of disease type labeling are greatly improved through batched labeling, and on the other hand, labeling rules are unified through batched labeling, so that the situation that the same image is labeled differently through different people is avoided.

In the application, the feature extraction model is coarsely trained based on the first sample image set, so that the feature extraction model 'understands' the first sample image set, and the substantial correspondence between image feature information output by the feature extraction model and the first sample image set is improved.

In the method, the joint detection model and the feature extraction model are respectively subjected to rough training through the same first sample image set, and the coherence performance and compatibility between the joint detection model and the feature extraction model of the rough training are improved through a mode of multiplexing training samples.

Drawings

FIG. 1 is a flow chart of an intelligent self-labeling method for hip joint diseases according to an embodiment of the present application;

FIG. 2 is a diagram of a model architecture for a method for intelligent self-labeling of hip joint diseases according to an embodiment of the present application;

FIG. 3 is a flow chart of a model training of an intelligent self-labeling method for hip joint diseases according to an embodiment of the present application;

FIG. 4 is a flow chart of intelligent self-labeling method model fine tuning of a hip joint disease according to an embodiment of the present application;

fig. 5 is a block diagram of a continuous Swin transducer module for a method of intelligent self-labeling of hip joint diseases according to an embodiment of the present application;

FIG. 6 is a block diagram of a device for intelligent self-labeling of hip joint diseases according to an embodiment of the present application;

Fig. 7 is a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the above objects, features and advantages of the present application more comprehensible, embodiments accompanied with figures are described in detail below. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

It is noted that unless otherwise indicated, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs.

The disease of the hip joint is a great annoyance to the patient, and before the formation of the hip joint, some morphological changes usually occur at the position of the hip joint, which can be of great benefit to the patient if they can be found early in the disease and prevented and intervened early as early signals of the disease.

The current discovery mode of the hip joint disease is mainly realized by means of X-ray, namely, a patient takes X-ray pictures and the doctor of the imaging department reads the pictures to give a diagnosis suggestion. However, with the development of the degree of Chinese aging and the lack of imaging doctors, the imaging doctors are under high-intensity working pressure for a long time. This state is not beneficial to the health of the imaging physician on the one hand; on the other hand, due to the operation in the fatigue state, an erroneous diagnosis is also liable to occur.

Based on the above, more and more researchers begin to diagnose the hip joint diseases by using a deep learning method, but the premise that the deep training can achieve the expected effect is good labeling, but the existing manual labeling mode is difficult to meet the requirements on labeling quality and labeling speed.

Aiming at the problems, the application provides a novel intelligent self-labeling scheme for the hip joint diseases based on unsupervised learning, which can link a Swin transform model with a classification model to carry out intelligent self-labeling of the hip joint diseases and solves the problem that the current labeling mode cannot meet the requirement of deep training on the labeling of the hip joint diseases.

For ease of understanding, the following terms that may be used are explained herein:

End-to-end learning: (End-To-End Learning), also known as End-To-End training, refers To the general goal of directly optimizing tasks without performing a split or staged training during the Learning process.

MSA module: a standard multi-head self-attention module;

W-MSA: a multi-headed self-attention module based on non-overlapping local windows;

SW-MSA: a multi-headed self-attention module based on a shift window.

K-means: the K-MEANS algorithm is a clustering algorithm and is also an unsupervised algorithm, namely no label is needed when a model is trained, and the main purpose of the K-MEANS algorithm is to divide sample data into K classes through loop iteration.

YOLO model: (You Only Look Once) YOLO is a target detection algorithm that performs the task of object detection and classification simultaneously in a single neural network.

Retina model: is an One-Stage dense prediction model, and is a whole model formed by a Backbone and two self-networks for executing specific tasks.

The embodiment of the application provides an intelligent self-labeling method for hip joint diseases based on unsupervised learning, the specific scheme of the method is shown in fig. 1-7, the method can be executed by an intelligent self-labeling device for hip joint diseases based on unsupervised learning, and the intelligent self-labeling device for hip joint diseases based on unsupervised learning can be integrated in electronic equipment such as a computer, a server, a computer, a server cluster, a data center and the like. Referring to fig. 1 and 2, a flowchart of an intelligent self-labeling method for hip joint diseases based on unsupervised learning according to an embodiment of the present application is shown; the intelligent self-labeling method for the hip joint diseases based on the unsupervised learning comprises the following steps:

S201, inputting an image set to be annotated into a joint detection model to obtain a hip joint image set output by the joint detection model;

in the application, the image set to be marked is a set formed by a plurality of images to be marked; in the step, each image to be annotated in the image set to be annotated is independently input into a joint detection model to obtain corresponding hip joint images, and the hip joint images form a hip joint image set.

In the application, the image to be marked is a medical image containing a hip joint, and the state of the hip joint can be judged based on the medical image. Medical images include, for example: an X-ray slice, an electronic computed tomography (Computed Tomography, CT) image, a magnetic resonance (Magnetic Resonance Imaging, MRI) image, or a composite image of the above-mentioned plurality of images, or the like.

In the application, the number of the images to be marked in the image set to be marked is the same as the number of the hip images in the hip image set, and the images to be marked have a one-to-one correspondence relationship.

S202, inputting the hip joint image set into a feature extraction model to obtain an image feature information set, wherein the feature extraction model is constructed by a Swin transducer model;

in the step, each hip joint image in the hip joint image set is independently input into a feature extraction model to obtain corresponding image feature information, and the image feature information forms an image feature information set.

In the application, the number of the hip joint images in the hip joint image set is the same as the number of the image feature information in the image feature information set, and the hip joint images and the image feature information in the image feature information set have a one-to-one correspondence.

It should be noted that, in the present application, the joint detection model, the feature extraction model and the classification model after pre-training/rough training are integrated models, wherein each image to be annotated in the image set to be annotated is independently input into the joint detection model, each hip image in the hip image set is independently input into the feature extraction model, but all image feature information in the image feature information set is simultaneously input into the classification model for classification.

S203, inputting the image characteristic information set into a classification model to obtain and label the category of each image characteristic information;

In the application, the classification model is an unsupervised learning model, so that classification can be performed based on the characteristics of the image, and manual interference is avoided.

In the application, the first sample image set is a set formed by a plurality of first sample images; the second sample image set is a set formed by a plurality of second sample images; the first and second sample images are used to indicate a hip joint region of an arbitrary subject. The object is used for indicating a patient, and the first sample image and the second sample image can be medical images obtained by scanning the patient.

In the application, the first sample image, the second sample image and the image to be marked are images of the same type, and the images can be specifically a whole body image, a half body image, a local human body image and the like, so long as the images of the hip joint part are included; the difference is that the first sample image has the corresponding circled hip joint region and the mask information corresponding to the hip joint region, so that the joint detection model and the feature extraction model are roughly trained as samples, and the second sample image has the corresponding disease type information, so that the joint detection model, the feature extraction model and the classification model are finely tuned as samples.

In the manual labeling process, the problem that the X-ray images are different in' same image and different in image and labeling personnel level is caused, so that labeling rules are difficult to unify, the same images are easy to appear but different in labeling, and the accuracy of deep learning is seriously disturbed.

In the method, the joint detection model, the feature extraction model and the classification model which are combined with one another are set as an integrated model for synchronous fine adjustment, so that end-to-end training is realized, and the problems of complexity increase and compatibility caused by decentralized training are avoided; the model complexity is reduced, the model reasoning time is also reduced, and intelligent labeling can be automatically performed.

Referring to fig. 3, in one embodiment, the step S201, before inputting the image set to be annotated into the joint detection model to obtain the hip joint image set output by the joint detection model, further includes:

s101, acquiring the first sample image set, wherein the first sample image set comprises a plurality of first sample images, and each first sample image has a marked hip joint region and a MASK MASK corresponding to the hip joint region;

the first sample image and the marked hip joint region are used for training a joint detection model, and the marked hip joint region is a supervision signal.

The hip joint region marked on the first sample image and the MASK corresponding to the hip joint region are used for training the feature extraction model, and the hip joint region and the MASK corresponding to the hip joint region serve as input data and also serve as supervision signals.

It should be noted that, the training data of the feature extraction model only needs to be a hip joint image and a MASK corresponding thereto, and is not limited to a hip joint region cut in the first sample image, and in the present application, the hip joint region cut in the first sample image is used to achieve the effect of multiplexing samples, so as to improve the coordination of training and reduce the collection of sample data.

S102, training the joint detection model according to the first sample image set to obtain a roughly trained joint detection model;

s103, capturing the first sample image labeling hip joint region from the first sample image set;

s104, performing rough training on the feature extraction model according to the intercepted hip joint region and the corresponding MASK to obtain a feature extraction model after rough training;

in the application, the first sample image is multiplexed, and the first sample image is directly used as a sample to independently perform rough training on the joint detection model and the feature extraction model, so that error propagation caused by linkage rough training is avoided, and the error increment is avoided.

S105, performing synchronous fine adjustment on the classification model, the joint detection model after rough training and the feature extraction model according to the second sample image set to obtain the joint detection model after fine adjustment, the feature extraction model and the classification model.

In the present application, the classification model, the joint detection model and the feature extraction model are subjected to synchronous fine tuning, that is, the classification model, the joint detection model and the feature extraction model are directly trained as models combined together, and the joint detection model and the feature extraction model are subjected to coarse training, so that the training is only required to be subjected to fine tuning. In the fine tuning process, the second sample image in the second sample image set is used as input data, and the marked disease type information is used as a supervision signal of the integrated model.

In the method, the mode of linkage fine adjustment after independent rough training is adopted, so that the error propagation range is limited on the basis of end-to-end training and reasoning of the hip joint image annotation, and the smooth convergence of the overall loss is ensured.

In one embodiment, the step S104 of performing rough training on the feature extraction model according to the intercepted hip joint region and the corresponding MASK to obtain a feature extraction model after rough training includes:

patch segmentation is carried out on the hip joint region;

determining a MASK sample image according to the segmented hip joint region and the MASK;

inputting the mask sample image into the feature extraction model to obtain mask image codes;

inputting the segmented hip joint region into the feature extraction model to obtain a sample image code;

the mask image is used as a query vector, the corresponding sample image is used as a positive sample vector, the rest groups of sample images are used as negative sample vectors, and the overall loss of the feature extraction model is calculated;

And adjusting the feature extraction model according to the integral loss until the integral loss converges.

In this application, the hip region (hip image) can be divided into a fixed-size Patch sequence, which is modeled with multiple points of attention.

In the present application, the image may be divided into patches in a deformable manner. In this way, semantics can be better preserved in a Patch (e.g., a formable Patch module (DPT)), reducing semantic corruption caused by fixed Patch partitioning.

Note that masking MASK is performed on a Patch basis, so that when the hip joint region is divided into a fixed size or a variable size, the divided hip joint region still needs to have a correspondence with masking MASK.

In the present application, the MASK is used to block a part of the Patch in the segmented hip joint region, so as to form a zero-input Patch.

The MASK sample image is an image obtained by correspondingly shielding the segmented hip joint region according to a MASK, and the masked Patch in the image is thrown to keep a corresponding position and only adjusts the corresponding pixel value.

Wherein, the image is a natural signal with a great amount of space redundancy, which is very different from the language of a signal with high semantic meaning and information density generated by human beings, and the image is directly encoded by a feature extraction model with a Swin transducer model, so that accurate semantic information in the image is difficult to clearly acquire.

In the method, the MASK MASKs part of the content in the image and then training is carried out, so that the feature extraction model can extract semantic information from the residual image to supplement the masked information, and the effect that the feature extraction model 'understands' the image is achieved.

In one embodiment, the MASK is different for each set of the hip joint regions. Therefore, the MASK is disturbed, the difficulty of rough training is improved, and content highly related to the MASK in semantic information extracted by the feature extraction model is avoided, so that a better training effect is achieved.

It should be noted that the image has a large amount of spatial redundancy, and based on this property, lost blocks can be recovered from neighboring blocks, but this recovery is based on neighboring blocks, without understanding the semantic information of the image; this makes the trained feature extraction model less well understood of the local, object, and scene.

In one embodiment, the MASK is randomly generated, thereby increasing the difficulty of image restoration, so that the feature extraction model cannot be directly restored based on adjacent blocks, and thereby improving the advanced understanding of the feature extraction model on the image.

In one embodiment, the masking rate of the MASK is greater than 50%.

Wherein, the masking rate refers to the proportion of the masked Patch blocks in the MASK occupying the total Patch blocks of the image. The masking rate of MASK is greater than 50%, i.e. more than half of the hip joint region after segmentation is masked.

In this way, the feature extraction model cannot be directly restored based on adjacent blocks through high masking rate, so that the high-level understanding of the image by the feature extraction model is improved. In addition, the high masking rate optimizes the understanding of the feature extraction model on the image on the one hand, and on the other hand, the feature extraction model does not need to extract the masked features of the Patch in the training process of the feature extraction model, so that the data to be processed (only the unmasked part needs to be processed) is greatly reduced, the time of coarse training can be reduced by 1 time or more, the memory consumption is reduced, and the feature extraction model can be applied or expanded to a large model.

The mask image code and the sample image code are codes of the same format.

In one embodiment, the sample image code contains image information for all patches of the hip region and the mask image code contains image information corresponding to all patches of the hip region.

The sample image code comprises image information of all Patches of the hip joint region, namely the sample image code is obtained by inputting the divided complete hip joint region into the feature extraction model; the MASK image code contains image information corresponding to all the Patches of the hip joint region, namely the MASK image code is obtained by combining the divided hip joint region with a MASK MASK and inputting the MASK MASK into the feature extraction model, wherein the masked Patches are still reserved but not deleted.

In the application, the masked Patch is reserved, so that the extracted features have corresponding relations with the masked image and the unmasked image, and the situation that the extracted features and the masked image do not correspond due to the fact that the masked Patch is deleted is avoided.

In one embodiment, the calculation formula of the overall loss of the feature extraction model is:

where q is the query vector, k is the positive sample vector, k _i For negative sample vectors, sim (q, k) is the similarity of the query vector to the positive sample vector, sim (q, k) _i ) For similarity of the query vector and the negative sample vector, τ is a temperature parameter.

The calculation of the similarity and the determination of the temperature parameter may be performed according to actual situations, which are not described herein.

In the application, the true hip joint region is taken as a positive sample, the output of the feature extraction model on the masked hip joint region is taken as a query sample, the other groups of the hip joint regions (the other groups of the masked hip joint regions can also be included) are taken as negative samples, and the distance between the query sample and the positive sample is shortened through training, and the distance between the query sample and the negative sample is lengthened.

In the method, the feature extraction model is subjected to self-supervision learning through coarse training, so that the feature extraction model can understand hip joint image information; then, fine tuning is carried out on the rough training feature extraction model, the classification model and the joint detection model, so that model integration is realized; on one hand, the time occupation and the resource occupation of model training can be saved, and on the other hand, the understanding capability and the resolving power of the feature extraction model on the hip joint image information can be improved, so that a better training effect is achieved.

In the method, the random sampling MASK with high masking rate eliminates redundancy to a great extent, so that tasks which cannot be easily solved by extrapolation from visible adjacent patches are created, and a better training effect of the feature extraction model is achieved.

Referring to fig. 4, in an embodiment, the classification model is a K-means model, and the step S105 of performing fine tuning on the classification model, the joint detection model after coarse training, and the feature extraction model according to the second sample image set to obtain the fine-tuned joint detection model, the feature extraction model, and the classification model includes:

s501, acquiring the second sample image set, wherein the second sample image set comprises a plurality of second sample images, and each second sample image has a marked disease type;

it should be noted that, in order to achieve better fine tuning effect, the second sample image should contain all kinds of hip joint diseases, and the second sample image of each disease exceeds a certain amount.

S502, counting disease types marked by the second sample image set, and determining a K value of the classification model according to the disease types;

in the application, the classification grades of the disease types marked by the second sample image are the same, namely, any two disease types are mutually independent, do not have overlapping relation, and do not have inclusion and inclusion relation.

Wherein the overlapping relationship is the same sample image, and can be diagnosed as the disease A or the disease B; the inclusion relationship is that the disease A is an upper disease and comprises a disease B and a disease C.

In the step, the number of disease types is used as the K value of the classification model, so that a one-to-one correspondence is achieved.

S503, inputting the second sample image set into the joint detection model after rough training to obtain a second hip joint image set corresponding to the second sample image set;

inputting each second sample image into a joint detection model after coarse training to obtain a corresponding second hip joint image; these second hip images constitute a second set of hip images.

S504, inputting the second hip joint image set into the feature extraction model after rough training to obtain a second image feature information set corresponding to the second hip joint image set;

and inputting each second hip joint image into the feature extraction model after rough training to obtain corresponding second image feature information, wherein the second image feature information forms a second image feature information set.

It should be noted that, the images are classified by the classification model, and the images themselves have high redundancy, so that the dimension of the encoded information of the image conversion is too high, and the dimension reduction is required to be performed first and then the classification model is required to be input for classification. The dimensionality reduction is typically achieved using a PCA (principal component analysis) algorithm that is used to convert high-dimensional features to low-dimensional features. However, in specific practice, the correspondence between the encoded information after the dimension reduction of the PCA algorithm and the image is very poor, and the classification effect of the analysis model is disturbed.

In the method, the feature extraction model is set, and dimension reduction is performed while the image features are extracted in a feature extraction mode, so that the high correspondence between coding information and images is reserved, and the classification accuracy of the classification model is improved.

In the application, the second hip joint image is processed through the feature extraction model, specifically, the dimension reduction processing can be performed through the second plastic layer, or the encoded information output by the feature extraction model can be input into the classification model after the dimension reduction by adding the full connection layer.

In a specific implementation process, the second sample image and the first sample image may be 512×512 format pictures, a 64×64 hip joint image is intercepted by the joint detection model, and 100×1 dimension image feature information is extracted by the feature extraction model (and the full connection layer) to be used as input data for classification by the classification model.

S505, inputting the second image characteristic information set into the classification model, and classifying the second image characteristic information into K categories based on the K value;

in the application, the classification model is a K-MEANS classification model, and after the K value is determined, the second image characteristic information is classified into K categories through the classification model.

S506, determining the corresponding relation between the K categories and the marked disease categories;

k disease types are marked, and the second image characteristic information set is divided into K types through a classification model; in this step, the one-to-one correspondence between the K disease categories and the K categories is determined.

S507, calculating an overall loss function according to the category corresponding to the second sample image and the marked disease category;

in one embodiment, the calculation formula of the overall loss function is:

The loss function is to use the absolute value of the difference between the predicted value and the actual value as the loss value. It measures the absolute difference between the predicted and actual values, even small differences will have an effect on the loss, and the loss function is therefore relatively insensitive to outliers.

In the application, the characteristic that the loss function is insensitive to the abnormal value reduces the influence of the abnormal second sample image on the classification process.

And S508, iterating the initial clustering center of the classification model, the joint detection model and the feature extraction model according to the integral loss function until the integral loss function converges.

In one embodiment, the step S105 performs fine adjustment on the classification model, the joint detection model after coarse training, and the feature extraction model according to the second sample image set to obtain the fine-adjusted joint detection model, the feature extraction model, and the classification model, and further includes:

It should be noted that, the classification model is an unsupervised learning model, but the unsupervised learning model can obtain a local optimal solution in an iterative manner, but the classification model is sensitive to an initial clustering center, that is, when the classification model classifies the same second sample image set, different initial clustering centers are selected, and the classification results are different.

In the application, the initial cluster center after the last iteration is saved as the initial cluster center of the classification model after fine tuning, namely, in the actual use process (step S203), the reserved initial cluster center is used when the classification model is classified, and the initial cluster center is not selected randomly.

It should be noted that, in the process of classifying the image feature information set by the classification model, the initial clustering center is still iterated, but the iterated time clustering center is not the initial clustering center.

In the application, the corresponding relation between the initial clustering center and the disease types is saved, and the corresponding relation between the classified K types and the disease types is determined through the iterative process from the initial clustering center to the clustering center, so that the disease types of each type are determined while the classification model divides the image characteristic information set into the K types, namely the type of each image characteristic information is obtained and marked (disease type).

In the method, a classification model, a joint detection model and a feature extraction model are set as an integrated model, disease categories of a second sample image are used as supervision in a fine adjustment mode, and an initial clustering center of the last iteration is reserved; the initial clustering center is fixed, so that the fine adjustment effect is kept, the sensitivity of the integrated model to the initial clustering center is reduced, and the condition that the classification results of the same image are different is avoided.

In one embodiment, the step S506 of determining the correspondence between K categories and the noted disease categories includes:

re-executing the step S505, inputting the second image characteristic information set into the classification model, and dividing the second image characteristic information into K categories based on the K values under the condition that the K categories do not all establish one-to-one correspondence with different disease categories;

In the application, the number of disease types marked by the first sample images of the K categories and the disease proportion occupied in the category are counted respectively, namely, the number of each disease category in each category and the disease proportion of each disease category are counted.

In the present application, the number of one disease category in one category is greater than the first threshold ratio of the total number of the disease categories (the first threshold ratio is greater than 50%), meaning that the disease category is largely divided into categories, thus establishing correspondence.

In the present application, the disease proportion of one disease category in one category is greater than the second threshold proportion (the second threshold proportion is greater than 50%), meaning that the first sample image in the category is mostly labeled as the disease category, and thus a correspondence is established.

For example, the classified category K contains 100 first sample images, and the first sample images are labeled with 5 disease categories, namely disease a, disease B, disease C, disease D and disease E; the number of each disease category in the corresponding category K is 30, 29, 21, 15,5, and the disease proportion of each disease category in the corresponding category K is 30%,29%,21%,15%,5% in sequence; after the number of K categories and occupied disease proportion of the disease B are respectively ranked, the following steps are found: the total number of diseases B is 32, the number in the category K is 29, the disease proportion is 29, the number in the category J is 3, and the disease proportion is 4, and then the diseases B and the category K establish a corresponding relation; after the number of K categories and occupied disease proportion of the disease A are respectively ranked, the following steps are found: 300 diseases A are counted, the number in the category K is 30, the disease proportion is 30, the number in the category G is 270 (90% of the total number of diseases A is larger than a first threshold proportion), the disease proportion is 25%, and then the diseases A and the category G are in a corresponding relation; after the number of K categories and occupied disease proportion of the disease E are respectively ranked, the following steps are found: the total number of diseases E is 30, the number in the category K is 5, the disease proportion is 5, the number in the category F is 12, and the disease proportion is 89%, so that the diseases E and the category F are in corresponding relation.

Therefore, the self-correspondence of the category and the marked disease category can be realized through a computer, so that the effect of intelligent self-marking is achieved.

In an embodiment, the training the joint detection model according to the first sample image set may obtain a coarse-trained joint detection model, where the key detection model may be a YOLO detection model or a Retina detection model, so as to detect a hip joint region in a picture.

The coarse training process of the YOLO model is based on back propagation of the cross entropy loss function. The loss function includes, for each bounding box, a position error, a confidence error, and a class error. The YOLO model algorithm updates parameters of the neural network model through back propagation, and accuracy of target detection is improved.

The coarse training process of the Retina model is based on back propagation of the focus loss function. The Retina model is mainly used for solving the problem of unbalanced foreground and background types, and reduces the loss weight of a simple sample through a focus loss function, so that a network pays attention to training of a difficult sample.

In the application, the coarse training process of the YOLO model and the coarse training process of the Retina model can refer to the existing YOLO model and Retina model, and the specific structure and the specific coarse training process of the YOLO model are not repeated.

In one embodiment, as shown in connection with FIG. 2, the feature extraction model includes a linear embedding layer, a first remodelling layer, an even number of Swin transducer modules, and a second remodelling layer;

the linear embedding layer changes the dimension of the input image data;

the first remodelling layer changes the channel number of the image data;

In this application, the Linear Embedding layer is the Linear Embedding layer in fig. 2, which converts the dimension of the input vector into a preset dimension value, namely a dimension value that can be received by the Swin transform module; the first remolding layer is a Reshape layer in front of the Swin Transformer module in fig. 2, and is used for transforming a specified matrix into a matrix with a specific dimension, and the number of elements in the matrix is unchanged, which can readjust the number of rows, the number of columns and the dimension of the matrix, and in the application, the number of channels of an input vector is adjusted, so that the number of channels of the input vector is converted into a preset number of channels, namely the number of channels which can be received by the Swin Transformer module; the Swin Transformer modules are Swin Transformer Block ×n layers in fig. 2, and are used for extracting features of input image data, extracting image characteristics, and obtaining feature information/feature graphs; the second remodelling layer is a Reshape layer after the Swin transducer module in fig. 2, and is used for remodelling the output of the feature extraction model to obtain image feature information, where the image feature information is corresponding format information that can be input into the GPT model.

In the present application, the number of Swin transducer modules is even, and the structures of two adjacent Swin transducer modules are similar and combined, and the structures of the two adjacent Swin transducer modules are identical to those of the other two Swin transducer modules.

As shown in connection with fig. 5, in one embodiment, the Swin transducer module comprises: the system comprises an MLP module and an MSA module based on a displacement window, wherein the front of the MSA module and the front of the MLP module are respectively provided with a normalization layer, and the rear of the MSA module and the rear of the MLP module are respectively provided with a residual connection.

In fig. 5, two consecutive Swin Transformer Block are shown, and it can be seen that the two Swin Transformer Block are similar in structure, but the MSA modules are not identical, wherein the former Swin Transformer Block is a W-MSA module, the latter Swin Transformer Block is a SW-MSA module, and the rest remains the same.

The MLP module is a 2-layer MLP module which is arranged in the middle in a way of sandwiching the Gelu nonlinearity, and the Layernorm (LN) layer in the figure is the normalization layer.

In the Swin transducer module, the W-MSA module calculates self-attention in a non-overlapped local window to replace global self-attention; the images are evenly divided in a non-overlapping mode to obtain each window, and the W-MSA module has linear complexity, so that compared with the MSA module with secondary complexity, the calculation complexity is greatly reduced.

While window-based self-attention modules (W-MSAs) reduce computational complexity from quadratic to linear, the lack of communication and linkage across windows will limit their modeling characterization capabilities; therefore introducing the relation of the quartic windows through the shift window, and simultaneously keeping the computing efficiency of the non-overlapping windows; the introduction alternates between two partition configurations in succession Swin Transformer Blocks.

In this application, by shifting to the left and up in a cyclic way, after such shifting, the batch window may be composed of sub-windows that are not adjacent in the feature map, so the (Masked MSA) self-attention computation is limited within each sub-window using a masking mechanism, specifically: the self-attention is normally calculated, and then the Mask operation is performed to try to set the unnecessary attention to 0, so that the self-attention calculation is limited to each sub-window.

The embodiment of the application provides an intelligent self-labeling device for hip joint diseases based on unsupervised learning, which is used for executing the intelligent self-labeling method for hip joint diseases based on unsupervised learning described in the application, and the intelligent self-labeling device for hip joint diseases based on unsupervised learning is described in detail below.

As shown in fig. 6, the intelligent self-labeling device for hip joint diseases based on unsupervised learning comprises:

The joint detection module 101 is used for inputting the image set to be annotated into the joint detection model to obtain a hip joint image set output by the joint detection model;

the feature extraction module 102 is used for inputting the hip joint image set into a feature extraction model to obtain an image feature information set, wherein the feature extraction model is constructed by a Swin transducer model;

the category labeling module 103 is used for inputting the image characteristic information set into a classification model, obtaining and labeling the category of each image characteristic information;

The intelligent self-labeling device for the hip joint diseases based on the unsupervised learning provided by the embodiment of the application has a corresponding relation with the intelligent self-labeling method for the hip joint diseases based on the unsupervised learning provided by the embodiment of the application, so that specific contents in the device have a corresponding relation with the intelligent self-labeling method for the hip joint diseases, and specific contents can refer to records in the intelligent self-labeling method for the hip joint diseases, and the details are not repeated in the application.

The intelligent self-labeling device for the hip joint diseases based on the unsupervised learning provided by the embodiment of the application and the intelligent self-labeling method for the hip joint diseases based on the unsupervised learning provided by the embodiment of the application have the same beneficial effects as the method adopted, operated or realized by the stored application program due to the same inventive concept.

The internal functions and structures of the intelligent self-labeling device for hip joint diseases based on the unsupervised learning are described above, and as shown in fig. 7, in practice, the intelligent self-labeling device for hip joint diseases based on the unsupervised learning may be implemented as an electronic device, including: memory 301 and processor 303.

The memory 301 may be configured to store a program.

In addition, the memory 301 may also be configured to store other various data to support operations on the electronic device. Examples of such data include instructions for any application or method operating on the electronic device, contact data, phonebook data, messages, pictures, videos, and the like.

The memory 301 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

A processor 303 coupled to the memory 301 for executing programs in the memory 301 for:

In this application, the processor is further specifically configured to execute all the processes and steps of the above-mentioned intelligent self-labeling method for hip joint diseases based on unsupervised learning, and specific content may refer to the record in the intelligent self-labeling method for hip joint diseases, which is not described in detail in this application.

In this application, only some components are schematically shown in fig. 7, which does not mean that the electronic device only includes the components shown in fig. 7.

The electronic device provided by the embodiment has the same beneficial effects as the method adopted, operated or realized by the application program stored by the electronic device based on the same inventive concept as the intelligent self-labeling method for the hip joint diseases based on the unsupervised learning provided by the embodiment of the application.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory, random Access Memory (RAM), and/or nonvolatile memory, such as Read Only Memory (ROM) or Flash memory (Flash RAM), among others, in a computer readable medium. Memory is an example of computer-readable media.

The present application further provides a computer readable storage medium corresponding to the intelligent self-labeling method for hip joint diseases based on the unsupervised learning provided in the foregoing embodiment, where a computer program (i.e. a program product) is stored thereon, and when the computer program is executed by a processor, the intelligent self-labeling method for hip joint diseases based on the unsupervised learning provided in any of the foregoing embodiments is executed.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable Media, as defined herein, does not include Transitory computer-readable Media (transmission Media), such as modulated data signals and carrier waves.

The computer readable storage medium provided by the above embodiment of the present application and the intelligent self-labeling method for hip joint diseases based on unsupervised learning provided by the embodiments of the present application have the same beneficial effects as the method adopted, operated or implemented by the application program stored therein, because of the same inventive concept.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the present application may be practiced without these specific details. In some instances, well-known structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims

1. An intelligent self-labeling method for hip joint diseases based on unsupervised learning is characterized by comprising the following steps:

the joint detection model and the characteristic extraction model are obtained by performing coarse training based on a first sample image set and performing fine adjustment based on a second sample image set, the classification model is obtained by performing fine adjustment based on the second sample image set, the first sample image set is provided with marked hip joint area information and mask information, and the second sample image set is provided with marked disease type information;

Before the image set to be annotated is input into the joint detection model to obtain the hip joint image set output by the joint detection model, the method further comprises the following steps:

performing synchronous fine adjustment on the classification model, the joint detection model after coarse training and the feature extraction model according to the second sample image set to obtain the joint detection model after fine adjustment, the feature extraction model and the classification model;

performing rough training on the feature extraction model according to the intercepted hip joint region and the corresponding MASK to obtain a feature extraction model after rough training, wherein the method comprises the following steps:

Patch segmentation is carried out on the hip joint region;

2. The intelligent self-labeling method of claim 1, wherein the classification model is a K-means model, the performing synchronous fine adjustment on the classification model, the joint detection model after coarse training, and the feature extraction model according to the second sample image set, to obtain the fine-adjusted joint detection model, the feature extraction model, and the classification model, includes:

3. The intelligent self-labeling method of hip joint diseases according to claim 2, wherein the classification model is a K-means model, the performing synchronous fine adjustment on the classification model, the joint detection model after coarse training, and the feature extraction model according to the second sample image set, to obtain the joint detection model after fine adjustment, the feature extraction model, and the classification model, further comprising:

4. The intelligent self-labeling method for hip joint diseases according to claim 2, wherein determining the correspondence between K categories and labeled disease categories comprises:

5. The intelligent self-labeling method for hip joint diseases according to claim 2, wherein the calculation formula of the overall loss function is:

6. The method for intelligent self-labeling of hip joint diseases according to any one of claims 1-2, wherein the feature extraction model comprises a linear embedding layer, a first remodelling layer, an even number of Swin transducer modules, and a second remodelling layer;

the linear embedding layer changes the dimension of the input image data;

the first remodelling layer changes the channel number of the image data;

7. An intelligent self-labeling device for hip joint diseases based on unsupervised learning, which is characterized by comprising:

Patch segmentation is carried out on the hip joint region;

8. An electronic device, comprising: a memory and a processor;

the memory is used for storing programs;

the processor, coupled to the memory, is configured to execute the program for:

patch segmentation is carried out on the hip joint region;

9. A computer readable storage medium having stored thereon a computer program, wherein the program is executed by a processor to implement the unsupervised learning based intelligent self-labeling method for hip joint diseases according to any one of claims 1 to 6.