CN116738297A - Diabetes typing method and system based on depth self-coding - Google Patents

Diabetes typing method and system based on depth self-coding Download PDF

Info

Publication number
CN116738297A
CN116738297A CN202311022792.8A CN202311022792A CN116738297A CN 116738297 A CN116738297 A CN 116738297A CN 202311022792 A CN202311022792 A CN 202311022792A CN 116738297 A CN116738297 A CN 116738297A
Authority
CN
China
Prior art keywords
data
diabetes
model
clinical
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311022792.8A
Other languages
Chinese (zh)
Other versions
CN116738297B (en
Inventor
王伟好
肖佩
潘琦
陈子豪
李影
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qs Medical Technology Co ltd
Original Assignee
Beijing Qs Medical Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qs Medical Technology Co ltd filed Critical Beijing Qs Medical Technology Co ltd
Priority to CN202311022792.8A priority Critical patent/CN116738297B/en
Publication of CN116738297A publication Critical patent/CN116738297A/en
Application granted granted Critical
Publication of CN116738297B publication Critical patent/CN116738297B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention provides a diabetes typing method and system based on depth self-coding. The diabetes typing method comprises the following steps: extracting clinical data samples from a diabetes clinical database as training data and verification data; constructing a diabetes parting model based on depth self-coding, and training the diabetes parting model by utilizing the training model to obtain a trained diabetes parting model; wherein, a Kmes clustering module is embedded in a partial depth self-encoder of the diabetes parting model; and verifying the trained diabetes parting model by using verification data, and determining whether the diabetes parting model needs to be adjusted or not based on a verification result to obtain a final diabetes parting model. The diabetes typing system includes a module corresponding to the diabetes typing method.

Description

Diabetes typing method and system based on depth self-coding
Technical Field
The invention provides a diabetes typing method and system based on depth self-coding, and belongs to the technical field of deep learning model establishment.
Background
The current general method for classifying diabetes mellitus classifies diabetes mellitus into type 1 diabetes mellitus and type 2 diabetes mellitus, of which 90% belong to type 2 diabetes mellitus. And the type 2 diabetes can show different manifestations in different individuals in the aspects of etiology, clinical manifestations, prognosis and the like, has higher heterogeneity and different clinical fatalities. Therefore, the current diabetes typing method cannot meet the clinical work requirements, and cannot be used for individual and accurate treatment of diabetics. In this context, there is a need for a disease typing model designed for the diabetic population.
The traditional machine learning clustering method is difficult to accurately evaluate the similarity between samples, and is difficult to effectively cluster high-dimensional data with sparse data distribution and unclear cluster structures; meanwhile, if the neural network is only used as a feature extractor, it does not explicitly incorporate the clustering promotion target in the learning process, so that the learned deep neural network does not necessarily output dimension reduction data suitable for clustering.
Disclosure of Invention
The invention provides a diabetes typing method and a diabetes typing system based on depth self-coding, which are used for solving the problem that the existing diabetes typing model cannot effectively cluster high-dimension data with sparse data distribution and unclear cluster structure, and adopts the following technical scheme:
a depth self-encoding based diabetes typing method, the diabetes typing method comprising:
extracting clinical data samples from a diabetes clinical database as training data and verification data;
constructing a diabetes parting model based on depth self-coding, and training the diabetes parting model by using a training model to obtain a trained diabetes parting model; wherein, a Kmes clustering module is embedded in a partial depth self-encoder of the diabetes parting model;
And verifying the trained diabetes parting model by using verification data, and determining whether the diabetes parting model needs to be adjusted or not based on a verification result to obtain a final diabetes parting model.
Further, extracting clinical data samples from the diabetes clinical database as training data and validation data, comprising:
extracting clinical data samples from the diabetes clinical database;
performing data preprocessing on the clinical data sample to obtain a preprocessed clinical data sample;
dividing the preprocessed clinical data samples according to the data proportion of preset training data and verification data, and obtaining the training data and the verification data corresponding to the data proportion.
Further, performing data preprocessing on the clinical data sample to obtain a preprocessed clinical data sample, including:
removing null values from the clinical data samples to obtain clinical sample data without null values;
removing the null-free clinical sample dataNObtaining clinical sample data without abnormal values by abnormal values outside the standard deviation;
and carrying out continuous variable normalization and classified variable coding treatment on the clinical sample data without abnormal values to obtain a preprocessed clinical data sample.
Further, the clinical sample data without null values is removedNOutlier values outside of the individual standard deviations, obtaining outlier-free clinical sample data, comprising:
carrying out average value calculation and standard deviation calculation on the clinical sample data to obtain an average value and a standard deviation corresponding to the clinical sample data;
determining a threshold coefficient of an outlier using the mean and standard deviation corresponding to the clinical sample dataNAnd pass through the threshold coefficientNDetermining a range of outliers, wherein the threshold coefficientNAnd the range of outliers is obtained by the following formula:
wherein ,Nrepresenting a threshold coefficient;X p mean values representing clinical sample data;X c standard deviation representing clinical sample data;Pthe percentile point is indicated as being the percentile point,Pthe value range of (2) is 0.71-0.74;λrepresenting adjustment coefficients whenX c -(1+P 2X p >At the time of 0, the temperature of the liquid,λ=-(1-P) When (when)X c -(1+P 2X p <At the time of 0, the temperature of the liquid,λ=1;ΔPrepresenting a first adjustment factor;X ymax andX ymin upper limit value sum of range representing abnormal valueA lower limit value;
traversing each data point in the data set, and judging whether the data points exceed the range of abnormal values or not;
when the clinical sample data exceeds the range of the abnormal value, taking the clinical sample data exceeding the range of the abnormal value as the abnormal value;
And acquiring a substitute value of the abnormal value according to the relation between the abnormal value and the range of the abnormal value, replacing the abnormal value and the corresponding position of the abnormal value by the substitute value, and deleting the abnormal value.
Further, the substitution value is obtained by the following formula:
wherein ,X t representing a substitute value corresponding to the outlier;X p mean values representing clinical sample data;Xa numerical value corresponding to an original data point representing the clinical sample data;X c standard deviation representing clinical sample data;Pthe percentile point is indicated as being the percentile point,Pthe value range of (2) is 0.71-0.74;X ymax andX ymin the upper limit value and the lower limit value of the range representing the abnormal value.
Further, performing continuous variable normalization and classified variable encoding processing on the denoised clinical sample data to obtain a preprocessed clinical data sample, including:
setting a scaling strategy of continuous variables, wherein the scaling strategy corresponds to the formula as follows:
wherein ,X s a numerical value corresponding to a data point representing scaled clinical sample data;Xa numerical value corresponding to an original data point representing the clinical sample data;X min raw data set representing clinical sample dataThe minimum data value of (a); X max A maximum data value in the raw dataset representing clinical sample data;X rmin andX rmax a data lower limit value and a data upper limit value of scaling data preset in variable scaling of the clinical sample data are represented;
scaling and normalizing continuous variables to be normalized in the clinical sample data according to the scaling strategy of the continuous variables to generate continuous variable normalized data information;
and determining a classification variable which needs to be subjected to coding processing in the continuous variable normalized data information, and carrying out classification variable coding processing on the continuous variable normalized data information according to the characteristics of the classification variable to obtain sample data after classification coding conversion, wherein the sample data after class coding conversion is a clinical data sample after preprocessing.
Further, constructing a diabetes parting model based on depth self-coding, training the diabetes parting model by using a training model to obtain a trained diabetes parting model, and the method comprises the following steps of:
constructing a diabetes typing model based on depth self-coding;
training the depth self-encoder by using training data to obtain a trained depth self-encoder;
The trained depth is self-coded in the encoderMDepth self-encoderKmensThe clustering module passes throughKLPerforming joint loss optimization in a divergence mode to form a carrier withKmensA clustered depth self-encoder; wherein is provided withKmensThe diabetes typing model of the clustered depth self-encoder is the trained diabetes typing model, and the depth self-encoderMThe specific value of (2) is obtained by the following formula:
wherein ,Mrepresenting a unionKmensThe number of depth self-encoders of the clustering module, and,Mto round downwards when passing throughCalculated to obtainMWhen=0, letM =1, when passing->Calculated to obtainM>M 0 Time, orderM =M 0 -1;A 0 The number of data representing abnormal values in the clinical sample data;Atotal number of sample data representing clinical sample data;M 0 the total number of depth self-encoders represented in the depth self-encoding diabetes typing model; deltaMRepresenting a second adjustment factor.
Further, verifying the trained diabetes parting model by using verification data, determining whether the diabetes parting model needs to be adjusted based on a verification result, and obtaining a final diabetes parting model, wherein the method comprises the following steps of:
inputting the verification data into a trained diabetes typing model to obtain a clustering index radar chart after diabetes typing;
Comparing the index data represented in the clustered index radar map with the characteristics of each type of diabetes in the validation data;
when the comparison result shows that the diabetes parting model accords with the characteristic distribution rule range of the verification data, judging that the current trained diabetes parting model is the final diabetes parting model;
when the comparison result shows that the diabetes parting model does not accord with the characteristic distribution rule range of the verification data, the first adjustment factor and the second adjustment factor are utilized to respectively carry out threshold coefficient on the abnormal valueNAnd the number of encodersMAdjusting; and uses the threshold coefficient of the adjusted outlierNAnd the number of encodersMAnd re-acquiring the trained diabetes parting model until the verification result of the trained diabetes parting model accords with the characteristic distribution rule range of the verification data.
Further, the first adjustment factor and the second adjustment factor are obtained by the following formula:
wherein ,ΔPRepresenting a first adjustment factor; deltaMRepresenting a second adjustment factor;Krepresenting the number of data which does not accord with the characteristic distribution rule range of the verification data;X mi represent the firstiData values which do not conform to the characteristic distribution rule range of the verification data; X si Represent the firstiScaled data values corresponding to data that do not conform to the characteristic distribution rule range of the verification data;X h representing the range of feature distribution rules and the firstiData values corresponding to data nearest data points which do not accord with the characteristic distribution rule range of the verification data;X p mean values representing clinical sample data;X c standard deviation representing clinical sample data;X c1 and a numerical value representing the standard deviation corresponding to the verification data.
A depth self-encoding based diabetes typing system, the diabetes typing system comprising:
the data extraction module is used for extracting clinical data samples from the diabetes clinical database as training data and verification data;
the model construction and training module is used for constructing a diabetes parting model based on depth self-coding, and training the diabetes parting model by utilizing the training model to obtain a trained diabetes parting model; wherein, a Kmes clustering module is embedded in a partial depth self-encoder of the diabetes parting model;
and the verification adjustment module is used for verifying the trained diabetes parting model by using verification data, determining whether the diabetes parting model needs to be adjusted or not based on a verification result, and obtaining a final diabetes parting model.
The invention has the beneficial effects that:
according to the diabetes typing method and system based on depth self-coding, a clustering target is added to an optimization process, namely, a pre-trained encoder part of a self-encoder is taken out, and the clustering target and a Kmes clustering module perform joint loss optimization through KL divergence, so that effective clustering can be performed on high-dimensional data with sparse data distribution and unclear cluster structure.
Drawings
FIG. 1 is a flow chart of a method for typing diabetes mellitus according to the present invention;
FIG. 2 is a system block diagram of a diabetes typing system according to the present invention;
FIG. 3 is a model of diabetes mellitus typing according to the present inventionKmensThe clustering module adds a schematic diagram.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.
The embodiment of the invention provides a diabetes typing method based on depth self-coding, as shown in figure 1, comprising the following steps:
S1, extracting clinical data samples from a diabetes clinical database as training data and verification data;
s2, constructing a diabetes parting model based on depth self-coding, and training the diabetes parting model by using a training model to obtain a trained diabetes parting model; wherein, a Kmen clustering module is embedded in a partial depth self-encoder of the diabetes typing model, and the principle of the Kmen clustering module is shown in a figure 3, wherein the abbreviation of English deep coding of DEC means a self-encoding clustering algorithm, the encoder means an encoder, and the decoder means a decoder;
and S3, verifying the trained diabetes parting model by using verification data, and determining whether the diabetes parting model needs to be adjusted or not based on a verification result to obtain a final diabetes parting model.
The working principle of the technical scheme is as follows: s1, extracting clinical data samples from a diabetes clinical database as training data and verification data: in this step, a number of clinical data samples are obtained from the diabetes clinical database. These data samples contain clinical features and signatures associated with diabetes. Training data is used to build the model and validation data is used to evaluate the performance of the model.
S2, constructing a diabetes parting model based on depth self-coding, and training the diabetes parting model by using the training model to obtain a trained diabetes parting model:
in this step, a diabetes typing model is constructed using a depth self-encoding model. Depth self-coding is an unsupervised learning method that encodes and decodes input data through a multi-layer neural network to extract advanced feature representations of the data. By model training the training data, an optimized diabetes typing model can be obtained that includes the ability to meaningfully extract and represent features of the input data.
Wherein a neural network model with multiple depth self-encoders may be employed, the structure of which may be, but is not limited to, the following network model structure:
the following is a constituent structure of neural network models stacked from encoders for diabetes type typing:
input Layer (Input Layer): clinical characteristics of a diabetic patient are received as input.
Encoder (Encoder): a layer of multiple self-encoders, each self-encoder being responsible for learning a different level of abstract feature representation of the input data. Each self-encoder is composed of an encoder section and a decoder section. Wherein each self-encoder comprises the following two parts:
a. An encoder section: comprising one or more hidden layers and an activation function, compresses the input data into a lower dimensional coded representation.
b. A decoder section: symmetrical to the encoder section, containing one or more hidden layers and activation functions, maps the encoded representation back to the original input dimension.
Decoder (Decoder): the last decoder section output from the encoder is the output of the entire model.
Output Layer (Output Layer): a layer consisting of one or more neurons outputs probability distributions belonging to different diabetes types.
S3, verifying the trained diabetes parting model by using verification data, and determining whether the diabetes parting model needs to be adjusted or not based on a verification result to obtain a final diabetes parting model:
in this step, the trained diabetes subtype is evaluated and validated using the validation dataset. By inputting the validation data into the trained model, a model's prediction of the new sample can be obtained. Based on the verification results, the performance and accuracy of the model can be evaluated. If the verification result does not meet the requirement, the diabetes parting model can be adjusted and optimized, such as adjusting the hyper-parameters of the model, increasing the training data amount, and the like, so as to obtain the final diabetes parting model.
The technical scheme has the effects that: according to the diabetes typing method based on the depth self-coding, a clustering target is added to an optimization process, namely, a pre-trained encoder part of a self-encoder is taken out, and the pre-trained encoder part and a Kmen clustering module are subjected to joint loss optimization through KL divergence, so that effective clustering can be performed on high-dimensional data with sparse data distribution and unclear cluster structure, and meanwhile, the final diabetes typing model obtained by the diabetes typing method based on the depth self-coding can ensure that all data output by the depth self-encoder are dimension reduction data suitable for clustering.
In one embodiment of the invention, extracting clinical data samples from a diabetes clinical database as training data and validation data includes:
s101, extracting clinical data samples from the diabetes clinical database;
s102, carrying out data preprocessing on the clinical data sample to obtain a preprocessed clinical data sample;
s103, dividing the preprocessed clinical data sample according to the data proportion of preset training data and verification data, and obtaining the training data and the verification data corresponding to the data proportion.
Wherein, carry on the data preprocessing to the said clinical data sample, obtain the clinical data sample after preprocessing, including:
s1021, removing null values from the clinical data samples to obtain clinical sample data without null values;
s1022, removing the clinical sample data without null valueNObtaining clinical sample data without abnormal values by abnormal values outside the standard deviation;
s1023, carrying out continuous variable normalization and classified variable coding treatment on the clinical sample data without abnormal values to obtain a preprocessed clinical data sample.
The working principle of the technical scheme is as follows: clinical data samples of diabetics are obtained from the database. Data preprocessing is performed on the data samples. Data preprocessing is to clean and prepare the data for subsequent analysis and modeling. In this step, the following sub-steps are performed:
null values are removed from clinical data samples, and in order to process missing values in clinical data, the missing values may affect the results of subsequent analysis and modeling, and therefore samples containing null values need to be processed or culled.
The clinical sample data without null values is stripped of outliers outside of N standard deviations, with the aim of detecting and removing outliers in the data. Outliers may be due to measurement errors or other anomalies, which if left untreated, may adversely affect modeling and analysis. Where N represents a threshold, which may be a multiple of the standard deviation, as the case may be.
Continuous variable normalization and categorical variable encoding processes are performed on outlier-removed clinical sample data in order to properly process different types of features so that they are comparable and usable. Continuous variable normalization can scale different ranges of continuous variables to the same range, common methods include MinMax Scaling or Z-score normalization. The classification variable coding process converts the classification variable into a digital representation, and common methods are One-Hot Encoding (One-Hot Encoding) or Label Encoding (Label Encoding).
Dividing according to the preset proportion of training data and verification data, and mainly dividing the preprocessed clinical data sample into training data and verification data sets according to the preset proportion. Training data is used for training of the model, and validation data is used for evaluating the model performance and making adjustments.
The technical scheme has the effects that: the above-described technical solution of the present embodiment provides clean data samples and data sets for training and verification by performing data preprocessing and partitioning on diabetes clinical data. The technical scheme of the embodiment is beneficial to reducing noise and abnormal values in the data and converting the data into a form suitable for modeling, so that the accuracy and stability of the model are improved.
One embodiment of the invention removes the null-free clinical sample dataNOutlier values outside of the individual standard deviations, obtaining outlier-free clinical sample data, comprising:
step 1, carrying out average value calculation and standard deviation calculation on the clinical sample data to obtain an average value and a standard deviation corresponding to the clinical sample data;
step 2, determining the threshold coefficient of the abnormal value by using the average value and standard deviation corresponding to the clinical sample dataNAnd pass through the threshold coefficientNDetermining a range of outliers, wherein the threshold coefficientNAnd the range of outliers is obtained by the following formula:
wherein ,Nrepresenting a threshold coefficient;X p mean values representing clinical sample data;X c standard deviation representing clinical sample data;Pthe percentile point is indicated as being the percentile point,Pthe value range of (2) is 0.71-0.74;λrepresenting adjustment coefficients whenX c -(1+P 2X p >At the time of 0, the temperature of the liquid,λ=-(1-P) When (when)X c -(1+P 2X p <At the time of 0, the temperature of the liquid,λ=1;ΔPrepresenting a first adjustment factor;X ymax andX ymin an upper limit value and a lower limit value representing a range of abnormal values;
step 3, traversing each data point in the data set, and judging whether the data points exceed the range of abnormal values or not;
step 4, when the clinical sample data exceeds the range of the abnormal value, taking the clinical sample data exceeding the range of the abnormal value as the abnormal value;
And 5, acquiring a substitute value of the abnormal value according to the relation between the abnormal value and the range of the abnormal value, replacing the abnormal value and the corresponding position of the abnormal value with the substitute value, and deleting the abnormal value.
Wherein the substitution value is obtained by the following formula:
wherein ,X t representing a substitute value corresponding to the outlier;X p mean values representing clinical sample data;Xa numerical value corresponding to an original data point representing the clinical sample data;X c standard deviation representing clinical sample data;Pthe percentile point is indicated as being the percentile point,Pthe value range of (2) is 0.71-0.74;X ymax andX ymin upper and lower limits representing ranges of outliersA limit value.
The working principle of the technical scheme is as follows: and carrying out average value calculation and standard deviation calculation on the clinical sample data to obtain an average value and a standard deviation corresponding to the clinical sample data. The average value calculation is to average the data samples, and the standard deviation calculation is to measure the discrete degree of the data samples.
And determining a threshold coefficient N of the abnormal value by using the average value and the standard deviation corresponding to the clinical sample data, and determining the range of the abnormal value through the threshold coefficient N. The threshold coefficient N is a critical range for determining outliers from the mean and standard deviation, typically by multiplying N by the standard deviation.
Each data point in the dataset is traversed and a determination is made as to whether it is outside the range of outliers. For each data point, a determination is made as to whether it belongs to an outlier by comparison to a threshold range of outliers.
When the clinical sample data exceeds the range of outliers, the clinical sample data that exceeds the range of outliers is marked as outliers. This step marks data points that exceed the outlier range as outliers for subsequent processing.
And acquiring a substitute value of the abnormal value according to the relation between the abnormal value and the range of the abnormal value, and replacing the abnormal value and the corresponding position thereof with the substitute value. In this step, different strategies may be adopted to replace outliers, such as using means, median or other statistics as replacement values, as the case may be. And meanwhile, deleting the abnormal value from the data set to ensure the accuracy and consistency of the data.
The technical scheme has the effects that: according to the technical scheme, the range of abnormal values is determined by calculating the average value and the standard deviation of clinical sample data, and data points out of the range are marked as the abnormal values. Then, a substitute value for the outlier is obtained from the relationship between the outlier and the range, and the substitute value is substituted for the outlier. This effectively handles outliers in the clinical sample data to ensure the quality and reliability of the data.
Meanwhile, the threshold coefficient N obtained by the formula can effectively improve the accuracy of setting the range of the abnormal value and the matching property between the threshold coefficient N and clinical sample data, so that the problems of reduced screening sensitivity of the abnormal value, reduced accuracy of the abnormal value and further reduced precision and accuracy of a classification model of subsequent training are prevented; meanwhile, the problem that the threshold coefficient N is too small to cause the screening sensitivity of abnormal values to be too high and further cause effective training data to be removed by mistake can be prevented. The threshold coefficient N of the outlier can be determined according to the average value and standard deviation of the clinical sample data and other parameters and adjustment factors by the above formula, and the range of the outlier can be further determined. This helps identify and process outliers in clinical data, improving the accuracy and reliability of data screening.
On the other hand, the replacement value obtained through the formula can be set by combining the average value and standard deviation of clinical sample data and the upper limit value and the lower limit value of the range of the percentile point and the abnormal value, and setting is carried out by combining the actual distribution condition of the data value of each abnormal data, so that the rationality and the accuracy of the replacement value setting are effectively improved, and meanwhile, the non-abnormality of the replacement value can be effectively reduced through the replacement value setting, and the accuracy of the subsequent model training is further improved.
In one embodiment of the present invention, performing continuous variable normalization and classified variable encoding processing on the denoised clinical data sample to obtain a preprocessed clinical data sample, including:
setting a scaling strategy of continuous variables, wherein the scaling strategy corresponds to the formula as follows:
wherein ,X s a numerical value corresponding to a data point representing scaled clinical sample data;Xa numerical value corresponding to an original data point representing the clinical sample data;X min a minimum data value in the raw dataset representing clinical sample data;X max representing the raw number of clinical sample dataThe maximum data value in the dataset;X rmin andX rmax a data lower limit value and a data upper limit value of scaling data preset in variable scaling of the clinical sample data are represented;
secondly, scaling and normalizing continuous variables to be normalized in the clinical sample data according to the scaling strategy of the continuous variables to generate continuous variable normalized data information;
and thirdly, determining a classification variable which needs to be subjected to coding processing in the continuous variable normalized data information, and carrying out classification variable coding processing on the continuous variable normalized data information according to the characteristics of the classification variable to obtain sample data subjected to classification coding conversion, wherein the sample data subjected to class coding conversion is a pre-processed clinical data sample.
The working principle of the technical scheme is as follows: the scaling strategy for the continuous variable is set, in which step the scaling strategy to be adopted for the continuous variable needs to be determined, for example using, but not limited to, min-max scaling, normalization, etc.
And scaling and normalizing the continuous variable which needs to be normalized in the clinical sample data according to the scaling strategy of the continuous variable, and generating data information after continuous variable normalization. And according to the selected scaling strategy, carrying out corresponding processing on the continuous variable to ensure that the value of the continuous variable is within a certain range or meets specific distribution characteristics.
And determining the classification variable which needs to be subjected to coding processing in the continuous variable normalized data information. According to the characteristics of the classified variables, determining which variables need to be encoded, for example, converting the classified variables into numerical representations by using methods such as single-hot encoding, tag encoding and the like.
Finally, pre-processed clinical data samples can be obtained by continuous variable normalization and classification variable encoding processes, wherein the continuous variable has been subjected to a scaled normalization process and the classification variable has been converted into a numerical representation.
The technical scheme has the effects that: by the preprocessing, dimensional differences among data can be eliminated, the training effect of the model is improved, and different types of characteristics can be ensured to be correctly input into the model. Meanwhile, the clinical sample data is preprocessed by setting a scaling strategy of the continuous variable, scaling and normalizing the continuous variable and encoding the classified variable. Therefore, the comparability of data and the training effect of the model can be improved, and a better data base is provided for the subsequent establishment of the diabetes type parting model.
On the other hand, the scaled numerical values obtained through the scaling strategy can be reasonable in data scaling, the comprehensiveness of the distribution characteristics of the data in a certain range can be realized to the greatest extent, the quality of subsequent training data and the verification effectiveness of verification data are further improved, the data quality of the training data and the verification data is prevented from being reduced due to unreasonable data scaling, and further the problems that the accuracy of early model training is lower and the accuracy of subsequent model verification is lower are caused.
In one embodiment of the invention, a diabetes parting model based on depth self-coding is constructed, the diabetes parting model is trained by utilizing the training model, and the trained diabetes parting model is obtained, and the method comprises the following steps:
s201, constructing a diabetes typing model based on depth self-coding;
s202, training a depth self-encoder by using training data to obtain a trained depth self-encoder;
s203, the trained depth self-encoder is used for encodingMDepth self-encoderKmensThe clustering module passes throughKLPerforming joint loss optimization in a divergence mode to form a carrier withKmensA clustered depth self-encoder; wherein is provided withKmensThe diabetes typing model of the clustered depth self-encoder is the trained diabetes typing model, and the depth self-encoder MThe specific value of (2) is obtained by the following formula:
wherein ,Mrepresenting a unionKmensThe number of depth self-encoders of the clustering module, and,Mto round downwards when passing throughCalculated to obtainMWhen=0, letM =1, when passing->Calculated to obtainM>M 0 Time, orderM =M 0 -1;A 0 The number of data representing abnormal values in the clinical sample data;Atotal number of sample data representing clinical sample data;M 0 the total number of depth self-encoders represented in the depth self-encoding diabetes typing model; deltaMRepresenting a second adjustment factor.
The working principle of the technical scheme is as follows: and constructing a diabetes typing model based on depth self-coding. The depth self-encoder is a neural network model, and consists of an encoder and a decoder, and is used for learning the compact representation and reconstruction capability of input data.
Training the depth self-encoder by using training data to obtain the trained depth self-encoder. In this step, the training data is used to train the depth self-encoder, optimizing the model parameters by minimizing the reconstruction error, enabling it to reconstruct the input data better.
And carrying out joint loss optimization on M depth self-encoders in the trained depth self-encoders and a Kmeans clustering module in a KL divergence mode to form the depth self-encoder with Kmeans clustering. In this step, the trained depth self-encoder is combined with the Kmeans clustering module, and the model is optimized by minimizing the KL divergence, so that the encoded representation can be better matched with the Kmeans clustering result.
The technical scheme has the effects that: by the technical scheme, the depth self-encoder with Kmeans clustering is constructed, the model can encode and decode input data, and the data are grouped and classified by a clustering method. The depth self-encoder learns the feature representation of the data, and the Kmeans clustering module classifies the data by a clustering algorithm. Through joint optimization, the model can better type the diabetes data, so that the purpose of diabetes typing is achieved.
Meanwhile, according to the technical scheme, the clustering target is added into the optimization process, namely the pre-trained encoder part of the self-encoder is taken out, and the clustering target and the Kmen clustering module are combined to perform loss optimization through KL divergence, so that effective clustering can be performed on high-dimensional data with sparse data distribution and unclear cluster structure, and meanwhile, the final diabetes typing model obtained through the technical scheme provided by the embodiment can ensure that all data output by the depth self-encoder are dimension reduction data suitable for clustering.
The technical scheme provided by the embodiment constructs a model for diabetes typing by combining a depth self-encoder and a Kmeans clustering module and adopting a combined loss optimization mode. The model can automatically learn the characteristic representation of the data and cluster, and provides an effective method for diabetes typing.
On the other hand, the adjustment factor of the number M of the depth self-encoders calculated by the above formula is used to determine the number of the depth self-encoders in the diabetes typing model. The function of this adjustment factor is to adjust the number of depth self-encoders according to the number of outliers and the total number of samples to accommodate the characteristics and complexity of the data. Meanwhile, the number of the depth self-encoders embedded with the clustering model, which is obtained through the formula, can be combined with the actual situation of sample data, so that the number rationality of the depth self-encoders with the clustering model is effectively improved, the problem that the response speed is reduced due to the fact that the number of the depth self-encoders with the clustering model is too large, and the problem that the clustering effect is poor due to the fact that the number of the depth self-encoders with the clustering model is too small is prevented.
In one embodiment of the present invention, verifying the trained diabetes parting model using verification data, and determining whether the diabetes parting model needs to be adjusted based on a verification result, to obtain a final diabetes parting model, includes:
s301, inputting the verification data into a trained diabetes typing model to obtain a clustering index radar chart after diabetes typing;
S302, comparing index data represented in the clustering index radar graph with characteristics of each type of diabetes in verification data;
s303, when the comparison result shows that the diabetes parting model accords with the characteristic distribution rule range of the verification data, judging that the current trained diabetes parting model is the final diabetes parting model;
s304, when the comparison result shows that the diabetes parting model does not accord with the characteristic distribution rule range of the verification data, using the first adjustment factor and the second adjustment factor to respectively carry out threshold coefficient on the abnormal valueNAnd the number of encodersMAdjusting; and uses the threshold coefficient of the adjusted outlierNAnd the number of encodersMAnd re-acquiring the trained diabetes parting model until the verification result of the trained diabetes parting model accords with the characteristic distribution rule range of the verification data.
The first adjustment factor and the second adjustment factor are obtained through the following formula:
wherein ,ΔPRepresenting a first adjustment factor; deltaMRepresenting a second adjustment factor;Krepresenting the number of data which does not accord with the characteristic distribution rule range of the verification data;X mi represent the firstiData values which do not conform to the characteristic distribution rule range of the verification data; X si Represent the firstiScaled data values corresponding to data that do not conform to the characteristic distribution rule range of the verification data;X h representing the range of feature distribution rules and the firstiData values corresponding to data nearest data points which do not accord with the characteristic distribution rule range of the verification data;X p mean values representing clinical sample data;X c standard deviation representing clinical sample data;X c1 and a numerical value representing the standard deviation corresponding to the verification data.
The working principle of the technical scheme is as follows: inputting the verification data into the trained diabetes typing model to obtain a clustering index radar chart after diabetes typing. The cluster index radar chart is used for representing the distribution situation of different indexes on different diabetes types.
The index data in the clustered index radar map is compared with the characteristics of each type of diabetes in the validation data. By comparison, whether the trained diabetes parting model accords with the characteristic distribution rule range of the verification data can be evaluated.
If the comparison result shows that the diabetes parting model accords with the characteristic distribution rule range of the verification data, the current trained diabetes parting model is judged to be the final diabetes parting model.
If the comparison result shows that the diabetes typing model does not accord with the characteristic distribution rule range of the verification data, adjustment is needed. And respectively adjusting the threshold coefficient N of the outlier and the number M of the encoders by using the first adjusting factor and the second adjusting factor. And retraining the diabetes parting model through the adjusted threshold coefficient N and the number M of encoders until the verification result accords with the characteristic distribution rule range of the verification data.
The technical scheme has the effects that: according to the technical scheme, the trained diabetes parting model gradually approaches to the characteristic distribution rule range of the verification data through continuously adjusting the threshold coefficient N of the abnormal value and the number M of the encoders. And through iterative adjustment, the diabetes typing model conforming to the verification data characteristics is finally obtained, and the accuracy and the adaptability of the model are improved.
Meanwhile, the verification data is compared with the trained model, and the threshold coefficient of the abnormal value and the number of encoders are continuously adjusted, so that the diabetes typing model conforming to the characteristic distribution rule of the verification data is finally obtained. The fitting capacity and accuracy of the model are effectively improved, so that the model can be better applied to actual diabetes typing tasks.
The embodiment of the invention provides a diabetes typing system based on depth self-coding, as shown in fig. 2, comprising:
the data extraction module is used for extracting clinical data samples from the diabetes clinical database as training data and verification data;
the model construction and training module is used for constructing a diabetes parting model based on depth self-coding, and training the diabetes parting model by utilizing the training model to obtain a trained diabetes parting model; wherein, a Kmes clustering module is embedded in a partial depth self-encoder of the diabetes parting model;
And the verification adjustment module is used for verifying the trained diabetes parting model by using verification data, determining whether the diabetes parting model needs to be adjusted or not based on a verification result, and obtaining a final diabetes parting model.
The working principle of the technical scheme is as follows: firstly, extracting clinical data samples from a diabetes clinical database as training data and verification data through a data extraction module;
then, constructing a diabetes parting model based on depth self-coding by using a model construction and training module, and training the diabetes parting model by using the training model to obtain a trained diabetes parting model; wherein, a Kmes clustering module is embedded in a partial depth self-encoder of the diabetes parting model;
and finally, verifying the trained diabetes parting model by using verification data through a verification adjustment module, and determining whether the diabetes parting model needs to be adjusted based on a verification result to obtain a final diabetes parting model.
The technical scheme has the effects that: according to the diabetes typing system based on the depth self-coding, a clustering target is added to an optimization process, namely, a pre-trained encoder part of a self-encoder is taken out, and the diabetes typing system based on the depth self-coding and a Kmen clustering module perform joint loss optimization through KL divergence, so that effective clustering can be performed on high-dimensional data with sparse data distribution and unclear cluster structures, and meanwhile, the final diabetes typing model obtained by the diabetes typing system based on the depth self-coding can ensure that all data output by the depth self-encoder are dimension reduction data suitable for clustering.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (10)

1. A diabetes typing method based on depth self-coding, the diabetes typing method comprising:
extracting clinical data samples from a diabetes clinical database as training data and verification data;
constructing a diabetes parting model based on depth self-coding, and training the diabetes parting model by using a training model to obtain a trained diabetes parting model; wherein, a Kmes clustering module is embedded in a partial depth self-encoder of the diabetes parting model;
and verifying the trained diabetes parting model by using verification data, and determining whether the diabetes parting model needs to be adjusted or not based on a verification result to obtain a final diabetes parting model.
2. The method of claim 1, wherein extracting clinical data samples from a diabetes clinical database as training data and validation data comprises:
Extracting clinical data samples from the diabetes clinical database;
performing data preprocessing on the clinical data sample to obtain a preprocessed clinical data sample;
dividing the preprocessed clinical data samples according to the data proportion of preset training data and verification data, and obtaining the training data and the verification data corresponding to the data proportion.
3. The method of claim 2, wherein the data preprocessing is performed on the clinical data samples to obtain preprocessed clinical data samples, comprising:
removing null values from the clinical data samples to obtain clinical sample data without null values;
removing the null-free clinical sample dataNObtaining clinical sample data without abnormal values by abnormal values outside the standard deviation;
and carrying out continuous variable normalization and classified variable coding treatment on the clinical sample data without abnormal values to obtain a preprocessed clinical data sample.
4. A method of typing diabetes according to claim 3, wherein the null-free clinical sample data is removedNOutlier values outside of the individual standard deviations, obtaining outlier-free clinical sample data, comprising:
Carrying out average value calculation and standard deviation calculation on the clinical sample data to obtain an average value and a standard deviation corresponding to the clinical sample data;
determining a threshold coefficient of an outlier using the mean and standard deviation corresponding to the clinical sample dataNAnd pass through the threshold coefficientNDetermining a range of outliers, wherein the threshold coefficientNAnd the range of outliers is obtained by the following formula:
wherein ,Nrepresenting a threshold coefficient;X p mean values representing clinical sample data;X c standard deviation representing clinical sample data;Pthe percentile point is indicated as being the percentile point,Pthe value range of (2) is 0.71-0.74;λrepresenting adjustment coefficients whenX c -(1+P 2X p >At the time of 0, the temperature of the liquid,λ=-(1-P) When (when)X c -(1+P 2X p <At the time of 0, the temperature of the liquid,λ=1;ΔPrepresenting a first adjustment factor;X ymax andX ymin an upper limit value and a lower limit value representing a range of abnormal values;
traversing each data point in the data set, and judging whether the data points exceed the range of abnormal values or not;
when the clinical sample data exceeds the range of the abnormal value, taking the clinical sample data exceeding the range of the abnormal value as the abnormal value;
and acquiring a substitute value of the abnormal value according to the relation between the abnormal value and the range of the abnormal value, replacing the abnormal value and the corresponding position of the abnormal value by the substitute value, and deleting the abnormal value.
5. The method of claim 4, wherein the surrogate value is obtained by the following formula:
wherein ,X t representing a substitute value corresponding to the outlier;X p mean values representing clinical sample data;Xa numerical value corresponding to an original data point representing the clinical sample data;X c standard deviation representing clinical sample data;Pthe percentile point is indicated as being the percentile point,Pthe value range of (2) is 0.71-0.74;X ymax andX ymin the upper limit value and the lower limit value of the range representing the abnormal value.
6. A method of typing diabetes according to claim 3, wherein performing continuous variable normalization and classification variable encoding processing on the de-outlier clinical sample data to obtain a pre-processed clinical data sample comprises:
setting a scaling strategy of continuous variables, wherein the scaling strategy corresponds to the formula as follows:
wherein ,X s a numerical value corresponding to a data point representing scaled clinical sample data;Xa numerical value corresponding to an original data point representing the clinical sample data;X min a minimum data value in the raw dataset representing clinical sample data;X max a maximum data value in the raw dataset representing clinical sample data;X rmin andX rmax a data lower limit value and a data upper limit value of scaling data preset in variable scaling of the clinical sample data are represented;
Scaling and normalizing continuous variables to be normalized in the clinical sample data according to the scaling strategy of the continuous variables to generate continuous variable normalized data information;
and determining a classification variable which needs to be subjected to coding processing in the continuous variable normalized data information, and carrying out classification variable coding processing on the continuous variable normalized data information according to the characteristics of the classification variable to obtain sample data after classification coding conversion, wherein the sample data after class coding conversion is a clinical data sample after preprocessing.
7. The method of claim 1, wherein constructing a depth self-coding based diabetes typing model and training the diabetes typing model with a training model to obtain a trained diabetes typing model comprises:
constructing a diabetes typing model based on depth self-coding;
training the depth self-encoder by using training data to obtain a trained depth self-encoder;
the trained depth is self-coded in the encoderMDepth self-encoderKmensThe clustering module passes throughKLPerforming joint loss optimization in a divergence mode to form a carrier with KmensA clustered depth self-encoder; wherein is provided withKmensThe diabetes typing model of the clustered depth self-encoder is the trained diabetes typing model, and the depth self-encoderMThe specific value of (2) is obtained by the following formula:
wherein ,Mrepresenting a unionKmensThe number of depth self-encoders of the clustering module, and,Mto round downwards when passing throughCalculated to obtainMWhen=0, letM =1, when passing->Calculated to obtainM>M 0 Time, orderM =M 0 -1;A 0 The number of data representing abnormal values in the clinical sample data;Atotal number of sample data representing clinical sample data;M 0 the total number of depth self-encoders represented in the depth self-encoding diabetes typing model; deltaMRepresenting a second adjustment factor.
8. The method of claim 1, wherein validating the trained diabetes typing model using validation data and determining whether adjustment of the diabetes typing model is required based on validation results to obtain a final diabetes typing model comprises:
inputting the verification data into a trained diabetes typing model to obtain a clustering index radar chart after diabetes typing;
comparing the index data represented in the clustered index radar map with the characteristics of each type of diabetes in the validation data;
When the comparison result shows that the diabetes parting model accords with the characteristic distribution rule range of the verification data, judging that the currently trained diabetes parting model is the final diabetes parting model;
when the comparison result shows that the diabetes parting model does not accord with the characteristic distribution rule range of the verification data, the first adjustment factor and the second adjustment factor are utilized to respectively carry out threshold coefficient on the abnormal valueNAnd the number of encodersMAdjusting; and uses the threshold coefficient of the adjusted outlierNAnd the number of encodersMAnd re-acquiring the trained diabetes parting model until the verification result of the trained diabetes parting model accords with the characteristic distribution rule range of the verification data.
9. The method of claim 8, wherein the first and second adjustment factors are obtained by the following formula:
wherein ,ΔPRepresenting a first adjustment factor; deltaMRepresenting a second adjustment factor;Krepresenting the number of data which does not accord with the characteristic distribution rule range of the verification data;X mi represent the firstiData values which do not conform to the characteristic distribution rule range of the verification data;X si represent the firstiScaled data values corresponding to data that do not conform to the characteristic distribution rule range of the verification data; X h Representing the range of feature distribution rules and the firstiData values corresponding to data nearest data points which do not accord with the characteristic distribution rule range of the verification data;X p mean values representing clinical sample data;X c standard deviation representing clinical sample data;X c1 and a numerical value representing the standard deviation corresponding to the verification data.
10. A depth self-encoding based diabetes typing system, the diabetes typing system comprising:
the data extraction module is used for extracting clinical data samples from the diabetes clinical database as training data and verification data;
the model construction and training module is used for constructing a diabetes parting model based on depth self-coding, and training the diabetes parting model by utilizing the training model to obtain a trained diabetes parting model; wherein, a Kmes clustering module is embedded in a partial depth self-encoder of the diabetes parting model;
and the verification adjustment module is used for verifying the trained diabetes parting model by using verification data, determining whether the diabetes parting model needs to be adjusted or not based on a verification result, and obtaining a final diabetes parting model.
CN202311022792.8A 2023-08-15 2023-08-15 Diabetes typing method and system based on depth self-coding Active CN116738297B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311022792.8A CN116738297B (en) 2023-08-15 2023-08-15 Diabetes typing method and system based on depth self-coding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311022792.8A CN116738297B (en) 2023-08-15 2023-08-15 Diabetes typing method and system based on depth self-coding

Publications (2)

Publication Number Publication Date
CN116738297A true CN116738297A (en) 2023-09-12
CN116738297B CN116738297B (en) 2023-11-21

Family

ID=87904777

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311022792.8A Active CN116738297B (en) 2023-08-15 2023-08-15 Diabetes typing method and system based on depth self-coding

Country Status (1)

Country Link
CN (1) CN116738297B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020082732A1 (en) * 2018-10-26 2020-04-30 平安科技(深圳)有限公司 Automatic picture classification method, device, and computer readable storage medium
CN111178427A (en) * 2019-12-27 2020-05-19 杭州电子科技大学 Depth self-coding embedded clustering method based on Sliced-Wasserstein distance
CN111696660A (en) * 2020-05-13 2020-09-22 平安科技(深圳)有限公司 Artificial intelligence-based patient grouping method, device, equipment and storage medium
CN112884010A (en) * 2021-01-25 2021-06-01 浙江师范大学 Multi-mode self-adaptive fusion depth clustering model and method based on self-encoder
CN114023449A (en) * 2021-11-05 2022-02-08 中山大学 Diabetes risk early warning method and system based on depth self-encoder
CN116563587A (en) * 2023-04-25 2023-08-08 杭州电子科技大学 Method and system for embedded clustering of depth of graph convolution structure based on slimed-Wasserstein distance

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020082732A1 (en) * 2018-10-26 2020-04-30 平安科技(深圳)有限公司 Automatic picture classification method, device, and computer readable storage medium
CN111178427A (en) * 2019-12-27 2020-05-19 杭州电子科技大学 Depth self-coding embedded clustering method based on Sliced-Wasserstein distance
CN111696660A (en) * 2020-05-13 2020-09-22 平安科技(深圳)有限公司 Artificial intelligence-based patient grouping method, device, equipment and storage medium
CN112884010A (en) * 2021-01-25 2021-06-01 浙江师范大学 Multi-mode self-adaptive fusion depth clustering model and method based on self-encoder
CN114023449A (en) * 2021-11-05 2022-02-08 中山大学 Diabetes risk early warning method and system based on depth self-encoder
CN116563587A (en) * 2023-04-25 2023-08-08 杭州电子科技大学 Method and system for embedded clustering of depth of graph convolution structure based on slimed-Wasserstein distance

Also Published As

Publication number Publication date
CN116738297B (en) 2023-11-21

Similar Documents

Publication Publication Date Title
CN109639739B (en) Abnormal flow detection method based on automatic encoder network
CN109086805B (en) Clustering method based on deep neural network and pairwise constraints
CN111785329A (en) Single-cell RNA sequencing clustering method based on confrontation automatic encoder
CN113052271B (en) Biological fermentation data prediction method based on deep neural network
CN111161814A (en) DRGs automatic grouping method based on convolutional neural network
CN109740254B (en) Ship diesel engine abrasive particle type identification method based on information fusion
CN117349782B (en) Intelligent data early warning decision tree analysis method and system
CN114548199A (en) Multi-sensor data fusion method based on deep migration network
CN117131022B (en) Heterogeneous data migration method of electric power information system
CN114880538A (en) Attribute graph community detection method based on self-supervision
CN114139624A (en) Method for mining time series data similarity information based on integrated model
CN117095754B (en) Method for classifying proteins by machine learning
CN112464281B (en) Network information analysis method based on privacy grouping and emotion recognition
CN116738297B (en) Diabetes typing method and system based on depth self-coding
CN111863153A (en) Method for predicting total amount of suspended solids in wastewater based on data mining
CN112712855A (en) Joint training-based clustering method for gene microarray containing deletion value
CN116776270A (en) Method and system for detecting micro-service performance abnormality based on transducer
CN116318773A (en) Countermeasure training type unsupervised intrusion detection system and method based on AE model optimization
CN114864004A (en) Deletion mark filling method based on sliding window sparse convolution denoising self-encoder
CN114792026A (en) Method and system for predicting residual life of aircraft engine equipment
CN111882441A (en) User prediction interpretation Treeshap method based on financial product recommendation scene
CN115831339B (en) Medical system risk management and control pre-prediction method and system based on deep learning
CN112070023B (en) Neighborhood prior embedded type collaborative representation mode identification method
CN113485863B (en) Method for generating heterogeneous imbalance fault samples based on improved generation of countermeasure network
CN116777292A (en) Defect rate index correction method based on multi-batch small sample space product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant