CN111488924A - Multivariate time sequence data clustering method - Google Patents
Multivariate time sequence data clustering method Download PDFInfo
- Publication number
- CN111488924A CN111488924A CN202010265442.4A CN202010265442A CN111488924A CN 111488924 A CN111488924 A CN 111488924A CN 202010265442 A CN202010265442 A CN 202010265442A CN 111488924 A CN111488924 A CN 111488924A
- Authority
- CN
- China
- Prior art keywords
- data
- clustering
- value
- model
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 238000004364 calculation method Methods 0.000 claims abstract description 17
- 238000013135 deep learning Methods 0.000 claims abstract description 12
- 238000010606 normalization Methods 0.000 claims abstract description 9
- 238000000605 extraction Methods 0.000 claims abstract description 7
- 229940079593 drug Drugs 0.000 claims description 12
- 239000003814 drug Substances 0.000 claims description 12
- 230000006870 function Effects 0.000 claims description 12
- 238000005457 optimization Methods 0.000 claims description 12
- 238000012549 training Methods 0.000 claims description 10
- 238000007781 pre-processing Methods 0.000 claims description 7
- 238000010200 validation analysis Methods 0.000 claims description 5
- 238000011156 evaluation Methods 0.000 claims description 4
- 238000012935 Averaging Methods 0.000 claims description 3
- 238000009827 uniform distribution Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 2
- 238000012545 processing Methods 0.000 abstract description 2
- 238000005065 mining Methods 0.000 description 5
- 238000005259 measurement Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000007334 memory performance Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000003213 activating effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 231100001263 laboratory chemical safety summary Toxicity 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Probability & Statistics with Applications (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a multivariate time sequence data clustering method, which comprises the steps of carrying out normalization pretreatment on multivariate time sequence data; constructing a sparse self-encoder of a deep learning unsupervised learning model, and performing feature extraction on multivariate time sequence data to construct a new feature sequence; acquiring a clustering K value of a new characteristic sequence of sample data; calculating the distance between new characteristic sequences of different sample data based on the Euclidean distance; clustering the new characteristic sequence set of the sample data; and analyzing potential patterns of the multivariate time sequence data according to the clustering result. According to the invention, through the sparse self-encoder model and the clustering method, the efficiency of processing large-scale data is improved, the sparse self-encoder model is constructed to improve the performance of the model for extracting a new characteristic sequence from the multivariate time sequence data, and meanwhile, the multivariate distance calculation model is constructed according to the Euclidean distance to realize the clustering of the multivariate time sequence data.
Description
Technical Field
The invention relates to the field of data clustering, in particular to a multivariate time series data clustering method.
Background
With the rapid development of the internet of things, research based on time series data is widely applied to multiple fields such as finance and medical treatment. Clustering is an effective time sequence data analysis method, characteristics of time sequence data can be analyzed by mining potential patterns of the time sequence, and application problems of the time sequence data can be further researched on the basis.
At present, a time series clustering method mainly comprises the following steps: (1) time series data clustering method based on division. The number of data categories and an initial clustering center point are first determined, and then the sample points are classified into different categories by calculating the distance between each sample point and the clustering center point until convergence. (2) A time series data clustering method based on density. The category radius and the number of samples within a category are first determined and then clustered until the density of neighboring regions exceeds a set threshold. (3) A time series data clustering method based on hierarchy. The method can be divided into a top-down mode and a bottom-up mode, wherein the top-down mode takes all samples as root nodes, and then the splitting is performed recursively until a single sample class appears; the latter starts from a single sample and merges until a stop condition is met. These methods usually cannot accurately and comprehensively mine the inherent characteristics of time series data, and research on time series data is relatively limited, especially for mining analysis of potential patterns of multivariate time series data, so we develop a multivariate time series data clustering method here.
Disclosure of Invention
The invention aims to provide a multivariate time sequence clustering method, which combines an unsupervised learning model sparse self-encoder and a traditional clustering method Kmeans, constructs a sparse self-encoder model by taking a deep learning L STM model as a basic unit, extracts a new feature sequence set of single variable time sequence data through the sparse self-encoder model, constructs a multivariate distance calculation method according to Euclidean distances to calculate the distances between multiple variable time sequence data of different samples, and then clusters the new feature sequences of all samples by using the Kmeans clustering method, thereby effectively mining the potential pattern of the multivariate time sequence data based on a clustering result.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention comprises the following steps:
s10, preprocessing the multivariate time sequence data, wherein the preprocessing comprises the steps of carrying out validation and normalization operation on the data;
s20, constructing a sparse self-encoder of a deep learning unsupervised learning model by taking a deep learning L STM model as a basic unit, and performing feature extraction on multivariate time series data to construct a new feature sequence;
s30, acquiring a cluster K value of the new characteristic sequence of the sample data;
s40, calculating the distance between new feature sequences of different sample data based on Euclidean distance;
s50, clustering the new characteristic sequence set of the sample data;
s60, analyzing the potential mode of the multivariate time sequence data according to the clustering result, averaging all the sample point data in each category to obtain the average value of each time point in the multivariate time sequence data, and acquiring a new multivariate average time sequence with the category as the unit.
Further, the self-encoder model comprises an encoder and a decoder, new characteristic values of the data are extracted by encoding the input data, and then an output is obtained by further decoding, and the output is equal to the input as a model optimization target. The self-encoder model is a training process of adding a sparse term in an optimization function to limit model parameters so as to optimize the model, and the training process is as follows:
(1) taking data points in the time series data one by one as the input of an L STM unit in an encoder, and taking a sequence obtained after the input of the last data point as a new characteristic sequence of sample data;
(2) and taking the new characteristic sequence of the sample data as the input of a decoder, and continuously training a sparse self-encoder model by taking a mean square error function added with a sparse term as an optimization function. The calculation formula of the optimization function is as follows:
further, obtaining a clustering K value of the multivariate time series data, specifically comprising the following steps:
(1) selecting a specific K value within the value range of (1-100), randomly generating samples with the same number as the initial samples in a specific three-dimensional area where the samples are located according to the uniform distribution principle, and clustering by adopting a Kmeans method to obtain WkThe calculation formula is as follows:
(2) obtaining s bykThe value n takes the value of 100, and the calculation formula is as follows:
wherein, WknDenotes W under the condition of specific n valuek。
(4) And performing a second step and a third step on all K values in the K value range, selecting the K value with the fastest Wk drop as the optimal clustering number, and adopting the following calculation formula:
further, the validation comprises deleting data with missing value proportion larger than 80% in the data, and the normalization comprises mapping the data of different medication types of the case to the intervals (0,1) by adopting a most value normalization method, wherein the specific formula is as follows:
further, the method for clustering the new feature sequences of all sample data specifically comprises the following steps:
(1) firstly, randomly dividing all sample points into K categories according to K values;
(2) calculating a new category center point for each category, and clustering all sample points again according to the distance between each sample point and each category center point;
(3) the second step is repeated until the value of K is satisfied.
And evaluating the effectiveness of the method provided by the invention by taking the contour coefficient SC as an evaluation standard of the clustering performance of the multivariate time series data.
Where a (i) represents the average distance of sample i to other samples in the same cluster, and b (i) represents the average distance of sample i to all samples in other clusters.
Compared with the prior art, the invention has the beneficial effects that:
the multivariate time sequence data clustering method provided by the invention fully utilizes the high-performance feature extraction of the unsupervised learning model sparse self-encoder on large-scale data in deep learning and the excellent sequence memory performance of the L STM model on time sequence data in deep learning by combining a new deep learning method and a traditional clustering method, and effectively solves the problem that the traditional Kmeans clustering method cannot well process the large-scale data, thereby better mining and analyzing the potential pattern of the multivariate time sequence data.
Drawings
FIG. 1 is a flow chart of a multivariate time series data clustering method;
FIG. 2 is a block diagram of a sparse self-encoder model;
FIG. 3 is a result of performance evaluation of a multivariate time series data clustering method;
FIG. 4 is a potential pattern mining result of multivariate time series data
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
As shown in figure 1, the method is suitable for a multivariate sequence data clustering method, and is used for preprocessing multivariate sequence data, namely, multi-type patient medication data, acquired by a medical institution, firstly, a new characteristic sequence of the data is constructed through characteristic extraction, then, distance measurement between a clustering K value and the new characteristic sequence is acquired, finally, the new characteristic sequence is clustered based on a Kmeans clustering method, and a potential mode of original data is analyzed according to a clustering result.
Step S10: multivariate time series data preprocessing
The medical institution acquires various types of medication data of patients, and the data are respectively subjected to validation and normalization processing to construct a clustered data set. Taking the Medicare data set as an example, 90-day administration data containing about 32 ten thousand cases for two types of drugs (referred to simply as drug a and drug B), the following pre-processing is performed:
and (4) activating. The case data with the missing value proportion of more than 80 percent in the data is deleted, and the number of cases is reduced from about 32 ten thousands to about 31 ten thousands.
And (6) normalizing. The data of different medication types of the case are mapped between the intervals (0,1) by adopting a most value normalization method, and the specific formula is as follows:
step S20: construction of new feature sequences by feature extraction of multivariate time series data
In the step, a deep learning unsupervised learning model sparse self-encoder is constructed by taking a deep learning long-short term memory (L ong-term memory, L STM) model as a basic unit, and the multivariate time series data is subjected to feature extraction to construct a new feature sequence of the sample data.
L STM is a special deep learning RNN model, can solve the problem of gradient disappearance appearing in the long-time sequence data training process, has better time sequence memory performance than the ordinary RNN model L STM contains 3 gates, which are respectively (1) update gate, used for controlling input gate (2) output gate, used for controlling the past degree of the existing content (3) forget gate, used for controlling output, the update formula of model parameters is as follows:
a<t>=o*tanh(c<t>)
f=σ(Wf[a<t-1>,x<t>]+bf)
u=σ(Wu[a<t-1>,x<t>]+bu)
o=σ(Wo[a<t-1>,x<t-1>]+bo)
the sparse self-encoder model respectively comprises an encoder part and a decoder part, and as shown in fig. 2, the part a belongs to the encoder part of the sparse self-encoder model; part B belongs to the decoder part of the sparse self-encoder model.
The method comprises the steps of firstly, encoding input data to extract a new characteristic value of the data, and then, further decoding to obtain an output, wherein the output is equal to the input and is used as a model optimization target. The sparse self-encoder model is a training process for optimizing a model by adding a sparse term to an optimization function of the self-encoder model to limit model parameters, and the training process is as follows:
taking data points in the time series data one by one as the input of an L STM unit in an encoder, and taking a sequence obtained after the input of the last data point as a new characteristic sequence of sample data;
and taking the new characteristic sequence of the time sequence data as the input of a decoder, and continuously training a sparse self-encoder model by taking a mean square error function added with sparse items as an optimization function. The calculation formula of the optimization function is as follows:
in the case, the administration data of 90 days of case-to-case drug A and drug B are respectively converted into 50-dimensional new characteristic sequences by data extraction.
Step S30: and acquiring a clustering K value of the new sample data characteristic sequence set.
In the step, a Gap statistical method is used for acquiring a clustering K value of multivariate time sequence data, and the specific process is as follows:
setting a value range of the K value;
selecting a specific K value in a value range, randomly generating samples with the same number as the initial samples in a specific three-dimensional area where the samples are located according to a uniform distribution principle, and clustering by adopting Kmeans to obtain Wk, wherein the calculation formula is as follows:
the sk value is obtained by repeating the second step 2-5 times, and the calculation formula is as follows:
repeating the second step and the third step for all K values in the K value range, selecting the K value with the fastest Wk drop as the optimal clustering number, and adopting the following calculation formula:
in the case, based on the above calculation of Gap static, 4 is selected as the cluster K value as shown in fig. 3.
Step S40: calculating the distance between new characteristic sequences of different sample data
In the step, a multivariate distance calculation model is constructed according to Euclidean distances, and the distance between new characteristic sequences of different sample data is calculated, wherein the calculation formula is as follows:
step S50: clustering new characteristic sequence set of sample data based on Kmeans clustering method
The new characteristic sequences of all sample data are clustered in the step, and the specific process is as follows:
(1) firstly, randomly dividing all sample points into K categories according to K values;
(2) calculating a new category center point for each category, and clustering all sample points again according to the distance between each sample point and each category center point;
(3) the second step is repeated until the value of K is satisfied.
And evaluating the effectiveness of the method provided by the invention by taking the contour coefficient SC as an evaluation standard of the clustering performance of the multivariate time series data.
Where a (i) represents the average distance of sample i to other samples in the same cluster, and b (i) represents the average distance of sample i to all samples in other clusters.
As shown in Table 1, the SC value of the method provided by the invention is higher than that of other existing methods, and the SC value is highest and the clustering performance is optimal under the condition that the Euclidean distance is taken as the clustering measurement.
TABLE 1 clustering performance results of different clustering methods under different distance metrics
Distance measurement | Hierarchical clustering method | k-means | bi-kmeans | k-medoids | The invention |
Euclidean | 0.65 | 0.56 | 0.69 | 0.63 | 0.88 |
Pearson | 0.41 | 0.49 | 0.65 | 0.59 | 0.72 |
LCSS | 0.55 | 0.52 | 0.67 | 0.53 | 0.70 |
DTW | 0.63 | 0.54 | 0.61 | 0.47 | 0.67 |
EDR | 0.57 | 0.58 | 0.59 | 0.51 | 0.66 |
Step S60: analyzing potential patterns of multivariate time sequence data according to clustering results
Averaging all sample point data in each category to obtain an average value of each time point in the multivariate time sequence data, acquiring a new multivariate average time sequence with the category as a unit, and further researching the potential mode of the multivariate time sequence data on the basis of the new multivariate average time sequence.
According to the above analysis method, the potential patterns of the data of two drugs administered to the patient in the case are shown in fig. 4, and can be divided into 4 types: (1) type a, i.e. ultra low dose administration. The dosage of the two medicines is about 0, and the number of cases accounts for 32.3 percent of the total number of cases; (2) type B, i.e. low dose administration. The dosage of OPI is less than 30 percent, the dosage of BZD is less than 2 percent, and the number of cases accounts for 57.5 percent of the total number of cases; (3) type C, i.e. low dose BZD ultra high dose administration of OPI. The dosage interval range of OPI is (30,50), the dosage interval range of BZD is (13, 19), and the number of cases accounts for 5.0 percent of the total number of cases; (4) type D, i.e. high dose administration. The dosage of OPI is more than 220 percent and the dosage of BZD is more than 5 percent, and the number of cases accounts for 5.2 percent of the total number of people.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.
Claims (5)
1. A multivariate time series data clustering method is characterized by comprising the following steps:
s10, preprocessing the multivariate time sequence data, wherein the preprocessing comprises the steps of carrying out validation and normalization operation on the data;
s20, constructing a sparse self-encoder of a deep learning unsupervised learning model by taking a deep learning L STM model as a basic unit, and performing feature extraction on multivariate time series data to construct a new feature sequence;
s30, acquiring a cluster K value of the new characteristic sequence of the sample data;
s40, calculating the distance between new feature sequences of different sample data based on Euclidean distance;
s50, clustering the new characteristic sequence set of the sample data;
s60, analyzing the potential mode of the multivariate time sequence data according to the clustering result, averaging all the sample point data in each category to obtain the average value of each time point in the multivariate time sequence data, and acquiring a new multivariate average time sequence with the category as the unit.
2. The method of claim 2, wherein the self-encoder model comprises an encoder and a decoder, and the self-encoder model is used for encoding input data to extract new eigenvalues of the data, and then further decoding to obtain an output, and the output is equal to the input as the model optimization target. The self-encoder model is a training process of adding a sparse term in an optimization function to limit model parameters so as to optimize the model, and the training process is as follows:
(1) taking data points in the time series data one by one as the input of an L STM unit in an encoder, and taking a sequence obtained after the input of the last data point as a new characteristic sequence of sample data;
(2) and taking the new characteristic sequence of the sample data as the input of a decoder, and continuously training a sparse self-encoder model by taking a mean square error function added with a sparse term as an optimization function. The calculation formula of the optimization function is as follows:
3. the method as claimed in claim 1, wherein the clustering K value of the multivariate time series data is obtained by the following steps:
(1) selecting a specific K value within the value range of (1-100), randomly generating samples with the same number as the initial samples in a specific three-dimensional area where the samples are located according to the uniform distribution principle, and clustering by adopting a Kmeans method to obtain WkThe calculation formula is as follows:
(2) obtaining s bykThe value n takes the value of 100, and the calculation formula is as follows:
wherein, WknDenotes W under the condition of specific n valuek。
(3) Repeating the second step and the third step for all K values in the K value range, selecting the K value with the fastest Wk drop as the optimal clustering number, and adopting the following calculation formula:
4. the multivariate time series data clustering method as claimed in claim 1, wherein the validation comprises deleting data with missing value ratio greater than 80%, and the normalization comprises mapping data of different medication types of a case to intervals (0,1) by using a most-valued normalization method, wherein the specific formula is as follows:
5. the method according to claim 1, wherein the new signature sequences of all sample data are clustered by the following steps:
(1) firstly, randomly dividing all sample points into K categories according to K values;
(2) calculating a new category center point for each category, and clustering all sample points again according to the distance between each sample point and each category center point;
(3) the second step is repeated until the value of K is satisfied.
And evaluating the effectiveness of the method provided by the invention by taking the contour coefficient SC as an evaluation standard of the clustering performance of the multivariate time series data.
Where a (i) represents the average distance of sample i to other samples in the same cluster, and b (i) represents the average distance of sample i to all samples in other clusters.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010265442.4A CN111488924B (en) | 2020-04-07 | 2020-04-07 | Multivariable time sequence data clustering method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010265442.4A CN111488924B (en) | 2020-04-07 | 2020-04-07 | Multivariable time sequence data clustering method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111488924A true CN111488924A (en) | 2020-08-04 |
CN111488924B CN111488924B (en) | 2024-04-26 |
Family
ID=71811758
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010265442.4A Active CN111488924B (en) | 2020-04-07 | 2020-04-07 | Multivariable time sequence data clustering method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111488924B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112345261A (en) * | 2020-10-29 | 2021-02-09 | 南京航空航天大学 | Aero-engine pumping system abnormity detection method based on improved DBSCAN algorithm |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3188111A1 (en) * | 2015-12-28 | 2017-07-05 | Deutsche Telekom AG | A method for extracting latent context patterns from sensors |
CN109472321A (en) * | 2018-12-03 | 2019-03-15 | 北京工业大学 | A kind of prediction towards time series type surface water quality big data and assessment models construction method |
CN109636061A (en) * | 2018-12-25 | 2019-04-16 | 深圳市南山区人民医院 | Training method, device, equipment and the storage medium of medical insurance Fraud Prediction network |
CN109919189A (en) * | 2019-01-29 | 2019-06-21 | 华南理工大学 | A kind of depth K mean cluster method towards time series data |
CN110070145A (en) * | 2019-04-30 | 2019-07-30 | 天津开发区精诺瀚海数据科技有限公司 | LSTM wheel hub single-item energy consumption prediction based on increment cluster |
CN110459292A (en) * | 2019-07-02 | 2019-11-15 | 南京邮电大学 | A kind of risk management stage division based on cluster and PNN |
-
2020
- 2020-04-07 CN CN202010265442.4A patent/CN111488924B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3188111A1 (en) * | 2015-12-28 | 2017-07-05 | Deutsche Telekom AG | A method for extracting latent context patterns from sensors |
CN109472321A (en) * | 2018-12-03 | 2019-03-15 | 北京工业大学 | A kind of prediction towards time series type surface water quality big data and assessment models construction method |
CN109636061A (en) * | 2018-12-25 | 2019-04-16 | 深圳市南山区人民医院 | Training method, device, equipment and the storage medium of medical insurance Fraud Prediction network |
CN109919189A (en) * | 2019-01-29 | 2019-06-21 | 华南理工大学 | A kind of depth K mean cluster method towards time series data |
CN110070145A (en) * | 2019-04-30 | 2019-07-30 | 天津开发区精诺瀚海数据科技有限公司 | LSTM wheel hub single-item energy consumption prediction based on increment cluster |
CN110459292A (en) * | 2019-07-02 | 2019-11-15 | 南京邮电大学 | A kind of risk management stage division based on cluster and PNN |
Non-Patent Citations (1)
Title |
---|
张潇龙;齐林海;: "融合稀疏降噪自编码与聚类算法的配电网台区分类研究", 电力信息与通信技术, no. 12, pages 15 - 23 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112345261A (en) * | 2020-10-29 | 2021-02-09 | 南京航空航天大学 | Aero-engine pumping system abnormity detection method based on improved DBSCAN algorithm |
CN112345261B (en) * | 2020-10-29 | 2022-05-03 | 南京航空航天大学 | Aero-engine pumping system abnormity detection method based on improved DBSCAN algorithm |
Also Published As
Publication number | Publication date |
---|---|
CN111488924B (en) | 2024-04-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107016438B (en) | System based on traditional Chinese medicine syndrome differentiation artificial neural network algorithm model | |
Xia et al. | Research in clustering algorithm for diseases analysis | |
CN111000553B (en) | Intelligent classification method for electrocardiogram data based on voting ensemble learning | |
WO2019041628A1 (en) | Method for mining multivariate time series association rule based on eclat | |
CN108763590B (en) | Data clustering method based on double-variant weighted kernel FCM algorithm | |
CN107122643B (en) | Identity recognition method based on feature fusion of PPG signal and respiratory signal | |
CN107103048A (en) | Medicine information matching process and system | |
Wong et al. | Herd clustering: A synergistic data clustering approach using collective intelligence | |
CN107203686A (en) | medicine information difference processing method and system | |
CN111488924B (en) | Multivariable time sequence data clustering method | |
CN110335160B (en) | Medical care migration behavior prediction method and system based on grouping and attention improvement Bi-GRU | |
Peng et al. | The health care fraud detection using the pharmacopoeia spectrum tree and neural network analytic contribution hierarchy process | |
Gossmann et al. | Test data reuse for the evaluation of continuously evolving classification algorithms using the area under the receiver operating characteristic curve | |
Tzacheva et al. | Support confidence and utility of action rules triggered by meta-actions | |
Idris et al. | Applications of machine learning for prediction of liver disease | |
Sebayang et al. | Optimization on Purity K-means using variant distance measure | |
US20240170104A1 (en) | Method and system for predicting adverse drug-drug interactions by recovering the multi-attribute information of drugs, and medium | |
CN104616027A (en) | Non-adjacent graph structure sparse face recognizing method | |
Vidyasagar | Probabilistic methods in cancer biology | |
Yang et al. | Clustering inter-arrival time of health care encounters for high utilizers | |
Yazdi et al. | Hierarchical tree clustering of fuzzy number | |
Thirumagal et al. | Lung cancer classification using exponential mean saturation linear unit activation function in various generative adversarial network models | |
Egho et al. | Healthcare trajectory mining by combining multidimensional component and itemsets | |
CN114298126A (en) | Brain function network classification method based on condition mutual information and kernel density estimation | |
Pedrycz et al. | Genetic design of feature spaces for pattern classifiers |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |