CN117349630A - Method and system for biochemical data analysis - Google Patents

Method and system for biochemical data analysis Download PDF

Info

Publication number
CN117349630A
CN117349630A CN202311641939.1A CN202311641939A CN117349630A CN 117349630 A CN117349630 A CN 117349630A CN 202311641939 A CN202311641939 A CN 202311641939A CN 117349630 A CN117349630 A CN 117349630A
Authority
CN
China
Prior art keywords
data
biochemical detection
detection parameter
biochemical
point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311641939.1A
Other languages
Chinese (zh)
Other versions
CN117349630B (en
Inventor
周静茹
曹志勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xingtai Medical College
Original Assignee
Xingtai Medical College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xingtai Medical College filed Critical Xingtai Medical College
Priority to CN202311641939.1A priority Critical patent/CN117349630B/en
Publication of CN117349630A publication Critical patent/CN117349630A/en
Application granted granted Critical
Publication of CN117349630B publication Critical patent/CN117349630B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention relates to the technical field of digital data processing, and provides a method and a system for biochemical data analysis, wherein the method comprises the following steps: acquiring a time sequence of biochemical detection parameters; acquiring a neighbor data set and local density according to the time sequence of the biochemical detection parameter, acquiring a local density change sequence according to the neighbor data set and the local density, and acquiring a structure change index according to the local density change sequence; acquiring a candidate representative point set according to the structure change index, acquiring a distance distribution difference degree and a distance distribution sequence according to the candidate representative point set, acquiring a target representative point according to the distance distribution sequence, and acquiring a clustering result of the biochemical detection parameter based on the target representative point by using a CURE clustering algorithm; and obtaining an analysis result of the biochemical detection parameters according to the clustering result of the biochemical detection parameters. The invention avoids the phenomenon that the representative points in the CURE clustering algorithm are intensively distributed, and improves the accuracy of the clustering result of the biochemical detection parameters.

Description

Method and system for biochemical data analysis
Technical Field
The invention relates to the technical field of digital data processing, in particular to a method and a system for biochemical data analysis.
Background
Biochemical data generally refers to data reflecting metabolism, physiological functions and disease states in a living body, and biochemical data analysis is commonly used in the medical field, and by analyzing some biochemical index data in a human body, the metabolic condition in the human body is further known, so as to judge the possibility of developing some diseases. Common biochemical data include biochemical detection parameter data of human urine, and the influence of kidney diseases on human body is researched through urine biochemical analysis. At present, due to the complexity of biochemical data analysis, the quality of the biochemical data analysis is poor, and more accurate scientific support cannot be provided for medical treatment.
In order to detect the level of an index in a patient, changes in the index of a patient suffering from kidney disease are mainly studied by a statistical method, but this method is time-consuming and extremely wasteful of human resources and is prone to errors. Along with the development of the digital data processing field, biochemical data of urine detection are obtained, and the index condition in urine can be rapidly obtained through a clustering analysis method. For example, the CURE hierarchical clustering algorithm can be used for performing cluster analysis on complex biochemical data. However, due to different detection indexes of kidney disease patients with different degrees, the selection of the cluster representative points greatly influences the effect of cluster analysis, and the accuracy of the cluster analysis is easy to be poor.
Disclosure of Invention
The invention provides a method and a system for biochemical data analysis, which aim to solve the problem of poor accuracy of cluster analysis, and the adopted technical scheme is as follows:
in a first aspect, one embodiment of the present invention is a method for biochemical data analysis, the method comprising the steps of:
acquiring a time sequence of biochemical detection parameters;
acquiring a local density and a neighbor data set of each data point in the time sequence of each biochemical detection parameter according to the time sequence of each biochemical detection parameter, and acquiring a local density change sequence of each data point in the time sequence of each biochemical detection parameter according to the local density and the neighbor data set of each data point in the time sequence of each biochemical detection parameter; acquiring a density difference index of each data point in the time sequence of each biochemical detection parameter according to the local density change sequence of each data point in the time sequence of each biochemical detection parameter; obtaining the structural change index of each data point in the time sequence of each biochemical detection parameter according to the density difference index of each data point in the time sequence of each biochemical detection parameter;
acquiring a candidate representative point set of each biochemical detection parameter according to the structural change index of the data points in the time sequence of each biochemical detection parameter; obtaining the distance distribution difference degree of each candidate representative point in the candidate representative point set of each biochemical detection parameter according to the candidate representative point set of each biochemical detection parameter; acquiring target representative points of each biochemical detection parameter according to the distance distribution difference degree of the candidate representative points in the candidate representative point set of each biochemical detection parameter, and acquiring a clustering result of each biochemical detection parameter based on the target representative points of each biochemical detection parameter by adopting a CURE clustering algorithm;
and acquiring an abnormal cluster of each biochemical detection parameter according to the clustering result of each biochemical detection parameter, and acquiring an analysis result of the biochemical detection parameter according to the abnormal cluster of the biochemical detection parameter.
Preferably, the method for obtaining the local density and the neighbor data set of each data point in the time sequence of each biochemical detection parameter according to the time sequence of each biochemical detection parameter, and obtaining the local density change sequence of each data point in the time sequence of each biochemical detection parameter according to the local density and the neighbor data set of each data point in the time sequence of each biochemical detection parameter comprises the following steps:
for the time sequence of each biochemical detection parameter, taking a set formed by all data points in the time sequence of the biochemical detection parameter as input of a DPC density peak clustering algorithm, and taking output of the DPC density peak clustering algorithm as local density of each data point in the time sequence of the biochemical detection parameter;
for each data point in the time sequence of each biochemical detection parameter, taking the data point as a central data point, and taking a set formed by all data points within a preset cut-off distance range of the central data point as a neighbor data set of the central data point;
for the time sequence of each biochemical detection parameter, a sequence formed by the local densities of all data points in the neighbor data set of each data point according to the ascending order of the numerical value is used as the local density change sequence of each data point.
Preferably, the method for obtaining the density difference index of each data point in the time sequence of each biochemical detection parameter according to the local density change sequence of each data point in the time sequence of each biochemical detection parameter comprises the following steps:
in the method, in the process of the invention,a density difference index representing the jth data point in the time series of the ith biochemical test parameter,representing an exponential function based on natural constants, < ->Representing the number of data in the neighbor dataset of the jth data point in the time series of the ith biochemical detection parameter, and>representation->Distance function->A local density change sequence representing the jth data point in the time series of the ith biochemical detection parameter,/for the jth data point>Local density change sequence of c-th data point in neighbor data set representing j-th data point in time series of i-th biochemical detection parameter,/v>And->Respectively representing the maximum value and the minimum value of data in the local density change sequence of the jth data point in the time sequence of the ith biochemical detection parameter.
Preferably, the method for obtaining the structural change index of each data point in the time sequence of each biochemical detection parameter according to the density difference index of each data point in the time sequence of each biochemical detection parameter comprises the following steps:
acquiring the local data neighbor degree of each data point in the time sequence of each biochemical detection parameter according to the neighbor data set of each data point in the time sequence of each biochemical detection parameter;
for each data point in the time sequence of each biochemical detection parameter, taking a negative mapping result taking a natural constant as a base and taking the local data neighbor of the data point as an index as a first product factor, and taking the product of the first product factor and the density difference index of the data point as a structure change index of the data point.
Preferably, the method for obtaining the local data proximity of each data point in the time sequence of each biochemical detection parameter according to the neighbor data set of each data point in the time sequence of each biochemical detection parameter comprises the following steps:
in the method, in the process of the invention,representing local data proximity of the jth data point in the time series of the ith biochemical test parameter,coefficient of variation of data in neighbor dataset representing jth data point in time series of ith biochemical detection parameter, +.>Representing the ith biochemical test parameterNumber of data in neighbor dataset of jth data point in time series of numbers, +.>And->The local densities of the d-th and b-th data points in the neighbor data set of the j-th data point in the time sequence of the i-th biochemical detection parameter are respectively represented.
Preferably, the method for obtaining the candidate representative point set of each biochemical detection parameter according to the structural change index of the data points in the time sequence of each biochemical detection parameter comprises the following steps:
taking a data set consisting of structural change indexes of all data points in the time sequence of each biochemical detection parameter as a structural data set of each biochemical detection parameter, taking all data in the structural data set of each biochemical detection parameter as the input of a k-means clustering algorithm, and taking the output of the k-means clustering algorithm as the clustering result of the structural data set of each biochemical detection parameter;
taking each cluster in the clustering result of the structural data set of each biochemical detection parameter as each data distribution category, acquiring the average value of all data in each data distribution category, and taking the average value as the average level of each data distribution category;
for each biochemical detection parameter, taking the data point closest to the average level of the data distribution category in each data distribution category as each candidate representative point, and taking a set formed by all candidate representative points as a candidate representative point set of the biochemical detection parameter.
Preferably, the specific method for obtaining the distance distribution difference degree of each candidate representative point in the candidate representative point set of each biochemical detection parameter according to the candidate representative point set of each biochemical detection parameter comprises the following steps:
in the method, in the process of the invention,distance distribution difference of g candidate representative points in candidate representative point set representing ith biochemical detection parameter, +.>Representing the number of candidate representative points in the candidate representative point set of the ith biochemical detection parameter, +.>Representing Euclidean distance function, ">Representing the position of the g candidate representative point in the data space in the candidate representative point set of the ith biochemical detection parameter,/for>The position of the h candidate representative point in the data space in the candidate representative point set of the ith biochemical detection parameter is represented.
Preferably, the method for obtaining the target representative point of each biochemical detection parameter according to the distance distribution difference degree of the candidate representative point in the candidate representative point set of each biochemical detection parameter and obtaining the clustering result of each biochemical detection parameter based on the target representative point of each biochemical detection parameter by adopting the CURE clustering algorithm comprises the following steps:
for each candidate representative point set of the biochemical detection parameters, taking a sequence formed by the distance distribution difference degrees of all candidate representative points in the candidate representative point set according to the ascending order of the numerical values as a distance distribution sequence of the biochemical detection parameters, and taking all candidate representative points corresponding to a preset number of distance distribution difference degrees at the tail end of the distance distribution sequence of the biochemical detection parameters as target representative points of the biochemical detection parameters;
for each biochemical detection parameter, taking all data in the time sequence of the biochemical detection parameter as input of a CURE clustering algorithm, taking a target representative point of the biochemical detection parameter as a representative point selected when all data points in the time sequence of the biochemical detection parameter are clustered, and taking output of the CURE clustering algorithm as a clustering result of the biochemical detection parameter.
Preferably, the method for obtaining the abnormal cluster of each biochemical detection parameter according to the clustering result of each biochemical detection parameter and obtaining the analysis result of the biochemical detection parameter according to the abnormal cluster of the biochemical detection parameter comprises the following steps:
for the clustering result of each biochemical detection parameter, calculating the element mean value of all elements in each clustering cluster in the clustering result, and obtaining a clustering cluster corresponding to the maximum element mean value and the minimum element mean value;
the biochemical detection parameters comprise urine specific gravity, urine beta-2-microglobulin, urine N-acetyl-D amino acid glucosidase and urine cystatin C, wherein a cluster corresponding to the mean value of the smallest element in the clustering result of the urine specific gravity is used as a first abnormal cluster, and a cluster corresponding to the mean value of the largest element in the clustering result of the time sequence of the urine beta-2-microglobulin, the urine N-acetyl-D amino acid glucosidase and the urine cystatin C is respectively used as a second abnormal cluster, a third abnormal cluster and a fourth abnormal cluster;
taking each element in the abnormal cluster of the biochemical detection parameters as each abnormal element, wherein each abnormal element represents the content of the biochemical detection parameters in urine of each patient, and taking the abnormal cluster of the biochemical detection parameters as an analysis result of the biochemical detection parameters.
In a second aspect, an embodiment of the present invention further provides a system for biochemical data analysis, including a memory, a processor, and a computer program stored in the memory and running on the processor, where the processor implements the steps of any one of the methods described above when executing the computer program.
The beneficial effects of the invention are as follows: the method comprises the steps of obtaining local density of time sequence data of each biochemical detection parameter by using a density peak clustering algorithm, obtaining a structure change index according to a change rule of the local density in a local neighborhood of a data point, obtaining candidate representative points by using a k-means clustering algorithm according to the structure change index, obtaining distance distribution difference degree and distance distribution sequence according to Euclidean distance between the candidate representative points, and selecting self-adaptive representative points according to the distance distribution sequence. The method has the advantages that the method combines the structural change of the local neighborhood of the data point and the Euclidean distance self-adaptive representative point, avoids the phenomenon of centralized distribution of the selected representative points when the biochemical detection parameters are clustered, and improves the accuracy of the clustering result of the biochemical detection parameters.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained according to these drawings without inventive faculty for a person skilled in the art.
FIG. 1 is a flow chart of a method for biochemical data analysis according to an embodiment of the present invention;
FIG. 2 is a flowchart showing an embodiment of a method for biochemical data analysis according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, a flowchart of a method for biochemical data analysis according to an embodiment of the present invention is shown, the method includes the following steps:
step S001, obtaining a time sequence of biochemical detection parameters.
The invention mainly analyzes the biochemical indexes of urine in a patient to obtain the clustering analysis result of biochemical detection parameters. The invention collects the biochemical examination in the urine of 500 patients in the recent in a biochemical data platform of a hospitalParameter measurement data, the biochemical detection parameters comprise urine specific gravityUrinary->-microglobulin->Urine N-acetyl-D amino acid glucosidaseUrocystatin C->. For each biochemical detection parameter, a sequence of biochemical detection parameter data in ascending order of patient seeing time is taken as a time sequence of each biochemical detection parameter.
To this end, a time series of each biochemical detection parameter is obtained.
Step S002, obtaining a neighbor data set and local density according to the time sequence of the biochemical detection parameters, obtaining a local density change sequence and a density difference index according to the neighbor data set and the local density, and obtaining a structure change index according to the density difference index.
The traditional CURE clustering algorithm only considers the relation of the distance between the data when selecting the representative points, but the selected partial representative points are easy to cause that the representative points do not have the representativeness of some data, namely the property of the representative points is not reflected well, and the accuracy of the obtained clustering result is poor. Therefore, the selection mode of the representative points needs to be improved so as to acquire accurate clustering results and improve the accuracy of biochemical data analysis. A flow chart of an embodiment of the present invention is shown in fig. 2.
Based on the above analysis, since the kidney damage to different degrees reflects different data conditions, it is necessary to analyze the distribution of biochemical data in order to accurately obtain representative points. Time series for each biochemical detection parameter:
in the method, in the process of the invention,time sequence representing the ith biochemical detection parameter, < ->And->The contents of the ith biochemical detection parameter in urine of the 1 st and nth patients are respectively shown.
For the time sequence of each biochemical detection parameter, the DPC density peak clustering algorithm is utilized, the preset cutoff distance is selected so that the number of data points with average surrounding distance of each data point smaller than the preset cutoff distance accounts for the number of all data points in the data setThe set formed by all time series data of each biochemical detection parameter is used as the input of a DPC density peak value clustering algorithm, the output of the DPC density peak value clustering algorithm is used as the local density of each data point in the time series of each biochemical detection parameter, and the DPC density peak value clustering algorithm is a known technology and is not redundant.
The change of the local density of the data point neighbor region can reflect the data distribution structure in the data point neighbor region to a certain extent, and in order to select an effective representative point by using the CURE clustering algorithm, the spatial distribution of the data needs to be analyzed.
Specifically, for each time series of biochemical detection parameters, each data point is taken as each center data point, and a set of data points within a truncated distance range of each center data point is taken as a neighbor data set of each center data point. Further, a sequence in which the local densities of all data points in the neighbor data set of each data point are formed in order of ascending numerical value is taken as a local density change sequence of each data point.
Calculating a density difference index for each data point in the time series of each biochemical detection parameter:
in the method, in the process of the invention,a density difference index representing the jth data point in the time series of the ith biochemical test parameter,representing an exponential function based on natural constants, < ->Representing the number of data in the neighbor dataset of the jth data point in the time series of the ith biochemical detection parameter, and>representation->Distance function->A local density change sequence representing the jth data point in the time series of the ith biochemical detection parameter,/for the jth data point>Local density change sequence of c-th data point in neighbor data set representing j-th data point in time series of i-th biochemical detection parameter,/v>And->Respectively representing the maximum value and the minimum value of data in the local density change sequence of the jth data point in the time sequence of the ith biochemical detection parameter.
Ith biochemical test parametersBetween the sequence of local density changes of the jth data point in the time series of data points and the sequence of local density changes of the c data point in the neighbor data set of that data pointDistance->The larger the difference between the maximum value and the minimum value of the data in the local density change sequence of the jth data point in the time sequence of the ith biochemical detection parameter +.>The larger the sequence of local density changes, the lower the similarity between the sequence of local density changes, and the larger the local density change, i.e., the larger the local area density difference of the data points, the larger the density difference index.
Further, the structural change index of each data point in the time series of each biochemical detection parameter is calculated:
in the method, in the process of the invention,representing local data proximity of the jth data point in the time series of the ith biochemical test parameter,coefficient of variation of data in neighbor dataset representing jth data point in time series of ith biochemical detection parameter, +.>Representing the number of data in the neighbor dataset of the jth data point in the time series of the ith biochemical detection parameter, and>and->Local densities of the d-th and b-th data points in the neighbor data set respectively representing the j-th data point in the time series of the i-th biochemical detection parameter, +.>Index of structural change indicating the jth data point in the time series of the ith biochemical test parameter, +.>Density difference index indicating the jth data point in the time series of the ith biochemical test parameter,/>An exponential function based on a natural constant is represented.
Differences between local densities of the d-th and b-th data points in the neighbor data set of the j-th data point in the time series of the i-th biochemical detection parameterThe larger, and the coefficient of variation of the data in the neighbor dataset of the jth data point in the time series of the ith biochemical detection parameter +.>The larger the density variation in the neighborhood of the data point, the larger the difference in local spatial distribution variation of the data point, and the smaller the local data neighborhood. In addition, local data proximity +_for the jth data point in the time series of the ith biochemical detection parameter>The smaller the density difference index +.>The larger the description numberThe larger the data change of the local area of the data point, namely the larger the data structure change, the larger the structure change index.
Thus, the structural change index of each data point in the time sequence of each biochemical detection parameter is obtained.
Step S003, a candidate representative point set is obtained according to the structure change index, a distance distribution sequence is obtained according to the candidate representative point set, a target representative point is obtained according to the distance distribution sequence, and a clustering result of biochemical detection parameters is obtained based on the target representative point by using a CURE clustering algorithm.
The structure change index reflects the discrete condition of the local spatial distribution of the data to a certain extent, and the representative points selected in the CURE clustering algorithm need to have the representativeness of the data characteristics, so that the consideration of the data distribution characteristics of the data point neighbor areas is beneficial to selecting effective representative points, and a better clustering result is obtained.
Further, a data set composed of the structural change indexes of all data points in the time series of each biochemical detection parameter is taken as a structural data set of each biochemical detection parameter. In order to select a proper representative point, using a k-means clustering algorithm, taking all data in the structural data set of each biochemical detection parameter as the input of the k-means clustering algorithm, presetting the empirical value of the classification parameter k to be 30, measuring the distance to be Euclidean distance, and taking the output of the k-means clustering algorithm as the clustering result of the structural data set of each biochemical detection parameter. And for the clustering result of the structural data set of each biochemical detection parameter, acquiring 30 clustering clusters in the clustering result, taking each clustering cluster as each data distribution category, calculating the data average value of each data distribution category, and taking the data average value of each data distribution category as the average level of each data distribution category.
Based on the above analysis, for each biochemical detection parameter data, in order to obtain effective representative points, the representative points are selected in consideration of the data structure information. Specifically, for each biochemical detection parameter, a data point corresponding to an element value closest to the average level of the data distribution category in each data distribution category is taken as each candidate representative point, and a set formed by all candidate representative points is taken as a candidate representative point set of the biochemical detection parameter. Different candidate representative points in the candidate representative point set can represent data points of different data distribution characteristics to a certain extent.
Further, selecting representative points requires avoiding a centralized distribution of data points, because the representative points of the centralized distribution are not sufficiently representative of the entire data set. Therefore, the distance distribution difference degree of each candidate representative point in the candidate representative point set of each biochemical detection parameter is calculated in consideration of the euclidean distance between the candidate representative points:
in the method, in the process of the invention,distance distribution difference of g candidate representative points in candidate representative point set representing ith biochemical detection parameter, +.>Representing the number of candidate representative points in the candidate representative point set of the ith biochemical detection parameter, +.>Representing Euclidean distance function, ">Representing the position of the g candidate representative point in the data space in the candidate representative point set of the ith biochemical detection parameter,/for>The position of the h candidate representative point in the data space in the candidate representative point set of the ith biochemical detection parameter is represented.
Euclidean distance between g and h candidate representative points in candidate representative point set of ith biochemical detection parameter in data spaceThe smaller the candidate representative point is, the closer the candidate representative point is to the rest candidate representative points is, and the more likely the concentration distribution phenomenon of the representative points is, the smaller the distance distribution difference degree is.
Further, for each candidate representative point set of the biochemical detection parameters, a sequence formed by the distance distribution differences of all candidate representative points in the candidate representative point set according to the ascending order of the values is used as a distance distribution sequence.
For the time series data of each biochemical detection parameter, using a CURE clustering algorithm, taking all the time series data of each biochemical detection parameter as the input of the CURE clustering algorithm, wherein the preset cluster number is 15, the preset representative point number is 20, the contraction factor takes an empirical value of 0.9, and taking the output of the CURE clustering algorithm as the clustering result of all the time series data of each biochemical detection parameter. It should be noted that, the preset number of representative points in the CURE clustering algorithm is 20, and the selected target representative points are candidate representative points corresponding to the last 20 elements of the distance distribution sequence of each parameter. Thus, when clustering is performed on time-series data of each biochemical detection parameter, 20 different target representative points can be obtained.
The representative points selected in the traditional CURE clustering algorithm are only based on the distance, so that the phenomenon that the selected representative points are intensively distributed is easy to cause, and the biochemical detection parameter data set cannot be represented well. The method considers the structural change of the local neighborhood of the data points, and combines the distance between the data points to obtain more representative points, thereby obtaining more accurate clustering results.
So far, the clustering result of each biochemical detection parameter is obtained.
Step S004, performing abnormal analysis according to the clustering result of the biochemical detection parameters to obtain an analysis result of the biochemical detection parameters.
Respectively obtain urine specific gravity and urine-clustering results corresponding to microglobulin, urinary N-acetyl-D amino acid glucosidase, urinary cystatin C. For each of the parameters of the biochemical tests,and calculating element average values in each cluster in the clustering result, and acquiring a cluster corresponding to the maximum element average value and a cluster corresponding to the minimum element average value.
Medical research shows that urine in human bodyWhen the content of microglobulin, urinary N-acetyl-D amino acid glucosidase and urinary cystatin C is high, the abnormal phenomenon is caused, and kidney diseases are likely to exist; when the urine specific gravity content is low, it is an abnormal phenomenon, and kidney diseases are more likely to occur.
Therefore, the cluster corresponding to the smallest element mean value in the clustering result of the time series data of urine density is taken as the first abnormal cluster, and urine is respectively taken as the first abnormal clusterAnd taking a cluster corresponding to the maximum element mean value in the clustering result of the time sequence data of the microglobulin, the urine N-acetyl-D amino acid glucosidase and the urine cystatin C as a second abnormal cluster, a third abnormal cluster and a fourth abnormal cluster. The abnormal cluster reflects the abnormal condition of the kidney in the patient to a certain extent, and the patient corresponding to each element in the abnormal cluster is more likely to suffer from kidney diseases. Therefore, the abnormal cluster of the biochemical detection parameter is used as the analysis result of the biochemical detection parameter.
Based on the same inventive concept as the method, the embodiment of the invention also provides a system for biochemical data analysis, after the clustering result of the biochemical detection parameters is obtained, the clustering result of the biochemical detection parameters is transmitted to an abnormal analysis module, the abnormal clustering cluster of the clustering result of each biochemical detection parameter is obtained by using the method, and the abnormal clustering cluster in the clustering result of each biochemical detection parameter is used as the analysis result of the biochemical detection parameters.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. The above description is only of the preferred embodiments of the present invention and is not intended to limit the invention, but any modifications, equivalent substitutions, improvements, etc. within the principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method for biochemical data analysis, the method comprising the steps of:
acquiring a time sequence of biochemical detection parameters;
acquiring a local density and a neighbor data set of each data point in the time sequence of each biochemical detection parameter according to the time sequence of each biochemical detection parameter, and acquiring a local density change sequence of each data point in the time sequence of each biochemical detection parameter according to the local density and the neighbor data set of each data point in the time sequence of each biochemical detection parameter; acquiring a density difference index of each data point in the time sequence of each biochemical detection parameter according to the local density change sequence of each data point in the time sequence of each biochemical detection parameter; obtaining the structural change index of each data point in the time sequence of each biochemical detection parameter according to the density difference index of each data point in the time sequence of each biochemical detection parameter;
acquiring a candidate representative point set of each biochemical detection parameter according to the structural change index of the data points in the time sequence of each biochemical detection parameter; obtaining the distance distribution difference degree of each candidate representative point in the candidate representative point set of each biochemical detection parameter according to the candidate representative point set of each biochemical detection parameter; acquiring target representative points of each biochemical detection parameter according to the distance distribution difference degree of the candidate representative points in the candidate representative point set of each biochemical detection parameter, and acquiring a clustering result of each biochemical detection parameter based on the target representative points of each biochemical detection parameter by adopting a CURE clustering algorithm;
and acquiring an abnormal cluster of each biochemical detection parameter according to the clustering result of each biochemical detection parameter, and acquiring an analysis result of the biochemical detection parameter according to the abnormal cluster of the biochemical detection parameter.
2. The method for analyzing biochemical data according to claim 1, wherein the method for obtaining the local density and the neighbor data set of each data point in the time sequence of each biochemical detection parameter according to the time sequence of each biochemical detection parameter, and obtaining the local density change sequence of each data point in the time sequence of each biochemical detection parameter according to the local density and the neighbor data set of each data point in the time sequence of each biochemical detection parameter comprises:
for the time sequence of each biochemical detection parameter, taking a set formed by all data points in the time sequence of the biochemical detection parameter as input of a DPC density peak clustering algorithm, and taking output of the DPC density peak clustering algorithm as local density of each data point in the time sequence of the biochemical detection parameter;
for each data point in the time sequence of each biochemical detection parameter, taking the data point as a central data point, and taking a set formed by all data points within a preset cut-off distance range of the central data point as a neighbor data set of the central data point;
for the time sequence of each biochemical detection parameter, a sequence formed by the local densities of all data points in the neighbor data set of each data point according to the ascending order of the numerical value is used as the local density change sequence of each data point.
3. The method for analyzing biochemical data according to claim 1, wherein the method for obtaining the density difference index of each data point in the time series of each biochemical detection parameter from the local density variation sequence of each data point in the time series of each biochemical detection parameter comprises:
in the method, in the process of the invention,representing the ith biochemical test parameterDensity difference index, ++j, for the jth data point in the time series>Representing an exponential function based on natural constants, < ->Representing the number of data in the neighbor dataset of the jth data point in the time series of the ith biochemical detection parameter, and>representation->Distance function->A local density change sequence representing the jth data point in the time series of the ith biochemical detection parameter,/for the jth data point>Local density change sequence of c-th data point in neighbor data set representing j-th data point in time series of i-th biochemical detection parameter,/v>And->Respectively representing the maximum value and the minimum value of data in the local density change sequence of the jth data point in the time sequence of the ith biochemical detection parameter.
4. The method for analyzing biochemical data according to claim 1, wherein the method for obtaining the structural change index of each data point in the time series of each biochemical test parameter according to the density difference index of each data point in the time series of each biochemical test parameter comprises:
acquiring the local data neighbor degree of each data point in the time sequence of each biochemical detection parameter according to the neighbor data set of each data point in the time sequence of each biochemical detection parameter;
for each data point in the time sequence of each biochemical detection parameter, taking a negative mapping result taking a natural constant as a base and taking the local data neighbor of the data point as an index as a first product factor, and taking the product of the first product factor and the density difference index of the data point as a structure change index of the data point.
5. The method for analyzing biochemical data according to claim 4, wherein the method for obtaining the local data proximity of each data point in the time series of each biochemical detection parameter from the neighboring data set of each data point in the time series of each biochemical detection parameter comprises:
in the method, in the process of the invention,local data proximity,/-for the jth data point in the time series representing the ith biochemical detection parameter>Coefficient of variation of data in neighbor dataset representing jth data point in time series of ith biochemical detection parameter, +.>Representing the number of data in the neighbor dataset of the jth data point in the time series of the ith biochemical detection parameter, and>and->The local densities of the d-th and b-th data points in the neighbor data set of the j-th data point in the time sequence of the i-th biochemical detection parameter are respectively represented.
6. The method for analyzing biochemical data according to claim 1, wherein the method for obtaining the candidate representative point set of each biochemical detection parameter according to the structural change index of the data points in the time series of each biochemical detection parameter is as follows:
taking a data set consisting of structural change indexes of all data points in the time sequence of each biochemical detection parameter as a structural data set of each biochemical detection parameter, taking all data in the structural data set of each biochemical detection parameter as the input of a k-means clustering algorithm, and taking the output of the k-means clustering algorithm as the clustering result of the structural data set of each biochemical detection parameter;
taking each cluster in the clustering result of the structural data set of each biochemical detection parameter as each data distribution category, acquiring the average value of all data in each data distribution category, and taking the average value as the average level of each data distribution category;
for each biochemical detection parameter, taking the data point closest to the average level of the data distribution category in each data distribution category as each candidate representative point, and taking a set formed by all candidate representative points as a candidate representative point set of the biochemical detection parameter.
7. The method for analyzing biochemical data according to claim 1, wherein the specific method for obtaining the distance distribution difference degree of each candidate representative point in the candidate representative point set of each biochemical detection parameter according to the candidate representative point set of each biochemical detection parameter is as follows:
in the method, in the process of the invention,distance distribution difference of g candidate representative points in candidate representative point set representing ith biochemical detection parameter, +.>Representing the number of candidate representative points in the candidate representative point set of the ith biochemical detection parameter, +.>Representing Euclidean distance function, ">Representing the position of the g candidate representative point in the data space in the candidate representative point set of the ith biochemical detection parameter,/for>The position of the h candidate representative point in the data space in the candidate representative point set of the ith biochemical detection parameter is represented.
8. The method for analyzing biochemical data according to claim 1, wherein the method for obtaining the target representative point of each biochemical detection parameter according to the difference of the distance distribution of the candidate representative points in the candidate representative point set of each biochemical detection parameter, and obtaining the clustering result of each biochemical detection parameter based on the target representative point of each biochemical detection parameter by using the CURE clustering algorithm comprises the following steps:
for each candidate representative point set of the biochemical detection parameters, taking a sequence formed by the distance distribution difference degrees of all candidate representative points in the candidate representative point set according to the ascending order of the numerical values as a distance distribution sequence of the biochemical detection parameters, and taking all candidate representative points corresponding to a preset number of distance distribution difference degrees at the tail end of the distance distribution sequence of the biochemical detection parameters as target representative points of the biochemical detection parameters;
for each biochemical detection parameter, taking all data in the time sequence of the biochemical detection parameter as input of a CURE clustering algorithm, taking a target representative point of the biochemical detection parameter as a representative point selected when all data points in the time sequence of the biochemical detection parameter are clustered, and taking output of the CURE clustering algorithm as a clustering result of the biochemical detection parameter.
9. The method for analyzing biochemical data according to claim 1, wherein the method for acquiring the abnormal cluster of each biochemical detection parameter according to the clustering result of each biochemical detection parameter and acquiring the analysis result of the biochemical detection parameter according to the abnormal cluster of the biochemical detection parameter comprises the steps of:
for the clustering result of each biochemical detection parameter, calculating the element mean value of all elements in each clustering cluster in the clustering result, and obtaining a clustering cluster corresponding to the maximum element mean value and the minimum element mean value;
the biochemical detection parameters comprise urine specific gravity and urine-microglobulin, urine N-acetyl-D amino acid glucosidase, urine cystatin C, taking a cluster corresponding to the mean value of the minimum element in the clustering result of urine specific gravity as a first abnormal cluster, and respectively taking urine->The cluster corresponding to the maximum element mean value in the clustering results of the time sequence of the microglobulin, the urine N-acetyl-D amino acid glucosidase and the urine cystatin C are respectively used as a second abnormal cluster, a third abnormal cluster and a fourth abnormal cluster;
taking each element in the abnormal cluster of the biochemical detection parameters as each abnormal element, wherein each abnormal element represents the content of the biochemical detection parameters in urine of each patient, and taking the abnormal cluster of the biochemical detection parameters as an analysis result of the biochemical detection parameters.
10. A system for biochemical data analysis comprising a memory, a processor and a computer program stored in the memory and running on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1-9 when executing the computer program.
CN202311641939.1A 2023-12-04 2023-12-04 Method and system for biochemical data analysis Active CN117349630B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311641939.1A CN117349630B (en) 2023-12-04 2023-12-04 Method and system for biochemical data analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311641939.1A CN117349630B (en) 2023-12-04 2023-12-04 Method and system for biochemical data analysis

Publications (2)

Publication Number Publication Date
CN117349630A true CN117349630A (en) 2024-01-05
CN117349630B CN117349630B (en) 2024-02-23

Family

ID=89356049

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311641939.1A Active CN117349630B (en) 2023-12-04 2023-12-04 Method and system for biochemical data analysis

Country Status (1)

Country Link
CN (1) CN117349630B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117786582A (en) * 2024-02-23 2024-03-29 江苏斐能软件科技有限公司 Intelligent monitoring method and system for abnormal power consumption state based on data driving
CN117809070A (en) * 2024-03-01 2024-04-02 唐山市食品药品综合检验检测中心(唐山市农产品质量安全检验检测中心、唐山市检验检测研究院) Spectral data intelligent processing method for detecting pesticide residues in vegetables
CN117978081A (en) * 2024-04-01 2024-05-03 誉金新能源科技(山东)有限公司 Photovoltaic array self-cleaning system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111582326A (en) * 2020-04-22 2020-08-25 长沙理工大学 Method and equipment for selecting cluster center of density peak clustering algorithm
CN112070109A (en) * 2020-07-21 2020-12-11 广东工业大学 Calla kiln energy consumption abnormity detection method based on improved density peak clustering
CN112345261A (en) * 2020-10-29 2021-02-09 南京航空航天大学 Aero-engine pumping system abnormity detection method based on improved DBSCAN algorithm
CN114613456A (en) * 2022-03-07 2022-06-10 哈尔滨理工大学 High-entropy alloy hardness prediction method based on improved density peak value clustering algorithm
CN115496138A (en) * 2022-09-16 2022-12-20 桂林理工大学 Self-adaptive density peak value clustering method based on natural neighbors
CN116959587A (en) * 2023-09-19 2023-10-27 深圳赛威玛智能科技有限公司 Pathogenic microorganism data real-time online analysis system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111582326A (en) * 2020-04-22 2020-08-25 长沙理工大学 Method and equipment for selecting cluster center of density peak clustering algorithm
CN112070109A (en) * 2020-07-21 2020-12-11 广东工业大学 Calla kiln energy consumption abnormity detection method based on improved density peak clustering
CN112345261A (en) * 2020-10-29 2021-02-09 南京航空航天大学 Aero-engine pumping system abnormity detection method based on improved DBSCAN algorithm
CN114613456A (en) * 2022-03-07 2022-06-10 哈尔滨理工大学 High-entropy alloy hardness prediction method based on improved density peak value clustering algorithm
CN115496138A (en) * 2022-09-16 2022-12-20 桂林理工大学 Self-adaptive density peak value clustering method based on natural neighbors
CN116959587A (en) * 2023-09-19 2023-10-27 深圳赛威玛智能科技有限公司 Pathogenic microorganism data real-time online analysis system

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
"快速特征映射优化的流形密度峰聚类", 《南京大学学报(自然科学)》, vol. 54, no. 5, pages 838 - 847 *
YIFAN SHI ET AL.: "Fast and Effective Active Clustering Ensemble Based on Density Peak", 《IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS》, vol. 32, no. 8, pages 3593 - 3607, XP011868698, DOI: 10.1109/TNNLS.2020.3015795 *
YIZHANG WANG ET AL.: "McDPC: multi-center density peak clustering", 《NEURAL COMPUTING AND APPLICATIONS》, pages 13465 *
王军等: "混合的密度峰值聚类算法", 《计算机应用》, vol. 39, no. 2, pages 403 *
陈忠华等: "基于一致性 K 均值聚类的电动汽车充电负荷建模方法", 《现代电力》, vol. 39, no. 3, pages 338 - 348 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117786582A (en) * 2024-02-23 2024-03-29 江苏斐能软件科技有限公司 Intelligent monitoring method and system for abnormal power consumption state based on data driving
CN117786582B (en) * 2024-02-23 2024-05-07 江苏斐能软件科技有限公司 Intelligent monitoring method and system for abnormal power consumption state based on data driving
CN117809070A (en) * 2024-03-01 2024-04-02 唐山市食品药品综合检验检测中心(唐山市农产品质量安全检验检测中心、唐山市检验检测研究院) Spectral data intelligent processing method for detecting pesticide residues in vegetables
CN117809070B (en) * 2024-03-01 2024-05-14 唐山市食品药品综合检验检测中心(唐山市农产品质量安全检验检测中心、唐山市检验检测研究院) Spectral data intelligent processing method for detecting pesticide residues in vegetables
CN117978081A (en) * 2024-04-01 2024-05-03 誉金新能源科技(山东)有限公司 Photovoltaic array self-cleaning system

Also Published As

Publication number Publication date
CN117349630B (en) 2024-02-23

Similar Documents

Publication Publication Date Title
CN117349630B (en) Method and system for biochemical data analysis
CN110444287B (en) Methods for identifying and diagnosing pulmonary diseases using a classification system and kits therefor
CN111161879B (en) Disease prediction system based on big data
JP7286863B2 (en) Automated validation of medical data
CN107908819B (en) Method and device for predicting user state change
CN109817339B (en) Patient grouping method and device based on big data
CN108766559B (en) Clinical decision support method and system for intelligent disease screening
CN113053535B (en) Medical information prediction system and medical information prediction method
CN112633601A (en) Method, device, equipment and computer medium for predicting disease event occurrence probability
CN113539460A (en) Intelligent diagnosis guiding method and device for remote medical platform
CN110812241A (en) Medication reminding method based on time sequence clustering and related equipment
CN114494215A (en) Transformer-based thyroid nodule detection method
CN107169264B (en) complex disease diagnosis system
CN116564409A (en) Machine learning-based identification method for sequencing data of transcriptome of metastatic breast cancer
CN117476247B (en) Intelligent analysis method for disease multi-mode data
Dou et al. Comparative analysis of weka-based classification algorithms on medical diagnosis datasets
CN116665907A (en) Deep learning-based risk prediction method for ADPKD patient entering ESRD
CN112102952B (en) Method for identifying pathology category based on distance calculation method and related equipment
CN113779295A (en) Retrieval method, device, equipment and medium for abnormal cell image features
CN113658110A (en) Medical image identification method based on dynamic field adaptive learning
CN116052889B (en) sFLC prediction system based on blood routine index detection
CN116504394B (en) Auxiliary medical method and device based on multi-feature fusion and computer storage medium
CN116230193B (en) Intelligent hospital file management method and system
CN117476110B (en) Multi-scale biomarker discovery system based on artificial intelligence
KR102617046B1 (en) Sleep stage prediction method using deep learning model and analysis apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant