CN110097920B - Metabonomics data missing value filling method based on neighbor stability - Google Patents
Metabonomics data missing value filling method based on neighbor stability Download PDFInfo
- Publication number
- CN110097920B CN110097920B CN201910284004.XA CN201910284004A CN110097920B CN 110097920 B CN110097920 B CN 110097920B CN 201910284004 A CN201910284004 A CN 201910284004A CN 110097920 B CN110097920 B CN 110097920B
- Authority
- CN
- China
- Prior art keywords
- sample
- metabolite
- content
- samples
- deletion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 26
- 239000002207 metabolite Substances 0.000 claims abstract description 100
- 238000012217 deletion Methods 0.000 claims abstract description 42
- 230000037430 deletion Effects 0.000 claims abstract description 42
- 230000002503 metabolic effect Effects 0.000 claims abstract description 7
- 239000000523 sample Substances 0.000 claims description 89
- 238000002705 metabolomic analysis Methods 0.000 claims description 12
- 230000001431 metabolomic effect Effects 0.000 claims description 12
- 230000002596 correlated effect Effects 0.000 claims description 6
- 230000007812 deficiency Effects 0.000 claims description 4
- 238000004949 mass spectrometry Methods 0.000 claims description 4
- 239000012472 biological sample Substances 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 claims description 2
- 230000005484 gravity Effects 0.000 claims description 2
- 238000010606 normalization Methods 0.000 claims description 2
- 238000007405 data analysis Methods 0.000 abstract description 8
- 230000000694 effects Effects 0.000 abstract description 3
- 239000003550 marker Substances 0.000 abstract description 3
- 238000005516 engineering process Methods 0.000 abstract description 2
- 238000004422 calculation algorithm Methods 0.000 description 4
- 230000000875 corresponding effect Effects 0.000 description 4
- 238000001514 detection method Methods 0.000 description 3
- 238000004088 simulation Methods 0.000 description 3
- HSINOMROUCMIEA-FGVHQWLLSA-N (2s,4r)-4-[(3r,5s,6r,7r,8s,9s,10s,13r,14s,17r)-6-ethyl-3,7-dihydroxy-10,13-dimethyl-2,3,4,5,6,7,8,9,11,12,14,15,16,17-tetradecahydro-1h-cyclopenta[a]phenanthren-17-yl]-2-methylpentanoic acid Chemical compound C([C@@]12C)C[C@@H](O)C[C@H]1[C@@H](CC)[C@@H](O)[C@@H]1[C@@H]2CC[C@]2(C)[C@@H]([C@H](C)C[C@H](C)C(O)=O)CC[C@H]21 HSINOMROUCMIEA-FGVHQWLLSA-N 0.000 description 2
- 238000005481 NMR spectroscopy Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000003613 bile acid Substances 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N27/00—Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
- G01N27/62—Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B5/00—ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
Landscapes
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Physiology (AREA)
- General Physics & Mathematics (AREA)
- Immunology (AREA)
- Pathology (AREA)
- Electrochemistry (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Biochemistry (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
Abstract
The invention provides a neighbor stability-based metabonomics data missing value filling method, and belongs to the technical field of metabonomics data analysis. The core technology of the method is to measure the stability of the content of k nearest neighbor samples of the samples containing the missing metabolites on the corresponding metabolites, and fill different types of missing values by adopting different strategies respectively based on the stable nearest neighbor samples. The method has a good effect of filling metabonomics data containing a deletion value, and has important significance for subsequent data analysis, metabolic marker selection and the like.
Description
Technical Field
The invention belongs to the technical field of metabonomics data analysis, relates to a metabonomics data missing value filling method based on neighbor stability, and relates to a metabonomics data missing value filling method considering the missing type of metabolite missing values, the similarity relation among samples and the neighbor sample stability.
Background
Metabolomics searches for metabolites associated with physiopathological changes by systematically performing qualitative and quantitative studies on molecular metabolites in organisms. Methods for the qualitative and quantitative determination of different metabolites include mass spectrometry and nuclear magnetic resonance spectroscopy. In general, there are many missing values in metabolomics data obtained by mass spectrometry. These deletion values mainly originate from two aspects: firstly, random errors introduced in the data acquisition process or instrument operation cause that the content of certain metabolites in a sample is not detected, and the data deletion type is called random deletion; secondly, the content of the metabolite in the sample is lower than the detection limit of the mass spectrometer and is not detected, and the data deletion type is called non-random deletion. For example, the concentration of the metabolite bile acid in humans varies widely, and due to the existence of instrumental detection limits, the bile acid metabolite in the obtained metabolomic data may be a missing value in many samples. However, conventional data analysis methods are only suitable for processing complete data matrices without missing values. If metabolites or samples containing missing values in metabolomic data are directly deleted, much valuable information is lost. Therefore, filling missing data by using a simple and efficient method is an important task in metabonomics data analysis, and has important significance for subsequent data analysis, metabolic marker selection and the like.
Some metabolomics data deficiency value processing methods fill in the deficiency value for the corresponding metabolite using a zero value, a minimum value for the metabolite content, half or the median of the minimum value, etc. These methods are simpler but tend to have a greater impact on subsequent data analysis. The missing value filling algorithm based on k nearest neighbor is a common method for processing missing values in metabonomic data. The method considers that the greater the similarity between samples, the smaller the content deviation between their metabolites. If the content of the metabolite m of the sample s is missing, a missing value filling algorithm based on k nearest neighbors finds k nearest neighbor samples with the sample s according to the similarity measure (if the k nearest neighbor samples correspond to the missing content of the metabolite m, the k nearest neighbor samples are replaced by subsequent neighbors), and then fills the content of the missing metabolite m of the sample s with a weighted average of the content of the metabolite m of the k nearest neighbor samples. The missing value filling algorithm based on k nearest neighbor can better process random missing type data in metabonomics data, but the filling effect of the missing value filling algorithm based on k nearest neighbor is not ideal enough.
The method provides a neighbor stability-based metabonomics data missing value filling method. The method comprises the steps of determining k nearest neighbor samples of samples containing missing metabolites according to Euclidean distances among the samples, evaluating the stability of the nearest neighbor samples, and filling different types of missing values by adopting corresponding strategies based on the stable nearest neighbor samples.
Disclosure of Invention
The object of the present invention is to fill in missing values in metabolomic data. The core technology of the method is to measure the stability of the content of k nearest neighbor samples of the samples containing the missing metabolites on the corresponding metabolites, and fill different types of missing values by adopting different strategies respectively based on the stable nearest neighbor samples.
In order to achieve the above object, the technical solution adopted by the present invention is as follows:
a metabonomics data missing value filling method based on neighbor stability comprises the following steps:
detecting metabolic components in a biological sample by using a mass spectrometry, obtaining map data of the metabolic components, analyzing the map data by adopting preprocessing operations such as peak identification, peak matching, normalization and the like, determining the content of metabolites in the sample, and obtaining metabonomics data.
N denotes the number of samples in metabonomic data, p denotes the number of metabolites in the samples, x i =(x i1 ,x i2 ,…,x ip ) A value vector representing the content composition of p metabolites in the ith sample, 1 ≦ i ≦ n. Sample x in metabolomics data i The content of the middle metabolite m is absent (x) im Is a deletion value), m is more than or equal to 1 and less than or equal to p, the deletion value x is obtained by the following steps im Filling:
(1) calculating a sample x i And sample x j (1 ≦ i ≠ j ≦ n) Euclidean distance d (x) i ,x j ) The formula is as follows:
wherein o is il Represents a sample x i Whether the content of the l (1. ltoreq. l.ltoreq.p) metabolites is missing or not, when the sample x i In the absence of the content of the first metabolite of (1), o il 0, otherwise o il =1。Is shown at sample x i And sample x j Chinese herbal medicineThe number of metabolites that are not missing in amount. Distance d (x) i ,x j ) The smaller, x i And x j The higher the similarity between them. Determining the distance to the sample x by Euclidean distance i The most similar k samples constitute a sample set S k ;
(2) Judging the deletion type of the metabolite.
Pearson correlation coefficients between metabolite m and other metabolites were calculated. Finding out the metabolite aux _ m with the strongest correlation with m as the reference metabolite of m. And (3) judging the deletion type of the metabolite m according to the content distribution condition of the reference metabolite aux _ m, wherein the judgment process is as follows:
order S miss ={x j |x jm Is a deletion value, j is more than or equal to 1 and less than or equal to n represents a sample set of which the metabolite m is the deletion value in the metabonomic data. Order S obs ={x j |x jm Not missing values, 1 ≦ j ≦ n represents a set of samples in the metabonomic data for which metabolite m is not a missing value. Separately calculating reference metabolite aux _ m in sample set S miss And S obs The average content of (A) is recorded as mu miss And mu obs . When metabolite m is positively correlated with aux _ m and μ miss <μ obs If so, the deletion type of m is non-random deletion, and the step (3) is carried out; and conversely, if the deletion type of m is random deletion, the step (4) is carried out. When metabolite m is negatively correlated with aux _ m and μ miss >μ obs If so, the deletion type of m is non-random deletion, and the step (3) is carried out; and conversely, if the deletion type of m is random deletion, the step (4) is carried out.
(3) And (4) a non-random deletion type processing mode.
When S is k In the presence of a deficiency of the content of the metabolite m of the sample, temporarily populating S with the minimum content value of the metabolite m over all samples in the metabolomic data k The value of m missing from the sample. This step takes into account the fact that the metabolomics data contains non-random missing data. Non-random missing values occurred because the metabolite content was below the detection limit of the instrument and was not detected. Temporarily filling the missing m values of the neighbor sample with the minimum content value of the metabolite m more closely matches the non-random missing dataIs characterized in that.
(4) And (4) a random deletion type processing mode.
When S is k When the content of the metabolite m in the sample is absent, the metabolite m in S is used k Average content of metabolite m of the sample without deletion of medium content, temporarily filled with S k The content value of m missing from the sample. When S is k When the content of the metabolite m in the middle sample is missing, then the minimum content value of the metabolite m on all the remaining samples in the metabonomic data is used to temporarily fill in S k The content value of missing m of the sample.
(5) Stable neighbor samples are determined.
According to S k Determination of the degree of fluctuation of the content of the metabolite m in the sample S k Of the stable neighbor sample. Calculating S k Mean μ and standard deviation σ of the metabolite m content of the middle sample. When S is k In the presence of the metabolite m in the sample at [ mu-sigma, mu + sigma ]]Out of range, sample is taken from S k Deleting the neighbor samples to obtain a stable neighbor sample set S' k . Because the variation in metabolite content between neighboring samples is small, will [ mu-sigma, mu + sigma [ ]]The elimination of samples outside the range can reduce the influence of outliers to ensure stability and reliability in the computation of the fill values.
(6) Calculating S' k Weighted average of m content of middle sample metabolite, x calculated using equation (3) im Filling sample x i The content of the deletion metabolite m. The formula is as follows:
wherein k 'is | S' k L represents a sample set S' k Number of middle samples, s j ,s l (1. ltoreq. j, l. ltoreq. k ') is S' k Sample of (1), w (x) i ,s j ) Representing a sample s j In the calculation of x im The weight of the epoch. d (x) i ,s j ) Representing the sample x calculated by equation (1) i And s j European distance of(s) lm Representing a sample s l Content of metabolite m of (a). Based on neighboring samples and sample x i The distance size gives different weights to the content of m of different neighboring samples. S' k Middle sample and sample x i The smaller the distance, the more heavily weighted the content of its metabolite m, x is calculated im The greater the specific gravity.
The invention has the beneficial effects that:
the method is used for filling metabonomics missing data, missing value types of metabolites are considered, and different strategies are adopted for filling missing values according to different missing value types; and meanwhile, screening the adjacent samples, and filtering unstable adjacent samples. The method has a good effect of filling metabonomics data containing deletion values, and has important significance for subsequent data analysis, metabolic marker selection and the like.
Detailed Description
The following further describes the embodiments of the method on the simulation data in conjunction with the technical solutions, and the simulation data is only used to illustrate the present invention for easy understanding, but not to limit the present invention.
Table 1 shows the simulation data of the present invention, x i Denotes the ith sample, the data contains 10 samples, m 1 ~m 5 Representing 5 metabolites in the data and NaN representing the missing values in the data.
Table 1: analog data
The data in Table 1 contain 4 missing values, each x 13 ,x 52 ,x 84 ,x 93 . In the following with x 13 Are specifically described as examples.
(1) Calculating the sample x using equation (1) 1 The distance d from the other samples yields: d (x) 1 ,x 2 )=1.94,d(x 1 ,x 3 )=1.73,d(x 1 ,x 4 )=3.39,d(x 1 ,x 5 )=3.46,d(x 1 ,x 6 )=4.12,d(x 1 ,x 7 )=2.29,d(x 1 ,x 8 )=2.71,d(x 1 ,x 9 )=2.74,d(x 1 ,x 10 ) 3.16. Let k equal 6, then sum with sample x 1 The set of the most similar 6 samples is S k ={x 3 ,x 2 ,x 7 ,x 8 ,x 9 ,x 10 }。
(2) Determination of metabolite m 3 The type of deletion of (a). Calculate m 3 And m 1 ,m 2 ,m 4 ,m 5 The pearson correlation coefficient. Calculated, m 4 And m 3 M is selected if the correlation is strongest and positive correlation is present 4 Is m 3 The reference metabolite of (1). Metabolite m in the data 3 Set of samples S as missing values miss ={x 1 ,x 9 },m 3 Set of non-missing samples is S obs ={x 2 ,x 3 ,x 4 ,x 5 ,x 6 ,x 7 ,x 8 ,x 10 }. Reference metabolite m 4 At S miss Mean value of above μ miss Is 7 at S obs Mean value of above μ obs And was 4.86. Mu.s miss ≥μ obs If the deletion type is random deletion, the step (4) is entered.
(3) At x 1 Of the 6 nearest neighbor samples of (1), sample x 9 Metabolite m of 3 For missing values, then x is used 3 ,x 2 ,x 7 ,x 8 ,x 10 Metabolite m of 3 Average of 6 to fill x temporarily 93 The value of (c).
(4) Sample set S k M of middle sample 3 Corresponding to a value of {3,9,5,7,6,6}, S k M of middle sample 3 The mean value μ of (d) is 6 and the standard deviation σ is 2. The stability interval is then [4,8 ]]. Value x 33 ,x 23 Outside the stability interval, so sample x 3 ,x 2 From S k Is deleted, then S' k ={x 7 ,x 8 ,x 9 ,x 10 }。
(5) Calculating D' k The weight of the middle sample. Obtaining S 'from the formula (2)' k The weight of each sample in (a) is: w (x) 1 ,x 7 )=0.29,w(x 1 ,x 8 )=0.25,w(x 1 ,x 9 )=0.25,w(x 1 ,x 10 ) 0.21. Using equation (3), a weighted average x is calculated 13 =w(x 1 ,x 7 )*x 73 +w(x 1 ,x 8 )*x 83 +w(x 1 ,x 9 )*x 93 +w(x 1 ,x 10 )*x 10,3 5.95. Then 5.95 is taken as the missing value x 13 Is estimated to fill the value.
For missing value x 52 ,x 84 ,x 93 Filling is carried out by adopting the steps (1) to (6) respectively.
Claims (1)
1. A metabonomics data missing value filling method based on neighbor stability is characterized by comprising the following steps:
detecting metabolic components in a biological sample by using a mass spectrometry, obtaining map data of the metabolic components, analyzing the map data by adopting peak identification, peak matching and normalization pretreatment operations, determining the content of metabolites in the sample, and obtaining metabonomics data;
n denotes the number of samples in the metabolomic data, p denotes the number of metabolites in the samples, x i =(x i1 ,x i2 ,…,x ip ) A value vector representing the content composition of p metabolites in the ith sample, i is more than or equal to 1 and less than or equal to n; sample x in metabolomics data i The content of the middle metabolite m is absent, i.e. x im If m is greater than or equal to 1 and less than or equal to p, the missing value x is determined by the following steps im Filling:
(1) calculating a sample x i With other samples x j Euclidean distance d (x) i ,x j ) I ≠ j ≦ n, 1 ≦ as follows:
wherein o is il Representing a sample x i If the content of the first metabolite is missing, l is more than or equal to 1 and less than or equal to p, when the sample x i In the absence of the content of the first metabolite of (1), o il 0, otherwise o il =1;Is shown at sample x i And sample x j The number of metabolites whose contents are not deleted; distance d (x) i ,x j ) The smaller, x i And x j The higher the similarity between them; determining the distance to the sample x by Euclidean distance i The most similar k samples constitute a sample set S k ;
(2) Determination of the type of deletion of a metabolite
Calculating a pearson correlation coefficient between the metabolite m and the other metabolites; finding out a metabolite aux _ m with the strongest correlation with m as a reference metabolite of m; and (3) judging the deletion type of the metabolite m according to the content distribution condition of the reference metabolite aux _ m, wherein the judgment process is as follows:
order S miss ={x j |x jm J is more than or equal to 1 and less than or equal to n represents a sample set of which the metabolite m in the metabonomics data is a deletion value; order S obs ={x j |x jm J is not a deletion value, j is more than or equal to 1 and less than or equal to n represents a sample set of the metabolite m in the metabonomic data, wherein the metabolite m is not a deletion value; separately calculating reference metabolite aux _ m in sample set S miss And S obs The average content of (A) is recorded as mu miss And mu obs (ii) a When metabolite m is positively correlated with aux _ m and μ miss ≤μ obs If so, the deletion type of m is non-random deletion, and the step (3) is carried out; when metabolite m is positively correlated with aux _ m and μ miss >μ obs If yes, the deletion type of m is random deletion, and the step (4) is carried out; when metabolite m is negatively correlated with aux _ m and μ miss >μ obs If so, the deletion type of m is non-random deletion, and the step (3) is carried out; when metabolite m is negatively correlated with aux _ m and μ miss ≤μ obs Then m is randomIf the deletion exists, entering the step (4);
(3) non-random missing type processing mode
When S is k In the presence of a deficiency of the content of the metabolite m of the sample, temporarily populating S with the minimum content value of the metabolite m over all samples in the metabolomic data k The content value of m missing from the sample; entering the step (5);
(4) random miss type handling
When S is k When the content of the metabolite m in the sample is absent, the metabolite m in S is used k Average content of metabolite m of the sample without deletion of medium content, temporarily filled with S k The content value of m missing from the sample; when S is k When the content of the metabolite m in the middle sample is missing, then the minimum content value of the metabolite m on all the remaining samples in the metabonomic data is used to temporarily fill in S k The missing m content value of the sample; entering the step (5);
(5) determining stable neighbor samples
According to S k Determination of the degree of fluctuation of the content of the metabolite m in the sample S k A medium stable neighbor sample; calculating S k Mean value mu and standard deviation sigma of the metabolite m content of the medium sample; when S is k In the presence of the metabolite m in the sample in [ mu-sigma, [ mu + sigma ]]Out of range, sample is taken from S k Deleting the neighbor samples to obtain a stable neighbor sample set S' k ;
(6) Calculating S' k A weighted average of the m content of the metabolite m in the middle sample; x calculated using equation (3) im Filling sample x i The content of the deletion metabolite m of (a), the formula is as follows:
wherein k '═ S' k L represents a sample set S' k Number of samples in, s j ,s l (1. ltoreq. j, l. ltoreq. k ') is S' k Sample of (1), w (x) i ,s j ) Representing a sample s j In the calculation of x im The weight of the epoch; d (x) i ,s j ) Representing the sample x calculated by equation (1) i And s j European distance of(s) lm Representing a sample s l The content of metabolite m of (a); based on neighboring samples and sample x i The distance gives different weights to the content of m of different adjacent samples; s' k Middle sample and sample x i The smaller the distance, the more heavily weighted the content of its metabolite m, x is calculated im The greater the specific gravity.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910284004.XA CN110097920B (en) | 2019-04-10 | 2019-04-10 | Metabonomics data missing value filling method based on neighbor stability |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910284004.XA CN110097920B (en) | 2019-04-10 | 2019-04-10 | Metabonomics data missing value filling method based on neighbor stability |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110097920A CN110097920A (en) | 2019-08-06 |
CN110097920B true CN110097920B (en) | 2022-09-20 |
Family
ID=67444595
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910284004.XA Expired - Fee Related CN110097920B (en) | 2019-04-10 | 2019-04-10 | Metabonomics data missing value filling method based on neighbor stability |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110097920B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111737463B (en) * | 2020-06-04 | 2024-02-09 | 江苏名通信息科技有限公司 | Big data missing value filling method, device and computer readable memory |
CN111859275B (en) * | 2020-07-20 | 2022-08-12 | 厦门大学 | Mass spectrum data missing value filling method and system based on non-negative matrix factorization |
CN113485986A (en) * | 2021-06-25 | 2021-10-08 | 国网江苏省电力有限公司信息通信分公司 | Electric power data restoration method |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103177088A (en) * | 2013-03-08 | 2013-06-26 | 北京理工大学 | Biomedicine missing data compensation method |
WO2015004502A1 (en) * | 2013-07-09 | 2015-01-15 | Aselsan Elektronik Sanayi Ve Ticaret Anonim Sirketi | Method for imputing corrupted data based on localizing anomalous parts |
CN104298893A (en) * | 2014-09-30 | 2015-01-21 | 西南交通大学 | Imputation method of genetic expression deletion data |
CN105424827A (en) * | 2015-11-07 | 2016-03-23 | 大连理工大学 | Screening and calibrating method of metabolomic data random errors |
CN106407464A (en) * | 2016-10-12 | 2017-02-15 | 南京航空航天大学 | KNN-based improved missing data filling algorithm |
CN106777938A (en) * | 2016-12-06 | 2017-05-31 | 合肥工业大学 | A kind of microarray missing value estimation method based on adaptive weighting |
CN107193876A (en) * | 2017-04-21 | 2017-09-22 | 美林数据技术股份有限公司 | A kind of missing data complementing method based on arest neighbors KNN algorithms |
CN108256538A (en) * | 2016-12-28 | 2018-07-06 | 北京酷我科技有限公司 | A kind of subscriber data Forecasting Methodology and system |
CN108563770A (en) * | 2018-04-20 | 2018-09-21 | 南京邮电大学 | A kind of KPI and various dimensions network data cleaning method based on scene |
CN109472343A (en) * | 2018-10-16 | 2019-03-15 | 上海电机学院 | A kind of improvement sample data missing values based on GKNN fill up algorithm |
-
2019
- 2019-04-10 CN CN201910284004.XA patent/CN110097920B/en not_active Expired - Fee Related
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103177088A (en) * | 2013-03-08 | 2013-06-26 | 北京理工大学 | Biomedicine missing data compensation method |
WO2015004502A1 (en) * | 2013-07-09 | 2015-01-15 | Aselsan Elektronik Sanayi Ve Ticaret Anonim Sirketi | Method for imputing corrupted data based on localizing anomalous parts |
CN104298893A (en) * | 2014-09-30 | 2015-01-21 | 西南交通大学 | Imputation method of genetic expression deletion data |
CN105424827A (en) * | 2015-11-07 | 2016-03-23 | 大连理工大学 | Screening and calibrating method of metabolomic data random errors |
CN106407464A (en) * | 2016-10-12 | 2017-02-15 | 南京航空航天大学 | KNN-based improved missing data filling algorithm |
CN106777938A (en) * | 2016-12-06 | 2017-05-31 | 合肥工业大学 | A kind of microarray missing value estimation method based on adaptive weighting |
CN108256538A (en) * | 2016-12-28 | 2018-07-06 | 北京酷我科技有限公司 | A kind of subscriber data Forecasting Methodology and system |
CN107193876A (en) * | 2017-04-21 | 2017-09-22 | 美林数据技术股份有限公司 | A kind of missing data complementing method based on arest neighbors KNN algorithms |
CN108563770A (en) * | 2018-04-20 | 2018-09-21 | 南京邮电大学 | A kind of KPI and various dimensions network data cleaning method based on scene |
CN109472343A (en) * | 2018-10-16 | 2019-03-15 | 上海电机学院 | A kind of improvement sample data missing values based on GKNN fill up algorithm |
Non-Patent Citations (4)
Title |
---|
The Nearest Neighbor Algorithm of Filling Missing Data Based on Cluster Analysis;Chi Zhang等;《Applied Mechanics and Materials》;20130831;第347卷;2324-2328 * |
基于相关性组合变量的色谱数据分析方法;林晓惠等;《第21届全国色谱学术报告会及仪器展览会会议论文集》;20170519;121-122 * |
多组学联合缺失数据填补方法的评价;董学思等;《中国卫生统计》;20170825;第34卷(第04期);558-561+566 * |
质谱代谢组学数据预处理方法研究;刘月程等;《化学分析计量》;20180920;第27卷(第05期);105-109 * |
Also Published As
Publication number | Publication date |
---|---|
CN110097920A (en) | 2019-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110097920B (en) | Metabonomics data missing value filling method based on neighbor stability | |
US11790629B2 (en) | Intensity normalization in imaging mass spectrometry | |
CN111370067B (en) | LC/GC-MS-oriented metabonomics data quality control method and system | |
CN103890164B (en) | Cell recognition device and program | |
CN110796173A (en) | Load curve form clustering algorithm based on improved kmeans | |
Dordevic et al. | Statistical methods for improving verification of claims of origin for Italian wines based on stable isotope ratios | |
Henderson et al. | A comparison of PCA and MAF for ToF‐SIMS image interpretation | |
CN111521722A (en) | Method for identifying storage years of fragrant odor type finished product white spirit bottles | |
CN111122757A (en) | Metabonomics-based research method for bee toxicity effect caused by date flower honey | |
CN109557165B (en) | Method for monitoring the quality of a mass spectrometry imaging preparation workflow | |
CN111815681A (en) | Target tracking method based on deep learning and discriminant model training and memory | |
US20220091078A1 (en) | Training Data Generation Apparatus, Model Training Apparatus, Sample Characteristic Estimation Apparatus, and Chromatograph Mass Spectrometry Apparatus | |
Yu et al. | Comprehensive assessment of the diminished statistical power caused by nonlinear electrospray ionization responses in mass spectrometry-based metabolomics | |
Wang et al. | A dynamic wavelet-based algorithm for pre-processing tandem mass spectrometry data | |
Vutov et al. | Multiple two‐sample testing under arbitrary covariance dependency with an application in imaging mass spectrometry | |
WO2007147938A1 (en) | Normalizing spectroscopy data with multiple internal standards | |
CN101173918A (en) | Method for predicting biological, biochemical, biophysical, or pharmacological characteristics of a substance | |
CN116089828A (en) | Grape wine type identification method based on optimization support vector machine algorithm of sand cat group | |
CN114783539A (en) | Traditional Chinese medicine component analysis method and system based on spectral clustering | |
CN114141316A (en) | Method and system for predicting biological toxicity of organic matters based on spectrogram analysis | |
CN111210876A (en) | Disturbed metabolic pathway determination method and system | |
CN113569957A (en) | Object type identification method and device of business object and storage medium | |
Morohashi et al. | P-BOSS: a new filtering method for treasure hunting in metabolomics | |
CN117894385B (en) | Vinegar fermentation detection method and system based on component analysis technology | |
US20230282310A1 (en) | Microorganism Discrimination Method and System |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20220920 |