CN110097920B - Metabonomics data missing value filling method based on neighbor stability - Google Patents

Metabonomics data missing value filling method based on neighbor stability Download PDF

Info

Publication number
CN110097920B
CN110097920B CN201910284004.XA CN201910284004A CN110097920B CN 110097920 B CN110097920 B CN 110097920B CN 201910284004 A CN201910284004 A CN 201910284004A CN 110097920 B CN110097920 B CN 110097920B
Authority
CN
China
Prior art keywords
sample
metabolite
content
samples
deletion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201910284004.XA
Other languages
Chinese (zh)
Other versions
CN110097920A (en
Inventor
罗霄
李超
林晓惠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN201910284004.XA priority Critical patent/CN110097920B/en
Publication of CN110097920A publication Critical patent/CN110097920A/en
Application granted granted Critical
Publication of CN110097920B publication Critical patent/CN110097920B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N27/00Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
    • G01N27/62Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks

Landscapes

  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Physiology (AREA)
  • General Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Electrochemistry (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

The invention provides a neighbor stability-based metabonomics data missing value filling method, and belongs to the technical field of metabonomics data analysis. The core technology of the method is to measure the stability of the content of k nearest neighbor samples of the samples containing the missing metabolites on the corresponding metabolites, and fill different types of missing values by adopting different strategies respectively based on the stable nearest neighbor samples. The method has a good effect of filling metabonomics data containing a deletion value, and has important significance for subsequent data analysis, metabolic marker selection and the like.

Description

Metabonomics data missing value filling method based on neighbor stability
Technical Field
The invention belongs to the technical field of metabonomics data analysis, relates to a metabonomics data missing value filling method based on neighbor stability, and relates to a metabonomics data missing value filling method considering the missing type of metabolite missing values, the similarity relation among samples and the neighbor sample stability.
Background
Metabolomics searches for metabolites associated with physiopathological changes by systematically performing qualitative and quantitative studies on molecular metabolites in organisms. Methods for the qualitative and quantitative determination of different metabolites include mass spectrometry and nuclear magnetic resonance spectroscopy. In general, there are many missing values in metabolomics data obtained by mass spectrometry. These deletion values mainly originate from two aspects: firstly, random errors introduced in the data acquisition process or instrument operation cause that the content of certain metabolites in a sample is not detected, and the data deletion type is called random deletion; secondly, the content of the metabolite in the sample is lower than the detection limit of the mass spectrometer and is not detected, and the data deletion type is called non-random deletion. For example, the concentration of the metabolite bile acid in humans varies widely, and due to the existence of instrumental detection limits, the bile acid metabolite in the obtained metabolomic data may be a missing value in many samples. However, conventional data analysis methods are only suitable for processing complete data matrices without missing values. If metabolites or samples containing missing values in metabolomic data are directly deleted, much valuable information is lost. Therefore, filling missing data by using a simple and efficient method is an important task in metabonomics data analysis, and has important significance for subsequent data analysis, metabolic marker selection and the like.
Some metabolomics data deficiency value processing methods fill in the deficiency value for the corresponding metabolite using a zero value, a minimum value for the metabolite content, half or the median of the minimum value, etc. These methods are simpler but tend to have a greater impact on subsequent data analysis. The missing value filling algorithm based on k nearest neighbor is a common method for processing missing values in metabonomic data. The method considers that the greater the similarity between samples, the smaller the content deviation between their metabolites. If the content of the metabolite m of the sample s is missing, a missing value filling algorithm based on k nearest neighbors finds k nearest neighbor samples with the sample s according to the similarity measure (if the k nearest neighbor samples correspond to the missing content of the metabolite m, the k nearest neighbor samples are replaced by subsequent neighbors), and then fills the content of the missing metabolite m of the sample s with a weighted average of the content of the metabolite m of the k nearest neighbor samples. The missing value filling algorithm based on k nearest neighbor can better process random missing type data in metabonomics data, but the filling effect of the missing value filling algorithm based on k nearest neighbor is not ideal enough.
The method provides a neighbor stability-based metabonomics data missing value filling method. The method comprises the steps of determining k nearest neighbor samples of samples containing missing metabolites according to Euclidean distances among the samples, evaluating the stability of the nearest neighbor samples, and filling different types of missing values by adopting corresponding strategies based on the stable nearest neighbor samples.
Disclosure of Invention
The object of the present invention is to fill in missing values in metabolomic data. The core technology of the method is to measure the stability of the content of k nearest neighbor samples of the samples containing the missing metabolites on the corresponding metabolites, and fill different types of missing values by adopting different strategies respectively based on the stable nearest neighbor samples.
In order to achieve the above object, the technical solution adopted by the present invention is as follows:
a metabonomics data missing value filling method based on neighbor stability comprises the following steps:
detecting metabolic components in a biological sample by using a mass spectrometry, obtaining map data of the metabolic components, analyzing the map data by adopting preprocessing operations such as peak identification, peak matching, normalization and the like, determining the content of metabolites in the sample, and obtaining metabonomics data.
N denotes the number of samples in metabonomic data, p denotes the number of metabolites in the samples, x i =(x i1 ,x i2 ,…,x ip ) A value vector representing the content composition of p metabolites in the ith sample, 1 ≦ i ≦ n. Sample x in metabolomics data i The content of the middle metabolite m is absent (x) im Is a deletion value), m is more than or equal to 1 and less than or equal to p, the deletion value x is obtained by the following steps im Filling:
(1) calculating a sample x i And sample x j (1 ≦ i ≠ j ≦ n) Euclidean distance d (x) i ,x j ) The formula is as follows:
Figure BDA0002022654470000031
wherein o is il Represents a sample x i Whether the content of the l (1. ltoreq. l.ltoreq.p) metabolites is missing or not, when the sample x i In the absence of the content of the first metabolite of (1), o il 0, otherwise o il =1。
Figure BDA0002022654470000032
Is shown at sample x i And sample x j Chinese herbal medicineThe number of metabolites that are not missing in amount. Distance d (x) i ,x j ) The smaller, x i And x j The higher the similarity between them. Determining the distance to the sample x by Euclidean distance i The most similar k samples constitute a sample set S k
(2) Judging the deletion type of the metabolite.
Pearson correlation coefficients between metabolite m and other metabolites were calculated. Finding out the metabolite aux _ m with the strongest correlation with m as the reference metabolite of m. And (3) judging the deletion type of the metabolite m according to the content distribution condition of the reference metabolite aux _ m, wherein the judgment process is as follows:
order S miss ={x j |x jm Is a deletion value, j is more than or equal to 1 and less than or equal to n represents a sample set of which the metabolite m is the deletion value in the metabonomic data. Order S obs ={x j |x jm Not missing values, 1 ≦ j ≦ n represents a set of samples in the metabonomic data for which metabolite m is not a missing value. Separately calculating reference metabolite aux _ m in sample set S miss And S obs The average content of (A) is recorded as mu miss And mu obs . When metabolite m is positively correlated with aux _ m and μ miss <μ obs If so, the deletion type of m is non-random deletion, and the step (3) is carried out; and conversely, if the deletion type of m is random deletion, the step (4) is carried out. When metabolite m is negatively correlated with aux _ m and μ miss >μ obs If so, the deletion type of m is non-random deletion, and the step (3) is carried out; and conversely, if the deletion type of m is random deletion, the step (4) is carried out.
(3) And (4) a non-random deletion type processing mode.
When S is k In the presence of a deficiency of the content of the metabolite m of the sample, temporarily populating S with the minimum content value of the metabolite m over all samples in the metabolomic data k The value of m missing from the sample. This step takes into account the fact that the metabolomics data contains non-random missing data. Non-random missing values occurred because the metabolite content was below the detection limit of the instrument and was not detected. Temporarily filling the missing m values of the neighbor sample with the minimum content value of the metabolite m more closely matches the non-random missing dataIs characterized in that.
(4) And (4) a random deletion type processing mode.
When S is k When the content of the metabolite m in the sample is absent, the metabolite m in S is used k Average content of metabolite m of the sample without deletion of medium content, temporarily filled with S k The content value of m missing from the sample. When S is k When the content of the metabolite m in the middle sample is missing, then the minimum content value of the metabolite m on all the remaining samples in the metabonomic data is used to temporarily fill in S k The content value of missing m of the sample.
(5) Stable neighbor samples are determined.
According to S k Determination of the degree of fluctuation of the content of the metabolite m in the sample S k Of the stable neighbor sample. Calculating S k Mean μ and standard deviation σ of the metabolite m content of the middle sample. When S is k In the presence of the metabolite m in the sample at [ mu-sigma, mu + sigma ]]Out of range, sample is taken from S k Deleting the neighbor samples to obtain a stable neighbor sample set S' k . Because the variation in metabolite content between neighboring samples is small, will [ mu-sigma, mu + sigma [ ]]The elimination of samples outside the range can reduce the influence of outliers to ensure stability and reliability in the computation of the fill values.
(6) Calculating S' k Weighted average of m content of middle sample metabolite, x calculated using equation (3) im Filling sample x i The content of the deletion metabolite m. The formula is as follows:
Figure BDA0002022654470000041
Figure BDA0002022654470000042
wherein k 'is | S' k L represents a sample set S' k Number of middle samples, s j ,s l (1. ltoreq. j, l. ltoreq. k ') is S' k Sample of (1), w (x) i ,s j ) Representing a sample s j In the calculation of x im The weight of the epoch. d (x) i ,s j ) Representing the sample x calculated by equation (1) i And s j European distance of(s) lm Representing a sample s l Content of metabolite m of (a). Based on neighboring samples and sample x i The distance size gives different weights to the content of m of different neighboring samples. S' k Middle sample and sample x i The smaller the distance, the more heavily weighted the content of its metabolite m, x is calculated im The greater the specific gravity.
The invention has the beneficial effects that:
the method is used for filling metabonomics missing data, missing value types of metabolites are considered, and different strategies are adopted for filling missing values according to different missing value types; and meanwhile, screening the adjacent samples, and filtering unstable adjacent samples. The method has a good effect of filling metabonomics data containing deletion values, and has important significance for subsequent data analysis, metabolic marker selection and the like.
Detailed Description
The following further describes the embodiments of the method on the simulation data in conjunction with the technical solutions, and the simulation data is only used to illustrate the present invention for easy understanding, but not to limit the present invention.
Table 1 shows the simulation data of the present invention, x i Denotes the ith sample, the data contains 10 samples, m 1 ~m 5 Representing 5 metabolites in the data and NaN representing the missing values in the data.
Table 1: analog data
Figure BDA0002022654470000051
The data in Table 1 contain 4 missing values, each x 13 ,x 52 ,x 84 ,x 93 . In the following with x 13 Are specifically described as examples.
(1) Calculating the sample x using equation (1) 1 The distance d from the other samples yields: d (x) 1 ,x 2 )=1.94,d(x 1 ,x 3 )=1.73,d(x 1 ,x 4 )=3.39,d(x 1 ,x 5 )=3.46,d(x 1 ,x 6 )=4.12,d(x 1 ,x 7 )=2.29,d(x 1 ,x 8 )=2.71,d(x 1 ,x 9 )=2.74,d(x 1 ,x 10 ) 3.16. Let k equal 6, then sum with sample x 1 The set of the most similar 6 samples is S k ={x 3 ,x 2 ,x 7 ,x 8 ,x 9 ,x 10 }。
(2) Determination of metabolite m 3 The type of deletion of (a). Calculate m 3 And m 1 ,m 2 ,m 4 ,m 5 The pearson correlation coefficient. Calculated, m 4 And m 3 M is selected if the correlation is strongest and positive correlation is present 4 Is m 3 The reference metabolite of (1). Metabolite m in the data 3 Set of samples S as missing values miss ={x 1 ,x 9 },m 3 Set of non-missing samples is S obs ={x 2 ,x 3 ,x 4 ,x 5 ,x 6 ,x 7 ,x 8 ,x 10 }. Reference metabolite m 4 At S miss Mean value of above μ miss Is 7 at S obs Mean value of above μ obs And was 4.86. Mu.s miss ≥μ obs If the deletion type is random deletion, the step (4) is entered.
(3) At x 1 Of the 6 nearest neighbor samples of (1), sample x 9 Metabolite m of 3 For missing values, then x is used 3 ,x 2 ,x 7 ,x 8 ,x 10 Metabolite m of 3 Average of 6 to fill x temporarily 93 The value of (c).
(4) Sample set S k M of middle sample 3 Corresponding to a value of {3,9,5,7,6,6}, S k M of middle sample 3 The mean value μ of (d) is 6 and the standard deviation σ is 2. The stability interval is then [4,8 ]]. Value x 33 ,x 23 Outside the stability interval, so sample x 3 ,x 2 From S k Is deleted, then S' k ={x 7 ,x 8 ,x 9 ,x 10 }。
(5) Calculating D' k The weight of the middle sample. Obtaining S 'from the formula (2)' k The weight of each sample in (a) is: w (x) 1 ,x 7 )=0.29,w(x 1 ,x 8 )=0.25,w(x 1 ,x 9 )=0.25,w(x 1 ,x 10 ) 0.21. Using equation (3), a weighted average x is calculated 13 =w(x 1 ,x 7 )*x 73 +w(x 1 ,x 8 )*x 83 +w(x 1 ,x 9 )*x 93 +w(x 1 ,x 10 )*x 10,3 5.95. Then 5.95 is taken as the missing value x 13 Is estimated to fill the value.
For missing value x 52 ,x 84 ,x 93 Filling is carried out by adopting the steps (1) to (6) respectively.

Claims (1)

1. A metabonomics data missing value filling method based on neighbor stability is characterized by comprising the following steps:
detecting metabolic components in a biological sample by using a mass spectrometry, obtaining map data of the metabolic components, analyzing the map data by adopting peak identification, peak matching and normalization pretreatment operations, determining the content of metabolites in the sample, and obtaining metabonomics data;
n denotes the number of samples in the metabolomic data, p denotes the number of metabolites in the samples, x i =(x i1 ,x i2 ,…,x ip ) A value vector representing the content composition of p metabolites in the ith sample, i is more than or equal to 1 and less than or equal to n; sample x in metabolomics data i The content of the middle metabolite m is absent, i.e. x im If m is greater than or equal to 1 and less than or equal to p, the missing value x is determined by the following steps im Filling:
(1) calculating a sample x i With other samples x j Euclidean distance d (x) i ,x j ) I ≠ j ≦ n, 1 ≦ as follows:
Figure FDA0003759499020000011
wherein o is il Representing a sample x i If the content of the first metabolite is missing, l is more than or equal to 1 and less than or equal to p, when the sample x i In the absence of the content of the first metabolite of (1), o il 0, otherwise o il =1;
Figure FDA0003759499020000012
Is shown at sample x i And sample x j The number of metabolites whose contents are not deleted; distance d (x) i ,x j ) The smaller, x i And x j The higher the similarity between them; determining the distance to the sample x by Euclidean distance i The most similar k samples constitute a sample set S k
(2) Determination of the type of deletion of a metabolite
Calculating a pearson correlation coefficient between the metabolite m and the other metabolites; finding out a metabolite aux _ m with the strongest correlation with m as a reference metabolite of m; and (3) judging the deletion type of the metabolite m according to the content distribution condition of the reference metabolite aux _ m, wherein the judgment process is as follows:
order S miss ={x j |x jm J is more than or equal to 1 and less than or equal to n represents a sample set of which the metabolite m in the metabonomics data is a deletion value; order S obs ={x j |x jm J is not a deletion value, j is more than or equal to 1 and less than or equal to n represents a sample set of the metabolite m in the metabonomic data, wherein the metabolite m is not a deletion value; separately calculating reference metabolite aux _ m in sample set S miss And S obs The average content of (A) is recorded as mu miss And mu obs (ii) a When metabolite m is positively correlated with aux _ m and μ miss ≤μ obs If so, the deletion type of m is non-random deletion, and the step (3) is carried out; when metabolite m is positively correlated with aux _ m and μ miss >μ obs If yes, the deletion type of m is random deletion, and the step (4) is carried out; when metabolite m is negatively correlated with aux _ m and μ miss >μ obs If so, the deletion type of m is non-random deletion, and the step (3) is carried out; when metabolite m is negatively correlated with aux _ m and μ miss ≤μ obs Then m is randomIf the deletion exists, entering the step (4);
(3) non-random missing type processing mode
When S is k In the presence of a deficiency of the content of the metabolite m of the sample, temporarily populating S with the minimum content value of the metabolite m over all samples in the metabolomic data k The content value of m missing from the sample; entering the step (5);
(4) random miss type handling
When S is k When the content of the metabolite m in the sample is absent, the metabolite m in S is used k Average content of metabolite m of the sample without deletion of medium content, temporarily filled with S k The content value of m missing from the sample; when S is k When the content of the metabolite m in the middle sample is missing, then the minimum content value of the metabolite m on all the remaining samples in the metabonomic data is used to temporarily fill in S k The missing m content value of the sample; entering the step (5);
(5) determining stable neighbor samples
According to S k Determination of the degree of fluctuation of the content of the metabolite m in the sample S k A medium stable neighbor sample; calculating S k Mean value mu and standard deviation sigma of the metabolite m content of the medium sample; when S is k In the presence of the metabolite m in the sample in [ mu-sigma, [ mu + sigma ]]Out of range, sample is taken from S k Deleting the neighbor samples to obtain a stable neighbor sample set S' k
(6) Calculating S' k A weighted average of the m content of the metabolite m in the middle sample; x calculated using equation (3) im Filling sample x i The content of the deletion metabolite m of (a), the formula is as follows:
Figure FDA0003759499020000031
Figure FDA0003759499020000032
wherein k '═ S' k L represents a sample set S' k Number of samples in, s j ,s l (1. ltoreq. j, l. ltoreq. k ') is S' k Sample of (1), w (x) i ,s j ) Representing a sample s j In the calculation of x im The weight of the epoch; d (x) i ,s j ) Representing the sample x calculated by equation (1) i And s j European distance of(s) lm Representing a sample s l The content of metabolite m of (a); based on neighboring samples and sample x i The distance gives different weights to the content of m of different adjacent samples; s' k Middle sample and sample x i The smaller the distance, the more heavily weighted the content of its metabolite m, x is calculated im The greater the specific gravity.
CN201910284004.XA 2019-04-10 2019-04-10 Metabonomics data missing value filling method based on neighbor stability Expired - Fee Related CN110097920B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910284004.XA CN110097920B (en) 2019-04-10 2019-04-10 Metabonomics data missing value filling method based on neighbor stability

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910284004.XA CN110097920B (en) 2019-04-10 2019-04-10 Metabonomics data missing value filling method based on neighbor stability

Publications (2)

Publication Number Publication Date
CN110097920A CN110097920A (en) 2019-08-06
CN110097920B true CN110097920B (en) 2022-09-20

Family

ID=67444595

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910284004.XA Expired - Fee Related CN110097920B (en) 2019-04-10 2019-04-10 Metabonomics data missing value filling method based on neighbor stability

Country Status (1)

Country Link
CN (1) CN110097920B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111737463B (en) * 2020-06-04 2024-02-09 江苏名通信息科技有限公司 Big data missing value filling method, device and computer readable memory
CN111859275B (en) * 2020-07-20 2022-08-12 厦门大学 Mass spectrum data missing value filling method and system based on non-negative matrix factorization
CN113485986A (en) * 2021-06-25 2021-10-08 国网江苏省电力有限公司信息通信分公司 Electric power data restoration method

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103177088A (en) * 2013-03-08 2013-06-26 北京理工大学 Biomedicine missing data compensation method
WO2015004502A1 (en) * 2013-07-09 2015-01-15 Aselsan Elektronik Sanayi Ve Ticaret Anonim Sirketi Method for imputing corrupted data based on localizing anomalous parts
CN104298893A (en) * 2014-09-30 2015-01-21 西南交通大学 Imputation method of genetic expression deletion data
CN105424827A (en) * 2015-11-07 2016-03-23 大连理工大学 Screening and calibrating method of metabolomic data random errors
CN106407464A (en) * 2016-10-12 2017-02-15 南京航空航天大学 KNN-based improved missing data filling algorithm
CN106777938A (en) * 2016-12-06 2017-05-31 合肥工业大学 A kind of microarray missing value estimation method based on adaptive weighting
CN107193876A (en) * 2017-04-21 2017-09-22 美林数据技术股份有限公司 A kind of missing data complementing method based on arest neighbors KNN algorithms
CN108256538A (en) * 2016-12-28 2018-07-06 北京酷我科技有限公司 A kind of subscriber data Forecasting Methodology and system
CN108563770A (en) * 2018-04-20 2018-09-21 南京邮电大学 A kind of KPI and various dimensions network data cleaning method based on scene
CN109472343A (en) * 2018-10-16 2019-03-15 上海电机学院 A kind of improvement sample data missing values based on GKNN fill up algorithm

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103177088A (en) * 2013-03-08 2013-06-26 北京理工大学 Biomedicine missing data compensation method
WO2015004502A1 (en) * 2013-07-09 2015-01-15 Aselsan Elektronik Sanayi Ve Ticaret Anonim Sirketi Method for imputing corrupted data based on localizing anomalous parts
CN104298893A (en) * 2014-09-30 2015-01-21 西南交通大学 Imputation method of genetic expression deletion data
CN105424827A (en) * 2015-11-07 2016-03-23 大连理工大学 Screening and calibrating method of metabolomic data random errors
CN106407464A (en) * 2016-10-12 2017-02-15 南京航空航天大学 KNN-based improved missing data filling algorithm
CN106777938A (en) * 2016-12-06 2017-05-31 合肥工业大学 A kind of microarray missing value estimation method based on adaptive weighting
CN108256538A (en) * 2016-12-28 2018-07-06 北京酷我科技有限公司 A kind of subscriber data Forecasting Methodology and system
CN107193876A (en) * 2017-04-21 2017-09-22 美林数据技术股份有限公司 A kind of missing data complementing method based on arest neighbors KNN algorithms
CN108563770A (en) * 2018-04-20 2018-09-21 南京邮电大学 A kind of KPI and various dimensions network data cleaning method based on scene
CN109472343A (en) * 2018-10-16 2019-03-15 上海电机学院 A kind of improvement sample data missing values based on GKNN fill up algorithm

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
The Nearest Neighbor Algorithm of Filling Missing Data Based on Cluster Analysis;Chi Zhang等;《Applied Mechanics and Materials》;20130831;第347卷;2324-2328 *
基于相关性组合变量的色谱数据分析方法;林晓惠等;《第21届全国色谱学术报告会及仪器展览会会议论文集》;20170519;121-122 *
多组学联合缺失数据填补方法的评价;董学思等;《中国卫生统计》;20170825;第34卷(第04期);558-561+566 *
质谱代谢组学数据预处理方法研究;刘月程等;《化学分析计量》;20180920;第27卷(第05期);105-109 *

Also Published As

Publication number Publication date
CN110097920A (en) 2019-08-06

Similar Documents

Publication Publication Date Title
CN110097920B (en) Metabonomics data missing value filling method based on neighbor stability
US11790629B2 (en) Intensity normalization in imaging mass spectrometry
CN111370067B (en) LC/GC-MS-oriented metabonomics data quality control method and system
CN103890164B (en) Cell recognition device and program
CN110796173A (en) Load curve form clustering algorithm based on improved kmeans
Dordevic et al. Statistical methods for improving verification of claims of origin for Italian wines based on stable isotope ratios
Henderson et al. A comparison of PCA and MAF for ToF‐SIMS image interpretation
CN111521722A (en) Method for identifying storage years of fragrant odor type finished product white spirit bottles
CN111122757A (en) Metabonomics-based research method for bee toxicity effect caused by date flower honey
CN109557165B (en) Method for monitoring the quality of a mass spectrometry imaging preparation workflow
CN111815681A (en) Target tracking method based on deep learning and discriminant model training and memory
US20220091078A1 (en) Training Data Generation Apparatus, Model Training Apparatus, Sample Characteristic Estimation Apparatus, and Chromatograph Mass Spectrometry Apparatus
Yu et al. Comprehensive assessment of the diminished statistical power caused by nonlinear electrospray ionization responses in mass spectrometry-based metabolomics
Wang et al. A dynamic wavelet-based algorithm for pre-processing tandem mass spectrometry data
Vutov et al. Multiple two‐sample testing under arbitrary covariance dependency with an application in imaging mass spectrometry
WO2007147938A1 (en) Normalizing spectroscopy data with multiple internal standards
CN101173918A (en) Method for predicting biological, biochemical, biophysical, or pharmacological characteristics of a substance
CN116089828A (en) Grape wine type identification method based on optimization support vector machine algorithm of sand cat group
CN114783539A (en) Traditional Chinese medicine component analysis method and system based on spectral clustering
CN114141316A (en) Method and system for predicting biological toxicity of organic matters based on spectrogram analysis
CN111210876A (en) Disturbed metabolic pathway determination method and system
CN113569957A (en) Object type identification method and device of business object and storage medium
Morohashi et al. P-BOSS: a new filtering method for treasure hunting in metabolomics
CN117894385B (en) Vinegar fermentation detection method and system based on component analysis technology
US20230282310A1 (en) Microorganism Discrimination Method and System

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220920