WO2018133553A1 - Procédé d'établissement d'une plage de référence quantitative pour un protéome urinaire d'un sujet sain et d'acquisition d'un marqueur de protéine urinaire associé à une maladie - Google Patents

Procédé d'établissement d'une plage de référence quantitative pour un protéome urinaire d'un sujet sain et d'acquisition d'un marqueur de protéine urinaire associé à une maladie Download PDF

Info

Publication number
WO2018133553A1
WO2018133553A1 PCT/CN2017/113550 CN2017113550W WO2018133553A1 WO 2018133553 A1 WO2018133553 A1 WO 2018133553A1 CN 2017113550 W CN2017113550 W CN 2017113550W WO 2018133553 A1 WO2018133553 A1 WO 2018133553A1
Authority
WO
WIPO (PCT)
Prior art keywords
urine
protein
data
proteome
urinary
Prior art date
Application number
PCT/CN2017/113550
Other languages
English (en)
Chinese (zh)
Inventor
秦钧
冷文川
甄蓓
倪晓天
路天元
汪宜
王广舜
孙长青
钟博文
Original Assignee
北京蛋白质组研究中心
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201710048188.0A external-priority patent/CN108334752B/zh
Priority claimed from CN201710051714.9A external-priority patent/CN108334747B/zh
Application filed by 北京蛋白质组研究中心 filed Critical 北京蛋白质组研究中心
Publication of WO2018133553A1 publication Critical patent/WO2018133553A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics

Definitions

  • the invention belongs to the establishment of biomarker data in the field of medical biology, and relates to a method for establishing a quantitative reference range of a healthy human urine proteome using a urine human proteome data set capable of covering physiological fluctuations and differences between individuals and individuals, and establishing The Healthy Human Urine Proteome Database also relates to a method for obtaining a urine protein marker (ie, an outlier urine protein) of a disease patient by screening a urine proteome of a disease patient using a healthy human urine proteome quantitative reference range data In particular, it relates to the establishment of disease-related outlier urinary protein libraries and tumor-associated outlier urinary protein libraries established by tumors as a representative of diseases.
  • a urine protein marker ie, an outlier urine protein
  • Urine is the most commonly used body fluid sample in clinical tests except for blood.
  • the detection of bilirubin, glucose, ketone body, protein, blood cells and other indicators in urine routine is used for the diagnosis or therapeutic monitoring of various diseases.
  • proteomics technology In view of the important value of urine testing in health medicine, scientists all over the world have been using proteomics technology to try to find new protein markers for disease diagnosis, prognosis and efficacy detection from urine.
  • the current research and development process for finding new biomarkers from urine is usually divided into two stages: discovery and verification: in the discovery stage, proteomics can be used to target several to dozens of cases (usually ⁇ 50 cases) of target disease groups.
  • test samples were tested and the significantly different proteins between the two groups became the candidate biomarkers into the validation phase; in the validation phase, candidate biomarkers were tested using large independent samples.
  • candidate markers found through small sample sizes during the discovery phase are actually proteins that differ between individuals, rather than proteins that truly reflect differences in disease and control status.
  • proteomics methods there is no main reason for the successful application of new urine protein markers by proteomics methods. Therefore, it is necessary to establish a quantitative reference range for human urinary proteome that can cover intra- and inter-individual and inter-individual differences and physiological fluctuations, so as to establish a method for obtaining tumor urinary protein markers to be effective. Overcoming the interference caused by physiological fluctuations and differences within and between individuals in the urinary proteome.
  • an object of the present invention is to provide a method for establishing a quantitative reference range of a healthy human urine proteome, and further to provide a database of healthy human urine proteomes, which can cover individuals and individuals. Differences and physiological fluctuations in the healthy human urine proteome dataset and the number of healthy human urine proteins determined from the data set and the calculated quantitative reference range of the healthy human urine proteome.
  • the method for establishing a quantitative reference range for a healthy human urine proteome proposed by the present invention comprises the following steps:
  • each urine sample collected is made into a urine protein sample
  • Search and quantification perform database search, peptide quantification and protein splicing assembly on the mass spectrometry data of each urine protein sample, determine the protein species in each urine protein sample and quantify each protein to form a urine proteome data;
  • Different sub-data sets are determined for different people and different sampling time spans, including: urinary proteome data of all urine protein samples of individual individuals with different sampling time spans are collected to obtain the individual's intra-urine proteome sub-data set. (BCM); urinary proteome data of all urine protein samples collected by a small number of people or a single sample is collected to obtain an inter-individual urinary proteome sub-data set (BPRC);
  • the quantitative reference range is established by the parameter method, and the upper and lower limits of the reference range of the population covering the target percentage are calculated according to the statistical parameters (mean and standard deviation) of the data (eg, the mean plus Reduce the standard deviation by 2 times to cover 95% of individuals).
  • the non-parametric method is used to establish the quantitative reference range, and the upper and lower limits of the reference range are determined according to the percentile method to actually cover the target percentage (such as 2.5 and 97.5). The percentile covers 95% of the individuals).
  • sub-data sets are determined for different people and different sampling time spans, and sub-data sets formed by urine samples with a small number of sampling times are used to evaluate the physiological fluctuations of the urine proteome of a plurality of samples repeatedly. Differences; sub-data sets formed by urine samples with a small number of fewer samples were used to assess physiological fluctuations and differences between individuals in the urinary proteome with fewer or single samplings for most people; sub-data for male and female urine proteomes The set can be used to assess physiological fluctuations and differences between individuals of different urinary proteome groups.
  • the method of evaluation is to calculate the coefficient of variation of each eligible protein in the corresponding sub-data set or the total data set, and then display the distribution range of the coefficient of variation of the desired protein in each sub-data set or total data set in a box plot. To assess the physiological fluctuations and differences between individuals in the corresponding urinary proteome.
  • the step 2) uses a method based on ultracentrifugation and reduction to obtain a urine protein sample, that is, the precipitate after centrifugation of the urine sample is resuspended in a resuspension buffer (50 mM Tris, 250 mM sucrose, pH 8.5), and then dithiothreose is added. Alcohol, Most of the urinary protein in the sample was removed by heating, washed with a washing buffer (10 mM triethanolamine, 100 mM sodium chloride, pH 7.4), and then centrifuged to obtain a urine sample of the urine sample.
  • a resuspension buffer 50 mM Tris, 250 mM sucrose, pH 8.5
  • the step 3) separates the urine protein sample by polyacrylamide gel electrophoresis (SDS-PAGE), gel-cut into 6 bands for in-gel digestion, and then combines into a 2-component peptide sample as a urine protein.
  • the two-component peptide samples were detected by LC-MS/MS to obtain the urine protein sample mass spectrum data for each urine sample; and the purpose of the search was to analyze the data produced by the mass spectrometry to determine the mass spectrometry production.
  • the protein contained in the data is obtained, and a first-order quantitative result of all the peptides is obtained, thereby obtaining corresponding proteome data for each urine protein sample.
  • each sub-data set of 24 hours or 3 consecutive days of sampling includes 3-5 urine proteome data, for those in 3-
  • the quantitative data of the five urine samples were calculated, and the coefficient of variation was calculated.
  • each sampling time span Sub-data sets greater than 2 months include 6-62 urinary proteome data for those at least 3 ( ⁇ 30 urinary proteome sub-data sets) or 10% urine samples (>30 urinary proteome subgroups)
  • the data with quantitative data in the data set calculates the coefficient of variation, and finally obtains the distribution range of the coefficient of variation of all the required proteins in each sub-data set, and displays it in a box-plot.
  • the total data set and the gender sub-dataset in it are used to evaluate the physiological fluctuations and differences between healthy human urine proteome individuals, and the protein with quantitative data for more than 10% urine samples in each data set or sub-data set is calculated.
  • the coefficient of variation of the quantitative data, and box-plot is used to display the coefficient of variation distribution of all eligible proteins in each data set and sub-data set.
  • the present invention still further provides a healthy human urine proteome database, comprising the aforementioned identified sub-data sets, total data sets, and healthy human urine protein types determined according to the data set and the calculated quantitative reference range of healthy human urine proteome;
  • the quantitative reference range for the healthy human urine proteome includes 2025 urine proteins and their values listed in Table 7-1 or Table 7-2.
  • the present invention also provides a method of establishing a quantitative reference range for healthy human urine protein, comprising the following steps:
  • Sample preparation the collected urine sample is made into a urine protein sample
  • Detection detection of urine protein samples to obtain protein detection data of urine protein samples
  • the protein detection data is classified, and each category selects the upper limit value and the lower limit value from the plurality of protein detection data to form a quantitative reference range, and the plurality of types are combined to form a quantitative reference range for healthy human urine protein.
  • a healthy human urine protein quantitative reference range established based on healthy human urine protein data is within the scope of the present disclosure.
  • the human urine protein data includes, but is not limited to, protein species and individual protein content.
  • Another object of the present invention is to provide a method of obtaining a urine protein marker associated with a disease.
  • the disease can be any kind of disease. The following is a case in which the tumor is taken as an example, and the expression about the tumor can be applied to a certain disease.
  • the method for obtaining a disease-related (taking a tumor as an example) urine protein marker proposed by the present invention comprises the following steps:
  • the healthy human urine proteome data set A is randomly divided into three sub-data sets A1, sub-data sets A2 and sub-data sets A3, based on the non-parametric percentage of the healthy human urine proteome sub-data set A1.
  • the number of digits method determines the quantitative reference range of the healthy human urine proteome, and the quantitative value of the 99.5th percentile of each urine protein in the data set is the upper limit of the quantitative reference range;
  • the training sub-data set B1 is formed from the urine proteome data set B of the tumor patient, and each of the urine proteome data is screened by the upper limit of the reference range established in (1), if a certain protein is At least two samples exceeding the upper limit of the reference range are included in the candidate tumor-associated outlier urinary protein pool; all training data are screened to produce a candidate tumor-associated outlier urinary protein library C1;
  • each urinary proteome produces a sample-specific outlier urinary protein pool C2; each sample is specifically isolated from the urinary protein pool C2 and (2) the candidate tumor-associated outlier urinary protein
  • the proteins in library C1 are compared to determine the same protein and quantity in the two pools. The more the same protein, the closer the sample is to the tumor patient sample;
  • the hypergeometric test was used to calculate the p-values of the same protein overlap in the two libraries C1 and C2.
  • the ROC curve was used to investigate the generation of the ROC curve (2).
  • the group urine protein library C which contains the outlier protein, is a tumor urine protein marker.
  • the above method also includes the step of verifying the established tumor-associated outlier urine protein library C:
  • the card value Pc determined in the above (4) is compared to determine whether each urinary proteome belongs to a healthy person or a tumor patient, and the tumor-related outlier urinary protein pool is determined according to the false positive rate and the false negative rate to distinguish the sensitivity of the healthy person and the tumor patient. Sex and specificity.
  • the step (1) determines that the quantitative reference range of the healthy human urine proteome is calculated by the non-parametric method using the data of the sub-data set A1, and the individual who actually covers the target percentage according to the upper and lower limits of the reference range according to the percentile method ( For example, the 2.5th and 97.5th percentiles cover 95% of individuals).
  • the process of establishing the urine proteome data set B of the tumor patient in the step (2) comprises:
  • each urine sample collected is made into a urine protein sample
  • Search and quantification perform database search, peptide quantification and protein splicing assembly on the mass spectrometry data of each urine protein sample, determine the protein species in each urine protein sample and quantify each protein to form a urine proteome data;
  • the tumor-associated outlier urine protein library obtained by the above method and the outlier urine protein contained therein, that is, the tumor urine protein marker are also in the present invention.
  • the tumor-associated outlier urine protein library includes 509 urine proteins, specifically A1BG, A2M, ABCB7, ABCD4, ABCE1, ABHD11, ABHD12, ABHD14B, ACADM, ACADSB, ACE2, ACO2, ACOT9, ACSL3, ACSM2A, ACSM2B, ACTR1B, ADD1, AGT, AHNAK2, AHSG, ALDH1L2, ALDH3A1, ALDH3A2, ALDH3B1, ALDH4A1, ALDOC, AMY2A, AMY2B, ANGPTL6, ANK1, ANPEP, ANXA1, ANXA10, ANXA2, ANXA3, ANXA4, ANXA5, ANXA6, APMAP, APOB, APP, AQP7, ARFIP1, ARG1, ARHGAP1, ARL13B, ARL6IP5, ARL8A, ARMC9, ARRDC1, ASNA1, ASPH, ATP13A3, ATP2A
  • Still another object of the present invention is to provide a tumor correlation degree judgment for a urine sample to be tested by using the tumor-related out-of-group urine protein library by obtaining a proteomic data of the urine sample to be tested, and using a hypergeometric distribution test method Calculating the p value of the same protein overlap in the urine sample and the tumor urine protein outlier protein library, determining the Pc value when the specificity is 95%, and determining the urine to be tested when the p-value of the hypergeometric distribution test is greater than Pc
  • the sample is a healthy human sample. When the p value is less than Pc, the urine sample to be tested is judged to be a tumor patient sample.
  • other disease-related outlier urine protein stores can be used to determine the disease correlation degree of the urine sample to be tested.
  • the process is: obtaining proteomic data of the urine sample to be tested, and calculating the urine by using a hypergeometric distribution test method. Determining the p value of the same protein in the urinary protein outlier protein library, and determining the Pc value when the specificity is 95%. When the p-value of the hypergeometric distribution test is greater than Pc, the urine sample to be tested is determined to be a healthy person. The sample, when the p value is less than Pc, determines that the urine sample to be tested is a patient sample of the disease.
  • a urine protein set dataset capable of covering intra- and inter-individual differences and physiological fluctuations is established by collecting large-scale human urine proteome data, and a quantitative reference range of urine proteome is established by using the data set.
  • the urinary proteome data of a patient in the case of a tumor
  • the screening process can be well excluded from the discovery of urinary protein biomarkers. Interference from physiological fluctuations and differential proteins between individuals.
  • Figure 1 is a graph showing the coefficient of variation of physiological fluctuation ranges for 24 hours and 3 consecutive days in a healthy human urine proteome group.
  • the 24-hour data comes from 2 volunteers (U001 and U002), and the data for 3 consecutive days comes from 16 voluntary (U001-U005, U007-U017).
  • the vertical axis is the coefficient of variation, and the horizontal axis is the different sub-data sets of different individuals.
  • Figure 2 is a graph showing the coefficient of variation of physiological fluctuation ranges greater than 60 days in a healthy human urine proteome group.
  • the sampling time span of the other 14 volunteers was 61-314 days.
  • the vertical axis is the coefficient of variation, and the horizontal axis is the sub-data set of different individuals.
  • Figure 3 is a graph showing the relationship between the number of samples and the physiological fluctuations in the healthy human urine proteome.
  • A is the relationship between the number of samples of the volunteer U001 and the physiological fluctuations of the urine proteome.
  • B is the relationship between the number of samples of the volunteer U002 and the physiological fluctuations of the urine proteome;
  • the vertical axis is the variation.
  • the coefficient, the horizontal axis is the number of samples in the sub-data set.
  • Figure 4 is a graph showing the coefficient of variation of the range of physiological fluctuations between individuals in the healthy human urine proteome.
  • Vertical axis coefficient of variation
  • horizontal axis BCM, BPRC, A1, Female, and Male are sub-data sets
  • A is the total data set
  • the numbers in parentheses are the median coefficient of variation in the distribution of proteome variation coefficients in each data set.
  • Figure 5 is a total ion chromatogram generated by liquid chromatography tandem mass spectrometry (LC-MS) of a urine protein sample (including a two-component peptide sample) of volunteer U001.
  • the vertical axis is the signal intensity and the horizontal axis is the retention. time.
  • Figure 6 is a flow chart showing the process of establishing a tumor-associated outlier urine protein library
  • A is the training data set and the generation of candidate tumor-associated alien protein pools
  • B is the generation of the validation data set and the evaluation of the candidate tumor-associated alien protein pool
  • Panel C is the generation of test data sets and testing of the final tumor-associated alien protein pool.
  • One aspect of the present invention provides a method for establishing a quantitative reference range of a healthy human urine proteome, and further provides a healthy human urine protein group database; and another method for obtaining a disease-related urine protein marker, and taking a tumor as an example Further, a tumor-related outlier urinary protein pool was proposed.
  • the invention utilizes the quantitative reference range of the healthy human urine proteome to screen the urinary proteome data of a disease patient to find outliers, through three stages of discovery, verification and testing (to randomly divide the urine proteome data of healthy people and patients into training) The analysis of the validation and test sub-data sets separately determines the disease (in the case of tumors) associated with the outlier urinary protein pool.
  • the concept of a proteome refers to a collection of all kinds of proteins in a cell, within a tissue, within a body fluid, or within an individual.
  • the urinary proteome refers to all of the different kinds of proteins included in each urine sample.
  • a washing buffer (10 mM triethanolamine, 100 mM sodium chloride, pH 7.4) was added to 400 ul, and then centrifuged at 100,000 for 20 minutes under a centrifugal force of 100,000, and the supernatant was discarded to leave a precipitate.
  • This precipitate was used as a urine protein sample of the urine sample.
  • each urine protein sample prepared by the above ultracentrifugation method is dissolved in 60 ⁇ l of 1% sodium dodecyl sulfate buffer (1% SDS, 50 mM Tris, pH 8.5), and 30 ⁇ l of the sample is used for polypropylene. After separation by amide gel electrophoresis (SDS-PAGE), the gel was cut into 6 bands for in-gel digestion, and then combined into a 2-component peptide sample as a urine proteome, using LC-MS/MS for 2 components. Peptide samples were tested to obtain urine protein sample data for each urine sample (mass data, see Figure 5 for the spectrum). The specific operation is:
  • the peptide sample obtained after digestion was dissolved in 20 ⁇ l of loading buffer (5% methanol, 0.1% formic acid), and then 5 ⁇ l was applied for loading, using a ThermoScientific nanoscale liquid chromatography tandem high resolution mass spectrometry system (nLC-Easy1000-Q Exactive- HF) for data collection.
  • loading buffer 5% methanol, 0.1% formic acid
  • the specifications of the nanoliter liquid phase loading column are as follows: the inner diameter is 100 ⁇ m, the packing is the C18 packing of Dr. Maisch GmbH (particle diameter is 3 ⁇ m, the particle diameter is 120 nm), the packed bed length is 2 cm; the nanoliter liquid phase
  • the separation column specifications were as follows: an inner diameter of 150 ⁇ m, a filler of Dr. Maisch GmbH, a C18 filler (particle diameter of 1.9 ⁇ m, a particle diameter of 120 nm), and a packed bed length of 12 cm.
  • Mobile phase A was 0.1% formic acid
  • mobile phase B was acetonitrile and 0.1% formic acid.
  • the peptide separation elution gradient was as follows: 0-69 minutes for 5%-31% mobile phase B and 70-75 minutes for 95% mobile phase B.
  • the mass spectrometry data was collected by Data Dependent Acquisition.
  • the parameters used for Q Exactive-HF were as follows: the first-order mass spectrometer resolution was 120,000, the scanning range was 300-1400 m/z, the AGC was 3E+6, and the maximum ion implantation time was 80 msec.
  • the secondary mass spectrometry separates the fragmentation according to the signal intensity of the peptide fragment in the first-order mass spectrum from high to low (in Top 20 mode), the resolution of the secondary mass spectrometer is 15,000, and the mass separation window of the secondary mass spectrometer is 3 m/z.
  • the AGC is 2E+4, the maximum ion implantation time is 20ms, the HCD relative collision energy is 27%, and the data acquisition uses 12s dynamic elimination.
  • Mass spectrometry data from each urine protein sample was searched using bioinformatics tools and methods.
  • the purpose of the database search is to analyze the data produced by the mass spectrometry and determine the proteins contained in the data produced by the mass spectrometry.
  • the process is to analyze the secondary spectrum of the parent ion in the data produced by the mass spectrometer within a certain mass deviation range.
  • the intensity distribution of the fragment ions was compared with the theoretical intensity, and the mother ions were scored by the fragment ions not exceeding the mass deviation range to obtain the identification results of the parent ions (short peptides).
  • the short peptide fragment is matched with a known protein amino acid sequence library to determine the protein information of the detected short peptide segment, and the protein identification result is obtained.
  • the specific process and parameters used are as follows:
  • the obtained mass spectral data was subjected to peptide sequence database search analysis using the Proteome Discoverer V2.0 software of the Mascot 2.3 search engine.
  • the parameters of the database search are set: the human protein sequence database is selected in the “Protein Database”, and the database used is the human body of the National Center for Biotechnology Information (NCBI).
  • Protein reference sequence database select Trypsin in "Enzyme Name”; fill in 2 in “Maximum Missed Cleavage” (representing the maximum number of missed sites allowed to be 2); select Default in “Instrument”; select All in “Taxonomy” Entries; fill 20ppm in “Precursor Mass Tolerance”; fill 50mmu in “Precursor Mass Tolerance”; select False in “Use Average Precursor Mass”; select None in “From Quan Method”; select False in “Show All Modifications”; In “Dynamic Modification”, except for the commonly available Acetyl (Protein N-term), DeStreak (C), Oxidation (M), and Carbamidomethyl (C); the false positive identification of the peptide level is less than 1%.
  • the first-order spectra in the original data were calculated by the peptide-matching map information generated by the database search, and the first-order quantitative results of all the peptides were obtained.
  • the batch calculation program uses the existing protein abundance quantification software based on high-resolution mass spectrometry data peptide cross-regression [referred to as: PQPCR] V 1.0 (National Copyright Administration of the People's Republic of China computer software copyright registration number: soft boarding No. 0451332, registration number 2012SR083269, registration date: September 4, 2012, copyright owner: Beijing Proteome Research Center).
  • the quantified peptides are spliced into corresponding proteins according to the amino acid sequence of the proteins in the database, and the corresponding urine proteome data of each urine protein sample is obtained.
  • the concept of the urinary proteome refers to all the different kinds of proteins included in each urine sample, and all the proteins identified in one urine sample are called a urinary proteome.
  • Each healthy human urine proteome data obtained by the above method analysis was sequentially combined to obtain healthy human proteome data set A (integrated Table 4 and Table 5, containing data sets of 497 urine proteomes of 167 healthy persons),
  • Each tumor urinary proteome data obtained was combined to obtain the patient's tumor urinary proteome data set B (as shown in Table 8-2, including 17 solid tumors - bladder cancer 17 cases, breast cancer) 4 cases, 25 cases of cervical cancer, 22 cases of colorectal cancer, 14 cases of esophageal cancer, 47 cases of gastric cancer and 25 cases of lung cancer, 154 urine proteome data sets).
  • the data in the Healthy Human Urine Proteome Dataset A is used to assess the individual and individual of the healthy human urine proteome. Physiological fluctuations and differences between the organisms and establish a quantitative reference range for healthy human urine proteome.
  • the data in data set A can be divided into different sub-data sets for the purpose of assessing physiological fluctuations and differences in different types of urinary proteomes. For example, data used to assess intra-individual differences in an individual can constitute a sub-data set (see Table 3); the data in this sub-data set can also be subdivided into corresponding sub-data sets based on the sampling time span. To assess physiological fluctuations and differences in urinary proteome at different time spans within healthy individuals. In addition, sub-data sets can be created based on factors such as gender.
  • the data system of the data set or sub-dataset is used to evaluate the intra- and inter-individual differences and physiological fluctuations of the healthy human urine proteome, and the quantitative reference range of the healthy human urine proteome is calculated by the percentile method. (See Table 6).
  • the data in the tumor urinary proteome dataset B were randomly divided into training, validation, and test sub-data sets for the detection, validation, and testing of the ability of healthy and tumor patients to be diagnosed.
  • the physiologic fluctuations and differences in the urinary proteome of healthy individuals in three different sampling time spans were evaluated by determining the quantitative data of each protein in the corresponding sub-dataset.
  • the distribution range of the coefficient of variation (the standard deviation of protein quantitative data/the mean of protein quantitation data).
  • the sub-dataset sampled every 24 hours or 3 consecutive days includes 3-5 urine proteomic data, and for those proteins with quantitative data in 3-5 urine samples, calculate the coefficient of variation and finally obtain each sub- The data set all met the distribution of the coefficient of variation of the required protein and was presented in a box-plot.
  • the sub-dataset with a sampling time span of more than 2 months includes 6-62 urinary proteome data for those at least 3 ( ⁇ 30 urinary proteome subdata sets) or 10% urine samples (>30)
  • the protein with quantitative data in the sub-dataset of the urinary proteome calculates the coefficient of variation, and finally obtains the distribution range of the coefficient of variation of all the required proteins in each sub-data set, and displays it in a box-plot.
  • Data set A of 497 urinary proteomes containing 167 healthy individuals and their gender subsets were used to assess physiological fluctuations and differences between healthy human urinary proteome individuals, over 10 for each data set or sub-data set.
  • % urine samples have quantitative data, calculate the coefficient of variation of their quantitative data, and use box-plot to display the distribution of coefficient of variation for all eligible proteins in each data set and sub-data set.
  • the established 497 urine proteome data sets A containing 167 healthy people can cover the urine group of healthy people. And physiological fluctuations and differences between individuals.
  • the protein in the data set A was determined by the percentile method based on its quantitative data in 497 urine samples to determine the quantitative value of the protein in different percentiles as the protein.
  • Quantitative reference range in the urine proteome of healthy populations For example, a quantitative value for the 2.5th and 97.5th percentile levels of a protein covers the quantitative fluctuation range of the protein in 95% of the 497 urine samples.
  • the quantitative reference range of all proteins in this data set A can be used to exclude the interference caused by physiological fluctuations or inter-individual differences in the development of urine protein biomarkers; it can also help to find out in the process of health management using urine proteome information. Outliers that are outside the quantitative reference range.
  • the Healthy Human Urine Proteome Dataset A (dataset containing 497 urinary proteomes from 167 healthy individuals) was randomly divided into 3 sub-data sets.
  • the first sub-dataset A1 includes 350 urine proteome data from healthy individuals to establish a quantitative reference range for healthy human urine proteome (using the percentile method);
  • the second sub-dataset A2 includes 100 healthy individuals.
  • Urine proteomic data was used to validate the ability of screened tumor-associated outliers to differentiate between healthy and tumor patients;
  • the third sub-dataset A3 included 47 healthy urinary proteome data from healthy individuals for final independent testing of validated tumor-associated
  • the group urine protein library distinguishes the ability of healthy people and tumor patients.
  • Test sub-dataset A3 is no longer involved in the discovery and validation of tumor-associated outliers to ensure independence from the ability of the ultimately established tumor-associated outlier urinary protein pool to differentiate between healthy and cancer patients.
  • the urinary proteome dataset of tumor patients was also randomly divided into training subdataset B1, validation subdataset B2, and test subdataset B3 according to the corresponding number of 7 tumors for the corresponding healthy human urine proteome subdataset. (A1-A3) jointly completed the establishment of a tumor-associated outlier urinary protein pool.
  • the B1, B2, and B3 sub-data sets included urine proteomic data for 45, 61, and 48 tumor patients, respectively.
  • Test sub-dataset B3 is no longer involved in the discovery and validation of tumor-associated outliers to ensure independence from the ability of the ultimately established tumor-associated outlier urinary protein pool to differentiate between healthy and tumor patients. .
  • sub-dataset A1 including 350 urinary proteome data
  • Each urinary protein in the training sub-dataset B1 that will include urinary proteome data from 45 tumor patients The genomic data is screened using the upper limit of the reference range established in (1). If a protein exceeds the upper limit of the reference range in at least two samples, it is included in the post-candidate tumor-associated urinary protein pool. When all training data were screened, a candidate tumor-associated outlier urine protein library C1 was generated.
  • the upper limit is screened so that each urine proteome produces a sample-specific outlier urine protein library C2.
  • the total protein in each sample specific outlier urine protein library C2 was compared with the protein in the candidate tumor associated outlier urine protein library C1 generated in (2) to see how many identical proteins were in the two pools. The more the sample-specific outlier urinary protein pool C2 is the same as the candidate tumor-associated urinary protein pool C1, the closer the sample is to the tumor patient's sample.
  • the hypergeometric test was used to calculate (the calculation method is shown in Table 9, the formula is as follows). The p values of the same protein overlap in the two libraries.
  • the sub-dataset A2 of the healthy human urine proteome data and the urinary proteome validation subdataset B2 of the tumor patient obtained a total of 161 corresponding hypergeometric distribution test p values, and the ROC curve was drawn using these p values (receiver operating characteristic curve, ROC). ) was used to examine the ability of the candidate tumor-associated outlier urinary protein pool C1 generated in (2) to verify the urinary proteome of healthy and tumor patients in the sub-datasets A2 and B2.
  • the vertical axis of the ROC curve has a scale of 0-1, no unit, which is used to measure the sensitivity of the urine protein group in healthy people and tumor patients.
  • the corresponding hypergeometric distribution test p value can be determined according to the expected sensitivity or specificity as a card value (Pc value) to distinguish between healthy people and tumor patients. In this application, the corresponding card value Pc is determined with a specificity of 95%.
  • the above (3) is the data of 106 urine proteomes of tumor patients (48 urinary proteome data were randomly selected from the data of 154 tumor urinary proteome data of B data set to generate tumor test sub-data sets.
  • a training sub-data set B1 (containing 45 tumor urine proteome data) and a corresponding validation sub-data set B2 (containing 61 tumor urine proteome data) randomly generated in the remaining data after B3).
  • a total of 20 random samples were taken from 106 urinary proteome data of tumor patients, and 20 pairs of training sub-data sets and verification sub-data sets (20 pairs B1 to B2) were obtained.
  • the same analysis in (3) above was performed for each pair of sub-data sets (B1 to B2), and 20 candidate tumor-associated outlier urinary protein pools C1 and 20 ROC curves were obtained, which corresponded to the area under the maximum ROC curve (0.957).
  • the candidate tumor-associated outlier urinary protein pool C1 was identified as the final tumor-associated outlier urinary protein pool C (containing 509 tumor-associated outliers, see Table 10), and the Pc value at specificity of 95% was 1.78 ⁇ 10 -8
  • test sub-dataset A3 containing 47 urinary proteome data from healthy people
  • B3 including 48 urine proteomic data from tumor patients
  • the final tumor-associated outlier urinary protein pool C obtained in the healthy human and tumor patients was tested by the method of the above (3) to obtain the hypergeometric distribution of the urine proteome of each healthy person and tumor patient.
  • the p value is tested and compared with the card value Pc determined in (4) above to determine whether each urinary proteome belongs to a healthy person or a tumor patient, and the tumor-related outlier urinary protein pool is determined according to the false positive rate and the false negative rate. Sensitivity and specificity of human and cancer patients.
  • Example 1 Establish a data set for assessing physiological fluctuations and differences within a healthy human urinary proteome, and assess physiological fluctuations in the urinary proteome
  • the process of creating a data set includes:
  • each urine sample collected is prepared into a urine protein sample according to the method described above, and each urine sample is made into a urine protein sample (a peptide sample containing two components));
  • Urinary proteome data for example, U001-1 (a urine protein sample made from four urine samples collected by volunteers U001 for 24 hours), the urine proteome data are shown in Table 2, which includes 4 collected in 24 hours.
  • the sample involves quantitative data of 1615 proteins, limited to the length, in which only part of the protein data is extracted;
  • Each urinary proteome data was sequentially combined according to the method of the foregoing four to obtain an intra-individual urinary proteome data set for each volunteer of 17 healthy volunteers with different sampling time spans. Take U001 volunteer as an example. The sub-data set of the individual urinary proteome is shown in Table 3. It contains the quantitative data of 3264 proteins involved in the 62 samples collected from the volunteer for 314 days, which is limited to the space. Take some protein data;
  • the data set of this embodiment includes short-term (24 hours, 3 consecutive days) or long-term sampling (over 60 days) data of 17 volunteers, and the total sampling time span of each volunteer is 5 days to 314 days.
  • the sub-data set BCM is divided into sub-data sets of different individuals (see Table 3); in these sub-data sets, according to whether it is continuous sampling within 24 hours or sampling for 3 consecutive days, further Divided into different sub-data sets.
  • These sub-datasets can be used to assess the range or differences in physiologic fluctuations of urinary proteome within 24 hours, consecutive 3 days, and greater than 60 days in healthy individuals.
  • the results are shown in Figures 1 and 2 (horizontal axis is the different sub-data sets for different individuals) The vertical axis is the coefficient of variation). among them:
  • Figure 1 shows the 24-hour urinary proteome physiological fluctuation data for individuals from four volunteers (U001 and U002) for a total of four 24-hour sub-data sets, each sub-data set including 3-5 urine proteome data (
  • Table 2 shows data for 4 urine samples of volunteer U001 collected within 24 hours, each urine sample having 1 proteomic data, and then combined into a 24-hour sub-data set).
  • the protein with quantitative data in all urine samples is obtained, and the coefficient of variation of the quantitative data (the standard deviation of the quantitative data/the mean of the quantitative data) is obtained, and the coefficient of variation of all the proteins in the sub-data set meets the requirements.
  • the range is displayed using a box-plot to represent the physiological fluctuation range of the 24-hour urine proteome within the individual.
  • the median coefficient of variation of the physiological fluctuations of the 24-hour urine proteome of the four sub-datasets ranged from 0.29 to 0.33, and the coefficient of variation of the most variable protein was 2.0 (see U11-2 in the left section of Figure 1). .
  • the physiological volatility data of the urinary proteome in three consecutive days was from 35 sub-datasets of 16 volunteers (U001-U005, U007-U017), and each sub-dataset included 3 urinary proteome data (sampled every morning) The composition of the urine proteome data).
  • the distribution coefficient of variation coefficient of each sub-dataset urinary proteome was obtained to represent the physiological fluctuation range of the urinary proteome in the individual for 3 consecutive days (see the right part of Figure 1).
  • the median coefficient of variation of physiological fluctuations in the urinary proteome for 3 consecutive days was 0.23-0.5, slightly higher than the quantitative fluctuation of the urinary proteome within 24 hours.
  • the urinary proteome physiological fluctuation data for more than 60 days in vivo was from 17 sub-datasets of 17 volunteers, and each sub-dataset included 6-62 urine proteome data with a sampling time span of 61-314 days.
  • sub-data sets that include less than 30 urinary proteome data when a protein has quantitative information in at least 3 urine samples, the coefficient of variation is calculated (if a protein cannot be detected in at least 3 urine samples, then This protein is not considered to be a common protein in the healthy human urinary proteome and therefore does not assess its physiological fluctuations; for sub-data sets including 30 or more urinary proteome data, when a protein is present in at least 10% of the urine sample In the case of quantitative information (proteins that cannot be detected in at least 10% of urine samples, this protein is not considered to be a common protein in the healthy human urine proteome, so its physiological fluctuations are not evaluated) and its coefficient of variation is calculated.
  • the physiological fluctuation range of the urine proteome in each sub-dataset is expressed by the distribution range of the coefficient of variation of all eligible proteins (see Figure 2).
  • the median coefficient of variation of physiological fluctuations in the long-term urinary proteome in vivo was 0.45-0.87 (see Figure 2), which was significantly higher than the physiological fluctuations in the urinary proteome of individuals within 24 hours and 3 consecutive days.
  • the data in Figure 2 also shows that there is no linear relationship between the physiological fluctuations of the urinary proteome in the individual and the time span of sampling, which indicates that the physiological fluctuations of the urinary proteome in the individual do not change indefinitely with time, but in a limited Within a stable range. Therefore, it is feasible to establish a quantitative reference range for the individual urine proteome according to the physiological fluctuation range of a person's intra-urine proteome.
  • this example also uses two of the largest individual urinary proteome sub-datasets (containing 62 and 51 urinary proteome data, respectively) to analyze at least how many different samples are needed to cover a stable urinary proteome physiology in an individual. Range of sexual fluctuations. In each sub-data set, only proteins with quantitative information in at least 10% of the urine samples were involved in the analysis. Using random resampling methods, 3-25 urinary proteome data were randomly selected from each sub-data set to form a sub-data set with sample sizes of 3-25.
  • Example 2 establishing a data set for assessing physiological fluctuations and differences between healthy human urinary proteome individuals, and assessing physiological fluctuations between urinary proteome individuals
  • the data collection of the healthy human urine proteome was the same as in Example 1.
  • This example collected a sub-dataset BPRC consisting of 178 urine proteome data from 150 volunteers (see Table 5).
  • the sub-dataset BPRC and the sub-dataset BCM were combined to obtain 497 urinary proteome data sets A including 167 healthy volunteers (integration Tables 4 and 5, omitted here).
  • Data set A can also be based on the gender of the volunteer Divided into male and female urine proteome sub-data sets or other sub-data sets.
  • Sub-dataset BCM (including 319 urine proteomic data from 17 healthy volunteers) can be used to assess physiologic fluctuations and differences between urinary proteomes that have been sampled by a small number of individuals; sub-dataset BPRC (including 150 healthy volunteers) The 178 urinary proteome data can be used to assess physiologic fluctuations and differences between individuals in the urinary proteome with a small or single sampling of the majority; Male (including 343 urinary proteome data from 98 healthy volunteers) And the female (Female, including 154 urine proteomics data from 69 healthy volunteers) urinary proteome sub-datasets can be used to assess physiological fluctuations and differences between individuals of different sex urinary proteomes.
  • Data set A composed of 497 urine proteome data of healthy people and 350 urinary proteome data of healthy people sub-data set A1 can be used to establish quantitative reference range of healthy human urine proteome.
  • Figure 4 shows 6 sub-data sets and The inter-individual physiological fluctuation range of the middle urinary proteome is very similar, and the median coefficient of variation is between 1.01-1.19, which also indicates that the data set A or the sub-data set A1 basically covers the physiological fluctuations and differences among the healthy urinary proteome individuals. .
  • the range of physiological fluctuations between individuals is significantly higher than the range of physiological fluctuations within individuals ( Figure 4, Figure 2 and Figure 1).
  • Example 3 Establishing a Healthy Human Urine Proteome Quantitative Reference Range and a Healthy Human Urine Proteome Database
  • the Healthy Human Urine Proteome Total Data Set A (integrated Tables 4 and 5, containing a data set of 497 urinary proteomes from 167 healthy individuals) was randomly divided into three sub-data sets, of which the first sub-data set A1 Including 350 urine proteome data for healthy people, the second sub-data set A2 includes 100 urine proteome data for healthy people, and the third sub-data set A3 includes 47 urine proteome data for healthy people.
  • the quantitative data of the healthy human urine proteome is established using the data of the total data set A and the sub-data set A1, respectively.
  • the method of establishing the quantitative reference range is divided into two types: parameter and non-parameter.
  • the parameter reference method is used to establish the quantitative reference range.
  • the data must conform to the normal distribution, so that the percentage of the coverage target can be calculated according to the statistical parameters (mean and standard deviation) of the data.
  • the upper and lower limits of the reference range of the population such as the mean plus or minus 2 times the standard deviation covers 95% of the individuals.
  • the parameter method cannot be used when it is not clear whether the data conforms to the normal distribution.
  • the nonparametric method does not require statistical distribution of the data. Individuals that actually cover the target percentage by the upper and lower limits of the reference range according to the percentile method, such as the 2.5th and 97.5th percentiles, cover 95% of the individuals.
  • this example adopts the non-parametric method to establish a quantitative reference range for the healthy human urine proteome. The specific results are shown in Table 7-1 and Table 7-2.
  • the healthy human urine protein DYNC1H1 is taken as an example, and the quantitative values of the 2.5th and 97.5th percentile levels (0.024-11.344) Covers the quantitative fluctuation range of the protein in 95% of the 497 urine samples; the quantitative value of the 5th and 95th percentile levels (0.918-8.964) covers 90% of the protein in 497 urine samples. The range of quantitative fluctuations.
  • the quantitative value of the 99.5th percentile is the upper limit of the quantitative reference range
  • the healthy human urine protein DYNC1H1 is For example, the quantitative values of the 2.5th and 97.5th percentile levels (0.044-10.962) cover the quantitative fluctuation range of the protein in 95% of the 350 urine samples; the quantitative value of the 99.5th percentile (19.279) The upper limit of the reference range is quantified.
  • the range of values established by data set A can be used; in the case where the research must use two stages of discovery and verification, the range of values established by A1 can be used.
  • the Healthy Human Urine Proteome Database is established according to the above quantitative reference range of the healthy human urine proteome, and the database includes the above identified sub-data sets (such as Table 1 - Table 5), the total data set A (such as Table 6), and The healthy human urine protein species determined by total data set A or sub-data set A1 and the calculated healthy human urine proteome quantitative reference range (as in Table 7-1 or Table 7-2).
  • Example 4 Establishing a urinary proteome data set B of a tumor patient and establishing a tumor-associated outlier urinary protein pool C
  • the data set process for establishing a urine proteome of a tumor patient is the same as in Example 1.
  • This example collected 154 urinary proteome data from 154 patients including 7 solid tumor types to establish a urinary proteome data set B for tumor patients (see Table 8-2). Among them, 17 cases of bladder cancer, 4 cases of breast cancer, 25 cases of cervical cancer, 22 cases of colorectal cancer, 14 cases of esophageal cancer, 47 cases of gastric cancer and 25 cases of lung cancer. Tumors were established using the Healthy Human Urine Proteome Total Data Set A in Example 2 (integration of Tables 4 and 5, including 497 urine proteomic data of 167 persons) and the urinary proteome data set B of tumor patients in this example.
  • Related outliers urinary protein library C the specific process is as follows:
  • the Healthy Human Urine Proteome Dataset A (dataset containing 497 urinary proteomes from 167 healthy individuals) was randomly divided into three sub-data sets.
  • the first sub-dataset A1 includes 350 healthy human urine proteome data to establish a quantitative reference range for healthy human urine proteome (using the percentile method);
  • the second sub-dataset A2 includes 100 healthy individuals.
  • Urine proteomic data was used to validate the ability of screened tumor-associated outlier urine proteins to differentiate between healthy and tumor patients;
  • the third sub-dataset A3 included 47 healthy human urine proteomic data for final independent testing of validated tumor-associated
  • the group urine protein library distinguishes the ability of healthy people and tumor patients.
  • the urinary proteome dataset of tumor patients was also randomly divided into training subdataset B1, validation subdataset B2, and test subdataset B3 (see Table 8-1) for the corresponding health according to the corresponding number of 7 tumors.
  • the Human Urine Proteome Subset (A1-A3) together complete the establishment of a tumor-associated outlier urinary protein pool.
  • the B1, B2, and B3 sub-data sets included urine proteomic data for 45, 61, and 48 tumor patients, respectively.
  • Test sub-dataset B3 is no longer involved in the discovery and validation of tumor-associated outliers to ensure independence from the ability of the ultimately established tumor-associated outlier urinary protein pool to differentiate between healthy and tumor patients. .
  • a healthy human urine proteome quantitative reference range was established based on the first healthy human urinary proteome sub-data set A1 using the method of Example 3.
  • the quantitative value of the 99.5th percentile of the quantitative data of each urine protein in the 350 urine proteome of the first sub-data set A1 is the upper limit of the quantitative reference range;
  • Each urine proteome data in the training sub-data set B1 including 45 urine proteome data of the tumor patient is screened using the upper limit of the reference range established in (1), if a protein is in at least two samples The upper limit of the reference range is included in the post-candidate tumor-associated outlier urinary protein pool.
  • a candidate tumor-associated outlier urine protein library C1 was generated.
  • the sub-dataset A2 of the healthy human urine proteome data and the urinary proteome validation subdataset B2 of the tumor patient obtained a total of 161 corresponding hypergeometric distribution test p values, and the ROC curve was drawn using these p values (receiver operating characteristic curve, ROC). ) was used to examine the ability of the candidate tumor-associated outlier urinary protein pool C1 generated in (2) to verify the urinary proteome of healthy and tumor patients in the sub-dataset B2.
  • ROC song The vertical axis of the line is 0-1, no unit, which is used to measure the sensitivity of the urine protein group in healthy people and tumor patients. The closer to 1 is, the higher the sensitivity is; the horizontal axis is the false positive rate, and the scale is also 0.
  • the corresponding hypergeometric distribution test p value can be determined according to the expected sensitivity or specificity as a card value (Pc value) to distinguish between healthy people and tumor patients. In this application, the corresponding card value Pc is determined with a specificity of 95%.
  • C1-tumor related outlier protein library the number of proteins included is m;
  • T-protein detected in the urine proteome of all healthy people and tumor patients the number of proteins included is 15447;
  • C1 ⁇ C2- represents the intersection of C1 and C2, and the number of proteins included is q.
  • the above (3) is the urinary proteome data of 106 tumor patients (48 urinary proteome data were randomly selected from the data of 154 tumor urinary proteome data in B data set to generate tumor test sub-data sets.
  • a training sub-data set B1 (containing 45 tumor urine proteome data) and a corresponding validation sub-data set B2 (containing 61 tumor urine proteome data) randomly generated in the remaining data after B3).
  • 100 random samplings were performed on the urinary proteome data of 106 tumor patients, and 100 pairs of training sub-data sets and verification sub-data sets (100 pairs B1 to B2) were obtained.
  • the same analysis in the above (3) was performed for each pair of sub-data sets (B1 to B2), and 100 candidate tumor-associated outlier urine protein pools C1 and 100 ROC curves were obtained, which corresponded to the area under the maximum ROC curve (0.957).
  • the candidate tumor-associated outlier urinary protein pool C1 was identified as the final tumor-associated outlier urinary protein pool C (containing 509 tumor-associated outliers, see Table 10), and the Pc value at specificity of 95% was 1.78 ⁇ 10 -8
  • test sub-data sets A3 and B3 (containing urinary proteome data of 47 healthy people and 48 tumor patients) that are completely independent (referring to the training and verification process never participated), obtained in (4) above.
  • the final tumor-associated outlier urinary protein pool area C is tested for the ability of healthy people and tumor patients.
  • the method is the same as (3) above, and the p-value of the hypergeometric distribution test of the urine proteome of each healthy person and tumor patient is obtained. And comparing with the card value Pc determined in the above (4) to determine that each urine proteome belongs to a healthy person or a tumor patient, according to the false positive Sex and false negative rates determine the sensitivity and specificity of tumor-associated outlier urinary protein pools to distinguish healthy and cancer patients.
  • the number in parentheses after the first line of various cancers is the number of cases of the tumor urine sample
  • the numbers in the table represent the number of times the corresponding protein is an outlier in the corresponding tumor sample.
  • the identified 509 tumor-associated outliers were: A1BG, A2M, ABCB7, ABCD4, ABCE1, ABHD11, ABHD12, ABHD14B, ACADM, ACADSB, ACE2, ACO2, ACOT9, ACSL3, ACSM2A, ACSM2B, ACTR1B, ADD1, AGT, AHNAK2 AHSG, ALDH1L2, ALDH3A1, ALDH3A2, ALDH3B1, ALDH4A1, ALDOC, AMY2A, AMY2B, ANGPTL6, ANK1, ANPEP, ANXA1, ANXA10, ANXA2, ANXA3, ANXA4, ANXA5, ANXA6, APMAP, APOB, APP, AQP7, ARFIP1, ARG1 ARHGAP1, ARL13B, ARL6IP5, ARL8A, ARMC9, ARRDC1, ASNA1, ASPH, ATP13A3, ATP2A2, ATP6
  • the 509 outliers in the cancer outlier protein library (C) identified in this example are tumor-specific proteins, which can be used as tumor markers for research and development of various early detection or monitoring of cancer based on urine protein detection. In a service, kit or other product.
  • the type of disease to which the urine sample is directed can be adjusted, and it can be used to develop services and products (such as protein markers of specific diseases) for classifying different diseases and conditions, which are not enumerated here, but Similar changes made by those skilled in the art with reference to this embodiment also belong to the present disclosure.
  • the invention provides a method for establishing a quantitative reference range of a healthy human urine proteome and a database of healthy human urine proteome, and further provides a method for obtaining a disease-related urine protein marker and a tumor-related outlier urine protein library, which can be better excluded. Interference from physiological fluctuations and inter-individual differential proteins during the discovery of urinary protein biomarkers provides a basis for clinical testing and scientific experiments, and is suitable for industrial applications.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Analytical Chemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Bioethics (AREA)
  • Biochemistry (AREA)
  • Pathology (AREA)
  • General Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

L'invention concerne un procédé d'établissement d'une plage de référence quantitative pour le protéome urinaire d'un sujet sain et d'acquisition d'un marqueur de protéine urinaire associé à une maladie, qui consiste à : acquérir une plage de référence quantitative pour le protéome urinaire d'un sujet sain sur la base d'un ensemble de données de protéome urinaire établi du sujet sain, utiliser un procédé d'inspection de distribution hypergéométrique dans un ensemble de données de protéome urinaire d'un patient atteint d'une certaine maladie afin de cribler une protéine aberrante pour servir de marqueur de protéine urinaire associé à la maladie, et établir une bibliothèque de protéines urinaires aberrantes liée à une tumeur en prenant une tumeur en tant qu'exemple. Le procédé élimine des interférences de fluctuations physiologiques et de protéines différentielles entre des individus dans un processus de criblage pour un biomarqueur de protéine urinaire.
PCT/CN2017/113550 2017-01-20 2017-11-29 Procédé d'établissement d'une plage de référence quantitative pour un protéome urinaire d'un sujet sain et d'acquisition d'un marqueur de protéine urinaire associé à une maladie WO2018133553A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201710048188.0A CN108334752B (zh) 2017-01-20 2017-01-20 建立健康人尿蛋白质组定量参考范围的方法及健康人尿蛋白质组数据库
CN201710048188.0 2017-01-20
CN201710051714.9 2017-01-20
CN201710051714.9A CN108334747B (zh) 2017-01-20 2017-01-20 获取肿瘤尿蛋白标志物的方法及得到的肿瘤相关离群尿蛋白库

Publications (1)

Publication Number Publication Date
WO2018133553A1 true WO2018133553A1 (fr) 2018-07-26

Family

ID=62907724

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/113550 WO2018133553A1 (fr) 2017-01-20 2017-11-29 Procédé d'établissement d'une plage de référence quantitative pour un protéome urinaire d'un sujet sain et d'acquisition d'un marqueur de protéine urinaire associé à une maladie

Country Status (1)

Country Link
WO (1) WO2018133553A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101067833A (zh) * 2007-05-09 2007-11-07 冯连元 将临床医学上各种检测或化验结果的正常范围参考值及其实际测量值统一标化的方法
CN103884806A (zh) * 2012-12-21 2014-06-25 中国科学院大连化学物理研究所 结合二级质谱和机器学习算法的蛋白质组无标记定量方法
WO2016083832A1 (fr) * 2014-11-28 2016-06-02 The University Of Birmingham Pronostic du cancer de la vessie

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101067833A (zh) * 2007-05-09 2007-11-07 冯连元 将临床医学上各种检测或化验结果的正常范围参考值及其实际测量值统一标化的方法
CN103884806A (zh) * 2012-12-21 2014-06-25 中国科学院大连化学物理研究所 结合二级质谱和机器学习算法的蛋白质组无标记定量方法
WO2016083832A1 (fr) * 2014-11-28 2016-06-02 The University Of Birmingham Pronostic du cancer de la vessie

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
AN, LONGFEI ET AL.: "Application of Proteomics to Screen Urinary Biomarkers in Women with Cervical Cancer", CHINA BIOTECHNOLOGY, vol. 36, no. 9, 31 December 2016 (2016-12-31), pages 1 - 10 *
CHEN, BIN ET AL.: "Non-official translation: Lesson 11: How to Determine Reference Value Ranges", CHINESE JOURNAL OF PREVENTIVE MEDICINE, vol. 36, no. 5, 30 September 2002 (2002-09-30), pages 355 - 357 *
CUI, YOUHONG ET AL.: "Non- official translation: Quantitative Electrophoresis Analysis and Reference Values of Healthy Human Urine Protein Components", CHINESE JOURNAL OF NEPHROLOGY, 31 December 1994 (1994-12-31) *
ZHAO, XUHONG ET AL.: "Urinary Proteomic Analysis for Diagnosis and Differential Diagnosis of Prostatic Hyperplasia", JOURNAL OF CAPITAL MEDICAL UNIVERSITY, vol. 30, 30 June 2009 (2009-06-30), pages 277 - 281 *

Similar Documents

Publication Publication Date Title
US20230324407A1 (en) Identification and use of glycopeptides as biomarkers for diagnosis and treatment monitoring
US20190178895A1 (en) Methods and algorithms for aiding in the detection of cancer
CN109884302A (zh) 基于代谢组学和人工智能技术的肺癌早期诊断标志物及其应用
JP2021103177A (ja) 自閉症スペクトラム障害のリスクを決定するための方法およびシステム
JP7441303B2 (ja) サンプル調製、データ生成、タンパク質コロナ分析のためのシステムおよび方法
Kavallaris et al. Proteomics and disease: opportunities and challenges
Bilello The agony and ecstasy of “OMIC” technologies in drug development
US20220328129A1 (en) Multi-omic assessment
WO2011157655A1 (fr) Utilisation des acides de la bile pour la prédiction d'une apparition de sepsie
Leng et al. Proof-of-concept workflow for establishing reference intervals of human urine proteome for monitoring physiological and pathological changes
CN113396332A (zh) 评价妊娠进展和早产流产以进行临床干预的方法及其应用
US9678086B2 (en) Diagnostic assay for Alzheimer's disease
Karley et al. Biomarkers: The future of medical science to detect cancer
CN108334752B (zh) 建立健康人尿蛋白质组定量参考范围的方法及健康人尿蛋白质组数据库
US20220260559A1 (en) Biomarkers for diagnosing alzheimer's disease
CN118215845A (zh) 增强的对生物分子的检测和定量
Chao et al. Towards proteome standards: the use of absolute quantitation in high-throughput biomarker discovery
Donovan et al. Peptide-centric analyses of human plasma enable increased resolution of biological insights into non-small cell lung cancer relative to protein-centric analysis
US20230223111A1 (en) Multi-omic assessment
CN108334747B (zh) 获取肿瘤尿蛋白标志物的方法及得到的肿瘤相关离群尿蛋白库
WO2018133553A1 (fr) Procédé d'établissement d'une plage de référence quantitative pour un protéome urinaire d'un sujet sain et d'acquisition d'un marqueur de protéine urinaire associé à une maladie
GB2607436A (en) Multi-omic assessment
CN117396983A (zh) 多组学评估
Campbell et al. Applying gene expression microarrays to pulmonary disease
Bhandari et al. Analysis of toxin-and toxicant-induced biomarker signatures using microarrays

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17892995

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17892995

Country of ref document: EP

Kind code of ref document: A1