CN111944902A - Early prediction method of renal papillary cell carcinoma based on lincRNA expression profile combination characteristics - Google Patents
Early prediction method of renal papillary cell carcinoma based on lincRNA expression profile combination characteristics Download PDFInfo
- Publication number
- CN111944902A CN111944902A CN202010775535.1A CN202010775535A CN111944902A CN 111944902 A CN111944902 A CN 111944902A CN 202010775535 A CN202010775535 A CN 202010775535A CN 111944902 A CN111944902 A CN 111944902A
- Authority
- CN
- China
- Prior art keywords
- lincrna
- prediction
- expression
- sample
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 108091007460 Long intergenic noncoding RNA Proteins 0.000 title claims abstract description 182
- 230000014509 gene expression Effects 0.000 title claims abstract description 81
- 201000009030 Carcinoma Diseases 0.000 title claims abstract description 49
- 238000000034 method Methods 0.000 title claims abstract description 28
- 238000012706 support-vector machine Methods 0.000 claims abstract description 20
- 206010028980 Neoplasm Diseases 0.000 claims description 49
- 238000012795 verification Methods 0.000 claims description 21
- 238000012549 training Methods 0.000 claims description 20
- 238000011156 evaluation Methods 0.000 claims description 15
- 230000000694 effects Effects 0.000 claims description 12
- 238000012360 testing method Methods 0.000 claims description 12
- 238000010200 validation analysis Methods 0.000 claims description 11
- 238000002790 cross-validation Methods 0.000 claims description 10
- 201000010099 disease Diseases 0.000 claims description 7
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 7
- 238000012353 t test Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000012216 screening Methods 0.000 claims description 5
- 238000000692 Student's t-test Methods 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000012937 correction Methods 0.000 claims description 3
- 238000010832 independent-sample T-test Methods 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 101000906818 Homo sapiens Putative uncharacterized protein encoded by LINC00476 Proteins 0.000 claims description 2
- 101000878920 Homo sapiens Putative uncharacterized protein encoded by MIR22HG Proteins 0.000 claims description 2
- 102100023767 Putative uncharacterized protein encoded by LINC00476 Human genes 0.000 claims description 2
- 102100037987 Putative uncharacterized protein encoded by MIR22HG Human genes 0.000 claims description 2
- 238000001228 spectrum Methods 0.000 claims description 2
- 230000001225 therapeutic effect Effects 0.000 claims description 2
- 108091028043 Nucleic acid sequence Proteins 0.000 claims 1
- 238000010606 normalization Methods 0.000 claims 1
- 239000002773 nucleotide Substances 0.000 abstract description 5
- 125000003729 nucleotide group Chemical group 0.000 abstract description 5
- 239000000523 sample Substances 0.000 description 46
- 108020004414 DNA Proteins 0.000 description 18
- 230000034994 death Effects 0.000 description 5
- 231100000517 death Toxicity 0.000 description 5
- 208000008839 Kidney Neoplasms Diseases 0.000 description 4
- 206010038389 Renal cancer Diseases 0.000 description 4
- 238000013399 early diagnosis Methods 0.000 description 4
- 201000010982 kidney cancer Diseases 0.000 description 4
- 238000003745 diagnosis Methods 0.000 description 2
- 239000003550 marker Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 108090000623 proteins and genes Proteins 0.000 description 2
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 1
- 108091026890 Coding region Proteins 0.000 description 1
- 208000006265 Renal cell carcinoma Diseases 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 108091027963 non-coding RNA Proteins 0.000 description 1
- 102000042567 non-coding RNA Human genes 0.000 description 1
- 238000004393 prognosis Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 230000004083 survival effect Effects 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
- G16B25/10—Gene or protein expression profiling; Expression-ratio estimation or normalisation
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/112—Disease subtyping, staging or classification
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/158—Expression markers
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/178—Oligonucleotides characterized by their use miRNA, siRNA or ncRNA
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Chemical & Material Sciences (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Genetics & Genomics (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Analytical Chemistry (AREA)
- Theoretical Computer Science (AREA)
- Pathology (AREA)
- Molecular Biology (AREA)
- Public Health (AREA)
- Data Mining & Analysis (AREA)
- Organic Chemistry (AREA)
- Epidemiology (AREA)
- Immunology (AREA)
- Databases & Information Systems (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Primary Health Care (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Hospice & Palliative Care (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Oncology (AREA)
- Microbiology (AREA)
- Bioethics (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biochemistry (AREA)
- General Engineering & Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a characteristic lincRNA expression profile combination and a renal papillary cell carcinoma early-stage prediction method, wherein the nucleotide sequence of lincRNA is shown as SEQ ID NO.1-SEQ ID NO. 18. The method comprises the following steps: obtaining characteristic lincRNA stably and differentially expressed by a patient in early stage of renal papillary cell carcinoma; selecting characteristic lincRNA expression data, and carrying out data standardization on each sample; constructing an early prediction model for the standardized data by using a support vector machine; early prediction was performed based on the patient's characteristic lincRNA expression level. The invention has high precision and accuracy (AUC under ROC curve is 0.988). The early stage morbidity probability of the renal papillary cell carcinoma can be calculated by only acquiring the relative expression quantity of the 18 lincRNAs, and the early stage morbidity probability can be used as a reference basis for early stage prediction and treatment of the renal papillary cell carcinoma.
Description
Technical Field
The invention belongs to the technical field of biotechnology and medicine, and particularly relates to an early prediction method of renal papillary cell carcinoma based on lincRNA expression profile combination characteristics.
Background
Renal papillary cell carcinoma (renal papillary cell carcinoma) accounts for approximately 15% of renal cell carcinomas. The onset age is generally 50-70 years, and the prevalence rate of male and female is 2-3.9: 1. The prognosis of renal papillary cell carcinoma is better, and the 5-year survival rate can reach more than 80%. Global Burden of Disease (GBD) data shows that over 210 million people with renal cancer worldwide in 2017, with about 27 million people in china. The number of deaths of kidney cancer in 2017 worldwide is about 14 ten thousand, accounting for 0.25% of the total deaths. The number of the death patients in 2017 years in China is about 1.7 ten thousand, and accounts for 0.16 percent of the total death number. Statistics show that the prevalence and mortality of kidney cancer continues to increase worldwide from 1990 to 2017. In China, the prevalence rate and the death rate of the renal cancer are relatively stable in the last decade.
A Support Vector Machine (SVM) is a generalized linear classifier that performs binary classification on data in a supervised learning manner, and a decision boundary of the SVM is a maximum edge distance hyperplane for solving a learning sample. The SVM model represents instances as points in space, so that the mapping is such that instances of the individual classes are separated by as wide an apparent interval as possible. The new instances are then mapped to the same space and the categories are predicted based on which side of the interval they fall on. When the training data is linearly separable, the SVM is classified by hard interval maximization learning. When the training data is linearly non-separable, the SVM is classified by using a kernel technique and soft interval maximization learning. SVMs are powerful for medium-sized data sets with similar meaning of features and are also suitable for small data sets. In general, the prediction effect is good for the SVM data set with the sample size less than 1 ten thousand. SVM has a wide range of applications in disease diagnosis, tumor classification, tumor gene recognition, and the like.
Early diagnosis of tumors has been a difficult problem in the medical community. The existing early diagnosis methods mostly observe the expression level of a certain marker or a class of markers, and the ideal diagnosis effect is difficult to achieve. Since the expression profiles of these markers in tumor patients and normal populations partially overlap, it is difficult to define a cut-off for the markers that better separates tumor patients from normal populations. Therefore, the use of multiple marker expression signature combinations may be an effective method for early diagnosis of tumors. Long-stranded intergenic non-coding RNA (lincRNA) is a type of non-coding single-stranded RNA molecule with a length greater than 200 nucleotides located in the intergenic non-coding sequence. lincRNA has no coding potential and is not conserved between different species. Research shows that lincRNA is involved in the expression regulation of multiple genes, and the lincRNA is relatively stable in expression in a human body and easy to detect. Since the expression distribution of individual lincRNA molecules in tumor and normal human populations overlap, it is difficult to define a critical value for early diagnosis.
Therefore, there is a need to develop a more stable predictive model of the combination of multiple differential lincRNA expression profiles that will aid in the early prediction of renal papillary cell carcinoma.
Disclosure of Invention
In view of the above, the invention provides an early renal papillary cell carcinoma prediction method based on lincRNA expression profile combination characteristics, which can accurately predict the stage I/II renal papillary cell carcinoma.
In order to solve the technical problems, the invention discloses a characteristic lincRNA expression profile combination, which comprises AC026401.3, AC040977.1, AC091563.1, AC104825.1, AF127577.4, ATP2B1-AS1, CRNDE, DNAJC3-DT, LINC00476, LINC01503, LINC02381, LINC02604, LINC02615, MIR210HG, MIR22HG, MIR4435-2HG, SEPT7-AS1 and U91328.1, and the nucleotide sequence of the characteristic lincRNA expression profile combination is shown in SEQ ID NO.1-SEQ ID NO. 18.
The invention also discloses a renal papillary cell carcinoma early-stage prediction method based on lincRNA expression profile combination characteristics, which comprises the following steps:
step 2, selecting characteristic lincRNA expression data, and carrying out data standardization on each sample;
step 3, constructing an early prediction model for the standardized data by using a support vector machine;
step 4, carrying out early prediction according to the expression level of lincRNA (lincRNA) of the patient characteristics;
the method is useful for non-disease diagnostic and therapeutic purposes.
Optionally, the characteristic lincRNA stably and differentially expressed by the patient in the early stage of renal papillary cell carcinoma obtained in the step 1 is specifically:
step 1.1, downloading tumor tissue and para-carcinoma tissue transcriptome Data and clinical Data of a renal papillary cell carcinoma patient from a Genomic Data common Data Portal database to obtain a tumor tissue gene expression spectrum read counts value of the renal papillary cell carcinoma patient, namely a sequencing read value, and carrying out logarithmic conversion;
step 1.2, selecting lincRNA with the read counts of the lincRNA in all samples being more than or equal to 10, taking the logarithm of the read counts of all the lincRNA, setting the total number of the samples as n, setting the total number of the screened lincRNA as m, v as the read counts of the lincRNA, and u as an expression value after taking the logarithm, wherein the expression value is obtained;
uij=log2vij,i∈(1,n),j∈(1,m) (1)
wherein i is the sample number, j is the lincRNA number, uijExpression value after taking logarithm of ith sample and jth lincRNA number, vijRead counts values for the ith sample, jth lincRNA number;
step 1.3, selecting renal papillary cell carcinoma patients with disease stages of I and II, recording the patients as renal papillary cell carcinoma early-stage patients, and recording the total number of the renal papillary cell carcinoma early-stage patients as n';
step 1.4, selecting lincRNA with the coefficient of variation smaller than 0.2 in both tumor and normal samples, namely setting mu as the expression mean value of the lincRNA in all samples, and setting sigma as standard deviation, wherein the calculation formula of the coefficient of variation is as follows:
wherein j is the lincRNA number, cvIs the coefficient of variation, cvjCoefficient of variation, σ, for the j-th samplejStandard deviation for jth lincRNA numbering, μjThe expression average of lincRNA numbered for the jth lincRNA, set as m1For the total number of stably expressed lincrnas, the following are:
step 1.5, selecting lincRNA which is differentially expressed in a tumor sample and a normal sample; the log-taken expression values were used to calculate the log-taken fold change f of the lincrnas in tumor and normal samples, and the formula is:
wherein j is the lincRNA number, fjFold change for jth lincRNA numbering,. mu.1jExpression mean, μ, of tumor samples numbered for jth lincRNA2jThe expression mean of the normal sample numbered for the jth lincRNA;
the expression difference of lincRNA in tumor and normal samples was then compared using independent sample t-test, which was formulated as:
wherein n is1Is the number of tumor samples, n2Is a normal number of samples, mu1Mean expression of lincRNA in tumor samples, μ2Is the mean value of the expression of lincRNA in a normal sample,the variance of lincRNA in the tumor sample,lincRNA variance for normal samples;
correcting the p values obtained by all t tests by using a False Discovery Rate (FDR), wherein q is a value corrected by the FDR, and r is a p value in m1The sequenced positions in each lincRNA are:
wherein j is the lincRNA number, qjRepresents the FDR corrected value of the jth lincRNA number, pjP-value, r, from t-test representing the jth lincRNA numberjP-value at m representing the jth lincRNA number1The sequenced position in each lincRNA;
finally, lincRNA with the absolute value of the fold change f larger than 1 and the q value smaller than or equal to 0.05 after FDR correction is selected and marked as characteristic lincRNA, and the total number of the characteristic lincRNA is set as m2Then, there are:
m2=m1{|fj|≥1,qj≤0.05},j∈(1,m1) (7)。
optionally, the selecting characteristic lincRNA expression data in the step 2, and the normalizing the data of each sample specifically comprises:
wherein i is the sample number and j is the characteristic lincRNA number; mu.siThe mean, σ, of all characteristic lincRNA expression of the ith sampleiFor all characteristic lincRNA standard deviations, u, of the i-th sampleijTo take the characteristic lincRNA expression value after log, uij' is the normalized lincRNA value.
Optionally, the constructing an early prediction model for the normalized data by using the support vector machine in step 3 specifically includes:
step 3.1, grouping all samples, dividing 80% of all samples into a training set and a verification set, dividing the rest 20% of all samples into a test set, wherein the training set and the verification set are used for 5-fold cross verification, namely, dividing the training set and the verification set into 5 equal groups, taking one group as the verification set and the rest 4 groups as the training set in sequence, giving parameters, wherein the training set is used for constructing a model, and the verification set is used for checking the accuracy of the model;
step 3.2, optimal parameter screening, wherein the parameter gamma in the SVM controls the width of a Gaussian kernel, and C is a regularization parameter and limits the importance of each point; the parameter grid is set as:
gamma=[0.001,0.01,0.1,1,10,100] (9)
C=[0.001,0.01,0.1,1,10,100] (10)
in the cross validation, a model is constructed by sequentially using the combination of every two parameters gamma and C, then a validation set is used for checking the accuracy of the model, for each parameter combination, 1 accuracy is generated in each validation of 5-fold cross validation, 5 accuracies are generated by carrying out 5 times of validation in total, and the parameter combination with the highest average accuracy of the 5 verifications is selected as the optimal parameter;
3.3, constructing a model by using the optimal parameters and data of the training set and the verification set, and finally evaluating the model by using the test set, wherein evaluation indexes comprise accuracy (accuracy), accuracy (precision), recall (call), specificity (specificity), F1 score (F1 score), Matthews Correlation Coefficient (MCC) and area under the Receiver Operating Curve (ROC) (AUC); in the test set, defining the actual tumor and predicted tumor count as True Positive (TP), the actual normal but predicted tumor count as False Positive (FP), the actual tumor but predicted normal as False Negative (FN), and the actual normal and predicted normal as True Negative (TN), the above evaluation index calculation formula is:
the accuracy, recall rate, specificity, F1 score and AUC in the above evaluation indexes return values between (0, 1), and the higher the accuracy is, the higher the overall prediction efficiency of the model is; higher accuracy indicates that the class I error is smaller; higher recall indicates that a class II error is being made smaller; the high specificity indicates that few negative examples are mixed in the samples predicted to be positive examples; the F1 score is a comprehensive index and is a harmonic average of the accuracy rate and the recall rate; MCC is the correlation coefficient between observed and predicted binary classifications, returning a value between (-1, 1), where 1 represents perfect prediction, 0 represents no better than random prediction, -1 represents a complete disparity between prediction and observation; the higher AUC indicates the higher probability of the positive case predicted by the classifier, so that the closer the indexes are to 1, the better the overall prediction effect of the model is;
and 3.4, if the evaluation indexes are all larger than 0.9, the model has a better prediction effect, and then all data are used and the optimal parameter combination is used for constructing a final prediction model.
Optionally, the optimal parameters are: gamma is 0.01 and C is 100.
Optionally, the early prediction according to the expression level of lincRNA characteristic to the patient in the step 4 comprises:
step 4.1, standardizing the characteristic lincRNA expression data of the prediction sample, setting u as the characteristic lincRNA expression value of the prediction sample, setting mu as the average value of the characteristic lincRNA expression of the prediction sample, and setting sigma as the standard deviation of the characteristic lincRNA of the prediction sample, wherein the formula is as follows:
wherein j is the characteristic lincRNA numbering, uj' is the normalized lincRNA value;
step 4.2, substituting the normalized lincRNA value of the prediction sample into the final prediction for prediction; a prediction of 1 indicates the presence of renal papillary cell carcinoma, and a prediction of 0 indicates normal.
Compared with the prior art, the invention can obtain the following technical effects:
1) the invention has fast prediction speed: the prediction model constructed by the invention can be used for rapidly predicting large-scale samples, and the prediction time of 100 samples only needs a few seconds.
2) The invention has high accuracy: the prediction model constructed by the method has high prediction accuracy and accuracy which are both over 90 percent, and the area AUC under the ROC curve can reach 0.988.
3) The influence of the platform heterogeneity is small: since there is a large difference in lincRNA expression values determined for different analysis platforms, the present invention predicts the use of normalized characteristic lincRNA expression values and is therefore less affected by platform heterogeneity.
Of course, it is not necessary for any one product in which the invention is practiced to achieve all of the above-described technical effects simultaneously.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a flow chart of data screening and model building according to the present invention;
FIG. 2 is a cross-validation parameter optimization process for a support vector machine model according to the present invention;
FIG. 3 is a diagram of a test set evaluation index for a support vector machine model according to the present invention;
FIG. 4 is a support vector machine model test set ROC curve of the present invention.
Detailed Description
The following embodiments are described in detail with reference to the accompanying drawings, so that how to implement the technical features of the present invention to solve the technical problems and achieve the technical effects can be fully understood and implemented.
The invention discloses an early prediction method of renal papillary cell carcinoma based on lincRNA expression profile combination characteristics, which comprises the following steps:
step 1.1, downloading tumor tissue and para-carcinoma tissue transcriptome Data and clinical Data of a renal papillary cell carcinoma patient from a Genomic Data common Data Portal database, obtaining a tumor tissue gene expression profile sequencing reading (read counts) value of the renal papillary cell carcinoma patient, and carrying out logarithmic conversion;
and 1.2, selecting the lincRNA with certain expression abundance, namely the read counts of the lincRNA in all samples are more than or equal to 10. Taking the logarithm of the read counts of all the lincRNAs, setting the total number of samples as n, setting the total number of the screened lincRNAs as m, setting v as the read counts of the lincRNAs, and setting u as the expression value after taking the logarithm, wherein the number of the read counts is m;
uij=log2vij,i∈(1,n),j∈(1,m) (1)
wherein i is the sample number, j is the lincRNA number, uijExpression value after taking logarithm of ith sample and jth lincRNA number, vijNumber of read counts numbering the ith sample, jth lincRNAThe value is obtained.
Step 1.3, selecting renal papillary cell carcinoma patients with disease stages of I and II, recording the patients as renal papillary cell carcinoma early-stage patients, and recording the total number of the renal papillary cell carcinoma early-stage patients as n';
step 1.4, selecting the lincRNA stably expressed in the tumor sample and the normal sample, namely the lincRNA with the coefficient of variation smaller than 0.2 in the tumor sample and the normal sample, setting mu as the expression mean value of the lincRNA in all samples, setting sigma as the standard deviation, and calculating the coefficient of variation according to the formula:
wherein j is the lincRNA number, cvIs the coefficient of variation, cvjCoefficient of variation, σ, for the j-th samplejStandard deviation for jth lincRNA numbering, μjThe expression mean of lincRNA numbered for the jth lincRNA; let m1For the total number of stably expressed lincrnas, the following are:
step 1.5, lincRNA differentially expressed in tumor and normal samples was selected. The log-taken expression values were used to calculate the log-taken fold change f of the lincrnas in tumor and normal samples, and the formula is:
wherein j is the lincRNA number, fjFold change for jth lincRNA numbering,. mu.1jExpression mean, μ, of tumor samples numbered for jth lincRNA2jThe expression mean of the normal sample numbered for the jth lincRNA.
The expression difference of lincRNA in tumor and normal samples was then compared using independent sample t-test, which was formulated as:
wherein n is1Is the number of tumor samples, n2Is a normal number of samples, mu1Mean expression of lincRNA in tumor samples, μ2Is the mean value of the expression of lincRNA in a normal sample,the variance of lincRNA in the tumor sample,is the lincRNA variance of normal samples.
Correcting the p values obtained by all t tests by using a False Discovery Rate (FDR), wherein q is a value corrected by the FDR, and r is a p value in m1The sequenced positions in each lincRNA are:
wherein j is the lincRNA number, qjRepresents the FDR corrected value of the jth lincRNA number, pjP-value, r, from t-test representing the jth lincRNA numberjP-value at m representing the jth lincRNA number1The sequenced positions in each lincRNA.
Finally, lincRNA with the absolute value of the fold change f larger than 1 and the q value smaller than or equal to 0.05 after FDR correction is selected and marked as characteristic lincRNA, and the total number of the characteristic lincRNA is set as m2Then, there are:
m2=m1{|fj|≥1,qj≤0.05},j∈(1,m1) (7)
through the screening, 18 lincRNA characteristic of renal papillary cell carcinoma are finally obtained, and the lincRNA is shown in table 1. The nucleotide probe sequences of the lincRNA characteristic of 18 renal papillary cell carcinomas are shown in table 2.
TABLE 1 characteristics of renal papillary cell carcinoma lincRNA
TABLE 2 nucleotide probe sequences for lincRNA characteristic of renal papillary cell carcinoma
Step 2, selecting characteristic lincRNA expression data, and carrying out data standardization on each sample; the method specifically comprises the following steps:
where i is the sample number and j is the characteristic lincRNA number. Mu.siThe mean, σ, of all characteristic lincRNA expression of the ith sampleiFor all characteristic lincRNA standard deviations, u, of the i-th sampleijTo take the characteristic lincRNA expression value after log, uij' is the normalized lincRNA value.
Step 3, constructing an early prediction model for the standardized data by using a support vector machine; the method specifically comprises the following steps:
and 3.1, grouping all samples. 80% of all samples are divided into training set + validation set, and the remaining 20% are divided into test set. The training set and the verification set are used for 5-fold cross validation, namely the training set and the verification set are divided into 5 groups which are equal, one group is used as the verification set in sequence, and the other 4 groups are used as the training set. Given the parameters, the training set is used to construct the model, and the validation set is used to verify the accuracy of the model, as detailed in FIG. 1.
And 3.2, screening the optimal parameters. The parameter gamma in the SVM controls the width of the Gaussian kernel, and C is a regularization parameter, limiting the importance of each point. The parameter grid is set as:
gamma=[0.001,0.01,0.1,1,10,100] (9)
C=[0.001,0.01,0.1,1,10,100] (10)
in cross-validation, the model is constructed using a combination of every two parameters gamma and C in turn, and then the validation set is used to verify the model accuracy. For each parameter combination, each validation of 5-fold cross-validation yielded 1 accuracy, and a total of 5 validations yielded 5 accuracies. And selecting the parameter combination with the highest average accuracy of 5 times of verification as the optimal parameter.
Fig. 2 shows the cross-validation parameter optimization process, where the model cross-validation accuracy is highest when the parameter gamma is 0.01 and the parameter C is 100: 1.000. the optimal parameters of the model are therefore: gamma is 0.01 and C is 100.
And 3.3, constructing a model by using the optimal parameters and the data of the training set and the verification set, and finally evaluating the model by using the test set. The evaluation indices include accuracy (accuracy), precision (precision), recall (call), specificity (specificity), F1 score (F1 score), Mathematic Correlation Coefficient (MCC), and area under the subject operating curve (ROC) (AUC). In the test set, the tumor counts are defined as True Positive (TP), normal but predicted tumor counts as False Positive (FP), tumor counts as False Negative (FN), and normal and predicted as True Negative (TN). The above evaluation index calculation formula is:
the accuracy, recall, specificity, F1 score and AUC returned values between (0, 1) in the above evaluation indices. The higher the accuracy is, the higher the overall prediction efficiency of the model is; higher accuracy indicates that the class I error is smaller; higher recall indicates that a class II error is being made smaller; the high specificity indicates that few negative examples are mixed in the samples predicted to be positive examples; the F1 score is a comprehensive index and is a harmonic average of the accuracy rate and the recall rate; MCC is the correlation coefficient between observed and predicted binary classifications, returning a value between (-1, 1), where 1 represents perfect prediction, 0 represents no better than random prediction, -1 represents a complete disparity between prediction and observation; a higher AUC indicates a higher probability of a positive instance being predicted by the classifier. Therefore, the closer the above index is to 1, the better the prediction effect of the entire model is.
And 3.4, if the evaluation indexes are all larger than 0.9, the model has a better prediction effect. The final prediction model is constructed with the optimal parameter combinations using all the data.
FIG. 3 shows the accuracy, recall, specificity, F1 score and MCC in the above evaluation criteria, wherein all 6 criteria are greater than 0.91; FIG. 4 shows the ROC curve and AUC, with an AUC of 0.988 in the test set. The evaluation indexes show that the model has good prediction effect. Thus, using all the data, the final prediction model is constructed with the optimal parameter combinations.
And 4, carrying out early prediction according to the expression level of the lincRNA characteristic of the patient, specifically comprising the following steps:
step 4.1, standardizing the characteristic lincRNA expression data of the prediction sample, setting u as the characteristic lincRNA expression value of the prediction sample, setting mu as the average value of the characteristic lincRNA expression of the prediction sample, and setting sigma as the standard deviation of the characteristic lincRNA of the prediction sample, wherein the formula is as follows:
wherein j is the characteristic lincRNA numbering, uj' normalized lincRNA expression values for the jth characteristic lincRNA number.
The method randomly selects 10 samples for prediction, and eliminates the 10 samples when a final prediction model is constructed. The numbers of the 10 samples taken and the normalized characteristic lincRNA values are shown in table 3.
TABLE 3.10 sample numbers and values normalized for characteristic lincRNA
And 4.2, substituting the normalized lincRNA value of the prediction sample into the final prediction for prediction. A prediction of 1 indicates the presence of renal papillary cell carcinoma, and a prediction of 0 indicates normal.
The sample numbers of 10 cases, corresponding TCGA numbers, actual states and predicted results are shown in Table 4. The prediction results of 10 samples completely accord with the actual state, which shows that the invention can accurately predict the renal papillary cell carcinoma in early stage.
TABLE 4.10 sample numbers, corresponding TCGA numbers, actual and predicted states
In conclusion, the combination of the characteristic lincRNA expression profiles of the invention has high prediction accuracy, and can effectively predict renal papillary cell carcinoma in early stage. In addition, the method has no platform dependency, and can predict data from various sources.
While the foregoing description shows and describes several preferred embodiments of the invention, it is to be understood, as noted above, that the invention is not limited to the forms disclosed herein, but is not to be construed as excluding other embodiments and is capable of use in various other combinations, modifications, and environments and is capable of changes within the scope of the inventive concept as expressed herein, commensurate with the above teachings, or the skill or knowledge of the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.
SEQUENCE LISTING
<110> Chinese academy of sciences
<120> a characteristic lincRNA expression profile combination characteristic and early prediction method of renal papillary cell carcinoma
<130> 2019
<160> 18
<170> PatentIn version 3.3
<210> 1
<211> 30
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 1
aacggggttt caccatgttg gccatgctgg 30
<210> 2
<211> 30
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 2
ggcctgagcc cccgggttgg agcgaacata 30
<210> 3
<211> 30
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 3
cactggctta aaaaaatttt ttatagcatc 30
<210> 4
<211> 30
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 4
tcagtgcaag ttcatgaagt gaaagcaaat 30
<210> 5
<211> 30
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 5
attgatgggc atttcagttg tttccagttt 30
<210> 6
<211> 30
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 6
ccggcgctga ggtgcacggc gagaagcccg 30
<210> 7
<211> 30
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 7
atgcaaagaa catggaaaaa tcaaagtgct 30
<210> 8
<211> 30
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 8
tcccctcgat tcttccccag acaaacccgg 30
<210> 9
<211> 30
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 9
ttgcaatcac actgtgagaa actctaccct 30
<210> 10
<211> 30
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 10
aaatgcccac gataaacaaa taataaatag 30
<210> 11
<211> 30
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 11
gcaagtatac aaatttattg aaaaggaaga 30
<210> 12
<211> 30
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 12
atggcctctg cctcctcaca gtggaccccc 30
<210> 13
<211> 30
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 13
gtgaacctag ctcagaagtt tgcaccatga 30
<210> 14
<211> 30
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 14
cattctcaga gcacaaagac cccatgatct 30
<210> 15
<211> 30
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 15
ataagcagcc tcaaggacca agaaccatct 30
<210> 16
<211> 30
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 16
cactgggtcc tgagtctctt gttctggaag 30
<210> 17
<211> 30
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 17
ccgacatccc ctccccccct ccgcgaccag 30
<210> 18
<211> 30
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 18
ctggagcagg gcgggctgca cgactcgcga 30
Claims (7)
1. A combination of characteristic lincRNA expression profiles comprising AC026401.3, AC040977.1, AC091563.1, AC104825.1, AF127577.4, ATP2B1-AS1, CRNDE, DNAJC3-DT, LINC00476, LINC01503, LINC02381, LINC02604, LINC02615, MIR210HG, MIR22HG, MIR4435-2HG, SEPT7-AS1 and U91328.1, the nucleotide sequences of which are set forth in SEQ ID No.1-SEQ ID No. 18.
2. A method for early prediction of renal papillary cell carcinoma based on a combination of lincRNA expression profile features, comprising the steps of:
step 1, obtaining characteristic lincRNA stably and differentially expressed by a patient with renal papillary cell carcinoma at an early stage;
step 2, selecting characteristic lincRNA expression data, and carrying out data standardization on each sample;
step 3, constructing an early prediction model for the standardized data by using a support vector machine;
step 4, carrying out early prediction according to the expression level of lincRNA (lincRNA) of the patient characteristics;
the method is useful for non-disease diagnostic and therapeutic purposes.
3. The prediction method according to claim 2, wherein the characteristic lincRNA stably differentially expressed in the patient with early renal papillary cell carcinoma obtained in step 1 is specifically:
step 1.1, downloading tumor tissue and para-carcinoma tissue transcriptome Data and clinical Data of a renal papillary cell carcinoma patient from a Genomic Data common Data Portal database to obtain a tumor tissue gene expression spectrum read counts value of the renal papillary cell carcinoma patient, namely a sequencing read value, and carrying out logarithmic conversion;
step 1.2, selecting lincRNA with the read counts of the lincRNA in all samples being more than or equal to 10, taking the logarithm of the read counts of all the lincRNA, setting the total number of the samples as n, setting the total number of the screened lincRNA as m, v as the read counts of the lincRNA, and u as an expression value after taking the logarithm, wherein the expression value is obtained;
uij=log2vij,i∈(1,n),j∈(1,m) (1)
wherein i is the sample number, j is the lincRNA number, uijExpression value after taking logarithm of ith sample and jth lincRNA number, vijRead counts values for the ith sample, jth lincRNA number;
step 1.3, selecting renal papillary cell carcinoma patients with disease stages of I and II, recording the patients as renal papillary cell carcinoma early-stage patients, and recording the total number of the renal papillary cell carcinoma early-stage patients as n';
step 1.4, selecting lincRNA with the coefficient of variation smaller than 0.2 in both tumor and normal samples, namely setting mu as the expression mean value of the lincRNA in all samples, and setting sigma as standard deviation, wherein the calculation formula of the coefficient of variation is as follows:
wherein j is the lincRNA number, cvIs the coefficient of variation, cvjCoefficient of variation, σ, for the j-th samplejStandard deviation for jth lincRNA numbering, μjThe expression average of lincRNA numbered for the jth lincRNA, set as m1For the total number of stably expressed lincrnas, the following are:
step 1.5, selecting lincRNA which is differentially expressed in a tumor sample and a normal sample; the log-taken expression values were used to calculate the log-taken fold change f of the lincrnas in tumor and normal samples, and the formula is:
wherein j is the lincRNA number, fjFold change for jth lincRNA numbering,. mu.1jExpression mean, μ, of tumor samples numbered for jth lincRNA2jThe expression mean of the normal sample numbered for the jth lincRNA;
the expression difference of lincRNA in tumor and normal samples was then compared using independent sample t-test, which was formulated as:
wherein n is1Is the number of tumor samples, n2Is a normal number of samples, mu1Mean expression of lincRNA in tumor samples, μ2Is the mean value of the expression of lincRNA in a normal sample,the variance of lincRNA in the tumor sample,lincRNA variance for normal samples;
correcting the p values obtained by all t tests by using a False Discovery Rate (FDR), wherein q is a value corrected by the FDR, and r is a p value in m1The sequenced positions in each lincRNA are:
wherein j is the lincRNA number, qjRepresents the FDR corrected value of the jth lincRNA number, pjP-value, r, from t-test representing the jth lincRNA numberjP-value at m representing the jth lincRNA number1The sequenced position in each lincRNA;
finally, lincRNA with the absolute value of the fold change f larger than 1 and the q value smaller than or equal to 0.05 after FDR correction is selected and marked as characteristic lincRNATotal number of lincRNAs is m2Then, there are:
m2=m1{|fj|≥1,ηj<0.05},j∈(1,m1) (7)。
4. the prediction method according to claim 2, wherein the characteristic lincRNA expression data is selected in step 2, and the data normalization for each sample is specifically:
wherein i is the sample number and j is the characteristic lincRNA number; mu.siThe mean, σ, of all characteristic lincRNA expression of the ith sampleiFor all characteristic lincRNA standard deviations, u, of the i-th sampleijTo take the characteristic lincRNA expression value after log, uij' is the normalized lincRNA value.
5. The prediction method according to claim 2, wherein the constructing of the early prediction model for the normalized data by using the support vector machine in the step 3 is specifically:
step 3.1, grouping all samples, dividing 80% of all samples into a training set and a verification set, dividing the rest 20% of all samples into a test set, wherein the training set and the verification set are used for 5-fold cross verification, namely, dividing the training set and the verification set into 5 equal groups, taking one group as the verification set and the rest 4 groups as the training set in sequence, giving parameters, wherein the training set is used for constructing a model, and the verification set is used for checking the accuracy of the model;
step 3.2, optimal parameter screening, wherein the parameter gamma in the SVM controls the width of a Gaussian kernel, and C is a regularization parameter and limits the importance of each point; the parameter grid is set as:
gamma=[0.001,0.01,0.1,1,10,100] (9)
C=[0.001,0.01,0.1,1,10,100] (10)
in the cross validation, a model is constructed by sequentially using the combination of every two parameters gamma and C, then a validation set is used for checking the accuracy of the model, for each parameter combination, 1 accuracy is generated in each validation of 5-fold cross validation, 5 accuracies are generated by carrying out 5 times of validation in total, and the parameter combination with the highest average accuracy of the 5 verifications is selected as the optimal parameter;
3.3, constructing a model by using the optimal parameters and data of the training set and the verification set, and finally evaluating the model by using the test set, wherein evaluation indexes comprise accuracy (accuracy), accuracy (precision), recall (call), specificity (specificity), F1 score (F1 score), Matthews Correlation Coefficient (MCC) and area under the Receiver Operating Curve (ROC) (AUC); in the test set, defining the actual tumor and predicted tumor count as True Positive (TP), the actual normal but predicted tumor count as False Positive (FP), the actual tumor but predicted normal as False Negative (FN), and the actual normal and predicted normal as True Negative (TN), the above evaluation index calculation formula is:
the accuracy, recall rate, specificity, F1 score and AUC in the above evaluation indexes return values between (0, 1), and the higher the accuracy is, the higher the overall prediction efficiency of the model is; higher accuracy indicates that the class I error is smaller; higher recall indicates that a class II error is being made smaller; the high specificity indicates that few negative examples are mixed in the samples predicted to be positive examples; the F1 score is a comprehensive index and is a harmonic average of the accuracy rate and the recall rate; MCC is the correlation coefficient between observed and predicted binary classifications, returning a value between (-1, 1), where 1 represents perfect prediction, 0 represents no better than random prediction, -1 represents a complete disparity between prediction and observation; the higher AUC indicates the higher probability of the positive case predicted by the classifier, so that the closer the indexes are to 1, the better the overall prediction effect of the model is;
and 3.4, if the evaluation indexes are all larger than 0.9, the model has a better prediction effect, and then all data are used and the optimal parameter combination is used for constructing a final prediction model.
6. The prediction method according to claim 5, wherein the optimal parameters are: gamma is 0.01 and C is 100.
7. The prediction method according to claim 2, wherein the early prediction in step 4 is performed based on the patient-specific lincRNA expression level, and specifically comprises:
step 4.1, standardizing the characteristic lincRNA expression data of the prediction sample, setting u as the characteristic lincRNA expression value of the prediction sample, setting mu as the average value of the characteristic lincRNA expression of the prediction sample, and setting sigma as the standard deviation of the characteristic lincRNA of the prediction sample, wherein the formula is as follows:
wherein j is the characteristic lincRNA numbering, uj' is the normalized lincRNA value;
step 4.2, substituting the normalized lincRNA value of the prediction sample into the final prediction for prediction; a prediction of 1 indicates the presence of renal papillary cell carcinoma, and a prediction of 0 indicates normal.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010775535.1A CN111944902A (en) | 2020-08-04 | 2020-08-04 | Early prediction method of renal papillary cell carcinoma based on lincRNA expression profile combination characteristics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010775535.1A CN111944902A (en) | 2020-08-04 | 2020-08-04 | Early prediction method of renal papillary cell carcinoma based on lincRNA expression profile combination characteristics |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111944902A true CN111944902A (en) | 2020-11-17 |
Family
ID=73339526
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010775535.1A Pending CN111944902A (en) | 2020-08-04 | 2020-08-04 | Early prediction method of renal papillary cell carcinoma based on lincRNA expression profile combination characteristics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111944902A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112410291A (en) * | 2020-11-24 | 2021-02-26 | 艾冬梅 | Dental pulp stem cell odontoblast differentiation inducer |
CN114203254A (en) * | 2021-12-02 | 2022-03-18 | 杭州艾沐蒽生物科技有限公司 | Method for analyzing TCR related to immune characteristics based on artificial intelligence |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011057304A2 (en) * | 2009-11-09 | 2011-05-12 | Yale University | Microrna signatures differentiating uterine and ovarian papillary serous tumors |
CN109055562A (en) * | 2018-10-29 | 2018-12-21 | 深圳市颐康生物科技有限公司 | A kind of biomarker, predict clear-cell carcinoma recurrence and mortality risk method |
CN110273003A (en) * | 2019-07-26 | 2019-09-24 | 安徽医科大学第一附属医院 | A kind of Papillary Renal Cell Carcinoma patient prognosis recurrence detects the foundation of mark tool and its risk evaluation model |
-
2020
- 2020-08-04 CN CN202010775535.1A patent/CN111944902A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011057304A2 (en) * | 2009-11-09 | 2011-05-12 | Yale University | Microrna signatures differentiating uterine and ovarian papillary serous tumors |
CN109055562A (en) * | 2018-10-29 | 2018-12-21 | 深圳市颐康生物科技有限公司 | A kind of biomarker, predict clear-cell carcinoma recurrence and mortality risk method |
CN110273003A (en) * | 2019-07-26 | 2019-09-24 | 安徽医科大学第一附属医院 | A kind of Papillary Renal Cell Carcinoma patient prognosis recurrence detects the foundation of mark tool and its risk evaluation model |
Non-Patent Citations (2)
Title |
---|
HONGHAO CAO ET AL: "A Glycolysis-Based Long Non-coding RNA Signature Accurately Predicts Prognosis in Renal Carcinoma Patients", FRONTIERS IN GENETICS, pages 1 - 11 * |
KATIA TODOERTI ET AL: "DIS3 mutations in multiple myeloma impact the transcriptional signature and clinical outcome", PLASMA CELL DISORDERS, pages 1 - 178 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112410291A (en) * | 2020-11-24 | 2021-02-26 | 艾冬梅 | Dental pulp stem cell odontoblast differentiation inducer |
CN114203254A (en) * | 2021-12-02 | 2022-03-18 | 杭州艾沐蒽生物科技有限公司 | Method for analyzing TCR related to immune characteristics based on artificial intelligence |
CN114203254B (en) * | 2021-12-02 | 2023-05-23 | 杭州艾沐蒽生物科技有限公司 | Method for analyzing immune characteristic related TCR based on artificial intelligence |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111748632A (en) | Characteristic lincRNA expression profile combination and liver cancer early prediction method | |
Goksuluk et al. | MLSeq: Machine learning interface for RNA-sequencing data | |
CN111748633A (en) | Characteristic miRNA expression profile combination and head and neck squamous cell carcinoma early prediction method | |
JP7041614B6 (en) | Multi-level architecture for pattern recognition in biometric data | |
CN114203256B (en) | MIBC typing and prognosis prediction model construction method based on microbial abundance | |
Benso et al. | A cDNA microarray gene expression data classifier for clinical diagnostics based on graph theory | |
CN105808976A (en) | Recommendation model based miRNA target gene prediction method | |
CN111944902A (en) | Early prediction method of renal papillary cell carcinoma based on lincRNA expression profile combination characteristics | |
CN111748634A (en) | Characteristic lincRNA expression profile combination and early prediction method of colon cancer | |
CN111944900A (en) | Characteristic lincRNA expression profile combination and early endometrial cancer prediction method | |
CN113862351A (en) | Kit and method for identifying extracellular RNA biomarkers in body fluid sample | |
CN111763738A (en) | Characteristic mRNA expression profile combination and liver cancer early prediction method | |
CN111733251A (en) | Characteristic miRNA expression profile combination and early prediction method of renal clear cell carcinoma | |
CN111793692A (en) | Characteristic miRNA expression profile combination and lung squamous carcinoma early prediction method | |
CN111808965A (en) | Characteristic lincRNA expression profile combination and early prediction method of renal clear cell carcinoma | |
CN116312800A (en) | Lung cancer characteristic identification method, device and storage medium based on circulating RNA whole transcriptome sequencing in blood plasma | |
CN111850124A (en) | Characteristic lincRNA expression profile combination and lung squamous carcinoma early prediction method | |
CN111733252A (en) | Characteristic miRNA expression profile combination and early gastric cancer prediction method | |
Madjar | Survival models with selection of genomic covariates in heterogeneous cancer studies | |
CN111718996A (en) | Characteristic lincRNA expression profile combination and early gastric cancer prediction method | |
Yu et al. | Simple rule-based ensemble classifiers for cancer DNA microarray data classification | |
CN109887543B (en) | Differential methylation site recognition method for low methylation level | |
CN111944901A (en) | Characteristic mRNA expression profile combination and renal papillary cell carcinoma early prediction method | |
Mythili et al. | CTCHABC-hybrid online sequential fuzzy Extreme Kernel learning method for detection of Breast Cancer with hierarchical Artificial Bee | |
CN112951324A (en) | Pathogenic synonymous mutation prediction method based on undersampling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Country or region after: China Address after: 528000 No. 18, No. 1, Jiangwan, Guangdong, Foshan Applicant after: Foshan University Address before: 528000 No. 18, No. 1, Jiangwan, Guangdong, Foshan Applicant before: FOSHAN University Country or region before: China |