CN112760375A - Characteristic miRNA expression profile combination and endometrial cancer early-stage prediction method - Google Patents

Characteristic miRNA expression profile combination and endometrial cancer early-stage prediction method Download PDF

Info

Publication number
CN112760375A
CN112760375A CN202010775103.0A CN202010775103A CN112760375A CN 112760375 A CN112760375 A CN 112760375A CN 202010775103 A CN202010775103 A CN 202010775103A CN 112760375 A CN112760375 A CN 112760375A
Authority
CN
China
Prior art keywords
mirna
sample
expression
prediction
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010775103.0A
Other languages
Chinese (zh)
Inventor
孙婷婷
刘大海
李文兴
亓飞
刘蕾娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Foshan University
Original Assignee
Foshan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Foshan University filed Critical Foshan University
Priority to CN202010775103.0A priority Critical patent/CN112760375A/en
Publication of CN112760375A publication Critical patent/CN112760375A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B35/00ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/112Disease subtyping, staging or classification
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/178Oligonucleotides characterized by their use miRNA, siRNA or ncRNA

Abstract

The invention discloses a characteristic miRNA expression profile combination and an early endometrial cancer prediction method, wherein a nucleotide probe sequence of miRNA is shown in SEQ ID NO. 1-12. The method for evaluating the early-stage risk of endometrial cancer based on the miRNA expression profile combined characteristics has high precision and accuracy (AUC (area under ROC curve) ═ 0.994). The relative expression quantity of the 12 miRNAs is only required to be obtained, the early stage endometrial cancer morbidity is calculated through a support vector machine model, and the early stage endometrial cancer morbidity can be used as a reference basis for early stage endometrial cancer prediction.

Description

Characteristic miRNA expression profile combination and endometrial cancer early-stage prediction method
Technical Field
The invention belongs to the technical field of biotechnology and medicine, and particularly relates to a characteristic miRNA expression profile combination and an early endometrial cancer prediction method.
Background
Endometrial cancer (endometrial carcinoma) is an epithelial malignancy that occurs in the endometrium, frequently in perimenopausal and postmenopausal women. Endometrial cancer is one of the most common tumors of the female reproductive system, the incidence of which is closely related to life style, and the incidence rate is greatly different in various regions. The most common symptoms of endometrial cancer are postmenopausal or perimenopausal bleeding, which is difficult to diagnose early. Global Burden of Disease (GBD) data shows that the number of people with uterine cancer worldwide in 2017 is over 300 million, with the number of people with uterine cancer in china being about 53 million. The number of deaths with uterine cancer in 2017 is about 85 ten thousand, accounting for 0.15% of the total deaths. The number of the death patients in 2017 in China is about 12 thousands, accounting for 0.12 percent of the total death number. Statistics show that global uterine cancer prevalence and mortality rates continue to increase from 1990 to 2017. The prevalence rate of endometrial cancer in China is consistent with that of the endometrial cancer in the world, and the death rate is relatively stable.
A Support Vector Machine (SVM) is a generalized linear classifier that performs binary classification on data in a supervised learning manner, and a decision boundary of the SVM is a maximum edge distance hyperplane for solving a learning sample. The SVM model represents instances as points in space, so that the mapping is such that instances of the individual classes are separated by as wide an apparent interval as possible. The new instances are then mapped to the same space and the categories are predicted based on which side of the interval they fall on. When the training data is linearly separable, the SVM is classified by hard interval maximization learning. When the training data is linearly non-separable, the SVM is classified by using a kernel technique and soft interval maximization learning. SVMs are powerful for medium-sized data sets with similar meaning of features and are also suitable for small data sets. In general, the prediction effect is good for the SVM data set with the sample size less than 1 ten thousand. SVM has a wide range of applications in disease diagnosis, tumor classification, tumor gene recognition, and the like.
Early diagnosis of tumors has been a difficult problem in the medical community. The existing early diagnosis methods mostly observe the expression level of a certain marker or a class of markers, and the ideal diagnosis effect is difficult to achieve. Since the expression profiles of these markers in tumor patients and normal populations partially overlap, it is difficult to define a cut-off for the markers that better separates tumor patients from normal populations. Therefore, the use of multiple marker expression signature combinations may be an effective method for early diagnosis of tumors. MicroRNA (miRNA) is a non-coding single-stranded RNA molecule of about 21-25 nucleotides in length encoded by an endogenous gene that regulates gene expression primarily in a variety of ways. miRNA is relatively stable in expression in human body and easy to detect. Since the expression distribution of individual mirnas overlaps in tumor and normal populations, it is difficult to define the critical value for early diagnosis.
Therefore, there is a need to establish a predictive model for a more stable combination of differential miRNA expression signatures that contributes to the early prediction of endometrial cancer.
Disclosure of Invention
In view of the above, the present invention provides a combination of characteristic miRNA expression profiles and an early stage endometrial cancer prediction method, which can accurately predict stage I/II of endometrial cancer.
In order to solve the technical problem, the invention discloses a characteristic miRNA expression profile combination, which comprises hsa-let-7a-1, hsa-let-7a-2, hsa-let-7a-3, hsa-let-7b, hsa-mir-10b, hsa-mir-126, hsa-mir-1307, hsa-mir-22, hsa-mir-23b, hsa-mir-28, hsa-mir-30a and hsa-mir-30e, wherein the nucleotide probe sequence of the characteristic miRNA expression profile combination is shown in SEQ ID NO. 1-12.
The invention also discloses an endometrial cancer early prediction method based on the characteristic miRNA expression profile combination, which comprises the following steps:
step 1, obtaining characteristic miRNA stably and differentially expressed by an early endometrial cancer patient;
step 2, selecting characteristic miRNA expression data, and carrying out data standardization on each sample;
step 3, constructing an early prediction model for the standardized data by using a support vector machine;
step 4, carrying out early prediction according to the expression level of the patient characteristic miRNA;
the method is for non-disease diagnostic and therapeutic purposes.
Optionally, the obtaining of the miRNA characteristics stably and differentially expressed in the patient with early stage endometrial cancer in step 1 comprises:
step 1.1, downloading transcriptome Data and clinical Data of tumor tissues and para-carcinoma tissues of the endometrial cancer patient from a Genomic Data common Data Portal database, obtaining a tumor tissue gene expression profile read counts value of the endometrial cancer patient, namely a sequencing read value, and carrying out logarithmic conversion;
step 1.2, selecting miRNA with certain expression abundance, namely the read counts of the miRNA in all samples are more than or equal to 10. Taking logarithm of the read counts of all miRNA, setting the total number of samples as n, the total number of screened miRNA as m, v as the read counts of miRNA, u as the expression value after taking logarithm, and then obtaining the result;
uij=log2vij,i∈(1,n),j∈(1,m) (1)
wherein i is the sample number, j is the miRNA number, uijThe expression value after taking logarithm of the No. i sample and No. j miRNA number, vijThe read counts values for the ith sample and the jth miRNA number;
step 1.3, selecting endometrial cancer patients with disease stages I and II, and recording the patients as early-stage endometrial cancer patients, wherein the total number of the early-stage endometrial cancer patients is recorded as n';
step 1.4, selecting miRNA stably expressed in the tumor sample and the normal sample, namely miRNA with variation coefficient less than 0.1 in the tumor sample and the normal sample, setting mu as the expression mean value of miRNA in all samples, and sigma as standard deviation, wherein the calculation formula of the variation coefficient is as follows:
Figure RE-GDA0003013477360000031
wherein j is miRNA number, cvIs the coefficient of variation, cvjCoefficient of variation, σ, for the j-th samplejStandard deviation, μ for the jth miRNA numberjSetting m as the expression average value of the miRNA numbered by the jth miRNA1For the total number of stably expressed mirnas, there are:
Figure RE-GDA0003013477360000032
step 1.5, selecting miRNA in tumor and normal samples for differential expression, and calculating the fold change f of the miRNA in the tumor and normal samples after logarithm taking by using expression values after logarithm taking, wherein the formula is as follows:
Figure RE-GDA0003013477360000041
wherein j is miRNA number, fjFold change for the jth miRNA number, μ1jExpression mean, μ, of tumor samples numbered for the jth miRNA2jThe expression mean value of the normal sample numbered for the jth miRNA;
then comparing the expression difference of miRNA in the tumor sample and the normal sample by using independent sample t test, wherein the independent sample t test formula is as follows:
Figure RE-GDA0003013477360000042
wherein n is1Is the number of tumor samples, n2Is a normal number of samples, mu1Mean expression of miRNA in tumor sample, mu2Is the mean value of the expression of miRNA in a normal sample,
Figure RE-GDA0003013477360000043
the variance of the miRNA in the tumor sample is shown,
Figure RE-GDA0003013477360000044
miRNA variance for normal samples;
all the p-values from the t-tests were corrected for False Discovery Rate (FDR),defining q as the value after FDR correction and r as the value of p in m1The sequenced positions of the mirnas are as follows:
Figure RE-GDA0003013477360000045
wherein j is miRNA number, qjRepresents the FDR corrected value of the jth miRNA number, pjP-value, r, from t-test representing the number of the j miRNAjP-value at m representing the number of the j miRNA1The sequenced positions in the individual mirnas;
finally, selecting miRNA with the multiple change f absolute value larger than 1 and FDR corrected q value smaller than or equal to 0.05, recording as characteristic miRNA, and setting the total number of characteristic miRNA as m2Then, there are:
m2=m1{|fj|≥1,qj≤0.05},j∈(1,m1) (7)。
optionally, the characteristic miRNA expression data selected in step 2 is normalized for each sample, and the formula is as follows:
Figure RE-GDA0003013477360000051
wherein i is the sample number, j is the characteristic miRNA number, muiThe mean value, sigma, of all the miRNA expression characteristics of the ith sampleiAll characteristic miRNA standard deviations, u, of the ith sampleijTaking logarithmic characteristic miRNA expression value, uij' is the normalized miRNA value.
Optionally, the step 3 of constructing an early prediction model for the normalized data by using a support vector machine specifically includes:
step 3.1, grouping all samples, dividing 80% of all samples into a training set and a verification set, and dividing the rest 20% of all samples into a test set; the training set and the verification set are used for 5-fold cross verification, namely the training set and the verification set are divided into 5 groups which are equal, one group is used as the verification set in sequence, and the other 4 groups are used as the training set; parameters are given, a training set is used for constructing a model, and a verification set is used for checking the accuracy of the model;
step 3.2, optimal parameter screening, wherein the parameter gamma in the SVM controls the width of a Gaussian kernel, and C is a regularization parameter and limits the importance of each point; the parameter grid is set as:
gamma=[0.001,0.01,0.1,1,10,100] (9)
C=[0.001,0.01,0.1,1,10,100] (10)
in the cross validation, a model is constructed by sequentially using the combination of every two parameters gamma and C, and then the accuracy of the model is checked by using a validation set; for each parameter combination, each verification of 5-fold cross verification generates 1 precision, and 5 times of verification is performed to generate 5 precisions; selecting a parameter combination with the highest average accuracy of 5 times of verification as an optimal parameter;
3.3, constructing a model by using the optimal parameters and data of the training set and the verification set, and finally evaluating the model by using the test set, wherein evaluation indexes comprise accuracy (accuracy), accuracy (precision), recall (call), specificity (specificity), F1score (F1score), Matthews Correlation Coefficient (MCC) and area under the Receiver Operating Curve (ROC) (AUC); in the test set, defining the tumor count as True Positive (TP), the tumor count as normal but predicted as False Positive (FP), the tumor count as true but predicted as normal False Negative (FN), the tumor count as normal but predicted as True Negative (TN); the above evaluation index calculation formula is:
Figure RE-GDA0003013477360000061
Figure RE-GDA0003013477360000062
Figure RE-GDA0003013477360000063
Figure RE-GDA0003013477360000064
Figure RE-GDA0003013477360000065
Figure RE-GDA0003013477360000066
Figure RE-GDA0003013477360000067
the accuracy, recall, specificity, F1score and AUC of the above assessment indices returned values between (0, 1); the higher the accuracy is, the higher the overall prediction efficiency of the model is; higher accuracy indicates that the class I error is smaller; higher recall indicates that a class II error is being made smaller; the high specificity indicates that few negative examples are mixed in the samples predicted to be positive examples; the F1score is a comprehensive index and is a harmonic average of the accuracy rate and the recall rate; MCC is the correlation coefficient between observed and predicted binary classifications, returning a value between (-1, 1), where 1 represents perfect prediction, 0 represents no better than random prediction, -1 represents a complete disparity between prediction and observation; a higher AUC indicates a higher probability of a positive instance being predicted by the classifier; therefore, the closer the above index is to 1, the better the overall prediction effect of the model is;
and 3.4, if the evaluation indexes are all larger than 0.9, the model has a better prediction effect, and then all data are used and the optimal parameter combination is used for constructing a final prediction model.
Optionally, the early prediction according to the expression level of miRNA characteristic to the patient in step 4 is specifically:
step 4.1, standardizing the characteristic miRNA expression data of the prediction sample, setting u as the characteristic miRNA expression value of the prediction sample, mu as the characteristic miRNA expression mean value of the prediction sample, and sigma as the standard deviation of the characteristic miRNA of the prediction sample, wherein the formula is as follows:
Figure RE-GDA0003013477360000071
wherein j is the characteristic miRNA number, uj' is the normalized miRNA value;
and 4.2, substituting the miRNA value after the standardization of the prediction sample into the final prediction to predict, wherein the prediction result is 1 to indicate that the endometrial cancer is suffered, and the prediction result is 0 to indicate that the endometrial cancer is normal.
Compared with the prior art, the invention can obtain the following technical effects:
1) the prediction speed of the invention is fast: the prediction model constructed by the invention can be used for rapidly predicting large-scale samples, and the prediction time of 100 samples only needs a few seconds.
2) The invention has high accuracy: the prediction model constructed by the method has high prediction accuracy and accuracy which are both over 90 percent, and the AUC of the area under the ROC curve can reach 0.994.
3) The platform heterogeneity of the invention has little influence: due to the fact that miRNA expression values measured by different analysis platforms have large differences, the standardized characteristic miRNA expression values are used in prediction, and therefore the influence of platform heterogeneity is small.
Of course, it is not necessary for any one product in which the invention is practiced to achieve all of the above-described technical effects simultaneously.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a flow chart of data screening and model building according to the present invention;
FIG. 2 is a cross-validation parameter optimization process for a support vector machine model according to the present invention;
FIG. 3 is a diagram of a test set evaluation index for a support vector machine model according to the present invention;
FIG. 4 is a support vector machine model test set ROC curve of the present invention.
Detailed Description
The following embodiments are described in detail with reference to the accompanying drawings, so that how to implement the technical features of the present invention to solve the technical problems and achieve the technical effects can be fully understood and implemented.
The invention discloses an endometrial cancer early prediction method based on a characteristic miRNA expression profile combination, which is used for the purpose of non-disease diagnosis and treatment and comprises the following steps:
step 1, obtaining the miRNA with the characteristics of stable differential expression of early endometrial cancer patients, which specifically comprises the following steps:
step 1.1, downloading transcriptome Data and clinical Data of tumor tissues and para-carcinoma tissues of the endometrial cancer patient from a Genomic Data common Data Portal database, obtaining a tumor tissue gene expression profile read counts value of the endometrial cancer patient, namely a sequencing read value, and carrying out logarithmic conversion;
step 1.2, selecting miRNA with certain expression abundance, namely the read counts of the miRNA in all samples are more than or equal to 10. Taking logarithm of the read counts of all miRNA, setting the total number of samples as n, the total number of screened miRNA as m, v as the read counts of miRNA, u as the expression value after taking logarithm, and then obtaining the result;
uij=log2vij,i∈(1,n),j∈(1,m) (1)
wherein i is the sample number, j is the miRNA number, uijThe expression value after taking logarithm of the No. i sample and No. j miRNA number, vijThe read counts values for the ith sample and the jth miRNA number;
step 1.3, selecting endometrial cancer patients with disease stages I and II, and recording the patients as early-stage endometrial cancer patients, wherein the total number of the early-stage endometrial cancer patients is recorded as n';
step 1.4, selecting miRNA stably expressed in the tumor sample and the normal sample, namely miRNA with variation coefficient less than 0.1 in the tumor sample and the normal sample, setting mu as the expression mean value of miRNA in all samples, and sigma as standard deviation, wherein the calculation formula of the variation coefficient is as follows:
Figure RE-GDA0003013477360000081
wherein j is miRNA number, cvIs the coefficient of variation, cvjCoefficient of variation, σ, for the j-th samplejStandard deviation, μ for the jth miRNA numberjSetting m as the expression average value of the miRNA numbered by the jth miRNA1For the total number of stably expressed mirnas, there are:
Figure RE-GDA0003013477360000091
step 1.5, miRNA which are differentially expressed in tumor and normal samples are selected. Calculating the logarithm fold change f of the miRNA of the tumor sample and the normal sample by using the expression value after logarithm taking, wherein the formula is as follows:
Figure RE-GDA0003013477360000092
wherein j is miRNA number, fjFold change for the jth miRNA number, μ1jExpression mean, μ, of tumor samples numbered for the jth miRNA2jExpression means of normal samples numbered for the jth miRNA.
Then comparing the expression difference of miRNA in the tumor sample and the normal sample by using independent sample t test, wherein the independent sample t test formula is as follows:
Figure RE-GDA0003013477360000093
wherein n is1Is the number of tumor samples, n2Is a normal number of samples, mu1Is tumor-likeMean expression of the present miRNA,. mu.2Is the mean value of the expression of miRNA in a normal sample,
Figure RE-GDA0003013477360000094
the variance of the miRNA in the tumor sample is shown,
Figure RE-GDA0003013477360000095
miRNA variance was normal sample.
Correcting the p values obtained by all t tests by using a False Discovery Rate (FDR), wherein q is a value corrected by the FDR, and r is a p value in m1The sequenced positions of the mirnas are as follows:
Figure RE-GDA0003013477360000096
wherein j is miRNA number, qjRepresents the FDR corrected value of the jth miRNA number, pjP-value, r, from t-test representing the number of the j miRNAjP-value at m representing the number of the j miRNA1The sequenced positions in the individual mirnas;
finally, selecting miRNA with the multiple change f absolute value larger than 1 and FDR corrected q value smaller than or equal to 0.05, recording as characteristic miRNA, and setting the total number of characteristic miRNA as m2Then, there are:
m2=m1{|fj|≥1,qj≤0.05},j∈(1,m1) (7)。
step 2, selecting characteristic miRNA expression data, and carrying out data standardization on each sample, wherein the formula is as follows:
Figure RE-GDA0003013477360000101
wherein i is the sample number, j is the characteristic miRNA number, muiThe mean value, sigma, of all the miRNA expression characteristics of the ith sampleiAll characteristic miRNA standard deviations, u, of the ith sampleijTaking logarithmic characteristic miRNA expression value, uij' is standardizationThe latter miRNA values.
Step 3, constructing an early prediction model for the standardized data by using a support vector machine, specifically:
and 3.1, grouping all samples. 80% of all samples are divided into training set + validation set, and the remaining 20% are divided into test set. The training set and the verification set are used for 5-fold cross validation, namely the training set and the verification set are divided into 5 groups which are equal, one group is used as the verification set in sequence, and the other 4 groups are used as the training set. Given the parameters, the training set is used to construct the model, and the validation set is used to verify the accuracy of the model.
And 3.2, screening the optimal parameters. The parameter gamma in the SVM controls the width of the Gaussian kernel, and C is a regularization parameter, limiting the importance of each point. The parameter grid is set as:
gamma=[0.001,0.01,0.1,1,10,100] (9)
C=[0.001,0.01,0.1,1,10,100] (10)
in cross-validation, the model is constructed using a combination of every two parameters gamma and C in turn, and then the validation set is used to verify the model accuracy. For each parameter combination, each validation of 5-fold cross-validation yielded 1 accuracy, and a total of 5 validations yielded 5 accuracies. And selecting the parameter combination with the highest average accuracy of 5 times of verification as the optimal parameter.
And 3.3, constructing a model by using the optimal parameters and the data of the training set and the verification set, and finally evaluating the model by using the test set. The evaluation index includes accuracy (accuracy), accuracy (precision), recall (call), specificity (specificity), F1score (F1score), Mathematic Correlation Coefficient (MCC), and area under the Receiver Operating Curve (ROC) (AUC). In the test set, the tumor counts are defined as True Positive (TP), normal but predicted tumor counts as False Positive (FP), tumor counts as False Negative (FN), and normal and predicted as True Negative (TN). The above evaluation index calculation formula is:
Figure RE-GDA0003013477360000111
Figure RE-GDA0003013477360000112
Figure RE-GDA0003013477360000113
Figure RE-GDA0003013477360000114
Figure RE-GDA0003013477360000115
Figure RE-GDA0003013477360000116
Figure RE-GDA0003013477360000117
the accuracy, recall, specificity, F1score and AUC returned values between (0, 1) in the above evaluation indices. The higher the accuracy is, the higher the overall prediction efficiency of the model is; higher accuracy indicates that the class I error is smaller; higher recall indicates that a class II error is being made smaller; the high specificity indicates that few negative examples are mixed in the samples predicted to be positive examples; the F1score is a comprehensive index and is a harmonic average of the accuracy rate and the recall rate; MCC is the correlation coefficient between observed and predicted binary classifications, returning a value between (-1, 1), where 1 represents perfect prediction, 0 represents no better than random prediction, -1 represents a complete disparity between prediction and observation; a higher AUC indicates a higher probability of a positive instance being predicted by the classifier. Therefore, the closer the above index is to 1, the better the prediction effect of the entire model is.
And 3.4, if the evaluation indexes are all larger than 0.9, the model has a better prediction effect. The final prediction model is constructed with the optimal parameter combinations using all the data.
Step 4, carrying out early prediction according to the expression level of the patient characteristic miRNA, specifically comprising the following steps:
step 4.1, standardizing the characteristic miRNA expression data of the prediction sample, setting u as the characteristic miRNA expression value of the prediction sample, mu as the characteristic miRNA expression mean value of the prediction sample, and sigma as the standard deviation of the characteristic miRNA of the prediction sample, wherein the formula is as follows:
Figure RE-GDA0003013477360000121
wherein j is the characteristic miRNA number, uj' is the normalized miRNA value.
And 4.2, substituting the miRNA value after the standardization of the prediction sample into the final prediction for prediction. A prediction of 1 indicates endometrial cancer and a prediction of 0 indicates normal.
Example 1
An early endometrial cancer prediction method based on characteristic miRNA expression profile combination comprises the following steps:
step 1, obtaining miRNA (characteristic miRNA) stably and differentially expressed in an early endometrial cancer patient, wherein the detailed flow is shown in figure 1.
Step 1.1, downloading transcriptome Data and clinical Data of tumor tissues and para-carcinoma tissues of the endometrial cancer patient from a Genomic Data common Data Portal database, obtaining tumor tissue gene expression profile read counts values of the endometrial cancer patient, and carrying out logarithmic conversion.
Step 1.2, selecting miRNA with certain expression abundance, namely the read counts of the miRNA in all samples are more than or equal to 10, which is detailed in formula (1).
Step 1.3, select endometrial cancer patients with stage I and stage II disease, see formulas (2) - (3) for details, and designate these patients as early stage endometrial cancer patients.
Step 1.4, selecting miRNA stably expressed in the tumor sample and the normal sample, namely miRNA with variation coefficient less than 0.1 in the tumor sample and the normal sample.
Step 1.5, miRNA which are differentially expressed in tumor and normal samples are selected, and the detailed formulas are shown in formulas (4) to (7). Is recorded as a characteristic miRNA.
Through the screening, 12 miRNA for the endometrial cancer characteristics are finally obtained, and are shown in Table 1. The nucleotide probe sequences of 12 endometrial cancer characteristic miRNAs are shown in Table 2.
TABLE 1 miRNA characteristic of endometrial cancer
Figure RE-GDA0003013477360000131
TABLE 2 nucleotide probe sequences of miRNA characteristic of endometrial cancer
Figure RE-GDA0003013477360000132
Figure RE-GDA0003013477360000141
And 2, carrying out data standardization on each sample, wherein the details are shown in a formula (8).
And 3, constructing an early diagnosis model for the standardized data by using a support vector machine.
And 3.1, grouping all samples. 80% of all samples are divided into training set + validation set, and the remaining 20% are divided into test set. The training set and the verification set are used for 5-fold cross validation, namely the training set and the verification set are divided into 5 groups which are equal, one group is used as the verification set in sequence, and the other 4 groups are used as the training set. Given the parameters, the training set is used to construct the model, and the validation set is used to verify the accuracy of the model. See figure 1 for details.
And 3.2, screening the optimal parameters. The SVM parameter grid is set by formulas (9) - (10). In cross-validation, the model is constructed using a combination of every two parameters gamma and C in turn, and then the validation set is used to verify the model accuracy. For each parameter combination, each validation of 5-fold cross-validation yielded 1 accuracy, and a total of 5 validations yielded 5 accuracies. And selecting the parameter combination with the highest average accuracy of 5 times of verification as the optimal parameter. Fig. 2 shows the cross-validation parameter optimization process, where the model cross-validation accuracy is highest when the parameter gamma is 1 and the parameter C is 10: 0.985. the optimal parameters of the model are therefore: gamma is 1 and C is 10.
And 3.3, constructing a model by using the optimal parameters and the data of the training set and the verification set, and finally evaluating the model by using the test set. The evaluation index includes accuracy (accuracy), accuracy (precision), recall (call), specificity (specificity), F1score (F1score), Mathematic Correlation Coefficient (MCC), and area under the Receiver Operating Curve (ROC) (AUC). The evaluation index is described in detail in formulas (11) to (17).
Step 3.4, fig. 3 shows accuracy, recall, specificity, F1score and MCC in the above evaluation indexes, all of which 6 indexes are greater than 0.92; FIG. 4 shows the ROC curve and AUC, with an AUC of 0.994 in the test set. The evaluation indexes show that the model has good prediction effect. Thus, using all the data, the final prediction model is constructed with the optimal parameter combinations.
And 4, performing early prediction according to the expression level of the miRNA characteristic of the patient:
and 4.1, standardizing the characteristic miRNA expression data of the prediction sample, wherein the details are shown in a formula (18). The method randomly selects 10 samples for prediction, and eliminates the 10 samples when a final prediction model is constructed. The numbers of 10 selected samples and the values of the normalized characteristic mirnas are shown in table 3.
TABLE 3.10 sample numbers and values normalized for characteristic miRNAs
Figure RE-GDA0003013477360000151
And 4.2, substituting the miRNA value after the standardization of the prediction sample into the final prediction for prediction. A prediction of 1 indicates endometrial cancer and a prediction of 0 indicates normal. The sample numbers of 10 cases, corresponding TCGA numbers, actual states and predicted results are shown in Table 4. The predicted results of 10 samples completely accord with the actual state, which shows that the invention can accurately predict the endometrial cancer in an early stage.
TABLE 4.10 sample numbers, corresponding TCGA numbers, actual and predicted states
Figure RE-GDA0003013477360000152
Figure RE-GDA0003013477360000161
In conclusion, the characteristic miRNA expression profile combination has high prediction accuracy, and can effectively predict endometrial cancer at an early stage. In addition, the method has no platform dependency, and can predict data from various sources.
While the foregoing description shows and describes several preferred embodiments of the invention, it is to be understood, as noted above, that the invention is not limited to the forms disclosed herein, but is not to be construed as excluding other embodiments and is capable of use in various other combinations, modifications, and environments and is capable of changes within the scope of the inventive concept as expressed herein, commensurate with the above teachings, or the skill or knowledge of the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.
SEQUENCE LISTING
<110> institute of Buddha science and technology
<120> characteristic miRNA expression profile combination and early prediction method of endometrial cancer
<130> 2020
<160> 12
<170> PatentIn version 3.3
<210> 1
<211> 16
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 1
tgggatgagg tagtag 16
<210> 2
<211> 15
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 2
aggttgaggt agtag 15
<210> 3
<211> 17
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 3
gggtgaggta gtaggtt 17
<210> 4
<211> 16
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 4
gggaaggcag taggtt 16
<210> 5
<211> 19
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 5
attcccctag aatcgaatc 19
<210> 6
<211> 18
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 6
cgcattatta ctcacggt 18
<210> 7
<211> 13
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 7
cacgaccgac gcc 13
<210> 8
<211> 19
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 8
acagttcttc aactggcag 19
<210> 9
<211> 17
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 9
ggtaatccct ggcaatg 17
<210> 10
<211> 14
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 10
tccaggagct caca 14
<210> 11
<211> 17
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 11
gctgcaaaca tccgact 17
<210> 12
<211> 18
<212> DNA
<213> Artificial sequence (Artificial sequence)
<400> 12
gctgtaaaca tccgactg 18

Claims (6)

1. A characteristic miRNA expression profile combination is characterized by comprising hsa-let-7a-1, hsa-let-7a-2, hsa-let-7a-3, hsa-let-7b, hsa-mir-10b, hsa-mir-126, hsa-mir-1307, hsa-mir-22, hsa-mir-23b, hsa-mir-28, hsa-mir-30a and hsa-mir-30e, wherein the nucleotide probe sequence of the characteristic miRNA expression profile combination is shown in SEQ ID No. 1-12.
2. A method for the early prediction of endometrial cancer based on the combination of characteristic miRNA profiles of claim 1, comprising the steps of:
step 1, obtaining characteristic miRNA stably and differentially expressed by an early endometrial cancer patient;
step 2, selecting characteristic miRNA expression data, and carrying out data standardization on each sample;
step 3, constructing an early prediction model for the standardized data by using a support vector machine;
step 4, carrying out early prediction according to the expression level of the patient characteristic miRNA;
the method is for non-disease diagnostic and therapeutic purposes.
3. The early prediction method of claim 2, wherein the obtaining of the miRNA in step 1 for the characteristic miRNA stably and differentially expressed in the patient with early endometrial cancer is specifically:
step 1.1, downloading transcriptome Data and clinical Data of tumor tissues and para-carcinoma tissues of the endometrial cancer patient from a Genomic Data common Data Portal database, obtaining a tumor tissue gene expression profile read counts value of the endometrial cancer patient, namely a sequencing read value, and carrying out logarithmic conversion;
step 1.2, selecting miRNA with certain expression abundance, namely the read counts of the miRNA in all samples are more than or equal to 10. Taking logarithm of the read counts of all miRNA, setting the total number of samples as n, the total number of screened miRNA as m, v as the read counts of miRNA, u as the expression value after taking logarithm, then:
uij=log2vij,i∈(1,n),j∈(1,m) (1)
wherein i is the sample number, j is the miRNA number, uijThe expression value after taking logarithm of the No. i sample and No. j miRNA number, vijRead co numbering the ith sample, jth miRNAThe values of units;
step 1.3, selecting endometrial cancer patients with disease stages I and II, and recording the patients as early-stage endometrial cancer patients, wherein the total number of the early-stage endometrial cancer patients is recorded as n';
step 1.4, selecting miRNA stably expressed in the tumor sample and the normal sample, namely miRNA with variation coefficient less than 0.1 in the tumor sample and the normal sample, setting mu as the expression mean value of miRNA in all samples, and sigma as standard deviation, wherein the calculation formula of the variation coefficient is as follows:
Figure FDA0002617844410000021
wherein j is miRNA number, cvIs the coefficient of variation, cvjCoefficient of variation, σ, for the j-th samplejStandard deviation, μ for the jth miRNA numberjSetting m as the expression average value of the miRNA numbered by the jth miRNA1For the total number of stably expressed mirnas, there are:
Figure FDA0002617844410000025
step 1.5, selecting miRNA in tumor and normal samples for differential expression, and calculating the fold change f of the miRNA in the tumor and normal samples after logarithm taking by using expression values after logarithm taking, wherein the formula is as follows:
Figure FDA0002617844410000026
wherein j is miRNA number, fjFold change for the jth miRNA number, μ1jExpression mean, μ, of tumor samples numbered for the jth miRNA2jThe expression mean value of the normal sample numbered for the jth miRNA;
then comparing the expression difference of miRNA in the tumor sample and the normal sample by using independent sample t test, wherein the independent sample t test formula is as follows:
Figure FDA0002617844410000022
wherein n is1Is the number of tumor samples, n2Is a normal number of samples, mu1Mean expression of miRNA in tumor sample, mu2Is the mean value of the expression of miRNA in a normal sample,
Figure FDA0002617844410000023
the variance of the miRNA in the tumor sample is shown,
Figure FDA0002617844410000024
miRNA variance for normal samples;
correcting the p values obtained by all t tests by using a False Discovery Rate (FDR), wherein q is a value corrected by the FDR, and r is a p value in m1The sequenced positions of the mirnas are as follows:
Figure FDA0002617844410000031
wherein j is miRNA number, qjRepresents the FDR corrected value of the jth miRNA number, pjP-value, r, from t-test representing the number of the j miRNAjP-value at m representing the number of the j miRNA1The sequenced positions in the individual mirnas;
finally, selecting miRNA with the multiple change f absolute value larger than 1 and FDR corrected q value smaller than or equal to 0.05, recording as characteristic miRNA, and setting the total number of characteristic miRNA as m2Then, there are:
m2=m1{|fj|≥1,qj≤0.05},j∈(1,m1) (7)。
4. the early prediction method of claim 2, wherein the characteristic miRNA expression data in step 2 is selected and normalized for each sample according to the formula:
Figure FDA0002617844410000032
wherein i is the sample number, j is the characteristic miRNA number, muiThe mean value, sigma, of all the miRNA expression characteristics of the ith sampleiAll characteristic miRNA standard deviations, u, of the ith sampleijTaking logarithmic characteristic miRNA expression value, uij' is the normalized miRNA value.
5. The early prediction method according to claim 2, wherein the step 3 of constructing the early prediction model on the normalized data by using a support vector machine comprises:
step 3.1, grouping all samples, dividing 80% of all samples into a training set and a verification set, and dividing the rest 20% of all samples into a test set; the training set and the verification set are used for 5-fold cross verification, namely the training set and the verification set are divided into 5 groups which are equal, one group is used as the verification set in sequence, and the other 4 groups are used as the training set; parameters are given, a training set is used for constructing a model, and a verification set is used for checking the accuracy of the model;
step 3.2, optimal parameter screening, wherein the parameter gamma in the SVM controls the width of a Gaussian kernel, and C is a regularization parameter and limits the importance of each point; the parameter grid is set as:
gamma=[0.001,0.01,0.1,1,10,100] (9)
Figure FDA0002617844410000047
in the cross validation, a model is constructed by sequentially using the combination of every two parameters gamma and C, and then the accuracy of the model is checked by using a validation set; for each parameter combination, each verification of 5-fold cross verification generates 1 precision, and 5 times of verification is performed to generate 5 precisions; selecting a parameter combination with the highest average accuracy of 5 times of verification as an optimal parameter;
3.3, constructing a model by using the optimal parameters and data of the training set and the verification set, and finally evaluating the model by using the test set, wherein evaluation indexes comprise accuracy (accuracy), accuracy (precision), recall (call), specificity (specificity), F1score (F1score), Matthews Correlation Coefficient (MCC) and area under the Receiver Operating Curve (ROC) (AUC); in the test set, defining the tumor count as True Positive (TP), the tumor count as normal but predicted as False Positive (FP), the tumor count as true but predicted as normal False Negative (FN), the tumor count as normal but predicted as True Negative (TN); the above evaluation index calculation formula is:
Figure FDA0002617844410000041
Figure FDA0002617844410000042
Figure FDA0002617844410000043
Figure FDA0002617844410000044
Figure FDA0002617844410000045
Figure FDA0002617844410000046
Figure FDA0002617844410000051
the accuracy, recall, specificity, F1score and AUC of the above assessment indices returned values between (0, 1); the higher the accuracy is, the higher the overall prediction efficiency of the model is; higher accuracy indicates that the class I error is smaller; higher recall indicates that a class II error is being made smaller; the high specificity indicates that few negative examples are mixed in the samples predicted to be positive examples; the F1score is a comprehensive index and is a harmonic average of the accuracy rate and the recall rate; MCC is the correlation coefficient between observed and predicted binary classifications, returning a value between (-1, 1), where 1 represents perfect prediction, 0 represents no better than random prediction, -1 represents a complete disparity between prediction and observation; a higher AUC indicates a higher probability of a positive instance being predicted by the classifier; therefore, the closer the above index is to 1, the better the overall prediction effect of the model is;
and 3.4, if the evaluation indexes are all larger than 0.9, the model has a better prediction effect, and then all data are used and the optimal parameter combination is used for constructing a final prediction model.
6. The early prediction method according to claim 2, wherein the early prediction in step 4 is performed according to the expression level of miRNA characteristic of the patient, specifically:
step 4.1, standardizing the characteristic miRNA expression data of the prediction sample, setting u as the characteristic miRNA expression value of the prediction sample, mu as the characteristic miRNA expression mean value of the prediction sample, and sigma as the standard deviation of the characteristic miRNA of the prediction sample, wherein the formula is as follows:
Figure FDA0002617844410000052
wherein j is the characteristic miRNA number, uj' is the normalized miRNA value;
and 4.2, substituting the miRNA value after the standardization of the prediction sample into the final prediction to predict, wherein the prediction result is 1 to indicate that the endometrial cancer is suffered, and the prediction result is 0 to indicate that the endometrial cancer is normal.
CN202010775103.0A 2020-08-04 2020-08-04 Characteristic miRNA expression profile combination and endometrial cancer early-stage prediction method Pending CN112760375A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010775103.0A CN112760375A (en) 2020-08-04 2020-08-04 Characteristic miRNA expression profile combination and endometrial cancer early-stage prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010775103.0A CN112760375A (en) 2020-08-04 2020-08-04 Characteristic miRNA expression profile combination and endometrial cancer early-stage prediction method

Publications (1)

Publication Number Publication Date
CN112760375A true CN112760375A (en) 2021-05-07

Family

ID=75693032

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010775103.0A Pending CN112760375A (en) 2020-08-04 2020-08-04 Characteristic miRNA expression profile combination and endometrial cancer early-stage prediction method

Country Status (1)

Country Link
CN (1) CN112760375A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008073921A2 (en) * 2006-12-08 2008-06-19 Asuragen, Inc. Mir-126 regulated genes and pathways as targets for therapeutic intervention
CN101622350A (en) * 2006-12-08 2010-01-06 奥斯瑞根公司 miR-126 regulated genes and pathways as targets for therapeutic intervention
CN101675165A (en) * 2006-12-08 2010-03-17 奥斯瑞根公司 The function of LET-7 Microrna and target
CN103476947A (en) * 2011-03-02 2013-12-25 格路福生物制药公司 Enhanced biodistribution of oligomers
WO2014114802A1 (en) * 2013-01-25 2014-07-31 Charité - Universitätsmedizin Berlin Non-invasive prenatal genetic diagnostic methods

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008073921A2 (en) * 2006-12-08 2008-06-19 Asuragen, Inc. Mir-126 regulated genes and pathways as targets for therapeutic intervention
CN101622350A (en) * 2006-12-08 2010-01-06 奥斯瑞根公司 miR-126 regulated genes and pathways as targets for therapeutic intervention
CN101675165A (en) * 2006-12-08 2010-03-17 奥斯瑞根公司 The function of LET-7 Microrna and target
CN103476947A (en) * 2011-03-02 2013-12-25 格路福生物制药公司 Enhanced biodistribution of oligomers
WO2014114802A1 (en) * 2013-01-25 2014-07-31 Charité - Universitätsmedizin Berlin Non-invasive prenatal genetic diagnostic methods

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ANNA TORRES等: "Diagnostic and prognostic significance of miRNA signatures in tissues and plasma of endometrioid endometrial carcinoma patients", 《IJC》, pages 1633 - 1645 *
唐开 等: "miRNA在子宫内膜癌发病机制中的研究进展", 《现代肿瘤医学》, pages 831 - 833 *
审查员: "序列1-12检索结果", Retrieved from the Internet <URL:blast.ncbi.nlm.nih.gov/Blast.cgi;mirbase.org> *

Similar Documents

Publication Publication Date Title
CN111748632A (en) Characteristic lincRNA expression profile combination and liver cancer early prediction method
CN112020565A (en) Quality control template for ensuring validity of sequencing-based assays
EP2438187A1 (en) miRNA FINGERPRINT IN THE DIAGNOSIS OF MULTIPLE SCLEROSIS
CN111748633A (en) Characteristic miRNA expression profile combination and head and neck squamous cell carcinoma early prediction method
CN110305964A (en) A kind of foundation of patients with prostate cancer prognosis recurrence risk profile mark tool and its risk evaluation model
CN109830264B (en) Method for classifying tumor patients based on methylation sites
CN114203256B (en) MIBC typing and prognosis prediction model construction method based on microbial abundance
CN111733251A (en) Characteristic miRNA expression profile combination and early prediction method of renal clear cell carcinoma
CN111944900A (en) Characteristic lincRNA expression profile combination and early endometrial cancer prediction method
CN111748634A (en) Characteristic lincRNA expression profile combination and early prediction method of colon cancer
CN111763738A (en) Characteristic mRNA expression profile combination and liver cancer early prediction method
CN111944902A (en) Early prediction method of renal papillary cell carcinoma based on lincRNA expression profile combination characteristics
AU2020215312A1 (en) Method of predicting survival rates for cancer patients
Sarmah et al. A simple Affymetrix ratio-transformation method yields comparable expression level quantifications with cDNA data
CN113517073A (en) Method and system for predicting survival rate after lung cancer surgery
CN111733252A (en) Characteristic miRNA expression profile combination and early gastric cancer prediction method
CN111808965A (en) Characteristic lincRNA expression profile combination and early prediction method of renal clear cell carcinoma
CN111793692A (en) Characteristic miRNA expression profile combination and lung squamous carcinoma early prediction method
CN107075586B (en) Glycosyltransferase gene expression profiling for identifying multiple cancer types and subtypes
CN106415563A (en) Systems and methods for predicting a smoking status of an individual
CN116312800A (en) Lung cancer characteristic identification method, device and storage medium based on circulating RNA whole transcriptome sequencing in blood plasma
CN111850124A (en) Characteristic lincRNA expression profile combination and lung squamous carcinoma early prediction method
CN114107515B (en) Early gastric cancer prognosis differential gene and recurrence prediction model
CN112760375A (en) Characteristic miRNA expression profile combination and endometrial cancer early-stage prediction method
CN116364179A (en) Colorectal cancer prognosis marker screening system and method and colorectal cancer prognosis risk assessment system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination