CN117594133A - Screening method of biomarker for distinguishing uterine lesion type and application thereof - Google Patents

Screening method of biomarker for distinguishing uterine lesion type and application thereof Download PDF

Info

Publication number
CN117594133A
CN117594133A CN202410081934.6A CN202410081934A CN117594133A CN 117594133 A CN117594133 A CN 117594133A CN 202410081934 A CN202410081934 A CN 202410081934A CN 117594133 A CN117594133 A CN 117594133A
Authority
CN
China
Prior art keywords
uterine
biomarker
gene set
genes
candidate characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202410081934.6A
Other languages
Chinese (zh)
Inventor
季序我
赵义
李哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Pukang Ruiren Medical Laboratory Co ltd
Predatum Biomedicine Suzhou Co ltd
Precision Scientific Technology Beijing Co ltd
Original Assignee
Beijing Pukang Ruiren Medical Laboratory Co ltd
Predatum Biomedicine Suzhou Co ltd
Precision Scientific Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Pukang Ruiren Medical Laboratory Co ltd, Predatum Biomedicine Suzhou Co ltd, Precision Scientific Technology Beijing Co ltd filed Critical Beijing Pukang Ruiren Medical Laboratory Co ltd
Priority to CN202410081934.6A priority Critical patent/CN117594133A/en
Publication of CN117594133A publication Critical patent/CN117594133A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/30Unsupervised data analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/20Polymerase chain reaction [PCR]; Primer or probe design; Probe optimisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Abstract

The invention discloses a screening method of biomarkers for distinguishing uterine lesion categories and application thereof, and belongs to the technical field of biomarkers. The screening method comprises the following steps: firstly, counting the expression value of each gene in platelets; then performing unsupervised clustering, and screening to obtain candidate characteristic gene sets based on the consistency of unsupervised clustering results and disease labels; drawing an ROC curve, calculating an AUC value, and screening a new candidate characteristic gene set according to the AUC value; and finally, training a machine learning model by using the expression values of all genes in the new candidate characteristic gene set and the disease label to obtain a trained machine learning model and a gene combination corresponding to the machine learning model, and taking the gene combination as a biomarker for distinguishing uterine lesion types. The invention can realize noninvasive detection in clinical application, has simple quantity of the biomarkers, not only contains comprehensive information, but also avoids higher detection cost, and can detect the biomarkers by qPCR technology.

Description

Screening method of biomarker for distinguishing uterine lesion type and application thereof
Technical Field
The invention relates to the technical field of biomarkers, in particular to a screening method of biomarkers for distinguishing uterine lesion categories and application thereof.
Background
Hysteromyoma and uterine sarcoma are two types of uterine lesions, but the surgical modes and prognosis of the two are very different. For hysteromyoma, myoma culling operation is generally adopted, the prognosis is good, and observation after operation is the main; hysterectomy is generally adopted for uterine sarcoma, prognosis is poor, and certain drug treatment is also needed after operation. It follows that accurate discrimination of uterine lesion type is particularly important prior to surgery. However, in current clinical practice, there are some difficulties in accurately discriminating uterine lesions: firstly, the uterine sarcoma judging sensitivity is low, and if the uterine sarcoma is misjudged as hysteromyoma, a larger survival risk is brought to a patient; second, uterine fibroids are less specific to the discriminant and if they are misinterpreted as uterine sarcomas, the patient will be resected and overdreated, causing the patient to lose fertility and increasing the economic burden. Therefore, in order to improve the accuracy of discrimination of uterine lesions, it is necessary to screen for relevant biomarkers.
Currently, some molecular features have been proposed that can be used to distinguish between uterine fibroids and uterine sarcomas, but suffer from the following three drawbacks: first, some molecular characteristics are only single proteins, such as CA125 gene encoding proteins, LDH gene encoding proteins, etc., and the accuracy is low by only judging hysteromyoma and hysterosarcoma according to the content of the single proteins in the blood of a patient; second, some molecular features are based on genomic sequencing or transcriptome sequencing of uterine fibroid and uterine sarcoma lesion tissue samples and then comparing them with each other, including different genomic mutations, different gene copy number variations, differentially expressed genes, etc., and these molecular features do not observe corresponding differences in the blood of patients, so that these molecular features cannot play a role in clinical diagnosis in the form of noninvasive detection; third, the molecular features currently used to distinguish uterine fibroids from uterine sarcomas, for which clinical performance lacks validation based on independent clinical cohorts, therefore the evidence of these molecular features is not of sufficient grade and reliability is questionable.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides the following technical scheme.
The first aspect of the present invention provides a screening method for discriminating a biomarker of uterine lesion class, comprising:
for patients with hysteromyoma and uterine sarcoma, counting the expression value of each gene in the platelets according to the transcriptome sequencing data of the platelets respectively;
performing unsupervised clustering based on the expression values of genes in platelets, and screening to obtain a candidate characteristic gene set based on the consistency of unsupervised clustering results and disease labels, wherein the disease labels are uterine fibroids or uterine sarcomas;
drawing ROC curves according to the distribution of the expression value of each candidate characteristic gene in the candidate characteristic gene set in all hysteromyoma and hysterosarcoma patients and the disease label, calculating AUC values, and screening a new candidate characteristic gene set according to the AUC values;
and training a machine learning model by using the expression values of all genes in the new candidate characteristic gene set and the disease label to obtain a trained machine learning model and a gene combination corresponding to the machine learning model, and taking the gene combination as a biomarker for distinguishing uterine lesion types.
Preferably, the transcriptome sequencing data of the platelets is obtained by the following method:
peripheral blood plasma samples are respectively collected aiming at patients with hysteromyoma and hysterosarcoma, and platelets are separated from the plasma based on a gradient centrifugation experimental method;
extracting RNA from the separated platelets, and reversely transcribing the RNA into complementary DNA;
a sequencing library is constructed for complementary DNA and transcriptome sequencing is performed to obtain transcriptome sequencing data for platelets.
Preferably, the performing unsupervised clustering based on the expression value of the genes in the platelets, and screening to obtain the candidate characteristic gene set based on the consistency of the unsupervised clustering result and the disease label comprises:
carrying out differential analysis on the expression values of genes in platelets of patients with hysteromyoma and uterine sarcoma, and carrying out gene sequencing according to the difference from large to small to obtain an initial candidate characteristic gene set;
step two, removing the genes with the least difference of gene expression values in the initial candidate characteristic gene set to obtain an updated initial candidate characteristic gene set;
step three, performing unsupervised clustering on patients with hysteromyoma and uterine sarcoma based on the expression values of all initial candidate feature genes in the updated initial candidate feature gene set, and calculating the consistency of clustering results and disease labels;
and step four, iteratively repeating the step two to the step three, wherein the updated initial candidate characteristic gene set obtained in the previous iteration period is used as the initial candidate characteristic gene set in the next iteration period, the consistency between the clustering result and the disease label is not increased any more, and the updated initial candidate characteristic gene set in the previous iteration period is used as the candidate characteristic gene set.
Preferably, the screening the new candidate set of signature genes based on AUC values comprises: and selecting candidate characteristic genes with AUC values larger than a threshold value to form a new candidate characteristic gene set.
Preferably, the training the machine learning model by using the expression values of all genes in the new candidate characteristic gene set and the disease label, and obtaining the trained machine learning model and the corresponding gene combination thereof includes:
the first step, randomly removing 5 genes from all genes in the current new candidate characteristic gene set, then training an SVM model by adopting the expression values of the rest genes and the patient disease label, and recording the accuracy of the SVM model; repeating the operation until all combinations of 5 genes in the new candidate signature gene set are removed;
secondly, selecting a gene set corresponding to the SVM model with highest accuracy from all SVM models obtained through training in the first step as an updated candidate characteristic gene set;
and thirdly, continuously iterating the first step to the second step until the accuracy is not increased any more, taking the candidate characteristic gene set in the previous iteration period as a finally screened biomarker for distinguishing and diagnosing the hysteromyoma and the uterine sarcoma, and simultaneously taking the SVM model in the previous iteration period as a machine learning model for distinguishing and diagnosing the hysteromyoma and the uterine sarcoma.
Preferably, the screening method for determining the biomarkers of uterine lesion type provided by the invention further comprises the steps of: biomarkers were validated.
In a second aspect, the present invention provides a biomarker for discriminating uterine lesion classification obtained by a screening method for discriminating uterine lesion classification according to the first aspect.
Preferably, the biomarkers for discriminating uterine lesion categories provided by the invention include EZH2, COPG1, SUMO3, CLIP1, GSR, SLA2 and TREML1.
Preferably, the discriminating of the uterine lesion type adopts the following method: detecting the content or the expression level of the biomarker of uterine lesion patients of different categories, taking the content or the expression level of the biomarker as the input of a trained machine learning model, and taking the discrimination probability as the output of the trained machine learning model: if the output discrimination probability is greater than 0.5, judging that the uterine lesion is uterine sarcoma; if the output discrimination probability is less than 0.5, the uterine lesion is judged to be hysteromyoma.
In a third aspect the present invention provides the use of a biomarker, or a detection reagent for a biomarker, as described in the second aspect, in the manufacture of a product for discriminating uterine lesion categories.
Preferably, the method for distinguishing uterine lesion type comprises the following steps: detecting the content or the expression level of the biomarker of uterine lesion patients of different categories, taking the content or the expression level of the biomarker as the input of a trained machine learning model, and taking the discrimination probability as the output of the trained machine learning model: if the output discrimination probability is greater than 0.5, judging that the uterine lesion is uterine sarcoma; if the output discrimination probability is less than 0.5, the uterine lesion is judged to be hysteromyoma.
Preferably, the detection reagent for the biomarker comprises a reagent for detecting the content or expression level of the biomarker; and/or the product comprises a reagent, a kit, a test paper, a gene chip, a protein chip, a high throughput sequencing platform or a proteomic analysis product.
In a fourth aspect, the invention provides a product for distinguishing uterine lesion categories, which comprises the biomarker for distinguishing uterine lesion categories according to the second aspect.
The beneficial effects of the invention are as follows: the screening method for the biomarkers for distinguishing uterine lesions and the application thereof provided by the invention are based on platelet transcriptome sequencing data, and are combined with an unsupervised clustering and machine learning method to screen a group of biomarkers capable of distinguishing uterine lesions (hysteromyoma and uterine sarcoma), and a machine learning model with comprehensive information is obtained to distinguish two types of uterine lesions, and the distinguishing performance of the machine learning model can be verified in an independent clinical queue. The invention detects the gene expression of the platelet of the patient, so that the invention can realize noninvasive detection in clinical application; in addition, the quantity of the screened biomarkers is reduced, comprehensive information is contained, higher detection cost is avoided, and the detection can be performed by qPCR technology (real-time fluorescence quantitative polymerase chain reaction technology).
Drawings
Fig. 1 is a flow chart of a screening method for discriminating uterine lesion type biomarkers according to the invention.
Detailed Description
In order to better understand the above technical solutions, the following description will refer to the drawings and specific embodiments.
As shown in fig. 1, an embodiment of the present invention provides a screening method for determining biomarkers of uterine lesion type, including:
s101, aiming at patients with hysteromyoma and uterine sarcoma, counting the expression value of each gene in platelets according to transcriptome sequencing data of the platelets respectively;
s102, performing unsupervised clustering based on the expression values of genes in platelets, and screening to obtain a candidate characteristic gene set based on the consistency of unsupervised clustering results and disease labels, wherein the disease labels are uterine fibroids or uterine sarcomas;
s103, drawing an ROC curve (receiver operating characteristic, a subject working characteristic curve) and calculating an AUC value (area under the curve of ROC, area under the ROC curve) according to the distribution of the expression value of each candidate characteristic gene in the candidate characteristic gene set in all hysteromyoma and hysterosarcoma patients and the disease label, and screening a new candidate characteristic gene set according to the AUC value;
s104, training a machine learning model by using the expression values of all genes in the new candidate characteristic gene set and disease labels to obtain a trained machine learning model and a corresponding gene combination thereof, and taking the gene combination as a biomarker for distinguishing uterine lesion types.
Wherein, in step S101, the transcriptome sequencing data of the platelets may be obtained by the following method:
peripheral blood plasma samples are respectively collected aiming at patients with hysteromyoma and hysterosarcoma, and platelets are separated from the plasma based on a gradient centrifugation experimental method;
extracting RNA from the separated platelets, and reversely transcribing the RNA into complementary DNA;
a sequencing library is constructed for complementary DNA and transcriptome sequencing is performed to obtain transcriptome sequencing data for platelets.
The read data from transcriptome sequencing is then appended back to the human reference genome, and the number of reads that fall at the junction of an exon (exon) and an intron (intron) is counted for each gene as the expression value of that gene in the platelet sample, with reference to the annotation information of the human gene structure.
The step S102 is performed by the following steps:
carrying out differential analysis on the expression values of genes in platelets of patients with hysteromyoma and uterine sarcoma, and carrying out gene sequencing according to the difference from large to small to obtain an initial candidate characteristic gene set; specifically, a wilcoxon rank sum test method can be used to perform a differential analysis of gene expression values between patients with uterine fibroids and uterine sarcomas, to obtain a p-value (tail region probability) that can indicate the degree of differential gene expression. If the p-value is smaller, it is suggested that the expression of the gene has a large difference between patients with hysteromyoma and hysterosarcoma; conversely, if the p-value is greater, it is suggested that the expression of the gene will have a smaller difference between patients with uterine fibroids and uterine sarcomas. When sorting genes, all genes may be sorted in order of p-value size indicating the degree of difference in gene expression. The initial candidate characteristic gene set obtained in the first step contains both genes, and also contains the expression values of the respective genes in platelets and p-value indicating the degree of difference in gene expression.
Step two, removing the genes with the least difference of gene expression values in the initial candidate characteristic gene set to obtain an updated initial candidate characteristic gene set; in the embodiment of the invention, 10 genes with the largest p-value indicating the differential degree of gene expression can be removed from the initial candidate characteristic gene set to obtain an updated initial candidate characteristic gene set.
Step three, performing unsupervised clustering on patients with hysteromyoma and uterine sarcoma based on the expression values of all initial candidate feature genes in the updated initial candidate feature gene set, and calculating the consistency of clustering results and disease labels; specifically, a k-means clustering method may be used, where the number of clusters is designated as 2 (k=2), and the patient is clustered. In addition, a Fisher's exact test (Fisher accuracy test) statistical test method can be adopted to test the consistency of the k-means clustering result of the patient and the disease label thereof, so as to obtain consistency p-value. If the consistency p-value is smaller, the consistency of the k-means clustering result and the disease label is better; otherwise, if the consistency p-value is larger, the poor consistency of the k-means clustering result and the disease label is indicated.
And step four, iteratively repeating the step two to the step three, wherein the updated initial candidate characteristic gene set obtained in the previous iteration period is used as the initial candidate characteristic gene set in the next iteration period until the consistency between the clustering result and the disease label is not increased (specifically, the obtained consistency p-value is not reduced), and the updated initial candidate characteristic gene set in the previous iteration period is used as the candidate characteristic gene set.
Step S103 is performed to draw ROC curves and calculate AUC values based on the distribution of expression values of the gene in all patients and patient disease markers (uterine fibroids or uterine sarcomas) for each candidate characteristic gene screened by step S102. If the AUC value is larger, the effect of the gene expression value for distinguishing patients with hysteromyoma and hysterosarcoma is better. And selecting candidate characteristic genes with AUC values larger than a threshold value to form a new candidate characteristic gene set. Specifically, 0.7 can be used as a threshold value, and all genes with AUC values greater than the threshold value of 0.7 are taken to form a new candidate characteristic gene set.
The execution of step S104 may be performed as follows:
the first step, 5 genes are arbitrarily removed from all genes in the current new candidate characteristic gene set, and then an SVM (support vector machine ) model is trained by adopting the expression values of the rest genes and patient disease labels respectively; and recording the accuracy of the SVM model; repeating the operation until all combinations of 5 genes in the new candidate signature gene set are removed; in a specific training process after 5 genes are arbitrarily removed each time, the expression value of the candidate characteristic genes of each patient is used as the input of a model, and the disease label of the patient is used for checking the standard of the output accuracy of the model. The support vector machine calculates a probability from the input of each patient: if the probability is greater than 0.5, indicating that the model judges the patient to be a uterine sarcoma patient based on the input of the patient; if the probability is less than 0.5, the model determines that the patient is a myoma patient based on the patient's input. The probability is calculated for each patient separately in the above-described manner, and a disease label (uterine sarcoma or myoma) judged on the basis of the model for each patient is obtained according to the above-described rule. Finally, the accuracy of the model was assessed with reference to the patient's actual disease label (uterine sarcoma or myoma).
In the invention, the accuracy can be calculated by using the following formula: accuracy= (true positive + true negative)/(true positive + true negative + false positive + false negative). Wherein positive refers to uterine sarcoma and negative refers to uterine fibroid; true positives mean that for a sample, the machine learning model predicts uterine sarcoma, the true signature of which is also uterine sarcoma; true negative means that for a certain sample, the machine learning model predicts uterine fibroids, the true label of which is also uterine fibroids; false positives refer to that for a certain sample, a machine learning model predicts uterine sarcoma, and the true label of the model is uterine fibroid; false negative means that for a sample, the machine learning model predicts uterine fibroids, the true signature of which is uterine sarcomas.
Secondly, selecting a gene set corresponding to the SVM model with highest accuracy from all SVM models obtained through training in the first step as an updated candidate characteristic gene set; here, "a set of genes corresponding to an SVM model" refers to an input gene used for training the SVM model. The set of genes used as inputs for training each SVM model is different, i.e. "the remaining candidate signature genes after the arbitrary removal of 5 genes" is different. The gene set is input of model training and has no relation with patient disease labels which are taken as model output accuracy judgment standards.
And thirdly, continuously iterating the first step to the second step until the accuracy is not increased any more, taking the candidate characteristic gene set in the previous iteration period as a finally screened biomarker which can be used for distinguishing and diagnosing the hysteromyoma and the uterine sarcoma, and simultaneously taking the SVM model in the previous iteration period as a machine learning model for distinguishing and diagnosing the hysteromyoma and the uterine sarcoma.
The 7 biomarkers obtained by the screening method provided by the invention comprise EZH2, COPG1, SUMO3, CLIP1, GSR, SLA2 and TREML1, and the differentiation and diagnosis of hysteromyoma and uterine sarcoma can be performed based on the expression values of the 7 biomarkers in platelets.
In a preferred embodiment, a screening method for biomarkers for discriminating uterine lesion categories may further comprise the steps of: biomarkers were validated. Specifically, performance of biomarkers and machine learning models was validated based on independent clinical cohorts. In the first step, peripheral blood plasma samples of patients with uterine fibroids and uterine sarcomas were additionally collected independently of the above-described samples for characteristic gene screening and machine learning model training, platelets were then separated from the plasma based on a gradient centrifugation experimental method, and gene expression values in the platelet samples were detected as described in S101. Second, the expression values of 7 biomarkers screened by S104 were extracted for each sample. Thirdly, the SVM model constructed by S104 and used for distinguishing and diagnosing the myoma and the sarcoma of uterus and the expression values of 7 biomarkers in each sample are adopted to distinguish and diagnose the myoma or the sarcoma of uterus of different patients. And fourth, comparing the discrimination diagnosis result obtained in the third step with the actual disease label of the same patient, and calculating the AUC value of the machine learning model, wherein the AUC value is used for measuring the effectiveness of the biomarker screened by the method and the constructed machine learning model. And fifthly, the AUC value calculated in the fourth step is 0.85, which shows that the biomarker and the machine learning model can effectively judge and diagnose hysteromyoma and hysterosarcoma.
The biomarker or the detection reagent of the biomarker can be used for preparing products for distinguishing uterine lesion types. Wherein, the discrimination of the uterine lesion type can be carried out by the following method: detecting the content or the expression level of the biomarker of uterine lesion patients of different categories, taking the content or the expression level of the biomarker as the input of a trained machine learning model, and taking the discrimination probability as the output of the trained machine learning model: if the output discrimination probability is greater than 0.5, judging that the uterine lesion is uterine sarcoma; if the output discrimination probability is less than 0.5, the uterine lesion is judged to be hysteromyoma. The detection reagent for the biomarker may include a reagent for detecting the content or expression level of the biomarker; and/or the product comprises a reagent, a kit, a test paper, a gene chip, a protein chip, a high throughput sequencing platform or a proteomic analysis product.
The invention also provides a product for distinguishing uterine lesion types, and the product comprises the biomarker provided by the invention or a detection reagent of the biomarker.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention. It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (13)

1. A screening method for biomarkers for discriminating uterine lesion categories, comprising:
for patients with hysteromyoma and uterine sarcoma, counting the expression value of each gene in the platelets according to the transcriptome sequencing data of the platelets respectively;
performing unsupervised clustering based on the expression values of genes in platelets, and screening to obtain a candidate characteristic gene set based on the consistency of unsupervised clustering results and disease labels, wherein the disease labels are uterine fibroids or uterine sarcomas;
drawing ROC curves according to the distribution of the expression value of each candidate characteristic gene in the candidate characteristic gene set in all hysteromyoma and hysterosarcoma patients and the disease label, calculating AUC values, and screening a new candidate characteristic gene set according to the AUC values;
and training a machine learning model by using the expression values of all genes in the new candidate characteristic gene set and the disease label to obtain a trained machine learning model and a gene combination corresponding to the machine learning model, and taking the gene combination as a biomarker for distinguishing uterine lesion types.
2. The method of claim 1, wherein the transcriptome sequencing data of platelets is obtained by:
peripheral blood plasma samples are respectively collected aiming at patients with hysteromyoma and hysterosarcoma, and platelets are separated from the plasma based on a gradient centrifugation experimental method;
extracting RNA from the separated platelets, and reversely transcribing the RNA into complementary DNA;
a sequencing library is constructed for complementary DNA and transcriptome sequencing is performed to obtain transcriptome sequencing data for platelets.
3. The method for screening biomarkers for discriminating uterine lesion categories according to claim 1 wherein said performing unsupervised clustering based on gene expression values in platelets and screening based on consistency of unsupervised clustering results with disease signatures to obtain a set of candidate signature genes comprises:
carrying out differential analysis on the expression values of genes in platelets of patients with hysteromyoma and uterine sarcoma, and carrying out gene sequencing according to the difference from large to small to obtain an initial candidate characteristic gene set;
step two, removing the genes with the least difference of gene expression values in the initial candidate characteristic gene set to obtain an updated initial candidate characteristic gene set;
step three, performing unsupervised clustering on patients with hysteromyoma and uterine sarcoma based on the expression values of all initial candidate feature genes in the updated initial candidate feature gene set, and calculating the consistency of clustering results and disease labels;
and step four, iteratively repeating the step two to the step three, wherein the updated initial candidate characteristic gene set obtained in the previous iteration period is used as the initial candidate characteristic gene set in the next iteration period, the consistency between the clustering result and the disease label is not increased any more, and the updated initial candidate characteristic gene set in the previous iteration period is used as the candidate characteristic gene set.
4. The method of claim 1, wherein screening the new candidate signature gene set based on AUC values comprises: and selecting candidate characteristic genes with AUC values larger than a threshold value to form a new candidate characteristic gene set.
5. The method for screening biomarkers for uterine lesion classification according to claim 1, wherein training a machine learning model using expression values of all genes in the new candidate signature gene set and disease tags, the obtaining a trained machine learning model and corresponding gene combinations thereof comprises:
the first step, randomly removing 5 genes from all genes in the current new candidate characteristic gene set, then training an SVM model by adopting the expression values of the rest genes and the patient disease label, and recording the accuracy of the SVM model; repeating the operation until all combinations of 5 genes in the new candidate signature gene set are removed;
secondly, selecting a gene set corresponding to the SVM model with highest accuracy from all SVM models obtained through training in the first step as an updated candidate characteristic gene set;
and thirdly, continuously iterating the first step to the second step until the accuracy is not increased any more, taking the candidate characteristic gene set in the previous iteration period as a finally screened biomarker for distinguishing and diagnosing the hysteromyoma and the uterine sarcoma, and simultaneously taking the SVM model in the previous iteration period as a machine learning model for distinguishing and diagnosing the hysteromyoma and the uterine sarcoma.
6. The method of screening for biomarkers for the discrimination of uterine lesion categories according to claim 1, further comprising the step of: biomarkers were validated.
7. A biomarker for discriminating uterine lesion classification, characterized in that it is obtained by using the screening method for discriminating a biomarker for uterine lesion classification according to any of claims 1-6.
8. The biomarker for discriminating uterine lesion categories according to claim 7 including EZH2, COPG1, SUMO3, CLIP1, GSR, SLA2 and TREML1.
9. The biomarker for discriminating a uterine lesion class according to claim 7, characterized in that the discriminating a uterine lesion class employs the following method: detecting the content or the expression level of the biomarker of uterine lesion patients of different categories, taking the content or the expression level of the biomarker as the input of a trained machine learning model, and taking the discrimination probability as the output of the trained machine learning model: if the output discrimination probability is greater than 0.5, judging that the uterine lesion is uterine sarcoma; if the output discrimination probability is less than 0.5, the uterine lesion is judged to be hysteromyoma.
10. Use of a biomarker or a detection reagent for a biomarker in the manufacture of a product for discriminating uterine lesion categories, wherein the biomarker is a biomarker according to claim 7.
11. The use of claim 10, wherein the discrimination of uterine lesion categories is by the following method: detecting the content or the expression level of the biomarker of uterine lesion patients of different categories, taking the content or the expression level of the biomarker as the input of a trained machine learning model, and taking the discrimination probability as the output of the trained machine learning model: if the output discrimination probability is greater than 0.5, judging that the uterine lesion is uterine sarcoma; if the output discrimination probability is less than 0.5, the uterine lesion is judged to be hysteromyoma.
12. The use of claim 10, wherein the detection reagent for the biomarker comprises a reagent for detecting the level of expression or the content of the biomarker; and/or the product comprises a reagent, a kit, a test paper, a gene chip, a protein chip, a high throughput sequencing platform or a proteomic analysis product.
13. A product for discriminating uterine lesion classification, characterized in that the product comprises the biomarker for discriminating uterine lesion classification according to claim 7.
CN202410081934.6A 2024-01-19 2024-01-19 Screening method of biomarker for distinguishing uterine lesion type and application thereof Withdrawn CN117594133A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410081934.6A CN117594133A (en) 2024-01-19 2024-01-19 Screening method of biomarker for distinguishing uterine lesion type and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410081934.6A CN117594133A (en) 2024-01-19 2024-01-19 Screening method of biomarker for distinguishing uterine lesion type and application thereof

Publications (1)

Publication Number Publication Date
CN117594133A true CN117594133A (en) 2024-02-23

Family

ID=89913798

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410081934.6A Withdrawn CN117594133A (en) 2024-01-19 2024-01-19 Screening method of biomarker for distinguishing uterine lesion type and application thereof

Country Status (1)

Country Link
CN (1) CN117594133A (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005076005A2 (en) * 2004-01-30 2005-08-18 Medizinische Universität Wien A method for classifying a tumor cell sample based upon differential expression of at least two genes
CN103025890A (en) * 2010-04-06 2013-04-03 卡里斯生命科学卢森堡控股 Circulating biomarkers for disease
CN103409501A (en) * 2013-05-10 2013-11-27 新疆医科大学 Method for screening candidate plasma protein markers by cervical carcinoma specificity difference expression
US20180258499A1 (en) * 2013-05-28 2018-09-13 Beijing Normal University Neuroglioma molecular subtyping gene group and use thereof
CN109680060A (en) * 2017-10-17 2019-04-26 华东师范大学 Methylate marker and its application in diagnosing tumor, classification
CN110444248A (en) * 2019-07-22 2019-11-12 山东大学 Cancer Biology molecular marker screening technique and system based on network topology parameters
CN111739581A (en) * 2020-06-12 2020-10-02 大连理工大学 Comprehensive screening method for genome variables
CN112397153A (en) * 2020-11-18 2021-02-23 河南科技大学第一附属医院 Method for screening biomarker for predicting esophageal squamous cell carcinoma prognosis
CN112927757A (en) * 2021-02-24 2021-06-08 河南大学 Gastric cancer biomarker identification method based on gene expression and DNA methylation data
CN113862351A (en) * 2020-06-30 2021-12-31 清华大学 Kit and method for identifying extracellular RNA biomarkers in body fluid sample
CN114582425A (en) * 2022-03-14 2022-06-03 上海交通大学医学院附属仁济医院 NMIBC prognosis prediction molecular marker, screening method and modeling method
CN115287347A (en) * 2022-08-01 2022-11-04 华中农业大学 Asymptomatic mitral valve myxomatosis-like lesion biomarker for dogs and application thereof

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005076005A2 (en) * 2004-01-30 2005-08-18 Medizinische Universität Wien A method for classifying a tumor cell sample based upon differential expression of at least two genes
CN103025890A (en) * 2010-04-06 2013-04-03 卡里斯生命科学卢森堡控股 Circulating biomarkers for disease
CN103409501A (en) * 2013-05-10 2013-11-27 新疆医科大学 Method for screening candidate plasma protein markers by cervical carcinoma specificity difference expression
US20180258499A1 (en) * 2013-05-28 2018-09-13 Beijing Normal University Neuroglioma molecular subtyping gene group and use thereof
CN109680060A (en) * 2017-10-17 2019-04-26 华东师范大学 Methylate marker and its application in diagnosing tumor, classification
CN110444248A (en) * 2019-07-22 2019-11-12 山东大学 Cancer Biology molecular marker screening technique and system based on network topology parameters
CN111739581A (en) * 2020-06-12 2020-10-02 大连理工大学 Comprehensive screening method for genome variables
CN113862351A (en) * 2020-06-30 2021-12-31 清华大学 Kit and method for identifying extracellular RNA biomarkers in body fluid sample
CN112397153A (en) * 2020-11-18 2021-02-23 河南科技大学第一附属医院 Method for screening biomarker for predicting esophageal squamous cell carcinoma prognosis
CN112927757A (en) * 2021-02-24 2021-06-08 河南大学 Gastric cancer biomarker identification method based on gene expression and DNA methylation data
CN114582425A (en) * 2022-03-14 2022-06-03 上海交通大学医学院附属仁济医院 NMIBC prognosis prediction molecular marker, screening method and modeling method
CN115287347A (en) * 2022-08-01 2022-11-04 华中农业大学 Asymptomatic mitral valve myxomatosis-like lesion biomarker for dogs and application thereof

Similar Documents

Publication Publication Date Title
CN112888459B (en) Convolutional neural network system and data classification method
CN110444248B (en) Cancer biomolecule marker screening method and system based on network topology parameters
CN110100013A (en) Method and system for lesion detection
US11929148B2 (en) Systems and methods for enriching for cancer-derived fragments using fragment size
US20200219587A1 (en) Systems and methods for using fragment lengths as a predictor of cancer
CN111778326B (en) Gene marker combination for endometrial receptivity assessment and application thereof
KR20210113237A (en) Characterization of cell-free DNA ends
KR20200080272A (en) Use of nucleic acid size ranges for non-invasive prenatal testing and cancer detection
US20220336043A1 (en) cfDNA CLASSIFICATION METHOD, APPARATUS AND APPLICATION
TW201920683A (en) Enhancement of cancer screening using cell-free viral nucleic acids
CN113362893A (en) Construction method and application of tumor screening model
KR101990430B1 (en) System and method of biomarker identification for cancer recurrence prediction
CN117594133A (en) Screening method of biomarker for distinguishing uterine lesion type and application thereof
US20220042106A1 (en) Systems and methods of using cell-free nucleic acids to tailor cancer treatment
CN115803448A (en) Micronucleus DNA from peripheral red blood cells and uses thereof
KR20230007010A (en) Method and system for predicting metabolic disease risk
KR102225231B1 (en) IDENTIFYING METHOD FOR TUMOR PATIENT BASED ON miRNA IN EXOSOME AND APPARATUS FOR THE SAME
CN113393901B (en) Glioma sorting device based on tumor nucleic acid is gathered to monocyte
CN115678999B (en) Application of marker in lung cancer recurrence prediction and prediction model construction method
WO2023102786A1 (en) Application of gene marker in prediction of premature birth risk of pregnant woman
CN116844638A (en) Child acute leukemia typing system and method based on high-throughput transcriptome sequencing
KR20230059423A (en) Method for diagnosing and predicting cancer type using methylated cell free DNA
CN106909767B (en) System for classifying hepatitis B-related cirrhosis
CN117766028A (en) Method and device for predicting sample sources based on methylation differences
CN117095745A (en) Method and device for detecting fetal aneuploidy and copy number variation in maternal plasma free DNA and application thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20240223

WW01 Invention patent application withdrawn after publication