CN110570951A - Method for constructing classification model of new auxiliary chemotherapy curative effect of breast cancer - Google Patents

Method for constructing classification model of new auxiliary chemotherapy curative effect of breast cancer Download PDF

Info

Publication number
CN110570951A
CN110570951A CN201910841963.7A CN201910841963A CN110570951A CN 110570951 A CN110570951 A CN 110570951A CN 201910841963 A CN201910841963 A CN 201910841963A CN 110570951 A CN110570951 A CN 110570951A
Authority
CN
China
Prior art keywords
breast cancer
coverage
tsss
curative effect
analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910841963.7A
Other languages
Chinese (zh)
Inventor
胥顺
李坤
杨学习
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Xiongji Bioinformatics Technology Co Ltd
Original Assignee
Guangzhou Xiongji Bioinformatics Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Xiongji Bioinformatics Technology Co Ltd filed Critical Guangzhou Xiongji Bioinformatics Technology Co Ltd
Priority to CN201910841963.7A priority Critical patent/CN110570951A/en
Publication of CN110570951A publication Critical patent/CN110570951A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/20Polymerase chain reaction [PCR]; Primer or probe design; Probe optimisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Chemical & Material Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Primary Health Care (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Epidemiology (AREA)
  • Biotechnology (AREA)
  • Public Health (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Toxicology (AREA)
  • Medicinal Chemistry (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a method for constructing a classification model of new auxiliary chemotherapy curative effect of breast cancer, which obtains a classification model by obtaining plasma cfDNA sequencing data of patients with known curative effect, carrying out TSSs (sequence-specific sequences) region coverage analysis, determining coverage difference and then carrying out cluster analysis. The model constructed by the method can predict the curative effect of the breast cancer patient neoadjuvant chemotherapy, determine the treatment strategy of the breast cancer and guide the breast cancer neoadjuvant chemotherapy. The data required by the model of the invention is derived from peripheral blood, belongs to the non-invasive detection category, and overcomes the defects of poor clinical experience, incapability of solving tumor heterogeneity, puncture contraindication, tumor propagation and metastasis risks and the like caused by invasive puncture sampling required by the existing material for predicting the new auxiliary chemotherapy curative effect of breast cancer.

Description

Method for constructing classification model of new auxiliary chemotherapy curative effect of breast cancer
Technical Field
the invention relates to a method and a kit for constructing a classification model, in particular to a method for constructing a new auxiliary breast cancer chemotherapy curative effect classification model by using TSSs (specific sequences) areas in plasma cfDNA (cfDNA).
background
Neoadjuvant chemotherapy (NACT) is currently used in patients with locally advanced breast cancer, or in patients with indications of adjuvant chemotherapy and a strong need for breast protection. The following 4 advantages are mainly achieved: first, it can make unresectable tumors resectable; secondly, the chance of milk preservation is increased; for patients at locally advanced and distant risk of metastasis, preoperative systemic treatment is expected to improve patient survival; shrinkage and imaging changes of the tumor can be observed and rapid assessment of clinical efficacy can be made. Therefore, the use of the neoadjuvant chemotherapy can obviously reduce the size of the tumor before the operation, reduce the clinical stage, reduce the involvement of axillary lymph nodes, avoid the total mastectomy and contribute to improving the life quality of patients.
the response of neoadjuvant chemotherapy is usually evaluated as a pathologically complete response (pCR), i.e., at the time of final surgery, there is no invasive breast cancer tissue in both the primary tumor bed and regional lymph nodes. The results of pCR are closely related to the improvement of overall survival and disease-free survival. The response rate of the neoadjuvant chemotherapy to HER2 positive and triple negative breast cancer is high, the proportion of HER2 positive reaching pCR can be up to more than 30-60%, and the proportion of triple negative breast cancer can be up to 20-30%. For ER positive patients, the pCR rate of luminal A patients is only about 5%, and the pCR rate has little relation with the prognosis of the long term. For Luminal B type patients, most of breast cancer patients are treated in China, at least more than half of the patients can achieve the purpose of obviously reducing the tumor volume after the new adjuvant chemotherapy, but still have a few invasive cells and obviously improve the prognosis, so the Luminal B type new adjuvant chemotherapy can be used for shortening the tumor reduction period without using pCR as the new adjuvant therapy. RECIST1.0 can be divided into four categories according to the evaluation criteria of the curative effect of solid tumors: CR (disappearance of all lesions for 4 weeks), PR (reduction of 30%, maintenance for 4 weeks), SD (no PR/PD) and PD (increase of 20%, no CR/PR/SD before lesion increase).
although greater progress has been made in neoadjuvant chemotherapy over the last 30 years, 20-30% of patients have poor sensitivity to neoadjuvant chemotherapy regimens, resulting in delayed disease, missed optimal treatment opportunities, and over-treatment. Therefore, it is very important to make a personalized treatment plan for breast cancer according to the sensitivity prediction index, and the blindness of treatment can be avoided. The traditional auxiliary chemotherapy curative effect prediction scheme mainly aims at 21 gene recurrence risk assessment of early-stage breast cancer patients with ER positive and lymph node non-metastasis, 70 gene expression profile detection is newly added in breast cancer guidelines of NCCN, ASCO and CSCO in 2019, the range of the traditional auxiliary chemotherapy curative effect prediction scheme is mainly aimed at 1-3 patients with axillary lymph node metastasis for breast cancer patients with ER or PR positive and Her-2 negative, the detection has obvious significance for treatment decisions of the early-stage breast cancer patients and assessment of distant metastasis risk of the early-stage invasive breast cancer patients within 5 years, and therefore a part of breast cancer patients are prevented from receiving auxiliary chemotherapy on the basis of not influencing long-term curative effect. At present, no standard exists for predicting the curative effect of the breast cancer neoadjuvant chemotherapy, and a possible scheme is to use a cancer tissue puncture specimen before the neoadjuvant chemotherapy to carry out 21 gene recurrence risk assessment or 70 gene expression profile detection, so as to realize the prediction of the curative effect and the determination of the scheme. However, the clinical application of the punctured tissue specimen is limited due to the characteristics of tumor heterogeneity, small sample size and the like.
In the last few years, liquid biopsy has been increasingly used for early diagnosis of cancer, prognosis prediction, risk assessment of recurrence, continuous sampling to monitor the direction of disease progression and treatment response. Compared with the traditional tissue biopsy, the liquid biopsy has the advantages of non-invasiveness, homogenization, continuous sampling and the like. The markers currently used to monitor breast cancer are tumor antigens 15-3 and 27-29, but their sensitivity is estimated to be only 60-70%. Besides, the detection rate and the prognosis value of circulating tumor DNA (ctDNA), Circulating Tumor Cells (CTCs), exosomes (exosomes) and the like in the plasma of a patient have a large relationship. Wherein the gene mutations of FGFR, ErBB2(HER2), ErBB3(HER3), PI3K, BRCA1/2 and the like in ctDNA provide references for the treatment strategy of metastatic breast cancer.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a method for constructing a new auxiliary chemotherapy curative effect classification model of breast cancer.
The technical scheme adopted by the invention is as follows:
The inventors have found that cell-free DNA (cfDNA) is mainly derived from apoptotic cells, and cfDNA is not degraded because it is protected by nucleosomes. Nucleosome imprinting of the genome is closely related to the level of gene transcription, and binding of nucleosomes is significantly reduced in the region near the Transcription Start Sites (TSSs) of gene expression. Therefore, the invention utilizes whole genome sequencing of cfDNA in plasma to analyze the difference of TSSs regional coverage, constructs a classification model for predicting the neoadjuvant chemotherapy curative effect of breast cancer, provides a wider and reliable data source for breast cancer treatment, and has a wider prospect.
A method for constructing a classification model of the neoadjuvant chemotherapy curative effect of breast cancer comprises the following steps:
Acquisition of reads of cfDNA:
obtaining plasma cfDNA sequencing data of different breast cancer typing patients with known chemotherapy results, and obtaining reads data of the cfDNA;
TSSs regional coverage analysis:
Extracting genome position information of upstream and downstream regions of a gene transcription initiation site TSSs, and defining the genome position information as an original promoter region;
Comparing reads of the cfDNA to hg19 of a human reference genome, removing a repeated sequence in comparison data, counting the reads of original promoter regions of all protein coding genes, and then carrying out standardization processing on the coverage of TSSs regions;
coverage difference analysis:
screening out gene promoter regions with different TSSs (specific sequences of genes) region coverage in each group by using a Kruskal-Wallis nonparametric single-factor variance analysis method, correcting an original P value, screening out the gene promoter regions with FDR (fully drawn Ribose) values less than 0.1, and finally determining the different TSSs regions;
Clustering analysis:
carrying out sample-to-sample standardization on TSSs regional coverage data, and carrying out cluster analysis by adopting a hierarchical clustering method according to the correlation of promoter coverage among samples; and (3) clustering sample types into a pCR group and an npCR group or a PR group and an SD group according to the coverage difference of the original promoter region and the known curative effect condition of the patient to obtain a new auxiliary chemotherapy curative effect classification model of the breast cancer.
In some examples of methods, the primary promoter region is 1Kb upstream and downstream of TSSs, the transcription start site of a gene.
In some examples of methods, the cfDNA sequencing data amount per sample is not less than 6M.
In some examples of methods, the breast cancer typing comprises luminal B, Her-2 positive, and triple negative.
In some examples of the methods, the drug used in the chemotherapy comprises at least one of epirubicin, cyclophosphamide, paclitaxel, carboplatin, gemcitabine, capecitabine.
In some example methods, cluster analysis data is visualized.
In some examples of methods, the software for visualization is Treeview software or an R language heatmap function.
In some examples, the TSSs regional coverage analysis is performed by a normalization method selected from RPKM normalization, FPKM normalization, or a calculation method as follows:
Normalized value is the number of reads in the region/number of reads on the total alignment.
In some examples of methods, the raw P values are corrected using the holm method when coverage difference analysis is performed.
in some examples of methods, the data for gene promoter coverage is normalized between samples using the Z-score normalization method, which is calculated as: standard posterior valuewhere x is the original value, μ is the sample mean, and σ is the sample variance.
The invention has the beneficial effects that:
The model constructed by the method can realize the prediction of the curative effect of the new auxiliary chemotherapy of the breast cancer patient by the plasma cfDNA sequence of the patient to be detected, determine the treatment strategy of the breast cancer and guide the new auxiliary chemotherapy of the breast cancer.
The data required by the model of the invention is derived from peripheral blood, belongs to the non-invasive detection category, and overcomes the defects of poor clinical experience, incapability of solving tumor heterogeneity, puncture contraindication, tumor propagation and metastasis risks and the like caused by invasive puncture sampling required by the existing material for predicting the new auxiliary chemotherapy curative effect of breast cancer.
Drawings
FIG. 1 clustering results of differences in TSSs regional coverage in plasma cfDNA before chemotherapy for breast cancer PR and SD groups in combination of epirubicin, cyclophosphamide, and paclitaxel.
FIG. 2 clustering results of differences in TSSs regional coverage in pre-chemotherapy plasma cfDNA of breast cancer pCR and npCR groups for epirubicin, cyclophosphamide, paclitaxel combination regimens.
FIG. 3 clustering results of differences in TSSs regional coverage in pre-chemotherapy plasma cfDNA of breast cancer pCR and npCR groups for epirubicin, cyclophosphamide, paclitaxel, and carboplatin combination regimens.
Detailed Description
A method for constructing a classification model of the neoadjuvant chemotherapy curative effect of breast cancer comprises the following steps:
Acquisition of reads of cfDNA:
obtaining plasma cfDNA sequencing data of different breast cancer typing patients with known chemotherapy results, and obtaining reads data of the cfDNA;
TSSs regional coverage analysis:
Extracting genome position information of upstream and downstream regions of a gene transcription initiation site TSSs, and defining the genome position information as an original promoter region;
Comparing reads of the cfDNA to hg19 of a human reference genome, removing a repeated sequence in comparison data, counting the reads of original promoter regions of all protein coding genes, and then carrying out standardization processing on the coverage of TSSs regions;
Coverage difference analysis:
Screening out gene promoter regions with different TSSs (specific sequences of genes) region coverage in each group by using a Kruskal-Wallis nonparametric single-factor variance analysis method, correcting an original P value, screening out the gene promoter regions with FDR (fully drawn Ribose) values less than 0.1, and finally determining the different TSSs regions;
Clustering analysis:
carrying out sample-to-sample standardization on TSSs regional coverage data, and carrying out cluster analysis by adopting a hierarchical clustering method according to the correlation of promoter coverage among samples; and (3) clustering sample types into a pCR group and an npCR group or a PR group and an SD group according to the coverage difference of the original promoter region and the known curative effect condition of the patient to obtain a new auxiliary chemotherapy curative effect classification model of the breast cancer.
Based on the classification model, the classification of the patient to be tested can be accurately determined by analyzing the sequencing data of the plasma cfDNA of the patient, so that a corresponding new auxiliary chemotherapy strategy can be conveniently formulated, and the classification model has a good guidance effect.
In some examples of methods, the primary promoter region is 1Kb upstream and downstream of TSSs, the transcription start site of a gene. Therefore, omission can be avoided, and the accuracy of data can be improved.
In some examples of methods, the cfDNA sequencing data amount per sample is not less than 6M. This is advantageous to exclude invalid or less reliable data.
In some examples of methods, the breast cancer typing comprises luminal B, Her-2 positive, and triple negative.
In some examples of the methods, the drug used in the chemotherapy comprises at least one of epirubicin, cyclophosphamide, paclitaxel, carboplatin, gemcitabine, capecitabine. The medicines are common medicines for breast cancer neoadjuvant chemotherapy.
in some example methods, cluster analysis data is visualized. This allows a more intuitive display of the results.
In some examples of the method, the visualization software is a common visualization scheme such as Treeview software or an R language heatmap function.
In some examples, the TSSs regional coverage analysis is performed by a normalization method selected from RPKM normalization, FPKM normalization, or a calculation method as follows:
Normalized value is the number of reads in the region/number of reads on the total alignment.
by normalizing the data, the influence of sample differences on the accuracy of the result can be reduced.
In some examples of methods, the raw P values are corrected using the holm method when coverage difference analysis is performed. .
In some examples of methods, the data for gene promoter coverage is normalized between samples using the Z-score normalization method, which is calculated as: standard posterior valueWhere x is the original value, μ is the sample mean, and σ is the sample variance. Specifically, the standardization process may be performed by using conventional Cluster software or the like. .
The technical scheme of the invention is further explained by combining the embodiment. The examples are intended to further illustrate the technical solutions of the present invention and should not be construed as limiting the scope of the present invention.
The detection reagents related to the following examples include plasma cfDNA extraction reagents, library construction reagents, sequencing chips, and the like. The library construction kit comprises end repair reagents, linker ligation reagents and PCR amplification reagents, which are all conventional reagents in the field and can be obtained commercially.
Example 1 prediction of the efficacy of breast cancer neoadjuvant chemotherapy Using a combination regimen of epirubicin, cyclophosphamide, and paclitaxel
Step 1: collecting samples: 26 cases of peripheral blood of breast cancer patients before new adjuvant chemotherapy by adopting a combined scheme of epirubicin, cyclophosphamide and paclitaxel are collected, treatment effect evaluation is carried out on samples after 8 cycles of chemotherapy, and according to RECIST1.0 treatment effect evaluation standards, 18 cases of PR effects are achieved, 8 cases of SD effects are achieved, 7 cases of pCR patients are achieved according to whether the pCR standards are achieved, 19 cases of pCR (npCR) patients are not achieved, and the details are shown in Table 1.
Step 2: plasma separation: the collected peripheral blood was centrifuged at 4 ℃ and 1600g for 10min and the supernatant at 4 ℃ and 16000g for 10min, taking care not to aspirate leukocytes.
And step 3: plasma cfDNA extraction: plasma cfDNA was extracted using a plasma free DNA extraction kit (MagaBio corporation) and strictly performed as described in the specification, and the extracted DNA was subjected to concentration measurement using Qubit 3.0(Life Technologies).
and 4, step 4: library construction and quantification: DNA was subjected to end repair using Ion Plus Fragment Library Kit (Life technologies Co.), magnetic bead purification, and then both ends were added with sequencing adapters and amplified to obtain a sequencing Library, and the concentration was determined using Qubit 3.0.
and 5: high-throughput sequencing: 10 libraries were mixed in equal amounts using Ion PITMHi-QTMThe OT 2200 Kit (Life Technologies) pair was used for template preparation and template enrichment. Using Ion PITM Hi-QTMSequencing Kit (Life Technologies) for Sequencing, Ion Torrent PitonTMthe sequencer chose the preset program flow number of 500, with each sample size greater than 6M).
step 6: bioinformatics analysis
(1) TSSs regional coverage analysis
Downloading the information of human protein coding genes from a RefSeq database of UCSC, extracting the genome position information of 1Kb region of upstream and downstream of a transcription start site TSSs of each gene according to database annotation information, comparing an original sequencing read to a human reference genome hg19 by using Bowtie software, removing a repeated sequence in comparison data, counting the reads of original promoter regions of all protein coding genes, and processing the data by using an RPKM (normalized regression method).
(2) Coverage difference analysis
by Kruskal-Wallis nonparametric single-factor variance analysis, grouping is respectively carried out according to a PR group and an SD group, and a pCR group and an npCR group, the numerical difference of the coverage between the groups is compared, the original P value is corrected by a holm method to obtain an FDR value, and a TSSs region with the FDR value less than 0.1 is obtained by screening.
(3) Cluster analysis
and (3) carrying out standardization among samples on the data of the gene promoter coverage by using Cluster software, carrying out Cluster analysis according to the correlation of the promoter coverage among the samples by adopting a hierarchical clustering method, and carrying out visual display on the data by using a pheatmap package of an R language.
the results are shown in FIGS. 1 and 2. Based on the free DNA in the plasma of 26 cases of breast cancer patients adopting the combination scheme of epirubicin, cyclophosphamide and paclitaxel, high-throughput detection is carried out, the TSSs regional coverage is analyzed, and the effect of the breast cancer patients on the combination scheme of epirubicin, cyclophosphamide and paclitaxel is evaluated according to the evaluation standards of PR and SD. We found that the PR group and the SD group have different promoter coverage of 227 genes (Table 2). Among them, the coverage of 45 genes was increased in the SD group, and the coverage of 182 genes was decreased in the SD group. Unsupervised clustering was performed by differential promoter coverage, and the results found that the coverage difference of gene promoters can group patients in PR group and SD group into two groups, respectively, and the coverage patterns between the two groups are significantly different (fig. 1).
If pCR and npCR are taken as indexes, the effect of the breast cancer patient on the combination scheme of epirubicin, cyclophosphamide and paclitaxel is evaluated. We found that the pCR group and the npCR group had a difference in promoter coverage over 398 genes (Table 3). Among them, the coverage of 73 genes was increased in the pCR group, and the coverage of 325 genes was decreased in the pCR group. Unsupervised clustering was performed by differential promoter coverage, and as a result, it was found that the coverage difference of gene promoters can group the patients in pCR group and npCR group into two groups, respectively, and the coverage patterns between the two groups are significantly different (fig. 2).
Example 2 prediction of the efficacy of the combination regimen of the neoadjuvant chemotherapy for breast cancer, epirubicin, cyclophosphamide, paclitaxel and carboplatin
Step 1: 8 cases of peripheral blood of breast cancer patients before new adjuvant chemotherapy by adopting a combination scheme of epirubicin, cyclophosphamide, paclitaxel and carboplatin are collected, and treatment effect evaluation is carried out on samples after 8 cycles of chemotherapy, wherein the treatment effect reaches 3 cases of pCR patients, and does not reach 5 cases of pCR (npCR) patients, and the treatment effect is specifically shown in Table 1.
Step 2-6 the same as step 2-6 of example 1, the grouping was performed only according to the grouping of pCR and npCR.
as shown in FIG. 3, based on the high-throughput detection of plasma free DNA of 8 breast cancer patients who used a combination of epirubicin, cyclophosphamide, paclitaxel and carboplatin and the analysis of TSSs regional coverage, the difference of promoter coverage of 555 genes coexisting in the pCR group and the npCR group was found (Table 4). Wherein the coverage of 228 genes was increased among the pCR groups and the coverage of 327 genes was decreased among the pCR groups. Unsupervised clustering was performed by differential promoter coverage, and as a result, it was found that the coverage difference of gene promoters can group the patients in pCR group and npCR group into two groups, respectively, and the coverage patterns between the two groups are significantly different (fig. 3).
TABLE 1, 34 breast cancer patient information
TABLE 2 plasma cfDNA Difference TSSs regions before chemotherapy in PR and SD groups of breast cancer and in combination with epirubicin, cyclophosphamide and paclitaxel and List of related genes
TABLE 3 Breast cancer pCR and npCR groups of combination of epirubicin, cyclophosphamide, paclitaxel regimens plasma cfDNA differential TSSs regions and related Gene lists before chemotherapy
TABLE 4 Breast cancer pCR and npCR groups of combination regimens of epirubicin, cyclophosphamide, paclitaxel, carboplatin plasma cfDNA differential TSSs regions and related Gene lists before chemotherapy

Claims (10)

1. a method for constructing a classification model of the neoadjuvant chemotherapy curative effect of breast cancer comprises the following steps:
acquisition of reads of cfDNA:
Obtaining plasma cfDNA sequencing data of different breast cancer typing patients with known chemotherapy results, and obtaining reads data of the cfDNA;
TSSs regional coverage analysis:
Extracting genome position information of upstream and downstream regions of a gene transcription initiation site TSSs, and defining the genome position information as an original promoter region;
comparing reads of the cfDNA to hg19 of a human reference genome, removing a repeated sequence in comparison data, counting the reads of original promoter regions of all protein coding genes, and then carrying out standardization processing on the coverage of TSSs regions;
Coverage difference analysis:
Screening out gene promoter regions with different TSSs (specific sequences of genes) region coverage in each group by using a Kruskal-Wallis nonparametric single-factor variance analysis method, correcting an original P value, screening out the gene promoter regions with FDR (fully drawn Ribose) values less than 0.1, and finally determining the different TSSs regions;
Clustering analysis:
Carrying out sample-to-sample standardization on TSSs regional coverage data, and carrying out cluster analysis by adopting a hierarchical clustering method according to the correlation of promoter coverage among samples; and (3) clustering sample types into a pCR group and an npCR group or a PR group and an SD group according to the coverage difference of the original promoter region and the known curative effect condition of the patient to obtain a new auxiliary chemotherapy curative effect classification model of the breast cancer.
2. the method of claim 1, wherein: the original promoter region is a region 1Kb upstream and downstream of the gene transcription initiation sites TSSs.
3. The method of claim 1, wherein: the cfDNA sequencing data amount of each sample is not less than 6M.
4. The method of claim 1, wherein: the breast cancer typing includes luminal B, Her-2 positive and triple negative.
5. The method of claim 1, wherein: the chemotherapy drugs include at least one of epirubicin, cyclophosphamide, paclitaxel, carboplatin, gemcitabine, and capecitabine.
6. the method of claim 1, wherein: and carrying out visual processing on the clustering analysis data.
7. the method of claim 6, wherein: the software for visualization processing is Treeview software or an R language heatmap function.
8. The method of claim 1, wherein: in the TSSs regional coverage analysis, the normalization treatment method is selected from an RPKM normalization method, an FPKM normalization method or is calculated according to the following method:
Normalized value is the number of reads in the region/number of reads on the total alignment.
9. The method of claim 1, wherein: in the coverage difference analysis, the original P value was corrected by the holm method.
10. The method of claim 1, wherein: in clustering analysis, the Z-score standardization method is used for carrying out the standardization among samples on the data of the gene promoter coverage, and the calculation formula is as follows: standard posterior valueWhere x is the original value, μ is the sample mean, and σ is the sample variance.
CN201910841963.7A 2019-09-06 2019-09-06 Method for constructing classification model of new auxiliary chemotherapy curative effect of breast cancer Pending CN110570951A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910841963.7A CN110570951A (en) 2019-09-06 2019-09-06 Method for constructing classification model of new auxiliary chemotherapy curative effect of breast cancer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910841963.7A CN110570951A (en) 2019-09-06 2019-09-06 Method for constructing classification model of new auxiliary chemotherapy curative effect of breast cancer

Publications (1)

Publication Number Publication Date
CN110570951A true CN110570951A (en) 2019-12-13

Family

ID=68778195

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910841963.7A Pending CN110570951A (en) 2019-09-06 2019-09-06 Method for constructing classification model of new auxiliary chemotherapy curative effect of breast cancer

Country Status (1)

Country Link
CN (1) CN110570951A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113702637A (en) * 2021-08-09 2021-11-26 西北大学 Lectin test carrier, kit and prediction model for predicting neoadjuvant chemotherapy curative effect of breast cancer
CN116083578A (en) * 2022-12-15 2023-05-09 华中科技大学同济医学院附属同济医院 System and method for predicting cervical cancer newly assisted chemotherapy effect or recurrent high-risk classification

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109439753A (en) * 2018-11-28 2019-03-08 四川大学华西医院 Detect application and the construction method of patient with breast cancer's NAC outcome prediction model of the reagent of gene expression dose
CN109680049A (en) * 2018-12-03 2019-04-26 东南大学 A kind of method and its application based on the dissociative DNA in blood high-flux sequence analysis affiliated individual physiological state of cfDNA
CN110106244A (en) * 2019-06-06 2019-08-09 广州市雄基生物信息技术有限公司 A kind of noninvasive molecule parting kit of breast cancer and method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109439753A (en) * 2018-11-28 2019-03-08 四川大学华西医院 Detect application and the construction method of patient with breast cancer's NAC outcome prediction model of the reagent of gene expression dose
CN109680049A (en) * 2018-12-03 2019-04-26 东南大学 A kind of method and its application based on the dissociative DNA in blood high-flux sequence analysis affiliated individual physiological state of cfDNA
CN110106244A (en) * 2019-06-06 2019-08-09 广州市雄基生物信息技术有限公司 A kind of noninvasive molecule parting kit of breast cancer and method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113702637A (en) * 2021-08-09 2021-11-26 西北大学 Lectin test carrier, kit and prediction model for predicting neoadjuvant chemotherapy curative effect of breast cancer
CN116083578A (en) * 2022-12-15 2023-05-09 华中科技大学同济医学院附属同济医院 System and method for predicting cervical cancer newly assisted chemotherapy effect or recurrent high-risk classification

Similar Documents

Publication Publication Date Title
CN113257350B (en) ctDNA mutation degree analysis method and device based on liquid biopsy and ctDNA performance analysis device
CN106156543B (en) A kind of tumour ctDNA information statistical method
CN109830264B (en) Method for classifying tumor patients based on methylation sites
CN111863250B (en) Combined diagnosis model and system for early breast cancer
CN108588230A (en) A kind of marker and its screening technique for breast cancer diagnosis
Aghamaleki et al. Application of an artificial neural network in the diagnosis of chronic lymphocytic leukemia
CN113838533A (en) Cancer detection model and construction method and kit thereof
CN110570951A (en) Method for constructing classification model of new auxiliary chemotherapy curative effect of breast cancer
CN114203256B (en) MIBC typing and prognosis prediction model construction method based on microbial abundance
Zeng et al. Cell-free DNA from bronchoalveolar lavage fluid (BALF): a new liquid biopsy medium for identifying lung cancer
CN108048460A (en) A kind of New molecular marker and its application in preparing for the kit of head and neck cancer diagnosis and prognosis
CN110004229A (en) Application of the polygenes as EGFR monoclonal antibody class Drug-resistant marker
CN111763740B (en) System for predicting treatment effect and prognosis of neoadjuvant radiotherapy and chemotherapy of esophageal squamous carcinoma patient based on lncRNA molecular model
EP3347492B1 (en) Methods for diagnosis of cancer
WO2018049506A1 (en) Mirna prostate cancer marker
CN115820860A (en) Method for screening non-small cell lung cancer marker based on methylation difference of enhancer, marker and application thereof
CN117316278A (en) Cancer noninvasive early screening method and system based on cfDNA fragment length distribution characteristics
CN113470754A (en) Gene marker for tumor prognosis evaluation, evaluation product and application
CN112037851A (en) Application of autophagy-related gene in kit and system for colorectal cancer prognosis
KR102491322B1 (en) Preparation Method Using Multi-Feature Prediction Model for Cancer Diagnosis
CN110890128A (en) Grading model for detecting benign and malignant degree of skin tumor and application thereof
CN116287252B (en) Application of long-chain non-coding RNA APCDD1L-DT in preparation of pancreatic cancer detection products
Yang et al. Advanced Glycation End Products’ Receptor DNA Methylation Associated with Immune Infiltration and Prognosis of Lung Adenocarcinoma and Lung Squamous Cell Carcinoma
US20240038335A1 (en) Systems and methods for detecting disease subtypes
CN102766678B (en) Detection method, probe set, and diagnostic kit for predicting postoperation recurrence-free survival of colorectal cancer via gene counting method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination