CN114999568A - Calculation method of telomere allele imbalance TAI - Google Patents

Calculation method of telomere allele imbalance TAI Download PDF

Info

Publication number
CN114999568A
CN114999568A CN202210674201.4A CN202210674201A CN114999568A CN 114999568 A CN114999568 A CN 114999568A CN 202210674201 A CN202210674201 A CN 202210674201A CN 114999568 A CN114999568 A CN 114999568A
Authority
CN
China
Prior art keywords
afr
hrd
tai
score
window
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210674201.4A
Other languages
Chinese (zh)
Other versions
CN114999568B (en
Inventor
楼峰
孙宏
刘凯
张萌萌
郭璟
曹善柏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Xiangxin Biotechnology Co ltd
Tianjin Xiangxin Medical Instrument Co ltd
Beijing Xiangxin Biotechnology Co ltd
Original Assignee
Tianjin Xiangxin Biotechnology Co ltd
Tianjin Xiangxin Medical Instrument Co ltd
Beijing Xiangxin Biotechnology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Xiangxin Biotechnology Co ltd, Tianjin Xiangxin Medical Instrument Co ltd, Beijing Xiangxin Biotechnology Co ltd filed Critical Tianjin Xiangxin Biotechnology Co ltd
Priority to CN202210674201.4A priority Critical patent/CN114999568B/en
Publication of CN114999568A publication Critical patent/CN114999568A/en
Application granted granted Critical
Publication of CN114999568B publication Critical patent/CN114999568B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The application belongs to the technical field of gene detection, and particularly discloses a calculation method of telomere allele imbalance TAI, which can be used for evaluating HRD score. The application has at least one of the following beneficial effects: the method for evaluating the HRD score based on the low-depth WGS provided by the application is used for analyzing on the basis of data formed by sequencing of the low-depth WGS, so that the cost is greatly reduced, and the method is beneficial to large-scale application.

Description

Calculation method of telomere allele imbalance TAI
Technical Field
The application belongs to the technical field of gene detection, and particularly relates to a calculation method of telomere allele imbalance TAI.
Background
DNA double strand breaks (double strand breaks) are a type of DNA damage, which can cause chromosome breakage and rearrangement in severe cases, and because there is no complementary strand to repair, DNA sequences are difficult to recover, resulting in loss of genetic information, and such DNA double strand breaks require homologous recombination to repair. When HRD occurs in the absence of the repair ability of homologous recombination, the genome loses stability, and DNA damage is likely to accumulate in the case of unstable genome, resulting in a vicious circle and cancer. HRD has important guiding significance for the use of platinum or PARP inhibitors.
HRD is generally caused by genetic variation or apparent variation in homologous recombination repair pathways including genes such as BRCA1/2, Rad52/Rad22, PALB2, RAD51 family, BRIP1/BACH1, ATM and CHEK 2. Studies have shown that women with the BRCA1 mutation have a 50-85% and 15-45% probability of acquiring breast and ovarian cancer. In breast cancer, the genetic BRCA1/2 variation accounts for approximately 7%, while in triple negative breast cancers 11% -15% can be achieved. In familial and sporadic breast cancer patients, an estimated 40% belong to the homologous recombination defect. Although current major focus is on the treatment of HRD in breast cancer, HRD is also an important indicator in other cancer species.
Currently, there are two HRD detection methods:
the HR gene chip comprises homologous recombination pathway genes when the chip is designed, utilizes a target capture technology and a second-generation sequencing technology to obtain sequencing data of the homologous recombination pathway genes and detect SNV, Indel and large array of all the genes, has the defect that HRD can be estimated excessively, and has certain limitation because SNP sites on the chip are fixed and only can detect variation of specific sites when the chip is detected.
Whole Genome Sequencing (WGS), sequencing the whole genome, detecting chromosomal structural variation: HRD score was calculated including heterozygous deletion-LOH, telomere site imbalance-TAI and large degree of genomic instability-LST. The method has the advantages of high accuracy; the disadvantage is the relatively high cost.
Disclosure of Invention
In order to reduce cost under the condition of ensuring sensitivity and accuracy, the method for calculating the unbalanced TAI of the telomere allele is provided, the HRD score is evaluated on the basis of data obtained by low-depth WGS sequencing, the cost is reduced, and the method is more suitable for large-scale clinical application.
The application is realized by the following scheme:
the application provides a method for evaluating HRD score based on low depth WGS, comprising the following steps: processing low-depth WGS offline data of a sample to be detected; and any one or more steps selected from the following steps:
the method comprises the following steps: establishing a calculation method of genome heterozygosity loss LOH to obtain HRD-LOH score;
step two: establishing a calculation method of telomere allele imbalance TAI to obtain HRD-TAI score; and the combination of (a) and (b),
step three: and establishing a calculation method of large-fragment migration LST to obtain the HRD-LST score.
The method for evaluating the HRD score is established on the basis of low-depth WGS off-line data, so that the cost of Whole Genome Sequencing (WGS) is greatly reduced, compared with HR gene chip detection, the method has the advantages that the detection site is more flexible, the result of detecting a sample to be detected is more accurate, and the real condition of the sample to be detected is met.
In an embodiment of the present application, the processing low-depth WGS machine data of a sample to be measured specifically includes:
s1-1: comparing the off-line data with a reference genome of a human whole genome to obtain a first comparison file;
s1-2: removing repeated reads in the first comparison file to obtain a second comparison file;
s1-3: the human whole genome was divided into windows of 100 kbps in size.
In the application, the whole genome is divided into different windows according to the sequence and the size of 100Kbp, so that the analysis and the processing of subsequent data are facilitated.
In an embodiment of the present application, the processing the low-depth WGS machine data of the sample to be tested further includes:
s1-4: taking reads in the second comparison file as a basic unit, counting the number of reads in each window, taking the number of reads as the reads count of the window, and recording as RC i I is the order of windows divided in the whole genome in the order of arrangement, i is 1,2, 3.;
s1-5: counting GC base content of each window, merging adjacent windows with the same GC content into one group, and marking the j group as W j The number of windows in the jth group is represented as M j And the kth window contained in the jth group is denoted as W kj J and k are 1,2 and 3 respectively;
s1-6: calculate each W j Is noted as RC j Average RC over the sample, denoted RC p By the following formula to RC i And (3) performing correction:
NRCi=RCi×RCp/RCj
i=M 1 +M 2 +M 3 ...+M (j-1) +k;
s1-7: processing the low depth WGS off-set data of the N healthy samples according to the steps S1-1, S1-2 and S1-3, calculating the median value RC of each window in the N healthy samples, and marking as RC y Constructing baseline as RC of the window, wherein N is more than or equal to 30, and y is 1,2 and 3;
s1-8: traversing the windows of the samples to be detected and the windows of the healthy samples, and dividing NRCi of each window sample to be detected by RC in corresponding baseline y Obtaining DR;
s1-9: and (2) segmenting the DR based on a cyclic binary segmentation algorithm (CBS) and recording as DR segments, wherein DR values in the same DR segment are relatively close, average DR values of two adjacent DR segments are obviously different, and each DR segment at least comprises 10 windows.
In the application, each DR segment is set to contain at least 10 windows, wherein 10 windows can ensure that segments with the length of more than 1M are reserved in each DR segment, so as to shield interference signals to the greatest extent possible.
In an embodiment of the present application, the processing the low-depth WGS machine data of the sample to be tested further includes: s1-10: counting the median of DR in each DR segment, taking the median as the DR value of the DR segment, and recording as DR q Calculating the copy number of the DR fragment, and recording as C q The calculation formula is as follows:
Cq=interger(DRq×2)。
in the application, the internal cause of cancer occurrence can be preliminarily known by calculating the Cq value, and the medicine can be taken according to the symptoms on the cytology level, so that the remission rate of the cancer is greatly improved. If the Cq value is not equal to 2, it indicates that a variation in gene copy number has occurred. A Cq value greater than 2 indicates an increase in the gene (gain), and a value less than 2 indicates a deletion of the gene (loss). If some of the genes responsible for cell proliferation produce gain or tumor suppressor genes lose, it is possible to trigger unlimited proliferation of cells, leading to the development of cancer. Therefore, the occurrence of cancer can be preliminarily judged from the Cq value.
In one embodiment of the present application, the calculation method for establishing genomic heterozygosity loss LOH specifically comprises:
s2-1: selecting SNP loci with higher heterozygosity probability by using genome planning data of thousands of people;
s2-2: counting the frequency of each site allelic base on the SNP site on a sample to be detected, and if a plurality of allelic bases exist, selecting two allelic bases with the highest frequency; if there is only one allelic base, the second allelic base is given a default frequency of 0;
s2-3: counting the average frequency of the second allelic base of the SNP locus in each window to serve as AF (alloele frequency) of the window, and generating a new AF sequence; if AF is greater than 0, adjust AF to 0.5;
s2-4: connecting the same and adjacent windows of the AF in the step S2-3 to obtain a larger AF fragment;
s2-5: selecting C q If the length of the AF fragment is more than or equal to 1 and is less than the length of the whole chromosome where the AF fragment is located and is more than 15Mb, the AF fragment is marked as an LOH event;
s2-6: and recording LOH events in the sample to be detected as HRD-LOH score.
In one embodiment of the present application, the SNP sites with higher heterozygous probability means that the heterozygous probability is greater than 0.2, and the sites are approximately uniformly distributed on the genome, and the total number of the SNP sites is about 110000.
In a specific embodiment of the present application, the calculation method for establishing a large segment migration LST specifically includes:
s4-1: removing fragments of which the DR fragments are less than 3Mb from the DR fragments obtained in the step S1-9;
s4-2: taking a single chromosome as an analysis target, sequentially comparing the DR segment with the chromosome, and comparing adjacent C on the chromosome q The same DR segments are merged into one large segment, denoted as DR d Analyzing and processing all chromosomes in sequence;
s4-3: for DR d Making statistics if DR is formed d The length of two adjacent DR segments is more than 10Mb, and the middle interval is less than 3Mb, then the two adjacent DR segments are marked as an LST event;
s4-4: and recording the LST event in the sample to be detected as HRD-LST score.
In a specific embodiment of the present application, the calculation method for establishing the telomere allele imbalance TAI specifically includes:
s3-1: selecting SNP loci with higher heterozygosity probability by using genome planning data of thousands of people;
s3-2: counting variation frequency of allelic base of each site on the SNP site to obtain two frequencies with the highest variation frequency, namely a first allelic base frequency AF1 and a second allelic base frequency AF2, and calculating an AFR (equivalent frequency ratio) value of each site according to the following formula; if a certain locus has no variation, the AFR value is 0, and the locus is removed;
Figure BDA0003695804000000041
s3-3: calculating the average AFR of each window as the AFR of the window, and recording as the AFR p If the AFR values of a certain window are all 0, the AFR of the window p Is 0;
s3-4: subjecting AFR to p Combining adjacent windows less than 0.5 to obtain AFR p Combining the adjacent windows larger than 0.5 to respectively generate AFR fragments;
s3-5: if a certain AFR fragment contains telomeres, the length is more than 11Mb, and the AFR fragment contains telomeres p If the number is less than 0.5, the event is marked as a TAI event;
s3-6: and recording the TAI event in the sample to be detected as HRD-TAI score.
In an embodiment of the present application, the processing the low-depth WGS machine data of the sample to be tested further includes S1-0: preprocessing machine unloading data: and removing the joints on the reads from the data of the off-line machine.
In an embodiment of the present application, a file format of the low-depth WGS offline data is fastq format.
In one embodiment of the present application, the off-line data removes the joints on the reads through fastp software.
In a specific embodiment of the present application, the format of the first comparison file is a bam format.
In one embodiment of the present application, the off-set data is aligned to a reference genome of the human whole genome by bwa software.
In a specific embodiment of the present application, the first comparison file is obtained by removing duplicate reads through the picard software.
In a specific embodiment of the present application, the first comparison file is subjected to base quality value correction before removing duplicate reads.
In a specific embodiment of the present application, the first comparison file is base quality corrected by GATK software.
In a specific embodiment of the present application, the HRD score is HRD-LOH score + HRD-TAI score + HRD-LST score.
In a specific embodiment of the present application, the cutoff value for HRD negative or positive is HRD score 42.
In a specific embodiment of the present application, the low depth WGS is a WGS sequencing result of 10 or more layers. Preferably, the low depth WGS is a 10-tier WGS sequencing result.
Another aspect of the present application provides an apparatus for enabling low depth WGS-based assessment of HRD score, comprising: a data processing module: for processing low depth WGS off-line data; and one or more statistical modules selected from:
HRD-LOH score statistics Module: used for judging and counting the HRD-LOH score;
HRD-TAI score statistics Module: used for judging and counting the HRD-TAI score; and
HRD-LST score statistics Module: used to judge and count the HRD-LST score.
The method provided by the application has at least one of the following beneficial effects:
the method for evaluating the HRD score based on the low-depth WGS provided by the application is used for analyzing on the basis of data formed by sequencing of the low-depth WGS, so that the cost is greatly reduced, and the method is beneficial to large-scale application.
Drawings
Fig. 1 is a schematic flow chart of a method for evaluating HRD score based on low depth WGS provided in an embodiment of the present application.
Figure 2 is a graph of survival analysis of different HRD score patients provided in the examples of the present application.
FIG. 3 is a graph of HRD score vs. BRCA1/2 deleterious mutations provided in the examples of the present application.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
The technical solutions of the present application will be described clearly and completely in conjunction with the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. The examples, in which specific conditions are not specified, were conducted under conventional conditions or conditions recommended by the manufacturer. The reagents or instruments used are not indicated by the manufacturer, and are all conventional products available commercially.
Abbreviations and key terms in this application are defined as follows:
HR: homologous Recombination
HRD: homologous Recombination repair of defects
LOH: loss of heterozygosity, Loss of Hererozygosity
TAI: telomeric annular interference, Telomeric site imbalance
LST: large Genomic Instability of Large-Scale Genomic Instrument
HRD-LOH score: number of LOH events greater than 15Mb and less than chromosome length
HRD-TAI score: the sites extending to the ends of the chromosome are not balanced and the number of events with a region length of more than 11Mb
HRD-LST score: the number of events in which two adjacent segments of the chromosome are longer than 10Mb and the distance between the two segments is less than 3Mb
In the application, the sample to be tested is derived from a tumor patient sample received by a company, and the health sample is derived from a contribution sample of employees of the company collected by the company. The whole genome was derived from the public database NCBI, version hg 19. Thousand human genome project data are derived from the public database https:// www.internationalgenome.org/data.
As shown in fig. 1, the procedure for evaluating HRD score based on low depth WGS for the present application. In FIG. 1, HRD score was evaluated as the sum of HRD-LOH score, HRD-TAI score and HRD-LST score. In the practical application process, diseases related to heterozygote deletion or medication guidance and the like can be preliminarily judged according to the HRD-LOH score or combined with other indexes such as BRCA1/2 mutation and the like, and the HRD score can also be preliminarily judged; the HRD score can be preliminarily judged according to the HRD-TAI score or combined with other indexes to preliminarily judge diseases related to the imbalance of the telomere sites or guide medication and the like, and can also be preliminarily judged; and preliminarily judging diseases or medication guidance and the like related to large-scale genome instability according to the HRD-LST score or by combining other indexes, and also preliminarily judging the HRD score.
Example 1 Pre-treatment of Low depth WGS data processing
1. Removing joints on reads from low-depth WGS offline data through fastp software;
2. comparing the processed off-line data with a reference genome of a human whole genome through bwa software to obtain a first comparison file in a bam format;
3. correcting the base quality value of the first comparison file through GATK software;
4. and removing repeated reads from the corrected first comparison file through picard software to obtain a second comparison file which does not contain the repeated reads, wherein the file format is a bam format.
5. The human whole genome was divided into windows of 100Kbp in rank order.
Example 2 construction of DR fragment
1) Taking reads in the second comparison file in the embodiment 1 as a basic unit, counting the number of reads falling in each window in the embodiment 1, and taking the number as the reads count of the window and recording as RC i I is the order of windows divided in the whole genome in the order of arrangement, and i is 1,2,3.
2) Counting the GC base content of each window, combining adjacent windows with the same GC content into one group, and marking the jth group as W j The number of windows in the jth group is represented as M j And the kth window in the jth group is denoted as W kj J and k are 1,2 and 3 respectively;
3) calculating W j Is a median value RC of j And the sample to be testedAverage RC of the whole body, denoted as RC p By the following formula to RC i And (3) performing correction:
NRCi=RCi×RCp/RCj
i=M 1 +M 2 +M 3 ...+M (j-1) +k;
4) processing 30 healthy people low-depth WGS data according to the method in the steps 1), 2) and 3), and calculating the median RC of each window in a plurality of samples, and recording the RC as RC y Constructing baseline as RC of the window; n is more than or equal to 30, and y is 1,2,3.
5) Taking NRC of each window to be tested sample i Divided by RC in the corresponding baseline y Obtaining DR (depthratio);
6) based on a cyclic binary segmentation algorithm (CBS: circular segmentation) is used for segmenting DR (segments), which are marked as DR segments, the DR values in the same DR segment are relatively close, the average DR values of two adjacent DR segments are obviously different, and each DR segment at least comprises 10 windows.
Example 3 calculation of copy number
Counting the median of DR in each DR segment, taking the median as the DR value of the DR segment, and recording as DR q Calculating the copy number of the DR fragment, and recording as C q The calculation formula is as follows:
Cq=interger(DRq×2)。
in this example, the intrinsic cause of cancer in a patient can be preliminarily understood by calculation of Cq (copy number).
Example 4HRD-LOH score statistics
1. Using thousands of genome planning data, selecting SNP sites with higher heterozygous probability, wherein the sites are approximately uniformly distributed in the genome and are about 110000;
2. counting the frequency of each site allelic base on the SNP site on a sample to be detected, and if a plurality of allelic bases exist, selecting two of the allelic bases with the highest frequency; if there is only one allelic base, the second allelic base is given a default frequency of 0;
3. counting the average frequency of the second allelic base of the SNP locus in each window in example 1 as AF (alloele frequency) of the window, and generating a new AF sequence; if AF is greater than 0, adjust AF to 0.5;
4. connecting adjacent windows with the same AF in the step 3 to obtain a larger AF fragment;
5. selecting AF fragments of which Cq is more than or equal to 1 and AF is equal to 0 in example 3, and recording as an LOH event if the length of the fragments is more than 15Mb and less than the length of the whole chromosome in which the fragments are located;
6. and recording LOH events in the sample to be detected as HRD-LOH score.
Example 5 statistics of HRD-TAI score
1. Selecting SNP loci with higher heterozygosity probability by using genome planning data of thousands of people;
2. counting variation frequency of allelic base of each site on the SNP site to obtain two frequencies with the highest variation frequency, namely a first allelic base frequency AF1 and a second allelic base frequency AF2, and calculating an AFR value of each site according to the following formula; if the site has no variation, the AFR value is 0, and the site is removed;
Figure BDA0003695804000000081
3. the average AFR for each window in example 1 was calculated as the AFR for that window and noted as AFR p If the AFR values of a certain window are all 0, the AFR of the window p Is 0;
4. subjecting AFR to p Combining adjacent windows less than 0.5 to obtain AFR p Combining adjacent windows larger than 0.5 to respectively generate AFR fragments;
5. if a certain AFR fragment contains telomeres, the length is more than 11Mb, and the AFR p If the number is less than 0.5, the event is marked as a TAI event;
6. and recording the TAI event in the sample to be detected as HRD-TAI score.
Example 6 statistics of HRD-LST score
1. Removing fragments with the DR fragments smaller than 3Mb from the DR fragments constructed in the embodiment 2;
2. taking a single chromosome as an analysis target, sequentially comparing the DR segment with the chromosome, and obtaining the chromosome phaseAdjacent C q The same DR segments are merged into one large segment, denoted as DR d Analyzing and processing all chromosomes in sequence;
3. for DR d Making statistics if DR is formed d The length of two adjacent DR segments is more than 10Mb, and the middle interval is less than 3Mb, then the two adjacent DR segments are marked as an LST event;
4. and recording the LST event in the sample to be detected as HRD-LST score.
Example 7 statistics of HRD score
HRD score=HRD-LOH score+HRD-TAI score+HRD-LST score。
In this example, the patient is sensitive to platinum group drugs and PARP inhibitors when the cutoff value of HRD is HRD score ═ 42, i.e., when HRD score is greater than 42.
Example 8 determination of the number of layers for Low depth WGS sequencing
WGS sequencing was performed using 10 samples to obtain 50-fold data, and then 50X, 30X, 20X, 10X, 5X were randomly truncated, HRD-LOH score, HRD-TAI score, and HRD-LST score were detected using the methods in examples 1-7, and the final HRD score was calculated, thereby performing minimum sequencing quantity evaluation. The experimental results are shown in Table 1.
TABLE 1 correlation of different depth sequencing
Sample(s) 50X 30X 20X 10X 5X
S1 62 62 67 59 70
S2 51 51 46 42 68
S3 69 71 66 66 53
S4 30 30 34 39 46
S5 3 3 2 11 15
S6 73 70 70 70 55
S7 55 55 50 62 44
S8 57 54 57 57 39
S9 46 44 50 50 51
S10 27 27 19 18 20
Correlation of 1 0.997 0.980 0.952 0.753
As can be seen from Table 1, with the data volume of 50X as a reference, when the data volume of 30X, 20X, 10X is used, the correlation coefficient of the obtained HRD-score with the data volume of 50X is more than 95%, and the lower the number of layers, the worse the correlation coefficient, therefore, the 10X with more than 95% correlation is used as the lowest layer number for low depth WGS sequencing in the examples of the present application.
Application example 1
A sample from a breast cancer patient is analyzed using the methods identified in examples 1-8 of the present application.
Breast cancer patients: name wu-za, gender of the female, age 46, clinical symptoms left invasive carcinoma of the breast, no other history.
The patient was judged to be HRD positive if the patient had an HRD-LOH score of 1, an HRD-TAI score of 4, an HRD-LST score of 44, a total HRD score of 49, and a cutoff value as determined by the methods described in examples 1-8 herein. The patient is judged to have good response to platinum chemotherapeutic drugs or PARP inhibitors and the like, and the result is consistent with the correlation of HRD-high and platinum drug treatment sensitivity reported in the prior literature. Clinically, patients were given a platinum chemotherapeutic with Progression Free Survival (PFS) of 13 months, demonstrating that HRD score can be calculated using the methods identified in examples 1-6 herein and clinical medication can be guided by HRD score.
Application example 2
43 breast cancer samples and 72 ovarian cancer samples (115 samples) are selected for low-depth whole genome sequencing, and the 115 samples have prognosis information of platinum chemotherapy (namely, the 115 samples are all treated by the platinum chemotherapy). LOH, TAI, LST were calculated using the methods described in examples 1-8 of the present application to obtain HRD score.
Based on the HRD score values, 115 samples were divided into two groups, HRD-High (positive) and HRD-Low (negative), with cutoff value 42 as the critical point. Among them, 40 cases were given in the HRD-High group, and 75 cases were given in the HRD-low group. Survival analysis was performed on the HRD-High and HRD-Low groups in combination with clinical PFS (progress Free survival), and the experimental results are shown in FIG. 2.
As can be seen from FIG. 2, the overall survival time of the HRD-High group (40 cases) is significantly longer than that of the HRD-Low group (75 cases), indicating that the HRD-High group is sensitive to platinum-based drug treatment, consistent with the fact.
The harmful variation of SNV and Indel on BRCA1/2 gene of 115 samples is detected respectively. The result is: among 43 breast cancer samples, 13 BRCA1/2 mutant samples and 30 BRCA1/2 wild type samples; among the 72 ovarian cancer samples, 27 BRCA1/2 mutant samples and 45 BRCA1/2 wild-type samples were subjected to BRCA1/2-HRD distribution, and the results are shown in FIG. 3.
As can be seen from FIG. 3, in the samples with BRCA1/2 mutation, most of the HRD values are high, and the HRD values are positive, and the patients in the category are sensitive to platinum drugs; while a small proportion of samples in the wild type BRCA1/2 were HRD-High, it can be seen from FIG. 2 that even in the wild type BRCA1/2, if the HRD score is High and the samples are HRD-positive, the patient is sensitive to platinum drugs, which is consistent with the fact. Therefore, the HRD score was calculated using the methods provided in the examples of the present application, and clinical medication could be guided to some extent by the numerical values of the HRD score.
The present embodiment is only for explaining the present application, and it is not limited to the present application, and those skilled in the art can make modifications of the present embodiment without inventive contribution as needed after reading the present specification, but all of them are protected by patent law within the scope of the claims of the present application.

Claims (3)

1. A method for calculating the telomere allele imbalance TAI is characterized by comprising the following steps:
s1-1: comparing the low-depth WGS offline data of the sample to be detected with a reference genome of a whole genome to obtain a first comparison file;
s1-2: removing repeated reads in the first comparison file to obtain a second comparison file;
s1-3: dividing the whole genome into windows with the size of 100Kbp according to the arrangement sequence;
s3-1: selecting SNP loci with higher heterozygosity probability by using genome planning data of thousands of people;
s3-2: counting the variation frequency of the allelic base of each site on the SNP site of the second comparison file to obtain two frequencies with the highest variation frequency, namely a first allelic base frequency AF1 and a second allelic base frequency AF2, and calculating the AFR value of each site according to the following formula; if a certain locus has no variation, the AFR value is 0, and the locus is removed;
Figure FDA0003695803990000011
s3-3: calculating the average AFR of each window, taking the average AFR as the AFR of the window, and recording the average AFR as the AFR p If the AFR values of a certain window are all 0, the AFR of the window p Is 0;
s3-4: subjecting AFR to p Combining adjacent windows less than 0.5 to obtain AFR p Combining adjacent windows larger than 0.5 to respectively generate AFR fragments;
s3-5: if a certain AFR fragment contains telomeres, the length is more than 11Mb, and the AFR fragment contains telomeres p If the number is less than 0.5, the event is marked as a TAI event;
s3-6: and recording the TAI event in the sample to be detected as HRD-TAI score.
2. The method for calculating telomere allele imbalance TAI according to claim 1, wherein the method for processing low depth WGS dector data further comprises S1-0: preprocessing machine unloading data: and removing the joints on the reads from the data of the off-line machine.
3. The method of claim 2 wherein the low depth WGS is a WGS sequencing result of 10 layers or more.
CN202210674201.4A 2021-06-28 2021-06-28 Calculation method of telomere allele imbalance TAI Active CN114999568B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210674201.4A CN114999568B (en) 2021-06-28 2021-06-28 Calculation method of telomere allele imbalance TAI

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110716079.8A CN113257346B (en) 2021-06-28 2021-06-28 Method for evaluating HRD score based on low-depth WGS
CN202210674201.4A CN114999568B (en) 2021-06-28 2021-06-28 Calculation method of telomere allele imbalance TAI

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN202110716079.8A Division CN113257346B (en) 2021-06-28 2021-06-28 Method for evaluating HRD score based on low-depth WGS

Publications (2)

Publication Number Publication Date
CN114999568A true CN114999568A (en) 2022-09-02
CN114999568B CN114999568B (en) 2023-04-18

Family

ID=77189984

Family Applications (3)

Application Number Title Priority Date Filing Date
CN202210674201.4A Active CN114999568B (en) 2021-06-28 2021-06-28 Calculation method of telomere allele imbalance TAI
CN202111154180.5A Active CN113948151B (en) 2021-06-28 2021-06-28 Processing method of low-depth WGS (WGS) offline data
CN202110716079.8A Active CN113257346B (en) 2021-06-28 2021-06-28 Method for evaluating HRD score based on low-depth WGS

Family Applications After (2)

Application Number Title Priority Date Filing Date
CN202111154180.5A Active CN113948151B (en) 2021-06-28 2021-06-28 Processing method of low-depth WGS (WGS) offline data
CN202110716079.8A Active CN113257346B (en) 2021-06-28 2021-06-28 Method for evaluating HRD score based on low-depth WGS

Country Status (1)

Country Link
CN (3) CN114999568B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114067909B (en) * 2021-11-23 2022-08-30 北京吉因加医学检验实验室有限公司 Method, device and storage medium for correcting homologous recombination defect score
CN114242170B (en) * 2021-12-21 2023-05-09 深圳吉因加医学检验实验室 Method and device for evaluating homologous recombination repair defects and storage medium
CN114300053A (en) * 2021-12-29 2022-04-08 苏州绘真医学检验有限公司 Homologous recombination defective gene analysis method
CN115862733B (en) * 2023-02-27 2023-06-02 广州嘉检医学检测有限公司 Method for detecting heterozygosity deficiency based on mid-depth whole genome second generation sequencing
CN115985399B (en) * 2023-03-20 2023-07-04 广州迈景基因医学科技有限公司 HRD panel site selection optimization method and system for high-throughput sequencing

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170260588A1 (en) * 2013-12-09 2017-09-14 Institut Curie Methods for detecting inactivation of the homologous recombination pathway (brca1/2) in human tumors
US20170283879A1 (en) * 2014-08-15 2017-10-05 Myriad Genetics, Inc. Methods and materials for assessing homologous recombination deficiency
CN107287285A (en) * 2017-03-28 2017-10-24 上海至本生物科技有限公司 It is a kind of to predict the method that homologous recombination absent assignment and patient respond to treatment of cancer
US20180002379A1 (en) * 2015-01-21 2018-01-04 Sangamo Therapeutics, Inc. Methods and compositions for identification of highly specific nucleases
CN111676277A (en) * 2020-08-12 2020-09-18 臻和(北京)生物科技有限公司 Method and kit for determining unstable genome based on second-generation sequencing technology
CN111883211A (en) * 2020-08-07 2020-11-03 张哲� Gene scar for representing HRD homologous recombination repair defect and identification method
CN112164420A (en) * 2020-09-07 2021-01-01 厦门艾德生物医药科技股份有限公司 Method for establishing genome scar model
CN112397145A (en) * 2020-11-19 2021-02-23 河南省肿瘤医院 HRD (high resolution display) score calculation method based on chip detection
CN112410423A (en) * 2020-11-03 2021-02-26 南京世和基因生物技术股份有限公司 Marker for deletion of homologous recombination, detection method and detection system
CN112669906A (en) * 2020-11-25 2021-04-16 深圳华大基因股份有限公司 Detection method, device, terminal device and computer-readable storage medium for measuring genome instability
CN112802548A (en) * 2021-01-07 2021-05-14 深圳吉因加医学检验实验室 Method for predicting allele-specific copy number variation of single-sample whole genome
CN112820351A (en) * 2021-03-01 2021-05-18 江苏医联生物科技有限公司 Method for detecting mutation and HRD (high resolution contrast) score guiding medication of tumor patient
CN112980834A (en) * 2021-04-22 2021-06-18 菁良基因科技(深圳)有限公司 Homologous recombination defect repair reference product and preparation method and kit thereof

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105986011B (en) * 2015-01-30 2019-10-15 深圳华大基因研究院 A kind of detection method of loss of heterozygosity
CN112226495B (en) * 2020-12-18 2021-03-16 北京迈基诺基因科技股份有限公司 Method for detecting DNA homologous recombination abnormality and application thereof

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170260588A1 (en) * 2013-12-09 2017-09-14 Institut Curie Methods for detecting inactivation of the homologous recombination pathway (brca1/2) in human tumors
US20170283879A1 (en) * 2014-08-15 2017-10-05 Myriad Genetics, Inc. Methods and materials for assessing homologous recombination deficiency
US20180002379A1 (en) * 2015-01-21 2018-01-04 Sangamo Therapeutics, Inc. Methods and compositions for identification of highly specific nucleases
CN107287285A (en) * 2017-03-28 2017-10-24 上海至本生物科技有限公司 It is a kind of to predict the method that homologous recombination absent assignment and patient respond to treatment of cancer
CN111883211A (en) * 2020-08-07 2020-11-03 张哲� Gene scar for representing HRD homologous recombination repair defect and identification method
CN111676277A (en) * 2020-08-12 2020-09-18 臻和(北京)生物科技有限公司 Method and kit for determining unstable genome based on second-generation sequencing technology
CN112164420A (en) * 2020-09-07 2021-01-01 厦门艾德生物医药科技股份有限公司 Method for establishing genome scar model
CN112410423A (en) * 2020-11-03 2021-02-26 南京世和基因生物技术股份有限公司 Marker for deletion of homologous recombination, detection method and detection system
CN112397145A (en) * 2020-11-19 2021-02-23 河南省肿瘤医院 HRD (high resolution display) score calculation method based on chip detection
CN112669906A (en) * 2020-11-25 2021-04-16 深圳华大基因股份有限公司 Detection method, device, terminal device and computer-readable storage medium for measuring genome instability
CN112802548A (en) * 2021-01-07 2021-05-14 深圳吉因加医学检验实验室 Method for predicting allele-specific copy number variation of single-sample whole genome
CN112820351A (en) * 2021-03-01 2021-05-18 江苏医联生物科技有限公司 Method for detecting mutation and HRD (high resolution contrast) score guiding medication of tumor patient
CN112980834A (en) * 2021-04-22 2021-06-18 菁良基因科技(深圳)有限公司 Homologous recombination defect repair reference product and preparation method and kit thereof

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
JAMES R. WAGNER 等: "Computational Analysis of Whole-Genome Differential Allelic Expression Data in Human" *
LUISA MATOS DO CANTON 等: "Increased Levels of Genomic Instability and Mutations in Homologous Recombination Genes in Locally Advanced Rectal Carcinomas" *
ZOEYER: "HRD评分体系(1)" *
刘蕾: "遗传因素与乳腺癌特征及预后相关性研究" *
许燕梅: "DNA传感器高性能电化学检测大肠杆菌基因序列的研究" *

Also Published As

Publication number Publication date
CN113948151A (en) 2022-01-18
CN113257346A (en) 2021-08-13
CN114999568B (en) 2023-04-18
CN113948151B (en) 2022-07-05
CN113257346B (en) 2021-10-19

Similar Documents

Publication Publication Date Title
CN114999568B (en) Calculation method of telomere allele imbalance TAI
CN109880910A (en) A kind of detection site combination, detection method, detection kit and the system of Tumor mutations load
CN112226495B (en) Method for detecting DNA homologous recombination abnormality and application thereof
CN110444255B (en) Biological information quality control method and device based on second-generation sequencing and storage medium
CN103667438B (en) Method for screening HRDs disease-causing mutation and gene chip hybridization probe designing method involved in same
CN107423578B (en) Device for detecting somatic cell mutation
CN110648721B (en) Method and device for detecting copy number variation by aiming at exon capture technology
CN108256292B (en) Copy number variation detection device
EP4372751A1 (en) Cancer detection model and construction method therefor, and reagent kit
CN111968701A (en) Method and device for detecting somatic copy number variation of designated genome region
CN112088220A (en) Surrogate markers and methods for tumor mutation burden determination
Zhao et al. Association between the IL-10-1082G/A, IL-10-592A/C, and IL-10-819G/A polymorphisms and atopic dermatitis susceptibility: a meta-analysis
CN111951893A (en) Method for constructing tumor mutation load TMB panel and using method thereof
CN112201306A (en) True and false gene mutation analysis method based on high-throughput sequencing and application
CN113724781B (en) Method and apparatus for detecting homozygous deletions
CN112037863B (en) Early NSCLC prognosis prediction system
CN112786103A (en) Method and device for analyzing feasibility of target sequencing Panel for estimating tumor mutation load
CN116741272A (en) Ovarian cancer HRD typing system and method based on genome mutation characteristics and gene set expression characteristics
CN103710340B (en) A kind of I type USHER syndrome associated gene mutation and apply the deaf Molecular Etiology diagnostic reagent of this mutator gene
CN105838720A (en) PTPRQ gene mutant and application thereof
CN111944905B (en) Human gene combination and application thereof in preparation of kit for evaluating sensitivity of cervical cancer newly assisted chemotherapy drugs
CN114410772A (en) Susceptibility gene of chronic obstructive pulmonary acute exacerbation and application thereof in predicting susceptibility to chronic obstructive pulmonary acute exacerbation
CN104099338A (en) MYO15A gene mutant and application thereof
CN106906220A (en) A kind of COL4A5 genes of mutation and its application
CN115074439B (en) Group of NK/T cell lymphoma prognosis related genes, genome prognosis model and application thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant