CN113257346A - Method for evaluating HRD score based on low-depth WGS - Google Patents

Method for evaluating HRD score based on low-depth WGS Download PDF

Info

Publication number
CN113257346A
CN113257346A CN202110716079.8A CN202110716079A CN113257346A CN 113257346 A CN113257346 A CN 113257346A CN 202110716079 A CN202110716079 A CN 202110716079A CN 113257346 A CN113257346 A CN 113257346A
Authority
CN
China
Prior art keywords
hrd
score
window
afr
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110716079.8A
Other languages
Chinese (zh)
Other versions
CN113257346B (en
Inventor
楼峰
刘凯
张萌萌
郭璟
孙宏
曹善柏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiangxin Biotechnology Co ltd
Tianjin Xiangxin Biotechnology Co ltd
Tianjin Xiangxin Medical Instrument Co ltd
Zhejiang Cancer Hospital
Original Assignee
Tianjin Xiangxin Biotechnology Co ltd
Tianjin Xiangxin Medical Instrument Co ltd
Beijing Xiangxin Biotechnology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Xiangxin Biotechnology Co ltd, Tianjin Xiangxin Medical Instrument Co ltd, Beijing Xiangxin Biotechnology Co ltd filed Critical Tianjin Xiangxin Biotechnology Co ltd
Priority to CN202210674201.4A priority Critical patent/CN114999568B/en
Priority to CN202110716079.8A priority patent/CN113257346B/en
Priority to CN202111154180.5A priority patent/CN113948151B/en
Publication of CN113257346A publication Critical patent/CN113257346A/en
Application granted granted Critical
Publication of CN113257346B publication Critical patent/CN113257346B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The application belongs to the technical field of gene detection, and particularly discloses a method for evaluating HRD score based on low-depth WGS, which comprises the following steps: processing low-depth WGS offline data of a sample to be detected; and any one or more of the following three steps: the method comprises the following steps: establishing a calculation method of genome heterozygosity loss LOH to obtain HRD-LOH score; step two: establishing a calculation method of telomere allele imbalance TAI to obtain HRD-TAI score; and, step three: and establishing a calculation method of large-fragment migration LST to obtain the HRD-LST score. The application has at least one of the following beneficial effects: the method for evaluating the HRD score based on the low-depth WGS provided by the application is used for analyzing on the basis of data formed by sequencing of the low-depth WGS, so that the cost is greatly reduced, and the method is beneficial to large-scale application.

Description

Method for evaluating HRD score based on low-depth WGS
Technical Field
The application belongs to the technical field of gene detection, and particularly relates to a method for evaluating HRD score based on low-depth WGS.
Background
DNA double strand breaks (double strand breaks) are a type of DNA damage, which can cause chromosome breakage and rearrangement in severe cases, and because there is no complementary strand to repair, DNA sequences are difficult to recover, resulting in loss of genetic information, and such DNA double strand breaks require homologous recombination to repair. When HRD occurs in the absence of the repair ability of homologous recombination, the genome loses stability, and DNA damage is likely to accumulate in the case of unstable genome, resulting in a vicious circle and cancer. HRD has important guiding significance for the use of platinum or PARP inhibitors.
HRD is generally caused by genetic variation or apparent variation in homologous recombination repair pathways including genes such as BRCA1/2, Rad52/Rad22, PALB2, RAD51 family, BRIP1/BACH1, ATM and CHEK 2. Studies have shown that women with the BRCA1 mutation have a 50-85% and 15-45% probability of acquiring breast and ovarian cancer. In breast cancer, the genetic BRCA1/2 variation accounts for approximately 7%, while in triple negative breast cancers 11% -15% can be achieved. In familial and sporadic breast cancer patients, an estimated 40% belong to the homologous recombination defect. Although current major focus is on the treatment of HRD in breast cancer, HRD is also an important indicator in other cancer species.
At present, the HRD detection methods include the following two methods:
HR gene chip, chip design contains homologous recombination pathway gene, utilizes target capture technology and second generation sequencing technology, obtains the sequencing data of homologous recombination pathway gene, detects SNV, Indel and large array of all genes, the disadvantage is that HRD may be overestimated, and based on chip detection, the SNP site on the chip is fixed, only can detect the variation of specific site, has certain limitation.
Whole Genome Sequencing (WGS), sequencing the whole genome, detecting chromosomal structural variation: HRD score was calculated including loss of heterozygosity-LOH, telomere site imbalance-TAI and large degree of genomic instability-LST. The method has the advantages of high accuracy; the disadvantage is the relatively high cost.
Disclosure of Invention
In order to reduce the cost under the condition of ensuring the sensitivity and the accuracy, the application provides the method for evaluating the HRD score based on the low-depth WGS, the method evaluates the HRD score based on the data obtained by the low-depth WGS sequencing, the cost is reduced, and the method is more suitable for large-scale clinical application.
The application is realized by the following scheme:
the application provides a method for evaluating HRD score based on low depth WGS, comprising the following steps: processing low-depth WGS offline data of a sample to be detected; and any one or more steps selected from the following steps:
the method comprises the following steps: establishing a calculation method of genome heterozygosity loss LOH to obtain HRD-LOH score;
step two: establishing a calculation method of telomere allele imbalance TAI to obtain HRD-TAI score; and the combination of (a) and (b),
step three: and establishing a calculation method of large-fragment migration LST to obtain the HRD-LST score.
The method for evaluating the HRD score is established on the basis of low-depth WGS off-line data, so that the cost of Whole Genome Sequencing (WGS) is greatly reduced, compared with HR gene chip detection, the method has the advantages that the detection site is more flexible, the result of detecting a sample to be detected is more accurate, and the real condition of the sample to be detected is met.
In an embodiment of the present application, the processing low-depth WGS machine data of a sample to be measured specifically includes:
s1-1: comparing the off-line data with a reference genome of a human whole genome to obtain a first comparison file;
s1-2: removing repeated reads in the first comparison file to obtain a second comparison file;
s1-3: the human whole genome was divided into windows of 100 kbps in size.
In the application, the whole genome is divided into different windows according to the sequence and the size of 100Kbp, so that the analysis and the processing of subsequent data are facilitated.
In an embodiment of the present application, the processing the low-depth WGS machine data of the sample to be tested further includes:
s1-4: taking reads in the second comparison file as a basic unit, counting the number of reads in each window, taking the number of reads as the reads count of the window, and recording as RCiI is the order of windows divided in the whole genome according to the arrangement order, and i is 1,2 and 3;
s1-5: counting the GC base content of each window, combining adjacent windows with the same GC content into one group, and marking the jth group as WjThe number of windows in the jth group is represented as MjAnd the kth window contained in the jth group is denoted as WkjJ and k are 1,2 and 3 respectively;
s1-6: calculate each WjMedian value of (2), denoted as RCjAverage RC over the sample, denoted RCpBy the following formula to RCiAnd (3) performing correction:
Figure 252255DEST_PATH_IMAGE001
i=M1+M2+M3...+M(j-1)+k;
s1-7: processing the low depth WGS off-set data of the N healthy samples according to the steps S1-1, S1-2 and S1-3, calculating the median value RC of each window in the N healthy samples, and marking as RCyConstructing baseline as RC of the window, wherein N is more than or equal to 30, and y is 1,2 and 3;
s1-8: traversing windows of the sample to be detected and windows of the healthy sample, and taking every timeDividing NRCi of each window sample to be tested by RC in corresponding baselineyObtaining DR;
s1-9: and (2) segmenting the DR based on a cyclic binary segmentation algorithm (CBS) and recording as DR segments, wherein DR values in the same DR segment are relatively close, average DR values of two adjacent DR segments are obviously different, and each DR segment at least comprises 10 windows.
In the application, each DR segment is set to contain at least 10 windows, wherein 10 windows can ensure that segments with the length of more than 1M are reserved in each DR segment, so as to shield interference signals to the greatest extent possible.
In an embodiment of the present application, the processing the low-depth WGS machine data of the sample to be tested further includes:
s1-10: counting the median of DR in each DR segment, taking the median as the DR value of the DR segment, and recording as DRqCalculating the copy number of the DR fragment, and recording as CqThe calculation formula is as follows:
Figure 497291DEST_PATH_IMAGE002
in the application, the internal cause of cancer occurrence can be preliminarily known by calculating the Cq value, and the medicine can be taken according to the symptoms on the cytology level, so that the remission rate of the cancer is greatly improved. If the Cq value is not equal to 2, it indicates that a variation in gene copy number has occurred. A Cq value greater than 2 indicates an increase in the gene (gain), and a value less than 2 indicates a deletion of the gene (loss). If some of the genes responsible for cell proliferation produce gain or tumor suppressor genes lose, it is possible to trigger unlimited proliferation of cells, leading to the development of cancer. Therefore, the occurrence of cancer can be preliminarily judged from the Cq value.
In one embodiment of the present application, the calculation method for establishing genomic heterozygosity loss LOH specifically comprises:
s2-1: selecting SNP loci with higher heterozygosity probability by using genome planning data of thousands of people;
s2-2: counting the frequency of each site allelic base on the SNP site on a sample to be detected, and if a plurality of allelic bases exist, selecting two of the allelic bases with the highest frequency; if there is only one allelic base, the second allelic base is given a default frequency of 0;
s2-3: counting the average frequency of the second allelic base of the SNP locus in each window to serve as AF (alloele frequency) of the window, and generating a new AF sequence; if AF is greater than 0, adjust AF to 0.5;
s2-4: connecting the same AF in the step S2-3 and adjacent windows to obtain a larger AF fragment;
s2-5: selecting CqIf the length of the AF fragment is more than or equal to 1 and is less than the length of the whole chromosome where the AF fragment is located and is more than 15Mb, the AF fragment is marked as an LOH event;
s2-6: and recording LOH events in the sample to be detected as HRD-LOH score.
In one embodiment of the present application, the SNP sites with higher heterozygous probability means that the heterozygous probability is greater than 0.2, and the sites are approximately uniformly distributed on the genome, and the total number of the SNP sites is about 110000.
In a specific embodiment of the present application, the calculation method for establishing a large segment migration LST specifically includes:
s4-1: removing fragments of which the DR fragments are less than 3Mb from the DR fragments obtained in the step S1-9;
s4-2: taking a single chromosome as an analysis target, sequentially comparing the DR segment with the chromosome, and comparing adjacent C on the chromosomeqThe same DR segments are merged into one large segment, denoted as DRdAnalyzing and processing all chromosomes in sequence;
s4-3: for DRdMaking statistics if DR is formeddThe length of two adjacent DR segments is more than 10Mb, and the middle interval is less than 3Mb, then the two adjacent DR segments are marked as an LST event;
s4-4: and recording the LST event in the sample to be detected as HRD-LST score.
In a specific embodiment of the present application, the calculation method for establishing the telomere allele imbalance TAI specifically includes:
s3-1: selecting SNP loci with higher heterozygosity probability by using genome planning data of thousands of people;
s3-2: counting variation frequency of allelic base of each site on the SNP site to obtain two frequencies with the highest variation frequency, namely a first allelic base frequency AF1 and a second allelic base frequency AF2, and calculating an AFR (equivalent frequency ratio) value of each site according to the following formula; if a certain locus has no variation, the AFR value is 0, and the locus is removed;
Figure 607854DEST_PATH_IMAGE003
s3-3: calculating the average AFR of each window as the AFR of the window, and recording as the AFRpIf the AFR values of a certain window are all 0, the AFR of the windowpIs 0;
s3-4: subjecting AFR topCombining adjacent windows less than 0.5 to obtain AFRpCombining adjacent windows larger than 0.5 to respectively generate AFR fragments;
s3-5: if a certain AFR fragment contains telomeres, the length is more than 11Mb, and the AFR fragment contains telomerespIf the number is less than 0.5, the event is marked as a TAI event;
s3-6: and recording the TAI event in the sample to be detected as HRD-TAI score.
In an embodiment of the present application, the processing the low-depth WGS machine data of the sample to be tested further includes S1-0: preprocessing machine unloading data: and removing the joints on the reads from the data of the off-line machine.
In an embodiment of the present application, a file format of the low-depth WGS offline data is fastq format.
In one embodiment of the present application, the off-line data removes the joints on the reads through fastp software.
In a specific embodiment of the present application, the format of the first comparison file is a bam format.
In one embodiment of the present application, the off-set data is aligned to a reference genome of the human whole genome by bwa software.
In a specific embodiment of the present application, the first comparison file is obtained by removing duplicate reads through the picard software.
In a specific embodiment of the present application, the first comparison file is subjected to base quality value correction before removing duplicate reads.
In a specific embodiment of the present application, the first comparison file is base quality corrected by GATK software.
In a specific embodiment of the present application, the HRD score = HRD-LOH score + HRD-TAI score + HRD-LST score.
In a specific embodiment of the present application, the cutoff value for HRD negative or positive is HRD score = 42.
In a specific embodiment of the present application, the low depth WGS is a WGS sequencing result of 10 or more layers. Preferably, the low depth WGS is a 10-tier WGS sequencing result.
Another aspect of the present application provides an apparatus for enabling low depth WGS-based assessment of HRD score, comprising:
a data processing module: for processing low depth WGS off-line data; and one or more statistical modules selected from:
HRD-LOH score statistics Module: used for judging and counting the HRD-LOH score;
HRD-TAI score statistics Module: used for judging and counting the HRD-TAI score; and
HRD-LST score statistics Module: used to judge and count the HRD-LST score.
The method provided by the application has at least one of the following beneficial effects:
the method for evaluating the HRD score based on the low-depth WGS provided by the application is used for analyzing on the basis of data formed by sequencing of the low-depth WGS, so that the cost is greatly reduced, and the method is beneficial to large-scale application.
Drawings
Fig. 1 is a schematic flow chart of a method for evaluating HRD score based on low depth WGS provided in an embodiment of the present application.
Figure 2 is a graph of survival analysis of different HRD score patients provided in the examples of the present application.
FIG. 3 is a graph of HRD score vs. BRCA1/2 deleterious mutations provided in the examples of the present application.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
The technical solutions of the present application will be described clearly and completely in conjunction with the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. The examples, in which specific conditions are not specified, were conducted under conventional conditions or conditions recommended by the manufacturer. The reagents or instruments used are not indicated by the manufacturer, and are all conventional products available commercially.
Abbreviations and key terms in this application are defined as follows:
HR: homologous Recombination
HRD: homologous Recombination repair of defects
LOH: loss of heterozygosity, Loss of Hererozygosity
TAI: telomeric annular interference, Telomeric site imbalance
LST: large Genomic Instability of Large-Scale Genomic Instrument
HRD-LOH score: number of LOH events greater than 15Mb and less than chromosome length
HRD-TAI score: the sites extending to the ends of the chromosome are not balanced and the number of events with a region length of more than 11Mb
HRD-LST score: the number of events in which two adjacent segments of the chromosome are longer than 10Mb and the distance between the two segments is less than 3Mb
In the application, the sample to be tested is derived from a tumor patient sample received by a company, and the health sample is derived from a contribution sample of employees of the company collected by the company. The whole genome was derived from the public database NCBI, version hg 19. Thousand human genome project data are derived from the public database https:// www.internationalgenome.org/data.
As shown in fig. 1, the procedure for evaluating HRD score based on low depth WGS for the present application. In FIG. 1, HRD score was evaluated as the sum of HRD-LOH score, HRD-TAI score and HRD-LST score. In the practical application process, diseases related to heterozygote deletion or medication guidance and the like can be preliminarily judged according to the HRD-LOH score or combined with other indexes such as BRCA1/2 mutation and the like, and the HRD score can also be preliminarily judged; the HRD score can be preliminarily judged according to the HRD-TAI score or combined with other indexes to preliminarily judge diseases related to the imbalance of the telomere sites or guide medication and the like, and can also be preliminarily judged; and primarily judging diseases or medication guidance and the like related to the large degree of genome instability according to the HRD-LST score or combining other indexes, and also primarily judging the HRD score.
Example 1 Pre-processing of Low depth WGS data processing
1. Removing joints on reads from low-depth WGS offline data through fastp software;
2. comparing the processed off-line data with a reference genome of a human whole genome through bwa software to obtain a first comparison file in a bam format;
3. correcting the base quality value of the first comparison file through GATK software;
4. and removing repeated reads from the corrected first comparison file through picard software to obtain a second comparison file which does not contain the repeated reads, wherein the file format is a bam format.
5. The human whole genome was divided into windows of 100Kbp in rank order.
Example 2 construction of DR fragment
1) Taking reads in the second alignment file in the embodiment 1 as a basic unit, counting the number of reads falling in each window in the embodiment 1, and recording the number as a reads count of the window as RCiI is the order of windows divided in the whole genome in the order of arrangement, and i is 1,2,3.
2) Each time of statisticsThe GC base contents of the individual windows are combined into one group, and the j-th group is denoted as WjThe number of windows in the jth group is represented as MjAnd the kth window contained in the jth group is denoted as WkjJ and k are 1,2 and 3 respectively;
3) calculating WjIs a median value RC ofjThe average RC of the whole sample to be measured is recorded as RCpBy the following formula to RCiAnd (3) performing correction:
Figure 330960DEST_PATH_IMAGE001
i=M1+M2+M3...+M(j-1)+k;
4) processing 30 healthy people low-depth WGS data according to the method in the steps 1), 2) and 3), and calculating the median RC of each window in a plurality of samples, and recording the RC as RCyConstructing baseline as RC of the window; n is more than or equal to 30, and y is 1,2,3.
5) Taking NRC of each window to be tested sampleiDivided by RC in the corresponding baselineyObtaining DR (depthratio);
6) based on a cyclic binary segmentation algorithm (CBS: circular segmentation) is used for segmenting DR (segments), which are marked as DR segments, the DR values in the same DR segment are relatively close, the average DR values of two adjacent DR segments are obviously different, and each DR segment at least comprises 10 windows.
Example 3 calculation of copy number
Counting the median of DR in each DR segment, taking the median as the DR value of the DR segment, and recording as DRqCalculating the copy number of the DR fragment, and recording as CqThe calculation formula is as follows:
Figure 473228DEST_PATH_IMAGE002
in this example, the intrinsic cause of cancer in a patient can be preliminarily understood by calculation of Cq (copy number).
Example 4 HRD-LOH score statistics
1. Using thousands of genome planning data, selecting SNP sites with higher heterozygous probability, wherein the sites are approximately uniformly distributed in the genome and are about 110000;
2. counting the frequency of each site allelic base on the SNP site on a sample to be detected, and if a plurality of allelic bases exist, selecting two of the allelic bases with the highest frequency; if there is only one allelic base, the second allelic base is given a default frequency of 0;
3. counting the average frequency of the second allelic base of the SNP locus in each window in example 1 as AF (alloele frequency) of the window, and generating a new AF sequence; if AF is greater than 0, adjust AF to 0.5;
4. connecting adjacent windows with the same AF in the step 3 to obtain a larger AF fragment;
5. selecting AF fragments of which Cq is more than or equal to 1 and AF is equal to 0 in example 3, and recording as an LOH event if the length of the fragments is more than 15Mb and less than the length of the whole chromosome in which the fragments are located;
6. and recording LOH events in the sample to be detected as HRD-LOH score.
Example 5 statistics of HRD-TAI score
1. Selecting SNP loci with higher heterozygosity probability by using genome planning data of thousands of people;
2. counting variation frequency of allelic base of each site on the SNP site to obtain two frequencies with the highest variation frequency, namely a first allelic base frequency AF1 and a second allelic base frequency AF2, and calculating an AFR value of each site according to the following formula; if a certain locus has no variation, the AFR value is 0, and the locus is removed;
Figure 889166DEST_PATH_IMAGE003
3. the average AFR for each window in example 1 was calculated as the AFR for that window and reported as the AFRpIf the AFR values of a certain window are all 0, the AFR of the windowpIs 0;
4. subjecting AFR topProximity wi of less than 0.5Window merging, AFRpCombining adjacent windows larger than 0.5 to respectively generate AFR fragments;
5. if a certain AFR fragment contains telomeres, the length is more than 11Mb, and the AFR fragment contains telomerespIf the number is less than 0.5, the event is marked as a TAI event;
6. and recording the TAI event in the sample to be detected as HRD-TAI score.
Example 6 statistics of HRD-LST score
1. Removing fragments with the DR fragment less than 3Mb from the DR fragments constructed in the example 2;
2. taking a single chromosome as an analysis target, sequentially comparing the DR segment with the chromosome, and comparing adjacent C on the chromosomeqThe same DR segments are merged into one large segment, denoted as DRdAnalyzing and processing all chromosomes in sequence;
3. for DRdMaking statistics if DR is formeddThe length of two adjacent DR segments is more than 10Mb, and the middle interval is less than 3Mb, then the two adjacent DR segments are marked as an LST event;
4. and recording the LST event in the sample to be detected as HRD-LST score.
Example 7 statistics of HRD score
HRD score = HRD-LOH score + HRD-TAI score + HRD-LST score。
In this example, a cutoff value of HRD is HRD score =42, i.e., greater than 42, the patient is sensitive to platinum group drugs and PARP inhibitors.
Example 8 determination of the number of layers for Low depth WGS sequencing
WGS sequencing was performed using 10 samples to obtain 50-fold data, then 50X, 30X, 20X, 10X, 5X were randomly truncated, HRD-LOH score, HRD-TAI score, and HRD-LST score were detected using the methods in examples 1-7, and the final HRD score was calculated, thereby performing minimum sequencing quantity evaluation. The experimental results are shown in Table 1.
TABLE 1 correlation of different depth sequencing
Figure 749674DEST_PATH_IMAGE004
As can be seen from table 1, with the data volume of 50X as a reference, when the data volume of 30X, 20X, 10X is used, the correlation coefficient between the obtained HRD-score and the data volume of 50X is over 95%, and the lower the number of layers, the worse the correlation coefficient, so in the present embodiment, 10X with more than 95% correlation is used as the lowest number of layers for low depth WGS sequencing.
Application example 1
A sample from a breast cancer patient is analyzed using the methods identified in examples 1-8 of the present application.
Breast cancer patients: name wu-za, gender of the female, age 46, clinical symptoms left invasive carcinoma of the breast, no other history.
The patient was judged to be HRD positive if the patient had an HRD-LOH score of 1, an HRD-TAI score of 4, an HRD-LST score of 44, a total HRD score of 49, and a cutoff value as determined by the methods described in examples 1-8 herein. The patient is judged to have good response to platinum chemotherapeutic drugs or PARP inhibitors and the like, and the result is consistent with the correlation of HRD-high and platinum drug treatment sensitivity reported in the prior literature. Clinically, patients were given a platinum chemotherapeutic with Progression Free Survival (PFS) of 13 months, demonstrating that HRD score can be calculated using the methods identified in examples 1-6 herein and clinical medication can be guided by HRD score.
Application example 2
43 breast cancer samples and 72 ovarian cancer samples (115 samples) are selected for low-depth whole genome sequencing, and the 115 samples have prognosis information of platinum chemotherapy (namely, the 115 samples are all treated by the platinum chemotherapy). LOH, TAI, LST were calculated using the methods described in examples 1-8 of the present application to obtain HRD score.
Based on the HRD score values, 115 samples were divided into two groups, HRD-High (positive) and HRD-Low (negative), with cutoff =42 as the critical point. Among them, 40 cases of HRD-High group and 75 cases of HRD-low group were used. Survival analysis was performed on the HRD-High and HRD-Low groups in combination with clinical PFS (progress Free survival), and the experimental results are shown in FIG. 2.
As can be seen from FIG. 2, the overall survival time of the HRD-High group (40 cases) is significantly longer than that of the HRD-Low group (75 cases), indicating that the HRD-High group is sensitive to platinum-based drug treatment, consistent with the fact.
The harmful variation of SNV and Indel on BRCA1/2 gene of 115 samples is detected respectively. The result is: among the 43 breast cancer samples, 13 BRCA1/2 mutant samples and 30 BRCA1/2 wild type samples; among the 72 ovarian cancer samples, 27 BRCA1/2 mutant samples and 45 BRCA1/2 wild-type samples were subjected to BRCA1/2-HRD distribution, and the results are shown in FIG. 3.
As can be seen from FIG. 3, in the samples with BRCA1/2 mutation, most of HRD values are high and are HRD positive, and patients in the category are sensitive to platinum drugs; while a small proportion of samples in the wild type BRCA1/2 were HRD-High, it can be seen from FIG. 2 that even in the wild type BRCA1/2, if the HRD score is High and the samples are HRD-positive, the patient is sensitive to platinum drugs, which is consistent with the fact. Therefore, the HRD score was calculated using the methods provided in the examples of the present application, and clinical medication could be guided to some extent by the numerical values of the HRD score.
The present embodiment is only for explaining the present application, and it is not limited to the present application, and those skilled in the art can make modifications of the present embodiment without inventive contribution as needed after reading the present specification, but all of them are protected by patent law within the scope of the claims of the present application.

Claims (10)

1. A method for evaluating HRD score based on low depth WGS, comprising the steps of,
processing low-depth WGS offline data of a sample to be detected; and any one or more steps selected from the following three steps:
the method comprises the following steps: establishing a calculation method of genome heterozygosity loss LOH to obtain HRD-LOH score;
step two: establishing a calculation method of telomere allele imbalance TAI to obtain HRD-TAI score; and the combination of (a) and (b),
step three: and establishing a calculation method of large-fragment migration LST to obtain the HRD-LST score.
2. The method of claim 1, wherein the processing low depth WGS machine data of the sample to be tested specifically comprises:
s1-1: comparing the offline data with a reference genome of a whole genome to obtain a first comparison file;
s1-2: removing repeated reads in the first comparison file to obtain a second comparison file;
s1-3: the whole genome was divided into windows of 100Kbp in order of arrangement.
3. The method of claim 2, wherein processing the low depth WGS machine data of the sample to be tested further comprises:
s1-4: taking reads in the second comparison file as a basic unit, counting the number of reads in each window, taking the number of reads as the reads count of the window, and recording the number as RCiI is the order of windows divided in the whole genome according to the arrangement order, and i is 1,2 and 3;
s1-5: counting the GC base content of each window, combining adjacent windows with the same GC content into one group, and marking the jth group as WjThe number of windows in the jth group is represented as MjAnd the kth window contained in the jth group is denoted as WkjJ and k are 1,2 and 3 respectively;
s1-6: calculating WjIs a median value RC ofjThe average RC of the whole sample to be measured is recorded as RCpBy the following formula to RCiAnd (3) performing correction:
Figure 105526DEST_PATH_IMAGE001
i=M1+M2+M3...+M(j-1)+k;
s1-7: processing the low depth WGS off-set data of the N healthy samples according to the steps S1-4, S1-5 and S1-6, calculating the median value RC of each window in the N healthy samples, and marking as RCyAs aThe RC of the window constructs baseline; n is more than or equal to 30, and y is 1,2, 3;
s1-8: taking NRC of each window to be tested sampleiDivided by RC in the corresponding baselineyObtaining DR;
s1-9: and segmenting the DR based on a cyclic binary segmentation algorithm, and marking as DR segments, wherein DR values in the same DR segment are relatively close, the average DR values of two adjacent DR segments are obviously different, and each DR segment at least comprises 10 windows.
4. The method of claim 3, wherein processing the low depth WGS trip data for the sample to be tested further comprises:
s1-10: counting the median of DR in each DR segment, taking the median as the DR value of the DR segment, and recording as DRqCalculating the copy number of the DR fragment, and recording as CqThe calculation formula is as follows:
Figure 370810DEST_PATH_IMAGE002
5. the method of claim 4, wherein the calculation method for establishing genomic loss of heterozygosity LOH specifically comprises:
s2-1: selecting SNP loci with higher heterozygosity probability by using genome planning data of thousands of people;
s2-2: counting the frequency of each site allelic base on the SNP site on a sample to be detected, and if a plurality of allelic bases exist, selecting two of the allelic bases with the highest frequency; if there is only one allelic base, the second allelic base is given a default frequency of 0;
s2-3: counting the average number of the second allelic base frequency of the SNP locus in each window to serve as AF of the window, and generating a new AF number sequence; if AF is greater than 0, adjust AF to 0.5;
s2-4: connecting the same AF in the step S2-3 and adjacent windows to obtain a larger AF fragment;
s2-5: selecting Cq1 or more and AF equal toAn AF fragment of 0, which is marked as an LOH event if the length of the fragment is greater than 15Mb and less than the length of the whole chromosome in which the fragment is located;
s2-6: and recording LOH events in the sample to be detected as HRD-LOH score.
6. The method according to claim 3, wherein the calculation method for establishing large segment migration LST specifically comprises:
s4-1: removing fragments of which the DR fragments are less than 3Mb from the DR fragments obtained in the step S1-9;
s4-2: taking a single chromosome as an analysis target, sequentially comparing the DR segment with the chromosome, and comparing adjacent C on the chromosomeqThe same DR segments are merged into one large segment, denoted as DRdAnalyzing and processing all chromosomes in sequence;
s4-3: for DRdMaking statistics if DR is formeddThe length of two adjacent DR segments is more than 10Mb, and the middle interval is less than 3Mb, then the two adjacent DR segments are marked as an LST event;
s4-4: and recording the LST event in the sample to be detected as HRD-LST score.
7. The method according to claim 2, wherein the calculation method for establishing Telomere Allele Imbalance (TAI) specifically comprises:
s3-1: selecting SNP loci with higher heterozygosity probability by using genome planning data of thousands of people;
s3-2: counting the variation frequency of the allelic base of each site on the SNP site of the second comparison file to obtain two frequencies with the highest variation frequency, namely a first allelic base frequency AF1 and a second allelic base frequency AF2, and calculating the AFR value of each site according to the following formula; if a certain locus has no variation, the AFR value is 0, and the locus is removed;
Figure 692070DEST_PATH_IMAGE003
s3-3: calculate each instituteThe average AFR of the window is designated as AFR as the AFR of the windowpIf the AFR values of a certain window are all 0, the AFR of the windowpIs 0;
s3-4: subjecting AFR topCombining adjacent windows less than 0.5 to obtain AFRpCombining adjacent windows larger than 0.5 to respectively generate AFR fragments;
s3-5: if a certain AFR fragment contains telomeres, the length is more than 11Mb, and the AFR fragment contains telomerespIf the number is less than 0.5, the event is marked as a TAI event;
s3-6: and recording the TAI event in the sample to be detected as HRD-TAI score.
8. The method of claim 2, wherein the processing low depth WGS machine data of the sample to be tested further comprises S1-0: preprocessing machine unloading data: and removing the joints on the reads from the data of the off-line machine.
9. The method of claim 1, wherein the HRD score = HRD-LOH score + HRD-TAI score + HRD-LST score.
10. The method of claim 1 wherein the low depth WGS is a WGS sequencing result of 10 layers or more.
CN202110716079.8A 2021-06-28 2021-06-28 Method for evaluating HRD score based on low-depth WGS Active CN113257346B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202210674201.4A CN114999568B (en) 2021-06-28 2021-06-28 Calculation method of telomere allele imbalance TAI
CN202110716079.8A CN113257346B (en) 2021-06-28 2021-06-28 Method for evaluating HRD score based on low-depth WGS
CN202111154180.5A CN113948151B (en) 2021-06-28 2021-06-28 Processing method of low-depth WGS (WGS) offline data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110716079.8A CN113257346B (en) 2021-06-28 2021-06-28 Method for evaluating HRD score based on low-depth WGS

Related Child Applications (2)

Application Number Title Priority Date Filing Date
CN202210674201.4A Division CN114999568B (en) 2021-06-28 2021-06-28 Calculation method of telomere allele imbalance TAI
CN202111154180.5A Division CN113948151B (en) 2021-06-28 2021-06-28 Processing method of low-depth WGS (WGS) offline data

Publications (2)

Publication Number Publication Date
CN113257346A true CN113257346A (en) 2021-08-13
CN113257346B CN113257346B (en) 2021-10-19

Family

ID=77189984

Family Applications (3)

Application Number Title Priority Date Filing Date
CN202110716079.8A Active CN113257346B (en) 2021-06-28 2021-06-28 Method for evaluating HRD score based on low-depth WGS
CN202111154180.5A Active CN113948151B (en) 2021-06-28 2021-06-28 Processing method of low-depth WGS (WGS) offline data
CN202210674201.4A Active CN114999568B (en) 2021-06-28 2021-06-28 Calculation method of telomere allele imbalance TAI

Family Applications After (2)

Application Number Title Priority Date Filing Date
CN202111154180.5A Active CN113948151B (en) 2021-06-28 2021-06-28 Processing method of low-depth WGS (WGS) offline data
CN202210674201.4A Active CN114999568B (en) 2021-06-28 2021-06-28 Calculation method of telomere allele imbalance TAI

Country Status (1)

Country Link
CN (3) CN113257346B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114067909A (en) * 2021-11-23 2022-02-18 深圳基因家科技有限公司 Method, device and storage medium for correcting homologous recombination defect score
CN114242170A (en) * 2021-12-21 2022-03-25 深圳吉因加医学检验实验室 Method and device for evaluating homologous recombination repair defects and storage medium
CN114300053A (en) * 2021-12-29 2022-04-08 苏州绘真医学检验有限公司 Homologous recombination defective gene analysis method
CN115862733A (en) * 2023-02-27 2023-03-28 广州嘉检医学检测有限公司 Method for detecting heterozygosity loss based on medium-depth whole genome next generation sequencing
CN115985399A (en) * 2023-03-20 2023-04-18 广州迈景基因医学科技有限公司 HRD panel site selection optimization method and system for high-throughput sequencing

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111883211A (en) * 2020-08-07 2020-11-03 张哲� Gene scar for representing HRD homologous recombination repair defect and identification method
CN112410423A (en) * 2020-11-03 2021-02-26 南京世和基因生物技术股份有限公司 Marker for deletion of homologous recombination, detection method and detection system
CN112802548A (en) * 2021-01-07 2021-05-14 深圳吉因加医学检验实验室 Method for predicting allele-specific copy number variation of single-sample whole genome

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3080292B1 (en) * 2013-12-09 2022-02-02 Institut Curie Methods for detecting inactivation of the homologous recombination pathway (brca1/2) in human tumors
EP3180447B1 (en) * 2014-08-15 2020-03-11 Myriad Genetics, Inc. Methods and materials for assessing homologous recombination deficiency
WO2016118726A2 (en) * 2015-01-21 2016-07-28 Sangamo Biosciences, Inc. Methods and compositions for identification of highly specific nucleases
CN105986011B (en) * 2015-01-30 2019-10-15 深圳华大基因研究院 A kind of detection method of loss of heterozygosity
CN107287285A (en) * 2017-03-28 2017-10-24 上海至本生物科技有限公司 It is a kind of to predict the method that homologous recombination absent assignment and patient respond to treatment of cancer
CN111676277B (en) * 2020-08-12 2020-12-15 臻和(北京)生物科技有限公司 Method and kit for determining unstable genome based on second-generation sequencing technology
CN112164420B (en) * 2020-09-07 2021-07-20 厦门艾德生物医药科技股份有限公司 Method for establishing genome scar model
CN112397145A (en) * 2020-11-19 2021-02-23 河南省肿瘤医院 HRD (high resolution display) score calculation method based on chip detection
CN112669906B (en) * 2020-11-25 2021-09-28 深圳华大基因股份有限公司 Detection method, device, terminal device and computer-readable storage medium for measuring genome instability
CN112226495B (en) * 2020-12-18 2021-03-16 北京迈基诺基因科技股份有限公司 Method for detecting DNA homologous recombination abnormality and application thereof
CN112820351A (en) * 2021-03-01 2021-05-18 江苏医联生物科技有限公司 Method for detecting mutation and HRD (high resolution contrast) score guiding medication of tumor patient
CN112980834B (en) * 2021-04-22 2021-08-17 菁良基因科技(深圳)有限公司 Homologous recombination defect repair reference product and preparation method and kit thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111883211A (en) * 2020-08-07 2020-11-03 张哲� Gene scar for representing HRD homologous recombination repair defect and identification method
CN112410423A (en) * 2020-11-03 2021-02-26 南京世和基因生物技术股份有限公司 Marker for deletion of homologous recombination, detection method and detection system
CN112802548A (en) * 2021-01-07 2021-05-14 深圳吉因加医学检验实验室 Method for predicting allele-specific copy number variation of single-sample whole genome

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
EECKHOUTTE A.等: "ShallowHRD: detection of homologous recombination deficiency from shallow whole genome sequencing", 《BIOINFORMATICS》 *
POORNIMA VIJAYAN 等: "A brief overview of homologous recombination deficiency testing in cancers for the "Next-Generation" pathologist", 《JOURNAL OF PATHOLOGY OF NEPAL》 *
XAVIER M.DE LUCA等: "Using whole-genome sequencing data to derive the homologous recombination deficiency scores", 《NPJ BREAST CANCER》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114067909A (en) * 2021-11-23 2022-02-18 深圳基因家科技有限公司 Method, device and storage medium for correcting homologous recombination defect score
CN114067909B (en) * 2021-11-23 2022-08-30 北京吉因加医学检验实验室有限公司 Method, device and storage medium for correcting homologous recombination defect score
CN114242170A (en) * 2021-12-21 2022-03-25 深圳吉因加医学检验实验室 Method and device for evaluating homologous recombination repair defects and storage medium
CN114300053A (en) * 2021-12-29 2022-04-08 苏州绘真医学检验有限公司 Homologous recombination defective gene analysis method
CN115862733A (en) * 2023-02-27 2023-03-28 广州嘉检医学检测有限公司 Method for detecting heterozygosity loss based on medium-depth whole genome next generation sequencing
CN115985399A (en) * 2023-03-20 2023-04-18 广州迈景基因医学科技有限公司 HRD panel site selection optimization method and system for high-throughput sequencing

Also Published As

Publication number Publication date
CN113948151A (en) 2022-01-18
CN113257346B (en) 2021-10-19
CN114999568A (en) 2022-09-02
CN113948151B (en) 2022-07-05
CN114999568B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
CN113257346B (en) Method for evaluating HRD score based on low-depth WGS
CN109880910A (en) A kind of detection site combination, detection method, detection kit and the system of Tumor mutations load
CN112226495B (en) Method for detecting DNA homologous recombination abnormality and application thereof
CN107423578B (en) Device for detecting somatic cell mutation
CN103667438B (en) Method for screening HRDs disease-causing mutation and gene chip hybridization probe designing method involved in same
CN111968701B (en) Method and device for detecting somatic copy number variation of designated genome region
CN108256292A (en) A kind of copy number variation detection device
CN107480470A (en) Known the variation method for detecting and device examined based on Bayes and Poisson distribution
KR101936933B1 (en) Methods for detecting nucleic acid sequence variations and a device for detecting nucleic acid sequence variations using the same
CN112088220A (en) Surrogate markers and methods for tumor mutation burden determination
Zhao et al. Association between the IL-10-1082G/A, IL-10-592A/C, and IL-10-819G/A polymorphisms and atopic dermatitis susceptibility: a meta-analysis
CN113724781B (en) Method and apparatus for detecting homozygous deletions
Eaaswarkhanth et al. Genome-wide selection scan in an Arabian Peninsula population identifies a TNKS haplotype linked to metabolic traits and hypertension
CN106906220A (en) A kind of COL4A5 genes of mutation and its application
CN112037863B (en) Early NSCLC prognosis prediction system
CN111951893A (en) Method for constructing tumor mutation load TMB panel and using method thereof
CN103710340B (en) A kind of I type USHER syndrome associated gene mutation and apply the deaf Molecular Etiology diagnostic reagent of this mutator gene
CN114694752B (en) Method, computing device and medium for predicting homologous recombination repair defects
CN105838720A (en) PTPRQ gene mutant and application thereof
CN114410772A (en) Susceptibility gene of chronic obstructive pulmonary acute exacerbation and application thereof in predicting susceptibility to chronic obstructive pulmonary acute exacerbation
CN114067908A (en) Method, device and storage medium for evaluating single-sample homologous recombination defects
Quinodoz et al. Detection of elusive DNA copy-number variations in hereditary disease and cancer through the use of noncoding and off-target sequencing reads
CN115074439B (en) Group of NK/T cell lymphoma prognosis related genes, genome prognosis model and application thereof
CN111383713A (en) ctDNA detection and analysis device and method
Gillmor Deconvolution of Genetic Heterogeneity in Glioblastoma

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Lou Feng

Inventor after: Wang Xiaojia

Inventor after: Liu Kai

Inventor after: Cao Wenming

Inventor after: Zhang Mengmeng

Inventor after: Guo Jing

Inventor after: Sun Hong

Inventor after: Cao Shanbai

Inventor before: Lou Feng

Inventor before: Liu Kai

Inventor before: Zhang Mengmeng

Inventor before: Guo Jing

Inventor before: Sun Hong

Inventor before: Cao Shanbai

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230605

Address after: 102600 1601, floor 16, building 5, yard 18, Kechuang 13th Street, Beijing Economic and Technological Development Zone, Daxing District, Beijing

Patentee after: Beijing Xiangxin Biotechnology Co.,Ltd.

Patentee after: Tianjin Xiangxin Biotechnology Co.,Ltd.

Patentee after: Tianjin Xiangxin Medical Instrument Co.,Ltd.

Patentee after: ZHEJIANG CANCER Hospital

Address before: 102600 1601, floor 16, building 5, yard 18, Kechuang 13th Street, Beijing Economic and Technological Development Zone, Daxing District, Beijing

Patentee before: Beijing Xiangxin Biotechnology Co.,Ltd.

Patentee before: Tianjin Xiangxin Biotechnology Co.,Ltd.

Patentee before: Tianjin Xiangxin Medical Instrument Co.,Ltd.