CN113257346A

CN113257346A - Method for evaluating HRD score based on low-depth WGS

Info

Publication number: CN113257346A
Application number: CN202110716079.8A
Authority: CN
Inventors: 楼峰; 刘凯; 张萌萌; 郭璟; 孙宏; 曹善柏
Original assignee: Tianjin Xiangxin Biotechnology Co ltd; Tianjin Xiangxin Medical Instrument Co ltd; Beijing Xiangxin Biotechnology Co ltd
Current assignee: Beijing Xiangxin Biotechnology Co ltd; Tianjin Xiangxin Biotechnology Co ltd; Tianjin Xiangxin Medical Instrument Co ltd; Zhejiang Cancer Hospital
Priority date: 2021-06-28
Filing date: 2021-06-28
Publication date: 2021-08-13
Anticipated expiration: 2041-06-28
Also published as: CN113948151A; CN113257346B; CN114999568A; CN113948151B; CN114999568B

Abstract

The application belongs to the technical field of gene detection, and particularly discloses a method for evaluating HRD score based on low-depth WGS, which comprises the following steps: processing low-depth WGS offline data of a sample to be detected; and any one or more of the following three steps: the method comprises the following steps: establishing a calculation method of genome heterozygosity loss LOH to obtain HRD-LOH score; step two: establishing a calculation method of telomere allele imbalance TAI to obtain HRD-TAI score; and, step three: and establishing a calculation method of large-fragment migration LST to obtain the HRD-LST score. The application has at least one of the following beneficial effects: the method for evaluating the HRD score based on the low-depth WGS provided by the application is used for analyzing on the basis of data formed by sequencing of the low-depth WGS, so that the cost is greatly reduced, and the method is beneficial to large-scale application.

Description

Method for evaluating HRD score based on low-depth WGS

Technical Field

The application belongs to the technical field of gene detection, and particularly relates to a method for evaluating HRD score based on low-depth WGS.

Background

DNA double strand breaks (double strand breaks) are a type of DNA damage, which can cause chromosome breakage and rearrangement in severe cases, and because there is no complementary strand to repair, DNA sequences are difficult to recover, resulting in loss of genetic information, and such DNA double strand breaks require homologous recombination to repair. When HRD occurs in the absence of the repair ability of homologous recombination, the genome loses stability, and DNA damage is likely to accumulate in the case of unstable genome, resulting in a vicious circle and cancer. HRD has important guiding significance for the use of platinum or PARP inhibitors.

HRD is generally caused by genetic variation or apparent variation in homologous recombination repair pathways including genes such as BRCA1/2, Rad52/Rad22, PALB2, RAD51 family, BRIP1/BACH1, ATM and CHEK 2. Studies have shown that women with the BRCA1 mutation have a 50-85% and 15-45% probability of acquiring breast and ovarian cancer. In breast cancer, the genetic BRCA1/2 variation accounts for approximately 7%, while in triple negative breast cancers 11% -15% can be achieved. In familial and sporadic breast cancer patients, an estimated 40% belong to the homologous recombination defect. Although current major focus is on the treatment of HRD in breast cancer, HRD is also an important indicator in other cancer species.

At present, the HRD detection methods include the following two methods:

HR gene chip, chip design contains homologous recombination pathway gene, utilizes target capture technology and second generation sequencing technology, obtains the sequencing data of homologous recombination pathway gene, detects SNV, Indel and large array of all genes, the disadvantage is that HRD may be overestimated, and based on chip detection, the SNP site on the chip is fixed, only can detect the variation of specific site, has certain limitation.

Whole Genome Sequencing (WGS), sequencing the whole genome, detecting chromosomal structural variation: HRD score was calculated including loss of heterozygosity-LOH, telomere site imbalance-TAI and large degree of genomic instability-LST. The method has the advantages of high accuracy; the disadvantage is the relatively high cost.

Disclosure of Invention

In order to reduce the cost under the condition of ensuring the sensitivity and the accuracy, the application provides the method for evaluating the HRD score based on the low-depth WGS, the method evaluates the HRD score based on the data obtained by the low-depth WGS sequencing, the cost is reduced, and the method is more suitable for large-scale clinical application.

The application is realized by the following scheme:

the application provides a method for evaluating HRD score based on low depth WGS, comprising the following steps: processing low-depth WGS offline data of a sample to be detected; and any one or more steps selected from the following steps:

the method comprises the following steps: establishing a calculation method of genome heterozygosity loss LOH to obtain HRD-LOH score;

step two: establishing a calculation method of telomere allele imbalance TAI to obtain HRD-TAI score; and the combination of (a) and (b),

step three: and establishing a calculation method of large-fragment migration LST to obtain the HRD-LST score.

The method for evaluating the HRD score is established on the basis of low-depth WGS off-line data, so that the cost of Whole Genome Sequencing (WGS) is greatly reduced, compared with HR gene chip detection, the method has the advantages that the detection site is more flexible, the result of detecting a sample to be detected is more accurate, and the real condition of the sample to be detected is met.

In an embodiment of the present application, the processing low-depth WGS machine data of a sample to be measured specifically includes:

s1-1: comparing the off-line data with a reference genome of a human whole genome to obtain a first comparison file;

s1-2: removing repeated reads in the first comparison file to obtain a second comparison file;

s1-3: the human whole genome was divided into windows of 100 kbps in size.

In the application, the whole genome is divided into different windows according to the sequence and the size of 100Kbp, so that the analysis and the processing of subsequent data are facilitated.

In an embodiment of the present application, the processing the low-depth WGS machine data of the sample to be tested further includes:

s1-4: taking reads in the second comparison file as a basic unit, counting the number of reads in each window, taking the number of reads as the reads count of the window, and recording as RC_iI is the order of windows divided in the whole genome according to the arrangement order, and i is 1,2 and 3;

s1-5: counting the GC base content of each window, combining adjacent windows with the same GC content into one group, and marking the jth group as W_jThe number of windows in the jth group is represented as M_jAnd the kth window contained in the jth group is denoted as W_kjJ and k are 1,2 and 3 respectively;

s1-6: calculate each W_jMedian value of (2), denoted as RC_jAverage RC over the sample, denoted RC_pBy the following formula to RC_iAnd (3) performing correction:

i=M₁+M₂+M₃...+M_（j-1）+k；

s1-7: processing the low depth WGS off-set data of the N healthy samples according to the steps S1-1, S1-2 and S1-3, calculating the median value RC of each window in the N healthy samples, and marking as RC_yConstructing baseline as RC of the window, wherein N is more than or equal to 30, and y is 1,2 and 3;

s1-8: traversing windows of the sample to be detected and windows of the healthy sample, and taking every timeDividing NRCi of each window sample to be tested by RC in corresponding baseline_yObtaining DR;

s1-9: and (2) segmenting the DR based on a cyclic binary segmentation algorithm (CBS) and recording as DR segments, wherein DR values in the same DR segment are relatively close, average DR values of two adjacent DR segments are obviously different, and each DR segment at least comprises 10 windows.

In the application, each DR segment is set to contain at least 10 windows, wherein 10 windows can ensure that segments with the length of more than 1M are reserved in each DR segment, so as to shield interference signals to the greatest extent possible.

s1-10: counting the median of DR in each DR segment, taking the median as the DR value of the DR segment, and recording as DR_qCalculating the copy number of the DR fragment, and recording as C_qThe calculation formula is as follows:

。

in the application, the internal cause of cancer occurrence can be preliminarily known by calculating the Cq value, and the medicine can be taken according to the symptoms on the cytology level, so that the remission rate of the cancer is greatly improved. If the Cq value is not equal to 2, it indicates that a variation in gene copy number has occurred. A Cq value greater than 2 indicates an increase in the gene (gain), and a value less than 2 indicates a deletion of the gene (loss). If some of the genes responsible for cell proliferation produce gain or tumor suppressor genes lose, it is possible to trigger unlimited proliferation of cells, leading to the development of cancer. Therefore, the occurrence of cancer can be preliminarily judged from the Cq value.

In one embodiment of the present application, the calculation method for establishing genomic heterozygosity loss LOH specifically comprises:

s2-1: selecting SNP loci with higher heterozygosity probability by using genome planning data of thousands of people;

s2-2: counting the frequency of each site allelic base on the SNP site on a sample to be detected, and if a plurality of allelic bases exist, selecting two of the allelic bases with the highest frequency; if there is only one allelic base, the second allelic base is given a default frequency of 0;

s2-3: counting the average frequency of the second allelic base of the SNP locus in each window to serve as AF (alloele frequency) of the window, and generating a new AF sequence; if AF is greater than 0, adjust AF to 0.5;

s2-4: connecting the same AF in the step S2-3 and adjacent windows to obtain a larger AF fragment;

s2-5: selecting C_qIf the length of the AF fragment is more than or equal to 1 and is less than the length of the whole chromosome where the AF fragment is located and is more than 15Mb, the AF fragment is marked as an LOH event;

s2-6: and recording LOH events in the sample to be detected as HRD-LOH score.

In one embodiment of the present application, the SNP sites with higher heterozygous probability means that the heterozygous probability is greater than 0.2, and the sites are approximately uniformly distributed on the genome, and the total number of the SNP sites is about 110000.

In a specific embodiment of the present application, the calculation method for establishing a large segment migration LST specifically includes:

s4-1: removing fragments of which the DR fragments are less than 3Mb from the DR fragments obtained in the step S1-9;

s4-2: taking a single chromosome as an analysis target, sequentially comparing the DR segment with the chromosome, and comparing adjacent C on the chromosome_qThe same DR segments are merged into one large segment, denoted as DR_dAnalyzing and processing all chromosomes in sequence;

s4-3: for DR_dMaking statistics if DR is formed_dThe length of two adjacent DR segments is more than 10Mb, and the middle interval is less than 3Mb, then the two adjacent DR segments are marked as an LST event;

s4-4: and recording the LST event in the sample to be detected as HRD-LST score.

In a specific embodiment of the present application, the calculation method for establishing the telomere allele imbalance TAI specifically includes:

s3-1: selecting SNP loci with higher heterozygosity probability by using genome planning data of thousands of people;

s3-2: counting variation frequency of allelic base of each site on the SNP site to obtain two frequencies with the highest variation frequency, namely a first allelic base frequency AF1 and a second allelic base frequency AF2, and calculating an AFR (equivalent frequency ratio) value of each site according to the following formula; if a certain locus has no variation, the AFR value is 0, and the locus is removed;

s3-3: calculating the average AFR of each window as the AFR of the window, and recording as the AFR_pIf the AFR values of a certain window are all 0, the AFR of the window_pIs 0;

s3-4: subjecting AFR to_pCombining adjacent windows less than 0.5 to obtain AFR_pCombining adjacent windows larger than 0.5 to respectively generate AFR fragments;

s3-5: if a certain AFR fragment contains telomeres, the length is more than 11Mb, and the AFR fragment contains telomeres_pIf the number is less than 0.5, the event is marked as a TAI event;

s3-6: and recording the TAI event in the sample to be detected as HRD-TAI score.

In an embodiment of the present application, the processing the low-depth WGS machine data of the sample to be tested further includes S1-0: preprocessing machine unloading data: and removing the joints on the reads from the data of the off-line machine.

In an embodiment of the present application, a file format of the low-depth WGS offline data is fastq format.

In one embodiment of the present application, the off-line data removes the joints on the reads through fastp software.

In a specific embodiment of the present application, the format of the first comparison file is a bam format.

In one embodiment of the present application, the off-set data is aligned to a reference genome of the human whole genome by bwa software.

In a specific embodiment of the present application, the first comparison file is obtained by removing duplicate reads through the picard software.

In a specific embodiment of the present application, the first comparison file is subjected to base quality value correction before removing duplicate reads.

In a specific embodiment of the present application, the first comparison file is base quality corrected by GATK software.

In a specific embodiment of the present application, the HRD score = HRD-LOH score + HRD-TAI score + HRD-LST score.

In a specific embodiment of the present application, the cutoff value for HRD negative or positive is HRD score = 42.

In a specific embodiment of the present application, the low depth WGS is a WGS sequencing result of 10 or more layers. Preferably, the low depth WGS is a 10-tier WGS sequencing result.

Another aspect of the present application provides an apparatus for enabling low depth WGS-based assessment of HRD score, comprising:

a data processing module: for processing low depth WGS off-line data; and one or more statistical modules selected from:

HRD-LOH score statistics Module: used for judging and counting the HRD-LOH score;

HRD-TAI score statistics Module: used for judging and counting the HRD-TAI score; and

HRD-LST score statistics Module: used to judge and count the HRD-LST score.

The method provided by the application has at least one of the following beneficial effects:

the method for evaluating the HRD score based on the low-depth WGS provided by the application is used for analyzing on the basis of data formed by sequencing of the low-depth WGS, so that the cost is greatly reduced, and the method is beneficial to large-scale application.

Drawings

Fig. 1 is a schematic flow chart of a method for evaluating HRD score based on low depth WGS provided in an embodiment of the present application.

Figure 2 is a graph of survival analysis of different HRD score patients provided in the examples of the present application.

FIG. 3 is a graph of HRD score vs. BRCA1/2 deleterious mutations provided in the examples of the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

The technical solutions of the present application will be described clearly and completely in conjunction with the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. The examples, in which specific conditions are not specified, were conducted under conventional conditions or conditions recommended by the manufacturer. The reagents or instruments used are not indicated by the manufacturer, and are all conventional products available commercially.

Abbreviations and key terms in this application are defined as follows:

HR: homologous Recombination

HRD: homologous Recombination repair of defects

LOH: loss of heterozygosity, Loss of Hererozygosity

TAI: telomeric annular interference, Telomeric site imbalance

LST: large Genomic Instability of Large-Scale Genomic Instrument

HRD-LOH score: number of LOH events greater than 15Mb and less than chromosome length

HRD-TAI score: the sites extending to the ends of the chromosome are not balanced and the number of events with a region length of more than 11Mb

HRD-LST score: the number of events in which two adjacent segments of the chromosome are longer than 10Mb and the distance between the two segments is less than 3Mb

In the application, the sample to be tested is derived from a tumor patient sample received by a company, and the health sample is derived from a contribution sample of employees of the company collected by the company. The whole genome was derived from the public database NCBI, version hg 19. Thousand human genome project data are derived from the public database https:// www.internationalgenome.org/data.

As shown in fig. 1, the procedure for evaluating HRD score based on low depth WGS for the present application. In FIG. 1, HRD score was evaluated as the sum of HRD-LOH score, HRD-TAI score and HRD-LST score. In the practical application process, diseases related to heterozygote deletion or medication guidance and the like can be preliminarily judged according to the HRD-LOH score or combined with other indexes such as BRCA1/2 mutation and the like, and the HRD score can also be preliminarily judged; the HRD score can be preliminarily judged according to the HRD-TAI score or combined with other indexes to preliminarily judge diseases related to the imbalance of the telomere sites or guide medication and the like, and can also be preliminarily judged; and primarily judging diseases or medication guidance and the like related to the large degree of genome instability according to the HRD-LST score or combining other indexes, and also primarily judging the HRD score.

Example 1 Pre-processing of Low depth WGS data processing

1. Removing joints on reads from low-depth WGS offline data through fastp software;

2. comparing the processed off-line data with a reference genome of a human whole genome through bwa software to obtain a first comparison file in a bam format;

3. correcting the base quality value of the first comparison file through GATK software;

4. and removing repeated reads from the corrected first comparison file through picard software to obtain a second comparison file which does not contain the repeated reads, wherein the file format is a bam format.

5. The human whole genome was divided into windows of 100Kbp in rank order.

Example 2 construction of DR fragment

1) Taking reads in the second alignment file in the embodiment 1 as a basic unit, counting the number of reads falling in each window in the embodiment 1, and recording the number as a reads count of the window as RC_iI is the order of windows divided in the whole genome in the order of arrangement, and i is 1,2,3.

2) Each time of statisticsThe GC base contents of the individual windows are combined into one group, and the j-th group is denoted as W_jThe number of windows in the jth group is represented as M_jAnd the kth window contained in the jth group is denoted as W_kjJ and k are 1,2 and 3 respectively;

3) calculating W_jIs a median value RC of_jThe average RC of the whole sample to be measured is recorded as RC_pBy the following formula to RC_iAnd (3) performing correction:

i=M₁+M₂+M₃...+M_（j-1）+k；

4) processing 30 healthy people low-depth WGS data according to the method in the steps 1), 2) and 3), and calculating the median RC of each window in a plurality of samples, and recording the RC as RC_yConstructing baseline as RC of the window; n is more than or equal to 30, and y is 1,2,3.

5) Taking NRC of each window to be tested sample_iDivided by RC in the corresponding baseline_yObtaining DR (depthratio);

6) based on a cyclic binary segmentation algorithm (CBS: circular segmentation) is used for segmenting DR (segments), which are marked as DR segments, the DR values in the same DR segment are relatively close, the average DR values of two adjacent DR segments are obviously different, and each DR segment at least comprises 10 windows.

Example 3 calculation of copy number

Counting the median of DR in each DR segment, taking the median as the DR value of the DR segment, and recording as DR_qCalculating the copy number of the DR fragment, and recording as C_qThe calculation formula is as follows:

。

in this example, the intrinsic cause of cancer in a patient can be preliminarily understood by calculation of Cq (copy number).

Example 4 HRD-LOH score statistics

1. Using thousands of genome planning data, selecting SNP sites with higher heterozygous probability, wherein the sites are approximately uniformly distributed in the genome and are about 110000;

2. counting the frequency of each site allelic base on the SNP site on a sample to be detected, and if a plurality of allelic bases exist, selecting two of the allelic bases with the highest frequency; if there is only one allelic base, the second allelic base is given a default frequency of 0;

3. counting the average frequency of the second allelic base of the SNP locus in each window in example 1 as AF (alloele frequency) of the window, and generating a new AF sequence; if AF is greater than 0, adjust AF to 0.5;

4. connecting adjacent windows with the same AF in the step 3 to obtain a larger AF fragment;

5. selecting AF fragments of which Cq is more than or equal to 1 and AF is equal to 0 in example 3, and recording as an LOH event if the length of the fragments is more than 15Mb and less than the length of the whole chromosome in which the fragments are located;

6. and recording LOH events in the sample to be detected as HRD-LOH score.

Example 5 statistics of HRD-TAI score

1. Selecting SNP loci with higher heterozygosity probability by using genome planning data of thousands of people;

2. counting variation frequency of allelic base of each site on the SNP site to obtain two frequencies with the highest variation frequency, namely a first allelic base frequency AF1 and a second allelic base frequency AF2, and calculating an AFR value of each site according to the following formula; if a certain locus has no variation, the AFR value is 0, and the locus is removed;

3. the average AFR for each window in example 1 was calculated as the AFR for that window and reported as the AFR_pIf the AFR values of a certain window are all 0, the AFR of the window_pIs 0;

4. subjecting AFR to_pProximity wi of less than 0.5Window merging, AFR_pCombining adjacent windows larger than 0.5 to respectively generate AFR fragments;

5. if a certain AFR fragment contains telomeres, the length is more than 11Mb, and the AFR fragment contains telomeres_pIf the number is less than 0.5, the event is marked as a TAI event;

6. and recording the TAI event in the sample to be detected as HRD-TAI score.

Example 6 statistics of HRD-LST score

1. Removing fragments with the DR fragment less than 3Mb from the DR fragments constructed in the example 2;

2. taking a single chromosome as an analysis target, sequentially comparing the DR segment with the chromosome, and comparing adjacent C on the chromosome_qThe same DR segments are merged into one large segment, denoted as DR_dAnalyzing and processing all chromosomes in sequence;

3. for DR_dMaking statistics if DR is formed_dThe length of two adjacent DR segments is more than 10Mb, and the middle interval is less than 3Mb, then the two adjacent DR segments are marked as an LST event;

4. and recording the LST event in the sample to be detected as HRD-LST score.

Example 7 statistics of HRD score

HRD score = HRD-LOH score + HRD-TAI score + HRD-LST score。

In this example, a cutoff value of HRD is HRD score =42, i.e., greater than 42, the patient is sensitive to platinum group drugs and PARP inhibitors.

Example 8 determination of the number of layers for Low depth WGS sequencing

WGS sequencing was performed using 10 samples to obtain 50-fold data, then 50X, 30X, 20X, 10X, 5X were randomly truncated, HRD-LOH score, HRD-TAI score, and HRD-LST score were detected using the methods in examples 1-7, and the final HRD score was calculated, thereby performing minimum sequencing quantity evaluation. The experimental results are shown in Table 1.

TABLE 1 correlation of different depth sequencing

As can be seen from table 1, with the data volume of 50X as a reference, when the data volume of 30X, 20X, 10X is used, the correlation coefficient between the obtained HRD-score and the data volume of 50X is over 95%, and the lower the number of layers, the worse the correlation coefficient, so in the present embodiment, 10X with more than 95% correlation is used as the lowest number of layers for low depth WGS sequencing.

Application example 1

A sample from a breast cancer patient is analyzed using the methods identified in examples 1-8 of the present application.

Breast cancer patients: name wu-za, gender of the female, age 46, clinical symptoms left invasive carcinoma of the breast, no other history.

The patient was judged to be HRD positive if the patient had an HRD-LOH score of 1, an HRD-TAI score of 4, an HRD-LST score of 44, a total HRD score of 49, and a cutoff value as determined by the methods described in examples 1-8 herein. The patient is judged to have good response to platinum chemotherapeutic drugs or PARP inhibitors and the like, and the result is consistent with the correlation of HRD-high and platinum drug treatment sensitivity reported in the prior literature. Clinically, patients were given a platinum chemotherapeutic with Progression Free Survival (PFS) of 13 months, demonstrating that HRD score can be calculated using the methods identified in examples 1-6 herein and clinical medication can be guided by HRD score.

Application example 2

43 breast cancer samples and 72 ovarian cancer samples (115 samples) are selected for low-depth whole genome sequencing, and the 115 samples have prognosis information of platinum chemotherapy (namely, the 115 samples are all treated by the platinum chemotherapy). LOH, TAI, LST were calculated using the methods described in examples 1-8 of the present application to obtain HRD score.

Based on the HRD score values, 115 samples were divided into two groups, HRD-High (positive) and HRD-Low (negative), with cutoff =42 as the critical point. Among them, 40 cases of HRD-High group and 75 cases of HRD-low group were used. Survival analysis was performed on the HRD-High and HRD-Low groups in combination with clinical PFS (progress Free survival), and the experimental results are shown in FIG. 2.

As can be seen from FIG. 2, the overall survival time of the HRD-High group (40 cases) is significantly longer than that of the HRD-Low group (75 cases), indicating that the HRD-High group is sensitive to platinum-based drug treatment, consistent with the fact.

The harmful variation of SNV and Indel on BRCA1/2 gene of 115 samples is detected respectively. The result is: among the 43 breast cancer samples, 13 BRCA1/2 mutant samples and 30 BRCA1/2 wild type samples; among the 72 ovarian cancer samples, 27 BRCA1/2 mutant samples and 45 BRCA1/2 wild-type samples were subjected to BRCA1/2-HRD distribution, and the results are shown in FIG. 3.

As can be seen from FIG. 3, in the samples with BRCA1/2 mutation, most of HRD values are high and are HRD positive, and patients in the category are sensitive to platinum drugs; while a small proportion of samples in the wild type BRCA1/2 were HRD-High, it can be seen from FIG. 2 that even in the wild type BRCA1/2, if the HRD score is High and the samples are HRD-positive, the patient is sensitive to platinum drugs, which is consistent with the fact. Therefore, the HRD score was calculated using the methods provided in the examples of the present application, and clinical medication could be guided to some extent by the numerical values of the HRD score.

The present embodiment is only for explaining the present application, and it is not limited to the present application, and those skilled in the art can make modifications of the present embodiment without inventive contribution as needed after reading the present specification, but all of them are protected by patent law within the scope of the claims of the present application.

Claims

1. A method for evaluating HRD score based on low depth WGS, comprising the steps of,

processing low-depth WGS offline data of a sample to be detected; and any one or more steps selected from the following three steps:

2. The method of claim 1, wherein the processing low depth WGS machine data of the sample to be tested specifically comprises:

s1-1: comparing the offline data with a reference genome of a whole genome to obtain a first comparison file;

s1-3: the whole genome was divided into windows of 100Kbp in order of arrangement.

3. The method of claim 2, wherein processing the low depth WGS machine data of the sample to be tested further comprises:

s1-4: taking reads in the second comparison file as a basic unit, counting the number of reads in each window, taking the number of reads as the reads count of the window, and recording the number as RC_iI is the order of windows divided in the whole genome according to the arrangement order, and i is 1,2 and 3;

s1-6: calculating W_jIs a median value RC of_jThe average RC of the whole sample to be measured is recorded as RC_pBy the following formula to RC_iAnd (3) performing correction:

i=M₁+M₂+M₃...+M_（j-1）+k；

s1-7: processing the low depth WGS off-set data of the N healthy samples according to the steps S1-4, S1-5 and S1-6, calculating the median value RC of each window in the N healthy samples, and marking as RC_yAs aThe RC of the window constructs baseline; n is more than or equal to 30, and y is 1,2, 3;

s1-8: taking NRC of each window to be tested sample_iDivided by RC in the corresponding baseline_yObtaining DR;

s1-9: and segmenting the DR based on a cyclic binary segmentation algorithm, and marking as DR segments, wherein DR values in the same DR segment are relatively close, the average DR values of two adjacent DR segments are obviously different, and each DR segment at least comprises 10 windows.

4. The method of claim 3, wherein processing the low depth WGS trip data for the sample to be tested further comprises:

。

5. the method of claim 4, wherein the calculation method for establishing genomic loss of heterozygosity LOH specifically comprises:

s2-3: counting the average number of the second allelic base frequency of the SNP locus in each window to serve as AF of the window, and generating a new AF number sequence; if AF is greater than 0, adjust AF to 0.5;

s2-5: selecting C_q1 or more and AF equal toAn AF fragment of 0, which is marked as an LOH event if the length of the fragment is greater than 15Mb and less than the length of the whole chromosome in which the fragment is located;

s2-6: and recording LOH events in the sample to be detected as HRD-LOH score.

6. The method according to claim 3, wherein the calculation method for establishing large segment migration LST specifically comprises:

7. The method according to claim 2, wherein the calculation method for establishing Telomere Allele Imbalance (TAI) specifically comprises:

s3-2: counting the variation frequency of the allelic base of each site on the SNP site of the second comparison file to obtain two frequencies with the highest variation frequency, namely a first allelic base frequency AF1 and a second allelic base frequency AF2, and calculating the AFR value of each site according to the following formula; if a certain locus has no variation, the AFR value is 0, and the locus is removed;

s3-3: calculate each instituteThe average AFR of the window is designated as AFR as the AFR of the window_pIf the AFR values of a certain window are all 0, the AFR of the window_pIs 0;

8. The method of claim 2, wherein the processing low depth WGS machine data of the sample to be tested further comprises S1-0: preprocessing machine unloading data: and removing the joints on the reads from the data of the off-line machine.

9. The method of claim 1, wherein the HRD score = HRD-LOH score + HRD-TAI score + HRD-LST score.

10. The method of claim 1 wherein the low depth WGS is a WGS sequencing result of 10 layers or more.