CN114457143B

CN114457143B - Method for constructing CNV detection library and CNV detection method

Info

Publication number: CN114457143B
Application number: CN202210258037.9A
Authority: CN
Inventors: 郑贝贝; 夏琴; 赵丁丁; 冒燕; 孔令印; 梁波
Original assignee: Suzhou Basecare Medical Device Co ltd
Current assignee: Suzhou Basecare Medical Device Co ltd
Filing date: 2022-03-16
Publication date: 2024-07-09
Anticipated expiration: 2042-03-16

Abstract

The invention discloses a method for constructing a CNV detection library and a CNV detection method. The method comprises the following steps: cutting a sample genome DNA by adopting a double endonuclease combination to obtain an enzyme cutting product, and connecting the enzyme cutting product with a sequencing joint to obtain a connecting product; PCR amplification of the ligation products was performed to obtain a sequencing library, and the double endonuclease combinations included MboI, mseI, bfaI or a combination of any 2 of ApoI-HF. The library construction method of the present invention enables high-depth sequencing of a representative region of a whole genome segment with a small data amount (sequencing cost), and enables high-density and uniform coverage of a sufficient amount of SNP sites for CNV analysis. The detection of multiple CNV mutation types, including chromosome aneuploidy, repeated deletion, haploid, triploid, polyploid, LOH, UPD and the like, is realized through a CNV combination analysis strategy.

Description

Method for constructing CNV detection library and CNV detection method

Technical Field

The invention belongs to the technical field of genetic detection, and relates to a method for constructing a CNV detection library and a CNV detection method.

Background

Simplified genome sequencing (reduced-representation sequencing) is a technology developed in the second generation sequencing technology that uses enzyme digestion technology, sequence capture technology or other technical means to reduce the complexity of the genome of a species and sequence specific regions, thereby obtaining partial genome sequence information. Simplified genome sequencing techniques mainly include enzyme-based sequencing techniques and low coverage genotyping techniques. Sequencing techniques based on restriction enzyme(s) in turn include simplified representation library (RRLs) sequencing techniques, simplified polymorphic sequence complexity (CRoPS) sequencing techniques, restriction enzyme site-related DNA sequencing (RAD-seq) techniques; while low coverage genotyping techniques include sequencing-based Genotyping (GBS) techniques and Multiplex Shotgun Genotyping (MSG) techniques. The most widely used is RAD-seq technology, which uses restriction enzyme to cut genome to generate fragments with a certain size, constructs a sequencing library and performs high-throughput sequencing on RAD markers generated after the cutting. Since RAD markers are small fragment DNA tags near the genome-wide presentation of specific cleavage sites, representing the sequence features of the entire genome, thousands of single nucleotide polymorphism (single nucleotide polymorphism, SNP) markers can be obtained in most organisms by sequencing RAD markers.

Chromosomal abnormalities are important causes of embryo dysplasia and spontaneous abortion, and a part of fetuses capable of normal delivery are accompanied by obvious congenital birth defects, such as mental retardation, growth and development retardation, multiple organ deformity and the like caused by unknown reasons. Chromosomal abnormalities include abnormal numbers and abnormal structures, where copy number variation (copy numbervariation, CNV) generally refers to a decrease or increase in copy number of large segments of the genome that are longer than 1kb, and are predominantly represented by sub-microscopic repeats and deletions, which are important components of structural variations in the genome. The mechanisms by which CNVs are generated include: (1) Non-allelic homologous recombination occurs primarily in meiosis, and can lead to inversion, duplication and deletion; (2) Some simple structure copy number variations may result from non-homologous end joining (3) replication fork arrest and template switching (FoSTeS) mechanisms based on DNA error replication may result in complex structure-containing copy number variations. Abnormal CNV can lead to monogenic and rare diseases inherited in mendelian, also associated with complex diseases in humans.

Copy Number Variation (CNV) detection types include: aneuploidy, duplication, deletion, chimerism, loss of heterozygosity (loss ofheterozygosity, LOH), uniparental disomy (uniparental disomy, UPD), haploids, triploids, polyploids, and the like. Chromosome aneuploidy refers to the occurrence of an abnormal embryo chromosome aneuploidy (aneuploidy) during the period of in vitro fertilization-embryo transfer (in vitro fertilization-embryo transfer, IVF-ET), manifested by an increase or decrease in chromosome number of one or more than one on the basis of a normal diploid; repetition refers to the phenomenon that the same segment is added to a chromosome, thereby causing mutation; deletions refer to the phenomenon of loss of a segment of a chromosome; chimeras are used genetically to refer to individuals that are chimeric or promiscuous in different genetic traits; loss of heterozygosity (LOH) refers to an allele at a particular locus on a pair of homologous chromosomes, with mutations (deleterious) on one side and normal on the other, and for some reason, deletion or mutation of the corresponding sequence on the normal side, rendering the locus semi-homozygous or homozygous; uniparental disomy (UPD) refers to the replacement of a chromosomal segment from one parent with a homologous portion of the other, or both homologous chromosomes of an individual from the same parent; haploid refers to a somatic cell containing one set of chromosomes, and triploid refers to three complete sets of chromosomes in each cell.

Current techniques for detecting CNV include: g-banding chromosome karyotyping analysis technology, fluorescence In Situ Hybridization (FISH), chip hybridization technology, low-depth whole genome second generation sequencing technology which is rapid in development and the like, and the technologies provide a more accurate, specific and rapid method for embryo implantation pre-detection, so that the success rate of pregnancy is further improved, and the occurrence rate of birth defects is reduced.

Chromosome karyotyping is used as a "gold standard" for cytogenetic diagnosis, and can detect abnormal chromosome number and abnormal structure such as deletion, repetition, inversion of fragments larger than 5Mb-10 Mb, but cannot cover the whole genome, and the source of abnormal chromosome fragments is often difficult to judge.

The FISH technology is to utilize a fluorescence labeling probe to in-situ hybridize a nucleic acid sequence of a sample to be detected, and then to determine whether chromosome abnormality exists by observing the fluorescence state of the sample under a fluorescence microscope. However, the overlapping of signals during the detection of the fluorescence in situ hybridization technique is liable to cause false positives, the signal judgment has a large dependence on personal experience, and the number of probes in the detection technique is generally limited and usually 5, and although 15 chromosomes can be detected with the development of multicolor fluorescence in situ hybridization, all 46 chromosomes cannot be analyzed.

Chromosome Microarray Analysis (CMA) techniques can provide high resolution and genome-wide detection of chromosome imbalance changes, pinpointing abnormally altered fragments, but CMA chips are costly and do not detect probe-uncovered chromosome segments.

Genomic copy number variation sequencing technology (CNV-Seq) is a chromosome analysis technology that detects genomic copy number variation based on a whole genome sequencing method of high throughput sequencing technology. The method can detect chromosome disease types covered by the CMA chip platform, can find microdeletion and microduplication of chromosomes which cannot be covered by the chip probe, has low cost and good repeatability, but the CNV-Seq technology can only detect conventional deletion and repetition and cannot detect copy number abnormality of haploid, triploid, LOH and other types.

High-depth whole genome sequencing can detect various types of CNV abnormalities, but has high detection cost and is difficult to clinically apply and popularize.

In summary, how to provide a method with low cost, high accuracy and capability of detecting various types of CNV abnormalities is one of the problems in the field of genetic analysis.

Disclosure of Invention

Aiming at the defects and actual demands of the prior art, the invention provides a method for constructing a CNV detection library and a CNV detection method, which are aimed at designing according to relevant technical indexes of CNV detection, develop a set of library construction flow suitable for CNV detection, and can carry out high-depth sequencing on a representative region of a whole genome part by using less data volume (sequencing cost) to obtain high-density and uniform-coverage SNP loci which can carry out CNV analysis in sufficient quantity, thereby carrying out CNV detection with low cost, high accuracy and capability of detecting various CNV abnormal types.

In order to achieve the above purpose, the invention adopts the following technical scheme:

In a first aspect, the present invention provides a method of constructing a CNV detection library, the method comprising:

cutting a sample genome DNA by adopting a double endonuclease combination to obtain an enzyme cutting product, and connecting the enzyme cutting product with a sequencing joint to obtain a connecting product; PCR amplification of the ligation products was performed to obtain a sequencing library, and the double endonuclease combinations included MboI, mseI, bfaI or a combination of any 2 of ApoI-HF.

According to the invention, a library construction flow suitable for CNV detection is developed aiming at the detection requirements of various CNV variation types, the requirements of CNV on uniformity and repeatability are fully analyzed, a double endonuclease combination is controlled, the enzyme digestion recognition sites of the endonuclease combination can be stably and uniformly covered on a whole human genome, the genome is broken by adopting the endonuclease combination, the DNA sequence in a fixed fragment range is captured to construct the library, and the high-depth sequencing can be carried out on a representative region of the whole genome part by using a small amount of data (sequencing cost) to obtain high-density and uniform-coverage SNP sites which can be used for CNV analysis in a sufficient amount.

Preferably, the double endonuclease combination is a combination of MseI and MboI.

In the invention, the enzyme digestion recognition sites are required to meet the conditions of high density distribution, high coverage, good uniformity and the like, and the library constructed by utilizing the double endonuclease combination meets the concentration quality inspection requirement, is more than 6 ng/mu L, and controls the double endonuclease combination to MseI and MboI, so that CNV fluctuation is smaller, GC distribution is more gentle, coverage is higher, and the number of residual SNP after filtration is more.

Preferably, the method further comprises the step of screening the ligation products.

Preferably, the screening for ligation products comprises:

the fragment of 200 bp-400 bp is obtained from the connection product by adopting a magnetic bead sorting method, including but not limited to 201bp, 202bp, 203bp, 205bp, 210bp, 220bp, 230bp, 250bp, 260bp, 280bp, 300bp, 320bp, 350bp, 360bp, 370bp, 380bp or 390bp.

Preferably, the method further comprises the step of purifying the library.

Preferably, the purification library comprises:

and (3) screening and obtaining fragments of 200-500 bp from the PCR amplified products by adopting a magnetic bead sorting method, wherein the fragments include, but are not limited to 201bp、202bp、203bp、205bp、210bp、220bp、230bp、250bp、260bp、280bp、300bp、320bp、350bp、360bp、370bp、380bp、450bp、460bp、470bp、480bp or 490bp.

In the present invention, the sequencing adaptor sequences and PCR amplified primers include, but are not limited to, adaptor sequences and primers suitable for use in the Sieimer, illumina or Huada sequencing platforms.

Preferably, the sequencing adaptors comprise a first adaptor and a second adaptor.

Preferably, the nucleic acid sequence of the first linker is shown as SEQ ID NO.1 and SEQ ID NO. 2, and the nucleic acid sequence of the second linker is shown as SEQ ID NO. 3 and SEQ ID NO. 4.

Preferably, the PCR amplified primers comprise high throughput sequencing platform universal primers.

Preferably, the high throughput sequencing platform universal primers comprise an upstream primer and a downstream primer.

Preferably, the sequence of the upstream primer is shown as SEQ ID NO. 5, and the sequence of the downstream primer is shown as SEQ ID NO. 6.

SEQ ID NO:1：5’-GAACGACATGGCTACGATCCGACTTTTAA-3’。

SEQ ID NO:2：5’-AAGTCGGATCGTAGCCATGTCGTTC-3’。

SEQ ID NO:3：

5’-GATCAAGTCGGAGGCCAAGCGGTCTTAGGAAGACAA-3’。

SEQ ID NO:4：5’-TTGTCTTCCTAAGACCGCTTGGCCTCCGACTT-3’。

SEQ ID NO:5：5’-GAACGACATGGCTACGA-3’。

SEQ ID NO:6：

5’-TGTGAGCCAAGGAGTTG(barcode)TTGTCTTCCTAAGACCGC-3’。

Wherein, barcode is a sequencing tag sequence.

Preferably, the method further comprises the step of determining the concentration of the sequencing library.

As a preferred technical scheme, the method for constructing the CNV detection library comprises the following steps:

(1) Cutting the sample genome DNA by adopting a double endonuclease combination to obtain an enzyme cutting product;

(2) Connecting the enzyme digestion product with a sequencing joint to obtain a connection product;

(3) Selecting and obtaining fragments of 200 bp-400 bp from the connection products by adopting a magnetic bead separation method;

(4) Performing PCR amplification on the fragments obtained by screening in the step (3) by using a high-throughput sequencing platform universal primer;

(5) Screening and obtaining fragments of 200 bp-500 bp from the PCR amplified products by adopting a magnetic bead sorting method to obtain a sequencing library;

(6) Determining the concentration of the sequencing library.

In a second aspect, the present invention provides a method for detecting CNV for non-disease diagnosis purposes, the method comprising:

Constructing a CNV detection library by using the method for constructing the CNV detection library in the first aspect, and sequencing and CNV detection are performed.

In the invention, the conventional sequencing technology is suitable for the technical scheme of the invention and can be selected according to requirements.

In one embodiment, a second generation sequencing technology is adopted, in the second generation sequencing, each base is provided with a corresponding quality value, the sequencing quality value (Q) is used for measuring the sequencing accuracy, Q20 and Q30 represent the percentage of bases with the quality value not less than 20 or 30, the sequencing data Q20 is more than 95%, and the sequencing data Q30 is more than 87%, so that a better analysis result can be obtained.

Preferably, the CNV detection includes both conventional CNV detection and special type CNV detection.

Preferably, the conventional CNV detection includes repeat and deletion detection.

Preferably, the duplication and deletion detection includes counting the number of sequencing data reads in a window.

Preferably, the duplication and deletion detection includes counting the number of sequencing data read in a window, and plotting a CNV graph, wherein the ordinate zero line represents the chromosome copy number of 2, the a line represents the chromosome copy number of 2.8, the b line represents the chromosome copy number of 2.2, the c line represents the chromosome copy number of 1.8, the d line represents the chromosome copy number of 1.2, and the corresponding chromosome copy number is between 1.8 and 2.2, representing normal; a region exceeding line a, i.e., a chromosome copy number greater than 2.8, represents duplication, and a region exceeding line d, i.e., a chromosome copy number less than 1.2, represents deletion; the region between the a line and the b line is embedded repetition; the region between the c-line and d-line is the chimeric deletion.

Preferably, the specific type of CNV assay comprises haploid, diploid, triploid and LOH assays.

Preferably, the haploid, diploid, triploid and LOH detection comprises analysis of B allele frequencies at SNP sites.

Preferably, the haploid, diploid, triploid and LOH detection comprises analyzing the B allele frequency of the SNP site, drawing a scatter plot and a BAF density profile, the BAF scatter plot being reflected in the density profile, the haploid having a distinct peak around 1 (0.8-1.2); diploid is unimodal around 0.5 (0.42,0.58) and normally distributed; triploid was found to be significantly peaked near 0.33 (0.25,0.41) and 0.67 (0.59,0.75), LOH was also unimodal near 0.5 (0.42,0.58), but the BAF scattergram exhibited a phenomenon that there were few scattered points in the significant partial region, and it was clearly judged.

The invention adopts a special data analysis strategy to normalize the simplified genome sequencing data, the analysis principle is shown in figure 1, and the detection analysis of various CNV variation types is realized, specifically:

(1) Performing CNV analysis of whole genome deletion, repetition and chimerism on the sequencing data, wherein the analysis principle is that the number of the read windows is counted to determine the repetition and deletion conditions, the region of the sample to be detected, which is higher than the number of the control sample sequences, tends to repeat, the region of the sample to be detected, which is lower than the number of the control sample sequences, tends to be deleted, and the analysis principle diagram is shown in figure 1;

Comparing the base sequence obtained by DNA sequencing with a human genome standard sequence hg19, determining the exact position of each base sequence obtained by sequencing on a chromosome, then removing the base sequence which is low in quality, multiple-matched and incompletely matched on the chromosome, ensuring the accuracy of sequencing data and the uniqueness of the positioning of each base sequence, dividing the whole chromosome into windows with the size of 20Kb fragments after accurately comparing the base sequences with a reference genome, calculating the number of unique matching sequences obtained on each 20Kb window, correcting GC content deviation, merging the windows, obtaining the normalized Reads number, comparing the obtained number of unique matching sequences with a reference database, calculating LogRR value of each window, logRR value reflecting the difference between a sample and a reference data window fragment, namely the CNV condition of each window fragment, and judging that reads of continuous 5 windows are increased or decreased, so that the deletion repetition above 100Kb can be detected, as shown in a graph 1, a vertical zero line represents the chromosome copy number 2, a line represents the chromosome copy number between 2.8, and a 2.8.c represents the chromosome copy number, and a 2.c 2.2.2-c, and a region between the vertical zero line represents the chromosome copy number of 2.2.2; a region exceeding line a, i.e., a chromosome copy number greater than 2.8, represents duplication, and a region exceeding line d, i.e., a chromosome copy number less than 1.2, represents deletion; the region between the a line and the b line is embedded repetition; the region between line c and line d is a chimeric deletion, e.g., the a ellipse is located within the repeat interval, the B ellipse is located within the chimeric repeat region, and other chromosomal regions are located within the normal interval;

(2) Performing CNV analysis of whole genome haploids, triploids, polyploids and LOH on the sequencing data;

Single Nucleotide Polymorphisms (SNPs) refer to polymorphisms in a certain site in a chromosomal DNA sequence due to single nucleotide changes, and B Allele Frequencies (BAFs) can be used for CNV analysis of haploids, triploids, polyploids, LOHs, etc., the principle of which is: for haploids, there is no heterozygous SNP due to the loss of one allele, so BAF will be 0 or 1 (reflecting two possibilities of A, B, respectively); for diploids, there are two copies per autosome, so BAF in diploid cells will be 0, 0.5 or 1 (reflecting three possibilities of AA, AB, BB, respectively); for triploid, BAF will be 0, 0.33, 0.67 or 1.0 (reflecting four possibilities for AAA, AAB, ABB and BBB, respectively); for LOH, a heterozygosity deletion occurs in a partial region thereof, so BAF of the heterozygosity deletion region is 0 or 1; however, except for the LOH region, the BAF of other genome normal diploid regions is still 0, 0.5 or 1, and the ratio is quite high, so that the BAF density distribution diagram of the LOH is consistent with that of the normal diploid, the LOH and the diploid cannot be distinguished by the density distribution diagram alone, the LOH region can be found to have almost no scattered points by combining with the BAF scatter diagram for analysis, but the BAF of each SNP site is not completely 0, 0.33, 0.5, 0.67 and 1 due to the variability of the sequence, so that the interval of "0.33" is defined as [0.25,0.41], "0.5 interval" is defined as [0.42,0.58], "0.67 interval" is defined as [0.59,0.75];

After sequencing, firstly, accurately comparing a base sequence to a human reference genome hg19, dividing the whole chromosome into a window with a 20kb fragment size, carrying out mutation recognition, filtering SNP, screening out SNP loci with a high GQ value, analyzing and calculating the B Allele Frequency (BAF) of each SNP locus, respectively drawing a scatter diagram and a BAF density distribution diagram according to the BAF value by different algorithms, combining a CNV diagram, a BAF scatter diagram and the BAF density distribution diagram, and then carrying out analysis of haploids, triploids, polyploids and LOH, wherein the peak value of the BAF density distribution diagram represents more SNPs with the value, and the BAF scatter diagram is reflected in the density diagram through data processing, and the haploids are obvious peaks near 1; diploid is unimodal around 0.5 and normally distributed; triploid can be seen to peak around 0.33 and 0.67, LOH was also seen to be unimodal around 0.5, but its BAF scattergram exhibited a phenomenon that there was almost no scattering in a clear partial region, and it can be clearly judged.

The selection criteria for SNPs were:

(1) The GC content of the SNP area is 25-65%, and the GC content of the SNP area is 35-55% preferentially;

(2) When the CNV types such as haploids, triploids, polyploids and LOH are detected, the sample sequencing data volume Totalbase needs to reach 5G, and the double-end sequencing PE100, namely the sample off-machine data is double-end 25M, can meet the analysis requirement;

(3) The number of the SNPs which can be captured by the method is about 60 ten thousand, the number of the SNPs with the sequencing depth reaching 3X is about 17%, the number of the SNPs with the sequencing depth reaching 5X is about 12%, wherein the data reliability is low when the SNP loci with the sequencing depth below 3X are analyzed, and the use is not recommended; SNP loci with the sequencing depth of 3X to 5X are moderate credible data and can be used for analysis; SNP loci with sequencing depth of more than 5X are highly reliable data, and the analysis result is accurate; part of SNP loci can reach up to 30X in sequencing depth, as shown in figure 2, so that SNP loci with the sequencing depth of more than 5X are preferentially selected for BAF value analysis, and SNP loci with the sequencing depth of more than 3X are assisted for analysis;

(4) When the invention detects CNV types such as haploids, triploids, polyploids, LOH and the like, at least 3 available SNPs, preferably 5 or more available SNPs exist in a region, and the more available SNPs, the more accurate the CNV judgment result.

The method for constructing the CNV detection library and the CNV detection method are not directly used for diagnosis purposes, namely, the diagnosis result is not directly obtained, but is used as an auxiliary means for analyzing an in-vitro sample to obtain intermediate result information or used for scientific research and other purposes such as analyzing animal genome conditions and the like.

Compared with the prior art, the invention has the following beneficial effects:

(1) The method for constructing the CNV detection library can be used for efficiently and specifically capturing the SNP of the whole genome for CNV analysis, and improves the CNV detection type and accuracy;

(2) Compared with the CNV-Seq, the CNV detection method can detect aneuploidy, repetition and deletion which can be detected by the CNV-Seq, and can detect haploid, triploid, polyploid, LOH and the like which cannot be detected by the CNV-Seq technology;

(3) Compared with the whole genome sequencing technology, the CNV detection method provided by the invention has the advantages that the sequencing cost is obviously reduced (80% lower) when the CNV detection method is used for carrying out high-depth sequencing on the representative region of the whole genome part.

Drawings

FIG. 1 is a schematic diagram of a CNV detection system according to the present invention;

FIG. 2 is a graph showing the CNV detection result in example 2 of the present invention;

FIG. 3 is a BAF scatter plot of example 2 of the present invention;

FIG. 4 is a density distribution diagram of BAF in example 2 of the present invention;

FIG. 5 is a graph showing the CNV detection result in example 3 of the present invention;

FIG. 6 is a BAF scatter plot of example 3 of the present invention;

FIG. 7 is a density distribution diagram of BAF in example 3 of the present invention;

FIG. 8 is a graph showing the CNV detection result in example 4 of the present invention;

FIG. 9 is a BAF scatter plot of example 4 of the present invention;

FIG. 10 is a density distribution diagram of BAF in example 4 of the present invention;

FIG. 11 is a BAF scatter plot in example 5 of the present invention;

fig. 12 is a density profile of BAF in example 5 of the present invention.

Detailed Description

The technical means adopted by the invention and the effects thereof are further described below with reference to the examples and the attached drawings. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof.

The specific techniques or conditions are not identified in the examples and are described in the literature in this field or are carried out in accordance with the product specifications. The reagents or apparatus used were conventional products commercially available through regular channels, with no manufacturer noted.

Example 1

The embodiment constructs a CNV detection library, comprising the following steps:

(1) Extracting the genomic DNA of a sample by adopting a Tiangen blood/cell/tissue genomic DNA extraction kit, performing DNA enzyme digestion reaction after quality inspection is qualified, taking 200ng of the DNA sample, adding nuclease-free water to supplement to 17 mu L, uniformly mixing, centrifuging, and placing on an ice box for standby, and preparing enzyme digestion mixed solution Mix1 according to a table 1;

TABLE 1

Component (A)	Volume (mu L)
		Endonuclease buffer (NEW ENGLAND Biolabs, cat# B7204)	2
MseI (NEW ENGLAND Biolabs, cat# R0525)	0.5
		MboI(New England Biolabs，R0147)	0.5

Mu.L of the enzyme-digested mixed solution Mix1 was added to 17. Mu.L of DNA sample, and the mixture was blown and mixed with a pipette, centrifuged briefly, immediately placed in a PCR instrument after centrifugation, and the following reaction procedure was started: maintaining at 37deg.C for 20min, 65deg.C for 20min, and 4deg.C;

(2) Adding a linker at the end of the DNA fragment, and preparing a linker mixture Mix2 according to Table 2;

TABLE 2

Adding 5 mu L of the joint mixture Mix2 into the DNA digested in the step (1), uniformly vortex-mixing, briefly centrifuging, immediately placing in a PCR instrument after centrifuging, and starting a reaction program: 60 ℃,10 minutes and 4 ℃ hold;

(3) The joints are connected, and a ligase mixed solution Mix3 is prepared according to the following table;

TABLE 3 Table 3

Component (A)	Volume (mu L)
		Ligase buffer (NEW ENGLAND Biolabs, B0202)	3
T4 DNA ligase (NEW ENGLAND Biolabs, M0202)	2

Mu.L of ligase mixture Mix3 was added to the above adaptor-mixed DNA, vortexed, centrifuged briefly, and immediately after centrifugation, placed in a PCR apparatus, and the following reaction procedure was initiated: keeping at 22deg.C for 25 min, 65deg.C for 10 min, and 4deg.C, taking out, centrifuging briefly, and placing on ice box when the program is operated to 4deg.C;

(4) Transferring the connection product into a 1.5mL centrifuge tube, adding 70 mu L of nuclease-free water to complement to 100 mu L, adding 60 mu L of DNA purification magnetic beads, uniformly mixing, standing for 5 minutes at room temperature, standing on a magnetic rack, standing for 4 minutes on the magnetic rack until liquid is clear, transferring supernatant into a new 1.5mL centrifuge tube, adding 18 mu L of DNA purification magnetic beads, standing for 5 minutes at room temperature after uniformly mixing, standing on the magnetic rack until the liquid is clear, removing the supernatant, washing with 200 mu L of 80% alcohol, drying at room temperature, extracting DNA with 22 mu L of Low TE, and taking 20 mu L of solution into a new 0.2mL PCR tube;

(5) Amplifying the library, and preparing a PCR reaction mixture Mix4 according to the following table;

TABLE 4 Table 4

To 20. Mu.L of the sample after fragment screening, 28. Mu.L of PCR reaction mixture Mix4 was added, and then specific primer X (2. Mu.L, jin Weizhi) was added, and the mixture was vortexed and centrifuged briefly, at which time the total volume was 50. Mu.L, and then the PCR tube was put into a PCR apparatus for reaction under the following conditions: 98 ℃ for 45 seconds; (98 ℃ C. 15 seconds, 55 ℃ C. 30 seconds, 72 ℃ C. 30 seconds). Times.6 cycles; 72 ℃ for 1 minute; preserving at 4 ℃;

(6) Purifying the library, centrifuging after the reaction is finished, transferring an amplified product into a 1.5mL centrifuge tube, adding 50 mu L of DNA purification magnetic beads, uniformly mixing, standing at room temperature for 5 minutes, then placing on a magnetic rack for 4 minutes until liquid is clarified, discarding supernatant, washing with 200 mu L of 80% alcohol, repeating once, drying the magnetic beads at 25 ℃, adding 25 mu L of Low TE to resuspend the magnetic beads, placing on the magnetic rack until the liquid is clarified after placing for 5 minutes at room temperature, and taking 25 mu L of solution into a new 1.5mL centrifuge tube;

(7) Library quantification was performed using a Qubit fluorometer.

Example 2

This example was based on the library construction method described in example 1 and performed on-machine sequencing using a second generation platform, and haploids were detected from sequencing data.

The consultant women, age 32 years, spontaneous abortion, no genetic test, taking 5mL of peripheral blood sample, storing in EDTA anticoagulation blood collection tube, extracting DNA by using a Tiangen whole blood extraction kit, sample number JX1, carrying out experiments by using the library construction method described in example 1, analyzing results as shown in fig. 2-4, CNV test results as normal women, BAF density distribution map shows that peaks are obvious near 1, thus judging that the test result is haploid.

Example 3

The present example performs a triploid assay.

The consultant used as women, aged 33 years, spontaneous abortion, no genetic test, taking 5mL of peripheral blood sample, storing in EDTA anticoagulation blood collection tube, extracting DNA by using Tiangen whole blood extraction kit, sample number being Y2155, and subsequently adopting the library construction method described in example 1 for experiment, the analysis result is shown in fig. 5-7, CNV graph shows CNV test result as normal female, BAF density distribution graph shows two peaks near 0.33 and 0.66, thus judging that test result is triploid.

Example 4

This example detects loss of heterozygosity (LOH).

Consultants allowed women, aged 34 years, spontaneous abortion, no genetic test, taking 5mL of peripheral blood sample, storing in EDTA anticoagulation blood collection tube, extracting DNA by using a Tiangen whole blood extraction kit, sample number YY, and subsequently carrying out experiments by using the library construction method described in example 1, wherein the analysis result is shown in fig. 8-10, CNV detection result is del (14) (q 11.2). Seq [ GRCh37/hg19] (19200001-20640000) X1, BAF density distribution map has 1 peak near 0.5, which means diploid, and the arrow direction in the graph is found by combining with BAF scatter diagram, and part of the area has no scattered points obviously, so that the detection result can be judged to be LOH.

Example 5

This example compares the detection method of the present invention with the Whole Genome (WGS) detection technique.

Sample number V2308 (triploid), as shown in fig. 11 and 12, was 46,xy by low-flux WGS detection, whereas by the detection method of the present invention, it was found that the sample was triploid by the detection method of the present invention, whereas in the BAF (B allele frequency) scatter diagram of the triploid obtained by the two methods, as shown in fig. 11, the SNP points obtained by the present invention were significantly more than the low-flux WGS technique, so that the present invention was superior to the low-flux WGS in triploid detection, BAF density distribution of all samples was calculated (fig. 12), theoretically, the BAF peak of the triploid was around 0.33 and 0.66, the peak of the diploid was around 0.5, and for sample V2038, the present invention found that there were two peaks around 0.33 and 0.66, whereas the low-flux WGS analysis had a distinct peak around 0.5, and several other lower peaks around 0.33, 0.4, 0.66 and 0.7, and thus the present invention was found that the correct results were obtained.

In summary, the present invention provides a library construction method suitable for CNV detection, which can perform high-depth sequencing on a representative region of a whole genome part with a small data amount (sequencing cost), obtain high-density and uniformly-covered enough CNV-analysis-capable SNP sites, perform deletion, repetition and chimeric CNV analysis by counting the number of windows of sequencing data read, and obtain BAF scatter diagrams and density distribution diagrams by analyzing B allele frequencies of each SNP site and counting SNPs in different BAF value intervals, and can analyze CNV analysis such as haploids, triploids, polyploids, LOHs, and the like.

The applicant states that the detailed method of the present invention is illustrated by the above examples, but the present invention is not limited to the detailed method described above, i.e. it does not mean that the present invention must be practiced in dependence upon the detailed method described above. It should be apparent to those skilled in the art that any modification of the present invention, equivalent substitution of raw materials for the product of the present invention, addition of auxiliary components, selection of specific modes, etc., falls within the scope of the present invention and the scope of disclosure.

Sequence listing

<110> Su Bei Kang medical instruments Co., ltd

<120> A method of constructing CNV detection library and CNV detection method

<130> 2022-03-15

<160> 6

<170> PatentIn version 3.3

<210> 1

<211> 29

<212> DNA

<213> Artificial sequence

<400> 1

gaacgacatg gctacgatcc gacttttaa 29

<210> 2

<211> 25

<212> DNA

<213> Artificial sequence

<400> 2

aagtcggatc gtagccatgt cgttc 25

<210> 3

<211> 36

<212> DNA

<213> Artificial sequence

<400> 3

gatcaagtcg gaggccaagc ggtcttagga agacaa 36

<210> 4

<211> 32

<212> DNA

<213> Artificial sequence

<400> 4

ttgtcttcct aagaccgctt ggcctccgac tt 32

<210> 5

<211> 17

<212> DNA

<213> Artificial sequence

<400> 5

gaacgacatg gctacga 17

<210> 6

<211> 35

<212> DNA

<213> Artificial sequence

<400> 6

tgtgagccaa ggagttgttg tcttcctaag accgc 35

Claims

1. A method for detecting CNV for non-disease diagnosis purposes, comprising:

Performing CNV detection by using a CNV detection library;

The construction method of the CNV detection library comprises the following steps: cutting genome DNA of a human sample by adopting a double endonuclease combination to obtain an enzyme cutting product, and connecting the enzyme cutting product with a sequencing joint to obtain a connecting product; performing PCR amplification on the connection product to obtain a sequencing library;

the double endonuclease combination is a combination of MseI and MboI;

the CNV detection comprises counting the number of sequencing data reads in a window;

the CNV detection also included analysis of B allele frequencies at SNP sites.

2. The CNV detection method according to claim 1, further comprising the step of screening the ligation products;

the screening of the ligation products comprises:

And screening the connection product by adopting a magnetic bead separation method to obtain a fragment of 200 bp-400 bp.

3. The CNV detection method according to claim 1, further comprising the step of purifying the library;

The purification library comprises:

and screening and obtaining fragments of 200 bp-500 bp from the PCR amplified products by adopting a magnetic bead sorting method.

4. The CNV detection method according to claim 1, wherein the sequencing adapter comprises a first adapter and a second adapter;

the nucleic acid sequence of the first joint is shown as SEQ ID NO.1 and SEQ ID NO. 2, and the nucleic acid sequence of the second joint is shown as SEQ ID NO. 3 and SEQ ID NO. 4.

5. The CNV detection method according to claim 1, wherein the PCR amplified primers comprise high throughput sequencing platform universal primers.

6. The CNV detection method according to claim 5, wherein the high throughput sequencing platform universal primers comprise an upstream primer and a downstream primer.

7. The method for detecting CNV according to claim 6, wherein the sequence of the upstream primer is shown in SEQ ID NO. 5 and the sequence of the downstream primer is shown in SEQ ID NO. 6.

8. The CNV detection method according to claim 1, further comprising the step of determining the concentration of the sequencing library.

9. The CNV detection method according to any one of claims 1 to 8, comprising the steps of:

(3) Selecting fragments of 200 bp-400 bp from the connection product by adopting a magnetic bead separation method;

(6) Determining the concentration of the sequencing library.