CN112877403B

CN112877403B - Method for constructing sequencing library of target sequence

Info

Publication number: CN112877403B
Application number: CN201911295838.7A
Authority: CN
Inventors: 王寅; 李林蔚; 柳焱; 孙福明; 王柯; 张媛媛; 茹兰兰
Original assignee: Fujian Herui Gene Technology Co ltd
Current assignee: Fujian Herui Gene Technology Co ltd
Priority date: 2019-11-29
Filing date: 2019-11-29
Publication date: 2023-11-03
Anticipated expiration: 2039-11-29
Also published as: CN112877403A; CN110872610A; CN110872610B

Abstract

The present application relates to high throughput nucleic acid sequencing, and more particularly, to methods of constructing sequencing libraries of target sequences and corresponding kits.

Description

Method for constructing sequencing library of target sequence

The present application is a divisional application of the application patent application of which the application date is 2019, 11, 29, 201911207127X and the application name is "method for constructing sequencing library of target sequences".

Technical Field

Background

By efficiently enriching target sequences to construct a sequencing library aiming at the target sequences, the sequencing cost can be effectively reduced, and the sequencing depth can be improved. For applications that typically require high depth sequencing, such as somatic mutation detection, target sequence enrichment performance (e.g., capture efficiency) is a major factor in determining the sensitivity and specificity of the method.

The currently common target sequence sequencing method is mainly a liquid phase hybridization capture method based on nucleic acid probes. The prior art method may comprise the following main steps: 1) End repair of the fragmented DNA is performed, the fragmented DNA is ligated with a linker (comprising a sequencer bridging anchor sequence (e.g., P5/P7 sequence suitable for Illumina platform), an index (index) sequence, a unique molecular recognition code (Unique Molecular Identifier, UMI) sequence (or molecular barcode/two-dimensional code, molecular Barcode, MBC), etc.) to introduce UMI, index sequence and primer amplification region, and a first round of amplification is performed with primers corresponding to the primer amplification region, thereby constructing a complete whole genome library; 2) The complete whole genome library is hybridized and captured using nucleic acid probes, and then amplified with the same primers for a second round to enrich for the target sequence to be detected, thereby generating a sequencing library for high throughput sequencing of the target sequence.

Alternatively, the prior art method may comprise the main steps of: 1) Performing end repair on the fragmented DNA, connecting the fragmented DNA with a truncated adaptor with UMI to introduce UMI, 2) performing a first round of amplification with primers with a sequencer bridging anchor sequence and an index sequence to introduce the sequencer bridging anchor sequence and the index sequence, thereby constructing a complete whole genome library; and 3) hybridizing and capturing the complete whole genome library using the nucleic acid probe, followed by a second round of amplification to enrich for the target sequence to be detected, thereby generating a sequencing library for high throughput sequencing of the target sequence. This process is shown for example in fig. 1.

However, such a method has the following drawbacks: the total length of the adaptors at both ends of the members of the complete whole genome sequencing library (e.g., the length from P5/P7 to UMI shown in FIG. 1) is typically about 60-80 nucleotides, carrying library members with such long adaptors that bind to each other (or "overlap") during capture to form complexes, thereby significantly reducing the proportion of target sequences, resulting in reduced capture efficiency.

To solve the problem of reduced capture efficiency due to linker overlap of library members in conventional target sequence sequencing methods, it is common to add a linker blocking agent at the probe hybridization stage during capture, which inhibits linker overlap by employing specially designed and modified DNA that can bind efficiently to the linker sequence (e.g., as shown in fig. 2). However, since the linker blocking agent requires special design and modification and a relatively high concentration to achieve the blocking effect, a relatively high detection cost is generated.

Thus, there is a need in the art for new library construction methods for target sequence sequencing with high capture efficiency and low detection cost.

Disclosure of Invention

The present invention meets the above-described need by providing a novel method of constructing a sequencing library of target sequences. More specifically, the inventors have found that by constructing an intermediate library of a specific architecture from nucleic acid fragments, then capturing intermediate library members carrying a target sequence from the intermediate library, and then complementing and amplifying the intermediate library members with primers to construct a sequencing library useful for high throughput sequencing of the target sequence, the method of the present invention successfully constructs a sequencing library with maintained or improved capture efficiency compared to prior art methods without using the linker blocking agent required by the prior art methods, thereby eliminating the need for the linker blocking agent by the prior art methods, and thereby reducing detection costs.

In one aspect, the invention provides a method of constructing a sequencing library of a target sequence comprising:

(a) Adding adaptors to the 5 'and 3' ends of the nucleic acid fragments to obtain adaptor-added nucleic acid fragments; optionally, the nucleic acid fragment is obtained by fragmenting a nucleic acid;

The linker comprises in sequence in the 5'-3' direction a public sequence, optionally a single molecule barcode, and optionally a spacer sequence;

(b)

(i) The adaptor-added nucleic acid fragments extend outwards at the 5 'end and the 3' end of the nucleic acid fragments, respectively, to nucleotide sequences of 16-40 nucleotides, respectively, independently of the nucleic acid fragments, the adaptor-added nucleic acid fragments being directly used for capturing without amplification; or alternatively

(ii) Amplifying the adaptor-added nucleic acid fragment using a first amplification upstream primer and a first amplification downstream primer, to obtain a first amplicon that extends outwardly at the 5 'end and 3' end of the nucleic acid fragment, respectively, of a nucleotide sequence that is each independently 16-40 nucleotides in length, relative to the nucleic acid fragment, a portion or all of the first amplification upstream primer being sufficiently complementary to a portion or all of the public sequence or its complement to effect amplification of the adaptor-added nucleic acid fragment, and a portion or all of the first amplification downstream primer being sufficiently complementary to a portion or all of the public sequence or its complement to effect amplification of the adaptor-added nucleic acid fragment;

(c) Capturing a target nucleic acid fragment from the adaptor-added nucleic acid fragment or the first amplicon without the use of a adaptor blocking agent;

(d) Amplifying the captured target nucleic acid fragment using a second amplification upstream primer and a second amplification downstream primer to obtain a second amplicon as a member of a sequencing library, thereby constructing the sequencing library,

the second amplification upstream primer comprises in sequence in the 5'-3' direction an upstream sequencer bridging anchor sequence, optionally an index sequence, and an upstream sequencing sequence,

the second amplification downstream primer comprises, in sequence in the 5'-3' direction, a downstream sequencer bridging anchor sequence, an optional index sequence, and a downstream sequencing sequence,

the sequence of the public sequence or its complement and the sequence of the first amplification upstream primer are each part or all of the sequence of the upstream sequencing sequence,

the sequence of the common sequence or its complement and the sequence of the first amplification downstream primer are each part or all of the sequence of the downstream sequencing sequence.

In another aspect, the invention provides a kit for constructing a sequencing library of a target sequence, comprising:

(a) A linker capable of being added to the 5 'and 3' ends of the nucleic acid fragment to obtain a nucleic acid fragment to which the linker is added; optionally, the nucleic acid fragment is obtained by fragmenting a nucleic acid;

(b)

(ii) A first amplification upstream primer and a first amplification downstream primer, the first amplification upstream primer and the first amplification downstream primer being capable of amplifying the adaptor-added nucleic acid fragment to obtain a first amplicon, the first amplicon extending outwardly from the nucleic acid fragment at the 5 'end and the 3' end, respectively, independently of the nucleic acid fragment, for a nucleotide sequence of 16-40 nucleotides, a portion or all of the first amplification upstream primer being sufficiently complementary to a portion or all of the common sequence or its complement to effect amplification of the adaptor-added nucleic acid fragment, a portion or all of the first amplification downstream primer being sufficiently complementary to a portion or all of the common sequence or its complement to effect amplification of the adaptor-added nucleic acid fragment;

(c) A capture reagent capable of capturing a target nucleic acid fragment from the adaptor-added nucleic acid fragment or the first amplicon without the use of a adaptor blocking reagent;

(d) A second amplification upstream primer and a second amplification downstream primer, the second amplification upstream primer and the second amplification downstream primer being capable of being used to amplify the captured target nucleic acid fragment to obtain a second amplicon as a member of a sequencing library, thereby constructing the sequencing library,

the sequence of the public sequence or its complement and the sequence of the first amplification downstream primer are each part or all of the sequence of the downstream sequencing sequence;

wherein the kit does not include a linker blocking agent.

In a further aspect, the invention provides the use of a reagent for preparing a kit for constructing a sequencing library of a target sequence, the reagent comprising:

(b)

wherein the agent does not include a linker blocking agent.

Drawings

FIG. 1 is a schematic diagram of a prior art target sequence sequencing method.

FIG. 2 is a schematic representation of the prior art adding a linker blocking agent during the hybridization phase to inhibit linker overlap.

FIG. 3 comparison of library construction without blocking agent for the linker of the invention, library construction with blocking agent for the IDT commercial long linker control, and library construction without blocking agent for the IDT commercial long linker control.

FIG. 4 comparison of library constructions without blocking agent for various lengths of linkers of the present invention, long linker control without blocking agent, and library Constructions (CK) for IDT commercial long linker control plus blocking agent.

FIG. 5 is a comparison of the construction of a library of the linker of the invention without blocking with the construction of a library of IDT commercial linkers plus blocking agent, as verified in clinical applications.

FIG. 6 comparison between library constructions of linkers of the same length and different nucleotide composition without blocking agent of the present invention.

Detailed Description

These and other aspects, features and advantages will become apparent to those of ordinary skill in the art from a reading of the following detailed description and the appended claims. For the avoidance of doubt, any feature of one aspect of the present invention may be used in any other aspect of the present invention. The word "comprising" is intended to mean "including", but not necessarily "consisting of. In other words, the listed steps or options need not be exhaustive. It should be noted that the examples given in the following description are intended to clarify the invention and are not intended to limit the invention to these examples per se. Similarly, all percentages are weight/weight percentages unless otherwise indicated. Unless otherwise explicitly indicated in the working and comparative examples, all numbers in this description indicating amounts of material or conditions of reaction, physical properties of materials and/or use are to be understood as modified by the word "about". The numerical range expressed in the form of "x to y" is understood to include x and y. When describing a plurality of preferred ranges in the form of "x to y" for a particular feature, it is to be understood that all ranges combining the different endpoints are also contemplated. In other words, any particular upper value may be associated with any particular lower value at any range of specified values. Finally, references to an element by the indefinite article "a/an" do not exclude the possibility that more than one element is present, unless the context clearly requires that there be only one element. Thus, the indefinite article "a/an" generally means "at least one.

Where features relating to particular aspects of the invention (e.g. methods of the invention) are disclosed, such disclosure is also deemed applicable to any other aspect of the invention (e.g. kits and uses of the invention) mutatis mutandis.

(b)

More specifically, the invention provides a method of constructing a sequencing library of a target sequence. The method comprises the following steps.

First, adaptors are added to the 5 'end and the 3' end of the nucleic acid fragment to obtain a adaptor-added nucleic acid fragment.

In some embodiments, the nucleic acid fragment may be obtained by fragmenting a nucleic acid. Enzymes useful for fragmentation of nucleic acids are known in the art. The nucleic acid may be free nucleic acid, e.g., from a bodily fluid such as blood, lymph, joint synovial fluid, cerebrospinal fluid, and the like. The nucleic acid may also be genomic nucleic acid extracted from cells derived from the tissue, for example by lysing the cells, for example healthy tissue or diseased tissue, such as a tumour. For example, the nucleic acid may be genomic DNA, mitochondrial DNA, long fragment PCR products, long fragment chromatin co-immunoprecipitated DNA, RNA reverse transcription product cDNA, or circulating tumor DNA (ctDNA). Enzymes useful for lysing cells are known in the art and may be protease K or other proteases or mixtures thereof. The length of the nucleic acid fragment may be in a range having endpoints selected from the group consisting of: 50. 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, and 600 nucleotides. For example, in some embodiments, the nucleic acid fragment may be in the range of 100-600 nucleotides in length.

As used herein, "adaptor" refers to a sequence that is single-stranded or double-stranded or partially double-stranded (e.g., Y-type) for addition to a nucleic acid fragment to be sequenced, e.g., by ligation, such as T-a ligation, or by insertion, such as by transposase transposition, for subsequent library construction. Typically, the preparation of a library requires the addition of adaptors to the ends of the nucleic acid fragments so that the nucleic acid fragments can be sequenced by a sequencer. When adaptors are added to the nucleic acid fragments, adaptors are typically added randomly to the 5 'and/or 3' ends of the nucleic acid fragments.

The linker may be a single stranded nucleic acid molecule having a functional end, which may be added in a manner that varies depending on the characteristics of the nucleic acid fragment, including 1) ligating the linker to the nucleic acid fragment having a pre-existing overhang, e.g., ligating the linker having a T-overhang of the 3 'end to the nucleic acid fragment having an a-overhang of the 3' end by complementation of the T-a-overhang; and 2) ligating the adaptor to the blunt-ended double-stranded nucleic acid fragment or the single-stranded nucleic acid fragment by a specific functional modification. The single-stranded adaptors ligated to the nucleic acid fragments may complement their complementary strands by complementation, thereby obtaining corresponding double-stranded adaptors.

The adaptor may also be a double stranded nucleic acid molecule, e.g.a double stranded nucleic acid molecule having functional ends formed by annealing or the like of two single stranded nucleic acid molecules which may be fully or partially complementary to each other. The double-stranded adaptor may have a single-stranded portion, such as a Y-adaptor formed by annealing two partially complementary single-stranded nucleic acid molecules, having one double-stranded portion and two single-stranded portions.

The adaptor may also be a nucleic acid molecule which is single-stranded in an initial state, in which a special base such as uracil is introduced into the single strand to form a closed structure such as a hairpin, and after both ends of the single strand are added to one end of the double-stranded nucleic acid fragment, the special base is degraded by an enzyme such as glycosylase to open the closed structure, thereby being double-stranded in a final state.

In some embodiments, the linker may be a single-stranded linker or a double-stranded linker. In some embodiments, the double-stranded adaptor may be a double-stranded Y-adaptor.

In some embodiments, the length of the linker may be 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50 nucleotides. As used herein, the "length" of a linker refers to the length of a single-stranded linker when the linker is single-stranded, and to the length of the longer strand in the double-strand when the linker (ultimately) is double-stranded. For example, in some embodiments, the linker may be a double-stranded linker, wherein one strand may be 16, 20, 24, 31, 34, 37, or 40 nucleotides in length and the other strand may be 15, 19, 23, 30, 33, 36, or 39 nucleotides in length, the linker is considered to be 16, 20, 24, 31, 34, 37, or 40 nucleotides in length.

In some embodiments, the length of the linker may be in a range having endpoints selected from the group consisting of: 10. 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50 nucleotides. For example, in some embodiments, the length of the linker may be in the range of 16-40 nucleotides, 16-37 nucleotides, 16-34 nucleotides, 20-37 nucleotides, 20-34 nucleotides, 20-31 nucleotides, 20-24 nucleotides, or 24-31 nucleotides. For example, in some embodiments, the length of the linker may be 16, 20, 24, 31, 34, 37, or 40 nucleotides. In some embodiments, the linker may be a double-stranded linker, and the length of one strand of the linker may differ from the length of the other strand of the linker by 1, 2, 3, 4, or 5 nucleotides, preferably 1 nucleotide. In some embodiments, the linker may be a double-stranded linker, and the length of one strand of the linker and the length of the other strand of the linker may differ by a spacer sequence of one strand of the linker, such as a length of T, e.g., 1 nucleotide. For example, in some embodiments, the linker may be a double-stranded linker, wherein one strand may be 16, 20, 24, 31, 34, 37, or 40 nucleotides in length, and the other strand may be 15, 19, 23, 30, 33, 36, or 39 nucleotides in length.

In some embodiments, the linker may comprise, in sequence in the 5'-3' direction, a public sequence, optionally a single molecule barcode, and optionally a spacer sequence; or consist of them. In some preferred embodiments, the linker may be free of the index sequence and the sequencer bridging anchor sequence.

As used herein, "common sequence" refers to a sequence that is amplified (e.g., PCR) by specific binding of primers to a linker used in a subsequent step.

As used herein, "single molecule barcode (single molecular barcode, SMB)" refers to a unique nucleotide sequence that can be located on a linker to add to a nucleic acid fragment to be sequenced, thereby uniquely labeling the nucleic acid fragment. One nucleic acid fragment may, after addition of a linker, carry one or two or more single molecule barcodes, which may be the same or different.

In some embodiments, there may be multiple nucleic acid fragments from a single sample or multiple nucleic acid fragments from multiple samples, with corresponding generation of multiple adaptor-added nucleic acid fragments. Such multiple adaptor-added nucleic acid fragments may be mixed together for subsequent processing. The multiple nucleic acid fragments can be distinguished from each other by the addition of a single molecule barcode.

After obtaining sequencing data, the corresponding nucleic acid fragment can be uniquely identified by identifying the single molecule barcode, or the sample source can be uniquely identified. For example, in processing nucleic acid samples from two patients, the nucleic acid samples are labeled separately with two sets of single molecule barcodes, then mixed to construct a sequencing library, one high throughput sequencing is performed, and after sequencing data is obtained, the two nucleic acid samples are distinguished by identifying the two sets of single molecule barcodes. In some embodiments, the single-molecule barcode sequences may be divided into single-molecule barcode groups according to the base uniformity principle, with any single-molecule barcode in each group being different from any single-molecule barcode in any other group. Single molecule barcodes are sometimes also referred to in the art as unique molecular recognition codes (Unique Molecular Identifier, UMI) or molecular barcode/two-dimensional codes (Molecular Barcode, MBC).

As used herein, a "spacer" sequence refers to a sequence between two nucleotide functional regions. In some embodiments, a spacer sequence may be present. In some embodiments, the spacer sequence may not be present. The spacer sequence may be 1, 2, 3, 4, 5, 6, 7 or 8 nucleotides in length. For example, the spacer sequence may be T. In some embodiments, spacer sequences are used for the linkage between the linker and the nucleic acid fragment, e.g., T-A linkage.

As used herein, an "index (index) sequence" is a functional sequence known in the art for use in second generation sequencing, often to rely on its sequence to identify the source of a sample. For example, in processing nucleic acid samples from two patients, the nucleic acid samples are labeled with two index sequences, respectively, and then mixed to construct a sequencing library, one high throughput sequencing is performed, and after sequencing data is obtained, the two nucleic acid samples are distinguished by identifying the two index sequences.

As used herein, a "sequencer bridging anchor sequence" is a functional sequence known in the art for use in second generation sequencing that anchors to the surface of a flow cell of a sequencer for bridge amplification. For example, for the commonly used Illumina platform, the sequencer bridging anchor sequence is commonly referred to as the P5/P7 sequence, the specific sequence of which is well known in the art. The method of the invention can be applied to various second-generation high-throughput sequencing platforms, such as: hiSeq/MiSeq/MiniSeq/MySeq/NovaSeq sequencing platform from Illumina, PGM/Proton sequencing platform from Thermo Fisher, etc.

In some embodiments, the length of the single molecule barcode may be sufficient to achieve unique identification of the nucleic acid fragment. In some embodiments, the length of the single molecule barcode may be within a range having endpoints selected from the group consisting of: 2. 3, 4, 5, 6,7, 8, 9, 10, 11, 12, 13 and 14 nucleotides. For example, in some embodiments, the length of a single molecule barcode may be in the range of 2-8 nucleotides, 2-7 nucleotides, 6-8 nucleotides, or 6-7 nucleotides, such as 2,6,7, or 8 nucleotides.

In some embodiments, the length of the common sequence may be sufficient to effect subsequent amplification. In some embodiments, the length of the common sequence may be within a range having endpoints selected from the group consisting of: 10. 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40 nucleotides. For example, in some embodiments, the length of the public sequence may be in the range of 13-31 nucleotides, 13-28 nucleotides, 13-25 nucleotides, 13-23 nucleotides, 13-17 nucleotides, or 17-23 nucleotides, e.g., 13, 17, 23, 25, 28, or 31 nucleotides.

The length of each of the single molecule barcode (if any) and the public sequence in a given length of linker can be tailored for various purposes provided that the length of each of the single molecule barcode (if any) and the public sequence is sufficient to perform its function. For example, in a given 20 nucleotide length linker, it may be that the linker has a single molecule barcode of 6 nucleotides in length, a common sequence of 13 nucleotides in length, and a spacer sequence of 1 nucleotide in length (e.g., T); it is also possible that, in order to measure a smaller number of nucleic acid fragments to be measured, the linker has a single molecule barcode of 4 nucleotides in length, a public sequence of 15 nucleotides in length and a spacer sequence of 1 nucleotide in length (e.g.T).

Next, after the adaptor-added nucleic acid fragment is prepared, capture of the target nucleic acid fragment is prepared. Since the samples from which the nucleic acid fragments to be sequenced are obtained vary, the preparation work varies.

In some cases, the concentration of nucleic acid fragments in the sample is higher, so that the adaptor-added nucleic acid fragments can be used directly for capture without amplification. In such a case, the adaptor-added nucleic acid fragments, which are directly used for capturing without amplification, extend outwardly at the 5 'end and the 3' end of the nucleic acid fragments, respectively, nucleotide sequences of 16 to 40 nucleotides, respectively, independently from the nucleic acid fragments. If there are multiple nucleic acid fragments, there are correspondingly multiple adaptor-added nucleic acid fragments. One or more of the adaptor-added nucleic acid fragments serves as an intermediate library or pre-library. One or more pre-libraries prepared separately from one or more samples may be mixed together for capture or may be captured separately from each other.

In other cases, the concentration of nucleic acid fragments in the sample is low, and therefore, it is necessary to amplify the nucleic acid fragments to which the adaptors are added in order to effectively perform subsequent capturing. In such a case, the adaptor-added nucleic acid fragment is amplified using the first amplification upstream primer and the first amplification downstream primer to obtain first amplicons that are each independently 16 to 40 nucleotides in nucleotide sequence at the 5 '-end and the 3' -end of the nucleic acid fragment, respectively, as compared to the nucleic acid fragment. If multiple nucleic acid fragments are present, then multiple first amplicons are correspondingly present. The one or more first amplicons serve as an intermediate library or pre-library. One or more pre-libraries prepared separately from one or more samples may be mixed together for capture or may be captured separately from each other.

In some embodiments, the length of the nucleic acid fragment to which the adaptor is added or the nucleotide fragment of the first amplicon extending outwardly at the 5 'end and the 3' end of the nucleic acid fragment, respectively, as compared to the nucleic acid fragment, may each independently be 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50 nucleotides. In some embodiments, the length of the nucleic acid fragment or the first amplicon to which the adaptor is added, respectively, extending outwardly at the 5 'end and the 3' end of the nucleic acid fragment as compared to the nucleic acid fragment may each independently be within a range having endpoints selected from the group consisting of: 10. 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50 nucleotides. For example, in some embodiments, the length of the nucleic acid fragment to which the adaptor is added or the nucleotide fragment of the first amplicon that extends outward at the 5 'end and the 3' end of the nucleic acid fragment, respectively, as compared to the nucleic acid fragment, may each independently be in the range of 16-40 nucleotides, 16-37 nucleotides, 16-34 nucleotides, 20-37 nucleotides, 20-34 nucleotides, 20-31 nucleotides, 20-24 nucleotides, or 24-31 nucleotides. For example, in some embodiments, the length of the nucleic acid fragment to which the adaptor is added or the nucleotide fragment of the first amplicon that extends outward at the 5 'and 3' ends of the nucleic acid fragment, respectively, may each independently be 16, 20, 24, 31, 34, 37, or 40 nucleotides as compared to the nucleic acid fragment.

Part or all of the public sequence derived from the adaptor or its complementary sequence contained in the adaptor-added nucleic acid fragment may constitute a primer binding portion for amplification of the adaptor-added nucleic acid fragment of the first amplification upstream/downstream primer pair.

In some embodiments, part or all of the sequence of the first amplification upstream primer may be sufficiently complementary to part or all of the public sequence or its complement to effect amplification of the adaptor-added nucleic acid fragment. In some embodiments, part or all of the sequence of the first amplification downstream primer may be sufficiently complementary to part or all of the public sequence or its complement to effect amplification of the adaptor-added nucleic acid fragment.

In some embodiments, the first amplification upstream primer and the first amplification downstream primer may be of sufficient length to bind to a common sequence or its complement and amplify the adaptor-added nucleic acid fragment. In some embodiments, the length of the first amplification upstream primer and the first amplification downstream primer may each independently be within a range having endpoints selected from the group consisting of: 10. 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40 nucleotides. In some embodiments, the first amplification upstream primer and the first amplification downstream primer can each independently be 14-28 nucleotides, 14-26 nucleotides, 16-24 nucleotides, or 16-22 nucleotides, e.g., 16 nucleotides or 22 nucleotides in length.

In some embodiments, the first amplification upstream primer may be free of the index sequence and the sequencer bridging anchor sequence. In some embodiments, the first amplification downstream primer may be free of the index sequence and the sequencer bridging anchor sequence.

In some embodiments, the 5 'end and/or the 3' end of the first amplification upstream primer may be flush or non-flush with the 5 'end and/or the 3' end of the adaptor. In some embodiments, the 5 'end and/or 3' end of the first amplification downstream primer may be flush or non-flush with the 5 'end and/or 3' end of the adaptor. "flush" means that the two sequences are aligned at the ends, each without protruding nucleotides. By "non-aligned" is meant that the two sequences are not aligned at the ends, one of which has an overhanging nucleotide.

In some embodiments, the 5 'end and/or 3' end of the first amplification upstream primer may differ from the 5 'end and/or 3' end of the adaptor by 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleotides. In some embodiments, the 5 'end and/or 3' end of the first amplification downstream primer may differ from the 5 'end and/or 3' end of the adaptor by 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleotides. This means that the two sequences are not aligned at the ends, one of which has an overhang of 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 nucleotides.

The amplification may be PCR amplification. PCR amplification is well known in the art. PCR amplification may be performed using one or more thermostable polymerases. The thermostable polymerase may be selected from: LA-Taq, rTaq, phusion, deep Vent (exo-), gold 360,Platinum Taq,KAPA 2G Robust and Q5 polymerase.

Third, the target nucleic acid fragment is captured from the adaptor-added nucleic acid fragment or the first amplicon without the use of a adaptor blocking reagent.

As previously mentioned, prior art methods typically construct a complete whole genome library, i.e., the resulting library members already contain the functional region sequences (e.g., sequencer bridging anchor sequences, index sequences, unique molecular identifiers, etc.) required for high throughput sequencing of a sequencer, and then hybridize and capture the complete whole genome library using a nucleic acid probe to construct a capture library for high throughput sequencing of the sequencer. However, since the length of the adaptor fragment connected to the nucleic acid fragment to be sequenced is generally long because the adaptor fragment needs to carry the functional region sequence (such as the bridging anchor sequence, the index sequence, the unique molecular identification code, etc.) required for high throughput sequencing of the sequencer, the overlap phenomenon of library members occurs, and the capturing efficiency is reduced, so that more expensive adaptor blocking agents are required to reduce the overlap of library members, and the capturing efficiency is improved.

In contrast, the method of the invention is to construct an intermediate library from nucleic acid fragments and then capture intermediate library members carrying the target sequence (i.e., target nucleic acid fragments) from the intermediate library. Members of the intermediate library may be free of functional region sequences required for sequencing, e.g., may be free of sequencer bridging anchor sequences and/or index sequences, and thus may be of shorter length. Thus, little or no overlap occurs when capturing intermediate library members, thereby eliminating the need for linker blocking agents of the prior art.

Techniques for capturing target nucleic acid sequences are well known in the art and may be performed, for example, using nucleic acid probes.

Finally, the intermediate library members are complemented and amplified using primers to construct a sequencing library for high throughput sequencing on a sequencer. The target nucleic acid fragment is amplified using the second amplification upstream primer and the second amplification downstream primer to obtain a second amplicon as a member of the sequencing library, thereby constructing the sequencing library. The second amplicon has the functional region sequence required for high throughput sequencing of the sequencer.

In some embodiments, the second amplification upstream primer may comprise, in sequence in the 5'-3' direction, an upstream sequencer bridging anchor sequence, an optional index sequence, and an upstream sequencing sequence; or consist of them.

In some embodiments, the second amplification downstream primer may comprise, in sequence in the 5'-3' direction, a downstream sequencer bridging anchor sequence, an optional index sequence, and a downstream sequencing sequence; or consist of them.

The method of the invention can be applied to various second-generation high-throughput sequencing platforms, such as: hiSeq/MiSeq/MiniSeq/MySeq/NovaSeq sequencing platform from Illumina, PGM/Proton sequencing platform from Thermo Fisher, etc. Thus, sequencers suitable for use in various platforms may be employed to bridge the anchor sequences. For example, a P5/P7 sequence suitable for use in an Illumina platform may be employed.

In some embodiments, the index sequence may or may not be present. This may be determined based on the type and/or number of samples.

In some embodiments, the index sequences may be the same or different. This may be determined based on the type and/or number of samples.

In some embodiments, the amplification may be performed for different intermediate libraries using a second amplification upstream primer and/or a second amplification downstream primer that contains different index sequences.

In some embodiments, the sequence of the public sequence or its complement and the sequence of the first amplification upstream primer may each be part or all of the sequence of the upstream sequencing sequence. In other words, the sequence of the common sequence or its complement and the sequence of the first amplification upstream primer may each be identical to part or all of the upstream sequencing sequence.

In some embodiments, the sequence of the public sequence or its complement and the sequence of the first amplification downstream primer may each be part or all of the sequence of the downstream sequencing sequence. In other words, the sequence of the common sequence or its complement and the sequence of the first amplification downstream primer may each be identical to part or all of the downstream sequencing sequence.

The methods of the present invention successfully construct sequencing libraries with maintained or improved capture efficiency compared to prior art methods without the use of linker blocking agents required by prior art methods, thereby eliminating the need for linker blocking agents by prior art methods, thereby reducing detection costs.

As used herein, the term "capture efficiency" is intended to encompass a number of aspects of the evaluation of capture, including, but not limited to, the following parameters:

(a) Target rate: refers to the ratio of the number of bases targeted in the target region to the total number of bases in the original effective sequencing lower machine data. Higher values indicate higher efficiency in capturing target sequences, and more useful data can be analyzed.

(b) Coverage degree: refers to the ratio of the number of bases with a sequencing depth greater than 0X obtained by targeting in the target region to the number of bases in the preset target region. The higher this value, the higher the efficiency of capturing coverage in the target area, the more valid data that can be analyzed.

(c) Redundancy: refers to the ratio of the number of molecules made by mirror image replication to the total number of molecules, and mirror image replication refers to multiple replications of molecules with identical start points, end points and sequences in the data obtained by sequencing. In the DNA of unit mass, the lower the mid-target rate is, the lower the proportion of molecules in the mid-target part is, the more easily reduced or deleted in the sequencing data is, namely the molecular diversity of the mid-target part is reduced, and the redundancy is affected.

(d) Stability: the standard deviation between replicates was used to measure the degree of dispersion. Stability is expressed herein as a positive error line, with shorter positive error lines indicating better stability.

(b)

wherein the kit does not include a linker blocking agent.

(b)

wherein the agent does not include a linker blocking agent.

The preferred aspects in the context of the method according to the invention also apply mutatis mutandis to the preferred aspects in the context of the kit and use according to the invention.

The invention provides the following items:

item 1. A method of constructing a sequencing library of a target sequence, comprising:

(b)

The method of item 1, wherein the adaptor-added nucleic acid fragment or the first amplicon extends outwardly from the 5 'end and the 3' end of the nucleic acid fragment, respectively, independently from each other for a nucleotide sequence of 16-37 nucleotides in length.

The method of item 1, wherein the adaptor-added nucleic acid fragment or the first amplicon extends outwardly from the 5 'end and the 3' end of the nucleic acid fragment, respectively, independently from each other by a nucleotide sequence of 16-34 nucleotides in length.

The method of item 1, wherein the adaptor-added nucleic acid fragment or the first amplicon extends outwardly from the 5 'end and the 3' end of the nucleic acid fragment, respectively, independently from each other for a nucleotide sequence of 20-37 nucleotides in length.

The method of item 1, wherein the adaptor-added nucleic acid fragment or the first amplicon extends outwardly from the 5 'end and the 3' end of the nucleic acid fragment, respectively, independently from each other for a nucleotide sequence of 20-34 nucleotides in length.

The method of item 1, wherein the adaptor-added nucleic acid fragment or the first amplicon extends outwardly from the 5 'end and the 3' end of the nucleic acid fragment, respectively, independently from each other for a nucleotide sequence of 20-31 nucleotides in length.

The method of item 1, wherein the adaptor-added nucleic acid fragment or the first amplicon extends outwardly at the 5 'end and the 3' end of the nucleic acid fragment, respectively, independently of each other, for a nucleotide sequence of 24 nucleotides in length, as compared to the nucleic acid fragment.

Item 8. The method of any one of items 1-7, wherein the single molecule barcode is 2-12 nucleotides in length.

The method of any one of clauses 1-7, wherein the public sequence is 13-31 nucleotides in length.

The method of any one of clauses 1-7, wherein the linker is a single-stranded linker or a double-stranded linker; optionally, the double-stranded adaptor is a double-stranded Y-adaptor.

The method of any one of clauses 1-7, wherein the 5 'end and/or the 3' end of the first amplification upstream primer is flush or not flush with the 5 'end and/or the 3' end of the adaptor; and/or

Wherein the 5 'end and/or 3' end of the first amplification downstream primer is flush or not flush with the 5 'end and/or 3' end of the adaptor.

Item 12. The method of any one of items 1-7, wherein:

(1) The nucleic acid fragments are from a single sample; or alternatively

(2) The nucleic acid fragments are from a plurality of samples, and the linker comprises the single molecule barcode.

The method of any one of clauses 1-7, wherein the linker is free of an index sequence and a sequencer bridging anchor sequence.

The method of any one of clauses 1-7, wherein the first amplification upstream primer and the first amplification downstream primer are each independently free of an index sequence and a sequencer bridging anchor sequence.

Item 15. A kit for constructing a sequencing library of target sequences, comprising:

(b)

wherein the kit does not include a linker blocking agent.

The kit of item 15, wherein the adaptor-added nucleic acid fragment or the first amplicon extends outwardly from the 5 'end and the 3' end of the nucleic acid fragment, respectively, independently from each other by a nucleotide sequence of 16-37 nucleotides.

The kit of item 17, wherein the adaptor-added nucleic acid fragment or the first amplicon extends outwardly from the 5 'end and the 3' end of the nucleic acid fragment, respectively, independently from each other by a nucleotide sequence of 16-34 nucleotides.

The kit of item 15, wherein the adaptor-added nucleic acid fragment or the first amplicon extends outwardly from the 5 'end and the 3' end of the nucleic acid fragment, respectively, independently from each other for a nucleotide sequence of 20-37 nucleotides.

The kit of item 19, wherein the adaptor-added nucleic acid fragment or the first amplicon extends outwardly from the 5 'end and the 3' end of the nucleic acid fragment, respectively, independently from each other, by a nucleotide sequence of 20-34 nucleotides.

The kit of item 20, wherein the adaptor-added nucleic acid fragment or the first amplicon extends outwardly from the 5 'end and the 3' end of the nucleic acid fragment, respectively, independently from each other, by a nucleotide sequence of 20-31 nucleotides.

The kit of item 15, wherein the adaptor-added nucleic acid fragment or the first amplicon extends outwardly at the 5 'end and the 3' end of the nucleic acid fragment, respectively, independently of each other, for a nucleotide sequence of 24 nucleotides in length, as compared to the nucleic acid fragment.

The kit of any one of items 15-21, wherein the single molecule barcode is 2-12 nucleotides in length.

The kit of any one of items 15-21, wherein the common sequence is 13-31 nucleotides in length.

The kit of any one of clauses 15-21, wherein the linker is a single-stranded linker or a double-stranded linker; optionally, the double-stranded adaptor is a double-stranded Y-adaptor.

The kit of any one of clauses 15-21, wherein the 5 'end and/or the 3' end of the first amplification upstream primer is flush or not flush with the 5 'end and/or the 3' end of the adaptor; and/or

The kit of any one of items 15-21, wherein:

(1) The nucleic acid fragments are from a single sample; or alternatively

The kit of any one of clauses 15-21, wherein the linker is free of an index sequence and a sequencer bridging anchor sequence.

The kit of any one of clauses 15-21, wherein the first amplification upstream primer and the first amplification downstream primer are each independently free of an index sequence and a sequencer bridging anchor sequence.

Use of a reagent for preparing a kit for constructing a sequencing library of a target sequence, the reagent comprising:

(b)

wherein the agent does not include a linker blocking agent.

The use of item 29, wherein the adaptor-added nucleic acid fragment or the first amplicon extends outwardly from the 5 'end and the 3' end of the nucleic acid fragment, respectively, independently from each other for a nucleotide sequence of 16-37 nucleotides in length.

The use of item 29, wherein the adaptor-added nucleic acid fragment or the first amplicon extends outwardly from the 5 'end and the 3' end of the nucleic acid fragment, respectively, independently from each other for a nucleotide sequence of 16-34 nucleotides in length.

The use of item 29, wherein the adaptor-added nucleic acid fragment or the first amplicon extends outwardly from the 5 'end and the 3' end of the nucleic acid fragment, respectively, independently from each other for a nucleotide sequence of 20-37 nucleotides in length.

The use of item 29, wherein the adaptor-added nucleic acid fragment or the first amplicon extends outwardly from the 5 'end and the 3' end of the nucleic acid fragment, respectively, independently from each other for a nucleotide sequence of 20-34 nucleotides in length.

The use of item 29, wherein the adaptor-added nucleic acid fragment or the first amplicon extends outwardly from the 5 'end and the 3' end of the nucleic acid fragment, respectively, independently from each other for a nucleotide sequence of 20-31 nucleotides in length.

The use of item 29, wherein the adaptor-added nucleic acid fragment or the first amplicon extends outwardly at the 5 'end and the 3' end of the nucleic acid fragment, respectively, independently of each other, for a nucleotide sequence of 24 nucleotides in length, as compared to the nucleic acid fragment.

The use of any one of clauses 29-35, wherein the single molecule barcode is 2-12 nucleotides in length.

The use of any one of clauses 29-35, wherein the public sequence is 13-31 nucleotides in length.

The use of any one of clauses 29-35, wherein the linker is a single-stranded linker or a double-stranded linker; optionally, the double-stranded adaptor is a double-stranded Y-adaptor.

The use of any one of clauses 29-35, wherein the 5 'end and/or the 3' end of the first amplification upstream primer is flush or not flush with the 5 'end and/or the 3' end of the adaptor; and/or

The use according to any one of items 29-35, wherein:

(1) The nucleic acid fragments are from a single sample; or alternatively

The use of any one of clauses 29-35, wherein the linker is free of an index sequence and a sequencer bridging anchor sequence.

The use of any one of clauses 29-35, wherein the first amplification upstream primer and the first amplification downstream primer are each independently free of an index sequence and a sequencer bridging anchor sequence.

The invention will now be illustrated by the following non-limiting examples. Unless otherwise indicated, experimental data are averages of duplicate replicates.

Examples

Method

(1) DNA fragmentation and end repair

a) A repair procedure was set up on a PCR apparatus (model T100, manufacturer Bio-Rad) according to Table 1, with a hot cap temperature of 75℃and a reaction volume of 25. Mu.L, run and suspended.

b) Using Qiagen, 5X WGS Fragmentation Mix (cat. No. Y9410L), 100ng of the sample DNA was placed in a PCR tube, buffer EB (Qiagen, cat. No. 19086) was added thereto to 17.5. Mu.L according to the volume of the sample DNA, and then the components were added in the order as exemplified in Table 2. Flick mixing, centrifuging for 5 seconds, removing bubbles by flick PCR tube, centrifuging for 5 seconds, placing into a suspended PCR instrument, and running the procedure of Table 1 to prepare the end-repaired sample DNA fragment.

TABLE 1

Reaction temperature	Reaction time
		32℃	22min(100ng)
65℃	30min
		4℃	Pause

TABLE 2

(2) Connecting joint

a) The ligation procedure was set up on the PCR apparatus according to Table 3, the reaction volume was 50. Mu.L, run and paused.

TABLE 3 Table 3

Reaction temperature	Reaction time
		20℃	15min
4℃	Pause

b) The components were added in the order exemplified in Table 4 using the WGS Ligase (cat No. L6030-W-L) of Qiagen, thoroughly mixed, centrifuged for 5 seconds, flicked to remove air bubbles, centrifuged for 5 seconds again, placed in a suspended PCR apparatus, and the procedure of Table 3 was run to prepare a DNA fragment to which a linker was attached.

TABLE 4 Table 4

Component (A)	Sample addition amount
		End-repaired sample DNA fragment	25μL
Buffer EB	9μl
		5x WGS Ligase buffer	10μL
Joint (10 mu M)	3μL
		WGS Ligase	5μL
Final volume of reaction	50μL

c) Note that: the linker with different single molecule bar codes is used for each sample, so that the repeated use of the same linker in the same batch of operation is avoided.

(3) Ligation product purification

a) Purification was performed using purification magnetic beads. Before purification, the beads were removed from the refrigerator at 4 ℃, mixed upside down, and equilibrated for 30 minutes at room temperature. 50. Mu.L of the ligation product was mixed with 100. Mu.L (2X) of magnetic beads in a 1.5ml EP tube, gently flicked with a finger or shaking, briefly centrifuged for 3 seconds and allowed to stand at room temperature for 5-10 minutes.

b) The EP tube was placed on a magnetic rack and adsorbed for 5 minutes, and the supernatant was pipetted off, taking care not to aspirate the beads.

c) On the magnetic rack, 200 mu L of cleaning liquid is slowly added into the EP tube along one side far away from the magnetic beads, the magnetic rack is gently rocked, the magnetic beads are prevented from scattering, the supernatant is sucked by a pipette, and the magnetic beads are not required to be sucked.

d) Repeating step c).

e) The EP tube was removed from the magnet holder, centrifuged briefly for 3 seconds, again placed on the magnet holder, the whole supernatant was pipetted off with a 10. Mu.L pipette, the tube cap was opened and left at room temperature for 5 minutes.

f) The EP tube was removed from the magnet holder, 25. Mu.L DNase/RNase-Free deionized water (Tiangen, cat. RT 121) was added, the eluate was gently flicked by hand and mixed with the beads and allowed to stand at room temperature for 10 minutes.

g) The EP tube was placed on a magnetic rack and adsorbed for 1 min, 23. Mu.L of the supernatant, which was the purified product after ligation, was aspirated into the new PCR tube.

(4) First round amplification

a) Placing the purified product after connection back on ice; the PCR procedure was set up on a PCR apparatus according to Table 5, the hot cap temperature was 105℃and the reaction volume was 50. Mu.L, run and suspended:

TABLE 5

b) Using KAPA hot start HiFi high fidelity enzyme Readymix (cat No. KK 2602) from Roche-KAPA, the components were added sequentially as exemplified in Table 6, thoroughly mixed, centrifuged for 5 seconds, flicked to remove air bubbles, centrifuged for 5 seconds,

put into a paused PCR instrument, the procedure of table 5 was run to prepare the first round of amplification products (i.e., the first amplicon).

TABLE 6

(5) Purification and concentration determination of first round amplification products

a) Purification was performed using purification magnetic beads. Before purification, the beads were removed from the refrigerator at 4 ℃, mixed upside down, and equilibrated for 30 minutes at room temperature. 50. Mu.L of the first round amplification product (first amplicon) was mixed with 75. Mu.L (1.5X) of magnetic beads in a 1.5ml EP tube, gently flicked with a finger or shaking, briefly centrifuged for 3 seconds, and left at room temperature for 5-10 minutes.

d) Repeating step c).

f) The EP tube was removed from the magnet holder, 35. Mu.L DNase/RNase-Free deionized water (Tiangen, cat. RT 121) was added, the eluate was gently flicked by hand and mixed with the magnetic beads and allowed to stand at room temperature for 10 minutes.

g) The EP tube was placed on a magnetic rack and adsorbed for 1 min, and 33. Mu.L of the supernatant, which was the pre-library purified product, was pipetted into the new EP tube.

h) Mu.l of supernatant was taken and Qubit dsDNA HS Assay Kit (Invitrogen, cat. No. Q32851) and Qubit were used ^TM A4 Fluorometer (Invitrogen, cat. No. Q33226) concentration meter measures the first amplicon concentration in the supernatant.

i) Fragment quality testing is performed on the first amplicon using a fragment quality testing device, such as Agilent 4200TapeStation Fragment Analyzer. The length of the first amplicon is the length of the sample fragment + the length of the two-terminal adaptor. The first amplicon may be used immediately in the next experiment or stored at-20.+ -. 5 ℃.

(6) Hybridization of biotin-containing probes

a) The procedure of table 7 was set up on the PCR instrument, running and suspending:

TABLE 7

b) According to the first amplicon concentration of each sample, 500ng of each sample was calculated, the first round of amplification product was added to a new PCR tube a, 1-8 samples with different molecular barcodes were placed in one PCR tube, marked with a mark pen, each component was added sequentially as exemplified in table 8 using xGen Hybridization and Wash Kit, box1, box2 (cat No. 1080584) from IDT, centrifuged for 5 seconds, flicked to remove air bubbles, centrifuged for 5 seconds again, the PCR tube was opened and placed in a vacuum concentrator, and a 45 ℃ concentration program was run until there was no liquid dry state.

TABLE 8

c) During library concentration, the hybridization solution (xGen Hybridization and Wash Kit from IDT, box1, box2 (accession No. 1080584)) and the as-designed biotin-containing probe were thawed on ice, the components were added to PCR tube B in the order exemplified in table 9, gently vortexed, centrifuged for 5 seconds, gently flicked to remove air bubbles, and centrifuged for 5 seconds.

TABLE 9

d) Transferring 17 μl of the mixed solution in the PCR tube B to a PCR tube A concentrated to a dry state (note that the pipette tip does not touch the bottom of the tube when adding, and can be added at 1/2 of the tube wall), standing at room temperature for 5min, slightly swirling or flicking the finger, mixing, centrifuging for 5 seconds, removing bubbles, centrifuging for 5 seconds, putting into a PCR instrument, running the hybridization program set in Table 7, and performing overnight hybridization reaction to hybridize the biotin-containing probe with the target molecule.

(7) Capturing target molecules hybridized to biotin-containing probes using streptavidin magnetic beads

a) Preparing Buffer, taking out captured magnetic beads Dynabeads Streptavidin M270 from xGen Hybridization and Wash Kit, box1 and Box2 (product No. 1080584) of IDT 1h before hybridization reaction is finished, balancing at room temperature for 30min, thawing other components at room temperature, diluting the washing liquid into 1x working liquid according to table 10 (when one sample is taken as an example, mix can be prepared according to actual loss conditions when the sample is more), taking 100 mu l of prepared 1x Wash Buffer I, split charging 1x Stringent Wash Buffer into 3 0.2ml PCR tubes (100 ul of each tube), and then placing split charging 1x Wash Buffer I,1x Stringent Wash Buffer into about 65 ℃ for balancing 15min, and placing the rest Wash Buffer I, wash Buffer II and Wash Buffer III into room temperature for standby.

Table 10

b) The PCR program was set on the PCR apparatus with a hot cap temperature of 70℃and a reaction volume of 40. Mu.L, run and suspended:

TABLE 11

Reaction temperature	Reaction time
		65℃	45min
65℃	Pause

c) After vortexing the equilibrated Dynabeads StreptavidinM for 15 seconds, 50 μl of each sample was added to the PCR tube; adding 100 μl of 1x Bead Wash Buffer (xGen Hybridization and Wash Kit, box1, box2 (product number 1080584) from IDT) into each PCR tube, mixing well by vortex, placing on a magnetic rack for 1min until the solution is clear, removing the supernatant, centrifuging for 5 seconds after the supernatant is discarded for the last time, placing on the magnetic rack for 1min, and removing the residual liquid; add 17. Mu.l of bead resuspension buffer per sample (xGen Hybridization and Wash Kit from IDT, box1, box2 (cat. No. 1080584)) (prepare bead resuspension buffer according to Table 9, where biotin-containing probes in the fractions were replaced with DNase/RNase-Free deionized water (Tiangen, cat. No. RT 121)), centrifuge for 5 seconds, mix with gentle vortexing, remove air bubbles, centrifuge for another 5 seconds, run the procedure of Table 10 in a PCR apparatus to bind streptavidin magnetic beads to target molecules hybridized to biotin-containing probes, and then rapidly proceed to the next reaction.

d) And opening the cover of the PCR instrument, opening the cover of the PCR tube, and not stopping running the program, blowing and sucking the mixture of the preheated and resuspended magnetic beads at 65 ℃ for 5 times, then quickly adding the mixture into the PCR tube, blowing and sucking the mixture uniformly, quickly and briefly centrifuging the mixture, running the program in the table 10, blowing and sucking the mixture 10 times or slightly swirling the mixture by using a 20 mu l low-adsorption gun head every 15min to ensure that the magnetic beads are in a suspended state, and briefly centrifuging the mixture to remove bubbles after each uniformly mixing.

e) Washing impurities at 65 ℃, adding 150 μl of 1xWash Buffer I (xGen Hybridization and Wash Kit, box1, box2 (product No. 1080584) from IDT) preheated at 65 ℃ into a PCR tube after 45min incubation, blowing and sucking 10 times with 70 μl low adsorption gun head, placing on a magnetic rack for 1min until solution is clear, and removing supernatant; rapidly adding 150 μl of preheated 1x Stringent Wash Buffer (xGen Hybridization and Wash Kit, box1, box2 (product No. 1080584) from IDT) at 65deg.C, sucking 10 times with 50 μl of low adsorption gun head, placing on 65 ℃ constant temperature metal bath (model H203-100C, manufactured by biological technology of honeysuckle, beijing Co., ltd.), incubating at 65deg.C for 5min, placing on a magnetic rack for 1min until solution is clarified, removing supernatant, and repeating the steps once;

f) Washing impurities at room temperature, taking the PCR tube off the magnetic frame, adding 150 μl of 1XWash Buffer I (xGen Hybridization and Wash Kit, box1, box2 (product No. 1080584) from IDT) placed at room temperature, keeping vortex for 30s, stopping 30s, vortex for 30s again, centrifuging, placing on the magnetic frame for 1min until the solution is clear, and removing the supernatant; taking the PCR tube off the magnetic rack, adding 150 μl of 1XWash Buffer II placed at room temperature, keeping vortex for 30s, stopping 30s, vortex for 30s again, centrifuging, placing on the magnetic rack for 1min until the solution is clear, and removing the supernatant; removing the PCR tube from the magnetic rack, adding 150 μl of 1x Wash Buffer III (xGen Hybridization and Wash Kit, box1, box2 (product No. 1080584) from IDT) placed at room temperature, holding vortex for 30s, stopping 30s, vortex for 30s again, centrifuging, placing on the magnetic rack for 1min until the solution is clear, removing the supernatant, centrifuging again, placing on the magnetic rack to thoroughly remove residual impurity washing liquid; and taking the PCR tube off the magnetic frame, adding 23 mu l of DNase/RNase-Free deionized water, blowing, sucking, mixing uniformly and centrifuging for later use. At this time, both the magnetic beads and the suspension exist in the PCR tube, and attention is paid not to the fact that the magnetic beads are removed by a magnetic rack, but to the fact that the next reaction is carried out by carrying the magnetic beads.

(8) Second round amplification and purification after capture

a) The amplification procedure was set on the PCR instrument, the thermal head temperature was 105 ℃, the reaction volume was 55 μl, run and paused, and the number of cycles was adjustable according to the different biotin-containing probes:

table 12

b) The components were added in the order illustrated in Table 12, and then thoroughly mixed, centrifuged for 5 seconds, flicked to remove air bubbles, centrifuged for another 5 seconds, and placed in a suspended PCR apparatus, and the procedure of Table 11 was run to prepare a second round of amplification products.

TABLE 13

c) Purification of amplification products: the second round amplification products were purified using AMPure XP purification magnetic beads. Before purification, the beads were removed from the refrigerator at 4 ℃, mixed upside down, and equilibrated for 30 minutes at room temperature. 50. Mu.L of the amplified product and 75. Mu.L of magnetic beads were mixed in a PCR tube, gently flicked with a finger or mixed with shaking, briefly centrifuged for 3 seconds and left at room temperature for 8 minutes. The sample was placed on a magnetic rack and adsorbed for 5 minutes, and the supernatant was pipetted off, taking care not to aspirate the beads. 150 mu L of 80% absolute ethyl alcohol is slowly added along one side far away from the magnetic beads on the magnetic frame, the magnetic frame is gently rocked, the magnetic beads are prevented from scattering, the supernatant is sucked by a pipette, and the magnetic beads are not required to be sucked. The mixture was washed once with 150. Mu.L of 80% absolute ethanol. The PCR plate was removed from the magnet holder, centrifuged briefly for 3 seconds, placed again on the magnet holder, all supernatant was pipetted off with a 10. Mu.L pipette, the tube lid was opened and left at room temperature for 5 minutes. The PCR plate was removed from the magnet holder, 40. Mu.L Buffer EB (Qiagen cat. No. 19086) was added, vortexed well for 1min, and left at room temperature for 8 min after gentle centrifugation. The PCR plate was placed on a magnetic rack, adsorbed for 1min, and 58. Mu.L of the supernatant was aspirated into a new 1.5ml EP tube to obtain a sequencing library.

(9) Quality control and sequencing of a sequencing library, wherein the sequencing library is subjected to real-time fluorescent quantitative PCR detection, and a sample is more than or equal to 5nM, so that the next on-machine sequencing can be performed. If the next experiment cannot be immediately performed, the sequencing library can be stored at-20+/-5 ℃ and on-machine sequencing can be arranged within one month. The fragment length of the sequencing library is equal to the length of the sample fragment plus the length of the full-length sequencing sequence at both ends. Sequencing the sequencing library on a sequencing platform.

(10) Sequencing data such as coverage of the target sequence at a certain sequencing depth, mid-target rate, etc. are analyzed.

Example 1: library construction without blocking agent of the linker of the invention (C), IDT commercial long linker control plus blocking Library construction of blocking agent (K), library construction of IDT commercial linker control plus blocking agent (M), IDT commercial long linker pair Between construction of the library without blocking agent (J1) and construction of the library without blocking agent (J2) against the IDT commercial linker control Comparison of

Sample: commercial tumor mutation standard (cyanine gene, tumor SNV 5% gdna standard, cat No. gwagtm 1003) diluted to a certain mutation ratio.

And (3) target: 400K, comprising 86 genes related to tumor (IDT synthesis 120nt 5-end modified biotin probe pool).

And (3) joint: (1) The linker of the invention (31 nucleotides) (example C, sequence see sequence listing); (2) Commercial long linker control (xGen Dual index with UMI Adapter) purchased from IDT (IDT does not disclose a sequence for comparative examples K and J1 are the same); and (3) a commercial linker control purchased from IDT (Duplex Seq Adapter (IDT does not disclose sequences for comparative examples M and J2 are the same).

First round amplification: for the linkers of the invention, an intermediate library (or pre-library) is prepared. For commercial long linker controls purchased from IDT, an already complete whole genome library was constructed. The corresponding primer sequences are shown in the sequence table.

Capturing: no blocking agent was added to examples C and comparative examples J1 and J2. Blocking agents, xGen Universal Blockers-TS Mix from IDT, cat# 1075475, were added to comparative examples K and M.

Second round of amplification: a sequencing library (or called a final library) is prepared, and the corresponding primer sequences are shown in a sequence table. Note that for IDT commercial linker comparative examples (M and J2), the second round of amplification was performed before capture as with the first round of amplification to obtain a complete whole genome library before capture.

Results:

sequencing was performed as described in the methods section above and the results are shown in FIG. 3. Using the inventive method without blocking agent (C, 82%), the mid-target rate was increased by 7% compared to the comparative method using IDT commercial long linker control with blocking agent added (K, 75%); the mid-target rate was increased by 52% compared to the control method (J1, 30%) using IDT commercial long linker control without blocking agent; the mid-target rate was 8% higher than the control method (M, 74%) using IDT commercial linker control with addition of blocking agent; the mid-target rate was 52% higher compared to the control method (J2, 30%) using IDT commercial linker control without blocking agent. It can be seen that the library construction method according to the present invention eliminates the need for blocking agents in prior art methods, thereby reducing detection costs and even further increasing the targeting rate.

Example 2: the library constructions without blocking agent for the various length linkers of the present invention, long linker control without blocking agent, and comparison between IDT commercial long linker control plus blocking agent library Constructions (CK).

Sample: the same as in example 1.

And (3) target: the same as in example 1.

And (3) joint: (1) The linkers of various lengths of the invention (16 nucleotides, 20 nucleotides, 24 nucleotides, 31 nucleotides, 34 nucleotides, 37 nucleotides, 40 nucleotides); (2) The inventors self-designed long-joint control (62 nucleotides) according to Illumina platform; and (3) a commercial long linker control (same as example 1) purchased from IDT (CK). The linker sequence is shown in the sequence table.

First round amplification: for the various lengths of the linkers of the invention, intermediate libraries (or pre-libraries) were prepared. For both the self-designed long linker (62 nucleotides) and the commercial long linker purchased from IDT, a complete whole genome library was constructed. The corresponding primer sequences are shown in the sequence table.

Capturing: the embodiments employing the various lengths of the present invention do not incorporate a splice-blocking agent; the comparative example using the 62 nucleotide long linker control did not incorporate a linker blocking agent; a linker blocking agent was added to a comparative example (CK) using a commercial linker control purchased from IDT.

Second round of amplification: a sequencing library (or called a final library) is prepared, and the corresponding primer sequences are shown in a sequence table.

Results:

sequencing was performed as described in the methods section above and the results are shown in FIG. 4. The linker of the present invention achieved comparable mid-target rates in the 16-40 nucleotide length range (65.5-84.9%) without the use of the linker blocking agent as compared to the comparative example (CK, 74.6%) using IDT commercial long linkers and linker blocking agents, even higher mid-target rates in the 20-31 nucleotide length range (81.4-84.9%), consistent with example 1. When the linker length reached 62 nucleotides, the mid-target rate was low (48.9%) if no linker blocking was added. These results indicate that the targeting rate of the library construction method according to the present invention is determined by the length of the capture object (e.g., adaptor-added nucleic acid fragment or first amplicon directly used for capture) extending outward at the 5 'and 3' ends of the nucleic acid fragment as compared to the nucleic acid fragment. The various length linkers of the present invention also verify the successful use of single molecule barcodes of various lengths.

Example 3: verification was performed in clinical applications, comparing library construction of the linker of the invention without blocking with library construction of IDT commercial linker plus blocking agent.

Sample: clinical tumor fresh tissue samples (12 cases) from Co-constructed laboratories

And (3) target: 39M whole exome, from IDT xGen Exome Research Panel v 1.0.0, accession number 1056115.

And (3) joint: (1) the linker of the invention (31 nucleotides); and (2) a commercial long linker control purchased from IDT (same as in example 1). The linker sequence is shown in the sequence table.

Capturing: the embodiment employing the linker of the invention does not incorporate a linker blocking agent; a linker blocking agent was added to a comparative example using a commercial linker control purchased from IDT.

Results:

sequencing was performed as described in the methods section above and the results are shown in FIG. 5. When the method is applied to clinical samples, compared with the prior art, the method eliminates the need for a joint sealing agent, improves the targeting rate by 3.54%, reduces the redundancy by 2.88%, has basically the same coverage, and has better stability after repeating for 12 times compared with the prior art.

Example 4: comparison between library constructions of the invention with linkers of the same length and different nucleotide composition without blocking agent.

Sample: the same as in example 1.

And (3) target: the same as in example 1.

And (3) joint: two linkers C (identical to example 1) and D of the invention of the same length (31 nucleotides), and different nucleotide composition. The linker sequence is shown in the sequence table.

First round amplification: an intermediate library (or pre-library) was prepared. The corresponding primer sequences are shown in the sequence table.

Capturing: no splice closure was added.

Results:

sequencing was performed as described in the methods section above and the results are shown in FIG. 6. The linkers of the invention of the same length and different nucleotide compositions achieved very similar mid-target rates (82.3% vs. 82.7%), indicating that the linker nucleotide composition did not substantially affect mid-target rate.

Example 5: economic analysis of detection methods

The cost of inventory of the clinical samples in example 3 was used to conduct an economic analysis of the assay. The cost of hybridization was reduced by 14.72% for 12 samples hybridized in a manner that 4 samples were hybridized with 1 capture library. The hybridization of 12 samples was performed in such a way that 1 sample was hybridized with 1 capture library, reducing the cost by 15.96%.

Table 14: prior art (IDT method) library construction main reagent

/>

^a : singapore element; ^b : dollars.

Table 15: the main reagent for library construction

^a : singapore element; ^b : dollars.

Table 16: cost comparison

The foregoing description of the examples and embodiments should be taken as illustrating, rather than limiting, the invention as defined by the claims. As will be readily appreciated, many variations and combinations of the above features may be utilized without departing from the present invention as set forth in the claims. Such variations are not to be regarded as a departure from the spirit and scope of the invention, and all such variations are intended to be included within the scope of the following claims.

Sequence listing

/>

N represents a single molecule barcode sequence and N represents an index sequence.

Claims

1. A method of constructing a sequencing library of a target sequence, comprising:

(a) Adding adaptors to the 5 'and 3' ends of the nucleic acid fragments to obtain adaptor-added nucleic acid fragments; wherein the nucleic acid fragment is obtained by fragmenting a nucleic acid;

（b）

The second amplification downstream primer comprises, in sequence in the 5'-3' direction, a downstream sequencer bridging anchor sequence, optionally an index sequence, and a downstream sequencing sequence,

wherein the 5 'end and/or 3' end of the first amplification upstream primer is flush or not flush with the 5 'end and/or 3' end of the adaptor; and/or

2. The method of claim 1, wherein the adaptor-added nucleic acid fragment or the first amplicon extends outwardly from the 5 'end and the 3' end of the nucleic acid fragment, respectively, independently from each other for a nucleotide sequence of 16-37 nucleotides in length.

3. The method of claim 1, wherein the adaptor-added nucleic acid fragment or the first amplicon extends outwardly from the 5 'end and the 3' end of the nucleic acid fragment, respectively, independently from each other for a nucleotide sequence of 16-34 nucleotides in length.

4. The method of claim 1, wherein the adaptor-added nucleic acid fragment or the first amplicon extends outwardly from the 5 'end and the 3' end of the nucleic acid fragment, respectively, independently of a nucleotide sequence of 20-37 nucleotides in length.

5. The method of claim 1, wherein the adaptor-added nucleic acid fragment or the first amplicon extends outwardly from the 5 'end and the 3' end of the nucleic acid fragment, respectively, independently of the nucleic acid fragment, for a nucleotide sequence of 20-34 nucleotides in length.

6. The method of any one of claims 1-5, wherein the single molecule barcode is 2-12 nucleotides in length.

7. The method of any one of claims 1-5, wherein the public sequence is 13-31 nucleotides in length.

8. The method of any one of claims 1-5, wherein the linker is a single-or double-stranded linker; wherein the double-stranded linker is a double-stranded Y-type linker.

9. The method of any one of claims 1-5, wherein:

(1) The nucleic acid fragments are from a single sample; or alternatively

10. The method of any one of claims 1-5, wherein the linker is free of an index sequence and a sequencer bridging anchor sequence.

11. The method of any one of claims 1-5, wherein the first amplification upstream primer and the first amplification downstream primer are each independently free of an index sequence and a sequencer bridging anchor sequence.

12. A kit for constructing a sequencing library of a target sequence, comprising:

(a) A linker capable of being added to the 5 'and 3' ends of the nucleic acid fragment to obtain a nucleic acid fragment to which the linker is added; wherein the nucleic acid fragment is obtained by fragmenting a nucleic acid;

（b）

wherein the kit does not include a linker blocking agent;

13. The kit of claim 12, wherein the adaptor-added nucleic acid fragment or the first amplicon extends outwardly from the 5 'end and the 3' end of the nucleic acid fragment, respectively, independently from each other for a nucleotide sequence of 16-37 nucleotides in length.

14. The kit of claim 12, wherein the adaptor-added nucleic acid fragment or the first amplicon extends outwardly from the 5 'end and the 3' end of the nucleic acid fragment, respectively, independently from each other for a nucleotide sequence of 16-34 nucleotides in length.

15. The kit of claim 12, wherein the adaptor-added nucleic acid fragment or the first amplicon extends outwardly from the 5 'end and the 3' end of the nucleic acid fragment, respectively, independently from each other for a nucleotide sequence of 20-37 nucleotides in length.

16. The kit of claim 12, wherein the adaptor-added nucleic acid fragment or the first amplicon extends outwardly from the 5 'end and the 3' end of the nucleic acid fragment, respectively, independently from each other for a nucleotide sequence of 20-34 nucleotides in length.

17. The kit of claim 12, wherein the adaptor-added nucleic acid fragment or the first amplicon extends outwardly from the 5 'end and the 3' end of the nucleic acid fragment, respectively, independently from each other for a nucleotide sequence of 20-31 nucleotides in length.

18. The kit of claim 12, wherein the adaptor-added nucleic acid fragment or the first amplicon extends outwardly at the 5 'end and the 3' end of the nucleic acid fragment, respectively, for a nucleotide sequence of 24 nucleotides, independently of the nucleic acid fragment.

19. The kit of any one of claims 12-18, wherein the single molecule barcode is 2-12 nucleotides in length.

20. The kit of any one of claims 12-18, wherein the common sequence is 13-31 nucleotides in length.

21. The kit of any one of claims 12-18, wherein the linker is a single-or double-stranded linker; wherein the double-stranded linker is a double-stranded Y-type linker.

22. The kit of any one of claims 12-18, wherein:

(1) The nucleic acid fragments are from a single sample; or alternatively

23. The kit of any one of claims 12-18, wherein the linker is free of an index sequence and a sequencer bridging anchor sequence.

24. The kit of any one of claims 12-18, wherein the first amplification upstream primer and the first amplification downstream primer are each independently free of an index sequence and a sequencer bridging anchor sequence.

25. Use of a reagent for preparing a kit for constructing a sequencing library of a target sequence, the reagent comprising:

（b）

wherein the agent does not include a linker blocking agent;

26. The use of claim 25, wherein the adaptor-added nucleic acid fragment or the first amplicon extends outwardly from the 5 'and 3' ends of the nucleic acid fragment, respectively, independently of a nucleotide sequence of 16-37 nucleotides in length.

27. The use of claim 25, wherein the adaptor-added nucleic acid fragment or the first amplicon extends outwardly from the 5 'end and the 3' end of the nucleic acid fragment, respectively, independently of a nucleotide sequence of 16-34 nucleotides in length.

28. The use of claim 25, wherein the adaptor-added nucleic acid fragment or the first amplicon extends outwardly from the 5 'and 3' ends of the nucleic acid fragment, respectively, independently of a nucleotide sequence of 20-37 nucleotides in length.

29. The use of claim 25, wherein the adaptor-added nucleic acid fragment or the first amplicon extends outwardly from the 5 'end and the 3' end of the nucleic acid fragment, respectively, independently of a nucleotide sequence of 20-34 nucleotides in length.

30. The use of claim 25, wherein the adaptor-added nucleic acid fragment or the first amplicon extends outwardly from the 5 'end and the 3' end of the nucleic acid fragment, respectively, independently of a nucleotide sequence of 20-31 nucleotides in length.

31. The use of claim 25, wherein the adaptor-added nucleic acid fragment or the first amplicon extends outwardly at the 5 'and 3' ends of the nucleic acid fragment, respectively, for a nucleotide sequence of 24 nucleotides, independently of the nucleic acid fragment.

32. The use of any one of claims 25-31, wherein the single molecule barcode is 2-12 nucleotides in length.

33. The use of any one of claims 25-31, wherein the public sequence is 13-31 nucleotides in length.

34. The use of any one of claims 25-31, wherein the linker is a single-or double-stranded linker; wherein the double-stranded linker is a double-stranded Y-type linker.

35. The use of any one of claims 25-31, wherein:

(1) The nucleic acid fragments are from a single sample; or alternatively

36. The use of any one of claims 25-31, wherein the linker is free of an index sequence and a sequencer bridging anchor sequence.

37. The use of any one of claims 25-31, wherein the first amplification upstream primer and the first amplification downstream primer are each independently free of an index sequence and a sequencer bridging anchor sequence.