WO2024124379A1 - 一种多模板核酸同步测序的方法及其应用 - Google Patents

一种多模板核酸同步测序的方法及其应用 Download PDF

Info

Publication number
WO2024124379A1
WO2024124379A1 PCT/CN2022/138468 CN2022138468W WO2024124379A1 WO 2024124379 A1 WO2024124379 A1 WO 2024124379A1 CN 2022138468 W CN2022138468 W CN 2022138468W WO 2024124379 A1 WO2024124379 A1 WO 2024124379A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequencing
templates
primers
round
present
Prior art date
Application number
PCT/CN2022/138468
Other languages
English (en)
French (fr)
Inventor
张元念
龚梅花
张颖华
欧阳凯
丁娅
徐崇钧
Original Assignee
深圳华大智造科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳华大智造科技股份有限公司 filed Critical 深圳华大智造科技股份有限公司
Priority to PCT/CN2022/138468 priority Critical patent/WO2024124379A1/zh
Publication of WO2024124379A1 publication Critical patent/WO2024124379A1/zh

Links

Images

Definitions

  • the present invention relates to the field of nucleic acid sequencing technology and bioinformatics, and in particular to a method for synchronous sequencing of multiple template nucleic acids and its application.
  • PE/MP sequencing is also called bidirectional sequencing, which is a pair of long sequences with the sequences at both ends measured. The sequences at both ends form a "pair", and the distance in the middle is the length of the inserted fragment. Sequence assembly and alignment can be performed. For duplications, deletions, and insertions of gene fragments, this method is more accurate and has a wider coverage of the genome. The difference between paired-end and mate-paired is that the library construction method is different. At present, paired-end sequencing has become the mainstream. While it increases the sequencing length, it can also provide a new method for structural variation analysis.
  • double-end sequencing still requires sequencing the two strands of DNA one after another, which limits the sequencing throughput and has high sequencing costs.
  • the inventors use amplification technology to generate sequencing chips with multiple template nucleic acid molecules at the same time, such as one-strand and two-strand (sense and antisense strands of DNA) templates, and by adjusting the copy number difference of the multiple templates or the concentration of their sequencing primers, the signal difference of different templates in the same round of sequencing reaction is achieved, thereby distinguishing and locating the detected bases of different templates, and using crosstalk correction parameters and/or phase shift correction parameters to correct the base reading results during sequencing, so that a scheme for simultaneous sequencing of multiple template nucleic acids can be realized.
  • This method can greatly save sequencing time and cost, improve sequencing throughput, and is suitable for wide application.
  • the present invention proposes a sequencing method.
  • it includes: there are multiple composite template sample points, multiple sequencing templates are set in the composite template sample points, and the multiple sequencing templates are hybridized with their corresponding sequencing primers; based on the sequencing primers, multiple rounds of sequencing reactions are performed on the multiple sequencing templates hybridized with the sequencing primers, and in each round of sequencing reactions, the signal intensities generated by the multiple sequencing templates are different; and for each round of the multiple rounds of sequencing reactions, based on the difference in the signal intensities, the signals of the sequencing channels are classified between the multiple sequencing templates.
  • the above method can efficiently perform simultaneous sequencing of multiple templates, greatly save sequencing time and cost, improve sequencing throughput, and is suitable for wide application.
  • the above sequencing method may further include at least one of the following additional technical features:
  • the signal intensities generated by the multiple sequencing templates have known relationship differences.
  • the present invention controls the signal intensity differences generated by the sequencing templates to classify the signals of the sequencing channels between the multiple sequencing templates, and those skilled in the art can set the signal intensity difference multiples generated by the sequencing templates as needed.
  • the known relationship is determined by the following method: controlling the copy number difference of the multiple sequencing templates or controlling the sequencing primer concentration difference of the multiple sequencing templates.
  • controlling the copy number difference of the multiple sequencing templates is achieved by controlling the concentration of different template primers or the time of polymerization extension reaction during the construction of the sequencing library.
  • the multiple sequencing templates are located at different positions of the same nucleic acid molecule.
  • the multiple sequencing templates are located on different nucleic acid molecules.
  • the multiple sequencing templates in the chip are located at different positions of the same single-stranded DNA molecule.
  • the multiple sequencing templates in the chip are located on different chains of multiple single-stranded DNA molecular complexes.
  • the multiple sequencing templates include: forming a single-stranded DNA molecule by rolling circle amplification by extending a rolling circle amplification primer, and obtaining a DNA molecule complex by performing multiple displacement amplification on the single-stranded DNA molecule by extending a multiple displacement amplification primer.
  • the primers for rolling circle amplification are fixed on a solid support or are free in a solution.
  • the primers for the multiple displacement amplification are fixed on a solid support or free in a solution.
  • the rolling circle amplification and the multiple displacement amplification are performed simultaneously in the same reaction system.
  • the rolling circle amplification reaction is performed first, and then the multiple displacement primers are hybridized to perform a multiple displacement reaction.
  • the signals of the sequencing channels are classified among the multiple sequencing templates to determine the nucleotide sequences of the multiple sequencing templates at each composite template sample point.
  • the plurality of sequencing templates include DNA clusters obtained by PCR amplification.
  • the signal generated by each channel is intensity-corrected and then classified among the multiple sequencing templates.
  • the correction parameters used in the intensity correction include at least one of a crosstalk correction parameter and a phase shift correction parameter.
  • the correction parameter is determined by the following steps:
  • the crosstalk correction parameter of the given base channel is determined based on the base calling results of the multiple crosstalk correction parameter reference sample points, and the phase shift correction parameter of the given base channel is determined based on the base calling results of the multiple phase shift correction parameter reference sample points, wherein the base calling results include the signal intensity values of each base channel in each round of sequencing reaction.
  • the high-confidence composite sample point is the composite sample point in which the base calling result is only one base in a given round of sequencing reactions in the multiple rounds of sequencing reactions.
  • the crosstalk correction parameter reference sample point is the composite sample point that satisfies the following conditions: in the given round of sequencing reaction, the base call result of the composite sample point is only one base different from the given base.
  • the phase shift correction parameter reference sample point is the composite sample point that satisfies the following conditions:
  • the base calling result of the composite sample point is only one of the given bases, then the phase shift correction parameter reference sample point is the delayed phase shift correction parameter reference sample point,
  • the base calling result of the composite sample point is only one of the given bases, and the phase shift correction parameter reference sample point is the advanced phase shift correction parameter reference sample point.
  • the crosstalk correction parameter is obtained by training the following formula using the signal intensity value of each base channel in the crosstalk correction parameter reference sample point:
  • B1, B2, B3 and B4 represent one of the base A channel, the base T channel, the base G channel and the base C channel, respectively, where B1 represents a given base channel;
  • N the number of the given round
  • yi (B1, N) represents the signal intensity value of the given base channel in a given round N
  • Xi (B2,N) , Xi (B3,N) and Xi (B4,N) respectively represent the signal intensity values of the given base channels B2, B3 and B4 in a given round N,
  • ⁇ 0, ⁇ 1, ⁇ 2 and ⁇ 3 represent the crosstalk correction parameters for a given base channel, and ⁇ represents the error parameter.
  • the phase shift correction parameter further includes at least one of a lagging phase shift correction parameter and an advancing phase shift correction parameter
  • the phase shift correction parameter is obtained by training the following formula using the signal intensity value of each base channel in the phase shift correction parameter reference sample point:
  • B1 represents a given base channel
  • M represents the number of the given round
  • M+1 represents the number of the round after the given round
  • M-1 represents the number of the round before the given round.
  • ⁇ 01 and ⁇ 4 represent the hysteresis phase shift correction parameters for a given base channel
  • ⁇ 02 and ⁇ 5 represent the advanced phase shift correction parameters for a given base channel.
  • the formula is trained using an MLR model.
  • the present invention provides a sequencing system.
  • the system comprises:
  • a chip wherein the chip has a plurality of composite template sample points, and a plurality of sequencing templates are arranged in the composite template sample points;
  • a detection device used for hybridizing the multiple sequencing templates with their corresponding sequencing primers, and based on the sequencing primers, synchronously performing multiple rounds of sequencing reactions on the multiple sequencing templates hybridized with the sequencing primers, and in each round of sequencing reactions, the signal intensities generated by the multiple sequencing templates are different;
  • the analysis device is used for classifying the signals of the sequencing channels among the multiple sequencing templates based on the difference of the signal intensities for each round of the multiple sequencing reactions.
  • the above sequencing system may further include at least one of the following additional technical features:
  • the multiple sequencing templates in the chip are located at different positions of the same nucleic acid molecule.
  • the multiple sequencing templates in the chip are located on different nucleic acid molecules.
  • the multiple sequencing templates in the chip are located at different positions of the same single-stranded DNA molecule.
  • the multiple sequencing templates in the chip are located on different chains of multiple single-stranded DNA molecular complexes.
  • the multiple sequencing templates include:
  • a DNA molecule is obtained by performing multiple displacement amplification on the single-stranded DNA molecule by extending the multiple displacement amplification primer.
  • the plurality of sequencing templates include DNA clusters obtained by PCR amplification.
  • the signal intensities generated by the multiple sequencing templates in the sequencing device have known relationship differences.
  • the known relationship is determined by the following method: controlling the copy number difference of the multiple sequencing templates or controlling the sequencing primer concentration difference of the multiple sequencing templates.
  • controlling the copy number difference of the multiple sequencing templates is achieved by controlling the difference in the concentration of primers for different templates or the time of polymerization extension reaction during the construction of the sequencing library.
  • the rolling circle amplification primer is fixed on the chip or is free in a solution on the surface of the chip.
  • the multiple displacement amplification primers are fixed on the chip or are free in a solution on the surface of the chip.
  • the rolling circle amplification and multiple displacement amplification are performed simultaneously on the chip.
  • the rolling circle amplification reaction is first performed on the chip, and then the multiple displacement primers are hybridized to perform the multiple displacement reaction.
  • the sequencing strand can be eluted to restore the sequencing template to a single-stranded state and perform a repeated sequencing reaction.
  • step-by-step sequencing can also be performed, wherein two-strand sequencing can be performed first and then one-strand sequencing, or one-strand sequencing can be performed first and then two-strand sequencing.
  • the analysis device further comprises an intensity correction module for performing intensity correction on the signal generated by each channel before classifying the signal of each sequencing channel among the multiple sequencing templates in each round of sequencing reaction.
  • the analysis device further comprises at least one of a crosstalk correction parameter acquisition module and a phase shift correction parameter acquisition module, wherein:
  • the crosstalk correction parameter acquisition module is used to determine the crosstalk correction parameter of a given base channel based on the base call results of each base channel in each round of sequencing reaction at multiple crosstalk correction parameter reference sample points,
  • the phase shift correction parameter acquisition module is used to determine the phase shift correction parameter of a given base channel based on the base call results of each base channel in each round of sequencing reaction at multiple phase shift correction parameter reference sample points,
  • the base calling result includes the signal intensity value of each base channel in each round of sequencing reaction.
  • the analysis device may further include: a high-confidence composite sample point determination module, configured to determine a plurality of high-confidence composite sample points from the plurality of composite template sample points based on the base calling result.
  • a high-confidence composite sample point determination module configured to determine a plurality of high-confidence composite sample points from the plurality of composite template sample points based on the base calling result.
  • the analysis device may further include: a crosstalk correction parameter reference sample point determination module, configured to determine, for a given base channel, a plurality of crosstalk correction parameter reference sample points from the plurality of high-confidence composite sample points.
  • a crosstalk correction parameter reference sample point determination module configured to determine, for a given base channel, a plurality of crosstalk correction parameter reference sample points from the plurality of high-confidence composite sample points.
  • the high-confidence composite sample point is the composite sample point whose base calling result is only one base in a given round of sequencing reaction in each round of sequencing reaction.
  • the crosstalk correction parameter reference sample point is the composite sample point that satisfies the following conditions: in the given round of sequencing reaction, the base call result of the composite sample point is only one base different from the given base.
  • the analysis device may further include a phase shift correction parameter reference sample point determination module, which is used to determine multiple phase shift correction parameter reference sample points from the multiple high-confidence composite sample points for a given base channel.
  • the present invention provides a computer device.
  • the computer device includes a memory, a controller and a processor;
  • the memory includes a program for storing;
  • the controller includes a program for executing the program in the memory to control the sequencing reaction;
  • the processor includes a program for executing the program stored in the memory to implement the sequencing method described in the first aspect.
  • the present invention provides a computer-readable storage medium.
  • the storage medium stores a program, and the program can be executed by a processor to implement the sequencing method described in the first aspect.
  • FIG1 shows a schematic flow chart of a method for simultaneous sequencing of multiple template nucleic acids according to an embodiment of the present invention
  • FIG2 shows a schematic flow chart of a method for simultaneous sequencing of multiple template nucleic acids according to an embodiment of the present invention
  • FIG3 shows a schematic flow chart of a method for simultaneous sequencing of multiple template nucleic acids according to another embodiment of the present invention
  • FIG4 shows a schematic flow chart of a method for simultaneous sequencing of multiple template nucleic acids according to an embodiment of the present invention
  • FIG5 shows a structural device diagram of a system for simultaneous sequencing of multiple template nucleic acids according to an embodiment of the present invention
  • FIG6 shows another structural device diagram of a system for simultaneous sequencing of multiple template nucleic acids according to an embodiment of the present invention
  • FIG. 7 shows another structural device diagram of a system for simultaneous sequencing of multiple template nucleic acids according to an embodiment of the present invention
  • FIG8 shows another structural device diagram of a system for simultaneous sequencing of multiple template nucleic acids according to an embodiment of the present invention
  • FIG9 shows a result diagram of a sequencing experiment scheme of an embodiment of the present invention: simultaneous sequencing of one strand and two strands;
  • FIG10 shows the result of sequencing two strands first and one strand later in the sequencing experiment scheme 1 of the embodiment of the present invention
  • FIG11 shows the result of simultaneous sequencing of one strand and two strands in the sequencing experiment scheme 2 of the embodiment of the present invention
  • FIG12 shows the result of the change of raw intensity with the number of cycles in the simultaneous sequencing of one strand and two strands in the sequencing experiment scheme 3 of the embodiment of the present invention
  • FIG13A shows a scheme 1 of achieving signal difference by controlling the copy number of sequencing templates using the bridge amplification method of the sequencing experiment scheme 4 of an embodiment of the present invention
  • FIG. 13B shows a second scheme of achieving signal difference by controlling the concentration of sequencing primers in the bridge amplification method of the sequencing experiment scheme 4 of the embodiment of the present invention.
  • FIG. 14 shows a graph showing the results of simultaneous sequencing of the first and second strands with signal difference achieved by primer dilution in sequencing experiment scheme 4 of an embodiment of the present invention.
  • first and second are used for descriptive purposes only and should not be understood as indicating or implying relative importance or implicitly indicating the number of the indicated technical features. Therefore, the features defined as “first” and “second” may explicitly or implicitly include at least one of the features. In the description of the present invention, the meaning of “plurality” is at least two, such as two, three, etc., unless otherwise clearly and specifically defined.
  • any values of the ranges disclosed in this article are not limited to the precise ranges or values, and these ranges or values should be understood to include values close to these ranges or values.
  • the endpoint values of each range, the endpoint values of each range and the individual point values, and the individual point values can be combined with each other to obtain one or more new numerical ranges, which should be regarded as specifically disclosed in this article.
  • paired-end sequencing refers to sequencing the 5’ and 3’ ends of the same DNA molecule separately.
  • the present invention provides a sequencing method.
  • the method comprises:
  • nucleic acid sequencing there are multiple composite template sample points at the same time, and multiple sequencing templates are set in the composite template sample points. Before multiple rounds of sequencing reactions are carried out simultaneously on the multiple sequencing templates, the multiple sequencing templates are hybridized with their corresponding sequencing primers.
  • the present invention does not strictly limit the number of sequencing templates for simultaneous sequencing. Usually, a single composite sample point needs to have at least two templates, namely the nucleic acid to be tested and the label sequence, wherein the label sequence plays a role of label identification. When there is a double label sequence, three templates are required.
  • the positions of the multiple sequencing templates there is no strict limitation on the positions of the multiple sequencing templates, and they can be located at different positions of the same nucleic acid molecule, on different nucleic acid molecules, at different positions of the same single-stranded DNA molecule (such as a DNA nanoball or a linear DNA single-stranded molecule), or on different chains of a complex of multiple single-stranded DNA molecules, etc., and can be flexibly selected according to actual conditions.
  • composite template used in the present disclosure refers to a sequencing template containing at least one previously amplified template, and the amplification method is not strictly limited, and it can be rolling circle amplification, bridge amplification, etc., that is, it can be a single-stranded DNA molecule (such as a DNA nanoball) formed by rolling circle amplification by extending a rolling circle amplification primer, and a DNA molecule obtained by multiple displacement amplification of the single-stranded DNA molecule by extending a multiple displacement amplification primer, and it can also be a DNA cluster obtained by PCR amplification performed on the chip surface.
  • a single-stranded DNA molecule such as a DNA nanoball
  • the primers for rolling circle amplification are fixed on a solid support or a chip.
  • the primers for rolling circle amplification are free in the solution on the surface of the chip.
  • the primers for the multiple displacement amplification are fixed on a solid support or a chip.
  • the primers for the multiple displacement amplification are free in the solution on the chip surface.
  • the rolling circle amplification and multiple displacement amplification are performed simultaneously in the same reaction system.
  • the rolling circle amplification reaction is performed first, and then the multiple displacement primers are hybridized to perform a multiple displacement reaction.
  • the signals of the sequencing channels are classified among the multiple sequencing templates to determine the nucleotide sequences of the multiple sequencing templates at each composite template sample point.
  • the signal intensities generated by the multiple sequencing templates have known differences.
  • the inventors can distinguish and classify the sequenced bases into one strand and two strands by controlling the number of signals generated by the multiple sequencing templates.
  • the signal intensity difference is determined by the following methods: controlling the copy number difference of different sequencing templates or controlling the concentration difference of sequencing primers of multiple sequencing templates to control the signal quantity relationship generated by different sequencing templates.
  • the copy number difference of the sequencing template can be controlled by controlling the time of the polymerization extension reaction in the process of constructing the sequencing library, or the copy number difference of the sequencing template can be controlled by controlling the concentration difference of different sequencing template primers.
  • the following two methods can be used to distinguish the number of signals generated by sequencing different sequencing templates: method (1) by modifying the amplification primers on the chip surface, one of which does not have a cleavable group and the other primer is doped with a certain proportion of primers with a cleavable group (such as enzyme cleavage, light cleavage, etc.), after the amplification is completed, the copy of this part of the sequencing template containing the cleavable group is removed, so as to achieve the control of the signal quantity relationship of the two chains; method (2) in the sequencing process, the concentration ratio of the two sequencing primers is controlled, or the number of primers that can be subjected to the polymerization extension reaction is controlled by adding a blocking group to the 3' end of the sequencing primer.
  • method (1) by modifying the amplification primers on the chip surface, one of which does not have a cleavable group and the other primer is doped with a certain proportion of primers with a cleavable group (such as enzyme
  • the 3' end of the amplification primer is blocked by the blocking group so that it cannot be extended under the action of the polymerase.
  • the blocked amplification primer is a primer with a phosphorylated 3' end, in the presence of which the nucleotide chain cannot be synthesized, and after dephosphorylation, the blocked amplification primer can be extended under the action of a polymerase.
  • the signal intensities generated by the multiple sequencing templates differ by at least two times.
  • a person skilled in the art can set a suitable signal intensity difference multiple as needed, which may be 2 times, 3 times, 4 times, 5 times, 7 times, 9 times, 11 times, 15 times, 20 times, 25 times, 30 times, etc.
  • S30 classifies the signal of each sequencing channel among the multiple templates.
  • the signals of the sequencing channels are classified among the multiple sequencing templates based on the differences in the signal intensities.
  • crosstalk is also referred to as "Crosstalk”
  • Crosstalk will lead to inaccurate detection results of each base channel.
  • This deviation will have a particularly significant impact on synchronous sequencing, which will lead to the inability to accurately distinguish between two bases, which may make the sequencing results unusable, and further requires crosstalk correction of the signal intensity of each base channel in the image obtained by the sequencing reaction.
  • the above-mentioned multiple channels can be four channels, three channels or two channels.
  • phase shift is also called “Phasing”
  • the signal generated by each channel is intensity corrected and then classified among the multiple sequencing templates, wherein the correction parameters used in the intensity correction include at least one of a crosstalk correction parameter and a phase shift correction parameter to remove noise and realize the interpretation of bases in synchronous sequencing.
  • the step of determining the correction parameters includes: based on the base calling results, determining a plurality of high-confidence composite sample points from the plurality of composite template sample points; for the channel of a given base, determining a plurality of crosstalk correction parameter reference sample points and a plurality of phase shift correction parameter reference sample points from the plurality of high-confidence composite sample points; and for the channel of the given base, determining the crosstalk correction parameters of the given base channel based on the base calling results of the plurality of crosstalk correction parameter reference sample points, and determining the phase shift correction parameters of the given base channel based on the base calling results of the plurality of phase shift correction parameter reference sample points, wherein the base calling results include the signal intensity values of each base channel in each round of sequencing reaction.
  • the high-confidence composite sample point is the composite sample point in which the base calling result is only one base in a given round of sequencing reactions in the multiple rounds of sequencing reactions.
  • the "high-confidence composite sample” is a collection of multiple high-confidence samples, and high-confidence samples mainly refer to samples with less signal interference. Since there is a large optical crosstalk between A base and T base, and between C base and G base, in a given round of sequencing reaction, the sample point with a base call result of only one base can be used as a high-confidence sample point, such as AA, TT, GG, CC. In this way, the accuracy of determining the correction parameters can be guaranteed.
  • the crosstalk correction parameter reference sample point includes the composite sample point that satisfies the following conditions: in the given round of sequencing reaction, the base call result of the composite sample point is only one base different from the given base.
  • the base call result of the composite sample point is only one base different from the given base.
  • the crosstalk correction parameter is obtained by training the following formula using the signal intensity value of each base channel in the crosstalk correction parameter reference sample point:
  • B1, B2, B3 and B4 represent one of the base A channel, the base T channel, the base G channel and the base C channel, respectively, where B1 represents a given base channel;
  • N the number of the given round
  • yi (B1, N) represents the signal intensity value of the given base channel in a given round N
  • Xi (B2,N) , Xi (B3,N) and Xi (B4,N) respectively represent the signal intensity values of the given base channels B2, B3 and B4 in a given round N,
  • ⁇ 0, ⁇ 1, ⁇ 2 and ⁇ 3 represent the crosstalk correction parameters for a given base channel, and ⁇ represents an error parameter.
  • the formula is trained using an MLR model.
  • the phase shift correction parameter reference sample point is the composite sample point that satisfies the following conditions: (a) in the given round of sequencing reaction, the base call result of the composite sample point is only one base different from the given base; and (b) in at least one of the front round or the rear round of the given round of sequencing reaction, the base call result of the composite sample point is only one given base.
  • the base call result of the composite sample point is only one given base.
  • AA, TT, GG, and CC in the front round/rear round can be used as the phase shift correction parameter reference sample points.
  • phase shift correction parameter reference sample point in (b), in the previous round of the given round of sequencing reaction, the base calling result of the composite sample point is only one of the given bases, then the phase shift correction parameter reference sample point is a lagging phase shift correction parameter reference sample point, or in (b), in the subsequent round of the given round of sequencing reaction, the base calling result of the composite sample point is only one of the given bases, then the phase shift correction parameter reference sample point is an advanced phase shift correction parameter reference sample point.
  • the phase shift correction parameter further includes at least one of a lagging phase shift correction parameter and an advancing phase shift correction parameter
  • the phase shift correction parameter is obtained by training the following formula using the signal intensity value of each base channel in the phase shift correction parameter reference sample point:
  • B1 represents a given base channel
  • M represents the number of the given round
  • M+1 represents the number of the round after the given round
  • M-1 represents the number of the round before the given round.
  • ⁇ 01 and ⁇ 4 represent the hysteresis phase shift correction parameters for a given base channel
  • ⁇ 02 and ⁇ 5 represent the advanced phase shift correction parameters for a given base channel.
  • an MLR model is used to train the above formula to obtain the crosstalk correction parameter and the phase shift correction parameter.
  • the signal of the C channel in the non-luminous state is caused by the crosstalk of other channels to it and the Lagrunon (advance) of the front and rear wheels. Therefore, select the points where the current wheel is identified as GG, AA, and TT, and calculate the Phasing coefficients of each channel to C.
  • the calculation of the Phasing coefficient involves the signal values of the front and rear wheels, and the signal interference of this round is avoided.
  • calculate the signal advance select the point where the current wheel N pre-calls AA and TT, the front wheel N-1 identifies AA and TT, and the rear wheel N+1 identifies CC, and calculate the Lagrunon coefficient on the C channel.
  • the calculation of other channels is similar.
  • the sequencing method comprises:
  • Step a obtaining a single-stranded circular nucleic acid
  • Step b using single-stranded circular nucleic acid as a template to perform rolling circle amplification (RCA) or reverse transcription reaction to form DNA nanoballs (DNBs).
  • RCA rolling circle amplification
  • DNS reverse transcription reaction
  • Step c Mix the DNB and buffer and load them onto the sequencing slide; then hybridize the multiple displacement amplification primer (MDA primer).
  • MDA primer multiple displacement amplification primer
  • Step d Pump in a specific buffer to allow the DNBs on the chip to continue rolling circle amplification to extend the 3' end of the template chain.
  • the extended template is then used as a sequencing template for one chain.
  • Step e Use the MDA primer hybridized in step c to perform multiple displacement amplification to generate a double-stranded template.
  • the first-strand sequencing and second-strand sequencing templates are generated, and by controlling the reaction time of rolling circle amplification, the copy number with differences can be obtained.
  • This template can achieve the signal difference between the first and second strands.
  • Step f Simultaneous hybridization of the first-strand and second-strand primers
  • Step g Perform simultaneous sequencing of one strand and two strands
  • the first-strand and second-strand hybrid primers are eluted to restore the single-stranded first-strand and second-strand templates, and repeated sequencing can be performed, which can be used for second-strand sequencing first and then first-strand sequencing, or first-strand sequencing first and then second-strand sequencing, to achieve repeated cycle sequencing.
  • the method includes:
  • Step a Obtain single-stranded circular DNA
  • Step b Mix the single-stranded circular DNA and buffer, load them onto a sequencing slide, fix the rolling circle amplification primers on the chip, and perform rolling circle amplification (RCA) using the single-stranded circular DNA as a template to form a single-stranded DNA molecule as a single-strand sequencing template.
  • RCA rolling circle amplification
  • Step c hybridization of multiple displacement amplification primers (MDA primers), which are free in the solution on the chip surface.
  • MDA primers multiple displacement amplification primers
  • Step d MLG generates a strand of sequencing template.
  • Step e hybridizing a single-strand sequencing primer with a 3'-end blocking function on a single-strand template
  • Step f Use the MDA primer hybridized in step c to perform multiple displacement amplification to generate a double-stranded template.
  • Step g one strand of sequencing primer is deblocked
  • Step h hybridization of the second strand primer
  • Step i Perform simultaneous sequencing of one and two strands.
  • the method includes:
  • Step a Using single-stranded circular DNA as template,
  • Step b Mix the single-stranded circular DNA and buffer, load it onto a sequencing slide, and perform rolling circle amplification (RCA) to form DNA nanoballs (DNBs).
  • RCA rolling circle amplification
  • Step c hybridizing a multiple displacement amplification primer (MDA primer), wherein the multiple displacement primer is fixed on a sequencing chip,
  • Step d Use the MDA primer hybridized in step c to perform multiple displacement amplification to generate a double-stranded template.
  • Step e Load a specific buffer to allow the DNB on the chip to continue rolling circle amplification to extend the 3' end of the template chain.
  • the extended template is then used as a sequencing template for one chain.
  • the first-strand sequencing and second-strand sequencing templates are generated.
  • the template can achieve the signal difference between the first and second strands.
  • Step f Simultaneous hybridization of the first-strand and second-strand primers
  • Step g Perform simultaneous sequencing of one and two strands.
  • paired-end sequencing it can be used to sequence the second strand first and then the first strand, or it can be used to sequence the first strand and then the second strand.
  • bridge PCR amplification can also be performed to generate DNA clusters.
  • the method includes:
  • Step a Using double-stranded DNA library as template,
  • Step b Perform bridge PCR amplification on the sequencing chip to generate DNA clusters.
  • Step c Irradiation or enzyme digestion reaction to remove part of the template from each DNA cluster to form single-stranded DNA strands one and two.
  • Step d hybridizing the first-strand and second-strand primers simultaneously, wherein the concentration ratio of the first-strand and second-strand primers has a multiple difference, for example: 1:2, 1:3, 1:4, etc.
  • Step e Perform simultaneous sequencing of one and two strands.
  • the method includes:
  • Step a Using double-stranded DNA library as template,
  • Step b Perform bridge PCR amplification on the sequencing chip to generate DNA clusters.
  • Step c Irradiation, enzyme digestion or other reactions are used to remove the second strand from each double-stranded DNA to form a single-stranded DNA strand with the same number of strands.
  • Step d simultaneously hybridizing the insert sequencing primer and the tag sequence sequencing primer, wherein the concentration ratio of the insert sequencing primer and the tag sequence sequencing primer has a multiple difference, for example: 1:2, 1:3, 1:4, etc.
  • Step e Perform simultaneous sequencing of the insert fragment sequencing primer and the tag sequence sequencing primer.
  • the present invention provides a sequencing system, as shown in FIG5 , the system comprising:
  • Chip 100 is a sequencing chip having multiple composite template sample points, wherein multiple sequencing templates are arranged in the composite template sample points;
  • the detection device 200 is used to hybridize the multiple sequencing templates with their corresponding sequencing primers, and based on the sequencing primers, synchronously perform multiple rounds of sequencing reactions on the multiple sequencing templates hybridized with the sequencing primers, and in each round of sequencing reactions, the signal intensities generated by the multiple sequencing templates are different;
  • the analysis device 300 is used to classify the signals of the sequencing channels among the multiple sequencing templates based on the differences in the signal intensities for each round of the multiple sequencing reactions.
  • the multiple sequencing templates in the chip are located at different positions of the same nucleic acid molecule, on different nucleic acid molecules, at different positions of the same single-stranded DNA molecule (such as DNA nanoball), or on different chains of multiple single-stranded DNA molecule complexes, etc., which can be flexibly selected according to actual conditions.
  • the multiple sequencing templates may be single-stranded DNA molecules formed by rolling circle amplification by extending rolling circle amplification primers, DNA molecules obtained by extending multiple displacement amplification primers to perform multiple displacement amplification on the single-stranded DNA molecules, or DNA clusters obtained by PCR amplification.
  • the primers for rolling circle amplification are fixed on a solid support or free in a solution
  • the primers for multiple displacement amplification are fixed on a solid support or free in a solution
  • the rolling circle amplification and multiple displacement amplification may be performed simultaneously in the same reaction system, or the rolling circle amplification reaction may be performed first, and then the multiple displacement primers may be hybridized to perform multiple displacement reactions.
  • the signals of the sequencing channels are classified among the multiple sequencing templates to determine the nucleotide sequences of the multiple sequencing templates at each composite template sample point.
  • the signal intensities generated by the multiple sequencing templates in the sequencing device have known relationship differences.
  • the relationship between the signal quantities generated by the multiple sequencing templates is set by controlling the difference in the copy numbers of the multiple templates or controlling the difference in the concentrations of the sequencing primers of the multiple templates.
  • the known relationship is determined by the following method: controlling the copy number difference of the multiple sequencing templates or controlling the sequencing primer concentration difference of the multiple sequencing templates.
  • controlling the copy number difference of the multiple sequencing templates is achieved by controlling the time of the polymerization extension reaction during the construction of the sequencing library.
  • the analysis device includes an intensity correction module 310 for performing intensity correction on the signal generated by each channel before classifying the signal of each sequencing channel among the multiple sequencing templates in each round of sequencing reaction.
  • the analysis device further includes at least one of a crosstalk correction parameter acquisition module 320 and a phase shift correction parameter acquisition module 330.
  • the crosstalk correction parameter acquisition module 320 is used to determine the crosstalk correction parameter of a given base channel based on the base call results of each base channel in each round of sequencing reaction at multiple crosstalk correction parameter reference sample points,
  • the phase shift correction parameter acquisition module 330 is used to determine the phase shift correction parameter of a given base channel based on the base call results of each base channel in each round of sequencing reaction at multiple phase shift correction parameter reference sample points.
  • the base calling result includes the signal intensity value of each base channel in each round of sequencing reaction.
  • the analysis device may further include a high-confidence composite sample point determination module 340, which is used to determine a plurality of high-confidence composite sample points from the plurality of composite template sample points based on the base call results, wherein the high-confidence composite sample points are composite sample points whose base call results are only one base in a given round of sequencing reactions in each round of sequencing reactions;
  • a high-confidence composite sample point determination module 340 which is used to determine a plurality of high-confidence composite sample points from the plurality of composite template sample points based on the base call results, wherein the high-confidence composite sample points are composite sample points whose base call results are only one base in a given round of sequencing reactions in each round of sequencing reactions;
  • the crosstalk correction parameter reference sample point determination module 350 is used to determine multiple crosstalk correction parameter reference sample points from the multiple high-confidence composite sample points for a given base channel.
  • the crosstalk correction parameter reference sample point is the composite sample point that satisfies the following conditions: in the given round of sequencing reaction, the base call result of the composite sample point is only one base different from the given base.
  • the phase shift correction parameter reference sample point determination module 360 is used to determine multiple phase shift correction parameter reference sample points from the multiple high-confidence composite sample points for a given base channel.
  • the phase shift correction parameter reference sample point is the composite sample point that satisfies the following conditions: (a) in the given round of sequencing reaction, the base call result of the composite sample point is only one base different from the given base; and (b) in at least one of the front round or the rear round of the given round of sequencing reaction, the base call result of the composite sample point is only one of the given bases.
  • phase shift correction parameter reference sample point is a lagging phase shift correction parameter reference sample point
  • the phase shift correction parameter reference sample point is an advanced phase shift correction parameter reference sample point
  • the crosstalk correction parameter acquisition module 320 is obtained by training the following formula using the signal intensity values of each base channel in the crosstalk correction parameter reference sample point:
  • B1, B2, B3 and B4 represent one of the base A channel, the base T channel, the base G channel and the base C channel, respectively, where B1 represents a given base channel;
  • N the number of the given round
  • yi (B1, N) represents the signal intensity value of the given base channel in a given round N
  • Xi (B2,N) , Xi (B3,N) and Xi (B4,N) respectively represent the signal intensity values of the given base channels B2, B3 and B4 in a given round N,
  • ⁇ 0, ⁇ 1, ⁇ 2 and ⁇ 3 represent the crosstalk correction parameters for a given base channel, and ⁇ represents the error parameter.
  • the phase shift correction parameter acquisition module 330 is obtained by training the following formula using the signal intensity value of each base channel in the phase shift correction parameter reference sample point:
  • B1 represents a given base channel
  • M represents the number of the given round
  • M+1 represents the number of the round after the given round
  • M-1 represents the number of the round before the given round.
  • ⁇ 01 and ⁇ 4 represent the hysteresis phase shift correction parameters for a given base channel
  • ⁇ 02 and ⁇ 5 represent the advanced phase shift correction parameters for a given base channel.
  • the present invention proposes a computer device, the computer device comprising a memory, a controller and a processor; the memory comprises a device for storing programs; the processor comprises a device for executing the program stored in the memory to implement the method described in the first aspect.
  • the computer device according to some specific embodiments of the present invention can effectively implement the method for synchronously sequencing multiple template nucleic acids, perform synchronous sequencing of multiple template nucleic acids, effectively improve sequencing throughput, and greatly save sequencing time and cost.
  • the electronic device can be any intelligent terminal including a tablet computer, a computing cluster, a sequencer, a car computer, etc.
  • the term "memory” used in the present disclosure refers to any computer program product, device, and/or device (e.g., disk, optical disk, memory, programmable logic device (PLD)) for providing machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as machine-readable signals.
  • the memory can be implemented in the form of a read-only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM).
  • ROM read-only memory
  • RAM random access memory
  • the memory can store an operating system and other application programs.
  • the relevant program code is stored in the memory, and the processor is called to execute the training method or gene sequencing method of the gene sequencing model of the embodiment of the present application.
  • a crosstalk correction parameter acquisition module and a phase shift correction parameter acquisition module in the memory.
  • the computer device may also include an input/output interface, a communication interface, and a bus.
  • the input/output interface is used to realize information input and output;
  • the communication interface is used to realize communication interaction between the device and other devices, and communication can be realized through wired methods (such as USB, network cable, etc.) or wireless methods (such as mobile networks, WIFI, Bluetooth, etc.);
  • the bus is used to transmit information between various components of the device (such as processors, memory, input/output interfaces and communication interfaces).
  • the present invention provides a computer-readable storage medium, wherein a program is stored in the storage medium, and the program can be executed by a processor to implement the method described in the first aspect.
  • the computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server or a data center that includes one or more available media integrated.
  • the available medium can be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a high-density digital video disc (digital video disc, DVD)), or a semiconductor medium (e.g., a solid state drive (solid state disk, SSD)), etc.
  • MGISEQ-2000 sequencer MGISEQ-2000 sequencing reagent slide (715nm), mini DNB loading device, PCR instrument, 8-tube PCR, a set of pipettes, high-speed centrifuge, mini centrifuge, vortex mixer.
  • Reagents The main reagents used in this application are shown in Table 1.
  • One-strand sequencing primer IP1-x2 powder with linker Bioengineering Second strand sequencing primer IP3 MGI DNBReadBuffer(REB) MGI EDTA - Formamide -
  • Reagent name volume Final concentration 100 ⁇ M first-strand sequencing primer IP1-x1 master solution with linker 50 ⁇ L 0.5 ⁇ M 100 ⁇ M first-strand sequencing primer IP1-x2 master solution with linker 50 ⁇ L 0.5 ⁇ M 10X Phi29 Buffer 1mL 1X Ultra-pure water 8.9mL --- total 10mL ---
  • Reagent name volume Final concentration 100 ⁇ M 1-strand barcoding primer BP1 stock solution 100 ⁇ L 1 ⁇ M 100 ⁇ M Secondary Barcoding Primer BP2 Stock Solution 100 ⁇ L 1 ⁇ M 5XSSC buffer 9.8mL --- total 10mL ---
  • Reagent name volume Final concentration 100 ⁇ M first-strand sequencing primer IP1-x1-OP master solution with linker 50 ⁇ L 0.5 ⁇ M 100 ⁇ M first-strand sequencing primer IP1-x2-OP master solution with linker 50 ⁇ L 0.5 ⁇ M 10X Phi29 Buffer 1mL 1X Ultra-pure water 8.9mL --- total 10mL ---
  • the base reading results of synchronous sequencing need to be corrected.
  • the AA, TT, CC, and GG base combinations with high credibility are screened out.
  • the crosstalk coefficient and phasing coefficient are calculated by the MLR method, and then applied to the signal correction of the entire cycle.
  • the calculation of the phasing coefficient involves the brightness values of the previous and next cycles. Therefore, in MLR, the influencing factors are the signals of the four channels of the current cycle and the brightness values of the related signals of the previous and next cycles.
  • B1, B2, B3 and B4 represent one of the base A channel, the base T channel, the base G channel and the base C channel, respectively, where B1 represents a given base channel;
  • N the number of the given round
  • yi (B1, N) represents the signal intensity value of the given base channel in a given round N
  • Xi (B2,N) , Xi (B3,N) and Xi (B4,N) respectively represent the signal intensity values of the given base channels B2, B3 and B4 in a given round N,
  • ⁇ 0, ⁇ 1, ⁇ 2 and ⁇ 3 represent the crosstalk correction parameters for a given base channel, and ⁇ represents the error parameter.
  • Phasing coefficient involves the values of the previous and next cycle signals, and its calculation is shown as follows:
  • B1 represents a given base channel
  • M represents the number of the given round
  • M+1 represents the number of the round after the given round
  • M-1 represents the number of the round before the given round.
  • ⁇ 01 and ⁇ 4 represent the hysteresis phase shift correction parameters for a given base channel
  • ⁇ 02 and ⁇ 5 represent the advanced phase shift correction parameters for a given base channel.
  • Figure 9 is a curve of the raw intensity of the simultaneous sequencing of the first and second chains of multiple replicate wells versus the number of cycles, where the 1st to 50th cycles represent the results of simultaneous sequencing of the first and second chains, and the 51st to 60th cycles represent the results of simultaneous sequencing of barcode 1 and barcode 2.
  • the algorithm results are shown in Table 8.
  • the synchronization results were split and compared, and the amount of comparable data for the largest signal was 1139578, the comparison rate was 80.93%, and the error rate was 0.75.
  • the amount of comparable data for the second largest signal was 213312, the comparison rate was 15.15%, and the error rate was 3.42%.
  • the overall error rate of the synchronous sequencing results of this experiment was 2%.
  • Figure 10 is a graph showing the results of step-by-step sequencing of the first and second chains of the template of this scheme, where the 1st to 50th cycles show the second-chain sequencing signal results, the 51st to 100th cycles show the first-chain signal results, and the 101st to 110th cycles show the barcode signal results.
  • this sequencing first hybridizes the first-strand sequencing primer with 3' end phosphorylation, then performs MDA, and then dephosphorylates the blocking primer hybridized on the first strand, and then hybridizes the second-strand primer to perform simultaneous sequencing of the first and second strands (50bp), and then hybridizes the first and second strand barcode primers Barcode Primer Mix to perform barcode sequencing (10bp).
  • the correction method of the base calling results of synchronous sequencing is the same as that of experimental scheme 1.
  • the experimental results are shown in Figure 11, where the 1st to 50th cycle represents the results of simultaneous sequencing of the first and second chains, and the 51st to 60th cycle represents the results of simultaneous sequencing of barcode 1 and barcode 2.
  • the algorithm results are shown in Table 9.
  • the synchronous results were split and compared, and the amount of comparable data for the largest signal was 981,505, the comparison rate was 69.71%, and the error rate was 1.64.
  • the amount of comparable data for the second largest signal was 171,611, the comparison rate was 12.18%, and the error rate was 4.67%.
  • the overall error rate of the synchronous sequencing results of this experiment was 3.02%.
  • the results of the synchronous sequencing algorithm are shown in Table 10.
  • the synchronous results are split and compared.
  • the amount of comparable data for the largest signal is 937468, the comparison rate is 66.58%, and the error rate is 1.39.
  • the amount of comparable data for the second largest signal is 500782, the comparison rate is 35.56%, and the error rate is 2.91%.
  • the overall error rate of the synchronous sequencing results of this experiment is 2.5%.
  • Figure 12 shows the change of raw intensity of the synchronous test with the number of cycles.
  • the signal difference between the first and second strands can also be achieved by controlling the copy number of the template or the concentration of the sequencing primer for simultaneous sequencing.
  • the scheme for controlling the copy number of the sequencing template to achieve the signal difference is shown in FIG13A
  • the scheme for controlling the concentration of the sequencing primer to achieve the signal difference is shown in FIG13B .
  • the 3' end of the sequencing primer is blocked by the blocking group and cannot be extended under the action of the polymerase.
  • the control of the concentration of the sequencing primer is used as an example for verification, and the specific experimental operation is as follows.
  • a first-strand sequencing primer with a 3'-end blocking group, a conventional first-strand sequencing primer and a second-strand sequencing primer are mixed in a ratio of 1:1:2 to obtain a sequencing primer mixture.
  • After the DNA clusters generated by bridge PCR amplification are hybridized with the above-mentioned mixed sequencing primers for simultaneous sequencing.
  • FIG14 shows the change of raw intensity with cycle number obtained by synchronous sequencing using a scheme of controlling the concentration of sequencing primers to achieve signal difference.
  • the correction method of the base call results of synchronous sequencing is the same as that of experimental scheme 1.
  • the results of the synchronous sequencing algorithm are shown in Table 11.
  • the synchronous results were split and compared.
  • the amount of comparable data for the largest signal was 941,401, with a comparison rate of 66.86% and an error rate of 1.41.
  • the amount of comparable data for the second largest signal was 513,416, with a comparison rate of 36.46% and an error rate of 4.36%.
  • the overall error rate of the synchronous sequencing results of this experiment was 2.92%.
  • first and second are used for descriptive purposes only and should not be understood as indicating or implying relative importance or implicitly indicating the number of the indicated technical features. Therefore, the features defined as “first” and “second” may explicitly or implicitly include at least one of the features. In the description of the present invention, the meaning of “plurality” is at least two, such as two, three, etc., unless otherwise clearly and specifically defined.

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

本发明提出了一种多模板核酸同步测序的方法及其应用,该方法包括:有多个复合模板样本点,所述复合模板样本点中设置有多种测序模板;将所述多种测序模板与其对应的测序引物进行杂交;基于所述测序引物,对杂交有测序引物的所述多种测序模板进行多轮测序反应,并且在每轮测序反应中,所述多种测序模板产生的信号强度存在差异;和针对所述多轮测序反应的每一轮,基于所述信号强度的差异,对测序通道的信号在所述多种测序模板之间进行归类。

Description

一种多模板核酸同步测序的方法及其应用
优先权信息
技术领域
本发明涉及核酸测序技术和生物信息领域,具体地,本发明涉及一种多模板核酸同步测序的方法及其应用。
背景技术
高通量测序的方式目前主要有单端测序、双端测序(paired/mate-paired,PE/MP),其中PE/MP测序也称作双向测序,是一对长序列测得其两端的序列,两端的序列形成“一对”,中间的距离为插入片段的长度,可以进行序列组装、比对等,对于基因片段的重复、缺失和插入,这种方法更加精确,在基因组上的覆盖面更广,Paired-end与mate-paired的区别在于建库的方式不一样。目前,Paired-end测序已经成为主流,它提高了测序长度的同时,又可以为结构变异分析提供新方法。
对于双端测序方法,目前不同的测序平台有不同的测序方案,但基本都需要先对DNA的一条一链进行测序,测序完毕后,再对其互补链二链进行测序,如现有的一些双端测序技术,主要采用桥式扩增的方式,首先进行DNA片段化,然后两端加入接头,接头上包括了测序引物结合位点,在第一轮一链测序完成后,去除第一轮测序的模板链,用对读的测序模块(Paired-end module)引导互补链在原位置再生和扩增,以达到第二轮测序所用的模板量,进行第二轮对互补链二链的合成测序。
目前的双端测序方法存在的问题是,需要分别对DNA的两条链先后进行测序,降低了测序通量,增加了测序成本。因此,仍需进一步开发更加灵活、高效的测序方法。
发明内容
本发明是基于发明人的下列发现而完成的:
目前,双端测序仍需要对DNA的两条链先后进行测序,限制了测序的通量,测序成本较高。本申请中,发明人利用扩增技术来生成同时有多个模板核酸分子的测序芯片,例如一链和二链(DNA的正义链和反义链)模板,并通过调节所述多个模板的拷贝数差异或其测序引物的浓度,实现同一轮测序反应中不同模板的信号差,从而对检测的碱基进行不同模板的区分和定位,且测序过程中利用串扰校正参数和/或相移校正参数对碱基判读结果进行校正,从而可以实现多模板核酸同时测序的方案,该方法能够大大节省测序的时间和成本,提高测序通量,适于广泛应用。
由此,在本发明的第一方面,本发明提出了一种测序方法。根据本发明的实施例,包括:有多个复合模板样本点,所述复合模板样本点中设置有多种测序模板,在所述多种测序模板与其对应的测序引物进行杂交;基于所述测序引物,对杂交有测序引物的所述多种测序模板进行多轮测序反应,并且在每轮测序反应中,所述多种测序模板产生的信号强度存在差异;和针对所述多轮测序反应的每一轮,基于所述信号强度的差异,对测序通道的信号在所述多种测序模板之间进行归类。
根据本发明的实施例,上述方法能够高效进行多种模板同步测序,大大节省测序的时间和成本,提高测序通量,适于广泛应用。
根据本发明的实施例,上述测序方法还可以包括下列附加技术特征中的至少之一:
根据本发明的实施例,所述多种测序模板产生的信号强度存在已知关系的差异。本发明通过控制所述测序模板产生的信号强度差异,以对测序通道的信号在所述多种测序模板之间进行归类,本领域技术人员可以根据需要来设置所述测序模板产生的信号强度差异倍数。
根据本发明的实施例,所述已知关系是通过以下方法确定的:控制所述多种测序模板的拷贝数差异或控制所述多种测序模板的测序引物浓度差异。
根据本发明的实施例,所述控制所述多种测序模板的拷贝数差异是通过控制构建测序文库过程中不同模板引物的浓度或者聚合延伸反应的时间实现的。
根据本发明的实施例,所述多种测序模板产生的信号强度存在至少两倍的差异。
根据本发明的实施例,所述多种测序模板位于同一核酸分子的不同位置上。
根据本发明的实施例,所述多种测序模板位于不同核酸分子上。
根据本发明的实施例,所述芯片中所述多种测序模板位于同一单链DNA分子的不同位置上。
根据本发明的实施例,所述芯片中所述多种测序模板位于多条单链DNA分子复合物的不同链上。
根据本发明的实施例,所述多种测序模板包括:通过延伸滚环扩增引物进行滚环扩增形成单链DNA分子,和通过延伸多重置换扩增引物对所述单链DNA分子进行多重置换扩增获得DNA分子复合物。
根据本发明的实施例,所述滚环扩增的引物固定于固体支持物上或游离于溶液中。
根据本发明的实施例,所述多重置换扩增的引物固定于固体支持物上或游离于溶液中。
根据本发明的实施例,所述滚环扩增和多重置换扩增在同一反应体系中同时进行。
根据本发明的实施例,先进行所述滚环扩增反应,然后杂交多重置换引物进行多重置换反应。
根据本发明的实施例,每一个复合模板样本点上多种测序模板产生的信号强度存在差异,基于所述多轮测序反应的每一轮信号强度的差异,对测序通道的信号在所述多种测序模板之间进行归类确定每一个复合模板样本点上多种测序模板的核苷酸序列。
根据本发明的实施例,所述多种测序模板包括通过PCR扩增,获得的DNA簇。
根据本发明的实施例,在所述每轮测序反应中,对各通道产生的信号进行强度校正后在所述多种测序模板之间进行归类。
根据本发明的实施例,所述强度校正采用的校正参数包括串扰校正参数和相移校正参数的至少之一。
根据本发明的实施例,所述校正参数是通过下列步骤确定的:
基于碱基判读结果在多个所述复合模板样本点中确定多个高可信度复合样本点;
针对给定碱基的所述通道,从所述多个高可信度复合样本点中确定多个串扰校正参数参考样本点和多个相移校正参数参考样本点;和
针对所述给定碱基的所述通道,基于所述多个串扰校正参数参考样本点的碱基判读结果,确定所述给定碱基通道的所述串扰校正参数,基于多个相移校正参数参考样本点的碱基判读结果,确定所述给定碱基通道的所述相移校正参数,其中,所述碱基判读结果包括所述每轮测序反应中各碱基通道的信号强度值。
根据本发明的实施例,所述高可信度复合样本点是在所述多轮测序反应的给定轮次测序反应中,所述碱基判读结果为仅一种碱基的所述复合样本点。
根据本发明的实施例,针对给定碱基通道,所述串扰校正参数参考样本点是满足下列条件的所述复合样本点:在所述给定轮次测序反应中,所述复合样本点的所述碱基判读结果为仅一种不同于所述给定碱基的碱基。
根据本发明的实施例,针对给定碱基通道,所述相移校正参数参考样本点是满足下列条件的所述复合样本点:
(a)在所述给定轮次测序反应中,所述复合样本点的所述碱基判读结果为仅一种不同于所述给定碱基的碱基;和
(b)在所述给定轮次测序反应的前轮或者后轮的至少之一中,所述复合样本点的所述碱基判读结果为仅一种所述给定碱基。
根据本发明的实施例,在(b)中,在所述给定轮次测序反应的前一轮,所述复合样本点的所述碱基判读结果为仅一种所述给定碱基,则所述相移校正参数参考样本点为滞后相移校正参数参考样本点,
或者
在(b)中,在所述给定轮次测序反应的后一轮,所述复合样本点的所述碱基判读结果为仅一种所述给定碱基,则所述相移校正参数参考样本点为超前相移校正参数参考样本点。
根据本发明的实施例,所述串扰校正参数是采用所述串扰校正参数参考样本点中所述各碱基通道的信号强度数值对下列公式进行训练而得到的:
yi (B1,N)=β0+β1*Xi (B2,N)+β2*Xi (B3,N)+β3*Xi (B4,N)+∈,
其中,
B1,B2,B3和B4分别代表碱基A通道,碱基T通道,碱基G通道和碱基C通道的一个,其中,B1表示给定碱基通道;
N代表所述给定轮次的编号;
yi (B1,N)表示在给定轮次N中,所述给定碱基通道的所述信号强度数值;
Xi (B2,N),Xi (B3,N)和Xi (B4,N)分别表示给定轮次N中,所述给定碱基通道B2、B3和B4的所述信号强度数值,
β0,β1,β2和β3表示针对给定碱基通道的所述串扰校正参数,∈表示误差参数。
根据本发明的实施例,所述相移校正参数进一步包括滞后相移校正参数和超前相移校正参数的至少之一,所述相移校正参数是采用所述相移校正参数参考样本点中所述各碱基通道的信号强度数值对下列公式进行训练而得到的:
yi (B1,M)=β01+β4*Xi (B1,M-1)
或者
yi (B1,M)=β02+β5*Xi (B1,M+1)
其中,B1表示给定碱基通道,M表示所述给定轮次的编号,M+1表示所述给定轮次后一轮的编号,M-1表示所述给定轮次前一轮的编号,
β01和β4表示表示针对给定碱基通道的所述滞后相移校正参数,
β02和β5表示表示针对给定碱基通道的所述超前相移校正参数。
根据本发明的实施例,采用MLR模型对所述公式进行训练。
在本发明的第二方面,本发明提出了一种测序系统。根据本发明的实施例,所述系统包括:
芯片,所述芯片具有多个复合模板样本点,所述复合模板样本点中设置有多种测序模板;
检测设备,用于将所述多种测序模板与其对应的测序引物进行杂交,并基于所述测序引物,对杂交有测序引物的所述多种测序模板同步进行多轮测序反应,并且在每轮测序反应中,所述多种测序模板产生的信号强度存在差异;
分析设备,用于针对所述多轮测序反应的每一轮,基于所述信号强度的差异,对测序通道的信号在所述多种测序模板之间进行归类。
根据本发明的实施例,上述测序系统还可以进一步包括下列附加技术特征中的至少之一:
根据本发明的实施例,所述芯片中所述多种测序模板位于同一核酸分子的不同位置上。
根据本发明的实施例,所述芯片中所述多种测序模板位于不同核酸分子上。
根据本发明的实施例,所述芯片中所述多种测序模板位于同一单链DNA分子的不同位置上。
根据本发明的实施例,所述芯片中所述多种测序模板位于多条单链DNA分子复合物的不同链上。
根据本发明的实施例,所述多种测序模板包括:
1)通过延伸滚环扩增引物进行滚环扩增形成的单链DNA分子,和
2)通过延伸多重置换扩增引物对所述单链DNA分子进行多重置换扩增获得的DNA分子。
根据本发明的实施例,所述多种测序模板包括通过PCR扩增,获得的DNA簇。
根据本发明的实施例,所述测序设备中所述多种测序模板产生的信号强度存在已知关系的差异。
根据本发明的实施例,所述已知关系是通过以下方法确定的:控制所述多种测序模板的拷贝数差异或控制所述多种测序模板的测序引物浓度差异。
根据本发明的实施例,所述控制所述多种测序模板的拷贝数差异是通过控制构建测序文库过程中不同模板引物浓度的差异或聚合延伸反应的时间实现的。
根据本发明的实施例,所述多种测序模板产生的信号强度存在至少两倍的差异。
根据本发明的实施例,所述滚环扩增引物固定于所述芯片上或游离在芯片表面的溶液中。
根据本发明的实施例,所述多重置换扩增引物固定于所述芯片上或游离在芯片表面的溶液中。
根据本发明的实施例,所述滚环扩增和多重置换扩增在所述芯片上同时进行。
根据本发明的实施例,先在芯片上进行所述滚环扩增反应,然后杂交多重置换引物进行多重置换反应。
根据本发明的实施例,同步测序信号采集完成之后,可以将测序链洗脱,让测序模板恢复单链状态,进行重复测序反应。
根据本发明的实施例,获得带有复合样本点的测序芯片之后,也可以进行分步测序,其中可以先进行二链测序再进行一链测序,或者先进行一链测序再进行二链测序。
根据本发明的实施例,所述分析设备中进一步包括强度校正模块,用于在每轮测序反应中,对所述各测序通道的信号在所述多种测序模板之间进行归类前,对各通道产生的信号进行强度校正。
根据本发明的实施例,所述分析设备中进一步包括串扰校正参数获取模块和相移校正参数获取模块中的至少之一,其中,
所述串扰校正参数获取模块用于针对给定碱基通道,基于多个串扰校正参数参考样本点的所述每轮测序反应中各个碱基通道的碱基判读结果,确定所述给定碱基通道的所述串扰校正参数,
所述相移校正参数获取模块用于针对给定碱基通道,基于多个相移校正参数参考样本点的所述每轮测序反应中各个碱基通道的碱基判读结果,确定所述给定碱基通道的所述相移校正参数,
其中,所述碱基判读结果包括所述每轮测序反应中各碱基通道的信号强度值。
根据本发明的实施例,所述分析设备中还可以包括:高可信度复合样本点确定模块,用于基于所述碱基判读结果,在多个所述复合模板样本点中确定多个高可信度复合样本点。
根据本发明的实施例,所述分析设备中还可以包括:串扰校正参数参考样本点确定模块,用于针对给定碱基通道,从所述多个高可信度复合样本点中确定多个串扰校正参数参考样本点。
根据本发明的实施例,所述高可信度复合样本点是在所述每轮测序反应的给定轮次测序反应中,所述碱基判读结果为仅一种碱基的所述复合样本点。
根据本发明的实施例,针对给定碱基通道,所述串扰校正参数参考样本点是满足下列条件的所述复合样本点:在所述给定轮次测序反应中,所述复合样本点的所述碱基判读结果为仅一种不同于所述给定碱基的碱基。
根据本发明的实施例,所述分析设备中还可以包括相移校正参数参考样本点确定模块,用于针对给定碱基通道,从所述多个高可信度复合样本点中确定多个相移校正参数参考样本点。
在本发明的第三方面,本发明提出了一种计算机设备。根据本发明的实施例,所述计算机设备包括存储器、控制器和处理器;所述存储器,包括用于存储程序;所述控制器,包括通过执行所述存储器的程序以实现控制测序反应;所述处理器,包括用于通过执行所述存储器存储的程序以实现第一方面所述的测序方法。
在本发明的第四方面,本发明提出了一种计算机可读存储介质。根据本发明的实施例,所述存储介质中存储有程序,所述程序能够被处理器执行以实现第一方面所述的测序方法。
应理解,在本发明范围内中,本发明的上述各技术特征和在下文(如实施例)中具体描述的各技术特征之间都可以互相组合,从而构成新的或优选的技术方案。限于篇幅,在此不再一一累述。
附图说明
图1显示了本发明实施例的多模板核酸同步测序的方法的流程示意图;
图2显示了本发明一个实施例的多模板核酸同步测序的方法的流程示意图;
图3显示了本发明另一个实施例的多模板核酸同步测序的方法的流程示意图;
图4显示了本发明有一个实施例的多模板核酸同步测序的方法的流程示意图;
图5显示了本发明实施例的多种模板核酸同步测序的系统的结构装置图;
图6显示了本发明实施例的多种模板核酸同步测序的系统的又一个结构装置图;
图7显示了本发明实施例的多种模板核酸同步测序的系统的又一个结构装置图;
图8显示了本发明实施例的多种模板核酸同步测序的系统的又一个结构装置图;
图9显示了本发明实施例的测序实验方案一一链和二链同时测序的结果图;
图10显示了本发明实施例的测序实验方案一的先二链后一链测序的结果图;
图11显示了本发明实施例的测序实验方案二中一链和二链同时测序的结果图;
图12显示了本发明实施例的测序实验方案三中的一链和二链同时测序中raw intensity随cycle数的变化结果图;
图13A显示了本发明实施例的测序实验方案四的桥式扩增法通过控制测序模板拷贝数实现信号差的方案一;
图13B显示了本发明实施例的测序实验方案四的桥式扩增法通过控制测序引物浓度实现信号差的方案二;以及
图14显示了本发明实施例的测序实验方案四的通过引物稀释实现信号差的一链和二链同步测序结果图。
具体实施方式
下面详细描述本发明的实施例,所述实施例的示例在附图中示出。下面通过参考附图描述的实施例是示例性的,旨在用于解释本发明,而不能理解为对本发明的限制。
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本发明的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。
在本文中所披露的范围的端点和任何值都不限于该精确的范围或值,这些范围或值应当理解为包含接近这些范围或值的值。对于数值范围来说,各个范围的端点值之间、各个范围的端点值和单独的点值之间,以及单独的点值之间可以彼此组合而得到一个或多个新的数值范围,这些数值范围应被视为在本文中具体公开。
术语“双端模式测序”是指对同一DNA分子5’末端和3‘末端分别进行测序。
同步测序方法
在本发明的一个方面,本发明提出了一种测序方法,在一些具体实施例中,参考图1,所述方法包括:
S10测序模板与测序引物进行杂交
在核酸测序过程中同时存在多个复合模板样本点,所述复合模板样本点中设置有多种测序模板,在所述多种测序模板上同步进行多轮测序反应前,将所述多种测序模板与其对应的测序引物进行杂交。本发明对于同步测序的测序模板数量不做严格限定,通常单个复合样本点需要有至少两种模板,分别为待测核酸和标签序列,其中标签序列起到标记识别作用,当存在双标签序列的情况下,则要三种模板。
而且,本发明中,对于多种测序模板位置也不做严格限定,可以位于同一核酸分子的不同位置上、不同核酸分子上、同一单链DNA分子(如DNA纳米球或线性DNA单链分子)的不同位置上或者多条单链DNA分子复合物的不同链上等,具体可以根据实际情况灵活选择。
本公开中使用的术语“复合模板”是指含有至少一个预先经过扩增所获得的测序模板,对于扩增方式不做严格限定,可以为滚环扩增、桥式扩增等,即可以为通过延伸滚环扩增引物进行滚环扩增形成的单链DNA分子(如DNA纳米球),和通过延伸多重置换扩增引物对所述单链DNA分子进行多重置换扩增获得的DNA分子,也可以是通过在芯片表面进行的PCR扩增,获得的DNA簇。
根据本发明的一些具体实施例,所述滚环扩增的引物固定于固体支持物上或芯片上。
根据本发明的一些具体实施例,所述滚环扩增的引物游离于芯片表面的溶液中。
根据本发明的一些具体实施例,所述多重置换扩增的引物固定于固体支持物上或芯片上。
根据本发明的一些具体实施例,所述多重置换扩增的引物游离于芯片表面的溶液中。
根据本发明的一些具体实施例,所述滚环扩增和多重置换扩增在同一反应体系中同时进行。
根据本发明的一些具体实施例,先进行所述滚环扩增反应,然后杂交多重置换引物进行多重置换反应。
根据本发明的一些具体实施例,每一个复合模板样本点上多种测序模板产生的信号强度存在差异,基于所述多轮测序反应的每一轮信号强度的差异,对测序通道的信号在所述多种测序模板之间进行归类确定每一个复合模板样本点上多种测序模板的核苷酸序列。
S20对测序模板进行多轮测序反应
该步骤中,将所述多种测序模板与其对应的测序引物进行杂交后,基于所述测序引物,对所述多种测序模板进行多轮测序反应,并且在每轮测序反应中,所述多种测序模板产生的信号强度存在差异。
根据本发明的一些具体实施例,所述多种测序模板产生的信号强度存在已知关系的差异。本申请中发明人通过控制所述多种测序模板产生的信号数量,即可对测序的碱基进行一链和二链的区分和归类。
在本发明的一些具体的测序过程中,所述信号强度差异是通过以下几种方式进行确定:控制不同测序模板的拷贝数差异或控制多种测序模板的测序引物浓度差异,以控制不同种测序模板产生的信号数量关系。具体地,可以通过控制构建测序文库过程中聚合延伸反应的时间以控制测序模板的拷贝数差异或者可以通过控制不同测序模板引物的浓度差异来控制测序模板的拷贝数差异。针对桥式扩增,可以采用如下两种方式区分不同测序模板测序产生的信号数量:方法(1)通过改造修饰在芯片表面上的扩增引物,其中一种扩增引物不具有可切基团、另一种引物掺杂一定比例的带有可切基团的引物(如酶切、光切等),扩增结束后,将这部分含可切基团的测序模板拷贝去除,即可实现对两条链信号数量关系的控制;方法(2)在测序过程中,控制两种测序引物的浓度比例,或者通过在测序引物3’末端添加阻断基团来控制可以进行聚合延伸反应的引物数量。具体地,扩增引物3’端被阻断基团所阻断从而不能在聚合酶作用下延伸。在本发明的一个具体实施例中,阻断的扩增引物是3’端磷酸化的引物,在该阻断引物的存在下无法进行核苷酸链的合成,进行去磷酸化处理后所述阻断扩增引物可以在聚合酶作用下进行延伸。由此,在每轮测序反应中,基于同步测序的原理,每个复合样本点上都有两种来源的碱基在发光,且存在信号强度差异,以此达到更好识别信号的效果,从而提升同步测序的测序质量。
根据本发明的一些具体实施例,所述多种测序模板产生的信号强度存在至少两倍的差异。本领域的技术人员可以根据需要设置合适的信号强度差异倍数,可以为2倍、3倍、4倍、5倍、7倍、9倍、11倍、15倍、20倍、25倍、30倍等。
S30对各测序通道的信号在所述多种模板之间进行归类。
在该步骤中,针对所述多轮测序反应的每一轮,基于所述信号强度的差异,对测序通道的信号在所述多种测序模板之间进行归类。
根据本发明的一些具体实施例,发明人发现在测序过程中的多个通道互相之间会存在串扰,例如A碱基通道和T碱基通道之间存在光学信号串扰或者C碱基通道和G碱基通道之间存在着光学信号串扰(在本文,“串扰”也称为“Crosstalk”),从而会导致各碱基通道的检测结果不准确,这种偏差对于同步测序的影响会格外显著,将导致两两碱基之间无法准确区分,从而有可能使得测序结果不可用,进而需要对测序反应所获得的图像中的各碱基通道的信号强度进行串扰校正。其中,上述多个通道可以为四个通道、三个通道或两个通道。
另外,发明人还发现,前轮测序过程或者后轮测序过程会对本轮测序过程产生滞后或超前的信号干扰,从而,也需要利用前轮或者后轮的测序信号对本轮测序所获得的测序信号进行相移校正(在本文,“相移”也称为“Phasing”)。由此,通过对各碱基通道的信号数据进行校正,以达到提高数据真实、去除噪声的目的,从而可以在同步测序中获得经校正的碱基组合的判读信息。
由此,在所述每轮测序反应中,对各通道产生的信号进行强度校正后在所述多种测序模板之间进行归类,其中,所述强度校正采用的校正参数包括串扰校正参数和相移校正参数中的至少之一,以去除噪声,以实现同步测序中碱基的判读。
根据本发明的一些具体实施例中,确定所述校正参数的步骤包括:基于碱基判读结果,在多个所述复合模板样本点中确定多个高可信度复合样本点;针对给定碱基的所述通道,从所述多个高可信度复合样本点中确定多个串扰校正参数参考样本点和多个相移校正参数参考样本点;和针对所述给定碱基的所述通道,基于所述多个串扰校正参数参考样本点的碱基判读结果,确定所述给定碱基通道的所述串扰校正参数,基于多个相移校正参数参考样本点的碱基判读结果,确定所述给定碱基通道的所述相移校正参数,其中,所述碱基判读结果包括所述每轮测序反应中各碱基通道的信号强度值。
根据本发明的一些具体实施例,所述高可信度复合样本点是在所述多轮测序反应的给定轮次测序反应中,所述碱基判读结果为仅一种碱基的所述复合样本点。
本文中,所述“高可信度复合样本”为多个高可信度样本的集合,高可信度样本主要是指受到信号干 扰较小的样本。由于A碱基和T碱基之间、C碱基和G碱基之间存在较大的光学串扰,因此,在给定轮次测序反应中,碱基判读结果为仅一种碱基的样本点可以作为高可信度样本点,例如AA、TT、GG、CC。由此,可以保证确定校正参数的准确性。
根据本发明的一些具体的实施例,针对给定碱基通道,所述串扰校正参数参考样本点包含满足下列条件的所述复合样本点:在所述给定轮次测序反应中,所述复合样本点的所述碱基判读结果为仅一种不同于所述给定碱基的碱基。由此,可以避免A碱基和T碱基之间、C碱基和G碱基之间存在较大的光学串扰对串扰校正参数的确定造成不良影响。
根据本发明的一些具体的实施例,所述串扰校正参数是采用所述串扰校正参数参考样本点中所述各碱基通道的信号强度数值对下列公式进行训练而得到的:
yi (B1,N)=β0+β1*Xi (B2,N)+β2*Xi (B3,N)+β3*Xi (B4,N)+∈,
其中,
B1,B2,B3和B4分别代表碱基A通道,碱基T通道,碱基G通道和碱基C通道的一个,其中,B1表示给定碱基通道;
N代表所述给定轮次的编号;
yi (B1,N)表示在给定轮次N中,所述给定碱基通道的所述信号强度数值;
Xi (B2,N),Xi (B3,N)和Xi (B4,N)分别表示给定轮次N中,所述给定碱基通道B2、B3和B4的所述信号强度数值,
β0,β1,β2和β3表示针对给定碱基通道的所述串扰校正参数,∈表示误差参数。采用MLR模型对所述公式进行训练。
计算每个通道噪声的影响因子,需先选点。以C通道的计算为例。C通道在非发光情况下的信号是由其他通道对它的Crosstalk以及前后轮的Phasing引起。因此,选点选择当前轮识别为GG、AA、TT的点,计算各通道对C的Crosstalk系数。
根据本发明的实施例,针对给定碱基通道,所述相移校正参数参考样本点是满足下列条件的所述复合样本点:(a)在所述给定轮次测序反应中,所述复合样本点的所述碱基判读结果为仅一种不同于所述给定碱基的碱基;和(b)在所述给定轮次测序反应的前轮或者后轮的至少之一中,所述复合样本点的所述碱基判读结果为仅一种所述给定碱基。例如,可以以前轮/后轮中AA、TT、GG、CC作为相移校正参数参考样本点。
根据本发明的实施例,在(b)中,在所述给定轮次测序反应的前一轮,所述复合样本点的所述碱基判读结果为仅一种所述给定碱基,则所述相移校正参数参考样本点为滞后相移校正参数参考样本点,或者在(b)中,在所述给定轮次测序反应的后一轮,所述复合样本点的所述碱基判读结果为仅一种所述给定碱基,则所述相移校正参数参考样本点为超前相移校正参数参考样本点。
根据本发明的实施例,所述相移校正参数进一步包括滞后相移校正参数和超前相移校正参数的至少之一,所述相移校正参数是采用所述相移校正参数参考样本点中所述各碱基通道的信号强度数值对下列公式进行训练而得到的:
yi (B1,M)=β01+β4*Xi (B1,M-1)
或者
yi (B1,M)=β02+β5*Xi (B1,M+1)
其中,B1表示给定碱基通道,M表示所述给定轮次的编号,M+1表示所述给定轮次后一轮的编号,M-1表示所述给定轮次前一轮的编号,
β01和β4表示表示针对给定碱基通道的所述滞后相移校正参数,
β02和β5表示表示针对给定碱基通道的所述超前相移校正参数。
根据本发明的一些具体实施例,采用MLR模型对上述公式进行训练以获得所述串扰校正参数和相移校正参数。
以C通道的计算为例,如前所述,C通道在非发光情况下的信号是由其他通道对它的Crosstalk以及前后轮的Lagrunon(超前)引起。因此,选点选择当前轮识别为GG、AA、TT的点,计算各通道对C的Phasing系数。Phasing系数的计算涉及到前后轮信号值,选择避免本轮信号干扰。计算信号滞后时,选点本轮N识别为AA、TT,前轮N-1为CC,且后轮N+1为AA、TT的点,计算C通道上的Lagging(滞后)系数。同理,计算信号超前,选点本轮N预CallAA、TT的点,前轮N-1识别AA、TT的点,后轮N+1识别CC的点,计算C通道上的Lagrunon系数。其他通道计算同理。
在本发明一些优选的具体实施例中,参考图2,所述测序方法包括:
步骤a:获得单链环核酸,
步骤b:以单链环核酸为模板进行滚环扩增(RCA)或逆转录反应形成DNA纳米球(DNB),
步骤c:将DNB和缓冲液混合,加载到测序载片上;然后杂交多重置换扩增引物(MDA引物),
步骤d:泵入特定缓冲液,使芯片上的DNB继续进行滚环扩增延伸模板链的3’端,延伸的模板后续作为一链的测序模板,
步骤e:用步骤c杂交好的MDA引物进行多重置换扩增过程生成二链模板,
此时一链测序和二链测序模板生成完毕,通过控制滚环扩增的反应时间来获得具有差异的拷贝数,该模板可实现一链二链的信号差异,
步骤f:同时杂交一链和二链引物,
步骤g:进行一链二链同时测序,
洗脱一链和二链杂交引物,恢复单链的一链和二链模板,可进行重复测序,既可用于先二链测序后一链测序,亦可用于先一链测序后二链测序。实现反复循环测序。
在本发明一些优选的具体实施例中,参考图3,所述方法包括:
步骤a:获得单链环DNA,
步骤b:将单链环DNA和缓冲液混合,加载到测序载片上,滚还扩增引物固定于芯片上,以单链环DNA为模板进行滚环扩增(RCA)形成单链DNA分子做为一链测序模板,
步骤c:杂交多重置换扩增引物(MDA引物),多重置换引物游离于芯片表面的溶液中,
步骤d:MLG产生一链的测序模板,
步骤e:在一链模板上杂交带有3’端阻断功能的一链测序引物,
步骤f:用步骤c杂交好的MDA引物进行多重置换扩增过程生成二链模板,
步骤g:一链测序引物去阻断,
步骤h:杂交二链引物,
步骤i:进行一链二链同时测序。
在本发明一些优选的具体实施例中,参考图4,所述方法包括:
步骤a:以单链环DNA为模板,
步骤b:将单链环DNA和缓冲液混合,加载到测序载片上,进行滚环扩增(RCA)形成DNA纳米球(DNB),
步骤c:杂交多重置换扩增引物(MDA引物),所述多重置换引物固定在测序芯片上,
步骤d:用步骤c杂交好的MDA引物进行多重置换扩增过程生成二链模板,
步骤e:加载特定缓冲液,使芯片上的DNB继续进行滚环扩增延伸模板链的3’端,延伸的模板后续作为一链的测序模板,
此时一链测序和二链测序模板生成完毕,通过控制滚环扩增的拷贝数,该模板可实现一链二链的信号差异,
步骤f:同时杂交一链和二链引物,
步骤g:进行一链二链同时测序。
对于双端测序,既可用于先二链测序后一链测序,亦可用于先一链测序后二链测序。
此外,还可以进行桥式PCR扩增,以生成DNA簇,在本发明一些优选的具体实施例中,参考图13A,所述方法包括:
步骤a:以双链DNA文库为模板,
步骤b:在测序芯片上进行桥式PCR扩增,生成DNA簇,
步骤c:光照或者酶切反应,从每一个DNA簇中去除部分模板,形成单链的DNA一链和二链,
步骤d:同时杂交一链和二链引物,其中一链和二链的引物浓度比例具有倍数差异,例如:1:2,1:3,1:4等,
步骤e:进行一链二链同时测序。
此外,在本发明一些优选的具体实施例中,参考图13B,所述方法包括:
步骤a:以双链DNA文库为模板,
步骤b:在测序芯片上进行桥式PCR扩增,生成DNA簇,
步骤c:光照、酶切或其它反应,从每一个DNA双链中去除二链,形成相同数量的单链的DNA一链,
步骤d:同时杂交插入片段测序引物和标签序列测序引物,其中插入片段测序引物和标签序列测序引物浓度比例具有倍数差异,例如:1:2,1:3,1:4等,
步骤e:进行插入片段测序引物和标签序列测序引物同时测序。
测序系统及装置
在本发明的另一方面,本发明提出了一种测序系统,如图5所示,所述系统包括:
芯片100,具有多个复合模板样本点的测序芯片,所述复合模板样本点中设置有多种测序模板;
检测设备200,用于将所述多种测序模板与其对应的测序引物进行杂交,并基于所述测序引物,对杂交有测序引物的所述多种测序模板同步进行多轮测序反应,并且在每轮测序反应中,所述多种测序模板产生的信号强度存在差异;
分析设备300,用于针对所述多轮测序反应的每一轮,基于所述信号强度的差异,对测序通道的信号在所述多种测序模板之间进行归类。
根据本发明的一些具体实施例,所述芯片中所述多种测序模板位于同一核酸分子的不同位置上,位于不同核酸分子上,位于同一单链DNA分子(如DNA纳米球)的不同位置上或者多条单链DNA分子复合物的不同链上等,具体可以根据实际情况灵活选择。
如前所述,所述多种测序模板可以为通过延伸滚环扩增引物进行滚环扩增形成的单链DNA分子,和通过延伸多重置换扩增引物对所述单链DNA分子进行多重置换扩增获得的DNA分子,也可以是通过PCR扩增,获得的DNA簇。其中,所述滚环扩增的引物固定于固体支持物上或游离于溶液中,所述多重置换扩增的引物固定于固体支持物上或游离于溶液中,所述滚环扩增和多重置换扩增可以在同一反应体系中同时进行,或者先进行所述滚环扩增反应,然后杂交多重置换引物进行多重置换反应。
根据本发明的一些具体实施例,每一个复合模板样本点上多种测序模板产生的信号强度存在差异,基于所述多轮测序反应的每一轮信号强度的差异,对测序通道的信号在所述多种测序模板之间进行归类确定每一个复合模板样本点上多种测序模板的核苷酸序列。
根据本发明的一些具体实施例,所述测序设备中所述多种测序模板所产生的信号强度存在已知关系的差异。
根据本发明的一些具体实施例,所述多种测序模板所产生的信号数量关系是通过控制所述多种模板的拷贝数的差异或控制所述多种模板测序引物浓度的差异设定的。
根据本发明的一些具体实施例,所述已知关系是通过以下方法确定的:控制所述多种测序模板的拷贝数差异或控制所述多种测序模板的测序引物浓度差异。
根据本发明的一些具体实施例,所述控制所述多种测序模板的拷贝数差异是通过控制构建测序文库过程中聚合延伸反应的时间实现的。
根据本发明的一些具体实施例,所述多种测序模板产生的信号强度存在至少两倍的差异。
参考图6,根据本发明的实施例,所述分析设备中包括强度校正模块310,用于在每轮测序反应中,对所述各测序通道的信号在所述多种测序模板之间进行归类前,对各通道产生的信号进行强度校正。
参考图7,根据本发明的一些具体实施例,所述分析设备进一步包括串扰校正参数获取模块320和相移校正参数获取模块330中的至少之一,
所述串扰校正参数获取模块320用于针对给定碱基通道,基于多个串扰校正参数参考样本点的所述 每轮测序反应中各个碱基通道的碱基判读结果,确定所述给定碱基通道的所述串扰校正参数,
所述相移校正参数获取模块330用于针对给定碱基通道,基于多个相移校正参数参考样本点的所述每轮测序反应中各个碱基通道的碱基判读结果,确定所述给定碱基通道的所述相移校正参数,
其中,所述碱基判读结果包括所述每轮测序反应中各碱基通道的信号强度值。
参考图8,根据本发明的一些具体实施例,所述分析设备中还可以包括高可信度复合样本点确定模块340,用于基于所述碱基判读结果,在多个所述复合模板样本点中确定多个高可信度复合样本点,其中,所述高可信度复合样本点是在所述每轮测序反应的给定轮次测序反应中,所述碱基判读结果为仅一种碱基的所述复合样本点;
串扰校正参数参考样本点确定模块350,用于针对给定碱基通道,从所述多个高可信度复合样本点中确定多个串扰校正参数参考样本点,在一些具体的实施例中,针对给定碱基通道,所述串扰校正参数参考样本点是满足下列条件的所述复合样本点:在所述给定轮次测序反应中,所述复合样本点的所述碱基判读结果为仅一种不同于所述给定碱基的碱基。
相移校正参数参考样本点确定模块360,用于针对给定碱基通道,从所述多个高可信度复合样本点中确定多个相移校正参数参考样本点。根据本发明的一些具体实施例,针对给定碱基通道,所述相移校正参数参考样本点是满足下列条件的所述复合样本点:(a)在所述给定轮次测序反应中,所述复合样本点的所述碱基判读结果为仅一种不同于所述给定碱基的碱基;和(b)在所述给定轮次测序反应的前轮或者后轮的至少之一中,所述复合样本点的所述碱基判读结果为仅一种所述给定碱基。在(b)中,在所述给定轮次测序反应的前一轮,所述复合样本点的所述碱基判读结果为仅一种所述给定碱基,则所述相移校正参数参考样本点为滞后相移校正参数参考样本点,或者在(b)中,在所述给定轮次测序反应的后一轮,所述复合样本点的所述碱基判读结果为仅一种所述给定碱基,则所述相移校正参数参考样本点为超前相移校正参数参考样本点。
根据本发明的一些具体实施例,所述串扰校正参数获取模块320是采用所述串扰校正参数参考样本点中所述各碱基通道的信号强度数值对下列公式进行训练而得到的:
yi (B1,N)=β0+β1*Xi (B2,N)+β2*Xi (B3,N)+β3*Xi (B4,N)+∈,
其中,
B1,B2,B3和B4分别代表碱基A通道,碱基T通道,碱基G通道和碱基C通道的一个,其中,B1表示给定碱基通道;
N代表所述给定轮次的编号;
yi (B1,N)表示在给定轮次N中,所述给定碱基通道的所述信号强度数值;
Xi (B2,N),Xi (B3,N)和Xi (B4,N)分别表示给定轮次N中,所述给定碱基通道B2、B3和B4的所述信号强度数值,
β0,β1,β2和β3表示针对给定碱基通道的所述串扰校正参数,∈表示误差参数。
根据本发明的一些具体实施例,所述相移校正参数获取模块330是采用所述相移校正参数参考样本点中所述各碱基通道的信号强度数值对下列公式进行训练而得到的:
yi (B1,M)=β01+β4*Xi (B1,M-1)
或者
yi (B1,M)=β02+β5*Xi (B1,M+1)
其中,B1表示给定碱基通道,M表示所述给定轮次的编号,M+1表示所述给定轮次后一轮的编号,M-1表示所述给定轮次前一轮的编号,
β01和β4表示表示针对给定碱基通道的所述滞后相移校正参数,
β02和β5表示表示针对给定碱基通道的所述超前相移校正参数。
计算机产品
在本发明的又一方面,本发明提出了一种计算机设备,所述计算机设备包括存储器、控制器和处理器;所述存储器,包括用于存储程序;所述处理器,包括用于通过执行所述存储器存储的程序以实现第一方面所述的方法。根据本发明一些具体实施例的所述计算机设备,能够有效实现所述对多模板核酸同步测序的方法,对多模板核酸进行同步测序,有效提高测序通量,大大节省测序的时间和成本。具体地,电子设备可以为包括平板电脑、计算集群、测序仪、车载电脑等任意智能终端。
本公开所使用的术语“存储器”指的是用于将机器指令和/或数据提供给可编程处理器的任何计算机程序产品、设备、和/或装置(例如,磁盘、光盘、存储器、可编程逻辑装置(PLD)),包括,接收作为机器可读信号的机器指令的机器可读介质。存储器可以采用只读存储器(Read Only Memory,ROM)、静态存储设备、动态存储设备或者随机存取存储器(Random Access Memory,RAM)等形式实现。存储器可以存储操作系统和其他应用程序,在通过软件或者固件来实现本说明书实施例所提供的技术方案时,相关的程序代码保存在存储器中,并由处理器来调用执行本申请实施例的基因测序模型的训练方法或者基因测序方法。具体地,存储器中串扰校正参数获取模块和相移校正参数获取模块。
在一些实施例中,所述计算机设备还可以包括输入/输出接口、通信接口、总线,输入/输出接口用于实现信息输入及输出;通信接口用于实现本设备与其他设备的通信交互,可以通过有线方式(例如USB、网线等)实现通信,也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信;总线用于设备的各个组件(例如处理器、存储器、输入/输出接口和通信接口)之间传输信息。
需要说明的是,前面针对同步测序的方法和对同步测序的碱基判读结果进行校正的方法所描述的特征和优点,同样适用于该同步测序系统、电子设备和计算机可读存储介质,在此不再赘述。
在本发明的再一方面,本发明提出了一种计算机可读存储介质,所述存储介质中存储有程序,所述程序能够被处理器执行以实现第一方面所述的方法。计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,高密度数字视频光盘(digital video disc,DVD))、或者半导体介质(例如,固态硬盘(solid state disk,SSD))等。
下面参考具体实施例,对本发明进行描述,需要说明的是,这些实施例仅仅是描述性的,而不以任何方式限制本发明。实施例中未注明具体技术或条件的,按照本领域内的文献所描述的技术或条件或者按照产品说明书进行。所用试剂或仪器未注明生产厂商者,均为可以通过市购获得的常规产品。
实施例1
1.器材:
MGISEQ-2000测序仪,MGISEQ-2000测序试剂载片(715nm),迷你DNB加载装置,PCR仪,PCR八连管,移液器一套,高速离心机,迷你离心机,涡旋混合器。
2.试剂:本申请中使用的主要试剂如表1所示。
表1:
试剂名称 品牌
DNA纳米球制备缓冲液 MGI
PNK酶(多聚核苷酸激酶) MGI
T4PNK 10X反应缓冲液 MGI
MGISEQ-2000RS高通量测序试剂 MGI
DNA纳米球制备酶混合液I MGI
DNA纳米球制备酶混合液II MGI
LTE缓冲液 MGI
DNA纳米球终止缓冲液 MGI
DNA纳米球加载缓冲液IV MGI
5XSSC缓冲液 MGI
大肠杆菌标准文库 MGI
10XPhi29缓冲液 华大研究院
带linker的一链测序引物IP1-x1粉末 生工
带linker的一链测序引物IP1-x2粉末 生工
二链测序引物IP3 MGI
DNBReadBuffer(REB) MGI
EDTA -
甲酰胺 -
3.引物序列:
带linker的一链测序引物IP1-x1:
Figure PCTCN2022138468-appb-000001
带linker的一链测序引物IP1-x2:
Figure PCTCN2022138468-appb-000002
带linker和3’端阻断的一链测序引物IP1-x1-OP:
Figure PCTCN2022138468-appb-000003
带linker和3’端阻断的一链测序引物IP1-x2-OP:
Figure PCTCN2022138468-appb-000004
4.试剂准备:
1)引物的溶解:
将装有引物粉末的1.5毫升的离心管在eppendorf高速离心机(5415D)上,最高转速离心5分钟;按照引物标签上的说明,用超纯水将引物溶解为100M的母液;
2)1M带有linker的一链测序引物工作液IP1-xlinker的配制如表2所示。
表2:
试剂名称 体积 终浓度
100μM带有linker的一链测序引物IP1-x1母液 50μL 0.5μM
100μM带有linker的一链测序引物IP1-x2母液 50μL 0.5μM
10X Phi29缓冲液 1mL 1X
超纯水 8.9mL ---
总计 10mL ---
3)DNB加载缓冲液V的配制如表3所示。
表3:
试剂名称 体积 终浓度
DNB纳米球加载缓冲液IV 100μL ---
EDTA(0.5M) 17μL ---
总计 117μL ---
4)PNK酶试剂配制方式如表4所示。
表4:
试剂名称 终浓度
10U T4PNK(BGI) 0.1U
T4PNK 10X反应缓冲液(pH5.9) 1X
5)一链和二链测序引物混合工作液Insert Primer Mix的配制方式如表5所示。
表5:
试剂名称 体积 终浓度
100μM带有linker的一链测序引物IP1-x1母液 50μL 0.5μM
100μM带有linker的一链测序引物IP1-x2母液 50μL 0.5μM
1.0μM二链测序引物IP3 9.9mL 1.0μM
总计 10mL ---
6)一链和二链条形码引物混合工作液Barcode Primer Mix的配制如表6所示。
表6:
试剂名称 体积 终浓度
100μM一链条形码引物BP1母液 100μL 1μM
100μM二链条形码引物BP2母液 100μL 1μM
5XSSC缓冲液 9.8mL ---
总计 10mL ---
7)1μM带有linker和3’端磷酸化的一链测序引物工作液IP1-xlinker-OP的配制如表7所示。
表7:
试剂名称 体积 终浓度
100μM带有linker的一链测序引物IP1-x1-OP母液 50μL 0.5μM
100μM带有linker的一链测序引物IP1-x2-OP母液 50μL 0.5μM
10X Phi29缓冲液 1mL 1X
超纯水 8.9mL ---
总计 10mL ---
5.测序分析操作步骤:
5.1制备DNB
参考《MGISEQ-2000RS高通量测序试剂套装使用说明书》对大肠杆菌文库进行DNA纳米球的制备,将EII的体积调整为1.6μL,将终止缓冲液的体积减半,即加入10μL终止缓冲液。
5.2装载DNB
准备1张MGISEQ-2000测序试剂载片(715nm),将所述DNB与DNB加载缓冲液V按体积比2:1混合均匀后,用迷你DNB加载装置加载到MGISEQ-2000测序试剂载片(715nm)上。
5.3测序步骤
实验方案一:
参考《MGISEQ-2000RS高通量测序试剂套装使用说明书》准备一套测序试剂盒;用上述1μM带有xlinker的一链测序引物工作液IP1-xlinker替换13号孔试剂,用DNB加载缓冲液IV替换6号孔试剂,用1X phi29 buffer替换7号孔试剂,用REB试剂替换11号孔试剂,用一二链测序引物混合液Insert Primer mix替换3号孔,用一二链条形码引物混合液Barcode Primer mix替换8号孔。
按照《MGISEQ-2000RS高通量测序试剂套装使用说明书》将测序试剂盒、芯片放在MGI2000-RS测序仪上,选择对应的脚本,设置SE50+10,进行测序。为了证明此方案可行,本次测序杂交一链二链测序混合引物Insert Primer Mix,进行同步测序(50bp),之后,再杂交一二链条形码引物Barcode Primer Mix,进行条形码测序(10bp)。
其中,同步测序的碱基判读结果进需行校正,通过对同步测序的预call,筛选出可信度高的AA,TT,CC,GG碱基组合。用MLR的方法计算crosstalk系数和phasing系数,再应用到整个cycle的信号矫正。Phasing系数的计算涉及到前后cycle的亮度值,因此,MLR中,影响因子是当前cycle四个通道的信号,以及前后cycle有关信号亮度值。计算每个通道噪声的影响因子,需先选点。利用以下公式计算各通道对各碱基的crosstalk系数:
yi (B1,N)=β0+β1*Xi (B2,N)+β2*Xi (B3,N)+β3*Xi (B4,N)+∈,
其中,
B1,B2,B3和B4分别代表碱基A通道,碱基T通道,碱基G通道和碱基C通道的一个,其中, B1表示给定碱基通道;
N代表所述给定轮次的编号;
yi (B1,N)表示在给定轮次N中,所述给定碱基通道的所述信号强度数值;
Xi (B2,N),Xi (B3,N)和Xi (B4,N)分别表示给定轮次N中,所述给定碱基通道B2、B3和B4的所述信号强度数值,
β0,β1,β2和β3表示针对给定碱基通道的所述串扰校正参数,∈表示误差参数。
Phasing系数的计算则涉及到前后cycle信号值,其计算的公示如下:
yi (B1,M)=β01+β4*Xi (B1,M-1)
或者
yi (B1,M)=β02+β5*Xi (B1,M+1)
其中,B1表示给定碱基通道,M表示所述给定轮次的编号,M+1表示所述给定轮次后一轮的编号,M-1表示所述给定轮次前一轮的编号,
β01和β4表示表示针对给定碱基通道的所述滞后相移校正参数,
β02和β5表示表示针对给定碱基通道的所述超前相移校正参数。
对多个复孔进行上述步骤测序和校正后,最终获得的测序结果如图9和表8所示,图9为多个复孔的一链和二链同步测序的raw intensity随cycle数变化的曲线,其中,第1-50cycle表示一链和二链同时测序的结果,51-60cycle为barcode 1和barcode 2同时测序的结果。算法结果如表8所示。将同步结果进行拆分和比对分析,其中最大信号的可比对数据量为1139578个,比对率为80.93%,错误率为0.75。次大信号的可比对数据量为213312,比对率为15.15%,错误率为3.42%。该实验同步测序结果整体错误率为2%。
该方案模板下一链二链的信号比可以通过分步测序来直观得出。图10为该方案模板下一链和二链分步测序的结果图,其中,第1-50cycle显示的是二链测序信号结果,第51-100cycle显示的是一链信号结果,第101-110cycle显示barcode信号结果。
表8:
basecall version litecall
Rmax mappingNum 1139578
Rmax mappingRate(%) 80.93
Rmax errorRate(%) 0.75
Rsec mappingNum 213312
Rsec mappingRate(%) 15.15
Rsec errorRate(%) 3.42
Overall Error(%) 2.00
实验方案二:
参考《MGISEQ-2000RS高通量测序试剂套装使用说明书》准备一套测序试剂盒;用上述1μM带有linker和3’端磷酸化(xlinker-OP)的一链测序引物工作液IP1-xlinker-OP替换13号孔试剂,用DNB加载缓冲液IV替换6号孔试剂,用1X phi29 buffer替换7号孔试剂,用REB试剂替换11号孔试剂,用PNK酶试剂替换4号孔。用一二链测序引物混合液Insert Primer mix替换3号孔,用一二链条形码引物混合液Barcode Primer mix替换8号孔。
按照《MGISEQ-2000RS高通量测序试剂套装使用说明书》将测序试剂盒、芯片放在MGI2000-RS测序仪上,选择对应的脚本,设置SE50+10,进行测序。为了证明此方案可行,本次测序先杂交带有3’端磷酸化的一链测序引物,随后进行MDA,再对杂交在一链上的阻断引物去磷酸化,随后杂交二链引物,进行一链二链的同时测序(50bp),之后,再杂交一二链条形码引物Barcode Primer Mix,进行条形码测序(10bp)。
同步测序的碱基判读结果的校正方式同实验方案一。实验结果如图11所示,其中,第1-50cycle表示一链和二链同时测序的结果,51-60cycle为barcode 1和barcode 2同时测序的结果。算法结果如表9所示。将同步结果进行拆分和比对分析,其中最大信号的可比对数据量为981505个,比对率为69.71%,错误率为1.64。次大信号的可比对数据量为171611,比对率为12.18%,错误率为4.67%。该实验同步测序结果整体错误率为3.02%。
表9:
basecall version litecall
Rmax mappingNum 981505
Rmax mappingRate(%) 69.71
Rmax errorRate(%) 1.64
Rsec mappingNum 171611
Rsec mappingRate(%) 12.18
Rsec errorRate(%) 4.67
Overall Error(%) 3.02
实验方案三:
参考《MGISEQ-2000RS高通量测序试剂套装使用说明书》准备一套测序试剂盒;用上述1μM带有xlinker的一链测序引物工作液IP1-xlinker替换13号孔试剂,用DNB加载缓冲液IV替换6号孔试剂,用1X phi29 buffer替换7号孔试剂,用REB试剂替换11号孔试剂,去除4号孔、5号孔试剂。用一二链测序引物混合液Insert Primer mix替换3号孔。
按照《MGISEQ-2000RS高通量测序试剂套装使用说明书》将测序试剂盒、芯片放在MGI2000-RS测序仪上,选择对应的脚本,设置SE50,进行测序。同步测序的碱基判读结果的校正方式同实验方案一。
同步测序算法结果如表10所示,将同步结果进行拆分和比对分析,其中最大信号的可比对数据量为937468个,比对率为66.58%,错误率为1.39。次大信号的可比对数据量为500782,比对率为35.56%,错误率为2.91%。该实验同步测序结果整体错误率为2.5%。图12显示了同步测试的raw intensity随cycle数的变化。
表10
basecall version litecall
Rmax mappingNum 937468
Rmax mappingRate(%) 66.58
Rmax errorRate(%) 1.39
Rsec mappingNum 500782
Rsec mappingRate(%) 35.56
Rsec errorRate(%) 2.91
Overall Error(%) 2.5
实验方案四:
通过控制模板的拷贝数或者控制测序引物的浓度也可以实现的一链和二链的信号差异以进行同步测序。以经桥式PCR扩增产生的DNA簇为例,控制测序模板的拷贝数以实现信号差的方案如图13A所示,控制测序引物的浓度以实现信号差的方案如图13B所示,测序引物3’端被阻断基团所阻断从而不能在聚合酶作用下延伸,本实施例中以控制测序引物的浓度为例进行验证,具体的实验操作如下。
将具有3’端阻断基团的一链测序引物,常规一链测序引物和二链测序引物以1:1:2的比例进行混合,得到测序引物混合液。经桥式PCR扩增产生的DNA簇后,杂交上述混合测序引物,进行同步测序。
实验结果如图14和表11所示,其中,图14表示采用控制测序引物的浓度以实现信号差的方案,进行同步测序所得raw intensity随cycle数的变化。同步测序的碱基判读结果的校正方式同实验方案一。
同步测序算法结果如表11所示,将同步结果进行拆分和比对分析,其中最大信号的可比对数据量为941401个,比对率为66.86%,错误率为1.41。次大信号的可比对数据量为513416,比对率为36.46%,错误率为4.36%。该实验同步测序结果整体错误率为2.92%。
表11:通过引物稀释实现信号差的同步测序算法结果
basecall version litecall
Rmax mappingNum 941401
Rmax mappingRate(%) 66.86
Rmax errorRate(%) 1.41
Rsec mappingNum 513416
Rsec mappingRate(%) 36.46
Rsec errorRate(%) 4.36
Overall Error(%) 2.92
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本发明的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。
尽管上面已经示出和描述了本发明的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本发明的限制,本领域的普通技术人员在本发明的范围内可以对上述实施例进行变化、修改、替换和变型。

Claims (17)

  1. 一种测序方法,其特征在于,包括:
    有多个复合模板样本点,所述复合模板样本点中设置有多种测序模板;
    将所述多种测序模板与其对应的测序引物进行杂交;
    基于所述测序引物,对杂交有测序引物的所述多种测序模板进行多轮测序反应,并且在每轮测序反应中,所述多种测序模板产生的信号强度存在差异;和
    针对所述多轮测序反应的每一轮,基于所述信号强度的差异,对测序通道的信号在所述多种测序模板之间进行归类。
  2. 根据权利要求1所述的测序方法,其特征在于,所述多种测序模板产生的信号强度存在已知关系的差异;
    任选地,所述已知关系是通过以下方法确定的:控制所述多种测序模板的拷贝数差异或控制所述多种测序模板的测序引物浓度差异;
    任选地,所述控制所述多种测序模板的拷贝数差异是通过控制构建测序文库过程中不同模板引物的浓度或者聚合延伸反应的时间实现的;
    任选地,所述多种测序模板产生的信号强度存在至少两倍的差异。
  3. 根据权利要求1或2所述的测序方法,其特征在于,所述多种测序模板位于同一核酸分子的不同位置上;
    任选地,所述多种测序模板位于不同核酸分子上。
  4. 根据权利要求3所述的测序方法,其特征在于,所述多种测序模板包括:
    通过延伸滚环扩增引物进行滚环扩增形成单链DNA分子,和通过延伸多重置换扩增引物对所述单链DNA分子进行多重置换扩增获得DNA分子复合物;
    任选的,所述多种测序模板包括通过PCR扩增,获得的DNA簇。
  5. 根据权利要求4所述的测序方法,其特征在于,所述滚环扩增的引物固定于固体支持物上或游离于溶液中;
    任选地,所述多重置换扩增的引物固定于固体支持物上或游离于溶液中。
  6. 根据权利要求4所述的测序方法,其特征在于,所述滚环扩增和多重置换扩增在同一反应体系中同时进行。
  7. 根据权利要求4所述的测序方法,其特征在于,先进行所述滚环扩增反应,然后杂交多重置换引物进行多重置换反应。
  8. 根据权利要求1所述的测序方法,其特征在于,每一个复合模板样本点上多种测序模板产生的信号强度存在差异,基于所述多轮测序反应的每一轮信号强度的差异,对测序通道的信号在所述多种测序模板之间进行归类确定每一个复合模板样本点上多种测序模板的核苷酸序列。
  9. 一种测序系统,其特征在于,包括:
    芯片,所述芯片具有多个复合模板样本点,所述复合模板样本点中设置有多种测序模板;
    检测设备,用于将所述多种测序模板与其对应的测序引物进行杂交,并基于所述测序引物,对杂交有测序引物的所述多种测序模板同步进行多轮测序反应,并且在每轮测序反应中,所述多种测序模板产生的信号强度存在差异;
    分析设备,用于针对所述多轮测序反应的每一轮,基于所述信号强度的差异,对测序通道的信号在所述多种测序模板之间进行归类。
  10. 根据权利要求9所述的系统,其特征在于,所述芯片中所述多种测序模板位于同一核酸分子的不同位置上;
    任选地,所述芯片中所述多种测序模板位于不同核酸分子上;
    任选地,所述多种测序模板包括:
    1)通过延伸滚换扩增引物形成单链DNA分子,和
    2)通过延伸多重置换扩增引物对所述单链DNA分子进行多重置换扩增获得DNA分子;
    任选的,所述多种测序模板包括通过PCR扩增,获得的DNA簇。
  11. 根据权利要求9所述的系统,其特征在于,所述测序设备中所述多种测序模板产生的信号强度存在已知关系的差异;
    任选地,所述已知关系是通过以下方法确定的:控制所述多种测序模板的拷贝数差异或控制所述多种测序模板的测序引物浓度差异;
    任选地,所述控制所述多种测序模板的拷贝数差异是通过控制构建测序文库过程中不同模板引物浓度的差异或聚合延伸反应的时间实现的;
    任选地,所述多种测序模板产生的信号强度存在至少两倍的差异。
  12. 根据权利要求10所述的测序系统,其特征在于,所述滚环扩增引物固定于所述芯片上或游离于溶液中;
    任选地,所述多重置换扩增引物固定于所述芯片上或游离于溶液中。
  13. 根据权利要求10所述的测序系统,其特征在于,所述滚环扩增和多重置换扩增在所述芯片上同时进行。
  14. 根据权利要求10所述的测序系统,其特征在于,先在芯片上进行所述滚环扩增反应,然后杂交多重置换引物进行多重置换反应。
  15. 根据权利要求9所述的测序系统,其特征在于,芯片上的每一个复合模板样本点中多种测序模板产生的信号强度存在差异,基于分析设备对多轮测序反应的每一轮信号强度的差异分析,对测序通道的信号在所述多种测序模板之间进行归类确定每一个复合模板样本点上多种测序模板的核苷酸序列。
  16. 一种测序设备,其特征在于:包括存储器和处理器;
    所述存储器,包括用于存储程序;
    所述处理器,包括通过执行所述存储器存储的程序以实现权利要求1~8任一项所述的测序方法。
  17. 一种计算机可读存储介质,其特征在于:所述存储介质中存储有程序,所述程序能够被处理器执行以实现权利要求1~8任一项所述的测序方法。
PCT/CN2022/138468 2022-12-12 2022-12-12 一种多模板核酸同步测序的方法及其应用 WO2024124379A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/138468 WO2024124379A1 (zh) 2022-12-12 2022-12-12 一种多模板核酸同步测序的方法及其应用

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/138468 WO2024124379A1 (zh) 2022-12-12 2022-12-12 一种多模板核酸同步测序的方法及其应用

Publications (1)

Publication Number Publication Date
WO2024124379A1 true WO2024124379A1 (zh) 2024-06-20

Family

ID=91484238

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/138468 WO2024124379A1 (zh) 2022-12-12 2022-12-12 一种多模板核酸同步测序的方法及其应用

Country Status (1)

Country Link
WO (1) WO2024124379A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170298430A1 (en) * 2014-11-05 2017-10-19 Illumina Cambridge Limited Sequencing from multiple primers to increase data rate and density
CN114774523A (zh) * 2021-01-22 2022-07-22 上海羿鸣生物科技有限公司 一种通过多重检测提高线性扩增检测准确率的方法、试剂盒及应用
US20220349002A1 (en) * 2020-03-03 2022-11-03 Pacific Biosciences Of California, Inc. Methods and compositions for sequencing double stranded nucleic acids
US20220349001A1 (en) * 2020-01-17 2022-11-03 Mgi Tech Co., Ltd. Method for synchronously sequencing sense strand and antisense strand of dna
WO2022247555A1 (zh) * 2021-05-24 2022-12-01 深圳市真迈生物科技有限公司 测序方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170298430A1 (en) * 2014-11-05 2017-10-19 Illumina Cambridge Limited Sequencing from multiple primers to increase data rate and density
US20220349001A1 (en) * 2020-01-17 2022-11-03 Mgi Tech Co., Ltd. Method for synchronously sequencing sense strand and antisense strand of dna
US20220349002A1 (en) * 2020-03-03 2022-11-03 Pacific Biosciences Of California, Inc. Methods and compositions for sequencing double stranded nucleic acids
CN114774523A (zh) * 2021-01-22 2022-07-22 上海羿鸣生物科技有限公司 一种通过多重检测提高线性扩增检测准确率的方法、试剂盒及应用
WO2022247555A1 (zh) * 2021-05-24 2022-12-01 深圳市真迈生物科技有限公司 测序方法

Similar Documents

Publication Publication Date Title
Coenen-Stass et al. Evaluation of methodologies for microRNA biomarker detection by next generation sequencing
Fullwood et al. Chromatin interaction analysis using paired‐end tag sequencing
CN111108218B (zh) 使用压缩的分子标记的核酸序列数据检测融合的方法
Chung et al. The minimal amount of starting DNA for Agilent’s hybrid capture-based targeted massively parallel sequencing
Teder et al. TAC-seq: targeted DNA and RNA sequencing for precise biomarker molecule counting
US20160319347A1 (en) Systems and methods for detection of genomic variants
US20200362406A1 (en) Improved method and kit for the generation of dna libraries for massively parallel sequencing
US9169515B2 (en) Methods and systems for nucleic acid sequencing validation, calibration and normalization
Robinson et al. Computational exome and genome analysis
CN110219054B (zh) 一种核酸测序文库及其构建方法
WO2021142769A1 (zh) 同步进行dna正义链和反义链测序的方法
CN113373524A (zh) 一种ctDNA测序标签接头、文库、检测方法和试剂盒
JP2024056984A (ja) エピジェネティック区画アッセイを較正するための方法、組成物およびシステム
WO2024124379A1 (zh) 一种多模板核酸同步测序的方法及其应用
CN112111560B (zh) Dna纳米球及其制备方法和应用
JP2023060046A (ja) 脱アミノ化に誘導される配列エラーの補正
US20190218606A1 (en) Methods of reducing errors in deep sequencing
US20220177958A1 (en) Directional targeted sequencing
Lock et al. Efficiency clustering for low-density microarrays and its application to QPCR
WO2024124378A1 (zh) 对同步测序的碱基判读结果进行校正的方法、同步测序方法及系统、计算机程序产品
Kim et al. A Universal Analysis Pipeline for Hybrid Capture-Based Targeted Sequencing Data with Unique Molecular Indexes
CN111433374A (zh) 用于检测串联重复区的方法、系统和计算机可读介质
CN112970068A (zh) 用于检测样品之间的污染的方法和系统
Ku et al. The evolution of high-throughput sequencing technologies: From sanger to single-molecule sequencing
US20210027859A1 (en) Method, Apparatus and System to Detect Indels and Tandem Duplications Using Single Cell DNA Sequencing