CN114107290A - Sequencing joint and sequencing analysis system thereof - Google Patents

Sequencing joint and sequencing analysis system thereof Download PDF

Info

Publication number
CN114107290A
CN114107290A CN202111374708.XA CN202111374708A CN114107290A CN 114107290 A CN114107290 A CN 114107290A CN 202111374708 A CN202111374708 A CN 202111374708A CN 114107290 A CN114107290 A CN 114107290A
Authority
CN
China
Prior art keywords
sequencing
sequence
internal index
internal
sequences
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111374708.XA
Other languages
Chinese (zh)
Inventor
欧阳川
王珺
周逸文
王江浩
刘紫丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Jieyi Biotechnology Co ltd
Original Assignee
Hangzhou Jieyi Biotechnology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Jieyi Biotechnology Co ltd filed Critical Hangzhou Jieyi Biotechnology Co ltd
Priority to CN202111374708.XA priority Critical patent/CN114107290A/en
Priority to PCT/CN2022/071549 priority patent/WO2023087527A1/en
Publication of CN114107290A publication Critical patent/CN114107290A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Organic Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Zoology (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Wood Science & Technology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Immunology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Plant Pathology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to the technical field of molecular biology, in particular to nucleic acid sequencing, and specifically relates to a sequencing joint and a sequencing analysis system thereof. Wherein the sequencing adaptor is in a partially complementary paired Y-shaped structure, and one strand of the sequencing adaptor sequentially comprises from 5 'to 3': internal Index sequence, Index1 sequencing primer binding region sequence, Index1 sequence, P7 sequence bound to chip probe; the other strand, from 5 'to 3', comprises in sequence: the sequence of P5 bound to the chip probe, the sequence of the primer binding region for Read1 sequencing, the internal Index sequence and the T base overhang. The internal tag sequence regions of the two strands are perfectly complementary paired, and the sequencing primer binding region sequences are partially complementary paired. When internal Index sequence connectors with different lengths and artificially balanced base proportion are combined, sequencing and analysis of a plurality of samples can be realized under the condition of ensuring higher sequencing quality and throughput, the turnaround time is greatly shortened, and the detection timeliness is improved.

Description

Sequencing joint and sequencing analysis system thereof
Technical Field
The invention relates to the technical field of molecular biology, in particular to nucleic acid sequencing, and specifically relates to a sequencing joint and a sequencing analysis system thereof.
Background
Since 2014 the first clinical application case of definite diagnosis of leptoprosis by metagenomic second-generation sequencing (mNGS) was published in the New England medical journal, the mNGS has made a lot of progress in the aspects of new pathogen identification, rare important pathogen diagnosis and the like, and the application of the mNGS in the field of critical and severe infection is also clinically approved. The pathogenic mNGS is characterized in that a sample of a suspected infected part is extracted to obtain nucleic acid in the sample, a nucleic acid fragment is connected with a DNA joint which can be hybridized with a sequencing chip, the joint contains a label sequence (Index) capable of distinguishing different samples, sequencing is performed through a high-throughput sequencer, and the detected sequence is compared with a database containing various pathogens, so that the pathogens can be rapidly locked. Meanwhile, by distinguishing the tag sequences Index, a plurality of samples can be simultaneously sequenced in parallel in one operation, the sequencing flux is fully utilized, and the cost is reduced.
A conventional TruSeq sequencing adaptor is shown in FIG. 1a, and has a T base overhang at the end of the adaptor, which is used to complement the terminal A base overhang in the sample added to the target fragment for T-A ligation. The last bit of the sequencing primer of Read1 contains a T, so that the insert is directly detected first when sequencing, and the T base is not detected. And after the sequencing of the Read1 is finished, replacing the sequencing primer of the tag sequence to obtain the tag sequence. Generally, for pathogen sequencing, the Read1 sequencing portion takes about 500 minutes and the complete sequencing of the Index1 tag takes about 50 minutes. That is, it takes about 550 minutes (-9 hours) for the whole sequencing to be completed, the sequencer can obtain the whole sequence and can distinguish which specific sample is.
In summary, adding the time for library preparation (4 hours) and the sequencing time (9-10 hours), the total of 14 hours is required from the start of sample preparation to the end at which analysis of each sample can begin. In the case of an Illumina NextSeq-like throughput sequencer, each time approximately 20G of data is generated, an hour or so of analysis is required. Thus, at least 15 hours are required from the initial sample to the time of the result, and the approximate flow is as shown in FIG. 1 b. The detection has poor timeliness, and needs to be improved urgently.
Disclosure of Invention
The invention aims to provide a sequencing joint, which can realize sequencing and analysis of a plurality of samples under the condition of ensuring higher sequencing quality and throughput, greatly shortens the turn-around time (TAT) and improves the detection timeliness.
Through analysis of the existing sequencing joint detection process, in order to improve detection timeliness, two key time points need to be solved, wherein 1, the sequencing time is long and accounts for 50% of the total TAT time; 2. the analysis takes one hour and the sequence alignment analysis can be started by splitting the data after obtaining the tag sequence Index of each sample only after waiting for the complete sequencing, i.e. at least 14 hours.
In order to achieve the purpose, the invention adopts the following technical scheme:
a sequencing adaptor (as shown in figure 2) in the form of a partially complementary paired wye, wherein one strand comprises, in order from 5 'to 3': internal Index sequence, Index1 sequencing primer binding region sequence, Index1 sequence, P7 sequence bound to chip probe; the other strand, from 5 'to 3', comprises in sequence: the sequence of P5 bound to the chip probe, the sequence of the primer binding region for Read1 sequencing, the internal Index sequence and the T base overhang. The internal tag sequence regions of the two strands are perfectly complementary paired, and the sequencing primer binding region sequences are partially complementary paired. An Index2 sequence may also be added between the P5 sequence and the sequence of the Read1 sequencing primer binding region.
New linkers will appear during sequencing as internal Ind mutex sequences are added downstream of the binding region of the Read1 sequencing primer, all of which will be T bases in the sequencing result to a fixed position (T-A junction). As shown in figure 3, when an internal Index sequence joint with the length of 8bp is singly used for sequencing, a base proportion at each cycle number can generate a T base with a high proportion at the ninth cycle, so that the single base fluorescence intensity is too strong, other bases uniformly have no signals, the balance proportion among the four bases A/T/C/G is broken, the difficulty of a sequencer in analyzing specific bases is increased, the sequencing quality of the bases at the position can be judged to have problems by analysis software, the sequencing sequence with a large proportion can not pass quality control, and the effective data output is greatly reduced. For the second generation sequencer, it is important to sequence the first few bases, which plays a role in locating cluster positions, so that the quality and quantity of sequencing are greatly reduced if the same base exists in the whole sequencing chip in one cycle in the first ten cycles.
To solve this problem, the present invention further optimizes the design of internal Index sequences for discriminating different samples, designs internal Index sequences having two to four or more lengths, and the length difference between adjacent long and short internal Index sequences may be one base, two bases, or more, but preferably one base in order to save the sequencing cost and reduce the time spent sequencing internal indexes. When used, internal Ind mutex sequence adapters of different lengths must be used in combination to avoid T-A linked T bases from occurring in the same sequencing cycle. All internal Index sequences used should be combined to achieve substantial base ratios equilibrium between internal Index sequences at each position in the sequencing cycle, so that Index improves the quality of the first 10 bases as much as possible.
FIG. 4 shows the base proportion results of each cycle when three types of internal Index sequence linkers of 6bp, 7bp and 8bp are adopted, and the linkers of different Index lengths are mixed to stagger the cycle of the base T, and it can be seen from FIG. 4 that three cycles have a slightly high proportion of T, but not all of the T are concentrated in the same cycle, so that a high-quality sequencing result can be obtained after optimization.
At least two to more than four internal Index sequence combinations are recommended to complete the labeling and sequencing of multiple samples. And the actual ratios of the various internal Index length joints used are balanced. In order to save the sequencing cost and reduce the time spent on sequencing the internal indexes, three internal Index sequence length combinations of 6bp, 7bp and 8bp are optimally used, and the joint of each internal Index length occupies about one third of the total joint; or, optimally, four internal Index sequence length combinations of 6bp, 7bp, 8bp and 9bp are used, and the linker of each internal Index length accounts for about one fourth of the total amount of the linker. For example, when two long and short internal Index sequences are combined, one internal Index sequence is 6 bases long, and one internal Index sequence is 7 bases long, and the two types of samples are mixed at 50% each. This would appear to result in the seventh base sequencing being 50% of the sequence as T (T-A junction of the 6 base internal Ind mutex sequences) and the remaining 50% of the sequence as the seventh base of the 7 base length internal Ind mutex sequence (and not allowed to be designed as T). This combination gives 50% signal and is T (T-A junction of 7-base internal Ind mutex sequence) Ind mutex when sequenced to the eighth base. All sequences are in the insert starting from the ninth base. If there are three to four different lengths of internal Index sequence combinations, it is better to distribute the base ratios evenly over each cycle. For example, three different internal Index sequences, one 6 base long, one 7 base long, and one 8 base long, were combined together at 1/3. Or a combination of four internal Index sequences of different lengths, one internal Index sequence being 6 bases in length, one internal Index sequence being 7 bases in length, one internal Index sequence being 8 bases in length, and one internal Index sequence being 9 bases in length, each accounting for 1/4.
The length difference between adjacent long and short internal Index sequences may be one base, two bases or more, but the Index is preferably one base, such as 6 bases, a combination of 7 bases and 8 bases.
It is further preferred in the present invention that all internal Index sequences used, when combined, achieve substantial base ratios equilibrium between internal Index sequences in each sequencing cycle. . Generally, when the number of libraries (or the number of indexes used) in one sequencing is 4 or more, the ratios of the four kinds of bases ATCG in each sequencing cycle of the internal Index sequence are suitably controlled to 8% to 50%, and the ratio is optimally controlled to 12.5% to 37.5%.
In addition to the above requirements, all internal Index sequences used should also satisfy: (1) the minimum Hamming distance of any two internal Index sequences is 3; (2) excluding Index sequences containing three or more identical contiguous bases; (3) the first two bases of the internal Index should not be "GG". In general, the longer the length of the Index sequence, the more types of indexes that ATCG can combine to create. In order to design enough indexes for multi-sample sequencing and the minimum Hamming distance between any two Index sequences is 3 or more, the sequence length of the internal Index is preferably 6 bases or more.
Because the generation mode of the sequencing sequence is changed, after the sequencing is started, the internal Index sequence can be measured to distinguish each sample after a plurality of cycles, and therefore, the sequence of a specific sample can be analyzed without waiting for the completion of all sequencing (9-10 hours). In addition, as the sequencing cycle number is more, the measured sequence is longer and longer, and the invention can realize real-time analysis to obtain the comparison and analysis results of sequences with different lengths along with the progress of sequencing.
Another objective of the present invention is to provide a novel sequencing analysis system (see FIG. 2 b) for sequencing and analysis, and real-time analysis to obtain sequence alignment and analysis results, according to the above novel linker structure. The system has the advantages of real-time cycle analysis, short analysis time and high accuracy.
The sequencing analysis system of the invention comprises:
1. a sequencing monitoring module: used for monitoring the sequencing progress in real time and triggering an analysis task.
The sequencing monitoring module can scan the sequencing catalog at regular time and monitor the sequencing progress. When the sequencing is carried out to a sufficient length (the shortest length is 22 bp), a monitoring program sends out a signal to trigger the subsequent analysis step, the extended sequence is continuously analyzed in real time along with the sequencing, and the next analysis can be started immediately after the previous analysis is finished.
2. A data generation module: the system is used for converting the BCL file generated by sequencing into a fastq file and filtering a low-quality sequence;
while sequence data is split into corresponding samples using a specific analysis program for specially designed adaptors.
And the data generation module converts the BCL file generated by sequencing into a fastq file, performs quality control on the sequencing data, removes low-quality data and sequences containing joints, and ensures reliable quality of data entering a subsequent analysis process. Meanwhile, the specially designed adaptor is used for distinguishing different samples during sequencing, is also suitable for an extremely-rapid analysis process, and is used for splitting sequence data into corresponding samples by using a specific analysis program.
3. A data filtering module: for removing human sequences from the sequences passing quality control.
And the data filtering module compares the quality-controlled sequence with a human genome database by using quick comparison software to remove the human sequence on comparison. And outputting the unaligned sequences to obtain non-human data with human sequences removed.
4. A data analysis module: for aligning the non-human sequence to a pathogenic microorganism genome database;
and the data analysis module compares the non-human data with the pathogenic microorganism genome database to obtain a microorganism sequence comparison result. For sequences with multiple alignment results, the system will select the alignment score between scoring regions [ L, U]The nearest common ancestor (LCA) of the taxon (taxon) to which these reference sequences belong is calculated as the final alignment of the sequences. The determination mode among the scoring areas is as follows:
Figure DEST_PATH_IMAGE001
Figure 464932DEST_PATH_IMAGE002
wherein
Figure DEST_PATH_IMAGE003
Representing the highest score of the theoretical alignment,
Figure 569023DEST_PATH_IMAGE004
representing the lowest score of the theoretical alignment,
Figure DEST_PATH_IMAGE005
the highest score of the alignment representing the sequence represents the scoring interval range parameter, with a default value of 20. When analyzing the comparison result, recording the information of whether the species compared with each sequence is unique or not, whether the species is completely compared or not, and the like.
5. A report generation module: and the method is used for counting, analyzing and comparing results and outputting an analysis report.
And the report generation module counts the number of the sequences detected by each classification unit according to the comparison result of the sequences, and counts the number of the sequences on the taxon, the number of the sequences of the taxon and all the sub-nodes thereof and the number of the uniquely-compared and completely-compared sequences of each taxon for the taxon containing smaller nodes.
Through the implementation of the technical scheme, compared with the nucleic acid sequencing in the prior art, the method has the following advantages:
1. the internal Index is located between the sequencing primer and the insert, and when performing extreme speed analysis, the Index is first determined, so that sequences from different samples can be separated early in sequencing without waiting for sequencing to be completely completed.
The Index uses at least two or more different lengths (preferably 3 lengths, each 6/7/8 bp). And Index sequences with different lengths avoid that the conventional method is in the same cycle and is all the result of T, thereby reducing the sequencing quality.
Base at each position of Index requires a uniform distribution of base ratios.
4. And after the sequencer obtains 22 sequences, the analysis software begins to analyze pathogen information, and each cycle continues to follow up the analysis, so that the purpose of NGS real-time analysis is achieved.
5. By combining Index joints with different lengths and a real-time analysis method, the result can be known only after the original machine is operated for at least 11 hours, and the basic condition of the microorganism in the sample can be known at the first time after about 5 hours after sequencing is started, so that the purpose of NGS (Next Generation Standard) extremely-rapid analysis is achieved.
Drawings
FIG. 1a is a schematic diagram of a conventional sequencing linker structure and a sequencing process in the prior art;
FIG. 1b is a graph showing the time consumption of each process of a conventional sequencing adapter system according to the prior art;
FIG. 2 is a schematic diagram of a sequencing structure and a sequencing process according to the present invention;
FIG. 2b is a schematic flow diagram of a sequencing analysis system using the sequencing adapter of the present invention;
FIG. 3 shows the base ratio at each cycle when sequencing was performed using an internal Index sequence linker of 8bp in length alone;
FIG. 4 shows the base ratios at each cycle number when sequencing was performed using three internal Index sequence linkers of 6bp, 7bp and 8bp length;
FIG. 5 is a detailed sequence structure of the sequencing adapter used in example 1; FIG. 6 is a comparison of base ratios at each cycle number for sequencing using the sequencing adaptors of the invention and conventional Illumina TruSeq adaptors of example 1;
FIG. 7 is a graph comparing sequencing quality and final library data volume when sequencing using the sequencing adapters of the present invention and a conventional Illumina TruSeq adapter of example 1;
FIG. 8 shows the sequence numbers of the Legionella pneumophila in example 2 measured in each cycle of analysis;
FIG. 9 shows the sequences of Citrobacter cleaveri from example 2 measured at each cycle of the analysis.
Detailed Description
It should be noted that the following embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that: the technical solutions described in the foregoing embodiments can be modified, or some technical features can be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.
Example 1
A sequencing adaptor of this embodiment is in a partially complementary paired Y-shaped configuration. One strand comprises, in order from 5 'to 3': internal Index sequence, Index1 sequencing primer binding region sequence, Index1 sequence, P7 sequence bound to chip probe. The other strand, from 5 'to 3', comprises in sequence: the sequence of P5 bound to the chip probe, the sequence of the primer binding region for Read1 sequencing, the internal Index sequence and the T base overhang. The internal tag sequence regions of the two strands are perfectly complementary paired, and the sequencing primer binding region sequences are partially complementary paired. The structure is shown in figure 5, and three long and short internal Index sequences are adopted, and the lengths are respectively 6bp, 7bp and 8 bp.
In this example, 48 internal Index sequences were designed. They were divided into 16 groups, each with internal Index sequences 6bp, 7bp and 8bp long.
The internal Index sequence meets the following requirements: (1) minimum Hamming distance of any two internal Index sequences is 3 (2) excluding Index sequences containing more than three identical consecutive bases. (3) the first two bases of the internal Index should not be "GG". (4) the 7 th base of the 7bp Index and the 7 th base of the 8bp Index should not be T, and the 8 th base of the 8bp Index should not be T. (5) The base ratios at each sequencing position of the indexes within the combination were all adjusted manually to achieve relative equilibrium.
The specific sequence and design are as follows:
Figure DEST_PATH_IMAGE007
153 libraries were each constructed using the internal Index linker and the traditional Illumina TruSeq linker described above, and then sequenced on a batch basis: dividing the internal Index joint library into 8 times of machine-on sequencing, wherein about 18-20 libraries are subjected to machine-on sequencing each time, and joints with the length of each internal Index account for about one third of the total quantity of the joints used in the round of sequencing; the TruSeq linker library was divided into 5 runs for sequencing, approximately 30-31 libraries per run. The quality of sequencing of the two adapters was compared and the results are shown in FIG. 6 (base ratio comparison at each cycle number for sequencing of the two adapters) and FIG. 7 (comparison of quality of sequencing and final library data volume for sequencing of the two adapters).
As shown in FIG. 6, the use of optimized internal Index adapters provided a more balanced base ratio, only slightly higher T base ratio (relative to the TruSeq adapter) at cycle 9, but had no effect on sequencing quality.
As shown in fig. 7, the use of optimized internal Index linkers ensures a higher percentage of qualified clusters and Q30 scores, and these sequencing quality indicators do not differ significantly from the data for TruSeq linkers. When the optimized internal Index joint is used for splitting data, the internal Index can be used alone for splitting, or the internal Index + Index1 can be used for carrying out double-Index splitting, and the finally obtained library data amount has no obvious difference from that when TruSeq is used.
Example 2
To evaluate the analytical performance of the system, two clinically positive samples were analyzed using the sequencing assay system of the present invention. Where the clinical outcome of sample 1 was legionella pneumophila infection and the clinical outcome of sample 2 was citrobacter cruzi infection. The analysis time and the test results of the two samples are shown in table 1 below. The sequence number of the legionella pneumophila in each analysis cycle is shown in the figure 8, and the sequence number of the Citrobacter kefir in each analysis cycle is shown in the figure 9.
TABLE 1 clinical sample assay time statistics
Figure DEST_PATH_IMAGE008
The analysis result shows that in the first report of the rapid analysis with the sequencing read length of 22bp, the system can sensitively detect the positive pathogenic bacteria; as sequencing progresses, the number of detected pathogen sequences increases slowly and stabilizes after several cycles. Therefore, the system can detect positive pathogens in a very early stage for pathogen infection positive samples and give reliable analysis results.

Claims (10)

1. A sequencing adaptor, characterized by having a partially complementary paired Y-shaped structure, wherein one strand comprises, in order from 5 'to 3': internal Index sequence, Index1 sequencing primer binding region sequence, Index1 sequence, P7 sequence bound to chip probe; the other strand, from 5 'to 3', comprises in sequence: the sequence of P5 bound to the chip probe, the sequence of the primer binding region for Read1 sequencing, the internal Index sequence and the T base overhang.
2. A sequencing adaptor according to claim 1, wherein, in use, multiple sample labelling and sequencing is performed using a combination of internal Index sequence adaptors of different lengths.
3. A sequencing adaptor according to claim 2, wherein the difference in length between adjacent long and short internal Index sequences is one base.
4. A sequencing adaptor according to claim 2, wherein in use, the combination of two to four lengths of internal Index sequence adaptors is used to perform labelling and sequencing of multiple samples.
5. A sequencing adapter according to claim 1, wherein all internal Index sequences used, when combined, achieve substantial base balance between internal Index sequences in each cycle of sequencing.
6. The sequencing adapter of claim 5, wherein when the number of internal indexes used in one sequencing is 4 or more, the proportion of ATCG four bases in each sequencing cycle of the internal Index sequence is suitably controlled to be 8% to 50% respectively.
7. The sequencing adapter of claim 6, wherein when the number of internal indexes used in one sequencing is 4 or more, the proportion of ATCG four bases in each sequencing cycle of the internal Index sequence is respectively controlled to be 12.5% -37.5% optimally.
8. A sequencing adaptor according to claim 1, wherein all internal Index sequences used are such that: (1) the minimum Hamming distance of any two internal Index sequences is 3; (2) excluding Index sequences containing three or more identical contiguous bases; (3) the first two bases of the internal Index sequence should not be "GG".
9. The sequencing adapter of claim 1, wherein an Index2 sequence is added between the sequence of P5 bound by the chip probe and the sequence of the binding region of the Read1 sequencing primer.
10. A sequencing analysis system based on the sequencing joint is characterized by comprising:
a sequencing monitoring module: the system is used for monitoring the sequencing progress in real time and triggering an analysis task;
a data generation module: the system is used for converting the BCL file generated by sequencing into a fastq file and filtering a low-quality sequence; meanwhile, specific analysis programs are used for splitting sequence data into corresponding samples according to the specially designed joints;
a data filtering module: for removing human sequences in the sequences passing quality control;
a data analysis module: for aligning the non-human sequence to a pathogenic microorganism genome database;
a report generation module: and the method is used for counting, analyzing and comparing results and outputting an analysis report.
CN202111374708.XA 2021-11-19 2021-11-19 Sequencing joint and sequencing analysis system thereof Pending CN114107290A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111374708.XA CN114107290A (en) 2021-11-19 2021-11-19 Sequencing joint and sequencing analysis system thereof
PCT/CN2022/071549 WO2023087527A1 (en) 2021-11-19 2022-01-12 Sequencing adapter and sequencing analysis system thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111374708.XA CN114107290A (en) 2021-11-19 2021-11-19 Sequencing joint and sequencing analysis system thereof

Publications (1)

Publication Number Publication Date
CN114107290A true CN114107290A (en) 2022-03-01

Family

ID=80396782

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111374708.XA Pending CN114107290A (en) 2021-11-19 2021-11-19 Sequencing joint and sequencing analysis system thereof

Country Status (2)

Country Link
CN (1) CN114107290A (en)
WO (1) WO2023087527A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024138666A1 (en) * 2022-12-30 2024-07-04 深圳华大生命科学研究院 Quality testing method and apparatus for spatiotemporal omics positioning chip, and device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130281306A1 (en) * 2010-10-26 2013-10-24 Roberto Rigatti Sequencing methods
CN108893466A (en) * 2018-06-04 2018-11-27 苏州人人基因科技有限公司 The detection method of sequence measuring joints, sequence measuring joints group and ultralow frequency mutation
CN109439729A (en) * 2018-12-27 2019-03-08 上海鲸舟基因科技有限公司 Detect connector, connector mixture and the correlation method of low frequency variation
WO2019055715A1 (en) * 2017-09-15 2019-03-21 Illumina, Inc. Universal short adapters with variable length non-random unique molecular identifiers

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011125833A1 (en) * 2010-03-31 2011-10-13 学校法人 慶應義塾 Constitution of tool for analyzing biomolecular interaction and analysis method using same
GB201804641D0 (en) * 2018-03-22 2018-05-09 Inivata Ltd Methods of sequencing nucleic acids and error correction of sequence reads
CN108949941A (en) * 2018-06-25 2018-12-07 北京莲和医学检验所有限公司 Low-frequency mutation detection method, kit and device
CN109680054A (en) * 2019-01-15 2019-04-26 北京中源维康基因科技有限公司 A kind of detection method of low frequency DNA mutation
CN112795990B (en) * 2019-11-14 2024-03-22 广州华大基因医学检验所有限公司 Flexible and changeable multi-tag secondary sequencing library joint capable of reducing pollution and PCR bias
CN112626189A (en) * 2020-04-24 2021-04-09 北京吉因加医学检验实验室有限公司 Short joint, double-index joint primer and double-index library construction system of gene sequencer

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130281306A1 (en) * 2010-10-26 2013-10-24 Roberto Rigatti Sequencing methods
WO2019055715A1 (en) * 2017-09-15 2019-03-21 Illumina, Inc. Universal short adapters with variable length non-random unique molecular identifiers
CN108893466A (en) * 2018-06-04 2018-11-27 苏州人人基因科技有限公司 The detection method of sequence measuring joints, sequence measuring joints group and ultralow frequency mutation
CN109439729A (en) * 2018-12-27 2019-03-08 上海鲸舟基因科技有限公司 Detect connector, connector mixture and the correlation method of low frequency variation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郝昕蕾: "二代测序技术在外源性眼内炎患者病原微生物检测中的应用", 《眼科新进展》, vol. 41, no. 08, pages 750 - 754 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024138666A1 (en) * 2022-12-30 2024-07-04 深圳华大生命科学研究院 Quality testing method and apparatus for spatiotemporal omics positioning chip, and device and storage medium

Also Published As

Publication number Publication date
WO2023087527A1 (en) 2023-05-25

Similar Documents

Publication Publication Date Title
CN110349629B (en) Analysis method for detecting microorganisms by using metagenome or macrotranscriptome
WO2014023167A1 (en) METHOD AND SYSTEM FOR DETECTING α-GLOBIN GENE COPY NUMBER
CN110211633B (en) Detection method for MGMT gene promoter methylation, processing method for sequencing data and processing device
CN111808854B (en) Balanced joint with molecular bar code and method for quickly constructing transcriptome library
CN114657238B (en) Medlar 40K liquid phase chip and application
CN112251422B (en) Transposase complex containing unique molecular tag sequence and application thereof
CN110021352A (en) A kind of plant based on miRBase database has the miRNA data analysing method of ginseng
CN114107290A (en) Sequencing joint and sequencing analysis system thereof
AU2022298428A1 (en) Gene sequencing analysis method and apparatus, and storage medium and computer device
CN115433768B (en) IGH hypermutation detection method and system based on NGS amplicon sequencing technology
CN111676276A (en) Method for rapidly and accurately determining gene editing mutation condition and application thereof
CN113463202A (en) Novel RNA high-throughput sequencing method, primer group and kit and application thereof
CN113373524A (en) ctDNA sequencing tag joint, library, detection method and kit
CN113046835A (en) Sequencing library construction method for detecting lentivirus insertion site and lentivirus insertion site detection method
CN112795654A (en) Method and kit for organism fusion gene detection and fusion abundance quantification
CN109859797A (en) A kind of miRNA data analysing method without ginseng based on miRBase database
WO2023082305A1 (en) Library construction element compatible with double sequencing platforms, kit, and library construction method
CN115101128A (en) Method for evaluating off-target risk of hybridization capture probe
CN106520958B (en) Method for developing microsatellite marker locus and method for detecting length of microsatellite marker in microsatellite marker locus
Forsberg et al. CLC Bio Integrated Platform for Handling and Analysis of Tag Sequencing Data
CN115948607B (en) Method and kit for simultaneously detecting multiple pathogen genes
CN114277096B (en) Method and kit for identifying thalassemia alpha anti4.2 heterozygotes and HK alpha heterozygotes
CN108388771B (en) Automatic biodiversity analysis method
CN116064818A (en) Primer group, method and system for detecting IGH gene rearrangement and hypermutation
CN116065240A (en) Method and kit for constructing RNA sequencing library in high throughput

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination