WO2016090585A1

WO2016090585A1 - Sequencing data processing apparatus and method

Info

Publication number: WO2016090585A1
Application number: PCT/CN2014/093516
Authority: WO
Inventors: 刘兴民; 刘敬一; 刘耿; 赵鑫; 杨明; 侯勇; 吴逵; 李波
Original assignee: 深圳华大基因研究院
Priority date: 2014-12-10
Filing date: 2014-12-10
Publication date: 2016-06-16
Also published as: CN107077538A; CN107077538B

Abstract

Disclosed are a sequencing data processing apparatus, system and method, a computer-readable storage medium, and a method and apparatus for detecting SNP (Single Nucleotide Polymorphism), the sequencing data processing apparatus comprising: a data receiving unit (10) for receiving the sequencing data, wherein the sequencing data comprises a plurality of read pairs with each pair being composed of two reads originating from two locations of one chromosome segment respectively, and each read comprises a gap; a processor (20) for executing a data processing program, wherein executing the data processing program comprises comparing the sequencing data with a reference sequence to obtain a comparison result, and eliminating the gap of each read in the comparison result to obtain a universal comparison result; and at least one storage unit (30) for data storage, wherein the data processing program is comprised.

Description

Sequencing data processing device and method

Technical field

The present invention relates to the field of biological information. Specifically, the present invention relates to a sequencing data processing apparatus and method, and more particularly, to a sequencing data processing apparatus, a sequencing data processing system, and a processing method for sequencing data. A computer readable storage medium, a method of detecting a SNP, and a SNP detecting apparatus.

Background technique

cfDNA (cell-free DNA), which is present in serum, plasma or other body fluids, is an effective biomarker that can be applied to a variety of mutation detection, such as cancer, fetal chromosomal variation and other genetic mutations. Due to the lack of high sensitivity and accuracy of quantitative analysis techniques, previous studies have focused on a number of known disease-related genes, such as the pigmentoma-GNAQ gene (Metz, Claudia HD, et al. Ultradeep sequencing detection GNAQ and GNA11mutations). In cell‐free DNA from plasma of patients with uveal melanoma. Cancer medicine 2.2 (2013): 208-215.), 21 Trisomy 21 (Liao, Gary JW, et al. "Noninvasive prenatal diagnosis of fetal trisomy 21by Allelic ratio analysis using targeted massively parallel sequencing of maternal plasma DNA. "PLoS One 7.5 (2012): e38154.) and the like.

The birth of next-generation sequencing technologies 454 (Roche), Solexa (Illumina) and SOLiD (ABI) has led to a rapid increase in sequencing throughput and a sharp drop in sequencing costs, which provides new ideas for cfDNA detection. Massively Parallel Sequencing (MPS) is the most popular cfDNA detection technology. It is widely used in plasma DNA molecular diagnosis, fetal chromosomal aneuploidy, whole genome karyotyping, and even fetal whole genome sequencing. Single Nucleotide Polymorphism (SNP) refers to the variation of single nucleotides in the genome (including substitutions, transversions, deletions, and insertions). The genetic markers formed are numerous and polymorphic. . SNP may cause a variety of human diseases, such as cancer, infectious diseases (AIDS, leprosy, hepatitis, etc.), autoimmune diseases, neuropsychiatric diseases, sickle cell anemia, beta thalassemia and cystic fibrosis. [Ingram,VM"A Specific Chemical Difference Between the Globins of Normal Human and Sickle-Cell

Nature 178 (1956).]; SNP-related diseases may become the main gene targets for drug therapy [Fareed, Mohd, and Mohammad Afzal." Single nucleotide polymorphism in genome-wide association of human population: A tool for broad spectrum "Egyptian Journal of Medical Human Genetics 14.2 (2013): 123-134.]; the metabolism of certain drugs is closely related to SNP [Yanase, Kae, et al." Functional SNPs of the breast cancer resistance protein-therapeutic effects and Inhibitor development. "Cancer letters 234.1 (2006): 73-80.]; SNPs that have no effect on phenotype, due to their stable inheritance in generations, are important in genome-wide association studies (GWAS) [Thomas, Philippe E., et al. "Challenges in the association of human single nucleotide polymorphism mentions with unique database identifiers. "BMC bioinformatics 12. Suppl 4 (2011): S4.]. Therefore, SNPs are called third-generation genetic markers and have been extensively studied.

Summary of the invention

The present invention is directed to solving at least some of the above technical problems or at least providing a commercial choice.

According to a first aspect of the present invention, the present invention provides a sequencing data processing apparatus, the apparatus comprising: a data receiving unit, configured to receive the sequencing data, the sequencing data comprising a plurality of pairs of read pairs, each pair of reads Composed of two reads, each from two positions of a chromosome segment, two reads of each pair of read pairs are from the positive and negative strands of the chromosome segment, or each pair of reads Both reads are from the positive strand of the chromosome fragment or the negative strand of the chromosome, each read contains a gap, and the two reads of a pair of read pairs are defined as the left arm and the right arm, respectively; And executing the data processing program, the executing the data processing program includes implementing the comparison of the sequencing data with a reference sequence, obtaining a comparison result, and eliminating a gap of each of the comparison results. Obtaining a universal alignment result, the alignment result comprising a plurality of alignment results of the pair of reads, and/or, the comparison result comprising a plurality of the left arm alignment results and a plurality of the right Arm alignment results; and, At least one storage unit for storing data, including the data processing program. The pair of reads from two positions of a chromosome fragment, respectively, can be obtained by sequencing a constructed library by constructing a pair-end library or a mate-pair library. In one embodiment of the present invention, multiple pairs of read pairs are obtained using the library construction method of Complete Genomics (CG) and its sequencing platform. The distance between a pair of read pairs is determined by the length of the read and the enzyme. The distance between the recognition site and the cleavage site is controlled. The CG platform was constructed by enzymatic cleavage to construct a multi-linker paired-end library, and the constructed circular library was sequenced by a unique combinatorial probe-ligation sequencing (cPAL) technique. The bases on both sides of the linker were read because they were ligated by restriction enzyme digestion. Two segments of a linker are used to construct a paired-end library, since each enzyme has a preferred cutting distance, and in actual digestion, it is often one more position or one less than the preferred distance, which makes the reading often With a gap, the gap is often +1 or -1, and / or, if the same enzyme is used for multiple digestions during the construction of the library, the position of the enzyme digestion is easy to change, and the position of the enzyme digestion will change. The obtained reads are nicked, for example, when constructing a multi-ligand circular library, the Alu enzyme is used for two digestions to join different portions of the plurality of linkers, and when the bases adjacent to the linkers are read, a band of +3 is generated. A reading of the gap of /-3. The size of the gap in the present invention may also be zero. Taking the current two-coupler (2-AD) sequencing library of the CG platform as an example, the 2-AD sequencing output has a total length of 60 bp, which can be divided into two pairs of mate-paired reads, and each pair of reads is centered. The reads have a small gap at 10 bp, an invalid sequencing site N at the 20 bp position, and the distance between the two reads of a pair of reads is generally less than 2000 bp. From a plurality of reads in a multi-joint library, one read can form a pair of read pairs with any other read. The term "positive strand" and "negative strand" as used herein are complementary two strands constituting a chromosome fragment, and are opposite. A strand is said to be a positive strand, and its complementary strand may be referred to as a negative strand, in an embodiment of the present invention. In the example, a chain that matches a reference sequence is referred to as a positive chain, and another chain is referred to as a negative chain. In the present invention, the alignment can be performed using known comparison software, such as SOAP, BWA, etc., can also be performed using the comparison software TeraMap of the CG platform. In one embodiment of the invention, the alignment is performed using TeraMap, and the resulting alignment result is in the format TeraMap. In one embodiment of the present invention, the gap of each read in the elimination comparison result means that the negative gap is removed from the read with the negative gap, that is, the overlapping base is removed, and the positive gap is removed. The read segment replaces the size of the positive gap by N, N is A, T, C or G. For example, for a read with a negative gap such as -2 nt, the read can be divided into two parts based on the gap, the end of the two parts There are 2 nt overlaps. For example, the two parts of the read are ATCGCTTAAG and AGTACGATTC respectively, and the negative gaps are overlapped, and the corresponding read is ATCGCTTAAGTACGATTC.

In one embodiment of the invention, the aligning in the method of one aspect of the invention comprises: comparing the left and right arms of each pair of read pairs to the reference sequence, respectively, to obtain a level one left alignment The result is compared with the first-order right-aligned result; one of the first-order left-aligned result and the first-order right-aligned result is used as a reference, and the other is compared, and the second-order left-aligned result and the second are obtained. Level-aligning the result; obtaining a comparison result of the plurality of the pair of readings based on the result of the second-order left alignment and the result of the second-order right alignment, or obtaining an alignment result of the plurality of the left arms Alignment results with a plurality of said right arms. Thus, after two comparisons, the read comparison result can be obtained. In one embodiment of the present invention, the first alignment is globally aligned with the reference sequence, and the left arm/right arm alignment result is The second alignment of the baseline for the right arm/left arm alignment results is a local alignment, such that alignments from the second-order left alignment result and the second-order right alignment result, respectively, can be performed on the same chromosome. The distance between the two reads that match the expected pair is paired into a pair of read pairs, and the read contrast is obtained.

In an embodiment of the invention, the comparing comprises: setting the size of the notch to compare each left arm or each right arm with the reference sequence multiple times to obtain an optimal ratio For the result. For example, the gaps of each of the left arms or each of the right arms are set to -3 nt, -2 nt, -1 nt, 0 nt, 1 nt, 2 nt, 3 nt, 4 nt, 5 nt, 6 nt, and 7 nt, respectively. a read segment, respectively comparing the corresponding plurality of read segments with the reference sequence, and using the optimal aligned sequence as the left arm/right arm, where the comparison result may be based on the utilized Compare the software to the default evaluation of the results.

In an embodiment of the present invention, executing the data processing program further includes implementing, before the gap of each of the comparison results in the comparison result, extracting a unique comparison result in the comparison result to replace The alignment result, the unique alignment result comprising a plurality of read pairs uniquely aligned with the reference sequence, and each of the reads contrasts to the same chromosome to the reference sequence, each of the The distance between the two reads of the pair of reads corresponds to the expected distance between the two locations of the chromosome segment from which it came.

In one embodiment of the invention, executing the data processing program further comprises implementing correcting a positive strand of the same chromosome that contrasts each pair of the unique alignment results to the reference sequence. For example, for a pair of reads that respectively align the positive and negative strands of the previous chromosome, the reads of the aligned negative strands become their complementary strands, thus replacing the reads with their reverse complementary strands. Said correction.

In an embodiment of the invention, executing the data processing program further comprises implementing data format conversion, the number The format conversion includes converting the format of the alignment result or the unique alignment result. In an implementation of the present invention, the format of the general comparison result is required to be SAM or BAM, so as to facilitate subsequent analysis of the data based on the comparison result or the comparison result, SAM or BAM is a common binary format, and BAM is a SAM. Compressed format. Due to the use of different comparison software, the format of the output comparison result or the unique comparison result may not be applicable to existing subsequent data processing or analysis software programs, such as the comparison result of the aforementioned TeraMap format, and the output data format thereof. It does not meet the requirements of the input data format of most existing mutation detection software SOAPsnp, GATK or SOAPindel, and converts the data format to obtain the general comparison result with the common data format, which is convenient for further analysis and processing of the data.

According to a second aspect of the present invention, there is provided a sequencing data processing system comprising a host and a display, the system further comprising a sequencing data processing device in accordance with one or any embodiment of the present invention. The foregoing description of the advantages and technical features of the sequencing data processing apparatus is equally applicable to the system of the present invention and will not be described herein.

According to a third aspect of the present invention, a method for processing a sequencing data is provided, the method comprising the steps of: acquiring sequencing data, the sequencing data comprising a plurality of pairs of read segments, each pair of read segments consisting of two read segments, respectively Two positions from one chromosome segment, two reads from each pair of reads are from the positive and negative strands of the chromosome segment, or two reads from each pair of read pairs are from the chromosome a positive strand of a fragment or a negative strand of the chromosome fragment, each read comprising a gap, defining two reads of a pair of read pairs as a left arm and a right arm, respectively; comparing the sequencing data to a reference sequence And obtaining a comparison result, the comparison result comprising a comparison result of the plurality of the pair of readings, and/or, the comparison result comprising a comparison result of the plurality of the left arms and a plurality of the The result of the alignment of the right arm; the gap of each of the readouts is eliminated, and a general alignment result is obtained. For the characteristics of the acquisition mode of the read pair, the gap included in the read, the alignment, the elimination of the gap, the comparison result and the general comparison result, reference may be made to the above-mentioned corresponding to the device in one aspect or any embodiment of the present invention. Description of technical features. For example, in the same way, the pair of reads from two positions of a chromosome fragment, respectively, can be constructed by constructing a pair-end library or a mate-pair library. By performing sequencing, in one embodiment of the present invention, multiple pairs of read pairs are obtained by using the library construction method of Complete Genomics (CG) and its sequencing platform, and the distance between a pair of read pairs is read by The length and the distance between the recognition site of the enzyme and the cleavage site are controlled. The CG platform was constructed by enzymatic cleavage to construct a multi-linker paired-end library, and the constructed circular library was sequenced by a unique combinatorial probe-ligation sequencing (cPAL) technique. The bases on both sides of the linker were read because they were ligated by restriction enzyme digestion. Two segments of a linker are used to construct a paired-end library, since each enzyme has a preferred cutting distance, and in actual digestion, it is often one more position or one less than the preferred distance, which makes the reading often With a gap, the gap is often +1 or -1, and / or, if the same enzyme is used for multiple digestions during the construction of the library, the position of the enzyme digestion is easy to change, and the position of the enzyme digestion will change. The obtained reads are nicked, for example, when constructing a multi-ligand circular library, the Alu enzyme is used for two digestions to join different portions of the plurality of linkers, and when the bases adjacent to the linkers are read, a band of +3 is generated. A reading of the gap of /-3. The size of the gap in the present invention may also be zero. Multiple reads from a multi-ligand library, one read can and either Its read segments form a pair of read pairs. The term "positive strand" and "negative strand" as used herein are complementary two strands constituting a chromosome fragment, and are opposite. A strand is said to be a positive strand, and its complementary strand may be referred to as a negative strand, in an embodiment of the present invention. In the example, a chain that matches a reference sequence is referred to as a positive chain, and another chain is referred to as a negative chain. In the present invention, the alignment can be performed using known comparison software, such as SOAP, BWA, etc., or can be performed using the comparison software TeraMap of the CG platform. In one embodiment of the invention, the alignment is performed using TeraMap, and the resulting alignment result is in the format TeraMap. In one embodiment of the present invention, the gap of each read in the elimination comparison result means that the negative gap is removed from the read with the negative gap, that is, the overlapping base is removed, and the positive gap is removed. The read segment replaces the size of the positive gap by N, N is A, T, C or G. For example, for a read with a negative gap such as -2 nt, the read can be divided into two parts based on the gap, the end of the two parts There are 2 nt overlaps. For example, the two parts of the read are ATCGCTTAAG and AGTACGATTC respectively, and the negative gaps are overlapped, and the corresponding read is ATCGCTTAAGTACGATTC.

In one embodiment of the invention, obtaining the sequencing data comprises constructing a sequencing library to obtain a sequencing library, the sequencing library being a single-stranded circular DNA library, the sequencing library being composed of a strand of the chromosome fragment and at least one The predetermined DNA sequence is constructed. The single-stranded circular library can be constructed by a known library construction method, for example, a single-linker circular double-stranded library is obtained by reference to the construction of a paired-end library of SOLiD of Life Technologies, and then the double-stranded single-stranded circular library is obtained. In one embodiment of the invention, the single-stranded circular library is constructed using the CG library construction technique, and the library construction can be referred to US7897344 to obtain a multi-linker single-stranded circular library.

In one embodiment of the invention, each pair of reads is from both ends of the chromosome segment. By referring to the improved CG library construction technique, two parts of a linker are respectively ligated to both ends of a chromosome fragment, single-stranded and single-stranded to obtain a 1-ligand single-stranded circular library, and the 1-linker single-stranded The circular library consists of one strand of the chromosome fragment and a predetermined DNA sequence connecting the two ends of the strand, the rolling circle is amplified to form DNA nanospheres (DNB), and the CG platform is sequenced using its unique high density. DNA nanochip technology, the principle is to embed DNA nanospheres on the chip, read sequence with non-continuous, non-linked joint probe anchor continuous (cPAL) technology, for DNB embedded into the chip and cPAL technology can refer to US8278039B2 and US8518640B2 respectively . The predetermined DNA sequence is a known sequence and is a link of the aforementioned linker or linker. The improved CG building method constructs a 1-ligand circular single-strand library comprising the steps of: (1) extracting a nucleic acid to be tested; (2) phosphorylating the nucleic acid at the terminal to obtain a terminal phosphorylated product; and (3) end-repairing Said terminal phosphorylation product, obtaining a terminal repair product; (4) linking the first sequence and the second sequence to both ends of the terminal repair product to obtain a first ligation product; (5) using the third sequence for the ligation The product is subjected to nick translation and amplification to obtain an amplification product, the third sequence being a pair of primer pairs, at least one primer of the primer pair carrying a biotin label; (6) using the biotin label to Amplification products are subjected to single-strand separation to obtain a single-stranded product; (7) cyclizing the single-stranded product with a fourth sequence to obtain the sequencing library; wherein the fourth sequence is capable of ligating one end of the first sequence And at one end of the second sequence, the other end of the first sequence and/or the second sequence is a dideoxynucleotide. Said fourth sequence is capable of connecting said first sequence and said second sequence Into a linker, the nick translation is a nick caused by the elimination of the dideoxynucleotide attached to the first end of the end repair product and/or the other end of the second sequence, using at least one primer The biotin label carries at least one strand of the amplified product with a biotin label, making it easier to subsequently isolate the single stranded product based on the biotin label. In one embodiment of the present invention, the improved CG library construction method constructs a 1-ligand circular single-strand library comprising the steps of: (1) extracting a nucleic acid to be tested; (2) repairing the nucleic acid at the end to obtain a terminal repair product. (3) terminal phosphorylating the terminal repair product to obtain a terminal phosphorylation product; (4) linking the first sequence and the second sequence to both ends of the terminal phosphorylation product to obtain a first ligation product; Performing nick translation and amplification of the ligation product using a third sequence to obtain an amplification product, the third sequence being a pair of primer pairs, at least one primer of the primer pair carrying a biotin label; (6) Single-stranding the amplification product by the biotin label to obtain a single-stranded product; (7) cyclizing the single-stranded product with a fourth sequence to obtain the sequencing library; wherein the fourth sequence One end of the first sequence and one end of the second sequence can be ligated, and the other end of the first sequence and/or the second sequence is a dideoxynucleotide. The steps of end repair and terminal phosphorylation are first made without limitation. End repair is to obtain a blunt-ended nucleic acid fragment that enables attachment of other nucleotides or sequences. Terminal phosphorylation is to reduce the interconnection of sample nucleic acid fragments, so that samples with low nucleic acid content can also be constructed in a library and meet the requirements of the library. Single-linker circular single-strand library As shown in Figure 1, the constructed single-linker circular single-strand library (1-AD) was sequenced on the machine, and the 1-AD sequencing output read pair had a total length of about 30 bp, one read. 12 bp, 19 bp in one read, the median distance of the genome between the two reads in a read is about 140 bp. The single joint has a small amount of storage, which is suitable for the case of less cfDNA content, and has the advantages of short construction time and low construction cost.

In one embodiment of the invention, the alignment in the method of the invention comprises: comparing the left and right arms of each pair of read pairs to the reference sequence, respectively, to obtain a level 1 left alignment result and The first-order right-aligned result is compared with one of the first-order left-aligned result and the first-order right-aligned result, and the other is compared, and the second-order left-aligned result and the second-level right are obtained. Aligning the results, obtaining a comparison result of the plurality of the pair of readings based on the result of the second-order left alignment and the result of the second-order right alignment, or obtaining a comparison result of the plurality of the left arms and The alignment of the right arms. Thus, after two comparisons, the read comparison result can be obtained. In one embodiment of the present invention, the first alignment is globally aligned with the reference sequence, and the left arm/right arm alignment result is The second alignment of the baseline for the right arm/left arm alignment results is a local alignment, such that alignments from the second-order left alignment result and the second-order right alignment result, respectively, can be performed on the same chromosome. The distance between the two reads that match the expected pair is paired into a pair of read pairs, and the read contrast is obtained.

In one embodiment of the invention, the aligning includes arranging the gaps such that each left or each right arm is compared with the reference sequence multiple times to obtain an optimal alignment result. For example, the gaps of each of the left arms or each of the right arms are set to -3 nt, -2 nt, -1 nt, 0 nt, 1 nt, 2 nt, 3 nt, 4 nt, 5 nt, 6 nt, and 7 nt, respectively. a read segment, respectively comparing the corresponding plurality of read segments with the reference sequence, and using the optimal aligned sequence as the left arm/right arm, where the comparison result may be based on the utilized Comparison software comparison result The default judgement.

In one embodiment of the invention, executing the data processing program further comprises implementing a data format conversion, the data format conversion comprising converting the alignment result or the format of the unique alignment result. In an implementation of the present invention, the format of the general comparison result is required to be SAM or BAM, so as to facilitate subsequent analysis of the data based on the comparison result or the comparison result, SAM or BAM is a common binary format, and BAM is a SAM. Compressed format. Due to the use of different comparison software, the format of the output comparison result or the unique comparison result may not be applicable to existing subsequent data processing or analysis software programs, such as the comparison result of the aforementioned TeraMap format, and the output data format thereof. It does not meet the requirements of the input data format of most existing mutation detection software SOAPsnp, GATK or SOAPindel, and converts the data format to obtain the general comparison result with the common data format, which is convenient for further analysis and processing of the data.

According to a fourth aspect of the present invention, there is provided a computer readable storage medium for storing a program for execution by a computer, the execution of the program comprising performing an aspect of the aforementioned invention or any one of its embodiments. Sequencing data processing method. The foregoing description of the advantages and technical features of the sequencing data processing method of the present invention is also applicable to the computer readable storage medium, and details are not described herein again. The storage medium may include: a read only memory, a random access memory, a magnetic disk or an optical disk, and the like.

According to a fifth aspect of the invention, the invention provides a method for detecting a single nucleotide polymorphism (SNP), the method comprising: A. obtaining a nucleic acid of a sample to be tested; B. performing at least a portion of the nucleic acid Sequencing, obtaining sequencing data; C. processing the sequencing data to obtain a universal alignment result; D. detecting a SNP based on the universal comparison result; wherein, step C is utilized in one aspect of the invention or any specific The sequencing data processing apparatus and/or method in the embodiments are performed. The above description of the advantages and technical features of the sequencing data processing apparatus and/or method of the present invention is also applicable to the SNP detection method of this aspect of the present invention, and will not be described herein.

In one embodiment of the invention, the step B comprises performing a sequencing library construction on at least a portion of the nucleic acid to obtain a sequencing library, the sequencing library being a single-stranded circular DNA library, the single-stranded circular DNA library The construct comprises: terminal phosphorylating the nucleic acid to obtain a terminal phosphorylation product; terminally repairing the terminal phosphorylation product to obtain an end repair a multiplex product; a first sequence and a second sequence are ligated to both ends of the terminal repair product to obtain a first ligation product; and the ligation product is subjected to nick translation and amplification using a third sequence to obtain an amplification product, The third sequence is a pair of primer pairs, at least one primer of the primer pair carrying a biotin label; the amplification product is subjected to single-strand separation using the biotin label to obtain a single-stranded product; a single-stranded product, wherein the sequencing library is obtained; wherein the fourth sequence is capable of joining one end of the first sequence and one end of the second sequence, the first sequence and/or the second sequence One end is a dideoxynucleotide. In another embodiment of the invention, end repair is performed followed by terminal phosphorylation. End repair is to obtain a blunt-ended nucleic acid fragment that enables attachment of other nucleotides or sequences. Terminal phosphorylation is to reduce the interconnection of sample nucleic acid fragments, so that samples with low nucleic acid content can also be constructed in a library and meet the requirements of the library. Said fourth sequence is capable of joining the first sequence and the second sequence to form one of said linkers, and the nick translation is for eliminating the dideoxy core attached to the first end of the end repair product and/or the other end of the second sequence A nick caused by a glycoside, with at least one primer carrying a biotin label to carry at least one strand of the amplified product with a biotin label, so that subsequent separation of the single-stranded product based on the biotin label is easily obtained. The single-linker circular single-stranded library is shown in Figure 1. The single-linker has a small amount of storage, which is suitable for the case of less cfDNA content. In addition, it has the advantages of short construction time and low cost of construction. In one embodiment of the invention, sequencing of the constructed library is performed using a combinatorial probe anchoring ligation sequencing technique, for example, using a CG sequencing platform, which avoids the accumulation of errors in sequencing, and its accuracy is more synthetic. The method and the ligation method have high sequencing, up to 99.999%, and the sequencing data is obtained by single-link sequencing of the CG platform, and the cost is lower and the speed is faster.

The detection of SNPs based on universal alignment results can utilize currently known SNP detection methods and/or software, such as using SOAP2, samtools, GATK, and the like. In one embodiment of the present invention, the open source software samtools is used to preprocess the common alignment results (bam files), including sorting, removing duplicate reads generated by the database PCR, and then using samtools mpileup and/or GATK open source software. Get the SNP results and convert them to vvc format using the open source software bcftools. Compared to other tools, Samtools is simple to operate, and the output format is common. Multi-threading can be used to improve efficiency during big data processing.

According to a sixth aspect of the present invention, the present invention provides a SNP detecting apparatus for performing all or part of the steps of the SNP detecting method of one aspect of the present invention, the apparatus comprising: a nucleic acid acquiring apparatus for acquiring a test a nucleic acid of the sample; a sequencing device for sequencing at least a portion of the nucleic acid from the nucleic acid acquisition unit to obtain sequencing data, the sequencing data comprising a plurality of pairs of read pairs, each pair of read pairs consisting of two reads , respectively, from two positions of a chromosome segment, two reads in each pair of read pairs are from the positive and negative strands of the chromosome segment, or two reads from each pair of read pairs are from a positive strand of a chromosome fragment or a negative strand of the chromosome, each read containing a gap, defining two reads of a pair of read pairs as a left arm and a right arm, respectively; a data processing device for pairing The sequencing data of the sequencing device is processed to obtain a universal alignment result; the detecting device is configured to detect the SNP based on a universal comparison result from the data processing device; wherein Data processing apparatus comprising a data receiving a unit for receiving sequencing data from the sequencing device, a processor for executing a data processing program, and executing the data processing program, including performing comparison of the sequencing data with a reference sequence, obtaining a comparison result, and Eliminating the gap of each of the comparison results, obtaining a general alignment result, the comparison result comprising a comparison result of the plurality of the pair of readings, and/or, the comparison result includes a plurality of Alignment results of the left arm and alignment results of a plurality of the right arms, and at least one storage unit for storing data, including the data processing program. The foregoing description of the advantages and technical features of the SNP detection method in one aspect of the present invention or any of its specific embodiments also applies to the SNP detecting device of this aspect of the present invention, and details are not described herein again. It will be understood by those skilled in the art that all or a portion of the units of the present invention, optionally, detachably, include one or more sub-units to perform or implement various embodiments of the foregoing SNP detection methods of the present invention.

The data processing apparatus, system and/or method of the invention comprises the following steps: developing the TeraMap2Sam conversion software, accurately processing the gap in the sequencing data, and converting the comparison result of the CG platform TeraMap into a general SAM format, so that the subsequent use of the Samtools can be directly used. , GATK and many other excellent open source software for mutation detection, making the selection of subsequent analysis more extensive. The SNP detection method and/or device of the present invention, including Sumtools software for SNP analysis, is simple, universal, fast, and highly reliable.

DRAWINGS

The above and/or additional aspects and advantages of the present invention will become apparent and readily understood from

1 is a schematic view showing the structure of a single-linker circular single-stranded library in one embodiment of the present invention;

2 is a schematic structural diagram of a sequencing data processing apparatus in an embodiment of the present invention;

3 is a schematic structural diagram of a sequencing data processing system in an embodiment of the present invention;

4 is a flow chart of a method for processing sequencing data in an embodiment of the present invention;

Figure 5 is a flow chart showing a method of processing sequencing data in an embodiment of the present invention;

6 is a flow chart of a SNP detecting method in one embodiment of the present invention;

7 is a schematic structural diagram of a SNP detecting apparatus in an embodiment of the present invention;

Figure 8 is a flow diagram showing the construction and sequencing of a single linker library in one embodiment of the present invention;

Figure 9 is a flow chart of SNP detection in an embodiment of the present invention;

Figure 10 is a flow chart of the algorithm of the Teramap2Sam software in one embodiment of the present invention.

detailed description

The embodiments of the present invention are described in detail below, and the examples of the embodiments are illustrated in the drawings, wherein the same or similar reference numerals are used to refer to the same or similar elements or elements having the same or similar functions. Below by reference to the attached drawings The described embodiments are illustrative only and are not to be construed as limiting the invention. It should be noted that the terms "first", "second", "third", "fourth" or "first grade", "secondary grade" and the like as used herein are merely for convenience of description, but not To understand or indicate the relative importance, it cannot be understood as a sequential relationship. In the description of the present invention, "a plurality" means two or more unless otherwise stated.

2 is a block diagram showing the structure of an embodiment of a sequencing data processing apparatus of the present invention. The sequencing data processing apparatus 100 includes a data receiving unit 10, a processor 20, and a storage unit 30, a processor 20 and a data receiving unit 10, and The storage unit 30 is connected, and the storage unit 30 is connected to the data processing unit 10. The data receiving unit 10 is configured to receive sequencing data, where the sequencing data includes multiple pairs of read pairs, each pair of read segments consists of two read segments, respectively, from two positions of a chromosome segment, each pair of read long pairs The two reads are from the positive and negative strands of the chromosome fragment, respectively, or both reads in each pair of reads are from the positive strand of the chromosome fragment or the negative strand of the chromosome, each read The segments all contain gaps, and the two reads of a pair of read pairs are defined as the left and right arms, respectively. The pair of reads from two positions of a chromosome fragment, respectively, can be obtained by sequencing a constructed library by constructing a pair-end library or a mate-pair library. In one embodiment of the present invention, multiple pairs of read pairs are obtained using the library construction method of Complete Genomics (CG) and its sequencing platform. The distance between a pair of read pairs is determined by the length of the read and the enzyme. The distance between the recognition site and the cleavage site is controlled. The CG platform was constructed by enzymatic cleavage to construct a multi-linker paired-end library, and the constructed circular library was sequenced by a unique combinatorial probe-ligation sequencing (cPAL) technique. The bases on both sides of the linker were read because they were ligated by restriction enzyme digestion. Two segments of a linker are used to construct a paired-end library, since each enzyme has a preferred cutting distance, and in actual digestion, it is often one more position or one less than the preferred distance, which makes the reading often With a gap, the gap is often +1 or -1, and / or, if the same enzyme is used for multiple digestions during the construction of the library, the position of the enzyme digestion is easy to change, and the position of the enzyme digestion will change. The obtained reads are nicked, for example, when constructing a multi-ligand circular library, the Alu enzyme is used for two digestions to join different portions of the plurality of linkers, and when the bases adjacent to the linkers are read, a band of +3 is generated. A reading of the gap of /-3. The size of the gap in the present invention may also be zero. Taking the current two-coupler (2-AD) sequencing library of the CG platform as an example, the 2-AD sequencing output has a total length of 60 bp, which can be divided into two pairs of mate-paired reads, and each pair of reads is centered. The reads have a small gap at 10 bp, an invalid sequencing site N at the 20 bp position, and the distance between the two reads of a pair of reads is generally less than 2000 bp. From a plurality of reads in a multi-joint library, one read can form a pair of read pairs with any other read. The term "positive strand" and "negative strand" as used herein are complementary two strands constituting a chromosome fragment, and are opposite. A strand is said to be a positive strand, and its complementary strand may be referred to as a negative strand, in an embodiment of the present invention. In the example, a chain that matches a reference sequence is referred to as a positive chain, and another chain is referred to as a negative chain.

The processor 20 is configured to execute a data processing program, and the executing the data processing program comprises: comparing the sequencing data with a reference sequence, obtaining a comparison result, and eliminating each read in the comparison result a gap, obtaining a universal alignment result, the alignment result comprising a plurality of alignment results of the pair of reads, and/or, the comparison result comprising a plurality of Alignment results of the left arms and alignment results of a plurality of the right arms. The comparison can be performed by using known comparison software, such as SOAP, BWA, etc., or by using the comparison software TeraMap of the CG platform. In one embodiment of the invention, the alignment is performed using TeraMap, and the resulting alignment result is in the format TeraMap. In one embodiment of the present invention, the gap of each read in the elimination comparison result means that the negative gap is removed from the read with the negative gap, that is, the overlapping base is removed, and the positive gap is removed. The read segment replaces the size of the positive gap by N, N is A, T, C or G, and the read with the gap 0 is not processed. For example, for a read with a negative gap such as -2 nt, the read based on the gap The segment can be divided into two parts, and the ends of the two parts have 2nt overlap. For example, the two parts of the read segment are ATCGCTTAAG and AGTACGATTC respectively, and the negative gap, that is, the overlapping AG, is eliminated, and the corresponding read segment is obtained as ATCGCTTAAGTACGATTC.

The storage unit 30 is for storing data, and the above-described data processing program is stored in the storage unit 30, and intermediate data or results of the processing of the sequencing data from the data receiving unit 10 and the processor 20 are also stored.

Figure 3 is a block diagram showing the structure of a system in an embodiment of the sequencing data processing system of the present invention. The sequencing data processing system 1000 includes a sequencing data processing device 100, a host 200, and a display device 300. The host 200 can be an audio/video/signal source device, such as a computer host, mainframe, etc., for transmitting display data required by the display device 300. The host 200 includes at least one interface electrically connected to the sequencing data processing device 100. The sequencing data processing device 100 receives the sequencing data output from the host 200, processes the sequenced data, and then outputs the processed data or results to the display device. 300.

4 is a flow chart showing the sequencing data processing of one embodiment of the sequencing data processing method of the present invention. The sequencing data processing method comprises the steps of: S1 acquiring sequencing data, the sequencing data comprising a plurality of pairs of read segments, each pair of read segments consisting of two read segments, respectively, from two positions of a chromosome segment, each pair of reads The two reads of the pair are from the positive and negative strands of the chromosome fragment, respectively, or both reads of each pair of read lengths are from the positive strand of the chromosome fragment or the negative strand of the chromosome fragment, Each read segment includes a gap, and two reads of a pair of read pairs are respectively defined as a left arm and a right arm; S2 compares the sequencing data with a reference sequence to obtain a comparison result, and the comparison result Include a comparison result of a plurality of the pair of read segments, and/or, the comparison result includes a comparison result of the plurality of the left arms and a comparison result of the plurality of the right arms; S3 eliminates the ratio A common alignment result is obtained for each gap in the result. For the characteristics of the acquisition method of the read pair, the gap included in the read, the alignment, the elimination of the gap, the comparison result and the general comparison result, reference may be made to the above-mentioned sequencing data processing apparatus in one aspect or any embodiment of the present invention. A description of the corresponding technical features in . For example, in the same way, the pair of reads from two positions of a chromosome fragment, respectively, can be constructed by constructing a pair-end library or a mate-pair library. By performing sequencing, in one embodiment of the present invention, multiple pairs of read pairs are obtained by using the library construction method of Complete Genomics (CG) and its sequencing platform, and the distance between a pair of read pairs is read by The length and the distance between the recognition site of the enzyme and the cleavage site are controlled. CG platform to construct a multi-linker paired end library by enzymatic digestion The constructed circular library was sequenced by a unique combinatorial probe-ligation sequencing (cPAL) technique, and the bases on both sides of the linker were read, as it was constructed by pairing two ends of a linker to perform paired-end library construction. Because each enzyme has a preferred cutting distance, and often there is one more position or one less position than the preferred distance in the actual enzymatic cutting, so that the reading often has a gap, and the gap is often +1. Or -1, and / or, if the same enzyme is used for multiple digestions during the construction of the library, the position of the enzyme digestion is easy to change each time, and the change of the enzyme digestion position will also cause the obtained reads to have a gap, for example, in the construction. In the case of a multi-ligand circular library, the Alu enzyme is digested twice to join different portions of the plurality of adaptors, and when the bases next to these linkers are read, a read with a gap of +3/-3 is generated. The size of the gap in the present invention may also be zero. From a plurality of reads in a multi-joint library, one read can form a pair of read pairs with any other read. The term "positive strand" and "negative strand" as used herein are complementary two strands constituting a chromosome fragment, and are opposite. When a strand is a positive strand, the complementary strand can be said to be a minus strand. Here, a chain that matches a reference sequence is referred to as a positive chain, and another chain is referred to as a negative chain. The comparison can be performed by using known comparison software, such as SOAP, BWA, etc., or by using the comparison software TeraMap of the CG platform. In one embodiment of the invention, the alignment is performed using TeraMap, and the resulting alignment result is in the format TeraMap. In one embodiment of the present invention, the gap of each read in the elimination comparison result means that the negative gap is removed from the read with the negative gap, that is, the overlapping base is removed, and the positive gap is removed. The read segment replaces the size of the positive gap by N, N is A, T, C or G, and the read with the gap 0 is not processed. For example, for a read with a negative gap such as -2 nt, the read based on the gap The segment can be divided into two parts, and the ends of the two parts have 2nt overlap. For example, the two parts of the read segment are ATCGCTTAAG and AGTACGATTC respectively, and the negative gap, that is, the overlapping AG, is eliminated, and the corresponding read segment is obtained as ATCGCTTAAGTACGATTC.

Figure 5 is a flow chart showing the data processing of one embodiment of the sequencing data processing method of the present invention. The sequencing data processing method comprises: S10 acquiring sequencing data, the sequencing data comprising a plurality of pairs of read pairs, each pair of read segments consisting of two read segments, respectively, from two positions of one chromosome segment, each pair of read pairs The two reads in the pair are from the positive and negative strands of the chromosome fragment, or the two reads in each pair of read lengths are from the positive strand of the chromosome fragment or the negative strand of the chromosome fragment, each Each of the read segments includes a gap, and two reads of a pair of read pairs are respectively defined as a left arm and a right arm; S20 compares the sequencing data with a reference sequence to obtain a comparison result, and the comparison result includes Aligning results of a plurality of the pair of read segments, and/or, the comparison result includes a comparison result of the plurality of the left arms and a comparison result of the plurality of the right arms; S30 extracting the comparison a unique alignment result in the result to replace the alignment result, the unique alignment result comprising a plurality of read pairs uniquely aligned with the reference sequence, and each of the read pairs is compared to the reference The same chromosome of the sequence, two of each pair of reads The distance of the read is in accordance with the expected distance between the two positions of the chromosome segment from which it is derived; the S40 correction causes each pair of the unique alignment to be compared to the same chromosome of the reference sequence Positive chain. For example, for a pair of reads that respectively align the positive and negative strands of the previous chromosome, the reads of the aligned negative strands become their complementary strands, thus replacing the reads with their reverse complementary strands. Said correction; S50 eliminates the gap of each of the unique alignment results to obtain a general alignment result.

Fig. 6 is a flow chart showing the detection of an embodiment of the SNP detecting method of the present invention. The SNP detection method comprises the steps of: S11 acquiring nucleic acid of a sample to be tested; S12 sequencing the nucleic acid to obtain sequencing data; S13 processing the sequencing data to obtain a general comparison result; S14 is based on the universal comparison As a result, the SNP is detected; wherein S13 is performed using a sequencing data processing device and/or a sequencing data processing method in one aspect of the invention or in any one of the embodiments. The detection of SNPs based on universal alignment results can utilize currently known SNP detection methods and/or software tools, such as using SOAP2, GATK, samtools, and the like.

FIG. 7 is a schematic structural diagram of an apparatus of an embodiment of a SNP detecting apparatus of the present invention. The device 2000 includes: a nucleic acid acquiring device 200 for acquiring nucleic acid of a sample to be tested; and a sequencing device 400 for sequencing nucleic acid from the nucleic acid acquiring unit to obtain sequencing data, the sequencing data including multiple pairs of readings Yes, each pair of reads consists of two reads, one from each of the two chromosome segments, and two of each pair of read pairs are from the positive and negative strands of the chromosome segment, respectively. Both reads of the pair of read lengths are from the positive strand of the chromosome fragment or the negative strand of the chromosome, each read contains a gap, and the two reads of a pair of read pairs are respectively defined as left An arm and a right arm; a data processing device 600 for processing sequencing data from the sequencing device to obtain a general alignment result; and a detecting device 800 for based on a general comparison result from the data processing device 600 Detecting a SNP; wherein the data processing apparatus 600 includes: a data receiving unit 610, configured to receive sequencing data from the sequencing apparatus, and a processor 630, configured to execute a data processing program, execute the The data processing program includes implementing the alignment of the sequencing data with a reference sequence, obtaining a comparison result, and eliminating a gap of each of the comparison results, obtaining a universal alignment result, the comparison result Include a comparison result of a plurality of the pair of read segments, and/or, the result of the comparison includes a comparison result of the plurality of the left arms and a comparison result of the plurality of the right arms, and at least one storage The unit 650 is configured to store data, where the data processing program is included. The foregoing description of the advantages and technical features of the SNP detection method in one aspect of the present invention or any of its specific embodiments also applies to the SNP detecting device of this aspect of the present invention, and details are not described herein again. It will be understood by those skilled in the art that all or a portion of the units of the present invention, optionally, detachably, include one or more sub-units to perform or implement various embodiments of the foregoing SNP detection methods of the present invention.

The following examples are merely illustrative of preferred embodiments of the invention, and specific methods or conditions are not indicated in the examples, which may be in accordance with the techniques or conditions described in the literature in the field (for example, reference to J. Sambrook et al. , Huang Peitang et al., "Molecular Cloning Experimental Guide", third edition, Science Press) or in accordance with product specifications. Any reagents or instruments that are not indicated by the manufacturer are commercially available products or services.

Embodiment 1

The peripheral blood plasma of lung cancer patients was taken as the test object. The samples were from Southwest Hospital and tested as follows:

(1) Library establishment and sequencing

The construction and sequencing process is shown in Figure 8. The specific sequences involved below are from 5' to 3' from left to right. The "/" in the sequence is the terminal modification group, and "phos" indicates Phosphorylation, "dd" means dideoxy, and "bio" means biotin.

1. Extraction of cfDNA (using SnoMag Circulating DNA Kit):

1) Take 200ul of plasma in a 1.5ml EP tube and add 600ul of buffer LSB.

2) Add 20 μl NanoMag Circulating Beads and mix for 10 min at room temperature and mix once every 2-3 min.

3) Place the EP tube on a magnetic stand for 1 min and discard the supernatant.

4) Remove the EP tube and add 150uL Buffer WA and mix.

5) Place the EP tube on a magnetic stand for 1 min and discard the supernatant.

6) Remove the EP tube and add 150 uL of 75% ethanol and mix.

7) Place the EP tube on a magnetic stand for 1 min and discard the supernatant.

8) Repeat 6-7 times.

9) Dry the magnetic beads for 5 min at room temperature.

10) Add 32 ul of elution buffer to mix the magnetic beads and let stand for 5 min at room temperature.

11) Place the EP tube on a magnetic stand for 1 min and transfer the supernatant to a new 1.5 ml EP tube.

2. Construction of the library:

1) rSAP dephosphorylation

cfDNAcfDNA	30ul 30ul
cfDNAcfDNA	30ul 30ul
10x NEBuffer 210x NEBuffer 2	3.5ul3.5ul
10x NEBuffer 210x NEBuffer 2	3.5ul3.5ul	rSAP(1U/ul)rSAP(1U/ul)	1.5ul1.5ul
TotalTotal	35ul35ul	rSAP(1U/ul)rSAP(1U/ul)	1.5ul1.5ul

Reaction conditions:

2) T4DNA Polymerase end fill

Reaction conditions:

12 ° C

20min

4 ° C

Hold

The above reaction product was purified by 60 ul of Ampure XP beads and eluted with 22 ul of Elution buffer.

3) The first sequence and the second sequence are respectively ligated to both ends of the end-filled DNA fragment

Reaction conditions:

20℃20 ° C	15min15min
20℃20 ° C	15min15min	4℃4 ° C	holdHold

The above reaction product was purified by 40 ul of Ampure XPbeads and eluted with 22 ul of Elution buffer.

The two strands of the first sequence are: TTGGCCTCCGACT/3-ddT/(SEQ ID NO: 1),

/5phos/AAGTCGGAGGCCAAGCGGTCGT/ddC/(SEQ ID NO: 2).

The two strands of the second sequence are: /5Phos/GTCTCCAGTCGAAGCCCGACG/3ddC/(SEQ ID NO: 3), GCTTCGACTGGAGA/3ddC/(SEQ ID NO: 4).

4) Nick Translation

The upstream primer in the third sequence is/5-bio/TCCTAAGACCGCTTGGCCTCCGACT (SEQ ID NO: 5),

Downstream primer in the third sequence

5Phos/AGACAAGCTCxxxxxxxxxxGATCGGGCTTCGACTGGAGAC (SEQ ID NO: 6), the intermediate "x" is a variable tag sequence region, which can be replaced by N, N is A, T, C or G, when no other sample libraries are mixed together, only A sample library is on the machine, no tag sequence is required, ie the third sequence can be

5Phos/AGACAAGCTCGATCGGGCTTCGACTGGAGAC (SEQ ID NO: 7), in this example, Because it is a tumor free nucleic acid sample, the target nucleic acid (ctDNA) content in the mixed nucleic acid is low. If a plurality of such sample libraries are mixed and obtained on the machine to obtain mixed data, the split mixed data needs to be corresponding to the respective samples, a part of the data is lost, and the construction is performed. The single-joint circular library reads are relatively short, and accurate sequencing requires deep sequencing to obtain a relatively large amount of assay data, preferably, a single sample library.

Reaction conditions:

60℃60 ° C	5min5min
60℃60 ° C	5min5min	37℃37 ° C	0.1℃/secs-hold0.1°C/secs-hold

Add 8ul of the following translations to the top reaction.

Reaction conditions:

37℃37 ° C	20min20min
37℃37 ° C	20min20min	4℃4 ° C	holdHold

The above reaction product was purified by 40 ul of Ampure XP beads and eluted with 37.4 ul of Elution buffer.

5) PCR with Pfx

Reaction conditions:

The above reaction product was purified by 50 ul of Ampure XP beads and eluted with 22 ul of Elution buffer.

6) Qubit quantification

The PCR product was subjected to concentration determination using a Qubit dsDNA HS assay kit.

7) Strand Separation

a) Multiple libraries were mixed to give a total of about 160 ng of DNA. The sample was filled with 1 x TE to a total volume of 60 ul.

b) Prepare the following reagents in advance: 4X BBB, Streptavidin Beads, 0.3M MOPS acid, 0.5% Tween 20, 1X BBB/Tween Mix, 1X BWB/Tween Mix, 0.1 M NaOH. Among them, 1X BWB/Tween Mix, 0.1M NaOH, and Streptavidin Beads are ready for use.

c) Configure the following four reagents 15 minutes in advance

0.5% Tween20, 1X BBB/Tween Mix, 1X BWB/Tween Mix, 0.1M NaOH.

The 0.5% Tween20 configuration method is the same as the above, and the other three configuration methods are as follows:

d) 1X BBB/Tween Mix

1X BBB1X BBB	30ul30ul
1X BBB1X BBB	30ul30ul	0.5％Tween200.5% Tween20	0.3ul0.3ul
TotalTotal	30.3ul30.3ul	0.5％Tween200.5% Tween20	0.3ul0.3ul

e) 1X BWB/Tween Mix

1X BWB1X BWB	2000ul2000ul
1X BWB1X BWB	2000ul2000ul	0.5％Tween200.5% Tween20	20ul20ul
TotalTotal	2020ul2020ul	0.5％Tween200.5% Tween20	20ul20ul

f) 0.1M NaOH

0.5M NaOH0.5M NaOH	15.6ul15.6ul
0.5M NaOH0.5M NaOH	15.6ul15.6ul	WaterWater	62.40ul62.40ul
TotalTotal	78.0ul78.0ul	WaterWater	62.40ul62.40ul

g) Streptavidin Beads washing method is as follows:

· Take 30ul Streptavidin Beads per sample: add 3-5 times the volume of 1XBBB, mix and place on a magnetic stand to absorb statically, adjust the direction of the non-stick tube, so that the beads move back and forth in the 1XBBB lotion, discard the supernatant. After the liquid, repeat the above operation once.

• Remove the non-stick tube and add 1 volume (30 ul) of 1X BBB/Tween Mix suspension, mix and let stand at room temperature.

h) Add 20 ul of 4XBBB to 60 ul of PCR product mixture, then transfer to a non-stick tube containing 30 ul of 1X BBB/Tween Mix-dissolved beads. Mix the 110 ul mixture at room temperature for 15-20 min. once.

i) Place the above non-stick magnetic frame for 3-5min, discard the supernatant, and wash it twice with 1ml of 1X BWB/Tween Mix. The method is the same as the washing method of Streptavidin Beads.

j) Add 26 ul of 0.1 M NaOH to the above beads, mix by blowing and let stand for 10 min, then place on a magnetic stand for 3-5 min, and take the supernatant into a new 1.5 ml EP tube.

k) Add 13ul of 0.3M MOPS to the above 1.5ml EP tube and mix for later use.

l) The product of this step can be stored frozen at -20 °C.

8) Splint Circulation

a) Add 10ul of 20uM fourth sequence to the 39ul sample obtained in the previous step. The fourth sequence is

TCGAGCTTGTCTTCCTAAGACCGC (SEQ ID NO: 8);

b) Prepare the ligase reaction mixture 5 minutes in advance, prepared as follows:

WaterWater	4.2ul4.2ul
WaterWater	4.2ul4.2ul	10x TA Buffer(LK1)10x TA Buffer(LK1)	6ul6ul
100mM ATP100mM ATP	0.6ul0.6ul	10x TA Buffer(LK1)10x TA Buffer(LK1)	6ul6ul
100mM ATP100mM ATP	0.6ul0.6ul	600U/ul Ligase600U/ul Ligase	0.2ul0.2ul
TotalTotal	11ul11ul	600U/ul Ligase600U/ul Ligase	0.2ul0.2ul

c) The ligase reaction mixture is shaken and thoroughly mixed. After centrifugation, 11 ul of the ligase reaction mixture is added to the EP tube to which the primer reaction mixture has been added, shaken for 10 s, and centrifuged instantaneously.

d) Incubate in a PCR machine for 1.5 h at 37 °C.

e) After the reaction is completed, 5 ul of the sample is taken out and subjected to electrophoresis detection of 6% denaturing gel, and the remaining volume of about 55 ul is passed to the next enzyme reaction.

9) Digestive digestion (Exo I and III)

a) Prepare the primer reaction mixture about 5 minutes in advance, and prepare as follows:

10x TA Buffer(LK1)10x TA Buffer(LK1)	1ul1ul
10x TA Buffer(LK1)10x TA Buffer(LK1)	1ul1ul	20U/ul Exo I20U/ul Exo I	3ul3ul
200/ul Exo III200/ul Exo III	1ul1ul	20U/ul Exo I20U/ul Exo I	3ul3ul
200/ul Exo III200/ul Exo III	1ul1ul	TotalTotal	5ul5ul

b) The mixture is shaken and thoroughly mixed, and after centrifugation, 5 ul of the reaction mixture is separately added to the 55 ul sample obtained in the previous step;

c) Incubate for 10 s, mix and centrifuge, and incubate in a PCR machine at 37 ° C for 30 min.

d) After the enzyme digestion was completed for 30 min, 2.5 ul of 500 mM EDTA was added to the sample to terminate the enzyme reaction.

e) The above sample was purified with PEG32beads/tween20 as follows:

Transfer 59 ul of the above step to a 1.5 ml non-stick tube, add 78 ul of PEG32beads/tween 20 (PEG32beads: tween20=100:1), and combine at room temperature for 15 min, while blowing and mixing once;

f) After the non-stick tube is placed on the magnetic stand for 3-5min, discard the supernatant and wash it twice with 700ul 75% ethanol. When washing, the non-stick tube will be reversed in the front-rear direction, so that the beads move in the ethanol, each wash tour Move 2-3 times;

g) After drying at room temperature, dissolve with 27ul TE/tween20 (TE: tween20=500:1), dissolve for 15min, mix once in the middle;

h) Transfer supernatant to a new tube 1.5mlEP, the final product was obtained with quantitative Qubit ^TM ssDNA Assay Kit. The ratio of Buffer to dye is 199:1. After mixing, votex and centrifuge for mixing. Take two 190 ul of diluted dye working solution and add 10 ul of two standard votex and centrifuge for mixing. Add 198 ul of diluted dye working solution to 2 ul sample. After the votex, centrifuge and quantify the Qubit instrument.

i) Normalization of concentration

The starting amount of the sample used for the preparation of DNB was adjusted to 35.3 ng-53 ng according to the concentration of single-stranded molecular quantitative determination. The corresponding volume sample (<60 ul) was transferred to the Biorad PCR plate, and the total volume was not more than 120 ul using 1XTE. .

The final concentration is 5.625-7.5fmol/ul, the volume is 120ul, the total amount is 35.3ng-53ng, and the DNB in the 1adapter sequencing needs 120fmol, 7.5foml/ul, 16ul. Therefore, the library needs to be diluted to 7.5 fmol/ul.

a) CG 1-Adapter sequencing

Sequencing using the standardized process of the CG platform. DNA nanochips are a high-throughput sequencing technology pioneered by CG. This example of sequencing improved single-joint sequencing libraries is less expensive and faster than other sequencing protocols, and integrates quality control to ensure sequencing quality.

Embodiment 2

The offline data of the first embodiment is processed, and FIG. 9 is a schematic flowchart of detecting the SNP. Using the sequencing data processing method and/or SNP detection method of the present invention, based on the CG platform sequencing technology, ultra-micro cfDNA enrichment, library establishment, sequencing and data analysis can be performed. In this example, due to the particularity of the CG sequencing principle, the sequencing of the reads is short, and there are resequencing and small gaps at specific locations. It is difficult to directly compare the sequencing results using ordinary comparison software or Detection analysis. For the special structure of reads, we use the TG platform's proprietary TeraMap for comparison. The working principle is: First, it will compare the two ends of the read length (LeftArm, RightArm), and TeraMap will try a variety of gaps. The value is used to process the read length to obtain more comparison results; then, the comparison result at each end is taken as a reference, and the other end is locally aligned (for example, 4-AD, the range of the local alignment is 0 to 700bp); if both ends can be well aligned to the same chromosome, and the insert-size meets expectations (eg 4-AD, the distance between the two reads of a read pair is 0-700bp), then only the best alignment result is output Otherwise, multiple comparison results at both ends are output. TeraMap is a comparison software for CG sequencing platform. It can compare CG-specific sequences to the reference genome. The output format consists of three parts. The following is a brief description: The first part, the first line is the reads sequence information, including the numbers of the reads. , the left arm and the right arm sequence are three columns; the second part: the second row and the third row are the readings comparison description, which is a brief description of the comparison of the left arm and the right arm, and the format is "field name=value "Part III: The fourth line to the beginning of the next read sequence information, that is, the fourth line and the fifth line, is the details of the results of the reads comparison.

first part:

列号Column number	字段Field	类型Types of	简介 Introduction
列号Column number	字段Field	类型Types of	简介 Introduction	11	QNAMEQNAME	字符串String	参考序列编号 Reference sequence number
22	POSPOS	整型Integer	比对到参考序列的位置Align to the position of the reference sequence	11	QNAMEQNAME	字符串String	参考序列编号 Reference sequence number
22	POSPOS	整型Integer	比对到参考序列的位置Align to the position of the reference sequence	33	SEQSEQ	字符串String	比对片段的序列信息Align the sequence information of the fragment

the second part:

the third part:

Because the TeraMap has a gap problem, making it impossible to perform downstream analysis, the Teramap2Sam software is developed according to the method of the present invention, and the gap in the TeraMap comparison result is removed and converted into SAM (sequence alignment/map format). The main process of Teramap2Sam software can be divided into three parts, and the algorithm flow chart is shown in Figure 10.

Step 1: Extract the unique alignment results. According to the TeraMap output result matchCount to determine whether the unique alignment, while requiring the length of the insert to meet the requirements and the read alignment of the two ends on a reference sequence.

Step 2: Remove the gap. The gap position in the reads is determined according to the gaps field, and the read sequence is corrected.

The third step: calculate FLAG. According to the comparison direction of the double-ended read, the FLAG parameter in the SAM file is calculated to obtain the comparison.

SAM is a more general format for storing comparison information. Each line is a pair of reads. It consists mainly of eleven fields. Later, more fields can be added to contain more information, such as XT:A: U means that this reads is a unique comparison. A brief description is as follows:

列号Column number	字段Field	类型Types of	简介 Introduction
列号Column number	字段Field	类型Types of	简介 Introduction	11	QNAMEQNAME	字符串String	比对读段的编号Compare the number of the read
22	FLAGFLAG	整型Integer	标识符之和，用于表述比对情况The sum of the identifiers used to describe the comparison	11	QNAMEQNAME	字符串String	比对读段的编号Compare the number of the read
22	FLAGFLAG	整型Integer		33	RNAMERNAME	字符串String	参考序列的编号Reference sequence number
44	POSPOS	整型Integer	比对上的位置Position on the comparison	33	RNAMERNAME	字符串String	参考序列的编号Reference sequence number
44	POSPOS	整型Integer	比对上的位置Position on the comparison	55	MAPQMAPQ	整型Integer	比对质量值Alignment quality value
66	CIGARCIGAR	字符串String	简要比对信息表达式Brief comparison of information expressions	55	MAPQMAPQ	整型Integer	比对质量值Alignment quality value
66	CIGARCIGAR	字符串String	简要比对信息表达式Brief comparison of information expressions	77	RNEXTRNEXT	字符串String	下一读段比对上的参考序列的编号Number of the reference sequence on the next read alignment
88	PNEXTPNEXT	整型Integer	下一读段比对上的位置Position on the next read alignment	77	RNEXTRNEXT	字符串String
88	PNEXTPNEXT	整型Integer	下一读段比对上的位置Position on the next read alignment	99	TLENTLEN	整型Integer	比对读段长度Aligned read length
1010	SEQSEQ	字符串String	序列读段的序列信息Sequence information of sequence reads	99	TLENTLEN	整型Integer	比对读段长度Aligned read length
1010	SEQSEQ	字符串String	序列读段的序列信息Sequence information of sequence reads	1111	QUALQUAL	字符串String	序列的质量信息Sequence quality information

In order to save storage resources in actual use, the binary compression format (BAM) is mainly used. In addition, CG developed the Assembly Software for its read structure to reassemble the reads, and perform the follow-up work after the assembly is completed.

Due to the shortcomings of the GS single-joint reads, the short readout is short (12 bp). In some special data processing, the original CG mutation detection tool is no longer applicable or the detection result is not good. In response to this situation, we first developed a tool to convert TeraMap's alignment results into a common SAM/BAM format, where SAM/BAM is a commonly used alignment format for high-throughput sequencing, so we use this common format. Then use BAM data to detect SNP mutations. SNP detection can utilize known open source software, such as SOAP2, samtools, GATK, and the like. In this example, the open source software samtools is used to preprocess the common alignment results (the SAM format bam file), including sorting, removing duplicate reads generated by the library PCR, and then using the samtools mpileup and/or GATK open source software to obtain the SNP. As a result, the open source software bcftools was converted to the vcf format. Compared with other tools, Samtools is easy to operate, and the output format is common. In the process of big data processing, multi-threading can be used to improve efficiency, and the speed is fast and credible.

Compared with the traditional method, we can use the CG single-join sequencing method to achieve ultra-micro-storage sequencing. Only 1-10 ng of nucleic acid is needed for database construction, and the peripheral blood volume is 2-5 ml, and the standardization process of CG is simple and fast. TeraMap ratio After converting the result to SAM format, it is more versatile than the closed source TeraMap format, and can be processed using software such as Samtools. In this example, TeraMap is used for comparison. After the sequencing is completed, the original reads are obtained using the CG platform's integrated tool makeADF, and then compared with TeraMap, and the sequenced reads are aligned on the reference sequence. The obtained alignment results were converted to the general SAM format using TeraMap2Sam, and then SNP detection was performed using software such as Samtools. The results are shown in Table 1.

Table 1

BAM(排序并去重)BAM (sort and deduplicate)	198Gb/4h20min198Gb/4h20min
BAM(排序并去重)BAM (sort and deduplicate)	198Gb/4h20min198Gb/4h20min	SNP(vcf)SNP (vcf)	3.1Gb/2h38min3.1Gb/2h38min
测序深度Sequencing depth	85.62X85.62X	SNP(vcf)SNP (vcf)	3.1Gb/2h38min3.1Gb/2h38min
测序深度Sequencing depth	85.62X85.62X	1X深度以上覆盖度1X depth coverage	91.66％91.66%
5X深度以上覆盖度5X depth coverage	89.97％89.97%	1X深度以上覆盖度1X depth coverage	91.66％91.66%
5X深度以上覆盖度5X depth coverage	89.97％89.97%	10X深度以上覆盖度10X depth coverage	87.85％87.85%
1X深度以上CDS区域覆盖度CDS area coverage above 1X depth	99.67％99.67%	10X深度以上覆盖度10X depth coverage	87.85％87.85%
1X深度以上CDS区域覆盖度CDS area coverage above 1X depth	99.67％99.67%	5X深度以上CDS区域覆盖度CDS area coverage above 5X depth	98.37％98.37%
10X深度以上CDS区域覆盖度CDS area coverage above 10X depth	96.04％96.04%	5X深度以上CDS区域覆盖度CDS area coverage above 5X depth	98.37％98.37%
10X深度以上CDS区域覆盖度CDS area coverage above 10X depth	96.04％96.04%	总共检测到的肿瘤体细胞SNP数量Total number of tumor somatic SNPs detected	2653326533

Claims

A sequencing data processing device, characterized in that

a data receiving unit, configured to receive the sequencing data, the sequencing data includes a plurality of pairs of read pairs, each pair of read segments consisting of two read segments, respectively, from two positions of a chromosome segment, each pair of read pairs The two reads in the pair are from the positive and negative strands of the chromosome fragment, or the two reads in each pair of read lengths are from the positive strand of the chromosome fragment or the negative strand of the chromosome, each The read segments all contain gaps, and the two read pairs of the pair of read pairs are defined as the left arm and the right arm, respectively;

a processor for executing a data processing program, the executing the data processing program comprising: comparing the sequencing data with a reference sequence, obtaining a comparison result, and eliminating a gap of each of the comparison results Obtaining a universal alignment result, the alignment result comprising a plurality of alignments of the pair of reads, and/or,

The comparison result includes a comparison result of a plurality of the left arms and a comparison result of a plurality of the right arms; and

At least one storage unit for storing data, including the data processing program.
The device of claim 1 wherein said comparing comprises

Comparing the left arm and the right arm of each pair of read pairs with the reference sequence, respectively, obtaining a first-order left-alignment result and a first-order right-alignment result,

Taking one of the first-order left-aligned result and the first-order right-aligning result as a reference, and comparing the other, obtaining the second-order left-aligned result and the second-level right-aligning result,

Obtaining a comparison result of the plurality of the pair of read segments based on the result of the second-order left alignment and the result of the second-order right alignment, or obtaining a comparison result of the plurality of the left arms and a plurality of the right The result of the alignment of the arms.
The apparatus of claim 2 wherein said comparing comprises arranging said notches to align each of said left or each right arm with said reference sequence a plurality of times.
The apparatus of claim 3 wherein said each of said left or each right arm is aligned a plurality of times with said reference sequence, said gaps of said each of said left arms or said each of said right arms being respectively set to -3nt, -2nt, -1nt, 0nt, 1nt, 2nt, 3nt, 4nt, 5nt, 6nt, and 7nt, obtaining corresponding plurality of reads, respectively comparing the corresponding plurality of reads with the reference sequence .
The apparatus of any of claims 1-4, wherein the format of the comparison result is TeraMap.
Apparatus according to any of claims 1-5, wherein performing said data processing program further comprises: implementing a unique ratio in said comparison result before said gap in each of said comparison results is eliminated Substituting the result of the alignment, the unique alignment result includes a plurality of pairs of reads that are uniquely aligned with the reference sequence, and each of the reads contrasts to the same chromosome to the reference sequence, The distance between the two reads of each of the pairs of reads corresponds to the distance of the two locations of the chromosome segment.
The apparatus of claim 6 wherein performing said data processing program further comprises implementing correcting a positive chain of the same chromosome that pairs each of said unique alignment results against said reference sequence.
The apparatus of claim 6 or 7, wherein executing the data processing program further comprises implementing a data format conversion, the data format conversion comprising converting the alignment result or the format of the unique alignment result.
Apparatus according to any of claims 1-8, wherein the elimination of said alignment result or the gap of each of said unique alignment results comprises,

If the read segment includes a positive gap, fill the size of the positive gap with N,

If the read segment includes a negative gap, the negative gap is removed, wherein

N is A, T, C or G.
The apparatus of any of claims 1-9, wherein the format of the universal alignment result is SAM or BAM.
A sequencing data processing system comprising a host and a display device, characterized in that the system further comprises the sequencing data processing device of any of claims 1-10.
A sequencing data processing method, comprising the following steps,

Obtaining sequencing data, the sequencing data comprising a plurality of pairs of read segments, each pair of read segments consisting of two read segments, respectively from two locations of one chromosome segment, and two reads of each pair of read length pairs are respectively from The positive and negative strands of the chromosomal segment, or both reads of each pair of read lengths are from the positive strand of the chromosomal segment or the negative strand of the chromosomal segment, each read containing a gap, The two readings of a pair of read pairs are defined as the left arm and the right arm, respectively;

Aligning the sequencing data with a reference sequence to obtain a comparison result, the alignment result comprising a plurality of alignments of the pair of reads, and/or,

The comparison result includes a comparison result of a plurality of the left arms and a comparison result of a plurality of the right arms;

The gap of each of the readout results is eliminated, and a general alignment result is obtained.
The method of claim 12, wherein obtaining the sequencing data comprises constructing a sequencing library to obtain a sequencing library, the sequencing library being a single-stranded circular DNA library, the sequencing library being a strand of the chromosome fragment and at least A predetermined DNA sequence constitutes.
The method of claim 12 wherein each pair of reads is from both ends of said chromosome segment.
The method of claim 14 wherein said obtaining sequencing results comprises sequencing library construction, obtaining a sequencing library, said sequencing library being a single-stranded circular DNA library, said sequencing library being linked and linked by said chromosome fragment A predetermined DNA sequence at both ends of the one strand is constructed.
The method of claim 15 wherein constructing said sequencing library comprises

(1) extracting a nucleic acid to be tested;

(2) terminal phosphorylating the nucleic acid to obtain a terminal phosphorylated product;

(3) repairing the terminal phosphorylation product at the end to obtain a terminal repair product;

(4) connecting the first sequence and the second sequence to both ends of the terminal repair product to obtain a first ligation product;

(5) performing nick translation and amplification of the ligation product using a third sequence to obtain an amplification product, the third sequence being a pair of primer pairs, at least one primer of the primer pair carrying a biotin label;

(6) performing single-strand separation of the amplification product using the biotin label to obtain a single-stranded product;

(7) cyclizing the single-stranded product with a fourth sequence to obtain the sequencing library;

The fourth sequence is capable of joining one end of the first sequence to one end of the second sequence, and the other end of the first sequence and/or the second sequence is a dideoxynucleotide.
The method of claim 15 wherein constructing said sequencing library comprises

(1) extracting a nucleic acid to be tested;

(2) repairing the nucleic acid at the end to obtain a terminal repair product;

(3) terminal phosphorylating the terminal repair product to obtain a terminal phosphorylation product;

(4) connecting the first sequence and the second sequence to both ends of the terminal phosphorylation product to obtain a first ligation product;

(5) performing nick translation and amplification of the ligation product using a third sequence to obtain an amplification product, the third sequence being a pair of primer pairs, at least one primer of the primer pair carrying a biotin label;

(6) performing single-strand separation of the amplification product using the biotin label to obtain a single-stranded product;

(7) cyclizing the single-stranded product with a fourth sequence to obtain the sequencing library;

The fourth sequence is capable of joining one end of the first sequence to one end of the second sequence, and the other end of the first sequence and/or the second sequence is a dideoxynucleotide.
The method of any of claims 12-17, wherein said comparing comprises

Comparing the left arm and the right arm of each pair of read pairs with the reference sequence, respectively, obtaining a first-order left-alignment result and a first-order right-alignment result,

Taking one of the first-order left-aligned result and the first-order right-aligning result as a reference, and comparing the other, obtaining the second-order left-aligned result and the second-level right-aligning result,

Obtaining a comparison result of the plurality of the pair of read segments based on the result of the second-order left alignment and the result of the second-order right alignment, or obtaining a comparison result of the plurality of the left arms and a plurality of the right The result of the alignment of the arms.
A method according to any of claims 12-18, wherein said aligning comprises arranging said notches to align each of said left or each right arm with said reference sequence a plurality of times.
The method of claim 19, wherein said each left or each right arm is compared a plurality of times with a reference sequence, and wherein each of said left arms or said each of said right arm gaps is set to -3nt, -2nt, -1nt, 0nt, 1nt, 2nt, 3nt, 4nt, 5nt, 6nt, and 7nt, obtaining corresponding plurality of reads, respectively comparing the corresponding plurality of reads with the reference sequence .
The method of any of claims 12-20, wherein the format of the comparison result is TeraMap.
A method according to any one of claims 12 to 21, characterized in that before the gap of each of the readout results is eliminated, a unique alignment result of the alignment results is extracted to replace the alignment result, The unique alignment result includes a plurality of pairs of reads that are uniquely aligned with the reference sequence, and each of the reads contrasts to the same chromosome to the reference sequence, two of each of the pairs of reads The distance of the reads corresponds to the size of the chromosome segment.
The method of claim 22 wherein the unique alignment result is modified such that each pair of the unique alignment results is aligned to a positive strand of the same chromosome of the reference sequence.
The method of claim 22 or 23, wherein obtaining the universal alignment result further comprises performing a data format conversion on the comparison result or the unique alignment result.
A method according to any one of claims 12-24, characterized in that the elimination of the alignment result or the gap of each of the unique alignment results comprises,

If the read segment includes a positive gap, fill the size of the positive gap with N,

If the read segment includes a negative gap, the negative gap is removed, wherein

N is A, T, C or G.
The method of any of claims 12-25, wherein the format of the universal alignment result is SAM or BAM.
A computer readable storage medium for storing a program for execution by a computer, the execution of the program comprising performing the method of any of claims 12-26.
A method for detecting a SNP, characterized in that

A. obtaining nucleic acid of the sample to be tested;

B. performing sequence determination on at least a portion of the nucleic acid to obtain sequencing data;

C. processing the sequencing data to obtain a general alignment result;

D. detecting the SNP based on the universal alignment result; wherein

Step C is carried out using the sequencing data processing apparatus of any of claims 1-10.
The method of claim 28, wherein the step B comprises performing a sequencing library construction on at least a portion of the nucleic acid to obtain a sequencing library, the sequencing library being a single-stranded circular DNA library.
The method of claim 29, wherein said sequencing library construction comprises

Phosphorylating the nucleic acid at the end to obtain a terminal phosphorylated product;

End-repairing the terminal phosphorylation product to obtain a terminal repair product;

Connecting the first sequence and the second sequence to both ends of the end repair product to obtain a first ligation product;

Amplifying the product by performing nick translation and amplification using a third sequence, wherein the third sequence is a pair of primer pairs, at least one primer of the primer pair carrying a biotin label;

Single-strand separation of the amplification product using the biotin label to obtain a single-stranded product;

The single-stranded product is cyclized using a fourth sequence to obtain the sequencing library, wherein

The fourth sequence is capable of joining one end of the first sequence to one end of the second sequence, and the other end of the first sequence and/or the second sequence is a dideoxynucleotide.
The method of claim 29, wherein said sequencing library construction comprises

Repairing the nucleic acid at the end to obtain a terminal repair product;

End-phosphorylation of the terminal repair product to obtain a terminal phosphorylated product;

Connecting the first sequence and the second sequence to both ends of the terminal phosphorylation product to obtain a first ligation product;

Amplifying the product by performing nick translation and amplification using a third sequence, wherein the third sequence is a pair of primer pairs, at least one primer of the primer pair carrying a biotin label;

Single-strand separation of the amplification product using the biotin label to obtain a single-stranded product;

The single-stranded product is cyclized using a fourth sequence to obtain the sequencing library, wherein

The fourth sequence is capable of joining one end of the first sequence to one end of the second sequence, and the other end of the first sequence and/or the second sequence is a dideoxynucleotide.
The method of any of claims 28-31, wherein said sequencing is performed using a combinatorial probe anchor ligation sequencing technique.
A SNP detecting device, characterized in that

a nucleic acid acquisition device for acquiring nucleic acid of the sample to be tested;

a sequencing device for sequencing at least a portion of the nucleic acid from the nucleic acid acquisition unit to obtain sequencing data, the sequencing data comprising a plurality of pairs of read segments, each pair of read segments consisting of two read segments, each from a strip Two positions of a chromosome segment, two reads of each pair of read lengths are from the positive and negative strands of the chromosome segment, respectively, or two reads of each pair of read lengths are from the chromosome segment a positive strand or a negative strand of the chromosome, each read includes a gap, and two reads of a pair of read pairs are defined as a left arm and a right arm, respectively;

a data processing device for processing sequencing data from the sequencing device to obtain a universal alignment result;

Detecting means for detecting the SNP based on a general comparison result from the data processing device; wherein

The data processing device includes

a data receiving unit, configured to receive sequencing data from the sequencing device,

a processor for executing a data processing program, the executing the data processing program comprising: comparing the sequencing data from the data receiving unit with a reference sequence, obtaining a comparison result, and eliminating each read in the comparison result a gap of the segment, obtaining a universal alignment result, the comparison result comprising a comparison result of the plurality of the pair of read segments, and/or, the comparison result comprising a plurality of comparison results of the left arm and a plurality of Alignment results of the right arms, and

At least one storage unit for storing data, including the data processing program.