CN109712672B - Method, device, storage medium and processor for detecting gene rearrangement - Google Patents

Method, device, storage medium and processor for detecting gene rearrangement Download PDF

Info

Publication number
CN109712672B
CN109712672B CN201811643484.6A CN201811643484A CN109712672B CN 109712672 B CN109712672 B CN 109712672B CN 201811643484 A CN201811643484 A CN 201811643484A CN 109712672 B CN109712672 B CN 109712672B
Authority
CN
China
Prior art keywords
sequence
breakpoint
candidate
sequences
breakpoints
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811643484.6A
Other languages
Chinese (zh)
Other versions
CN109712672A (en
Inventor
王彬安
刘洋洋
李富威
王建伟
伍启熹
刘倩
刘珂弟
唐宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Usci Medical Laboratory Co ltd
Original Assignee
Beijing Usci Medical Laboratory Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Usci Medical Laboratory Co ltd filed Critical Beijing Usci Medical Laboratory Co ltd
Priority to CN201811643484.6A priority Critical patent/CN109712672B/en
Publication of CN109712672A publication Critical patent/CN109712672A/en
Application granted granted Critical
Publication of CN109712672B publication Critical patent/CN109712672B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides a method, a device, a storage medium and a processor for detecting gene rearrangement. The method comprises the following steps: obtaining a sequence to be compared of a sample to be detected; comparing the sequence to be compared with a reference genome to obtain an abnormal comparison sequence, wherein the abnormal comparison sequence comprises a sequence with abnormal comparison position, a sequence with abnormal comparison direction and a sequence without the reference genome; determining the position of a candidate breakpoint according to the comparison position and the comparison direction of the abnormal comparison sequence on the reference genome; and assembling by using the sequences supporting the positions of the candidate breakpoints in the sequences to be compared, and keeping the breakpoints which are consistent with the sequence information of the positions of the candidate breakpoints in the assembly result and marking as the breakpoints of the gene rearrangement. The application solves the problem that the prior art is difficult to detect the breakpoint position of gene rearrangement.

Description

Method, device, storage medium and processor for detecting gene rearrangement
Technical Field
The invention relates to the field of gene variation detection, in particular to a method, a device, a storage medium and a processor for detecting gene rearrangement.
Background
The prior art generally adopts a nested RT-PCR method to detect the gene rearrangement phenomenon, and the steps are as follows: based on the known target gene sequence, a specific probe is prepared and gene rearrangement is detected. The nested PCR reaction has two PCR amplifications, thus reducing the possibility of amplifying a plurality of target sites (because the primers complementary to both sets of primers are few) and increasing the detection sensitivity; and two pairs of PCR primers are matched with the detection template, so that the detection reliability is improved. Since the second set of primers is located within the first round of PCR products and the probability that the non-desired fragment contains both sets of primer binding sites is minimal, the second set of primers cannot amplify the non-desired fragment. This nested PCR amplification ensures that the second round PCR products are little or completely free of contamination by non-specific amplification due to the lack of primer pair specificity.
However, nested RT-PCR checks for gene rearrangements with the following disadvantages: 1) the structure of gene rearrangement cannot be accurately judged. 2) The unknown rearrangement phenomenon cannot be detected due to the limitation of the primer and the probe. 3) Detailed information on the sequence of the rearranged gene disruption junction region is not available.
Therefore, there is a need for improvements to existing detection methods.
Disclosure of Invention
The invention mainly aims to provide a method, a device, a storage medium and a processor for detecting gene rearrangement, so as to solve the problem that the breakpoint position of gene rearrangement is difficult to detect in the prior art.
In order to achieve the above object, according to one aspect of the present invention, there is provided a method for detecting gene rearrangement, the method comprising: obtaining a sequence to be compared of a sample to be detected; comparing the sequence to be compared with a reference genome to obtain an abnormal comparison sequence, wherein the abnormal comparison sequence comprises a sequence with abnormal comparison position, a sequence with abnormal comparison direction and a sequence without the reference genome; determining the position of a candidate breakpoint according to the comparison position and the comparison direction of the abnormal comparison sequence on the reference genome; and assembling by using the sequences supporting the positions of the candidate breakpoints in the sequences to be compared, and keeping the breakpoints which are consistent with the sequence information of the positions of the candidate breakpoints in the assembly result and marking as the breakpoints of the gene rearrangement.
Further, determining the position of the candidate breakpoint according to the alignment position and the alignment direction of the abnormal alignment sequence on the reference genome comprises: and performing sequence segmentation on the abnormal comparison sequence, comparing the abnormal comparison sequence with the reference genome, and determining the position of the candidate breakpoint according to the comparison position and the comparison direction of the segmented abnormal comparison sequence on the reference genome.
Further, determining the position of the candidate breakpoint according to the alignment position and the alignment direction of the abnormal alignment sequence on the reference genome comprises: performing sequence segmentation on the abnormal comparison sequence, and comparing the abnormal comparison sequence with a reference genome to obtain a sequence which can simultaneously span two sides of the potential breakpoint and has a first length, marking the sequence as a first marker sequence, and a sequence which can simultaneously span two sides of the potential breakpoint and has a length smaller than a second length, and taking the sequence as a second marker sequence; simulating a breakpoint reference sequence in which gene rearrangement occurs according to the position of the potential breakpoint on the first marker sequence; comparing the sequence to be compared with the breakpoint reference sequence, marking the sequence which can be compared with the upper breakpoint reference sequence and cross the breakpoint on the breakpoint reference sequence, and recording the sequence as a breakpoint candidate sequence supporting the breakpoint; the position of the breakpoint on the breakpoint candidate sequence is determined as the position of the candidate breakpoint.
Further, determining the position of the breakpoint on the breakpoint candidate sequence as the position of the candidate breakpoint comprises: correcting the breakpoint candidate sequence according to the sequencing quality and the number of the support sequences to obtain a corrected candidate breakpoint sequence; and determining the positions of the breakpoints on the corrected candidate breakpoint sequence as the positions of the candidate breakpoints.
Further, determining the position of the breakpoint on the breakpoint candidate sequence as the position of the candidate breakpoint comprises: filtering false positive breakpoint sequences in the breakpoint candidate sequences according to a first marker sequence and a second marker sequence which support breakpoints on the breakpoint reference sequence and paired sequences which support breakpoints on the cross breakpoint reference sequence in the sequences to be compared to obtain filtered candidate breakpoint sequences; and determining the positions of the breakpoints on the filtered candidate breakpoint sequence as the positions of the candidate breakpoints.
Further, assembling with sequences that support positions of candidate breakpoints in the sequences to be aligned comprises: and assembling according to the first marker sequence and the second marker sequence which support the breakpoint on the breakpoint reference sequence and the paired sequences which support the breakpoint on the cross breakpoint reference sequence in the sequences to be compared, and keeping the breakpoint which is consistent with the sequence information of the position of the candidate breakpoint in the assembly result and marking as the breakpoint of the gene rearrangement.
Further, obtaining the sequences to be aligned of the sample to be tested comprises: constructing a sequencing library of a sample to be detected; performing high-throughput sequencing on the sequencing library to obtain sequencing data; and preprocessing the sequencing data to obtain a sequence to be compared of the sample to be detected.
Further, the sequencing library is a hybrid capture library, preferably obtained by capture probes of SEQ ID NO:1 to SEQ ID NO: 36.
Further, after obtaining the breakpoint of the gene rearrangement, the method further comprises a step of quantifying the rearranged gene, the quantifying step comprising: counting the sequence number of breakpoints supporting gene rearrangement in the sequences to be compared according to the sequence information of the breakpoints of the gene rearrangement, and recording the sequence number as a marker sequence number; and dividing the marker sequence number by the sequence number of the internal reference gene to obtain a ratio which is the expression abundance of the rearranged gene relative to the internal reference gene.
In order to achieve the above object, according to a second aspect of the present invention, there is provided an apparatus for detecting gene rearrangement, the apparatus being used for storing or operating modules, or the modules being components of the apparatus; wherein, the module is a software module, the software module is one or more, and the software module is used for executing any one of the methods for detecting gene rearrangement.
According to a third aspect of the present invention, there is provided a storage medium comprising a stored program, wherein the program performs any one of the above-described methods of detecting gene rearrangement.
According to a fourth aspect of the present invention, there is provided a processor for running a program, wherein the program when running performs any of the above-described methods for detecting gene rearrangement.
By applying the technical scheme of the invention, the position of gene rearrangement is detected by using high-throughput sequencing data, the position of the rearranged candidate breakpoint is determined by using the sequence which is abnormally compared with the sequence on the reference genome in the sequence to be compared, and then the reliable position of the candidate breakpoint is further verified by using the sequence-like sequence of the sequence to be compared, so that the position of the breakpoint of the gene rearrangement can be accurately detected, and correspondingly, the sequence information of the breakpoint position can also be accurately known, thereby providing a basis for further verifying the breakpoint position by using conventional PCR. Therefore, the method of the application can not only detect the known or unknown rearrangement phenomenon, but also accurately detect the specific position where the rearrangement occurs and the corresponding sequence information. This method directly utilizes NGS sequencing data, is based on statistical and algorithmic development, and does not add any additional experimental detection cost. In addition, the method has high detection accuracy and low cost, and is suitable for structural rearrangement detection of low-abundance genes.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a schematic diagram showing a simple flow chart for detecting the breakpoint position of gene rearrangement in a preferred embodiment according to the present invention;
FIG. 2 is a schematic view showing a detailed flow chart for detecting the breakpoint position of gene rearrangement in another preferred embodiment according to the present invention; and
FIGS. 3 and 4 are graphs showing the sequencing results of the breakpoint positions detected by the method of example 1 according to the present invention, verified by one-generation PCR sequencing, wherein FIG. 3 shows the sequencing results of the forward primer, and FIG. 4 shows the sequencing results of the reverse primer.
Detailed Description
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present invention will be described in detail with reference to examples.
As mentioned in the background, the prior art can only judge the rearrangement phenomenon and cannot accurately determine the position where the rearrangement occurs when detecting the gene in which the rearrangement occurs, and thus, in order to improve this situation, in an exemplary embodiment of the present application, there is provided a method for detecting gene rearrangement, the method comprising: obtaining a sequence to be compared of a sample to be detected; comparing the sequence to be compared with a reference genome to obtain an abnormal comparison sequence, wherein the abnormal comparison sequence comprises a sequence with abnormal comparison position, a sequence with abnormal comparison direction and a sequence without the reference genome; determining the position of a candidate breakpoint according to the comparison position and the comparison direction of the abnormal comparison sequence on the reference genome; and assembling by using the sequences supporting the positions of the candidate breakpoints in the sequences to be compared, and keeping the breakpoints which are consistent with the sequence information of the positions of the candidate breakpoints in the assembly result and marking as the breakpoints of the gene rearrangement.
According to the method for detecting gene rearrangement, the position of gene rearrangement is detected by using high-throughput sequencing data, the candidate breakpoint position of the rearrangement is determined by using the sequence which is abnormally compared with the sequence on the reference genome in the sequence to be compared, and then the reliable candidate breakpoint position is further verified by using the group sequence of the sequence to be compared, so that the breakpoint position of the gene rearrangement can be accurately detected, and correspondingly, the sequence information of the breakpoint position can also be accurately known, and a foundation is provided for further verifying the breakpoint position by using conventional PCR. Therefore, the method of the application can not only detect the known or unknown rearrangement phenomenon, but also accurately detect the specific position where the rearrangement occurs and the corresponding sequence information. This method directly utilizes NGS sequencing data, is based on statistical and algorithmic development, and does not add any additional experimental detection cost. In addition, the method has high detection accuracy and low cost, and is suitable for structural rearrangement detection of low-abundance genes.
It should be noted that the sequence to be compared of the sample to be tested may be a sequence to be compared formed by processing the original sequencing data of the sample to be tested, or may be an existing sequence to be compared that can be used for comparison. The method verifies the position of the candidate breakpoint through adding the sequence after assembly, so that the position of the breakpoint is more accurate.
In the sequences to be aligned, a part of the sequences can be aligned with the reference genome, and a part of the sequences cannot be directly aligned to the reference genome due to gene rearrangement, so that the part of the sequences is called abnormal aligned sequences. Aberrant aligned sequences include sequences that are aligned at aberrant positions (such as sequences that are forward tandem repeats), sequences that are aligned in aberrant orientations (such as sequences that are inverted tandem repeats), and sequences that are not aligned to a reference genome (such as insertion-deleted sequences). The potential breakpoint position can be determined by the existing method (for example, the position of the abnormal alignment sequence can be aligned to the same chromosome, inversion occurs between the sequences, and the potential breakpoint position can be determined by the abnormal alignment direction) according to the alignment position and the alignment direction of the abnormal alignment sequence on the reference genome.
In certain preferred embodiments, determining the location of the candidate breakpoint based on the aligned position and the aligned orientation of the aberrantly aligned sequence on the reference genome comprises: and performing sequence segmentation on the abnormal comparison sequence, comparing the abnormal comparison sequence with the reference genome, and determining the position of the candidate breakpoint according to the comparison position and the comparison direction of the segmented abnormal comparison sequence on the reference genome.
Specifically, the existing alignment software for sequence segmentation includes bwa, hisat2 or STAR. These software use a more relaxed alignment method to align each segmented sequence to the possible position of the reference genome, so that the final alignment position and alignment direction can be determined.
In some more preferred embodiments, determining the location of the candidate breakpoint based on the aligned location and the aligned orientation of the aberrantly aligned sequence on the reference genome comprises: performing sequence segmentation on the abnormal comparison sequence, and comparing the abnormal comparison sequence with a reference genome to obtain a sequence which can simultaneously span two sides of the potential breakpoint and has a first length, marking the sequence as a first marker sequence, and taking a sequence which can simultaneously span two sides of the potential breakpoint and has a length smaller than a second length as a second marker sequence; simulating a breakpoint reference sequence in which gene rearrangement occurs according to the position of the potential breakpoint on the first marker sequence; comparing the sequence to be compared with the breakpoint reference sequence, marking the sequence which can be compared with the upper breakpoint reference sequence and cross the breakpoint on the breakpoint reference sequence, and recording the sequence as a breakpoint candidate sequence supporting the breakpoint; the position of the breakpoint on the breakpoint candidate sequence is determined as the position of the candidate breakpoint.
In the data of double-ended sequencing, sequencing sequences in two directions exist, and according to the sequence of single-ended sequencing, if the sequence is cut into two or three segments and then compared with a reference genome, each segment can be compared to different positions and directions of the reference genome respectively, and the potential breakpoint position of gene rearrangement can be deduced according to the specific cut position. By dividing the first marker sequence and the second marker sequence and simulating and constructing a breakpoint reference sequence for re-alignment, the method is beneficial to acquiring more potential breakpoint-crossing sequences and supporting normally-aligned paired sequences crossing the breakpoint. And the candidate breakpoint sequence is further acted by a sequence supporting the breakpoint position on the breakpoint reference sequence, so that the accuracy of the screened candidate breakpoint is relatively high. The first length of the first marker sequence spanning two sides of the potential breakpoint can be reasonably set to be 20-25 bp according to the difference of sequence sequencing lengths. And the sequence with the length smaller than the second length is used as a second marker sequence, and the second length can be reasonably set to be 10-20 bp according to different sequencing lengths of the sequences.
In order to further improve the accuracy of the breakpoint position, the candidate breakpoints can be further corrected and false positive filtered according to the sequencing depth and the sequencing strategy of the sequencing data of the sample to be detected, so that the breakpoint position with higher authenticity is reserved.
In some preferred embodiments, determining the position of the breakpoint on the breakpoint candidate sequence as the position of the candidate breakpoint comprises: correcting the breakpoint candidate sequence according to the sequencing quality and the number of the support sequences to obtain a corrected candidate breakpoint sequence; and determining the positions of the breakpoints on the corrected candidate breakpoint sequence as the positions of the candidate breakpoints.
Specifically, for example, the average sequencing depth reaches 1000 ×, the sequence across the breakpoint reaches more than 2% of the average depth, that is, more than 20 × the breakpoint base correction can be performed, and the breakpoint can be corrected by simulating the breakpoint alignment position relationship of the reference sequence and the quality of the aligned base. Breakpoint false positives below 20 x for sequences supporting crossing breakpoints are usually removed if they are high.
In some preferred embodiments, determining the position of the breakpoint on the breakpoint candidate sequence as the position of the candidate breakpoint comprises: filtering false positive breakpoint sequences in the breakpoint candidate sequences according to a first marker sequence and a second marker sequence which support breakpoints on the breakpoint reference sequence and paired sequences which support breakpoints on the cross breakpoint reference sequence in the sequences to be compared to obtain filtered candidate breakpoint sequences; and determining the positions of the breakpoints on the filtered candidate breakpoint sequence as the positions of the candidate breakpoints.
Specifically, for example, the number of the first marker sequences is kept to be more than 10, and the breakpoints of the paired sequences supporting the breakpoints on the cross-breakpoint reference sequence in the sequences to be aligned are more than 50. Of course, the specific values herein can be adjusted according to different sequencing samples, and are only exemplary.
In certain preferred embodiments, assembling with sequences that support the position of a candidate breakpoint in the sequences to be aligned comprises: and assembling according to the first marker sequence and the second marker sequence which support the breakpoint on the breakpoint reference sequence and the paired sequences which support the breakpoint on the cross breakpoint reference sequence in the sequences to be compared, and keeping the breakpoint which is consistent with the sequence information of the position of the candidate breakpoint in the assembly result and marking as the breakpoint of the gene rearrangement.
By using the first marker sequence, the second marker sequence and the pair sequence supporting the breakpoint, sequence assembly is performed, and the candidate breakpoint position is verified again by the assembled sequence formed by de novo assembly, so that the finally determined breakpoint position of the gene rearrangement is more accurate.
As mentioned above, the data to be compared of the sample to be tested in the present application may be the existing sequence to be compared that can be directly used for comparison, or may be the sequence with comparison obtained by processing the original data obtained by sequencing. In some preferred embodiments, obtaining the sequences to be aligned of the test sample comprises: constructing a sequencing library of a sample to be detected; performing high-throughput sequencing on the sequencing library to obtain sequencing data; and preprocessing the sequencing data to obtain a sequence to be compared of the sample to be detected.
In certain preferred embodiments, the sequencing library is a hybrid capture library, preferably obtained by capture probes of SEQ ID NO:1 through SEQ ID NO: 36. The hybridization capture library can be used for detecting gene rearrangement aiming at sequencing data of a target gene. The capture probes of SEQ ID NO. 1 to SEQ ID NO. 36 can capture the whole exon sequence of the MLL gene, and thus can be used for detecting the exon rearrangement position of the gene and the corresponding sequence information thereof.
The method can accurately detect the position of the breakpoint of rearrangement of the target gene, and can also detect the expression quantity of the detected variant gene by using the sequence to be compared of the sample to be detected according to different research purposes. In certain preferred embodiments, after obtaining the breakpoint of the gene rearrangement, the above method further comprises a step of quantifying the rearranged gene, the step of quantifying comprising: counting the sequence number of breakpoints supporting gene rearrangement in the sequences to be compared according to the sequence information of the breakpoints of the gene rearrangement, and recording the sequence number as a marker sequence number; and dividing the marker sequence number by the sequence number of the internal reference gene to obtain a ratio which is the expression abundance of the rearranged gene relative to the internal reference gene. By detecting the expression level of a certain rearranged gene, it is possible to reflect the expression of the gene under a specific condition or in a specific treatment state, and further, by detecting the expression level of the gene under a series of different conditions or different states, it is possible to reflect the difference in expression. The above-mentioned reference gene can be appropriately selected according to actual needs, for example, when the gene to be detected is the MLL gene, the ABL1 gene can be usually selected as the reference gene.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for causing a computing device to execute the methods according to the embodiments of the present invention or a processor to execute the methods according to the embodiments of the present invention.
In a second exemplary embodiment of the present application, there is provided an apparatus for detecting gene rearrangement, the apparatus being used for storing or operating modules, or modules being part of the apparatus; the module is a software module, the number of the software modules is one or more, and the software module is used for executing any one of the methods. The device not only can more accurately detect the breakpoint position of gene rearrangement, but also can obtain the sequence information corresponding to the breakpoint position, so that the relative expression quantity of the gene can be conveniently detected according to the sequence information, the practicability and the application range are wider, and any variant gene with the gene rearrangement phenomenon can be detected by adopting the device.
Preferably, the above apparatus comprises: the device comprises an acquisition module, a comparison module, a candidate module and an assembly determination module, wherein the acquisition module is used for acquiring a sequence to be compared of a sample to be tested; the comparison module is used for comparing the sequence to be compared with the reference genome to obtain an abnormal comparison sequence, wherein the abnormal comparison sequence comprises a sequence with abnormal comparison position, a sequence with abnormal comparison direction and a sequence which is not compared with the reference genome; the candidate module is used for determining the position of a candidate breakpoint according to the comparison position and the comparison direction of the abnormal comparison sequence on the reference genome; and the assembly determining module is used for assembling by using the sequence supporting the position of the candidate breakpoint in the sequence to be compared, and keeping the breakpoint in the assembly result, which is consistent with the sequence information of the position of the candidate breakpoint, as the breakpoint of the gene rearrangement.
In a preferred embodiment, the candidate modules include: the system comprises a segmentation and comparison module and a candidate determination module, wherein the segmentation and comparison module is used for performing sequence segmentation on an abnormal comparison sequence and then comparing the abnormal comparison sequence with a reference genome, and the candidate determination module is used for determining the position of a candidate breakpoint according to the comparison position and the comparison direction of the segmented abnormal comparison sequence on the reference genome.
In a preferred embodiment, the candidate modules include: the system comprises a segmentation marking module, a simulation module, a comparison marking module and a candidate breakpoint module, wherein the segmentation marking module is used for segmenting the sequence of an abnormal comparison sequence and then comparing the sequence with a reference genome to obtain a sequence which can simultaneously span two sides of a potential breakpoint and has a first length, and the sequence is marked as a first marking sequence, can simultaneously span two sides of the potential breakpoint and has a length smaller than a second length and is used as a second marking sequence; the simulation module is used for simulating a breakpoint reference sequence in which gene rearrangement occurs according to the position of a potential breakpoint on the first marker sequence; the comparison marking module is used for comparing the sequence to be compared with the breakpoint reference sequence, marking the sequence which can be compared with the upper breakpoint reference sequence and cross the breakpoint on the breakpoint reference sequence, and recording the sequence as a breakpoint candidate sequence supporting the breakpoint; the candidate breakpoint module is configured to determine a position of a breakpoint on the breakpoint candidate sequence as a position of a candidate breakpoint.
In a preferred embodiment, the candidate breakpoint module includes: the device comprises a breakpoint correction module and a correction determination module, wherein the breakpoint correction module is used for correcting a breakpoint candidate sequence according to sequencing quality and a support sequence number to obtain a corrected candidate breakpoint sequence; and the correction determining module is used for determining the positions of the breakpoints on the corrected candidate breakpoint sequence as the positions of the candidate breakpoints.
In a preferred embodiment, the candidate breakpoint module includes: the breakpoint filtering module is used for filtering a false positive breakpoint sequence in the breakpoint candidate sequence according to a first marker sequence and a second marker sequence which support breakpoints on the breakpoint reference sequence and a paired sequence which supports breakpoints on the cross breakpoint reference sequence in the sequences to be compared to obtain a filtered candidate breakpoint sequence; and the filtering determination module is used for determining the positions of the breakpoints on the filtered candidate breakpoint sequence as the positions of the candidate breakpoints.
In a preferred embodiment, the assembly determination module comprises: the assembly submodule is used for assembling according to a first marker sequence and a second marker sequence which support breakpoints on the breakpoint reference sequence and paired sequences which support breakpoints on the cross breakpoint reference sequence in the sequences to be compared, and the retention module is used for retaining breakpoints which are consistent with sequence information of positions of candidate breakpoints in an assembly result and recording the breakpoints as gene rearrangement breakpoints.
In a preferred embodiment, the obtaining module comprises: the device comprises a construction module, a sequencing module and a pretreatment module, wherein the construction module is used for constructing a sequencing library of a sample to be detected; the sequencing module is used for carrying out high-throughput sequencing on the sequencing library to obtain sequencing data; the preprocessing module is used for preprocessing the sequencing data to obtain a sequence to be compared of the sample to be tested.
In a preferred embodiment, the sequencing library is a hybrid capture library, preferably obtained by using the capture probes of SEQ ID NO:1 to SEQ ID NO: 36.
In a preferred embodiment, the apparatus further comprises a quantification module for quantifying the rearranged genes, the quantification module comprising: the device comprises a statistic module and an expression quantity calculation module, wherein the statistic module is used for counting the sequence number of breakpoints supporting gene rearrangement in sequences to be compared according to the sequence information of the breakpoints of the gene rearrangement and recording the sequence number as a marker sequence number; and the expression quantity calculation module is used for dividing the marker sequence number and the sequence number of the internal reference gene, and the obtained ratio is the expression abundance of the rearranged gene relative to the internal reference gene.
In a third exemplary embodiment of the present application, a storage medium is provided, which includes a stored program, wherein the program performs any one of the above-described methods of detecting gene rearrangement.
In a fourth exemplary embodiment of the present application, a processor for executing a program is provided, wherein the program is executed to perform any one of the above-mentioned methods for detecting gene rearrangement.
The storage medium, the processor and the device can be used for executing the method for detecting gene rearrangement by a computer and outputting corresponding detection results, the products realize the detection of the gene rearrangement without adding any additional experiment and sequencing cost, and the device has low detection cost and high accuracy.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computing device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U disk, a Read (-) Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
The advantageous effects of the present application will be further described with reference to specific examples.
Example 1 method for detecting MLL-PTD Gene rearrangement
1. Sample and data
1) Bone marrow or peripheral blood of the patient is extracted and stored by a collection tube.
2) Extracting nucleic acid from the sample, and storing the rest sample at-80 deg.C.
3) A sequencing library was constructed and the target region was enriched by a hybrid capture method (hybrid capture probes for the MLL gene, 36 exons for the gene respectively, the specific sequences are shown in table 1 below).
4) And performing on-machine sequencing on the captured library.
Table 1:
Figure BDA0001931567130000081
Figure BDA0001931567130000091
Figure BDA0001931567130000101
Figure BDA0001931567130000111
Figure BDA0001931567130000121
Figure BDA0001931567130000131
2. pre-processing of sequencing data
1) Data quality control
Mainly deleting low-quality sequences, and removing sequences containing more than 5 bases N; sequences with an average sequencing quality of less than Q20 for 40 consecutive nucleotides were also deleted.
2) Alignment of MLL Gene sequences
The quality control-passed high quality sequences were aligned to the reference sequence with hisat2 for further analysis.
3. MLL-PTD recognition
1) Principle and theoretical basis:
MLL-PTD results in molecular level variation of MLL gene (total of 36 exons) as evidenced by altered exon junction order, with rearrangements typically occurring between exon2 and exon 11.
2) MLL-PTD breakpoint recognition:
firstly, the paired sequences are compared, and according to the position relation of the compared sequences, the structural variation existing between the sequence pairs is searched for the sequence pairs with abnormal position relation. And meanwhile, segmenting the abnormally compared sequences, comparing the sequences to possible positions by using a looser comparison method, determining the final comparison position and the comparison direction, and calculating the breakpoint position according to the comparison position of the cut sequences. As shown in fig. 1, the break-point crossing sequence can simultaneously cross a sequence of a first length on both sides of the break point as a first marker sequence, and can simultaneously cross a sequence of a second length as a second marker sequence. And simulating a breakpoint reference sequence of PTD generation by marking the breakpoint position of the sequence, re-comparing the sequence, and only reserving candidate breakpoints which are well compared and have a cross-breakpoint sequence.
3) Breakpoint correction
Because breakpoint edge sequences are similar and mutation or sequencing error exists, as shown in fig. 2, the sequencing quality and the number of support sequences are corrected according to the comparison score, and the optimal predicted breakpoint sequence is given as a candidate breakpoint.
4) Multifactorial filtration of false positives
For candidate breakpoints, false positives are further filtered according to the first marker sequence that supports breakpoints, the second marker sequence, and the pair-wise sequences that support cross-breakpoints. Then, as shown in fig. 2, all sequences supporting the break points are assembled, and the break points with the assembly result consistent with the break point sequence information are reserved. Thereby obtaining reliable MLL-PTD structure information.
4. MLL-PTD quantification
Based on the marker sequence number of MLL-PTD/the sequence depth of the internal reference gene ABL1, the abundance ratio of the marker sequence number to the ABL1 gene is obtained.
Specifically, 122 samples were tested according to the method of the present application shown in FIG. 2, and 10 samples were detected to have MLL-PTD variation, and the results are reported in tables 2 and 3 below.
Table 2:
Figure BDA0001931567130000141
table 3:
sample numbering SEQ ID NO: Fusion sequence
A 37 AGAGGTCTCTGATGAGTCACTTTCTTGACC@cttttcttttggtttttgttttacagggat
B 38 AGAGGTCTCTGATGAGTCACTTTCTTGACC@cttttcttttggtttttgttttacagggat
C 39 ATCTGAGCCAAAACCTAAGAATTGCTCATC@cttttcttttggtttttgttttacagggat
D
40 CATCTTCTGAGCCAGCAATTGATGACTTGT@cttttcttttggtttttgttttacagggat
E 41 ATCTGAGCCAAAACCTAAGAATTGCTCATC@cttttcttttggtttttgttttacagggat
F 42 ATCTGAGCCAAAACCTAAGAATTGCTCATC@cttttcttttggtttttgttttacagggat
G 43 ATCTGAGCCAAAACCTAAGAATTGCTCATC@cttaaagtccactctgatcctgtggactcc
H 44 ATCTGAGCCAAAACCTAAGAATTGCTCATC@ctgattctggtggtggaggctgctttttct
I 45 ATCTGAGCCAAAACCTAAGAATTGCTCATC@cttttcttttggtttttgttttacagggat
J 46 ATCTGAGCCAAAACCTAAGAATTGCTCATC@cttttcttttggtttttgttttacagggat
K 47 CATCTTCTGAGCCAGCAATTGATGACTTGT@cttttcttttggtttttgttttacagggat
Fusion sequences in table 3 are reverse complementary sequences, e.g. a: exon8- > exon4 lower case letters represent the sequence of exon8, upper case for exon 4.
2. And selecting a sample C to perform Sanger sequencing verification on the detected MLL-PTD breakpoint structure.
The PCR verified sample information is shown in table 4 below.
Table 4:
sample numbering MLL-PTD structure Exon A Exon B Marker sequence number Ratio
C exon8->exon2 exon8 exon2 231 17.12%
Sequence information obtained by verification is as follows:
ATCTGAGCCAAAACCTAAGAATTGCTCATC@cttttcttttggtttttgttttacagggat
(i.e., SEQ ID NO: 39).
3. Generating a breakpoint template sequence according to the breakpoint position, designing primers at 300bp before and after the breakpoint, and performing PCR amplification.
4. The PCR product has reasonable size and bright and single band, and the sequencing peak map is clean when the PCR product is subjected to Sanger sequencing.
5. Breakpoint structures can be found according to Sanger sequencing results, and bases before and after the breakpoints are completely consistent with breakpoint sequences identified by the method in the application (the sequencing result of the forward primer is shown in figure 3, and the sequencing result of the reverse complement is shown in figure 4).
From the above description, it can be seen that the above-described embodiments of the present invention achieve the following technical effects: the method can not only detect the known or unknown rearrangement phenomenon, but also accurately detect the specific position where the rearrangement occurs and the corresponding sequence information. The method has wide application range and is suitable for detecting all genes with rearrangement phenomena.
This method directly utilizes NGS sequencing data, is based on statistical and algorithmic development, and does not add any additional experimental detection cost. In addition, the method has high detection accuracy and low cost, and is suitable for structural rearrangement detection of low-abundance genes.
From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially implemented or the portions contributing to the prior art may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method of the embodiments or some portions of the embodiments of the present application.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and they may alternatively be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, or fabricated separately as individual integrated circuit modules, or fabricated as a single integrated circuit module from multiple modules or steps. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Sequence listing
<110> Beijing excel medical examination laboratory Co., Ltd
<120> method, apparatus, storage medium, and processor for detecting gene rearrangement.
<130> PN102308YXYX
<160> 47
<170> SIPOSequenceListing 1.0
<210> 1
<211> 455
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 1
ctgcttcact tcacggggcg aacatggcgc acagctgtcg gtggcgcttc cccgcccgac 60
ccgggaccac cgggggcggc ggcggcgggg ggcgccgggg cctagggggc gccccgcggc 120
aacgcgtccc ggccctgctg cttccccccg ggcccccggt cggcggtggc ggccccgggg 180
cgcccccctc ccccccggct gtggcggccg cggcggcggc ggcgggaagc agcggggctg 240
gggttccagg gggagcggcc gccgcctcag cagcctcctc gtcgtccgcc tcgtcttcgt 300
cttcgtcatc gtcctcagcc tcttcagggc cggccctgct ccgggtgggc ccgggcttcg 360
acgcggcgct gcaggtctcg gccgccatcg gcaccaacct gcgccggttc cgggccgtgt 420
ttggggagag cggcggggga ggcggcagcg gagag 455
<210> 2
<211> 70
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 2
gatgagcaat tcttaggttt tggctcagat gaagaagtca gagtgcgaag tcccacaagg 60
tctccttcag 70
<210> 3
<211> 2654
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 3
ttaaaactag tcctcgaaaa cctcgtggga gacctagaag tggctctgac cgaaattcag 60
ctatcctctc agatccatct gtgttttccc ctctaaataa atcagagacc aaatctggag 120
ataagatcaa gaagaaagat tctaaaagta tagaaaagaa gagaggaaga cctcccacct 180
tccctggagt aaaaatcaaa ataacacatg gaaaggacat ttcagagtta ccaaagggaa 240
acaaagaaga tagcctgaaa aaaattaaaa ggacaccttc tgctacgttt cagcaagcca 300
caaagattaa aaaattaaga gcaggtaaac tctctcctct caagtctaag tttaagacag 360
ggaagcttca aataggaagg aagggggtac aaattgtacg acggagagga aggcctccat 420
caacagaaag gataaagacc ccttcgggtc tcctcattaa ttctgaactg gaaaagcccc 480
agaaagtccg gaaagacaag gaaggaacac ctccacttac aaaagaagat aagacagttg 540
tcagacaaag ccctcgaagg attaagccag ttaggattat tccttcttca aaaaggacag 600
atgcaaccat tgctaagcaa ctcttacaga gggcaaaaaa gggggctcaa aagaaaattg 660
aaaaagaagc agctcagctg cagggaagaa aggtgaagac acaggtcaaa aatattcgac 720
agttcatcat gcctgttgtc agtgctatct cctcgcggat cattaagacc cctcggcggt 780
ttatagagga tgaggattat gaccctccaa ttaaaattgc ccgattagag tctacaccga 840
atagtagatt cagtgccccg tcctgtggat cttctgaaaa atcaagtgca gcttctcagc 900
actcctctca aatgtcttca gactcctctc gatctagtag ccccagtgtt gatacctcca 960
cagactctca ggcttctgag gagattcagg tacttcctga ggagcggagc gatacccctg 1020
aagttcatcc tccactgccc atttcccagt ccccagaaaa tgagagtaat gataggagaa 1080
gcagaaggta ttcagtgtcg gagagaagtt ttggatctag aacgacgaaa aaattatcaa 1140
ctctacaaag tgccccccag cagcagacct cctcgtctcc acctccacct ctgctgactc 1200
caccgccacc actgcagcca gcctccagta tctctgacca cacaccttgg cttatgcctc 1260
caacaatccc cttagcatca ccatttttgc ctgcttccac tgctcctatg caagggaagc 1320
gaaaatctat tttgcgagaa ccgacattta ggtggacttc tttaaagcat tctaggtcag 1380
agccacaata cttttcctca gcaaagtatg ccaaagaagg tcttattcgc aaaccaatat 1440
ttgataattt ccgaccccct ccactaactc ccgaggacgt tggctttgca tctggttttt 1500
ctgcatctgg taccgctgct tcagcccgat tgttttcgcc actccattct ggaacaaggt 1560
ttgatatgca caaaaggagc cctcttctga gagctccaag atttactcca agtgaggctc 1620
actctagaat atttgagtct gtaaccttgc ctagtaatcg aacttctgct ggaacatctt 1680
cttcaggagt atccaataga aaaaggaaaa gaaaagtgtt tagtcctatt cgatctgaac 1740
caagatctcc ttctcactcc atgaggacaa gaagtggaag gcttagtagt tctgagctct 1800
cacctctcac ccccccgtct tctgtctctt cctcgttaag catttctgtt agtcctcttg 1860
ccactagtgc cttaaaccca acttttactt ttccttctca ttccctgact cagtctgggg 1920
aatctgcaga gaaaaatcag agaccaagga agcagactag tgctccggca gagccatttt 1980
catcaagtag tcctactcct ctcttccctt ggtttacccc aggctctcag actgaaagag 2040
ggagaaataa agacaaggcc cccgaggagc tgtccaaaga tcgagatgct gacaagagcg 2100
tggagaagga caagagtaga gagagagacc gggagagaga aaaggagaat aagcgggagt 2160
caaggaaaga gaaaaggaaa aagggatcag aaattcagag tagttctgct ttgtatcctg 2220
tgggtagggt ttccaaagag aaggttgttg gtgaagatgt tgccacttca tcttctgcca 2280
aaaaagcaac agggcggaag aagtcttcat cacatgattc tgggactgat attacttctg 2340
tgactcttgg ggatacaaca gctgtcaaaa ccaaaatact tataaagaaa gggagaggaa 2400
atctggaaaa aaccaacttg gacctcggcc caactgcccc atccctggag aaggagaaaa 2460
ccctctgcct ttccactcct tcatctagca ctgttaaaca ttccacttcc tccataggct 2520
ccatgttggc tcaggcagac aagcttccaa tgactgacaa gagggttgcc agcctcctaa 2580
aaaaggccaa agctcagctc tgcaagattg agaagagtaa gagtcttaaa caaaccgacc 2640
agcccaaagc acag 2654
<210> 4
<211> 178
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 4
ggtcaagaaa gtgactcatc agagacctct gtgcgaggac cccggattaa acatgtctgc 60
agaagagcag ctgttgccct tggccgaaaa cgagctgtgt ttcctgatga catgcccacc 120
ctgagtgcct taccatggga agaacgagaa aagattttgt cttccatggg gaatgatg 178
<210> 5
<211> 235
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 5
acaagtcatc aattgctggc tcagaagatg ctgaacctct tgctccaccc atcaaaccaa 60
ttaaacctgt cactagaaac aaggcacccc aggaacctcc agtaaagaaa ggacgtcgat 120
cgaggcggtg tgggcagtgt cccggctgcc aggtgcctga ggactgtggt gtttgtacta 180
attgcttaga taagcccaag tttggtggtc gcaatataaa gaagcagtgc tgcaa 235
<210> 6
<211> 65
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 6
gatgagaaaa tgtcagaatc tacaatggat gccttccaaa gcctacctgc agaagcaagc 60
taaag 65
<210> 7
<211> 378
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 7
ctgtgaaaaa gaaagagaaa aagtctaaga ccagtgaaaa gaaagacagc aaagagagca 60
gtgttgtgaa gaacgtggtg gactctagtc agaaacctac cccatcagca agagaggatc 120
ctgccccaaa gaaaagcagt agtgagcctc ctccacgaaa gcccgtcgag gaaaagagtg 180
aagaagggaa tgtctcggcc cctgggcctg aatccaaaca ggccaccact ccagcttcca 240
ggaagtcaag caagcaggtc tcccagccag cactggtcat cccgcctcag ccacctacta 300
caggaccgcc aagaaaagaa gttcccaaaa ccactcctag tgagcccaag aaaaagcagc 360
ctccaccacc agaatcag 378
<210> 8
<211> 74
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 8
gtccagagca gagcaaacag aaaaaagtgg ctccccgccc aagtatccct gtaaaacaaa 60
aaccaaaaga aaag 74
<210> 9
<211> 132
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 9
gaaaaaccac ctccggtcaa taagcaggag aatgcaggca ctttgaacat cctcagcact 60
ctctccaatg gcaatagttc taagcaaaaa attccagcag atggagtcca caggatcaga 120
gtggacttta ag 132
<210> 10
<211> 114
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 10
gaggattgtg aagcagaaaa tgtgtgggag atgggaggct taggaatctt gacttctgtt 60
cctataacac ccagggtggt ttgctttctc tgtgccagta gtgggcatgt agag 114
<210> 11
<211> 147
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 11
tttgtgtatt gccaagtctg ttgtgagccc ttccacaagt tttgtttaga ggagaacgag 60
cgccctctgg aggaccagct ggaaaattgg tgttgtcgtc gttgcaaatt ctgtcacgtt 120
tgtggaaggc aacatcaggc tacaaag 147
<210> 12
<211> 96
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 12
cagctgctgg agtgtaataa gtgccgaaac agctatcacc ctgagtgcct gggaccaaac 60
taccccacca aacccacaaa gaagaagaaa gtctgg 96
<210> 13
<211> 121
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 13
atctgtacca agtgtgttcg ctgtaagagc tgtggatcca caactccagg caaagggtgg 60
gatgcacagt ggtctcatga tttctcactg tgtcatgatt gcgccaagct ctttgctaaa 120
g 121
<210> 14
<211> 123
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 14
gaaacttctg ccctctctgt gacaaatgtt atgatgatga tgactatgag agtaagatga 60
tgcaatgtgg aaagtgtgat cgctgggtcc attccaaatg tgagaatctt tcaggtacag 120
aag 123
<210> 15
<211> 185
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 15
atgagatgta tgagattcta tctaatctgc cagaaagtgt ggcctacact tgtgtgaact 60
gtactgagcg gcaccctgca gagtggcgac tggcccttga aaaagagctg cagatttctc 120
tgaagcaagt tctgacagct ttgttgaatt ctcggactac cagccatttg ctacgctacc 180
ggcag 185
<210> 16
<211> 174
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 16
gctgccaagc ctccagactt aaatcccgag acagaggaga gtataccttc ccgcagctcc 60
cccgaaggac ctgatccacc agttcttact gaggtcagca aacaggatga tcagcagcct 120
ttagatctag aaggagtcaa gaggaagatg gaccaaggga attacacatc tgtg 174
<210> 17
<211> 111
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 17
ttggagttca gtgatgatat tgtgaagatc attcaagcag ccattaattc agatggagga 60
cagccagaaa ttaaaaaagc caacagcatg gtcaagtcct tcttcattcg g 111
<210> 18
<211> 74
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 18
caaatggaac gtgtttttcc atggttcagt gtcaaaaagt ccaggttttg ggagccaaat 60
aaagtatcaa gcaa 74
<210> 19
<211> 194
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 19
cagtgggatg ttaccaaacg cagtgcttcc accttcactt gaccataatt atgctcagtg 60
gcaggagcga gaggaaaaca gccacactga gcagcctcct ttaatgaaga aaatcattcc 120
agctcccaaa cccaaaggtc ctggagaacc agactcacca actcctctgc atcctcctac 180
accaccaatt ttga 194
<210> 20
<211> 107
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 20
gtactgatag gagtcgagaa gacagtccag agctgaaccc acccccaggc atagaagaca 60
atagacagtg tgcgttatgt ttgacttatg gtgatgacag tgctaat 107
<210> 21
<211> 138
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 21
gatgctggtc gtttactata tattggccaa aatgagtgga cacatgtaaa ttgtgctttg 60
tggtcagcgg aagtgtttga agatgatgac ggatcactaa agaatgtgca tatggctgtg 120
atcaggggca agcagctg 138
<210> 22
<211> 159
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 22
agatgtgaat tctgccaaaa gccaggagcc accgtgggtt gctgtctcac atcctgcacc 60
agcaactatc acttcatgtg ttcccgagcc aagaactgtg tctttctgga tgataaaaaa 120
gtatattgcc aacgacatcg ggatttgatc aaaggcgaa 159
<210> 23
<211> 118
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 23
gtggttcctg agaatggatt tgaagttttc agaagagtgt ttgtggactt tgaaggaatc 60
agcttgagaa ggaagtttct caatggcttg gaaccagaaa atatccacat gatgattg 118
<210> 24
<211> 79
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 24
ggtctatgac aatcgactgc ttaggaattc taaatgatct ctccgactgt gaagataagc 60
tctttcctat tggatatca 79
<210> 25
<211> 161
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 25
gtgttccagg gtatactgga gcaccacaga tgctcgcaag cgctgtgtat atacatgcaa 60
gatagtggag tgccgtcctc cagtcgtaga gccggatatc aacagcactg ttgaacatga 120
tgaaaacagg accattgccc atagtccaac atcttttaca g 161
<210> 26
<211> 186
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 26
aaagttcatc aaaagagagt caaaacacag ctgaaattat aagtcctcca tcaccagacc 60
gacctcctca ttcacaaacc tctggctcct gttattatca tgtcatctca aaggtcccca 120
ggattcgaac acccagttat tctccaacac agagatcccc tggctgtcga ccgttgcctt 180
ctgcag 186
<210> 27
<211> 4249
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 27
gaagtcctac cccaaccact catgaaatag tcacagtagg tgatccttta ctctcctctg 60
gacttcgaag cattggctcc aggcgtcaca gtacctcttc cttatcaccc cagcggtcca 120
aactccggat aatgtctcca atgagaactg ggaatactta ctctaggaat aatgtttcct 180
cagtctccac caccgggacc gctactgatc ttgaatcaag tgccaaagta gttgatcatg 240
tcttagggcc actgaattca agtactagtt tagggcaaaa cacttccacc tcttcaaatt 300
tgcaaaggac agtggttact gtaggcaata aaaacagtca cttggatgga tcttcatctt 360
cagaaatgaa gcagtccagt gcttcagact tggtgtccaa gagctcctct ttaaagggag 420
agaagaccaa agtgctgagt tccaagagct cagagggatc tgcacataat gtggcttacc 480
ctggaattcc taaactggcc ccacaggttc ataacacaac atctagagaa ctgaatgtta 540
gtaaaatcgg ctcctttgct gaaccctctt cagtgtcgtt ttcttctaaa gaggccctct 600
ccttcccaca cctccatttg agagggcaaa ggaatgatcg agaccaacac acagattcta 660
cccaatcagc aaactcctct ccagatgaag atactgaagt caaaaccttg aagctatctg 720
gaatgagcaa cagatcatcc attatcaacg aacatatggg atctagttcc agagatagga 780
gacagaaagg gaaaaaatcc tgtaaagaaa ctttcaaaga aaagcattcc agtaaatctt 840
ttttggaacc tggtcaggtg acaactggtg aggaaggaaa cttgaagcca gagtttatgg 900
atgaggtttt gactcctgag tatatgggcc aacgaccatg taacaatgtt tcttctgata 960
agattggtga taaaggcctt tctatgccag gagtccccaa agctccaccc atgcaagtag 1020
aaggatctgc caaggaatta caggcaccac ggaaacgcac agtcaaagtg acactgacac 1080
ctctaaaaat ggaaaatgag agtcaatcca aaaatgccct gaaagaaagt agtcctgctt 1140
cccctttgca aatagagtca acatctccca cagaaccaat ttcagcctct gaaaatccag 1200
gagatggtcc agtggcccaa ccaagcccca ataatacctc atgccaggat tctcaaagta 1260
acaactatca gaatcttcca gtacaggaca gaaacctaat gcttccagat ggccccaaac 1320
ctcaggagga tggctctttt aaaaggaggt atccccgtcg cagtgcccgt gcacgttcta 1380
acatgttttt tgggcttacc ccactctatg gagtaagatc ctatggtgaa gaagacattc 1440
cattctacag cagctcaact gggaagaagc gaggcaagag atcagctgaa ggacaggtgg 1500
atggggccga tgacttaagc acttcagatg aagacgactt atactattac aacttcacta 1560
gaacagtgat ttcttcaggt ggagaggaac gactggcatc ccataattta tttcgggagg 1620
aggaacagtg tgatcttcca aaaatctcac agttggatgg tgttgatgat gggacagaga 1680
gtgatactag tgtcacagcc acaacaagga aaagcagcca gattccaaaa agaaatggta 1740
aagaaaatgg aacagagaac ttaaagattg atagacctga agatgctggg gagaaagaac 1800
atgtcactaa gagttctgtt ggccacaaaa atgagccaaa gatggataac tgccattctg 1860
taagcagagt taaaacacag ggacaagatt ccttggaagc tcagctcagc tcattggagt 1920
caagccgcag agtccacaca agtaccccct ccgacaaaaa tttactggac acctataata 1980
ctgagctcct gaaatcagat tcagacaata acaacagtga tgactgtggg aatatcctgc 2040
cttcagacat tatggacttt gtactaaaga atactccatc catgcaggct ttgggtgaga 2100
gcccagagtc atcttcatca gaactcctga atcttggtga aggattgggt cttgacagta 2160
atcgtgaaaa agacatgggt ctttttgaag tattttctca gcagctgcct acaacagaac 2220
ctgtggatag tagtgtctct tcctctatct cagcagagga acagtttgag ttgcctctag 2280
agctaccatc tgatctgtct gtcttgacca cccggagtcc cactgtcccc agccagaatc 2340
ccagtagact agctgttatc tcagactcag gggagaagag agtaaccatc acagaaaaat 2400
ctgtagcctc ctctgaaagt gacccagcac tgctgagccc aggagtagat ccaactcctg 2460
aaggccacat gactcctgat cattttatcc aaggacacat ggatgcagac cacatctcta 2520
gccctccttg tggttcagta gagcaaggtc atggcaacaa tcaggattta actaggaaca 2580
gtagcacccc tggccttcag gtacctgttt ccccaactgt tcccatccag aaccagaagt 2640
atgtgcccaa ttctactgat agtcctggcc cgtctcagat ttccaatgca gctgtccaga 2700
ccactccacc ccacctgaag ccagccactg agaaactcat agttgttaac cagaacatgc 2760
agccacttta tgttctccaa actcttccaa atggagtgac ccaaaaaatc caattgacct 2820
cttctgttag ttctacaccc agtgtgatgg agacaaatac ttcagtattg ggacccatgg 2880
gaggtggtct cacccttacc acaggactaa atccaagctt gccaacttct caatctttgt 2940
tcccttctgc tagcaaagga ttgctaccca tgtctcatca ccagcactta cattccttcc 3000
ctgcagctac tcaaagtagt ttcccaccaa acatcagcaa tcctccttca ggcctgctta 3060
ttggggttca gcctcctccg gatccccaac ttttggtttc agaatccagc cagaggacag 3120
acctcagtac cacagtagcc actccatcct ctggactcaa gaaaagaccc atatctcgtc 3180
tacagacccg aaagaataaa aaacttgctc cctctagtac cccttcaaac attgcccctt 3240
ctgatgtggt ttctaatatg acattgatta acttcacacc ctcccagctt cctaatcatc 3300
caagtctgtt agatttgggg tcacttaata cttcatctca ccgaactgtc cccaacatca 3360
taaaaagatc taaatctagc atcatgtatt ttgaaccggc acccctgtta ccacagagtg 3420
tgggaggaac tgctgccaca gcggcaggca catcaacaat aagccaggat actagccacc 3480
tcacatcagg gtctgtgtct ggcttggcat ccagttcctc tgtcttgaat gttgtatcca 3540
tgcaaactac cacaacccct acaagtagtg cgtcagttcc aggacacgtc accttaacca 3600
acccaaggtt gcttggtacc ccagatattg gctcaataag caatctttta atcaaagcta 3660
gccagcagag cctggggatt caggaccagc ctgtggcttt accgccaagt tcaggaatgt 3720
ttccacaact ggggacatca cagaccccct ctactgctgc aataacagcg gcatctagca 3780
tctgtgtgct cccctccact cagactacgg gcataacagc cgcttcacct tctggggaag 3840
cagacgaaca ctatcagctt cagcatgtga accagctcct tgccagcaaa actgggattc 3900
attcttccca gcgtgatctt gattctgctt cagggcccca ggtatccaac tttacccaga 3960
cggtagacgc tcctaatagc atgggactgg agcagaacaa ggctttatcc tcagctgtgc 4020
aagccagccc cacctctcct gggggttctc catcctctcc atcttctgga cagcggtcag 4080
caagcccttc agtgccgggt cccactaaac ccaaaccaaa aaccaaacgg tttcagctgc 4140
ctctagacaa agggaatggc aagaagcaca aagtttccca tttgcggacc agttcttctg 4200
aagcacacat tccagaccaa gaaacgacat ccctgacctc aggcacagg 4249
<210> 28
<211> 81
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 28
gactccagga gcagaggctg agcagcagga tacagctagc gtggagcagt cctcccagaa 60
ggagtgtggg caacctgcag g 81
<210> 29
<211> 65
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 29
gcaagtcgct gttcttccgg aagttcaggt gacccaaaat ccagcaaatg aacaagaaag 60
tgcag 65
<210> 30
<211> 171
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 30
aacctaaaac agtggaagaa gaggaaagta atttcagctc cccactgatg ctttggcttc 60
agcaagaaca aaagcggaag gaaagcatta ctgagaaaaa acccaagaaa ggacttgttt 120
ttgaaatttc cagtgatgat ggctttcaga tctgtgcaga aagtattgaa g 171
<210> 31
<211> 75
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 31
atgcctggaa gtcattgaca gataaagtcc aggaagctcg atcaaatgcc cgcctaaagc 60
agctctcatt tgcag 75
<210> 32
<211> 175
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 32
gtgttaacgg tttgaggatg ctggggattc tccatgatgc agttgtgttc ctcattgagc 60
agctgtctgg tgccaagcac tgtcgaaatt acaaattccg tttccacaag ccagaggagg 120
ccaatgaacc ccccttgaac cctcacggct cagccagggc tgaagtccac ctcag 175
<210> 33
<211> 108
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 33
gaagtcagca tttgacatgt ttaacttcct ggcttctaaa catcgtcagc ctcctgaata 60
caaccccaat gatgaagaag aggaggaggt acagctgaag tcagctcg 108
<210> 34
<211> 84
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 34
gagggcaact agcatggatc tgccaatgcc catgcgcttc cggcacttaa aaaagacttc 60
taaggaggca gttggtgtct acag 84
<210> 35
<211> 130
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 35
gtctcccatc catggccggg gtcttttctg taagagaaac attgatgcag gtgagatggt 60
gattgagtat gccggcaacg tcatccgctc catccagact gacaagcggg aaaagtatta 120
cgacagcaag 130
<210> 36
<211> 4928
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 36
ggcattggtt gctatatgtt ccgaattgat gactcagagg tagtggatgc caccatgcat 60
ggaaatgctg cacgcttcat caatcactcg tgtgagccta actgctattc tcgggtcatc 120
aatattgatg ggcagaagca cattgtcatc tttgccatgc gtaagatcta ccgaggagag 180
gaactcactt acgactataa gttccccatt gaggatgcca gcaacaagct gccctgcaac 240
tgtggcgcca agaaatgccg gaagttccta aactaaagct gctcttctcc cccagtgttg 300
gagtgcaagg aggcggggcc atccaaagca acgctgaagg ccttttccag cagctgggag 360
ctcccggatt gcgtggcaca gctgaggggc ctctgtgatg gctgagctct cttatgtcct 420
atactcacat cagacatgtg atcatagtcc cagagacaga gttgaggtct cgaagaaaag 480
atccatgatc ggctttctcc tggggcccct ccaattgttt actgttagaa agtgggaatg 540
gggtccctag cagacttgcc tggaaggagc ctattataga gggttggtta tgttgggaga 600
ttgggcctga atttctccac agaaataagt tgccatcctc aggttggccc tttcccaagc 660
actgtaagtg agtgggtcag gcaaagcccc aaatggaggg ttggttagat tcctgacagt 720
ttgccagcca ggccccacct acagcgtctg tcgaacaaac agaggtctgg tggttttccc 780
tactatcctc ccactcgaga gttcacttct ggttgggaga caggattcct agcacctccg 840
gtgtcaaaag gctgtcatgg ggttgtgcca attaattacc aaacattgag cctgcaggct 900
ttgagtggga gtgttgcccc caggagcctt atctcagcca attacctttc ttgacagtag 960
gagcggcttc cctctcccat tccctcttca ctcccttttc ttcctttccc ctgtcttcat 1020
gccactgctt tcccatgctt ctttcgggtt gtaggggaga ctgactgcct gctcaaggac 1080
actccctgct gggcatagga tgtgcctgca aaaagttccc tgagcctgta agcactccag 1140
gtggggaagt ggacaggagc cattggtcat aaccagacag aatttggaaa cattttcata 1200
aagctccatg gagagtttta aagaaacata tgtagcatga ttttgtagga gaggaaaaag 1260
attatttaaa taggatttaa atcatgcaac aacgagagta tcacagccag gatgaccctt 1320
gggtcccatt cctaagacat ggttacttta ttttcccctt gttaagacat aggaagactt 1380
aatttttaaa cggtcagtgt ccagttgaag gcagaacact aatcagattt caaggcccac 1440
aacttgggga ctagaccacc ttatgttgag ggaactctgc cacctgcgtg caacccacag 1500
ctaaagtaaa ttcaatgaca ctactgccct gattactcct taggatgtgg tcaaaacagc 1560
atcaaatgtt tcttctcttc ctttccccaa gacagagtcc tgaacctgtt aaattaagtc 1620
attggatttt actctgttct gtttacagtt tactatttaa ggttttataa atgtaaatat 1680
attttgtata tttttctatg agaagcactt catagggaga agcacttatg acaaggctat 1740
tttttaaacc gcggtattat cctaatttaa aagaagatcg gtttttaata attttttatt 1800
ttcataggat gaagttagag aaaatattca gctgtacaca caaagtctgg tttttcctgc 1860
ccaacttccc cctggaaggt gtactttttg ttgtttaatg tgtagcttgt ttgtgccctg 1920
ttgacataaa tgtttcctgg gtttgctctt tgacaataaa tggagaagga aggtcaccca 1980
actccattgg gccactcccc tccttcccct attgaagctc ctcaaaaggc tacagtaata 2040
tcttgataca acagattctc ttctttcccg cctctctcct ttccggcgca acttccagag 2100
tggtgggaga cggcaatctt tacatttccc tcatctttct tacttcagag ttagcaaaca 2160
acaagttgaa tggcaacttg acatttttgc atcaccatct gcctcatagg ccactctttc 2220
ctttccctct gcccaccaag tcctcatatc tgcagagaac ccattgatca ccttgtgccc 2280
tcttttgggg cagcctgttg aaactgaagc acagtctgac cactcacgat aaagcagatt 2340
tttctctgcc tctgccacaa ggtttcagag tagtgtagtc caagtagagg gtggggcacc 2400
cttttctcgc cgcaagaagc ccattcctat ggaagtctag caaagcaata cgactcagcc 2460
cagcactctc tgccccagga ctcatggctc tgctgtgcct tccatcctgg gctcccttct 2520
ctcctgtgac cttaagaact ttgtctggtg gctttgctgg aacattgtca ctgttttcac 2580
tgtcatgcag ggagcccagc actgtggcca ggatggcaga gacttccttg tcatcatgga 2640
gaagtgccag caggggactg ggaaaagcac tctacccaga cctcacctcc cttcctcctt 2700
ttgcccatga acaagatgca gtggccctag gggttccact agtgtctgct ttcctttatt 2760
attgcactgt gtgaggtttt tttgtaaatc cttgtattcc tatttttttt aaagaaaaaa 2820
aaaaaacctt aagctgcatt tgttactgaa atgattaatg cactgatggg tcctgaattc 2880
accttgagaa agacccaaag gccagtcagg gggtgggggg aactcagcta aatagaccta 2940
gttactgccc tgctaggcca tgctgtactg tgagcccctc ctcactctct accaacccta 3000
aaccctgagg acaggggagg aacccacagc ttccttctcc tgccagctgc agatggtttg 3060
ccttgccttt ccacccccta attgtcaacc acaaaaatga gaaattcctc ttctagctca 3120
gccttgagtc cattgccaaa ttttcagcac acctgccagc aacttggggg aataagcgaa 3180
ggtttcccta caagagggaa agaaggcaaa aacggcacag ctatctccaa acacatctga 3240
gttcatttca aaagtgacca agggaatctc cgcacaaaag tgcagattga ggaattgtga 3300
tgggtcattc ccaagaatcc cccaaggggc atcccaaatc cctgaggagt aacagctgca 3360
aacctggtca gttctcagtg agagccagct cacttatagc tttgctgcta gaacctgttg 3420
tggctgcatt tcctggtggc cagtgacaac tgtgtaacca gaatagctgc atggcgctga 3480
ccctttggcc ggaacttggt ctcttggctc cctccttggc cacccaccac ctctcgcaca 3540
gcccctctgt ttttacacca ataacaagaa ttaaggggga agccctggca gctatacgtt 3600
ttcaaccaga ctcctttgcc gggacccagc ccgccaccct gctcgcctcc gtcaaacccc 3660
cggccaatgc agtgagcacc atgtagctcc cttgatttaa aaaaaataaa aaataaaaaa 3720
aaaaggaaaa aaaaatacaa cacacacaca aaaataaaaa aaatattcta atgaatgtat 3780
ctttctaaag gactgacgtt caatcaaata tctgaaaata ctaaaggtca aaaccttgtc 3840
agatgttaac ttctaagttc ggtttgggat tttttttttt taatagaaat caagttgttt 3900
ttgtttttaa ggaaaagcgg gtcattgcaa agggctgggt gtaattttat gtttcatttc 3960
cttcatttta aagcaataca aggttatgga gcagatggtt ttgtgccgaa tcatgaatac 4020
tagtcaagtc acacactctg gaaacttgca actttttgtt tgttttggtt ttcaaataaa 4080
tataaatatg atatatatag gaactaatat agtaatgcac catgtaacaa agcctagttc 4140
agtccatggc ttttaattct cttaacacta tagataagga ttgtgttaca gttgctagta 4200
gcggcaggaa gatgtcaggc tcactttcct ctgattcccg aaatgggggg aacctctaac 4260
cataaaggaa tggtagaaca gtccattcct cggatcagag aaaaatgcag acatggtgtc 4320
acctggattt ttttctgccc atgaatgttg ccagtcagta cctgtcctcc ttgtttctct 4380
atttttggtt atgaatgttg gggttaccac ctgcatttag gggaaaattg tgttctgtgc 4440
tttcctggta tcttgttccg aggtactcta gttctgtctt tcaaccaaga aaatagaatt 4500
gtggtgtttc ttttattgaa cttttaacag tctctttagt aaatacaggt agttgaataa 4560
ttgtttcaag agctcaacag atgacaagct tcttttctag aaataagaca ttttttgaca 4620
actttatcat gtataacaga tctgtttttt ttccttgtgt tcttccaagc ttctggttag 4680
agaaaaagag aaaaaaaaaa aaggaaaatg tgtctaaagt ccatcagtgt taactccctg 4740
tgacagggat gaaggaaaat actttaatag ttcaaaaaat aataatgctg aaagctctct 4800
acgaaagact gaatgtaaaa gtaaaaagtg tacatagttg taaaaaaaag gagtttttaa 4860
acatgtttat tttctatgca ctttttttta tttaagtgat agtttaatta ataaacatgt 4920
caagttta 4928
<210> 37
<211> 60
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 37
agaggtctct gatgagtcac tttcttgacc cttttctttt ggtttttgtt ttacagggat 60
<210> 38
<211> 60
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 38
agaggtctct gatgagtcac tttcttgacc cttttctttt ggtttttgtt ttacagggat 60
<210> 39
<211> 60
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 39
atctgagcca aaacctaaga attgctcatc cttttctttt ggtttttgtt ttacagggat 60
<210> 40
<211> 60
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 40
catcttctga gccagcaatt gatgacttgt cttttctttt ggtttttgtt ttacagggat 60
<210> 41
<211> 60
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 41
atctgagcca aaacctaaga attgctcatc cttttctttt ggtttttgtt ttacagggat 60
<210> 42
<211> 60
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 42
atctgagcca aaacctaaga attgctcatc cttttctttt ggtttttgtt ttacagggat 60
<210> 43
<211> 60
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 43
atctgagcca aaacctaaga attgctcatc cttaaagtcc actctgatcc tgtggactcc 60
<210> 44
<211> 60
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 44
atctgagcca aaacctaaga attgctcatc ctgattctgg tggtggaggc tgctttttct 60
<210> 45
<211> 60
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 45
atctgagcca aaacctaaga attgctcatc cttttctttt ggtttttgtt ttacagggat 60
<210> 46
<211> 60
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 46
atctgagcca aaacctaaga attgctcatc cttttctttt ggtttttgtt ttacagggat 60
<210> 47
<211> 60
<212> DNA
<213> Intelligent (Homo sapiens)
<400> 47
catcttctga gccagcaatt gatgacttgt cttttctttt ggtttttgtt ttacagggat 60

Claims (11)

1. A method of detecting gene rearrangement, the method comprising:
obtaining a sequence to be compared of a sample to be detected;
comparing the sequence to be compared with a reference genome to obtain an abnormal comparison sequence, wherein the abnormal comparison sequence comprises a sequence with abnormal comparison position, a sequence with abnormal comparison direction and a sequence which is not compared with the reference genome;
determining the position of a candidate breakpoint according to the alignment position and the alignment direction of the abnormal alignment sequence on the reference genome;
assembling the sequences supporting the positions of the candidate breakpoints in the sequences to be compared, reserving the breakpoints which are consistent with the sequence information of the positions of the candidate breakpoints in an assembly result, and recording the breakpoints as the breakpoints of the gene rearrangement;
determining the position of a candidate breakpoint according to the alignment position and the alignment direction of the abnormal alignment sequence on the reference genome comprises:
performing sequence segmentation on the abnormal comparison sequence, comparing the abnormal comparison sequence with the reference genome, and determining the position of the candidate breakpoint according to the comparison position and the comparison direction of the segmented abnormal comparison sequence on the reference genome;
determining the position of the candidate breakpoint according to the alignment position and the alignment direction of the abnormal alignment sequence on the reference genome comprises:
performing sequence segmentation on the abnormal alignment sequence, and then comparing the abnormal alignment sequence with the reference genome to obtain a sequence which can simultaneously span two sides of a potential breakpoint and has a first length, marking the sequence as a first marker sequence, and a sequence which can simultaneously span two sides of the potential breakpoint and has a length smaller than a second length as a second marker sequence;
simulating a breakpoint reference sequence for which gene rearrangement occurs based on the location of the potential breakpoint on the first marker sequence;
comparing the sequence to be compared with the breakpoint reference sequence, marking sequences which can be compared with the breakpoint reference sequence and cross the breakpoint on the breakpoint reference sequence, and recording the sequences as breakpoint candidate sequences supporting breakpoints;
determining a position of a breakpoint on the breakpoint candidate sequence as a position of the candidate breakpoint.
2. The method of claim 1, wherein determining the location of the breakpoint on the breakpoint candidate sequence as the location of the candidate breakpoint comprises:
correcting the breakpoint candidate sequence according to the sequencing quality and the number of the support sequences to obtain the corrected candidate breakpoint sequence;
and determining the position of the breakpoint on the corrected candidate breakpoint sequence as the position of the candidate breakpoint.
3. The method of claim 1, wherein determining the location of the breakpoint on the breakpoint candidate sequence as the location of the candidate breakpoint comprises:
filtering false positive breakpoint sequences in the breakpoint candidate sequences according to the first marker sequence and the second marker sequence which support breakpoints on the breakpoint reference sequence and the paired sequences which support breakpoints spanning the breakpoint reference sequence in the sequences to be compared, so as to obtain the filtered candidate breakpoint sequences;
and determining the position of the breakpoint on the filtered candidate breakpoint sequence as the position of the candidate breakpoint.
4. The method of claim 1, wherein assembling with the sequences of the sequences to be aligned that support the position of the candidate breakpoint comprises:
and assembling according to the first marker sequence and the second marker sequence which support the breakpoint on the breakpoint reference sequence and the paired sequences which support the breakpoint on the crossing breakpoint reference sequence in the sequences to be compared, and keeping the breakpoint which is consistent with the sequence information of the position of the candidate breakpoint in the assembly result and marking as the breakpoint of the gene rearrangement.
5. The method of any one of claims 1 to 4, wherein obtaining the sequences to be aligned of the test sample comprises:
constructing a sequencing library of the sample to be detected;
performing high-throughput sequencing on the sequencing library to obtain sequencing data;
and preprocessing the sequencing data to obtain a sequence to be compared of the sample to be detected.
6. The method of claim 5, wherein the sequencing library is a hybrid capture library.
7. The method of claim 6, wherein the hybrid capture library is obtained by the capture probe of SEQ ID NO 1 to SEQ ID NO 36.
8. The method according to any one of claims 1 to 4, wherein after obtaining the breakpoint of the gene rearrangement, the method further comprises a step of quantifying the gene in which rearrangement occurs, the step of quantifying comprising:
counting the number of sequences of breakpoints supporting the gene rearrangement in the sequences to be compared according to the sequence information of the breakpoints of the gene rearrangement, and recording the number as a marker sequence;
and dividing the marker sequence number by the sequence number of the internal reference gene to obtain a ratio, namely the expression abundance of the rearranged gene relative to the internal reference gene.
9. An apparatus for detecting gene rearrangement, wherein the apparatus is used for storing or running a module, or wherein the module is a component of the apparatus; wherein the module is a software module, and the software module is one or more software modules, and the software module is used for executing the method for detecting gene rearrangement according to any one of claims 1 to 8.
10. A storage medium comprising a stored program, wherein the program executes the method for detecting gene rearrangement according to any one of claims 1 to 8.
11. A processor configured to run a program, wherein the program when executed performs the method for detecting gene rearrangement according to any one of claims 1 to 8.
CN201811643484.6A 2018-12-29 2018-12-29 Method, device, storage medium and processor for detecting gene rearrangement Active CN109712672B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811643484.6A CN109712672B (en) 2018-12-29 2018-12-29 Method, device, storage medium and processor for detecting gene rearrangement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811643484.6A CN109712672B (en) 2018-12-29 2018-12-29 Method, device, storage medium and processor for detecting gene rearrangement

Publications (2)

Publication Number Publication Date
CN109712672A CN109712672A (en) 2019-05-03
CN109712672B true CN109712672B (en) 2021-05-25

Family

ID=66260266

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811643484.6A Active CN109712672B (en) 2018-12-29 2018-12-29 Method, device, storage medium and processor for detecting gene rearrangement

Country Status (1)

Country Link
CN (1) CN109712672B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110942807A (en) * 2019-11-20 2020-03-31 北京橡鑫生物科技有限公司 Method and apparatus for detecting gene rearrangement
CN111081318B (en) * 2019-12-06 2023-06-06 人和未来生物科技(长沙)有限公司 Fusion gene detection method, system and medium
CN111524548B (en) * 2020-07-03 2020-10-23 至本医疗科技(上海)有限公司 Method, computing device, and computer storage medium for detecting IGH reordering
CN114694753B (en) * 2022-03-18 2023-04-07 深圳华大医学检验实验室 Nucleic acid sequence comparison method, device, equipment and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105339506A (en) * 2013-03-15 2016-02-17 基因组影像公司 Methods for the detection of breakpoints in rearranged genomic sequences
CN106951732A (en) * 2010-05-25 2017-07-14 加利福尼亚大学董事会 BAMBAM:The parallel comparative analysis of high-flux sequence data
CN107480472A (en) * 2017-07-21 2017-12-15 广州漫瑞生物信息技术有限公司 The detection method and device of a kind of Gene Fusion
CN108256295A (en) * 2016-12-29 2018-07-06 安诺优达基因科技(北京)有限公司 A kind of device for being used to detect Gene Fusion

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104298892B (en) * 2014-09-18 2017-05-10 天津诺禾致源生物信息科技有限公司 Detection device and method for gene fusion
CN104794371B (en) * 2015-04-29 2018-02-09 深圳华大生命科学研究院 The method and apparatus for detecting retrotransponsons insertion polymorphism
TWI765875B (en) * 2015-12-16 2022-06-01 美商磨石生物公司 Neoantigen identification, manufacture, and use
CN108830044B (en) * 2018-06-05 2020-06-26 序康医疗科技(苏州)有限公司 Detection method and device for detecting cancer sample gene fusion

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106951732A (en) * 2010-05-25 2017-07-14 加利福尼亚大学董事会 BAMBAM:The parallel comparative analysis of high-flux sequence data
CN105339506A (en) * 2013-03-15 2016-02-17 基因组影像公司 Methods for the detection of breakpoints in rearranged genomic sequences
CN108256295A (en) * 2016-12-29 2018-07-06 安诺优达基因科技(北京)有限公司 A kind of device for being used to detect Gene Fusion
CN107480472A (en) * 2017-07-21 2017-12-15 广州漫瑞生物信息技术有限公司 The detection method and device of a kind of Gene Fusion

Also Published As

Publication number Publication date
CN109712672A (en) 2019-05-03

Similar Documents

Publication Publication Date Title
CN109712672B (en) Method, device, storage medium and processor for detecting gene rearrangement
JP7051900B2 (en) Methods and systems for the generation and error correction of unique molecular index sets with non-uniform molecular lengths
CN111341383B (en) Method, device and storage medium for detecting copy number variation
KR102356323B1 (en) Verification method and system for sequence variant call
Fruciano et al. Genetic linkage of distinct adaptive traits in sympatrically speciating crater lake cichlid fish
CN107077537A (en) With short reading sequencing data detection repeat amplification protcol
CN112218956A (en) Methods and reagents for resolving nucleic acid mixtures and mixed cell populations and related applications
CN107267613A (en) Sequencing data processing system and SMN gene detection systems
CN112349346A (en) Method for detecting structural variations in genomic regions
CN113621716A (en) Method and device for multi-line drug-resistant gene identification of mycobacterium tuberculosis
CN115083521A (en) Method and system for identifying tumor cell group in single cell transcriptome sequencing data
CN109920480B (en) Method and device for correcting high-throughput sequencing data
KR102347463B1 (en) Method and appartus for detecting false positive variants in nucleic acid sequencing analysis
CN110942806A (en) Blood type genotyping method and device and storage medium
CN109390039B (en) Method, device and storage medium for counting DNA copy number information
KR101815529B1 (en) Human Haplotyping System And Method
CN112513292B (en) Method and device for detecting homologous sequences based on high-throughput sequencing
US20170226588A1 (en) Systems and methods for dna amplification with post-sequencing data filtering and cell isolation
CN115961054B (en) Genetic marker for identifying south China tiger individuals and/or paternity testing and application thereof
Xu et al. Analysis of population-genetic properties of copy number variations
RU2759953C2 (en) Method for detecting copy number variations and changes in brca1 and brca2 genes according to data of targeted massive parallel genome sequencing
US20230332220A1 (en) Random insertion genome reconstruction
US6963805B2 (en) Methods for identifying the evolutionarily conserved sequences
US20230332205A1 (en) Linked dual barcode insertion constructs
Kloda Gene expression analysis on a subgene level

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant