CN109712672B

CN109712672B - Method, device, storage medium and processor for detecting gene rearrangement

Info

Publication number: CN109712672B
Application number: CN201811643484.6A
Authority: CN
Inventors: 王彬安; 刘洋洋; 李富威; 王建伟; 伍启熹; 刘倩; 刘珂弟; 唐宇
Original assignee: Beijing Usci Medical Laboratory Co ltd
Current assignee: Beijing Usci Medical Laboratory Co ltd
Priority date: 2018-12-29
Filing date: 2018-12-29
Publication date: 2021-05-25
Anticipated expiration: 2038-12-29
Also published as: CN109712672A

Abstract

The invention provides a method, a device, a storage medium and a processor for detecting gene rearrangement. The method comprises the following steps: obtaining a sequence to be compared of a sample to be detected; comparing the sequence to be compared with a reference genome to obtain an abnormal comparison sequence, wherein the abnormal comparison sequence comprises a sequence with abnormal comparison position, a sequence with abnormal comparison direction and a sequence without the reference genome; determining the position of a candidate breakpoint according to the comparison position and the comparison direction of the abnormal comparison sequence on the reference genome; and assembling by using the sequences supporting the positions of the candidate breakpoints in the sequences to be compared, and keeping the breakpoints which are consistent with the sequence information of the positions of the candidate breakpoints in the assembly result and marking as the breakpoints of the gene rearrangement. The application solves the problem that the prior art is difficult to detect the breakpoint position of gene rearrangement.

Description

Method, device, storage medium and processor for detecting gene rearrangement

Technical Field

The invention relates to the field of gene variation detection, in particular to a method, a device, a storage medium and a processor for detecting gene rearrangement.

Background

The prior art generally adopts a nested RT-PCR method to detect the gene rearrangement phenomenon, and the steps are as follows: based on the known target gene sequence, a specific probe is prepared and gene rearrangement is detected. The nested PCR reaction has two PCR amplifications, thus reducing the possibility of amplifying a plurality of target sites (because the primers complementary to both sets of primers are few) and increasing the detection sensitivity; and two pairs of PCR primers are matched with the detection template, so that the detection reliability is improved. Since the second set of primers is located within the first round of PCR products and the probability that the non-desired fragment contains both sets of primer binding sites is minimal, the second set of primers cannot amplify the non-desired fragment. This nested PCR amplification ensures that the second round PCR products are little or completely free of contamination by non-specific amplification due to the lack of primer pair specificity.

However, nested RT-PCR checks for gene rearrangements with the following disadvantages: 1) the structure of gene rearrangement cannot be accurately judged. 2) The unknown rearrangement phenomenon cannot be detected due to the limitation of the primer and the probe. 3) Detailed information on the sequence of the rearranged gene disruption junction region is not available.

Therefore, there is a need for improvements to existing detection methods.

Disclosure of Invention

The invention mainly aims to provide a method, a device, a storage medium and a processor for detecting gene rearrangement, so as to solve the problem that the breakpoint position of gene rearrangement is difficult to detect in the prior art.

In order to achieve the above object, according to one aspect of the present invention, there is provided a method for detecting gene rearrangement, the method comprising: obtaining a sequence to be compared of a sample to be detected; comparing the sequence to be compared with a reference genome to obtain an abnormal comparison sequence, wherein the abnormal comparison sequence comprises a sequence with abnormal comparison position, a sequence with abnormal comparison direction and a sequence without the reference genome; determining the position of a candidate breakpoint according to the comparison position and the comparison direction of the abnormal comparison sequence on the reference genome; and assembling by using the sequences supporting the positions of the candidate breakpoints in the sequences to be compared, and keeping the breakpoints which are consistent with the sequence information of the positions of the candidate breakpoints in the assembly result and marking as the breakpoints of the gene rearrangement.

Further, determining the position of the candidate breakpoint according to the alignment position and the alignment direction of the abnormal alignment sequence on the reference genome comprises: and performing sequence segmentation on the abnormal comparison sequence, comparing the abnormal comparison sequence with the reference genome, and determining the position of the candidate breakpoint according to the comparison position and the comparison direction of the segmented abnormal comparison sequence on the reference genome.

Further, determining the position of the candidate breakpoint according to the alignment position and the alignment direction of the abnormal alignment sequence on the reference genome comprises: performing sequence segmentation on the abnormal comparison sequence, and comparing the abnormal comparison sequence with a reference genome to obtain a sequence which can simultaneously span two sides of the potential breakpoint and has a first length, marking the sequence as a first marker sequence, and a sequence which can simultaneously span two sides of the potential breakpoint and has a length smaller than a second length, and taking the sequence as a second marker sequence; simulating a breakpoint reference sequence in which gene rearrangement occurs according to the position of the potential breakpoint on the first marker sequence; comparing the sequence to be compared with the breakpoint reference sequence, marking the sequence which can be compared with the upper breakpoint reference sequence and cross the breakpoint on the breakpoint reference sequence, and recording the sequence as a breakpoint candidate sequence supporting the breakpoint; the position of the breakpoint on the breakpoint candidate sequence is determined as the position of the candidate breakpoint.

Further, determining the position of the breakpoint on the breakpoint candidate sequence as the position of the candidate breakpoint comprises: correcting the breakpoint candidate sequence according to the sequencing quality and the number of the support sequences to obtain a corrected candidate breakpoint sequence; and determining the positions of the breakpoints on the corrected candidate breakpoint sequence as the positions of the candidate breakpoints.

Further, determining the position of the breakpoint on the breakpoint candidate sequence as the position of the candidate breakpoint comprises: filtering false positive breakpoint sequences in the breakpoint candidate sequences according to a first marker sequence and a second marker sequence which support breakpoints on the breakpoint reference sequence and paired sequences which support breakpoints on the cross breakpoint reference sequence in the sequences to be compared to obtain filtered candidate breakpoint sequences; and determining the positions of the breakpoints on the filtered candidate breakpoint sequence as the positions of the candidate breakpoints.

Further, assembling with sequences that support positions of candidate breakpoints in the sequences to be aligned comprises: and assembling according to the first marker sequence and the second marker sequence which support the breakpoint on the breakpoint reference sequence and the paired sequences which support the breakpoint on the cross breakpoint reference sequence in the sequences to be compared, and keeping the breakpoint which is consistent with the sequence information of the position of the candidate breakpoint in the assembly result and marking as the breakpoint of the gene rearrangement.

Further, obtaining the sequences to be aligned of the sample to be tested comprises: constructing a sequencing library of a sample to be detected; performing high-throughput sequencing on the sequencing library to obtain sequencing data; and preprocessing the sequencing data to obtain a sequence to be compared of the sample to be detected.

Further, the sequencing library is a hybrid capture library, preferably obtained by capture probes of SEQ ID NO:1 to SEQ ID NO: 36.

Further, after obtaining the breakpoint of the gene rearrangement, the method further comprises a step of quantifying the rearranged gene, the quantifying step comprising: counting the sequence number of breakpoints supporting gene rearrangement in the sequences to be compared according to the sequence information of the breakpoints of the gene rearrangement, and recording the sequence number as a marker sequence number; and dividing the marker sequence number by the sequence number of the internal reference gene to obtain a ratio which is the expression abundance of the rearranged gene relative to the internal reference gene.

In order to achieve the above object, according to a second aspect of the present invention, there is provided an apparatus for detecting gene rearrangement, the apparatus being used for storing or operating modules, or the modules being components of the apparatus; wherein, the module is a software module, the software module is one or more, and the software module is used for executing any one of the methods for detecting gene rearrangement.

According to a third aspect of the present invention, there is provided a storage medium comprising a stored program, wherein the program performs any one of the above-described methods of detecting gene rearrangement.

According to a fourth aspect of the present invention, there is provided a processor for running a program, wherein the program when running performs any of the above-described methods for detecting gene rearrangement.

By applying the technical scheme of the invention, the position of gene rearrangement is detected by using high-throughput sequencing data, the position of the rearranged candidate breakpoint is determined by using the sequence which is abnormally compared with the sequence on the reference genome in the sequence to be compared, and then the reliable position of the candidate breakpoint is further verified by using the sequence-like sequence of the sequence to be compared, so that the position of the breakpoint of the gene rearrangement can be accurately detected, and correspondingly, the sequence information of the breakpoint position can also be accurately known, thereby providing a basis for further verifying the breakpoint position by using conventional PCR. Therefore, the method of the application can not only detect the known or unknown rearrangement phenomenon, but also accurately detect the specific position where the rearrangement occurs and the corresponding sequence information. This method directly utilizes NGS sequencing data, is based on statistical and algorithmic development, and does not add any additional experimental detection cost. In addition, the method has high detection accuracy and low cost, and is suitable for structural rearrangement detection of low-abundance genes.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:

FIG. 1 is a schematic diagram showing a simple flow chart for detecting the breakpoint position of gene rearrangement in a preferred embodiment according to the present invention;

FIG. 2 is a schematic view showing a detailed flow chart for detecting the breakpoint position of gene rearrangement in another preferred embodiment according to the present invention; and

FIGS. 3 and 4 are graphs showing the sequencing results of the breakpoint positions detected by the method of example 1 according to the present invention, verified by one-generation PCR sequencing, wherein FIG. 3 shows the sequencing results of the forward primer, and FIG. 4 shows the sequencing results of the reverse primer.

Detailed Description

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present invention will be described in detail with reference to examples.

As mentioned in the background, the prior art can only judge the rearrangement phenomenon and cannot accurately determine the position where the rearrangement occurs when detecting the gene in which the rearrangement occurs, and thus, in order to improve this situation, in an exemplary embodiment of the present application, there is provided a method for detecting gene rearrangement, the method comprising: obtaining a sequence to be compared of a sample to be detected; comparing the sequence to be compared with a reference genome to obtain an abnormal comparison sequence, wherein the abnormal comparison sequence comprises a sequence with abnormal comparison position, a sequence with abnormal comparison direction and a sequence without the reference genome; determining the position of a candidate breakpoint according to the comparison position and the comparison direction of the abnormal comparison sequence on the reference genome; and assembling by using the sequences supporting the positions of the candidate breakpoints in the sequences to be compared, and keeping the breakpoints which are consistent with the sequence information of the positions of the candidate breakpoints in the assembly result and marking as the breakpoints of the gene rearrangement.

According to the method for detecting gene rearrangement, the position of gene rearrangement is detected by using high-throughput sequencing data, the candidate breakpoint position of the rearrangement is determined by using the sequence which is abnormally compared with the sequence on the reference genome in the sequence to be compared, and then the reliable candidate breakpoint position is further verified by using the group sequence of the sequence to be compared, so that the breakpoint position of the gene rearrangement can be accurately detected, and correspondingly, the sequence information of the breakpoint position can also be accurately known, and a foundation is provided for further verifying the breakpoint position by using conventional PCR. Therefore, the method of the application can not only detect the known or unknown rearrangement phenomenon, but also accurately detect the specific position where the rearrangement occurs and the corresponding sequence information. This method directly utilizes NGS sequencing data, is based on statistical and algorithmic development, and does not add any additional experimental detection cost. In addition, the method has high detection accuracy and low cost, and is suitable for structural rearrangement detection of low-abundance genes.

It should be noted that the sequence to be compared of the sample to be tested may be a sequence to be compared formed by processing the original sequencing data of the sample to be tested, or may be an existing sequence to be compared that can be used for comparison. The method verifies the position of the candidate breakpoint through adding the sequence after assembly, so that the position of the breakpoint is more accurate.

In the sequences to be aligned, a part of the sequences can be aligned with the reference genome, and a part of the sequences cannot be directly aligned to the reference genome due to gene rearrangement, so that the part of the sequences is called abnormal aligned sequences. Aberrant aligned sequences include sequences that are aligned at aberrant positions (such as sequences that are forward tandem repeats), sequences that are aligned in aberrant orientations (such as sequences that are inverted tandem repeats), and sequences that are not aligned to a reference genome (such as insertion-deleted sequences). The potential breakpoint position can be determined by the existing method (for example, the position of the abnormal alignment sequence can be aligned to the same chromosome, inversion occurs between the sequences, and the potential breakpoint position can be determined by the abnormal alignment direction) according to the alignment position and the alignment direction of the abnormal alignment sequence on the reference genome.

In certain preferred embodiments, determining the location of the candidate breakpoint based on the aligned position and the aligned orientation of the aberrantly aligned sequence on the reference genome comprises: and performing sequence segmentation on the abnormal comparison sequence, comparing the abnormal comparison sequence with the reference genome, and determining the position of the candidate breakpoint according to the comparison position and the comparison direction of the segmented abnormal comparison sequence on the reference genome.

Specifically, the existing alignment software for sequence segmentation includes bwa, hisat2 or STAR. These software use a more relaxed alignment method to align each segmented sequence to the possible position of the reference genome, so that the final alignment position and alignment direction can be determined.

In some more preferred embodiments, determining the location of the candidate breakpoint based on the aligned location and the aligned orientation of the aberrantly aligned sequence on the reference genome comprises: performing sequence segmentation on the abnormal comparison sequence, and comparing the abnormal comparison sequence with a reference genome to obtain a sequence which can simultaneously span two sides of the potential breakpoint and has a first length, marking the sequence as a first marker sequence, and taking a sequence which can simultaneously span two sides of the potential breakpoint and has a length smaller than a second length as a second marker sequence; simulating a breakpoint reference sequence in which gene rearrangement occurs according to the position of the potential breakpoint on the first marker sequence; comparing the sequence to be compared with the breakpoint reference sequence, marking the sequence which can be compared with the upper breakpoint reference sequence and cross the breakpoint on the breakpoint reference sequence, and recording the sequence as a breakpoint candidate sequence supporting the breakpoint; the position of the breakpoint on the breakpoint candidate sequence is determined as the position of the candidate breakpoint.

In the data of double-ended sequencing, sequencing sequences in two directions exist, and according to the sequence of single-ended sequencing, if the sequence is cut into two or three segments and then compared with a reference genome, each segment can be compared to different positions and directions of the reference genome respectively, and the potential breakpoint position of gene rearrangement can be deduced according to the specific cut position. By dividing the first marker sequence and the second marker sequence and simulating and constructing a breakpoint reference sequence for re-alignment, the method is beneficial to acquiring more potential breakpoint-crossing sequences and supporting normally-aligned paired sequences crossing the breakpoint. And the candidate breakpoint sequence is further acted by a sequence supporting the breakpoint position on the breakpoint reference sequence, so that the accuracy of the screened candidate breakpoint is relatively high. The first length of the first marker sequence spanning two sides of the potential breakpoint can be reasonably set to be 20-25 bp according to the difference of sequence sequencing lengths. And the sequence with the length smaller than the second length is used as a second marker sequence, and the second length can be reasonably set to be 10-20 bp according to different sequencing lengths of the sequences.

In order to further improve the accuracy of the breakpoint position, the candidate breakpoints can be further corrected and false positive filtered according to the sequencing depth and the sequencing strategy of the sequencing data of the sample to be detected, so that the breakpoint position with higher authenticity is reserved.

In some preferred embodiments, determining the position of the breakpoint on the breakpoint candidate sequence as the position of the candidate breakpoint comprises: correcting the breakpoint candidate sequence according to the sequencing quality and the number of the support sequences to obtain a corrected candidate breakpoint sequence; and determining the positions of the breakpoints on the corrected candidate breakpoint sequence as the positions of the candidate breakpoints.

Specifically, for example, the average sequencing depth reaches 1000 ×, the sequence across the breakpoint reaches more than 2% of the average depth, that is, more than 20 × the breakpoint base correction can be performed, and the breakpoint can be corrected by simulating the breakpoint alignment position relationship of the reference sequence and the quality of the aligned base. Breakpoint false positives below 20 x for sequences supporting crossing breakpoints are usually removed if they are high.

In some preferred embodiments, determining the position of the breakpoint on the breakpoint candidate sequence as the position of the candidate breakpoint comprises: filtering false positive breakpoint sequences in the breakpoint candidate sequences according to a first marker sequence and a second marker sequence which support breakpoints on the breakpoint reference sequence and paired sequences which support breakpoints on the cross breakpoint reference sequence in the sequences to be compared to obtain filtered candidate breakpoint sequences; and determining the positions of the breakpoints on the filtered candidate breakpoint sequence as the positions of the candidate breakpoints.

Specifically, for example, the number of the first marker sequences is kept to be more than 10, and the breakpoints of the paired sequences supporting the breakpoints on the cross-breakpoint reference sequence in the sequences to be aligned are more than 50. Of course, the specific values herein can be adjusted according to different sequencing samples, and are only exemplary.

In certain preferred embodiments, assembling with sequences that support the position of a candidate breakpoint in the sequences to be aligned comprises: and assembling according to the first marker sequence and the second marker sequence which support the breakpoint on the breakpoint reference sequence and the paired sequences which support the breakpoint on the cross breakpoint reference sequence in the sequences to be compared, and keeping the breakpoint which is consistent with the sequence information of the position of the candidate breakpoint in the assembly result and marking as the breakpoint of the gene rearrangement.

By using the first marker sequence, the second marker sequence and the pair sequence supporting the breakpoint, sequence assembly is performed, and the candidate breakpoint position is verified again by the assembled sequence formed by de novo assembly, so that the finally determined breakpoint position of the gene rearrangement is more accurate.

As mentioned above, the data to be compared of the sample to be tested in the present application may be the existing sequence to be compared that can be directly used for comparison, or may be the sequence with comparison obtained by processing the original data obtained by sequencing. In some preferred embodiments, obtaining the sequences to be aligned of the test sample comprises: constructing a sequencing library of a sample to be detected; performing high-throughput sequencing on the sequencing library to obtain sequencing data; and preprocessing the sequencing data to obtain a sequence to be compared of the sample to be detected.

In certain preferred embodiments, the sequencing library is a hybrid capture library, preferably obtained by capture probes of SEQ ID NO:1 through SEQ ID NO: 36. The hybridization capture library can be used for detecting gene rearrangement aiming at sequencing data of a target gene. The capture probes of SEQ ID NO. 1 to SEQ ID NO. 36 can capture the whole exon sequence of the MLL gene, and thus can be used for detecting the exon rearrangement position of the gene and the corresponding sequence information thereof.

The method can accurately detect the position of the breakpoint of rearrangement of the target gene, and can also detect the expression quantity of the detected variant gene by using the sequence to be compared of the sample to be detected according to different research purposes. In certain preferred embodiments, after obtaining the breakpoint of the gene rearrangement, the above method further comprises a step of quantifying the rearranged gene, the step of quantifying comprising: counting the sequence number of breakpoints supporting gene rearrangement in the sequences to be compared according to the sequence information of the breakpoints of the gene rearrangement, and recording the sequence number as a marker sequence number; and dividing the marker sequence number by the sequence number of the internal reference gene to obtain a ratio which is the expression abundance of the rearranged gene relative to the internal reference gene. By detecting the expression level of a certain rearranged gene, it is possible to reflect the expression of the gene under a specific condition or in a specific treatment state, and further, by detecting the expression level of the gene under a series of different conditions or different states, it is possible to reflect the difference in expression. The above-mentioned reference gene can be appropriately selected according to actual needs, for example, when the gene to be detected is the MLL gene, the ABL1 gene can be usually selected as the reference gene.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for causing a computing device to execute the methods according to the embodiments of the present invention or a processor to execute the methods according to the embodiments of the present invention.

In a second exemplary embodiment of the present application, there is provided an apparatus for detecting gene rearrangement, the apparatus being used for storing or operating modules, or modules being part of the apparatus; the module is a software module, the number of the software modules is one or more, and the software module is used for executing any one of the methods. The device not only can more accurately detect the breakpoint position of gene rearrangement, but also can obtain the sequence information corresponding to the breakpoint position, so that the relative expression quantity of the gene can be conveniently detected according to the sequence information, the practicability and the application range are wider, and any variant gene with the gene rearrangement phenomenon can be detected by adopting the device.

Preferably, the above apparatus comprises: the device comprises an acquisition module, a comparison module, a candidate module and an assembly determination module, wherein the acquisition module is used for acquiring a sequence to be compared of a sample to be tested; the comparison module is used for comparing the sequence to be compared with the reference genome to obtain an abnormal comparison sequence, wherein the abnormal comparison sequence comprises a sequence with abnormal comparison position, a sequence with abnormal comparison direction and a sequence which is not compared with the reference genome; the candidate module is used for determining the position of a candidate breakpoint according to the comparison position and the comparison direction of the abnormal comparison sequence on the reference genome; and the assembly determining module is used for assembling by using the sequence supporting the position of the candidate breakpoint in the sequence to be compared, and keeping the breakpoint in the assembly result, which is consistent with the sequence information of the position of the candidate breakpoint, as the breakpoint of the gene rearrangement.

In a preferred embodiment, the candidate modules include: the system comprises a segmentation and comparison module and a candidate determination module, wherein the segmentation and comparison module is used for performing sequence segmentation on an abnormal comparison sequence and then comparing the abnormal comparison sequence with a reference genome, and the candidate determination module is used for determining the position of a candidate breakpoint according to the comparison position and the comparison direction of the segmented abnormal comparison sequence on the reference genome.

In a preferred embodiment, the candidate modules include: the system comprises a segmentation marking module, a simulation module, a comparison marking module and a candidate breakpoint module, wherein the segmentation marking module is used for segmenting the sequence of an abnormal comparison sequence and then comparing the sequence with a reference genome to obtain a sequence which can simultaneously span two sides of a potential breakpoint and has a first length, and the sequence is marked as a first marking sequence, can simultaneously span two sides of the potential breakpoint and has a length smaller than a second length and is used as a second marking sequence; the simulation module is used for simulating a breakpoint reference sequence in which gene rearrangement occurs according to the position of a potential breakpoint on the first marker sequence; the comparison marking module is used for comparing the sequence to be compared with the breakpoint reference sequence, marking the sequence which can be compared with the upper breakpoint reference sequence and cross the breakpoint on the breakpoint reference sequence, and recording the sequence as a breakpoint candidate sequence supporting the breakpoint; the candidate breakpoint module is configured to determine a position of a breakpoint on the breakpoint candidate sequence as a position of a candidate breakpoint.

In a preferred embodiment, the candidate breakpoint module includes: the device comprises a breakpoint correction module and a correction determination module, wherein the breakpoint correction module is used for correcting a breakpoint candidate sequence according to sequencing quality and a support sequence number to obtain a corrected candidate breakpoint sequence; and the correction determining module is used for determining the positions of the breakpoints on the corrected candidate breakpoint sequence as the positions of the candidate breakpoints.

In a preferred embodiment, the candidate breakpoint module includes: the breakpoint filtering module is used for filtering a false positive breakpoint sequence in the breakpoint candidate sequence according to a first marker sequence and a second marker sequence which support breakpoints on the breakpoint reference sequence and a paired sequence which supports breakpoints on the cross breakpoint reference sequence in the sequences to be compared to obtain a filtered candidate breakpoint sequence; and the filtering determination module is used for determining the positions of the breakpoints on the filtered candidate breakpoint sequence as the positions of the candidate breakpoints.

In a preferred embodiment, the assembly determination module comprises: the assembly submodule is used for assembling according to a first marker sequence and a second marker sequence which support breakpoints on the breakpoint reference sequence and paired sequences which support breakpoints on the cross breakpoint reference sequence in the sequences to be compared, and the retention module is used for retaining breakpoints which are consistent with sequence information of positions of candidate breakpoints in an assembly result and recording the breakpoints as gene rearrangement breakpoints.

In a preferred embodiment, the obtaining module comprises: the device comprises a construction module, a sequencing module and a pretreatment module, wherein the construction module is used for constructing a sequencing library of a sample to be detected; the sequencing module is used for carrying out high-throughput sequencing on the sequencing library to obtain sequencing data; the preprocessing module is used for preprocessing the sequencing data to obtain a sequence to be compared of the sample to be tested.

In a preferred embodiment, the sequencing library is a hybrid capture library, preferably obtained by using the capture probes of SEQ ID NO:1 to SEQ ID NO: 36.

In a preferred embodiment, the apparatus further comprises a quantification module for quantifying the rearranged genes, the quantification module comprising: the device comprises a statistic module and an expression quantity calculation module, wherein the statistic module is used for counting the sequence number of breakpoints supporting gene rearrangement in sequences to be compared according to the sequence information of the breakpoints of the gene rearrangement and recording the sequence number as a marker sequence number; and the expression quantity calculation module is used for dividing the marker sequence number and the sequence number of the internal reference gene, and the obtained ratio is the expression abundance of the rearranged gene relative to the internal reference gene.

In a third exemplary embodiment of the present application, a storage medium is provided, which includes a stored program, wherein the program performs any one of the above-described methods of detecting gene rearrangement.

In a fourth exemplary embodiment of the present application, a processor for executing a program is provided, wherein the program is executed to perform any one of the above-mentioned methods for detecting gene rearrangement.

The storage medium, the processor and the device can be used for executing the method for detecting gene rearrangement by a computer and outputting corresponding detection results, the products realize the detection of the gene rearrangement without adding any additional experiment and sequencing cost, and the device has low detection cost and high accuracy.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computing device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U disk, a Read (-) Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

The advantageous effects of the present application will be further described with reference to specific examples.

Example 1 method for detecting MLL-PTD Gene rearrangement

1. Sample and data

1) Bone marrow or peripheral blood of the patient is extracted and stored by a collection tube.

2) Extracting nucleic acid from the sample, and storing the rest sample at-80 deg.C.

3) A sequencing library was constructed and the target region was enriched by a hybrid capture method (hybrid capture probes for the MLL gene, 36 exons for the gene respectively, the specific sequences are shown in table 1 below).

4) And performing on-machine sequencing on the captured library.

Table 1:

2. pre-processing of sequencing data

1) Data quality control

Mainly deleting low-quality sequences, and removing sequences containing more than 5 bases N; sequences with an average sequencing quality of less than Q20 for 40 consecutive nucleotides were also deleted.

2) Alignment of MLL Gene sequences

The quality control-passed high quality sequences were aligned to the reference sequence with hisat2 for further analysis.

3. MLL-PTD recognition

1) Principle and theoretical basis:

MLL-PTD results in molecular level variation of MLL gene (total of 36 exons) as evidenced by altered exon junction order, with rearrangements typically occurring between exon2 and exon 11.

2) MLL-PTD breakpoint recognition:

firstly, the paired sequences are compared, and according to the position relation of the compared sequences, the structural variation existing between the sequence pairs is searched for the sequence pairs with abnormal position relation. And meanwhile, segmenting the abnormally compared sequences, comparing the sequences to possible positions by using a looser comparison method, determining the final comparison position and the comparison direction, and calculating the breakpoint position according to the comparison position of the cut sequences. As shown in fig. 1, the break-point crossing sequence can simultaneously cross a sequence of a first length on both sides of the break point as a first marker sequence, and can simultaneously cross a sequence of a second length as a second marker sequence. And simulating a breakpoint reference sequence of PTD generation by marking the breakpoint position of the sequence, re-comparing the sequence, and only reserving candidate breakpoints which are well compared and have a cross-breakpoint sequence.

3) Breakpoint correction

Because breakpoint edge sequences are similar and mutation or sequencing error exists, as shown in fig. 2, the sequencing quality and the number of support sequences are corrected according to the comparison score, and the optimal predicted breakpoint sequence is given as a candidate breakpoint.

4) Multifactorial filtration of false positives

For candidate breakpoints, false positives are further filtered according to the first marker sequence that supports breakpoints, the second marker sequence, and the pair-wise sequences that support cross-breakpoints. Then, as shown in fig. 2, all sequences supporting the break points are assembled, and the break points with the assembly result consistent with the break point sequence information are reserved. Thereby obtaining reliable MLL-PTD structure information.

4. MLL-PTD quantification

Based on the marker sequence number of MLL-PTD/the sequence depth of the internal reference gene ABL1, the abundance ratio of the marker sequence number to the ABL1 gene is obtained.

Specifically, 122 samples were tested according to the method of the present application shown in FIG. 2, and 10 samples were detected to have MLL-PTD variation, and the results are reported in tables 2 and 3 below.

Table 2:

table 3:

sample numbering	SEQ ID NO:	Fusion sequence
			A	37	AGAGGTCTCTGATGAGTCACTTTCTTGACC@cttttcttttggtttttgttttacagggat
B	38	AGAGGTCTCTGATGAGTCACTTTCTTGACC@cttttcttttggtttttgttttacagggat
			C	39	ATCTGAGCCAAAACCTAAGAATTGCTCATC@cttttcttttggtttttgttttacagggat
D
		40	CATCTTCTGAGCCAGCAATTGATGACTTGT@cttttcttttggtttttgttttacagggat
E				41	ATCTGAGCCAAAACCTAAGAATTGCTCATC@cttttcttttggtttttgttttacagggat
	F	42	ATCTGAGCCAAAACCTAAGAATTGCTCATC@cttttcttttggtttttgttttacagggat
G				43	ATCTGAGCCAAAACCTAAGAATTGCTCATC@cttaaagtccactctgatcctgtggactcc
	H	44	ATCTGAGCCAAAACCTAAGAATTGCTCATC@ctgattctggtggtggaggctgctttttct
I				45	ATCTGAGCCAAAACCTAAGAATTGCTCATC@cttttcttttggtttttgttttacagggat
	J	46	ATCTGAGCCAAAACCTAAGAATTGCTCATC@cttttcttttggtttttgttttacagggat
K				47	CATCTTCTGAGCCAGCAATTGATGACTTGT@cttttcttttggtttttgttttacagggat

Fusion sequences in table 3 are reverse complementary sequences, e.g. a: exon8- > exon4 lower case letters represent the sequence of exon8, upper case for exon 4.

2. And selecting a sample C to perform Sanger sequencing verification on the detected MLL-PTD breakpoint structure.

The PCR verified sample information is shown in table 4 below.

Table 4:

sample numbering	MLL-PTD structure	Exon A	Exon B	Marker sequence number	Ratio
						C	exon8->exon2	exon8	exon2	231	17.12％

Sequence information obtained by verification is as follows:

ATCTGAGCCAAAACCTAAGAATTGCTCATC@cttttcttttggtttttgttttacagggat

(i.e., SEQ ID NO: 39).

3. Generating a breakpoint template sequence according to the breakpoint position, designing primers at 300bp before and after the breakpoint, and performing PCR amplification.

4. The PCR product has reasonable size and bright and single band, and the sequencing peak map is clean when the PCR product is subjected to Sanger sequencing.

5. Breakpoint structures can be found according to Sanger sequencing results, and bases before and after the breakpoints are completely consistent with breakpoint sequences identified by the method in the application (the sequencing result of the forward primer is shown in figure 3, and the sequencing result of the reverse complement is shown in figure 4).

From the above description, it can be seen that the above-described embodiments of the present invention achieve the following technical effects: the method can not only detect the known or unknown rearrangement phenomenon, but also accurately detect the specific position where the rearrangement occurs and the corresponding sequence information. The method has wide application range and is suitable for detecting all genes with rearrangement phenomena.

This method directly utilizes NGS sequencing data, is based on statistical and algorithmic development, and does not add any additional experimental detection cost. In addition, the method has high detection accuracy and low cost, and is suitable for structural rearrangement detection of low-abundance genes.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially implemented or the portions contributing to the prior art may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method of the embodiments or some portions of the embodiments of the present application.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and they may alternatively be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, or fabricated separately as individual integrated circuit modules, or fabricated as a single integrated circuit module from multiple modules or steps. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Sequence listing

<110> Beijing excel medical examination laboratory Co., Ltd

<120> method, apparatus, storage medium, and processor for detecting gene rearrangement.

<130> PN102308YXYX

<160> 47

<170> SIPOSequenceListing 1.0

<210> 1

<211> 455

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 1

ctgcttcact tcacggggcg aacatggcgc acagctgtcg gtggcgcttc cccgcccgac 60

ccgggaccac cgggggcggc ggcggcgggg ggcgccgggg cctagggggc gccccgcggc 120

aacgcgtccc ggccctgctg cttccccccg ggcccccggt cggcggtggc ggccccgggg 180

cgcccccctc ccccccggct gtggcggccg cggcggcggc ggcgggaagc agcggggctg 240

gggttccagg gggagcggcc gccgcctcag cagcctcctc gtcgtccgcc tcgtcttcgt 300

cttcgtcatc gtcctcagcc tcttcagggc cggccctgct ccgggtgggc ccgggcttcg 360

acgcggcgct gcaggtctcg gccgccatcg gcaccaacct gcgccggttc cgggccgtgt 420

ttggggagag cggcggggga ggcggcagcg gagag 455

<210> 2

<211> 70

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 2

gatgagcaat tcttaggttt tggctcagat gaagaagtca gagtgcgaag tcccacaagg 60

tctccttcag 70

<210> 3

<211> 2654

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 3

ttaaaactag tcctcgaaaa cctcgtggga gacctagaag tggctctgac cgaaattcag 60

ctatcctctc agatccatct gtgttttccc ctctaaataa atcagagacc aaatctggag 120

ataagatcaa gaagaaagat tctaaaagta tagaaaagaa gagaggaaga cctcccacct 180

tccctggagt aaaaatcaaa ataacacatg gaaaggacat ttcagagtta ccaaagggaa 240

acaaagaaga tagcctgaaa aaaattaaaa ggacaccttc tgctacgttt cagcaagcca 300

caaagattaa aaaattaaga gcaggtaaac tctctcctct caagtctaag tttaagacag 360

ggaagcttca aataggaagg aagggggtac aaattgtacg acggagagga aggcctccat 420

caacagaaag gataaagacc ccttcgggtc tcctcattaa ttctgaactg gaaaagcccc 480

agaaagtccg gaaagacaag gaaggaacac ctccacttac aaaagaagat aagacagttg 540

tcagacaaag ccctcgaagg attaagccag ttaggattat tccttcttca aaaaggacag 600

atgcaaccat tgctaagcaa ctcttacaga gggcaaaaaa gggggctcaa aagaaaattg 660

aaaaagaagc agctcagctg cagggaagaa aggtgaagac acaggtcaaa aatattcgac 720

agttcatcat gcctgttgtc agtgctatct cctcgcggat cattaagacc cctcggcggt 780

ttatagagga tgaggattat gaccctccaa ttaaaattgc ccgattagag tctacaccga 840

atagtagatt cagtgccccg tcctgtggat cttctgaaaa atcaagtgca gcttctcagc 900

actcctctca aatgtcttca gactcctctc gatctagtag ccccagtgtt gatacctcca 960

cagactctca ggcttctgag gagattcagg tacttcctga ggagcggagc gatacccctg 1020

aagttcatcc tccactgccc atttcccagt ccccagaaaa tgagagtaat gataggagaa 1080

gcagaaggta ttcagtgtcg gagagaagtt ttggatctag aacgacgaaa aaattatcaa 1140

ctctacaaag tgccccccag cagcagacct cctcgtctcc acctccacct ctgctgactc 1200

caccgccacc actgcagcca gcctccagta tctctgacca cacaccttgg cttatgcctc 1260

caacaatccc cttagcatca ccatttttgc ctgcttccac tgctcctatg caagggaagc 1320

gaaaatctat tttgcgagaa ccgacattta ggtggacttc tttaaagcat tctaggtcag 1380

agccacaata cttttcctca gcaaagtatg ccaaagaagg tcttattcgc aaaccaatat 1440

ttgataattt ccgaccccct ccactaactc ccgaggacgt tggctttgca tctggttttt 1500

ctgcatctgg taccgctgct tcagcccgat tgttttcgcc actccattct ggaacaaggt 1560

ttgatatgca caaaaggagc cctcttctga gagctccaag atttactcca agtgaggctc 1620

actctagaat atttgagtct gtaaccttgc ctagtaatcg aacttctgct ggaacatctt 1680

cttcaggagt atccaataga aaaaggaaaa gaaaagtgtt tagtcctatt cgatctgaac 1740

caagatctcc ttctcactcc atgaggacaa gaagtggaag gcttagtagt tctgagctct 1800

cacctctcac ccccccgtct tctgtctctt cctcgttaag catttctgtt agtcctcttg 1860

ccactagtgc cttaaaccca acttttactt ttccttctca ttccctgact cagtctgggg 1920

aatctgcaga gaaaaatcag agaccaagga agcagactag tgctccggca gagccatttt 1980

catcaagtag tcctactcct ctcttccctt ggtttacccc aggctctcag actgaaagag 2040

ggagaaataa agacaaggcc cccgaggagc tgtccaaaga tcgagatgct gacaagagcg 2100

tggagaagga caagagtaga gagagagacc gggagagaga aaaggagaat aagcgggagt 2160

caaggaaaga gaaaaggaaa aagggatcag aaattcagag tagttctgct ttgtatcctg 2220

tgggtagggt ttccaaagag aaggttgttg gtgaagatgt tgccacttca tcttctgcca 2280

aaaaagcaac agggcggaag aagtcttcat cacatgattc tgggactgat attacttctg 2340

tgactcttgg ggatacaaca gctgtcaaaa ccaaaatact tataaagaaa gggagaggaa 2400

atctggaaaa aaccaacttg gacctcggcc caactgcccc atccctggag aaggagaaaa 2460

ccctctgcct ttccactcct tcatctagca ctgttaaaca ttccacttcc tccataggct 2520

ccatgttggc tcaggcagac aagcttccaa tgactgacaa gagggttgcc agcctcctaa 2580

aaaaggccaa agctcagctc tgcaagattg agaagagtaa gagtcttaaa caaaccgacc 2640

agcccaaagc acag 2654

<210> 4

<211> 178

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 4

ggtcaagaaa gtgactcatc agagacctct gtgcgaggac cccggattaa acatgtctgc 60

agaagagcag ctgttgccct tggccgaaaa cgagctgtgt ttcctgatga catgcccacc 120

ctgagtgcct taccatggga agaacgagaa aagattttgt cttccatggg gaatgatg 178

<210> 5

<211> 235

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 5

acaagtcatc aattgctggc tcagaagatg ctgaacctct tgctccaccc atcaaaccaa 60

ttaaacctgt cactagaaac aaggcacccc aggaacctcc agtaaagaaa ggacgtcgat 120

cgaggcggtg tgggcagtgt cccggctgcc aggtgcctga ggactgtggt gtttgtacta 180

attgcttaga taagcccaag tttggtggtc gcaatataaa gaagcagtgc tgcaa 235

<210> 6

<211> 65

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 6

gatgagaaaa tgtcagaatc tacaatggat gccttccaaa gcctacctgc agaagcaagc 60

taaag 65

<210> 7

<211> 378

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 7

ctgtgaaaaa gaaagagaaa aagtctaaga ccagtgaaaa gaaagacagc aaagagagca 60

gtgttgtgaa gaacgtggtg gactctagtc agaaacctac cccatcagca agagaggatc 120

ctgccccaaa gaaaagcagt agtgagcctc ctccacgaaa gcccgtcgag gaaaagagtg 180

aagaagggaa tgtctcggcc cctgggcctg aatccaaaca ggccaccact ccagcttcca 240

ggaagtcaag caagcaggtc tcccagccag cactggtcat cccgcctcag ccacctacta 300

caggaccgcc aagaaaagaa gttcccaaaa ccactcctag tgagcccaag aaaaagcagc 360

ctccaccacc agaatcag 378

<210> 8

<211> 74

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 8

gtccagagca gagcaaacag aaaaaagtgg ctccccgccc aagtatccct gtaaaacaaa 60

aaccaaaaga aaag 74

<210> 9

<211> 132

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 9

gaaaaaccac ctccggtcaa taagcaggag aatgcaggca ctttgaacat cctcagcact 60

ctctccaatg gcaatagttc taagcaaaaa attccagcag atggagtcca caggatcaga 120

gtggacttta ag 132

<210> 10

<211> 114

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 10

gaggattgtg aagcagaaaa tgtgtgggag atgggaggct taggaatctt gacttctgtt 60

cctataacac ccagggtggt ttgctttctc tgtgccagta gtgggcatgt agag 114

<210> 11

<211> 147

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 11

tttgtgtatt gccaagtctg ttgtgagccc ttccacaagt tttgtttaga ggagaacgag 60

cgccctctgg aggaccagct ggaaaattgg tgttgtcgtc gttgcaaatt ctgtcacgtt 120

tgtggaaggc aacatcaggc tacaaag 147

<210> 12

<211> 96

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 12

cagctgctgg agtgtaataa gtgccgaaac agctatcacc ctgagtgcct gggaccaaac 60

taccccacca aacccacaaa gaagaagaaa gtctgg 96

<210> 13

<211> 121

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 13

atctgtacca agtgtgttcg ctgtaagagc tgtggatcca caactccagg caaagggtgg 60

gatgcacagt ggtctcatga tttctcactg tgtcatgatt gcgccaagct ctttgctaaa 120

g 121

<210> 14

<211> 123

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 14

gaaacttctg ccctctctgt gacaaatgtt atgatgatga tgactatgag agtaagatga 60

tgcaatgtgg aaagtgtgat cgctgggtcc attccaaatg tgagaatctt tcaggtacag 120

aag 123

<210> 15

<211> 185

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 15

atgagatgta tgagattcta tctaatctgc cagaaagtgt ggcctacact tgtgtgaact 60

gtactgagcg gcaccctgca gagtggcgac tggcccttga aaaagagctg cagatttctc 120

tgaagcaagt tctgacagct ttgttgaatt ctcggactac cagccatttg ctacgctacc 180

ggcag 185

<210> 16

<211> 174

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 16

gctgccaagc ctccagactt aaatcccgag acagaggaga gtataccttc ccgcagctcc 60

cccgaaggac ctgatccacc agttcttact gaggtcagca aacaggatga tcagcagcct 120

ttagatctag aaggagtcaa gaggaagatg gaccaaggga attacacatc tgtg 174

<210> 17

<211> 111

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 17

ttggagttca gtgatgatat tgtgaagatc attcaagcag ccattaattc agatggagga 60

cagccagaaa ttaaaaaagc caacagcatg gtcaagtcct tcttcattcg g 111

<210> 18

<211> 74

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 18

caaatggaac gtgtttttcc atggttcagt gtcaaaaagt ccaggttttg ggagccaaat 60

aaagtatcaa gcaa 74

<210> 19

<211> 194

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 19

cagtgggatg ttaccaaacg cagtgcttcc accttcactt gaccataatt atgctcagtg 60

gcaggagcga gaggaaaaca gccacactga gcagcctcct ttaatgaaga aaatcattcc 120

agctcccaaa cccaaaggtc ctggagaacc agactcacca actcctctgc atcctcctac 180

accaccaatt ttga 194

<210> 20

<211> 107

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 20

gtactgatag gagtcgagaa gacagtccag agctgaaccc acccccaggc atagaagaca 60

atagacagtg tgcgttatgt ttgacttatg gtgatgacag tgctaat 107

<210> 21

<211> 138

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 21

gatgctggtc gtttactata tattggccaa aatgagtgga cacatgtaaa ttgtgctttg 60

tggtcagcgg aagtgtttga agatgatgac ggatcactaa agaatgtgca tatggctgtg 120

atcaggggca agcagctg 138

<210> 22

<211> 159

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 22

agatgtgaat tctgccaaaa gccaggagcc accgtgggtt gctgtctcac atcctgcacc 60

agcaactatc acttcatgtg ttcccgagcc aagaactgtg tctttctgga tgataaaaaa 120

gtatattgcc aacgacatcg ggatttgatc aaaggcgaa 159

<210> 23

<211> 118

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 23

gtggttcctg agaatggatt tgaagttttc agaagagtgt ttgtggactt tgaaggaatc 60

agcttgagaa ggaagtttct caatggcttg gaaccagaaa atatccacat gatgattg 118

<210> 24

<211> 79

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 24

ggtctatgac aatcgactgc ttaggaattc taaatgatct ctccgactgt gaagataagc 60

tctttcctat tggatatca 79

<210> 25

<211> 161

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 25

gtgttccagg gtatactgga gcaccacaga tgctcgcaag cgctgtgtat atacatgcaa 60

gatagtggag tgccgtcctc cagtcgtaga gccggatatc aacagcactg ttgaacatga 120

tgaaaacagg accattgccc atagtccaac atcttttaca g 161

<210> 26

<211> 186

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 26

aaagttcatc aaaagagagt caaaacacag ctgaaattat aagtcctcca tcaccagacc 60

gacctcctca ttcacaaacc tctggctcct gttattatca tgtcatctca aaggtcccca 120

ggattcgaac acccagttat tctccaacac agagatcccc tggctgtcga ccgttgcctt 180

ctgcag 186

<210> 27

<211> 4249

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 27

gaagtcctac cccaaccact catgaaatag tcacagtagg tgatccttta ctctcctctg 60

gacttcgaag cattggctcc aggcgtcaca gtacctcttc cttatcaccc cagcggtcca 120

aactccggat aatgtctcca atgagaactg ggaatactta ctctaggaat aatgtttcct 180

cagtctccac caccgggacc gctactgatc ttgaatcaag tgccaaagta gttgatcatg 240

tcttagggcc actgaattca agtactagtt tagggcaaaa cacttccacc tcttcaaatt 300

tgcaaaggac agtggttact gtaggcaata aaaacagtca cttggatgga tcttcatctt 360

cagaaatgaa gcagtccagt gcttcagact tggtgtccaa gagctcctct ttaaagggag 420

agaagaccaa agtgctgagt tccaagagct cagagggatc tgcacataat gtggcttacc 480

ctggaattcc taaactggcc ccacaggttc ataacacaac atctagagaa ctgaatgtta 540

gtaaaatcgg ctcctttgct gaaccctctt cagtgtcgtt ttcttctaaa gaggccctct 600

ccttcccaca cctccatttg agagggcaaa ggaatgatcg agaccaacac acagattcta 660

cccaatcagc aaactcctct ccagatgaag atactgaagt caaaaccttg aagctatctg 720

gaatgagcaa cagatcatcc attatcaacg aacatatggg atctagttcc agagatagga 780

gacagaaagg gaaaaaatcc tgtaaagaaa ctttcaaaga aaagcattcc agtaaatctt 840

ttttggaacc tggtcaggtg acaactggtg aggaaggaaa cttgaagcca gagtttatgg 900

atgaggtttt gactcctgag tatatgggcc aacgaccatg taacaatgtt tcttctgata 960

agattggtga taaaggcctt tctatgccag gagtccccaa agctccaccc atgcaagtag 1020

aaggatctgc caaggaatta caggcaccac ggaaacgcac agtcaaagtg acactgacac 1080

ctctaaaaat ggaaaatgag agtcaatcca aaaatgccct gaaagaaagt agtcctgctt 1140

cccctttgca aatagagtca acatctccca cagaaccaat ttcagcctct gaaaatccag 1200

gagatggtcc agtggcccaa ccaagcccca ataatacctc atgccaggat tctcaaagta 1260

acaactatca gaatcttcca gtacaggaca gaaacctaat gcttccagat ggccccaaac 1320

ctcaggagga tggctctttt aaaaggaggt atccccgtcg cagtgcccgt gcacgttcta 1380

acatgttttt tgggcttacc ccactctatg gagtaagatc ctatggtgaa gaagacattc 1440

cattctacag cagctcaact gggaagaagc gaggcaagag atcagctgaa ggacaggtgg 1500

atggggccga tgacttaagc acttcagatg aagacgactt atactattac aacttcacta 1560

gaacagtgat ttcttcaggt ggagaggaac gactggcatc ccataattta tttcgggagg 1620

aggaacagtg tgatcttcca aaaatctcac agttggatgg tgttgatgat gggacagaga 1680

gtgatactag tgtcacagcc acaacaagga aaagcagcca gattccaaaa agaaatggta 1740

aagaaaatgg aacagagaac ttaaagattg atagacctga agatgctggg gagaaagaac 1800

atgtcactaa gagttctgtt ggccacaaaa atgagccaaa gatggataac tgccattctg 1860

taagcagagt taaaacacag ggacaagatt ccttggaagc tcagctcagc tcattggagt 1920

caagccgcag agtccacaca agtaccccct ccgacaaaaa tttactggac acctataata 1980

ctgagctcct gaaatcagat tcagacaata acaacagtga tgactgtggg aatatcctgc 2040

cttcagacat tatggacttt gtactaaaga atactccatc catgcaggct ttgggtgaga 2100

gcccagagtc atcttcatca gaactcctga atcttggtga aggattgggt cttgacagta 2160

atcgtgaaaa agacatgggt ctttttgaag tattttctca gcagctgcct acaacagaac 2220

ctgtggatag tagtgtctct tcctctatct cagcagagga acagtttgag ttgcctctag 2280

agctaccatc tgatctgtct gtcttgacca cccggagtcc cactgtcccc agccagaatc 2340

ccagtagact agctgttatc tcagactcag gggagaagag agtaaccatc acagaaaaat 2400

ctgtagcctc ctctgaaagt gacccagcac tgctgagccc aggagtagat ccaactcctg 2460

aaggccacat gactcctgat cattttatcc aaggacacat ggatgcagac cacatctcta 2520

gccctccttg tggttcagta gagcaaggtc atggcaacaa tcaggattta actaggaaca 2580

gtagcacccc tggccttcag gtacctgttt ccccaactgt tcccatccag aaccagaagt 2640

atgtgcccaa ttctactgat agtcctggcc cgtctcagat ttccaatgca gctgtccaga 2700

ccactccacc ccacctgaag ccagccactg agaaactcat agttgttaac cagaacatgc 2760

agccacttta tgttctccaa actcttccaa atggagtgac ccaaaaaatc caattgacct 2820

cttctgttag ttctacaccc agtgtgatgg agacaaatac ttcagtattg ggacccatgg 2880

gaggtggtct cacccttacc acaggactaa atccaagctt gccaacttct caatctttgt 2940

tcccttctgc tagcaaagga ttgctaccca tgtctcatca ccagcactta cattccttcc 3000

ctgcagctac tcaaagtagt ttcccaccaa acatcagcaa tcctccttca ggcctgctta 3060

ttggggttca gcctcctccg gatccccaac ttttggtttc agaatccagc cagaggacag 3120

acctcagtac cacagtagcc actccatcct ctggactcaa gaaaagaccc atatctcgtc 3180

tacagacccg aaagaataaa aaacttgctc cctctagtac cccttcaaac attgcccctt 3240

ctgatgtggt ttctaatatg acattgatta acttcacacc ctcccagctt cctaatcatc 3300

caagtctgtt agatttgggg tcacttaata cttcatctca ccgaactgtc cccaacatca 3360

taaaaagatc taaatctagc atcatgtatt ttgaaccggc acccctgtta ccacagagtg 3420

tgggaggaac tgctgccaca gcggcaggca catcaacaat aagccaggat actagccacc 3480

tcacatcagg gtctgtgtct ggcttggcat ccagttcctc tgtcttgaat gttgtatcca 3540

tgcaaactac cacaacccct acaagtagtg cgtcagttcc aggacacgtc accttaacca 3600

acccaaggtt gcttggtacc ccagatattg gctcaataag caatctttta atcaaagcta 3660

gccagcagag cctggggatt caggaccagc ctgtggcttt accgccaagt tcaggaatgt 3720

ttccacaact ggggacatca cagaccccct ctactgctgc aataacagcg gcatctagca 3780

tctgtgtgct cccctccact cagactacgg gcataacagc cgcttcacct tctggggaag 3840

cagacgaaca ctatcagctt cagcatgtga accagctcct tgccagcaaa actgggattc 3900

attcttccca gcgtgatctt gattctgctt cagggcccca ggtatccaac tttacccaga 3960

cggtagacgc tcctaatagc atgggactgg agcagaacaa ggctttatcc tcagctgtgc 4020

aagccagccc cacctctcct gggggttctc catcctctcc atcttctgga cagcggtcag 4080

caagcccttc agtgccgggt cccactaaac ccaaaccaaa aaccaaacgg tttcagctgc 4140

ctctagacaa agggaatggc aagaagcaca aagtttccca tttgcggacc agttcttctg 4200

aagcacacat tccagaccaa gaaacgacat ccctgacctc aggcacagg 4249

<210> 28

<211> 81

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 28

gactccagga gcagaggctg agcagcagga tacagctagc gtggagcagt cctcccagaa 60

ggagtgtggg caacctgcag g 81

<210> 29

<211> 65

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 29

gcaagtcgct gttcttccgg aagttcaggt gacccaaaat ccagcaaatg aacaagaaag 60

tgcag 65

<210> 30

<211> 171

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 30

aacctaaaac agtggaagaa gaggaaagta atttcagctc cccactgatg ctttggcttc 60

agcaagaaca aaagcggaag gaaagcatta ctgagaaaaa acccaagaaa ggacttgttt 120

ttgaaatttc cagtgatgat ggctttcaga tctgtgcaga aagtattgaa g 171

<210> 31

<211> 75

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 31

atgcctggaa gtcattgaca gataaagtcc aggaagctcg atcaaatgcc cgcctaaagc 60

agctctcatt tgcag 75

<210> 32

<211> 175

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 32

gtgttaacgg tttgaggatg ctggggattc tccatgatgc agttgtgttc ctcattgagc 60

agctgtctgg tgccaagcac tgtcgaaatt acaaattccg tttccacaag ccagaggagg 120

ccaatgaacc ccccttgaac cctcacggct cagccagggc tgaagtccac ctcag 175

<210> 33

<211> 108

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 33

gaagtcagca tttgacatgt ttaacttcct ggcttctaaa catcgtcagc ctcctgaata 60

caaccccaat gatgaagaag aggaggaggt acagctgaag tcagctcg 108

<210> 34

<211> 84

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 34

gagggcaact agcatggatc tgccaatgcc catgcgcttc cggcacttaa aaaagacttc 60

taaggaggca gttggtgtct acag 84

<210> 35

<211> 130

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 35

gtctcccatc catggccggg gtcttttctg taagagaaac attgatgcag gtgagatggt 60

gattgagtat gccggcaacg tcatccgctc catccagact gacaagcggg aaaagtatta 120

cgacagcaag 130

<210> 36

<211> 4928

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 36

ggcattggtt gctatatgtt ccgaattgat gactcagagg tagtggatgc caccatgcat 60

ggaaatgctg cacgcttcat caatcactcg tgtgagccta actgctattc tcgggtcatc 120

aatattgatg ggcagaagca cattgtcatc tttgccatgc gtaagatcta ccgaggagag 180

gaactcactt acgactataa gttccccatt gaggatgcca gcaacaagct gccctgcaac 240

tgtggcgcca agaaatgccg gaagttccta aactaaagct gctcttctcc cccagtgttg 300

gagtgcaagg aggcggggcc atccaaagca acgctgaagg ccttttccag cagctgggag 360

ctcccggatt gcgtggcaca gctgaggggc ctctgtgatg gctgagctct cttatgtcct 420

atactcacat cagacatgtg atcatagtcc cagagacaga gttgaggtct cgaagaaaag 480

atccatgatc ggctttctcc tggggcccct ccaattgttt actgttagaa agtgggaatg 540

gggtccctag cagacttgcc tggaaggagc ctattataga gggttggtta tgttgggaga 600

ttgggcctga atttctccac agaaataagt tgccatcctc aggttggccc tttcccaagc 660

actgtaagtg agtgggtcag gcaaagcccc aaatggaggg ttggttagat tcctgacagt 720

ttgccagcca ggccccacct acagcgtctg tcgaacaaac agaggtctgg tggttttccc 780

tactatcctc ccactcgaga gttcacttct ggttgggaga caggattcct agcacctccg 840

gtgtcaaaag gctgtcatgg ggttgtgcca attaattacc aaacattgag cctgcaggct 900

ttgagtggga gtgttgcccc caggagcctt atctcagcca attacctttc ttgacagtag 960

gagcggcttc cctctcccat tccctcttca ctcccttttc ttcctttccc ctgtcttcat 1020

gccactgctt tcccatgctt ctttcgggtt gtaggggaga ctgactgcct gctcaaggac 1080

actccctgct gggcatagga tgtgcctgca aaaagttccc tgagcctgta agcactccag 1140

gtggggaagt ggacaggagc cattggtcat aaccagacag aatttggaaa cattttcata 1200

aagctccatg gagagtttta aagaaacata tgtagcatga ttttgtagga gaggaaaaag 1260

attatttaaa taggatttaa atcatgcaac aacgagagta tcacagccag gatgaccctt 1320

gggtcccatt cctaagacat ggttacttta ttttcccctt gttaagacat aggaagactt 1380

aatttttaaa cggtcagtgt ccagttgaag gcagaacact aatcagattt caaggcccac 1440

aacttgggga ctagaccacc ttatgttgag ggaactctgc cacctgcgtg caacccacag 1500

ctaaagtaaa ttcaatgaca ctactgccct gattactcct taggatgtgg tcaaaacagc 1560

atcaaatgtt tcttctcttc ctttccccaa gacagagtcc tgaacctgtt aaattaagtc 1620

attggatttt actctgttct gtttacagtt tactatttaa ggttttataa atgtaaatat 1680

attttgtata tttttctatg agaagcactt catagggaga agcacttatg acaaggctat 1740

tttttaaacc gcggtattat cctaatttaa aagaagatcg gtttttaata attttttatt 1800

ttcataggat gaagttagag aaaatattca gctgtacaca caaagtctgg tttttcctgc 1860

ccaacttccc cctggaaggt gtactttttg ttgtttaatg tgtagcttgt ttgtgccctg 1920

ttgacataaa tgtttcctgg gtttgctctt tgacaataaa tggagaagga aggtcaccca 1980

actccattgg gccactcccc tccttcccct attgaagctc ctcaaaaggc tacagtaata 2040

tcttgataca acagattctc ttctttcccg cctctctcct ttccggcgca acttccagag 2100

tggtgggaga cggcaatctt tacatttccc tcatctttct tacttcagag ttagcaaaca 2160

acaagttgaa tggcaacttg acatttttgc atcaccatct gcctcatagg ccactctttc 2220

ctttccctct gcccaccaag tcctcatatc tgcagagaac ccattgatca ccttgtgccc 2280

tcttttgggg cagcctgttg aaactgaagc acagtctgac cactcacgat aaagcagatt 2340

tttctctgcc tctgccacaa ggtttcagag tagtgtagtc caagtagagg gtggggcacc 2400

cttttctcgc cgcaagaagc ccattcctat ggaagtctag caaagcaata cgactcagcc 2460

cagcactctc tgccccagga ctcatggctc tgctgtgcct tccatcctgg gctcccttct 2520

ctcctgtgac cttaagaact ttgtctggtg gctttgctgg aacattgtca ctgttttcac 2580

tgtcatgcag ggagcccagc actgtggcca ggatggcaga gacttccttg tcatcatgga 2640

gaagtgccag caggggactg ggaaaagcac tctacccaga cctcacctcc cttcctcctt 2700

ttgcccatga acaagatgca gtggccctag gggttccact agtgtctgct ttcctttatt 2760

attgcactgt gtgaggtttt tttgtaaatc cttgtattcc tatttttttt aaagaaaaaa 2820

aaaaaacctt aagctgcatt tgttactgaa atgattaatg cactgatggg tcctgaattc 2880

accttgagaa agacccaaag gccagtcagg gggtgggggg aactcagcta aatagaccta 2940

gttactgccc tgctaggcca tgctgtactg tgagcccctc ctcactctct accaacccta 3000

aaccctgagg acaggggagg aacccacagc ttccttctcc tgccagctgc agatggtttg 3060

ccttgccttt ccacccccta attgtcaacc acaaaaatga gaaattcctc ttctagctca 3120

gccttgagtc cattgccaaa ttttcagcac acctgccagc aacttggggg aataagcgaa 3180

ggtttcccta caagagggaa agaaggcaaa aacggcacag ctatctccaa acacatctga 3240

gttcatttca aaagtgacca agggaatctc cgcacaaaag tgcagattga ggaattgtga 3300

tgggtcattc ccaagaatcc cccaaggggc atcccaaatc cctgaggagt aacagctgca 3360

aacctggtca gttctcagtg agagccagct cacttatagc tttgctgcta gaacctgttg 3420

tggctgcatt tcctggtggc cagtgacaac tgtgtaacca gaatagctgc atggcgctga 3480

ccctttggcc ggaacttggt ctcttggctc cctccttggc cacccaccac ctctcgcaca 3540

gcccctctgt ttttacacca ataacaagaa ttaaggggga agccctggca gctatacgtt 3600

ttcaaccaga ctcctttgcc gggacccagc ccgccaccct gctcgcctcc gtcaaacccc 3660

cggccaatgc agtgagcacc atgtagctcc cttgatttaa aaaaaataaa aaataaaaaa 3720

aaaaggaaaa aaaaatacaa cacacacaca aaaataaaaa aaatattcta atgaatgtat 3780

ctttctaaag gactgacgtt caatcaaata tctgaaaata ctaaaggtca aaaccttgtc 3840

agatgttaac ttctaagttc ggtttgggat tttttttttt taatagaaat caagttgttt 3900

ttgtttttaa ggaaaagcgg gtcattgcaa agggctgggt gtaattttat gtttcatttc 3960

cttcatttta aagcaataca aggttatgga gcagatggtt ttgtgccgaa tcatgaatac 4020

tagtcaagtc acacactctg gaaacttgca actttttgtt tgttttggtt ttcaaataaa 4080

tataaatatg atatatatag gaactaatat agtaatgcac catgtaacaa agcctagttc 4140

agtccatggc ttttaattct cttaacacta tagataagga ttgtgttaca gttgctagta 4200

gcggcaggaa gatgtcaggc tcactttcct ctgattcccg aaatgggggg aacctctaac 4260

cataaaggaa tggtagaaca gtccattcct cggatcagag aaaaatgcag acatggtgtc 4320

acctggattt ttttctgccc atgaatgttg ccagtcagta cctgtcctcc ttgtttctct 4380

atttttggtt atgaatgttg gggttaccac ctgcatttag gggaaaattg tgttctgtgc 4440

tttcctggta tcttgttccg aggtactcta gttctgtctt tcaaccaaga aaatagaatt 4500

gtggtgtttc ttttattgaa cttttaacag tctctttagt aaatacaggt agttgaataa 4560

ttgtttcaag agctcaacag atgacaagct tcttttctag aaataagaca ttttttgaca 4620

actttatcat gtataacaga tctgtttttt ttccttgtgt tcttccaagc ttctggttag 4680

agaaaaagag aaaaaaaaaa aaggaaaatg tgtctaaagt ccatcagtgt taactccctg 4740

tgacagggat gaaggaaaat actttaatag ttcaaaaaat aataatgctg aaagctctct 4800

acgaaagact gaatgtaaaa gtaaaaagtg tacatagttg taaaaaaaag gagtttttaa 4860

acatgtttat tttctatgca ctttttttta tttaagtgat agtttaatta ataaacatgt 4920

caagttta 4928

<210> 37

<211> 60

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 37

agaggtctct gatgagtcac tttcttgacc cttttctttt ggtttttgtt ttacagggat 60

<210> 38

<211> 60

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 38

agaggtctct gatgagtcac tttcttgacc cttttctttt ggtttttgtt ttacagggat 60

<210> 39

<211> 60

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 39

atctgagcca aaacctaaga attgctcatc cttttctttt ggtttttgtt ttacagggat 60

<210> 40

<211> 60

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 40

catcttctga gccagcaatt gatgacttgt cttttctttt ggtttttgtt ttacagggat 60

<210> 41

<211> 60

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 41

atctgagcca aaacctaaga attgctcatc cttttctttt ggtttttgtt ttacagggat 60

<210> 42

<211> 60

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 42

atctgagcca aaacctaaga attgctcatc cttttctttt ggtttttgtt ttacagggat 60

<210> 43

<211> 60

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 43

atctgagcca aaacctaaga attgctcatc cttaaagtcc actctgatcc tgtggactcc 60

<210> 44

<211> 60

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 44

atctgagcca aaacctaaga attgctcatc ctgattctgg tggtggaggc tgctttttct 60

<210> 45

<211> 60

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 45

atctgagcca aaacctaaga attgctcatc cttttctttt ggtttttgtt ttacagggat 60

<210> 46

<211> 60

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 46

atctgagcca aaacctaaga attgctcatc cttttctttt ggtttttgtt ttacagggat 60

<210> 47

<211> 60

<212> DNA

<213> Intelligent (Homo sapiens)

<400> 47

catcttctga gccagcaatt gatgacttgt cttttctttt ggtttttgtt ttacagggat 60

Claims

1. A method of detecting gene rearrangement, the method comprising:

obtaining a sequence to be compared of a sample to be detected;

comparing the sequence to be compared with a reference genome to obtain an abnormal comparison sequence, wherein the abnormal comparison sequence comprises a sequence with abnormal comparison position, a sequence with abnormal comparison direction and a sequence which is not compared with the reference genome;

determining the position of a candidate breakpoint according to the alignment position and the alignment direction of the abnormal alignment sequence on the reference genome;

assembling the sequences supporting the positions of the candidate breakpoints in the sequences to be compared, reserving the breakpoints which are consistent with the sequence information of the positions of the candidate breakpoints in an assembly result, and recording the breakpoints as the breakpoints of the gene rearrangement;

determining the position of a candidate breakpoint according to the alignment position and the alignment direction of the abnormal alignment sequence on the reference genome comprises:

performing sequence segmentation on the abnormal comparison sequence, comparing the abnormal comparison sequence with the reference genome, and determining the position of the candidate breakpoint according to the comparison position and the comparison direction of the segmented abnormal comparison sequence on the reference genome;

determining the position of the candidate breakpoint according to the alignment position and the alignment direction of the abnormal alignment sequence on the reference genome comprises:

performing sequence segmentation on the abnormal alignment sequence, and then comparing the abnormal alignment sequence with the reference genome to obtain a sequence which can simultaneously span two sides of a potential breakpoint and has a first length, marking the sequence as a first marker sequence, and a sequence which can simultaneously span two sides of the potential breakpoint and has a length smaller than a second length as a second marker sequence;

simulating a breakpoint reference sequence for which gene rearrangement occurs based on the location of the potential breakpoint on the first marker sequence;

comparing the sequence to be compared with the breakpoint reference sequence, marking sequences which can be compared with the breakpoint reference sequence and cross the breakpoint on the breakpoint reference sequence, and recording the sequences as breakpoint candidate sequences supporting breakpoints;

determining a position of a breakpoint on the breakpoint candidate sequence as a position of the candidate breakpoint.

2. The method of claim 1, wherein determining the location of the breakpoint on the breakpoint candidate sequence as the location of the candidate breakpoint comprises:

correcting the breakpoint candidate sequence according to the sequencing quality and the number of the support sequences to obtain the corrected candidate breakpoint sequence;

and determining the position of the breakpoint on the corrected candidate breakpoint sequence as the position of the candidate breakpoint.

3. The method of claim 1, wherein determining the location of the breakpoint on the breakpoint candidate sequence as the location of the candidate breakpoint comprises:

filtering false positive breakpoint sequences in the breakpoint candidate sequences according to the first marker sequence and the second marker sequence which support breakpoints on the breakpoint reference sequence and the paired sequences which support breakpoints spanning the breakpoint reference sequence in the sequences to be compared, so as to obtain the filtered candidate breakpoint sequences;

and determining the position of the breakpoint on the filtered candidate breakpoint sequence as the position of the candidate breakpoint.

4. The method of claim 1, wherein assembling with the sequences of the sequences to be aligned that support the position of the candidate breakpoint comprises:

and assembling according to the first marker sequence and the second marker sequence which support the breakpoint on the breakpoint reference sequence and the paired sequences which support the breakpoint on the crossing breakpoint reference sequence in the sequences to be compared, and keeping the breakpoint which is consistent with the sequence information of the position of the candidate breakpoint in the assembly result and marking as the breakpoint of the gene rearrangement.

5. The method of any one of claims 1 to 4, wherein obtaining the sequences to be aligned of the test sample comprises:

constructing a sequencing library of the sample to be detected;

performing high-throughput sequencing on the sequencing library to obtain sequencing data;

and preprocessing the sequencing data to obtain a sequence to be compared of the sample to be detected.

6. The method of claim 5, wherein the sequencing library is a hybrid capture library.

7. The method of claim 6, wherein the hybrid capture library is obtained by the capture probe of SEQ ID NO 1 to SEQ ID NO 36.

8. The method according to any one of claims 1 to 4, wherein after obtaining the breakpoint of the gene rearrangement, the method further comprises a step of quantifying the gene in which rearrangement occurs, the step of quantifying comprising:

counting the number of sequences of breakpoints supporting the gene rearrangement in the sequences to be compared according to the sequence information of the breakpoints of the gene rearrangement, and recording the number as a marker sequence;

and dividing the marker sequence number by the sequence number of the internal reference gene to obtain a ratio, namely the expression abundance of the rearranged gene relative to the internal reference gene.

9. An apparatus for detecting gene rearrangement, wherein the apparatus is used for storing or running a module, or wherein the module is a component of the apparatus; wherein the module is a software module, and the software module is one or more software modules, and the software module is used for executing the method for detecting gene rearrangement according to any one of claims 1 to 8.

10. A storage medium comprising a stored program, wherein the program executes the method for detecting gene rearrangement according to any one of claims 1 to 8.

11. A processor configured to run a program, wherein the program when executed performs the method for detecting gene rearrangement according to any one of claims 1 to 8.