CN109712672A - Detect method, apparatus, storage medium and the processor of gene rearrangement - Google Patents

Detect method, apparatus, storage medium and the processor of gene rearrangement Download PDF

Info

Publication number
CN109712672A
CN109712672A CN201811643484.6A CN201811643484A CN109712672A CN 109712672 A CN109712672 A CN 109712672A CN 201811643484 A CN201811643484 A CN 201811643484A CN 109712672 A CN109712672 A CN 109712672A
Authority
CN
China
Prior art keywords
sequence
breakpoint
compared
candidate point
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811643484.6A
Other languages
Chinese (zh)
Other versions
CN109712672B (en
Inventor
王彬安
刘洋洋
李富威
王建伟
伍启熹
刘倩
刘珂弟
唐宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing You Xun Medical Laboratory Laboratory Co Ltd
Original Assignee
Beijing You Xun Medical Laboratory Laboratory Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing You Xun Medical Laboratory Laboratory Co Ltd filed Critical Beijing You Xun Medical Laboratory Laboratory Co Ltd
Priority to CN201811643484.6A priority Critical patent/CN109712672B/en
Publication of CN109712672A publication Critical patent/CN109712672A/en
Application granted granted Critical
Publication of CN109712672B publication Critical patent/CN109712672B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention provides a kind of method, apparatus, storage medium and processors for detecting gene rearrangement.This method comprises: obtaining the sequence to be compared of sample to be tested;Sequence to be compared is compared with reference to genome, obtains abnormal aligned sequences, abnormal aligned sequences include comparing the sequence of malposition, compare the sequence of direction exception and do not compare the sequence with reference to genome;According to comparison position of the abnormal aligned sequences on reference genome and direction is compared, determines the position of Candidate point;It is assembled using the sequence for the position for supporting Candidate point in sequence to be compared, retains the consistent breakpoint of sequence information in assembling result with the position of Candidate point, be denoted as the breakpoint of gene rearrangement.The application solves the problems, such as that the prior art is difficult to detect the breakpoint location of gene rearrangement generation.

Description

Detect method, apparatus, storage medium and the processor of gene rearrangement
Technical field
The present invention relates to genetic mutation detection field, in particular to a kind of method, apparatus for detecting gene rearrangement, Storage medium and processor.
Background technique
The prior art generallys use the method for RT-Nested PCR to detect gene rearrangement phenomenon, and its step are as follows: based on The target-gene sequence known prepares special probe, detects gene rearrangement.Nest-type PRC reaction has twice PCR amplification, to reduce A possibility that amplification multiple target sites (because with two sets of all complementary primers of primer seldom), increases the sensibility of detection;Again There is the pairing of two pairs of PCR primers and detection template, increases the reliability of detection.Since second set of primer is located at first round PCR Inside product, rather than a possibility that purpose segment includes two sets of primer binding sites, is minimum, therefore second set of primer can not expand Increase non-purpose segment.This nested PCR amplification ensures that the second wheel PCR product is nearly or completely specific not without primer pairing The pollution of non-specific amplification caused by strong.
But RT-Nested PCR checks that gene rearrangement has the disadvantage in that the structure for 1) being unable to judge accurately gene rearrangement. 2) it is limited by primer and probe, unknown rearrangements can not be detected.3) it is unable to get the sequence of rearranged gene fracture bonding pad Details.
Therefore, it is necessary to be improved to existing detection method.
Summary of the invention
The main purpose of the present invention is to provide method, apparatus, storage medium and the places of a kind of detection detection gene rearrangement Device is managed, to solve the problems, such as that the prior art is difficult to detect the breakpoint location of gene rearrangement generation.
To achieve the goals above, according to an aspect of the invention, there is provided a kind of method for detecting gene rearrangement, is somebody's turn to do Method includes: to obtain the sequence to be compared of sample to be tested;Sequence to be compared is compared with reference to genome, obtains anomaly ratio To sequence, abnormal aligned sequences include comparing the sequence of malposition, compare the sequence of direction exception and do not compare with reference to base Because of the sequence of group;According to comparison position of the abnormal aligned sequences on reference genome and direction is compared, determines Candidate point Position;It is assembled, is retained disconnected with candidate in assembling result using the sequence for the position for supporting Candidate point in sequence to be compared The consistent breakpoint of sequence information of the position of point, is denoted as the breakpoint of gene rearrangement.
Further, the comparison position according to abnormal aligned sequences on reference genome and comparison direction, determine candidate The position of breakpoint include: by abnormal aligned sequences carry out sequence cutting after again with reference genome alignment, according to different after cutting Normal comparison position of the aligned sequences on reference genome and comparison direction, determine the position of Candidate point.
Further, the comparison position according to abnormal aligned sequences on reference genome and comparison direction, determine candidate The position of breakpoint includes: to carry out abnormal aligned sequences after sequence cutting again with reference genome alignment, and acquisition can cross over simultaneously The sequence of potential the first length of breakpoint two sides, is denoted as the first flag sequence, and can cross over potential breakpoint two sides simultaneously, but length is small In the second length sequence as the second flag sequence;It is simulated according to the position of the potential breakpoint on the first flag sequence and base occurs Because of the breakpoint reference sequences of rearrangement;Sequence to be compared is compared with breakpoint reference sequences, and joins to top broken-point can be compared The sequence for examining sequence and the breakpoint on across breakpoint reference sequences is marked, and is denoted as supporting the breakpoint candidate sequence of breakpoint;It will break The position of breakpoint on point candidate sequence is determined as the position of Candidate point.
It further, include: according to survey by the position that the position of the breakpoint on breakpoint candidate sequence is determined as Candidate point Sequence quality and support sequence number are corrected breakpoint candidate sequence, the Candidate point sequence after being corrected;After correction The position of breakpoint in Candidate point sequence is determined as the position of Candidate point.
It further, include: according to branch by the position that the position of the breakpoint on breakpoint candidate sequence is determined as Candidate point It holds and supports across breakpoint ginseng in the first flag sequence, the second flag sequence and sequence to be compared of the breakpoint on breakpoint reference sequences The pairs of sequence of the breakpoint in sequence is examined, the false positive sequence of breakpoints in breakpoint candidate sequence is filtered, obtains filtered candidate Sequence of breakpoints;The position of breakpoint in filtered Candidate point sequence is determined as to the position of Candidate point.
Further, carrying out assembling using the sequence for the position for supporting Candidate point in sequence to be compared includes: according to branch It holds and supports across breakpoint ginseng in the first flag sequence, the second flag sequence and sequence to be compared of the breakpoint on breakpoint reference sequences The pairs of sequence for examining the breakpoint in sequence is assembled, and is retained consistent with the sequence information of the position of Candidate point in assembling result Breakpoint, be denoted as the breakpoint of gene rearrangement.
Further, the sequence to be compared for obtaining sample to be tested includes: the sequencing library for constructing sample to be tested;To sequencing text Library carries out high-flux sequence, obtains sequencing data;Sequencing data is pre-processed, the sequence to be compared of sample to be tested is obtained.
Further, sequencing library is hybrid capture library, preferably passes through the capture of SEQIDNO:1 to SEQ ID NO:36 Probe obtains hybrid capture library.
Further, after the breakpoint for obtaining gene rearrangement, method further includes quantifying to the gene reset The step of, quantitative step includes: to count support gene weight in sequence to be compared according to the sequence information of the breakpoint of gene rearrangement The sequence number of the breakpoint of row is denoted as marker sequence number;The sequence number of marker sequence number and reference gene is divided by, gained ratio Gene expression abundance of the gene that value is as reset relative to reference gene.
To achieve the goals above, according to the second aspect of the invention, a kind of device for detecting gene rearrangement is provided, Device, which is used to store, runs module or module perhaps as the component part of device;Wherein, module is software module, software mould Block is one or more, the method that software module is used to execute any of the above-described kind of detection gene rearrangement.
According to the third aspect of the present invention, a kind of storage medium is provided, storage medium includes the program of storage, In, program executes any of the above-described kind of method for detecting gene rearrangement.
According to the fourth aspect of the present invention, a kind of processor is provided, processor is for running program, wherein program The method of any of the above-described kind of detection gene rearrangement is executed when operation.
It applies the technical scheme of the present invention, by detecting the position of gene rearrangement using high-flux sequence data, Abnormal sequence is compared with the sequence on reference genome using in sequence to be compared, to determine that the candidate reset is disconnected Point position, verifies reliable Candidate point position, further by the group shape sequence of sequence to be compared then so as to standard Really detect the breakpoint location of gene rearrangement, correspondingly, the sequence information of breakpoint location also can accurately be known, further to lead to Standard PCR is crossed to verify the breakpoint location and provide the foundation.Therefore, the present processes can not only detect known or unknown Rearrangements, and can accurately detect the specific location and corresponding sequence information reset and occurred.This method directly utilizes NGS sequencing data does not increase any additional experiment testing cost based on statistics and algorithm development.In addition, the inspection of this method It is high to survey accuracy, at low cost, the structural rearrangement suitable for low-abundance gene detects.
Detailed description of the invention
The accompanying drawings constituting a part of this application is used to provide further understanding of the present invention, and of the invention shows Examples and descriptions thereof are used to explain the present invention for meaning property, does not constitute improper limitations of the present invention.In the accompanying drawings:
Fig. 1 shows the quick-reading flow sheets that the breakpoint location of gene rearrangement is detected in a kind of preferred embodiment according to the present invention Schematic diagram;
Fig. 2 shows the detailed streams for the breakpoint location that gene rearrangement is detected in another preferred embodiment according to the present invention Journey schematic diagram;And
Fig. 3 and Fig. 4 shows breakpoint location detected by the method for embodiment according to the present invention 1 and surveys through generation PCR The sequencing result figure of sequence verifying, wherein Fig. 3 shows that the sequencing result of forward primer, Fig. 4 show the survey of reverse primer Sequence result.
Specific embodiment
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.Below in conjunction with embodiment, the present invention will be described in detail.
As background technique is previously mentioned, the prior art is only capable of judgement and resets when detecting to the gene reset The raw position of phenomenon and counterweight discharge is unable to Accurate Determining, thus, in order to improve this situation, in a kind of typical reality of the application It applies in mode, provides a kind of method for detecting gene rearrangement, this method comprises: obtaining the sequence to be compared of sample to be tested;It will Sequence to be compared is compared with reference to genome, obtains abnormal aligned sequences, and abnormal aligned sequences include comparing malposition Sequence, compare direction exception sequence and do not compare refer to genome sequence;According to abnormal aligned sequences in reference base Because of the comparison position in group and direction is compared, determines the position of Candidate point;Utilize support Candidate point in sequence to be compared The sequence of position is assembled, and is retained the consistent breakpoint of sequence information in assembling result with the position of Candidate point, is denoted as base Because of the breakpoint of rearrangement.
The method of above-mentioned detection gene rearrangement provided herein, by detecting gene using high-flux sequence data The position reset compares abnormal sequence with the sequence on reference genome using in sequence to be compared, to determine Then reliable Candidate point is further verified by the group shape sequence of sequence to be compared in the Candidate point position reset Position, so as to be accurately detected the breakpoint location of gene rearrangement, correspondingly, the sequence information of breakpoint location also can be accurate Know, provides the foundation further to verify the breakpoint location by Standard PCR.Therefore, the present processes can not only be examined Known or unknown rearrangements are measured, and can accurately detect the specific location and corresponding sequence information reset and occurred. This method directly utilizes NGS sequencing data, based on statistics and algorithm development, do not increase any additional experiment detection at This.In addition, the detection accuracy of this method is high, at low cost, the structural rearrangement suitable for low-abundance gene is detected.
It is passed through it should be noted that the sequence to be compared of above-mentioned sample to be tested can be from the raw sequencing data of sample to be tested Sequence to be compared is formed after processing, is also possible to the existing ready-made sequence to be compared that can be used to compare.The above method is logical It crosses to increase through the sequence after assembling and verifies Candidate point position, so that breakpoint location is more acurrate.
In sequence to be compared, a part can be with the sequence on reference genome alignment, and a part is because occurring gene weight It arranges and can not directly compare on reference genome, thus this partial sequence is known as abnormal aligned sequences.Abnormal aligned sequences Sequence including comparing the sequence sequences of tandem sequence repeats (so forward direction) of malposition, comparing direction exception (for example is reversely gone here and there Join duplicate sequence) and do not compare sequence (sequence of such as insertion and deletion) with reference to genome.According to these anomaly ratios pair Comparison position of the sequence on reference genome and comparison direction, using existing method (than arriving same chromosome as can comparing Malposition, occur to be inverted between sequence, it is abnormal by comparing direction, determine its potential breakpoint location.It is arrived alternatively, can compare Transposition occurs for different chromosome sequence positions, sequence, by comparing direction) it can determine its potential breakpoint location.
In certain preferred embodiments, according to comparison position of the abnormal aligned sequences on reference genome and compare other side To, determine Candidate point position include: by abnormal aligned sequences carry out sequence cutting after again with reference genome alignment, according to Comparison position of the abnormal aligned sequences on reference genome and comparison direction after cutting, determine the position of Candidate point.
Specifically, existing sequence cutting, which compares software, bwa, hisat2 or STAR.These softwares are used in comparison Looser comparison method, on every section of sequence alignment that cutting is opened to the possible position of reference genome, so as to determination Final comparison position and comparison direction.
In some preferred embodiments, according to comparison position of the abnormal aligned sequences on reference genome and comparison Direction determines that the position of Candidate point includes: that, again with reference genome alignment, will obtain after the progress sequence cutting of abnormal aligned sequences The sequence that can cross over potential the first length of breakpoint two sides simultaneously is obtained, the first flag sequence is denoted as, and potential breakpoint can be crossed over simultaneously Two sides, but less than the sequence of the second length as the second flag sequence;According to the position of the potential breakpoint on the first flag sequence The breakpoint reference sequences of gene rearrangement occur for simulation;Sequence to be compared is compared with breakpoint reference sequences, and to can compare The sequence of breakpoint on top broken-point reference sequences and across breakpoint reference sequences is marked, is denoted as supporting the breakpoint of breakpoint candidate Sequence;The position of breakpoint on breakpoint candidate sequence is determined as to the position of Candidate point.
In the data of both-end sequencing, there are the sequencing sequence of both direction, from the point of view of the sequence according to single-ended sequencing, if logical It crosses and is cut into two sections or three sections of sequences are compared with reference genome again, every section can compare arrive that refer to genome different respectively In position and direction, then the potential breakpoint location of gene rearrangement can be inferred according to the position of specific cutting.By dividing the One flag sequence and the second flag sequence, and building breakpoint reference sequences are simulated with this and are compared again, help to obtain more latent Across sequence of breakpoints and support the pairs of sequence of normally comparison across breakpoint.Further by supporting on the breakpoint reference sequences The sequence of breakpoint location acts on Candidate point sequence, to keep the accuracy of screened Candidate point relatively high.It is above-mentioned First flag sequence crosses over the first length of potential breakpoint two sides according to the difference of sequence length, can rationally be set as 20 ~25bp.And less than the sequence of the second length as in the second flag sequence, the second length can be according to sequence length Difference is rationally set as 10~20bp.
In order to further increase the accuracy of breakpoint location, can according to the sequencing depth of the sequencing data of sample to be tested and Sequencing strategy, is further corrected above-mentioned Candidate point and false positive filters, to retain the higher breakpoint of authenticity Position.
In certain preferred embodiments, the position of the breakpoint on breakpoint candidate sequence is determined as to the position of Candidate point It include: according to sequencing quality and sequence number to be supported to be corrected breakpoint candidate sequence, the Candidate point sequence after being corrected; The position of breakpoint in Candidate point sequence after correction is determined as to the position of Candidate point.
Specifically, for example, sequencing mean depth reach 1000 ×, the sequence across breakpoint reaches 2% or more of mean depth, I.e. 20 × breakpoint base correction can be carried out above, positional relationship, the base of comparison are compared by the breakpoint of analog references sequence Quality carries out breakpoint correction.If support the sequence across breakpoint lower than 20 × breakpoint false positive it is higher, usually remove.
In certain preferred embodiments, the position of the breakpoint on breakpoint candidate sequence is determined as to the position of Candidate point It include: according in the first flag sequence, the second flag sequence and sequence to be compared of supporting the breakpoint on breakpoint reference sequences It supports the pairs of sequence of the breakpoint on across breakpoint reference sequences, filters the false positive sequence of breakpoints in breakpoint candidate sequence, obtain Filtered Candidate point sequence;The position of breakpoint in filtered Candidate point sequence is determined as to the position of Candidate point It sets.
Specifically, such as retain the first flag sequence number greater than 10, supported on across breakpoint reference sequences in sequence to be compared Breakpoint pairs of sequence be greater than 50 breakpoint.Certainly, specific value herein can be fitted according to the difference of different sequencing samples Work as adjustment, is merely illustrative of herein.
In certain preferred embodiments, assembled using the sequence for the position for supporting Candidate point in sequence to be compared It include: according in the first flag sequence, the second flag sequence and sequence to be compared of supporting the breakpoint on breakpoint reference sequences Support that the pairs of sequence of the breakpoint on across breakpoint reference sequences is assembled, retain in assembling result with the position of Candidate point The consistent breakpoint of sequence information, is denoted as the breakpoint of gene rearrangement.
By carrying out using above-mentioned first flag sequence, the second flag sequence and the pairs of sequence for supporting above-mentioned breakpoint Sequence assembling, the assembling sequence by from the beginning assembling formation verify Candidate point position again, so that finally determining gene weight The breakpoint location of row is more acurrate.
It has been observed that the data to be compared of the sample to be tested of the application can be it is existing can be directly used for compare to than To sequence, it is also possible to the band aligned sequences that the initial data that sequencing obtains obtains after processing.In certain preferred embodiments In, the sequence to be compared for obtaining sample to be tested includes: the sequencing library for constructing sample to be tested;High pass measurement is carried out to sequencing library Sequence obtains sequencing data;Sequencing data is pre-processed, the sequence to be compared of sample to be tested is obtained.
In certain preferred embodiments, sequencing library is hybrid capture library, preferably passes through SEQIDNO:1 to SEQ ID The capture probe of NO:36 obtains hybrid capture library.Using hybrid capture library, can be carried out for the sequencing data of target gene Gene rearrangement detection.The capture probe of above-mentioned SEQ ID NO:1 to SEQ ID NO:36 can capture the full exon of mll gene Sequence, it is thus possible to for detect the gene exon rearrangement position and its corresponding sequence information.
The above method of the application is capable of the breakpoint location of accurate testing goal gene rearrangement, according to research purpose Difference can also detect the expression quantity of mutant gene detected using the sequence to be compared of above-mentioned sample to be tested.? In certain preferred embodiments, obtain gene rearrangement breakpoint after, the above method further include to the gene reset into The quantitative step of row, quantitative step includes: to count and support in sequence to be compared according to the sequence information of the breakpoint of gene rearrangement The sequence number of the breakpoint of gene rearrangement is denoted as marker sequence number;The sequence number of marker sequence number and reference gene is divided by, Gene expression abundance of the gene that gained ratio is as reset relative to reference gene.By to certain genes reset Expression quantity is detected, and can react the gene under given conditions or the expression under particular procedure state passes through in turn The expression quantity of the gene is detected under a series of different conditions or different conditions, the difference condition of its expression can be reacted. Above-mentioned reference gene can reasonably select according to actual needs, for example when the gene of detection is mll gene, usually can choose ABL1 gene is as reference gene.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because According to the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know It knows, the embodiments described in the specification are all preferred embodiments, and related movement not necessarily present invention institute is necessary 's.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation The method of example can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but it is very much In the case of the former be more preferably embodiment.Based on this understanding, technical solution of the present invention is substantially in other words to existing The part that technology contributes can be embodied in the form of software products, which is stored in a storage In medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that calculating equipment executes each embodiment of the present invention The method, or make processor to execute method described in each embodiment of the present invention.
In second of the application typical embodiment, a kind of device for detecting gene rearrangement is provided, device is used for Storage perhaps runs module or module is the component part of device;Wherein, module is software module, and software module is one Or it is multiple, software module is for executing any of the above-described kind of method.Gene can not only more accurately be detected using the device The breakpoint location of rearrangement, and the corresponding sequence information of breakpoint location can be obtained, and then convenient for detecting according to its sequence information Its relative expression quantity, the practicality and the scope of application are wider, and any there are the mutant genes of gene rearrangement phenomenon to use Above-mentioned apparatus is detected.
Preferably, above-mentioned apparatus includes: to obtain module, comparison module, candidate block and assembling determining module, obtains module For obtaining the sequence to be compared of sample to be tested;Comparison module is used to for sequence to be compared being compared with reference to genome, obtains To abnormal aligned sequences, abnormal aligned sequences include comparing the sequence of malposition, compare the sequence of direction exception and do not compare On with reference to genome sequence;Candidate block is used for comparison position and comparison according to abnormal aligned sequences on reference genome Direction determines the position of Candidate point;Determining module is assembled to be used to utilize the position for supporting Candidate point in sequence to be compared Sequence is assembled, and is retained the consistent breakpoint of sequence information in assembling result with the position of Candidate point, is denoted as gene rearrangement Breakpoint.
In a kind of preferred embodiment, above-mentioned candidate block includes: cutting comparison module and candidate determining module, cutting Comparison module is used for, again with reference genome alignment, candidate determining module is for root after the progress sequence cutting of abnormal aligned sequences According to comparison position of the abnormal aligned sequences after cutting on reference genome and direction is compared, determines the position of Candidate point.
In a kind of preferred embodiment, above-mentioned candidate block includes: cutting mark module, analog module, compares label Module and Candidate point module, cutting mark module be used for by abnormal aligned sequences carry out sequence cutting after again with reference genome It compares, obtains the sequence that can cross over potential the first length of breakpoint two sides simultaneously, be denoted as the first flag sequence, and can be simultaneously across latent In breakpoint two sides, but length less than the second length sequence as the second flag sequence;Analog module is used for according to the first label The breakpoint reference sequences of gene rearrangement occur for the position simulation of the potential breakpoint in sequence;Comparison mark module is used for will be to be compared Sequence is compared with breakpoint reference sequences, and to the breakpoint that can be compared on top broken-point reference sequences and across breakpoint reference sequences Sequence be marked, be denoted as support breakpoint breakpoint candidate sequence;Candidate point module is used for will be on breakpoint candidate sequence The position of breakpoint is determined as the position of Candidate point.
In a kind of preferred embodiment, Candidate point module includes: correction breakpoint module and correction determining module, correction Breakpoint module is used for according to sequencing quality and sequence number is supported to be corrected breakpoint candidate sequence, and the candidate after being corrected is disconnected Point sequence;Correction determining module is used to for being determined as the position of the breakpoint in the Candidate point sequence after correction the position of Candidate point It sets.
In a kind of preferred embodiment, Candidate point module includes: filtering breakpoint module and filtering determining module, filtering Breakpoint module is used for according to the first flag sequence of breakpoint, the second flag sequence and to be compared supported on breakpoint reference sequences The pairs of sequence that the breakpoint on across breakpoint reference sequences is supported in sequence, filters the false positive breakpoint sequence in breakpoint candidate sequence Column, obtain filtered Candidate point sequence;Determining module is filtered to be used for the breakpoint in filtered Candidate point sequence Position is determined as the position of Candidate point.
In a kind of preferred embodiment, assembling determining module includes: assembling submodule and reservation module, assembles submodule For being propped up according in the first flag sequence, the second flag sequence and sequence to be compared for supporting the breakpoint on breakpoint reference sequences The pairs of sequence for holding the breakpoint on across breakpoint reference sequences is assembled, and reservation module is disconnected with candidate in assembling result for retaining The consistent breakpoint of sequence information of the position of point, is denoted as the breakpoint of gene rearrangement.
In a kind of preferred embodiment, obtaining module includes: building module, sequencer module and preprocessing module, structure Modeling block is used to construct the sequencing library of sample to be tested;Sequencer module is used to carry out high-flux sequence to sequencing library, is surveyed Ordinal number evidence;Preprocessing module obtains the sequence to be compared of sample to be tested for pre-processing to sequencing data.
In a kind of preferred embodiment, above-mentioned sequencing library is hybrid capture library, preferably extremely by SEQIDNO:1 The capture probe of SEQ ID NO:36 obtains hybrid capture library.
In a kind of preferred embodiment, above-mentioned apparatus further includes that quantitative quantitative mould is carried out to the gene reset Block, quantitative module include: statistical module and expression quantity computing module, and statistical module is used for the sequence of the breakpoint according to gene rearrangement Information counts the sequence number for supporting the breakpoint of gene rearrangement in sequence to be compared, is denoted as marker sequence number;Expression quantity calculates mould Block is for the sequence number of marker sequence number and reference gene to be divided by, and the gene that gained ratio is as reset is relative to interior Join the gene expression abundance of gene.
In the application in the third typical embodiment, a kind of storage medium is provided, which includes storage Program, wherein program execute any of the above-described kind detection gene rearrangement method.
In the 4th kind of the application typical embodiment, a kind of processor is provided, which is used to run program, Wherein, the method for any of the above-described kind of detection gene rearrangement is executed when program is run.
Above-mentioned storage medium, processor and device can be used to execute the side of above-mentioned detection gene rearrangement by computer Method, and export corresponding testing result, these products are realized on the basis of not increasing any additional experiment and sequencing cost Detection to gene rearrangement, and the testing cost of the device is low, accuracy is high.
In several embodiments provided herein, it should be understood that disclosed technology contents can pass through others Mode is realized.Wherein, the apparatus embodiments described above are merely exemplary, such as the division of the unit, only A kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can combine or Person is desirably integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual Between coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or communication link of unit or module It connects, can be electrical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies, which is stored in a storage medium, including some instructions are used so that a calculating is set Standby (can be personal computer, server or network equipment etc.) executes the whole or portion of each embodiment the method for the present invention Step by step.And storage medium above-mentioned includes: that USB flash disk, read-only memory (ROM, Read (-) Only Memory), arbitrary access are deposited Reservoir (RAM, Random Access Memory), mobile hard disk, magnetic or disk etc. be various to can store program code Medium.
Further illustrate the beneficial effect of the application below in conjunction with specific embodiments.
The method of the detection MLL-PTD gene rearrangement of embodiment 1
1, sample and data
1) Bone Marrow of Patients or peripheral blood are extracted, is saved with collection tube.
2) sample nucleic acid is extracted, remaining sample is placed in -80 DEG C of preservations.
3) sequencing library is constructed, hybrid capture method (the hybrid capture probe of mll gene, the respectively gene are passed through 1) 36 exons, particular sequence see the table below, target area are enriched with.
4) library after capturing carries out machine sequencing.
Table 1:
2, the pretreatment of sequencing data
1) data Quality Control
Low-quality sequence is mainly deleted, the sequence comprising 5 or more base N is removed;Continuous 40 nucleotide Average sequence of the sequencing quality lower than Q20 is also deleted.
2) mll gene sequence is compared
The high quality sequence alignment for being passed through Quality Control with hisat2 is to reference sequences, for further analyzing.
3, MLL-PTD is identified
1) principle and theoretical basis:
MLL-PTD causes mll gene (totally 36 exons) molecular level to morph, and shows as the exon order of connection It changes, resets and usually occur between exon2 to exon11.
2) MLL-PTD breakpoint identifies:
Pairs of sequence is compared first, according to the sequence location relationship of comparison, for the sequence of positional relationship exception It is right, find existing structure variation between sequence pair.Simultaneously by the sequence cutting of improper comparison, the looser ratio other side of use Method determines sequence alignment to possible position final comparison position and compares direction, breakpoint location is according to cutting sequence It compares position and calculates acquisition.As shown in Figure 1, the sequence of the first length of breakpoint two sides can be crossed over simultaneously across sequence of breakpoints, as the One flag sequence, can be simultaneously across the sequence of the second length as the second flag sequence.It is simulated by flag sequence breakpoint location The breakpoint reference sequences that PTD occurs, sequence is compared again, is only retained and is compared good and there is the candidate across sequence of breakpoints Breakpoint.
3) breakpoint corrects
Because breakpoint border sequences are similar, there are mutation or mistake is sequenced, thus, as shown in Fig. 2, being surveyed according to alignment score Sequence quality and support sequence number are corrected, and optimum prediction sequence of breakpoints are provided, as Candidate point.
4) more factors filter false positive
To Candidate point, according to the first flag sequence for supporting breakpoint, the second flag sequence and support are across the pairs of of breakpoint Sequence further filters false positive.Later as shown in Fig. 2, all sequences of breakpoint will be supported to assemble, retain assembling result With the consistent breakpoint of sequence of breakpoints information.To obtain reliable MLL-PTD structural information.
4, MLL-PTD is quantitative
Marker sequence number/reference gene ABL1 sequence depth based on MLL-PTD, obtains the abundance with ABL1 gene Ratio.
Specifically 122 samples are detected according to the present processes shown in Fig. 2, are tested with 10 sample hairs MLL-PTD variation has been given birth to, it is specific to report result such as the following table 2 and table 3.
Table 2:
Table 3:
Sample number SEQ ID NO: Fusion sequence *
A 37 AGAGGTCTCTGATGAGTCACTTTCTTGACC@cttttcttttggtttttgttttacagggat
B 38 AGAGGTCTCTGATGAGTCACTTTCTTGACC@cttttcttttggtttttgttttacagggat
C 39 ATCTGAGCCAAAACCTAAGAATTGCTCATC@cttttcttttggtttttgttttacagggat
D 40 CATCTTCTGAGCCAGCAATTGATGACTTGT@cttttcttttggtttttgttttacagggat
E 41 ATCTGAGCCAAAACCTAAGAATTGCTCATC@cttttcttttggtttttgttttacagggat
F 42 ATCTGAGCCAAAACCTAAGAATTGCTCATC@cttttcttttggtttttgttttacagggat
G 43 ATCTGAGCCAAAACCTAAGAATTGCTCATC@cttaaagtccactctgatcctgtggactcc
H 44 ATCTGAGCCAAAACCTAAGAATTGCTCATC@ctgattctggtggtggaggctgctttttct
I 45 ATCTGAGCCAAAACCTAAGAATTGCTCATC@cttttcttttggtttttgttttacagggat
J 46 ATCTGAGCCAAAACCTAAGAATTGCTCATC@cttttcttttggtttttgttttacagggat
K 47 CATCTTCTGAGCCAGCAATTGATGACTTGT@cttttcttttggtttttgttttacagggat
* the fusion sequence in table 3 is the sequence of reverse complemental, and the letter of such as 4 small letter of A:exon8- > exon represents exon8 Sequence, the representative exon4 of capitalization.
2, it chooses sample C and carries out the MLL-PTD breakpoint arrangement that Sanger sequence verification detects.
Sample information such as the following table 4 of PCR verifying.
Table 4:
Sample number MLL-PTD structure Exon A Exon B Marker sequence number Ratio
C exon8->exon2 exon8 exon2 231 17.12%
Verify obtained sequence information are as follows:
ATCTGAGCCAAAACCTAAGAATTGCTCATC@cttttcttttggtttttgttttacagggat
(i.e. SEQ ID NO:39).
3, breakpoint template sequence is generated according to breakpoint location, the 300bp design primer before and after breakpoint carries out PCR amplification.
4, PCR product size reasonable, band become clear it is single, by PCR product carry out Sanger sequencing, sequencing peak figure it is clean.
5, breakpoint arrangement, and the side of breakpoint base context and above-mentioned the application can be found according to Sanger sequencing result The sequence of breakpoints that method is identified is completely the same (forward primer sequencing result is shown in that Fig. 3, reverse complemental sequencing result are shown in Fig. 4).
It can be seen from the above description that the above embodiments of the present invention realized the following chievements: the application Method can not only detect known or unknown rearrangements, and can accurately detect and reset the specific location occurred and corresponding Sequence information.It is applied widely, it is suitble to the detection of all genes that rearrangements occur.
This method directly utilizes NGS sequencing data, based on statistics and algorithm development, does not increase any additional experiment Testing cost.In addition, the detection accuracy of this method is high, at low cost, the structural rearrangement suitable for low-abundance gene is detected.
As seen through the above description of the embodiments, those skilled in the art can be understood that the application can It realizes by means of software and necessary general hardware platform.Based on this understanding, the technical solution essence of the application On in other words the part that contributes to existing technology can be embodied in the form of software products, the computer software product It can store in storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions are used so that a computer equipment (can be personal computer, server or the network equipment etc.) executes the certain of each embodiment of the application or embodiment Partial method.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method Part explanation.
The application can be used in numerous general or special purpose computing system environments or configuration.Such as: personal computer, service Device computer, handheld device or portable device, laptop device, multicomputer system, microprocessor-based system, top set Box, programmable consumer-elcetronics devices, network PC, minicomputer, mainframe computer, including any of the above system or equipment Distributed computing environment etc..
Obviously, those skilled in the art should be understood that each module of the above invention or each step can be with general Computing device realize that they can be concentrated on a single computing device, or be distributed in multiple computing devices and formed Network on, optionally, they can be realized with the program code that computing device can perform, it is thus possible to which they are stored Be performed by computing device in the storage device, perhaps they are fabricated to each integrated circuit modules or by they In multiple modules or step be fabricated to single integrated circuit module to realize.In this way, the present invention is not limited to any specific Hardware and software combines.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, made any to repair Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.
Sequence table
<110>Beijing You Xun Laboratory of medical test Co., Ltd
<120>method, apparatus, storage medium and the processor of gene rearrangement are detected.
<130> PN102308YXYX
<160> 47
<170> SIPOSequenceListing 1.0
<210> 1
<211> 455
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 1
ctgcttcact tcacggggcg aacatggcgc acagctgtcg gtggcgcttc cccgcccgac 60
ccgggaccac cgggggcggc ggcggcgggg ggcgccgggg cctagggggc gccccgcggc 120
aacgcgtccc ggccctgctg cttccccccg ggcccccggt cggcggtggc ggccccgggg 180
cgcccccctc ccccccggct gtggcggccg cggcggcggc ggcgggaagc agcggggctg 240
gggttccagg gggagcggcc gccgcctcag cagcctcctc gtcgtccgcc tcgtcttcgt 300
cttcgtcatc gtcctcagcc tcttcagggc cggccctgct ccgggtgggc ccgggcttcg 360
acgcggcgct gcaggtctcg gccgccatcg gcaccaacct gcgccggttc cgggccgtgt 420
ttggggagag cggcggggga ggcggcagcg gagag 455
<210> 2
<211> 70
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 2
gatgagcaat tcttaggttt tggctcagat gaagaagtca gagtgcgaag tcccacaagg 60
tctccttcag 70
<210> 3
<211> 2654
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 3
ttaaaactag tcctcgaaaa cctcgtggga gacctagaag tggctctgac cgaaattcag 60
ctatcctctc agatccatct gtgttttccc ctctaaataa atcagagacc aaatctggag 120
ataagatcaa gaagaaagat tctaaaagta tagaaaagaa gagaggaaga cctcccacct 180
tccctggagt aaaaatcaaa ataacacatg gaaaggacat ttcagagtta ccaaagggaa 240
acaaagaaga tagcctgaaa aaaattaaaa ggacaccttc tgctacgttt cagcaagcca 300
caaagattaa aaaattaaga gcaggtaaac tctctcctct caagtctaag tttaagacag 360
ggaagcttca aataggaagg aagggggtac aaattgtacg acggagagga aggcctccat 420
caacagaaag gataaagacc ccttcgggtc tcctcattaa ttctgaactg gaaaagcccc 480
agaaagtccg gaaagacaag gaaggaacac ctccacttac aaaagaagat aagacagttg 540
tcagacaaag ccctcgaagg attaagccag ttaggattat tccttcttca aaaaggacag 600
atgcaaccat tgctaagcaa ctcttacaga gggcaaaaaa gggggctcaa aagaaaattg 660
aaaaagaagc agctcagctg cagggaagaa aggtgaagac acaggtcaaa aatattcgac 720
agttcatcat gcctgttgtc agtgctatct cctcgcggat cattaagacc cctcggcggt 780
ttatagagga tgaggattat gaccctccaa ttaaaattgc ccgattagag tctacaccga 840
atagtagatt cagtgccccg tcctgtggat cttctgaaaa atcaagtgca gcttctcagc 900
actcctctca aatgtcttca gactcctctc gatctagtag ccccagtgtt gatacctcca 960
cagactctca ggcttctgag gagattcagg tacttcctga ggagcggagc gatacccctg 1020
aagttcatcc tccactgccc atttcccagt ccccagaaaa tgagagtaat gataggagaa 1080
gcagaaggta ttcagtgtcg gagagaagtt ttggatctag aacgacgaaa aaattatcaa 1140
ctctacaaag tgccccccag cagcagacct cctcgtctcc acctccacct ctgctgactc 1200
caccgccacc actgcagcca gcctccagta tctctgacca cacaccttgg cttatgcctc 1260
caacaatccc cttagcatca ccatttttgc ctgcttccac tgctcctatg caagggaagc 1320
gaaaatctat tttgcgagaa ccgacattta ggtggacttc tttaaagcat tctaggtcag 1380
agccacaata cttttcctca gcaaagtatg ccaaagaagg tcttattcgc aaaccaatat 1440
ttgataattt ccgaccccct ccactaactc ccgaggacgt tggctttgca tctggttttt 1500
ctgcatctgg taccgctgct tcagcccgat tgttttcgcc actccattct ggaacaaggt 1560
ttgatatgca caaaaggagc cctcttctga gagctccaag atttactcca agtgaggctc 1620
actctagaat atttgagtct gtaaccttgc ctagtaatcg aacttctgct ggaacatctt 1680
cttcaggagt atccaataga aaaaggaaaa gaaaagtgtt tagtcctatt cgatctgaac 1740
caagatctcc ttctcactcc atgaggacaa gaagtggaag gcttagtagt tctgagctct 1800
cacctctcac ccccccgtct tctgtctctt cctcgttaag catttctgtt agtcctcttg 1860
ccactagtgc cttaaaccca acttttactt ttccttctca ttccctgact cagtctgggg 1920
aatctgcaga gaaaaatcag agaccaagga agcagactag tgctccggca gagccatttt 1980
catcaagtag tcctactcct ctcttccctt ggtttacccc aggctctcag actgaaagag 2040
ggagaaataa agacaaggcc cccgaggagc tgtccaaaga tcgagatgct gacaagagcg 2100
tggagaagga caagagtaga gagagagacc gggagagaga aaaggagaat aagcgggagt 2160
caaggaaaga gaaaaggaaa aagggatcag aaattcagag tagttctgct ttgtatcctg 2220
tgggtagggt ttccaaagag aaggttgttg gtgaagatgt tgccacttca tcttctgcca 2280
aaaaagcaac agggcggaag aagtcttcat cacatgattc tgggactgat attacttctg 2340
tgactcttgg ggatacaaca gctgtcaaaa ccaaaatact tataaagaaa gggagaggaa 2400
atctggaaaa aaccaacttg gacctcggcc caactgcccc atccctggag aaggagaaaa 2460
ccctctgcct ttccactcct tcatctagca ctgttaaaca ttccacttcc tccataggct 2520
ccatgttggc tcaggcagac aagcttccaa tgactgacaa gagggttgcc agcctcctaa 2580
aaaaggccaa agctcagctc tgcaagattg agaagagtaa gagtcttaaa caaaccgacc 2640
agcccaaagc acag 2654
<210> 4
<211> 178
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 4
ggtcaagaaa gtgactcatc agagacctct gtgcgaggac cccggattaa acatgtctgc 60
agaagagcag ctgttgccct tggccgaaaa cgagctgtgt ttcctgatga catgcccacc 120
ctgagtgcct taccatggga agaacgagaa aagattttgt cttccatggg gaatgatg 178
<210> 5
<211> 235
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 5
acaagtcatc aattgctggc tcagaagatg ctgaacctct tgctccaccc atcaaaccaa 60
ttaaacctgt cactagaaac aaggcacccc aggaacctcc agtaaagaaa ggacgtcgat 120
cgaggcggtg tgggcagtgt cccggctgcc aggtgcctga ggactgtggt gtttgtacta 180
attgcttaga taagcccaag tttggtggtc gcaatataaa gaagcagtgc tgcaa 235
<210> 6
<211> 65
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 6
gatgagaaaa tgtcagaatc tacaatggat gccttccaaa gcctacctgc agaagcaagc 60
taaag 65
<210> 7
<211> 378
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 7
ctgtgaaaaa gaaagagaaa aagtctaaga ccagtgaaaa gaaagacagc aaagagagca 60
gtgttgtgaa gaacgtggtg gactctagtc agaaacctac cccatcagca agagaggatc 120
ctgccccaaa gaaaagcagt agtgagcctc ctccacgaaa gcccgtcgag gaaaagagtg 180
aagaagggaa tgtctcggcc cctgggcctg aatccaaaca ggccaccact ccagcttcca 240
ggaagtcaag caagcaggtc tcccagccag cactggtcat cccgcctcag ccacctacta 300
caggaccgcc aagaaaagaa gttcccaaaa ccactcctag tgagcccaag aaaaagcagc 360
ctccaccacc agaatcag 378
<210> 8
<211> 74
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 8
gtccagagca gagcaaacag aaaaaagtgg ctccccgccc aagtatccct gtaaaacaaa 60
aaccaaaaga aaag 74
<210> 9
<211> 132
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 9
gaaaaaccac ctccggtcaa taagcaggag aatgcaggca ctttgaacat cctcagcact 60
ctctccaatg gcaatagttc taagcaaaaa attccagcag atggagtcca caggatcaga 120
gtggacttta ag 132
<210> 10
<211> 114
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 10
gaggattgtg aagcagaaaa tgtgtgggag atgggaggct taggaatctt gacttctgtt 60
cctataacac ccagggtggt ttgctttctc tgtgccagta gtgggcatgt agag 114
<210> 11
<211> 147
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 11
tttgtgtatt gccaagtctg ttgtgagccc ttccacaagt tttgtttaga ggagaacgag 60
cgccctctgg aggaccagct ggaaaattgg tgttgtcgtc gttgcaaatt ctgtcacgtt 120
tgtggaaggc aacatcaggc tacaaag 147
<210> 12
<211> 96
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 12
cagctgctgg agtgtaataa gtgccgaaac agctatcacc ctgagtgcct gggaccaaac 60
taccccacca aacccacaaa gaagaagaaa gtctgg 96
<210> 13
<211> 121
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 13
atctgtacca agtgtgttcg ctgtaagagc tgtggatcca caactccagg caaagggtgg 60
gatgcacagt ggtctcatga tttctcactg tgtcatgatt gcgccaagct ctttgctaaa 120
g 121
<210> 14
<211> 123
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 14
gaaacttctg ccctctctgt gacaaatgtt atgatgatga tgactatgag agtaagatga 60
tgcaatgtgg aaagtgtgat cgctgggtcc attccaaatg tgagaatctt tcaggtacag 120
aag 123
<210> 15
<211> 185
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 15
atgagatgta tgagattcta tctaatctgc cagaaagtgt ggcctacact tgtgtgaact 60
gtactgagcg gcaccctgca gagtggcgac tggcccttga aaaagagctg cagatttctc 120
tgaagcaagt tctgacagct ttgttgaatt ctcggactac cagccatttg ctacgctacc 180
ggcag 185
<210> 16
<211> 174
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 16
gctgccaagc ctccagactt aaatcccgag acagaggaga gtataccttc ccgcagctcc 60
cccgaaggac ctgatccacc agttcttact gaggtcagca aacaggatga tcagcagcct 120
ttagatctag aaggagtcaa gaggaagatg gaccaaggga attacacatc tgtg 174
<210> 17
<211> 111
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 17
ttggagttca gtgatgatat tgtgaagatc attcaagcag ccattaattc agatggagga 60
cagccagaaa ttaaaaaagc caacagcatg gtcaagtcct tcttcattcg g 111
<210> 18
<211> 74
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 18
caaatggaac gtgtttttcc atggttcagt gtcaaaaagt ccaggttttg ggagccaaat 60
aaagtatcaa gcaa 74
<210> 19
<211> 194
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 19
cagtgggatg ttaccaaacg cagtgcttcc accttcactt gaccataatt atgctcagtg 60
gcaggagcga gaggaaaaca gccacactga gcagcctcct ttaatgaaga aaatcattcc 120
agctcccaaa cccaaaggtc ctggagaacc agactcacca actcctctgc atcctcctac 180
accaccaatt ttga 194
<210> 20
<211> 107
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 20
gtactgatag gagtcgagaa gacagtccag agctgaaccc acccccaggc atagaagaca 60
atagacagtg tgcgttatgt ttgacttatg gtgatgacag tgctaat 107
<210> 21
<211> 138
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 21
gatgctggtc gtttactata tattggccaa aatgagtgga cacatgtaaa ttgtgctttg 60
tggtcagcgg aagtgtttga agatgatgac ggatcactaa agaatgtgca tatggctgtg 120
atcaggggca agcagctg 138
<210> 22
<211> 159
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 22
agatgtgaat tctgccaaaa gccaggagcc accgtgggtt gctgtctcac atcctgcacc 60
agcaactatc acttcatgtg ttcccgagcc aagaactgtg tctttctgga tgataaaaaa 120
gtatattgcc aacgacatcg ggatttgatc aaaggcgaa 159
<210> 23
<211> 118
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 23
gtggttcctg agaatggatt tgaagttttc agaagagtgt ttgtggactt tgaaggaatc 60
agcttgagaa ggaagtttct caatggcttg gaaccagaaa atatccacat gatgattg 118
<210> 24
<211> 79
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 24
ggtctatgac aatcgactgc ttaggaattc taaatgatct ctccgactgt gaagataagc 60
tctttcctat tggatatca 79
<210> 25
<211> 161
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 25
gtgttccagg gtatactgga gcaccacaga tgctcgcaag cgctgtgtat atacatgcaa 60
gatagtggag tgccgtcctc cagtcgtaga gccggatatc aacagcactg ttgaacatga 120
tgaaaacagg accattgccc atagtccaac atcttttaca g 161
<210> 26
<211> 186
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 26
aaagttcatc aaaagagagt caaaacacag ctgaaattat aagtcctcca tcaccagacc 60
gacctcctca ttcacaaacc tctggctcct gttattatca tgtcatctca aaggtcccca 120
ggattcgaac acccagttat tctccaacac agagatcccc tggctgtcga ccgttgcctt 180
ctgcag 186
<210> 27
<211> 4249
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 27
gaagtcctac cccaaccact catgaaatag tcacagtagg tgatccttta ctctcctctg 60
gacttcgaag cattggctcc aggcgtcaca gtacctcttc cttatcaccc cagcggtcca 120
aactccggat aatgtctcca atgagaactg ggaatactta ctctaggaat aatgtttcct 180
cagtctccac caccgggacc gctactgatc ttgaatcaag tgccaaagta gttgatcatg 240
tcttagggcc actgaattca agtactagtt tagggcaaaa cacttccacc tcttcaaatt 300
tgcaaaggac agtggttact gtaggcaata aaaacagtca cttggatgga tcttcatctt 360
cagaaatgaa gcagtccagt gcttcagact tggtgtccaa gagctcctct ttaaagggag 420
agaagaccaa agtgctgagt tccaagagct cagagggatc tgcacataat gtggcttacc 480
ctggaattcc taaactggcc ccacaggttc ataacacaac atctagagaa ctgaatgtta 540
gtaaaatcgg ctcctttgct gaaccctctt cagtgtcgtt ttcttctaaa gaggccctct 600
ccttcccaca cctccatttg agagggcaaa ggaatgatcg agaccaacac acagattcta 660
cccaatcagc aaactcctct ccagatgaag atactgaagt caaaaccttg aagctatctg 720
gaatgagcaa cagatcatcc attatcaacg aacatatggg atctagttcc agagatagga 780
gacagaaagg gaaaaaatcc tgtaaagaaa ctttcaaaga aaagcattcc agtaaatctt 840
ttttggaacc tggtcaggtg acaactggtg aggaaggaaa cttgaagcca gagtttatgg 900
atgaggtttt gactcctgag tatatgggcc aacgaccatg taacaatgtt tcttctgata 960
agattggtga taaaggcctt tctatgccag gagtccccaa agctccaccc atgcaagtag 1020
aaggatctgc caaggaatta caggcaccac ggaaacgcac agtcaaagtg acactgacac 1080
ctctaaaaat ggaaaatgag agtcaatcca aaaatgccct gaaagaaagt agtcctgctt 1140
cccctttgca aatagagtca acatctccca cagaaccaat ttcagcctct gaaaatccag 1200
gagatggtcc agtggcccaa ccaagcccca ataatacctc atgccaggat tctcaaagta 1260
acaactatca gaatcttcca gtacaggaca gaaacctaat gcttccagat ggccccaaac 1320
ctcaggagga tggctctttt aaaaggaggt atccccgtcg cagtgcccgt gcacgttcta 1380
acatgttttt tgggcttacc ccactctatg gagtaagatc ctatggtgaa gaagacattc 1440
cattctacag cagctcaact gggaagaagc gaggcaagag atcagctgaa ggacaggtgg 1500
atggggccga tgacttaagc acttcagatg aagacgactt atactattac aacttcacta 1560
gaacagtgat ttcttcaggt ggagaggaac gactggcatc ccataattta tttcgggagg 1620
aggaacagtg tgatcttcca aaaatctcac agttggatgg tgttgatgat gggacagaga 1680
gtgatactag tgtcacagcc acaacaagga aaagcagcca gattccaaaa agaaatggta 1740
aagaaaatgg aacagagaac ttaaagattg atagacctga agatgctggg gagaaagaac 1800
atgtcactaa gagttctgtt ggccacaaaa atgagccaaa gatggataac tgccattctg 1860
taagcagagt taaaacacag ggacaagatt ccttggaagc tcagctcagc tcattggagt 1920
caagccgcag agtccacaca agtaccccct ccgacaaaaa tttactggac acctataata 1980
ctgagctcct gaaatcagat tcagacaata acaacagtga tgactgtggg aatatcctgc 2040
cttcagacat tatggacttt gtactaaaga atactccatc catgcaggct ttgggtgaga 2100
gcccagagtc atcttcatca gaactcctga atcttggtga aggattgggt cttgacagta 2160
atcgtgaaaa agacatgggt ctttttgaag tattttctca gcagctgcct acaacagaac 2220
ctgtggatag tagtgtctct tcctctatct cagcagagga acagtttgag ttgcctctag 2280
agctaccatc tgatctgtct gtcttgacca cccggagtcc cactgtcccc agccagaatc 2340
ccagtagact agctgttatc tcagactcag gggagaagag agtaaccatc acagaaaaat 2400
ctgtagcctc ctctgaaagt gacccagcac tgctgagccc aggagtagat ccaactcctg 2460
aaggccacat gactcctgat cattttatcc aaggacacat ggatgcagac cacatctcta 2520
gccctccttg tggttcagta gagcaaggtc atggcaacaa tcaggattta actaggaaca 2580
gtagcacccc tggccttcag gtacctgttt ccccaactgt tcccatccag aaccagaagt 2640
atgtgcccaa ttctactgat agtcctggcc cgtctcagat ttccaatgca gctgtccaga 2700
ccactccacc ccacctgaag ccagccactg agaaactcat agttgttaac cagaacatgc 2760
agccacttta tgttctccaa actcttccaa atggagtgac ccaaaaaatc caattgacct 2820
cttctgttag ttctacaccc agtgtgatgg agacaaatac ttcagtattg ggacccatgg 2880
gaggtggtct cacccttacc acaggactaa atccaagctt gccaacttct caatctttgt 2940
tcccttctgc tagcaaagga ttgctaccca tgtctcatca ccagcactta cattccttcc 3000
ctgcagctac tcaaagtagt ttcccaccaa acatcagcaa tcctccttca ggcctgctta 3060
ttggggttca gcctcctccg gatccccaac ttttggtttc agaatccagc cagaggacag 3120
acctcagtac cacagtagcc actccatcct ctggactcaa gaaaagaccc atatctcgtc 3180
tacagacccg aaagaataaa aaacttgctc cctctagtac cccttcaaac attgcccctt 3240
ctgatgtggt ttctaatatg acattgatta acttcacacc ctcccagctt cctaatcatc 3300
caagtctgtt agatttgggg tcacttaata cttcatctca ccgaactgtc cccaacatca 3360
taaaaagatc taaatctagc atcatgtatt ttgaaccggc acccctgtta ccacagagtg 3420
tgggaggaac tgctgccaca gcggcaggca catcaacaat aagccaggat actagccacc 3480
tcacatcagg gtctgtgtct ggcttggcat ccagttcctc tgtcttgaat gttgtatcca 3540
tgcaaactac cacaacccct acaagtagtg cgtcagttcc aggacacgtc accttaacca 3600
acccaaggtt gcttggtacc ccagatattg gctcaataag caatctttta atcaaagcta 3660
gccagcagag cctggggatt caggaccagc ctgtggcttt accgccaagt tcaggaatgt 3720
ttccacaact ggggacatca cagaccccct ctactgctgc aataacagcg gcatctagca 3780
tctgtgtgct cccctccact cagactacgg gcataacagc cgcttcacct tctggggaag 3840
cagacgaaca ctatcagctt cagcatgtga accagctcct tgccagcaaa actgggattc 3900
attcttccca gcgtgatctt gattctgctt cagggcccca ggtatccaac tttacccaga 3960
cggtagacgc tcctaatagc atgggactgg agcagaacaa ggctttatcc tcagctgtgc 4020
aagccagccc cacctctcct gggggttctc catcctctcc atcttctgga cagcggtcag 4080
caagcccttc agtgccgggt cccactaaac ccaaaccaaa aaccaaacgg tttcagctgc 4140
ctctagacaa agggaatggc aagaagcaca aagtttccca tttgcggacc agttcttctg 4200
aagcacacat tccagaccaa gaaacgacat ccctgacctc aggcacagg 4249
<210> 28
<211> 81
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 28
gactccagga gcagaggctg agcagcagga tacagctagc gtggagcagt cctcccagaa 60
ggagtgtggg caacctgcag g 81
<210> 29
<211> 65
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 29
gcaagtcgct gttcttccgg aagttcaggt gacccaaaat ccagcaaatg aacaagaaag 60
tgcag 65
<210> 30
<211> 171
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 30
aacctaaaac agtggaagaa gaggaaagta atttcagctc cccactgatg ctttggcttc 60
agcaagaaca aaagcggaag gaaagcatta ctgagaaaaa acccaagaaa ggacttgttt 120
ttgaaatttc cagtgatgat ggctttcaga tctgtgcaga aagtattgaa g 171
<210> 31
<211> 75
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 31
atgcctggaa gtcattgaca gataaagtcc aggaagctcg atcaaatgcc cgcctaaagc 60
agctctcatt tgcag 75
<210> 32
<211> 175
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 32
gtgttaacgg tttgaggatg ctggggattc tccatgatgc agttgtgttc ctcattgagc 60
agctgtctgg tgccaagcac tgtcgaaatt acaaattccg tttccacaag ccagaggagg 120
ccaatgaacc ccccttgaac cctcacggct cagccagggc tgaagtccac ctcag 175
<210> 33
<211> 108
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 33
gaagtcagca tttgacatgt ttaacttcct ggcttctaaa catcgtcagc ctcctgaata 60
caaccccaat gatgaagaag aggaggaggt acagctgaag tcagctcg 108
<210> 34
<211> 84
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 34
gagggcaact agcatggatc tgccaatgcc catgcgcttc cggcacttaa aaaagacttc 60
taaggaggca gttggtgtct acag 84
<210> 35
<211> 130
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 35
gtctcccatc catggccggg gtcttttctg taagagaaac attgatgcag gtgagatggt 60
gattgagtat gccggcaacg tcatccgctc catccagact gacaagcggg aaaagtatta 120
cgacagcaag 130
<210> 36
<211> 4928
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 36
ggcattggtt gctatatgtt ccgaattgat gactcagagg tagtggatgc caccatgcat 60
ggaaatgctg cacgcttcat caatcactcg tgtgagccta actgctattc tcgggtcatc 120
aatattgatg ggcagaagca cattgtcatc tttgccatgc gtaagatcta ccgaggagag 180
gaactcactt acgactataa gttccccatt gaggatgcca gcaacaagct gccctgcaac 240
tgtggcgcca agaaatgccg gaagttccta aactaaagct gctcttctcc cccagtgttg 300
gagtgcaagg aggcggggcc atccaaagca acgctgaagg ccttttccag cagctgggag 360
ctcccggatt gcgtggcaca gctgaggggc ctctgtgatg gctgagctct cttatgtcct 420
atactcacat cagacatgtg atcatagtcc cagagacaga gttgaggtct cgaagaaaag 480
atccatgatc ggctttctcc tggggcccct ccaattgttt actgttagaa agtgggaatg 540
gggtccctag cagacttgcc tggaaggagc ctattataga gggttggtta tgttgggaga 600
ttgggcctga atttctccac agaaataagt tgccatcctc aggttggccc tttcccaagc 660
actgtaagtg agtgggtcag gcaaagcccc aaatggaggg ttggttagat tcctgacagt 720
ttgccagcca ggccccacct acagcgtctg tcgaacaaac agaggtctgg tggttttccc 780
tactatcctc ccactcgaga gttcacttct ggttgggaga caggattcct agcacctccg 840
gtgtcaaaag gctgtcatgg ggttgtgcca attaattacc aaacattgag cctgcaggct 900
ttgagtggga gtgttgcccc caggagcctt atctcagcca attacctttc ttgacagtag 960
gagcggcttc cctctcccat tccctcttca ctcccttttc ttcctttccc ctgtcttcat 1020
gccactgctt tcccatgctt ctttcgggtt gtaggggaga ctgactgcct gctcaaggac 1080
actccctgct gggcatagga tgtgcctgca aaaagttccc tgagcctgta agcactccag 1140
gtggggaagt ggacaggagc cattggtcat aaccagacag aatttggaaa cattttcata 1200
aagctccatg gagagtttta aagaaacata tgtagcatga ttttgtagga gaggaaaaag 1260
attatttaaa taggatttaa atcatgcaac aacgagagta tcacagccag gatgaccctt 1320
gggtcccatt cctaagacat ggttacttta ttttcccctt gttaagacat aggaagactt 1380
aatttttaaa cggtcagtgt ccagttgaag gcagaacact aatcagattt caaggcccac 1440
aacttgggga ctagaccacc ttatgttgag ggaactctgc cacctgcgtg caacccacag 1500
ctaaagtaaa ttcaatgaca ctactgccct gattactcct taggatgtgg tcaaaacagc 1560
atcaaatgtt tcttctcttc ctttccccaa gacagagtcc tgaacctgtt aaattaagtc 1620
attggatttt actctgttct gtttacagtt tactatttaa ggttttataa atgtaaatat 1680
attttgtata tttttctatg agaagcactt catagggaga agcacttatg acaaggctat 1740
tttttaaacc gcggtattat cctaatttaa aagaagatcg gtttttaata attttttatt 1800
ttcataggat gaagttagag aaaatattca gctgtacaca caaagtctgg tttttcctgc 1860
ccaacttccc cctggaaggt gtactttttg ttgtttaatg tgtagcttgt ttgtgccctg 1920
ttgacataaa tgtttcctgg gtttgctctt tgacaataaa tggagaagga aggtcaccca 1980
actccattgg gccactcccc tccttcccct attgaagctc ctcaaaaggc tacagtaata 2040
tcttgataca acagattctc ttctttcccg cctctctcct ttccggcgca acttccagag 2100
tggtgggaga cggcaatctt tacatttccc tcatctttct tacttcagag ttagcaaaca 2160
acaagttgaa tggcaacttg acatttttgc atcaccatct gcctcatagg ccactctttc 2220
ctttccctct gcccaccaag tcctcatatc tgcagagaac ccattgatca ccttgtgccc 2280
tcttttgggg cagcctgttg aaactgaagc acagtctgac cactcacgat aaagcagatt 2340
tttctctgcc tctgccacaa ggtttcagag tagtgtagtc caagtagagg gtggggcacc 2400
cttttctcgc cgcaagaagc ccattcctat ggaagtctag caaagcaata cgactcagcc 2460
cagcactctc tgccccagga ctcatggctc tgctgtgcct tccatcctgg gctcccttct 2520
ctcctgtgac cttaagaact ttgtctggtg gctttgctgg aacattgtca ctgttttcac 2580
tgtcatgcag ggagcccagc actgtggcca ggatggcaga gacttccttg tcatcatgga 2640
gaagtgccag caggggactg ggaaaagcac tctacccaga cctcacctcc cttcctcctt 2700
ttgcccatga acaagatgca gtggccctag gggttccact agtgtctgct ttcctttatt 2760
attgcactgt gtgaggtttt tttgtaaatc cttgtattcc tatttttttt aaagaaaaaa 2820
aaaaaacctt aagctgcatt tgttactgaa atgattaatg cactgatggg tcctgaattc 2880
accttgagaa agacccaaag gccagtcagg gggtgggggg aactcagcta aatagaccta 2940
gttactgccc tgctaggcca tgctgtactg tgagcccctc ctcactctct accaacccta 3000
aaccctgagg acaggggagg aacccacagc ttccttctcc tgccagctgc agatggtttg 3060
ccttgccttt ccacccccta attgtcaacc acaaaaatga gaaattcctc ttctagctca 3120
gccttgagtc cattgccaaa ttttcagcac acctgccagc aacttggggg aataagcgaa 3180
ggtttcccta caagagggaa agaaggcaaa aacggcacag ctatctccaa acacatctga 3240
gttcatttca aaagtgacca agggaatctc cgcacaaaag tgcagattga ggaattgtga 3300
tgggtcattc ccaagaatcc cccaaggggc atcccaaatc cctgaggagt aacagctgca 3360
aacctggtca gttctcagtg agagccagct cacttatagc tttgctgcta gaacctgttg 3420
tggctgcatt tcctggtggc cagtgacaac tgtgtaacca gaatagctgc atggcgctga 3480
ccctttggcc ggaacttggt ctcttggctc cctccttggc cacccaccac ctctcgcaca 3540
gcccctctgt ttttacacca ataacaagaa ttaaggggga agccctggca gctatacgtt 3600
ttcaaccaga ctcctttgcc gggacccagc ccgccaccct gctcgcctcc gtcaaacccc 3660
cggccaatgc agtgagcacc atgtagctcc cttgatttaa aaaaaataaa aaataaaaaa 3720
aaaaggaaaa aaaaatacaa cacacacaca aaaataaaaa aaatattcta atgaatgtat 3780
ctttctaaag gactgacgtt caatcaaata tctgaaaata ctaaaggtca aaaccttgtc 3840
agatgttaac ttctaagttc ggtttgggat tttttttttt taatagaaat caagttgttt 3900
ttgtttttaa ggaaaagcgg gtcattgcaa agggctgggt gtaattttat gtttcatttc 3960
cttcatttta aagcaataca aggttatgga gcagatggtt ttgtgccgaa tcatgaatac 4020
tagtcaagtc acacactctg gaaacttgca actttttgtt tgttttggtt ttcaaataaa 4080
tataaatatg atatatatag gaactaatat agtaatgcac catgtaacaa agcctagttc 4140
agtccatggc ttttaattct cttaacacta tagataagga ttgtgttaca gttgctagta 4200
gcggcaggaa gatgtcaggc tcactttcct ctgattcccg aaatgggggg aacctctaac 4260
cataaaggaa tggtagaaca gtccattcct cggatcagag aaaaatgcag acatggtgtc 4320
acctggattt ttttctgccc atgaatgttg ccagtcagta cctgtcctcc ttgtttctct 4380
atttttggtt atgaatgttg gggttaccac ctgcatttag gggaaaattg tgttctgtgc 4440
tttcctggta tcttgttccg aggtactcta gttctgtctt tcaaccaaga aaatagaatt 4500
gtggtgtttc ttttattgaa cttttaacag tctctttagt aaatacaggt agttgaataa 4560
ttgtttcaag agctcaacag atgacaagct tcttttctag aaataagaca ttttttgaca 4620
actttatcat gtataacaga tctgtttttt ttccttgtgt tcttccaagc ttctggttag 4680
agaaaaagag aaaaaaaaaa aaggaaaatg tgtctaaagt ccatcagtgt taactccctg 4740
tgacagggat gaaggaaaat actttaatag ttcaaaaaat aataatgctg aaagctctct 4800
acgaaagact gaatgtaaaa gtaaaaagtg tacatagttg taaaaaaaag gagtttttaa 4860
acatgtttat tttctatgca ctttttttta tttaagtgat agtttaatta ataaacatgt 4920
caagttta 4928
<210> 37
<211> 60
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 37
agaggtctct gatgagtcac tttcttgacc cttttctttt ggtttttgtt ttacagggat 60
<210> 38
<211> 60
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 38
agaggtctct gatgagtcac tttcttgacc cttttctttt ggtttttgtt ttacagggat 60
<210> 39
<211> 60
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 39
atctgagcca aaacctaaga attgctcatc cttttctttt ggtttttgtt ttacagggat 60
<210> 40
<211> 60
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 40
catcttctga gccagcaatt gatgacttgt cttttctttt ggtttttgtt ttacagggat 60
<210> 41
<211> 60
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 41
atctgagcca aaacctaaga attgctcatc cttttctttt ggtttttgtt ttacagggat 60
<210> 42
<211> 60
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 42
atctgagcca aaacctaaga attgctcatc cttttctttt ggtttttgtt ttacagggat 60
<210> 43
<211> 60
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 43
atctgagcca aaacctaaga attgctcatc cttaaagtcc actctgatcc tgtggactcc 60
<210> 44
<211> 60
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 44
atctgagcca aaacctaaga attgctcatc ctgattctgg tggtggaggc tgctttttct 60
<210> 45
<211> 60
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 45
atctgagcca aaacctaaga attgctcatc cttttctttt ggtttttgtt ttacagggat 60
<210> 46
<211> 60
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 46
atctgagcca aaacctaaga attgctcatc cttttctttt ggtttttgtt ttacagggat 60
<210> 47
<211> 60
<212> DNA
<213>homo sapiens (Homo sapiens)
<400> 47
catcttctga gccagcaatt gatgacttgt cttttctttt ggtttttgtt ttacagggat 60

Claims (12)

1. a kind of method for detecting gene rearrangement, which is characterized in that the described method includes:
Obtain the sequence to be compared of sample to be tested;
The sequence to be compared is compared with reference to genome, obtains abnormal aligned sequences, the exception aligned sequences packet The sequence for comparing malposition is included, the sequence of direction exception is compared and does not compare the sequence with reference to genome;
In the comparison position with reference on genome and direction is compared according to the abnormal aligned sequences, is determined candidate disconnected The position of point;
Assembled using the sequence for the position for supporting the Candidate point in the sequence to be compared, retain assembling result in The consistent breakpoint of the sequence information of the position of the Candidate point, is denoted as the breakpoint of the gene rearrangement.
2. the method according to claim 1, wherein referring to genome described according to the abnormal aligned sequences On comparison position and compare direction, determine that the position of Candidate point includes:
Genome alignment is referred to described again after the abnormal aligned sequences are carried out sequence cutting, according to described different after cutting Normal aligned sequences determine the position of the Candidate point in the comparison position and the comparison direction with reference on genome It sets.
3. according to the method described in claim 2, it is characterized in that, referring to genome described according to the abnormal aligned sequences On comparison position and compare direction, determine that the position of the Candidate point includes:
Genome alignment is referred to described again after the abnormal aligned sequences are carried out sequence cutting, acquisition can be simultaneously across potential The sequence of the first length of breakpoint two sides, is denoted as the first flag sequence, and can cross over the potential breakpoint two sides simultaneously, but length is small In the second length sequence as the second flag sequence;
The breakpoint reference sequences that gene rearrangement occurs are simulated according to the position of the potential breakpoint on first flag sequence;
The sequence to be compared is compared with the breakpoint reference sequences, and to the upper breakpoint reference sequences can be compared And the sequence across the breakpoint on the breakpoint reference sequences is marked, and is denoted as supporting the breakpoint candidate sequence of breakpoint;
The position of breakpoint on the breakpoint candidate sequence is determined as to the position of the Candidate point.
4. according to the method described in claim 3, it is characterized in that, the position of the breakpoint on the breakpoint candidate sequence is determined Position for the Candidate point includes:
According to sequencing quality and sequence number is supported to be corrected the breakpoint candidate sequence, it is described candidate disconnected after being corrected Point sequence;
The position of breakpoint in the Candidate point sequence after correction is determined as to the position of the Candidate point.
5. according to the method described in claim 3, it is characterized in that, the position of the breakpoint on the breakpoint candidate sequence is determined Position for the Candidate point includes:
According to first flag sequence, second flag sequence and the institute for supporting the breakpoint on the breakpoint reference sequences Pairs of sequence of the support across the breakpoint on the breakpoint reference sequences in sequence to be compared is stated, is filtered in the breakpoint candidate sequence False positive sequence of breakpoints, obtain the filtered Candidate point sequence;
The position of breakpoint in the filtered Candidate point sequence is determined as to the position of the Candidate point.
6. according to the method described in claim 3, it is characterized in that, using the Candidate point is supported in the sequence to be compared Position sequence carry out assembling include:
According to first flag sequence, second flag sequence and the institute for supporting the breakpoint on the breakpoint reference sequences It states and supports that the pairs of sequence across the breakpoint on the breakpoint reference sequences is assembled in sequence to be compared, retain in assembling result With the consistent breakpoint of sequence information of the position of the Candidate point, it is denoted as the breakpoint of the gene rearrangement.
7. method according to any one of claim 1 to 6, which is characterized in that obtain the sequence to be compared of sample to be tested Include:
Construct the sequencing library of the sample to be tested;
High-flux sequence is carried out to the sequencing library, obtains sequencing data;
The sequencing data is pre-processed, the sequence to be compared of the sample to be tested is obtained.
8. preferably passing through the method according to the description of claim 7 is characterized in that the sequencing library is hybrid capture library The capture probe of SEQIDNO:1 to SEQ ID NO:36 obtains the hybrid capture library.
9. method according to any one of claim 1 to 6, which is characterized in that in the breakpoint for obtaining the gene rearrangement Later, the method also includes carrying out quantitative step to the gene reset, the quantitative step includes:
According to the sequence information of the breakpoint of the gene rearrangement, counts and support the disconnected of the gene rearrangement in the sequence to be compared The sequence number of point, is denoted as marker sequence number;
The sequence number of the marker sequence number and reference gene is divided by, gained ratio is the gene phase reset For the gene expression abundance of the reference gene.
10. a kind of device for detecting gene rearrangement, which is characterized in that described device is for storing or running module, Huo Zhesuo State the component part that module is described device;Wherein, the module is software module, and the software module is one or more, Method of the software module for detection gene rearrangement described in any one of perform claim requirement 1 to 9.
11. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein described program right of execution Benefit require any one of 1 to 9 described in detect gene rearrangement method.
12. a kind of processor, which is characterized in that the processor is for running program, wherein right of execution when described program is run Benefit require any one of 1 to 9 described in detect gene rearrangement method.
CN201811643484.6A 2018-12-29 2018-12-29 Method, device, storage medium and processor for detecting gene rearrangement Active CN109712672B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811643484.6A CN109712672B (en) 2018-12-29 2018-12-29 Method, device, storage medium and processor for detecting gene rearrangement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811643484.6A CN109712672B (en) 2018-12-29 2018-12-29 Method, device, storage medium and processor for detecting gene rearrangement

Publications (2)

Publication Number Publication Date
CN109712672A true CN109712672A (en) 2019-05-03
CN109712672B CN109712672B (en) 2021-05-25

Family

ID=66260266

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811643484.6A Active CN109712672B (en) 2018-12-29 2018-12-29 Method, device, storage medium and processor for detecting gene rearrangement

Country Status (1)

Country Link
CN (1) CN109712672B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110942807A (en) * 2019-11-20 2020-03-31 北京橡鑫生物科技有限公司 Method and apparatus for detecting gene rearrangement
CN111081318A (en) * 2019-12-06 2020-04-28 人和未来生物科技(长沙)有限公司 Fusion gene detection method, system and medium
CN111524548A (en) * 2020-07-03 2020-08-11 至本医疗科技(上海)有限公司 Method, computing device, and computer storage medium for detecting IGH reordering
CN114694753A (en) * 2022-03-18 2022-07-01 深圳华大医学检验实验室 Nucleic acid sequence comparison method, device, equipment and readable storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104298892A (en) * 2014-09-18 2015-01-21 天津诺禾致源生物信息科技有限公司 Detection device and method for gene fusion
CN104794371A (en) * 2015-04-29 2015-07-22 深圳华大基因研究院 Method and device for detecting insertion polymorphism of retrotransposon
CN105339506A (en) * 2013-03-15 2016-02-17 基因组影像公司 Methods for the detection of breakpoints in rearranged genomic sequences
US20170199961A1 (en) * 2015-12-16 2017-07-13 Gritstone Oncology, Inc. Neoantigen Identification, Manufacture, and Use
CN106951732A (en) * 2010-05-25 2017-07-14 加利福尼亚大学董事会 BAMBAM:The parallel comparative analysis of high-flux sequence data
CN107480472A (en) * 2017-07-21 2017-12-15 广州漫瑞生物信息技术有限公司 The detection method and device of a kind of Gene Fusion
CN108256295A (en) * 2016-12-29 2018-07-06 安诺优达基因科技(北京)有限公司 A kind of device for being used to detect Gene Fusion
CN108830044A (en) * 2018-06-05 2018-11-16 上海鲸舟基因科技有限公司 For detecting the detection method and device of cancer sample Gene Fusion

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106951732A (en) * 2010-05-25 2017-07-14 加利福尼亚大学董事会 BAMBAM:The parallel comparative analysis of high-flux sequence data
CN105339506A (en) * 2013-03-15 2016-02-17 基因组影像公司 Methods for the detection of breakpoints in rearranged genomic sequences
CN104298892A (en) * 2014-09-18 2015-01-21 天津诺禾致源生物信息科技有限公司 Detection device and method for gene fusion
CN104794371A (en) * 2015-04-29 2015-07-22 深圳华大基因研究院 Method and device for detecting insertion polymorphism of retrotransposon
US20170199961A1 (en) * 2015-12-16 2017-07-13 Gritstone Oncology, Inc. Neoantigen Identification, Manufacture, and Use
CN108256295A (en) * 2016-12-29 2018-07-06 安诺优达基因科技(北京)有限公司 A kind of device for being used to detect Gene Fusion
CN107480472A (en) * 2017-07-21 2017-12-15 广州漫瑞生物信息技术有限公司 The detection method and device of a kind of Gene Fusion
CN108830044A (en) * 2018-06-05 2018-11-16 上海鲸舟基因科技有限公司 For detecting the detection method and device of cancer sample Gene Fusion

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110942807A (en) * 2019-11-20 2020-03-31 北京橡鑫生物科技有限公司 Method and apparatus for detecting gene rearrangement
CN111081318A (en) * 2019-12-06 2020-04-28 人和未来生物科技(长沙)有限公司 Fusion gene detection method, system and medium
CN111524548A (en) * 2020-07-03 2020-08-11 至本医疗科技(上海)有限公司 Method, computing device, and computer storage medium for detecting IGH reordering
CN111524548B (en) * 2020-07-03 2020-10-23 至本医疗科技(上海)有限公司 Method, computing device, and computer storage medium for detecting IGH reordering
CN114694753A (en) * 2022-03-18 2022-07-01 深圳华大医学检验实验室 Nucleic acid sequence comparison method, device, equipment and readable storage medium
CN114694753B (en) * 2022-03-18 2023-04-07 深圳华大医学检验实验室 Nucleic acid sequence comparison method, device, equipment and readable storage medium

Also Published As

Publication number Publication date
CN109712672B (en) 2021-05-25

Similar Documents

Publication Publication Date Title
CN102770558B (en) The analysis of Fetal genome is carried out by maternal biological sample
CN110870016B (en) Verification method and system for sequence variant exhalations
CN108491689B (en) Tumour neoantigen identification method based on transcript profile
Cosart et al. Exome-wide DNA capture and next generation sequencing in domestic and wild species
CN109712672A (en) Detect method, apparatus, storage medium and the processor of gene rearrangement
Tigano et al. Chromosome-level assembly of the Atlantic silverside genome reveals extreme levels of sequence diversity and structural genetic variation
CN111534602A (en) Method for analyzing human blood type and genotype based on high-throughput sequencing and application thereof
Meusnier et al. Polymerase chain reaction–single strand conformation polymorphism analyses of nuclear and chloroplast DNA provide evidence for recombination, multiple introductions and nascent speciation in the Caulerpa taxifolia complex
CN109584957B (en) Detection kit for capturing α thalassemia related gene copy number
CN107460254A (en) A kind of method based on pig LINE1 transposons insertion polymorphism research and development New molecular marker
CN107312861A (en) A kind of B ALL patients prognosis risk assessment label
CN108474028A (en) Differentiate and distinguish the system and method for genetic material
Smirnova et al. The use of non-functional clonotypes as a natural calibrator for quantitative bias correction in adaptive immune receptor repertoire profiling
US20030194711A1 (en) System and method for analyzing gene expression data
CN109949866A (en) Detection method, device, computer equipment and the storage medium of pathogen operational group
CN111276189B (en) Chromosome balance translocation detection and analysis system based on NGS and application thereof
CN116515955B (en) Multi-gene targeting typing method
KR101815529B1 (en) Human Haplotyping System And Method
CN112442528B (en) LOXHD1 gene mutant and application thereof
CN105779463B (en) VPS13B gene mutation body and its application
WO2003050748A2 (en) Genetic analysis of gene expression in heterosis
Sung et al. Reduced-Cost Genotyping by Resequencing in Peanut Breeding Programs Using Tecan Allegro Targeted Resequencing V2
US20190373871A1 (en) Method for assaying genetic variants
CN114875161A (en) Molecular marker related to low-temperature tolerance of chicken, primer combination and corresponding breeding method
CN117198399A (en) Microsatellite locus, system and kit for predicting MSI state

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant