CN108229100B - DNA resets region and corresponding RNA product detections method, equipment and storage medium - Google Patents

DNA resets region and corresponding RNA product detections method, equipment and storage medium Download PDF

Info

Publication number
CN108229100B
CN108229100B CN201810497054.1A CN201810497054A CN108229100B CN 108229100 B CN108229100 B CN 108229100B CN 201810497054 A CN201810497054 A CN 201810497054A CN 108229100 B CN108229100 B CN 108229100B
Authority
CN
China
Prior art keywords
gene
dna
mapping
region
predetermined
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810497054.1A
Other languages
Chinese (zh)
Other versions
CN108229100A (en
Inventor
陈惠�
王凯
秦公炜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
To Medical Science And Technology (shanghai) Co Ltd
Original Assignee
To Medical Science And Technology (shanghai) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by To Medical Science And Technology (shanghai) Co Ltd filed Critical To Medical Science And Technology (shanghai) Co Ltd
Priority to CN201810497054.1A priority Critical patent/CN108229100B/en
Publication of CN108229100A publication Critical patent/CN108229100A/en
Application granted granted Critical
Publication of CN108229100B publication Critical patent/CN108229100B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression

Abstract

The present invention provides a kind of DNA to reset region and corresponding RNA product detections method, equipment and storage medium, and DNA therein resets method for detecting area, includes the following steps:Receive the comparison information of sample to be tested;It obtains cluster and obtains multiple tuftlets;It merges to obtain each big cluster respectively by the identical relationship of corresponding two tuftlets of pairs of two mappings and judges whether each big cluster meets predetermined logarithm;Judge that each logarithm meets the big cluster of predetermined logarithm and whether meets and predetermined compare particular conditions respectively;It all are mapped as one group by corresponding tuftlet is identical to meeting in the big cluster for comparing particular conditions respectively and carry out predetermined filtering to every group respectively;Mapping area of the region in the reference gene group is reset in the corresponding region of all mappings in the big cluster all left after the scheduled filtering of two groups of judgement for an actual DNA of sample to be tested, and corresponding big cluster is that the DNA is supported to reset the big cluster of support in region.

Description

DNA resets region and corresponding RNA product detections method, equipment and storage medium
Technical field
The invention belongs to biological information fields, and in particular to a kind of DNA rearrangements region inspection for resetting region for detecting DNA Survey method and corresponding equipment and storage medium, further relate to for above-mentioned determination DNA reset region corresponding RNA products into Capable RNA detection methods and corresponding equipment and storage medium.
Background technology
The variation of two generation sequencing technologies (WGS or the capture of target gene probe) detection DNA level, is clinical cancer therapy The most common technique mode for targeting medication design and finding.The variant form of DNA level mainly has single base mutation (single Nucleotide polymorphism, abbreviation SNP), insertion and deletion (insertion and deletion, abbreviation Indel), Reset (rearrangement, due to being happened at DNA level, we claim DNA to reset) and copy number variation (copy number Variation, abbreviation CNV) four major types, reset the relatively other DNA variant forms biggers of influence to gene function.
Wherein, above-mentioned DNA is reset, and refers to the variation of the position in genome by DNA fragmentation, i.e., from an evolution It is a kind of mode that gene activity is adjusted to change the activity or expression quantity of gene to another position, and DNA resets and turns Record can form fusion (fusion gene) or gene blocks (gene truncation), is the weight of cancer occurrence and development Want one of mechanism.Therefore, detection DNA is reset and function effect prediction is designed and found for the targeting medication of oncotherapy, tool There is very important meaning, is necessary a link.
The nickname that DNA is rearranged in Chromosome level is called chromosomal structural variation (structure variation, abbreviation SV) comprising type (abbreviation chromosomal rearrangement type) include large fragment repeat, missing, intrachromosomal translocation, interchromosomal Transposition, this 5 kinds of forms of inversion.
And DNA is rearranged in the rearrangement of gene level, including type (being indicated with gene rearrangement type) substantially have 3 classes:1) It is reset between gene:Such as ALK, RET, ROS1, the target genes such as BRAF and the DNA of other genes are reset;2) it is reset in gene: The gene internal of Oncogene such as EGFR, MET, kinases main structure domain duplication reset and caused by abnormal activation;3) base Because being reset with intergenic region:The gene internal of Tumor suppressor gene and intergenic region (regions intergenic) weight Row and caused by truncation block abnormal inactivation.From the aforegoing it can be seen that gene rearrangement type uses clinically cancer target Medicine guidance is directly related.
At present both at home and abroad, a variety of method softwares are had based on transcript detection fusion gene:FusionSeq、TopHat- Fusion, deFuse, FusionHunter, FusionMap, SoapFus (CN201180076185.9) etc. (one);Inspection Surveying chromosomal structural variation also has correlation technique (such as CN201380004734.0) open (the two);External FMI is also just specified Some genes or newfound fusion gene detections carry out patent application protection (third party).
But because the detection object of one is RNA, the two detection level is Chromosome level, be can only obtain disconnected Point range needs further experiment to confirm breakpoint;And the third party is detected by the DNA of certain specified genes, prediction Fusion gene mechanism.Three does not analyze the rearrangement of the DNA comprehensive direct visualizztions for carrying out gene level, for clinic The targeting medication of oncotherapy designs and finds the general requirment of the variation detection of required gene rearrangement type and is not suitable for.
Invention content
The present invention provide a kind of DNA resetting region for detecting DNA reset method for detecting area and corresponding equipment with And storage medium, it further relates to reset RNA detection methods and phase that the corresponding RNA products in region carry out for the DNA of above-mentioned determination The equipment and storage medium answered.
To achieve the goals above, present invention employs following technical solutions:
The present invention provides a kind of DNA to reset method for detecting area, and resetting region to the DNA of sample to be tested is detected, It is characterized by comprising the following steps:It receives and the multipair of double end sequencings acquisitions is carried out to multiple sequencing segments of sample to be tested Pairs of two read the comparison information that long double end sequencing data are compared with reference gene group;It is obtained from comparison information All mapping distances meet the corresponding mapping of predetermined mapping value and are clustered to obtain multiple tuftlets by predetermined clusters rule;To own All mappings in tuftlet are merged to obtain respectively each by the pairs of identical relationship of corresponding two tuftlets of two mappings Big cluster, and judge whether the logarithm for two pairs of mappings that each big cluster includes meets predetermined logarithm respectively;Judge respectively each Whether a logarithm meets the comparison that the corresponding all readings of all mappings in the big cluster of predetermined logarithm are grown in reference gene group full Foot is predetermined to compare particular conditions;Respectively the identical institute of corresponding tuftlet is pressed to meeting in the predetermined big cluster for comparing particular conditions It is mapped as one group and carries out predetermined filtering to every group respectively;It is all in the big cluster all left after the scheduled filtering of two groups of judgement It maps the actual DNA that corresponding region is sample to be tested and resets mapping area of the region in the reference gene group, phase The big cluster answered is that the DNA is supported to reset the big cluster of support in region.
DNA provided by the invention resets method for detecting area, also has the feature that, further comprising the steps of:Respectively To support each two pairs of mappings in big cluster judge whether to meet one by one all not the first breakpoint condition of breakpoint, have Second breakpoint condition of breakpoint or only there are one have a condition in the third breakpoint condition of breakpoint;Setting meets institute first Pairs of two of breakpoint condition are mapped as first kind mapping;Pairs of two that setting meets the second breakpoint condition are mapped as the Two classes map;Pairs of two that setting meets third breakpoint condition are mapped as the mapping of third class;To including at least the mapping of two classes The big cluster of support the corresponding mapping area of inhomogeneity between determine overlap region obtain overlap corresponding with the big cluster of the support Region.
DNA provided by the invention resets method for detecting area, also has the feature that, further comprising the steps of:Judgement Each DNA resets the chromosomal rearrangement type in region, specifically includes:According between the two groups of mappings supported in big cluster direction and Chromosome location judges whether chromosomal rearrangement type is intrachromosomal missing, repeats, in the transposition of inversion or interchromosomal One kind.
DNA provided by the invention resets method for detecting area, also has the feature that, further comprising the steps of:It is based on With reference to transcript, direction and corresponding mapping position pair are mapped accordingly in conjunction with each mapping contained in big cluster is supported Each mapping area is annotated to obtain the detailed gene structure that corresponding each DNA resets region.
DNA provided by the invention resets method for detecting area, also has the feature that, further comprising the steps of:Judgement Each DNA resets the gene rearrangement type in region, specifically includes:The detailed gene structure being annotated into according to mapping area, sentences Whether the gene rearrangement type that fixed corresponding DNA resets region is gene and rearrangement in gene rearrangement, gene or gene and gene Between area reset in one kind.
DNA provided by the invention resets method for detecting area, also has the feature that, wherein predetermined intersize is 0bp is more than or equal to 2000bp;Predetermined clusters rule is:By between the corresponding each mapping position of each mapping for cluster Spacing meet the gathering for a tuftlet of predetermined clusters distance, predetermined clusters distance is, less than or equal to 1000bp, predetermined logarithm is More than or equal to 6 pairs, the predetermined particular conditions that compare are:For judging that the predetermined all readings for comparing particular conditions are grown with reference to base Because all mappings in group meet presumptive area quantity according to the number for merging the combined region that rule merges, merge Rule is:Meet the merging of predetermined combined distance by the spacing between the corresponding each mapping position of each mapping for merging For a combined region, predetermined combined distance is less than or equal to 1000bp, and presumptive area quantity is less than or equal to 6.
DNA provided by the invention resets method for detecting area, also has the feature that, wherein predetermined filtering include with Lower step:Whether judge in all mappings that group includes containing there are the mappings of breakpoint;When judgement does not contain reflecting there are breakpoint When penetrating, judge to grow in all mappings in reference gene group there are the mapping of breakpoint with the presence or absence of corresponding reading in group, when In the presence of judgement, judge that corresponding group leaves, when being judged as, containing there are when the mapping of breakpoint, judging whether deposited in corresponding group In the breakpoint for meeting predetermined condition for consistence, when judging to have the breakpoint for meeting predetermined condition for consistence, judgement meets predetermined The breakpoint of condition for consistence is reliable breakpoint, and judges that corresponding group leaves, and predetermined condition for consistence is:It is each that there is same break Using the same breakpoint as starting point, there are the identical series that cannot continuously compare between each other for all mappings of point, and these The quantity of mapping is more than or equal to 2.
DNA provided by the invention resets method for detecting area, also has the feature that, when sample to be tested is non-tumor group When knitting sample, it is further comprising the steps of to make a reservation for filtering:
Judge whether the group that leaves meets near each reliable breakpoint that corresponding corresponding mapping has within the scope of predetermined base There is consecutive identical base quantity to be respectively less than the first condition of the mapping of predetermined base number;Judge whether the group left meets institute There is the quantity in the presence of the mapping of the reliable breakpoint more than the first predetermined breakpoint number to account for the ratio of the sum of all mappings in the group Example is less than the second condition of predetermined ratio;Judge whether the group left meets all mappings pair having less than or equal to predetermined total number The number for the same reliable breakpoint answered is less than the third condition of the second predetermined breakpoint number;Judgement while meeting first condition, the The group of one or more of two conditions or third condition leaves, and predetermined base is ranging from extended by midpoint or so of reliable breakpoint The corresponding mapping range of 20 bases, predetermined base number are 20, and predetermined ratio is one third, the first predetermined breakpoint number It it is 2, predetermined total number is 2, and the second predetermined breakpoint number is 10.
DNA provided by the invention resets method for detecting area, also has the feature that, when sample to be tested is tumor tissues It is the check sample compareed with nonneoplastic tissue when sample, the multiple sequencing segments progress also received to check sample is double Long double end sequencing data are read in multipair two pairs of controls that end sequencing obtains and reference gene group compare to Few includes the control comparison information for each control mapping that control reads to grow in reference gene group, and predetermined filtering further includes following step Suddenly:Judge the group that leaves whether meet near each reliable breakpoint corresponding corresponding mapping within the scope of predetermined base have it is continuous Identical base quantity is respectively less than the first condition of the mapping of predetermined base number;Judge whether the group left meets all presence The total ratio that all mappings in the group are accounted for more than the quantity of the mapping of the several reliable breakpoint of the first predetermined breakpoint is small In the second condition of predetermined ratio;It is all with corresponding less than or equal to the mapping of predetermined total number to judge whether the group left meets The number of same reliable breakpoint is less than the third condition of the second predetermined breakpoint number;Judge whether the group left meets with reliable In all mappings of breakpoint exist with compare in comparison information distinguish it is identical control mapping control breakpoint distinguish it is identical can It is less than or equal to the fourth condition of predetermined item number by the quantity of all mappings of breakpoint, judges while meeting first condition, Article 2 The group of one or more of part, third condition or fourth condition leaves, and predetermined base is ranging from using reliable breakpoint as a midpoint left side The corresponding mapping range of 20 bases of right extension, predetermined base number are 20, and predetermined ratio is one third, and first is predetermined disconnected Point number is 2, and predetermined total number is 2, and the second predetermined breakpoint number is 10, and predetermined item number is 3-5 items.
The present invention also provides a kind of RNA product forecasts methods, which is characterized in that including:DNA resets region detection, uses Region is reset in the DNA to sample to be tested to be detected;RNA product forecasts, the RNA for resetting region to the DNA detected Product is predicted, wherein it is above-mentioned using the gene rearrangement type for determining each DNA rearrangements region that DNA resets region detection DNA reset method for detecting area and carry out, RNA product forecasts reset the gene rearrangement type in region and corresponding based on DNA It supports the RNA products that the detailed annotation gene structure of big cluster resets DNA in region to predict, includes the following steps:When DNA weights When the gene rearrangement type for arranging region is that gene is reset with intergenic region or gene internal is reset, and participate in the ginseng for the gene reset With the part of rearrangement be noted as gene be 5 ' end when, predict that the corresponding RNA products for resetting region are protein truncation, when weight When the gene rearrangement type for arranging region is that gene is reset with intergenic region or gene internal is reset, and participate in the ginseng for the gene reset When being noted as 3 ' end with the part of rearrangement, predict that the corresponding RNA products for resetting region are protein delation, when DNA resets area The gene rearrangement type in domain is gene and gene rearrangement, and the part that two genes for participating in resetting respectively participate in resetting all is noted When releasing to 5 ' end, predict that the RNA products of two genes are all protein truncation, when the type that DNA resets region is gene and base When being all annotated into 3 ' end because of the part that the genes of rearrangement, and two participation rearrangements respectively participate in resetting, two genes are predicted RNA products all be protein delation, when DNA reset region type be gene and gene rearrangement, and two participate in reset bases Because in, the part that a gene participates in resetting is annotated into 5 ' ends, when the part that another gene participates in resetting is annotated 3 ' end, The RNA products that prediction DNA resets region are the fusion protein that two Gene Fusions obtain;When the type that DNA resets region is When gene is reset with gene rearrangement or gene internal, when most end of the corresponding mapping area of group in the big cluster of support in mapping direction When what end was annotated into is introne, then the General Principle fallen by montage according to introne is predicted outer aobvious after the group is transcribed Son number, when support the group in big cluster least significant end be annotated into be exon when, predict that the group is walked around the exon number of record It is constant, when it is fusion protein to predict DNA to reset region, have a common boundary in the exon position of two groups predicted Codon complementary state after place is complementary is the coding triplet period of the day from 11 p.m. to 1 a.m, predicts that the reading frame at the fusion of fusion protein is not moved Position.
The present invention also provides a kind of DNA to reset equipment for area detection equipment, which is characterized in that including:Receiving part is received to institute State multipair two pairs of double end sequencing numbers for reading length that the double end sequencings of multiple sequencing segments progress of sample to be tested obtain According to the comparison information compared with the reference gene group;Cluster portion is obtained, is obtained from the comparison information all described Mapping distance meets the corresponding mapping of predetermined mapping value and is clustered to obtain multiple tuftlets by predetermined clusters rule;Merge and judges Portion, by all mappings in all tuftlets by the identical relationship of corresponding two tuftlets of pairs of two mappings It merges to obtain each big cluster respectively, and judges the logarithm for two pairs of mappings that each big cluster includes respectively Whether predetermined logarithm is met;Specific judging part judges that each logarithm meets in the big cluster of predetermined logarithm respectively Whether the comparison that the corresponding all readings of all mappings are grown in the reference gene group, which meets, predetermined compares specificity Condition;Filter house is identical by the corresponding tuftlet in the predetermined big cluster for comparing particular conditions to meeting respectively All described be mapped as one group and carry out predetermined filtering to every group respectively;Determination unit is reset, two described group of judgement was scheduled The corresponding region of all mappings in the big cluster all left after filter is an actual DNA weight of the sample to be tested Mapping area of the region in the reference gene group is arranged, the corresponding big cluster is that the DNA is supported to reset the big cluster of support in region.
The present invention also provides a kind of RNA product forecasts systems, which is characterized in that including:DNA resets region detection and sets It is standby, it resets region for the DNA to sample to be tested and is detected;And RNA product forecast equipment, for the DNA to detecting The RNA products for resetting region are predicted, wherein DNA resets equipment for area detection equipment and completes above-mentioned DNA rearrangements region detection side Method, RNA product forecasts equipment reset the gene rearrangement type in region based on DNA and support the detailed annotation base of big cluster accordingly Because the RNA products that structure resets DNA in region are predicted, including:Protein variations type prediction portion, when DNA resets region Gene rearrangement type when being that gene and intergenic region are reset or gene internal is reset, and the participation for participating in the gene reset is reset Part be noted as gene be 5 ' end when, predict that the corresponding RNA products for resetting region are protein truncation, when resetting region Gene rearrangement type when being that gene and intergenic region are reset or gene internal is reset, and the participation for participating in the gene reset is reset Part when being noted as 3 ' end, predict that the corresponding RNA products for resetting region are protein delation, when DNA resets the base in region It is gene and gene rearrangement because resetting type, and the part that two genes for participating in resetting respectively participate in resetting all is annotated into 5 ' When end, it is protein truncation to predict the RNA products of two genes all, when the type that DNA resets region is gene and gene rearrangement, And two participate in reset genes respectively participate in reset part be all annotated into 3 ' end when, predict two genes RNA production Object all be protein delation, when DNA reset region type be gene and gene rearrangement, and two participate in reset genes in, one The part that a gene participates in resetting is annotated into 5 ' ends, when the part that another gene participates in resetting is annotated 3 ' end, predicts DNA The RNA products for resetting region are the fusion protein that two Gene Fusions obtain;Exon prediction section is transcribed, when DNA is reset When the type in region is that gene is reset with gene rearrangement or gene internal, reflected when supporting the corresponding mapping area of group in big cluster When penetrate that the least significant end in direction is annotated into is introne, then the General Principle fallen by montage according to introne predicts the group quilt Exon number after transcription, when support the group in big cluster least significant end be annotated into be exon when, predict that the group is walked around The exon number of record is constant;Reading frame judging part at fusion is being predicted when it is fusion protein to predict DNA to reset region Two groups exon position intersection complementation after codon complementary state be the coding triplet period of the day from 11 p.m. to 1 a.m, prediction melts Reading frame at the fusion of hop protein is not displaced.
The present invention also provides the equipment that a kind of DNA resets region detection, which is characterized in that including:It is calculated for storing The memory of machine program instruction;And the processor for executing program instructions, wherein when the computer program instructions are by this When managing device execution, the equipment is made to execute the step of above-mentioned DNA resets method for detecting area.
The present invention also provides a kind of computer-readable medium, computer-readable medium storage has computer program, wherein Computer program can be executed by processor to realize the step of above-mentioned DNA resets method for detecting area.
The present invention also provides a kind of equipment of RNA product forecasts, which is characterized in that including:For storing computer journey The memory of sequence instruction;And the processor for executing program instructions, wherein when the computer program instructions are by the processor When execution, the step of making the equipment execute above-mentioned RNA product forecast methods.
The present invention also provides another computer-readable mediums, it is characterised in that:Computer-readable medium storage has Computer program, wherein computer program can be executed by processor the step of to realize above-mentioned RNA product forecast methods.
Invention effect
DNA provided by the invention resets region and corresponding RNA product detections method, equipment and storage medium, due to DNA Reset the mapping progress that method for detecting area meets mapping distance by the comparison information based on sample to be tested predetermined mapping value Cluster obtains each tuftlet, and by the mapping in each tuftlet by the pairs of identical relationship of corresponding two tuftlets of two mappings Merge to obtain it is each it is pairs of map the big cluster for reaching predetermined logarithm, also to meet the predetermined big cluster for comparing particular conditions into The predetermined filtering of row, the support DNA that can be accurately obtained reset the big cluster of support of the relevant information containing each mapping in region, Thus the accurate annotation of progress that region can also be reset to DNA obtains detailed gene structure, accurate judgement obtains gene rearrangement class Type and chromosomal rearrangement type etc., the RNA products so as to reset region to DNA are accurately predicted so that entirely to RNA The prediction of product is simple, at low cost, efficient, and accuracy is good.
Description of the drawings
Fig. 1 is the structure diagram for the RNA product forecast systems that embodiment is related to;
Fig. 2 is the structure diagram of the DNA rearrangement equipment for area detection equipment involved by embodiment;
Fig. 3 is the breakpoint condition for consistence schematic diagram involved by embodiment;
Fig. 4 is the map type schematic diagram involved by embodiment;
Fig. 5 is the judgement schematic diagram of the chromosomal rearrangement type involved by embodiment;
Fig. 6 is the structure diagram of the RNA product forecast equipment involved by embodiment;
Fig. 7 is No. exon after the General Principle prediction transcription fallen by montage according to intron that embodiment is related to signal Figure;
Fig. 8 is the overall step flow chart of the RNA product forecast systems involved by embodiment;
Fig. 9 is the step flow chart that the DNA that embodiment is related to resets equipment for area detection equipment;
Figure 10 is the step flow chart for carrying out predetermined filtering that the DNA that embodiment is related to resets equipment for area detection equipment;
Figure 11 is the step flow for the RNA product forecast equipment that embodiment is related to;
Figure 12 is the testing result comparison for verifying the two methods that example is related to.
Specific implementation mode
Illustrate the specific implementation mode of the present invention below in conjunction with attached drawing.
One, comparison information source
In following embodiment, before proceeding, each sequencing segment that first sample to be tested is captured by probe into The double end sequencings of row obtain double end datas, which includes that a pair grows pairs of reading;Again by double ends of acquisition Comparison information is obtained on comparing to reference gene group.
Two, definition or term
Segment is sequenced, the sample to be tested from target individual is typically passed through to the library construction flow of microarray dataset adaptation The DNA library built, composition are the DNA random fragments of certain length;
Length is read, the sequencing sequence that sequencing fragment ends are sequenced;
Pairs of reading is grown, two sequencing sequences from the same sequencing segment both ends that double end sequencings obtain;
Mapping refers to the comparison area for reading to grow some position in reference gene group, and for the ease of narration, the present invention is by one A reading grows the comparison on a position in reference gene group and is known as a mapping, is possible to since a reading is long and refers to base Because above different location can be matched group partially or completely namely multiposition compares, a reading is grown can in reference gene group There can be multiple mappings;
Corresponding mapping is grown in pairs of mapping, pairs of reading;
Start position is mapped, the starting point of above-mentioned mapping is the location of in reference gene group;
Final position is mapped, the terminal of above-mentioned mapping is the location of in reference gene group;
Two of mapping distance (intersize), the distance between above-mentioned pairs of mapping starting point namely pairs of mapping Map the distance between start position;
Direction is mapped, an above-mentioned direction being mapped in reference gene group is referred to;
Mapping position, above-mentioned mapping starting point is the location of in reference gene group;
Breakpoint, breakpoint typically refer to one and read on long, continuously not with reference gene group continuous coupling and with reference gene group Site where matched intersection base is known as the breakpoint of reading length;In the present invention, for purposes of illustration only, reading long break by each Point indicates that every breakpoint being related in the present invention is the breakpoint of a mapping with its corresponding breakpoint each mapped, namely In one above-mentioned mapping, and adjacent with the base of reference gene group continuous coupling can continuously it be mismatched with reference gene group Base where breakpoint of the site as the mapping;
The breakpoint for the different mappings that breakpoint is mapped on identical genomic locations is referred to as same by same breakpoint, the present invention Breakpoint;
Detailed gene structure:Refer to that gene structure is had by accurate annotation to introne (intron), exon (exon) The gene structure of body composed structure;
Fusion (Fusion gene) refers to that all or part of sequence of two genes mutually permeates The process of new gene also refers to fusion protein in transcriptional level.
Embodiment
Following embodiment is illustrated so that the sample to be tested from target individual is tumor tissues sample as an example.In addition, needing It is noted that sample to be tested is in addition to for tumor tissues sample, can also be that other non-tissues containing Oncogenome exist Sample, such as ctDNA or excretion body etc..
Check sample involved in embodiment is nonneoplastic tissue sample, can be from group or above-mentioned target individual Nonneoplastic tissue sample or blood sample.So-called nonneoplastic tissue sample refers to the tissue samples other than tumor tissues, so-called blood Sample refers to the DNA of the leucocyte extracted in blood.
So-called target individual refers to the individual of pending RNA product forecasts, so-called group refer to numerous individuals (including Above-mentioned target individual) composition group.
Fig. 1 is the structure diagram for the RNA product forecast systems that embodiment is related to.
It is set as shown in Figure 1, RNA product forecast systems include DNA rearrangement equipment for area detection equipment 100 and RNA product forecasts Standby 200, DNA resets and is communicated to connect by communication network 300 between equipment for area detection equipment 100 and RNA product forecasts equipment 200.
Fig. 2 is the structure diagram that the DNA that embodiment is related to resets equipment for area detection equipment.
Include receiving part 10, obtain cluster portion 11, merge judging part as shown in Fig. 2, DNA resets equipment for area detection equipment 100 12, specific judging part 13, filter house 14, rearrangement determination unit 15, breakpoint judging part 16, configuration part 17, overlapping region determining section 18, chromosomal rearrangement type decision portion 19, comment section 20, gene rearrangement type decision portion 21, detection side communication unit 22, detection side Temporary storage part 23 and detection side control unit 24.
Receiving part 10 is used to receive the comparison compared with reference gene group to double end sequencing data of sample to be tested Information, the comparison information include reading the mapping direction of the long mapping in reference gene group, the mapping for comparing and obtaining and reflecting Position is penetrated, further include intersize between pairs of mapping further includes further being found out by the match condition of mapping Each breakpoint corresponding with different mappings;In addition, by this present embodiment, sample to be tested is tumor tissues sample, also It needs to be filtered as a contrast with check sample, multiple sequencing segments progress that receiving part 10 is also received to check sample is identical The multipair two pairs of controls that obtain of double end sequencings read long double end sequencing data and compare to obtain with reference gene group Control comparison information, the control comparison information include at least control read grow in reference gene group each control mapping and The control comparison information of different control breakpoints corresponding from different control mappings.
Cluster portion 11 is obtained, all intersize is obtained and meets the corresponding mappings of predetermined intersize and advised by predetermined clusters It is then clustered to obtain the tuftlet of multiple mappings for containing multiple acquisitions respectively, and difference is located at for two pairs of mappings Two tuftlets in:Intersize is namely met into a pair of regular by preset distance to mapping of predetermined intersize distances It is clustered.
In the present embodiment, predetermined intersize distances can be 0 or be more than or equal to 2000b, when for 0, be expressed as pair Two mappings be not located at same chromosome, that is, their mapping position is respectively in different chromosome;When for 2000bp When, be expressed as to two mappings be located at same chromosome, that is, when their mapping position is located at the same chromosome, in advance That determines mapping value is ranging from more than or equal to 2000bp.
In the present embodiment, predetermined clusters rule is by between the corresponding each mapping position of each mapping for cluster Spacing meets the gathering for a tuftlet of predetermined clusters distance, specifically, be exactly will meet between adjacent each mapping it is predetermined Clustering distance is got together, and all mappings that the tuftlet obtained in this way contains all meet such relationship:The end of one mapping Spacing between point and the starting point of next adjacent mapping, meets predetermined clusters distance.The each tuftlet finally clustered Between the position relationship that meets be:In the corresponding mapping position of each mapping that one tuftlet contains, positioned at reflecting for the last one Penetrate corresponding mapping position, with adjacent tuftlet be located at the corresponding mapping position of mapping for most starting one between, spacing is full The above-mentioned predetermined clusters distance of foot.Also, in the present embodiment, each tuftlet also meets is located at difference for two pairs of mappings Two tuftlets in, namely pairs of two mappings are not in the same tuftlet.The clustering distance is too small, can cause to react same The mapping that a DNA resets region is poly- less than the difficulty for together, increasing lookup existing DNA rearrangements region in this way;Conversely, cluster Apart from excessive, then cause to react different DNA and reset the mapping in region and can not distinguish, equally increase the existing DNA of lookup and reset The difficulty in region, but also different DNA may be reset to the mapping in region, it is believed that it is that the same DNA of reaction resets region, causes As a result deviation or mistake.The present inventor passes through the study found that most suitable when predetermined clusters distance is less than or equal to 1000bp.
Merge judging part 12 first by all mappings in all tuftlets by the pairs of corresponding two tuftlet phases of two mappings Same relationship merges to obtain each big clusters containing at least a pair of two pairs of mappings respectively, then judges again Whether the logarithm for two pairs of mappings that each big cluster includes meets predetermined logarithm.
In the present embodiment, when merging, merged by pairs of corresponding two tuftlets of mapping are identical, for example, As shown in table 1, a total of 4 pairs of the mapping that the present embodiment obtains, tuftlet that the mapping of each centering is clustered and each pair of corresponding Two tuftlets are as shown in table 1.
There it can be seen that in these results, mapping is all cluster 1 and cluster 2 to 1 and mapping two tuftlets corresponding to 3, Namely meeting the identical relationship of corresponding two tuftlets of pairs of mapping, merging obtains a big cluster, is called big cluster 1 here, should Big cluster 1 includes mapping 11, mapping 12, mapping 31 and mapping 32;Likewise, mapping also meets this relationship to 4- mappings to 9, close Also a big cluster is obtained after and, is called big cluster 2 here, which includes mapping 41, mapping 42, mapping 51, mapping 52, mapping 61, mapping 62, mapping 71, mapping 72, mapping 81, mapping 82, mapping 91 and mapping 92.
When judgement, whether the logarithm of the pairs of mapping inside big cluster judged meets predetermined logarithm, this reality Apply in example predetermined logarithm be more than or equal to 6, such as the example above big cluster 1, the mapping of the inside is to being 2 pairs<Predetermined logarithm 6, Big cluster 1 is unsatisfactory for predetermined logarithm, and the mapping of 2 the inside of big cluster is more than or equal to predetermined logarithm 6 to logarithm, and big cluster 2 meets predetermined pair Number.
Specific judging part 13, for judging that logarithm meets the corresponding all readings of all mappings in the big cluster of predetermined logarithm Grow the comparison in reference gene group whether meet it is predetermined compare particular conditions, in the present embodiment, make a reservation for compare specific item Part is:For judging that it is regular according to merging that all mappings in reference gene group are grown in the predetermined all readings for comparing particular conditions The number of the combined region merged meets presumptive area quantity, here " predetermined compare specific item for judging All readings of part are grown " what is referred to is exactly " logarithm meets the corresponding all reading length of all mappings in the big cluster of predetermined logarithm ".
In the present embodiment, the predetermined rule that merges is:By between the corresponding each mapping position of each mapping for merging Spacing meet the combined region of merging into of predetermined combined distance, that is, it is predetermined by meeting between adjacent each mapping Combined distance merges as a combined region, and all mappings that the combined region obtained in this way contains all meet in this way Relationship:Spacing between the terminal and the starting point of next adjacent mapping of one mapping, meets predetermined combined distance.This reality It applies in example, predetermined combined distance is less than or equal to 1000bp.It is illustrated below:
In this example embodiment, each mapping in the big cluster obtained in table 1 is mapped by the different length of reading in double end datas It arrives, and these are read long corresponding all mappings in reference gene group and are shown in Table 2.
As seen from Table 2, the reading length 41 in big cluster 2, read long 42, read long 51, read long 52, read long 61, read long 62, read length 71, it reads long 72, reads long 81, reads long 82, reading long 91 and read long 92 corresponding all to be mapped as:Mapping 41, mapping 42, mapping 51, mapping 53, mapping 52, mapping 54, mapping 61, mapping 63, mapping 62, mapping 64, mapping 66, mapping 71, mapping 72, mapping 81, mapping 83, mapping 85, mapping 82, mapping 84, mapping 91, mapping 92 and mapping 94, these are mapped, by above-mentioned pre- Surely merge compatible rule merging, several combined region can be obtained by seeing, for example can obtain 4 combined region after merging here, according to above-mentioned Know, if the number of obtained combined region meets presumptive area quantity, means that all mappings in the big cluster 2 are corresponding All readings grow the comparison in reference gene group and meet above-mentioned predetermined comparison particular conditions, namely big cluster 2 meets predetermined compare Particular conditions.In the present embodiment, presumptive area quantity is less than or equal to 6,4<6, illustrate that big cluster 2 meets above-mentioned predetermined comparison Particular conditions.
Theoretically, the presumptive area quantity obtained is got over to be approached with the number of clusters in big cluster, namely close to 2, is indicated in big cluster The long effect compared of reading it is better, namely specificity is better, but when due to resetting, will also result in comparison not uniquely Property, presumptive area quantity is not limited to 2 in the present embodiment, but if the quantity limited is too many, and can influence subsequent Treatment effeciency, in the present embodiment, presumptive area quantity is just to be unlikely in this way to the problem of missing inspection is reset occurs less than or equal to 6, It can guarantee treatment effeciency again simultaneously, improve processing speed.
Moreover, predetermined combined distance is preferably being twice with predetermined clusters distance, in gene structure identical so not It can be merged together with rearranged form.
Filter house 14 is used for identical all by corresponding tuftlet in the big cluster after making a reservation for compare particular conditions to meeting It is mapped as one group and carries out predetermined filtering, such as above-mentioned big cluster 2 to every group respectively, corresponding tuftlet is cluster 3 and cluster 4, wherein is reflected Penetrate 41, mapping 51, mapping 61, mapping 71, mapping 81 and identical with 91 corresponding tuftlets of mapping, be cluster 3, and map 42, Mapping 52, mapping 62, mapping 72, mapping 82 and with mapping 92 corresponding tuftlets it is identical, be cluster 4, will mapping 41, mapping 51, mapping 61, mapping 71, mapping 81 and be filtered for one group (be named as here group 1) with mapping 91, will mapping 42, reflect It penetrates 52, mapping 62, mapping 72, mapping 82 and is filtered for one group (being named as group 2 here) with mapping 92.
Filter house 14 includes the first judging unit 14a, second judgment unit 14b, third judging unit 14c, the 4th judgement Unit 14d, the 5th judging unit 14e, the 6th judging unit 14f, the 7th judging unit 14g, the first judging unit 14h, second Judging unit 14i and third judging unit 14j.
First judging unit 14a is used to judge whether contain in all mappings that above-mentioned group includes there are the mapping of breakpoint, Such as in judgement group 1, mapping 41, mapping 51, mapping 61, mapping 71, mapping 81 and and mapping 91 this 6 mapping in, if Have and finds that mapping 41, mapping 51, mapping 61, mapping 71, mapping 81 are all deposited in the mapping there are breakpoint, such as group 1 afterwards judged In breakpoint, illustrate that there are breakpoints in group 1, similarly judge group 2, for example judge that breakpoint is not present in discovery group 2 here.
When judging without containing breakpoint, grown with reference to base with the presence or absence of corresponding reading in second judgment unit 14b judgement groups Because in all mappings in group there are in the mapping of breakpoint, such as above-mentioned group 2 be not present breakpoint, then just see group 2 in institute Have and maps the corresponding all mappings read in long corresponding all mappings namely table 2:Mapping 42, mapping 52, mapping 54, mapping 62, in mapping 64, mapping 66, mapping 72, mapping 82, mapping 84, mapping 92 and mapping 94, if having reflecting there are breakpoint It penetrates, for example judges to find mapping 66 and mapping 94 here there are the mapping of breakpoint, to judge exist in group 2.
In the presence of second judgment unit judgement, the first judging unit 14h judges corresponding group and leaves, such as above-mentioned group 2 In, decision set 2 leaves in filtering so far.
Third judging unit 14c, when the first judging unit 14a is judged as containing corresponding there are when the mapping of breakpoint, judging Group in the presence or absence of meeting the breakpoint of predetermined condition for consistence, such as containing there are the mappings of breakpoint in above-mentioned group 1, then this In continue to determine whether to exist and meet the breakpoint of predetermined condition for consistence, namely continue to judge mapping 41, mapping 51, mapping 61, Whether mapping 71,81 each breakpoints of mapping meet predetermined condition for consistence.
Fig. 3 is the breakpoint condition for consistence schematic diagram involved by embodiment.
In the present embodiment, predetermined condition for consistence is:Each all mappings with same breakpoint are same with this between each other One breakpoint is that there are the identical series that cannot continuously compare for starting point, and the quantity of these mappings is more than or equal to 2.On such as In the group 1 stated, mapping 41, mapping 51,61 breakpoint that has of mapping are same breakpoint, then with the breakpoint namely the breakpoint Site where base is starting point, when seeing that these mappings are compared with reference gene group, between them with the presence or absence of it is identical cannot The continuous series compared, meets, and indicates there is the breakpoint for meeting predetermined condition for consistence in corresponding group 1.Intuitively, and example In Fig. 3, each map is compared with reference gene group respectively in longitudinal arrangement in figure, and grey parts indicate that there are same disconnected Each mapping of point can matched part, black portions indicate cannot matched part, each mapping in the figure cannot compare There are consecutive identical series, the same breakpoints of these mappings to meet condition for consistence for part.
Second judging unit 14i meets predetermined condition for consistence when third judging unit 14c judges to exist in corresponding group Breakpoint, the breakpoint that judgement meets predetermined condition for consistence is reliable breakpoint, and judges that corresponding group leaves:Such as above-mentioned group In 1, the breakpoint that mapping 41, mapping 51, mapping 61 have is same breakpoint, meets condition for consistence judged, then judges that this is several The breakpoint in a mapping being the same breakpoint is reliable breakpoint.
4th judging unit 14d judges to judge the group left through the second judging unit 14i, if meets each reliable breakpoint Corresponding corresponding each mapping is respectively provided with consecutive identical base quantity and is respectively less than predetermined alkali within the scope of neighbouring predetermined base The first condition of base number is for for mapping of each of the group with reliable breakpoint, if this be mapped in it Within the scope of predetermined base near reliable breakpoint, there is consecutive identical base, then the quantity of this identical base is less than Predetermined base number, it is this to be mapped with several reliable breakpoints, then it just should be to sentencing as each reliable breakpoint nearby progress It is disconnected.In addition, this consecutive identical base, refers to the same base, for example all it is base AAA.Predetermined alkali in the present embodiment Base ranging from extends the corresponding mapping range of 20 bases by midpoint or so of reliable breakpoint, namely at the place of the reliable breakpoint Site is midpoint, on the position in reference gene group, by the site to the left and to the right respectively 20 bases of extension to get to above-mentioned Predetermined base range.In addition, in the present embodiment, predetermined base number is 20.Concrete example is as follows:
Such as in above-mentioned group 1, see near 41 corresponding reliable breakpoints of mapping within the scope of 20 bases in left and right, the mapping 41 Within the scope of the predetermined base, the consecutive identical base having is AAAAAA, is 6 in total, judges that the quantity is less than 20 Predetermined base number, other mappings with reliable breakpoint are all gone to judge, be reflected if all reliable breakpoints are corresponding so respectively It penetrates, within the scope of the base of left and right 20 near reliable breakpoint, identical continuous base quantity is both less than 20, it is considered that should Group meets above-mentioned first condition.
5th judging unit 14e judges the group left through the second judging unit 14i judgements, judges whether the group left meets All quantity that there is the mapping for being more than the several reliable breakpoint of the first predetermined breakpoint account for the sum of all mappings in the group Ratio be less than the second condition of predetermined ratio, namely in this set, see in the mapping with reliable breakpoint, of reliable breakpoint Number whether be more than the first predetermined breakpoint number, by the number of reliable breakpoint be more than the first predetermined breakpoint number mapping quantity into Row statistics, sees that this quantity accounts for the ratio of the total quantity of all mappings in the group, which is less than predetermined ratio, then it represents that the group Meet second condition.In the present embodiment, the first predetermined breakpoint number is 2, and predetermined ratio is one third.Concrete example is such as Under:
Such as in above-mentioned group 1, have mapping 41, mapping 51, mapping 61, mapping 71 with, mapping 81 and mapping 91 totally 6 A, statistics is found, mapping 41, mapping 51 and mapping 61 have reliable breakpoint, and the number for mapping 41 reliable breakpoint is 3 A, the number for mapping 51 reliable breakpoint is 1, and the number for mapping 61 reliable breakpoint is 2, and reliable breakpoint number is more than the As soon as the mapping of predetermined breakpoint number only map 41 this, predetermined ratio 1/6<1/3, group 1 meets second condition.
6th judging unit 14f judges to judge the group left through the second judging unit 14i, if meet it is all have be less than Equal to the third condition that the number of the corresponding same reliable breakpoint of mapping of predetermined total number is less than the second predetermined breakpoint number, namely For all same reliable breakpoints for thering is the mapping for being less than or equal to predetermined total number to support, the number of these same reliable breakpoints is seen, If number is less than the second predetermined breakpoint number, then it represents that the group meets above-mentioned third condition.In the present embodiment, predetermined total number is 2, the second predetermined breakpoint number is 10.Concrete example is as follows:
Such as in above-mentioned group 1, there is mapping 41, mapping 51, mapping 61, mapping 71, mapping 81 and mapping 91 totally 6 It is a, if mapping 41, mapping 51, this 3 mappings of mapping 61 have or be called corresponding same reliable breakpoint (same reliable breakpoint 1) namely the breakpoint of this 3 mappings is in same site, and the same reliable breakpoint 1 is supported by this 3 mappings in other words, and is mapped 71 have with this 2 mappings of mapping 81 or are called corresponding same reliable breakpoint (same reliable breakpoint 2), it can be seen that in group 1 In, same reliable breakpoint is same reliable breakpoint 1 and same reliable breakpoint 2, and wherein only supports reflecting for same reliable breakpoint 2 The sum (2) penetrated is less than or equal in predetermined total number 2 namely the group 1, and the sum of only 1 corresponding mapping is less than or equal to make a reservation for The same reliable breakpoint of sum, and 1<Second predetermined breakpoint number 10, so the group 1 meets third condition.
7th judging unit 14g judges to judge the group left through the second judging unit 14i, if meeting has reliable breakpoint All mappings in exist with it is corresponding control comparison information in respectively it is identical control mapping control breakpoint distinguish it is identical The quantity of all mappings of reliable breakpoint is less than or equal to the fourth condition of predetermined item number:Namely there are reliable breakpoints for all Mapping, with by compare all controls in comparison information mapping compare, find out it is identical control mapping all mappings, it is so-called Identical refers to just the same mapping read long comparison and obtained to same position, and each mapping is seen one by one to the mapping that these are found out It above whether there is reliable breakpoint, and the control breakpoint on existing reliable breakpoint and identical control mapping, it is whether identical, Namely the same site whether is corresponded to, statistics is the quantity of identical mapping, if quantity is less than or equal to predetermined item number, it is believed that Corresponding group meets fourth condition.In the present embodiment, predetermined item number is 3-5 items, can be removed well in this way with tumour variation nothing The rearrangement event of pass.
Third judging unit 14j judgement at the same meet in first condition, second condition, third condition or fourth condition one A or multiple group leaves in the present embodiment, and the group that will preferentially meet this four conditions simultaneously leaves.
Determination unit 15, the big cluster all left after two groups are filtered namely two groups are reset by filter house 14 to distinguish The big cluster all left after filtering judges the corresponding region of mapping in all big clusters for an actual DNA of sample to be tested Mapping area of the region in the reference gene group is reset, corresponding big cluster is that the DNA is supported to reset the big cluster of support in region.
Fig. 4 is the map type schematic diagram involved by embodiment.
In Fig. 4, it is shown that supporting that the big cluster of support that the DNA that DNA is reset resets region occurs between Gene A and gene B In, the case where mapping corresponding map type in pairs.
Such as Fig. 4, breakpoint judging part 16 for being respectively to supporting each two pairs of mappings in big cluster to judge one by one It is no satisfaction all without the first condition of breakpoint (Breakpoint, BP), have breakpoint second condition or only there are one have it is disconnected A condition in the third condition of point, that is, seeing is is judged respectively to the breakpoint of two pairs of mappings to a pair It is not:Two mappings are all without breakpoint, namely both exactly match (Whole Mapping), are not passed through in schematic diagram Breakpoint;Still there is breakpoint (Two BP), namely both pass through the breakpoint in schematic diagram;Or one is mapped with breakpoint, and Another maps no breakpoint namely a breakpoint passed through in schematic diagram, another does not have (One Partner BP).
Such as Fig. 4, when breakpoint judging part 16 judges that two pairs of mappings meet the first breakpoint condition, configuration part 17 is set This pairs of two are mapped as first kind mapping (I);When breakpoint judging part 16 judges that two pairs of mappings meet the second breakpoint When condition, configuration part 17 be set to two be mapped as the second class map (II);When breakpoint judging part 16 judges pairs of two A mapping is when meeting third breakpoint condition, configuration part 17 be set to two be mapped as third class and map (III).
Overlapping region determining section 18 is for the corresponding map section of inhomogeneity to the big cluster of support mapped including at least two classes Determine that the region overlapped obtains overlapping region between domain, for example, big cluster is supported to only include first kind mapping and the second class when one When mapping, the corresponding all mapping areas of all pairs of mappings of first kind mapping are seen as, then see all of the second class mapping The corresponding all mapping areas of pairs of mapping, then using the region mutually covered between both mapping areas as coincidence Region is the mapping in the obtained big cluster of support in conjunction with the citing of table 3, in table 3 to relationship and mapping area:
From table 3 it is observed that looking for coincidence between 5-70 and 18-85, the overlapping region between this two class determined is 18-70。
When the DNA rearrangements region for there are overlapping region, indicating that the big cluster is supported between inhomogeneous mapping in the big cluster of support It is more reliable.
Also, the comparison situation for individually taking out the overlapping region part, is equal to the part for being exaggerated rearrangement, when by this portion When dividing visualization, will become more apparent that, more conducively observe.
Fig. 5 is the judgement schematic diagram of the chromosomal rearrangement type involved by embodiment.
Chromosomal rearrangement type decision portion 19 specifically judges for judging that determining DNA resets the rearrangement type in region For:Each when the mapping position for supporting two groups of mappings in big cluster is located at different chromosomes namely in one group is reflected It penetrates in a chromosome, and when each in another group is mapped in another chromosome, judgement DNA resets the chromosome in region Rearrangement type is transposition;When the mapping position for supporting two groups of mappings in big cluster is located at identical chromosome, as shown in figure 5, It is inversion to judge when the direction between all kinds of two pairs of mappings is identical that corresponding DNA resets the chromosomal rearrangement type in region (inversion), the dyeing weight in corresponding DNA rearrangements region when the direction between all kinds of two pairs of mappings of judgement is opposite It is missing (deletion) to arrange type, judges that corresponding DNA resets area when the direction between all kinds of two pairs of mappings is opposite The chromosomal rearrangement type in domain is to repeat (duplication).
Since gene rearrangement type is directly related to clinically cancer target medication guide, we are specific in order to determine Gene rearrangement type, and can be used to predict corresponding RNA products, it is thus necessary to determine that DNA resets the detailed gene structure in region.
Comment section 20 is used to, based on transcript is referred to, map direction accordingly in conjunction with each mapping contained in big cluster is supported And corresponding mapping position annotates mapping area to obtain the detailed gene structure in DNA rearrangements region, it is so-called detailed Thin gene structure is exactly the structure for being accurate to introne and exon concrete structure.
The detailed gene structure that gene rearrangement type decision portion 21 is used to obtain based on annotation, judgement DNA reset region Gene rearrangement type, specially:When the detailed gene structure that mapping area is annotated into corresponds to the inside of a gene, judgement The gene rearrangement type that corresponding DNA resets region is to be reset in gene, when the detailed gene structure that mapping area is annotated into It is corresponding be a gene inside and the gene and other gene between the structure that is combined of area, judge corresponding DNA weights The gene rearrangement type for arranging region is that gene is reset with intergenic region, when the detailed gene structure that mapping area is annotated into corresponds to Be two genes inside between combination when, judge that corresponding DNA resets the gene rearrangement type in the region weight between gene Row.
In addition, according to detailed gene structure, for the different big clusters of support, as long as being noted as identical gene structure, It is considered as these and supports that the corresponding mapping area of big cluster is the different rearranged forms of same gene structure, for example, two supports Big cluster after annotated, finds all to be EML4-ALK, then the corresponding mapping area of the two big clusters, be exactly gene structure is EML4- The different rearranged forms of ALK that is to say EML4 genes and ALK gene when variation due to truncating made of various combination Two kinds of rearranged forms.
Each data information that detection obtains is sent to RNA product forecasts equipment 200 by detection side communication unit 22.
The related data that side temporary storage part 23 is received or generated to the DNA rearrangement operations of equipment for area detection equipment 100 is detected to carry out temporarily When store.
Detection side control unit 24 include for control and receive portion 10, obtain cluster portion 11, merge judging part 12, specificity is sentenced Disconnected portion 13, filter house 14 reset determination unit 15, breakpoint judging part 16, configuration part 17, overlapping region determining section 18, dyeing weight Arrange type decision portion 19, comment section 20, gene rearrangement type decision portion 21, detection side communication unit 22 and detection side temporary storage part 23 The computer program of operation.
Fig. 6 is the structure diagram of the RNA product forecast equipment involved by embodiment.
It is reset as shown in fig. 6, RNA product forecasts equipment 200 is used to reset the DNA that equipment for area detection equipment 100 determines to DNA The RNA products in region are predicted, including:It predicts side communication unit 210, protein variations type prediction portion 211, transcribes exon Reading frame judging part 213, prediction side temporary storage part 214 and prediction side control unit 215 at number prediction section 212, fusion.
Prediction side communication unit 210 receives DNA and resets each data information that equipment for area detection equipment 100 is sent:Including DNA It resets the detailed gene structure of the big cluster of support in region, support direction, type and the mapping position of each mapping in big cluster And DNA resets corresponding mapping area in region etc..
Protein variations type prediction portion 211 is based on gene rearrangement type and annotation gene structure resets DNA in detail Protein variant type after regional transcription is predicted, is specifically included:
When the gene rearrangement type that the DNA resets region is that gene is reset with intergenic region or gene internal is reset, And the part reset of the participation for participating in the gene reset be noted as gene be 5 ' end when, predict the corresponding rearrangement The RNA products in region are protein truncation, when the gene rearrangement type for resetting region is gene and intergenic region rearrangement or base When because of internal rearrangement, and participate in reset the gene participation reset part be noted as 3 ' end when, predict corresponding institute It is protein delation to state and reset the RNA products in region, when the gene rearrangement type that the DNA resets region is gene and gene weight Row, and two participate in reset the genes respectively participate in reset part be all annotated into 5 ' end when, predict two genes RNA products all be protein truncation, when the DNA reset region type be gene and gene rearrangement, and two participate in reset The gene part that respectively participates in resetting when being all annotated into 3 ' end, it is albumen to predict the RNA products of two genes all Missing, when the DNA reset region type be gene and gene rearrangement, and two participate in reset the genes in, one The part that gene participates in resetting is annotated into 5 ' ends, when the part that another gene participates in resetting is annotated 3 ' end, prediction DNA weights The RNA products in row region are that the fusion protein that two Gene Fusions obtain also is specially according to gene rearrangement type:
When the gene rearrangement type that DNA resets region is that gene is reset with intergenic region or gene internal is reset, and join When what the part that the participation with the gene of rearrangement is reset was noted as gene is 5 ' end, the corresponding RNA productions for resetting region are predicted Object is protein truncation;
When the gene rearrangement type for resetting region is that gene is reset with intergenic region or gene internal is reset, and participate in weight When the part that the participation of the gene of row is reset is noted as 3 ' end, predict that the corresponding RNA products for resetting region are protein delation;
When DNA reset region gene rearrangement type be gene and gene rearrangement, and two participate in reset genes respectively When the part for participating in resetting all is annotated into 5 ' end, predict that the RNA products of two genes are all protein truncation;
When DNA reset region type be gene and gene rearrangement, and two participate in reset genes respectively participate in resetting Part when being all annotated into 3 ' end, it is protein delation to predict the RNA products of two genes all;
When DNA resets the type in region for gene and gene rearrangement, and in two genes for participating in rearrangement, a gene is joined 5 ' ends are annotated into the part of rearrangement, when the part that another gene participates in resetting is annotated 3 ' end, prediction DNA resets region RNA products be the obtained fusion protein of two Gene Fusions.
Fig. 7 is No. exon after the General Principle prediction transcription fallen by montage according to intron that embodiment is related to signal Figure.
When the type that DNA resets region is that gene is reset with gene rearrangement or gene internal, using transcription exon number After the corresponding structure in position of two groups for rearrangement in big cluster of the prediction section 212 to supporting DNA rearrangements region is transcribed No. exon predicted, with predict reset region participate in rearrangement at transcription after corresponding No. exon, specifically include:
When support the corresponding mapping area of group in big cluster the least significant end for mapping direction be annotated into be intron when, The General Principle then fallen by montage according to intron is predicted No. exon (as shown in Figure 7) after the group is transcribed.
When support the group in big cluster least significant end be annotated into be exon when, predict that the group is walked around No. exon of record not Become.
Many targeting medicines are pointedly treated the case where change according to the exon of variant protein, according to exon Number, it is known that the albumen transcribed after DNA is reset occurs, with original albumen phase that transcription generation when DNA is reset does not occur Than whether protein structure domain-functionalities change, what the Functional portions that structural domain leaves are, so as to preferably target use Medicine plays directive function.
Reading frame judging part 213 at fusion, when the DNA rearrangements region that protein variations type prediction portion 211 predicts is It is codeword triplet in the codon state of the exon position intersections of two groups predicted when fusion protein When, predict that the reading frame at the fusion of fusion protein is not displaced.
Predict side temporary storage part 214 for temporarily storing the data letter that RNA product forecasts equipment 200 receives or operation generates Breath.
Predict that side control unit 215 includes control forecasting side communication unit 210, protein variations type prediction portion 211, transcription is outer The computer program that reading frame judging part 213 and prediction side temporary storage part 214 are run at aobvious son prediction section 212, fusion.
In the present embodiment, RNA product forecasts method and the step flow of RNA product forecast systems correspond, and DNA is reset The step of step flow that method for detecting area resets equipment for area detection equipment with DNA corresponds, RNA product forecasts is produced with RNA The step flow of the pre- measurement equipment of object corresponds.
Fig. 8 is the overall step flow chart of the RNA product forecast systems involved by embodiment.
As shown in figure 8, in the present embodiment, the step flow of RNA product forecast systems includes the following steps:
Step S1, the comparison information of comparison information and check sample based on sample to be tested, completes the DNA of sample to be tested Region detection is reset, subsequently into step S2;
Step S2 resets the gene rearrangement type in region based on the obtained detailed gene structures of step S1 and DNA, right The RNA products that DNA resets region are predicted.
Fig. 9 is the step flow chart that the DNA that embodiment is related to resets equipment for area detection equipment.
As shown in figure 9, in the present embodiment, the step flow that DNA resets equipment for area detection equipment 100 comprises the steps of:
Step S1-1, receiving part 10 receive the comparison information of sample to be tested, and receive the control comparison information of check sample, Subsequently into step S1-2;
Step S1-2, acquisition cluster portion 11 obtain all intersize and meet the corresponding mappings of predetermined intersize by pre- Determine clustering rule to be clustered to obtain the tuftlet of multiple mappings for containing multiple acquisitions respectively, and is two pairs of mapping difference In two different tuftlets, subsequently into step S1-3;
Step S1-3 merges judging part 12 all mappings in all tuftlets are two corresponding by two pairs of mappings The identical relationship of tuftlet merges to obtain each big cluster containing at least a pair of two pairs of mappings respectively, and judges respectively Whether the logarithm for two pairs of mappings that each big cluster includes meets predetermined logarithm, when judging to meet, enters step S1-4, When judging to be unsatisfactory for, into terminating;
Step S1-4, specific judging part 13 judge that logarithm meets all mappings in the big cluster of predetermined logarithm and corresponds to respectively All readings grow the comparison in reference gene group whether meet it is predetermined compare particular conditions, when judging to meet, into step Rapid S1-5, when judging to be unsatisfactory for, into terminating;
Step S1-5, filter house 14 are identical by corresponding tuftlet in the predetermined big cluster for comparing particular conditions to meeting respectively It is all be mapped as one group and carry out predetermined filtering to every group respectively, subsequently into step S1-6;
Step S1-6, detection side control unit 24 judge whether two groups in each big cluster all leave after predetermined filtering, When being judged as YES, S1-7 is entered step, when being judged as not being, into terminating;
Step S1-7 resets determination unit 15 and judges that the corresponding region of all mappings in corresponding big cluster is sample to be tested An actual DNA resets mapping area of the region in the reference gene group, judges corresponding big cluster to support the DNA to reset The big cluster of support in region, then respectively enters step S1-8 and step S1-18;
Step S1-8, breakpoint judging part 16 is respectively to supporting each two pairs of mappings in big cluster to judge whether one by one Meet all the first breakpoint condition of breakpoint does not enter step S1-9 when being judged as meeting, when be judged be unsatisfactory for when, into Enter step S1-10;
Step S1-9, configuration part 17 sets this pairs of two and is mapped as first kind mapping, subsequently into step S1-12;
Step S1-10, breakpoint judging part 16 continue to judge whether two pairs of mappings meet and have the second of breakpoint Breakpoint condition enters step S1-11 when judging to meet, and when judging to be unsatisfactory for, enters step S1-12;
Step S1-11, configuration part 17 sets this pairs of two and is mapped as the mapping of the second class, subsequently into step S1-12;
Step S1-12, breakpoint judging part 16 continue to judge whether two pairs of mappings meet only that there are one breakpoints Third breakpoint condition enters step S1-13 when judging to meet, and when judging to be unsatisfactory for, enters step S1-14;
Step S1-13, configuration part 17 sets this pairs of two and is mapped as the mapping of third class, subsequently into step S1-14;
Step S1-14, detection side control unit 24 judge whether all pairs of mappings are all judged in each big cluster of support It completes, when being judged as YES, enters step S1-15, when being judged as not being, return to step S1-8;
Step S1-15, detection side control unit 24 supports big cluster to judge whether that at least two classes map respectively to each, when one When a big cluster of support is judged as being, S1-16 is entered step, when supporting big cluster to be judged as not being for one, enters step S1- 17;
Step S1-16, overlapping region determining section 18 are corresponding to the inhomogeneity of the big cluster of support mapped including at least two classes Determined between mapping area the region overlapped obtain with this it is each support the corresponding each overlapping region of big cluster, subsequently into Step S1-17;
Step S1-17, chromosomal rearrangement type decision portion 19 support what is mapped in each group in big cluster to reflect according to each It penetrates position and maps direction determining each DNA resets the chromosomal rearrangement type in region accordingly, subsequently into end;
Step S1-18, comment section 20 are based on referring to transcript, corresponding in conjunction with each each mapping for supporting to contain in big cluster Mapping direction and corresponding mapping position annotated to obtain the detailed base that DNA resets region to each mapping area Because of structure, subsequently into step S1-19;
Step S1-19, gene rearrangement type decision portion 21 judge each DNA weights based on the detailed gene structure that annotation obtains The gene rearrangement type for arranging region, subsequently into step S1-20;
Step S1-20 sends out the related data information that the detection that region is reset to DNA obtains by detecting side communication unit 22 RNA product forecasts equipment 200 is given, subsequently into end.
Figure 10 is the step flow chart for carrying out predetermined filtering that the DNA that embodiment is related to resets equipment for area detection equipment.
As shown in Figure 10, the DNA of the present embodiment resets the step of equipment for area detection equipment carries out predetermined filtering to every group (S1- 5) flow specifically includes following steps:
Step S1-5-1, the first judging unit 14a meet each in the one big cluster after making a reservation for compare particular conditions Group judges whether contain there are the mapping of breakpoint in all mappings for including one by one, when judging not containing, enters step S1-5- 2, when judging to contain sometimes, enter step S1-5-4;
Step S1-5-2, second judgment unit 14b judge to grow in reference gene group with the presence or absence of corresponding reading in the group All mappings in there are the mappings of breakpoint, in the presence of judgement, enter step S1-5-3, in the absence of judgement, enter Terminate;
Step S1-5-3, the first judging unit 14h judges corresponding group and leaves, subsequently into step S1-5-12;
Step S1-5-4, third judging unit 14c, which judge to whether there is in corresponding group, meets predetermined condition for consistence Breakpoint enters step S1-5-5 in the presence of being judged as, in the absence of being judged as, into terminating;
Step S1-5-5, it is reliable breakpoint that the second judging unit 14i judgements, which meet the breakpoint of predetermined condition for consistence, and is sentenced Fixed corresponding group leaves, and then respectively enters step S1-5-6, S1-5-7, S1-5-8, S1-5-9;
Step S1-5-6, the 4th judging unit 14d judge whether the group left through the second judging unit 14i judgement meets the One condition enters step S1-1-5-10 when being judged as meeting, and when being judged as being unsatisfactory for, enters step S1-1-5-10;
Step S1-5-7, the 5th judging unit 14e judge whether the group left through the second judging unit 14i judgement meets the Two conditions enter step S1-1-5-10 when being judged as meeting, and when being judged as being unsatisfactory for, enter step S1-1-5-10;
Step S1-5-8, the 6th judging unit 14f judge whether the group left through the second judging unit 14i judgement meets the Three conditions enter step S1-1-5-10 when being judged as meeting, and when being judged as being unsatisfactory for, enter step S1-1-5-10;
Step S1-5-9, the 7th judging unit 14g judge whether the group left through the second judging unit 14i judgement meets the Four conditions enter step S1-1-5-10 when being judged as meeting, and when being judged as being unsatisfactory for, enter step S1-1-5-10;
Step S1-5-10, detection side control unit 24 judge the second judging unit 14i judge the group left whether and meanwhile meet One or more of first condition, second condition or third condition enter step S1-5-11 when meeting, when being unsatisfactory for When, into terminating;
Step S1-5-11, third judging unit 14j judgement while meeting first condition, second condition, third condition or the The group of one or more of four conditions leaves, and in the present embodiment, the group that will preferentially meet this four conditions simultaneously leaves, Subsequently into step S1-5-12;
Whether step S1-1-5-12, detection side control unit 24 judge in big cluster there is also the group for not carrying out predetermined filtering, when In the presence of judgement, return to step S1-5-1, in the absence of judgement, into terminating.
Figure 11 is the step flow chart for the RNA product forecast equipment that embodiment is related to.
As shown in figure 11, in the present embodiment, the step flow of RNA product forecasts equipment 200 includes the following steps:
Step S2-1, prediction side communication unit 210 receive DNA by communication network 300 and reset the transmission of equipment for area detection equipment 100 That comes resets the relevant data information in region with DNA, subsequently into step S2-2;
Step S2-2, protein variations type prediction portion 211 based in above-mentioned data information gene rearrangement type and Annotation gene structure predicts the protein variant type after DNA rearrangement regional transcriptions in detail, subsequently into step S2-3;
Step S2-3, transcription exon prediction section 212 is to supporting DNA to reset two groups in the big cluster in region for weight No. exon after the corresponding structure in position of row is transcribed is predicted, (is reset at the participation rearrangement of region with predicting to reset Intersection) transcription after corresponding No. exon, subsequently into step S2-4;
Step S2-4 (resets intersection/fusion to have a common boundary in the exon positions intersection of two groups predicted Place) codon complementary state be the coding triplet period of the day from 11 p.m. to 1 a.m, predict that the reading frame at the fusion of fusion protein is not displaced.
Verify example
For this verification example by taking adenocarcinoma of lung FFPE samples to be tested (117J3141D1M1) as an example, EGFR-KDD uses it two kinds Method carries out the detection of RNA products:
Method one, method using the present invention carry out DNA and reset the detection in region, and carry out the prediction of RNA products;
Method two directly using RNA directly detects RNA products to it.
The result visualization of two methods is shown below, intuitively to be compared.
Figure 12 is the testing result comparison for verifying the two methods that example is related to.
As shown in figure 12, a row is that method one obtains as a result, a following row is the result that method two obtains above in figure. Above in a row, left and right is respectively the group supported in big cluster, shown in figure each to have arrow direction be mapping.Figure midsole Such as ASPSCR1 occurred in every column at end:NM_024083:The English of exon7 indicates respectively:Gene name:Transcript number:Outside Aobvious son or introne number.
The DNA provided through the invention resets method for detecting area and detects to support big cluster and support each in big cluster reflect Mapping direction, mapping position for penetrating etc. judge that the DNA of the sample to be tested resets rearrangement of the region between gene and gene, and Specific gene structure is provided, and combines and predicts that the RNA products of the rearrangement are fusion protein with reference to transcript etc., further Specific No. exon can be predicted and reading frame does not send displacement.
And from Figure 12 it can also be seen that method one obtains as a result, the result obtained with method two fits like a glove.
Embodiment effect
DNA provided in this embodiment resets region and corresponding RNA product detections method, equipment and storage medium, due to DNA rearrangement method for detecting area meets predetermined mapping value to mapping distance by the comparison information based on sample to be tested and is mapped into Row cluster obtains each tuftlet, and by the mapping in each tuftlet by the pairs of identical pass of corresponding two tuftlets of two mappings System merges to obtain the big cluster that each pairs of mapping reaches predetermined logarithm, also to meeting the predetermined big cluster for comparing particular conditions Predetermined filtering is carried out, the support that the support DNA that can be accurately obtained resets the relevant information containing each mapping in region is big Cluster, the accurate annotation of progress that region thus can also be reset to DNA obtains detailed gene structure, accurate judgement obtains gene rearrangement Type and chromosomal rearrangement type etc., the RNA products so as to reset region to DNA are accurately predicted so that entire right The prediction of RNA products is simple, at low cost, efficient, and accuracy is good.
In addition, the big cluster of the support obtained to the present embodiment is based on reference gene group and is visualized with reference to transcript profile, Direction, mapping position, the corresponding gene structure etc. that each mapping therein can intuitively be shown, so as to intuitively judge The species number for the rearranged form for going out DNA rearrangements region resets type, and can intuitively predict RNA products.
In addition, correspondingly, the invention also discloses a kind of DNA reset region detection equipment, including:It is calculated for storing The memory of machine program instruction;And the processor for executing program instructions, wherein when the computer program instructions are by this When managing device execution, the equipment is made to execute the step of DNA in embodiment resets the method for equipment for area detection equipment operation.Technology segment Particular content can be found in hereinbefore embodiment, details are not described herein.
Correspondingly, the invention also discloses a kind of computer readable storage medium, stored on computer readable storage medium There is computer program, the method for resetting equipment for area detection equipment operation such as above-mentioned DNA is realized when computer program is executed by processor The step of.Particular content can be found in embodiment, and details are not described herein.
In addition, correspondingly, the invention also discloses a kind of equipment of RNA product forecasts, including:For storing computer journey The memory of sequence instruction;And the processor for executing program instructions, wherein when the computer program instructions are by the processor When execution, make the equipment execute embodiment in RNA product forecast system operations method the step of.Technology segment it is specific in Hold and can be found in hereinbefore embodiment, details are not described herein.
Correspondingly, the invention also discloses a kind of computer readable storage medium, stored on computer readable storage medium There is computer program, the step of the method such as above-mentioned RNA product forecasts system operation is realized when computer program is executed by processor Suddenly.Particular content can be found in embodiment, and details are not described herein.

Claims (16)

1. a kind of DNA resets method for detecting area, resetting region to the DNA of sample to be tested is detected, which is characterized in that including Following steps:
Multipair two pairs of reading length that reception obtains the double end sequencings of multiple sequencing segments progress of the sample to be tested The comparison information that double end sequencing data are compared with reference gene group;
Obtained from the comparison information all mapping distances meet predetermined mapping value it is corresponding mapping by predetermined clusters rule into Row cluster obtains multiple tuftlets;
By all mappings in all tuftlets by the identical pass of corresponding two tuftlets of pairs of two mappings System merges to obtain each big cluster respectively, and judges pair for two pairs of mappings that each big cluster includes respectively Whether number meets predetermined logarithm;
It is corresponding all to judge that each logarithm meets all mappings in the big cluster of the predetermined logarithm respectively The reading grows whether the comparison in the reference gene group meets predetermined comparison particular conditions;
Respectively to meeting in the predetermined big cluster for comparing particular conditions by the identical all institutes of the corresponding tuftlet It states and is mapped as one group and carries out predetermined filtering to every described group respectively;
The corresponding region of all mappings in the big cluster all left after the scheduled filtering of two described group of judgement is institute An actual DNA for stating sample to be tested resets mapping area of the region in the reference gene group, and the corresponding big cluster is The DNA is supported to reset the big cluster of support in region.
2. DNA according to claim 1 resets method for detecting area, which is characterized in that
It is further comprising the steps of:
Judge whether to meet all not no first of breakpoint one by one to each two pairs of mappings supported in big cluster respectively Breakpoint condition, the second breakpoint condition for having breakpoint or only there are one have a condition in the third breakpoint condition of breakpoint;
Setting, which meets, is mapped as first kind mapping described in pairs of two of first breakpoint condition;
Setting, which meets, is mapped as the mapping of the second class described in pairs of two of second breakpoint condition;
Setting, which meets, is mapped as the mapping of third class described in pairs of two of the third breakpoint condition;
Coincidence is determined between the corresponding mapping area of inhomogeneity including at least the big cluster of the support mapped described in two classes Region obtains overlapping region corresponding with the big cluster of the support.
3. DNA according to claim 1 resets method for detecting area, which is characterized in that further comprising the steps of:
Judge that each DNA resets the chromosomal rearrangement type in region, specifically includes:
Judge chromosomal rearrangement type according to the direction supported between being mapped described in two groups in big cluster and chromosome location Whether it is intrachromosomal missing, repeats, one kind in the transposition of inversion or interchromosomal.
4. DNA according to claim 1 resets method for detecting area, which is characterized in that further comprising the steps of:
Based on reference to transcript, each mapping contained in big cluster is supported to map direction accordingly and reflect accordingly in conjunction with described Position is penetrated each mapping area is annotated to obtain the detailed gene knot in corresponding each DNA rearrangements region Structure.
5. DNA according to claim 4 resets method for detecting area, which is characterized in that further comprising the steps of:
Judge that each DNA resets the gene rearrangement type in region, specifically includes:
The detailed gene structure being annotated into according to the mapping area judges that corresponding DNA resets the gene rearrangement class in region Whether type is gene and rearrangement in gene rearrangement, gene or gene and one kind in intergenic region rearrangement.
6. DNA according to claim 1 resets method for detecting area, it is characterised in that:
The predetermined mapping value is 0bp or is more than or equal to 2000bp;
The predetermined clusters rule is:Meet by each spacing mapped between corresponding mapping position for cluster pre- Determine gathering for a tuftlet for clustering distance,
Predetermined clusters distance for less than or equal to 1000bp,
The predetermined logarithm be more than or equal to 6 pairs,
The predetermined comparison particular conditions are:For judging that the predetermined all readings for comparing particular conditions are grown in institute All mappings stated in reference gene group are predetermined according to the number satisfaction for merging the combined region that rule merges Region quantity,
The merging rule is:It is full by each spacing mapped between corresponding each mapping position for merging The predetermined combined distance of foot merges into a combined region,
The predetermined combined distance be less than or equal to 1000bp,
The presumptive area quantity is less than or equal to 6.
7. DNA as claimed in any of claims 1 to 6 resets method for detecting area, it is characterised in that:
Wherein, the predetermined filtering includes the following steps:
Whether judge in all mappings that described group includes containing there are the mappings of breakpoint;
When judgement is without containing there are when the mapping of the breakpoint, judge to grow described with the presence or absence of the corresponding reading in described group There are the mapping of breakpoint in all mappings in reference gene group,
In the presence of judgement, corresponding described group of judgement leaves,
When be judged as containing there are when the mapping of the breakpoint, judge it is described group corresponding in the presence or absence of meeting predetermined consistency The breakpoint of condition,
When judging to have the breakpoint for meeting the predetermined condition for consistence, judgement meets the breakpoint of the predetermined condition for consistence For reliable breakpoint, and judge that corresponding described group leaves,
The predetermined condition for consistence is:Each all mappings with same breakpoint are same described with this between each other Breakpoint is that there are the identical series that cannot continuously compare for starting point, and the quantity of these mappings is more than or equal to 2.
8. DNA according to claim 7 resets method for detecting area, it is characterised in that:
When the sample to be tested is nonneoplastic tissue sample, the predetermined filtering is further comprising the steps of:
Judge whether leave described group meet near each reliable breakpoint corresponding corresponding institute within the scope of predetermined base State the first condition for the mapping that there is consecutive identical base quantity to be respectively less than predetermined base number for mapping;
Judge whether described group left meet all mappings that there is the reliable breakpoint more than the first predetermined breakpoint number Quantity account for all mappings in the group sum ratio be less than predetermined ratio second condition;
Judge leave described group whether meet it is all have it is corresponding less than or equal to the mapping of predetermined total number same described reliable The number of breakpoint is less than the third condition of the second predetermined breakpoint number;
Judgement while one or more of meet the first condition, the second condition or the third condition described group It leaves,
The predetermined base ranging from extends the corresponding mapping range of 20 bases by midpoint or so of the reliable breakpoint, described Predetermined base number is 20,
The predetermined ratio is one third,
The first predetermined breakpoint number is 2,
The predetermined total number is 2,
The second predetermined breakpoint number is 10.
9. DNA according to claim 7 resets method for detecting area, it is characterised in that:
It is the check sample compareed with nonneoplastic tissue when the sample to be tested is tumor tissues sample,
It also receives and multipair two pairs of controls that double end sequencings obtain is carried out to multiple sequencing segments of the check sample It reads long double end sequencing data and reads to grow in the reference with described compare that include at least that the reference gene group compares The control comparison information of each control mapping on genome,
The predetermined filtering is further comprising the steps of:
Judge whether leave described group meet near each reliable breakpoint corresponding corresponding institute within the scope of predetermined base State the first condition for the mapping that there is consecutive identical base quantity to be respectively less than predetermined base number for mapping;
Judge whether described group left meet all mappings that there is the reliable breakpoint more than the first predetermined breakpoint number Quantity account for all mappings in the group sum ratio be less than predetermined ratio second condition;
Judge leave described group whether meet it is all have it is corresponding less than or equal to the mapping of predetermined total number same described reliable The number of breakpoint is less than the third condition of the second predetermined breakpoint number;
Judge whether leave described group meet in all mappings with the reliable breakpoint and exist and the contrast ratio To the control breakpoint of identical control mapping distinguishes the number of all mappings of the identical reliable breakpoint respectively in information Amount is less than or equal to the fourth condition of predetermined item number,
Judge while meeting one in the first condition, the second condition, the third condition or the fourth condition Or multiple described group leaves,
The predetermined base ranging from extends the corresponding mapping range of 20 bases by midpoint or so of the reliable breakpoint, described Predetermined base number is 20, and the predetermined ratio is one third,
The first predetermined breakpoint number is 2,
The predetermined total number is 2,
The second predetermined breakpoint number is 10,
The predetermined item number is 3-5 items.
10. a kind of RNA product forecasts method, which is characterized in that including:
DNA resets region detection, and resetting region for the DNA to sample to be tested is detected;
RNA product forecasts, the RNA products for resetting region to the DNA that detects predict,
Wherein, the DNA resets region detection and resets method for detecting area progress using the DNA described in claim 5,
The RNA product forecasts reset the gene rearrangement type in region based on the DNA and support the detailed of big cluster accordingly The RNA products that annotation gene structure resets DNA in region are predicted, are included the following steps:
When the gene rearrangement type that the DNA resets region is that gene is reset with intergenic region or gene internal is reset, and join When what the part that the participation with the gene of rearrangement is reset was noted as gene is 5 ' end, the corresponding rearrangement region is predicted RNA products be protein truncation, when the gene rearrangement type for resetting region be gene and intergenic region reset or gene in When portion is reset, and when the part reset of the participation for participating in the gene reset is noted as 3 ' end, prediction is corresponding described heavy Arrange region RNA products be protein delation, when the DNA reset region gene rearrangement type be gene and gene rearrangement, and When the part that two genes for participating in resetting respectively participate in resetting all is annotated into 5 ' end, the RNA of two genes is predicted Product all be protein truncation, when the DNA reset region type be gene and gene rearrangement, and two participate in reset it is described When the part that gene respectively participates in resetting all is annotated into 3 ' end, predict that the RNA products of two genes are all protein delation, when The type that the DNA resets region is gene and gene rearrangement, and two participate in the gene reset, and a gene participates in The part of rearrangement is annotated into 5 ' ends, and when the part that another gene participates in resetting is annotated 3 ' end, prediction DNA resets region RNA products are the fusion protein that two Gene Fusions obtain;
When the type that the DNA resets region is that gene is reset with gene rearrangement or gene internal, supported in big cluster when described The corresponding mapping area of group map direction least significant end be annotated into be introne when, then fallen by montage according to introne General Principle, the exon number after the group is transcribed is predicted, when described group supported in big cluster is in the least significant end quilt When what is annotated is exon, predict that the exon number that the group is transcribed is constant,
When it is fusion protein to predict the DNA to reset region, in described group of the exon institute of two predicted Codon complementary state after intersection complementation at position is the coding triplet period of the day from 11 p.m. to 1 a.m, predicts the reading at the fusion of fusion protein Frame is not displaced.
11. a kind of DNA resets equipment for area detection equipment, which is characterized in that including:
Receiving part, reception carry out multiple sequencing segments of sample to be tested multipair two pairs of reading length that double end sequencings obtain The comparison information that compares of double end sequencing data and reference gene group;
Cluster portion is obtained, all mapping distances are obtained from the comparison information and meet the corresponding mapping of predetermined mapping value by predetermined Clustering rule is clustered to obtain multiple tuftlets;
Merge judging part, all mappings in all tuftlets are corresponding two described small by two pairs of mappings The identical relationship of cluster merges to obtain each big cluster respectively, and judges two pairs of institutes that each big cluster includes respectively Whether the logarithm for stating mapping meets predetermined logarithm;
Specific judging part judges that each logarithm meets all mappings pair in the big cluster of predetermined logarithm respectively All readings answered grow whether the comparison in the reference gene group meets predetermined comparison particular conditions;
Filter house, it is identical by the corresponding tuftlet in the predetermined big cluster for comparing particular conditions to meeting respectively It is mapped as one group described in all and carries out predetermined filtering to every group respectively;
Determination unit is reset, judges that all mappings in the big cluster all left after two described group scheduled filtering correspond to Region be that one of the sample to be tested actual DNA resets mapping area of the region in the reference gene group, accordingly The big cluster is that the DNA is supported to reset the big cluster of support in region.
12. a kind of RNA product forecasts system, which is characterized in that including:
DNA resets equipment for area detection equipment, and resetting region for the DNA to sample to be tested is detected;And
RNA product forecast equipment, the RNA products for resetting region to the DNA that detects predict,
Wherein, the DNA resets the DNA that equipment for area detection equipment is completed described in claim 5 and resets method for detecting area,
The RNA product forecasts equipment resets the gene rearrangement type in region based on the DNA and supports big cluster accordingly Annotation gene structure predicts the RNA products in DNA rearrangements region in detail, including:
Protein variations type prediction portion, when the gene rearrangement type that the DNA resets region is that gene is reset with intergenic region Or gene internal reset when, and participate in reset the gene participation reset part be noted as gene be 5 ' end when, The corresponding RNA products for resetting region of prediction are protein truncation, when the gene rearrangement type for resetting region is gene When being reset with intergenic region rearrangement or gene internal, and the part for participating in the participation rearrangement for the gene reset is noted as 3 ' When end, the corresponding RNA products for resetting region of prediction are protein delation, when the DNA resets the gene rearrangement class in region Type is gene and gene rearrangement, and the part that two genes for participating in resetting respectively participate in resetting all is annotated into 5 ' ends When, predict that the RNA products of two genes are all protein truncation, when the type that the DNA resets region is gene and gene weight Row, and two participate in reset the genes respectively participate in reset part be all annotated into 3 ' end when, predict two genes RNA products all be protein delation, when the DNA reset region type be gene and gene rearrangement, and two participate in reset The gene in, the part that gene participates in resetting is annotated into 5 ' ends, and the part that another gene participates in resetting is noted When releasing 3 ' end, the RNA products that prediction DNA resets region are the fusion protein that two Gene Fusions obtain;
Exon prediction section is transcribed, when the type that the DNA resets region is that gene is reset with gene rearrangement or gene internal When, when it is described support the corresponding mapping area of group in big cluster the least significant end for mapping direction be annotated into be introne when, The General Principle then fallen by montage according to introne predicts the exon number after the group is transcribed, and is supported in big cluster when described Described group the least significant end be annotated into be exon when, predict that the exon number that the group is transcribed is constant;
Reading frame judging part at fusion, when it is fusion protein to predict the DNA to reset region, described in two predicted Codon complementary state after the exon position intersection complementation of group is the coding triplet period of the day from 11 p.m. to 1 a.m, prediction fusion Reading frame at the fusion of albumen is not displaced.
13. the equipment that a kind of DNA resets region detection, which is characterized in that including:
Memory for storing computer program instructions;And
Processor for executing program instructions,
Wherein, when the computer program instructions are executed by the processor, the equipment perform claim is made to require any one of 1 to 9 The DNA resets the step of method for detecting area.
14. a kind of computer-readable medium, it is characterised in that:
The computer-readable medium storage has computer program,
Wherein, the computer program can be executed by processor to realize DNA weights as claimed in any one of claims 1-9 wherein The step of arranging method for detecting area.
15. a kind of equipment of RNA product forecasts, which is characterized in that including:
Memory for storing computer program instructions;And
Processor for executing program instructions,
Wherein, when the computer program instructions are executed by the processor, the equipment perform claim is made to require the RNA productions described in 10 The step of object prediction technique.
16. a kind of computer-readable medium, it is characterised in that:
The computer-readable medium storage has computer program,
Wherein, the computer program can be executed by processor to realize RNA product forecasts method as claimed in claim 10 The step of.
CN201810497054.1A 2018-05-22 2018-05-22 DNA resets region and corresponding RNA product detections method, equipment and storage medium Active CN108229100B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810497054.1A CN108229100B (en) 2018-05-22 2018-05-22 DNA resets region and corresponding RNA product detections method, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810497054.1A CN108229100B (en) 2018-05-22 2018-05-22 DNA resets region and corresponding RNA product detections method, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN108229100A CN108229100A (en) 2018-06-29
CN108229100B true CN108229100B (en) 2018-08-24

Family

ID=62657993

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810497054.1A Active CN108229100B (en) 2018-05-22 2018-05-22 DNA resets region and corresponding RNA product detections method, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN108229100B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111292809B (en) * 2020-01-20 2021-03-16 至本医疗科技(上海)有限公司 Method, electronic device, and computer storage medium for detecting RNA level gene fusion
CN111243669B (en) * 2020-01-20 2021-02-09 至本医疗科技(上海)有限公司 Method, electronic device, and computer storage medium for determining RNA gene fusion
CN116312797B (en) * 2023-02-17 2024-02-20 至本医疗科技(上海)有限公司 Method, apparatus and medium for predicting functional fusion for gene structural rearrangement

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006113613A1 (en) * 2005-04-15 2006-10-26 Cedars-Sinai Medical Center 5′/3′ ratioing procedure for detection of gene rearrangements
WO2008026927A2 (en) * 2006-08-30 2008-03-06 Academisch Medisch Centrum Process for displaying t- and b-cell receptor repertoires
CN102994631B (en) * 2012-09-24 2014-01-01 中山大学达安基因股份有限公司 Kit for differential diagnosis of liposarcomas and preparation method thereof
WO2017109266A1 (en) * 2015-12-23 2017-06-29 Servicio Andaluz De Salud Diagnosis of soft tissue cancer

Also Published As

Publication number Publication date
CN108229100A (en) 2018-06-29

Similar Documents

Publication Publication Date Title
Chakravarty et al. Clinical cancer genomic profiling
Mroz et al. The challenges of tumor genetic diversity
Serratì et al. Next-generation sequencing: advances and applications in cancer diagnosis
Maire et al. Molecular pathologic diagnosis of epidermal growth factor receptor
CN108229100B (en) DNA resets region and corresponding RNA product detections method, equipment and storage medium
Bejar Clinical and genetic predictors of prognosis in myelodysplastic syndromes
US10961586B2 (en) MDM2-containing double minute chromosomes and methods therefore
Leshchiner et al. Comprehensive analysis of tumour initiation, spatial and temporal progression under multiple lines of treatment
Hussen et al. The emerging roles of NGS in clinical oncology and personalized medicine
Taylor et al. Clinical cancer genomics: how soon is now?
CN110168648A (en) The verification method and system of sequence variations identification
Garcia et al. Sensitivity, specificity, and accuracy of a liquid biopsy approach utilizing molecular amplification pools
Zhao et al. GFusion: an effective algorithm to identify fusion genes from cancer RNA-Seq data
Zhang et al. Identification of 17 mRNAs and a miRNA as an integrated prognostic signature for lung squamous cell carcinoma
Ren et al. Investigating intratumour heterogeneity by single-cell sequencing
KR20220060493A (en) Method for Determining Sensitivity to PARP inhibitor or genotoxic drugs based on non-functional transcripts
Swanton et al. From genomic landscapes to personalized cancer management—is there a roadmap?
Yang et al. Single amino acid changes in naked mole rat may reveal new anti-cancer mechanisms in mammals
CN116580768A (en) Tumor tiny residual focus detection method based on customized strategy
Ewing et al. Breaking point: the genesis and impact of structural variation in tumours
KR20220125708A (en) Next-generation sequencing-based target gene RNA sequencing panel and analysis algorithm
Bian et al. Construction of survival-related co-expression modules and identification of potential prognostic biomarkers of osteosarcoma using WGCNA
Wenzel et al. Routine molecular pathology diagnostics in precision oncology
US20190385696A1 (en) Method for predicting disease risk based on analysis of complex genetic information
Dacic State of the Art of Pathologic and Molecular Testing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant