For constructing the device and method of user friendly chromosomal gene variation map
Technical field
The invention belongs to genetic test fields, and in particular to one kind is for constructing user friendly chromosomal gene variation figure
The method and device of spectrum.
Background technique
Genetic test is the technology detected by blood, other body fluid or cell to DNA, can be diagnosed the illness,
It can be used for the prediction of disease risks.Genetic test is usually that the Oral Mucosal Cells for taking detected person to fall off or its hetero-organization are thin
Born of the same parents after expanding its gene information, detect the DNA molecular information in detected person's cell by particular device, and body is predicted
The risk suffered from the disease is analyzed the various genetic profiles contained by it, to allow one to understand the gene information of oneself, and is passed through
The living environment and living habit for improving oneself, avoid or delay the generation of disease.
With the development of new-generation sequencing technology, the gene based on NGS (Next Generation Sequencing) is examined
Survey technology is grown rapidly, and can detecte since inside and outside various factors makes the base composition or arrangement of gene DNA sequence
Sequence change caused by DNA primary structure change, specifically include that single base change (i.e. single nucleotide variations,
Single Nucleotide Variant, SNV), big or small sequence fragment insertion and missing (i.e. Insertion&
Deletion, InDel), the copy number variation (Copy Number Variant, CNV) of sequence fragment, sequence structure variation
(Structure Variant, SV) etc..
Genetic test mechanism is usually in the form of genetic mutation map to user report testing result.For full-length genome
For (chromosome) genetic test, existing genetic mutation map construction method pays close attention to the variation feelings for showing full-length genome entirety
Condition, or show chromosome location locating for gene, it there is no genetic mutation map construction method can be really intuitively in genetic test
Show which variation a certain gene specifically has occurred in report.This reader that genetic test is reported can not intuitively obtain
The relevant information of genetic test result is disagreeableness for a user.
Bibliography
1.Yang D,Khan S,et al.Association of BRCA1and BRCA2mutations with
survival,chemotherapy sensitivity,and gene mutator phenotype in patients with
ovarian cancer.JAMA.2011;306(14):1557-1565.
2.M Krzywinski,J Schein,et al.Circos:an information aesthetic for
comparative genomics.Genome Res.2009.19:1639-1645.
Summary of the invention
In view of above-mentioned the deficiencies in the prior art, the purpose of the present invention is to provide one kind can construct it is user friendly
The device and method of type chromosomal gene variation map, can automate, really, intuitively and aesthetically show on whole chromosome
The specific variation situation of any gene.Further, genetic test result can also be expressed as colored image, so that data are more
It is easy to recognize by vision-based detection, improves the readability of genetic test result.
The present inventor has made intensive studies to solve above-mentioned technical problem, as a result, it has been found that: by using optimization
Information labeling rule, can rationally arrangement needs the information that marks in limited spacial flex, to solve above-mentioned technology
Problem.
That is, the present invention includes:
1. a kind of for constructing the device of user friendly chromosomal gene variation map comprising:
Data acquisition module, for obtaining genetic mutation detection data, gene information and chromosomal G-banding data;Here,
Genetic mutation detection data includes that such as raw sequencing data obtains after processing, comparison, mutation algorithm detection and annotation
Snp or indel variation information.Gene information includes all transcriptions of each gene for example provided from refseq database
Originally, locating chromosome, the information such as exon number, initial position and final position.Chromosomal G-banding data refer to: application is glimmering
After photoinitiator dye handles chromosome, chromosome can be observed under fluorescence microscope along its long axis and show a rule width and brightness
These band information are converted to the text of absolute position and section of the every band of electronical record in chromosome by different bands
Part.
Data preparation module is connected with the data acquisition module, for matching the transcript of input gene, extracts
Genetic mutation detection data all within the scope of 15~30bp, preferably 20bp and general around the exon and exon of the transcript
The genetic mutation detection data arranges output by specified format;Here, transcript, which refers to, passes through what transcription was formed by a gene
One or more mature mRNA for coding protein, each gene may have multiple transcripts, it is possible to have multiple turns
This number is recorded, input data needs to provide the specific transcript number of gene to be mapped.Specified format refer to will extract information by
The form shown in figure according to hope is arranged, such as " 9 exons, heterozygosis: c.6513G > C:p.V2171V " is output to
In the temporary file of one for example entitled gene_mutpos.txt, final algorithm, which has been run, can delete all temporary files.
Constant gene segment C figure drafting module is connected with the data preparation module, is used for according to constant gene segment C information, will
The conversion of all exon length is proportional, while the accumulation of each exon is added one and isometric includes sub-segments;Here, it " draws
System " refers to script in shape of drawing into device, is stored after completing into such as file of png format, opens this file
When figure can be shown on such as indicator screen.The constant gene segment C information including, for example, gene transcripts, exon id,
The information such as chromosome id, initial position and final position.Length conversion is proportional to be referred to: due to the length ratio of the introne of gene
Exon is higher by several times, if Direct graphic method will lead to naked eyes and can't see exon at all.If wishing only to show exon and outer
The variation of sub- surrounding 20bp is shown, subregion can be included all and is all replaced with equal length.If entire painting canvas regarded as
It is 1 × 1 painting canvas, each position of exon region can be converted into ratio value locating on painting canvas: physical location/(institute
Have exon region total length+length of intron × introne number), that is, it has been converted into ratio.
Chromosome map drafting module is connected, for using chromosomal G-banding with the constant gene segment C figure drafting module
All G are shown zone segment length and are converted into ratio by different colours mark, are judged that each section is located at p arm or q arm, are drawn base
Gene position is marked on figure because of the chromosome at place and the figure in the chromosome;Here it is possible to make for a gene
One figure, draws chromosome first, which position that gene is located at chromosome is then marked.And
Genetic mutation information labeling module, with the data preparation module, constant gene segment C figure drafting module and chromosome
Figure drafting module is connected comprising:
Submodule A, for judging whether the gene has genetic mutation site, if it is, output order starts following sons
Module B, if it is not, then any genetic mutation information is not marked, it is only that chromosome map and constant gene segment C figure is defeated as final result
Out.
Submodule B is connected with the submodule A, currently makees whether map space places all bases enough for judging
Because of variant sites information, if it is, output order starts following submodule C;If it is not, then pop-up miscue, informs base
Because variant sites are too many, can not map.It should be noted that the case where can not mapping for report, can first filter out
Site, then map, for the number of sites mapped preferably within 50, the site retained after filtering is preferably user's concern
Site or the relevant site of disease/prognosis.
Submodule C is connected with the submodule B and submodule H, for judging whether current point is located at the gene
It is interior, if it is, output order starts following submodule D;If it is not, then pop-up warning, informs current point not in the gene model
In enclosing, and output order starts following submodule H;Here, it is currently used in the sky that the relevant figure of genetic mutation testing result is presented
Between be referred to as and currently make map space.In general, making a figure for a gene, the content that a figure can be shown is limited
, that is, making map space (the device of the invention stresses the use of user, and mapping Spatial General 6 R refers to that genetic test is reported) is to have
Limit.The pixel and font size for the map that the apparatus according to the invention is drawn can choose the variation Information Number that setting is drawn
Measure the upper limit.In the case where guaranteeing that figure can be identified visually under the premise of keeping content clearly and layout is beautiful, it can only be drawn in figure
Most about 50 variation information need not map if more than 50.On the other hand, nearly all patient is in a gene
On variant sites be no more than 15,50 can substantially guarantee map to all samples.It is step-by-step when mapping
It sets sequence to draw into variant sites information one by one, the variant sites information drawn is exactly current point;Current point mapping
Before, the newest variant sites information finished be exactly on a bit;If after current point is finished, i.e., the variant sites that will be drawn are believed
Breath is exactly next point;Left point is exactly remaining all variant sites information finished not yet.
Submodule D is connected with the submodule C and submodule H, works as judging whether current spatial is placed enough
Preceding point and left point;If it is, carrying out submodule E;If it is not, then moving the base of distance to a declared goal mark current point directly up
Because of the information that makes a variation, and output order starts following submodule H.
Submodule E is connected with the submodule D and submodule H, for judge current point and it is upper apart from whether
It is especially close, so that the markup information of this two o'clock can be overlapped;If it is, moving down distance to a declared goal mark current point from upper position
Genetic mutation information, and output order starts following submodule H;If it is not, then output order starts following submodule G;This
In, " especially close " or " especially remote " is all to be compared the distance of point-to-point transmission with preset value.If the distance of point-to-point transmission
It is then " especially close " less than preset value (for example, 0.01);If the distance of point-to-point transmission is greater than preset value (for example,
It 0.1) is, then " especially remote ".The preset value can be set as needed.
Whether submodule G is connected with the submodule E and submodule H, for judging current point apart from upper special
It is not remote and especially close apart from next point, if it is, moving up the genetic mutation information of distance to a declared goal mark current point, and defeated
Instruction starts following submodule H out;If it is not, then directly in the genetic mutation information of current location mark current point, and export
Instruction starts following submodule H.
And
Submodule H, for judging whether current point is the last one genetic mutation site, if it is mark terminates, will
Above-mentioned deterministic process obtains result and exports as final result, if it is not, then skipping to the lower variant sites of gene and output order
Start above-mentioned submodule C.
2. a kind of method for constructing user friendly chromosomal gene variation map comprising:
Data acquisition obtains genetic mutation detection data, gene information and chromosomal G-banding data;Here, genetic mutation
Snp that detection data includes such as raw sequencing data to be obtained after processing, comparison, mutation algorithm detection and annotation or
Indel variation information.Gene information includes all transcripts of each gene, locating for example provided from refseq database
Chromosome, the information such as exon number, initial position and final position.Chromosomal G-banding data refer to: using at fluorescent dye
After managing chromosome, chromosome can be observed under fluorescence microscope along its long axis and show the rule width cross different with brightness
These band information are converted to the file of absolute position and section of the every band of electronical record in chromosome by line.
Data preparation, the transcript of matching input gene, extract 15 around the exon and exon of the transcript~
Within the scope of 30bp, preferably 20bp all genetic mutation detection datas and by the genetic mutation detection data by specified format arrange
Output;Here, transcript refers to by a gene by transcribing the one or more maturations for coding protein formed
MRNA, each gene may have multiple transcripts, it is possible to have multiple transcripts to number, input data needs provide base to be mapped
The specific transcript of cause is numbered.Specified format refers to that will extract the form that information is desirably shown in figure arranges, example
As " 9 exons, heterozygosis: c.6513G > C:p.V2171V " is output to the interim of a for example entitled gene_mutpos.txt
In file, final algorithm, which has been run, can delete all temporary files.
Constant gene segment C figure is drawn, according to constant gene segment C information, all exon length are converted proportional, while each outer
Aobvious son accumulation is added one and isometric includes sub-segments;For example, the rectangle of each color-grading represents in fig. 1 and 2
One exon, the blank between two exons, which represents, has an introne, and exon is passed through by the arrangement drafting of ID sequence
More and more shallow same colour system rectangle is gradually drawn, the graduated colors that vision is seen is produced, can be conducive to using gradual change colour system
Vision receives and improves aesthetics;Here, " drafting " refers to that script in shape of drawing into device, is stored after completing to for example
In the file of png format, figure can be shown when opening this file on such as indicator screen.The constant gene segment C information
Including, for example, information such as gene transcripts, exon id, chromosome id, initial position and final positions.Length conversion is proportional
Refer to: since the length of the introne of gene is higher by several times than exon, if Direct graphic method will lead to naked eyes and can't see at all
Exon.If wishing the variation for only showing 20bp around exon and exon, can by it is all include subregion all use it is equal
Length replaces.If entire painting canvas to be regarded as to the painting canvas of 1*1, each position of exon region can be converted into painting canvas
Upper locating ratio value: physical location/(all exon region total lengths+length of intron * introne number) are converted into
Ratio.
Chromosome map is drawn, chromosomal G-banding is marked with different colours, all G are shown into zone segment length and are converted into ratio
Example judges that each section is located at p arm or q arm, draws the figure of the chromosome where gene and marks on the figure of the chromosome
Gene position;For example, left part represents chromosome in Fig. 1 and Fig. 2, the rectangle of each color-grading or semicircle generation
One G of table chromosome shows band, arranges and draws by absolute position sequence;Here it is possible to make a figure for a gene, first
Chromosome is drawn, which position that gene is located at chromosome is then marked.And
Genetic mutation information labeling comprising:
Step A, judges whether the gene has genetic mutation site, if it is, following step B is carried out, if it is not, then not
Any genetic mutation information is marked, is only exported using chromosome map and constant gene segment C figure as final result.
Step B, judgement currently make whether map space places all genetic mutation site information enough, if it is, carrying out
Following step C;If it is not, then pop-up miscue, informs that genetic mutation site is too many, can not map.It should be noted that right
In reporting the case where can not mapping, some sites can be first filtered out, then map, the number of sites mapped is preferably 50
Within a, the site retained after filtering is preferably the site or the relevant site of disease/prognosis of user's concern.
Step C, judges whether current point is located in the gene, if it is, carrying out following step D;If it is not, then pop-up
Warning, informing current point carry out following step H not within the scope of the gene;Here, it is currently used in and genetic mutation detection is presented
As a result map space is referred to as currently made in the space of relevant figure.In general, making a figure for a gene, a figure can be shown
Content out is limited, that is, is limited as map space.Under the premise of keeping content clearly and layout is beautiful, guarantee figure
Most about 50 variation information can only be drawn in the case that shape can be identified visually, in figure, need not be mapped if more than 50
?.On the other hand, variant sites of nearly all patient on a gene are no more than 15, and 50 can substantially protect
Card can map to all samples.It is that opsition dependent sequence is drawn into variant sites information one by one when mapping, is drawing
Variant sites information is exactly current point;Current point mapping before, the newest variant sites information finished be exactly on a bit;If worked as
After preceding point is finished, i.e., the variant sites information that will be drawn is exactly next point;Left point is exactly remaining all finishes not yet
Variant sites information.
Step D, judges whether current spatial places current point and left point enough;If it is, carrying out step E;If
It is no, then the genetic mutation information of distance to a declared goal mark current point is moved directly up, and carries out following step H.
Step E judges whether current point and upper distance are especially close, so that the markup information of this two o'clock can be overlapped;Such as
Fruit is the genetic mutation information of distance to a declared goal mark current point then to be moved down from upper position, and carry out following step H;If
It is no, then carry out following step G;Here, " especially close " or " especially remote " is all to carry out the distance of point-to-point transmission and preset value
Compare.If the distance of point-to-point transmission is less than preset value (for example, 0.01), for " especially close ";If the distance of point-to-point transmission
It is then " especially remote " greater than preset value (for example, 0.1).The preset value can be set as needed.
Step G, judge current point whether apart from it is upper especially remote and apart from next point it is especially close, if it is, upwards
The genetic mutation information of mobile distance to a declared goal mark current point, and carry out following step H;If it is not, then directly in current location
The genetic mutation information of current point is marked, and carries out following step H;And
Step H judges whether current point is the last one genetic mutation site, and if it is mark terminates, and sentences above-mentioned
Disconnected process obtains result and exports as final result, if it is not, then skipping to the lower variant sites of gene and carrying out above-mentioned steps C.
Invention effect
The device and method according to the present invention for being used to construct user friendly chromosomal gene variation map, can be automatic
Change, is true, specific variation situation that is intuitive and aesthetically showing any gene on whole chromosome.It further, can also be by base
Because testing result is expressed as colored image, so that data are easier to recognize by vision-based detection, that improves genetic test result can
The property read.
Detailed description of the invention
Fig. 1 is the BRCA1 variation detection data user friendly dyeing for showing the sample VB01562 obtained in embodiment 1
The figure of body genetic mutation map.
Fig. 2 is the BRCA2 variation detection data user friendly dyeing for showing the sample VB01562 obtained in embodiment 1
The figure of body genetic mutation map.
The specific embodiment of invention
Carrying out sample using BRCA1/2 user friendly chromosomal gene of the present invention variation map construction device, (sample is compiled
Number VB01562) variation data chromosomal gene makes a variation map construction, and which includes:
Data acquisition module, for obtaining genetic mutation detection data, gene information and G-band chromosome data.
Wherein, genetic mutation detection data include for example raw sequencing data by processing, compare, mutation algorithm detection and
Snp or indel the variation information obtained after annotation;Gene information includes each base for example provided from refseq database
Because all transcripts, locating for chromosome, the information such as exon number, initial position and final position.
Data preparation module is connected with the data acquisition module, for matching the transcript of input gene, extracts
Genetic mutation detection data all within the scope of 20bp and the genetic mutation is examined around the exon and exon of the transcript
Measured data arranges output by specified format.
Wherein, specified format refers to that will extract the form that information is desirably shown in figure arranges, such as " No. 9
Exon, heterozygosis: c.6513G > C:p.V2171V " is output to the temporary file of a for example entitled gene_mutpos.txt
In, final algorithm, which has been run, can delete all temporary files.
Constant gene segment C figure drafting module is connected with the data preparation module, for drawing constant gene segment C figure,
In, according to constant gene segment C information, all exon length are converted proportional, while the accumulation of each exon addition one is isometric
Include sub-segments.
Chromosome map drafting module, be connected with the data preparation module and with the constant gene segment C figure drafting module
It is connected, for drawing chromosome map, wherein chromosomal G-banding is marked with different colours, all G are shown into zone segment length
It converts proportional, judges that each section is located at p arm or q arm, draw the figure of the chromosome where gene and in the chromosome
Gene position is marked on figure.And
Genetic mutation information labeling module, with the data preparation module, constant gene segment C figure drafting module and chromosome
Figure drafting module is connected comprising:
Submodule A, for judging whether the gene has genetic mutation site, if it is, output order starts following sons
Module B, if it is not, then any genetic mutation information is not marked, it is only that chromosome map and constant gene segment C figure is defeated as final result
Out.
Submodule B is connected with the submodule A, currently makees whether map space places all bases enough for judging
Because of variant sites information, if it is, output order starts following submodule C;If it is not, then informing genetic mutation site too
It is more, it can not map.
Submodule C is connected with the submodule B and submodule H, for judging whether current point is located at the gene
It is interior, if it is, carrying out output order starts following submodule D;If it is not, then pop-up warning, informs current point not in the base
Because in range, and output order starts following submodule H.
Submodule D is connected with the submodule C and submodule H, works as judging whether current spatial is placed enough
Preceding point and left point;If it is, output order starts following submodule E;If it is not, then moving distance to a declared goal mark directly up
The genetic mutation information of current point is infused, and output order starts following submodule H.
Submodule E is connected with the submodule D and submodule H, for judge current point and it is upper apart from whether
It is especially close, so that the markup information of this two o'clock can be overlapped;If it is, moving down distance to a declared goal mark current point from upper position
Genetic mutation information, and output order starts following submodule H;If it is not, then output order starts following submodule G.
Wherein, " especially close " or " especially remote " is all to be compared the distance of point-to-point transmission with preset value;If two
Distance between point is less than preset value 0.01, then is " especially close ";If the distance of point-to-point transmission is greater than preset value 0.1,
It is then " especially remote ".The distance to a declared goal is 0.01.
Whether submodule G is connected with the submodule E and submodule H, for judging current point apart from upper special
It is not remote and especially close apart from next point, if it is, moving up the genetic mutation information of distance to a declared goal mark current point, and defeated
Instruction starts following submodule H out;If it is not, then directly in the genetic mutation information of current location mark current point, and export
Instruction starts following submodule H.And
Submodule H, for judging whether current point is the last one genetic mutation site, if it is, mark terminates, it will
The result generated in above-mentioned submodule is exported as final result;Refer to if it is not, then skipping to next genetic mutation site and exporting
It enables and starts above-mentioned submodule C.
Chromosome is constructed using above-mentioned BRCA1/2 user friendly chromosomal gene variation map construction device of the invention
After genetic mutation map, we obtain the BRCA1 of sample VB01562 and BRCA2 variation detection datas to visualize file
VB01562_BRCA1.png (see Fig. 1) and VB01562_BRCA2.png.It can in file VB01562_BRCA1.png (see Fig. 2)
See, sample VB01562 is not detected in BRCA1 gene extron and surrounding 20bp and morphs, only by chromosome map and gene
Section figure is exported as final result;And visible sample VB01562 detects 6 BRCA1 in file VB01562_BRCA2.png
Gene extron and the interior variation occurred of surrounding 20bp.
Industrial applicibility
In accordance with the invention it is possible to which providing one kind can automate, really, intuitively and aesthetically show any base on whole chromosome
The device and method for being used to construct user friendly chromosomal gene variation map of the specific variation situation of cause.