CN103810402B - Data processing method and device for genomes - Google Patents
Data processing method and device for genomes Download PDFInfo
- Publication number
- CN103810402B CN103810402B CN201410064832.XA CN201410064832A CN103810402B CN 103810402 B CN103810402 B CN 103810402B CN 201410064832 A CN201410064832 A CN 201410064832A CN 103810402 B CN103810402 B CN 103810402B
- Authority
- CN
- China
- Prior art keywords
- information
- fragment
- genome
- group
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a data processing method and device for genomes. The data processing method for the genomes includes the steps that first comparison is carried out on information of the target genomes with the information of the reference genomes to obtain a first comparison result; information of sections, which do not meet the comparison conditions, of the genomes is obtained from the first comparison result; second comparison is carried out on the information of the sections, which do not meet the comparison conditions, of the genomes with the information of the reference genomes to obtain a second comparison result; information of distinguished sequences of the target genomes is obtained from the second comparison result. By means of the data processing method and device, the problem that the accurate distinguished sequences are difficult to obtain through the relative technology is solved.
Description
Technical field
The present invention relates to data processing field, in particular to a kind of data processing method for genome and dress
Put.
Background technology
Comparative genomic strategy analysis directions include: one, by finding the similar gene order of genome between species, research
The similar gene function being likely to be of between species and mechanism;Two, by find species between genome broader region
Genome mutation event that phase Sihe distinguished sequence, the evolutionary history of research species and species produce during evolution etc..
At present, in the related, when finding the distinguished sequence of genome between species, simply by species to be studied
Genome protein sequence is compared with the genome protein sequence of the nearly edge species on evolutionary relationship, to obtain albumen between species
The comparison information of sequence, and the comparison information of protein sequence between species is clustered, thus obtaining genome between species
Distinguished sequence.Because genome is in addition to including protein sequence, also include the sequence of other elements, thus be difficult to obtain accurately
Distinguished sequence.
Further, since the quantity of information of genome is larger, the comparison of genome protein sequence therefore in technique scheme needs
Consume substantial amounts of time and internal memory.
For being difficult in correlation technique obtain the problem of accurate distinguished sequence, effective solution party is not yet proposed at present
Case.
Content of the invention
Present invention is primarily targeted at providing a kind of data processing method for genome and device, to solve correlation
It is difficult in technology obtain the problem of accurate distinguished sequence.
To achieve these goals, according to an aspect of the invention, it is provided a kind of data processing for genome
Method.The method includes: the information of the information of target gene group and reference gene group is carried out first and compares, obtain the first comparison
Result;The information of the genomic fragment not compared is obtained from the first comparison result;By the genomic fragment on not comparing
The information of information and reference gene group carries out second and compares, and obtains the second comparison result;And obtain from the second comparison result
The information of the distinguished sequence of target gene group.
Further, the information of the information of the genomic fragment on not comparing and reference gene group is carried out second to compare,
Obtain the second comparison result to include: detect in the information of genomic fragment not compared with the presence or absence of the sequence information repeating;
If there is the sequence information of repetition in the information detecting the genomic fragment not compared, the sequence information of repetition is entered
Rower is noted, and obtains the information marking;Never filter, in the information of genetic fragment on comparing, the information marking, filtered
Information afterwards;And the information of the information after filtering and reference gene group is compared, obtain the second comparison result.
Further, the first comparison result includes multiple homologous geness group fragments, and wherein, multiple homologous geness group fragments are
Multiple genomic fragments comparing, the information obtaining the genomic fragment not compared from the first comparison result includes: from
Filter multiple homologous geness group fragments in first comparison result, obtain multiple genome sub-piece not compared;According to multiple
Position relationship in target gene group for the genome sub-piece not compared is ranked up, and obtains multiple genes not compared
The sequence of group sub-piece;Genome sub-piece that is adjacent for any two position in sequence and having lap is merged,
Obtain the sequence of the genome sub-piece not compared including multiple merging;And connection includes not comparing of multiple merging
The sequence of genome sub-piece in full gene group sub-piece, the information of the genomic fragment not compared.
Further, the second comparison result includes multiple homologous geness group fragments, obtains target from the second comparison result
The information of the distinguished sequence of genome includes: extracts multiple homologous geness group fragments;According to multiple homologous geness group fragments in mesh
Position relationship in mark genome is ranked up, and obtains the sequence of multiple homologous geness group fragments;Any two in detection sequence
The adjacent homologous geness group fragment in position whether there is lap;If detecting adjacent same in any two position in sequence
There is lap in source genome fragment, then merge lap, obtains the homologous geness group fragment after multiple merging;And from
The information of the homologous geness group fragment after filtering in the second comparison result including multiple merging, obtains the special sequence of target gene group
The information of row.
Further, before extracting multiple homologous geness group fragments, data processing method also includes: judges multiple genes
Whether the length of group fragment is more than or equal to preset length;If it is judged that the length of multiple genome fragments is more than or equal to default length
Degree, then whether the similarity judging multiple genome fragments is more than or equal to default similarity;If it is judged that multiple genomes are broken
The similarity of piece is more than or equal to default similarity, then judge whether the comparison rate of multiple genome fragments compares more than or equal to default
Rate;And if it is judged that the comparison rate of multiple genome fragment is more than or equal to default comparison rate, then by multiple genome fragments
Information as multiple homologous geness group fragments information.
To achieve these goals, according to a further aspect in the invention, there is provided a kind of data processing for genome
Device.This device includes: the first comparing unit, for the information of the information of target gene group and reference gene group is carried out first
Compare, obtain the first comparison result;First acquisition unit, for obtaining the gene pack not compared from the first comparison result
The information of section;Second comparing unit, for carrying out the information of the information of the genomic fragment on not comparing and reference gene group
Second comparison, obtains the second comparison result;And second acquisition unit, for obtaining target gene group from the second comparison result
Distinguished sequence information.
Further, the second comparing unit includes: first detection module, the genomic fragment not compared for detection
Whether there is the sequence information repeating in information;Labeling module, if for the letter detecting the genomic fragment not compared
There is the sequence information of repetition in breath, then the sequence information of repetition is labeled, obtain the information marking;First filter module
Block, filters, in the information for the genetic fragment never comparing, the information marking, the information after being filtered;And compare
Module, for the information of the information after filtering and reference gene group is compared, obtains the second comparison result.
Further, the first comparison result includes multiple homologous geness group fragments, and wherein, multiple homologous geness group fragments are
Multiple genomic fragments comparing, first acquisition unit includes: the second filtering module, for filtering from the first comparison result
Multiple homologous geness group fragments, obtain multiple genome sub-piece not compared;First order module, for according to multiple not
Position relationship in target gene group for the genome sub-piece in comparison is ranked up, and obtains multiple genomes not compared
The sequence of sub-piece;First merging module, for by genome that is adjacent for any two position in sequence and having lap
Sub-piece merges, and obtains the sequence of the genome sub-piece not compared including multiple merging;And link block, use
Full gene group sub-piece in the sequence connecting the genome sub-piece not compared including multiple merging, is not compared
Information to upper genomic fragment.
Further, the second comparison result includes multiple homologous geness group fragments, and second acquisition unit includes: extracts mould
Block, for extracting multiple homologous geness group fragments;Second order module, for according to multiple homologous geness group fragments in target base
Because the position relationship in group is ranked up, obtain the sequence of multiple homologous geness group fragments;Second detection module, for detecting sequence
In row, the adjacent homologous geness group fragment in any two position whether there is lap;Second merging module, if for inspection
Measure the adjacent homologous geness group fragment in any two position in sequence and there is lap, then merge lap, obtain many
Homologous geness group fragment after individual merging;And the 3rd filtering module, for filtering including multiple conjunctions from the second comparison result
And after homologous geness group fragment information, obtain the information of the distinguished sequence of target gene group.
Further, this data processing equipment also includes: the first judge module, for broken in the multiple homologous geness groups of extraction
Before piece, judge whether the length of multiple genome fragments is more than or equal to preset length;Second judge module, for if it is determined that
The length going out multiple genome fragments is more than or equal to preset length, then judge whether the similarity of multiple genome fragments is more than
In default similarity;3rd judge module, for if it is judged that the similarity of multiple genome fragment is more than or equal to default phase
Like spending, then whether the comparison rate judging multiple genome fragments is more than or equal to default comparison rate;And determining module, if for
Judge that the comparison rate of multiple genome fragments is more than or equal to default comparison rate, then by the validation of information of multiple genome fragments be
The information of multiple homologous geness group fragments.
By the present invention, compared using the information of the information of target gene group and reference gene group is carried out first, obtain
First comparison result;The information of the genomic fragment not compared is obtained from the first comparison result;By the gene on not comparing
The information of group fragment carries out second with the information of reference gene group and compares, and obtains the second comparison result;And compare knot from second
Obtain the information of the distinguished sequence of target gene group in fruit, solve and be difficult in correlation technique obtain asking of accurate distinguished sequence
Topic, and then reached the effect of the degree of accuracy improving distinguished sequence.
Brief description
The accompanying drawing constituting the part of the application is used for providing a further understanding of the present invention, the schematic reality of the present invention
Apply example and its illustrate, for explaining the present invention, not constituting inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is the schematic diagram of the data processing equipment for genome according to embodiments of the present invention;
Fig. 2 is the schematic diagram of the according to embodiments of the present invention data processing equipment being preferably used for genome;
Fig. 3 is the flow chart of the data processing method for genome according to embodiments of the present invention;And
Fig. 4 is the flow chart of the according to embodiments of the present invention data processing method being preferably used for genome.
Specific embodiment
It should be noted that in the case of not conflicting, the embodiment in the application and the feature in embodiment can phases
Mutually combine.To describe the present invention below with reference to the accompanying drawings and in conjunction with the embodiments in detail.
In order that those skilled in the art is better understood from the present invention program, below in conjunction with the embodiment of the present invention
Accompanying drawing, to being clearly and completely described in the embodiment of the present invention it is clear that described embodiment is only the present invention one
Partial embodiment, rather than whole embodiments.Based on the embodiment in the present invention, do not have in those of ordinary skill in the art
The every other embodiment being obtained under the premise of making creative work, all should belong to protection scope of the present invention.
It should be noted that term " first " in description and claims of this specification and above-mentioned accompanying drawing, "
Two " it is etc. for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that such use
Data can exchange in the appropriate case so that embodiments of the invention described herein can with except here diagram or
Order beyond those of description is implemented.Additionally, term " comprising " and " having " and their any deformation are it is intended that cover
Cover non-exclusive comprising.
According to embodiments of the invention, there is provided a kind of data processing equipment for genome, this is used for genome
Data processing equipment is used for obtaining the information of accurate distinguished sequence, creates conditions for accurate gene analysiss.
Fig. 1 is the schematic diagram of the data processing equipment for genome according to embodiments of the present invention.
As shown in figure 1, this device includes: the first comparing unit 10, first acquisition unit 20, the second comparing unit 30 and
Two acquiring units 40.
First comparing unit 10 compares for the information of the information of target gene group and reference gene group is carried out first, obtains
To the first comparison result.
Specifically, can be by the nucmer instrument in mummer software, by same for target gene group reference gene group
Carry out the first comparison, obtain the first comparison result between two genomes.It should be noted that in full-length genome scope
During one compares, nucmer instrument can be replaced.
Wherein, target gene group and reference gene group may come from different species, and target gene group can be
The genome of species to be studied, and reference gene group can be the genome of species known to gene information.For example, in analysis
During the genome of willow, the genome of willow can be as target gene group, and if between willow to be analyzed and willow
Gene function relation, can be using the genome of willow as reference gene group, and if the base between willow to be analyzed and Sophora japonica L.
Because of functional relationship, can be using the genome of Sophora japonica L. as reference gene group.First compares and can compare for preliminary, and corresponding first
Comparison result can be preliminary comparison result.It should be noted that species to be studied can include plant, animal and microorganism
Deng.
Preferably, in comparing first, target gene group and reference gene group can be divided into n gene regions respectively,
Can be compared with n gene regions of reference gene group in n gene regions of target gene group simultaneously.As such, it is possible to save
Comparison time, improves comparison efficiency.
Alternatively, this data processing equipment can also include: the 3rd acquiring unit and the 4th acquiring unit.Wherein, the 3rd
Acquiring unit is used for comparing the information of the information of target gene group and reference gene group is carried out first, obtains the first comparison and ties
Before fruit, the 4th acquiring unit is used for obtaining the information of target gene group, and the information obtaining reference gene group.
First acquisition unit 20 is used for obtaining the information of the genomic fragment not compared from the first comparison result.
Wherein, the first comparison result can include the information of genomic fragment comparing and the gene pack not compared
The information of section.Genomic fragment in comparison is properly termed as homologous geness group fragment again.
Specifically, the letter of the genomic fragment that first acquisition unit 20 can not compared by the acquisition of the following two kinds method
Breath:
Method one, extracts the information of the genomic fragment not compared from the first comparison result.
Wherein, the information of the genomic fragment not compared may include that the similarity with reference gene group is less than first
The information of the genomic fragment of default similarity, for example, this preset value can be 98%;Genomic fragment is less than the first default length
The information of the genomic fragment of degree, for example, the first preset length can be 40bp, and if the first preset length is long for one
Degree cluster, then this length cluster can be 90bp;Comparison rate is less than the information of the genomic fragment of the first default comparison rate.Comparison rate can
To be the ratio that sequence to be compared in the genomic fragment of target gene group accounts for sequence to be compared in reference gene group.
Method two, the information filtering of the homologous geness group fragment in the first comparison result is fallen, obtains remaining genome
The information of fragment, wherein, using the information of the information of the remaining genomic fragment genomic fragment as on do not compare.
Wherein it is possible to the information filtering of homologous geness group fragment be fallen by bedtools instrument.So, by filtering out
The information of homologous geness group fragment, can save the consumption to calculator memory.
Second comparing unit 30 is used for carrying out the information of the information of the genomic fragment on not comparing and reference gene group
Second comparison, obtains the second comparison result.
Second comparison result can include multiple homologous geness group fragments and distinguished sequence.Wherein, homologous geness group fragment
For the genome fragment on comparing;Distinguished sequence is the sequence not compared, and it can include gene order and other elements sequence
Row.
Specifically, by blastn software, the information of the genomic fragment on not comparing can be carried out and reference gene group
Information compare, obtain the second comparison result.Wherein, this time compares and compares for fine, and corresponding second comparison result is
Fine comparison result.As such, it is possible to the homologous geness group fragment comparing out in comparing first is found out, and filter out, thus can
To obtain accurate distinguished sequence.This is because, in comparing second, the length of homologous geness group fragment can be preset for second
Length, and the second preset length can be more than the first preset length, and for example, the second preset length can be 100bp;And homology
The similarity of genome fragment can be the second default similarity;And second the comparison rate of comparison default can compare for second
Rate, for example, the second default comparison rate can be 90.
Preferably, in comparing second, the genome on not comparing and reference gene group can be divided into n base respectively
Because organize area, by n gene regions of the genomic fragment on not comparing can with n gene regions of reference gene group simultaneously than
Right.As such, it is possible to saving comparison time, improve comparison efficiency.
Second acquisition unit 40 is used for obtaining the information of the distinguished sequence of target gene group from the second comparison result.
Obtain the method for the information of distinguished sequence of target gene group and from the first comparison result from the second comparison result
The method of the information of the genomic fragment that middle acquisition does not compare is similar to, and will not be described here.
By the embodiment of the present invention, due to first is carried out to the information of target gene group and the information priority of reference gene group
Compare and second compare and compare twice, and compare every time and compare softwares and preset length not etc., preset phase using different
Like comparison data such as degree, default comparison rate, thus reach the effect of the degree of accuracy improving distinguished sequence.In addition, passing through
Mummer software and the cooperation of blastn software, can analyze the diversity in gene structure level for the distinguished sequence.
Fig. 2 is the schematic diagram of the according to embodiments of the present invention data processing equipment being preferably used for genome.
As shown in Fig. 2 this embodiment can be used as the preferred implementation of embodiment illustrated in fig. 1, being used for of this embodiment
The data processing equipment of genome includes the first comparing unit 10 of first embodiment, first acquisition unit 20, second compares list
Unit 30 and second acquisition unit 40, wherein, the second comparing unit 30 includes first detection module 301, labeling module 302, first
Filtering module 303 and comparing module 304.
Phase in the effect of the first comparing unit 10, first acquisition unit 20 and second acquisition unit 40 and first embodiment
Same, will not be described here.
First detection module 301 is used in the information of genomic fragment that detection does not compare with the presence or absence of the sequence repeating
Information.
Preferably, when species to be studied are plant, detect in the information of genomic fragment not compared whether deposit
In the sequence information meaning repeating, this is because there is the sequence of substantial amounts of repetition in the genome of plant, and work as to be studied
When species are animal, can not detect in the information of the genetic fragment not compared and whether there is the sequence information repeating, this is
Because there is the sequence of a small amount of repetition in the genome of animal.
If labeling module 302 is used for the sequence letter that there is repetition in the information detect the genomic fragment not compared
Breath, then be labeled the sequence information of repetition, obtain the information marking.
Specifically, the sequence information of repetition can be marked out by repeatmasker software, and can be with being different from
Other characters of base symbol or numeral etc. are labeled to the sequence information repeating.As such, it is possible to prevent the letter marking
Breath is obscured with base sequence information phase.
First filtering module 303 is used for filtering, in the information of genetic fragment never comparing, the information marking, and obtains
Information after filtration.
It should be noted that the information marking can not be filtered, but in the information phase with reference gene group
During contrast, skip the information marking.
Comparing module 304 is used for the information of the information after filtering and reference gene group is compared, and obtains the second comparison
Result.
By the embodiment of the present invention, when the information with reference gene group is compared, using the sequence detecting repetition
Information, and the mode being filtered or being skipped in comparison, it is possible to reduce the quantity of genome sequence to be compared, thus
Comparison efficiency can be improved, and filter the information marking and can reduce the consumption to calculator memory for the genome.
Alternatively, in embodiments of the present invention, the first comparison result can include multiple homologous geness group fragments, wherein,
Multiple homologous geness group fragments be multiple compare on genomic fragment, first acquisition unit may include that the second filtering module,
First order module, the first merging module and link block.
Second filtering module is used for filtering multiple homologous geness group fragments from the first comparison result, obtains multiple comparison
On genome sub-piece.
It should be noted that above-mentioned from filtering multiple homologous geness group fragments from the first comparison result, obtain multiple not
The step of the genome sub-piece in comparison can be replaced with extracting the step of multiple genome sub-piece not compared.
First order module is used for the position in target gene group according to the genome sub-piece on multiple comparison and closes
System is ranked up, and obtains the sequence of multiple genome sub-piece not compared.
First merges module is used for genome sub-piece that is adjacent for any two position in sequence and having lap
Merge, obtain the sequence of the genome sub-piece not compared including multiple merging.
Specifically, by bedtools instrument, these genome sub-piece with lap can be merged.
Preferably, before this, can first in detection sequence whether the adjacent genome sub-piece in any two position
There is lap, if detecting that in sequence, the adjacent genome sub-piece in any two position has lap, will
In sequence, any two position is adjacent and genome sub-piece that have lap merges, and obtains including multiple merging
The sequence of the genome sub-piece not compared.If detecting that in sequence, the adjacent genome sub-piece in any two position is not
There is lap, then skip and genome sub-piece that is adjacent for any two position in sequence and having lap is closed
And, the step obtaining the sequence of the genome sub-piece not compared including multiple merging.Wherein, overlap can be two bases
Part because organizing sub-piece there occurs overlap, or can be that the whole of two genome sub-piece there occurs overlap, or can
Be the whole of a genome sub-piece with the part of another genome sub-piece there occurs overlapping.
By repeating part in the genome sub-piece on multiple comparison is merged, it is possible to reduce during second compares
To identical genomic fragment repeat compare, such that it is able to reduce time loss during comparison, and repeating part is carried out
Merging can also reduce the consumption to calculator memory.
Link block is used for connecting the whole bases in the sequence of genome sub-piece not compared including multiple merging
Because organizing sub-piece, the information of the genomic fragment not compared.
For example, after the multiple homologous geness group fragments in filtering the first comparison result, 4 can be obtained and do not compare
Genome sub-piece, it is respectively the first sub-piece, the second sub-piece, the 3rd sub-piece and the 4th sub-piece, wherein, first
Sub-piece, the second sub-piece, the 3rd sub-piece and the 4th sub-piece are from left to right arranged successively according to the position relationship in genome
It is classified as a sequence, and the afterbody of the 3rd sub-piece in this sequence and the stem of the 4th sub-piece overlap, and so may be used
To merge the part of this overlap, and the 3rd sub-piece and the 4th sub-piece merge into a new genome sub-pieces
Section the 5th sub-piece, such that it is able to obtain the new sequence being made up of the first sub-piece, the second sub-piece and the 5th sub-piece,
The first sub-piece in this new sequence, the second sub-piece and the 5th sub-piece are sequentially connected the information of the genomic fragment obtaining
It is the information of the genomic fragment not compared.
Alternatively, the second comparison result can include multiple homologous geness group fragments, and second acquisition unit may include that and carries
Delivery block, the second order module, the second detection module, the second merging module and the 3rd filtering module.
Extraction module is used for extracting multiple homologous geness group fragments.Second order module is used for according to multiple homologous geness groups
Position relationship in target gene group for the fragment is ranked up, and obtains the sequence of multiple homologous geness group fragments, specifically, permissible
By the sort instrument in bedtools, multiple homologous geness group fragments are ranked up.Second detection module is used for detection sequence
The adjacent homologous geness group fragment in middle any two position whether there is lap.If the second merging module is used for detecting
In sequence there is lap in the adjacent homologous geness group fragment in any two position, then merge lap, obtain multiple conjunctions
And after homologous geness group fragment.3rd filtering module is for the homology after filtering from the second comparison result including multiple merging
The information of genome fragment, obtains the information of the distinguished sequence of target gene group, and wherein, the information being herein filtered out is except including
The information of the homologous geness group fragment after multiple merging, also includes the information that there is not the homologous geness group fragment of lap.
Wherein, filter homologous geness group flaking step to be replaced with upset homologous geness group flaking step, specifically, can pass through
Complement instrument overturns to homologous geness group fragment.
It should be noted that the function of first acquisition unit can be used to replace from the function of second acquisition unit, here is not
Repeat again.
Preferably, this data processing equipment can also include: the first judge module, the second judge module, the 3rd judges mould
Block and determining module.First judge module is used for, before extracting multiple homologous geness group fragments, judging multiple gene fragments
Whether length is more than or equal to preset length.Wherein, preset length is identical with the second preset length.If the second judge module is used for
Judge that the length of multiple genome fragments is more than or equal to preset length, then judge whether the similarity of multiple genome fragments is big
Preset similarity in being equal to.Wherein, default similarity is identical with the second default similarity.3rd judge module be used for if it is determined that
The similarity going out multiple genome fragments is more than or equal to default similarity, then judge whether the comparison rate of multiple genome fragments is big
Preset comparison rate in being equal to.Wherein, default comparison rate is identical with the second default comparison rate.Determining module is used for if it is judged that many
The comparison rate of individual genome fragment is more than or equal to default comparison rate, then using the information of multiple genome fragments as multiple homology bases
Because organizing the information of fragment.
According to embodiments of the invention, there is provided a kind of data processing method for genome, this is used for genome
Data processing method is used for obtaining the information of accurate distinguished sequence, creates conditions for accurate gene analysiss.This is used for gene
The data processing method of group may operate on computer-processing equipment.It should be noted that what the embodiment of the present invention was provided
Data processing method for genome can be executed for the data processing equipment of genome by the embodiment of the present invention,
The data processing equipment for genome of the embodiment of the present invention can be used for execute the embodiment of the present invention for genome
Data processing method.
Fig. 3 is the flow chart of the data processing method for genome according to embodiments of the present invention.
As shown in figure 3, the method includes steps s302 to step s308:
Step s302, the information of the information of target gene group and reference gene group is carried out first and compares, and obtains the first ratio
To result.
Specifically, can be by the nucmer instrument in mummer software, by same for target gene group reference gene group
Carry out the first comparison, obtain the first comparison result between two genomes.It should be noted that in full-length genome scope
First comparison in, nucmer instrument can be replaced.
Wherein, target gene group and reference gene group may come from different species, and target gene group can be
The genome of species to be studied, and reference gene group can be the genome of species known to gene information.For example, in analysis
During the genome of willow, the genome of willow can be as target gene group, and if between willow to be analyzed and willow
Gene function relation, can be using the genome of willow as reference gene group, and if the base between willow to be analyzed and Sophora japonica L.
Because of functional relationship, can be using the genome of Sophora japonica L. as reference gene group.First compares and can compare for preliminary, and corresponding first
Comparison result can be preliminary comparison result.It should be noted that species to be studied can include plant, animal and microorganism
Deng.
Preferably, in comparing first, target gene group and reference gene group can be divided into n genome respectively
Area, n genomic region of target gene group can be compared with n genomic region of reference gene group simultaneously.So, may be used
To save comparison time, improve comparison efficiency.
Alternatively, compare the information of the information of target gene group and reference gene group is carried out first, obtain the first ratio
Before result, this data processing method can also include: obtain the information of target gene group, and obtain reference gene group
Information.
Step s304, obtains the information of the genomic fragment not compared from the first comparison result.
Wherein, the first comparison result can include the information of genomic fragment comparing and the gene pack not compared
The information of section.Genomic fragment in comparison is properly termed as homologous geness group fragment again.
Specifically, the information of the genomic fragment that can not compared by the acquisition of the following two kinds method:
Method one, extracts the information of the genomic fragment not compared from the first comparison result.
Wherein, the information of the genomic fragment not compared may include that the similarity with reference gene group is less than first
The information of the genomic fragment of default similarity, for example, this preset value can be 98%;Genomic fragment is less than the first default length
The information of the genomic fragment of degree, for example, the first preset length can be 40bp, and if the first preset length is long for one
Degree cluster, then this length cluster can be 90bp;Comparison rate is less than the information of the genomic fragment of the first default comparison rate.Comparison rate can
To be the ratio that sequence to be compared in the genomic fragment of target gene group accounts for sequence to be compared in reference gene group.
Method two, the information filtering of the homologous geness group fragment in the first comparison result is fallen, obtains remaining genome
The information of fragment, wherein, using the information of the information of the remaining genomic fragment genomic fragment as on do not compare.
Wherein it is possible to the information filtering of homologous geness group fragment be fallen by the nucmer instrument in mummer software.This
Sample, by filtering out the information of homologous geness group fragment, can save the consumption to calculator memory.
Step s306, the information of the information of the genomic fragment on not comparing and reference gene group is carried out second and compares,
Obtain the second comparison result.
Second comparison result can include multiple homologous geness group fragments and distinguished sequence.Wherein, homologous geness group fragment
For the genome fragment on comparing;Distinguished sequence is the sequence not compared, and it can include gene order and other elements sequence
Row.
Specifically, by blastn software, the information of the genetic fragment on not comparing can be carried out and reference gene group
Information is compared, and obtains the second comparison result.Wherein, this time compares and compares for fine, and corresponding second comparison result is essence
Thin comparison result.As such, it is possible to the homologous geness group fragment comparing out in comparing first is found out, and filter out, such that it is able to
Obtain accurate distinguished sequence.This is because, in comparing second, the length of homologous geness group fragment can be the second default length
Degree, and the second preset length can be more than the first preset length, and for example, the second preset length can be 100bp;And homology base
Similarity because organizing fragment can be the second default similarity;And second the comparison rate of comparison can be the second default comparison rate,
For example, the second default comparison rate can be 90.
Preferably, in comparing second, the genomic fragment on not comparing and reference gene group can be divided into n respectively
Individual genomic region, can be same with n genomic region of reference gene group by n genomic region of the genomic fragment on not comparing
When compare.As such, it is possible to saving comparison time, improve comparison efficiency.
Step s308, obtains the information of the distinguished sequence of target gene group from the second comparison result.
Obtain the method for the information of distinguished sequence of target gene group and from the first comparison result from the second comparison result
The method of the information of the genomic fragment that middle acquisition does not compare is similar to, and will not be described here.
By the embodiment of the present invention, due to first is carried out to the information of target gene group and the information priority of reference gene group
Compare and second compare and compare twice, and compare every time and compare softwares and preset length not etc., preset phase using different
Like comparison data such as degree, default comparison rate, thus reach the effect of the degree of accuracy improving distinguished sequence.In addition, passing through
Mummer software and the cooperation of blastn software, can analyze the diversity in gene structure level for the distinguished sequence.
Fig. 4 is the flow chart of the according to embodiments of the present invention data processing method being preferably used for genome.
As shown in figure 4, the data processing method that this is used for genome includes steps s402 to step s414, this is real
Applying example can be used as the preferred implementation of embodiment illustrated in fig. 3.
Step s402 to step s404, respectively with embodiment illustrated in fig. 3 step s302 to step s304, here is no longer superfluous
State.
Step s406, detects in the information of genomic fragment not compared with the presence or absence of the sequence information repeating.
Preferably, when species to be studied are plant, detect in the information of genomic fragment not compared whether deposit
In the sequence information meaning repeating, this is because there is the sequence of substantial amounts of repetition in the genome of plant, and work as to be studied
When species are animal, can not detect in the information of the genomic fragment not compared and whether there is the sequence information repeating, this
It is because the sequence that there is a small amount of repetition in the genome of animal.
Step s408, if there is the sequence information of repetition in detecting the information of the genomic fragment not compared,
The sequence information of repetition is labeled, obtains the information marking.
Specifically, the sequence information of repetition can be marked out by repeatmasker software, and can be with being different from
Other characters of base symbol or numeral etc. are labeled to the sequence information repeating.As such, it is possible to prevent the letter marking
Breath is obscured with base sequence information phase
Step s410, filters the information marking, after being filtered in the information of genomic fragment never comparing
Information.
It should be noted that the information marking can not be filtered, but in the information phase with reference gene group
During contrast, skip the information marking.
Step s412, the information of the information after filtering and reference gene group is compared, obtains the second comparison result.
Step s414, with step s308 of embodiment illustrated in fig. 3, will not be described here.
By the embodiment of the present invention, when the information with reference gene group is compared, using the sequence detecting repetition
Information, and the mode being filtered or skipping, it is possible to reduce the quantity of genome sequence to be compared, such that it is able to improve ratio
To efficiency, and filter the information marking and can reduce the consumption to calculator memory for the genome.
Alternatively, in embodiments of the present invention, the first comparison result can include multiple homologous geness group fragments, wherein,
Multiple homologous geness group fragments are the genomic fragment on multiple comparison, obtain the gene not compared from the first comparison result
The information of group fragment may include steps of:
First, filter multiple homologous geness group fragments from the first comparison result, obtain multiple gene polyadenylation signals not compared
Fragment.
It should be noted that above-mentioned from filtering multiple homologous geness group fragments from the first comparison result, obtain multiple not
The step of the genome sub-piece in comparison can be replaced with extracting the step of multiple genome sub-piece not compared.
Then, according to the genome sub-piece on multiple comparison, the position relationship in target gene group is ranked up,
Obtain the sequence of multiple genome sub-piece not compared.
Then, genome sub-piece that is adjacent for any two position in sequence and having lap is merged, obtain
Sequence to the genome sub-piece not compared including multiple merging.
Specifically, by bedtools instrument, these genome sub-piece with lap can be merged.
Preferably, before this, can first in detection sequence whether the adjacent genome sub-piece in any two position
There is lap, if detecting that in sequence, the adjacent genome sub-piece in any two position has lap, will
In sequence, any two position is adjacent and genome sub-piece that have lap merges, and obtains including multiple merging
The sequence of the genome sub-piece not compared.If detecting that in sequence, the adjacent genome sub-piece in any two position is not
There is lap, then skip and genome sub-piece that is adjacent for any two position in sequence and having lap is closed
And, the step obtaining the sequence of the genome sub-piece not compared including multiple merging.Wherein, overlap can be two bases
Part because organizing sub-piece there occurs overlap, or can be that the whole of two genome sub-piece there occurs overlap, or can
Be the whole of a genome sub-piece with the part of another genome sub-piece there occurs overlapping.
By repeating part in the genome sub-piece on multiple comparison is merged, it is possible to reduce during second compares
To identical genomic fragment repeat compare, such that it is able to reduce time loss during comparison, and repeating part is carried out
Merging can also reduce the consumption to calculator memory.
Finally, connect the full gene sub-pieces in the sequence of genome sub-piece not compared including multiple merging
Section, the information of the genome sub-piece not compared.
For example, after the multiple homologous geness group fragments in filtering the first comparison result, 4 can be obtained and do not compare
Genome sub-piece, it is respectively the first sub-piece, the second sub-piece, the 3rd sub-piece and the 4th sub-piece, wherein, first
Sub-piece, the second sub-piece, the 3rd sub-piece and the 4th sub-piece are from left to right arranged successively according to the position relationship in genome
It is classified as a sequence, and the afterbody of the 3rd sub-piece in this sequence and the stem of the 4th sub-piece overlap, and so may be used
To merge the part of this overlap, and the 3rd sub-piece and the 4th sub-piece merge into a new gene group sub-piece
5th sub-piece is such that it is able to obtain the new sequence being made up of the first sub-piece, the second sub-piece and the 5th sub-piece, new by this
The information that the first sub-piece in sequence, the second sub-piece and the 5th sub-piece are sequentially connected the genome sub-piece obtaining is
The information of the genome sub-piece not compared.
Alternatively, the second comparison result can include multiple homologous geness group fragments, obtains mesh from the second comparison result
The information of the distinguished sequence of mark genome may include steps of:
First, multiple homologous geness group fragments are extracted.Secondly, according to multiple homologous geness group fragments in target gene group
Position relationship be ranked up, obtain the sequence of multiple homologous geness group fragments, specifically, can be by bedtools
Sort instrument is ranked up to multiple homologous geness group fragments.Again, the adjacent homology base in any two position in detection sequence
Because group fragment whether there is lap.Then, if detecting that in sequence, the adjacent homologous geness group in any two position is broken
There is lap in piece, then merge lap, obtains the homologous geness group fragment after multiple merging.Finally, from the second comparison
The information of the homologous geness group fragment after filtering in result including multiple merging, obtains the letter of the distinguished sequence of target gene group
Breath, wherein, the information that is herein filtered out except including the information of the homologous geness group fragment after multiple merging, also includes not existing
The information of the homologous geness group fragment of lap.Wherein, filtering homologous geness flaking step can be with upset homologous geness group
Flaking step is replaced, and specifically, by complement instrument, homologous geness group fragment can be overturn.
It should be noted that the step obtaining the information of distinguished sequence of target gene group from the second comparison result is permissible
Replaced with the step with the information obtaining the genetic fragment not compared from the first comparison result, will not be described here.
Preferably, before extracting multiple homologous geness group fragments, this data processing method can also include: first, sentences
Whether the length of disconnected multiple genome fragments is more than or equal to preset length.Wherein, preset length is identical with the second preset length.Connect
, if it is judged that the length of multiple genome fragment is more than or equal to preset length, then judge the similar of multiple genome fragments
Whether degree is more than or equal to default similarity.Wherein, default similarity is identical with the second default similarity.Then, if it is judged that
The similarity of multiple genome fragments is more than or equal to default similarity, then judge whether the comparison rate of multiple genome fragments is more than
It is equal to default comparison rate.Wherein, default comparison rate is identical with the second default comparison rate.Finally, if it is judged that multiple genome
The comparison rate of fragment is more than or equal to default comparison rate, then using the information of multiple gene fragments as multiple homologous geness group fragments
Information.
As can be seen from the above description, the present invention passes through long sequence alignment program and short sequence alignment program simultaneously
With obtaining all types of distinguished sequences (being not limited to protein sequence) between accurate species, and having reached minimizing gene
Time when group compares and the effect of internal memory, this can provide condition for the variety analysis of follow-up species.
It should be noted that the step that illustrates of flow process in accompanying drawing can be in such as one group of computer executable instructions
Execute in computer system, and although showing logical order in flow charts, but in some cases, can be with not
It is same as the step shown or described by order execution herein.
Obviously, those skilled in the art should be understood that each module of the above-mentioned present invention or each step can be with general
Computing device realizing, they can concentrate on single computing device, or be distributed in multiple computing devices and formed
Network on, alternatively, they can be realized with the executable program code of computing device, it is thus possible to they are stored
To be executed by computing device in the storage device, or they be fabricated to each integrated circuit modules respectively, or by they
In multiple modules or step be fabricated to single integrated circuit module to realize.So, the present invention be not restricted to any specific
Hardware and software combines.
The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, for the skill of this area
For art personnel, the present invention can have various modifications and variations.All within the spirit and principles in the present invention, made any repair
Change, equivalent, improvement etc., should be included within the scope of the present invention.
Claims (8)
1. a kind of data processing method for genome is it is characterised in that include:
The information of the information of target gene group and reference gene group is carried out first compare, obtain the first comparison result;
The information of the genomic fragment not compared is obtained from described first comparison result;
The information of the information of the genomic fragment on described comparison and described reference gene group is carried out second compare, obtain
Two comparison results;And
The information of the distinguished sequence of described target gene group is obtained from described second comparison result;
Described first comparison result includes multiple homologous geness group fragments, and wherein, the plurality of homologous geness group fragment is multiple
Genomic fragment in comparison, the information obtaining the genomic fragment not compared from described first comparison result includes:
Filter the plurality of homologous geness group fragment from described first comparison result, obtain the multiple genome not compared
Fragment;
It is ranked up according to position relationship in described target gene group for the genome sub-piece on the plurality of comparison, obtain
The sequence of the genome sub-piece on multiple comparison;
Genome sub-piece that is adjacent for any two position in described sequence and having lap is merged, including
The sequence of the genome sub-piece not compared of multiple merging;And
Connect the full gene group sub-piece in the sequence of genome sub-piece not compared of the multiple merging of described inclusion, obtain
The information of the genomic fragment on described comparison.
2. data processing method according to claim 1 is it is characterised in that by the genomic fragment on described comparison
The information of information and described reference gene group carries out second and compares, and obtains the second comparison result and includes:
Whether there is the sequence information repeating in the information of the genomic fragment not compared described in detection;
If there is the sequence information of repetition in the information of the genomic fragment not compared described in detecting, by described repetition
Sequence information be labeled, obtain the information marking;
The described information marking, the information after being filtered is filtered from the information of the genetic fragment described comparison;With
And
The information of the information after described filtration and described reference gene group is compared, obtains described second comparison result.
3. data processing method according to claim 1 is it is characterised in that described second comparison result includes multiple homologies
Genome fragment, the information of the distinguished sequence obtaining described target gene group from described second comparison result includes:
Extract the plurality of homologous geness group fragment;
It is ranked up according to position relationship in described target gene group for the plurality of homologous geness group fragment, obtain described many
The sequence of individual homologous geness group fragment;
Detect that in described sequence, the adjacent homologous geness group fragment in any two position whether there is lap;
If detecting that in described sequence, the adjacent homologous geness group fragment in any two position has lap, merge institute
State lap, obtain the homologous geness group fragment after multiple merging;And
The information of the homologous geness group fragment after filtering from described second comparison result including multiple merging, obtains described target
The information of the distinguished sequence of genome.
4. data processing method according to claim 3 is it is characterised in that extracting the plurality of homologous geness group fragment
Before, described data processing method also includes:
Judge whether the length of multiple genome fragments is more than or equal to preset length;
If it is judged that the length of the plurality of genome fragment is more than or equal to preset length, then judge that the plurality of genome is broken
Whether the similarity of piece is more than or equal to default similarity;
If it is judged that the similarity of the plurality of genome fragment is more than or equal to default similarity, then judge the plurality of gene
Whether the comparison rate of group fragment is more than or equal to default comparison rate;And
If it is judged that the comparison rate of the plurality of genome fragment is more than or equal to default comparison rate, then by the plurality of genome
The information of fragment is as the information of the plurality of homologous geness group fragment.
5. a kind of data processing equipment for genome is it is characterised in that include:
First comparing unit, compares for the information of the information of target gene group and reference gene group is carried out first, obtains
One comparison result;
First acquisition unit, for obtaining the information of the genomic fragment not compared from described first comparison result;
Second comparing unit, for entering the information of the information of the genomic fragment on described comparison and described reference gene group
Row second compares, and obtains the second comparison result;And
Second acquisition unit, for obtaining the information of the distinguished sequence of described target gene group from described second comparison result;
Described first comparison result includes multiple homologous geness group fragments, and wherein, the plurality of homologous geness group fragment is multiple
Genomic fragment in comparison, described first acquisition unit includes:
Second filtering module, for filtering the plurality of homologous geness group fragment from described first comparison result, obtains multiple
The genome sub-piece not compared;
First order module, for the position in described target gene group according to the genome sub-piece on the plurality of comparison
The relation of putting is ranked up, and obtains the sequence of multiple genome sub-piece not compared;
First merging module, for by genome sub-piece that is adjacent for any two position in described sequence and having lap
Merge, obtain the sequence of the genome sub-piece not compared including multiple merging;And link block, for connecting
Described include multiple merging the sequence of genome sub-piece not compared in full gene group sub-piece, obtain described in not
The information of the genomic fragment in comparison.
6. data processing equipment according to claim 5 is it is characterised in that described second comparing unit includes:
First detection module, in the information of the genomic fragment not compared described in detecting with the presence or absence of the sequence letter repeating
Breath;
, if there is the sequence letter of repetition in the information for the genomic fragment not compared described in detecting in labeling module
Breath, then be labeled the sequence information of described repetition, obtain the information marking;
First filtering module, for filtering the described information marking from the information of the genetic fragment on described comparison, obtains
Information to after filter;And
Comparing module, for the information after described filtration and the information of described reference gene group are compared, obtains described the
Two comparison results.
7. data processing equipment according to claim 5 is it is characterised in that described second comparison result includes multiple homologies
Genome fragment, described second acquisition unit includes:
Extraction module, for extracting the plurality of homologous geness group fragment;
Second order module, enters for the position relationship in described target gene group according to the plurality of homologous geness group fragment
Row sequence, obtains the sequence of the plurality of homologous geness group fragment;
Second detection module, for detecting that in described sequence, the adjacent homologous geness group fragment in any two position whether there is weight
Folded part;
Second merging module, if for detecting the homologous geness group fragment presence that in described sequence, any two position is adjacent
Lap, then merge described lap, obtains the homologous geness group fragment after multiple merging;And
3rd filtering module, for the homologous geness group fragment after filtering from described second comparison result including multiple merging
Information, obtains the information of the distinguished sequence of described target gene group.
8. data processing equipment according to claim 7 is it is characterised in that also include:
First judge module, for, before extracting the plurality of homologous geness group fragment, judging the length of multiple genome fragments
Whether degree is more than or equal to preset length;
Second judge module, for if it is judged that the length of the plurality of genome fragment is more than or equal to preset length, then sentencing
Whether the similarity of disconnected the plurality of genome fragment is more than or equal to default similarity;
3rd judge module, for if it is judged that the similarity of the plurality of genome fragment is more than or equal to default similarity,
Whether the comparison rate then judging the plurality of genome fragment is more than or equal to default comparison rate;And
Determining module, for if it is judged that the comparison rate of the plurality of genome fragment is more than or equal to default comparison rate, then
The validation of information of the plurality of genome fragment is the information of the plurality of homologous geness group fragment.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410064832.XA CN103810402B (en) | 2014-02-25 | 2014-02-25 | Data processing method and device for genomes |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410064832.XA CN103810402B (en) | 2014-02-25 | 2014-02-25 | Data processing method and device for genomes |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103810402A CN103810402A (en) | 2014-05-21 |
CN103810402B true CN103810402B (en) | 2017-01-18 |
Family
ID=50707162
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410064832.XA Active CN103810402B (en) | 2014-02-25 | 2014-02-25 | Data processing method and device for genomes |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103810402B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104298892B (en) * | 2014-09-18 | 2017-05-10 | 天津诺禾致源生物信息科技有限公司 | Detection device and method for gene fusion |
CN104462869B (en) * | 2014-11-28 | 2017-12-26 | 天津诺禾致源生物信息科技有限公司 | The method and apparatus for detecting body cell single nucleotide mutation |
CN107679366A (en) * | 2017-08-30 | 2018-02-09 | 武汉古奥基因科技有限公司 | A kind of computational methods of genome mutation data |
CN111477275B (en) * | 2020-04-02 | 2020-12-25 | 上海之江生物科技股份有限公司 | Method and device for identifying multi-copy area in microorganism target fragment and application |
CN115862735B (en) * | 2022-12-28 | 2024-02-27 | 郑州思昆生物工程有限公司 | Nucleic acid sequence detection method, device, computer equipment and storage medium |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102206704A (en) * | 2011-03-02 | 2011-10-05 | 深圳华大基因科技有限公司 | Method and device for assembling genome sequence |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI439548B (en) * | 2009-12-23 | 2014-06-01 | Ind Tech Res Inst | Sequence calibration method and sequence calibration device |
WO2012034251A2 (en) * | 2010-09-14 | 2012-03-22 | 深圳华大基因科技有限公司 | Methods and systems for detecting genomic structure variations |
KR101295784B1 (en) * | 2011-10-31 | 2013-08-12 | 삼성에스디에스 주식회사 | Apparatus and method for generating novel sequence in target genome sequence |
-
2014
- 2014-02-25 CN CN201410064832.XA patent/CN103810402B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102206704A (en) * | 2011-03-02 | 2011-10-05 | 深圳华大基因科技有限公司 | Method and device for assembling genome sequence |
Also Published As
Publication number | Publication date |
---|---|
CN103810402A (en) | 2014-05-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103810402B (en) | Data processing method and device for genomes | |
CN104298892B (en) | Detection device and method for gene fusion | |
CN104302781B (en) | A kind of method and device detecting chromosomal structural abnormality | |
CN111081317B (en) | Gene spectrum-based breast cancer lymph node metastasis prediction method and prediction system | |
Simillion et al. | Building genomic profiles for uncovering segmental homology in the twilight zone | |
CN111584006B (en) | Circular RNA identification method based on machine learning strategy | |
CN109448787B (en) | Protein subnuclear localization method for feature extraction and fusion based on improved PSSM | |
CN108197434A (en) | The method for removing human source gene sequence in macro gene order-checking data | |
CN108256293A (en) | A kind of statistical method and system of the disease association assortment of genes | |
CN109086772A (en) | A kind of recognition methods and system distorting adhesion character picture validation code | |
CN108004330A (en) | A kind of molecular labeling and its application for being used to identify maple leaf duck | |
CN109448842B (en) | The determination method, apparatus and electronic equipment of human body intestinal canal Dysbiosis | |
CN110020726A (en) | A kind of method and system of pair of assembling sequence permutation | |
CN105046105B (en) | The Haplotype map and its construction method of chromosome span | |
CN110188592B (en) | Urine formed component cell image classification model construction method and classification method | |
CN108018359A (en) | A kind of molecular labeling and its application for being used to identify cherry valley duck | |
CN103348350B (en) | Information nucleic acid processing means and processing method thereof | |
CN110970091A (en) | Label quality control method and device | |
Wu et al. | DeepRetention: a deep learning approach for intron retention detection | |
CN106282352A (en) | Target area capture probe and method for designing thereof | |
CN116144794B (en) | Bovine 12K SV liquid phase chip and design method and application thereof | |
CN113096737A (en) | Method and system for automatically analyzing pathogen types | |
CN107400723A (en) | The authentication method and purposes of seed plant species | |
CN111684113B (en) | Rice green gene chip and application | |
CN107885972A (en) | It is a kind of based on the fusion detection method of single-ended sequencing and its application |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C56 | Change in the name or address of the patentee | ||
CP01 | Change in the name or title of a patent holder |
Address after: 100083 Beijing, Haidian District, Qing Qing Road, No. 38, block B, Jin code building, 712 Patentee after: Beijing Polytron Technologies Inc Address before: 100083 Beijing, Haidian District, Qing Qing Road, No. 38, block B, Jin code building, 712 Patentee before: Nuo Hezhi source, Beijing bioinformation Science and Technology Ltd. |