CN110379461A - A kind of gene data comparison method, device, equipment and medium - Google Patents
A kind of gene data comparison method, device, equipment and medium Download PDFInfo
- Publication number
- CN110379461A CN110379461A CN201910576190.4A CN201910576190A CN110379461A CN 110379461 A CN110379461 A CN 110379461A CN 201910576190 A CN201910576190 A CN 201910576190A CN 110379461 A CN110379461 A CN 110379461A
- Authority
- CN
- China
- Prior art keywords
- data
- gene
- gene data
- compared
- standard
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
Landscapes
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Biophysics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of gene data comparison method, device, equipment and media.The step of this method includes: to obtain gene data to be compared, and gene data to be compared is divided into multiple genetic fragment data;Execute the matching operation between the standard gene data in each genetic fragment data and standard gene library parallel by FPGA, and obtain in standard gene library with the highest result gene data of the comprehensive matching degree of each genetic fragment data, with for being compared with gene data to be compared.For realizing that gene data compares using CPU or GPU in compared with the prior art, the opposite integral operation efficiency for ensuring gene data comparison process of this method.In addition, the present invention also provides a kind of gene data comparison device, equipment and medium, beneficial effect are same as above.
Description
Technical field
The present invention relates to field of cloud calculation, more particularly to a kind of gene data comparison method, device, equipment and medium.
Background technique
Gene (gene) is complete nucleotide sequence needed for generating a polypeptide chain or function RNA.Gene is supported
The essential structure and performance of life.
In the research for biology, researcher has for every a kind of biological corresponding standard gene library, in research base
During the evolution or variation of cause, in the standard gene library of the gene to be compared and the species that need newly to get in species
Corresponding target gene is compared, and the gene variation information of the species is learned with this.
Comparison currently for gene data is limited in by CPU or GPU realization, but due to the comparison of gene data
Data operation quantity in journey is huge, and as the application of gene comparison technology is deepened, leads to the continuous of the scale of gene data
Increase, the integral operation efficiency of gene data comparison process is difficult to ensure by the operational capability of CPU and GPU itself.
It can be seen that a kind of gene data comparison method is provided, with the opposite whole fortune for ensuring gene data comparison process
Efficiency is calculated, is those skilled in the art's problem to be solved.
Summary of the invention
The object of the present invention is to provide a kind of gene data comparison method, device, equipment and media, ensure gene with opposite
The integral operation efficiency of comparing process.
In order to solve the above technical problems, the present invention provides a kind of gene data comparison method, comprising:
Gene data to be compared is obtained, and gene data to be compared is divided into multiple genetic fragment data;
Execute the matching between the standard gene data in each genetic fragment data and standard gene library parallel by FPGA
Operation, and obtain in standard gene library with the highest result gene data of the comprehensive matching degree of each genetic fragment data, with
It is compared in gene data to be compared.
Preferably, it is executed parallel by FPGA between the standard gene data in each genetic fragment data and standard gene library
Matching operation, and obtain in standard gene library with the highest result gene number of the comprehensive matching degree of each genetic fragment data
According to for being compared with gene data to be compared, comprising:
Execute the matching between the standard gene data in each genetic fragment data and standard gene library parallel by FPGA
Operation;
When there are the target criteria gene datas in the equal matching criteria gene pool of adjacent target gene fragment data, and
Matched region consecutive hours, obtains current continuous matching length in target criteria gene data;
When meeting preset condition including at least the judgement parameter including current continuous matching length, by target criteria gene
Data markers are result gene data, for being compared with gene data to be compared.
Preferably, when meeting preset condition including at least the judgement parameter including current continuous matching length, by target
Standard gene data markers are result gene data, for being compared with gene data to be compared, comprising:
When current continuous matching length is compared to other continuous matching length longests, target criteria gene data is marked
For result gene data, for being compared with gene data to be compared.
Preferably, when the judgement parameter including including at least current continuous matching length meets preset condition, by mesh
Mark standard gene data markers are result gene data, before being used to be compared with gene data to be compared, method
Include:
Adjacent target gene fragment data is connected as target gene fragment sequence;
Correspondingly, when meeting preset condition including at least the judgement parameter including current continuous matching length, by target
Standard gene data markers are result gene data, for being compared with gene data to be compared, comprising:
When each genetic fragment data and target criteria gene data alignment score are maximum value, by current continuous coupling
Between length and other continuous matching lengths longer standard gene data markers be result gene data, with for it is to be compared
Gene data is compared;
When current continuous matching length and the equal longest of other continuous matching lengths, and the hash value of target gene fragment sequence
Greater than other target gene fragment sequences hash value when, by target criteria gene data be labeled as result gene data, with
It is compared in gene data to be compared.
Preferably, gene data to be compared is obtained, and gene data to be compared is divided into multiple genetic fragment data, is wrapped
It includes:
Gene data to be compared is obtained, and gene data to be compared is divided into multiple gene pieces by way of K-mer
Segment data.
In addition, the present invention also provides a kind of gene data comparison devices, comprising:
Segmentation module is obtained, is divided into multiple genes for obtaining gene data to be compared, and by gene data to be compared
Fragment data;
Computing module, for executing the standard gene number in each genetic fragment data and standard gene library parallel by FPGA
Matching operation between, and obtain in standard gene library with the highest result base of the comprehensive matching degree of each genetic fragment data
Because of data, for being compared with gene data to be compared.
Preferably, computing module includes:
Matching operation module, for executing the standard base in each genetic fragment data and standard gene library parallel by FPGA
Because of the matching operation between data;
Matching length obtains module, for when there are in the equal matching criteria gene pool of adjacent target gene fragment data
Target criteria gene data, and the matched region consecutive hours in target criteria gene data, obtain current continuous matching length;
Condition analysis module meets preset condition for the judgement parameter including including at least current continuous matching length
When, target criteria gene data is labeled as result gene data, for being compared with gene data to be compared.
In addition, the present invention also provides a kind of gene datas to compare equipment, comprising:
Memory, for storing computer program;
Processor is realized when for executing computer program such as the step of above-mentioned gene data comparison method.
In addition, being stored with meter on computer readable storage medium the present invention also provides a kind of computer readable storage medium
Calculation machine program is realized when computer program is executed by processor such as the step of above-mentioned gene data comparison method.
Gene data comparison method provided by the present invention, obtains gene data to be compared first, and by gene to be compared
Data are divided into multiple genetic fragment data, and then pass through FPGA and execute the mark in each genetic fragment data and standard gene library
Matching operation between quasi- gene data, and in standard gene library with the comprehensive matching degree highest standard base of genetic fragment data
Because of data gene data as a result, for being compared with gene data to be compared.This method is by by base to be compared
Because data are divided into the mode of multiple genetic fragment data, realize parallel by the FPGA with high concurrent operational capability characteristic
The purpose for executing the matching operation between the standard gene data in each genetic fragment data and the standard gene library, due to
FPGA has higher concurrent operational capability for CPU or GPU, therefore middle using CPU or GPU compared with the prior art
For realizing that gene data compares, the opposite integral operation efficiency for ensuring gene data comparison process of this method.In addition, this hair
It is bright that also to provide a kind of gene data comparison device, equipment and medium, beneficial effect same as above.
Detailed description of the invention
In order to illustrate the embodiments of the present invention more clearly, attached drawing needed in the embodiment will be done simply below
It introduces, it should be apparent that, drawings in the following description are only some embodiments of the invention, for ordinary skill people
For member, without creative efforts, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of flow chart of gene data comparison method provided in an embodiment of the present invention;
Fig. 2 is a kind of flow chart of gene data comparison method provided in an embodiment of the present invention;
Fig. 3 is a kind of structure chart of gene data comparison device provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, rather than whole embodiments.Based on this
Embodiment in invention, those of ordinary skill in the art are without making creative work, obtained every other
Embodiment belongs to the scope of the present invention.
Comparison currently for gene data is limited in by CPU or GPU realization, but due to the comparison of gene data
Data operation quantity in journey is huge, and as the application of gene comparison technology is deepened, leads to the continuous of the scale of gene data
Increase, the integral operation efficiency of gene data comparison process is difficult to ensure by the operational capability of CPU and GPU itself.
Core of the invention is to provide a kind of gene data comparison method, ensures the whole of gene data comparison process with opposite
Body operation efficiency.Another core of the invention is to provide a kind of gene data comparison device, equipment and medium.
In order to enable those skilled in the art to better understand the solution of the present invention, with reference to the accompanying drawings and detailed description
The present invention is described in further detail.
Fig. 1 is a kind of flow chart of gene data comparison method provided in an embodiment of the present invention.Referring to FIG. 1, gene number
Include: according to the specific steps of comparison method
Step S10: gene data to be compared is obtained, and gene data to be compared is divided into multiple genetic fragment data.
It should be noted that the gene data to be compared in this step is the gene of current its particular sequence content to be analyzed
Data, since gene data to be compared is the form of sequence, the comparison for gene is actually to base on each position of sequence
It is compared because of the comparison of coherence of element, therefore in order to parallel in segmented fashion, this step divides gene data to be compared
It was segmented into a genetic fragment data, wherein genetic fragment data are the gene data of certain length in gene data to be compared
Paragraph, user can set the whole number for the genetic fragment data being split to form by gene to be compared according to specific demand
According to that is, according to the bout length of the divided genetic fragment data of specific requirements setting.
Step S11: executed parallel by FPGA standard gene data in each genetic fragment data and standard gene library it
Between matching operation, and obtain in standard gene library with the highest result gene number of the comprehensive matching degree of each genetic fragment data
According to for being compared with gene data to be compared.
It should be noted that this step focus on carry out each genetic fragment data in a parallel fashion using FPGA
With the matching operation between the standard gene data in standard gene library, the purpose of matching operation be learn in standard gene library with
Gene data to be compared is consistent or approximate standard gene data, i.e., the result gene data in this step.
During matching operation, FPGA carries out each standard in each genetic fragment data and standard gene library respectively
Matching between gene data can only characterize the portion gene information in complete genome due to genetic fragment data, and mark
There may be identical portion gene information between different standard genes in quasi- gene pool, therefore matching signified in this step
The main purpose of process is to obtain to match with the content of greater number of genetic fragment data in standard gene library
Standard gene, that is to say, that the amount to match with the genetic fragment data that gene data to be compared is divided is more, that is, this
Signified comprehensive matching degree highest, then more close with gene data to be compared in step.In getting standard gene library with
After the highest result gene data of the comprehensive matching degree of each genetic fragment data, then further by result gene data be used for
Gene data to be compared is compared, and the particular content of comparative analysis is not as the key content of the application, therefore herein
It does not repeat them here.
Gene data comparison method provided by the present invention, obtains gene data to be compared first, and by gene to be compared
Data are divided into multiple genetic fragment data, and then pass through FPGA and execute the mark in each genetic fragment data and standard gene library
Matching operation between quasi- gene data, and in standard gene library with the comprehensive matching degree highest standard base of genetic fragment data
Because of data gene data as a result, for being compared with gene data to be compared.This method is by by base to be compared
Because data are divided into the mode of multiple genetic fragment data, realize parallel by the FPGA with high concurrent operational capability characteristic
The purpose for executing the matching operation between the standard gene data in each genetic fragment data and the standard gene library, due to
FPGA has higher concurrent operational capability for CPU or GPU, therefore middle using CPU or GPU compared with the prior art
For realizing that gene data compares, the opposite integral operation efficiency for ensuring gene data comparison process of this method.
Fig. 2 is a kind of flow chart of gene data comparison method provided in an embodiment of the present invention.Referring to FIG. 2, gene number
Include: according to the specific steps of comparison method
Step S20: gene data to be compared is obtained, and gene data to be compared is divided into multiple genetic fragment data.
Step S21: the mark in each genetic fragment data and the standard gene library is executed by the FPGA parallel
The matching operation between quasi- gene data.
Step S22: when matching the target criteria in the standard gene library there are adjacent target gene fragment data
Gene data, and the matched region consecutive hours in the target criteria gene data, obtain current continuous matching length.
It should be noted that the present embodiment is considered carrying out each genetic fragment data and standard gene library Plays gene
When matching between data, it is understood that there may be in gene data to be compared in the adjacent genetic fragment data in position and standard information library
The case where target criteria gene data continuum matches, the i.e. region of target criteria gene and neighboring gene segment Corresponding matching
Between be it is continuous, in these cases, this method obtains adjacent target genetic fragment continuous coupling target criteria genetic profile
Under adjacent target genetic fragment current matching length, each genetic fragment data and mark are characterized by current matching length with this
Comprehensive matching degree in quasi- gene pool between each standard gene.
Step S23:, will when meeting preset condition including at least the judgement parameter including the current continuous matching length
The target criteria gene data is labeled as the result gene data, described in being used to carry out with the gene data to be compared
Compare analysis.
This step is using current continuous matching length as judging gene data to be compared and standard gene library Plays gene
Between comprehensive matching degree one of judgement parameter, the judgement parameter including including at least the current continuous matching length is full
When sufficient preset condition, the target criteria gene data is labeled as the result gene data.
The present embodiment is by obtaining the adjacent target under adjacent target genetic fragment continuous coupling target criteria genetic profile
The current matching length of genetic fragment, and then characterized in each genetic fragment data and standard gene library respectively by current matching length
Comprehensive matching degree between a standard gene, using this by current continuous matching length as judge gene data to be compared and mark
The judgement parameter of comprehensive matching degree between quasi- gene pool Plays gene, it is opposite to improve for the acquisition of result gene data
Accuracy, and then ensure the overall accuracy of gene data comparison process.
On the basis of the above embodiments, as a preferred embodiment, it is long when including at least current continuous coupling
When judgement parameter including degree meets preset condition, by target criteria gene data be labeled as result gene data, with for
Gene data to be compared is compared, comprising:
When current continuous matching length is compared to other continuous matching length longests, target criteria gene data is marked
For result gene data, for being compared with gene data to be compared.
It should be noted that judging parameter in present embodiment only includes current continuous matching length, therefore when current
When continuous matching length is compared to other continuous matching length longests, then illustrate the corresponding target criteria of current continuous matching length
Consistency between gene data and gene data to be compared is higher, therefore target criteria gene data is labeled as result gene
Data, and for gene data to be compared to be compared.Due to continuous matching length can intuitively reflect it is to be compared
Close degree between gene data and standard gene data, therefore present embodiment opposite can ensure for result gene number
According to the accuracy of acquisition, and then ensure the overall accuracy of gene data comparison process.
On the basis of the above embodiments, as a preferred embodiment, when including at least current continuous coupling
When judgement parameter including length meets preset condition, target criteria gene data is labeled as result gene data, to be used for
Before being compared with gene data to be compared, method includes:
Adjacent target gene fragment data is connected as target gene fragment sequence;
Correspondingly, when meeting preset condition including at least the judgement parameter including current continuous matching length, by target
Standard gene data markers are result gene data, for being compared with gene data to be compared, comprising:
When each genetic fragment data and target criteria gene data alignment score are maximum value, and current continuous coupling is long
Spend it is longer compared to other continuous matching lengths when, by target criteria gene data be labeled as result gene data, with for
Gene data to be compared is compared;
When current continuous matching length and the equal longest of other continuous matching lengths, and the hash value of target gene fragment sequence
Greater than other target gene fragment sequences hash value when, by target criteria gene data be labeled as result gene data, with
It is compared in gene data to be compared.
It should be noted that present embodiment is allowed for, there may be each genetic fragment data and target criteria gene number
The case where according to alignment score being maximum value, and current continuous matching length longest identical and equal with other continuous matching lengths
The case where, in current continuous matching length in identical as other continuous matching lengths and longest situation, in order to more accurate
Get with the highest target criteria gene data of gene data matching degree to be compared, present embodiment work as adjacent target
Adjacent target gene fragment data is connected as mesh by genetic fragment matched region consecutive hours in target criteria gene data
Gene fragment order is marked, that is, adjacent target gene segment is spliced into longer sequence fragment according to its adjacent sequential.
And then when current continuous matching length longest equal with other continuous matching lengths, further target gene fragment sequence is compared
The hash value of hash value and other target gene fragment sequences, when due to hash value for the extraction of data critical content as a result,
Therefore the hash value the big, illustrates that the key content of data is more, i.e. the importance of the data is stronger, therefore works as target gene piece
When the hash value of Duan Xulie is greater than the hash value of other target gene fragment sequences, then illustrate gene data to be compared and target mark
Similarity between quasi- gene data is higher, thus by target criteria gene data be labeled as result gene data, with for
Gene data to be compared is compared.Present embodiment can further ensure that for the accurate of result gene data acquisition
Property, and then further ensure that the overall accuracy of gene data comparison process.
On the basis of a series of above-mentioned embodiments, as a preferred embodiment, gene data to be compared is obtained,
And gene data to be compared is divided into multiple genetic fragment data, comprising:
Gene data to be compared is obtained, and gene data to be compared is divided into multiple gene pieces by way of K-mer
Segment data.
The essence of K-mer algorithm is equivalent to according to certain length and certain intervals cutting character string, utilizes the side of K-mer
Gene data to be compared is divided into multiple genetic fragment data by formula, being capable of the opposite whole utilization for improving gene data to be compared
Rate, and then improve and the comprehensive of analysis is compared to gene data to be compared, further ensure gene data comparison process
Overall accuracy.
Fig. 3 is a kind of structure chart of gene data comparison device provided in an embodiment of the present invention.The embodiment of the present invention provides
Gene data comparison device, comprising:
Segmentation module 10 is obtained, is divided into multiple bases for obtaining gene data to be compared, and by gene data to be compared
Because of fragment data;
Computing module 11, for executing the standard gene in each genetic fragment data and standard gene library parallel by FPGA
Matching operation between data, and obtain the highest result of comprehensive matching degree in standard gene library with each genetic fragment data
Gene data, for being compared with gene data to be compared.
Gene data comparison device provided by the present invention, obtains gene data to be compared first, and by gene to be compared
Data are divided into multiple genetic fragment data, and then pass through FPGA and execute the mark in each genetic fragment data and standard gene library
Matching operation between quasi- gene data, and in standard gene library with the comprehensive matching degree highest standard base of genetic fragment data
Because of data gene data as a result, for being compared with gene data to be compared.The present apparatus is by by base to be compared
Because data are divided into the mode of multiple genetic fragment data, realize parallel by the FPGA with high concurrent operational capability characteristic
The purpose for executing the matching operation between the standard gene data in each genetic fragment data and the standard gene library, due to
FPGA has higher concurrent operational capability for CPU or GPU, therefore middle using CPU or GPU compared with the prior art
For realizing that gene data compares, the opposite integral operation efficiency for ensuring gene data comparison process of the present apparatus.
On the basis of said gene comparing device, as a preferred embodiment, computing module includes:
Matching operation module, for executing the standard base in each genetic fragment data and standard gene library parallel by FPGA
Because of the matching operation between data;
Matching length obtains module, for when there are in the equal matching criteria gene pool of adjacent target gene fragment data
Target criteria gene data, and the matched region consecutive hours in target criteria gene data, obtain current continuous matching length;
Condition analysis module meets preset condition for the judgement parameter including including at least current continuous matching length
When, target criteria gene data is labeled as result gene data, for being compared with gene data to be compared.
The present invention also provides a kind of gene datas to compare equipment, comprising:
Memory, for storing computer program;
Processor is realized when for executing the computer program such as the step of above-mentioned gene data comparison method.
Gene data provided by the present invention compares equipment, obtains gene data to be compared first, and by gene to be compared
Data are divided into multiple genetic fragment data, and then pass through FPGA and execute the mark in each genetic fragment data and standard gene library
Matching operation between quasi- gene data, and in standard gene library with the comprehensive matching degree highest standard base of genetic fragment data
Because of data gene data as a result, for being compared with gene data to be compared.This equipment is by by base to be compared
Because data are divided into the mode of multiple genetic fragment data, realize parallel by the FPGA with high concurrent operational capability characteristic
The purpose for executing the matching operation between the standard gene data in each genetic fragment data and the standard gene library, due to
FPGA has higher concurrent operational capability for CPU or GPU, therefore middle using CPU or GPU compared with the prior art
For realizing that gene data compares, the opposite integral operation efficiency for ensuring gene data comparison process of this equipment.
The present invention also provides a kind of computer readable storage medium, computer journey is stored on computer readable storage medium
Sequence is realized when the computer program is executed by processor such as the step of above-mentioned gene data comparison method.
Computer readable storage medium provided by the present invention, obtains gene data to be compared first, and by base to be compared
Because data are divided into multiple genetic fragment data, and then pass through FPGA and execute in each genetic fragment data and standard gene library
Matching operation between standard gene data, and in standard gene library with the comprehensive matching degree highest standard of genetic fragment data
Gene data gene data as a result, for being compared with gene data to be compared.This computer-readable storage medium
Matter is realized in such a way that gene data to be compared is divided into multiple genetic fragment data by with high concurrent operation energy
The FPGA of force characteristic executes between the standard gene data in each genetic fragment data and the standard gene library parallel
Purpose with operation, since FPGA has higher concurrent operational capability for CPU or GPU, compared to existing
For realizing that gene data compares using CPU or GPU in technology, this computer readable storage medium is opposite to ensure gene data
The integral operation efficiency of comparison process.
Detailed Jie has been carried out to a kind of gene data comparison method provided by the present invention, device, equipment and medium above
It continues.Each embodiment is described in a progressive manner in specification, the highlights of each of the examples are with other embodiments
Difference, the same or similar parts in each embodiment may refer to each other.For the device disclosed in the embodiment, by
It is corresponded to the methods disclosed in the examples in it, so being described relatively simple, reference may be made to the description of the method.
It should be pointed out that for those skilled in the art, it without departing from the principle of the present invention, can also be right
Some improvement and modification can also be carried out by the present invention, and these improvements and modifications also fall within the scope of protection of the claims of the present invention.
It should also be noted that, in the present specification, relational terms such as first and second and the like be used merely to by
One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation
Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning
Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that
A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or
The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged
Except there is also other identical elements in the process, method, article or apparatus that includes the element.
Claims (9)
1. a kind of gene data comparison method characterized by comprising
Gene data to be compared is obtained, and the gene data to be compared is divided into multiple genetic fragment data;
It is executed parallel by FPGA between the standard gene data in each genetic fragment data and the standard gene library
Matching operation, and obtain the highest result base of comprehensive matching degree in the standard gene library with each genetic fragment data
Because of data, for being compared with the gene data to be compared.
2. gene data comparison method according to claim 1, which is characterized in that described to execute each institute parallel by FPGA
The matching operation between the standard gene data in genetic fragment data and the standard gene library is stated, and obtains the standard base
Because of the highest result gene data of comprehensive matching degree in library with each genetic fragment data, with for it is described to be compared
Gene data is compared, comprising:
By the FPGA execute parallel the standard gene data in each genetic fragment data and the standard gene library it
Between the matching operation;
When matching the target criteria gene data in the standard gene library there are adjacent target gene fragment data, and
Matched region consecutive hours, obtains current continuous matching length in the target criteria gene data;
When meeting preset condition including at least the judgement parameter including the current continuous matching length, by the target criteria
Gene data is labeled as the result gene data, for carrying out described comparing analysis with the gene data to be compared.
3. gene data comparison method according to claim 2, which is characterized in that described currently to connect when including at least described
When judgement parameter including continuous matching length meets preset condition, the target criteria gene data is labeled as the result base
Because of data, for carrying out described comparing analysis with the gene data to be compared, comprising:
When the current continuous matching length is compared to other continuous matching length longests, by the target criteria gene data
Labeled as the result gene data, for carrying out described comparing analysis with the gene data to be compared.
4. gene data comparison method according to claim 2, which is characterized in that described described current when including at least
When judgement parameter including continuous matching length meets preset condition, the target criteria gene data is labeled as the result
Gene data, with for the gene data to be compared carry out it is described compare analysis before, which comprises
The adjacent target gene fragment data is connected as target gene fragment sequence;
Correspondingly, it is described when meeting preset condition including at least the judgement parameter including the current continuous matching length, it will
The target criteria gene data is labeled as the result gene data, described in being used to carry out with the gene data to be compared
Compare analysis, comprising:
When each genetic fragment data and the target criteria gene data alignment score are maximum value, and described currently connect
When continuous matching length is longer compared to the other continuous matching lengths, the target criteria gene data is labeled as the knot
Fruit gene data, for carrying out described comparing analysis with the gene data to be compared;
When the current continuous matching length and other equal longests of continuous matching length, and the target gene fragment sequence
Hash value be greater than other target gene fragment sequences hash value when, by the target criteria gene data be labeled as the knot
Fruit gene data, for carrying out described comparing analysis with the gene data to be compared.
5. gene data comparison method according to any one of claims 1 to 4, which is characterized in that it is described obtain to than
Multiple genetic fragment data are divided into gene data, and by the gene data to be compared, comprising:
The gene data to be compared is obtained, and the gene data to be compared is divided into multiple institutes by way of K-mer
State genetic fragment data.
6. a kind of gene data comparison device characterized by comprising
Segmentation module is obtained, is divided into multiple genes for obtaining gene data to be compared, and by the gene data to be compared
Fragment data;
Computing module, for executing the standard base in each genetic fragment data and the standard gene library parallel by FPGA
Because of the matching operation between data, and obtain the comprehensive matching degree in the standard gene library with each genetic fragment data
Highest result gene data, for being compared with the gene data to be compared.
7. gene data comparison device according to claim 6, which is characterized in that the computing module includes:
Matching operation module, for being executed in each genetic fragment data and the standard gene library parallel by the FPGA
Standard gene data between the matching operation;
Matching length obtains module, for when there are adjacent target gene fragment datas to match in the standard gene library
Target criteria gene data, and the matched region consecutive hours in the target criteria gene data, obtain current continuous coupling
Length;
Condition analysis module meets preset condition for the judgement parameter including including at least the current continuous matching length
When, by the target criteria gene data be labeled as the result gene data, with for the gene data to be compared into
The row comparison analysis.
8. a kind of gene data compares equipment characterized by comprising
Memory, for storing computer program;
Processor realizes that gene data described in any one of claim 1 to 5 such as compares when for executing the computer program
The step of method.
9. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium
Program is realized when the computer program is executed by processor as gene data described in any one of claim 1 to 5 compares other side
The step of method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910576190.4A CN110379461A (en) | 2019-06-28 | 2019-06-28 | A kind of gene data comparison method, device, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910576190.4A CN110379461A (en) | 2019-06-28 | 2019-06-28 | A kind of gene data comparison method, device, equipment and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110379461A true CN110379461A (en) | 2019-10-25 |
Family
ID=68251151
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910576190.4A Pending CN110379461A (en) | 2019-06-28 | 2019-06-28 | A kind of gene data comparison method, device, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110379461A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112229989A (en) * | 2020-10-19 | 2021-01-15 | 广州吉源生物科技有限公司 | Biological sample identification equipment of GPU (graphics processing Unit) technology |
WO2021169387A1 (en) * | 2020-02-28 | 2021-09-02 | 苏州浪潮智能科技有限公司 | Sequence alignment method, apparatus and device, and medium |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108985008A (en) * | 2018-06-29 | 2018-12-11 | 郑州云海信息技术有限公司 | A kind of method and Compare System of quick comparison gene data |
-
2019
- 2019-06-28 CN CN201910576190.4A patent/CN110379461A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108985008A (en) * | 2018-06-29 | 2018-12-11 | 郑州云海信息技术有限公司 | A kind of method and Compare System of quick comparison gene data |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021169387A1 (en) * | 2020-02-28 | 2021-09-02 | 苏州浪潮智能科技有限公司 | Sequence alignment method, apparatus and device, and medium |
CN112229989A (en) * | 2020-10-19 | 2021-01-15 | 广州吉源生物科技有限公司 | Biological sample identification equipment of GPU (graphics processing Unit) technology |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Rautiainen et al. | GraphAligner: rapid and versatile sequence-to-graph alignment | |
US8428882B2 (en) | Method of processing and/or genome mapping of diTag sequences | |
CA2424031C (en) | System and process for validating, aligning and reordering genetic sequence maps using ordered restriction map | |
US8832139B2 (en) | Associative memory and data searching system and method | |
CN107103205A (en) | A kind of bioinformatics method based on proteomic image data notes eukaryotic gene group | |
CN111445952B (en) | Method and system for quickly comparing similarity of super-long gene sequences | |
CN110379461A (en) | A kind of gene data comparison method, device, equipment and medium | |
CN112735528A (en) | Gene sequence comparison method and system | |
Rasheed et al. | A map-reduce framework for clustering metagenomes | |
KR20070083641A (en) | Gene identification signature(gis) analysis for transcript mapping | |
CN104156635A (en) | OPSM mining method of gene chip expression data based on common sub-sequences | |
Comin et al. | Beyond fixed-resolution alignment-free measures for mammalian enhancers sequence comparison | |
Ng et al. | Acceleration of short read alignment with runtime reconfiguration | |
CN109828785B (en) | Approximate code clone detection method accelerated by GPU | |
Zytnicki et al. | DARN! A weighted constraint solver for RNA motif localization | |
Olman et al. | Identification of regulatory binding sites using minimum spanning trees | |
Bannai et al. | A string pattern regression algorithm and its application to pattern discovery in long introns | |
CN112768081B (en) | Common-control biological network motif discovery method and device based on subgraphs and nodes | |
CN109949867A (en) | A kind of optimization method and system, storage medium of a plurality of sequence alignment algorithms | |
Gruca et al. | Annotation agnostic approaches to nascent transcription analysis: fast read stitcher and transcription fit | |
CN115497567A (en) | Nucleic acid sequence clustering method, device, computer-readable storage medium and terminal | |
Gao et al. | miR-Island: an ultrafast and memory-efficient tool for plant miRNA annotation and expression analysis | |
Cole et al. | WOODSTOCC: Extracting latent parallelism from a DNA sequence aligner on a GPU | |
CN107526942A (en) | The reverse search method of life group sequence data | |
Hu et al. | Mining low-variance biclusters to discover coregulation modules in sequencing datasets |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191025 |
|
RJ01 | Rejection of invention patent application after publication |