CN110379461A - A kind of gene data comparison method, device, equipment and medium - Google Patents

A kind of gene data comparison method, device, equipment and medium Download PDF

Info

Publication number
CN110379461A
CN110379461A CN201910576190.4A CN201910576190A CN110379461A CN 110379461 A CN110379461 A CN 110379461A CN 201910576190 A CN201910576190 A CN 201910576190A CN 110379461 A CN110379461 A CN 110379461A
Authority
CN
China
Prior art keywords
data
gene
gene data
compared
standard
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910576190.4A
Other languages
Chinese (zh)
Inventor
葛沅
赵健
尹云峰
崔星辰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Wave Intelligent Technology Co Ltd
Original Assignee
Suzhou Wave Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Wave Intelligent Technology Co Ltd filed Critical Suzhou Wave Intelligent Technology Co Ltd
Priority to CN201910576190.4A priority Critical patent/CN110379461A/en
Publication of CN110379461A publication Critical patent/CN110379461A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search

Landscapes

  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biophysics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of gene data comparison method, device, equipment and media.The step of this method includes: to obtain gene data to be compared, and gene data to be compared is divided into multiple genetic fragment data;Execute the matching operation between the standard gene data in each genetic fragment data and standard gene library parallel by FPGA, and obtain in standard gene library with the highest result gene data of the comprehensive matching degree of each genetic fragment data, with for being compared with gene data to be compared.For realizing that gene data compares using CPU or GPU in compared with the prior art, the opposite integral operation efficiency for ensuring gene data comparison process of this method.In addition, the present invention also provides a kind of gene data comparison device, equipment and medium, beneficial effect are same as above.

Description

A kind of gene data comparison method, device, equipment and medium
Technical field
The present invention relates to field of cloud calculation, more particularly to a kind of gene data comparison method, device, equipment and medium.
Background technique
Gene (gene) is complete nucleotide sequence needed for generating a polypeptide chain or function RNA.Gene is supported The essential structure and performance of life.
In the research for biology, researcher has for every a kind of biological corresponding standard gene library, in research base During the evolution or variation of cause, in the standard gene library of the gene to be compared and the species that need newly to get in species Corresponding target gene is compared, and the gene variation information of the species is learned with this.
Comparison currently for gene data is limited in by CPU or GPU realization, but due to the comparison of gene data Data operation quantity in journey is huge, and as the application of gene comparison technology is deepened, leads to the continuous of the scale of gene data Increase, the integral operation efficiency of gene data comparison process is difficult to ensure by the operational capability of CPU and GPU itself.
It can be seen that a kind of gene data comparison method is provided, with the opposite whole fortune for ensuring gene data comparison process Efficiency is calculated, is those skilled in the art's problem to be solved.
Summary of the invention
The object of the present invention is to provide a kind of gene data comparison method, device, equipment and media, ensure gene with opposite The integral operation efficiency of comparing process.
In order to solve the above technical problems, the present invention provides a kind of gene data comparison method, comprising:
Gene data to be compared is obtained, and gene data to be compared is divided into multiple genetic fragment data;
Execute the matching between the standard gene data in each genetic fragment data and standard gene library parallel by FPGA Operation, and obtain in standard gene library with the highest result gene data of the comprehensive matching degree of each genetic fragment data, with It is compared in gene data to be compared.
Preferably, it is executed parallel by FPGA between the standard gene data in each genetic fragment data and standard gene library Matching operation, and obtain in standard gene library with the highest result gene number of the comprehensive matching degree of each genetic fragment data According to for being compared with gene data to be compared, comprising:
Execute the matching between the standard gene data in each genetic fragment data and standard gene library parallel by FPGA Operation;
When there are the target criteria gene datas in the equal matching criteria gene pool of adjacent target gene fragment data, and Matched region consecutive hours, obtains current continuous matching length in target criteria gene data;
When meeting preset condition including at least the judgement parameter including current continuous matching length, by target criteria gene Data markers are result gene data, for being compared with gene data to be compared.
Preferably, when meeting preset condition including at least the judgement parameter including current continuous matching length, by target Standard gene data markers are result gene data, for being compared with gene data to be compared, comprising:
When current continuous matching length is compared to other continuous matching length longests, target criteria gene data is marked For result gene data, for being compared with gene data to be compared.
Preferably, when the judgement parameter including including at least current continuous matching length meets preset condition, by mesh Mark standard gene data markers are result gene data, before being used to be compared with gene data to be compared, method Include:
Adjacent target gene fragment data is connected as target gene fragment sequence;
Correspondingly, when meeting preset condition including at least the judgement parameter including current continuous matching length, by target Standard gene data markers are result gene data, for being compared with gene data to be compared, comprising:
When each genetic fragment data and target criteria gene data alignment score are maximum value, by current continuous coupling Between length and other continuous matching lengths longer standard gene data markers be result gene data, with for it is to be compared Gene data is compared;
When current continuous matching length and the equal longest of other continuous matching lengths, and the hash value of target gene fragment sequence Greater than other target gene fragment sequences hash value when, by target criteria gene data be labeled as result gene data, with It is compared in gene data to be compared.
Preferably, gene data to be compared is obtained, and gene data to be compared is divided into multiple genetic fragment data, is wrapped It includes:
Gene data to be compared is obtained, and gene data to be compared is divided into multiple gene pieces by way of K-mer Segment data.
In addition, the present invention also provides a kind of gene data comparison devices, comprising:
Segmentation module is obtained, is divided into multiple genes for obtaining gene data to be compared, and by gene data to be compared Fragment data;
Computing module, for executing the standard gene number in each genetic fragment data and standard gene library parallel by FPGA Matching operation between, and obtain in standard gene library with the highest result base of the comprehensive matching degree of each genetic fragment data Because of data, for being compared with gene data to be compared.
Preferably, computing module includes:
Matching operation module, for executing the standard base in each genetic fragment data and standard gene library parallel by FPGA Because of the matching operation between data;
Matching length obtains module, for when there are in the equal matching criteria gene pool of adjacent target gene fragment data Target criteria gene data, and the matched region consecutive hours in target criteria gene data, obtain current continuous matching length;
Condition analysis module meets preset condition for the judgement parameter including including at least current continuous matching length When, target criteria gene data is labeled as result gene data, for being compared with gene data to be compared.
In addition, the present invention also provides a kind of gene datas to compare equipment, comprising:
Memory, for storing computer program;
Processor is realized when for executing computer program such as the step of above-mentioned gene data comparison method.
In addition, being stored with meter on computer readable storage medium the present invention also provides a kind of computer readable storage medium Calculation machine program is realized when computer program is executed by processor such as the step of above-mentioned gene data comparison method.
Gene data comparison method provided by the present invention, obtains gene data to be compared first, and by gene to be compared Data are divided into multiple genetic fragment data, and then pass through FPGA and execute the mark in each genetic fragment data and standard gene library Matching operation between quasi- gene data, and in standard gene library with the comprehensive matching degree highest standard base of genetic fragment data Because of data gene data as a result, for being compared with gene data to be compared.This method is by by base to be compared Because data are divided into the mode of multiple genetic fragment data, realize parallel by the FPGA with high concurrent operational capability characteristic The purpose for executing the matching operation between the standard gene data in each genetic fragment data and the standard gene library, due to FPGA has higher concurrent operational capability for CPU or GPU, therefore middle using CPU or GPU compared with the prior art For realizing that gene data compares, the opposite integral operation efficiency for ensuring gene data comparison process of this method.In addition, this hair It is bright that also to provide a kind of gene data comparison device, equipment and medium, beneficial effect same as above.
Detailed description of the invention
In order to illustrate the embodiments of the present invention more clearly, attached drawing needed in the embodiment will be done simply below It introduces, it should be apparent that, drawings in the following description are only some embodiments of the invention, for ordinary skill people For member, without creative efforts, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of flow chart of gene data comparison method provided in an embodiment of the present invention;
Fig. 2 is a kind of flow chart of gene data comparison method provided in an embodiment of the present invention;
Fig. 3 is a kind of structure chart of gene data comparison device provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, rather than whole embodiments.Based on this Embodiment in invention, those of ordinary skill in the art are without making creative work, obtained every other Embodiment belongs to the scope of the present invention.
Comparison currently for gene data is limited in by CPU or GPU realization, but due to the comparison of gene data Data operation quantity in journey is huge, and as the application of gene comparison technology is deepened, leads to the continuous of the scale of gene data Increase, the integral operation efficiency of gene data comparison process is difficult to ensure by the operational capability of CPU and GPU itself.
Core of the invention is to provide a kind of gene data comparison method, ensures the whole of gene data comparison process with opposite Body operation efficiency.Another core of the invention is to provide a kind of gene data comparison device, equipment and medium.
In order to enable those skilled in the art to better understand the solution of the present invention, with reference to the accompanying drawings and detailed description The present invention is described in further detail.
Fig. 1 is a kind of flow chart of gene data comparison method provided in an embodiment of the present invention.Referring to FIG. 1, gene number Include: according to the specific steps of comparison method
Step S10: gene data to be compared is obtained, and gene data to be compared is divided into multiple genetic fragment data.
It should be noted that the gene data to be compared in this step is the gene of current its particular sequence content to be analyzed Data, since gene data to be compared is the form of sequence, the comparison for gene is actually to base on each position of sequence It is compared because of the comparison of coherence of element, therefore in order to parallel in segmented fashion, this step divides gene data to be compared It was segmented into a genetic fragment data, wherein genetic fragment data are the gene data of certain length in gene data to be compared Paragraph, user can set the whole number for the genetic fragment data being split to form by gene to be compared according to specific demand According to that is, according to the bout length of the divided genetic fragment data of specific requirements setting.
Step S11: executed parallel by FPGA standard gene data in each genetic fragment data and standard gene library it Between matching operation, and obtain in standard gene library with the highest result gene number of the comprehensive matching degree of each genetic fragment data According to for being compared with gene data to be compared.
It should be noted that this step focus on carry out each genetic fragment data in a parallel fashion using FPGA With the matching operation between the standard gene data in standard gene library, the purpose of matching operation be learn in standard gene library with Gene data to be compared is consistent or approximate standard gene data, i.e., the result gene data in this step.
During matching operation, FPGA carries out each standard in each genetic fragment data and standard gene library respectively Matching between gene data can only characterize the portion gene information in complete genome due to genetic fragment data, and mark There may be identical portion gene information between different standard genes in quasi- gene pool, therefore matching signified in this step The main purpose of process is to obtain to match with the content of greater number of genetic fragment data in standard gene library Standard gene, that is to say, that the amount to match with the genetic fragment data that gene data to be compared is divided is more, that is, this Signified comprehensive matching degree highest, then more close with gene data to be compared in step.In getting standard gene library with After the highest result gene data of the comprehensive matching degree of each genetic fragment data, then further by result gene data be used for Gene data to be compared is compared, and the particular content of comparative analysis is not as the key content of the application, therefore herein It does not repeat them here.
Gene data comparison method provided by the present invention, obtains gene data to be compared first, and by gene to be compared Data are divided into multiple genetic fragment data, and then pass through FPGA and execute the mark in each genetic fragment data and standard gene library Matching operation between quasi- gene data, and in standard gene library with the comprehensive matching degree highest standard base of genetic fragment data Because of data gene data as a result, for being compared with gene data to be compared.This method is by by base to be compared Because data are divided into the mode of multiple genetic fragment data, realize parallel by the FPGA with high concurrent operational capability characteristic The purpose for executing the matching operation between the standard gene data in each genetic fragment data and the standard gene library, due to FPGA has higher concurrent operational capability for CPU or GPU, therefore middle using CPU or GPU compared with the prior art For realizing that gene data compares, the opposite integral operation efficiency for ensuring gene data comparison process of this method.
Fig. 2 is a kind of flow chart of gene data comparison method provided in an embodiment of the present invention.Referring to FIG. 2, gene number Include: according to the specific steps of comparison method
Step S20: gene data to be compared is obtained, and gene data to be compared is divided into multiple genetic fragment data.
Step S21: the mark in each genetic fragment data and the standard gene library is executed by the FPGA parallel The matching operation between quasi- gene data.
Step S22: when matching the target criteria in the standard gene library there are adjacent target gene fragment data Gene data, and the matched region consecutive hours in the target criteria gene data, obtain current continuous matching length.
It should be noted that the present embodiment is considered carrying out each genetic fragment data and standard gene library Plays gene When matching between data, it is understood that there may be in gene data to be compared in the adjacent genetic fragment data in position and standard information library The case where target criteria gene data continuum matches, the i.e. region of target criteria gene and neighboring gene segment Corresponding matching Between be it is continuous, in these cases, this method obtains adjacent target genetic fragment continuous coupling target criteria genetic profile Under adjacent target genetic fragment current matching length, each genetic fragment data and mark are characterized by current matching length with this Comprehensive matching degree in quasi- gene pool between each standard gene.
Step S23:, will when meeting preset condition including at least the judgement parameter including the current continuous matching length The target criteria gene data is labeled as the result gene data, described in being used to carry out with the gene data to be compared Compare analysis.
This step is using current continuous matching length as judging gene data to be compared and standard gene library Plays gene Between comprehensive matching degree one of judgement parameter, the judgement parameter including including at least the current continuous matching length is full When sufficient preset condition, the target criteria gene data is labeled as the result gene data.
The present embodiment is by obtaining the adjacent target under adjacent target genetic fragment continuous coupling target criteria genetic profile The current matching length of genetic fragment, and then characterized in each genetic fragment data and standard gene library respectively by current matching length Comprehensive matching degree between a standard gene, using this by current continuous matching length as judge gene data to be compared and mark The judgement parameter of comprehensive matching degree between quasi- gene pool Plays gene, it is opposite to improve for the acquisition of result gene data Accuracy, and then ensure the overall accuracy of gene data comparison process.
On the basis of the above embodiments, as a preferred embodiment, it is long when including at least current continuous coupling When judgement parameter including degree meets preset condition, by target criteria gene data be labeled as result gene data, with for Gene data to be compared is compared, comprising:
When current continuous matching length is compared to other continuous matching length longests, target criteria gene data is marked For result gene data, for being compared with gene data to be compared.
It should be noted that judging parameter in present embodiment only includes current continuous matching length, therefore when current When continuous matching length is compared to other continuous matching length longests, then illustrate the corresponding target criteria of current continuous matching length Consistency between gene data and gene data to be compared is higher, therefore target criteria gene data is labeled as result gene Data, and for gene data to be compared to be compared.Due to continuous matching length can intuitively reflect it is to be compared Close degree between gene data and standard gene data, therefore present embodiment opposite can ensure for result gene number According to the accuracy of acquisition, and then ensure the overall accuracy of gene data comparison process.
On the basis of the above embodiments, as a preferred embodiment, when including at least current continuous coupling When judgement parameter including length meets preset condition, target criteria gene data is labeled as result gene data, to be used for Before being compared with gene data to be compared, method includes:
Adjacent target gene fragment data is connected as target gene fragment sequence;
Correspondingly, when meeting preset condition including at least the judgement parameter including current continuous matching length, by target Standard gene data markers are result gene data, for being compared with gene data to be compared, comprising:
When each genetic fragment data and target criteria gene data alignment score are maximum value, and current continuous coupling is long Spend it is longer compared to other continuous matching lengths when, by target criteria gene data be labeled as result gene data, with for Gene data to be compared is compared;
When current continuous matching length and the equal longest of other continuous matching lengths, and the hash value of target gene fragment sequence Greater than other target gene fragment sequences hash value when, by target criteria gene data be labeled as result gene data, with It is compared in gene data to be compared.
It should be noted that present embodiment is allowed for, there may be each genetic fragment data and target criteria gene number The case where according to alignment score being maximum value, and current continuous matching length longest identical and equal with other continuous matching lengths The case where, in current continuous matching length in identical as other continuous matching lengths and longest situation, in order to more accurate Get with the highest target criteria gene data of gene data matching degree to be compared, present embodiment work as adjacent target Adjacent target gene fragment data is connected as mesh by genetic fragment matched region consecutive hours in target criteria gene data Gene fragment order is marked, that is, adjacent target gene segment is spliced into longer sequence fragment according to its adjacent sequential. And then when current continuous matching length longest equal with other continuous matching lengths, further target gene fragment sequence is compared The hash value of hash value and other target gene fragment sequences, when due to hash value for the extraction of data critical content as a result, Therefore the hash value the big, illustrates that the key content of data is more, i.e. the importance of the data is stronger, therefore works as target gene piece When the hash value of Duan Xulie is greater than the hash value of other target gene fragment sequences, then illustrate gene data to be compared and target mark Similarity between quasi- gene data is higher, thus by target criteria gene data be labeled as result gene data, with for Gene data to be compared is compared.Present embodiment can further ensure that for the accurate of result gene data acquisition Property, and then further ensure that the overall accuracy of gene data comparison process.
On the basis of a series of above-mentioned embodiments, as a preferred embodiment, gene data to be compared is obtained, And gene data to be compared is divided into multiple genetic fragment data, comprising:
Gene data to be compared is obtained, and gene data to be compared is divided into multiple gene pieces by way of K-mer Segment data.
The essence of K-mer algorithm is equivalent to according to certain length and certain intervals cutting character string, utilizes the side of K-mer Gene data to be compared is divided into multiple genetic fragment data by formula, being capable of the opposite whole utilization for improving gene data to be compared Rate, and then improve and the comprehensive of analysis is compared to gene data to be compared, further ensure gene data comparison process Overall accuracy.
Fig. 3 is a kind of structure chart of gene data comparison device provided in an embodiment of the present invention.The embodiment of the present invention provides Gene data comparison device, comprising:
Segmentation module 10 is obtained, is divided into multiple bases for obtaining gene data to be compared, and by gene data to be compared Because of fragment data;
Computing module 11, for executing the standard gene in each genetic fragment data and standard gene library parallel by FPGA Matching operation between data, and obtain the highest result of comprehensive matching degree in standard gene library with each genetic fragment data Gene data, for being compared with gene data to be compared.
Gene data comparison device provided by the present invention, obtains gene data to be compared first, and by gene to be compared Data are divided into multiple genetic fragment data, and then pass through FPGA and execute the mark in each genetic fragment data and standard gene library Matching operation between quasi- gene data, and in standard gene library with the comprehensive matching degree highest standard base of genetic fragment data Because of data gene data as a result, for being compared with gene data to be compared.The present apparatus is by by base to be compared Because data are divided into the mode of multiple genetic fragment data, realize parallel by the FPGA with high concurrent operational capability characteristic The purpose for executing the matching operation between the standard gene data in each genetic fragment data and the standard gene library, due to FPGA has higher concurrent operational capability for CPU or GPU, therefore middle using CPU or GPU compared with the prior art For realizing that gene data compares, the opposite integral operation efficiency for ensuring gene data comparison process of the present apparatus.
On the basis of said gene comparing device, as a preferred embodiment, computing module includes:
Matching operation module, for executing the standard base in each genetic fragment data and standard gene library parallel by FPGA Because of the matching operation between data;
Matching length obtains module, for when there are in the equal matching criteria gene pool of adjacent target gene fragment data Target criteria gene data, and the matched region consecutive hours in target criteria gene data, obtain current continuous matching length;
Condition analysis module meets preset condition for the judgement parameter including including at least current continuous matching length When, target criteria gene data is labeled as result gene data, for being compared with gene data to be compared.
The present invention also provides a kind of gene datas to compare equipment, comprising:
Memory, for storing computer program;
Processor is realized when for executing the computer program such as the step of above-mentioned gene data comparison method.
Gene data provided by the present invention compares equipment, obtains gene data to be compared first, and by gene to be compared Data are divided into multiple genetic fragment data, and then pass through FPGA and execute the mark in each genetic fragment data and standard gene library Matching operation between quasi- gene data, and in standard gene library with the comprehensive matching degree highest standard base of genetic fragment data Because of data gene data as a result, for being compared with gene data to be compared.This equipment is by by base to be compared Because data are divided into the mode of multiple genetic fragment data, realize parallel by the FPGA with high concurrent operational capability characteristic The purpose for executing the matching operation between the standard gene data in each genetic fragment data and the standard gene library, due to FPGA has higher concurrent operational capability for CPU or GPU, therefore middle using CPU or GPU compared with the prior art For realizing that gene data compares, the opposite integral operation efficiency for ensuring gene data comparison process of this equipment.
The present invention also provides a kind of computer readable storage medium, computer journey is stored on computer readable storage medium Sequence is realized when the computer program is executed by processor such as the step of above-mentioned gene data comparison method.
Computer readable storage medium provided by the present invention, obtains gene data to be compared first, and by base to be compared Because data are divided into multiple genetic fragment data, and then pass through FPGA and execute in each genetic fragment data and standard gene library Matching operation between standard gene data, and in standard gene library with the comprehensive matching degree highest standard of genetic fragment data Gene data gene data as a result, for being compared with gene data to be compared.This computer-readable storage medium Matter is realized in such a way that gene data to be compared is divided into multiple genetic fragment data by with high concurrent operation energy The FPGA of force characteristic executes between the standard gene data in each genetic fragment data and the standard gene library parallel Purpose with operation, since FPGA has higher concurrent operational capability for CPU or GPU, compared to existing For realizing that gene data compares using CPU or GPU in technology, this computer readable storage medium is opposite to ensure gene data The integral operation efficiency of comparison process.
Detailed Jie has been carried out to a kind of gene data comparison method provided by the present invention, device, equipment and medium above It continues.Each embodiment is described in a progressive manner in specification, the highlights of each of the examples are with other embodiments Difference, the same or similar parts in each embodiment may refer to each other.For the device disclosed in the embodiment, by It is corresponded to the methods disclosed in the examples in it, so being described relatively simple, reference may be made to the description of the method. It should be pointed out that for those skilled in the art, it without departing from the principle of the present invention, can also be right Some improvement and modification can also be carried out by the present invention, and these improvements and modifications also fall within the scope of protection of the claims of the present invention.
It should also be noted that, in the present specification, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged Except there is also other identical elements in the process, method, article or apparatus that includes the element.

Claims (9)

1. a kind of gene data comparison method characterized by comprising
Gene data to be compared is obtained, and the gene data to be compared is divided into multiple genetic fragment data;
It is executed parallel by FPGA between the standard gene data in each genetic fragment data and the standard gene library Matching operation, and obtain the highest result base of comprehensive matching degree in the standard gene library with each genetic fragment data Because of data, for being compared with the gene data to be compared.
2. gene data comparison method according to claim 1, which is characterized in that described to execute each institute parallel by FPGA The matching operation between the standard gene data in genetic fragment data and the standard gene library is stated, and obtains the standard base Because of the highest result gene data of comprehensive matching degree in library with each genetic fragment data, with for it is described to be compared Gene data is compared, comprising:
By the FPGA execute parallel the standard gene data in each genetic fragment data and the standard gene library it Between the matching operation;
When matching the target criteria gene data in the standard gene library there are adjacent target gene fragment data, and Matched region consecutive hours, obtains current continuous matching length in the target criteria gene data;
When meeting preset condition including at least the judgement parameter including the current continuous matching length, by the target criteria Gene data is labeled as the result gene data, for carrying out described comparing analysis with the gene data to be compared.
3. gene data comparison method according to claim 2, which is characterized in that described currently to connect when including at least described When judgement parameter including continuous matching length meets preset condition, the target criteria gene data is labeled as the result base Because of data, for carrying out described comparing analysis with the gene data to be compared, comprising:
When the current continuous matching length is compared to other continuous matching length longests, by the target criteria gene data Labeled as the result gene data, for carrying out described comparing analysis with the gene data to be compared.
4. gene data comparison method according to claim 2, which is characterized in that described described current when including at least When judgement parameter including continuous matching length meets preset condition, the target criteria gene data is labeled as the result Gene data, with for the gene data to be compared carry out it is described compare analysis before, which comprises
The adjacent target gene fragment data is connected as target gene fragment sequence;
Correspondingly, it is described when meeting preset condition including at least the judgement parameter including the current continuous matching length, it will The target criteria gene data is labeled as the result gene data, described in being used to carry out with the gene data to be compared Compare analysis, comprising:
When each genetic fragment data and the target criteria gene data alignment score are maximum value, and described currently connect When continuous matching length is longer compared to the other continuous matching lengths, the target criteria gene data is labeled as the knot Fruit gene data, for carrying out described comparing analysis with the gene data to be compared;
When the current continuous matching length and other equal longests of continuous matching length, and the target gene fragment sequence Hash value be greater than other target gene fragment sequences hash value when, by the target criteria gene data be labeled as the knot Fruit gene data, for carrying out described comparing analysis with the gene data to be compared.
5. gene data comparison method according to any one of claims 1 to 4, which is characterized in that it is described obtain to than Multiple genetic fragment data are divided into gene data, and by the gene data to be compared, comprising:
The gene data to be compared is obtained, and the gene data to be compared is divided into multiple institutes by way of K-mer State genetic fragment data.
6. a kind of gene data comparison device characterized by comprising
Segmentation module is obtained, is divided into multiple genes for obtaining gene data to be compared, and by the gene data to be compared Fragment data;
Computing module, for executing the standard base in each genetic fragment data and the standard gene library parallel by FPGA Because of the matching operation between data, and obtain the comprehensive matching degree in the standard gene library with each genetic fragment data Highest result gene data, for being compared with the gene data to be compared.
7. gene data comparison device according to claim 6, which is characterized in that the computing module includes:
Matching operation module, for being executed in each genetic fragment data and the standard gene library parallel by the FPGA Standard gene data between the matching operation;
Matching length obtains module, for when there are adjacent target gene fragment datas to match in the standard gene library Target criteria gene data, and the matched region consecutive hours in the target criteria gene data, obtain current continuous coupling Length;
Condition analysis module meets preset condition for the judgement parameter including including at least the current continuous matching length When, by the target criteria gene data be labeled as the result gene data, with for the gene data to be compared into The row comparison analysis.
8. a kind of gene data compares equipment characterized by comprising
Memory, for storing computer program;
Processor realizes that gene data described in any one of claim 1 to 5 such as compares when for executing the computer program The step of method.
9. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program is realized when the computer program is executed by processor as gene data described in any one of claim 1 to 5 compares other side The step of method.
CN201910576190.4A 2019-06-28 2019-06-28 A kind of gene data comparison method, device, equipment and medium Pending CN110379461A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910576190.4A CN110379461A (en) 2019-06-28 2019-06-28 A kind of gene data comparison method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910576190.4A CN110379461A (en) 2019-06-28 2019-06-28 A kind of gene data comparison method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN110379461A true CN110379461A (en) 2019-10-25

Family

ID=68251151

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910576190.4A Pending CN110379461A (en) 2019-06-28 2019-06-28 A kind of gene data comparison method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN110379461A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112229989A (en) * 2020-10-19 2021-01-15 广州吉源生物科技有限公司 Biological sample identification equipment of GPU (graphics processing Unit) technology
WO2021169387A1 (en) * 2020-02-28 2021-09-02 苏州浪潮智能科技有限公司 Sequence alignment method, apparatus and device, and medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108985008A (en) * 2018-06-29 2018-12-11 郑州云海信息技术有限公司 A kind of method and Compare System of quick comparison gene data

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108985008A (en) * 2018-06-29 2018-12-11 郑州云海信息技术有限公司 A kind of method and Compare System of quick comparison gene data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021169387A1 (en) * 2020-02-28 2021-09-02 苏州浪潮智能科技有限公司 Sequence alignment method, apparatus and device, and medium
CN112229989A (en) * 2020-10-19 2021-01-15 广州吉源生物科技有限公司 Biological sample identification equipment of GPU (graphics processing Unit) technology

Similar Documents

Publication Publication Date Title
Rautiainen et al. GraphAligner: rapid and versatile sequence-to-graph alignment
US8428882B2 (en) Method of processing and/or genome mapping of diTag sequences
CA2424031C (en) System and process for validating, aligning and reordering genetic sequence maps using ordered restriction map
US8832139B2 (en) Associative memory and data searching system and method
CN107103205A (en) A kind of bioinformatics method based on proteomic image data notes eukaryotic gene group
CN111445952B (en) Method and system for quickly comparing similarity of super-long gene sequences
CN110379461A (en) A kind of gene data comparison method, device, equipment and medium
CN112735528A (en) Gene sequence comparison method and system
Rasheed et al. A map-reduce framework for clustering metagenomes
KR20070083641A (en) Gene identification signature(gis) analysis for transcript mapping
CN104156635A (en) OPSM mining method of gene chip expression data based on common sub-sequences
Comin et al. Beyond fixed-resolution alignment-free measures for mammalian enhancers sequence comparison
Ng et al. Acceleration of short read alignment with runtime reconfiguration
CN109828785B (en) Approximate code clone detection method accelerated by GPU
Zytnicki et al. DARN! A weighted constraint solver for RNA motif localization
Olman et al. Identification of regulatory binding sites using minimum spanning trees
Bannai et al. A string pattern regression algorithm and its application to pattern discovery in long introns
CN112768081B (en) Common-control biological network motif discovery method and device based on subgraphs and nodes
CN109949867A (en) A kind of optimization method and system, storage medium of a plurality of sequence alignment algorithms
Gruca et al. Annotation agnostic approaches to nascent transcription analysis: fast read stitcher and transcription fit
CN115497567A (en) Nucleic acid sequence clustering method, device, computer-readable storage medium and terminal
Gao et al. miR-Island: an ultrafast and memory-efficient tool for plant miRNA annotation and expression analysis
Cole et al. WOODSTOCC: Extracting latent parallelism from a DNA sequence aligner on a GPU
CN107526942A (en) The reverse search method of life group sequence data
Hu et al. Mining low-variance biclusters to discover coregulation modules in sequencing datasets

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191025

RJ01 Rejection of invention patent application after publication