CN105095686B - High-throughput transcript profile sequencing data method of quality control based on multi-core CPU hardware - Google Patents
High-throughput transcript profile sequencing data method of quality control based on multi-core CPU hardware Download PDFInfo
- Publication number
- CN105095686B CN105095686B CN201410205571.9A CN201410205571A CN105095686B CN 105095686 B CN105095686 B CN 105095686B CN 201410205571 A CN201410205571 A CN 201410205571A CN 105095686 B CN105095686 B CN 105095686B
- Authority
- CN
- China
- Prior art keywords
- sequence
- transcript profile
- core cpu
- data
- quality
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The present invention is a kind of high-throughput transcript profile sequencing data method of quality control based on multi-core CPU hardware.Including:Parallel processing is carried out to high-throughput transcript profile sequencing data using multi-core CPU, obtains the data for removing low sequencing quality sequence;The rRNA sequences in data using multi-core CPU to removing low sequencing quality sequence are predicted and are removed, and carry out the Qualitative Identification of polluted sequence;Sequence alignment result is counted and is evaluated.The present invention is based on multi-core CPU computer, the computational efficiency bottleneck based on monokaryon CPU hardware computer is overcome, high-throughput transcript profile data quality control efficiency can be made to improve 7 times or more;The application of the present invention will significantly improve accuracy and the speed of high-throughput transcript profile data quality control, contribute to the rapid development of transcript profile sequencing correlative study extensively.
Description
Technical field
The invention patent relates to bioinformatics, specifically a kind of high-throughput transcript profile based on multi-core CPU hardware
Sequencing data method of quality control quickly can carry out quality control to high-throughput transcript profile sequencing data.
Background technology
High throughput sequencing technologies are also known as " next generation " sequencing technologies, are the changes to tradition sequencing revolution, can
Sequencing once is carried out to millions of DNA/RNA molecules to hundreds of thousands, is applied to biology phase more and more widely
It closes in research.Compared with traditional Sanger sequencing technologies, the flux of new-generation sequencing technology improves one to two orders of magnitude,
Data volume is more (100MB to several G).Transcript profile sequencing is a deeply application based on high throughput sequencing technologies, can be to one
That the transcripting spectrums of a species carries out is careful, deeply and comprehensively analyzes.However, due to high throughput sequencing technologies itself limitation and
The operating error that transcript profile extraction etc. is artificially tested, the transcript profile data being originally generated often contain part low quality sequence, packet
Include low quality base, polluted sequence and nRNA sequence (rRNA) etc..After the presence of these low quality sequences will greatly influence
The accuracy of continuous transcript profile data analysis, even results in the conclusion of mistake.Further, since subsequent transcriptome analysis result relies on
It is obtained after (alignment) is compared with reference gene group in sequence, therefore the comparison quality of transcript profile sequence is also weighing apparatus
Measure one of the key factor of transcript profile sequencing data total quality.In conclusion quality control is to carry out high-throughput transcript profile to survey
The necessary committed step of sequence data analysis.Current existing transcript profile data quality control method, which focuses primarily upon, completes sequence ratio
Quality evaluation to level, and can not quality control comprehensively be carried out at the same time for base, sequence, pollution and comparison quality.
Since high-throughput transcript profile sequencing data generally requires the multiple samples for measuring different condition or different time points acquisition
This, each sample is generally respectively necessary for three or more biology and repeats to repeat with technology, therefore the sample size being sequenced is huge, leads
Sequencing every time is caused often to obtain the data volume more than 20 samples and tens G, so corresponding to high-throughput transcript profile data
Quality control, it is necessary to there is the supercomputer with suitable operational capability and corresponding analysis software to realize.Using current
General analysis method scans several hundred million sequences using single CPU computer and is handled respectively one by one, it may be necessary to number
Its even month time, the efficiency of data analysis is made also to become the big bottleneck in correlative study.
Invention content
Comprehensively, accurately it efficiently can not meet high-throughput transcript profile in order to solve traditional analysis and computing system and survey
The problem of requirement of sequence data quality control, the present invention according to high-throughput transcript profile sequencing data can parallel processing the characteristics of, carry
Go out a kind of high-throughput transcript profile sequencing data method of quality control based on multi-core CPU hardware.
Present invention technical solution used for the above purpose is:A kind of high throughput based on multi-core CPU hardware turn
Record group sequencing data method of quality control, includes the following steps:
Parallel processing is carried out to high-throughput transcript profile sequencing data using multi-core CPU, obtains removing low sequencing quality sequence
Data;
The rRNA sequences in data using multi-core CPU to removing low sequencing quality sequence are predicted and are removed, and are gone forward side by side
The Qualitative Identification of row polluted sequence;
Sequence alignment result is counted and is evaluated.
The removal for carrying out low sequencing quality sequence to high-throughput transcript profile sequencing data using multi-core CPU, including with
Lower step:
Input file is divided into several small-scale subdatas using Parallel-QC tools;
Each subdata is assigned on different CPU cores;
While the base quality and joint sequence of each sequence in its subdata are detected on multiple CPU cores, and according to
The length that user specifies cuts off the low quality base at each sequence both ends successively, filters the low quality alkali containing user's designated ratio
The sequence of base deletes joint sequence therein;
Treated sequence is merged together, to obtain removing the data of low sequencing quality sequence.
RRNA sequences in the data using multi-core CPU to removing low sequencing quality sequence are predicted and are removed,
And the Qualitative Identification of polluted sequence is carried out, include the following steps:
By rRNA sequence construct Hidden Markov Model all in database SILVA;It is searched based on Hidden Markov Model
Rope carries out rRNA predictions and extraction for transcript profile sequence, and the rRNA sequences of prediction are removed from transcript profile data;
16S the or 18S rRNA that will be predicted and extract are mapped on known rRNA sequence libraries SILVA, obtain institute
There is the source of species information of sequence, the annotation result of 16S and 18S rRNA characteristic sequences gathers respectively, generates species
Structure composition is as a result, to obtain all species that may be present and polluted information in transcript profile sequencing data;
It is described that rRNA predictions and extraction are carried out for transcript profile sequence based on Hidden Markov Model search, and by prediction
RRNA sequences are removed from transcript profile data, are included the following steps:
The data file segmentation of the processed removal low quality sequencing sequences of Parallel-QC will be passed through into small-scale subnumber
According to;
Different subdatas is assigned on different CPU cores;
Predict 16S, 18S, 23S or 28S rRNA characteristic sequences of subsequence simultaneously on numerous CPU cores;
All kinds of characteristic sequence prediction results are merged together;
Extensive input data is repeatedly loaded into memory according to characteristic sequence prediction result from external memory and is searched and is carried
It takes, finally merges search result.
The result on sequence alignment to reference gene group is counted and is evaluated, including the number of statistical series,
Sequence of calculation coverage summarizes both-end sequence comparison information.
The number of the statistical series includes full sequence, compares successful sequence, compares and arrive certain specific gene group areas
The sequence in domain and above-mentioned sequence ratio shared in full sequence.
The sequence of calculation coverage includes the number for the gene that sequence successfully compares, the covering of the base of each gene
Distribution of the sequence that degree, success compare on genome structure.
Sequence number, the number for the sequence that only one end successfully compares, the both-end ratio including the successful comparison of both-end
To the Insert Fragment length of sequence.
The present invention has the following advantages and beneficial effects:
1. realizing comprehensive, efficient transcript profile data quality control, including for sequencing quality, rRNA sequences, pollution
Various comprehensive analysis such as sequence and comparison result and Quality Control;
2. being matched with based on multi-core CPU computer, the computational efficiency bottle based on monokaryon CPU hardware computer is overcome
Neck can make high-throughput transcript profile data quality control efficiency improve 7 times or more;
3. the application of the present invention will significantly improve accuracy and the speed of high-throughput transcript profile data quality control, extensively
Contribute to the rapid development of transcript profile sequencing correlative study.
Description of the drawings
Fig. 1 is the hardware architecture diagram of the present invention;Wherein, it is 1. DMI and PCIe2.0 buses;2. being triple channel DDR3 memories
Bus;3. being SATA buses;
Fig. 2 is the software flow pattern of the present invention;Wherein, (1) is low sequencing quality data processing;(2) be rRNA sequences and
The Qualitative Identification of polluted sequence;(3) it is the evaluation and quality control of sequence comparison;
Fig. 3 is the test for the same transcript profile sequencing data using the 16 core CUP applications present invention and application monokaryon CPU
Result figure.
Specific implementation mode
The present invention is described in further detail with reference to the accompanying drawings and embodiments.
The technical solution adopted by the present invention is that multi-core CPU computer and the highly efficient, unified software constructed thereon are flat
Platform.Its main feature is that (1) high performance parallel computation and storage hardware system;(2) full-featured, high-performance, uniformly, it is configurable and
Rowization software platform.
(1) high performance parallel computation and storage hardware
The hardware system carries out large-scale parallel calculating using multiple-path multiple-core CPU.Fig. 1 is the system knot of calculation server
Composition:
First, multiple-path multiple-core CPU parallelizations calculate, and using 4 path processors, are connected using QPI buses between processor.Often
Path processor has 8 independent calculating cores, is equipped with triple channel DDR3 RDIMM memories, while being also adapted to cloud computing server
Calculating requirement.
Secondly, cache and high-speed bus:It is adapted to allotment and the collaboration work of the sequencing data analysis task of concurrent type frog
Make needs of the environment in the distribution of extensive task.
Finally, RAID disk array:Stored by RAID disk array, not only improve central server response speed and
Stability, and be conducive to irregular central server update.The backup and upgrading that cloud computing server can be dealt with simultaneously need
It wants.
(2) full-featured, high-performance, software platform uniformly, configurable
High performance software platform includes low sequencing quality data processing, the Qualitative Identification of polluted sequence, rRNA pollution sequences
(Fig. 2) such as the qualitative, quantitative identifications and the detection of sequence alignment quality of row.This system is named as RNA-QC-Chain softwares system
Unite (http://www.computationalbioenergy.org/rna-qc-chain.html, independent intellectual property right), number
It is according to quality control step:
First, the low sequencing quality data processing based on multi-core CPU parallel computation.Utilize Parallel-QC tools
(http://www.computationalbioenergy.org/parallel-qc.html, independent intellectual property right), it will input
Different subdatas is assigned at small-scale subdata on different CPU cores by file division, then simultaneously in multiple CPU
The length predicting the base quality and joint sequence of each sequence on kernel, and specified according to user excision sequence both ends successively
Low quality base filters the sequence containing certain proportion low quality base, deletes joint sequence therein, finally will be filtered
Sequence is merged together, to obtain removing the data result of low sequencing quality sequence.
Second, the Qualitative Identification of the polluted sequence based on multi-core CPU parallel computation.First with rRNA-filter tools
Remove the rRNA sequences in data.By rRNA sequences all in disclosed rRNA databases SILVA (including 16S, 18S, 23S
With 28S rRNA sequences) structure Hidden Markov Model (HMM), and it is pre- for transcript profile sequence progress rRNA based on HMM search
It surveys, then removes the rRNA sequences of prediction from transcript profile data.SILVA databases are to include in the world at present most comprehensively
One of nRNA database of rRNA sequences covers the rRNA sequences in three big field of bacterium, fungi and eucaryote.Therefore, I
Method can it is as much as possible removal transcript profile sequence contained in rRNA sequences.RRNA-filter is by input file
It is divided into small-scale subdata, different subdatas is assigned on different CPU cores, it is then same on numerous CPU cores
When predict subsequence 16S, 18S, 23S or 28S rRNA characteristic sequences, finally all kinds of characteristic sequence prediction results are merged into
Together;Then, extensive input data is repeatedly loaded into from external memory by memory according to characteristic sequence prediction result and searched
Extraction, finally merges search result.
Then, 16S or 18S rRNA sequences are a kind of shorter biomarker characteristic sequences, are widely used in protokaryon and true
The identification of core species.RRNA-filter is based on that 16S or 18S rRNA are annotated as a result, qualitatively to predicting and extracting
Obtain the source of species information of all sequences in high-flux sequence data, and searching 16S and 18S rRNA characteristic sequences respectively
Hitch fruit gathers, generate patterned Species Structure composition as a result, to obtain it is all in transcript profile sequencing data can
Species and polluted information existing for energy.
Third, the evaluation and quality control of comprehensive, accurate sequence alignment result.Utilize the SAM-stats of independent development
Tool, the sequence alignment result file based on SAM formats, to the comparison result of transcript profile sequence and genomic data (known) into
Row is accurate, comprehensively statistics and evaluation, function include:
The number of statistical series, including full sequence, compare successful sequence, compare to certain specific gene group regions
Sequence and above-mentioned sequence ratio etc. shared in full sequence;
Sequence of calculation coverage, include the number of the gene that sequence successfully compares, the base coverage of each gene, at
Distribution etc. of the sequence that work(compares on genome structure;
Summarize both-end sequence comparison information, including the successful sequence number compared of both-end, only one end successfully compare
The number of sequence, Insert Fragment length of both-end aligned sequences etc..
In conclusion this software platform depends on multi-core CPU hardware platform, high efficiency can be played by only cooperating
The function of transcript profile sequencing data quality control.
As shown in Figure 1, the high-throughput transcript profile sequencing data method of quality control based on multi-core CPU hardware, main portion
Dividing is:First, the multiple dimensioned parallelization computing capability of 4 road multi-core CPU has independent 8 calculating core per road CPU, and has
There is triple channel memory.Second, cache and high-speed bus.Third, RAID disk array not only improve the sound of central server
Speed and stability are answered, and is conducive to irregular central server update.It calculates and storage hardware basic configuration is:Single channel
CPU at least has 4 separate physicals and calculates core, dual access memory 2GB or more, hard disk at least 50G or more, CPU and storage
Between interconnect at a high speed.
As shown in Fig. 2, its flow has main steps that:First, using Parallel-QC software tools, multi-core CPU pair is utilized
Transcript profile sequence is handled, and cuts off the low quality base at input data sequence both ends successively, and filtering contains certain proportion low-quality
The sequence for measuring base, deletes joint sequence therein, is then combined result, as high sequencing quality sequence data.So
Afterwards, using rRNA-filter tools, for data obtained in the previous step carry out rRNA sequences prediction and polluted sequence it is qualitative
The rRNA sequences (16S/18S or 23S/28S) of prediction are extracted and removed to detection, and will using parallelization multithreading calculating instrument
16S or 18S sequences therein are mapped on known rRNA sequence libraries SILVA, obtain all sequences source of species (including
May must pollute species) information.Finally, it for the result (file of SAM formats) on sequence alignment to reference gene group, utilizes
SAM-stats software tools, count from the angle of sequence alignment and evaluate the quality of transcript profile data, including compare successfully sequence
Number, the effect etc. of the coverage of gene and both-end aligned sequences.In summary as a result, generating graphical analysis result and dividing
Analysis report.Software platform basic configuration is:(SuSE) Linux OS, prepackage GCC running environment, CUDA running environment (3.0 with
On), 1.0 or more RNA-QC-Chain software systems version, 2.0 or more Parallel-META software versions.RNA-QC-Chain
The runnable interface of software systems and Parallel-META software systems is order line form, matches electronic edition operation instruction.Official simultaneously
Square website (http://www.computationalbioenergy.org/software.html) long-term software upgrading clothes are provided
Business.
The method of the present invention, overcomes the computational efficiency bottleneck based on monokaryon CPU hardware computer, makes high-throughput transcript profile
Data quality control efficiency improves 7 times or more.As shown in figure 3, showing to use for the test of the same transcript profile sequencing data
16 core CPU can complete entire quality control process in 23 minutes, and it is 180 minutes that using monokaryon CPU when, which takes,.
Claims (6)
1. a kind of high-throughput transcript profile sequencing data method of quality control based on multi-core CPU hardware, which is characterized in that including with
Lower step:
Parallel processing is carried out to high-throughput transcript profile sequencing data using multi-core CPU, obtains the number for removing low sequencing quality sequence
According to;
The rRNA sequences in data using multi-core CPU to removing low sequencing quality sequence are predicted and are removed, and carry out dirt
Contaminate the Qualitative Identification of sequence;
Sequence alignment result is counted and is evaluated;
RRNA sequences in the data using multi-core CPU to removing low sequencing quality sequence are predicted and are removed, and are gone forward side by side
The Qualitative Identification of row polluted sequence, includes the following steps:
By rRNA sequence construct Hidden Markov Model all in database SILVA;Based on Hidden Markov Model search pair
RRNA predictions and extraction are carried out in transcript profile sequence, and the rRNA sequences of prediction are removed from transcript profile data;
16S the or 18S rRNA that will be predicted and extract are mapped on known rRNA sequence libraries SILVA, obtain institute orderly
The source of species information of row respectively gathers the annotation result of 16S and 18S rRNA characteristic sequences, generates Species Structure
Composition is as a result, to obtain all species that may be present and polluted information in transcript profile sequencing data;
It is described that rRNA predictions and extraction are carried out for transcript profile sequence based on Hidden Markov Model search, include the following steps:
The data file segmentation of the processed removal low quality sequencing sequences of Parallel-QC will be passed through into small-scale subdata;
Different subdatas is assigned on different CPU cores;
Predict 16S, 18S, 23S or 28S rRNA characteristic sequences of subsequence simultaneously on numerous CPU cores;
All kinds of characteristic sequence prediction results are merged together;
Extensive input data is repeatedly loaded into memory from external memory according to characteristic sequence prediction result and searches extraction, most
Search result is merged afterwards.
2. the high-throughput transcript profile sequencing data method of quality control according to claim 1 based on multi-core CPU hardware,
It is characterized in that, the removal for carrying out low sequencing quality sequence to high-throughput transcript profile sequencing data using multi-core CPU, including with
Lower step:
Input file is divided into several small-scale subdatas using Parallel-QC tools;
Each subdata is assigned on different CPU cores;
The base quality and joint sequence of each sequence in its subdata are detected on multiple CPU cores simultaneously, and according to user
Specified length cuts off the low quality base at each sequence both ends successively, low quality base of the filtering containing user's designated ratio
Sequence deletes joint sequence therein;
Treated sequence is merged together, to obtain removing the data of low sequencing quality sequence.
3. the high-throughput transcript profile sequencing data method of quality control according to claim 1 based on multi-core CPU hardware,
It being characterized in that, the result on sequence alignment to reference gene group is counted and is evaluated, including the number of statistical series,
Sequence of calculation coverage summarizes both-end sequence comparison information.
4. the high-throughput transcript profile sequencing data method of quality control according to claim 3 based on multi-core CPU hardware,
It is characterized in that, the number of the statistical series includes full sequence, compares successful sequence, compares and arrive certain specific gene group areas
The sequence in domain and above-mentioned sequence distinguish shared ratio in full sequence.
5. the high-throughput transcript profile sequencing data method of quality control based on multi-core CPU hardware stated according to claim 3, special
Sign is, the sequence of calculation coverage include the number for the gene that sequence successfully compares, the base coverage of each gene,
Distribution of the sequence that success compares on genome structure.
6. the high-throughput transcript profile sequencing data method of quality control based on multi-core CPU hardware stated according to claim 3, special
Sign is that the both-end sequence comparison information that summarizes includes the successful sequence number compared of both-end, only has one end successfully to compare
Sequence number, the Insert Fragment length of both-end aligned sequences.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410205571.9A CN105095686B (en) | 2014-05-15 | 2014-05-15 | High-throughput transcript profile sequencing data method of quality control based on multi-core CPU hardware |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410205571.9A CN105095686B (en) | 2014-05-15 | 2014-05-15 | High-throughput transcript profile sequencing data method of quality control based on multi-core CPU hardware |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105095686A CN105095686A (en) | 2015-11-25 |
CN105095686B true CN105095686B (en) | 2018-08-14 |
Family
ID=54576104
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410205571.9A Active CN105095686B (en) | 2014-05-15 | 2014-05-15 | High-throughput transcript profile sequencing data method of quality control based on multi-core CPU hardware |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105095686B (en) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105740650B (en) * | 2016-03-02 | 2019-04-05 | 广西作物遗传改良生物技术重点开放实验室 | A method of quick and precisely identifying high-throughput genomic data pollution sources |
CN106407743B (en) * | 2016-08-31 | 2019-03-05 | 上海美吉生物医药科技有限公司 | A kind of high-throughput data analysing method based on cluster |
CN106777262B (en) * | 2016-12-28 | 2020-07-03 | 上海华点云生物科技有限公司 | High-throughput sequencing data quality filtering method and filtering device |
CN106701995B (en) * | 2017-02-20 | 2019-11-26 | 元码基因科技(北京)股份有限公司 | The method for carrying out cell quality control is sequenced by unicellular transcript profile |
CN107194204A (en) * | 2017-05-22 | 2017-09-22 | 人和未来生物科技(长沙)有限公司 | A kind of sequencing data of whole genome calculates deciphering method |
CN107203703A (en) * | 2017-05-22 | 2017-09-26 | 人和未来生物科技(长沙)有限公司 | A kind of transcript profile sequencing data calculates deciphering method |
CN107451424A (en) * | 2017-07-31 | 2017-12-08 | 浙江绍兴千寻生物科技有限公司 | In high volume unicellular RNA seq data quality controls and analysis method |
CN109559780A (en) * | 2018-09-27 | 2019-04-02 | 华中科技大学鄂州工业技术研究院 | A kind of RNA data processing method of high-flux sequence |
CN112927756B (en) * | 2019-12-06 | 2023-05-30 | 深圳华大基因科技服务有限公司 | Method and device for identifying rRNA pollution source of transcriptome and method for improving rRNA pollution |
CN111326216B (en) * | 2020-02-27 | 2023-07-21 | 中国科学院计算技术研究所 | Rapid partitioning method for big data gene sequencing file |
CN115495299B (en) * | 2022-11-15 | 2023-03-24 | 深圳市江元科技(集团)有限公司 | Method, system and medium for intelligent QC software detection and identification uploading |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101914619A (en) * | 2010-07-22 | 2010-12-15 | 深圳华大基因科技有限公司 | RNA (Ribonucleic Acid) sequencing quality control method and device relating to gene expression |
WO2012125848A2 (en) * | 2011-03-16 | 2012-09-20 | Baylor College Of Medicine | A method for comprehensive sequence analysis using deep sequencing technology |
-
2014
- 2014-05-15 CN CN201410205571.9A patent/CN105095686B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN105095686A (en) | 2015-11-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105095686B (en) | High-throughput transcript profile sequencing data method of quality control based on multi-core CPU hardware | |
Jian et al. | GetOrganelle: a simple and fast pipeline for de novo assembly of a complete circular chloroplast genome using genome skimming data | |
CN106295250B (en) | Short sequence quick comparison analysis method and device was sequenced in two generations | |
Schbath et al. | Mapping reads on a genomic sequence: an algorithmic overview and a practical comparative analysis | |
Bao et al. | Evaluation of next-generation sequencing software in mapping and assembly | |
Paine et al. | Functional traits of individual trees reveal ecological constraints on community assembly in tropical rain forests | |
US8832139B2 (en) | Associative memory and data searching system and method | |
KR100681795B1 (en) | A protocol for genome sequence alignment on grid environment | |
CN103838985A (en) | High-throughput sequencing data quality control system based on multi-core CPU and GPGPU hardware | |
CN101914619A (en) | RNA (Ribonucleic Acid) sequencing quality control method and device relating to gene expression | |
Herath et al. | Accelerating string matching for bio-computing applications on multi-core CPUs | |
US20090182994A1 (en) | Two-level representative workload phase detection method, apparatus, and computer usable program code | |
CN115331750A (en) | New target compound activity prediction method and system based on deep learning | |
Chimani et al. | Algorithm engineering: Concepts and practice | |
Sarwar et al. | Database search, alignment viewer and genomics analysis tools: big data for bioinformatics | |
US20190156917A1 (en) | Data Processing Method and Apparatus | |
Banerjee et al. | Efficient and scalable workflows for genomic analyses | |
Zhang et al. | MOST+: A de novo motif finding approach combining genomic sequence and heterogeneous genome-wide signatures | |
US20140164032A1 (en) | Cladistics data analyzer for business data | |
Bhowmik et al. | A review article on ChIP-Seq tools: MACS2, HOMER, SICER, PEAKANNOTATOR and MEME | |
Pyrgiotis et al. | Parallel implementation of the wu-manber algorithm using the opencl framework | |
Khan et al. | MSuPDA: A memory efficient algorithm for sequence alignment | |
Karimi et al. | Binos4dna: Bitmap indexes and nosql for identifying species with dna signatures through metagenomics samples | |
CN105117327B (en) | Towards the instable architecture appraisal procedure of multi-core platform | |
Škrbina et al. | Using parallel processing for file carving |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |