CN105095686B - High-throughput transcript profile sequencing data method of quality control based on multi-core CPU hardware - Google Patents

High-throughput transcript profile sequencing data method of quality control based on multi-core CPU hardware Download PDF

Info

Publication number
CN105095686B
CN105095686B CN201410205571.9A CN201410205571A CN105095686B CN 105095686 B CN105095686 B CN 105095686B CN 201410205571 A CN201410205571 A CN 201410205571A CN 105095686 B CN105095686 B CN 105095686B
Authority
CN
China
Prior art keywords
sequence
transcript profile
core cpu
data
quality
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410205571.9A
Other languages
Chinese (zh)
Other versions
CN105095686A (en
Inventor
周茜
宁康
苏晓泉
徐健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Institute of Bioenergy and Bioprocess Technology of CAS
Original Assignee
Qingdao Institute of Bioenergy and Bioprocess Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Institute of Bioenergy and Bioprocess Technology of CAS filed Critical Qingdao Institute of Bioenergy and Bioprocess Technology of CAS
Priority to CN201410205571.9A priority Critical patent/CN105095686B/en
Publication of CN105095686A publication Critical patent/CN105095686A/en
Application granted granted Critical
Publication of CN105095686B publication Critical patent/CN105095686B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The present invention is a kind of high-throughput transcript profile sequencing data method of quality control based on multi-core CPU hardware.Including:Parallel processing is carried out to high-throughput transcript profile sequencing data using multi-core CPU, obtains the data for removing low sequencing quality sequence;The rRNA sequences in data using multi-core CPU to removing low sequencing quality sequence are predicted and are removed, and carry out the Qualitative Identification of polluted sequence;Sequence alignment result is counted and is evaluated.The present invention is based on multi-core CPU computer, the computational efficiency bottleneck based on monokaryon CPU hardware computer is overcome, high-throughput transcript profile data quality control efficiency can be made to improve 7 times or more;The application of the present invention will significantly improve accuracy and the speed of high-throughput transcript profile data quality control, contribute to the rapid development of transcript profile sequencing correlative study extensively.

Description

High-throughput transcript profile sequencing data method of quality control based on multi-core CPU hardware
Technical field
The invention patent relates to bioinformatics, specifically a kind of high-throughput transcript profile based on multi-core CPU hardware Sequencing data method of quality control quickly can carry out quality control to high-throughput transcript profile sequencing data.
Background technology
High throughput sequencing technologies are also known as " next generation " sequencing technologies, are the changes to tradition sequencing revolution, can Sequencing once is carried out to millions of DNA/RNA molecules to hundreds of thousands, is applied to biology phase more and more widely It closes in research.Compared with traditional Sanger sequencing technologies, the flux of new-generation sequencing technology improves one to two orders of magnitude, Data volume is more (100MB to several G).Transcript profile sequencing is a deeply application based on high throughput sequencing technologies, can be to one That the transcripting spectrums of a species carries out is careful, deeply and comprehensively analyzes.However, due to high throughput sequencing technologies itself limitation and The operating error that transcript profile extraction etc. is artificially tested, the transcript profile data being originally generated often contain part low quality sequence, packet Include low quality base, polluted sequence and nRNA sequence (rRNA) etc..After the presence of these low quality sequences will greatly influence The accuracy of continuous transcript profile data analysis, even results in the conclusion of mistake.Further, since subsequent transcriptome analysis result relies on It is obtained after (alignment) is compared with reference gene group in sequence, therefore the comparison quality of transcript profile sequence is also weighing apparatus Measure one of the key factor of transcript profile sequencing data total quality.In conclusion quality control is to carry out high-throughput transcript profile to survey The necessary committed step of sequence data analysis.Current existing transcript profile data quality control method, which focuses primarily upon, completes sequence ratio Quality evaluation to level, and can not quality control comprehensively be carried out at the same time for base, sequence, pollution and comparison quality.
Since high-throughput transcript profile sequencing data generally requires the multiple samples for measuring different condition or different time points acquisition This, each sample is generally respectively necessary for three or more biology and repeats to repeat with technology, therefore the sample size being sequenced is huge, leads Sequencing every time is caused often to obtain the data volume more than 20 samples and tens G, so corresponding to high-throughput transcript profile data Quality control, it is necessary to there is the supercomputer with suitable operational capability and corresponding analysis software to realize.Using current General analysis method scans several hundred million sequences using single CPU computer and is handled respectively one by one, it may be necessary to number Its even month time, the efficiency of data analysis is made also to become the big bottleneck in correlative study.
Invention content
Comprehensively, accurately it efficiently can not meet high-throughput transcript profile in order to solve traditional analysis and computing system and survey The problem of requirement of sequence data quality control, the present invention according to high-throughput transcript profile sequencing data can parallel processing the characteristics of, carry Go out a kind of high-throughput transcript profile sequencing data method of quality control based on multi-core CPU hardware.
Present invention technical solution used for the above purpose is:A kind of high throughput based on multi-core CPU hardware turn Record group sequencing data method of quality control, includes the following steps:
Parallel processing is carried out to high-throughput transcript profile sequencing data using multi-core CPU, obtains removing low sequencing quality sequence Data;
The rRNA sequences in data using multi-core CPU to removing low sequencing quality sequence are predicted and are removed, and are gone forward side by side The Qualitative Identification of row polluted sequence;
Sequence alignment result is counted and is evaluated.
The removal for carrying out low sequencing quality sequence to high-throughput transcript profile sequencing data using multi-core CPU, including with Lower step:
Input file is divided into several small-scale subdatas using Parallel-QC tools;
Each subdata is assigned on different CPU cores;
While the base quality and joint sequence of each sequence in its subdata are detected on multiple CPU cores, and according to The length that user specifies cuts off the low quality base at each sequence both ends successively, filters the low quality alkali containing user's designated ratio The sequence of base deletes joint sequence therein;
Treated sequence is merged together, to obtain removing the data of low sequencing quality sequence.
RRNA sequences in the data using multi-core CPU to removing low sequencing quality sequence are predicted and are removed, And the Qualitative Identification of polluted sequence is carried out, include the following steps:
By rRNA sequence construct Hidden Markov Model all in database SILVA;It is searched based on Hidden Markov Model Rope carries out rRNA predictions and extraction for transcript profile sequence, and the rRNA sequences of prediction are removed from transcript profile data;
16S the or 18S rRNA that will be predicted and extract are mapped on known rRNA sequence libraries SILVA, obtain institute There is the source of species information of sequence, the annotation result of 16S and 18S rRNA characteristic sequences gathers respectively, generates species Structure composition is as a result, to obtain all species that may be present and polluted information in transcript profile sequencing data;
It is described that rRNA predictions and extraction are carried out for transcript profile sequence based on Hidden Markov Model search, and by prediction RRNA sequences are removed from transcript profile data, are included the following steps:
The data file segmentation of the processed removal low quality sequencing sequences of Parallel-QC will be passed through into small-scale subnumber According to;
Different subdatas is assigned on different CPU cores;
Predict 16S, 18S, 23S or 28S rRNA characteristic sequences of subsequence simultaneously on numerous CPU cores;
All kinds of characteristic sequence prediction results are merged together;
Extensive input data is repeatedly loaded into memory according to characteristic sequence prediction result from external memory and is searched and is carried It takes, finally merges search result.
The result on sequence alignment to reference gene group is counted and is evaluated, including the number of statistical series, Sequence of calculation coverage summarizes both-end sequence comparison information.
The number of the statistical series includes full sequence, compares successful sequence, compares and arrive certain specific gene group areas The sequence in domain and above-mentioned sequence ratio shared in full sequence.
The sequence of calculation coverage includes the number for the gene that sequence successfully compares, the covering of the base of each gene Distribution of the sequence that degree, success compare on genome structure.
Sequence number, the number for the sequence that only one end successfully compares, the both-end ratio including the successful comparison of both-end To the Insert Fragment length of sequence.
The present invention has the following advantages and beneficial effects:
1. realizing comprehensive, efficient transcript profile data quality control, including for sequencing quality, rRNA sequences, pollution Various comprehensive analysis such as sequence and comparison result and Quality Control;
2. being matched with based on multi-core CPU computer, the computational efficiency bottle based on monokaryon CPU hardware computer is overcome Neck can make high-throughput transcript profile data quality control efficiency improve 7 times or more;
3. the application of the present invention will significantly improve accuracy and the speed of high-throughput transcript profile data quality control, extensively Contribute to the rapid development of transcript profile sequencing correlative study.
Description of the drawings
Fig. 1 is the hardware architecture diagram of the present invention;Wherein, it is 1. DMI and PCIe2.0 buses;2. being triple channel DDR3 memories Bus;3. being SATA buses;
Fig. 2 is the software flow pattern of the present invention;Wherein, (1) is low sequencing quality data processing;(2) be rRNA sequences and The Qualitative Identification of polluted sequence;(3) it is the evaluation and quality control of sequence comparison;
Fig. 3 is the test for the same transcript profile sequencing data using the 16 core CUP applications present invention and application monokaryon CPU Result figure.
Specific implementation mode
The present invention is described in further detail with reference to the accompanying drawings and embodiments.
The technical solution adopted by the present invention is that multi-core CPU computer and the highly efficient, unified software constructed thereon are flat Platform.Its main feature is that (1) high performance parallel computation and storage hardware system;(2) full-featured, high-performance, uniformly, it is configurable and Rowization software platform.
(1) high performance parallel computation and storage hardware
The hardware system carries out large-scale parallel calculating using multiple-path multiple-core CPU.Fig. 1 is the system knot of calculation server Composition:
First, multiple-path multiple-core CPU parallelizations calculate, and using 4 path processors, are connected using QPI buses between processor.Often Path processor has 8 independent calculating cores, is equipped with triple channel DDR3 RDIMM memories, while being also adapted to cloud computing server Calculating requirement.
Secondly, cache and high-speed bus:It is adapted to allotment and the collaboration work of the sequencing data analysis task of concurrent type frog Make needs of the environment in the distribution of extensive task.
Finally, RAID disk array:Stored by RAID disk array, not only improve central server response speed and Stability, and be conducive to irregular central server update.The backup and upgrading that cloud computing server can be dealt with simultaneously need It wants.
(2) full-featured, high-performance, software platform uniformly, configurable
High performance software platform includes low sequencing quality data processing, the Qualitative Identification of polluted sequence, rRNA pollution sequences (Fig. 2) such as the qualitative, quantitative identifications and the detection of sequence alignment quality of row.This system is named as RNA-QC-Chain softwares system Unite (http://www.computationalbioenergy.org/rna-qc-chain.html, independent intellectual property right), number It is according to quality control step:
First, the low sequencing quality data processing based on multi-core CPU parallel computation.Utilize Parallel-QC tools (http://www.computationalbioenergy.org/parallel-qc.html, independent intellectual property right), it will input Different subdatas is assigned at small-scale subdata on different CPU cores by file division, then simultaneously in multiple CPU The length predicting the base quality and joint sequence of each sequence on kernel, and specified according to user excision sequence both ends successively Low quality base filters the sequence containing certain proportion low quality base, deletes joint sequence therein, finally will be filtered Sequence is merged together, to obtain removing the data result of low sequencing quality sequence.
Second, the Qualitative Identification of the polluted sequence based on multi-core CPU parallel computation.First with rRNA-filter tools Remove the rRNA sequences in data.By rRNA sequences all in disclosed rRNA databases SILVA (including 16S, 18S, 23S With 28S rRNA sequences) structure Hidden Markov Model (HMM), and it is pre- for transcript profile sequence progress rRNA based on HMM search It surveys, then removes the rRNA sequences of prediction from transcript profile data.SILVA databases are to include in the world at present most comprehensively One of nRNA database of rRNA sequences covers the rRNA sequences in three big field of bacterium, fungi and eucaryote.Therefore, I Method can it is as much as possible removal transcript profile sequence contained in rRNA sequences.RRNA-filter is by input file It is divided into small-scale subdata, different subdatas is assigned on different CPU cores, it is then same on numerous CPU cores When predict subsequence 16S, 18S, 23S or 28S rRNA characteristic sequences, finally all kinds of characteristic sequence prediction results are merged into Together;Then, extensive input data is repeatedly loaded into from external memory by memory according to characteristic sequence prediction result and searched Extraction, finally merges search result.
Then, 16S or 18S rRNA sequences are a kind of shorter biomarker characteristic sequences, are widely used in protokaryon and true The identification of core species.RRNA-filter is based on that 16S or 18S rRNA are annotated as a result, qualitatively to predicting and extracting Obtain the source of species information of all sequences in high-flux sequence data, and searching 16S and 18S rRNA characteristic sequences respectively Hitch fruit gathers, generate patterned Species Structure composition as a result, to obtain it is all in transcript profile sequencing data can Species and polluted information existing for energy.
Third, the evaluation and quality control of comprehensive, accurate sequence alignment result.Utilize the SAM-stats of independent development Tool, the sequence alignment result file based on SAM formats, to the comparison result of transcript profile sequence and genomic data (known) into Row is accurate, comprehensively statistics and evaluation, function include:
The number of statistical series, including full sequence, compare successful sequence, compare to certain specific gene group regions Sequence and above-mentioned sequence ratio etc. shared in full sequence;
Sequence of calculation coverage, include the number of the gene that sequence successfully compares, the base coverage of each gene, at Distribution etc. of the sequence that work(compares on genome structure;
Summarize both-end sequence comparison information, including the successful sequence number compared of both-end, only one end successfully compare The number of sequence, Insert Fragment length of both-end aligned sequences etc..
In conclusion this software platform depends on multi-core CPU hardware platform, high efficiency can be played by only cooperating The function of transcript profile sequencing data quality control.
As shown in Figure 1, the high-throughput transcript profile sequencing data method of quality control based on multi-core CPU hardware, main portion Dividing is:First, the multiple dimensioned parallelization computing capability of 4 road multi-core CPU has independent 8 calculating core per road CPU, and has There is triple channel memory.Second, cache and high-speed bus.Third, RAID disk array not only improve the sound of central server Speed and stability are answered, and is conducive to irregular central server update.It calculates and storage hardware basic configuration is:Single channel CPU at least has 4 separate physicals and calculates core, dual access memory 2GB or more, hard disk at least 50G or more, CPU and storage Between interconnect at a high speed.
As shown in Fig. 2, its flow has main steps that:First, using Parallel-QC software tools, multi-core CPU pair is utilized Transcript profile sequence is handled, and cuts off the low quality base at input data sequence both ends successively, and filtering contains certain proportion low-quality The sequence for measuring base, deletes joint sequence therein, is then combined result, as high sequencing quality sequence data.So Afterwards, using rRNA-filter tools, for data obtained in the previous step carry out rRNA sequences prediction and polluted sequence it is qualitative The rRNA sequences (16S/18S or 23S/28S) of prediction are extracted and removed to detection, and will using parallelization multithreading calculating instrument 16S or 18S sequences therein are mapped on known rRNA sequence libraries SILVA, obtain all sequences source of species (including May must pollute species) information.Finally, it for the result (file of SAM formats) on sequence alignment to reference gene group, utilizes SAM-stats software tools, count from the angle of sequence alignment and evaluate the quality of transcript profile data, including compare successfully sequence Number, the effect etc. of the coverage of gene and both-end aligned sequences.In summary as a result, generating graphical analysis result and dividing Analysis report.Software platform basic configuration is:(SuSE) Linux OS, prepackage GCC running environment, CUDA running environment (3.0 with On), 1.0 or more RNA-QC-Chain software systems version, 2.0 or more Parallel-META software versions.RNA-QC-Chain The runnable interface of software systems and Parallel-META software systems is order line form, matches electronic edition operation instruction.Official simultaneously Square website (http://www.computationalbioenergy.org/software.html) long-term software upgrading clothes are provided Business.
The method of the present invention, overcomes the computational efficiency bottleneck based on monokaryon CPU hardware computer, makes high-throughput transcript profile Data quality control efficiency improves 7 times or more.As shown in figure 3, showing to use for the test of the same transcript profile sequencing data 16 core CPU can complete entire quality control process in 23 minutes, and it is 180 minutes that using monokaryon CPU when, which takes,.

Claims (6)

1. a kind of high-throughput transcript profile sequencing data method of quality control based on multi-core CPU hardware, which is characterized in that including with Lower step:
Parallel processing is carried out to high-throughput transcript profile sequencing data using multi-core CPU, obtains the number for removing low sequencing quality sequence According to;
The rRNA sequences in data using multi-core CPU to removing low sequencing quality sequence are predicted and are removed, and carry out dirt Contaminate the Qualitative Identification of sequence;
Sequence alignment result is counted and is evaluated;
RRNA sequences in the data using multi-core CPU to removing low sequencing quality sequence are predicted and are removed, and are gone forward side by side The Qualitative Identification of row polluted sequence, includes the following steps:
By rRNA sequence construct Hidden Markov Model all in database SILVA;Based on Hidden Markov Model search pair RRNA predictions and extraction are carried out in transcript profile sequence, and the rRNA sequences of prediction are removed from transcript profile data;
16S the or 18S rRNA that will be predicted and extract are mapped on known rRNA sequence libraries SILVA, obtain institute orderly The source of species information of row respectively gathers the annotation result of 16S and 18S rRNA characteristic sequences, generates Species Structure Composition is as a result, to obtain all species that may be present and polluted information in transcript profile sequencing data;
It is described that rRNA predictions and extraction are carried out for transcript profile sequence based on Hidden Markov Model search, include the following steps:
The data file segmentation of the processed removal low quality sequencing sequences of Parallel-QC will be passed through into small-scale subdata;
Different subdatas is assigned on different CPU cores;
Predict 16S, 18S, 23S or 28S rRNA characteristic sequences of subsequence simultaneously on numerous CPU cores;
All kinds of characteristic sequence prediction results are merged together;
Extensive input data is repeatedly loaded into memory from external memory according to characteristic sequence prediction result and searches extraction, most Search result is merged afterwards.
2. the high-throughput transcript profile sequencing data method of quality control according to claim 1 based on multi-core CPU hardware, It is characterized in that, the removal for carrying out low sequencing quality sequence to high-throughput transcript profile sequencing data using multi-core CPU, including with Lower step:
Input file is divided into several small-scale subdatas using Parallel-QC tools;
Each subdata is assigned on different CPU cores;
The base quality and joint sequence of each sequence in its subdata are detected on multiple CPU cores simultaneously, and according to user Specified length cuts off the low quality base at each sequence both ends successively, low quality base of the filtering containing user's designated ratio Sequence deletes joint sequence therein;
Treated sequence is merged together, to obtain removing the data of low sequencing quality sequence.
3. the high-throughput transcript profile sequencing data method of quality control according to claim 1 based on multi-core CPU hardware, It being characterized in that, the result on sequence alignment to reference gene group is counted and is evaluated, including the number of statistical series, Sequence of calculation coverage summarizes both-end sequence comparison information.
4. the high-throughput transcript profile sequencing data method of quality control according to claim 3 based on multi-core CPU hardware, It is characterized in that, the number of the statistical series includes full sequence, compares successful sequence, compares and arrive certain specific gene group areas The sequence in domain and above-mentioned sequence distinguish shared ratio in full sequence.
5. the high-throughput transcript profile sequencing data method of quality control based on multi-core CPU hardware stated according to claim 3, special Sign is, the sequence of calculation coverage include the number for the gene that sequence successfully compares, the base coverage of each gene, Distribution of the sequence that success compares on genome structure.
6. the high-throughput transcript profile sequencing data method of quality control based on multi-core CPU hardware stated according to claim 3, special Sign is that the both-end sequence comparison information that summarizes includes the successful sequence number compared of both-end, only has one end successfully to compare Sequence number, the Insert Fragment length of both-end aligned sequences.
CN201410205571.9A 2014-05-15 2014-05-15 High-throughput transcript profile sequencing data method of quality control based on multi-core CPU hardware Active CN105095686B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410205571.9A CN105095686B (en) 2014-05-15 2014-05-15 High-throughput transcript profile sequencing data method of quality control based on multi-core CPU hardware

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410205571.9A CN105095686B (en) 2014-05-15 2014-05-15 High-throughput transcript profile sequencing data method of quality control based on multi-core CPU hardware

Publications (2)

Publication Number Publication Date
CN105095686A CN105095686A (en) 2015-11-25
CN105095686B true CN105095686B (en) 2018-08-14

Family

ID=54576104

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410205571.9A Active CN105095686B (en) 2014-05-15 2014-05-15 High-throughput transcript profile sequencing data method of quality control based on multi-core CPU hardware

Country Status (1)

Country Link
CN (1) CN105095686B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105740650B (en) * 2016-03-02 2019-04-05 广西作物遗传改良生物技术重点开放实验室 A method of quick and precisely identifying high-throughput genomic data pollution sources
CN106407743B (en) * 2016-08-31 2019-03-05 上海美吉生物医药科技有限公司 A kind of high-throughput data analysing method based on cluster
CN106777262B (en) * 2016-12-28 2020-07-03 上海华点云生物科技有限公司 High-throughput sequencing data quality filtering method and filtering device
CN106701995B (en) * 2017-02-20 2019-11-26 元码基因科技(北京)股份有限公司 The method for carrying out cell quality control is sequenced by unicellular transcript profile
CN107203703A (en) * 2017-05-22 2017-09-26 人和未来生物科技(长沙)有限公司 A kind of transcript profile sequencing data calculates deciphering method
CN107194204A (en) * 2017-05-22 2017-09-22 人和未来生物科技(长沙)有限公司 A kind of sequencing data of whole genome calculates deciphering method
CN107451424A (en) * 2017-07-31 2017-12-08 浙江绍兴千寻生物科技有限公司 In high volume unicellular RNA seq data quality controls and analysis method
CN109559780A (en) * 2018-09-27 2019-04-02 华中科技大学鄂州工业技术研究院 A kind of RNA data processing method of high-flux sequence
CN112927756B (en) * 2019-12-06 2023-05-30 深圳华大基因科技服务有限公司 Method and device for identifying rRNA pollution source of transcriptome and method for improving rRNA pollution
CN111326216B (en) * 2020-02-27 2023-07-21 中国科学院计算技术研究所 Rapid partitioning method for big data gene sequencing file
CN115495299B (en) * 2022-11-15 2023-03-24 深圳市江元科技(集团)有限公司 Method, system and medium for intelligent QC software detection and identification uploading

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101914619A (en) * 2010-07-22 2010-12-15 深圳华大基因科技有限公司 RNA (Ribonucleic Acid) sequencing quality control method and device relating to gene expression
WO2012125848A2 (en) * 2011-03-16 2012-09-20 Baylor College Of Medicine A method for comprehensive sequence analysis using deep sequencing technology

Also Published As

Publication number Publication date
CN105095686A (en) 2015-11-25

Similar Documents

Publication Publication Date Title
CN105095686B (en) High-throughput transcript profile sequencing data method of quality control based on multi-core CPU hardware
Jian et al. GetOrganelle: a simple and fast pipeline for de novo assembly of a complete circular chloroplast genome using genome skimming data
Jin et al. GetOrganelle: a fast and versatile toolkit for accurate de novo assembly of organelle genomes
CN106295250B (en) Short sequence quick comparison analysis method and device was sequenced in two generations
Schbath et al. Mapping reads on a genomic sequence: an algorithmic overview and a practical comparative analysis
Bao et al. Evaluation of next-generation sequencing software in mapping and assembly
US20220075794A1 (en) Similarity analyses in analytics workflows
US8832139B2 (en) Associative memory and data searching system and method
Bolón-Canedo et al. Challenges and future trends for microarray analysis
CN103838985A (en) High-throughput sequencing data quality control system based on multi-core CPU and GPGPU hardware
CN101914619A (en) RNA (Ribonucleic Acid) sequencing quality control method and device relating to gene expression
KR100681795B1 (en) A protocol for genome sequence alignment on grid environment
Herath et al. Accelerating string matching for bio-computing applications on multi-core CPUs
US20090182994A1 (en) Two-level representative workload phase detection method, apparatus, and computer usable program code
CN115331750A (en) New target compound activity prediction method and system based on deep learning
CN113160886B (en) Cell type prediction system based on single cell Hi-C data
US20170169159A1 (en) Repetition identification
CN111312342A (en) Computer-aided medicine design system of electronic structure
Park et al. Lane detection and tracking in PCR gel electrophoresis images
Banerjee et al. Efficient and scalable workflows for genomic analyses
Zhang et al. MOST+: A de novo motif finding approach combining genomic sequence and heterogeneous genome-wide signatures
Bhowmik et al. A review article on ChIP-Seq tools: MACS2, HOMER, SICER, PEAKANNOTATOR and MEME
Guha Neogi et al. NGS data analysis with apache spark
Pungila et al. Accelerating DNA biometrics in criminal investigations through GPU-based pattern matching
Khan et al. MSuPDA: A memory efficient algorithm for sequence alignment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant