WO2022082879A1 - 基因测序数据处理方法和基因测序数据处理装置 - Google Patents

基因测序数据处理方法和基因测序数据处理装置 Download PDF

Info

Publication number
WO2022082879A1
WO2022082879A1 PCT/CN2020/127101 CN2020127101W WO2022082879A1 WO 2022082879 A1 WO2022082879 A1 WO 2022082879A1 CN 2020127101 W CN2020127101 W CN 2020127101W WO 2022082879 A1 WO2022082879 A1 WO 2022082879A1
Authority
WO
WIPO (PCT)
Prior art keywords
algorithm
sequencing data
gene sequencing
idle state
gpu
Prior art date
Application number
PCT/CN2020/127101
Other languages
English (en)
French (fr)
Inventor
张优劲
于闯
孔令翔
何惠
贺增泉
晋向前
Original Assignee
深圳华大基因股份有限公司
华大基因健康科技(香港)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳华大基因股份有限公司, 华大基因健康科技(香港)有限公司 filed Critical 深圳华大基因股份有限公司
Priority to JP2021571845A priority Critical patent/JP7393439B2/ja
Priority to AU2020450960A priority patent/AU2020450960A1/en
Priority to EP20937176.4A priority patent/EP4235678A1/en
Priority to IL288594A priority patent/IL288594A/en
Publication of WO2022082879A1 publication Critical patent/WO2022082879A1/zh
Priority to AU2023266239A priority patent/AU2023266239A1/en

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/42Bus transfer protocol, e.g. handshake; Synchronisation
    • G06F13/4204Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus
    • G06F13/4221Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus being an input/output bus, e.g. ISA bus, EISA bus, PCI bus, SCSI bus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the invention relates to the technical field of data processing, in particular to a gene sequencing data processing method and a gene sequencing data processing device.
  • the traditional alignment algorithm bwa uses the bwt algorithm and the Smith-Waterman of the imprecise alignment algorithm.
  • the algorithm is also implemented based on the SSE2 instructions of the x86 architecture.
  • the BWT comparison algorithm based on x86 runs faster on the CPU of the x86 architecture, it cannot be calculated in large batches at the same time, and the BWT algorithm cannot adapt to the SIMT operation mode of the GPU, resulting in a greatly reduced efficiency of BWT running on the GPU , thereby affecting the efficiency of the entire alignment process.
  • the existing Smith-Waterman algorithm only runs on the x86 architecture, lacks the support of SSE2 acceleration in the ARM platform, and runs slowly; and the algorithm is also not suitable for computing on the GPU architecture.
  • the present invention provides a gene sequencing data processing device and a gene sequencing data processing method, so as to solve the problem that the existing gene sequencing data analysis and processing process steps can only be run on the x86 framework and the running speed is slow on the GPU. Causes the problem of low efficiency in the process of gene sequencing data processing.
  • An embodiment of the present invention provides a gene sequencing data processing method, which is applied to a gene sequencing data processing device, wherein the gene sequencing data processing device is a heterogeneous multi-core architecture, including: an ARM architecture, a GPU architecture, and a PCI bus ; Described ARM framework connects described GPU framework through described PCI bus; Described ARM framework includes at least one CPU module; Described GPU framework includes at least one GPU module; Described method comprises the following steps:
  • Step S1 the CPU module in the idle state reads the gene sequencing data in batches to obtain the batched gene sequencing data
  • Step S2 the CPU module in the idle state divides the gene analysis method to obtain the first algorithm and the second algorithm;
  • Step S3 The CPU module in the idle state divides the batched gene sequencing data according to the first algorithm to obtain each short sequence, and sends each of the short sequences and the second algorithm to the idle state.
  • the GPU module The CPU module in the idle state divides the batched gene sequencing data according to the first algorithm to obtain each short sequence, and sends each of the short sequences and the second algorithm to the idle state.
  • Step S4 the GPU module in the idle state calculates each of the short sequences according to the second algorithm, and sends the calculation result to the CPU module in the idle state;
  • Step S5 the CPU module in the idle state obtains a batch processing result according to the calculation result and the first algorithm
  • Steps S1 to S5 are repeated until the processing of the gene sequencing data is completed, and the CPU module in an idle state performs an integrated operation on each of the batch processing results to obtain a final processing result.
  • the CPU module in the idle state scans each of the GPU modules, determines the number of GPU modules in the idle state and the data processing volume of the GPU modules in the idle state, and determines the number of GPU modules in the idle state and the data processing volume of each GPU module according to the idle state. Batch reads of genetic sequencing data.
  • the gene analysis algorithms include gene alignment algorithm, Dotplot algorithm, blast algorithm, PAM algorithm, HMM algorithm and AI inference algorithm.
  • the gene alignment algorithm includes a BWT algorithm, and the first algorithm includes an anchor cut algorithm;
  • the CPU module in the idle state uses the anchor point cutting algorithm to perform anchor point positioning on the batched gene sequencing data, and extends the length of N bp forward and backward respectively with the anchor point fixed point as the center, and uses the NEON instruction to perform the anchor point setting.
  • the batch gene sequencing data is cut with a length of 2N+1 bp to obtain each of the short sequences, where N is any positive integer.
  • step of obtaining each of the short sequences including: using the following formula to calculate and obtain each of the short sequences:
  • x represents the number of anchor points
  • N represents the number of extended bp
  • L represents the length of the batch gene sequencing data
  • the second algorithm is a Hash algorithm; the GPU module in the idle state performs a Hash operation on each of the short sequences according to the Hash algorithm, obtains a Hash calculation result, and sends the Hash calculation result to the idle state.
  • CPU module wherein the Hash calculation result is the value of the BWT algorithm matrix, which is used for the calculation of the BWT algorithm matrix.
  • the first algorithm also includes a BWT matrix transformation algorithm
  • the CPU module in the idle state uses the BWT matrix transformation algorithm to transform the BWT algorithm matrix to obtain a BWT transformation result of the short sequence.
  • the comparison algorithm includes the Smith-Waterman algorithm, and the second algorithm includes a scoring matrix algorithm;
  • the GPU module in the idle state calculates the Smith-Waterman scoring matrix according to the scoring matrix algorithm, each of the short sequences and the reference species sequence, and sends the Smith-Waterman scoring matrix to the CPU module in the idle state.
  • the Smith-Waterman scoring matrix is calculated using the following formula:
  • M represents the Smith-Waterman scoring matrix
  • R represents the length of the candidate interval sequence of the reference species
  • C represents the length of the short sequence formed by screening and splicing each short sequence received from the CPU module in the idle state
  • L represents the length of the short sequence.
  • a and b represent constants.
  • An embodiment of the present invention provides a gene sequencing data processing device, the gene sequencing data processing device is a heterogeneous multi-core framework, and the gene sequencing data processing device executes the gene sequencing data processing method.
  • the gene sequencing data processing device and the gene sequencing data processing method in the embodiments of the present invention are applied to the device, and the gene sequencing data processing device is a heterogeneous multi-core framework, including an ARM framework, a GPU framework and a PCI bus, wherein the ARM framework includes at least one The CPU module, while the GPU framework includes at least one GPU module, the CPU module is connected to the GPU module through a PCI bus, and information can be transmitted between the two.
  • the method includes the CPU module in the idle state, which is mainly used to read the gene sequencing data in batches and divide the gene analysis method, so as to obtain the batched gene sequencing data and the first algorithm (this algorithm is the most suitable algorithm for the CPU module to run) and the second algorithm (this algorithm is the most suitable algorithm for the GPU module), then the first algorithm is used to segment the batched gene sequencing data to obtain a series of short sequences, and these short sequences and the second algorithm are passed through the PCI bus It is transmitted to the GPU module in the idle state; the GPU module calculates these short sequences according to the second algorithm, and then returns the calculation result to the CPU module in the idle state; the CPU module in the idle state calculates according to the calculation result and the first algorithm.
  • the gene sequencing data processing device and the gene sequencing data processing method separate the analysis method (ie the analysis process) of the gene sequencing data, and let them run on the CPU module and the GPU module respectively according to the characteristics, which greatly improves the efficiency of gene sequencing data analysis .
  • the gene sequencing data processing device can be provided with multiple CPU modules and GPU modules, and multiple GPU modules can simultaneously calculate short sequences of different lengths, which can solve the problem of low GPU parallel efficiency.
  • FIG. 1 is a schematic structural diagram of a gene sequencing data processing device in an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of a data processing process of a gene sequencing data processing device in an embodiment of the present invention
  • FIG. 3 is a schematic diagram of anchor cutting performed by a CPU module on batched gene sequencing data in an embodiment of the present invention
  • FIG. 4 is a schematic diagram of a GPU module using a Hash algorithm to perform Hash operation on a short sequence in an embodiment of the present invention
  • FIG. 5 is a schematic flowchart of a method for processing gene sequencing data in an embodiment of the present invention.
  • Gene refers to a DNA or RNA sequence that carries genetic information (that is, a gene is a DNA or RNA segment with genetic effects), also known as a genetic factor, and is the basic genetic unit that controls traits. Genes express the genetic information they carry by directing the synthesis of proteins, thereby controlling the performance of individual organisms.
  • Gene sequencing is a new type of gene detection technology, which analyzes and determines the entire gene sequence from blood or saliva, so as to predict the possibility of suffering from various diseases, and the behavioral characteristics and reasonable behavior of individuals.
  • Short sequence It is a small short sequence fragment, which is the sequencing data generated by a high-throughput sequencer. Sequencing the entire genome will generate tens of millions of reads, and then splicing these reads together can Obtain the full sequence of the genome.
  • the short sequences (reads) sequenced by NGS are stored in the FASTQ file. Although they are originally from an ordered genome, after DNA library construction and sequencing, the sequence relationship between different reads in the file is It's all been lost. Therefore, there is no positional relationship between the two reads next to each other in the FASTQ file, they are just short sequences randomly derived from a certain position in the original genome. Therefore, we need to smooth out this large pile of short sequences, compare them one by one with the reference genome of the species, find the position of each read on the reference genome, and then arrange them in order. This process is called sequencing. comparison of data.
  • Alignment Algorithms Computational methods for sequence alignments are generally divided into two categories: global alignments and local alignments. Computing a global route is a form of global optimization that enforces alignment of all query sequences over the entire length. In contrast, local alignments only identify local similarities while entire long sequences are often very different. Local alignments are often desirable, but can be more difficult to compute because there are challenges from identifying other similar regions.
  • Various computational algorithms have been applied to sequence alignment problems, including slow but formal optimization methods like dynamic programming, efficient but incomplete heuristics, or probabilistic methods designed to search large databases.
  • ARM ARM architecture, Advanced RISC Machine, earlier known as Acorn Reduced Instruction Set Machine, Acorn RISC Machine
  • Acorn RISC Machine is a reduced instruction set (RISC) processor architecture family, which is widely used in many embedded system design. Due to the characteristics of energy saving, it also has many achievements in other fields.
  • the ARM processor is very suitable for the field of mobile communication, and its main design goals are low cost, high performance, and low power consumption. On the other hand, supercomputers consume a lot of power, and ARM is also seen as a more efficient choice.
  • ARM Holdings developed this architecture and authorized other companies to use it for them to implement one of ARM's architectures and develop their own system-on-module (system-on-module, SoC).
  • GPU Graphics Processing Unit
  • GPU Graphics Processing Unit
  • display core visual processor, display device, or graphics device
  • graphics processor reduces the dependence of the graphics card on the central processing unit (CPU), and shares part of the work originally performed by the central processing unit, especially when performing 3D graphics operations, the effect is more obvious.
  • CUDA Computer Unified Device Architecture, unified computing architecture
  • NVIDIA is the company's official name for GPGPU.
  • NVIDIA is the company's official name for GPGPU.
  • NVIDIA is the company's official name for GPGPU.
  • NVIDIA GeForce 8 and later GPUs and newer Quadro GPUs for computing.
  • a GPU can be used as a development environment for a C-compiler.
  • NVIDIA When NVIDIA is marketing, it tends to mix and promote compilers and architectures, causing confusion.
  • CUDA is compatible with OpenCL or its own C-compiler. Whether it is CUDA C-language or OpenCL, the instructions will eventually be converted into PTX code by the driver, which is then calculated by the display core.
  • BWT (Burrows–Wheeler Transform, referred to as block sorting compression), is an algorithm applied in data compression technology (such as bzip2).
  • the algorithm was invented in 1994 by Michael Burrows and David Wheeler at the DEC Systems Research Center in Palo Alto, California. It is based on an undisclosed conversion method previously invented by Wheeler in 1983.
  • the algorithm When a string is converted with this algorithm, the algorithm only changes the order of characters in the string without changing its characters. If the original string has several substrings that appear multiple times, then the converted string will have some consecutive repeating characters, which is useful for compression. This method makes it easier to compress codes based on techniques that deal with consecutive repeating characters in strings, such as MTF transform and run-length coding.
  • Smith-waterman (Smith-Waterman algorithm) is an algorithm that performs local sequence alignment (as opposed to global alignment) to find similar regions between two nucleotide sequences or protein sequences.
  • the purpose of this algorithm is not to align the entire sequence, but to find fragments with high similarity in two sequences.
  • HASH Also known as hash algorithm, hash function, is a method of creating small digital "fingerprints" from any kind of data.
  • the hash function compresses the message or data into a digest, making the amount of data smaller and fixing the format of the data. This function shuffles the data and recreates a fingerprint called hash values (hash values, hash codes, hash sums, or hashes).
  • hash values are usually represented by a short string of random letters and numbers.
  • Good hash functions rarely have hash collisions in the input domain. In hash tables and data processing, not suppressing collisions to distinguish data can make database records more difficult to find.
  • SSE2 (Streaming SIMD Extensions 2), is a SIMD (Single Instruction Multiple Data) instruction set of the IA-32 architecture.
  • SSE2 is an instruction set that was launched in 2001 with Intel's release of the first-generation Pentium 4 processor. It extends the earlier SSE instruction set and can completely replace the MMX instruction set.
  • FIG. 1 is a schematic structural diagram of a gene sequencing data processing device. As shown in Figure 1, a genetic test
  • the sequence data processing device, the gene sequencing data processing device is a heterogeneous multi-core framework, including: an ARM framework 10, a GPU framework 20 and a PCI bus 30; the ARM framework 10 is connected to the GPU framework 20 through the PCI bus 30; the ARM framework 10 includes at least one CPU module
  • the GPU framework 30 includes at least one GPU module; the CPU module in the idle state is used to read the gene sequencing data in batches to obtain the batched gene sequencing data, and divide the gene analysis method to obtain the first algorithm and the second algorithm; An algorithm divides the batched gene sequencing data to obtain each short sequence, and sends each short sequence and the second algorithm to the GPU module in idle state; the GPU module in idle state is used to calculate each short sequence according to the second algorithm , and send the calculation result to the CPU module in the idle state; the CPU module in the idle state is also used to obtain batch processing results according to the calculation result and the first algorithm; the CPU module in the idle state and the GPU module in the idle state repeatedly execute the above steps , until the processing of the gene sequencing data is completed
  • the gene sequencing data processing device is a heterogeneous multi-core framework, that is, an ARM+GPU framework, wherein the ARM framework 10 includes a CPU module, and the GPU framework 20 includes a GPU module; the number of CPU modules and GPU modules is not fixed, and can be based on actual For example, it is determined according to the amount of gene sequencing data, CPU module performance, GPU module performance (such as GPU memory, CUDA core number, CUDA core frequency), and the algorithm complexity used in gene analysis.
  • each CPU module may be the same or different.
  • the processing or computing capabilities of each GPU module can also be the same or different.
  • the GPU module may be a GPU computing card, where the GPU computing card usually adopts a SIMT architecture.
  • the CPU module adopts the NENO acceleration technology; using this acceleration technology can further improve the running speed of the CPU module.
  • the gene sequencing data device can use the Jetson nano TX1 released by NVIDIA.
  • the device uses a Maxwell architecture GPU with 128 Cuda cores and a computing power of 472G.
  • Jetson-nano also has a 4-core A57 processor as ARM CPU core arithmetic unit.
  • Gene analysis methods refer to the methods used in the analysis and processing of gene sequencing data, including sequence comparison, gene set enrichment analysis (including GO analysis, KEGG analysis), and gene regulatory network analysis.
  • the first algorithm and the second algorithm obtained by dividing the gene analysis method are mainly divided according to the characteristics of a gene analysis method, that is, the algorithm suitable for CPU module processing is divided from the gene analysis method.
  • the first algorithm; the algorithm suitable for GPU module processing is also divided from the gene analysis method to form the second algorithm; it can be seen that the first algorithm and the second algorithm can be part of the gene analysis method, and can be composed of one or more algorithms. It consists of small steps, and there are no strict algorithm rules in the segmentation process, that is, as long as the segmentation principle is met.
  • the segmentation principle mainly includes: the first algorithm usually requires a lot of logical judgments, and there are dependencies between the calculation results, such as the second step calculation dependence or the first step calculation results as the basis, involving yes or no judgments, etc.;
  • the second algorithm is usually that multiple data can run the calculation at the same time, and no logical judgment is involved between each data or there is no dependency between the data.
  • each CPU module may be different, that is, some CPU modules are in a running state, while others are in an idle state.
  • the GPU modules in the GPU architecture 20 have a similar situation. Therefore, in this embodiment, the CPU modules and GPU modules in an idle state are used to perform corresponding operations, and the selected CPU modules and GPU modules may be all modules in an idle state, or may be a part of them.
  • the gene sequencing data may be data obtained by gene sequencing of any species, including DNA sequencing fragments, RNA sequencing fragments, and the like. Since a large amount of data is generated in one sequencing, the amount of gene sequencing data is relatively large, and the data can be analyzed and processed in batches, thereby avoiding data transmission congestion and the like. Therefore, in this embodiment, the CPU modules in the idle state read gene sequencing data in batches, and the amount of gene sequencing data read each time may not be equal. Specifically, the number of GPU modules and the data processing capability of each GPU module and The data reading capability of the CPU module and the data transmission capability of the PCI bus are considered to determine the most suitable amount of gene sequencing data, so as to ensure the highest data processing efficiency to the greatest extent.
  • the first algorithm is used to segment the batched gene sequencing data, wherein the lengths of the short sequences that are cut into short sequences may be different, and the number of the short sequences to be cut is not fixed.
  • the number of , the number of GPU modules in the idle state, and the GPU processing capacity are considered to select the most appropriate value.
  • the GPU module in the idle state calculates each short sequence according to the second algorithm, and the CPU module in the idle state can perform the next
  • the gene sequencing data is read and divided in batches; when the GPU module in the idle state completes the processing of the short sequence, the calculation results are transmitted to the CPU module in the idle state, and the CPU module can calculate the batches according to the calculation results and the first algorithm.
  • the calculation result is repeated continuously, and a pipeline is formed between the CPU module and the GPU module until all the gene sequencing data are processed.
  • the gene sequencing data processing apparatus is a heterogeneous multi-core framework, including an ARM framework 10, a GPU framework 20 and a PCI bus 30, wherein the ARM framework includes at least one CPU module, and the GPU framework includes At least one GPU module and the CPU module are connected with the GPU module through the PCI bus, and information can be transmitted between them.
  • the CPU module in the idle state is mainly used to read the gene sequencing data in batches and divide the gene analysis method, so as to obtain the batched gene sequencing data, the first algorithm (this algorithm is the most suitable algorithm for the CPU module to run) and the second.
  • this algorithm is the most suitable algorithm for the GPU module to run
  • the first algorithm to segment the batched gene sequencing data to obtain a series of short sequences, and transmit these short sequences and the second algorithm to the location at the PCI bus through the PCI bus.
  • the GPU module in the idle state; the GPU module calculates these short sequences according to the second algorithm, and then returns the calculation result to the CPU module in the idle state; the CPU module in the idle state obtains an allocation process according to the calculation result and the first algorithm.
  • the CPU module in the idle state and the GPU module in the idle state repeatedly perform the above steps until the gene sequencing data is processed, and then the CPU module in the idle state integrates each batch processing result to obtain the final processing result.
  • the gene sequencing data processing device and the gene sequencing data processing method separate the analysis method (ie the analysis process) of the gene sequencing data, and let them run on the CPU module and the GPU module respectively according to the characteristics, which greatly improves the efficiency of gene sequencing data analysis .
  • the gene sequencing data processing device can be provided with multiple CPU modules and GPU modules, and multiple GPU modules can simultaneously calculate short sequences of different lengths, which can solve the problem of low GPU parallel efficiency.
  • the CPU module in the idle state is further configured to scan each GPU module, determine the number of the GPU modules in the idle state and the data processing amount of the GPU modules in the idle state, and process the data according to the number of the GPU modules in the idle state and the data processing capacity of the GPU modules in the idle state. Quantitative batch reads of gene sequencing data.
  • the CPU module in the idle state starts the gene analysis, it can scan the GPU module to determine the number of GPUs currently available and the data processing capacity of the available GPU module, so as to determine the batch read gene sequencing this time. The amount of data, and then read the gene sequencing data according to this amount.
  • Time T2 These split short sequences are transmitted to one of the idle GPU modules through the PCI bus, and the CPU module can then process the next batch of data Di+1, forming a 2-stage pipeline.
  • Time T3 When the Di data is transferred to the video memory in the GPU, the second algorithm of the GPU can be started. At this time, Di+1 enters the PCI transmission stage, and the CPU module processes the next batch of data Di+2, forming a 3-stage pipeline. .
  • Time T4 Di data calculation is completed, and the calculation results are sent back to the CPU module through PCI. At this time, Di+1 enters the GPU module calculation stage, Di+2 enters the PCI input stage, and Di+3 is processed by the CPU module. 4-stage pipeline.
  • Time T5 After the calculation result of Di data is returned, it is handed over to the CPU module to use the first algorithm to continue to complete the operation of the subsequent stage of the comparison algorithm. At this time, a 5-stage pipeline is formed.
  • the gene analysis algorithm includes a gene alignment algorithm, a Dotplot algorithm, a blast algorithm, a PAM algorithm, an HMM algorithm, and an AI inference algorithm.
  • the Dotplot algorithm and the blast algorithm are a sequence alignment algorithm.
  • the PAM algorithm is a data mining clustering algorithm that can be used in single-cell sequencing to analyze cell subsets.
  • HMM algorithm hidden Markov clustering algorithm
  • hidden Markov clustering algorithm is a statistical model, which is used to describe a Markov process with hidden unknown parameters, which can be used in the prediction of target genes.
  • AI inference algorithm (DeepVariant), a deep learning algorithm, can be used to identify genetic mutations, etc.
  • the AI inference algorithm may be an inference algorithm related to CNN (Convolutional Neural Network) or RNN (Recurrent Neural Network).
  • the algorithm when the gene analysis algorithm is the Dotplot algorithm, the blast algorithm, or the PAM algorithm, the algorithm usually needs to be CUDAized first. CUDAization of the algorithm makes the method more suitable to run on the gene sequencing data processing device in the embodiment of the present invention.
  • the gene alignment algorithm includes the BWT algorithm, and the first algorithm includes the anchor cut algorithm; the CPU module in the idle state is further configured to use the anchor cut algorithm to perform anchor point positioning on the batched gene sequencing data, and use the anchor cut algorithm Point and point as the center to extend N bp lengths forward and backward respectively, and use the NEON command to cut the batched gene sequencing data by 2N+1 bp lengths to obtain each short sequence, where N is any positive integer.
  • the step of obtaining each short sequence includes: using the following formula to calculate and obtain each short sequence:
  • x represents the number of anchor points
  • N represents the number of extended bp
  • L represents the length of the batch gene sequencing data.
  • the gene alignment algorithm may be the BWT algorithm
  • the first algorithm may be the anchor point cutting algorithm and the BWT matrix transformation algorithm
  • the second algorithm may be the Hash algorithm.
  • the specific process is as follows: as shown in Figure 3, the CPU module in the idle state uses the first algorithm (ie the anchor point cutting algorithm) to process the data Di; first, the gene sequencing data (ie, read) of length L is read in batches.
  • the anchor point is fixed, and the length of N bp is extended forward and backward to obtain a short read with a length of 2N+1, and then the NEON command is used to cut and transport the read with a length of 2N+1.
  • the number of anchor points is x
  • the number of N is related to the following formula:
  • x represents the number of anchor points
  • N represents the number of extended bp
  • L represents the length of the batch gene sequencing data.
  • the second algorithm is a Hash algorithm
  • the GPU module in the idle state is also used to perform a Hash operation on each short sequence according to the Hash algorithm, obtain a Hash calculation result, and send the Hash calculation result to the idle state CPU module
  • the Hash calculation result is the value of the BWT algorithm matrix, which is used for the calculation of the BWT algorithm matrix.
  • the gene alignment algorithm may be the BWT algorithm
  • the first algorithm may be the anchor point cutting algorithm and the BWT matrix transformation algorithm
  • the second algorithm may be the Hash algorithm.
  • the short sequence x*K short sequences calculated by the first algorithm are transferred to the video memory in the GPU module in the idle state, where K represents the number of Di, and the number of short sequences is related to multiple GPU modules.
  • the total video memory is positively correlated.
  • the Hash algorithm is beneficial to the operation of the SIMT architecture of the GPU, the kernel function of the GPU is used to perform the hash calculation on multiple short sequences to obtain the Hash calculation result, and the Hash calculation result is sent to the CPU module in the idle state; the Hash calculation result is the BWT algorithm.
  • the value of the matrix which is used in the calculation of the matrix of the BWT algorithm. Compared with other traditional calculations (such as kmer calculation site algorithm), the use of Hash algorithm can greatly save memory space.
  • the first algorithm further includes a BWT matrix transformation algorithm; the CPU module in the idle state is further configured to use the BWT matrix transformation algorithm to transform the BWT algorithm matrix to obtain a BWT transformation result of a short sequence.
  • the gene alignment algorithm may be a BWT algorithm
  • the first algorithm may be an anchor point cutting algorithm and a BWT matrix transformation algorithm.
  • the CPU module After the GPU module sends the Hash calculation result to the CPU module in the idle state, the CPU module will use the Hash calculation result as the value of the BWT algorithm matrix for the calculation of the BWT algorithm matrix, and then use the BWT matrix transformation algorithm to perform the BWT algorithm matrix. Transform to get the BWT transform result of the short sequence.
  • h represents the Hash calculation result
  • Y represents the BWT algorithm matrix
  • r represents the short sequence. The method can quickly and accurately obtain the BWT transformation result of the short sequence, so that the compression of the gene sequencing data can be quickly completed, and the subsequent processing is more convenient.
  • the alignment algorithm includes a Smith-Waterman algorithm
  • the second algorithm includes a scoring matrix algorithm
  • the GPU module in the idle state is further configured to calculate the Smith-Waterman scoring matrix according to the scoring matrix algorithm, each short sequence and the reference species sequence, And send the Smith-Waterman scoring matrix to the CPU module in idle state.
  • step of calculating the Smith-Waterman scoring matrix comprising:
  • M represents the Smith-Waterman scoring matrix
  • R represents the length of the candidate interval sequence of the reference species
  • C represents the length of the short sequence formed by screening and splicing each short sequence received from the CPU module in the idle state
  • L represents the length of the short sequence.
  • a and b represent constants.
  • the traditional Smith-Waterman algorithm is relatively inefficient in the GPU, and cannot be directly used in the gene sequencing data processing device in the embodiment of the present invention, so the Smith-Waterman algorithm is improved.
  • there is a scoring matrix in the Smith-Waterman algorithm and the size is R*C; if the steps of calculating the scoring matrix are placed in the GPU module, then the second algorithm is the intended matrix algorithm at this time.
  • M represents the Smith-Waterman scoring matrix
  • R is the length of the candidate interval sequence of the reference species
  • C represents the length of the short sequence formed by screening and splicing each short sequence received from the CPU module in the idle state
  • L represents the score.
  • Length of batch gene sequencing data, a and b represent constants.
  • the length of C is related to the Hash calculation result calculated by the GPU module in the BWT algorithm. In this way, the traditional Smith-Waterman algorithm can be improved, so that it is suitable for running in the GPU, and the running efficiency is high.
  • an embodiment of the present invention further provides a gene sequencing data processing method.
  • a gene sequencing data processing method As shown in Figure 5, a gene sequencing data processing method, the method is applied to a gene sequencing data processing device, comprising the following steps:
  • Step S1 the CPU module in the idle state reads the gene sequencing data in batches to obtain the batched gene sequencing data
  • Step S2 the CPU module in the idle state divides the gene analysis method to obtain the first algorithm and the second algorithm;
  • Step S3 the CPU module in the idle state divides the batched gene sequencing data according to the first algorithm to obtain each short sequence, and sends each short sequence and the second algorithm to the GPU module in the idle state;
  • Step S4 the GPU module in idle state calculates each short sequence according to the second algorithm, and sends the calculation result to the CPU module in idle state;
  • Step S5 the CPU module in the idle state obtains the batch processing result according to the calculation result and the first algorithm calculation
  • Steps S1 to S5 are repeated until the processing of the gene sequencing data is completed, and the CPU module in an idle state performs an integrated operation on the batch processing results to obtain a final processing result.
  • the amount of gene sequencing data is relatively large, and the data can be analyzed and processed in batches, thereby avoiding data transmission congestion and the like.
  • the gene sequencing data read by the i-th batch of CPU in idle state is recorded as Di.
  • the CPU module in the idle state reads the gene sequencing data Di, and divides the gene analysis method to obtain the first algorithm and the second algorithm; divides the gene sequencing data Di according to the first algorithm to obtain each short sequence, and divides each short sequence.
  • the sequence and the second algorithm are sent to the GPU module in the idle state, and then the GPU module in the idle state calculates each short sequence according to the second algorithm, and sends the calculation result to the CPU module in the idle state; then the CPU module in the idle state calculates according to the second algorithm.
  • the result and the first algorithm are calculated to obtain batch processing results; in addition, the CPU module in the idle state reads the gene sequencing data Di+1, divides the gene sequencing data Di+1, and divides the divided gene sequencing data Di+1
  • the corresponding short sequence is sent to the GPU module in the idle state, and Di+1 represents the gene sequencing data read in batch i+1; the short sequence corresponding to the gene sequencing data Di+1 after being split by the GPU module in the idle state is processed.
  • the processing results are then sent to the CPU module in the idle state; the CPU module in the idle state and the GPU module in the idle state continuously read, segment, transmit, calculate and return the gene sequencing data (that is, repeat steps S1-S5 continuously. ), until all the gene sequencing data is processed, a pipeline is formed between the CPU module in the idle state and the GPU module in the idle state in the process.
  • the CPU module in the idle state scans each GPU module, determines the number of the GPU modules in the idle state and the data processing amount of the GPU modules in the idle state, and divides them into batches according to the number of the GPU modules in the idle state and the processing amount of each data. Read gene sequencing data.
  • the gene analysis algorithm includes a gene alignment algorithm, a Dotplot algorithm, a blast algorithm, a PAM algorithm, an HMM algorithm, and an AI inference algorithm.
  • the gene alignment algorithm includes a BWT algorithm, and the first algorithm includes an anchor point cutting algorithm; the CPU module in an idle state uses the anchor point cutting algorithm to perform anchor point positioning on the batched gene sequencing data, and the anchor point setting point is The center is extended forward and backward by N bp lengths, respectively, and the batch gene sequencing data is cut by 2N+1 bp lengths using the NEON command to obtain each short sequence, where N is any positive integer.
  • the step of obtaining each short sequence includes: using the following formula to calculate and obtain each short sequence:
  • x represents the number of anchor points
  • N represents the number of extended bp
  • L represents the length of the batch gene sequencing data.
  • the second algorithm is a Hash algorithm
  • the GPU module in the idle state is also used to perform a Hash operation on each short sequence according to the Hash algorithm, obtain a Hash calculation result, and send the Hash calculation result to the idle state CPU module;
  • Hash is the value of the BWT algorithm matrix, which is used for the calculation of the BWT algorithm matrix.
  • the first algorithm further includes a BWT matrix transformation algorithm; the CPU module in the idle state transforms the BWT algorithm matrix by using the BWT matrix transformation algorithm to obtain a BWT transformation result of a short sequence.
  • the alignment algorithm includes a Smith-Waterman algorithm
  • the second algorithm includes a scoring matrix algorithm
  • the GPU module in the idle state is further configured to calculate the Smith-Waterman scoring matrix according to the scoring matrix algorithm, each short sequence and the reference species sequence, And send the Smith-Waterman scoring matrix to the CPU module in idle state.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Chemical & Material Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

一种基因测序数据处理方法和装置,方法应用于装置,装置包括依次连接的ARM构架、PCI总线和GPU构架,ARM构架包括至少一个CPU模块;GPU构架包括至少一个GPU模块;方法包括空闲状态的CPU模块分批读取基因测序数据,将基因分析方法分为第一算法和第二算法;用第一算法对分批基因测序数据进行切分,将得到的各短序列和第二算法发送至空闲状态的GPU模块;空闲状态的GPU模块根据第二算法对各短序列进行计算把计算结果发送至空闲状态的CPU模块;空闲状态的CPU模块根据计算结果和第一算法计算得到分批处理结果。该方法将分析方法分割运行在CPU模块和GPU模块,大大提高了数据分析效率。

Description

基因测序数据处理方法和基因测序数据处理装置
本申请要求于2020年10月22日提交中国专利局、申请号为202011139823.4、发明名称为“基因测序数据处理方法和基因测序数据处理装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及数据处理技术领域,具体涉及一种基因测序数据处理方法和基因测序数据处理装置。
背景技术
随着基因测序技术的不断发展,该方法被广泛地应用于新物种、病毒以及疾病的研发分析中;与此同时大量的基因测序数据大量涌出,如何高效地完成对这些数据进行分析处理就显得尤为重要。
目前的基因分析流程中,绝大部分步骤(例如基因比对过程)都只能运行的在x86构架上面,例如传统的比对算法bwa使用的是bwt算法、非精确比对算法的Smith-Waterman算法也是基于x86架构的SSE2指令实现的。
虽然基于x86实现的BWT比对算法在x86构架的CPU上面运行速度比较快,但无法大批量同时计算,并且BWT算法因无法适应GPU的SIMT的运行模式,导致了BWT在GPU运行的效率大大降低,从而影响整个比对过程的效率。同样的,现有的Smith-Waterman算法仅运行在x86架构上面,在ARM平台中缺少了SSE2加速的支持,运行速度较慢;而且该算法也同样不适合在GPU架构上进行运算。
发明内容
有鉴于此,本发明提供了一种基因测序数据处理装置和基因测序数据处理方法,以解决现有的基因测序数据分析处理流程步骤只能运行在x86构架上且在GPU上运行速度慢,从而造成基因测序数据处理过程效率低的问题。
本发明实施例中提供了一种基因测序数据处理方法,所述方法应用于基因测序数据处理装置,其中所述基因测序数据处理装置为异构多核构架,包括:ARM构架、GPU构架以及PCI总线;所述ARM构架通过所述PCI总线连接所述GPU构架;所述ARM构架包括至少一个CPU模块;所述GPU构架包括至少一个GPU模块;所述方法包括以下步骤:
步骤S1:空闲状态的所述CPU模块分批读取基因测序数据得到分批基因测序数据;
步骤S2:空闲状态的所述CPU模块对基因分析方法进行分切得到第一算法和第二算法;
步骤S3:空闲状态的所述CPU模块根据所述第一算法对所述分批基因测序数据进行切分得到各短序列,并把各所述短序列和所述第二算法发送至空闲状态的所述GPU模块;
步骤S4:空闲状态的所述GPU模块根据所述第二算法对各所述短序列进行计算,并把计算结果发送至空闲状态的所述CPU模块;
步骤S5:空闲状态的所述CPU模块根据所述计算结果和所述第一算法计算得到分批处理结果;
重复步骤S1~S5,直至将所述基因测序数据处理完成,空闲状态的所述CPU模块将各所述分批处理结果进行整合运算,得到最终处理结果。
可选地,
空闲状态的所述CPU模块扫描各所述GPU模块,确定空闲状态的GPU模块数量以及各空闲状态的GPU模块的数据处理量,并根据所述空闲状态的GPU模块数量以及各所述数据处理量分批读取基因测序数据。
可选地,
所述基因分析算法包括基因比对算法、Dotplot算法、blast算法、PAM算法、HMM算法以及AI推断算法。
可选地,
所述基因比对算法包括BWT算法,所述第一算法包括锚点切割算法;
空闲状态的所述CPU模块将所述分批基因测序数据采用锚点切割算法进行锚点定点,并以所述锚点定点为中心分别向前后延伸N个bp长度, 并采用NEON指令对所述分批基因测序数据进行2N+1个bp长度的切割,得到各所述短序列,其中N为任意正整数。
可选地,
在得到各所述短序列的步骤中,包括:采用以下公式计算得到各所述短序列:
(2*N+1)*x<L
其中,x表示锚点个数,N表示延伸的bp数量,L表示所述分批基因测序数据的长度。
可选地,
所述第二算法为Hash算法;空闲状态的所述GPU模块根据所述Hash算法对各所述短序列进行Hash运算,得到Hash计算结果,并将所述Hash计算结果发送至空闲状态的所述CPU模块;其中所述Hash计算结果为BWT算法矩阵的值,用于BWT算法矩阵的计算。
可选地,
所述第一算法还包括BWT矩阵变换算法;
空闲状态的所述CPU模块采用所述BWT矩阵变换算法对所述BWT算法矩阵进行变换,得到所述短序列的BWT变换结果。
可选地,
所述比对算法包括Smith-Waterman算法,所述第二算法包括打分矩阵算法;
空闲状态的所述GPU模块根据所述打分矩阵算法、各所述短序列以及参考物种序列计算Smith-Waterman打分矩阵,并将所述Smith-Waterman打分矩阵发送至空闲状态的所述CPU模块。
可选地,
在计算Smith-Waterman打分矩阵的步骤中,包括:
采用以下公式计算Smith-Waterman打分矩阵:
M=R*C
R=a*L 2+b
其中,M表示Smith-Waterman打分矩阵,R为参考物种备选区间序列 的长度,C表示对从空闲状态的所述CPU模块接收的到各短序列进行筛选拼接形成的短序列的长度,L表示表示所述分批基因测序数据的长度,a和b表示常数。
本发明实施例中提供了一种基因测序数据处理装置,所述基因测序数据处理装置为异构多核构架,所述基因测序数据处理装置执行所述的基因测序数据处理方法。
本发明实施例中的基因测序数据处理装置和基因测序数据处理方法,方法应用于装置,基因测序数据处理装置为异构多核构架,包括ARM构架、GPU构架和PCI总线,其中ARM构架包括至少一个CPU模块,而GPU构架包括至少一个GPU模块,CPU模块通过PCI总线与GPU模块连接,这两者之间可以进行信息的相互传输。其中方法包括空闲状态的CPU模块主要用于分批读取基因测序数据和对基因分析方法进行分切,从而得到分批基因测序数据、第一算法(该算法是CPU模块最适合运行的算法)和第二算法(该算法是GPU模块最适合运行的算法),然后采用第一算法对分批基因测序数据进行切分,得到一些列短序列,并将这些短序列和第二算法通过PCI总线传输至处于空闲状态的GPU模块;GPU模块就根据第二算法对这些短序列进行计算,然后把计算结果返回至空闲状态的CPU模块;空闲状态的CPU模块就根据计算结果和第一算法计算得到一个分配处理结果;空闲状态的CPU模块和空闲状态的GPU模块重复执行上述步骤,直至将基因测序数据处理完成,然后空闲状态的CPU模块将每个分批处理结果进行整合,即可得到最终处理结果。该基因测序数据处理装置和基因测序数据处理方法将基因测序数据的分析方法(即分析过程)分割开来,让其按照特性分别运行在CPU模块和GPU模块,大大提高了基因测序数据分析的效率。另外,该基因测序数据处理装置中可以设置多个CPU模块和GPU模块,且多个GPU模块可以同时计算不同长度短序列,可以解决GPU并行效率低的问题。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对 实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本发明实施例中的基因测序数据处理装置的结构示意图;
图2为本发明实施例中基因测序数据处理装置数据处理过程示意图;
图3为本发明实施例中CPU模块对分批基因测序数据进行锚点切割的示意图;
图4为本发明实施例中GPU模块采用Hash算法对短序列进行Hash运算的示意图;
图5为本发明实施例中基因测序数据处理方法流程示意图。
具体实施方式
下面将对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
名词解释:
基因(Gene,Mendelian factor)是指携带有遗传信息的DNA或RNA序列(即基因是具有遗传效应的DNA或RNA片段),也称为遗传因子,是控制性状的基本遗传单位。基因通过指导蛋白质的合成来表达自己所携带的遗传信息,从而控制生物个体的性状表现。
基因测序是一种新型基因检测技术,从血液或唾液中分析测定基因全序列,从而预测罹患多种疾病的可能性、个体的行为特征及行为合理等。
短序列(read):是一小段短的测序片段,是高通量测序仪产生的测序数据,对整个基因组进行测序,就会产生成百上千万的read,然后将这些read拼接起来就能获得基因组的全序列。
比对分析:NGS测序下来的短序列(read)存储于FASTQ文件里面,虽然它们原本都来自于有序的基因组,但在经过DNA建库和测序之后,文件中不同read之间的前后顺序关系就已经全部丢失了。因此,FASTQ文件 中紧挨着的两条read之间没有任何位置关系,它们都是随机来自于原本基因组中某个位置的短序列而已。因此,我们需要先把这一大堆的短序列捋顺,一个个去跟该物种的参考基因组比较,找到每一条read在参考基因组上的位置,然后按顺序排列好,这个过程就称为测序数据的比对。
比对算法:序列比对的计算方法一般分为两类:全局性比对(global alignments)和局部比对(local alignments)。计算一个全局性的路线,是一个全局优化的形式,其强制按照整个长度的所有查询序列对齐。与此相反,局部比对只确定局部的相似而整个长序列却往往大相径庭。局部比对往往是可取的,但可能更难以计算,因为还有来自确定其他相似区域的挑战。各种计算算法已应用于序列比对问题中,包括缓慢但正规的像动态规划的优化方法、高效率但不彻底的启发式算法,或大型数据库搜索设计的概率方法。
ARM:ARM架构,高级精简指令集机器(Advanced RISC Machine,更早称作Acorn精简指令集机器,Acorn RISC Machine),是一个精简指令集(RISC)处理器架构家族,其广泛地使用在许多嵌入式系统设计中。由于节能的特点,其在其他领域上也有很多作为。ARM处理器非常适用于移动通信领域,符合其主要设计目标为低成本、高性能、低耗电的特性。另一方面,超级计算机消耗大量电能,ARM同样被视作更高效地选择。安谋控股(ARM Holdings)开发此架构并授权其他公司使用,以供他们实现ARM的某一个架构,开发自主的系统单片机和系统模块(system-on-module,SoC)。
GPU:图形处理器(Graphics Processing Unit,缩写:GPU;又称显示核心、视觉处理器、显示装置或绘图装置)是一种专门在个人电脑、工作站、游戏机和一些移动设备(如平板电脑、智能手机等)上运行绘图运算工作的微处理器。图形处理器使显卡减少对中央处理器(CPU)的依赖,并分担部分原本是由中央处理器所担当的工作,尤其是在进行三维绘图运算时,功效更加明显。
CUDA:(Compute Unified Device Architecture,统一计算架构)是由NVIDIA所推出的一种集成技术,是该公司对于GPGPU的正式名称。透过 这个技术,用户可利用NVIDIA的GeForce 8以后的GPU和较新的Quadro GPU进行计算。亦是首次可以利用GPU作为C-编译器的开发环境。NVIDIA营销的时候,往往将编译器与架构混合推广,造成混乱。实际上,CUDA可以兼容OpenCL或者自家的C-编译器。无论是CUDA C-语言或是OpenCL,指令最终都会被驱动程序转换成PTX代码,交由显示核心计算。
BWT:(Burrows–Wheeler Transform,简称BWT,也称作块排序压缩),是一个被应用在数据压缩技术(如bzip2)中的算法。该算法于1994年被Michael Burrows和David Wheeler在位于加利福尼亚州帕洛阿尔托的DEC系统研究中心发明。它的基础是之前Wheeler在1983年发明的一种没有公开的转换方法。当一个字符串用该算法转换时,算法只改变这个字符串中字符的顺序而并不改变其字符。如果原字符串有几个出现多次的子串,那么转换过的字符串上就会有一些连续重复的字符,这对压缩是很有用的。该方法能使得基于处理字符串中连续重复字符的技术(如MTF变换和游程编码)的编码更容易被压缩。
Smith-waterman:(Smith-Waterman algorithm)是一种进行局部序列比对(相对于全局比对)的算法,用于找出两个核苷酸序列或蛋白质序列之间的相似区域。该算法的目的不是进行全序列的比对,而是找出两个序列中具有高相似度的片段。
HASH:又称散列算法、哈希函数,是一种从任何一种数据中创建小的数字“指纹”的方法。散列函数把消息或数据压缩成摘要,使得数据量变小,将数据的格式固定下来。该函数将数据打乱混合,重新创建一个叫做散列值(hash values,hash codes,hash sums,或hashes)的指纹。散列值通常用一个短的随机字母和数字组成的字符串来代表。好的散列函数在输入域中很少出现散列冲突。在散列表和数据处理中,不抑制冲突来区别数据,会使得数据库记录更难找到。
SSE2:(Streaming SIMD Extensions 2),是一种IA-32架构的SIMD(单一指令多重数据)指令集。SSE2是在2001年随着Intel发表第一代Pentium 4处理器也一并推出的指令集。它延伸较早的SSE指令集,而且可以完全取代MMX指令集。
为了更详细说明本发明,下面结合附图对本发明提供的一种基因测序数据处理装置和基因测序数据处理方法,进行具体地描述。
图1为基因测序数据处理装置的结构示意图。如图1所示,一种基因测
序数据处理装置,基因测序数据处理装置为异构多核构架,包括:ARM构架10、GPU构架20以及PCI总线30;ARM构架10通过PCI总线30连接GPU构架20;ARM构架10包括至少一个CPU模块;GPU构架30包括至少一个GPU模块;空闲状态的CPU模块用于分批读取基因测序数据得到分批基因测序数据,以及对基因分析方法进行分切得到第一算法和第二算法;根据第一算法对分批基因测序数据进行切分得到各短序列,并把各短序列和第二算法发送至空闲状态的GPU模块;空闲状态的GPU模块用于根据第二算法对各短序列进行计算,并把计算结果发送至空闲状态的CPU模块;空闲状态的CPU模块还用于根据计算结果和第一算法计算得到分批处理结果;空闲状态的CPU模块和空闲状态的GPU模块重复执行上述步骤,直至将基因测序数据处理完成,然后空闲状态的CPU模块并将各分批处理结果进行整合运算,得到最终处理结果。
具体地,基因测序数据处理装置为异构多核构架,即是ARM+GPU构架,其中ARM构架10包括CPU模块,GPU构架20包括GPU模块;CPU模块和GPU模块的数量不是固定的,可以根据实际的运算情况进行设置,例如根据基因测序数据的数量、CPU模块性能、GPU模块性能(例如GPU的显存、CUDA核心数、CUDA核心频率)、基因分析采用的算法复杂度等进行确定。
其中,每个CPU模块内部的核心(core)的处理或计算能力可以是相同的也可以不同的。同样每个GPU模块的处理或计算能力也可以是相同或不同的。可选地,GPU模块可以是GPU计算卡,其中GPU计算卡通常是采用SIMT架构。
在一种可选地实施方式中,CPU模块采用了NENO加速技术;采用该加速技术可以进一步提高CPU模块运行速度。
在一个可选地实施例方式中,基因测序数据装置可以使用NVIDIA发布的Jetson nano TX1,该设备使用的是Maxwell架构的GPU,有128个Cuda核 心、运算能力472G,同时Jetson-nano还有一个4核心A57处理器作为ARM CPU核心运算器。
基因分析方法是指对基因测序数据进行分析处理过程中使用的方法,其中包括序列对比、基因集富集分析(包括GO分析、KEGG分析)以及基因调控网络分析等。
在将基因分析方法进行切分得到的第一算法和第二算法时,主要是根据某个基因分析方法的特性进行分割,即就是把适合CPU模块处理的算法从该基因分析方法中分割出来形成第一算法;把适合GPU模块处理的算法也从该基因分析方法中分割出来形成第二算法;由此可见,第一算法和第二算法可以是该基因分析方法中一部分,可以由一个或多个小步骤组成,其中在分割过程中没有严格的算法规则,即只要符合分割原理即可。其中,分割原理主要包括:第一算法是通常需要大量逻辑判断、计算结果之间有依赖性,例如第二步计算依赖或以第一步计算结果作为基础、涉及到是或否的判断等;而第二算法通常是多个数据可以同时运行计算,且每个数据之间不涉及到逻辑判断或这个数据之间没有依赖性。
应当理解,本实施例中的“第一”、“第二”并不是对算法的限定,仅只是为了将这两者进行区分。
此外,由于ARM构架10中通常有多个CPU模块,每个CPU模块的运行或工作状态可能不相同,即有的CPU模块处于运行状态,而有的则处于空闲状态。同样,GPU构架20中的GPU模块也有类似的情况。因此,在本实施例中,采用空闲状态的CPU模块和GPU模块进行相应的操作,其中选择的CPU模块和GPU模块可以是所有空闲状态的模块,也可以是其中的一部分。
另外,所述的基因测序数据可以是任意物种的进行基因测序而得到的数据,其中包括DNA测序片段、RNA测序片段等。由于一次测序中会产生大量的数据,因此所述的基因测序数据的数据量会比较大,在对该数据进行分析处理时可以分批进行,从而避免数据传输的拥堵等。因此,在本实施例中,空闲状态的CPU模块分批读取基因测序数据,每次读取的基因测序数据的数量可以不相等,具体可以根据GPU模块数量以及每个GPU模块 数据处理能力以及CPU模块数据读取能力、PCI总线数据传输能力等综合考虑确定出最合适的基因测序数据的数量,从而能最大程度地保证数据处理效率最高。
在分批次读取基因测序数据后,通常还需要对分批基因测序数据进行切分,形成多个短序列。在本实施例中,采用第一算法对分批基因测序数据进行切分,其中切割成短序列长度可以是不相同的,且切割的短序列数量也并不是固定,可以根据分批基因测序数据的数量、空闲状态的GPU模块数量以及GPU处理能力综合考虑选择最合适的数值。
当空闲状态的CPU模块把各短序列以及第二算法传输至空闲状态的GPU后,空闲状态的GPU模块根据第二算法对各短序列进行计算,此时空闲状态的CPU模块可以进行下一次的分批基因测序数据读取、分割;而当空闲状态的GPU模块将短序列处理完成后,将计算结果又传输至空闲状态的CPU模块,CPU模块可以根据计算结果以及第一算法计算得到分批计算结果;由此不断重复,CPU模块与GPU模块之间形成流水线,直至将所有基因测序数据全部处理完成。
本发明实施例中的基因测序数据处理装置,其中基因测序数据处理装置为异构多核构架,包括ARM构架10、GPU构架20和PCI总线30,其中ARM构架包括至少一个CPU模块,而GPU构架包括至少一个GPU模块,CPU模块通过PCI总线与GPU模块连接,这两者之间可以进行信息的相互传输。空闲状态的CPU模块主要用于分批读取基因测序数据和对基因分析方法进行分切,从而得到分批基因测序数据、第一算法(该算法是CPU模块最适合运行的算法)和第二算法(该算法是GPU模块最适合运行的算法),然后采用第一算法对分批基因测序数据进行切分,得到一些列短序列,并将这些短序列和第二算法通过PCI总线传输至处于空闲状态的GPU模块;GPU模块就根据第二算法对这些短序列进行计算,然后把计算结果返回至空闲状态的CPU模块;空闲状态的CPU模块就根据计算结果和第一算法计算得到一个分配处理结果,空闲状态的CPU模块和空闲状态的GPU模块重复执行上述步骤,直至将基因测序数据处理完成,然后空闲状态的CPU模块将每个分批处理结果进行整合,即可得到最终处理结果。该基因测序数据处理装 置和基因测序数据处理方法将基因测序数据的分析方法(即分析过程)分割开来,让其按照特性分别运行在CPU模块和GPU模块,大大提高了基因测序数据分析的效率。另外,该基因测序数据处理装置中可以设置多个CPU模块和GPU模块,且多个GPU模块可以同时计算不同长度短序列,可以解决GPU并行效率低的问题。
在一个实施例中,空闲状态的CPU模块还用于扫描各GPU模块,确定空闲状态的GPU模块数量以及各空闲状态的GPU模块的数据处理量,并根据空闲状态的GPU模块数量以及各数据处理量分批读取基因测序数据。
具体地,空闲状态的CPU模块在启动基因分析时,可扫描GPU模块,以确定当前可用的GPU的片数,以及可用的GPU模块的数据处理量,从而来确定本次分批读取基因测序数据的数量,然后根据这个数量读取基因测序数据。
为了便于理解本方案,结合附图1和附图2,给出一个基因测序数据处理装置工作流程的详细实施例,其中在本实施例中基因分析方法采用基因比对方法为例。
1.T1时刻:空闲状态的CPU模块接收基因测序数据D,启动比对任务程序,扫描当前可用的GPU模块的片数,记为G;根据数据D的测序长度记为L1;CPU模块对数据D进行分批读取,每次一批读取数据Di数量记为K,K的值可根据GPU模块数量进行调整:计算公式:K=A*G,其中A表示GPU模块一次能处理的数据量(在本实施例中选择每个GPU模块处理能力完全相同),对数据Di按照第一算法进行切分,形成多个短序列。
2.T2时刻:将这些切分好的短序列通过PCI总线传输到其中一片空闲的GPU模块中,同时CPU模块可以接着对下一批数据Di+1进行处理,形成了2级流水线。
3.T3时刻:当Di数据传输到GPU中显存中,即可启动GPU第二算法,此时Di+1进入了PCI传输阶段,CPU模块处理下一批数据Di+2,形成了3级流水线。
4.T4时刻:Di数据计算完成,计算结果通过PCI回传给CPU模块,此时Di+1进入了GPU模块计算阶段,Di+2进入PCI输入阶段,Di+3由CPU模 块处理数据,形成了4级流水线。
5.T5时刻:Di数据计算结果回传完毕后交给CPU模块采用第一算法继续完成比对算法后续阶段的操作,此时形成了5级流水线。
在一个实施例中,基因分析算法包括基因比对算法、Dotplot算法、blast算法、PAM算法、HMM算法以及AI推断算法。
具体地,Dotplot算法、blast算法是一种序列比对算法。
PAM算法是一种数据挖掘的聚类算法,可以用在单细胞测序中来对细胞亚群等进行分析。
HMM算法,隐马尔可夫聚类算法,是统计模型,它用来描述一个含有隐含未知参数的马尔可夫过程,可以用在靶基因的预测中。
AI推断算法(DeepVariant),深度的学习算法,可以用来识别基因突变等。可选地,AI推断算法可以是CNN(卷积神经网络)、RNN(循环神经网络)相关的推断算法。
可选地,当基因分析算法为Dotplot算法、blast算法、PAM算法时,通常需要先对算法进行CUDA化。对算法进行CUDA化使得该方法更加合适运行在本发明实施例中的基因测序数据处理装置上。
在一个实施例中,基因比对算法包括BWT算法,第一算法包括锚点切割算法;空闲状态的CPU模块还用于将分批基因测序数据采用锚点切割算法进行锚点定点,并以锚点定点为中心分别向前后延伸N个bp长度,并采用NEON指令对分批基因测序数据进行2N+1个bp长度的切割,得到各短序列,其中N为任意正整数。
在一个实施例中,在得到各短序列的步骤中,包括:采用以下公式计算得到各短序列:
(2*N+1)*x<L
其中,x表示锚点个数,N表示延伸的bp数量,L表示分批基因测序数据的长度。
可选地,基因比对算法可以是BWT算法,第一算法可以是锚点切割算法和BWT矩阵变换算法;第二算法可以是Hash算法。具体过程为:如图3所示,空闲状态的CPU模块对数据Di采用第一算法(即锚点切割算法)进 行处理;首先对分批读取的长度为L的基因测序数据(即read)进行锚点定点,并前后延伸N个bp长度,获得长度为2N+1的短read,然后使用NEON指令对read进行2N+1长度的切割和搬运。锚点数量为x个的情况下,N的个数有关系如下公式:
(2*N+1)*x<L
其中,x表示锚点个数,N表示延伸的bp数量,L表示分批基因测序数据的长度。采用上述的方式可以得到多个短序列,这些短序列适合在GPU模块上运行。
在一个实施例中,第二算法为Hash算法;空闲状态的GPU模块还用于根据Hash算法对各短序列进行Hash运算,得到Hash计算结果,并将Hash计算结果发送至空闲状态的CPU模块;其中Hash计算结果为BWT算法矩阵的值,用于BWT算法矩阵的计算。
具体地,基因比对算法可以是BWT算法,第一算法可以是锚点切割算法和BWT矩阵变换算法;第二算法可以是Hash算法。如图4所示,将通过第一算法计算出的短序列x*K条短序列传入空闲状态的GPU模块中的显存中,其中K表示Di的数量,短序列的数量与多个GPU模块的总显存呈正相关。由于Hash算法利于GPU的SIMT架构的运行,使用GPU的核函数对多个短序列进行hash计算,得到Hash计算结果,并将Hash计算结果发送至空闲状态的CPU模块;其中Hash计算结果为BWT算法矩阵的值,用于BWT算法矩阵的计算。采用Hash算法与传统其他计算(例如kmer计算位点算法)相比,可以大大节约内存空间。
在一个实施例中,第一算法还包括BWT矩阵变换算法;空闲状态的CPU模块还用于采用BWT矩阵变换算法对BWT算法矩阵进行变换,得到短序列的BWT变换结果。
具体地,基因比对算法可以是BWT算法,第一算法可以是锚点切割算法和BWT矩阵变换算法。在GPU模块将Hash计算结果发送至空闲状态的CPU模块后,CPU模块将根据Hash计算结果为BWT算法矩阵的值,用于BWT算法矩阵的计算,然后采用采用BWT矩阵变换算法对BWT算法矩阵进行变换,得到短序列的BWT变换结果。可选地,Hash计算结果为BWT 算法矩阵之间的关系可以表示为h=Hash(x,r),Y=BWT(h,r)。其中h表示Hash计算结果,Y表示BWT算法矩阵,r表示短序列。采用该方法可以快速准确地得到短序列的BWT变换结果,从而快速完成对基因测序数据的压缩,更加方便后续的处理。
在一个实施例中,比对算法包括Smith-Waterman算法,第二算法包括打分矩阵算法;空闲状态的GPU模块还用于根据打分矩阵算法、各短序列以及参考物种序列计算Smith-Waterman打分矩阵,并将Smith-Waterman打分矩阵发送至空闲状态的CPU模块。
在一个实施例中,在计算Smith-Waterman打分矩阵的步骤中,包括:
采用以下公式计算Smith-Waterman打分矩阵:M=R*C,R=a*L2+b;
其中,M表示Smith-Waterman打分矩阵,R为参考物种备选区间序列的长度,C表示对从空闲状态的所述CPU模块接收的到各短序列进行筛选拼接形成的短序列的长度,L表示表示分批基因测序数据的长度,a和b表示常数。
具体而言,传统的Smith-Waterman算法在GPU中运行效率比较低下,无法直接在本发明实施例中的基因测序数据处理装置,因此对Smith-Waterman算法进行改进。具体的,Smith-Waterman算法中存在一个打分矩阵,大小为R*C;将计算打分矩阵的步骤放在GPU模块中,那么此时第二算法为打算矩阵算法。采用以下公式计算Smith-Waterman打分矩阵:M=R*C,R=a*L2+b;
其中,M表示Smith-Waterman打分矩阵,R为参考物种备选区间序列的长度,C表示对从空闲状态的CPU模块接收的到各短序列进行筛选拼接形成的短序列的长度,L表示表示分批基因测序数据的长度,a和b表示常数。另外,C的长度与BWT算法中GPU模块计算出的Hash计算结果有关。采用该方式可以对传统的Smith-Waterman算法进行改进,使其适合于在GPU中运行,且运行效率高。
根据上述的基因测序数据处理装置,本发明实施例中还提供了一种基因测序数据处理方法。
如图5所示,一种基因测序数据处理方法,方法应用于基因测序数据处 理装置,包括以下步骤:
步骤S1:空闲状态的CPU模块分批读取基因测序数据得到分批基因测序数据;
步骤S2:空闲状态的CPU模块对基因分析方法进行分切得到第一算法和第二算法;
步骤S3:空闲状态的CPU模块根据第一算法对分批基因测序数据进行切分得到各短序列,并把各短序列和第二算法发送至空闲状态的GPU模块;
步骤S4:空闲状态的GPU模块根据第二算法对各短序列进行计算,并把计算结果发送至空闲状态的CPU模块;
步骤S5:空闲状态的CPU模块根据计算结果和第一算法计算得到分批处理结果;
重复步骤S1~S5,直至将基因测序数据处理完成,空闲状态的CPU模块将各分批处理结果进行整合运算,得到最终处理结果。
具体地,由于一次测序中会产生大量的数据,因此所述的基因测序数据的数据量会比较大,在对该数据进行分析处理时可以分批进行,从而避免数据传输的拥堵等。其中把空闲状态的CPU第i批读取的基因测序数据记为Di。空闲状态的CPU模块读取基因测序数据Di,以及对基因分析方法进行分切得到第一算法和第二算法;根据第一算法对基因测序数据Di进行切分得到各短序列,并把各短序列和第二算法发送至空闲状态的GPU模块,然后空闲状态的GPU模块根据第二算法对各短序列进行计算,并把计算结果发送至空闲状态的CPU模块;然后空闲状态的CPU模块根据计算结果和第一算法计算得到分批处理结果;此外,空闲状态的CPU模块读取基因测序数据Di+1,对基因测序数据Di+1进行分切,将分切后的基因测序数据Di+1对应的短序列发送至空闲状态的GPU模块,Di+1表示第i+1批读取的基因测序数据;空闲状态的GPU模块分切后的基因测序数据Di+1对应的短序列进行处理,再把处理结果发送至空闲状态的CPU模块;空闲状态的CPU模块和处于空闲状态的GPU模块不断地进行基因测序数据读取、切分、传输、计算和回传(即不断重复步骤S1-S5),直到将所有的基因测序数据处理完毕,在该过程中空闲状态的CPU模块和空闲状态的GPU模块之间形成 流水线。
在一个实施例中,空闲状态的CPU模块扫描各GPU模块,确定空闲状态的GPU模块数量以及各空闲状态的GPU模块的数据处理量,并根据空闲状态的GPU模块数量以及各数据处理量分批读取基因测序数据。
在一个实施例中,基因分析算法包括基因比对算法、Dotplot算法、blast算法、PAM算法、HMM算法以及AI推断算法。
在一个实施例中,基因比对算法包括BWT算法,第一算法包括锚点切割算法;空闲状态的CPU模块将分批基因测序数据采用锚点切割算法进行锚点定点,并以锚点定点为中心分别向前后延伸N个bp长度,并采用NEON指令对分批基因测序数据进行2N+1个bp长度的切割,得到各短序列,其中N为任意正整数。
在一个实施例中,在得到各短序列的步骤中,包括:采用以下公式计算得到各短序列:
(2*N+1)*x<L
其中,x表示锚点个数,N表示延伸的bp数量,L表示分批基因测序数据的长度。
在一个实施例中,第二算法为Hash算法;空闲状态的GPU模块还用于根据Hash算法对各短序列进行Hash运算,得到Hash计算结果,并将Hash计算结果发送至空闲状态的CPU模块;其中Hash为BWT算法矩阵的值,用于BWT算法矩阵的计算。
在一个实施例中,第一算法还包括BWT矩阵变换算法;空闲状态的CPU模块采用BWT矩阵变换算法对BWT算法矩阵进行变换,得到短序列的BWT变换结果。
在一个实施例中,比对算法包括Smith-Waterman算法,第二算法包括打分矩阵算法;空闲状态的GPU模块还用于根据打分矩阵算法、各短序列以及参考物种序列计算Smith-Waterman打分矩阵,并将Smith-Waterman打分矩阵发送至空闲状态的CPU模块。
在一个实施例中,在计算Smith-Waterman打分矩阵的步骤中,包括:采用以下公式计算Smith-Waterman打分矩阵:M=R*C,R=a*L2+b;其中, M表示Smith-Waterman打分矩阵,R为参考物种备选区间序列的长度,C表示对从空闲状态的CPU模块接收的到各短序列进行筛选拼接形成的短序列的长度,L表示表示分批基因测序数据的长度,a和b表示常数。
关于基因测序数据处理方法的具体限定可以参见上文中对于基因测序数据处理装置的限定,在此不再赘述。
以上所述仅是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。

Claims (10)

  1. 一种基因测序数据处理方法,所述方法应用于基因测序数据处理装置,其特征在于,其中所述基因测序数据处理装置为异构多核构架,包括:ARM构架、GPU构架以及PCI总线;所述ARM构架通过所述PCI总线连接所述GPU构架;所述ARM构架包括至少一个CPU模块;所述GPU构架包括至少一个GPU模块;所述方法包括以下步骤:
    步骤S1:空闲状态的所述CPU模块分批读取基因测序数据得到分批基因测序数据;
    步骤S2:空闲状态的所述CPU模块对基因分析方法进行分切得到第一算法和第二算法;
    步骤S3:空闲状态的所述CPU模块根据所述第一算法对所述分批基因测序数据进行切分得到各短序列,并把各所述短序列和所述第二算法发送至空闲状态的所述GPU模块;
    步骤S4:空闲状态的所述GPU模块根据所述第二算法对各所述短序列进行计算,并把计算结果发送至空闲状态的所述CPU模块;
    步骤S5:空闲状态的所述CPU模块根据所述计算结果和所述第一算法计算得到分批处理结果;
    重复步骤S1~S5,直至将所述基因测序数据处理完成,空闲状态的所述CPU模块将各所述分批处理结果进行整合运算,得到最终处理结果。
  2. 根据权利要求1所述的基因测序数据处理方法,其特征在于,空闲状态的所述CPU模块扫描各所述GPU模块,确定空闲状态的GPU模块数量以及各空闲状态的GPU模块的数据处理量,并根据所述空闲状态的GPU模块数量以及各所述数据处理量分批读取基因测序数据。
  3. 根据权利要求1所述的基因测序数据处理方法,其特征在于,所述基因分析算法包括基因比对算法、Dotplot算法、blast算法、PAM算法、HMM算法以及AI推断算法。
  4. 根据权利要求3所述的基因测序数据处理方法,其特征在于,所述基因比对算法包括BWT算法,所述第一算法包括锚点切割算法;
    空闲状态的所述CPU模块将所述分批基因测序数据采用锚点切割算 法进行锚点定点,并以所述锚点定点为中心分别向前后延伸N个bp长度,并采用NEON指令对所述分批基因测序数据进行2N+1个bp长度的切割,得到各所述短序列,其中N为任意正整数。
  5. 根据权利要求4所述的基因测序数据处理方法,其特征在于,在得到各所述短序列的步骤中,包括:
    采用以下公式计算得到各所述短序列:
    (2*N+1)*x<L
    其中,x表示锚点个数,N表示延伸的bp数量,L表示所述分批基因测序数据的长度。
  6. 根据权利要求3或4所述的基因测序数据处理方法,其特征在于,所述第二算法为Hash算法;
    空闲状态的所述GPU模块根据所述Hash算法对各所述短序列进行Hash运算,得到Hash计算结果,并将所述Hash计算结果发送至空闲状态的所述CPU模块;其中所述Hash计算结果为BWT算法矩阵的值,用于BWT算法矩阵的计算。
  7. 根据权利要求6所述的基因测序数据处理方法,其特征在于,所述第一算法还包括BWT矩阵变换算法;
    空闲状态的所述CPU模块采用所述BWT矩阵变换算法对所述BWT算法矩阵进行变换,得到所述短序列的BWT变换结果。
  8. 根据权利要求3所述的基因测序数据处理方法,其特征在于,所述比对算法包括Smith-Waterman算法,所述第二算法包括打分矩阵算法;
    空闲状态的所述GPU模块根据所述打分矩阵算法、各所述短序列以及参考物种序列计算Smith-Waterman打分矩阵,并将所述Smith-Waterman打分矩阵发送至空闲状态的所述CPU模块。
  9. 根据权利要求8所述的基因测序数据处理方法,其特征在于,在计算Smith-Waterman打分矩阵的步骤中,包括:
    采用以下公式计算Smith-Waterman打分矩阵:
    M=R*C
    R=a*L 2+b
    其中,M表示Smith-Waterman打分矩阵,R为参考物种备选区间序列的长度,C表示对从空闲状态的所述CPU模块接收的到各短序列进行筛选拼接形成的
    短序列的长度,L表示表示所述分批基因测序数据的长度,a和b表示常数。
  10. 一种基因测序数据处理装置,其特征在于,所述基因测序数据处理装置执行权利要求1-9任一项所述的基因测序数据处理方法。
PCT/CN2020/127101 2020-10-22 2020-11-06 基因测序数据处理方法和基因测序数据处理装置 WO2022082879A1 (zh)

Priority Applications (5)

Application Number Priority Date Filing Date Title
JP2021571845A JP7393439B2 (ja) 2020-10-22 2020-11-06 遺伝子シークエンシングデータ処理方法及び遺伝子シークエンシングデータ処理装置
AU2020450960A AU2020450960A1 (en) 2020-10-22 2020-11-06 Method for processing gene sequencing data and apparatus for processing gene sequencing data
EP20937176.4A EP4235678A1 (en) 2020-10-22 2020-11-06 Gene sequencing data processing method and gene sequencing data processing device
IL288594A IL288594A (en) 2020-10-22 2021-12-01 Method and apparatus for processing gene sequence data
AU2023266239A AU2023266239A1 (en) 2020-10-22 2023-11-13 Method for processing gene sequencing data and apparatus for processing gene sequencing data

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011139823.4 2020-10-22
CN202011139823.4A CN112259168B (zh) 2020-10-22 2020-10-22 基因测序数据处理方法和基因测序数据处理装置

Publications (1)

Publication Number Publication Date
WO2022082879A1 true WO2022082879A1 (zh) 2022-04-28

Family

ID=74264788

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/127101 WO2022082879A1 (zh) 2020-10-22 2020-11-06 基因测序数据处理方法和基因测序数据处理装置

Country Status (2)

Country Link
CN (1) CN112259168B (zh)
WO (1) WO2022082879A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116932631A (zh) * 2023-07-18 2023-10-24 哈尔滨晨文科技开发有限公司 一种基于大数据的检测数据可视化管理系统及方法

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113299344A (zh) * 2021-06-23 2021-08-24 深圳华大医学检验实验室 基因测序分析方法、装置、存储介质和计算机设备
TWI819480B (zh) 2022-01-27 2023-10-21 緯創資通股份有限公司 加速系統及其動態配置方法
CN114328399B (zh) * 2022-03-15 2022-05-24 四川大学华西医院 一种基因测序多样本数据文件自动配对的方法和系统
CN116594745A (zh) * 2023-05-11 2023-08-15 阿里巴巴达摩院(杭州)科技有限公司 任务执行方法、系统、芯片及电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239732A (zh) * 2014-09-24 2014-12-24 湖南大学 一种运行于多核计算机平台的并行通用序列的比对方法
CN104504303A (zh) * 2014-09-29 2015-04-08 肇庆学院 基于cpu+gpu异构系统的序列比对方法
EP3428798A1 (en) * 2016-04-08 2019-01-16 Huawei Technologies Co., Ltd. Resource allocation method and device for genetic analysis
CN110135584A (zh) * 2019-03-30 2019-08-16 华南理工大学 基于自适应并行遗传算法的大规模符号回归方法及系统
CN110473593A (zh) * 2019-07-25 2019-11-19 深圳大学 一种基于FPGA的Smith-Waterman算法实现方法及装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103279445A (zh) * 2012-09-26 2013-09-04 上海中科高等研究院 运算任务的计算方法及超算系统
CN106295250B (zh) * 2016-07-28 2019-03-29 北京百迈客医学检验所有限公司 二代测序短序列快速比对分析方法及装置
WO2020124275A1 (en) * 2018-12-21 2020-06-25 Huawei Technologies Co., Ltd. Method, system, and computing device for optimizing computing operations of gene sequencing system
CN110427262B (zh) * 2019-09-26 2020-05-15 深圳华大基因科技服务有限公司 一种基因数据分析方法及异构调度平台

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239732A (zh) * 2014-09-24 2014-12-24 湖南大学 一种运行于多核计算机平台的并行通用序列的比对方法
CN104504303A (zh) * 2014-09-29 2015-04-08 肇庆学院 基于cpu+gpu异构系统的序列比对方法
EP3428798A1 (en) * 2016-04-08 2019-01-16 Huawei Technologies Co., Ltd. Resource allocation method and device for genetic analysis
CN110135584A (zh) * 2019-03-30 2019-08-16 华南理工大学 基于自适应并行遗传算法的大规模符号回归方法及系统
CN110473593A (zh) * 2019-07-25 2019-11-19 深圳大学 一种基于FPGA的Smith-Waterman算法实现方法及装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116932631A (zh) * 2023-07-18 2023-10-24 哈尔滨晨文科技开发有限公司 一种基于大数据的检测数据可视化管理系统及方法

Also Published As

Publication number Publication date
CN112259168A (zh) 2021-01-22
CN112259168B (zh) 2023-03-28

Similar Documents

Publication Publication Date Title
WO2022082879A1 (zh) 基因测序数据处理方法和基因测序数据处理装置
Nobile et al. Graphics processing units in bioinformatics, computational biology and systems biology
CN107563150B (zh) 蛋白质结合位点的预测方法、装置、设备及存储介质
Ng et al. Reconfigurable acceleration of genetic sequence alignment: A survey of two decades of efforts
Sadasivan et al. Accelerating Minimap2 for accurate long read alignment on GPUs
Wu et al. FPGA accelerated INDEL realignment in the cloud
Du et al. A tile-based parallel Viterbi algorithm for biological sequence alignment on GPU with CUDA
Chen et al. A hybrid short read mapping accelerator
Du et al. Deepadd: protein function prediction from k-mer embedding and additional features
Houtgast et al. An efficient gpuaccelerated implementation of genomic short read mapping with bwamem
Aguado-Puig et al. Accelerating edit-distance sequence alignment on GPU using the wavefront algorithm
Ng et al. Acceleration of short read alignment with runtime reconfiguration
Soto et al. JACC-FPGA: A hardware accelerator for Jaccard similarity estimation using FPGAs in the cloud
WO2021113779A1 (en) Rapid detection of gene fusions
Yin et al. Improving the prediction of DNA-protein binding by integrating multi-scale dense convolutional network with fault-tolerant coding
JP7393439B2 (ja) 遺伝子シークエンシングデータ処理方法及び遺伝子シークエンシングデータ処理装置
RU2799005C2 (ru) Способ обработки данных секвенирования генов и устройство для обработки данных секвенирования генов
CN114999566A (zh) 基于词向量表征和注意力机制的药物重定位方法及系统
Hazelhurst Algorithms for clustering expressed sequence tags: the wcd tool: reviewed article
Gudur et al. Hardware-algorithm codesign for fast and energy efficient approximate string matching on FPGA for computational biology
Anderson et al. An FPGA-based hardware accelerator supporting sensitive sequence homology filtering with profile hidden Markov models
Nasrin et al. PSALR: Parallel Sequence Alignment for long Sequence Read with Hash model
Kieu-Do-Nguyen et al. High-Performance FPGA-Based BWA-MEM Accelerator
Kawam et al. A GPU-CPU heterogeneous algorithm for NGS read alignment
Surendar et al. Micro Sequence Identification of DNA Data Using Pattern Mining Techniques

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2021571845

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2020450960

Country of ref document: AU

Date of ref document: 20201106

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20937176

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020937176

Country of ref document: EP

Effective date: 20230522

WWE Wipo information: entry into national phase

Ref document number: 521431052

Country of ref document: SA