CN110767265A - Parallel acceleration method for big data genome comparison file sequencing - Google Patents

Parallel acceleration method for big data genome comparison file sequencing Download PDF

Info

Publication number
CN110767265A
CN110767265A CN201911008972.4A CN201911008972A CN110767265A CN 110767265 A CN110767265 A CN 110767265A CN 201911008972 A CN201911008972 A CN 201911008972A CN 110767265 A CN110767265 A CN 110767265A
Authority
CN
China
Prior art keywords
file
sorting
buffer
reading
thread
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911008972.4A
Other languages
Chinese (zh)
Inventor
张中海
谭光明
张春明
姚二林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201911008972.4A priority Critical patent/CN110767265A/en
Publication of CN110767265A publication Critical patent/CN110767265A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/50Compression of genetic data

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Genetics & Genomics (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a parallel acceleration method for big data genome comparison file sequencing, which comprises the following steps: reading and decompressing the target BAM file, and storing the target BAM file into a continuous first buffer B; when the first buffer B is full, performing multithread sorting and merging through heap sorting to form an intermediate file; reading the intermediate files in sequence, putting the intermediate files into the associated second buffer areas MB, and merging the data of each second buffer area MB through heap sequencing; and compressing the merged data through a plurality of threads, and writing the compressed data into a result file. The invention separately allocates threads for reading and decompressing, and respectively constructs thread pools for decompressing and compressing, thereby reducing the number of threads to be opened, fully utilizing multithreading resources, improving the efficiency of reading and writing files, reducing the number of intermediate files, reducing the number of memory copy operations, and realizing the shortening of processing time.

Description

Parallel acceleration method for big data genome comparison file sequencing
Technical Field
The invention relates to the field of high-performance computing, in particular to a parallel acceleration method for ordering files by comparing big data genomes.
Background
In recent years, with the progress of gene sequencing technology, the field of biological gene health has been rapidly developed. The rapid growth of genetic data poses even greater challenges to genetic analysis techniques. How to rapidly process the big data in the biological gene field becomes a great hot research direction for generating information and calculating high performance.
In clinical and scientific research, mainstream analysis procedures for human genome data include: genome comparison, sequencing, redundancy removal, insertion and deletion weight comparison, quality fraction re-check, mutation detection and the like. The intermediate generated to-be-sorted files have dozens of Gb less than the files and hundreds of Gb more than the files. The existing processing software, such as SAMtools, mainly includes two stages in the sequencing process:
the first stage, the file to be ordered is read into the memory, only one part is read each time, the data of the part is related to the size of the system memory and can be manually set; then, the part of data is evenly distributed to a plurality of threads, and each thread only sequences the distributed data; writing the ordered data into a temporary file, and continuously reading the next part for the same processing; at the end of this phase, many ordered temporary files will be generated.
And in the second stage, the temporary files are sorted by a heap sorting algorithm and finally input into a result file.
For the comparison file in the BAM format, the above conventional sorting method needs decompression for each reading and compression for each writing. Decompression and sorting operations consume relatively little processor resources, while compression operations consume very much processor resources. This is also the biggest difference between big data gene alignment files and other big data file sorting processes. Because too many temporary files are generated in the sorting process and too many threads are opened in the file reading and writing process, the efficiency of reading a disk is low when the temporary files are combined, and the traditional sorting algorithm is very inefficient and time-consuming.
It can be seen that, for the sequencing process of the SAM/BAM file, the efficiency will depend on the coordination degree of the computing resources such as the hard disk read-write speed, the memory size, the processor computing power, etc. However, the existing sorting methods and tools, such as SAMtools, cannot achieve reasonable utilization of computing and hardware resources, so as to achieve faster sorting processing.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a parallel acceleration method for sequencing big data genome comparison files. The method can fully utilize the read-write performance of the hard disk, and simultaneously coordinate the use of the calculation resources and the memory resources by the decompression, the sequencing and the compression operations in the file processing process, thereby improving the processing efficiency and reducing the processing time.
In order to achieve the above object, in one aspect, the present invention provides a parallel acceleration method for sorting big data genome alignment files, which is characterized by comprising the following steps:
step 101, reading and decompressing a target BAM file, and storing the target BAM file into a continuous first buffer B;
102, distributing data in the first buffer area B to a plurality of threads according to blocks for sorting respectively after the first buffer area B is full, merging the processing results of each sorting thread through heap sorting, and compressing to form an intermediate file;
103, reading the intermediate files in sequence, associating each intermediate file to be read with a second buffer area, reading and decompressing the intermediate files, storing the intermediate files into the associated second buffer areas MB, and merging the data of each second buffer area MB through heap sorting;
and 104, compressing the merged data through a plurality of threads, and writing the compressed data into a result file.
In a preferred implementation, the step 101 includes allocating a read thread and a decompression thread for the read operation and the decompression operation respectively, where the number of the read threads is less than the number of the decompression threads.
In another preferred implementation, the step 101 includes: when data reading is carried out, one thread is allocated for reading, and a plurality of threads are allocated for decompression.
In another preferred implementation, the step 102 includes dividing the first buffer B into a plurality of buffer blocks, and each sorting thread is associated with a buffer block for sorting data therein.
In another preferred implementation, the step 102 includes creating a first thread pool and a second thread pool each containing a plurality of threads, the first thread pool being used for decompression operations and the second thread pool being used for compression operations.
In another preferred implementation, the step 104 includes: associating each intermediate file F to be read with a read-in thread, a read queue and a second buffer region MB, reading and decompressing the intermediate file F in sequence through the read-in thread and the decompression thread pool, and storing the intermediate file F into the associated second buffer region MB.
In another preferred implementation, the method further comprises: the plurality of intermediate files are respectively read into the associated second buffer areas MB, the files in the second buffer areas MB are read for heap sorting, and the result of the heap sorting is written into the result file.
In another preferred implementation, the step 103 includes: the data of each second buffer MB is merged by means of heap sorting.
In another aspect, the invention provides a computer readable storage medium having a computer program stored thereon, wherein the program when executed by a processor implements the method.
In another aspect, the invention provides a computer device comprising a memory and a processor, the memory having stored thereon a computer program operable on the processor, wherein the processor implements the method when executing the program. During data reading, the data volume of single reading and decompression processing can be determined according to the size of the memory.
It should be noted that the "to-be-read" intermediate file mentioned in the present invention refers to an intermediate file that has a free second buffer MB, i.e., an intermediate file that is to be associated with and read from the corresponding second buffer MB.
The invention has the following advantages:
the parallel acceleration method for sorting files by comparing large data genomes disclosed by the invention has the advantages that threads are respectively and independently allocated for reading and decompressing (the number of the reading threads is less than that of the decompressing threads), thread pools are respectively established for decompressing and compressing, so that the number of the opened threads is greatly reduced, multithreading resources are fully utilized, and the file reading and writing efficiency is further improved.
The parallel acceleration method for sorting files by comparing large data genomes can fully utilize the read-write performance of a hard disk by constructing two independent buffer areas, reasonably allocates a reading thread, a decompressing thread and a compressing thread, and coordinates the use of decompressing, sorting and compressing operations in the file processing process on computing resources and memory resources simultaneously, so that the aims of improving the processing efficiency and reducing the processing time are fulfilled.
Drawings
FIG. 1 is a schematic flow chart of a parallel acceleration method for big data genome alignment file sorting according to the present invention.
FIG. 2 is a first part of a processing flow diagram of a parallel acceleration method for big data genome alignment file sorting according to an embodiment of the present invention.
FIG. 3 is a second portion of the process flow diagram of the embodiment shown in FIG. 2.
FIG. 4 is a third portion of the process flow diagram of the embodiment shown in FIG. 2.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
As shown in fig. 1, the parallel acceleration method for big data genome alignment file sorting according to the present invention generally includes the following steps:
step 101, obtaining a target BAM file to be sorted, reading and decompressing the target BAM file, and storing the target BAM file into a continuous first buffer area B;
102, distributing data to a plurality of threads for sorting respectively after the first buffer B is full, merging the processing results of each sorting thread through heap sorting, and compressing to form an intermediate file;
103, reading the intermediate files in sequence, associating a second buffer area with each intermediate file to be read, reading and decompressing the intermediate files to be read, putting the intermediate files into the associated second buffer areas MB, and merging the data of each second buffer area MB through stacking and sorting;
step 104, compressing the merged data through a plurality of threads, and writing the compressed data into a result file;
and 105, judging whether all the intermediate files are executed completely, if so, ending the process, otherwise, returning to the step 103.
The whole process of the parallel acceleration method for big data genome alignment file ordering of the invention will be described in detail as follows:
for the input BAM file, because the data size is too large, the BAM file cannot be put into a memory at one time, and therefore, only a part of the content can be read at one time. Fig. 2 shows the process of reading and decompressing the BAM file and putting it into buffer B. In order to avoid excessive memory opening release and copy operation between memories, a continuous large memory space (larger than a preset value, wherein the preset value is determined according to the total size of the memory space) is applied to serve as a buffer B, and then reading, decompressing and caching are carried out in a blocking mode.
Specifically, in this embodiment, for the decompression and compression operations involved, two thread pools including N threads, a decompression thread pool and a compression thread pool are created, the threads in the decompression thread pool are exclusively used for the decompression operation, the threads in the compression thread pool are exclusively used for the compression operation, and the decompression and compression involved in the sorting process are both completed by the decompression and compression threads in the thread pools.
The input BAM file is read through an IO thread, and a read-in thread RT and a read queue RQ are associated with the input BAM file. The read-in thread RT reads contents from an input file according to blocks, each block of data is placed into a read queue RQ which is used as a queue to be decompressed, a decompression thread pool acquires each block of data in the read queue RQ and delivers the data to a decompression thread for processing, and the decompression thread decompresses the block of data and then sequentially places the decompressed data into a buffer B according to the number of the block. When the buffer B is full, the read-in thread RT blocks and waits for the next read. The task queue to be decompressed in fig. 2 is a read queue RQ, and the memory in fig. 2 is a buffer B in the memory.
And then sequencing is carried out, wherein PN sequencing threads are developed by the main thread, and PN is a positive integer. As shown in fig. 3, the buffer B is divided into PN blocks, in the example of the figure, the value of PN is 4, that is, the data (for example, SAM records) read into the memory is divided into PN blocks, each block is associated with a sorting thread, and each thread sorts its associated data. And after all the sequencing threads are executed, merging the data associated with the PN threads by using a heap sequencing algorithm.
After each time the buffer B is full, the files obtained by multithread sorting and merging after the sorting are stacked are created into one intermediate file F. By repeating the above steps, a plurality of intermediate files F can be obtained.
And delivering the merged intermediate file F to a compression thread pool for compression, sequentially putting the compressed data into a write-in queue WQ according to the thread number, and writing by using an IO thread. Specifically, the intermediate file F may be associated with a write thread WT and a write queue WQ, and the write thread WT sequentially writes data in the queue into the hard disk. The ordered intermediate file F (BAM format) in fig. 3 is any one of the intermediate files F, and the SAM data read into the memory refers to the file stored in the buffer B.
And repeating the operations until the input file is processed.
After the above processing, some intermediate files F are generated, for example, the number of the intermediate files is M, and since the number of the intermediate files may be large and the number of the intermediate files processed at the same time is limited, the intermediate files need to be processed for multiple times to be read and decompressed respectively. Fig. 4 shows the process of decompressing, buffering, heap sorting and compression writing two intermediate files F simultaneously.
The following description will be given by taking the example of reading and heap sorting for synchronization of two intermediate files. For each intermediate file F to be read, when it is read, a read thread, a read queue and a buffer MB are associated with it, the intermediate file F is read and decompressed (since read decompression is similar to that in fig. 2, the processing is simplified in fig. 4) by decompression threads (e.g., a single read thread and multiple decompression threads) in the read thread and decompression thread pool, the associated buffer MB is filled, and the data in each buffer MB at this time is merged by a heap sorting algorithm. After reading of a certain buffer area MB is finished (reading from the buffer area and finishing stacking and sorting), the reading thread is started again, another corresponding intermediate file F is read again through the reading thread and the decompressing thread in the decompressing thread pool, and the buffer area MB is filled with decompressed data. After the buffer MB is filled, the merge operation continues.
And creating the merged result into a result file FN, associating a write-in thread and a write-in queue with the merged result similar to the case of writing the intermediate file F, compressing the merged data by the compression threads in the compression thread pool CP, and finally writing the compressed data into the result file FN by the write-in thread.
The read queue RQ of fig. 2 only shows 4 blocks of data, it being understood that the number of data blocks in the read queue RQ may be 4 or more. The number M of intermediate files F shown in fig. 4 is 2, it being understood that the number M of intermediate files F may be 2 or more. The number of data blocks in RQ and the number M of intermediate files F need to be specifically determined by the size of the data size of the input BAM file and the size of the buffer B in the memory, which is not described herein again.
Compared with the existing processing software such as SAMtools, under the condition of the same hardware condition, for example, the same computer or server is adopted, and the target files with the same size, for example, 100GB, are sorted simultaneously or sequentially. The sorting method can improve the data sorting speed by 40-50%.
The parallel acceleration method for the big data genome comparison file sequencing can fully utilize multi-thread resources by greatly reducing the number of the opened threads, thereby improving the file reading and writing efficiency, reducing the number of intermediate files, reducing the copy operation times of the memory and shortening the processing time.
The parallel acceleration method for sorting the big data genome comparison files can fully utilize the read-write performance of the hard disk, and realize the purposes of improving the processing efficiency and reducing the processing time by coordinating the use of the decompression, sorting and compression operations in the file processing process to the computing resources and the memory resources.
Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (10)

1. A parallel acceleration method for sequencing big data genome alignment files is characterized by comprising the following steps:
step 101, reading and decompressing a target BAM file, and storing the target BAM file into a continuous first buffer B;
102, distributing data in the first buffer area B to a plurality of threads according to blocks for sorting respectively after the first buffer area B is full, merging the processing results of each sorting thread through heap sorting, and compressing to form an intermediate file;
103, reading the intermediate files in sequence, associating each intermediate file to be read with a second buffer area, reading and decompressing the intermediate files, storing the intermediate files into the associated second buffer areas MB, and merging the data of each second buffer area MB through heap sorting;
and 104, compressing the merged data through a plurality of threads, and writing the compressed data into a result file.
2. The parallel acceleration method for big data genome alignment file sorting according to claim 1, wherein the step 101 comprises allocating a reading thread and a decompression thread for the reading operation and the decompression operation respectively, and the number of the reading threads is less than that of the decompression threads.
3. The parallel acceleration method for big data genome alignment file sorting according to claim 2, wherein the step 101 comprises: when data reading is carried out, one thread is allocated for reading, and a plurality of threads are allocated for decompression.
4. The method of claim 1, wherein the step 102 comprises dividing the first buffer B into a plurality of buffer blocks, and each sorting thread is associated with one buffer block for sorting data therein.
5. The method of claim 1, wherein the step 102 comprises creating a first thread pool and a second thread pool respectively comprising a plurality of threads, the first thread pool being used for decompression operations and the second thread pool being used for compression operations.
6. The parallel acceleration method for big data genome alignment file sorting according to claim 1, wherein the step 104 comprises: associating each intermediate file F to be read with a read-in thread, a read queue and a second buffer region MB, reading and decompressing the intermediate file F in sequence through the read-in thread and the decompression thread pool, and storing the intermediate file F into the associated second buffer region MB.
7. The parallel acceleration method for big data genome alignment file sorting according to claim 6, characterized in that the method further comprises: the plurality of intermediate files are respectively read into the associated second buffer areas MB, the files in the second buffer areas MB are read for heap sorting, and the result of the heap sorting is written into the result file.
8. The parallel acceleration method for big data genome alignment file sorting according to claim 1, wherein the step 103 comprises: the data of each second buffer MB is merged by means of heap sorting.
9. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method according to any one of claims 1 to 8.
10. A computer device comprising a memory and a processor, on which memory a computer program is stored which is executable on the processor, characterized in that the processor implements the method of any one of claims 1 to 8 when executing the program.
CN201911008972.4A 2019-10-23 2019-10-23 Parallel acceleration method for big data genome comparison file sequencing Pending CN110767265A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911008972.4A CN110767265A (en) 2019-10-23 2019-10-23 Parallel acceleration method for big data genome comparison file sequencing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911008972.4A CN110767265A (en) 2019-10-23 2019-10-23 Parallel acceleration method for big data genome comparison file sequencing

Publications (1)

Publication Number Publication Date
CN110767265A true CN110767265A (en) 2020-02-07

Family

ID=69332927

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911008972.4A Pending CN110767265A (en) 2019-10-23 2019-10-23 Parallel acceleration method for big data genome comparison file sequencing

Country Status (1)

Country Link
CN (1) CN110767265A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111309269A (en) * 2020-02-28 2020-06-19 苏州浪潮智能科技有限公司 Method, system and equipment for dropping compressed data and readable storage medium
CN114242173A (en) * 2021-12-22 2022-03-25 深圳吉因加医学检验实验室 Data processing method, device and storage medium for identifying microorganisms by using mNGS
US12026371B2 (en) 2020-02-28 2024-07-02 Inspur Suzhou Intelligent Technology Co., Ltd. Method, system, and device for writing compressed data to disk, and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130198464A1 (en) * 2012-01-27 2013-08-01 Comcast Cable Communications, Llc Efficient read and write operations
CN103577559A (en) * 2013-10-23 2014-02-12 华为技术有限公司 Data ordering method and device
CN104572106A (en) * 2015-01-12 2015-04-29 浪潮电子信息产业股份有限公司 Concurrent program developing method for processing of large-scale data based on small memory
CN110299187A (en) * 2019-07-04 2019-10-01 南京邮电大学 A kind of parallelization gene data compression method based on Hadoop

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130198464A1 (en) * 2012-01-27 2013-08-01 Comcast Cable Communications, Llc Efficient read and write operations
CN103577559A (en) * 2013-10-23 2014-02-12 华为技术有限公司 Data ordering method and device
CN104572106A (en) * 2015-01-12 2015-04-29 浪潮电子信息产业股份有限公司 Concurrent program developing method for processing of large-scale data based on small memory
CN110299187A (en) * 2019-07-04 2019-10-01 南京邮电大学 A kind of parallelization gene data compression method based on Hadoop

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111309269A (en) * 2020-02-28 2020-06-19 苏州浪潮智能科技有限公司 Method, system and equipment for dropping compressed data and readable storage medium
US12026371B2 (en) 2020-02-28 2024-07-02 Inspur Suzhou Intelligent Technology Co., Ltd. Method, system, and device for writing compressed data to disk, and readable storage medium
CN114242173A (en) * 2021-12-22 2022-03-25 深圳吉因加医学检验实验室 Data processing method, device and storage medium for identifying microorganisms by using mNGS

Similar Documents

Publication Publication Date Title
CN103559020B (en) A kind of DNA reads ordinal number according to the compression of FASTQ file in parallel and decompression method
US7401174B2 (en) File system defragmentation and data processing method and apparatus for an information recording medium
EP3309685B1 (en) Method and apparatus for writing data to cache
WO2015145647A1 (en) Storage device, data processing method, and storage system
TW201111986A (en) Memory apparatus and data access method for memories
CN111061434B (en) Gene compression multi-stream data parallel writing and reading method, system and medium
CN107632776A (en) For compressing the data storage device of input data
US8850148B2 (en) Data copy management for faster reads
CN110767265A (en) Parallel acceleration method for big data genome comparison file sequencing
CN108134609A (en) Multithreading compression and decompressing method and the device of a kind of conventional data gz forms
US10810174B2 (en) Database management system, database server, and database management method
WO2024045556A1 (en) L2p table updating method, system and apparatus, and nonvolatile readable storage medium
US8713278B2 (en) System and method for stranded file opens during disk compression utility requests
CN104239231B (en) A kind of method and device for accelerating L2 cache preheating
US8452900B2 (en) Dynamic compression of an I/O data block
CN117369731B (en) Data reduction processing method, device, equipment and medium
EP3869343A1 (en) Storage device and operating method thereof
TW201351276A (en) Scheduling and execution of compute tasks
US9507794B2 (en) Method and apparatus for distributed processing of file
CN108334457B (en) IO processing method and device
CN115933994A (en) Data processing method and device, electronic equipment and storage medium
EP4321981A1 (en) Data processing method and apparatus
CN114816322A (en) External sorting method and device of SSD and SSD memory
US20220188316A1 (en) Storage device adapter to accelerate database temporary table processing
CN112037874B (en) Distributed data processing method based on mapping reduction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200207

RJ01 Rejection of invention patent application after publication