CN106603591B - Processing method and system for genome detection data transmission and preprocessing - Google Patents

Processing method and system for genome detection data transmission and preprocessing Download PDF

Info

Publication number
CN106603591B
CN106603591B CN201510663214.1A CN201510663214A CN106603591B CN 106603591 B CN106603591 B CN 106603591B CN 201510663214 A CN201510663214 A CN 201510663214A CN 106603591 B CN106603591 B CN 106603591B
Authority
CN
China
Prior art keywords
data
genome
blocks
detection
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510663214.1A
Other languages
Chinese (zh)
Other versions
CN106603591A (en
Inventor
王振飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Genedock Technology Co Ltd
Original Assignee
Beijing Genedock Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Genedock Technology Co Ltd filed Critical Beijing Genedock Technology Co Ltd
Priority to CN201510663214.1A priority Critical patent/CN106603591B/en
Publication of CN106603591A publication Critical patent/CN106603591A/en
Application granted granted Critical
Publication of CN106603591B publication Critical patent/CN106603591B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the field of genome sequencing data transmission, analysis and detection, in particular to a processing method and a system for genome detection data transmission and pretreatment, wherein the method comprises the following steps: obtaining the genome detection data, and partitioning the genome detection data, wherein if the genome detection data is single-stranded data, the genome detection data of M Read short sequences is partitioned into P ═ INT (M/N) according to one partition of every N Read short sequences, INT () is an upward rounding function, P is the number of partitions, if the genome detection data is double-stranded data, the strand data R1 and the strand data R2 are partitioned according to a single-stranded data partitioning method respectively, so as to generate R1 partition data and R2 partition data, and each R1 partition data is matched with one of the R2 partition data, or vice versa; and transmitting the block data to a server for genome analysis and detection. The present invention significantly reduces the genome data pre-processing time expenditure and increases the fault tolerance of the processing process.

Description

Processing method and system for genome detection data transmission and preprocessing
Technical Field
The invention relates to the field of genome sequencing data transmission, analysis and detection, in particular to a processing method and a processing system for genome detection data transmission and pretreatment.
Background
The data preprocessing of genomic variation detection in the prior art mainly comprises two main steps: firstly, data are transmitted to a server storage or cloud storage service, and then genome sequencing data of a sample stored by the server or the cloud service are compared with a standard reference genome of a species to which the sample belongs in a single-machine computing mode.
In the prior art, the data transmission and the species standard reference genome are compared into two steps, and the two steps are executed serially, that is, the comparison process with the species standard reference genome must be started until all sample genome sequencing data are transmitted to a server or cloud storage service.
There are many ways to transmit data to the server or the cloud storage service, for example, using a data synchronization tool based on TCP or UDP protocol, such as FTP, SCP, RSYNC, etc., directly mounting a hard disk with genome data or other storage media to the server, or using a client provided by the cloud storage service to transmit.
The process of alignment with a standard reference genome of the species to which the sample belongs is typically a computationally intensive task. The computing tasks for this process in the prior art were handled using a high performance server (e.g., a minicomputer or mainframe).
The prior art has the following problems:
the processes of the original data transmission of the genome sequencing of the sample and the comparison with the standard reference genome of the species are executed in series, and the time consumption is long;
for genome data of double-ended sequencing, it cannot be guaranteed that paired double-ended short sequences are successfully transmitted at the same time, and even the transmission of two double-ended data files is serial, so that the whole data transmission process is long;
the comparison process with the species standard reference genome to which the sample belongs is carried out on a single server, and the computing power of the server becomes the bottleneck of task processing, so that the processing process of the step is time-consuming;
the process is usually run in a single task mode, if the task fails, the whole sample genome sequencing data processing process needs to be run again, the retry cost is high, the fault tolerance capability is weak, and the processing time is further prolonged.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a processing method and a system for genome detection data transmission and pretreatment.
The invention provides a processing method for genome detection data transmission and pretreatment, which comprises the following steps:
step 1, acquiring the genome detection data, and partitioning the genome detection data, wherein if the genome detection data is single-stranded data, the genome detection data of M Read short sequences is partitioned into P ═ INT (M/N) according to one partition of every N Read short sequences, INT () is an upward rounding function, P is the number of partitions, and if the genome detection data is double-stranded data, the chain data R1 and the chain data R2 are respectively classified according to a single-stranded data partitioning method, so as to generate R1 partition data and R2 partition data, and each R1 partition data is matched with one of the R2 partition data, or vice versa;
and 2, transmitting the block data to a server for genome analysis and detection.
The processing method for transmitting and preprocessing the genome-oriented detection data, wherein the step 1 further comprises the following steps of: and selectively compressing the blocks to achieve the purpose of reducing the size of the transmitted data.
The processing method for genome-oriented detection data transmission and preprocessing puts the matched block data of the R1 block data and the R2 block data into the same data packet or ensures that the mutually matched block data of the R1 block data and the R2 block data are uploaded successfully at the same time and are used as the input of a genome data preprocessing task at the same time.
In the processing method for genome detection data transmission and preprocessing, in the step 1, the number of short sequences contained in the block data is any integer between 1 and M, which is greater than or equal to the number of Read short sequences contained in the sample.
In the processing method for genome detection data transmission and preprocessing, the block data in the step 2 are transmitted to a server in parallel, and are analyzed and detected in parallel, wherein any block data is wrong, and the analysis and detection of the rest block data are not influenced.
The invention also provides a genome detection data transmission and preprocessing oriented processing system, which comprises:
a partitioning module, configured to obtain the genome detection data and partition the genome detection data, where if the genome detection data is single-stranded data, the genome detection data of M Read short sequences is partitioned into P ═ INT (M/N) according to one partition of every N Read short sequences, INT () is an rounding-up function, P is the number of partitions, and if the genome detection data is double-stranded data, the chain data R1 and the chain data R2 are respectively classified according to a single-stranded data partitioning method, so as to generate R1 partition data and R2 partition data, and each R1 partition data matches with one of the R2 partition data, or vice versa;
and the transmission module is used for transmitting the block data to a server for genome analysis and detection.
The processing system for genome-oriented detection data transmission and preprocessing comprises: and selectively compressing the blocks to achieve the purpose of reducing the size of the transmitted data.
The genome detection data transmission and preprocessing-oriented processing system places the R1 block data and the matched block data in the R2 block data into the same data packet or ensures that the R1 block data and the matched block data in the R2 block data are uploaded successfully at the same time and are used as the input of a genome data preprocessing task at the same time. 9. The system for processing genome-oriented detection data transmission and preprocessing as claimed in claim 6, wherein in the step 6, the number of short sequences contained in the block data is any integer between 1 and M.
According to the genome detection data transmission and preprocessing oriented processing system, the block data in the transmission module are transmitted to the server in parallel, and parallel analysis and detection are carried out, wherein any block data is wrong, and the analysis and detection of the rest block data are not influenced.
As can be seen from the above invention, the advantages of the invention are:
according to the invention, through data blocking and combination of two steps of data transmission and species reference genome comparison into a complete workflow, complete sample genome sequencing data blocking processing is realized, parallelization of transmission and preprocessing between different data blocks is realized, the problem that complete data transmission needs to be waited for complete genome data comparison is avoided, the time of the process from data preparation transmission to generation of a preprocessed result data file is greatly reduced, the method has important significance in production application, and the method can refer to the graph 3 compared with the prior art.
Original identical sample genome sequencing data with the size of M Read short sequences takes UT seconds, while the transmission of the data with the size of M Read short sequences in the same network environment takes UTB seconds in INT (M/N) (1 ═ N < ═ M, N is the number of Read short sequences contained in a single block), and the transmission of the identical data in blocks takes UTB seconds, generally UT > ═ UTB, because the block transmission can better utilize the multi-core processing capability of a client computer and the utilization of network bandwidth is more effective. In the original scheme, the genome sequencing data of the sample can be compared and sequenced to output a comparison result file after UT seconds, and the PT seconds are taken in the step. In the present invention, the alignment and sorting are performed immediately after a block transfer is completed. While one block is compared and sequenced, and other blocks are also in the transmission process, so that the parallelization of the block comparison and sequencing and the block transmission is realized, and if the time spent in the processing process from the block comparison and sequencing to the final output result file segment is about PBT seconds on average, under the same computing power, for the genome sequencing data of N blocks, PBT < (PT/N) is the same,
in the prior art, the process from data transmission to final output of the comparison result data file takes UT + PT seconds.
In the invention, most of the block comparison processing, sequencing and block transmission are parallel, so the time spent by the invention is about UTB + PBT, and the UTB + PBT is far less than UT + PT.
Drawings
FIG. 1 is a data flow diagram of the overall data preprocessing for genomic variation detection of paired-end Reads;
FIG. 2 is a flow chart of data preprocessing for genome variation detection for single-ended detection;
fig. 3 is a diagram comparing the present invention with the prior art.
Detailed Description
The invention solves the problems of data transmission and pretreatment in the detection of the variation of massive genome data. The method mainly comprises the steps of segmenting sample genome sequencing original data according to short sequence (Read) numbers, dividing the sample genome sequencing original data into different blocks, and then selectively compressing, transmitting and checking the blocks in parallel and comparing the blocks with standard reference genomes of species to which samples belong. Compared with other schemes, the processing flow between different blocks can run on a computer cluster or a server simultaneously and parallelly, because the process of comparing with the species standard reference genome does not need to wait for the completion of the transmission of the genome sequencing original data of the sample, and the parallel running of the processing workflow between the blocks fully utilizes the computing power of the computing cluster and the high-performance server, the method can greatly shorten the time from the transmission of the genome sequencing original data of the sample to the final preprocessing process of comparing with the species standard reference genome, and in addition, because the transmission of the genome sequencing original data and the mapping process with the standard reference genome of the species to which the sample belongs are both carried out by taking the blocks as units, therefore, the failure of transmission of a single block or mapping does not affect the whole data processing process, and only the transmission of the failed block or the mapping process needs to be operated again, so that the fault tolerance is stronger, the corresponding processing time is also favorably shortened, and the invention needs to be supplemented to show that the invention not only supports the genome sequencing data of single-ended sequencing, but also supports the short-sequence paired transmission of the genome sequencing data of double-ended sequencing.
The processing steps of the invention are as follows:
and step 1, data is partitioned. The original genome sequencing data blocking strategy of the invention uses the original sequencing data file which is blocked by lines, such as one M Read short sequences, if one block is formed by every N (1 ═ N < ═ M) short sequences, the original genome sequencing data file can be divided into P ═ INT (M/N), INT () is an upward rounding function, and P is the number of blocks. For example, genome sequencing sample data containing 100 ten thousand rows of Read short sequences can be divided into 10(100w/10w) blocks by one block per 10 ten thousand rows of Read. Each block is named after the block is incremented by 1 from 0 block to block, and the name of the block is the file name plus the block number, for example, the name of the first block of the test. Genome variation detection can be divided into single-ended sequencing and double-ended sequencing according to the type of sample data, if the sample data is single-ended sequencing, the method for blocking is only required according to the method for blocking, double-ended detection is simultaneously supported, in a double-ended sequencing file, two short sequence files are blocked according to the same rule, wherein M, N values are the same, and therefore the obtained number of blocks is also the same.
And 2, data is compressed in a blocking mode. The data can be selectively compressed according to actual conditions. The invention can firstly wait until the same numbered blocks of the double-end Reads are all divided and then compressed together, or firstly compress the blocks of each read and then put the blocks of two compressed Reads together, and the method of putting the compressed blocks of the two Reads in the same sequence can be put into a tar packet or a new compressed packet, can also be a catalog, and can be any method which can ensure that the two compressed blocks are transmitted successfully at the same time and can be used as the input of a gene preprocessing task.
And 3, aligning the double-end detection blocks. Single ended detection skips this step. For double-end detection, the blocks of the same block number suffix of the R1 and R2 files (R1 and R2 represent two chains of a gene sequence) are packed into the same file packet, called a block (block), and the synchronous transmission processing of double-end short sequences is realized by this method. This step may also employ other mechanisms that ensure that the matching aligned partitions of R1 and R2 are both successfully uploaded and used as input for a genome data preprocessing task. For convenience of description, the block for single-ended detection of Reads is also referred to as a block.
For step 2 and step 3, the sequence of the two steps is irrelevant, and the method can be used for waiting until the blocks with the same number of the double-end Reads are divided and then compressed together, or compressing the blocks of each read before putting the blocks of the two compressed Reads together, and putting the compressed blocks of the two Reads in the same sequence together, wherein the method can be put into a tar packet or a new compressed packet, can be a catalog, or can be any method which can ensure that the two compressed blocks are transmitted successfully at the same time and can be used as the input of a gene preprocessing task. See fig. 1 for compression (Compressing), Merging (Merging).
And 4, transmitting the data in blocks. And 3, data blocking transmission, namely calling a corresponding stored client or API to transmit the block in the step 3 to a remote server.
And 5, decompressing the blocks. If no compression is performed in step 2, this step is skipped. At the server, for the block successfully transmitted in step 4, if double-end detection is performed, a file packet needs to be opened to obtain compressed blocks of two paired short sequences, and if single-end detection is performed, the block itself is the compressed block of the short sequence, processing is not needed, the obtained compressed blocks are decompressed according to different compression modes to obtain original short sequence block data, and the integrity of the data is checked through a hash value of the file, where an algorithm for calculating the hash value may be any hash algorithm, such as MD 5.
And 6, comparing the block with the species standard reference genome. Comparing the block data of the original Reads obtained in the step 5 with a standard reference genome corresponding to the sample species by using comparison software (such as BWA and the like), and outputting a result data file.
And 7, storing the originally transmitted short sequence data. The block data of the original Reads obtained in step 5 is saved to the corresponding storage location for subsequent direct reading from the storage for genome variation detection, as shown in the box "… …" in fig. 1 and 2.
And (3) processing in a streaming mode according to the steps 1 to 7, and when all the blocked processing flows are processed, obtaining the result data of mapping the whole sample genome sequencing data and the standard reference genome of the species to which the sample belongs, wherein the result data can be used by the subsequent various processing flows as required.
And (3) preprocessing the data stream by making up a blocked data stream for detecting genomic variations in a block depending on the blocked data produced in the previous step in steps 1 to 7. The data streams between different blocks are independent of each other and can run in parallel in a computing cluster and a multi-core server.
The whole flow from step 1 to step 7 is in block (block) units, and any failure in any step only causes the failure of the corresponding block processing flow, and does not affect the processing of other blocks. And the fault tolerance of the whole data processing flow can be realized by re-running the flow corresponding to the failed block.
The overall data flow diagram for the data pre-processing of genomic variation detection for paired-end Reads can be referenced to fig. 1, and the data pre-processing flow diagram for the genomic variation detection for singled-end detection can be referenced to fig. 2.
The invention also provides a genome detection data transmission and preprocessing oriented processing system, which comprises:
and a blocking module, configured to obtain the genome detection data and block the genome detection data, where if the genome detection data is single-stranded data, the genome detection data of M Read short sequences is divided into P INT (M/N) by one block per N (1 ═ N < ═ M) Read short sequences, INT () is an upward rounding function, and P is a block number, and for example, for genome sequencing including 100 ten thousand Read short sequences, 10(100w/10w) sample data can be divided into one block per 10 ten thousand Read. If the genome detection data is double-stranded data, classifying the strand data R1 and the strand data R2 according to a single-stranded data blocking method respectively to generate R1 block data and R2 block data, wherein each R1 block data is matched with one of the R2 block data, and vice versa;
and the transmission module is used for transmitting the block data to a server for genome analysis and detection.
The blocking module further comprises: and selectively compressing the blocks to achieve the purpose of reducing the size of the transmitted data.
Compressing the R1 blocks and the matching blocks in the R2 blocks into the same packet, or using other mechanisms that ensure that the matching aligned blocks of R1 and R2 are uploaded successfully at the same time and used as input for a genome data preprocessing task.
And the block data in the transmission module is transmitted to a server in parallel, and is subjected to parallel analysis and detection, wherein any block data has an error and has no influence on the analysis and detection of the rest of block data.
The present invention also includes the following preferred embodiments, as follows:
for the blocking method of the sample genome sequencing data in the step 1, the blocking strategy is to block according to the number of Read short sequences, so that the size range of one block can be from one Read short sequence to multiple Read short sequences, and the naming of the block can be according to the above description or any naming mode which can represent the file of the block and can represent the continuity and the orderliness between the blocks.
For step 2 and step 3, the sequence of the two steps is irrelevant, and the method can be used for waiting until the blocks with the same number of the double-end Reads are divided and then compressed together, or compressing the blocks of each read before putting the blocks of the two compressed Reads together, and putting the compressed blocks of the two Reads in the same sequence together, wherein the method can be put into a tar packet or a new compressed packet, can be a catalog, or can be any method which can ensure that the two compressed blocks are transmitted successfully at the same time and can be used as the input of a gene preprocessing task.
The transmission in step 4 may be any method that can copy or move a data file from one storage location to another, such as SCP, FTP, local copy, etc. The storage location may be a local direct server storage, a SAN or a NAS, and a distributed file system or a cloud storage service.
Step 6, comparing the decompressed Reads blocks with the standard reference genome of the species to which the sample belongs by using any software which can be used for comparing with the standard reference genome of the species to which the sample belongs, and outputting a comparison result
The process of step 6, which is compared to the standard reference genome of the species to which the sample belongs, can be run either in a computer cluster or on a server with multi-core processing capabilities.

Claims (6)

1. A processing method for genome detection data transmission and preprocessing is characterized by comprising the following steps:
step 1, acquiring the genome detection data, and partitioning the genome detection data, wherein if the genome detection data is single-stranded data, the genome detection data of M Read short sequences is partitioned into P ═ INT (M/N) according to one partition of every N Read short sequences, INT () is an upward rounding function, P is the number of partitions, and if the genome detection data is double-stranded data, the chain data R1 and the chain data R2 are respectively classified according to a single-stranded data partitioning method, so as to generate R1 partition data and R2 partition data, and each R1 partition data is matched with one of the R2 partition data, or vice versa;
the number of short sequences contained in the block data is any integer between 1 and M which is greater than or equal to the number of Read short sequences contained in the sample;
step 2, transmitting the block data to a server for genome analysis and detection; processing in a streaming mode, and when all the processing flows of the blocks are processed, obtaining the result data of mapping the whole sample genome sequencing data and the standard reference genome of the species to which the sample belongs, wherein the result data can be used by the subsequent various processing flows as required;
and placing the matched blocks of the R1 blocks and the R2 blocks into the same data packet or ensuring that the matched blocks of the R1 blocks and the R2 blocks are uploaded successfully at the same time and are used as input of a genome data preprocessing task at the same time.
2. The method for processing genome-oriented detection data transmission and preprocessing as claimed in claim 1, wherein the step 1 further comprises: and selectively compressing the blocks to achieve the purpose of reducing the size of the transmitted data.
3. The method for processing genome-oriented inspection data transmission and preprocessing as claimed in claim 1, wherein the block data in step 2 are transmitted to the server in parallel and analyzed and inspected in parallel, wherein any block data has errors and has no influence on the analysis and inspection of the rest of the block data.
4. A processing system for genome-specific test data transmission and preprocessing, comprising:
a partitioning module, configured to obtain the genome detection data and partition the genome detection data, where if the genome detection data is single-stranded data, the genome detection data of M Read short sequences is partitioned into P ═ INT (M/N) according to one partition of every N Read short sequences, INT () is an rounding-up function, P is the number of partitions, and if the genome detection data is double-stranded data, the chain data R1 and the chain data R2 are respectively classified according to a single-stranded data partitioning method, so as to generate R1 partition data and R2 partition data, and each R1 partition data matches with one of the R2 partition data, or vice versa;
the number of short sequences contained in the block data is any integer between 1 and M which is greater than or equal to the number of Read short sequences contained in the sample;
the transmission module is used for transmitting the block data to a server for genome analysis and detection; processing in a streaming mode, and when all the processing flows of the blocks are processed, obtaining the result data of mapping the whole sample genome sequencing data and the standard reference genome of the species to which the sample belongs, wherein the result data can be used by the subsequent various processing flows as required;
and placing the matched blocks of the R1 blocks and the R2 blocks into the same data packet or ensuring that the matched blocks of the R1 blocks and the R2 blocks are uploaded successfully at the same time and are used as input of a genome data preprocessing task at the same time.
5. The genome-oriented detection data transmission and preprocessing processing system of claim 4, wherein the partitioning module further comprises: and selectively compressing the blocks to achieve the purpose of reducing the size of the transmitted data.
6. The genome-specific test data transmission and preprocessing system as recited in claim 4, wherein the block data in the transmission module are transmitted to the server in parallel and analyzed and tested in parallel, and any one of the block data is in error and has no influence on the analysis and testing of the rest of the block data.
CN201510663214.1A 2015-10-14 2015-10-14 Processing method and system for genome detection data transmission and preprocessing Active CN106603591B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510663214.1A CN106603591B (en) 2015-10-14 2015-10-14 Processing method and system for genome detection data transmission and preprocessing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510663214.1A CN106603591B (en) 2015-10-14 2015-10-14 Processing method and system for genome detection data transmission and preprocessing

Publications (2)

Publication Number Publication Date
CN106603591A CN106603591A (en) 2017-04-26
CN106603591B true CN106603591B (en) 2020-02-07

Family

ID=58552019

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510663214.1A Active CN106603591B (en) 2015-10-14 2015-10-14 Processing method and system for genome detection data transmission and preprocessing

Country Status (1)

Country Link
CN (1) CN106603591B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109698702B (en) * 2017-10-20 2020-10-23 人和未来生物科技(长沙)有限公司 Gene sequencing data compression preprocessing method, system and computer readable medium
WO2019076177A1 (en) * 2017-10-20 2019-04-25 人和未来生物科技(长沙)有限公司 Gene sequencing data compression preprocessing, compression and decompression method, system, and computer-readable medium
CN109698703B (en) * 2017-10-20 2020-10-20 人和未来生物科技(长沙)有限公司 Gene sequencing data decompression method, system and computer readable medium
CN110797081B (en) * 2019-10-17 2020-11-10 南京医基云医疗数据研究院有限公司 Activation area identification method and device, storage medium and electronic equipment
CN111199777B (en) * 2019-12-24 2023-09-29 西安交通大学 Biological big data-oriented streaming and mutation real-time mining system and method
CN111599408B (en) * 2020-04-15 2022-05-06 至本医疗科技(上海)有限公司 Gene variation cis-trans position relation detection method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609631A (en) * 2010-10-28 2012-07-25 三星Sds株式会社 Cooperation-based method of managing, displaying, and updating DNA sequence data
CN102867134A (en) * 2012-08-16 2013-01-09 盛司潼 System and method for splicing gene sequence fragments
CN103049680A (en) * 2012-12-29 2013-04-17 深圳先进技术研究院 gene sequencing data reading method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8271206B2 (en) * 2008-04-21 2012-09-18 Softgenetics Llc DNA sequence assembly methods of short reads

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609631A (en) * 2010-10-28 2012-07-25 三星Sds株式会社 Cooperation-based method of managing, displaying, and updating DNA sequence data
CN102867134A (en) * 2012-08-16 2013-01-09 盛司潼 System and method for splicing gene sequence fragments
CN103049680A (en) * 2012-12-29 2013-04-17 深圳先进技术研究院 gene sequencing data reading method and system

Also Published As

Publication number Publication date
CN106603591A (en) 2017-04-26

Similar Documents

Publication Publication Date Title
CN106603591B (en) Processing method and system for genome detection data transmission and preprocessing
US11157512B2 (en) Method and system for replicating data to heterogeneous database and detecting synchronization error of heterogeneous database through SQL packet analysis
US10778246B2 (en) Managing compression and storage of genomic data
US20120310917A1 (en) Accelerated Join Process in Relational Database Management System
US9639444B2 (en) Architecture for end-to-end testing of long-running, multi-stage asynchronous data processing services
US20160357807A1 (en) Horizontal Decision Tree Learning from Very High Rate Data Streams
CN111221807B (en) Cloud service-oriented industrial equipment big data quality testing method and framework
US20120323871A1 (en) Method for Indexed-Field Based Difference Detection and Correction
US20230216520A1 (en) System and method for data compression with encryption
CN115203004A (en) Code coverage rate testing method and device, storage medium and electronic equipment
CN109426438B (en) Real-time big data mirror image storage method and device
CN111104390A (en) Method and system for merging and checking multiple CSV files
CN112084102A (en) Interface pressure testing method and device
US11709959B2 (en) Information processing apparatus and information processing method
US9197243B2 (en) Compression ratio for a compression engine
CN117221354A (en) Multi-source heterogeneous data real-time acquisition, storage and analysis method and system
US10891216B2 (en) Parallel data flow analysis processing to stage automated vulnerability research
US11700013B2 (en) System and method for data compaction and security with extended functionality
CN115421965A (en) Consistency checking method and device, electronic equipment and storage medium
CN112612767A (en) Log file rapid analysis method and device
US11038911B2 (en) Method and system for determining risk in automotive ECU components
CN110797082A (en) Method and system for storing and reading gene sequencing data
CN115543227B (en) Cross-system data migration method, system, electronic device and storage medium
CN111125255B (en) Block data processing method and device, terminal and readable storage medium
CN117149846B (en) Power data analysis method and system based on data fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant