CN106603591A - Processing method and system facing transmission and preprocessing of genome detection data - Google Patents

Processing method and system facing transmission and preprocessing of genome detection data Download PDF

Info

Publication number
CN106603591A
CN106603591A CN201510663214.1A CN201510663214A CN106603591A CN 106603591 A CN106603591 A CN 106603591A CN 201510663214 A CN201510663214 A CN 201510663214A CN 106603591 A CN106603591 A CN 106603591A
Authority
CN
China
Prior art keywords
data
block
genome
transmission
piecemeal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510663214.1A
Other languages
Chinese (zh)
Other versions
CN106603591B (en
Inventor
王振飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Genedock Technology Co Ltd
Original Assignee
Beijing Genedock Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Genedock Technology Co Ltd filed Critical Beijing Genedock Technology Co Ltd
Priority to CN201510663214.1A priority Critical patent/CN106603591B/en
Publication of CN106603591A publication Critical patent/CN106603591A/en
Application granted granted Critical
Publication of CN106603591B publication Critical patent/CN106603591B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes

Abstract

The invention relates to a processing method and system facing transmission and preprocessing of genome detection data, and belongs to the fields of genome sequencing data transmission, analysis and detection. The method comprises that the genome detection data is obtained and divided into blocks; if the genome detection data is single-chain data, the genome detection data of M read short sequences are divided into P=INT(M/N) according to that every N read short sequences belong to one block, INT() represents a top integral function, and P represents the amount of blocks; and if the genome detection data is double-chain data, each of chain data R1 and chain data R2 is blocked according to the single-chain data division method, R1 block data and R2 block data are generated, each R1 block data matches one of the R2 block data, and each R2 block data matched one of the R1 block data; and the block data is transmitted to the server for genome analysis and detection. According to the invention, time of genome data preprocessing is reduced greatly, and fault tolerance in the processing process is improved.

Description

A kind of processing method and system transmitted towards genome detection data with pretreatment
Technical field
The present invention relates to gene order-checking data transfer, analysis and detection field, more particularly to one kind is towards base Because of group detection data transmission and the processing method and system of pretreatment.
Background technology
The data prediction of genome mutation detection mainly includes two key steps in prior art:First, It is the storage service data transfer to server storage or high in the clouds, then server or high in the clouds is taken The gene order-checking data of the sample of business storage are joined using unit calculation with the standard of sample institute species Examine genome to compare.
Data transfer and Species Normal reference gene group are compared and are divided into two steps in conventional technical scheme Suddenly, and two steps are that serial is performed, that is to say, that compare with Species Normal reference gene group Process must be stored when all of sample gene order-checking data transfer to server storage or high in the clouds Just can start after service.
Data transfer to server or high in the clouds storage service have a variety of modes, such as using FTP, SCP, RSYNC etc. is transmitted based on the data syn-chronization instrument of TCP or udp protocol, directly will be with genome Data hard disc or other storage mediums are directly mounted on server, or are provided using high in the clouds storage service Client be transmitted.
For the process that the canonical reference genome with sample institute species is compared is typically to calculate close The task of collection type.The conventional technology of calculating task in to(for) the process (can be compared using high performance server Such as minicomputer or large scale computer) processed.
There is problems with prior art:
The gene order-checking original data transmissions of sample and the mistake compared with Species Normal reference gene group Journey is that serial is performed, and time-consuming;
For the genomic data of both-end sequencing, it is impossible to which the short sequence of both-end for ensureing pairing is simultaneous transmission success , or even the transmission of two dual ended data files is also serial, causes the process of whole data transfer very long;
With the Species Normal reference gene group process of comparing belonging to sample run on single server, The computing capability of the server just becomes the bottleneck of task process, and time-consuming to cause the processing procedure of the step;
The process is typically run in the way of single task, if mission failure, needs the base of whole sample Because a group sequencing data processing procedure reruns, the cost for retrying is very high, and fault-tolerant ability is very weak, further leads Cause the prolongation of process time.
The content of the invention
For the deficiencies in the prior art, the present invention proposes a kind of towards the transmission of genome detection data and pretreatment Processing method and system.
The present invention proposes a kind of processing method transmitted towards genome detection data with pretreatment, including:
Step 1, obtains the genome detection data, and the genome detection data is carried out into piecemeal, its If in the genome detection data be single-stranded data, the genome of the short sequence of M Read is detected Data, according to P=INT (M/N) per N number of Read one piecemeal of short sequence, is divided into, INT () is to take upwards Integral function, P be piecemeal quantity, if the genome detection data be double-strand data, chain data R1 with Chain data R2 are classified respectively according to single-stranded deblocking method, generate R1 block datas and R2 piecemeals Data, and one of each described R1 block data and the R2 block datas match, otherwise also So;
Step 2, block data is transferred to into server carries out genome analysises with detection.
It is described towards the transmission of genome detection data and the processing method of pretreatment, the step 1 is also wrapped Include:Optionally the piecemeal is compressed, to reach the purpose for reducing transmission data scale.
It is described towards the transmission of genome detection data and the processing method of pretreatment, by the R1 block counts According to being placed into same packet with the block data that matches in the R2 block datas or ensure the R1 The block data being mutually matched in block data and the R2 block datas is while uploads successfully simultaneously while conduct The input of one genomic data preprocessing tasks.
It is described towards the transmission of genome detection data and the processing method of pretreatment, in the step 1, point The short sequence number that block number is included according in is the short sequence numbers of Read included less than or equal to sample more than or equal to 1 Any one integer between M.
It is described towards the transmission of genome detection data and the processing method of pretreatment, described in the step 2 Block data parallel transmission carries out parallel parsing with detection, block data described in any of which to server There is mistake, the analysis and detection on block data remaining described is without impact.
The present invention also proposes a kind of processing system transmitted towards genome detection data with pretreatment, including:
Piecemeal module, for obtaining the genome detection data, the genome detection data is carried out point Block, if wherein the genome detection data is single-stranded data, the gene of the short sequence of M Read Group detection data, according to per N number of Read one piecemeal of short sequence, is divided into P=INT (M/N), and INT () is Round up function, and P is piecemeal quantity, if the genome detection data is double-strand data, chain data R1 is classified respectively with chain data R2 according to single-stranded deblocking method, generates R1 block datas and R2 Block data, and one of each described R1 block data and the R2 block datas match, instead It is as the same;
Transport module, genome analysises are carried out with detection for block data to be transferred to into server.
It is described towards the transmission of genome detection data and the processing system of pretreatment, the piecemeal module is also wrapped Include:Optionally the piecemeal is compressed, to reach the purpose for reducing transmission data scale.
It is described towards the transmission of genome detection data and the processing system of pretreatment, by the R1 block counts According to being placed into same packet with the block data that matches in the R2 block datas or ensure the R1 The block data being mutually matched in block data and the R2 block datas is while uploads successfully simultaneously while conduct The input of one genomic data preprocessing tasks.9. it is as claimed in claim 6 to detect number towards genome According to transmission and the processing system of pretreatment, it is characterised in that in the step 6, include in block data Short sequence number is any one between short sequence numbers M of Read included less than or equal to sample more than or equal to 1 Integer.
It is described towards the transmission of genome detection data and the processing system of pretreatment, institute in the transport module Block data parallel transmission is stated to server, and carries out parallel parsing with detection, block count described in any of which According to there is mistake, the analysis and detection on block data remaining described is without impact.
Invent from more than, it is an advantage of the current invention that:
The present invention compares two steps combinations by deblocking and by data transfer and species reference gene group For a complete workflow, realize complete sample gene order-checking deblockingization and process, different numbers According to the parallelization of transmission and pretreatment between piecemeal, it is to avoid genomic data is compared to be needed to wait partial data The problem being transmitted, greatly reduce data from prepare be transferred to generate pretreatment result data files this It is the time of individual process, significant in production application, Fig. 3 is may be referred to compared with the prior art.
Original identical sample gene order-checking data, size is the short sequence of M Read, in identical The data of the short sequence sizes of M Read of transmission need to spend the time of UT seconds under network environment, and are divided into INT (M/N) (1=<N<=M, N are the short sequence numbers of Read that single piecemeal is included) individual block transmission identical number According to need spend time be the UTB seconds, general UT>=UTB, because block transmission can be used preferably The multinuclear disposal ability of client computer, and it is also more effective to the utilization of the network bandwidth.Original scheme In, after needing the wait UT seconds, can just carry out the comparison and output of sorting of the gene order-checking data of sample Comparison result file, the step will take for the PT seconds.And in the present invention, after a block transmission is completed Will operating ratio pair and sequence at once.And a piecemeal is while comparing and sorting, other piecemeals Also in transmitting procedure, it is achieved that the comparison of piecemeal and sequence and the parallelization of block transmission, it is assumed that one The comparison of individual piecemeal and be ordered into final output destination file fragment processing procedure spend time average about For the PBT seconds, under identical computing capability, for the gene order-checking data of N blocks, PBT<=PT/N So,
In prior art, from data transfer to final output comparison result data file, this process needs altogether Spend the UT+PT seconds.
In the present invention, because most piecemeal comparison process and sequence and block transmission are all parallel, because The cost time of this present invention is about UTB+PBT, and UTB+PBT is much smaller than UT+PT, in actual applications, The time of the sample genomic data pretreatment time hinge structure of the present invention spends and is substantially reduced.
Description of the drawings
The whole data flowchart of the data prediction that Fig. 1 is detected for the genome mutation of both-end Reads;
The data prediction flow chart that Fig. 2 is detected for the genome mutation of single-ended detection;
Fig. 3 schemes compared with the prior art for the present invention.
Specific embodiment
The present invention solves data transfer and problem of pretreatment in the variation detection of magnanimity genomic data.It is main logical Cross carries out cutting by sample gene order-checking initial data according to short sequence (Read) number, is divided into different blocks, Then piecemeal optionally compressed parallel, transmitted, being verified and the standard ginseng with sample institute species Examine genome to compare.The processing procedure of single piecemeal is a handling process based on streaming, different Handling process between piecemeal can be while parallel running on computer cluster or server, compares other Scheme, because the genome that the process compared with Species Normal reference gene group withouts waiting for sample is surveyed Sequence initial data just starts after being all transmitted, while the process workflow parallel running between piecemeal is filled The computing capability that make use of computing cluster and high-performance server, the method is divided greatly to shorten from sample This gene order-checking original data transmissions are to this gene of finally comparing with Species Normal reference gene group The time of group variation detection data preprocessing process, in addition, because of gene order-checking initial data in the present invention Transmission and to carry out the process of mapping with the canonical reference genome of sample institute species be all with piecemeal Carry out for unit, so the transmission of single piecemeal or the failure of mapping do not interfere with whole data Processing procedure, it is only necessary to rerun unsuccessfully the transmission of piecemeal or the process of mapping, fault-tolerant energy Power is higher, and this is equally beneficial for shortening corresponding process time, it should be added that, the present invention except Support beyond the gene order-checking data of single-ended sequencing, also the gene order-checking data of support both-end sequencing is short Sequence is transmitted in pairs.
The process step flow process of the present invention is as follows:
Step 1, deblocking.Gene order-checking initial data partition strategy is used by row in the present invention Piecemeal, such as the sequencing raw data file of the short sequence of one M Read, if according to every N (1=<N<=M) One piecemeal of individual short sequence, then can be divided into P=INT (M/N), INT () is the function that rounds up, and P is Piecemeal quantity.Such as the gene order-checking sample data comprising the short sequences of 1,000,000 row Read, according to Per 100,000 row Read, mono- piecemeal, it is possible to be divided into the individual piecemeals of 10 (100w/10w).Each piecemeal is pressed It is named as suffix according to the mode for being incremented by 1 from 0 beginning block-by-block, the name of block uses filename bonus point again The mode of block number, such as the name of the first of test.fastq files piecemeal are just test.fastq_0. Genome mutation detection can be divided into single-ended sequencing and both-end sequencing according to the type of sample data, if single End sequencing, then according to above-mentioned method of partition, the present invention supports double-end monitor, both-end sequencing text simultaneously In part, two short sequential files all carry out piecemeal according to above-mentioned identical rule, and the value of wherein M, N is identical, Therefore the block count for obtaining is also identical.
Step 2, deblocking compression.Optionally data can be compressed according to practical situation.Pressure Compression process is supported to use the compress mode of any one, such as gzip to reduce transmission data scale to reach Purpose, user can as needed select different compress modes, the present invention both can first wait until both-end The identical numbering piecemeal of Reads all divides another rising to compress, it is also possible to first carry out for the piecemeal of each Reads The piecemeal of the Reads for after compression two having been compressed again is put together, two Reads same sequences The method that the piecemeal for having compressed is put together both can be put in a tar bag or new compressed package, also may be used To be a catalogue, or any one can ensure that two piecemeal simultaneous transmissions successes compressed simultaneously As the method for the input of gene preprocessing tasks.
Step 3, the alignment of double-end monitor piecemeal.Single-ended detection skips this step.For double-end monitor, by R1 Beat with the piecemeal of the identical piecemeal numbering suffix of R2 files (R1 and R2 represent two chains of gene order) Same file bag is wrapped, referred to as one piecemeal (block) realizes by this method the short sequence of both-end The synchronous transfer of row is processed.This step can also adopt others to can ensure that dividing for R1 and R2 matching alignment Block uploads the mechanism of the input successfully and as a genomic data preprocessing tasks simultaneously.For follow-up side Just illustrate, for the piecemeal of single-ended detection Reads is also referred to as a block.
For step 2 and step 3, the order of two steps is unrelated, both can first wait until both-end Reads Identical numbering piecemeal all divide and another play compression, it is also possible to be first compressed it for the piecemeal of each Reads The piecemeal of the Reads for afterwards two having been compressed again is put together, the compression of two Reads same sequences well The method put together of piecemeal both can be put in a tar bag or new compressed package, or one Individual catalogue, or any one can ensure that two piecemeal simultaneous transmission successes compressed and as base Because of the method for the input of preprocessing tasks.Compression (Compressing), the merging (Merging) seen in Fig. 1 It is shown.
Step 4, deblocking transmission.Deblocking transmission is exactly the client or API for calling respective stored Block in step 3 is transferred to into far-end server.
Step 5, decompresses piecemeal.If be not compressed in step 2, this step is skipped.In service End, for the block of transmission success in step 4, if double-end monitor, then needs to open file bag, Obtain two compression piecemeals for matching short sequence, if single-ended detection, the inherently short sequences of block Compression piecemeal, then without the need for processing, obtain original by the compression piecemeal for obtaining according to the different decompressions of compress mode Short sequence block data, and the integrity of the cryptographic Hash verification data for passing through file calculates here cryptographic Hash Algorithm can be any one hash algorithm, such as MD5 etc..
Step 6, block is compared with Species Normal reference gene group.By the original Reads obtained in step 5 Block data using compare software (such as BWA etc.) and sample species corresponding to canonical reference genome Compare, output result data file.
Step 7, preserves the short sequence data of original transmitted.For the original Reads's obtained in step 5 Block data, can be saved in corresponding storage location, so that follow-up directly reading from storage carries out genome The detection of variation, is shown in Fig. 1 and Fig. 2 shown in " ... " wire frame.
According to step 1 to step 7, processed in the way of streaming, when the handling process of all of piecemeal is all located Reason is completed, and has just obtained the canonical reference genome of whole sample gene order-checking data and sample institute species The result data of mapping is carried out, the result data can be used on demand by follow-up various handling processes.
Constitute one in the block data of previous step output by interdepending in step 1 to step 7 The block data preprocessed data stream of the genome mutation detection of piecemeal.Data flow between different piecemeals is mutual It is independent, can the parallel running in computing cluster and multiple-core server.
Whole flow process in step 1 to step 7 is all any step failure with piecemeal (block) as unit Only result in corresponding piecemeal handling process failure, the process without affecting other piecemeals.And can lead to Cross and rerun unsuccessfully the corresponding flow process of piecemeal and be capable of achieving the fault-tolerant of whole flow chart of data processing.
The whole data flowchart of the data prediction of the genome mutation detection for both-end Reads can be with With reference to Fig. 1, for the data prediction flow chart that the genome mutation of single-ended detection is detected may be referred to figure 2。
The present invention also proposes a kind of processing system transmitted towards genome detection data with pretreatment, including:
Piecemeal module, for obtaining the genome detection data, the genome detection data is carried out point Block, if wherein the genome detection data is single-stranded data, the gene of the short sequence of M Read Group detection data, according to every N (1=<N<=M) short one piecemeal of sequence of individual Read, it is divided into P=INT (M/N), To round up function, P is piecemeal quantity to INT (), such as comprising the short sequences of 1,000,000 row Read Gene order-checking sample data, according to per 100,000 row Read, mono- piecemeal, it is possible to be divided into 10 (100w/10w) Individual piecemeal.If the genome detection data is double-strand data, chain data R1 are pressed respectively with chain data R2 Classified according to single-stranded deblocking method, generated R1 block datas and R2 block datas, and described in each One of R1 block datas and described R2 block datas match, and vice versa;
Transport module, genome analysises are carried out with detection for block data to be transferred to into server.
The piecemeal module also includes:Optionally the piecemeal is compressed, to reach transmission number is reduced According to the purpose of scale.
The block data matched in the R1 block datas and the R2 block datas is compressed into into same number According to bag, or others are adopted to can ensure that the piecemeals of R1 and R2 matching alignment while uploading successfully simultaneously conduct The mechanism of the input of one genomic data preprocessing tasks.
Block data parallel transmission described in the transport module carries out parallel parsing with inspection to server Survey, mistake occurs in block data described in any of which, the analysis and detection to block data remaining described is without shadow Ring.
Present invention additionally comprises following preferred version, as follows:
For the method for partition of sample gene order-checking data in step 1, the strategy of piecemeal is according to Read Short sequence number carries out piecemeal, so for the magnitude range of a piecemeal can be the short sequence one of Read It is individual to be chunked into multiple Read one piecemeal of short sequence, in the name of piecemeal both can as described above, Can be that any one can represent differentiation blocked file and can represent continuous succession between piecemeal Naming method.
For step 2 and step 3, the order of two steps is unrelated, both can first wait until both-end Reads Identical numbering piecemeal all divide and another play compression, it is also possible to be first compressed it for the piecemeal of each Reads The piecemeal of the Reads for afterwards two having been compressed again is put together, the compression of two Reads same sequences well The method put together of piecemeal both can be put in a tar bag or new compressed package, or one Individual catalogue, or any one can ensure that two piecemeal simultaneous transmission successes compressed and as base Because of the method for the input of preprocessing tasks.
Transmission means in step 4 can be that any one can be a data file from a storage position Put the method for replicating or moving to another storage location, such as SCP, FTP, local copy etc.. Storage location both can be local direct server storage, or SAN or NAS, and distribution Formula file system or high in the clouds storage service.
In step 6 will the Reads piecemeals that obtain of decompression, can be used for and sample institute species using any The software that canonical reference genome is compared, is compared with the canonical reference genome of the species belonging to sample It is right, export the result of contrast
The processing procedure contrasted with the canonical reference genome of sample institute species in step 6 both can be with Operating in computer cluster can also operate on the server with multinuclear disposal ability.

Claims (10)

1. it is a kind of towards genome detection data transmission and pretreatment processing method, it is characterised in that include:
Step 1, obtains the genome detection data, and the genome detection data is carried out into piecemeal, its If in the genome detection data be single-stranded data, the genome of the short sequence of M Read is detected Data, according to P=INT (M/N) per N number of Read one piecemeal of short sequence, is divided into, INT () is to take upwards Integral function, P be piecemeal quantity, if the genome detection data be double-strand data, chain data R1 with Chain data R2 are classified respectively according to single-stranded deblocking method, generate R1 block datas and R2 piecemeals Data, and one of each described R1 block data and the R2 block datas match, otherwise also So;
Step 2, block data is transferred to into server carries out genome analysises with detection.
2. as claimed in claim 1 towards the transmission of genome detection data and the processing method of pretreatment, its It is characterised by, the step 1 also includes:Optionally the piecemeal is compressed, is passed with reaching to reduce The purpose of transmission of data scale.
3. as claimed in claim 1 towards the transmission of genome detection data and the processing method of pretreatment, its It is characterised by, the block data matched in the R1 block datas and the R2 block datas is placed into Same packet or the block count being mutually matched in ensureing the R1 block datas and the R2 block datas According to while uploading successfully and simultaneously as the input of a genomic data preprocessing tasks.
4. as claimed in claim 1 towards the transmission of genome detection data and the processing method of pretreatment, its It is characterised by, in the step 1, the short sequence number included in block data is to be less than or equal to more than or equal to 1 Any one integer between short sequence numbers M of Read that sample is included.
5. as claimed in claim 1 towards the transmission of genome detection data and the processing method of pretreatment, its Be characterised by, block data parallel transmission described in the step 2 to server, and carry out parallel parsing with There is mistake in detection, block data described in any of which, analysis and the detection nothing to block data remaining described Affect.
6. it is a kind of towards genome detection data transmission and pretreatment processing system, it is characterised in that include:
Piecemeal module, for obtaining the genome detection data, the genome detection data is carried out point Block, if wherein the genome detection data is single-stranded data, the gene of the short sequence of M Read Group detection data, according to per N number of Read one piecemeal of short sequence, is divided into P=INT (M/N), and INT () is Round up function, and P is piecemeal quantity, if the genome detection data is double-strand data, chain data R1 is classified respectively with chain data R2 according to single-stranded deblocking method, generates R1 block datas and R2 Block data, and one of each described R1 block data and the R2 block datas match, instead It is as the same;
Transport module, genome analysises are carried out with detection for block data to be transferred to into server.
7. as claimed in claim 6 towards the transmission of genome detection data and the processing system of pretreatment, its It is characterised by, the piecemeal module also includes:Optionally the piecemeal is compressed, to reach reduction The purpose of transmission data scale.
8. as claimed in claim 7 towards the transmission of genome detection data and the processing system of pretreatment, its It is characterised by, the block data matched in the R1 block datas and the R2 block datas is placed into Same packet or the block count being mutually matched in ensureing the R1 block datas and the R2 block datas According to while uploading successfully and simultaneously as the input of a genomic data preprocessing tasks.
9. as claimed in claim 6 towards the transmission of genome detection data and the processing system of pretreatment, its It is characterised by, in the step 6, the short sequence number included in block data is to be less than or equal to more than or equal to 1 Any one integer between short sequence numbers M of Read that sample is included.
10. as claimed in claim 6 towards the transmission of genome detection data and the processing system of pretreatment, Characterized in that, block data parallel transmission described in the transport module is to server, and divided parallel There is mistake with detection, block data described in any of which in analysis, analysis and the inspection to block data remaining described Survey without impact.
CN201510663214.1A 2015-10-14 2015-10-14 Processing method and system for genome detection data transmission and preprocessing Active CN106603591B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510663214.1A CN106603591B (en) 2015-10-14 2015-10-14 Processing method and system for genome detection data transmission and preprocessing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510663214.1A CN106603591B (en) 2015-10-14 2015-10-14 Processing method and system for genome detection data transmission and preprocessing

Publications (2)

Publication Number Publication Date
CN106603591A true CN106603591A (en) 2017-04-26
CN106603591B CN106603591B (en) 2020-02-07

Family

ID=58552019

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510663214.1A Active CN106603591B (en) 2015-10-14 2015-10-14 Processing method and system for genome detection data transmission and preprocessing

Country Status (1)

Country Link
CN (1) CN106603591B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019076177A1 (en) * 2017-10-20 2019-04-25 人和未来生物科技(长沙)有限公司 Gene sequencing data compression preprocessing, compression and decompression method, system, and computer-readable medium
CN109698702A (en) * 2017-10-20 2019-04-30 人和未来生物科技(长沙)有限公司 Gene sequencing data compression preprocess method, system and computer-readable medium
CN109698703A (en) * 2017-10-20 2019-04-30 人和未来生物科技(长沙)有限公司 Gene sequencing data decompression method, system and computer-readable medium
CN110797081A (en) * 2019-10-17 2020-02-14 南京医基云医疗数据研究院有限公司 Activation area identification method and device, storage medium and electronic equipment
CN111199777A (en) * 2019-12-24 2020-05-26 西安交通大学 Biological big data oriented streaming transmission and variation real-time mining system and method
CN111599408A (en) * 2020-04-15 2020-08-28 至本医疗科技(上海)有限公司 Gene variation cis-trans position relation detection method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090318310A1 (en) * 2008-04-21 2009-12-24 Softgenetics Llc DNA Sequence Assembly Methods of Short Reads
CN102609631A (en) * 2010-10-28 2012-07-25 三星Sds株式会社 Cooperation-based method of managing, displaying, and updating DNA sequence data
CN102867134A (en) * 2012-08-16 2013-01-09 盛司潼 System and method for splicing gene sequence fragments
CN103049680A (en) * 2012-12-29 2013-04-17 深圳先进技术研究院 gene sequencing data reading method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090318310A1 (en) * 2008-04-21 2009-12-24 Softgenetics Llc DNA Sequence Assembly Methods of Short Reads
CN102609631A (en) * 2010-10-28 2012-07-25 三星Sds株式会社 Cooperation-based method of managing, displaying, and updating DNA sequence data
CN102867134A (en) * 2012-08-16 2013-01-09 盛司潼 System and method for splicing gene sequence fragments
CN103049680A (en) * 2012-12-29 2013-04-17 深圳先进技术研究院 gene sequencing data reading method and system

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019076177A1 (en) * 2017-10-20 2019-04-25 人和未来生物科技(长沙)有限公司 Gene sequencing data compression preprocessing, compression and decompression method, system, and computer-readable medium
CN109698702A (en) * 2017-10-20 2019-04-30 人和未来生物科技(长沙)有限公司 Gene sequencing data compression preprocess method, system and computer-readable medium
CN109698703A (en) * 2017-10-20 2019-04-30 人和未来生物科技(长沙)有限公司 Gene sequencing data decompression method, system and computer-readable medium
CN109698703B (en) * 2017-10-20 2020-10-20 人和未来生物科技(长沙)有限公司 Gene sequencing data decompression method, system and computer readable medium
CN109698702B (en) * 2017-10-20 2020-10-23 人和未来生物科技(长沙)有限公司 Gene sequencing data compression preprocessing method, system and computer readable medium
US11551785B2 (en) 2017-10-20 2023-01-10 Genetalks Bio-Tech (Changsha) Co., Ltd. Gene sequencing data compression preprocessing, compression and decompression method, system, and computer-readable medium
CN110797081A (en) * 2019-10-17 2020-02-14 南京医基云医疗数据研究院有限公司 Activation area identification method and device, storage medium and electronic equipment
CN111199777A (en) * 2019-12-24 2020-05-26 西安交通大学 Biological big data oriented streaming transmission and variation real-time mining system and method
CN111199777B (en) * 2019-12-24 2023-09-29 西安交通大学 Biological big data-oriented streaming and mutation real-time mining system and method
CN111599408A (en) * 2020-04-15 2020-08-28 至本医疗科技(上海)有限公司 Gene variation cis-trans position relation detection method, device, equipment and storage medium
CN111599408B (en) * 2020-04-15 2022-05-06 至本医疗科技(上海)有限公司 Gene variation cis-trans position relation detection method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN106603591B (en) 2020-02-07

Similar Documents

Publication Publication Date Title
CN106603591A (en) Processing method and system facing transmission and preprocessing of genome detection data
US10778246B2 (en) Managing compression and storage of genomic data
US8819288B2 (en) Optimized data stream compression using data-dependent chunking
US9811550B2 (en) Security for multi-tenant deduplication datastore against other tenants
US11627207B2 (en) Systems and methods for data deduplication by generating similarity metrics using sketch computation
CN109447641B (en) Method and apparatus for transmitting blockchain data to blockchain browser
US7882084B1 (en) Compression of data transmitted over a network
KR20130095194A (en) Optimization of storage and transmission of data
Akter et al. Performance analysis of personal cloud storage services for mobile multimedia health record management
Upadhyay et al. Deduplication and compression techniques in cloud design
CN113111043B (en) Method, device, system and storage medium for processing medium source data file
Xiao et al. Towards web-based delta synchronization for cloud storage services
CN104378234A (en) Cross-data-center data transmission processing method and system
Pamies-Juarez et al. Decentralized erasure coding for efficient data archival in distributed storage systems
CN105721526B (en) The synchronous method and device of a kind of terminal, server file
US11762557B2 (en) System and method for data compaction and encryption of anonymized datasets
US11831343B2 (en) System and method for data compression with encryption
CN104408178B (en) WEB controls loading device and method
Vaucher et al. ZipLine: in-network compression at line speed
US20220156233A1 (en) Systems and methods for sketch computation
US20240020006A1 (en) System and method for compaction of floating-point numbers within a dataset
CN103605768A (en) Massive file synchronization speed increasing method in storage systems
JP5961471B2 (en) Output comparison method in multiple information systems
US20210191640A1 (en) Systems and methods for data segment processing
Xia et al. NetSync: A network adaptive and deduplication-inspired delta synchronization approach for cloud storage services

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant