CN115798591B - Genome sequence compression method based on Hilbert fractal - Google Patents
Genome sequence compression method based on Hilbert fractal Download PDFInfo
- Publication number
- CN115798591B CN115798591B CN202211680607.XA CN202211680607A CN115798591B CN 115798591 B CN115798591 B CN 115798591B CN 202211680607 A CN202211680607 A CN 202211680607A CN 115798591 B CN115798591 B CN 115798591B
- Authority
- CN
- China
- Prior art keywords
- sequence
- gene
- compressed
- reference sequence
- mode
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000007906 compression Methods 0.000 title claims abstract description 33
- 230000006835 compression Effects 0.000 title claims abstract description 33
- 238000000034 method Methods 0.000 title claims abstract description 20
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 76
- 230000008030 elimination Effects 0.000 claims abstract description 9
- 238000003379 elimination reaction Methods 0.000 claims abstract description 9
- 238000013507 mapping Methods 0.000 claims abstract description 8
- 230000009467 reduction Effects 0.000 claims abstract description 8
- 230000009466 transformation Effects 0.000 claims abstract description 8
- 238000000605 extraction Methods 0.000 claims abstract description 7
- 238000000354 decomposition reaction Methods 0.000 claims description 19
- 238000012163 sequencing technique Methods 0.000 claims description 10
- 239000013598 vector Substances 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 6
- 238000004458 analytical method Methods 0.000 claims description 3
- 238000012217 deletion Methods 0.000 claims description 3
- 230000037430 deletion Effects 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 238000011426 transformation method Methods 0.000 claims description 3
- 230000002596 correlated effect Effects 0.000 claims description 2
- 238000000491 multivariate analysis Methods 0.000 claims description 2
- 238000013144 data compression Methods 0.000 description 3
- 238000013500 data storage Methods 0.000 description 3
- 238000004590 computer program Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000000875 corresponding effect Effects 0.000 description 1
- 238000011840 criminal investigation Methods 0.000 description 1
- 230000006837 decompression Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000008054 signal transmission Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a genome sequence compression method based on Hilbert fractal, which is used for digitally mapping a gene sequence to be compressed and determining a gene reference sequence through Euclidean distance so as to more accurately determine the reference sequence; performing redundancy elimination operation on the sequence to be compressed and the gene reference sequence, and storing the reference sequence subjected to redundancy elimination in a form of two groups after matching with the sequence to be compressed; carrying out multi-mode extraction on all binary group data by adopting Hilbert fractal transformation; the average value of each mode is extracted, the dimension is reduced, the linear correlation among the average values is eliminated, and the average value of each mode after dimension reduction is independently compressed, so that the compression efficiency is improved.
Description
Technical Field
The invention relates to the technical field of biological information, in particular to a genome sequence compression method based on Hilbert fractal.
Background
In recent years, with the continuous progress of new generation sequencing technology, the speed of gene sequencing is faster and the cost is lower, and the gene sequencing technology is popularized and applied in a plurality of fields such as more extensive biology, medical treatment, health, criminal investigation, agriculture and the like, so that the amount of raw data generated by gene sequencing is explosively increased by 3 to 5 times per year and even faster. Moreover, the sample data of each gene sequencing is large, and the storage, management, retrieval and transmission of massive gene testing data face technical and cost challenges.
Data compression is one of the techniques that alleviates this challenge. Data compression is the process of converting data into a more compact form than the original format in order to reduce storage space. The original input data contains a sequence of symbols that we need to compress or reduce in size. The symbols are encoded by a compressor, and the output is encoded data. Typically at some later time, the encoded data is input to a decompressor where it is decoded, reconstructed, and the original data is output in the form of a symbol sequence. If the output data and the input data are always identical, this compression scheme is called lossless, also called lossless encoder. Otherwise, it is a lossy compression scheme.
According to the comparison research result of the existing gene sequencing data compression method, the problems of a general compression algorithm, a compression algorithm without a reference genome or a compression algorithm with a reference genome are as follows: 1. there is room for further reduction in compression rate; 2. the compression/decompression time of the algorithm is relatively long when a relatively good compression ratio is obtained, and the time cost becomes a new problem. Furthermore, the reference genome compression algorithm generally achieves better compression ratios than the generic compression algorithm and the no reference genome compression algorithm. However, for a compression algorithm with reference genomes, the selection of the reference genomes may lead to stability problems of algorithm performance, i.e. processing the same target sample data, there may be significant differences in compression algorithm performance when different reference genomes are selected; using the same reference genome selection strategy, there may also be significant differences in the performance of the compression algorithm when processing identical, different gene sequencing sample data.
Disclosure of Invention
In order to solve the technical problems, the invention provides a genome sequence compression method based on Hilbert fractal, which comprises the following steps:
s1, digitally mapping a gene sequence to be compressed, and determining a gene reference sequence through Euclidean distance;
s2, performing redundancy elimination operation on the sequence to be compressed and the gene reference sequence;
s3, after the reference sequence subjected to redundancy removal is matched with the sequence to be compressed, the reference sequence is stored in a form of a binary group;
s4, carrying out multi-mode extraction on all binary group data by adopting Hilbert fractal transformation;
s5, reducing the dimension of the extracted mean value of each mode, eliminating the linear correlation among the mean values, and independently compressing the mean value of each mode after dimension reduction.
Further, step S1 includes: and setting n gene sequences in total, digitally mapping the n gene sequences into high-dimension digital vectors in Euclidean space, calculating the Euclidean distance sum between each gene sequence and the high-dimension digital vector of the other n-1 gene sequences, and taking the gene sequence represented by the Euclidean distance sum minimum high-dimension digital vector as a gene reference sequence.
Further, step S2 includes:
s2.1, calculating hash values of the gene reference sequence and the sequence to be compressed, taking a reference hash value generated by the gene reference sequence as an index, respectively matching the reference hash value with each hash value in the hash value sequence generated by the sequence to be compressed, and removing a plurality of gene sequences in the sequence to be compressed in the unmatched hash value sequence.
S2.2, traversing the gene reference sequence according to the step length S, obtaining a continuous sub-reference sequence, taking the continuous sub-reference sequence as an index, and sequencing a plurality of gene sequences in the matched sequence to be compressed according to the index.
S2.3, calculating hash values of the continuous sub-reference sequences and a plurality of gene sequences in the matched sequences to be compressed to form a hash table data block.
S2.4, the offset of the continuous sub-reference sequence and the matched sequence to be compressed in the whole n gene reference sequences is inserted into the hash table data block, the data block with conflict is recorded, redundancy deletion is carried out on each sub-reference sequence and the matched sequence to be compressed of the data block with conflict, and the non-redundant sub-reference sequence and the matched sequence to be compressed are reserved.
Further, step S4 includes:
step 4.1: establishing a data input system, and sampling the binary group data to obtain a binary group data set;
step 4.2: and performing modal decomposition on the obtained binary group data set by using a Hilbert fractal transformation method, and decomposing the binary group data set into a plurality of intrinsic modes.
Further, step S4.2 includes:
step 4.21: adding a filling sequence omega (t) into the obtained binary group data set X (t) to obtain a filled data set X (t):
X(t)=x(t)+ω(t);
step 4.22: decomposing the data set X (t) added with the filling sequence into a plurality of modes by using mode decomposition;
in the formula, h j The j-th modality of the decomposition for X (t), r n N is the number of decomposed modes for the rest state after decomposing X (t);
step 4.23: each time a different filling sequence ω is added to the dataset X (t) i (t) (i=1, 2, …, n), repeating steps 4.21 and 4.22 repeatedly, and collecting the data X after the ith decomposition i (t);
X i (t)=x(t)+ω i (t);
The method is divided into:
in the formula, h ij Is X i (t) the j-th modality of decomposition, r in To the X i (t) a decomposed residual state;
Compared with the prior art, the invention has the following beneficial technical effects:
the gene sequence to be compressed is digitally mapped, and a gene reference sequence is determined through the Euclidean distance, so that the reference sequence can be determined more accurately; performing redundancy elimination operation on the sequence to be compressed and the gene reference sequence, and storing the reference sequence subjected to redundancy elimination in a form of two groups after matching with the sequence to be compressed; carrying out multi-mode extraction on all binary group data by adopting Hilbert fractal transformation; the average value of each mode is extracted, the dimension is reduced, the linear correlation among the average values is eliminated, and the average value of each mode after dimension reduction is independently compressed, so that the compression efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.
FIG. 1 is a schematic flow chart of a genome sequence compression method based on Hilbert fractal.
FIG. 2 is a schematic flow chart of the redundancy elimination operation of the sequence to be compressed and the gene reference sequence.
FIG. 3 is a schematic diagram of a genome sequence compression system based on Hilbert fractal of the present invention.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
In the drawings of the specific embodiments of the present invention, in order to better and more clearly describe the working principle of each element in the system, the connection relationship of each part in the device is represented, but only the relative positional relationship between each element is clearly distinguished, and the limitations on the signal transmission direction, connection sequence and the structure size, dimension and shape of each part in the element or structure cannot be constructed.
Fig. 1 is a schematic flow chart of a genome sequence compression method based on hilbert fractal, which comprises the following steps:
s1, digitally mapping a gene sequence to be compressed, and determining a gene reference sequence through Euclidean distance.
Setting n gene sequences in total, digitally mapping the gene sequences to be compressed into high-dimension digital vectors in Euclidean space, calculating the sum of Euclidean distances between the high-dimension digital vectors of each gene sequence and the high-dimension digital vectors of other n-1 sequences to be compressed, and taking the gene sequence to be compressed represented by the high-dimension digital vector with the smallest value of the sum of Euclidean distances as a gene reference sequence.
S2, performing redundancy elimination operation on the sequence to be compressed and the gene reference sequence, as shown in FIG. 2, specifically comprising the following steps:
s2.1, calculating hash values of a gene reference sequence and a sequence to be compressed, generating a reference hash value according to the gene reference sequence, taking the reference hash value as an index, generating a hash value sequence by the root sequence to be compressed, respectively matching the reference hash value with each hash value in the hash value sequence, determining matching results of the reference hash value relative to each hash value in the hash value sequence, and removing a plurality of gene sequences in the sequence to be compressed in the unmatched hash value sequence.
S2.2, traversing the gene reference sequence according to the step length S aiming at the gene reference sequence, obtaining a continuous sub-reference sequence with a specified length, taking the sub-reference sequence as an index, and sequencing a plurality of gene sequences in the matched sequence to be compressed according to the index.
The gene reference sequence consists of a series of A, C, T, G, and in order to facilitate analysis and processing of data, the invention introduces a continuous sub-reference sequence which is the naming of a small continuous ACTG reference sequence, the step length s is determined, and a plurality of groups of continuous sub-reference sequences with the step length s are obtained in the gene reference sequence.
The ACTG sub-reference sequences with fixed length are taken every other step length s, and the length of the continuous sub-reference sequences is the step length s, which can be defined by the user. Assuming that the total length of the gene reference sequence is N, the number of the common continuous sub-reference sequences is N/s corresponding to the whole gene reference sequence, and the redundancy elimination optimization method of the sequence to be compressed in this embodiment aims to reduce the number of the sequence to be compressed as much as possible through an algorithm, but at the same time, the quality of the continuous sub-reference sequences must be ensured.
S2.3, calculating hash values of each continuous sub-reference sequence to form a hash table data block.
The hash table data block refers to a plurality of data blocks containing hash values, each hash value occupies one data block, the data of each data block can also comprise information about whether the current data block is idle or not, information about whether the hash values collide or not, and an index of the current data block pointing to the next data block in collision, wherein the information is used for completing processing operations according to the information when the gene sequence is inserted into the data block, the gene sequence is deleted and the gene sequence is inquired.
The hash table data block capacity is used for recording the upper limit of the data block of the hash table; the number of used data blocks in the hash table data blocks is used for representing the number of hash values which are currently inserted; the idle data block index is used for indicating the position of the current idle data block and is used for realizing that a database can be rapidly allocated to a newly inserted gene sequence for use when the gene sequence is inserted.
s2.4, inserting the offset of the continuous sub-reference sequence and the matched sequence to be compressed in the whole n gene reference sequences into the hash table data block, recording the data block with conflict, and performing redundancy deletion on each sub-reference sequence and the matched sequence to be compressed of the data block with conflict, so as to keep the non-redundant sub-reference sequence and the matched sequence to be compressed.
And S3, after the reference sequence subjected to redundancy removal is matched with the sequence to be compressed, storing the reference sequence in a form of a binary group with the offset position and the length of >.
Storing the offset and the length of a plurality of gene reference sequences in a non-redundant sequence to be compressed in a form of a binary group of < offset position, length >; the offsets and lengths of the non-redundant sub-reference sequences are also stored in the form of a < offset position, length > tuple.
S4, performing multi-mode extraction on all data in the form of binary groups, and adopting Hilbert fractal transformation as a multi-mode extraction method, wherein the method specifically comprises the following steps:
step 4.1: and establishing a data input system, and sampling the binary group data to obtain a binary group data set.
Step 4.2: and performing modal decomposition on the obtained binary group data set by using a Hilbert fractal transformation method, and decomposing the binary group data set into a plurality of intrinsic modes.
Step 4.21: adding a filling sequence omega (t) into the obtained binary group data set X (t) to obtain a filled data set X (t):
X(t)=x(t)+ω(t)。
step 4.22: decomposing the data set X (t) added with the filling sequence into a plurality of modes by using mode decomposition;
in the formula, h j The j-th modality of the decomposition for X (t), r n N is the number of decomposed modes, which is the remainder of the decomposition of X (t).
Step 4.23: each time a different filling sequence ω is added to the dataset X (t) i (t) (i=1, 2, …, n), repeating steps 4.21 and 4.22 repeatedly, and collecting the data X after the ith decomposition i (t);
X i (t)=x(t)+ω i (t);
The method is divided into:
in the formula, h ij Is X i (t) the j-th modality of decomposition, r in To the X i (t) a decomposed residual state;
S5, reducing the dimension of the extracted mean value of each mode, eliminating the linear correlation among the mean values, and independently compressing the mean value of each mode after dimension reduction.
In a preferred embodiment, a main mode analysis can be adopted to perform mode multivariate statistics and processing, a plurality of modes with certain correlation in an original mode space are converted into main modes which are not correlated with each other in a new space, compression dimension reduction is performed on the original mode, and meanwhile, less information loss is ensured.
FIG. 3 is a schematic structural diagram of a genome sequence compression system based on Hilbert fractal of the present invention, the genome sequence compression system comprising: the device comprises a gene reference sequence determining unit, a redundancy removing unit, a data storage unit, a Hilbert fractal transformation unit and a compression unit.
And the gene reference sequence determining unit is used for digitally mapping the gene sequence to be compressed and determining the gene reference sequence through the Euclidean distance.
And the redundancy removing unit is used for performing redundancy removing operation on the sequence to be compressed and the gene reference sequence.
And the data storage unit is used for storing the reference sequence subjected to redundancy removal in a form of a binary group after matching with the sequence to be compressed.
The Hilbert fractal transformation unit is used for carrying out multi-mode extraction on all binary group data by adopting Hilbert fractal transformation.
The compression unit is used for reducing the dimension of the extracted average value of each mode, eliminating the linear correlation among the average values, and independently compressing the average value of each mode after dimension reduction.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted across a computer-readable storage medium. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.
While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (3)
1. The genome sequence compression method based on Hilbert fractal is characterized by comprising the following steps of:
s1, digitally mapping a gene sequence to be compressed, and determining a gene reference sequence through Euclidean distance;
s2, performing redundancy elimination operation on the sequence to be compressed and the gene reference sequence;
s3, after the reference sequence subjected to redundancy removal is matched with the sequence to be compressed, the reference sequence is stored in a form of a binary group with an offset position and a length;
s4, carrying out multi-mode extraction on all binary group data by adopting Hilbert fractal transformation, wherein the method comprises the following steps:
step 4.1: establishing a data input system, and sampling the binary group data to obtain a binary group data set;
step 4.2: performing modal decomposition on the obtained binary group data set by using a Hilbert fractal transformation method, and decomposing the binary group data set into a plurality of intrinsic modes, wherein the method comprises the following steps of:
step 4.21: adding a filling sequence omega (t) into the obtained binary group data set X (t) to obtain a filled data set X (t): x (t) =x (t) +ω (t);
step 4.22: decomposing the data set X (t) added with the filling sequence into a plurality of modes by using mode decomposition;
in the formula, h j The j-th modality of the decomposition for X (t), r n N is the number of decomposed modes for the rest state after decomposing X (t);
step 4.23: each time a different filling sequence ω is added to the dataset X (t) i (t) (i=1, 2, …, n), repeating steps 4.21 and 4.22 repeatedly, and collecting the data X after the ith decomposition i (t);
X i (t)=x(t)+ω i (t); the method is divided into:in the formula, h ij Is X i (t) the j-th modality of decomposition, r in To the X i (t) a decomposed residual state;
S5, reducing the dimension of the average value of each mode, carrying out mode multivariate statistics and processing by adopting main mode analysis, converting the correlated multi-mode in the original mode space into the uncorrelated main mode in the new space, eliminating the linear correlation among the average values of each mode, and independently compressing the average value of each mode after dimension reduction.
2. The method of genomic sequence compression according to claim 1, wherein step S1 comprises: and setting n gene sequences in total, digitally mapping the n gene sequences into high-dimension digital vectors in Euclidean space, calculating the Euclidean distance sum between each gene sequence and the high-dimension digital vectors of other n-1 gene sequences, and taking the gene sequence represented by the Euclidean distance sum minimum high-dimension digital vector as a gene reference sequence.
3. The method of genomic sequence compression according to claim 2, wherein step S2 comprises:
s2.1, calculating hash values of a gene reference sequence and a sequence to be compressed, taking a reference hash value generated by the gene reference sequence as an index, respectively matching the reference hash value with each hash value in the hash value sequence generated by the sequence to be compressed, and removing a plurality of gene sequences in the sequence to be compressed in the unmatched hash value sequence;
s2.2, traversing the gene reference sequence according to the step length S to obtain a continuous sub-reference sequence, taking the continuous sub-reference sequence as an index, and sequencing a plurality of gene sequences in the matched sequence to be compressed according to the index;
s2.3, calculating hash values of a plurality of gene sequences in the continuous sub-reference sequence and the matched sequence to be compressed to form a hash table data block;
s2.4, the offset of the continuous sub-reference sequence and the matched sequence to be compressed in the whole n gene reference sequences is inserted into the hash table data block, the data block with conflict is recorded, redundancy deletion is carried out on each sub-reference sequence and the matched sequence to be compressed of the data block with conflict, and the non-redundant sub-reference sequence and the matched sequence to be compressed are reserved.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211680607.XA CN115798591B (en) | 2022-12-23 | 2022-12-23 | Genome sequence compression method based on Hilbert fractal |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211680607.XA CN115798591B (en) | 2022-12-23 | 2022-12-23 | Genome sequence compression method based on Hilbert fractal |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115798591A CN115798591A (en) | 2023-03-14 |
CN115798591B true CN115798591B (en) | 2023-05-23 |
Family
ID=85426870
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211680607.XA Active CN115798591B (en) | 2022-12-23 | 2022-12-23 | Genome sequence compression method based on Hilbert fractal |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115798591B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105354877A (en) * | 2015-11-09 | 2016-02-24 | 北京航空航天大学 | Three-dimensional grid processing method based on empirical mode decomposition and Hilbert spectrum calculation of space filling curve |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3491561A1 (en) * | 2016-07-27 | 2019-06-05 | Sequenom, Inc. | Methods for non-invasive assessment of genomic instability |
CN109658985B (en) * | 2018-12-25 | 2020-07-17 | 人和未来生物科技(长沙)有限公司 | Redundancy removal optimization method and system for gene reference sequence |
CN109979537B (en) * | 2019-03-15 | 2020-12-18 | 南京邮电大学 | Multi-sequence-oriented gene sequence data compression method |
CN114884516A (en) * | 2022-05-13 | 2022-08-09 | 黑龙江八一农垦大学 | Supervised data compression method based on statistical method and Hilbert envelope spectrum |
-
2022
- 2022-12-23 CN CN202211680607.XA patent/CN115798591B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105354877A (en) * | 2015-11-09 | 2016-02-24 | 北京航空航天大学 | Three-dimensional grid processing method based on empirical mode decomposition and Hilbert spectrum calculation of space filling curve |
Also Published As
Publication number | Publication date |
---|---|
CN115798591A (en) | 2023-03-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20210050074A1 (en) | Systems and methods for sequence encoding, storage, and compression | |
EP2608096B1 (en) | Compression of genomic data file | |
US9929746B2 (en) | Methods and systems for data analysis and compression | |
KR101638594B1 (en) | Method and apparatus for searching DNA sequence | |
EP2724278B1 (en) | Methods and systems for data analysis | |
KR101969848B1 (en) | Method and apparatus for compressing genetic data | |
KR20190069469A (en) | Method and system for indexing bioinformatics data | |
EP2595076B1 (en) | Compression of genomic data | |
CN103546160A (en) | Multi-reference-sequence based gene sequence stage compression method | |
WO2019080670A1 (en) | Gene sequencing data compression method and decompression method, system, and computer readable medium | |
CN112544038A (en) | Method, device and equipment for compressing data of storage system and readable storage medium | |
CN115798591B (en) | Genome sequence compression method based on Hilbert fractal | |
CN115867668A (en) | Downsampling genomic sequence data | |
CN113873094A (en) | Chaotic compressed sensing image encryption method | |
CN116861271B (en) | Data analysis processing method based on big data | |
US20160357812A1 (en) | System and method for transforming and compressing genomics data | |
US11244742B2 (en) | System for generating genomics data, with adjusted quality scores, and device, method, and software product for use therein | |
CN113922823B (en) | Social media information propagation graph data compression method based on constraint sparse representation | |
Wu et al. | HD-code: End-to-end high density code for DNA storage | |
WO2021156110A1 (en) | Improved quality value compression framework in aligned sequencing data based on novel contexts | |
CN116683916B (en) | Disaster recovery system of data center | |
CN113672575A (en) | Data compression method and device and storage medium | |
Zhang et al. | DNA Image Storage Using a Scheme Based on Fuzzy Matching on Natural Genome | |
Dufort y Álvarez Zorrilla de San Martín | Compression algorithms for biomedical signals and nanopore sequencing data | |
US8311994B2 (en) | Run total encoded data processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP01 | Change in the name or title of a patent holder |
Address after: 150090 east side of 5th floor, A10 building, China Cloud valley software park, No.9, Songhua Road, concentrated area, haping Road, economic development zone, Harbin City, Heilongjiang Province Patentee after: Xingyun Gene Technology Co.,Ltd. Address before: 150090 east side of 5th floor, A10 building, China Cloud valley software park, No.9, Songhua Road, concentrated area, haping Road, economic development zone, Harbin City, Heilongjiang Province Patentee before: Harbin Xingyun medical laboratory Co.,Ltd. |
|
CP01 | Change in the name or title of a patent holder |