CN115798591B - Genome sequence compression method based on Hilbert fractal - Google Patents

Genome sequence compression method based on Hilbert fractal Download PDF

Info

Publication number
CN115798591B
CN115798591B CN202211680607.XA CN202211680607A CN115798591B CN 115798591 B CN115798591 B CN 115798591B CN 202211680607 A CN202211680607 A CN 202211680607A CN 115798591 B CN115798591 B CN 115798591B
Authority
CN
China
Prior art keywords
sequence
gene
compressed
reference sequence
mode
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211680607.XA
Other languages
Chinese (zh)
Other versions
CN115798591A (en
Inventor
刘志岩
郑青松
郭方
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xingyun Gene Technology Co ltd
Original Assignee
Harbin Xingyun Medical Laboratory Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Xingyun Medical Laboratory Co ltd filed Critical Harbin Xingyun Medical Laboratory Co ltd
Priority to CN202211680607.XA priority Critical patent/CN115798591B/en
Publication of CN115798591A publication Critical patent/CN115798591A/en
Application granted granted Critical
Publication of CN115798591B publication Critical patent/CN115798591B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a genome sequence compression method based on Hilbert fractal, which is used for digitally mapping a gene sequence to be compressed and determining a gene reference sequence through Euclidean distance so as to more accurately determine the reference sequence; performing redundancy elimination operation on the sequence to be compressed and the gene reference sequence, and storing the reference sequence subjected to redundancy elimination in a form of two groups after matching with the sequence to be compressed; carrying out multi-mode extraction on all binary group data by adopting Hilbert fractal transformation; the average value of each mode is extracted, the dimension is reduced, the linear correlation among the average values is eliminated, and the average value of each mode after dimension reduction is independently compressed, so that the compression efficiency is improved.

Description

Genome sequence compression method based on Hilbert fractal
Technical Field
The invention relates to the technical field of biological information, in particular to a genome sequence compression method based on Hilbert fractal.
Background
In recent years, with the continuous progress of new generation sequencing technology, the speed of gene sequencing is faster and the cost is lower, and the gene sequencing technology is popularized and applied in a plurality of fields such as more extensive biology, medical treatment, health, criminal investigation, agriculture and the like, so that the amount of raw data generated by gene sequencing is explosively increased by 3 to 5 times per year and even faster. Moreover, the sample data of each gene sequencing is large, and the storage, management, retrieval and transmission of massive gene testing data face technical and cost challenges.
Data compression is one of the techniques that alleviates this challenge. Data compression is the process of converting data into a more compact form than the original format in order to reduce storage space. The original input data contains a sequence of symbols that we need to compress or reduce in size. The symbols are encoded by a compressor, and the output is encoded data. Typically at some later time, the encoded data is input to a decompressor where it is decoded, reconstructed, and the original data is output in the form of a symbol sequence. If the output data and the input data are always identical, this compression scheme is called lossless, also called lossless encoder. Otherwise, it is a lossy compression scheme.
According to the comparison research result of the existing gene sequencing data compression method, the problems of a general compression algorithm, a compression algorithm without a reference genome or a compression algorithm with a reference genome are as follows: 1. there is room for further reduction in compression rate; 2. the compression/decompression time of the algorithm is relatively long when a relatively good compression ratio is obtained, and the time cost becomes a new problem. Furthermore, the reference genome compression algorithm generally achieves better compression ratios than the generic compression algorithm and the no reference genome compression algorithm. However, for a compression algorithm with reference genomes, the selection of the reference genomes may lead to stability problems of algorithm performance, i.e. processing the same target sample data, there may be significant differences in compression algorithm performance when different reference genomes are selected; using the same reference genome selection strategy, there may also be significant differences in the performance of the compression algorithm when processing identical, different gene sequencing sample data.
Disclosure of Invention
In order to solve the technical problems, the invention provides a genome sequence compression method based on Hilbert fractal, which comprises the following steps:
s1, digitally mapping a gene sequence to be compressed, and determining a gene reference sequence through Euclidean distance;
s2, performing redundancy elimination operation on the sequence to be compressed and the gene reference sequence;
s3, after the reference sequence subjected to redundancy removal is matched with the sequence to be compressed, the reference sequence is stored in a form of a binary group;
s4, carrying out multi-mode extraction on all binary group data by adopting Hilbert fractal transformation;
s5, reducing the dimension of the extracted mean value of each mode, eliminating the linear correlation among the mean values, and independently compressing the mean value of each mode after dimension reduction.
Further, step S1 includes: and setting n gene sequences in total, digitally mapping the n gene sequences into high-dimension digital vectors in Euclidean space, calculating the Euclidean distance sum between each gene sequence and the high-dimension digital vector of the other n-1 gene sequences, and taking the gene sequence represented by the Euclidean distance sum minimum high-dimension digital vector as a gene reference sequence.
Further, step S2 includes:
s2.1, calculating hash values of the gene reference sequence and the sequence to be compressed, taking a reference hash value generated by the gene reference sequence as an index, respectively matching the reference hash value with each hash value in the hash value sequence generated by the sequence to be compressed, and removing a plurality of gene sequences in the sequence to be compressed in the unmatched hash value sequence.
S2.2, traversing the gene reference sequence according to the step length S, obtaining a continuous sub-reference sequence, taking the continuous sub-reference sequence as an index, and sequencing a plurality of gene sequences in the matched sequence to be compressed according to the index.
S2.3, calculating hash values of the continuous sub-reference sequences and a plurality of gene sequences in the matched sequences to be compressed to form a hash table data block.
S2.4, the offset of the continuous sub-reference sequence and the matched sequence to be compressed in the whole n gene reference sequences is inserted into the hash table data block, the data block with conflict is recorded, redundancy deletion is carried out on each sub-reference sequence and the matched sequence to be compressed of the data block with conflict, and the non-redundant sub-reference sequence and the matched sequence to be compressed are reserved.
Further, step S4 includes:
step 4.1: establishing a data input system, and sampling the binary group data to obtain a binary group data set;
step 4.2: and performing modal decomposition on the obtained binary group data set by using a Hilbert fractal transformation method, and decomposing the binary group data set into a plurality of intrinsic modes.
Further, step S4.2 includes:
step 4.21: adding a filling sequence omega (t) into the obtained binary group data set X (t) to obtain a filled data set X (t):
X(t)=x(t)+ω(t);
step 4.22: decomposing the data set X (t) added with the filling sequence into a plurality of modes by using mode decomposition;
Figure BDA0004013845060000031
in the formula, h j The j-th modality of the decomposition for X (t), r n N is the number of decomposed modes for the rest state after decomposing X (t);
step 4.23: each time a different filling sequence ω is added to the dataset X (t) i (t) (i=1, 2, …, n), repeating steps 4.21 and 4.22 repeatedly, and collecting the data X after the ith decomposition i (t);
X i (t)=x(t)+ω i (t);
The method is divided into:
Figure BDA0004013845060000032
in the formula, h ij Is X i (t) the j-th modality of decomposition, r in To the X i (t) a decomposed residual state;
step 4.24: mean value of each mode obtained by decomposition
Figure BDA0004013845060000033
Figure BDA0004013845060000034
Compared with the prior art, the invention has the following beneficial technical effects:
the gene sequence to be compressed is digitally mapped, and a gene reference sequence is determined through the Euclidean distance, so that the reference sequence can be determined more accurately; performing redundancy elimination operation on the sequence to be compressed and the gene reference sequence, and storing the reference sequence subjected to redundancy elimination in a form of two groups after matching with the sequence to be compressed; carrying out multi-mode extraction on all binary group data by adopting Hilbert fractal transformation; the average value of each mode is extracted, the dimension is reduced, the linear correlation among the average values is eliminated, and the average value of each mode after dimension reduction is independently compressed, so that the compression efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.
FIG. 1 is a schematic flow chart of a genome sequence compression method based on Hilbert fractal.
FIG. 2 is a schematic flow chart of the redundancy elimination operation of the sequence to be compressed and the gene reference sequence.
FIG. 3 is a schematic diagram of a genome sequence compression system based on Hilbert fractal of the present invention.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
In the drawings of the specific embodiments of the present invention, in order to better and more clearly describe the working principle of each element in the system, the connection relationship of each part in the device is represented, but only the relative positional relationship between each element is clearly distinguished, and the limitations on the signal transmission direction, connection sequence and the structure size, dimension and shape of each part in the element or structure cannot be constructed.
Fig. 1 is a schematic flow chart of a genome sequence compression method based on hilbert fractal, which comprises the following steps:
s1, digitally mapping a gene sequence to be compressed, and determining a gene reference sequence through Euclidean distance.
Setting n gene sequences in total, digitally mapping the gene sequences to be compressed into high-dimension digital vectors in Euclidean space, calculating the sum of Euclidean distances between the high-dimension digital vectors of each gene sequence and the high-dimension digital vectors of other n-1 sequences to be compressed, and taking the gene sequence to be compressed represented by the high-dimension digital vector with the smallest value of the sum of Euclidean distances as a gene reference sequence.
S2, performing redundancy elimination operation on the sequence to be compressed and the gene reference sequence, as shown in FIG. 2, specifically comprising the following steps:
s2.1, calculating hash values of a gene reference sequence and a sequence to be compressed, generating a reference hash value according to the gene reference sequence, taking the reference hash value as an index, generating a hash value sequence by the root sequence to be compressed, respectively matching the reference hash value with each hash value in the hash value sequence, determining matching results of the reference hash value relative to each hash value in the hash value sequence, and removing a plurality of gene sequences in the sequence to be compressed in the unmatched hash value sequence.
S2.2, traversing the gene reference sequence according to the step length S aiming at the gene reference sequence, obtaining a continuous sub-reference sequence with a specified length, taking the sub-reference sequence as an index, and sequencing a plurality of gene sequences in the matched sequence to be compressed according to the index.
The gene reference sequence consists of a series of A, C, T, G, and in order to facilitate analysis and processing of data, the invention introduces a continuous sub-reference sequence which is the naming of a small continuous ACTG reference sequence, the step length s is determined, and a plurality of groups of continuous sub-reference sequences with the step length s are obtained in the gene reference sequence.
The ACTG sub-reference sequences with fixed length are taken every other step length s, and the length of the continuous sub-reference sequences is the step length s, which can be defined by the user. Assuming that the total length of the gene reference sequence is N, the number of the common continuous sub-reference sequences is N/s corresponding to the whole gene reference sequence, and the redundancy elimination optimization method of the sequence to be compressed in this embodiment aims to reduce the number of the sequence to be compressed as much as possible through an algorithm, but at the same time, the quality of the continuous sub-reference sequences must be ensured.
S2.3, calculating hash values of each continuous sub-reference sequence to form a hash table data block.
The hash table data block refers to a plurality of data blocks containing hash values, each hash value occupies one data block, the data of each data block can also comprise information about whether the current data block is idle or not, information about whether the hash values collide or not, and an index of the current data block pointing to the next data block in collision, wherein the information is used for completing processing operations according to the information when the gene sequence is inserted into the data block, the gene sequence is deleted and the gene sequence is inquired.
The hash table data block capacity is used for recording the upper limit of the data block of the hash table; the number of used data blocks in the hash table data blocks is used for representing the number of hash values which are currently inserted; the idle data block index is used for indicating the position of the current idle data block and is used for realizing that a database can be rapidly allocated to a newly inserted gene sequence for use when the gene sequence is inserted.
s2.4, inserting the offset of the continuous sub-reference sequence and the matched sequence to be compressed in the whole n gene reference sequences into the hash table data block, recording the data block with conflict, and performing redundancy deletion on each sub-reference sequence and the matched sequence to be compressed of the data block with conflict, so as to keep the non-redundant sub-reference sequence and the matched sequence to be compressed.
And S3, after the reference sequence subjected to redundancy removal is matched with the sequence to be compressed, storing the reference sequence in a form of a binary group with the offset position and the length of >.
Storing the offset and the length of a plurality of gene reference sequences in a non-redundant sequence to be compressed in a form of a binary group of < offset position, length >; the offsets and lengths of the non-redundant sub-reference sequences are also stored in the form of a < offset position, length > tuple.
S4, performing multi-mode extraction on all data in the form of binary groups, and adopting Hilbert fractal transformation as a multi-mode extraction method, wherein the method specifically comprises the following steps:
step 4.1: and establishing a data input system, and sampling the binary group data to obtain a binary group data set.
Step 4.2: and performing modal decomposition on the obtained binary group data set by using a Hilbert fractal transformation method, and decomposing the binary group data set into a plurality of intrinsic modes.
Step 4.21: adding a filling sequence omega (t) into the obtained binary group data set X (t) to obtain a filled data set X (t):
X(t)=x(t)+ω(t)。
step 4.22: decomposing the data set X (t) added with the filling sequence into a plurality of modes by using mode decomposition;
Figure BDA0004013845060000061
in the formula, h j The j-th modality of the decomposition for X (t), r n N is the number of decomposed modes, which is the remainder of the decomposition of X (t).
Step 4.23: each time a different filling sequence ω is added to the dataset X (t) i (t) (i=1, 2, …, n), repeating steps 4.21 and 4.22 repeatedly, and collecting the data X after the ith decomposition i (t);
X i (t)=x(t)+ω i (t);
The method is divided into:
Figure BDA0004013845060000062
in the formula, h ij Is X i (t) the j-th modality of decomposition, r in To the X i (t) a decomposed residual state;
step 4.24: mean value of each mode obtained by decomposition
Figure BDA0004013845060000063
Figure BDA0004013845060000064
S5, reducing the dimension of the extracted mean value of each mode, eliminating the linear correlation among the mean values, and independently compressing the mean value of each mode after dimension reduction.
In a preferred embodiment, a main mode analysis can be adopted to perform mode multivariate statistics and processing, a plurality of modes with certain correlation in an original mode space are converted into main modes which are not correlated with each other in a new space, compression dimension reduction is performed on the original mode, and meanwhile, less information loss is ensured.
FIG. 3 is a schematic structural diagram of a genome sequence compression system based on Hilbert fractal of the present invention, the genome sequence compression system comprising: the device comprises a gene reference sequence determining unit, a redundancy removing unit, a data storage unit, a Hilbert fractal transformation unit and a compression unit.
And the gene reference sequence determining unit is used for digitally mapping the gene sequence to be compressed and determining the gene reference sequence through the Euclidean distance.
And the redundancy removing unit is used for performing redundancy removing operation on the sequence to be compressed and the gene reference sequence.
And the data storage unit is used for storing the reference sequence subjected to redundancy removal in a form of a binary group after matching with the sequence to be compressed.
The Hilbert fractal transformation unit is used for carrying out multi-mode extraction on all binary group data by adopting Hilbert fractal transformation.
The compression unit is used for reducing the dimension of the extracted average value of each mode, eliminating the linear correlation among the average values, and independently compressing the average value of each mode after dimension reduction.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted across a computer-readable storage medium. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.
While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (3)

1. The genome sequence compression method based on Hilbert fractal is characterized by comprising the following steps of:
s1, digitally mapping a gene sequence to be compressed, and determining a gene reference sequence through Euclidean distance;
s2, performing redundancy elimination operation on the sequence to be compressed and the gene reference sequence;
s3, after the reference sequence subjected to redundancy removal is matched with the sequence to be compressed, the reference sequence is stored in a form of a binary group with an offset position and a length;
s4, carrying out multi-mode extraction on all binary group data by adopting Hilbert fractal transformation, wherein the method comprises the following steps:
step 4.1: establishing a data input system, and sampling the binary group data to obtain a binary group data set;
step 4.2: performing modal decomposition on the obtained binary group data set by using a Hilbert fractal transformation method, and decomposing the binary group data set into a plurality of intrinsic modes, wherein the method comprises the following steps of:
step 4.21: adding a filling sequence omega (t) into the obtained binary group data set X (t) to obtain a filled data set X (t): x (t) =x (t) +ω (t);
step 4.22: decomposing the data set X (t) added with the filling sequence into a plurality of modes by using mode decomposition;
Figure QLYQS_1
in the formula, h j The j-th modality of the decomposition for X (t), r n N is the number of decomposed modes for the rest state after decomposing X (t);
step 4.23: each time a different filling sequence ω is added to the dataset X (t) i (t) (i=1, 2, …, n), repeating steps 4.21 and 4.22 repeatedly, and collecting the data X after the ith decomposition i (t);
X i (t)=x(t)+ω i (t); the method is divided into:
Figure QLYQS_2
in the formula, h ij Is X i (t) the j-th modality of decomposition, r in To the X i (t) a decomposed residual state;
step 4.24: mean value of each mode obtained by decomposition
Figure QLYQS_3
Figure QLYQS_4
S5, reducing the dimension of the average value of each mode, carrying out mode multivariate statistics and processing by adopting main mode analysis, converting the correlated multi-mode in the original mode space into the uncorrelated main mode in the new space, eliminating the linear correlation among the average values of each mode, and independently compressing the average value of each mode after dimension reduction.
2. The method of genomic sequence compression according to claim 1, wherein step S1 comprises: and setting n gene sequences in total, digitally mapping the n gene sequences into high-dimension digital vectors in Euclidean space, calculating the Euclidean distance sum between each gene sequence and the high-dimension digital vectors of other n-1 gene sequences, and taking the gene sequence represented by the Euclidean distance sum minimum high-dimension digital vector as a gene reference sequence.
3. The method of genomic sequence compression according to claim 2, wherein step S2 comprises:
s2.1, calculating hash values of a gene reference sequence and a sequence to be compressed, taking a reference hash value generated by the gene reference sequence as an index, respectively matching the reference hash value with each hash value in the hash value sequence generated by the sequence to be compressed, and removing a plurality of gene sequences in the sequence to be compressed in the unmatched hash value sequence;
s2.2, traversing the gene reference sequence according to the step length S to obtain a continuous sub-reference sequence, taking the continuous sub-reference sequence as an index, and sequencing a plurality of gene sequences in the matched sequence to be compressed according to the index;
s2.3, calculating hash values of a plurality of gene sequences in the continuous sub-reference sequence and the matched sequence to be compressed to form a hash table data block;
s2.4, the offset of the continuous sub-reference sequence and the matched sequence to be compressed in the whole n gene reference sequences is inserted into the hash table data block, the data block with conflict is recorded, redundancy deletion is carried out on each sub-reference sequence and the matched sequence to be compressed of the data block with conflict, and the non-redundant sub-reference sequence and the matched sequence to be compressed are reserved.
CN202211680607.XA 2022-12-23 2022-12-23 Genome sequence compression method based on Hilbert fractal Active CN115798591B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211680607.XA CN115798591B (en) 2022-12-23 2022-12-23 Genome sequence compression method based on Hilbert fractal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211680607.XA CN115798591B (en) 2022-12-23 2022-12-23 Genome sequence compression method based on Hilbert fractal

Publications (2)

Publication Number Publication Date
CN115798591A CN115798591A (en) 2023-03-14
CN115798591B true CN115798591B (en) 2023-05-23

Family

ID=85426870

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211680607.XA Active CN115798591B (en) 2022-12-23 2022-12-23 Genome sequence compression method based on Hilbert fractal

Country Status (1)

Country Link
CN (1) CN115798591B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105354877A (en) * 2015-11-09 2016-02-24 北京航空航天大学 Three-dimensional grid processing method based on empirical mode decomposition and Hilbert spectrum calculation of space filling curve

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3491561A1 (en) * 2016-07-27 2019-06-05 Sequenom, Inc. Methods for non-invasive assessment of genomic instability
CN109658985B (en) * 2018-12-25 2020-07-17 人和未来生物科技(长沙)有限公司 Redundancy removal optimization method and system for gene reference sequence
CN109979537B (en) * 2019-03-15 2020-12-18 南京邮电大学 Multi-sequence-oriented gene sequence data compression method
CN114884516A (en) * 2022-05-13 2022-08-09 黑龙江八一农垦大学 Supervised data compression method based on statistical method and Hilbert envelope spectrum

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105354877A (en) * 2015-11-09 2016-02-24 北京航空航天大学 Three-dimensional grid processing method based on empirical mode decomposition and Hilbert spectrum calculation of space filling curve

Also Published As

Publication number Publication date
CN115798591A (en) 2023-03-14

Similar Documents

Publication Publication Date Title
US20210050074A1 (en) Systems and methods for sequence encoding, storage, and compression
EP2608096B1 (en) Compression of genomic data file
US9929746B2 (en) Methods and systems for data analysis and compression
KR101638594B1 (en) Method and apparatus for searching DNA sequence
EP2724278B1 (en) Methods and systems for data analysis
KR101969848B1 (en) Method and apparatus for compressing genetic data
KR20190069469A (en) Method and system for indexing bioinformatics data
EP2595076B1 (en) Compression of genomic data
CN103546160A (en) Multi-reference-sequence based gene sequence stage compression method
WO2019080670A1 (en) Gene sequencing data compression method and decompression method, system, and computer readable medium
CN112544038A (en) Method, device and equipment for compressing data of storage system and readable storage medium
CN115798591B (en) Genome sequence compression method based on Hilbert fractal
CN115867668A (en) Downsampling genomic sequence data
CN113873094A (en) Chaotic compressed sensing image encryption method
CN116861271B (en) Data analysis processing method based on big data
US20160357812A1 (en) System and method for transforming and compressing genomics data
US11244742B2 (en) System for generating genomics data, with adjusted quality scores, and device, method, and software product for use therein
CN113922823B (en) Social media information propagation graph data compression method based on constraint sparse representation
Wu et al. HD-code: End-to-end high density code for DNA storage
WO2021156110A1 (en) Improved quality value compression framework in aligned sequencing data based on novel contexts
CN116683916B (en) Disaster recovery system of data center
CN113672575A (en) Data compression method and device and storage medium
Zhang et al. DNA Image Storage Using a Scheme Based on Fuzzy Matching on Natural Genome
Dufort y Álvarez Zorrilla de San Martín Compression algorithms for biomedical signals and nanopore sequencing data
US8311994B2 (en) Run total encoded data processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 150090 east side of 5th floor, A10 building, China Cloud valley software park, No.9, Songhua Road, concentrated area, haping Road, economic development zone, Harbin City, Heilongjiang Province

Patentee after: Xingyun Gene Technology Co.,Ltd.

Address before: 150090 east side of 5th floor, A10 building, China Cloud valley software park, No.9, Songhua Road, concentrated area, haping Road, economic development zone, Harbin City, Heilongjiang Province

Patentee before: Harbin Xingyun medical laboratory Co.,Ltd.

CP01 Change in the name or title of a patent holder