EA201991907A1 - Способ и системы для эффективного сжатия прочтений геномной последовательности - Google Patents
Способ и системы для эффективного сжатия прочтений геномной последовательностиInfo
- Publication number
- EA201991907A1 EA201991907A1 EA201991907A EA201991907A EA201991907A1 EA 201991907 A1 EA201991907 A1 EA 201991907A1 EA 201991907 A EA201991907 A EA 201991907A EA 201991907 A EA201991907 A EA 201991907A EA 201991907 A1 EA201991907 A1 EA 201991907A1
- Authority
- EA
- Eurasian Patent Office
- Prior art keywords
- genomic sequence
- readings
- systems
- genomic
- effective compression
- Prior art date
Links
- 238000000034 method Methods 0.000 title abstract 3
- 230000006835 compression Effects 0.000 title 1
- 238000007906 compression Methods 0.000 title 1
- 230000009466 transformation Effects 0.000 abstract 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/20—Sequence assembly
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/3084—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
- H03M7/3086—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method employing a sliding window, e.g. LZ77
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/10—Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
- G06F21/12—Protecting executable software
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6809—Methods for determination or identification of nucleic acids involving differential detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/10—Signal processing, e.g. from mass spectrometry [MS] or from PCR
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B45/00—ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/20—Heterogeneous data integration
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/30—Data warehousing; Computing architectures
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/50—Compression of genetic data
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/3084—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
- H03M7/3088—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method employing the use of a dictionary, e.g. LZ78
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/3084—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
- H03M7/3091—Data deduplication
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/3084—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
- H03M7/3091—Data deduplication
- H03M7/3095—Data deduplication using variable length segments
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/70—Type of the data to be coded, other than image and sound
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioethics (AREA)
- Chemical & Material Sciences (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Analytical Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Organic Chemistry (AREA)
- Technology Law (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Multimedia (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Signal Processing (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Immunology (AREA)
- Microbiology (AREA)
- Epidemiology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Biochemistry (AREA)
Abstract
Способ и устройство для сжатия данных геномной последовательности, созданных секвенаторами генома. Прочтения последовательности кодируют путем выравнивания их относительно ранее существующих или построенных референсных последовательностей, причем процесс кодирования состоит из классифицирования прочтений в классы данных с последующим кодированием каждого класса посредством множества геномных дескрипторов. Геномные дескрипторы одного типа организуют в блоки, которые сжимают путем применения последовательных этапов преобразования, бинаризации и энтропийного кодирования. Для каждого класса данных и для каждого соответствующего дескриптора используют специальные модели источника и энтропийные кодеры.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/US2017/017842 WO2018071055A1 (en) | 2016-10-11 | 2017-02-14 | Method and apparatus for the compact representation of bioinformatics data |
PCT/US2017/041579 WO2018071078A1 (en) | 2016-10-11 | 2017-07-11 | Method and apparatus for the access to bioinformatics data structured in access units |
PCT/US2017/066863 WO2018151788A1 (en) | 2017-02-14 | 2017-12-15 | Method and systems for the efficient compression of genomic sequence reads |
Publications (1)
Publication Number | Publication Date |
---|---|
EA201991907A1 true EA201991907A1 (ru) | 2020-01-20 |
Family
ID=69374527
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EA201991907A EA201991907A1 (ru) | 2017-02-14 | 2017-12-15 | Способ и системы для эффективного сжатия прочтений геномной последовательности |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP3583250B1 (ru) |
JP (1) | JP7324145B2 (ru) |
EA (1) | EA201991907A1 (ru) |
MX (1) | MX2019009681A (ru) |
WO (1) | WO2018151788A1 (ru) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022008311A1 (en) * | 2020-07-10 | 2022-01-13 | Koninklijke Philips N.V. | Genomic information compression by configurable machine learning-based arithmetic coding |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070192139A1 (en) | 2003-04-22 | 2007-08-16 | Ammon Cookson | Systems and methods for patient re-identification |
US10902937B2 (en) | 2014-02-12 | 2021-01-26 | International Business Machines Corporation | Lossless compression of DNA sequences |
US20160100177A1 (en) * | 2014-10-06 | 2016-04-07 | Qualcomm Incorporated | Non-uniform exponential-golomb codes for palette mode coding |
-
2017
- 2017-12-15 JP JP2019542691A patent/JP7324145B2/ja active Active
- 2017-12-15 WO PCT/US2017/066863 patent/WO2018151788A1/en active Search and Examination
- 2017-12-15 EA EA201991907A patent/EA201991907A1/ru unknown
- 2017-12-15 MX MX2019009681A patent/MX2019009681A/es unknown
- 2017-12-15 EP EP17896462.3A patent/EP3583250B1/en active Active
Also Published As
Publication number | Publication date |
---|---|
EP3583250A4 (en) | 2020-12-16 |
EP3583250A1 (en) | 2019-12-25 |
JP7324145B2 (ja) | 2023-08-09 |
MX2019009681A (es) | 2019-10-09 |
WO2018151788A1 (en) | 2018-08-23 |
JP2020510907A (ja) | 2020-04-09 |
EP3583250B1 (en) | 2023-07-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
PH12019501881A1 (en) | Method and apparatus for the efficient compression of genomic sequence reads | |
PH12017501183A1 (en) | Palette index grouping for high throughput cabac coding | |
CO2019009919A2 (es) | Método y sistemas para la compresión eficiente de lecturas de secuencias genómicas | |
MX354002B (es) | Aparato y método para decodificar y codificar una señal de audio utilizando selección de mosaicos espectrales adaptativos. | |
EA201791429A1 (ru) | Контексты для больших элементов кодового дерева | |
WO2018111116A3 (en) | Method for handling multidimensional data | |
TN2017000327A1 (en) | Restriction on palette block size in video coding | |
EP3754484A3 (en) | Generating encoding software and decoding means | |
MY190014A (en) | Data compression | |
EA201991908A1 (ru) | Способ и устройство для компактного представления биоинформационных данных с помощью нескольких геномных дескрипторов | |
PH12019500294A1 (en) | Method and apparatuse for coding and decoding polar codes | |
MY178527A (en) | Encoder, decoder, system and methods for encoding and decoding | |
MX2019004125A (es) | Estructuras eficientes de datos para la representacion de informacion bioinformatica. | |
TW201615016A (en) | Transport stream for carriage of video coding extensions | |
EA201991906A1 (ru) | Способ и системы для восстановления геномных референсных последовательностей из сжатых прочтений геномной последовательности | |
PH12019500793A1 (en) | Method and apparatus for compact representation of bioinformatics data | |
EA201991907A1 (ru) | Способ и системы для эффективного сжатия прочтений геномной последовательности | |
MX2021011102A (es) | Inicializacion de probabilidad para codificacion de video. | |
PH12017500790A1 (en) | Image coding device, image coding method, image coding program, transmission device, transmission method, transmission program, image decoding device, image decoding method, image decoding program, reception device, reception method, and reception program | |
MX2020002143A (es) | Metodos y aparatos para codificar y decodificar informacion de modo y dispositivo electronico. | |
FI4029023T3 (fi) | Menetelmä genomin sekvenssitietojen pakkaamiseksi | |
PL412844A1 (pl) | System oraz sposób kodowania obszaru odsłoniętego w strumieniu danych sekwencji wielowidokowych | |
MX2022005226A (es) | Codificador, decodificador, metodo de codificacion, metodo de decodificacion, y programa de compresion de representaciones visuales. |