EA201991907A1 - Способ и системы для эффективного сжатия прочтений геномной последовательности - Google Patents

Способ и системы для эффективного сжатия прочтений геномной последовательности

Info

Publication number
EA201991907A1
EA201991907A1 EA201991907A EA201991907A EA201991907A1 EA 201991907 A1 EA201991907 A1 EA 201991907A1 EA 201991907 A EA201991907 A EA 201991907A EA 201991907 A EA201991907 A EA 201991907A EA 201991907 A1 EA201991907 A1 EA 201991907A1
Authority
EA
Eurasian Patent Office
Prior art keywords
genomic sequence
readings
systems
genomic
effective compression
Prior art date
Application number
EA201991907A
Other languages
English (en)
Inventor
Клаудио Алберти
Мохамед Хосо Балуч
Original Assignee
Геномсыс Са
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from PCT/US2017/017842 external-priority patent/WO2018071055A1/en
Priority claimed from PCT/US2017/041579 external-priority patent/WO2018071078A1/en
Application filed by Геномсыс Са filed Critical Геномсыс Са
Publication of EA201991907A1 publication Critical patent/EA201991907A1/ru

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/20Sequence assembly
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3084Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
    • H03M7/3086Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method employing a sliding window, e.g. LZ77
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/12Protecting executable software
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6809Methods for determination or identification of nucleic acids involving differential detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/10Signal processing, e.g. from mass spectrometry [MS] or from PCR
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B45/00ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/20Heterogeneous data integration
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/50Compression of genetic data
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3084Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
    • H03M7/3088Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method employing the use of a dictionary, e.g. LZ78
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3084Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
    • H03M7/3091Data deduplication
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3084Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
    • H03M7/3091Data deduplication
    • H03M7/3095Data deduplication using variable length segments
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/70Type of the data to be coded, other than image and sound

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioethics (AREA)
  • Chemical & Material Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Organic Chemistry (AREA)
  • Technology Law (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Multimedia (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Signal Processing (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Epidemiology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Biochemistry (AREA)

Abstract

Способ и устройство для сжатия данных геномной последовательности, созданных секвенаторами генома. Прочтения последовательности кодируют путем выравнивания их относительно ранее существующих или построенных референсных последовательностей, причем процесс кодирования состоит из классифицирования прочтений в классы данных с последующим кодированием каждого класса посредством множества геномных дескрипторов. Геномные дескрипторы одного типа организуют в блоки, которые сжимают путем применения последовательных этапов преобразования, бинаризации и энтропийного кодирования. Для каждого класса данных и для каждого соответствующего дескриптора используют специальные модели источника и энтропийные кодеры.
EA201991907A 2017-02-14 2017-12-15 Способ и системы для эффективного сжатия прочтений геномной последовательности EA201991907A1 (ru)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
PCT/US2017/017842 WO2018071055A1 (en) 2016-10-11 2017-02-14 Method and apparatus for the compact representation of bioinformatics data
PCT/US2017/041579 WO2018071078A1 (en) 2016-10-11 2017-07-11 Method and apparatus for the access to bioinformatics data structured in access units
PCT/US2017/066863 WO2018151788A1 (en) 2017-02-14 2017-12-15 Method and systems for the efficient compression of genomic sequence reads

Publications (1)

Publication Number Publication Date
EA201991907A1 true EA201991907A1 (ru) 2020-01-20

Family

ID=69374527

Family Applications (1)

Application Number Title Priority Date Filing Date
EA201991907A EA201991907A1 (ru) 2017-02-14 2017-12-15 Способ и системы для эффективного сжатия прочтений геномной последовательности

Country Status (5)

Country Link
EP (1) EP3583250B1 (ru)
JP (1) JP7324145B2 (ru)
EA (1) EA201991907A1 (ru)
MX (1) MX2019009681A (ru)
WO (1) WO2018151788A1 (ru)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022008311A1 (en) * 2020-07-10 2022-01-13 Koninklijke Philips N.V. Genomic information compression by configurable machine learning-based arithmetic coding

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070192139A1 (en) 2003-04-22 2007-08-16 Ammon Cookson Systems and methods for patient re-identification
US10902937B2 (en) 2014-02-12 2021-01-26 International Business Machines Corporation Lossless compression of DNA sequences
US20160100177A1 (en) * 2014-10-06 2016-04-07 Qualcomm Incorporated Non-uniform exponential-golomb codes for palette mode coding

Also Published As

Publication number Publication date
EP3583250A4 (en) 2020-12-16
EP3583250A1 (en) 2019-12-25
JP7324145B2 (ja) 2023-08-09
MX2019009681A (es) 2019-10-09
WO2018151788A1 (en) 2018-08-23
JP2020510907A (ja) 2020-04-09
EP3583250B1 (en) 2023-07-12

Similar Documents

Publication Publication Date Title
PH12019501881A1 (en) Method and apparatus for the efficient compression of genomic sequence reads
PH12017501183A1 (en) Palette index grouping for high throughput cabac coding
CO2019009919A2 (es) Método y sistemas para la compresión eficiente de lecturas de secuencias genómicas
MX354002B (es) Aparato y método para decodificar y codificar una señal de audio utilizando selección de mosaicos espectrales adaptativos.
EA201791429A1 (ru) Контексты для больших элементов кодового дерева
WO2018111116A3 (en) Method for handling multidimensional data
TN2017000327A1 (en) Restriction on palette block size in video coding
EP3754484A3 (en) Generating encoding software and decoding means
MY190014A (en) Data compression
EA201991908A1 (ru) Способ и устройство для компактного представления биоинформационных данных с помощью нескольких геномных дескрипторов
PH12019500294A1 (en) Method and apparatuse for coding and decoding polar codes
MY178527A (en) Encoder, decoder, system and methods for encoding and decoding
MX2019004125A (es) Estructuras eficientes de datos para la representacion de informacion bioinformatica.
TW201615016A (en) Transport stream for carriage of video coding extensions
EA201991906A1 (ru) Способ и системы для восстановления геномных референсных последовательностей из сжатых прочтений геномной последовательности
PH12019500793A1 (en) Method and apparatus for compact representation of bioinformatics data
EA201991907A1 (ru) Способ и системы для эффективного сжатия прочтений геномной последовательности
MX2021011102A (es) Inicializacion de probabilidad para codificacion de video.
PH12017500790A1 (en) Image coding device, image coding method, image coding program, transmission device, transmission method, transmission program, image decoding device, image decoding method, image decoding program, reception device, reception method, and reception program
MX2020002143A (es) Metodos y aparatos para codificar y decodificar informacion de modo y dispositivo electronico.
FI4029023T3 (fi) Menetelmä genomin sekvenssitietojen pakkaamiseksi
PL412844A1 (pl) System oraz sposób kodowania obszaru odsłoniętego w strumieniu danych sekwencji wielowidokowych
MX2022005226A (es) Codificador, decodificador, metodo de codificacion, metodo de decodificacion, y programa de compresion de representaciones visuales.