EA201991908A1 - METHOD AND DEVICE FOR COMPACT REPRESENTATION OF BIOINFORMATION DATA BY USING MULTIPLE GENOMIC DESCRIPTORS - Google Patents

METHOD AND DEVICE FOR COMPACT REPRESENTATION OF BIOINFORMATION DATA BY USING MULTIPLE GENOMIC DESCRIPTORS

Info

Publication number
EA201991908A1
EA201991908A1 EA201991908A EA201991908A EA201991908A1 EA 201991908 A1 EA201991908 A1 EA 201991908A1 EA 201991908 A EA201991908 A EA 201991908A EA 201991908 A EA201991908 A EA 201991908A EA 201991908 A1 EA201991908 A1 EA 201991908A1
Authority
EA
Eurasian Patent Office
Prior art keywords
data
compact representation
multiple genomic
descriptors
reads
Prior art date
Application number
EA201991908A
Other languages
Russian (ru)
Inventor
Клаудио Алберти
Гиоргио Зоиа
Даниэле Рензи
Мохамед Хосо Балуч
Original Assignee
Геномсыс Са
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from PCT/US2017/017842 external-priority patent/WO2018071055A1/en
Application filed by Геномсыс Са filed Critical Геномсыс Са
Publication of EA201991908A1 publication Critical patent/EA201991908A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/50Compression of genetic data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/58Random or pseudo-random number generators
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B45/00ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/10Ontologies; Annotations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B99/00Subject matter not provided for in other groups of this subclass
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/0861Generation of secret information including derivation or calculation of cryptographic keys or passwords
    • H04L9/0866Generation of secret information including derivation or calculation of cryptographic keys or passwords involving user or device identifiers, e.g. serial number, physical or biometrical information, DNA, hand-signature or measurable physical characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/30Public key, i.e. encryption algorithm being computationally infeasible to invert or user's encryption keys not requiring secrecy
    • H04L9/3066Public key, i.e. encryption algorithm being computationally infeasible to invert or user's encryption keys not requiring secrecy involving algebraic varieties, e.g. elliptic or hyper-elliptic curves
    • H04L9/3073Public key, i.e. encryption algorithm being computationally infeasible to invert or user's encryption keys not requiring secrecy involving algebraic varieties, e.g. elliptic or hyper-elliptic curves involving pairings, e.g. identity based encryption [IBE], bilinear mappings or bilinear pairings, e.g. Weil or Tate pairing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2209/00Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
    • H04L2209/30Compression, e.g. Merkle-Damgard construction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2209/00Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
    • H04L2209/34Encoding or coding, e.g. Huffman coding or error correction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2209/00Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
    • H04L2209/88Medical equipments

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Bioethics (AREA)
  • General Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Data Mining & Analysis (AREA)
  • Epidemiology (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Computer Security & Cryptography (AREA)
  • Mathematical Analysis (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Signal Processing (AREA)
  • Genetics & Genomics (AREA)
  • Computational Mathematics (AREA)
  • Algebra (AREA)
  • Human Computer Interaction (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)

Abstract

Способ и устройство для сжатия данных геномной последовательности, созданных секвенаторами генома. Прочтения последовательности кодируют путем выравнивания их относительно ранее существующих или построенных референсных последовательностей, причем процесс кодирования состоит из классифицирования прочтений в классы данных с последующим кодированием каждого класса посредством множества блоков дескрипторов. Для каждого класса данных, на которые разбивают данные, и каждого соответствующего блока дескрипторов используют специальные модели источников и энтропийные кодеры.Method and apparatus for compressing genomic sequence data generated by genome sequencers. The reads of the sequence are encoded by aligning them with respect to previously existing or constructed reference sequences, and the coding process consists of classifying the reads into data classes, followed by encoding each class by means of a plurality of descriptor blocks. For each class of data into which the data is split and each corresponding block of descriptors, special source models and entropy encoders are used.

EA201991908A 2017-02-14 2018-02-14 METHOD AND DEVICE FOR COMPACT REPRESENTATION OF BIOINFORMATION DATA BY USING MULTIPLE GENOMIC DESCRIPTORS EA201991908A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
PCT/US2017/017842 WO2018071055A1 (en) 2016-10-11 2017-02-14 Method and apparatus for the compact representation of bioinformatics data
PCT/US2017/041591 WO2018071080A2 (en) 2016-10-11 2017-07-11 Method and systems for the representation and processing of bioinformatics data using reference sequences
PCT/US2018/018092 WO2018152143A1 (en) 2017-02-14 2018-02-14 Method and apparatus for the compact representation of bioinformatics data using multiple genomic descriptors

Publications (1)

Publication Number Publication Date
EA201991908A1 true EA201991908A1 (en) 2020-01-21

Family

ID=68609803

Family Applications (1)

Application Number Title Priority Date Filing Date
EA201991908A EA201991908A1 (en) 2017-02-14 2018-02-14 METHOD AND DEVICE FOR COMPACT REPRESENTATION OF BIOINFORMATION DATA BY USING MULTIPLE GENOMIC DESCRIPTORS

Country Status (10)

Country Link
EP (1) EP3583500A4 (en)
KR (1) KR20190113971A (en)
AU (1) AU2018221458B2 (en)
CA (1) CA3052824A1 (en)
EA (1) EA201991908A1 (en)
IL (1) IL268651A (en)
MX (1) MX2019009680A (en)
SG (1) SG11201907418YA (en)
WO (1) WO2018152143A1 (en)
ZA (1) ZA201905921B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110189830B (en) * 2019-05-24 2021-06-08 杭州火树科技有限公司 Electronic medical record word stock training method based on machine learning
EP3896698A1 (en) 2020-04-15 2021-10-20 Genomsys SA Method and system for the efficient data compression in mpeg-g
KR102497634B1 (en) * 2020-12-21 2023-02-08 부산대학교 산학협력단 Method and apparatus for compressing fastq data through character frequency-based sequence reordering

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2002303234A1 (en) * 2001-04-02 2002-10-15 Cytoprint, Inc. Methods and apparatus for discovering, identifying and comparing biological activity mechanisms
US7698067B2 (en) * 2002-02-12 2010-04-13 International Business Machines Corporation Sequence pattern descriptors for transmembrane structural details
US7809765B2 (en) * 2007-08-24 2010-10-05 General Electric Company Sequence identification and analysis
KR101922129B1 (en) * 2011-12-05 2018-11-26 삼성전자주식회사 Method and apparatus for compressing and decompressing genetic information using next generation sequencing(NGS)
US9679104B2 (en) * 2013-01-17 2017-06-13 Edico Genome, Corp. Bioinformatics systems, apparatuses, and methods executed on an integrated circuit processing platform
CN103336916B (en) * 2013-07-05 2016-04-06 中国科学院数学与系统科学研究院 A kind of sequencing sequence mapping method and system
US10902937B2 (en) * 2014-02-12 2021-01-26 International Business Machines Corporation Lossless compression of DNA sequences

Also Published As

Publication number Publication date
SG11201907418YA (en) 2019-09-27
IL268651A (en) 2019-10-31
NZ757185A (en) 2021-05-28
CA3052824A1 (en) 2018-08-23
KR20190113971A (en) 2019-10-08
ZA201905921B (en) 2021-05-26
AU2018221458B2 (en) 2022-12-08
WO2018152143A1 (en) 2018-08-23
EP3583500A1 (en) 2019-12-25
MX2019009680A (en) 2019-10-09
AU2018221458A1 (en) 2019-10-03
EP3583500A4 (en) 2020-12-16

Similar Documents

Publication Publication Date Title
PH12019501879A1 (en) Method and apparatus for the compact representation of bioinformatics data using multiple genomic descriptors
SA517382335B1 (en) Deriving Motion Information for Sub-Blocks in Video Coding
EA201991908A1 (en) METHOD AND DEVICE FOR COMPACT REPRESENTATION OF BIOINFORMATION DATA BY USING MULTIPLE GENOMIC DESCRIPTORS
MY189223A (en) Apparatus and method for encoding or decoding a multi-channel signal using a broadband alignment parameter and a plurality of narrowband alignment parameters
MX2013002086A (en) Image encoding method, image decoding method, image encoding device, and image decoding device.
GB2545070A (en) Generating molecular encoding information for data storage
MY167857A (en) Image coding method, image decoding method, image coding apparatus, image decoding apparatus, and image coding and decoding apparatus
MX2015017219A (en) Coding and modulation apparatus using non-uniform constellation.
CO2019009919A2 (en) Method and systems for efficient compression of genomic sequence reads
DE602006009495D1 (en) QUANTIZING PARAMETERS FOR LANGUAGE AND AUDIO CODING BY PARTICULAR INFORMATION ON ATPATIC SUB-SEQUENCES
EP3754484A3 (en) Generating encoding software and decoding means
MX2021006632A (en) Three-dimensional data encoding method, three-dimensional data decoding method, three-dimensional data encoding device, and three-dimensional data decoding device.
PH12019500791A1 (en) Efficient data structures for bioinformatics information presentation
MY190014A (en) Data compression
MX2022013106A (en) Image coding device, image coding method, image coding program, image decoding device, image decoding method and image decoding program.
MX2021006569A (en) Three-dimensional data encoding method, three-dimensional data decoding method, three-dimensional data encoding device, and three-dimensional data decoding device.
SA519401514B1 (en) Method and apparatus for compact representation of bioinformatics data
EA201991907A1 (en) METHOD AND SYSTEMS FOR EFFECTIVE COMPRESSION OF READINGS OF A GENOMIC SEQUENCE
SG11201906107QA (en) Data processing method, and terminal device, and network device
AR107411A1 (en) APPARATUS AND METHOD FOR CODING OR DECODING A MULTI-CHANNEL SIGNAL USING SPECTRAL DOMAIN SAMPLING REPETITION
TH1901007951A (en) Methods and kits for polar encoders, wireless devices and computer-readable media.
FI4029023T3 (en) Method for the compression of genome sequence data
UA117004U (en) FACTORIAL DATA CODE DATA RECOVERY METHOD
PL412844A1 (en) System and method of coding of the exposed area in the multi-video sequence data stream
BR112019001415A8 (en) METHODS AND DEVICES OF REFERENCE QUANTIZATION PARAMETER DERIVATION IN VIDEO PROCESSING SYSTEM