CA3052824A1 - Procede et appareil pour la representation compacte de donnees bioinformatiques au moyen de plusieurs descripteurs genomiques - Google Patents

Procede et appareil pour la representation compacte de donnees bioinformatiques au moyen de plusieurs descripteurs genomiques Download PDF

Info

Publication number
CA3052824A1
CA3052824A1 CA3052824A CA3052824A CA3052824A1 CA 3052824 A1 CA3052824 A1 CA 3052824A1 CA 3052824 A CA3052824 A CA 3052824A CA 3052824 A CA3052824 A CA 3052824A CA 3052824 A1 CA3052824 A1 CA 3052824A1
Authority
CA
Canada
Prior art keywords
reads
class
descriptors
data
genomic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CA3052824A
Other languages
English (en)
Inventor
Mohamed Khoso BALUCH
Claudio ALBERTI
Giorgio Zoia
Daniele RENZI
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Genomsys SA
Original Assignee
Genomsys SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from PCT/US2017/017842 external-priority patent/WO2018071055A1/fr
Application filed by Genomsys SA filed Critical Genomsys SA
Publication of CA3052824A1 publication Critical patent/CA3052824A1/fr
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/50Compression of genetic data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/58Random or pseudo-random number generators
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B45/00ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/10Ontologies; Annotations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B99/00Subject matter not provided for in other groups of this subclass
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/0861Generation of secret information including derivation or calculation of cryptographic keys or passwords
    • H04L9/0866Generation of secret information including derivation or calculation of cryptographic keys or passwords involving user or device identifiers, e.g. serial number, physical or biometrical information, DNA, hand-signature or measurable physical characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/30Public key, i.e. encryption algorithm being computationally infeasible to invert or user's encryption keys not requiring secrecy
    • H04L9/3066Public key, i.e. encryption algorithm being computationally infeasible to invert or user's encryption keys not requiring secrecy involving algebraic varieties, e.g. elliptic or hyper-elliptic curves
    • H04L9/3073Public key, i.e. encryption algorithm being computationally infeasible to invert or user's encryption keys not requiring secrecy involving algebraic varieties, e.g. elliptic or hyper-elliptic curves involving pairings, e.g. identity based encryption [IBE], bilinear mappings or bilinear pairings, e.g. Weil or Tate pairing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2209/00Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
    • H04L2209/30Compression, e.g. Merkle-Damgard construction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2209/00Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
    • H04L2209/34Encoding or coding, e.g. Huffman coding or error correction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2209/00Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
    • H04L2209/88Medical equipments

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Bioethics (AREA)
  • General Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Data Mining & Analysis (AREA)
  • Epidemiology (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Computer Security & Cryptography (AREA)
  • Mathematical Analysis (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Signal Processing (AREA)
  • Genetics & Genomics (AREA)
  • Computational Mathematics (AREA)
  • Algebra (AREA)
  • Human Computer Interaction (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)

Abstract

La présente invention concerne un procédé et un appareil destinés à la compression de données de séquences génomiques produites par des machines de séquençage du génome. Des lectures de séquences sont codées en les alignant par rapport à des séquences de référence préexistantes ou construites, le processus de codage étant composé d'une classification des lectures en classes de données suivie par le codage de chaque classe en termes d'une multiplicité de blocs de descripteurs. Des modèles de sources spécifiques et des codeurs d'entropie sont utilisés pour chaque classe de données dans laquelle les données sont divisées, et chaque bloc de descripteurs associé.
CA3052824A 2017-02-14 2018-02-14 Procede et appareil pour la representation compacte de donnees bioinformatiques au moyen de plusieurs descripteurs genomiques Pending CA3052824A1 (fr)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
PCT/US2017/017842 WO2018071055A1 (fr) 2016-10-11 2017-02-14 Procédé et appareil pour la représentation compacte de données bioinformatiques
USPCT/US2017/017842 2017-02-14
PCT/US2017/041591 WO2018071080A2 (fr) 2016-10-11 2017-07-11 Procédé et systèmes pour la représentation et le traitement de données bio-informatiques à l'aide de séquences de référence
USPCT/US2017/041591 2017-07-11
PCT/US2018/018092 WO2018152143A1 (fr) 2017-02-14 2018-02-14 Procédé et appareil pour la représentation compacte de données bioinformatiques au moyen de plusieurs descripteurs génomiques

Publications (1)

Publication Number Publication Date
CA3052824A1 true CA3052824A1 (fr) 2018-08-23

Family

ID=68609803

Family Applications (1)

Application Number Title Priority Date Filing Date
CA3052824A Pending CA3052824A1 (fr) 2017-02-14 2018-02-14 Procede et appareil pour la representation compacte de donnees bioinformatiques au moyen de plusieurs descripteurs genomiques

Country Status (10)

Country Link
EP (1) EP3583500A4 (fr)
KR (1) KR20190113971A (fr)
AU (1) AU2018221458B2 (fr)
CA (1) CA3052824A1 (fr)
EA (1) EA201991908A1 (fr)
IL (1) IL268651A (fr)
MX (1) MX2019009680A (fr)
SG (1) SG11201907418YA (fr)
WO (1) WO2018152143A1 (fr)
ZA (1) ZA201905921B (fr)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110189830B (zh) * 2019-05-24 2021-06-08 杭州火树科技有限公司 基于机器学习的电子病历词库训练方法
EP3896698A1 (fr) 2020-04-15 2021-10-20 Genomsys SA Procédé et système pour la compression efficace des données en mpeg-g
KR102497634B1 (ko) * 2020-12-21 2023-02-08 부산대학교 산학협력단 문자 빈도 기반 서열 재정렬을 통한 fastq 데이터 압축 방법 및 장치

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002079391A2 (fr) * 2001-04-02 2002-10-10 Cytoprint, Inc. Methode et appareil permettant de decouvrir, d'identifier et de comparer des mecanismes d'activite biologique
US7698067B2 (en) * 2002-02-12 2010-04-13 International Business Machines Corporation Sequence pattern descriptors for transmembrane structural details
US7809765B2 (en) * 2007-08-24 2010-10-05 General Electric Company Sequence identification and analysis
KR101922129B1 (ko) * 2011-12-05 2018-11-26 삼성전자주식회사 차세대 시퀀싱을 이용하여 획득된 유전 정보를 압축 및 압축해제하는 방법 및 장치
US9679104B2 (en) * 2013-01-17 2017-06-13 Edico Genome, Corp. Bioinformatics systems, apparatuses, and methods executed on an integrated circuit processing platform
CN103336916B (zh) * 2013-07-05 2016-04-06 中国科学院数学与系统科学研究院 一种测序序列映射方法及系统
US10902937B2 (en) * 2014-02-12 2021-01-26 International Business Machines Corporation Lossless compression of DNA sequences

Also Published As

Publication number Publication date
SG11201907418YA (en) 2019-09-27
AU2018221458A1 (en) 2019-10-03
KR20190113971A (ko) 2019-10-08
AU2018221458B2 (en) 2022-12-08
EP3583500A4 (fr) 2020-12-16
MX2019009680A (es) 2019-10-09
ZA201905921B (en) 2021-05-26
EA201991908A1 (ru) 2020-01-21
IL268651A (en) 2019-10-31
EP3583500A1 (fr) 2019-12-25
NZ757185A (en) 2021-05-28
WO2018152143A1 (fr) 2018-08-23

Similar Documents

Publication Publication Date Title
US20200051665A1 (en) Method and apparatus for the compact representation of bioinformatics data using multiple genomic descriptors
EP4075438B1 (fr) Structures de données efficaces pour la représentation d'informations bioinformatiques
AU2018221458B2 (en) Method and apparatus for the compact representation of bioinformatics data using multiple genomic descriptors
US11386979B2 (en) Method and system for storing and accessing bioinformatics data
JP7362481B2 (ja) ゲノムシーケンスデータをコード化する方法、コード化されたゲノムデータをデコード化する方法、ゲノムシーケンスデータをコード化するためのゲノムエンコーダ、ゲノムデータをデコードするためのゲノムデコーダ、及びコンピュータ読み取り可能な記録媒体
CA3052773A1 (fr) Procede et systemes pour la compression efficace de lectures de sequence genomique
EP3526711B1 (fr) Procédé et appareil destinés à une représentation compacte de données bioinformatiques
CN110178183B (zh) 用于传输生物信息学数据的方法和系统
CN110663022B (zh) 使用基因组描述符紧凑表示生物信息学数据的方法和设备
NZ757185B2 (en) Method and apparatus for the compact representation of bioinformatics data using multiple genomic descriptors
EA043338B1 (ru) Способ и устройство для компактного представления биоинформационных данных с помощью нескольких геномных дескрипторов
EA040022B1 (ru) Способ и устройство для компактного представления данных биоинформатики

Legal Events

Date Code Title Description
EEER Examination request

Effective date: 20230214

EEER Examination request

Effective date: 20230214

EEER Examination request

Effective date: 20230214