EA201991908A1 - METHOD AND DEVICE FOR COMPACT REPRESENTATION OF BIOINFORMATION DATA BY USING MULTIPLE GENOMIC DESCRIPTORS - Google Patents
METHOD AND DEVICE FOR COMPACT REPRESENTATION OF BIOINFORMATION DATA BY USING MULTIPLE GENOMIC DESCRIPTORSInfo
- Publication number
- EA201991908A1 EA201991908A1 EA201991908A EA201991908A EA201991908A1 EA 201991908 A1 EA201991908 A1 EA 201991908A1 EA 201991908 A EA201991908 A EA 201991908A EA 201991908 A EA201991908 A EA 201991908A EA 201991908 A1 EA201991908 A1 EA 201991908A1
- Authority
- EA
- Eurasian Patent Office
- Prior art keywords
- data
- compact representation
- multiple genomic
- descriptors
- reads
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/50—Compression of genetic data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/58—Random or pseudo-random number generators
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B45/00—ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/10—Ontologies; Annotations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/30—Data warehousing; Computing architectures
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B99/00—Subject matter not provided for in other groups of this subclass
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/08—Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
- H04L9/0861—Generation of secret information including derivation or calculation of cryptographic keys or passwords
- H04L9/0866—Generation of secret information including derivation or calculation of cryptographic keys or passwords involving user or device identifiers, e.g. serial number, physical or biometrical information, DNA, hand-signature or measurable physical characteristics
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/30—Public key, i.e. encryption algorithm being computationally infeasible to invert or user's encryption keys not requiring secrecy
- H04L9/3066—Public key, i.e. encryption algorithm being computationally infeasible to invert or user's encryption keys not requiring secrecy involving algebraic varieties, e.g. elliptic or hyper-elliptic curves
- H04L9/3073—Public key, i.e. encryption algorithm being computationally infeasible to invert or user's encryption keys not requiring secrecy involving algebraic varieties, e.g. elliptic or hyper-elliptic curves involving pairings, e.g. identity based encryption [IBE], bilinear mappings or bilinear pairings, e.g. Weil or Tate pairing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L2209/00—Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
- H04L2209/30—Compression, e.g. Merkle-Damgard construction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L2209/00—Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
- H04L2209/34—Encoding or coding, e.g. Huffman coding or error correction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L2209/00—Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
- H04L2209/88—Medical equipments
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Biophysics (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Bioethics (AREA)
- General Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Analytical Chemistry (AREA)
- Data Mining & Analysis (AREA)
- Epidemiology (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Computer Security & Cryptography (AREA)
- Mathematical Analysis (AREA)
- Computer Networks & Wireless Communication (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Signal Processing (AREA)
- Genetics & Genomics (AREA)
- Computational Mathematics (AREA)
- Algebra (AREA)
- Human Computer Interaction (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
Abstract
Способ и устройство для сжатия данных геномной последовательности, созданных секвенаторами генома. Прочтения последовательности кодируют путем выравнивания их относительно ранее существующих или построенных референсных последовательностей, причем процесс кодирования состоит из классифицирования прочтений в классы данных с последующим кодированием каждого класса посредством множества блоков дескрипторов. Для каждого класса данных, на которые разбивают данные, и каждого соответствующего блока дескрипторов используют специальные модели источников и энтропийные кодеры.Method and apparatus for compressing genomic sequence data generated by genome sequencers. The reads of the sequence are encoded by aligning them with respect to previously existing or constructed reference sequences, and the coding process consists of classifying the reads into data classes, followed by encoding each class by means of a plurality of descriptor blocks. For each class of data into which the data is split and each corresponding block of descriptors, special source models and entropy encoders are used.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/US2017/017842 WO2018071055A1 (en) | 2016-10-11 | 2017-02-14 | Method and apparatus for the compact representation of bioinformatics data |
PCT/US2017/041591 WO2018071080A2 (en) | 2016-10-11 | 2017-07-11 | Method and systems for the representation and processing of bioinformatics data using reference sequences |
PCT/US2018/018092 WO2018152143A1 (en) | 2017-02-14 | 2018-02-14 | Method and apparatus for the compact representation of bioinformatics data using multiple genomic descriptors |
Publications (1)
Publication Number | Publication Date |
---|---|
EA201991908A1 true EA201991908A1 (en) | 2020-01-21 |
Family
ID=68609803
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EA201991908A EA201991908A1 (en) | 2017-02-14 | 2018-02-14 | METHOD AND DEVICE FOR COMPACT REPRESENTATION OF BIOINFORMATION DATA BY USING MULTIPLE GENOMIC DESCRIPTORS |
Country Status (10)
Country | Link |
---|---|
EP (1) | EP3583500A4 (en) |
KR (1) | KR20190113971A (en) |
AU (1) | AU2018221458B2 (en) |
CA (1) | CA3052824A1 (en) |
EA (1) | EA201991908A1 (en) |
IL (1) | IL268651A (en) |
MX (1) | MX2019009680A (en) |
SG (1) | SG11201907418YA (en) |
WO (1) | WO2018152143A1 (en) |
ZA (1) | ZA201905921B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110189830B (en) * | 2019-05-24 | 2021-06-08 | 杭州火树科技有限公司 | Electronic medical record word stock training method based on machine learning |
EP3896698A1 (en) | 2020-04-15 | 2021-10-20 | Genomsys SA | Method and system for the efficient data compression in mpeg-g |
KR102497634B1 (en) * | 2020-12-21 | 2023-02-08 | 부산대학교 산학협력단 | Method and apparatus for compressing fastq data through character frequency-based sequence reordering |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2002303234A1 (en) * | 2001-04-02 | 2002-10-15 | Cytoprint, Inc. | Methods and apparatus for discovering, identifying and comparing biological activity mechanisms |
US7698067B2 (en) * | 2002-02-12 | 2010-04-13 | International Business Machines Corporation | Sequence pattern descriptors for transmembrane structural details |
US7809765B2 (en) * | 2007-08-24 | 2010-10-05 | General Electric Company | Sequence identification and analysis |
KR101922129B1 (en) * | 2011-12-05 | 2018-11-26 | 삼성전자주식회사 | Method and apparatus for compressing and decompressing genetic information using next generation sequencing(NGS) |
US9679104B2 (en) * | 2013-01-17 | 2017-06-13 | Edico Genome, Corp. | Bioinformatics systems, apparatuses, and methods executed on an integrated circuit processing platform |
CN103336916B (en) * | 2013-07-05 | 2016-04-06 | 中国科学院数学与系统科学研究院 | A kind of sequencing sequence mapping method and system |
US10902937B2 (en) * | 2014-02-12 | 2021-01-26 | International Business Machines Corporation | Lossless compression of DNA sequences |
-
2018
- 2018-02-14 AU AU2018221458A patent/AU2018221458B2/en active Active
- 2018-02-14 EA EA201991908A patent/EA201991908A1/en unknown
- 2018-02-14 KR KR1020197026877A patent/KR20190113971A/en not_active Application Discontinuation
- 2018-02-14 WO PCT/US2018/018092 patent/WO2018152143A1/en unknown
- 2018-02-14 EP EP18753700.6A patent/EP3583500A4/en active Pending
- 2018-02-14 CA CA3052824A patent/CA3052824A1/en active Pending
- 2018-02-14 MX MX2019009680A patent/MX2019009680A/en unknown
- 2018-02-14 SG SG11201907418YA patent/SG11201907418YA/en unknown
-
2019
- 2019-08-12 IL IL26865119A patent/IL268651A/en unknown
- 2019-09-09 ZA ZA2019/05921A patent/ZA201905921B/en unknown
Also Published As
Publication number | Publication date |
---|---|
SG11201907418YA (en) | 2019-09-27 |
IL268651A (en) | 2019-10-31 |
NZ757185A (en) | 2021-05-28 |
CA3052824A1 (en) | 2018-08-23 |
KR20190113971A (en) | 2019-10-08 |
ZA201905921B (en) | 2021-05-26 |
AU2018221458B2 (en) | 2022-12-08 |
WO2018152143A1 (en) | 2018-08-23 |
EP3583500A1 (en) | 2019-12-25 |
MX2019009680A (en) | 2019-10-09 |
AU2018221458A1 (en) | 2019-10-03 |
EP3583500A4 (en) | 2020-12-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
PH12019501879A1 (en) | Method and apparatus for the compact representation of bioinformatics data using multiple genomic descriptors | |
SA517382335B1 (en) | Deriving Motion Information for Sub-Blocks in Video Coding | |
EA201991908A1 (en) | METHOD AND DEVICE FOR COMPACT REPRESENTATION OF BIOINFORMATION DATA BY USING MULTIPLE GENOMIC DESCRIPTORS | |
MY189223A (en) | Apparatus and method for encoding or decoding a multi-channel signal using a broadband alignment parameter and a plurality of narrowband alignment parameters | |
MX2013002086A (en) | Image encoding method, image decoding method, image encoding device, and image decoding device. | |
GB2545070A (en) | Generating molecular encoding information for data storage | |
MY167857A (en) | Image coding method, image decoding method, image coding apparatus, image decoding apparatus, and image coding and decoding apparatus | |
MX2015017219A (en) | Coding and modulation apparatus using non-uniform constellation. | |
CO2019009919A2 (en) | Method and systems for efficient compression of genomic sequence reads | |
DE602006009495D1 (en) | QUANTIZING PARAMETERS FOR LANGUAGE AND AUDIO CODING BY PARTICULAR INFORMATION ON ATPATIC SUB-SEQUENCES | |
EP3754484A3 (en) | Generating encoding software and decoding means | |
MX2021006632A (en) | Three-dimensional data encoding method, three-dimensional data decoding method, three-dimensional data encoding device, and three-dimensional data decoding device. | |
PH12019500791A1 (en) | Efficient data structures for bioinformatics information presentation | |
MY190014A (en) | Data compression | |
MX2022013106A (en) | Image coding device, image coding method, image coding program, image decoding device, image decoding method and image decoding program. | |
MX2021006569A (en) | Three-dimensional data encoding method, three-dimensional data decoding method, three-dimensional data encoding device, and three-dimensional data decoding device. | |
SA519401514B1 (en) | Method and apparatus for compact representation of bioinformatics data | |
EA201991907A1 (en) | METHOD AND SYSTEMS FOR EFFECTIVE COMPRESSION OF READINGS OF A GENOMIC SEQUENCE | |
SG11201906107QA (en) | Data processing method, and terminal device, and network device | |
AR107411A1 (en) | APPARATUS AND METHOD FOR CODING OR DECODING A MULTI-CHANNEL SIGNAL USING SPECTRAL DOMAIN SAMPLING REPETITION | |
TH1901007951A (en) | Methods and kits for polar encoders, wireless devices and computer-readable media. | |
FI4029023T3 (en) | Method for the compression of genome sequence data | |
UA117004U (en) | FACTORIAL DATA CODE DATA RECOVERY METHOD | |
PL412844A1 (en) | System and method of coding of the exposed area in the multi-video sequence data stream | |
BR112019001415A8 (en) | METHODS AND DEVICES OF REFERENCE QUANTIZATION PARAMETER DERIVATION IN VIDEO PROCESSING SYSTEM |