WO2005010642A3 - Method and apparatus for learning, recognizing and generalizing sequences - Google Patents

Method and apparatus for learning, recognizing and generalizing sequences Download PDF

Info

Publication number
WO2005010642A3
WO2005010642A3 PCT/IL2004/000704 IL2004000704W WO2005010642A3 WO 2005010642 A3 WO2005010642 A3 WO 2005010642A3 IL 2004000704 W IL2004000704 W IL 2004000704W WO 2005010642 A3 WO2005010642 A3 WO 2005010642A3
Authority
WO
WIPO (PCT)
Prior art keywords
dataset
tokens
sequences
generalizing
significant
Prior art date
Application number
PCT/IL2004/000704
Other languages
French (fr)
Other versions
WO2005010642A2 (en
Inventor
Shimon Edelman
David Horn
Eytan Ruppin
Tsach Solan
Original Assignee
Univ Ramot
Cornell Res Foundation Inc
Shimon Edelman
David Horn
Eytan Ruppin
Tsach Solan
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Univ Ramot, Cornell Res Foundation Inc, Shimon Edelman, David Horn, Eytan Ruppin, Tsach Solan filed Critical Univ Ramot
Priority to US10/566,480 priority Critical patent/US20070055662A1/en
Publication of WO2005010642A2 publication Critical patent/WO2005010642A2/en
Priority to IL173384A priority patent/IL173384A0/en
Publication of WO2005010642A3 publication Critical patent/WO2005010642A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

A method of generalizing a dataset having a plurality of sequences defined over a lexicon of tokens is provided. The method comprises: searching over the dataset for similarity sets, where each similarity set comprises a plurality of segments of size L having L-S common tokens and S uncommon tokens; and defining a plurality of equivalence classes corresponding to uncommon tokens of at least one similarity set. The method may further comprise a step in which a plurality of significant patterns are extracted, where each significant pattern corresponds to a most significant partial overlap between one sequence of the dataset and other sequences of the dataset. In one embodiment, a generalized dataset represented by a graph or a forest is constructed, and can be realized as a context-free grammar. The graph or forest can be used for generating sequences and/or testing grammatical structures.
PCT/IL2004/000704 2003-07-31 2004-08-01 Method and apparatus for learning, recognizing and generalizing sequences WO2005010642A2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US10/566,480 US20070055662A1 (en) 2004-08-01 2004-08-01 Method and apparatus for learning, recognizing and generalizing sequences
IL173384A IL173384A0 (en) 2003-07-31 2006-01-26 Method and apparatus for learning, recognizing and generalizing sequences

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US49123503P 2003-07-31 2003-07-31
US60/491,235 2003-07-31
US51055303P 2003-10-14 2003-10-14
US60/510,553 2003-10-14

Publications (2)

Publication Number Publication Date
WO2005010642A2 WO2005010642A2 (en) 2005-02-03
WO2005010642A3 true WO2005010642A3 (en) 2007-06-28

Family

ID=34107980

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IL2004/000704 WO2005010642A2 (en) 2003-07-31 2004-08-01 Method and apparatus for learning, recognizing and generalizing sequences

Country Status (1)

Country Link
WO (1) WO2005010642A2 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11615772B2 (en) * 2020-01-31 2023-03-28 Obeebo Labs Ltd. Systems, devices, and methods for musical catalog amplification services

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
ALTSCHUL S.F. ET AL.: "Gapped BLAST AND PSI_BLAST: A New Generation of Protein Database Search Programs", NUCLEIC ACIDS RESEARCH, vol. 25, 1997, pages 3389 - 3402, XP002905950 *
GOUET P. ET AL.: "ESPrint: Analysis of Multiple Sequence Alignment in PostScrit", BIOINFORMATICS, vol. 15, 1999, pages 305 - 308, XP003014382 *
REBHAN M. ET AL.: "GeneCards: a Novel Functional Genomics Compendium with Automated Data Mining and Query Reformulation Support", BIOINFORMATICS, vol. 14, 1998, pages 656 - 664, XP000890038 *
SRINIVASARAO G.Y. ET AL.: "Database of Protein Sequence Alignments: PIR_ALN", NUCLEIC ACIDS RESEARCH, vol. 27, 1999, pages 284 - 285, XP003014383 *
THOMPSON J.D. ET AL.: "CLUSTAL W: Improving the Sensitivity of Progressive Multiple Sequence Alignment Through Sequence Weighting, Position-specific Gap Penalties and Weight Matrix Choice", NUCLEIC ACIDS RESEARCH, vol. 22, 1994, pages 4673 - 4680, XP002956304 *

Also Published As

Publication number Publication date
WO2005010642A2 (en) 2005-02-03

Similar Documents

Publication Publication Date Title
CA2198306A1 (en) Method and apparatus for an improved language recognition system
WO2002077873A3 (en) System, method and apparatus for conducting a phrase search
WO2004042641A3 (en) Post-processing system and method for correcting machine recognized text
DE60005326D1 (en) DETECTION UNITS WITH COMPLEMENTARY LANGUAGE MODELS
WO2004075078A3 (en) Method and apparatus for fundamental operations on token sequences: computing similarity, extracting term values, and searching efficiently
DE60225170D1 (en) METHOD AND DEVICE FOR DECODING HANDWRITCH SIGNS
WO2009066501A1 (en) Information search method, device, and program, and computer-readable recording medium
DE60225317D1 (en) STRING IDENTIFICATION
DE602004010069D1 (en) DEVICE AND METHOD FOR TINTING LANGUAGES, AS WELL AS A KEYBOARD FOR OPERATING SUCH A DEVICE
WO2004042493A3 (en) Method and system for discovering knowledge from text documents
EP1548998A4 (en) Bit string check method and device
ATE527652T1 (en) MULTI-LEVEL LANGUAGE RECOGNITION
CN1412741A (en) Chinese speech identification method with dialect background
WO2004030261A3 (en) Method for solving waveform sequence-matching problems using multidimensional attractor tokens
CN100483310C (en) Topological phonetic alphabet input method and keyboard
AU2003232839A1 (en) Automatic segmentation of texts comprising chunsks without separators
DE68914032D1 (en) Speech recognition system.
WO2010018453A3 (en) System and method for processing electronically generated text
WO2005010642A3 (en) Method and apparatus for learning, recognizing and generalizing sequences
CN101359259B (en) Digital initial and final double-spelling input method
WO2004008433A3 (en) System and method for mandarin chinese speech recognition using an optimized phone set
Chalamandaris et al. Rule-based grapheme-to-phoneme method for the Greek
CN110688840B (en) Text conversion method and device
KR20120042381A (en) Apparatus and method for classifying sentence pattern of speech recognized sentence
RU2008101650A (en) CODED COMBINATIONS DICTIONARIES

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2007055662

Country of ref document: US

Ref document number: 10566480

Country of ref document: US

122 Ep: pct application non-entry in european phase
WWP Wipo information: published in national office

Ref document number: 10566480

Country of ref document: US