CA2077604A1 - Methode et dispositif pour determiner les frequences des mots dans un document sans decodage d'image - Google Patents

Methode et dispositif pour determiner les frequences des mots dans un document sans decodage d'image

Info

Publication number
CA2077604A1
CA2077604A1 CA2077604A CA2077604A CA2077604A1 CA 2077604 A1 CA2077604 A1 CA 2077604A1 CA 2077604 A CA2077604 A CA 2077604A CA 2077604 A CA2077604 A CA 2077604A CA 2077604 A1 CA2077604 A1 CA 2077604A1
Authority
CA
Canada
Prior art keywords
document
determining
frequency
words
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CA2077604A
Other languages
English (en)
Other versions
CA2077604C (fr
Inventor
Todd A. Cass
Per-Kristian Halvorsen
Daniel P. Huttenlocher
Ronald M. Kaplan
M. Margaret Withgott
Ramana B. Rao
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xerox Corp
Original Assignee
Xerox Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xerox Corp filed Critical Xerox Corp
Publication of CA2077604A1 publication Critical patent/CA2077604A1/fr
Application granted granted Critical
Publication of CA2077604C publication Critical patent/CA2077604C/fr
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/26Techniques for post-processing, e.g. correcting the recognition result
    • G06V30/262Techniques for post-processing, e.g. correcting the recognition result using context analysis, e.g. lexical, syntactic or semantic context
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Character Discrimination (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)
  • Character Input (AREA)
  • Image Input (AREA)
  • Processing Or Creating Images (AREA)
CA002077604A 1991-11-19 1992-09-04 Methode et dispositif pour determiner les frequences des mots dans un document sans decodage d'image Expired - Fee Related CA2077604C (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US79517391A 1991-11-19 1991-11-19
US795,173 1991-11-19

Publications (2)

Publication Number Publication Date
CA2077604A1 true CA2077604A1 (fr) 1993-05-20
CA2077604C CA2077604C (fr) 1999-07-06

Family

ID=25164902

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002077604A Expired - Fee Related CA2077604C (fr) 1991-11-19 1992-09-04 Methode et dispositif pour determiner les frequences des mots dans un document sans decodage d'image

Country Status (5)

Country Link
US (1) US5325444A (fr)
EP (1) EP0544430B1 (fr)
JP (1) JP3282860B2 (fr)
CA (1) CA2077604C (fr)
DE (1) DE69229468T2 (fr)

Families Citing this family (56)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2077274C (fr) * 1991-11-19 1997-07-15 M. Margaret Withgott Methode et appareil pour resumer un document d'imagerie sans le decoder
EP0574937B1 (fr) * 1992-06-19 2000-08-16 United Parcel Service Of America, Inc. Méthode et dispositif pour classification d'entrée utilisant un réseau neuronal
US6212299B1 (en) 1992-12-11 2001-04-03 Matsushita Electric Industrial Co., Ltd. Method and apparatus for recognizing a character
JP3272842B2 (ja) * 1992-12-17 2002-04-08 ゼロックス・コーポレーション プロセッサベースの判定方法
JP3422541B2 (ja) * 1992-12-17 2003-06-30 ゼロックス・コーポレーション キーワードのモデル化方法及び非キーワードhmmの提供方法
US5438630A (en) * 1992-12-17 1995-08-01 Xerox Corporation Word spotting in bitmap images using word bounding boxes and hidden Markov models
US5485566A (en) * 1993-10-29 1996-01-16 Xerox Corporation Method of finding columns in tabular documents
US6463176B1 (en) * 1994-02-02 2002-10-08 Canon Kabushiki Kaisha Image recognition/reproduction method and apparatus
EP0702322B1 (fr) * 1994-09-12 2002-02-13 Adobe Systems Inc. Méthode et appareil pour identifier des mots décrits dans un document électronique portable
CA2154952A1 (fr) * 1994-09-12 1996-03-13 Robert M. Ayers Methode et appareil de reconnaissance de mots decrits dans un fichier de description de pages
EP0723247B1 (fr) 1995-01-17 1998-07-29 Eastman Kodak Company Système et procédé d'évaluation de l'image d'un formulaire
US5774588A (en) * 1995-06-07 1998-06-30 United Parcel Service Of America, Inc. Method and system for comparing strings with entries of a lexicon
US5764799A (en) * 1995-06-26 1998-06-09 Research Foundation Of State Of State Of New York OCR method and apparatus using image equivalents
US5778397A (en) * 1995-06-28 1998-07-07 Xerox Corporation Automatic method of generating feature probabilities for automatic extracting summarization
US6041137A (en) 1995-08-25 2000-03-21 Microsoft Corporation Radical definition and dictionary creation for a handwriting recognition system
US6078915A (en) * 1995-11-22 2000-06-20 Fujitsu Limited Information processing system
US5892842A (en) * 1995-12-14 1999-04-06 Xerox Corporation Automatic method of identifying sentence boundaries in a document image
US5848191A (en) * 1995-12-14 1998-12-08 Xerox Corporation Automatic method of generating thematic summaries from a document image without performing character recognition
US5850476A (en) * 1995-12-14 1998-12-15 Xerox Corporation Automatic method of identifying drop words in a document image without performing character recognition
JP2973944B2 (ja) * 1996-06-26 1999-11-08 富士ゼロックス株式会社 文書処理装置および文書処理方法
US5956468A (en) * 1996-07-12 1999-09-21 Seiko Epson Corporation Document segmentation system
JP3427692B2 (ja) * 1996-11-20 2003-07-22 松下電器産業株式会社 文字認識方法および文字認識装置
US6562077B2 (en) 1997-11-14 2003-05-13 Xerox Corporation Sorting image segments into clusters based on a distance measurement
US6665841B1 (en) 1997-11-14 2003-12-16 Xerox Corporation Transmission of subsets of layout objects at different resolutions
US5999664A (en) * 1997-11-14 1999-12-07 Xerox Corporation System for searching a corpus of document images by user specified document layout components
US7152031B1 (en) * 2000-02-25 2006-12-19 Novell, Inc. Construction, manipulation, and comparison of a multi-dimensional semantic space
US6337924B1 (en) * 1999-02-26 2002-01-08 Hewlett-Packard Company System and method for accurately recognizing text font in a document processing system
US6459809B1 (en) * 1999-07-12 2002-10-01 Novell, Inc. Searching and filtering content streams using contour transformations
US7672952B2 (en) * 2000-07-13 2010-03-02 Novell, Inc. System and method of semantic correlation of rich content
US7653530B2 (en) * 2000-07-13 2010-01-26 Novell, Inc. Method and mechanism for the creation, maintenance, and comparison of semantic abstracts
US7286977B1 (en) 2000-09-05 2007-10-23 Novell, Inc. Intentional-stance characterization of a general content stream or repository
US20090234718A1 (en) * 2000-09-05 2009-09-17 Novell, Inc. Predictive service systems using emotion detection
US20100122312A1 (en) * 2008-11-07 2010-05-13 Novell, Inc. Predictive service systems
AU2000278962A1 (en) * 2000-10-19 2002-04-29 Copernic.Com Text extraction method for html pages
US8682077B1 (en) 2000-11-28 2014-03-25 Hand Held Products, Inc. Method for omnidirectional processing of 2D images including recognizable characters
US6985908B2 (en) * 2001-11-01 2006-01-10 Matsushita Electric Industrial Co., Ltd. Text classification apparatus
US7106905B2 (en) * 2002-08-23 2006-09-12 Hewlett-Packard Development Company, L.P. Systems and methods for processing text-based electronic documents
US7734627B1 (en) 2003-06-17 2010-06-08 Google Inc. Document similarity detection
GB2403558A (en) * 2003-07-02 2005-01-05 Sony Uk Ltd Document searching and method for presenting the results
US7207004B1 (en) * 2004-07-23 2007-04-17 Harrity Paul A Correction of misspelled words
US7809215B2 (en) 2006-10-11 2010-10-05 The Invention Science Fund I, Llc Contextual information encoded in a formed expression
US20060212430A1 (en) 2005-03-18 2006-09-21 Searete Llc, A Limited Liability Corporation Of The State Of Delaware Outputting a saved hand-formed expression
US8340476B2 (en) * 2005-03-18 2012-12-25 The Invention Science Fund I, Llc Electronic acquisition of a hand formed expression and a context of the expression
US7791593B2 (en) 2005-03-18 2010-09-07 The Invention Science Fund I, Llc Machine-differentiatable identifiers having a commonly accepted meaning
US8787706B2 (en) 2005-03-18 2014-07-22 The Invention Science Fund I, Llc Acquisition of a user expression and an environment of the expression
US7873243B2 (en) 2005-03-18 2011-01-18 The Invention Science Fund I, Llc Decoding digital information included in a hand-formed expression
US8229252B2 (en) 2005-03-18 2012-07-24 The Invention Science Fund I, Llc Electronic association of a user expression and a context of the expression
US8823636B2 (en) 2005-03-18 2014-09-02 The Invention Science Fund I, Llc Including environmental information in a manual expression
US8175394B2 (en) 2006-09-08 2012-05-08 Google Inc. Shape clustering in post optical character recognition processing
US20100321708A1 (en) * 2006-10-20 2010-12-23 Stefan Lynggaard Printing of coding patterns
US8296297B2 (en) * 2008-12-30 2012-10-23 Novell, Inc. Content analysis and correlation
US8301622B2 (en) * 2008-12-30 2012-10-30 Novell, Inc. Identity analysis and correlation
US8386475B2 (en) * 2008-12-30 2013-02-26 Novell, Inc. Attribution analysis and correlation
US20100250479A1 (en) * 2009-03-31 2010-09-30 Novell, Inc. Intellectual property discovery and mapping systems and methods
US20130024459A1 (en) * 2011-07-20 2013-01-24 Microsoft Corporation Combining Full-Text Search and Queryable Fields in the Same Data Structure
RU2571545C1 (ru) * 2014-09-30 2015-12-20 Общество с ограниченной ответственностью "Аби Девелопмент" Классификация изображений документов на основании контента

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0120334B1 (fr) * 1983-03-01 1989-12-06 Nec Corporation Système pour la détection de l'écartement de caractères
JPS607582A (ja) * 1983-06-27 1985-01-16 Fujitsu Ltd 文字読取り装置
US4610025A (en) * 1984-06-22 1986-09-02 Champollion Incorporated Cryptographic analysis system
US4791675A (en) * 1985-12-31 1988-12-13 Schlumberger Systems And Services, Inc. VSP Connectivity pattern recognition system
US5050218A (en) * 1986-08-26 1991-09-17 Nec Corporation Apparatus for recognizing address appearing on mail article
JPS63158678A (ja) * 1986-12-23 1988-07-01 Sharp Corp 単語間スペ−ス検出方法
ATE75552T1 (de) * 1987-10-16 1992-05-15 Computer Ges Konstanz Verfahren zur automatischen zeichenerkennung.
JP2783558B2 (ja) * 1988-09-30 1998-08-06 株式会社東芝 要約生成方法および要約生成装置
CA1318403C (fr) * 1988-10-11 1993-05-25 Michael J. Hawley Methode et dispositif d'extraction de mots-cles contenus dans un texte
CA1318404C (fr) * 1988-10-11 1993-05-25 Michael J. Hawley Methode et dispositif d'indexation de fichiers dans un systeme informatique
JP2597006B2 (ja) * 1989-04-18 1997-04-02 シャープ株式会社 矩形座標抽出方法
JPH036659A (ja) * 1989-06-03 1991-01-14 Brother Ind Ltd 文書処理装置
US5065437A (en) * 1989-12-08 1991-11-12 Xerox Corporation Identification and segmentation of finely textured and solid regions of binary images

Also Published As

Publication number Publication date
DE69229468T2 (de) 1999-10-28
JPH05282423A (ja) 1993-10-29
US5325444A (en) 1994-06-28
JP3282860B2 (ja) 2002-05-20
EP0544430A3 (en) 1993-12-22
CA2077604C (fr) 1999-07-06
DE69229468D1 (de) 1999-07-29
EP0544430B1 (fr) 1999-06-23
EP0544430A2 (fr) 1993-06-02

Similar Documents

Publication Publication Date Title
CA2077604A1 (fr) Methode et dispositif pour determiner les frequences des mots dans un document sans decodage d'image
EP0643358A3 (fr) Méthode et appareil de recherche d'image.
EP0952533A3 (fr) Synthèse de textes en utilisant des parties de parole
CA2033359A1 (fr) Methode de couplage de motifs et appareil connexe
EP0887760B8 (fr) Méthode et appareil pour décoder des codes à barres
EP0472026A3 (en) Information processing system and method for processing document by using structured keywords
EP0763925A3 (fr) Appareil de traitement d'image
EP0613303A3 (fr) Appareil et méthode de compression de signal d'image.
CA2077274A1 (fr) Methode et appareil pour resumer un document d'imagerie sans le decoder
EP0643529A3 (fr) Procédé de traitement d'image et appareil pour celui-ci.
EP0654749A3 (fr) Procédé et dispositif d'analyse d'images.
FR2701186B1 (fr) Système pour comprimer et décomprimer des données numériques d'images.
CA2078423A1 (fr) Methode et dispositif d'extraction de portions de l'information contenue dans une image sans decodage de cette derniere
GB8907063D0 (en) Method of and apparatus for converting attribute of display data into code
EP0635808A3 (fr) Méthode et appareil pour traiter le modèle de la structure de données sur une image pour produire une sortie perceptible par l'homme dans le contexte de l'image.
EP0652538A3 (fr) Procédé et appareil de traitement d'images.
EP0659009A3 (fr) Dispositif de lecture d'original et dispositif de traitement d'information comportant un dispositif de lecture d'original.
EP0650287A3 (fr) Procédé et appareil de traitement d'images.
DE69026320D1 (de) Halbtonbilddatenkomprimierungsverfahren und -vorrichtung
EP0633550A3 (fr) Procédé et appareil pour le traitement d'images.
EP0605208A3 (fr) Appareil et procédé de traitement d'image ainsi qu'appareil de lecture d'image.
EP0627845A3 (fr) Procédé et appareil de traitement d'images.
DE69013378D1 (de) Anordnung zur Umsetzung von Bildumrissdaten in Bildpunkte darstellende Punktdaten.
DE3174105D1 (en) Method of recognizing characters in an optical document reader
AU619130B2 (en) Document reader and reading processing method therefor

Legal Events

Date Code Title Description
EEER Examination request
MKLA Lapsed