AU2001285432A1 - Automatic categorization of documents based on textual content - Google Patents

Automatic categorization of documents based on textual content

Info

Publication number
AU2001285432A1
AU2001285432A1 AU2001285432A AU8543201A AU2001285432A1 AU 2001285432 A1 AU2001285432 A1 AU 2001285432A1 AU 2001285432 A AU2001285432 A AU 2001285432A AU 8543201 A AU8543201 A AU 8543201A AU 2001285432 A1 AU2001285432 A1 AU 2001285432A1
Authority
AU
Australia
Prior art keywords
textual content
documents based
automatic categorization
categorization
automatic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
AU2001285432A
Inventor
Frank Smadja
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Elron Software Inc
Original Assignee
Elron Software Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Elron Software Inc filed Critical Elron Software Inc
Publication of AU2001285432A1 publication Critical patent/AU2001285432A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • Y10S707/99936Pattern matching access

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
AU2001285432A 2000-08-09 2001-08-09 Automatic categorization of documents based on textual content Abandoned AU2001285432A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US09/635,714 2000-08-09
US09/635,714 US6621930B1 (en) 2000-08-09 2000-08-09 Automatic categorization of documents based on textual content
PCT/US2001/041669 WO2002013055A2 (en) 2000-08-09 2001-08-09 Automatic categorization of documents based on textual content

Publications (1)

Publication Number Publication Date
AU2001285432A1 true AU2001285432A1 (en) 2002-02-18

Family

ID=24548813

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2001285432A Abandoned AU2001285432A1 (en) 2000-08-09 2001-08-09 Automatic categorization of documents based on textual content

Country Status (3)

Country Link
US (1) US6621930B1 (en)
AU (1) AU2001285432A1 (en)
WO (1) WO2002013055A2 (en)

Families Citing this family (53)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004501421A (en) * 2000-03-27 2004-01-15 ドキュメンタム,インコーポレイティド Method and apparatus for generating metadata for documents
US20070027672A1 (en) * 2000-07-31 2007-02-01 Michel Decary Computer method and apparatus for extracting data from web pages
US6778986B1 (en) * 2000-07-31 2004-08-17 Eliyon Technologies Corporation Computer method and apparatus for determining site type of a web site
US20020091671A1 (en) * 2000-11-23 2002-07-11 Andreas Prokoph Method and system for data retrieval in large collections of data
CN1240011C (en) * 2001-03-29 2006-02-01 国际商业机器公司 File classifying management system and method for operation system
WO2003014975A1 (en) * 2001-08-08 2003-02-20 Quiver, Inc. Document categorization engine
JP3997774B2 (en) * 2001-12-11 2007-10-24 ソニー株式会社 Data processing system, data processing method, information processing apparatus, and computer program
US7024624B2 (en) * 2002-01-07 2006-04-04 Kenneth James Hintz Lexicon-based new idea detector
US7409404B2 (en) * 2002-07-25 2008-08-05 International Business Machines Corporation Creating taxonomies and training data for document categorization
US7743061B2 (en) * 2002-11-12 2010-06-22 Proximate Technologies, Llc Document search method with interactively employed distance graphics display
US20040122660A1 (en) * 2002-12-20 2004-06-24 International Business Machines Corporation Creating taxonomies and training data in multiple languages
US20040162824A1 (en) * 2003-02-13 2004-08-19 Burns Roland John Method and apparatus for classifying a document with respect to reference corpus
US7299261B1 (en) * 2003-02-20 2007-11-20 Mailfrontier, Inc. A Wholly Owned Subsidiary Of Sonicwall, Inc. Message classification using a summary
US8266215B2 (en) 2003-02-20 2012-09-11 Sonicwall, Inc. Using distinguishing properties to classify messages
US20040243554A1 (en) * 2003-05-30 2004-12-02 International Business Machines Corporation System, method and computer program product for performing unstructured information management and automatic text analysis
US7146361B2 (en) * 2003-05-30 2006-12-05 International Business Machines Corporation System, method and computer program product for performing unstructured information management and automatic text analysis, including a search operator functioning as a Weighted AND (WAND)
US20040243556A1 (en) * 2003-05-30 2004-12-02 International Business Machines Corporation System, method and computer program product for performing unstructured information management and automatic text analysis, and including a document common analysis system (CAS)
US20040243560A1 (en) * 2003-05-30 2004-12-02 International Business Machines Corporation System, method and computer program product for performing unstructured information management and automatic text analysis, including an annotation inverted file system facilitating indexing and searching
US7139752B2 (en) * 2003-05-30 2006-11-21 International Business Machines Corporation System, method and computer program product for performing unstructured information management and automatic text analysis, and providing multiple document views derived from different document tokenizations
CA2527281C (en) * 2003-06-13 2013-09-17 Equifax, Inc. Systems and processes for automated criteria and attribute generation, searching, auditing and reporting of data
US7734627B1 (en) 2003-06-17 2010-06-08 Google Inc. Document similarity detection
US20090100138A1 (en) * 2003-07-18 2009-04-16 Harris Scott C Spam filter
US20060242180A1 (en) * 2003-07-23 2006-10-26 Graf James A Extracting data from semi-structured text documents
US11132183B2 (en) 2003-08-27 2021-09-28 Equifax Inc. Software development platform for testing and modifying decision algorithms
CA2536097A1 (en) * 2003-08-27 2005-03-10 Equifax, Inc. Application processing and decision systems and processes
US7245765B2 (en) * 2003-11-11 2007-07-17 Sri International Method and apparatus for capturing paper-based information on a mobile computing device
US8693043B2 (en) * 2003-12-19 2014-04-08 Kofax, Inc. Automatic document separation
US7975240B2 (en) * 2004-01-16 2011-07-05 Microsoft Corporation Systems and methods for controlling a visible results set
US7725475B1 (en) 2004-02-11 2010-05-25 Aol Inc. Simplifying lexicon creation in hybrid duplicate detection and inductive classifier systems
US7392262B1 (en) * 2004-02-11 2008-06-24 Aol Llc Reliability of duplicate document detection algorithms
US7624274B1 (en) * 2004-02-11 2009-11-24 AOL LLC, a Delaware Limited Company Decreasing the fragility of duplicate document detecting algorithms
US7444380B1 (en) 2004-07-13 2008-10-28 Marc Diamond Method and system for dispensing and verification of permissions for delivery of electronic messages
US7496567B1 (en) 2004-10-01 2009-02-24 Terril John Steichen System and method for document categorization
US10803126B1 (en) * 2005-01-13 2020-10-13 Robert T. and Virginia T. Jenkins Method and/or system for sorting digital signal information
US7266562B2 (en) * 2005-02-14 2007-09-04 Levine Joel H System and method for automatically categorizing objects using an empirically based goodness of fit technique
US7593904B1 (en) * 2005-06-30 2009-09-22 Hewlett-Packard Development Company, L.P. Effecting action to address an issue associated with a category based on information that enables ranking of categories
US8719073B1 (en) 2005-08-25 2014-05-06 Hewlett-Packard Development Company, L.P. Producing a measure regarding cases associated with an issue after one or more events have occurred
US8423908B2 (en) * 2006-09-08 2013-04-16 Research In Motion Limited Method for identifying language of text in a handheld electronic device and a handheld electronic device incorporating the same
US7885466B2 (en) * 2006-09-19 2011-02-08 Xerox Corporation Bags of visual context-dependent words for generic visual categorization
CA2921562C (en) * 2007-08-07 2017-11-21 Equifax, Inc. Systems and methods for managing statistical expressions
US9082080B2 (en) * 2008-03-05 2015-07-14 Kofax, Inc. Systems and methods for organizing data sets
US20100121842A1 (en) * 2008-11-13 2010-05-13 Dennis Klinkott Method, apparatus and computer program product for presenting categorized search results
US20100121790A1 (en) * 2008-11-13 2010-05-13 Dennis Klinkott Method, apparatus and computer program product for categorizing web content
US8392175B2 (en) * 2010-02-01 2013-03-05 Stratify, Inc. Phrase-based document clustering with automatic phrase extraction
US8996350B1 (en) 2011-11-02 2015-03-31 Dub Software Group, Inc. System and method for automatic document management
US11928606B2 (en) 2013-03-15 2024-03-12 TSG Technologies, LLC Systems and methods for classifying electronic documents
US9298814B2 (en) 2013-03-15 2016-03-29 Maritz Holdings Inc. Systems and methods for classifying electronic documents
US9053392B2 (en) * 2013-08-28 2015-06-09 Adobe Systems Incorporated Generating a hierarchy of visual pattern classes
US9881079B2 (en) * 2014-12-24 2018-01-30 International Business Machines Corporation Quantification based classifier
JP2018517968A (en) * 2015-04-21 2018-07-05 レクシスネクシス ア ディヴィジョン オブ リード エルザヴィア インコーポレイテッド System and method for generating concepts from a document corpus
CN106294506B (en) * 2015-06-10 2020-04-24 华中师范大学 Domain-adaptive viewpoint data classification method and device
US11783005B2 (en) 2019-04-26 2023-10-10 Bank Of America Corporation Classifying and mapping sentences using machine learning
US11328025B1 (en) 2019-04-26 2022-05-10 Bank Of America Corporation Validating mappings between documents using machine learning

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0786914B2 (en) * 1986-11-07 1995-09-20 株式会社日立製作所 Change detection method using images
US5479533A (en) * 1992-02-28 1995-12-26 Yamatake-Honeywell Co., Ltd. Pattern recognition apparatus and method using fuzzy logic
US5581630A (en) * 1992-12-21 1996-12-03 Texas Instruments Incorporated Personal identification
EP0612035B1 (en) * 1993-02-19 2002-01-30 International Business Machines Corporation Neural net for the comparison of image pattern features
US5978620A (en) * 1998-01-08 1999-11-02 Xerox Corporation Recognizing job separator pages in a document scanning device
AU1122100A (en) 1998-10-30 2000-05-22 Justsystem Pittsburgh Research Center, Inc. Method for content-based filtering of messages by analyzing term characteristicswithin a message

Also Published As

Publication number Publication date
WO2002013055A2 (en) 2002-02-14
US6621930B1 (en) 2003-09-16
WO2002013055A3 (en) 2003-09-18

Similar Documents

Publication Publication Date Title
AU2001285432A1 (en) Automatic categorization of documents based on textual content
AU2003274613A1 (en) Content retrieval based on semantic association
AU2434101A (en) Voice interface for electronic documents
AU2001258125A1 (en) Portable multimedia tourist guide
AU2001275195A1 (en) Automatic pipette identification and detipping
GB0011543D0 (en) Automatic text classification system
AU2001263119A1 (en) Authoring arbitrary xml documents using dhtml and xslt
AU2094801A (en) Multimedia photo albums
AU2001261787A1 (en) Document with embedded information
WO2002052477A8 (en) Advertising enabled digital content
AU2002212613A1 (en) Method for providing multimedia files and terminal therefor
AU2001270964A1 (en) Transferring electronic content
AU2001286920A1 (en) Folder with retaining tab
AU3289999A (en) Multiple-layered leak-resistant tube
AU2070401A (en) Document formatting based on optimized formatting values
AU2001226941A1 (en) Document sorter and method
AU2002210759A1 (en) Extending hypermedia documents by adding tagged attributes
AU2002222020A1 (en) Automatic immunoassay apparatus
AUPR126500A0 (en) Documents envelope
AUPQ921600A0 (en) Automatic person meta-data labeller
AU2001291857A1 (en) Method for classifying documents
WO2002014706A8 (en) Hydrodynamic converter
AU4716101A (en) Content collection
AU7341900A (en) Automatic conversion between sets of text urls and cohesive scenes of visual urls
AU2001284654A1 (en) Automated claims filing and tracking