AU1907300A - Term-length term-frequency method for measuring document similarity and classifying text - Google Patents

Term-length term-frequency method for measuring document similarity and classifying text

Info

Publication number
AU1907300A
AU1907300A AU19073/00A AU1907300A AU1907300A AU 1907300 A AU1907300 A AU 1907300A AU 19073/00 A AU19073/00 A AU 19073/00A AU 1907300 A AU1907300 A AU 1907300A AU 1907300 A AU1907300 A AU 1907300A
Authority
AU
Australia
Prior art keywords
term
frequency method
length
document similarity
classifying text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
AU19073/00A
Inventor
Mark Kantrowitz
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
JustSystems Corp
Original Assignee
Justsystem Pittsburgh Research Center Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Justsystem Pittsburgh Research Center Inc filed Critical Justsystem Pittsburgh Research Center Inc
Publication of AU1907300A publication Critical patent/AU1907300A/en
Abandoned legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
AU19073/00A 1998-11-30 1999-11-01 Term-length term-frequency method for measuring document similarity and classifying text Abandoned AU1907300A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US20156998A 1998-11-30 1998-11-30
US09201569 1998-11-30
PCT/US1999/025686 WO2000033215A1 (en) 1998-11-30 1999-11-01 Term-length term-frequency method for measuring document similarity and classifying text

Publications (1)

Publication Number Publication Date
AU1907300A true AU1907300A (en) 2000-06-19

Family

ID=22746357

Family Applications (1)

Application Number Title Priority Date Filing Date
AU19073/00A Abandoned AU1907300A (en) 1998-11-30 1999-11-01 Term-length term-frequency method for measuring document similarity and classifying text

Country Status (2)

Country Link
AU (1) AU1907300A (en)
WO (1) WO2000033215A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105956010A (en) * 2016-04-20 2016-09-21 浙江大学 Distributed information retrieval set selection method based on distributed representation and local ordering

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3573688B2 (en) * 2000-06-28 2004-10-06 松下電器産業株式会社 Similar document search device and related keyword extraction device
AUPR208000A0 (en) * 2000-12-15 2001-01-11 80-20 Software Pty Limited Method of document searching
US7412453B2 (en) 2002-12-30 2008-08-12 International Business Machines Corporation Document analysis and retrieval
DE60315947T2 (en) * 2003-03-27 2008-05-21 Sony Deutschland Gmbh Method for language modeling
US7321880B2 (en) 2003-07-02 2008-01-22 International Business Machines Corporation Web services access to classification engines
JP2009516252A (en) * 2005-11-15 2009-04-16 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ How to get a representation of text
JP5027483B2 (en) * 2006-11-10 2012-09-19 富士通株式会社 Information search apparatus and information search method
US8244767B2 (en) 2009-10-09 2012-08-14 Stratify, Inc. Composite locality sensitive hash based processing of documents
US9355171B2 (en) 2009-10-09 2016-05-31 Hewlett Packard Enterprise Development Lp Clustering of near-duplicate documents
CN103218435B (en) * 2013-04-15 2017-01-25 上海嘉之道企业管理咨询有限公司 Method and system for clustering Chinese text data
US8837835B1 (en) * 2014-01-20 2014-09-16 Array Technology, LLC Document grouping system
CN114492446B (en) * 2022-02-16 2023-06-16 平安科技(深圳)有限公司 Legal document processing method and device, electronic equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5748953A (en) * 1989-06-14 1998-05-05 Hitachi, Ltd. Document search method wherein stored documents and search queries comprise segmented text data of spaced, nonconsecutive text elements and words segmented by predetermined symbols
JP3270783B2 (en) * 1992-09-29 2002-04-02 ゼロックス・コーポレーション Multiple document search methods
US5642502A (en) * 1994-12-06 1997-06-24 University Of Central Florida Method and system for searching for relevant documents from a text database collection, using statistical ranking, relevancy feedback and small pieces of text

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105956010A (en) * 2016-04-20 2016-09-21 浙江大学 Distributed information retrieval set selection method based on distributed representation and local ordering
CN105956010B (en) * 2016-04-20 2019-03-26 浙江大学 Distributed information retrieval set option method based on distributed characterization and partial ordering

Also Published As

Publication number Publication date
WO2000033215A1 (en) 2000-06-08

Similar Documents

Publication Publication Date Title
AU2001264928A1 (en) System and method for automatically classifying text
AU4905997A (en) Management and analysis of document information text
EP0996927A4 (en) Text classification system and method
AU1357099A (en) Method and device for classifying overhead objects
AU4698899A (en) Computer audio reading device providing highlighting of either character or bitmapped based text images
AU8887298A (en) Information processing device and information processing method
AU5251196A (en) Method and system for two-dimensional visualization of an information taxonomy and of text documents based on topical content of the documents
AU1133200A (en) Prescription-controlled data collection system and method
GB2345771B (en) Apparatus for classifying or disambiguating data
AUPP764398A0 (en) Method and apparatus for computing the similarity between images
AU4320299A (en) Methods and apparatuses for processing security documents
GB2318439B (en) Device and method for representing handwriting, and an alphabet therefor
AU6420699A (en) Document facing method and apparatus
AU2001275422A1 (en) Method and system for text analysis
AUPP603798A0 (en) Automated image interpretation and retrieval system
AU1095200A (en) Data exploration system and method
AU4043797A (en) Method and apparatus for processing and determining the orientation of documents
AU6401599A (en) Environmental material ticket reader (emtr) and environmental material ticket (emt) system
AU6265999A (en) Computer curve construction system and method
AU4620899A (en) Electronic file retrieval method and system
HK1038087A1 (en) System and method for searching electronic documents created with optical character recognition.
AU2198300A (en) Improved techniques for spatial representation of data and browsing based on similarity
AU2277900A (en) Method and device for object recognition
AU7540996A (en) Fingerprint characteristic extraction apparatus as well as fingerprint classification apparatus and fingerprint verification apparatus for use with fingerprint characteristic extraction apparatus
AU1907300A (en) Term-length term-frequency method for measuring document similarity and classifying text

Legal Events

Date Code Title Description
MK6 Application lapsed section 142(2)(f)/reg. 8.3(3) - pct applic. not entering national phase