WO2000067162A1 - Document-classification system, method and software - Google Patents

Document-classification system, method and software Download PDF

Info

Publication number
WO2000067162A1
WO2000067162A1 PCT/US2000/012386 US0012386W WO0067162A1 WO 2000067162 A1 WO2000067162 A1 WO 2000067162A1 US 0012386 W US0012386 W US 0012386W WO 0067162 A1 WO0067162 A1 WO 0067162A1
Authority
WO
WIPO (PCT)
Prior art keywords
classes
document
particular document
list
classified
Prior art date
Application number
PCT/US2000/012386
Other languages
French (fr)
Other versions
WO2000067162A9 (en
Inventor
Bokyung Yang-Stephens
M. Charles Swope
Jeffrey Locke
Isabelle Moulinier
Original Assignee
West Publishing Company
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by West Publishing Company filed Critical West Publishing Company
Priority to EP00932127A priority Critical patent/EP1212699A4/en
Priority to AU49898/00A priority patent/AU781157B2/en
Priority to CA002371688A priority patent/CA2371688C/en
Priority to JP2000615932A priority patent/JP4732593B2/en
Priority to NZ515293A priority patent/NZ515293A/en
Publication of WO2000067162A1 publication Critical patent/WO2000067162A1/en
Priority to US10/013,190 priority patent/US7065514B2/en
Publication of WO2000067162A9 publication Critical patent/WO2000067162A9/en
Priority to US11/388,753 priority patent/US7567961B2/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99932Access augmentation or optimizing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • Y10S707/99935Query augmenting and refining, e.g. inexact access
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99941Database schema or data structure
    • Y10S707/99943Generating database or data structure, e.g. via user interface

Definitions

  • the present invention concerns document classification systems and methods for legal documents, such as judicial decisions.
  • West Group (formerly West Publishing Company) of St. Paul, Minnesota, not only collect and publish the judicial opinions of jurisdictions from almost every federal and state jurisdiction in the United States, but also classify the opinions based on the principles or points of law they contain.
  • West Group for example, classifies judicial opinions using its proprietary Key NumberTM System. (Key Number is a trademark of
  • the Key Number System is a hierarchical system of over 400 major legal topics, with the topics divided into subtopics, the subtopics into sub-subtopics, and so on. Each topic or sub-topic has a unique alpha-numeric code, known as its Key Number classification. Table 1 shows an example of a portion of the
  • West Group classifiers manually classify over 350,000 headnotes across the approximately 82,000 separate classes of the Key Number classification system. Over time, many of the classifiers memorize significant portions of the Key Number System, enabling them to quickly assign Key Number classes to most headnotes they encounter. However, many headnotes are difficult to classify. For these, the classifier often invokes the WestLawTM online legal search service, which allows the user to manually define queries against a database of classified headnotes. (WestLaw is a trademark of West Group.)
  • an editor might define and run a query including the terms “abuse,” “discretion,” “maintenance,” and “divorce.”
  • the search service would return a set of annotated judicial opinions compliant with the query and the classifier would in turn sift through the headnotes in each judicial opinion, looking for those most similar to the headnote targeted for classification. If one or more of the headnotes satisfies the editor's threshold for similarity, the classifier manually assigns the Key Number classes associated with these headnotes to the target headnote.
  • the classifier through invocation of a separate application, may also view an electronic document listing a portion of the Key Number System to help identify related classes that may not be included in the search results.
  • This process of classification suffers from at least two problems.
  • this conventional process of classification lacks an efficient method of correcting misclassified headnotes.
  • a classifier makes a written request to a database administrator with rights to a master headnote database.
  • One exemplary system includes a personal computer or work station coupled to a memory storing classified judicial headnotes or abstracts and a memory containing one or more headnotes requiring classification.
  • the personal computer includes a graphical user interface that concurrently displays one of the headnotes requiring classification, a list of one or more candidate classes for the one headnote, at least one classification description associated with one of the listed candidate classes, and at least one classified headnote that is associated with one of the listed candidate classes.
  • the graphical user interface also facilitates user assignment of the one headnote requiring classification to one or more of the listed candidate classes.
  • the list of candidate classes results from automatically defining and executing a query against the classified headnotes, with the query derived from the one headnote requiring classification.
  • the exemplary system also displays the candidate classes in a ranked order based on measured similarity of corresponding classified headnotes to the headnote requiring classification, further assisting the user in assigning the headnote to an appropriate class.
  • Other features of the interface allow the user to reclassify a classified headnote and to define and execute an arbitrary query against the classified headnotes to further assist classification.
  • Figure 1 is a diagram of an exemplary classification system 100 embodying several aspects of the invention, including a unique graphical user interface 114;
  • Figure 2 is a flowchart illustrating an exemplary method embodied in classification system 100 of Figure 1;
  • Figure 3 is a diagram illustrating an unclassified document or headnote
  • Figure 4A is a facsimile of an exemplary graphical user interface 400 that forms a portion of classification system 100.
  • Figure 4B is a facsimile of exemplary graphical user interface 400 after responding to a user input.
  • Figure 4C is a facsimile of exemplary graphical user interface 400 after responding to another user input.
  • Figure 5 is a facsimile of an exemplary graphical user interface 500.
  • document refers to any logical collection or arrangement of machine-readable data having a filename.
  • Figure 1 shows a diagram of an exemplary document classification system 100 for assisting editors in manually classifying electronic documents according to a document classification scheme.
  • the exemplary embodiment assists in the classification of judicial abstracts, or headnotes, according to West Group's Key Number System.
  • West Group's Key Number System For further details on the Key Number System, see West's Analysis of American Law: Guide to the American Digest System, 2000 Edition, West Group, 1999. This text is inco ⁇ orated herein by reference.
  • the present invention is not limited to any particular type of documents or type of classification system.
  • System 100 includes an exemplary personal computer or classification work station 1 10, an exemplary classified documents database 120, an exemplary classification system database 130, and an unclassified documents database 140.
  • work station 110 and databases 120-140 as separate components, some embodiments combine the functionality of these components into a greater or lesser number of components. For example, one embodiment combines databases 120-140 within work station
  • database 110 and another embodiment combines database 130 with work station 110 and databases 120 and 140 into a single database.
  • work station 110 includes a processing unit
  • processor unit 11 1 includes one or more processors and an operating system which supports graphical-user interfaces.
  • Storage device 112 include one or more electronic, magnetic, and/or optical memory devices.
  • processors and data-storage devices use other types and numbers of processors and data-storage devices.
  • some embodiment implement one or more portions of system 100 using one or more mainframe computers or servers, such as the Sun Ultra 4000 server.
  • Exemplary display devices include a color monitor and virtual-reality goggles
  • exemplary user-interface devices include a keyboard, mouse, joystick, microphone, video camera, body-field sensors, and virtual-reality apparel, such as gloves, headbands, bodysuits, etc.
  • the invention is not limited to any genus or species of computerized platforms.
  • Classified documents database 120 includes documents classified according to a classification system.
  • database 120 includes an indexed collection of approximately twenty million headnotes spanning the entirety of the West Group's Key Number System.
  • some embodiments include an indexed subset of the total collection of classified headnotes. For example, one embodiment indexes headnotes from decisions made within the last 25 years. This reduces the number of headnotes by about half and thus reduces the time necessary to run queries against the the headnotes.
  • Other embodiments further reduce the size of the training collection to include only headnotes specific to the jurisdiction of the query. This is expected not only to result in retrieval of headnotes with greater similarity, but also to further reduce processing time.
  • Each headnote in the training collection has one or more logically associated Key Number classification codes.
  • An exemplary indexing procedure entails tokenizing the headnotes, generating transactions, and creating an inverted file. Tokenization entails reading in documents and removing predetermined stop-words, single digits, and stems.
  • the exemplary embodiment uses the Porter stemming algorithm to remove stems. See, M.F. Porter, An Algorithm for Suffix Stripping, Program, 14(3):130-137, July 1980. Single digits are removed since they tend to appear as item markers in enumerations and thus contribute very little to the substance of headnotes.
  • the procedure After tokenization, the procedure generates a transactions for each headnote.
  • a transaction is a tuple grouping a term t, a document identifier n, the frequency of the term t in the document n, and the positions of the term t in document n.
  • the procedure creates an inverted file containing records.
  • the records store the term, the number of documents in the collection that contain the term, and the generated transactions.
  • the inverted file allows efficient access to term information at search time. For further details, see G. Salton, Automatic Text Processing: the Transformation, Analysis and Retrieval of Information by Computer, Addison Wesley, 1989.
  • database 120 In addition to an indexed collection of headnotes, database 120 also includes a search engine 121.
  • search engine 121 comprises a natural-language search engine, such as the natural language version of WestLaw ® legal search tools.
  • search engines include other search engines based on the work by H. Turtle, Inference Networks for Document Retrieval, PhD thesis, Computer and Information Science Department, University of Massachusetts, October 1990.
  • Still other embodiments use an Inquery Retrieval System as described in J.P. Gallan, W.B. Croft, and S.M. Harding, The Inquery Retrieval System. In Proceedings of the Third International Conference on Database and Expert Systems Applications, pages 78-83, Valencia, Spain, 1992. Springer- Verlag.
  • Classification system database 130 includes searchable data describing the logical and hierarchical structure of the classification system used in system 100. In the exemplary embodiment, this data describes the approximately 82,000 classes of West Group's Key Number System. Each class description includes its Key Number code, a topic description, and data linking the class to adjacent classes.
  • Unclassified documents database 140 includes a set of one or more unclassified documents.
  • each document is an unclassified headnote or more generally a headnote requiring initial classification or reclassification.
  • each headnote has a corresponding judicial opinion.
  • the headnotes are determined manually by professional editor.
  • other embodiments may determine headnotes automatically using a computerized document summarizer. See for example U.S. Patent 5,708,825 to Bernardo Rafael Sotomayer, which is inco ⁇ orated herein by reference.
  • System 100 also includes, within data-storage device 112, classification- aiding software 112a.
  • software 112a comprises one or more software modules and operates as a separate application program or as part of the kernel or shell of an operating system.
  • Software 112a can be installed on work station 110 through a network-download or through a computer-readable medium, such as an optical or magnetic disc, or through other software transfer methods.
  • software 112a enables system 100 to generate graphical-user interface 114 which integrates unclassified headnotes from database 140 with classified headnotes and ranked candidate classes from database 120 and classification system data from database 130 to assist users in manually classifying or reclassifying headnotes.
  • FIG. 2 shows a flow chart 200 of an exemplary classification method at least partly embodied within and facilitated by software 112a.
  • Flow chart 200 includes a number of process blocks 202-214, which are arranged serially in the exemplary embodiment.
  • other embodiments of the invention may reorder the blocks, omits one or more blocks, and/or execute two or more blocks in parallel using multiple processors or a single processor organized as two or more virtual machines or subprocessors.
  • still other embodiments implement the blocks as one or more specific interconnected hardware or integrated-circuit modules with related control and data signals communicated between and through the modules.
  • the exemplary process flow is applicable to software, firmware, and hardware implementations.
  • the exemplary method begins at process block 202 with automatic or user-directed retrieval of a set of one or more unclassified headnotes from unclassified document database 140.
  • a number of sets of unclassified headnotes can be scheduled for classification at particular stations or a set of unclassified headnotes can be queued for sequential distribution to the next available work station.
  • Some embodiments allow the user to define and run a query against the unclassified headnotes and in effect define the set of headnotes he or she will classify or alternatively transfer the set of headnotes to another work station for classification.
  • execution of the exemplary method then proceeds to block 204.
  • Block 204 entails defining a query based on one of the headnotes in the set of unclassified headnotes. In the exemplary embodiment, this entails forwarding the one headnote to the natural-language search engine 121 which automatically defines the query using the indexing procedure already applied to index the classified headnotes of database 120.
  • Figure 3 shows the text of a sample headnote 300 and a structured query 300' that search engine 121 derives from it.
  • some embodiments include a query structuring or definition module within software 112a.
  • search engine 121 After defining the query, the exemplary method runs, or executes, the query against the classified document database 120, as indicated in block 206.
  • search engine 121 which has already defined the query from the unclassified headnote, executes a search based on the query.
  • search engine 121 implements memory-based reasoning, a variant of a -nearest neighbor method. This generally entails retrieving the classified headnotes that are closest to the unclassified headnote, or more precisely the query form of the unclassified headnote, based on some distance function.
  • the exemplary embodiment compares the query to each classified headnote in the database, scores all the terms, or concepts, that each classified headnote has in common with the query, sums the scores of all the common terms, and divides by the total number of query terms in the classified headnote to determine an average score for the classified headnote.
  • the inverse-document-frequency factor (idf) favors (that is, gives greater weight to) terms that are rare in the collection, while the term frequency factor (tf) gives a higher importance to terms that are frequent in the document being scored.
  • Block 208 entails determining the classes associated with a predetermined number k of the top classified headnotes from the ranked list of search results.
  • the k classified headnotes are the k nearest neighbors of the unclassified headnote according to the distance function used in search engine 121. Exemplary values for k include 5, 10, 25, 50, and 100. In the exemplary embodiment, some of the classified headnotes have two or more associated Key Number classes.
  • the method executes block 210 which entails transferring the k classified headnotes and their associated class identifiers from classified document database 120 to work station 110.
  • the station 110 next determines a ranking for the class identifiers (Key Number classes) associated with the top k classified headnotes.
  • the exemplary embodiment ranks the class identifiers based on their frequencies of occurrence within the set of candidate classes. In other words, each class identifier is ranked based on how many times it appears in the set of candidate classes.
  • the total similarity score is the sum of the similarity scores for all the headnotes associated with the class.
  • Some embodiments rank the similarity scores for all the headnotes associated with a class, weight the ranks according to a function, and then sum the weighted ranks to determine where to rank the class.
  • the system executes block 214 which entails displaying on display device 113 (shown in Figure 1) the exemplary graphical user interface 400 which is shown in Figure 4A.
  • Graphical user interface 400 includes concurrently displayed windows or regions 410, 420, 430, 440, and 450.
  • Window 410 displays the one unclassified headnote, headnote 300 of Figure 3, which was selected or retrieved from classification in block 202 of the exemplary flow chart in Figure 2.
  • Window 420 displays a sorted list or table 422 of candidate classes and their corresponding frequencies. A class 422a in list 422 is highlighted in subregion 420a of window 420.
  • Window 430 displays a portion 432a of the classification system hierarchy which includes class 422a.
  • Window 440 displays one or more of the classified headnotes that is similar to the one unclassified headnote and which has class 422a as one of its assigned classes.
  • Window 450 is an input window for assigning one or more classes to unclassified headnote 412 displayed in window 410.
  • interface devices 114-116 of system 100 enable a user to highlight or select one or more of the candidate classes in list 422. For example, a user may point and double click on candidate class 422a (232Akl79) to select the class, or a user may single click on the class to highlight it for further consideration. Selecting, or double-clicking, a class in the list, results in automatic insertion of the class into window 450.
  • the interface not only allows the user to select as many of the classes as desired, but also to manually insert one or more classes, including classes not listed, into window 450.
  • interface 400 When interface 400 is closed, it prompts the user to save, or in effect, actually assign the one or more classes in window 450 to the headnote in window 410.
  • interface 400 In response to highlighting class 422a, interface 400 displays subregion 420a of window 420 in reverse- video, that is, by reversing the background and foreground colors of subregion 420a. (Other embodiments use other techniques not only to indicate selection of one of the classes, but also to select one or more of the classes.)
  • classification station 110 In further response to highlighting a class in list 422 of window 420, classification station 110 (in Figure 1) defines a query based on all or a portion of the highlighted class and runs it against classification system database 130. Database 130 returns one or more classes in the neighborhood of the selected class to station 110, and window 430 displays one or more of these neighborhood classes, as portion 432a, allowing the user to view the highlighted class in context of the classification system, complete with class identifiers and class descriptors.
  • the interface In addition to responding to highlighting of class 422a by displaying it in context of the classification system in window 430, the interface also displays in window 440 one or more of the classified headnotes that is similar to the headnote being classified.
  • window 440 displays one of the headnotes, such as headnote 442a, which resulted in the highlighted class 422a being included in list 422. If there are more than one of these headnotes, window 440 allows the user to view each of them in order from most similar to least similar to the headnote being classified.
  • Figure 4B shows that the user may also highlight another class, such as class 422b in the list 422 to view this class in context of the classification system in window 430 and to view the classified headnotes associated with the class in window 440. More specifically, window 430 shows a portion 432b of the classification system stored in database 130, and window 440 shows a headnote 442b associated with highlighted class 422b. The interface allows the user to repeat this process with each of the classes in list. Window 430 also includes an enter-query button 434 which the user may invoke to convert window 430 into a query-entry window 430' as shown in Figure 4C.
  • another class such as class 422b in the list 422 to view this class in context of the classification system in window 430 and to view the classified headnotes associated with the class in window 440.
  • window 430 shows a portion 432b of the classification system stored in database 130
  • window 440 shows a headnote 442b associated with highlighted class 422b.
  • the interface allows the user to repeat this process with
  • This figure shows an exemplary query 436, which the user has defined to include several terms and/or phrases from or related to unclassified headnote 412 in window 410.
  • enter-query button 434 has been converted to a run-query button 434', which the use may actuate after entering query 436.
  • Actuating the run-query button runs the query against classified documents database 120, and results in representation of interface 400, with an updated list 422' of candidate classes for possible assignment to the unclassified headnote. (Once the user highlights one of the classes in the updated list 422', window 430 will display this class in context of the classification system hierarchy.
  • window 440 includes a reclassification button 444, which the user can invoke to initiate reclassification of the particular headnote, such as headnote 442b to another class. Invocation of button 444 results in display of window 500 as shown in Figure 5.
  • Window 500 includes a region 510 that displays a headnote 512 that is being reclassified, a region 520 which displays the highlighted class from list 422 that is associated with the headnote, and region 530 displays a ranked list 532 of candidate classes and an input field 534 for entry of new class.
  • Ranked list 532 is developed using the same process used for developing list 422.
  • One exemplary system includes a single graphical user interface that concurrently displays one of the headnotes requiring classification, a list of one or more candidate classes for the one headnote, at least one classification description associated with one of the listed candidate classes, and at least one classified headnote that is associated with one of the listed candidate classes.
  • the exemplary interface integrates two or more tools necessary for a user to accurately and efficiently classify judicial headnotes or other documents.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Every year, professional classifiers at West Group manually classify over 350,000 headnotes, or abstracts of judicial opinions across approximately 82,000 separate classes of the Key Number System (130). Although most headnotes are classified from the memory of classifiers, a significant number are difficult and thus costly to classify (130) manually. Accordingly, the inventors devised systems (120), methods, and software that facilitate manual classification (120) of headnotes and documents generally hard-to-classify and particularly headnotes. One exemplary system provides a graphical user interface (114) that concurrently displays an unclassified headnote (140), a ranked list of one or more candidate classes, a candidate class in combination with adjacent classes of the classification system (100), and at least one classified headnote associated with one of the candidate classes.

Description

DOCUMENT-CLASSIFICATION SYSTEM, METHOD AND SOFTWARE
Cross-Reference to Related Applications
This application is a continuation of U.S. provisional patent application 60/132673 which was filed May 5, 1999 and which is incoφorated herein by reference. Copyright Notice and Permission
A portion of this patent document contains material subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright whatsoever. The following notice applies to this document: Copyright © 1999, West Group
Technical Field The present invention concerns document classification systems and methods for legal documents, such as judicial decisions. Background
The American legal system, as well as some other legal systems around the world, relies heavily on written judicial opinions — the written pronouncements of judges — to articulate or inteφret the laws governing resolution of disputes. Each judicial opinion is not only important to resolving a particular dispute, but also to resolving all similar disputes in the future. This importance reflects the principle of American law that the judges within a given jurisdiction should decide disputes with similar factual circumstances in similar ways. Because of this principle, judges and lawyers within the American legal system are continually searching an ever-expanding body of past decisions, or case law, for the decisions that are most relevant to resolution of particular disputes.
To facilitate this effort, companies, such as West Group (formerly West Publishing Company) of St. Paul, Minnesota, not only collect and publish the judicial opinions of jurisdictions from almost every federal and state jurisdiction in the United States, but also classify the opinions based on the principles or points of law they contain. West Group, for example, classifies judicial opinions using its proprietary Key Number™ System. (Key Number is a trademark of
West Group.) This system has been a seminal tool for finding relevant judicial opinions since the turn of the century. The Key Number System is a hierarchical system of over 400 major legal topics, with the topics divided into subtopics, the subtopics into sub-subtopics, and so on. Each topic or sub-topic has a unique alpha-numeric code, known as its Key Number classification. Table 1 shows an example of a portion of the
Key Number System for classifying points of divorce law: Key Number Classification Topic Description
134 Divorce
134V Alimony, Allowances, and Property Disposition
134k230 Permanent Alimony
134k235k Discretion of Court
Table 1. Key Number hierarchy and corresponding Topic
Descriptions
At present, there are approximately 82,000 Key Number classes or categories, each one delineating a particular legal concept. Maintaining the Key Number System is an enormous on-going effort, requiring hundreds of professional editors to keep up with the thousands of judicial decisions issued throughout the United States ever year. Professional attorney-editors read each opinion and annotate it with individual abstracts, or headnotes, for each point of law it includes. The resulting annotated opinions are then passed in electronic form to classification editors, or classifiers, who read each headnote and manually assign it to one or more classes in the Key Number System. For example, a classifier facing the headnote: "Abuse of discretion in award of maintenance occurs only where no reasonable person would take view adopted by trial court assigned." would most likely assign it to Key Number class 134k235, which as indicated in Table 1, corresponds to the Divorce subtopic "discretion of court".
Every year, West Group classifiers manually classify over 350,000 headnotes across the approximately 82,000 separate classes of the Key Number classification system. Over time, many of the classifiers memorize significant portions of the Key Number System, enabling them to quickly assign Key Number classes to most headnotes they encounter. However, many headnotes are difficult to classify. For these, the classifier often invokes the WestLaw™ online legal search service, which allows the user to manually define queries against a database of classified headnotes. (WestLaw is a trademark of West Group.)
For instance, if presented with the exemplary "abuse of discretion" headnote, an editor might define and run a query including the terms "abuse," "discretion," "maintenance," and "divorce." The search service would return a set of annotated judicial opinions compliant with the query and the classifier would in turn sift through the headnotes in each judicial opinion, looking for those most similar to the headnote targeted for classification. If one or more of the headnotes satisfies the editor's threshold for similarity, the classifier manually assigns the Key Number classes associated with these headnotes to the target headnote. The classifier, through invocation of a separate application, may also view an electronic document listing a portion of the Key Number System to help identify related classes that may not be included in the search results.
The present inventors recognized that this process of classification suffers from at least two problems. First, even with use of online searching, the process is quite cumbersome and inefficient. For example, editors are forced to switch from viewing a headnote in one application, to a separate online search application to manually enter queries and view search results, to yet another application to consult a classification system list before finally finishing classification of some hard-to-classify headnotes. Secondly, this conventional process of classification lacks an efficient method of correcting misclassified headnotes. To correct misclassified headnotes, a classifier makes a written request to a database administrator with rights to a master headnote database.
Accordingly, there is a need for systems, methods, and software that not only streamline manual classification processes, but also promote consistency and accuracy of resulting classifications. Summary
To address this and other needs, the inventors devised systems, methods, and software that facilitate the manual classification of documents, particularly judicial opinions according to a legal classification system, such as West Group's Key Number System. One exemplary system includes a personal computer or work station coupled to a memory storing classified judicial headnotes or abstracts and a memory containing one or more headnotes requiring classification. The personal computer includes a graphical user interface that concurrently displays one of the headnotes requiring classification, a list of one or more candidate classes for the one headnote, at least one classification description associated with one of the listed candidate classes, and at least one classified headnote that is associated with one of the listed candidate classes. The graphical user interface also facilitates user assignment of the one headnote requiring classification to one or more of the listed candidate classes. In the exemplary system, the list of candidate classes results from automatically defining and executing a query against the classified headnotes, with the query derived from the one headnote requiring classification. The exemplary system also displays the candidate classes in a ranked order based on measured similarity of corresponding classified headnotes to the headnote requiring classification, further assisting the user in assigning the headnote to an appropriate class. Other features of the interface allow the user to reclassify a classified headnote and to define and execute an arbitrary query against the classified headnotes to further assist classification.
Brief Description of Drawings Figure 1 is a diagram of an exemplary classification system 100 embodying several aspects of the invention, including a unique graphical user interface 114; Figure 2 is a flowchart illustrating an exemplary method embodied in classification system 100 of Figure 1; Figure 3 is a diagram illustrating an unclassified document or headnote
300 and a structured query 300' derived from headnote 300 during operation of classification system 100; Figure 4A is a facsimile of an exemplary graphical user interface 400 that forms a portion of classification system 100. Figure 4B is a facsimile of exemplary graphical user interface 400 after responding to a user input. Figure 4C is a facsimile of exemplary graphical user interface 400 after responding to another user input. Figure 5 is a facsimile of an exemplary graphical user interface 500.
Detailed Description of Preferred Embodiments This description, which references and incoφorates the Figures, describes one or more specific embodiments of one or more inventions. These embodiments, offered not to limit but only to exemplify and teach the one or more inventions, are shown and described in sufficient detail to enable those skilled in the art to implement or practice the invention. Thus, where appropriate to avoid obscuring the invention, the description may omit certain information known to those of skill in the art.
The description includes many terms with meanings derived from their usage in the art or from their use within the context of the description. However, as a further aid, the following term definitions are presented.
The term "document" refers to any logical collection or arrangement of machine-readable data having a filename.
The term "database" includes any logical collection or arrangement of machine-readable documents. Figure 1 shows a diagram of an exemplary document classification system 100 for assisting editors in manually classifying electronic documents according to a document classification scheme. The exemplary embodiment assists in the classification of judicial abstracts, or headnotes, according to West Group's Key Number System. For further details on the Key Number System, see West's Analysis of American Law: Guide to the American Digest System, 2000 Edition, West Group, 1999. This text is incoφorated herein by reference. However, the present invention is not limited to any particular type of documents or type of classification system.
System 100 includes an exemplary personal computer or classification work station 1 10, an exemplary classified documents database 120, an exemplary classification system database 130, and an unclassified documents database 140. Though the exemplary embodiment presents work station 110, and databases 120-140 as separate components, some embodiments combine the functionality of these components into a greater or lesser number of components. For example, one embodiment combines databases 120-140 within work station
110, and another embodiment combines database 130 with work station 110 and databases 120 and 140 into a single database.
The most pertinent features of work station 110 include a processing unit
111, a data-storage device 112, a display device 113, a graphical-user interface 114, and user-interface devices 115 and 116. In the exemplary embodiment, processor unit 11 1 includes one or more processors and an operating system which supports graphical-user interfaces. Storage device 112 include one or more electronic, magnetic, and/or optical memory devices. However, other embodiments of the invention, use other types and numbers of processors and data-storage devices. For examples, some embodiment implement one or more portions of system 100 using one or more mainframe computers or servers, such as the Sun Ultra 4000 server. Exemplary display devices include a color monitor and virtual-reality goggles, and exemplary user-interface devices include a keyboard, mouse, joystick, microphone, video camera, body-field sensors, and virtual-reality apparel, such as gloves, headbands, bodysuits, etc. Thus, the invention is not limited to any genus or species of computerized platforms.
Classified documents database 120 includes documents classified according to a classification system. In the exemplary embodiment, database 120 includes an indexed collection of approximately twenty million headnotes spanning the entirety of the West Group's Key Number System. However, some embodiments include an indexed subset of the total collection of classified headnotes. For example, one embodiment indexes headnotes from decisions made within the last 25 years. This reduces the number of headnotes by about half and thus reduces the time necessary to run queries against the the headnotes. Other embodiments further reduce the size of the training collection to include only headnotes specific to the jurisdiction of the query. This is expected not only to result in retrieval of headnotes with greater similarity, but also to further reduce processing time. Each headnote in the training collection has one or more logically associated Key Number classification codes.
An exemplary indexing procedure entails tokenizing the headnotes, generating transactions, and creating an inverted file. Tokenization entails reading in documents and removing predetermined stop-words, single digits, and stems. The exemplary embodiment uses the Porter stemming algorithm to remove stems. See, M.F. Porter, An Algorithm for Suffix Stripping, Program, 14(3):130-137, July 1980. Single digits are removed since they tend to appear as item markers in enumerations and thus contribute very little to the substance of headnotes.
After tokenization, the procedure generates a transactions for each headnote. A transaction is a tuple grouping a term t, a document identifier n, the frequency of the term t in the document n, and the positions of the term t in document n. Next, the procedure creates an inverted file containing records. The records store the term, the number of documents in the collection that contain the term, and the generated transactions. The inverted file allows efficient access to term information at search time. For further details, see G. Salton, Automatic Text Processing: the Transformation, Analysis and Retrieval of Information by Computer, Addison Wesley, 1989. In addition to an indexed collection of headnotes, database 120 also includes a search engine 121. In the exemplary embodiment, search engine 121 comprises a natural-language search engine, such as the natural language version of WestLaw ® legal search tools. However, other embodiments include other search engines based on the work by H. Turtle, Inference Networks for Document Retrieval, PhD thesis, Computer and Information Science Department, University of Massachusetts, October 1990. Still other embodiments use an Inquery Retrieval System as described in J.P. Gallan, W.B. Croft, and S.M. Harding, The Inquery Retrieval System. In Proceedings of the Third International Conference on Database and Expert Systems Applications, pages 78-83, Valencia, Spain, 1992. Springer- Verlag.
Classification system database 130 includes searchable data describing the logical and hierarchical structure of the classification system used in system 100. In the exemplary embodiment, this data describes the approximately 82,000 classes of West Group's Key Number System. Each class description includes its Key Number code, a topic description, and data linking the class to adjacent classes.
Unclassified documents database 140 includes a set of one or more unclassified documents. In the exemplary embodiment, each document is an unclassified headnote or more generally a headnote requiring initial classification or reclassification. Moreover, each headnote has a corresponding judicial opinion. In the exemplary embodiment, the headnotes are determined manually by professional editor. However, other embodiments may determine headnotes automatically using a computerized document summarizer. See for example U.S. Patent 5,708,825 to Bernardo Rafael Sotomayer, which is incoφorated herein by reference.
System 100 also includes, within data-storage device 112, classification- aiding software 112a. In the exemplary embodiment, software 112a comprises one or more software modules and operates as a separate application program or as part of the kernel or shell of an operating system. (Software 112a can be installed on work station 110 through a network-download or through a computer-readable medium, such as an optical or magnetic disc, or through other software transfer methods.) In the exemplary embodiment, software 112a enables system 100 to generate graphical-user interface 114 which integrates unclassified headnotes from database 140 with classified headnotes and ranked candidate classes from database 120 and classification system data from database 130 to assist users in manually classifying or reclassifying headnotes.
Figure 2 shows a flow chart 200 of an exemplary classification method at least partly embodied within and facilitated by software 112a. Flow chart 200 includes a number of process blocks 202-214, which are arranged serially in the exemplary embodiment. However, other embodiments of the invention may reorder the blocks, omits one or more blocks, and/or execute two or more blocks in parallel using multiple processors or a single processor organized as two or more virtual machines or subprocessors. Moreover, still other embodiments implement the blocks as one or more specific interconnected hardware or integrated-circuit modules with related control and data signals communicated between and through the modules. Thus, the exemplary process flow is applicable to software, firmware, and hardware implementations.
The exemplary method begins at process block 202 with automatic or user-directed retrieval of a set of one or more unclassified headnotes from unclassified document database 140. For system embodiments that include two or more classification work stations, a number of sets of unclassified headnotes can be scheduled for classification at particular stations or a set of unclassified headnotes can be queued for sequential distribution to the next available work station. Some embodiments allow the user to define and run a query against the unclassified headnotes and in effect define the set of headnotes he or she will classify or alternatively transfer the set of headnotes to another work station for classification. After retrieval of the unclassified headnotes, execution of the exemplary method then proceeds to block 204.
Block 204 entails defining a query based on one of the headnotes in the set of unclassified headnotes. In the exemplary embodiment, this entails forwarding the one headnote to the natural-language search engine 121 which automatically defines the query using the indexing procedure already applied to index the classified headnotes of database 120. Figure 3 shows the text of a sample headnote 300 and a structured query 300' that search engine 121 derives from it. Although the exemplary embodiment relied on the inherent functionality of its search engine 121 for this query definition some embodiments include a query structuring or definition module within software 112a.
After defining the query, the exemplary method runs, or executes, the query against the classified document database 120, as indicated in block 206. In the exemplary embodiment, search engine 121, which has already defined the query from the unclassified headnote, executes a search based on the query. In executing the search, search engine 121 implements memory-based reasoning, a variant of a -nearest neighbor method. This generally entails retrieving the classified headnotes that are closest to the unclassified headnote, or more precisely the query form of the unclassified headnote, based on some distance function. More particularly, the exemplary embodiment compares the query to each classified headnote in the database, scores all the terms, or concepts, that each classified headnote has in common with the query, sums the scores of all the common terms, and divides by the total number of query terms in the classified headnote to determine an average score for the classified headnote.
In the exemplary embodiment, search engine 121 scores individual terms using the following formula: w(t,d) = 0.4 + 0.6 *tf(t,d) * idf(t), where w(t,d) denotes the weight, or score, for term t in document (or headnote) d; idf(t) denotes an inverse-document-frequency factor for the term t and tf(t,d) denotes the term-frequency factor for term t in document d. The inverse- document-frequency factor idf(t) is defined as idf(t) = (log (N) - log [df(t)])/ log(N), and the term-frequency factor tf(t,d) for term t in document d is defined as tf(t,d) = 0.5 + 0.5 x log[f(t,d)]/log(maxtf), where N is the total number of documents (headnotes) in the collection, df(t) is the number of documents where term t appears, f(t,d) is the number of occurrences of term t in document d, and maxtf is the maximum frequency of any term in document d. The inverse-document-frequency factor (idf) favors (that is, gives greater weight to) terms that are rare in the collection, while the term frequency factor (tf) gives a higher importance to terms that are frequent in the document being scored.
The result of the search is a ranked list of document-score pairs, with each score indicating the similarity between a retrieved classified document and the query. The score is the metric for finding the nearest neighbors. Execution of the method then continues to block 208. Block 208 entails determining the classes associated with a predetermined number k of the top classified headnotes from the ranked list of search results. The k classified headnotes are the k nearest neighbors of the unclassified headnote according to the distance function used in search engine 121. Exemplary values for k include 5, 10, 25, 50, and 100. In the exemplary embodiment, some of the classified headnotes have two or more associated Key Number classes.
After determining all the classes associated with the k classified headnotes most similar to the unclassified headnote, the method executes block 210 which entails transferring the k classified headnotes and their associated class identifiers from classified document database 120 to work station 110.
As block 212 shows, the station 110, or more particular processor unit 1 11, next determines a ranking for the class identifiers (Key Number classes) associated with the top k classified headnotes. The exemplary embodiment ranks the class identifiers based on their frequencies of occurrence within the set of candidate classes. In other words, each class identifier is ranked based on how many times it appears in the set of candidate classes.
Other embodiments rank the classes based on respective total similarity scores. For a given candidate class, the total similarity score is the sum of the similarity scores for all the headnotes associated with the class. Some embodiments rank the similarity scores for all the headnotes associated with a class, weight the ranks according to a function, and then sum the weighted ranks to determine where to rank the class. Two exemplary rank-weighting functions are: w(r) = 1/r and w(r) = (l-ε*r.), where w denotes the weight function and r denotes rank, ε = l/(k+l), k being the number of nearest neighbors. Functions such as these give a higher weight to a Key Number class assigned to a document at the top of the retrieved set, and a lower weight when the document is at a lower position.
After ranking the candidate classes, the system executes block 214 which entails displaying on display device 113 (shown in Figure 1) the exemplary graphical user interface 400 which is shown in Figure 4A. Graphical user interface 400 includes concurrently displayed windows or regions 410, 420, 430, 440, and 450.
Window 410 displays the one unclassified headnote, headnote 300 of Figure 3, which was selected or retrieved from classification in block 202 of the exemplary flow chart in Figure 2. Window 420 displays a sorted list or table 422 of candidate classes and their corresponding frequencies. A class 422a in list 422 is highlighted in subregion 420a of window 420. Window 430 displays a portion 432a of the classification system hierarchy which includes class 422a. Window 440 displays one or more of the classified headnotes that is similar to the one unclassified headnote and which has class 422a as one of its assigned classes. Window 450 is an input window for assigning one or more classes to unclassified headnote 412 displayed in window 410.
In operation, interface devices 114-116 of system 100 enable a user to highlight or select one or more of the candidate classes in list 422. For example, a user may point and double click on candidate class 422a (232Akl79) to select the class, or a user may single click on the class to highlight it for further consideration. Selecting, or double-clicking, a class in the list, results in automatic insertion of the class into window 450. The interface not only allows the user to select as many of the classes as desired, but also to manually insert one or more classes, including classes not listed, into window 450. When interface 400 is closed, it prompts the user to save, or in effect, actually assign the one or more classes in window 450 to the headnote in window 410. In response to highlighting class 422a, interface 400 displays subregion 420a of window 420 in reverse- video, that is, by reversing the background and foreground colors of subregion 420a. (Other embodiments use other techniques not only to indicate selection of one of the classes, but also to select one or more of the classes.)
In further response to highlighting a class in list 422 of window 420, classification station 110 (in Figure 1) defines a query based on all or a portion of the highlighted class and runs it against classification system database 130. Database 130 returns one or more classes in the neighborhood of the selected class to station 110, and window 430 displays one or more of these neighborhood classes, as portion 432a, allowing the user to view the highlighted class in context of the classification system, complete with class identifiers and class descriptors.
In addition to responding to highlighting of class 422a by displaying it in context of the classification system in window 430, the interface also displays in window 440 one or more of the classified headnotes that is similar to the headnote being classified. In other words, window 440 displays one of the headnotes, such as headnote 442a, which resulted in the highlighted class 422a being included in list 422. If there are more than one of these headnotes, window 440 allows the user to view each of them in order from most similar to least similar to the headnote being classified.
Figure 4B shows that the user may also highlight another class, such as class 422b in the list 422 to view this class in context of the classification system in window 430 and to view the classified headnotes associated with the class in window 440. More specifically, window 430 shows a portion 432b of the classification system stored in database 130, and window 440 shows a headnote 442b associated with highlighted class 422b. The interface allows the user to repeat this process with each of the classes in list. Window 430 also includes an enter-query button 434 which the user may invoke to convert window 430 into a query-entry window 430' as shown in Figure 4C. This figure shows an exemplary query 436, which the user has defined to include several terms and/or phrases from or related to unclassified headnote 412 in window 410. The figure also shows that enter-query button 434 has been converted to a run-query button 434', which the use may actuate after entering query 436. Actuating the run-query button runs the query against classified documents database 120, and results in representation of interface 400, with an updated list 422' of candidate classes for possible assignment to the unclassified headnote. (Once the user highlights one of the classes in the updated list 422', window 430 will display this class in context of the classification system hierarchy. This user-invocable option of defining and running queries further facilitates classification of headnotes when the candidate classes stemming form the automatically defined queries are unsatisfactory. When viewing the classified headnotes in window 440, the user may recognize that a particular headnote has been misclassified and thus require reclassification. Thus, window 440 includes a reclassification button 444, which the user can invoke to initiate reclassification of the particular headnote, such as headnote 442b to another class. Invocation of button 444 results in display of window 500 as shown in Figure 5. Window 500 includes a region 510 that displays a headnote 512 that is being reclassified, a region 520 which displays the highlighted class from list 422 that is associated with the headnote, and region 530 displays a ranked list 532 of candidate classes and an input field 534 for entry of new class. Ranked list 532 is developed using the same process used for developing list 422.
Conclusion In furtherance of the art, the inventors have presented exemplary systems, methods, and software that facilitate the manual classification of documents, particularly judicial headnotes according to a legal classification system, such as West Group's Key Number System. One exemplary system includes a single graphical user interface that concurrently displays one of the headnotes requiring classification, a list of one or more candidate classes for the one headnote, at least one classification description associated with one of the listed candidate classes, and at least one classified headnote that is associated with one of the listed candidate classes. The exemplary interface integrates two or more tools necessary for a user to accurately and efficiently classify judicial headnotes or other documents.
The embodiments described above are intended only to illustrate and teach one or more ways of practicing or implementing the present invention, not to restrict its breadth or scope. The actual scope of the invention, which embraces all ways of practicing or implementing the concepts of the invention, is defined only by the following claims and their equivalents.

Claims

Claims
1. A method of classifying one or more documents in a classification scheme including two or more classes, with each class having one or more classified document headnotes, the method comprising: summarizing a particular document to define one or more particular document headnotes; automatically generating a list of one or more of the classes, with each listed class having one or more classified document headnotes which are similar to the particular document headnote; and classifying the particular document or document summary based on the list of classes.
2. A method of classifying one or more documents in a classification scheme including two or more classes, with each class having one or more classified documents, the method comprising: summarizing a particular document to define a particular document summary; automatically generating a list of one or more of the classes, with each listed class having one or more classified documents which are similar to the particular document summary; and classifying the particular document or document summary based on the list of classes.
3. A method of classifying one or more documents in a classification scheme including two or more classes, with each class having one or more classified document summaries, the method comprising: summarizing a particular document to define a particular document summary; automatically generating a list of one or more of the classes, with each listed class having one or more classified document summaries which are similar to the particular document summary; and classifying the particular document based on the list of classes.
4. The method of claim 3, wherein summarizing a particular document comprises manually summarizing the particular document or electronically summarizing the particular document using a computerized text summarizer.
5. The method of claim 3, wherein generating a list of one or more of the classes comprises: defining one or more natural-language or boolean queries based on the particular document summary; performing one or more searches of the classified document summaries based on one or more of the queries, with one or more of the searches yielding one or more found document summaries; ranking the one or more found document summaries based on relative similarity to the particular document summary to define one or more ranked document summaries; generating the list based on one or more of the ranked document summaries.
6. The method of claim 3, wherein classifying the particular document based on the list of classes comprises manually selecting one or more of the classes using a graphical user interface or automatically selecting one or more of the classes using a predetermined selection procedure.
7. A method of classifying one or more documents in a classification scheme including two or more classes, with each class having one or more classified document summaries, the method comprising: a step for summarizing a particular document to define a particular document summary; a step for automatically generating a list of one or more of the classes, with each listed class having one or more classified document summaries which are similar to the particular document summary; and a step for classifying the particular document based on the list of classes.
8. A method of classifying one or more documents, comprising providing a classification scheme including two or more classes, with each class having one or more classified document summaries logically associated with it; summarizing a particular document to define a particular document summary; automatically generating a list of one or more of the classes, with each listed class having one or more classified document summaries which are similar to the particular document summary; and classifying the particular document based on the list of classes.
9. The method of claim 8, wherein summarizing a particular document comprises manually summarizing the particular document or electronically summarizing the particular document using a computerized text summarizer.
10. The method of claim 8, wherein generating a list of one or more of the classes comprises: defining one or more natural-language or boolean queries based on the particular document summary; performing one or more searches of the classified document summaries based on one or more of the queries, with one or more of the searches yielding one or more found document summaries; ranking the one or more found document summaries based on relative similarity to the particular document summary to define one or more ranked document summaries; generating the list based on one or more of the ranked document summaries.
11. The method of claim 8, wherein classifying the particular document based on the list of classes comprises manually selecting one or more of the classes using a graphical user interface or automatically selecting one or more of the classes using a predetermined selection procedure.
12. The method of claim 8, further comprising adding one or more classes to the classification scheme, with each added class having one or more classified document summaries logically associated with it.
13. The method of claim 8, wherein each class has an associated legal concept and the particular document is a judicial opinion or secondary legal source.
14. The method of claim 8, wherein the classification scheme conforms at least in part with a version of the West Key Numbering System.
15. A computer-readable magnetic, electronic, or optical medium comprising computer-executable instructions for: causing a computer to read at least part of a classification scheme into memory, the classification scheme including two or more classes, with each class having one or more classified document summaries logically associated with it; causing the computer to summarize in memory a particular document to define a particular document summary; causing the computer to generate a list in memory of one or more of the classes, with each listed class having associated with it one or more classified document summaries which are similar to the particular document summary; and causing the computer to classify the particular document based on the list of classes.
16. The medium of claim 15, wherein the instructions for summarizing a particular document comprises instructions for causing the computer to weigh the lexical content of the document.
17. The medium of claim 15, wherein the instructions for generating a list of one or more of the classes comprises instructions for: causing the computer to define one or more natural-language or boolean queries based on the particular document summary; causing the computer to perform one or more searches of the classified document summaries based on one or more of the queries, with one or more of the searches yielding one or more found document summaries; causing the computer to rank the one or more found document summaries based on relative similarity to the particular document summary to define one or more ranked document summaries; and causing the computer to generate the list based on one or more of the ranked document summaries.
18. The medium of claim 15, wherein the instructions for classifying the particular document based on the list of classes comprises instructions for causing the computer to facilitate manual selection one or more of the classes using a graphical user interface or instructions for causing the computer to automatically select one or more of the classes using a predetermined selection procedure.
19. The medium of claim 15, further comprising instructions for manually or automatically adding one or more classes to the classification scheme, with each added class having one or more classified document summaries logically associated with it.
20. The medium of claim 15, wherein each class has an associated legal concept and the particular document is a judicial opinion.
21. The medium of claim 15, wherein the classification scheme conforms at least in part with a version of the West Key Numbering System.
22. A system for classifying one or more documents in a classification scheme including two or more classes, with each class having one or more classified document summaries, the system comprising: means for summarizing a particular document to define a particular document summary; means for automatically generating a list of one or more of the classes, with each listed class having one or more classified document summaries which are similar to the particular document summary; and means for classifying the particular document based on the list of classes.
23. The system of claim 22, wherein the means for summarizing, the means for automatically generating a list, and the means for classifying exist as software module in a memory coupled to one or more computer processors or within various parts of a mainframe computer or within a SUN Ultra 4000 Server.
24. The system of claim 22, wherein the means for summarizing comprises the summarizer described in United States Patent 5,708,825 to Bernardo Rafael Sotomayer, which is incoφorated herein by reference.
25. A system for classifying one or more documents, comprising means for providing a classification scheme including two or more classes, with each class having one or more classified document summaries logically associated with it; means for summarizing a particular document to define a particular document summary; means for automatically generating a list of one or more of the classes, with each listed class having one or more classified document summaries which are similar to the particular document summary; and means for classifying the particular document based on the list of classes.
26. A graphical user interface for aiding manual classification of one or more documents in a document classification system having two or more classes, the interface comprising: means for displaying at least a portion of one of the documents; and means for displaying information identifying one or more of the classes as candidate classes.
27. The graphical user of claim 26, wherein each document is a headnote, the headnote associated with a judicial opinion.
28. A graphical user interface for aiding manual classification of one or more documents in a document classification system having two or more classes, the interface comprising: means for displaying at least a portion of one of the documents; means for displaying information identifying one or more of the classes as candidate classes; and means for displaying a logical relationhip between at least one of the candidate classes and another class in the document classification system.
29. A graphical user interface for aiding manual classification of documents according to a document classification system having two or more classes, the interface comprising: means for displaying at least a portion of one of the documents; means for displaying information identifying one or more of the classes as candidate classes for the one of the documents; means for displaying a logical relationhip between at least one of the candidate classes and another class in the document classification system; and means for displaying at least one classified document associated with one of the candidate classes.
30. A method for aiding manual classification of documents according to a document classification system having two or more classes, the method comprising: displaying at least a portion of one of the documents; displaying information identifying one or more of the classes as candidate classes for the one of the documents, the information displayed concuπently with the portion of the one or more documents; displaying a logical relationhip between at least one of the candidate classes and another class in the document classification system, the logical relationship displayed concurrent with the the information; and displaying at least a portion of one classified document associated with one of the candidate classes, the portion of the one classified documents displayed concurrent with the logical relationship.
31. The method of claim 30, wherein the logical relationship is a hierachical relationship of at least one the candidate classes to one or more adjacent classes in the document classification system.
PCT/US2000/012386 1999-05-05 2000-05-05 Document-classification system, method and software WO2000067162A1 (en)

Priority Applications (7)

Application Number Priority Date Filing Date Title
EP00932127A EP1212699A4 (en) 1999-05-05 2000-05-05 Document-classification system, method and software
AU49898/00A AU781157B2 (en) 1999-05-05 2000-05-05 Document-classification system, method and software
CA002371688A CA2371688C (en) 1999-05-05 2000-05-05 Document-classification system, method and software
JP2000615932A JP4732593B2 (en) 1999-05-05 2000-05-05 Document classification system, document classification method, and document classification software
NZ515293A NZ515293A (en) 1999-05-05 2000-05-05 Document-classification system, method and software
US10/013,190 US7065514B2 (en) 1999-05-05 2001-11-05 Document-classification system, method and software
US11/388,753 US7567961B2 (en) 1999-05-05 2006-03-24 Document-classification system, method and software

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US13267399P 1999-05-05 1999-05-05
US60/132,673 1999-05-05

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US10/013,190 Continuation US7065514B2 (en) 1999-05-05 2001-11-05 Document-classification system, method and software

Publications (2)

Publication Number Publication Date
WO2000067162A1 true WO2000067162A1 (en) 2000-11-09
WO2000067162A9 WO2000067162A9 (en) 2002-06-06

Family

ID=22455084

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/012386 WO2000067162A1 (en) 1999-05-05 2000-05-05 Document-classification system, method and software

Country Status (7)

Country Link
US (2) US7065514B2 (en)
EP (1) EP1212699A4 (en)
JP (1) JP4732593B2 (en)
AU (1) AU781157B2 (en)
CA (1) CA2371688C (en)
NZ (1) NZ515293A (en)
WO (1) WO2000067162A1 (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1213665A2 (en) * 2000-12-07 2002-06-12 Patentmall Limited Patent classification displaying method and apparatus
WO2003040875A2 (en) * 2001-11-02 2003-05-15 West Publishing Company Doing Business As West Group Systems, methods, and software for classifying documents
EP1324219A1 (en) * 2001-12-11 2003-07-02 Abb Research Ltd. Method of searching based on categories for information objects in information pools and system to find such information objects
FR2840088A1 (en) * 2002-05-24 2003-11-28 Overture Services Inc Search engine and database for distributed database in computer, has computing apparatus with transactional score generator and category assigner in communication with Internet cache of memory device
EP1449111A1 (en) * 2001-10-30 2004-08-25 Goldman, Sachs & Co. Systems and method for facilitating access to documents via associated tags
EP1483691A2 (en) * 2002-03-11 2004-12-08 The Boeing Company Knowledge management using text classification
US7412463B2 (en) 2002-01-11 2008-08-12 Bloomberg Finance L.P. Dynamic legal database providing historical and current versions of bodies of law
US7529756B1 (en) 1998-07-21 2009-05-05 West Services, Inc. System and method for processing formatted text documents in a database
US7778954B2 (en) 1998-07-21 2010-08-17 West Publishing Corporation Systems, methods, and software for presenting legal case histories
WO2011017098A1 (en) * 2009-07-28 2011-02-10 Fti Technology Llc Displaying relationships between electronically stored information to provide classification suggestions via nearest neighbor
US8028001B2 (en) 2001-10-30 2011-09-27 Goldman Sachs & Co. Systems and methods for facilitating access to documents via a set of content selection tags
US8126818B2 (en) 2002-12-30 2012-02-28 West Publishing Company Knowledge-management systems for law firms
US8260786B2 (en) 2002-05-24 2012-09-04 Yahoo! Inc. Method and apparatus for categorizing and presenting documents of a distributed database
US8942488B2 (en) 2004-02-13 2015-01-27 FTI Technology, LLC System and method for placing spine groups within a display
US9053179B2 (en) 2006-04-05 2015-06-09 Lexisnexis, A Division Of Reed Elsevier Inc. Citation network viewer and method
US9176642B2 (en) 2005-01-26 2015-11-03 FTI Technology, LLC Computer-implemented system and method for displaying clusters via a dynamic user interface
US9195399B2 (en) 2001-08-31 2015-11-24 FTI Technology, LLC Computer-implemented system and method for identifying relevant documents for display
US9208592B2 (en) 2005-01-26 2015-12-08 FTI Technology, LLC Computer-implemented system and method for providing a display of clusters
US9208221B2 (en) 2001-08-31 2015-12-08 FTI Technology, LLC Computer-implemented system and method for populating clusters of documents
US9275344B2 (en) 2009-08-24 2016-03-01 Fti Consulting, Inc. Computer-implemented system and method for generating a reference set via seed documents
US11068546B2 (en) 2016-06-02 2021-07-20 Nuix North America Inc. Computer-implemented system and method for analyzing clusters of coded documents

Families Citing this family (82)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002082224A2 (en) * 2001-04-04 2002-10-17 West Publishing Company System, method, and software for identifying historically related legal opinions
US7333966B2 (en) 2001-12-21 2008-02-19 Thomson Global Resources Systems, methods, and software for hyperlinking names
US7356461B1 (en) * 2002-01-14 2008-04-08 Nstein Technologies Inc. Text categorization method and apparatus
US8370761B2 (en) * 2002-02-21 2013-02-05 Xerox Corporation Methods and systems for interactive classification of objects
US20040193596A1 (en) * 2003-02-21 2004-09-30 Rudy Defelice Multiparameter indexing and searching for documents
GB0304782D0 (en) * 2003-03-03 2003-04-09 Percy Richard System and method using alphanumeric codes for the identification, description, classification and encoding of information
EP1563416A1 (en) * 2003-09-02 2005-08-17 Infoglide Software Corporation System and method for classification of documents
US7536368B2 (en) * 2003-11-26 2009-05-19 Invention Machine Corporation Method for problem formulation and for obtaining solutions from a database
BRPI0506675A (en) * 2003-12-31 2007-05-15 Thomson Global Resources system, methods, interfaces, and software to extend search results beyond the limits set by the initial query
US20050219263A1 (en) * 2004-04-01 2005-10-06 Thompson Robert L System and method for associating documents with multi-media data
US7392474B2 (en) * 2004-04-30 2008-06-24 Microsoft Corporation Method and system for classifying display pages using summaries
US7275052B2 (en) * 2004-08-20 2007-09-25 Sap Ag Combined classification based on examples, queries, and keywords
WO2006023542A2 (en) * 2004-08-23 2006-03-02 Lexisnexis, A Division Of Reed Elsevier Inc. Point of law search system and method
US20060074883A1 (en) * 2004-10-05 2006-04-06 Microsoft Corporation Systems, methods, and interfaces for providing personalized search and information access
US20060218110A1 (en) * 2005-03-28 2006-09-28 Simske Steven J Method for deploying additional classifiers
US20060282884A1 (en) * 2005-06-09 2006-12-14 Ori Pomerantz Method and apparatus for using a proxy to manage confidential information
US20070005588A1 (en) * 2005-07-01 2007-01-04 Microsoft Corporation Determining relevance using queries as surrogate content
US8019637B2 (en) * 2005-07-07 2011-09-13 Sermo, Inc. Method and apparatus for conducting an information brokering service
US9177050B2 (en) * 2005-10-04 2015-11-03 Thomson Reuters Global Resources Systems, methods, and interfaces for extending legal search results
CA2624865C (en) * 2005-10-04 2016-09-20 Thomson Global Resources Systems, methods, and software for identifying relevant legal documents
US7917519B2 (en) * 2005-10-26 2011-03-29 Sizatola, Llc Categorized document bases
US20070112833A1 (en) * 2005-11-17 2007-05-17 International Business Machines Corporation System and method for annotating patents with MeSH data
US9495349B2 (en) * 2005-11-17 2016-11-15 International Business Machines Corporation System and method for using text analytics to identify a set of related documents from a source document
US7814102B2 (en) * 2005-12-07 2010-10-12 Lexisnexis, A Division Of Reed Elsevier Inc. Method and system for linking documents with multiple topics to related documents
US20070247394A1 (en) * 2006-04-20 2007-10-25 Boyan Corydon J Display menu allowing better accessibility in a limited space
JP2007293769A (en) * 2006-04-27 2007-11-08 Sony Corp Program, information processing method and information processor
EP2033084A4 (en) * 2006-05-23 2012-04-11 David P Gold System and method for organizing, processing and presenting information
US10380231B2 (en) * 2006-05-24 2019-08-13 International Business Machines Corporation System and method for dynamic organization of information sets
JP4910582B2 (en) * 2006-09-12 2012-04-04 ソニー株式会社 Information processing apparatus and method, and program
JP2008070958A (en) * 2006-09-12 2008-03-27 Sony Corp Information processing device and method, and program
JP5240457B2 (en) * 2007-01-16 2013-07-17 日本電気株式会社 Extended recognition dictionary learning device and speech recognition system
US9460164B2 (en) 2007-01-26 2016-10-04 Recommind, Inc. Apparatus and method for single action approval of legally categorized documents
US9031947B2 (en) * 2007-03-27 2015-05-12 Invention Machine Corporation System and method for model element identification
US20080270119A1 (en) * 2007-04-30 2008-10-30 Microsoft Corporation Generating sentence variations for automatic summarization
US10083420B2 (en) 2007-11-21 2018-09-25 Sermo, Inc Community moderated information
US8788523B2 (en) * 2008-01-15 2014-07-22 Thomson Reuters Global Resources Systems, methods and software for processing phrases and clauses in legal documents
US8417694B2 (en) * 2008-03-31 2013-04-09 International Business Machines Corporation System and method for constructing targeted ranking from multiple information sources
US8713007B1 (en) 2009-03-13 2014-04-29 Google Inc. Classifying documents using multiple classifiers
US20100287177A1 (en) * 2009-05-06 2010-11-11 Foundationip, Llc Method, System, and Apparatus for Searching an Electronic Document Collection
US20100287148A1 (en) * 2009-05-08 2010-11-11 Cpa Global Patent Research Limited Method, System, and Apparatus for Targeted Searching of Multi-Sectional Documents within an Electronic Document Collection
EP2438542A2 (en) * 2009-06-05 2012-04-11 West Services, Inc. Feature engineering and user behavior analysis
US8364679B2 (en) * 2009-09-17 2013-01-29 Cpa Global Patent Research Limited Method, system, and apparatus for delivering query results from an electronic document collection
US20110082839A1 (en) * 2009-10-02 2011-04-07 Foundationip, Llc Generating intellectual property intelligence using a patent search engine
JP2011095905A (en) * 2009-10-28 2011-05-12 Sony Corp Information processing apparatus and method, and program
US20110119250A1 (en) * 2009-11-16 2011-05-19 Cpa Global Patent Research Limited Forward Progress Search Platform
US8868402B2 (en) 2009-12-30 2014-10-21 Google Inc. Construction of text classifiers
US20110295861A1 (en) * 2010-05-26 2011-12-01 Cpa Global Patent Research Limited Searching using taxonomy
US8595220B2 (en) * 2010-06-16 2013-11-26 Microsoft Corporation Community authoring content generation and navigation
US9582575B2 (en) 2010-07-09 2017-02-28 Lexisnexis, A Division Of Reed Elsevier Inc. Systems and methods for linking items to a matter
US9639602B2 (en) * 2011-02-02 2017-05-02 Nanoprep Technologies Ltd. Method for matching queries with answer items in a knowledge base
US8407208B2 (en) * 2011-02-02 2013-03-26 Nanorep Technologies Ltd Method for matching queries with answer items in a knowledge base
US8650136B2 (en) * 2011-02-24 2014-02-11 Ketera Technologies, Inc. Text classification with confidence grading
US20120278659A1 (en) * 2011-04-27 2012-11-01 Microsoft Corporation Analyzing Program Execution
US9348852B2 (en) 2011-04-27 2016-05-24 Microsoft Technology Licensing, Llc Frequent pattern mining
US9519883B2 (en) 2011-06-28 2016-12-13 Microsoft Technology Licensing, Llc Automatic project content suggestion
US20130006986A1 (en) * 2011-06-28 2013-01-03 Microsoft Corporation Automatic Classification of Electronic Content Into Projects
WO2013123182A1 (en) * 2012-02-17 2013-08-22 The Trustees Of Columbia University In The City Of New York Computer-implemented systems and methods of performing contract review
US9336302B1 (en) 2012-07-20 2016-05-10 Zuci Realty Llc Insight and algorithmic clustering for automated synthesis
US9256836B2 (en) 2012-10-31 2016-02-09 Open Text Corporation Reconfigurable model for auto-classification system and method
US20150262105A1 (en) * 2013-03-12 2015-09-17 Thomson Reuters Global Resources Workflow software structured around taxonomic themes of regulatory activity
EP2992457A4 (en) * 2013-05-01 2016-11-09 Hewlett Packard Development Co Content classification
US10599753B1 (en) 2013-11-11 2020-03-24 Amazon Technologies, Inc. Document version control in collaborative environment
US9542391B1 (en) 2013-11-11 2017-01-10 Amazon Technologies, Inc. Processing service requests for non-transactional databases
US10540404B1 (en) * 2014-02-07 2020-01-21 Amazon Technologies, Inc. Forming a document collection in a document management and collaboration system
US11336648B2 (en) 2013-11-11 2022-05-17 Amazon Technologies, Inc. Document management and collaboration system
US10691877B1 (en) 2014-02-07 2020-06-23 Amazon Technologies, Inc. Homogenous insertion of interactions into documents
US9740748B2 (en) * 2014-03-19 2017-08-22 International Business Machines Corporation Similarity and ranking of databases based on database metadata
WO2015187129A1 (en) * 2014-06-03 2015-12-10 Hewlett-Packard Development Company, L.P. Document classification based on multiple meta-algorithmic patterns
US9807073B1 (en) 2014-09-29 2017-10-31 Amazon Technologies, Inc. Access to documents in a document management and collaboration system
US20160103823A1 (en) * 2014-10-10 2016-04-14 The Trustees Of Columbia University In The City Of New York Machine Learning Extraction of Free-Form Textual Rules and Provisions From Legal Documents
WO2016093836A1 (en) 2014-12-11 2016-06-16 Hewlett Packard Enterprise Development Lp Interactive detection of system anomalies
US20160350765A1 (en) 2015-05-27 2016-12-01 Ascent Technologies Inc. System and interface for viewing modularized and taxonomy-based classification of regulatory obligations qualitative data
US10803074B2 (en) 2015-08-10 2020-10-13 Hewlett Packard Entperprise Development LP Evaluating system behaviour
WO2017216627A1 (en) * 2016-06-16 2017-12-21 Thomson Reuters Global Resources Unlimited Company Scenario analytics system
US11205103B2 (en) 2016-12-09 2021-12-21 The Research Foundation for the State University Semisupervised autoencoder for sentiment analysis
US10419269B2 (en) 2017-02-21 2019-09-17 Entit Software Llc Anomaly detection
US11361569B2 (en) * 2017-08-03 2022-06-14 Koninklijke Philips N.V. Hierarchical neural networks with granularized attention
CN108415959B (en) * 2018-02-06 2021-06-25 北京捷通华声科技股份有限公司 Text classification method and device
US11106664B2 (en) * 2018-05-03 2021-08-31 Thomson Reuters Enterprise Centre Gmbh Systems and methods for generating a contextually and conversationally correct response to a query
US11640504B2 (en) 2019-05-17 2023-05-02 Samsung Electronics Co., Ltd. Electronic apparatus and controlling method thereof
WO2024059593A1 (en) * 2022-09-12 2024-03-21 Thomson Reuters Enterprise Centre Gmbh Interactive tool for determining a headnote report
CN116758560B (en) * 2023-08-16 2023-11-17 湖北微模式科技发展有限公司 Document image classification method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5708825A (en) * 1995-05-26 1998-01-13 Iconovex Corporation Automatic summary page creation and hyperlink generation
US5794236A (en) * 1996-05-29 1998-08-11 Lexis-Nexis Computer-based system for classifying documents into a hierarchy and linking the classifications to the hierarchy
US5815392A (en) * 1993-03-24 1998-09-29 Engate Incorporated Attorney terminal having outline preparation capabilities for managing trial proceedings

Family Cites Families (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5157783A (en) 1988-02-26 1992-10-20 Wang Laboratories, Inc. Data base system which maintains project query list, desktop list and status of multiple ongoing research projects
US5321833A (en) * 1990-08-29 1994-06-14 Gte Laboratories Incorporated Adaptive ranking system for information retrieval
US5488725A (en) 1991-10-08 1996-01-30 West Publishing Company System of document representation retrieval by successive iterated probability sampling
US5265065A (en) 1991-10-08 1993-11-23 West Publishing Company Method and apparatus for information retrieval from a database by replacing domain specific stemmed phases in a natural language to create a search query
US7249026B1 (en) * 1993-03-24 2007-07-24 Engate Llc Attorney terminal having outline preparation capabilities for managing trial proceedings
US5497317A (en) 1993-12-28 1996-03-05 Thomson Trading Services, Inc. Device and method for improving the speed and reliability of security trade settlements
US5434932A (en) 1994-07-28 1995-07-18 West Publishing Company Line alignment apparatus and process
US5642502A (en) * 1994-12-06 1997-06-24 University Of Central Florida Method and system for searching for relevant documents from a text database collection, using statistical ranking, relevancy feedback and small pieces of text
JP3603392B2 (en) * 1995-07-06 2004-12-22 株式会社日立製作所 Document classification support method and apparatus
US5644720A (en) 1995-07-31 1997-07-01 West Publishing Company Interprocess communications interface for managing transaction requests
JPH09153049A (en) * 1995-11-29 1997-06-10 Hitachi Ltd Method and device for supporting document classification
US7051024B2 (en) * 1999-04-08 2006-05-23 Microsoft Corporation Document summarizer for word processors
US6038560A (en) * 1997-05-21 2000-03-14 Oracle Corporation Concept knowledge base search and retrieval system
JP3001460B2 (en) * 1997-05-21 2000-01-24 株式会社エヌイーシー情報システムズ Document classification device
US5940821A (en) * 1997-05-21 1999-08-17 Oracle Corporation Information presentation in a knowledge base search and retrieval system
US6137911A (en) * 1997-06-16 2000-10-24 The Dialog Corporation Plc Test classification system and method
JPH1115848A (en) * 1997-06-26 1999-01-22 Sharp Corp Information sorting device, document information sorting method and recording medium to be used for execution of the method
JPH11110409A (en) * 1997-10-07 1999-04-23 Ntt Data Corp Method for classifying information and device therefor
US6289342B1 (en) * 1998-01-05 2001-09-11 Nec Research Institute, Inc. Autonomous citation indexing and literature browsing using citation context
US6533822B2 (en) * 1998-01-30 2003-03-18 Xerox Corporation Creating summaries along with indicators, and automatically positioned tabs
US6584479B2 (en) * 1998-06-17 2003-06-24 Xerox Corporation Overlay presentation of textual and graphical annotations
US6772149B1 (en) * 1999-09-23 2004-08-03 Lexis-Nexis Group System and method for identifying facts and legal discussion in court case law documents

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5815392A (en) * 1993-03-24 1998-09-29 Engate Incorporated Attorney terminal having outline preparation capabilities for managing trial proceedings
US5708825A (en) * 1995-05-26 1998-01-13 Iconovex Corporation Automatic summary page creation and hyperlink generation
US5794236A (en) * 1996-05-29 1998-08-11 Lexis-Nexis Computer-based system for classifying documents into a hierarchy and linking the classifications to the hierarchy

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP1212699A4 *

Cited By (64)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8250118B2 (en) 1998-07-21 2012-08-21 West Services, Inc. Systems, methods, and software for presenting legal case histories
US8661066B2 (en) 1998-07-21 2014-02-25 West Service, Inc. Systems, methods, and software for presenting legal case histories
US8600974B2 (en) 1998-07-21 2013-12-03 West Services Inc. System and method for processing formatted text documents in a database
US7529756B1 (en) 1998-07-21 2009-05-05 West Services, Inc. System and method for processing formatted text documents in a database
US7778954B2 (en) 1998-07-21 2010-08-17 West Publishing Corporation Systems, methods, and software for presenting legal case histories
EP1213665A3 (en) * 2000-12-07 2003-12-10 Patentmall Limited Patent classification displaying method and apparatus
EP1213665A2 (en) * 2000-12-07 2002-06-12 Patentmall Limited Patent classification displaying method and apparatus
US9619551B2 (en) 2001-08-31 2017-04-11 Fti Technology Llc Computer-implemented system and method for generating document groupings for display
US9558259B2 (en) 2001-08-31 2017-01-31 Fti Technology Llc Computer-implemented system and method for generating clusters for placement into a display
US9208221B2 (en) 2001-08-31 2015-12-08 FTI Technology, LLC Computer-implemented system and method for populating clusters of documents
US9195399B2 (en) 2001-08-31 2015-11-24 FTI Technology, LLC Computer-implemented system and method for identifying relevant documents for display
EP1449111A1 (en) * 2001-10-30 2004-08-25 Goldman, Sachs & Co. Systems and method for facilitating access to documents via associated tags
US8028001B2 (en) 2001-10-30 2011-09-27 Goldman Sachs & Co. Systems and methods for facilitating access to documents via a set of content selection tags
EP1449111A4 (en) * 2001-10-30 2006-08-23 Goldman Sachs & Co Systems and method for facilitating access to documents via associated tags
US8554803B2 (en) 2001-10-30 2013-10-08 Goldman, Sachs & Co. Systems and methods for facilitating access to documents via a set of content selection tags
US7580939B2 (en) 2001-11-02 2009-08-25 Thomson Reuters Global Resources Systems, methods, and software for classifying text from judicial opinions and other documents
US7062498B2 (en) 2001-11-02 2006-06-13 Thomson Legal Regulatory Global Ag Systems, methods, and software for classifying text from judicial opinions and other documents
WO2003040875A2 (en) * 2001-11-02 2003-05-15 West Publishing Company Doing Business As West Group Systems, methods, and software for classifying documents
WO2003040875A3 (en) * 2001-11-02 2003-08-07 West Publishing Company Doing Systems, methods, and software for classifying documents
EP1324219A1 (en) * 2001-12-11 2003-07-02 Abb Research Ltd. Method of searching based on categories for information objects in information pools and system to find such information objects
US7412463B2 (en) 2002-01-11 2008-08-12 Bloomberg Finance L.P. Dynamic legal database providing historical and current versions of bodies of law
EP1483691A2 (en) * 2002-03-11 2004-12-08 The Boeing Company Knowledge management using text classification
FR2840088A1 (en) * 2002-05-24 2003-11-28 Overture Services Inc Search engine and database for distributed database in computer, has computing apparatus with transactional score generator and category assigner in communication with Internet cache of memory device
EP1367509A3 (en) * 2002-05-24 2005-08-31 Overture Services, Inc. Method and apparatus for categorizing and presenting documents of a distributed database
US7792818B2 (en) 2002-05-24 2010-09-07 Overture Services, Inc. Method and apparatus for categorizing and presenting documents of a distributed database
US7231395B2 (en) 2002-05-24 2007-06-12 Overture Services, Inc. Method and apparatus for categorizing and presenting documents of a distributed database
AU2003204327B2 (en) * 2002-05-24 2006-12-21 Excalibur Ip, Llc Method and Apparatus for Categorizing and Presenting Documents of a Distributed Database
US8260786B2 (en) 2002-05-24 2012-09-04 Yahoo! Inc. Method and apparatus for categorizing and presenting documents of a distributed database
US10832212B2 (en) 2002-12-30 2020-11-10 Thomson Reuters Enterprise Centre Gmbh Systems and methods for managing documents for law firms
US8126818B2 (en) 2002-12-30 2012-02-28 West Publishing Company Knowledge-management systems for law firms
US9710786B2 (en) 2002-12-30 2017-07-18 Thomson Reuters Global Resources Systems and methods for managing documents for law firms
US9619909B2 (en) 2004-02-13 2017-04-11 Fti Technology Llc Computer-implemented system and method for generating and placing cluster groups
US9858693B2 (en) 2004-02-13 2018-01-02 Fti Technology Llc System and method for placing candidate spines into a display with the aid of a digital computer
US9245367B2 (en) 2004-02-13 2016-01-26 FTI Technology, LLC Computer-implemented system and method for building cluster spine groups
US8942488B2 (en) 2004-02-13 2015-01-27 FTI Technology, LLC System and method for placing spine groups within a display
US9984484B2 (en) 2004-02-13 2018-05-29 Fti Consulting Technology Llc Computer-implemented system and method for cluster spine group arrangement
US9082232B2 (en) 2004-02-13 2015-07-14 FTI Technology, LLC System and method for displaying cluster spine groups
US9495779B1 (en) 2004-02-13 2016-11-15 Fti Technology Llc Computer-implemented system and method for placing groups of cluster spines into a display
US9384573B2 (en) 2004-02-13 2016-07-05 Fti Technology Llc Computer-implemented system and method for placing groups of document clusters into a display
US9342909B2 (en) 2004-02-13 2016-05-17 FTI Technology, LLC Computer-implemented system and method for grafting cluster spines
US9176642B2 (en) 2005-01-26 2015-11-03 FTI Technology, LLC Computer-implemented system and method for displaying clusters via a dynamic user interface
US9208592B2 (en) 2005-01-26 2015-12-08 FTI Technology, LLC Computer-implemented system and method for providing a display of clusters
US9053179B2 (en) 2006-04-05 2015-06-09 Lexisnexis, A Division Of Reed Elsevier Inc. Citation network viewer and method
US8909647B2 (en) 2009-07-28 2014-12-09 Fti Consulting, Inc. System and method for providing classification suggestions using document injection
US9064008B2 (en) 2009-07-28 2015-06-23 Fti Consulting, Inc. Computer-implemented system and method for displaying visual classification suggestions for concepts
WO2011017133A3 (en) * 2009-07-28 2011-03-31 Fti Technology Llc Providing a classification suggestion for concepts
US9336303B2 (en) 2009-07-28 2016-05-10 Fti Consulting, Inc. Computer-implemented system and method for providing visual suggestions for cluster classification
WO2011017134A1 (en) * 2009-07-28 2011-02-10 Fti Technology Llc Displaying relationships between concepts to provide classification suggestions via injection
WO2011017065A1 (en) * 2009-07-28 2011-02-10 Fti Technology Llc Displaying relationships between electronically stored information to provide classification suggestions via inclusion
US9477751B2 (en) 2009-07-28 2016-10-25 Fti Consulting, Inc. System and method for displaying relationships between concepts to provide classification suggestions via injection
US10083396B2 (en) 2009-07-28 2018-09-25 Fti Consulting, Inc. Computer-implemented system and method for assigning concept classification suggestions
US9165062B2 (en) 2009-07-28 2015-10-20 Fti Consulting, Inc. Computer-implemented system and method for visual document classification
US9542483B2 (en) 2009-07-28 2017-01-10 Fti Consulting, Inc. Computer-implemented system and method for visually suggesting classification for inclusion-based cluster spines
WO2011017098A1 (en) * 2009-07-28 2011-02-10 Fti Technology Llc Displaying relationships between electronically stored information to provide classification suggestions via nearest neighbor
WO2011017152A3 (en) * 2009-07-28 2011-04-07 Fti Technology Llc Displaying relationships between concepts to provide classification suggestions via nearest neighbor
WO2011017064A3 (en) * 2009-07-28 2011-03-31 Fti Technology Llc Providing a classification suggestion for electronically stored information
WO2011017155A1 (en) * 2009-07-28 2011-02-10 Fti Technology Llc Displaying relationships between concepts to provide classification suggestions via inclusion
WO2011017080A1 (en) * 2009-07-28 2011-02-10 Fti Technology Llc Displaying relationships between electronically stored information to provide classification suggestions via injection
US9898526B2 (en) 2009-07-28 2018-02-20 Fti Consulting, Inc. Computer-implemented system and method for inclusion-based electronically stored information item cluster visual representation
US9275344B2 (en) 2009-08-24 2016-03-01 Fti Consulting, Inc. Computer-implemented system and method for generating a reference set via seed documents
US9489446B2 (en) 2009-08-24 2016-11-08 Fti Consulting, Inc. Computer-implemented system and method for generating a training set for use during document review
US10332007B2 (en) 2009-08-24 2019-06-25 Nuix North America Inc. Computer-implemented system and method for generating document training sets
US9336496B2 (en) 2009-08-24 2016-05-10 Fti Consulting, Inc. Computer-implemented system and method for generating a reference set via clustering
US11068546B2 (en) 2016-06-02 2021-07-20 Nuix North America Inc. Computer-implemented system and method for analyzing clusters of coded documents

Also Published As

Publication number Publication date
US20020138529A1 (en) 2002-09-26
EP1212699A1 (en) 2002-06-12
JP4732593B2 (en) 2011-07-27
US20070038625A1 (en) 2007-02-15
CA2371688C (en) 2008-09-09
EP1212699A4 (en) 2006-01-11
AU4989800A (en) 2000-11-17
US7065514B2 (en) 2006-06-20
CA2371688A1 (en) 2000-11-09
US7567961B2 (en) 2009-07-28
WO2000067162A9 (en) 2002-06-06
AU781157B2 (en) 2005-05-12
NZ515293A (en) 2004-04-30
JP2002543528A (en) 2002-12-17

Similar Documents

Publication Publication Date Title
CA2371688C (en) Document-classification system, method and software
US6363379B1 (en) Method of clustering electronic documents in response to a search query
US8341159B2 (en) Creating taxonomies and training data for document categorization
US7617176B2 (en) Query-based snippet clustering for search result grouping
US6691108B2 (en) Focused search engine and method
Salton et al. Parallel text search methods
US7778954B2 (en) Systems, methods, and software for presenting legal case histories
US7496567B1 (en) System and method for document categorization
Kim et al. Automatic MeSH term assignment and quality assessment.
US20060200461A1 (en) Process for identifying weighted contextural relationships between unrelated documents
US20040049499A1 (en) Document retrieval system and question answering system
Raghavan et al. Experiments on the determination of the relationships between terms
JP2001184358A (en) Device and method for retrieving information with category factor and program recording medium therefor
Attardi et al. Theseus: categorization by context
Moschitti Answer filtering via text categorization in question answering systems
Wei et al. A mining-based category evolution approach to managing online document categories
Panda et al. A domain classification-based information retrieval system
Weiss et al. Lightweight document matching for help-desk applications
WO2002037328A2 (en) Integrating search, classification, scoring and ranking
Chakrabarti et al. Topic distillation and spectral filtering
WO2001039008A1 (en) Method and system for collecting topically related resources
Chi et al. Context query in information retrieval
Yang-Stephens et al. Computer-assisted classification of legal abstracts
Lancaster Mechanized document control: A review of some recent research
Atlam A new approach for text similarity using articles

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
ENP Entry into the national phase

Ref document number: 2371688

Country of ref document: CA

Kind code of ref document: A

Ref document number: 2371688

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 49898/00

Country of ref document: AU

REEP Request for entry into the european phase

Ref document number: 2000932127

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2000932127

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2000 615932

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 10013190

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: 515293

Country of ref document: NZ

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

AK Designated states

Kind code of ref document: C2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: C2

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

COP Corrected version of pamphlet

Free format text: PAGES 1/7-7/7, DRAWINGS, REPLACED BY NEW PAGES 1/7-7/7; DUE TO LATE TRANSMITTAL BY THE RECEIVING OFFICE

WWP Wipo information: published in national office

Ref document number: 2000932127

Country of ref document: EP

WWG Wipo information: grant in national office

Ref document number: 49898/00

Country of ref document: AU