WO2000043911A1 - Method and apparatus for improved document searching - Google Patents

Method and apparatus for improved document searching Download PDF

Info

Publication number
WO2000043911A1
WO2000043911A1 PCT/US1999/001299 US9901299W WO0043911A1 WO 2000043911 A1 WO2000043911 A1 WO 2000043911A1 US 9901299 W US9901299 W US 9901299W WO 0043911 A1 WO0043911 A1 WO 0043911A1
Authority
WO
WIPO (PCT)
Prior art keywords
nominal
item
connector
items
words
Prior art date
Application number
PCT/US1999/001299
Other languages
French (fr)
Inventor
Sam Christy
Original Assignee
Wordstream, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wordstream, Inc. filed Critical Wordstream, Inc.
Priority to PCT/US1999/001299 priority Critical patent/WO2000043911A1/en
Priority to AU24636/99A priority patent/AU2463699A/en
Publication of WO2000043911A1 publication Critical patent/WO2000043911A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing

Definitions

  • search engines To allow Internet users to focus their searching efforts, several firms have created free-of-charge sites called “search engines.” These systems maintain huge and constantly growing databases duplicating the text (or portions thereof) of thousands or even millions of documents accessible over the Internet, and permit "visitors" to the site to formulate queries that the search engine applies to its database.
  • the search engine retrieves documents matching the query, often ranked in order of relevance (e.g., in terms of the frequency and location of word matches or some other statistical measure).
  • a document may, for example, repeat a key word over and over in its invisible header, thereby ensuring that matches to queries containing the key word will receive a high relevance rank (since each repetition in the header s counts as a separate match).
  • electronically accessible documents are provided with abstracts written in a highly constrained artificial grammar.
  • sentences are bracketed in the manner of an algebraic equation.
  • the artificial grammar is capable of expressing the thoughts and information ordinarily conveyed in a natural grammar, but in a structured format that restricts the number of possible alternative meanings. Accordingly, while the grammar is clear in the sense of being easily understood by native speakers of the vocabulary and complex in its ability to express sophisticated concepts, sentences are derived from an organized vocabulary according to fixed rules.
  • a query preferably formulated in accordance with these rules, is employed by a search engine in the usual fashion.
  • the query is readily used to identify the most relevant documents merely by examination of document headers. Furthermore, because the abstracts are contained within the invisible header portion of a document, their presence disturbs neither the appearance of the document nor the operation of ordinary search routines.
  • the document header can contain an abstract in accordance with the invention as well as the usual key words, so that standard searches as well as the searches as described herein can coexist without mutual interference.
  • the vocabulary may be represented in a series of physically or logically distinct databases, each containing entries representing a form class as defined in the grammar.
  • the databases are constructed to minimize the occurrence of synonymous terms, thereby reducing the chances of false-negative search results.
  • sentences are composed of "linguistic units," each of which may be one or a few words, from the allowed form classes.
  • These classes are "things” or nominal terms that connote, for example, people, places, items, activities or ideas; "connectors” that specify relationships between two (or more) nominal terms; “descriptors” modifying the state of one or more nominal terms; and “logical connectors” establishing sets of the nominal terms.
  • the list of all allowed entries in all four classes represents the global lexicon of the invention.
  • entries from the form classes are combined according to four expansion rules detailed below. These rules can be followed explicitly in a stepwise fashion to produce sentences, but more typically, once the user is accustomed to the grammar, sentences are constructed by "feel" and, if necessary, subse- quently tested for conformity with the expansion rules.
  • FIG. 1 schematically illustrates application of the expansion rules of the present invention
  • FIG. 2 is a schematic representation of a hardware system embodying the invention.
  • FIG. 3 schematically illustrates operation of the invention.
  • the grammar of the present invention makes use of a lexicon and a constrained set of rules.
  • the rules divide the allowed vocabulary— i.e., the entire English language treated as linguistic units or a subset thereof, either of which represents a global lexicon of linguistic units— into four classes.
  • Each linguistic unit is (1 ) a single word, such as "dog” or "government”; or (2) a hyphenated combination of words, such as "parking-space” or “prime-minister”; or (3) a proper name; or (4) a word with a definition unique to the invention; or (5) one form of a word with multiple meanings.
  • each definition of the word represents a different linguistic unit, the various definitions may appear as entries in different form classes.
  • each definition may be distinguished, for example, by the number of periods appearing at the end of the word.
  • the entry for the first (arbitrarily designated) definition is listed with no period, the entry representing the second definition is listed with one period at its end, and so on.
  • different word senses can be identified numerically, e.g., using subscripts.
  • Words unique to the invention may make up a very small proportion of the total lexicon, and none of these words is specific to the invention or alien to the natural language upon which it is based. Instead, invention- specific words are broadened in connotation to limit the overall number of terms in the lexicon.
  • the word “use” is broadened to connote employment of any object for its primary intended purpose, so that in the sentence "Jake use book,” the term connotes reading.
  • the word “on” may be used to connote time (e.g., (i go-to ballgame) on yesterday). If desired for ease of use, however, the invention-specific words can be eliminated altogether and the lexicon expanded accordingly.
  • the invention divides the global lexicon of allowed terms into four classes: "things” or nominal terms that connote, for example, people, places, items, activities or ideas, identified herein by the code T; "connectors” that specify relationships between two (or more) nominal terms (including words typically described as prepositions and conjunc- tions, and terms describing relationships in terms of action, being, or states of being), identified herein by C; "descriptors” modifying the state of one or more nominal terms (including words typically described as adjectives, adverbs and intransitive verbs), identified herein by D; and “logical connectors” establishing sets of the nominal terms, identified herein by £.
  • Exemplary constrained lists of nominal terms, connectors and descriptors are set forth in Appendices 1-3, respectively. The preferred logical connectors are "and” and "or.”
  • verb tenses are not employed, since these tend to create more ambiguity than they resolve; connectors are phrased in the present tense, since tense is easily understood from context. Tense may nonetheless be indicated, however, by specifying a time, day and/or date. Alternatively, if tense is considered important, it may be indicated by symbolic signals such as "/" for past, "
  • Sentences in accordance with the invention are constructed from terms in the lexicon according to four expansion rules.
  • the most basic sentences proceed from one of the following three constructions (any of which can be created from a T term in accordance with the expansion rules set forth hereinbelow).
  • These structures which represent the smallest possible sets of words considered to carry information, are the building blocks of more complex sentences. Their structural simplicity facilitates ready translation into conversational, natural-language sentences; thus, even complex sentences in accordance with the invention are easily transformed into natural-language equivalents through modular analysis of the more basic sentence components (a process facilitated by the preferred representations described later).
  • Basic Structure 1 (BSD is formed by placing a descriptor after a nominal term to form the structure TD.
  • BS1 sentences such as "dog brown” and “Bill swim” readily translate into the English sentence “the dog is brown” (or the phrase “the brown dog") and "Bill swims.”
  • BS2 is formed by placing a connector between two nominal terms to form the structure TCT.
  • BS2 sentences such as "dog eat food” readily translate into English equivalents.
  • a sentence comprising one or more of the basic structures set forth above may be expanded using the following rules:
  • any linguistic unit from the nominal class can be expanded into the original item followed by a new item from the descriptor class, which modifies the original item. For example, "dog” becomes “dog big.”
  • Rule I is not limited in its application to an isolated nominal term (although this is how BS1 sentences are formed); instead, it can be applied to any nominal term regard- less of location within a larger sentence.
  • TD 1 ⁇ (TD 2 )D 1 .
  • "dog big” becomes “(dog brown) big” (corresponding to English sentence, "the brown dog is big”).
  • the order of addition may or may not be important in the case of consecutive adjectives, since these independently modify T; for example, in “(dog big) brown,” the adjective “big” distinguishes this dog from other dogs, and “brown” may describe a feature thought to be otherwise unknown to the listener.
  • the order of addition is almost always important where a D term is an intransitive verb. For example, expanding the TD sentence "dog run” (corresponding to “the dog runs” or "the running dog") by addition of the descriptor "fast” forms, in accordance with Rule I, "(dog fast) run” (corresponding to "the fast dog runs”).
  • any linguistic unit from the nominal class can be replaced with a connector surrounded by two nominal entries, one of which is the original linguistic unit.
  • "house” becomes “house on hill.”
  • Applying expansion Rule lla to BS1 produces TD-»(TCT)D; for example, "gloomy house” becomes “(house on hill) gloomy,” or "the house on the hill is gloomy.”
  • Rule lla can be used to add a transitive verb and its object.
  • the compound term “mother and father” can be expanded to "(mother and father) drive car.”
  • any linguistic unit from the nominal class can be replaced with a connector surrounded by two nominal entries, one of which is the original linguistic unit.
  • "dog” becomes “dog and cat.”
  • a nominal term can be a composite consisting of two or more nominal terms joined by a connector. For example, the expansion “(John and bill) go-to market” satisfies Rule lla. Subsequently applying Rule I, this sentence can be further expanded to "((John and bill) go-to market) together.
  • a descriptor can be replaced with a logical connector surrounded by two descriptors, one of which is the original. For example, “big” becomes “big and brown.”
  • Applying expansion Rule III to BS1 produces TD-»T(DCD); for example “dog big” (equivalent to “the dog is big,” or “the big dog”) becomes “dog (big and brown)” (equivalent to "the dog is big and brown” or “the big brown dog”).
  • any of the three basic structures can be formed by following expansion Rules I, lla and lib as shown at 112, 1 14, 1 16, respectively, to produce "cat striped” (BS1 ), "cat on couch” (BS2) or "cat and Sue.” Iterative application of expansion rule lla at 118 and 1 19 produces structures of the forms or "((cat on couch) eat mouse)" and (TC 1 T 1 )C 2 T 2 -> ⁇ (TC 1 T 1 )C 2 T 2 )C 3 T 3 or "(((cat on couch) eat mouse) with tail).” Expansion rule I can be applied at any point to a T linguistic unit as shown at 122 (to modify the original T, cat, to produce "(happy cat) on couch”) and 124 (to modify "eat mouse”). Rule III can also be applied as shown at 126 (to further modify cat
  • Expansion Rule I can be applied iteratively as shown at 112, 130 to further modify the original T (although, as emphasized at 130, a descriptor need not be an adjective).
  • Expansion Rule lla is available to show action of the modified T (as shown at 132), and Rule I can be used to modify the newly introduced T (as shown at 134).
  • Rule I can also be used to modify (in the broad sense of the invention) a compound subject formed by Rule s lib, as shown at 136.
  • the order in which linguistic units are assembled can strongly affect meaning.
  • the expansion TC 1 T 1 ⁇ (TC 1 T ⁇ )C 2 T 2 can take multiple forms.
  • the construct “cat hit (ball on couch)” conveys a meaning different from “cat hit ball (on couch).” In the former the ball is definitely o on the couch, and in the latter the action is taking place on the couch.
  • the sentence “(John want car) fast” indicates that the action should be accomplished quickly, while "(John want (car fast))” means that the car should move quickly.
  • the query "troops in China” which is an acceptable formulation in accordance with the grammar of the invention— would retrieve the last entry (101) as the most relevant, since only sentence 101 contains the infor- mation unit "troops in China” or a one-to-one underlying grammatical relationship between the words in the query and the words in the sentence.
  • Queries are processed according to a routine that extracts "information units" in sentences constructed according to the invention. For example, in the sentence,
  • the information units represent the most basic elements of information content in the sentence, as well as their combinations.
  • the sentence would be meaningful for a searcher looking not only for information specifically concerning President Clinton's visit to an aircraft carrier in the Persian Gulf in January 1997.
  • a searcher might, for example, be interested generally in the president's itinerary for January 1997, or events in the Persian Gulf at this time.
  • step 2 e.g., (% near beach)
  • step 2 e.g., (I like %); identify the following information units: "I like house,” “I like house on hill,” “I like house,” “I like house near beach,” “I like house,” “I like house on hill near beach”
  • step 2 produces empty brackets than remove all duplicate sentences from identified information units
  • Things in the first place of a set generally act as subjects, while “things” in the end place of a set generally act as objects; e.g., in the sentence (cat hit dog), "cat” is the primary Thing or subject, and “dog” is the secondary “Thing.” Accordingly, in the sentence ((cat with hat)see dog) the routine does not produce the information unit “hat see dog,” but does produce the information unit "cat see dog.”
  • a server typically a powerful computer or cluster of computers that behaves as a single computer— that services the requests of a large number of smaller computers, or clients, which connect to it.
  • the client computers usually communicate with a single server at any one time, although they can communicate with one another via the server or can use the server to reach other servers.
  • a server is typically a large mainframe or minicomputer cluster, while the clients may be simple personal computers. Servers providing Internet access to multiple subscriber clients are referred to as "gateways"; more generally, a gateway is a computer system that connects two computer networks.
  • the Internet supports a large variety of information-transfer protocols.
  • World Wide Web hereafter, simply, the "web”
  • Web-accessible information is identified by a uniform resource locator or "URL,” which specifies the location of the file in terms of a specific computer and a location on that computer. Any Internet “node” can access the file by invoking the proper communication protocol and specifying the URL.
  • a URL has the format http:// ⁇ host>/ ⁇ path> , where "http” refers to the HyperText Transfer Protocol, "host” is the server's Internet identifier, and the "path” specifies the location of the file within the server.
  • Each "web site” can make available one or more web "pages” or documents, which are format- ted, tree-structured repositories of information, such as text, images, sounds and animations.
  • a link appears unobtrusively as an underlined portion of text in a document; when the viewer of this document moves his cursor over the underlined text and clicks, the link— which is otherwise invisible to the user— is executed and the linked document retrieved. That document need not be located on the same server as the original document.
  • Hypertext and document-retrieval functionality is typically imple- mented on the client machine, using a computer program called a "web browser.”
  • the browser With the client connected as an Internet node, the browser, operating as a process on the client machine, utilizes URLs— provided either by the user or a link— to locate, fetch and display the specified documents.
  • the browser passes the URL to a protocol handler on the associ- ated server, which then retrieves the information and sends it to the browser for display; the browser causes the information to be cached (usually on a hard disk) on the client machine.
  • FIG. 2 A representative client machine implementing the present invention is shown in FIG. 2.
  • the system includes a main bidi- rectional bus 200, over which all system components communicate.
  • a network interface 208 connects, generally via telephone dial-up, to a gateway or other Internet access provider. As a result the client machine becomes a node on the Internet, capable of exchanging data with other Internet com- s puters.
  • the user interacts with the system using a keyboard 210 and a position-sensing device (e.g., a mouse) 212.
  • a position-sensing device e.g., a mouse
  • the output of either device can be used to designate information or select particular areas of a screen display 214 to direct functions to be performed by the system.
  • the main memory 204 contains a group of modules that control the operation of CPU 206 and its interaction with the other hardware components.
  • An operating system 220 directs the execution of low-level, basic system functions such as memory allocation, file management, and operation of mass storage devices 202.
  • an analysis module is 225, implemented as a series of stored instructions, may be included to assist the user in developing queries, or to detect queries that do not accord with the above-described rules (or which fall outside the global lexicon). Instructions defining a user interface 230 allow straightforward interaction over screen display 214.
  • User interface 230 provides functional- 0 ity for generating words or graphical images on display 214 to prompt action by the user, and for accepting user commands from keyboard 210 and/or position-sensing device 212.
  • a web browser 232 facilitates interaction with the web via network interface 208. Browser 232 may be integrated with user interface 230, deriving therefrom the functionality nec- s essary for interaction with the user. Suitable browsers are well known and readily available; these include the EXPLORER browser marketed by Microsoft Corp., and the COMMUNICATOR and NAVIGATOR products supplied by Netscape Communications Corp.
  • main memory 204 o may also include a partition defining a series of databases capable of storing the linguistic units of the invention; these are representatively denoted by reference numerals 235 1 r 235 2 , 235 3 , 235 4 .
  • Databases 235 which may be physically distinct (i.e., stored in different memory partitions and as separate files on storage device 202) or logically distinct (i.e., stored in a single memory partition as a structured list that may be addressed as a plurality of databases), each contain all of the linguistic units corresponding to a particular class. In other words, each database is organized as a table each of whose columns lists all of the linguistic units of the particular class.
  • Nominal terms may be contained in database 235 1 f and a representative example of the contents of that database appears in Appendix 1 hereto; connectors may be contained in database 235 , a representative example of which appears in Appendix 2 hereto; descriptors may be contained in database 235 3 , a representative example of which appears in Appendix 3 hereto; and logical connectors (most simply, "and” and "or") are contained in database 235 4 .
  • the appendices may simply contain lists of linguistic units, but are preferably formatted in three columns—the first containing the linguistic unit, the second containing a definition (if the linguistic unit has more than one meaning and is therefore replicated in the database), and the third containing a synonyms.
  • An input buffer 240 receives from the user, via keyboard 210, an input sentence.
  • Analysis module 225 examines the input sentence for conformance to the structure, and makes corrections as necessary.
  • Analysis module 225 enters a proposed sentence revision (or the unmodified sentence, if no changes were necessary) into an output buffer 245, the contents of which are presented to the user over screen display 214 (e.g., as a pop-up window in the browser display). The user is free to accept the revision or revise it; in the latter case, analysis module 225 once again reviews the sentence for conformance to the above-described rules, and enters the approved sentence or a proposed revision into output buffer 245.
  • analysis module 225 first determines whether each linguistic unit has more than one meaning (i.e., definition). If so, the user is prompted (via screen display 214) to choose the entry with the intended meaning, if a linguistic unit has one or more associated synonyms, these are offered to the user as alternatives. Furthermore, if the a synonym is linked to more than one linguistic unit, all of these are offered as alternatives.
  • main memory 204 modules of main memory 204 have been described separately, this is for clarity of presentation only; so long as the system performs all necessary functions, it is immaterial how they are distributed within the system and the programming architecture thereof.
  • the browser 232 is capable of establishing connection, via network interface 208, to one or more remote sources 300.
  • These sources are servers containing one or more web pages that include text and rendering instructions.
  • browser 232 executes the rendering instructions to create on screen 214 a display that includes the text, as well as graphical and/or image portions, of the web page.
  • Each web page may be stored on remote source 300 as a document containing a body portion 302 b and a header portion 302 h . Only the body portion 302 b is actually visible when the web page is "visited”— that is, downloaded onto the client computer (usually accompanied by further interaction with the server).
  • Web pages are stored as a database 310 on a search engine 315, i.e., a specialized server computer equipped to apply to database 310 queries received from connected client computers.
  • search engine 315 i.e., a specialized server computer equipped to apply to database 310 queries received from connected client computers.
  • database 310 Typically, the entire textual portion of each stored web page appears in database 310.
  • Search engine 315 applies a client-originated query to database 310 and generates a report listing the web pages matching the search criteria.
  • the various search engines differ in their op- erating characteristics, but generally the results of the search appear as list of hypertext links to the identified web pages, each link being accompanied by a portion of the text.
  • browser 232 performs a sequence of steps that is initiated by the user's acceptance of the query in output buffer 245, shown as a step 320.
  • Browser 232 then transmits the query (step 322), via network interface 208, to a search engine 315 with which the client computer has established an Internet connection.
  • the search engine 315 applies the query to its database 310 (preferably in accordance with the query-processing routine described above), identifying relevant web pages, and returning a list of hypertext links thereto.
  • the list is ranked hierarchically to reflect both the absolute number of word or information-unit matches between the query and the listed documents as well as other factors suggesting relevance; for example, a document in which word order is preserved or the query terms are found in close proximity to one another may be ranked higher than another document with the same number of word matches but where the words are separated or scattered.
  • the invention is capable of extending its search to a desired level of estimated relevance, ordering the retrieved documents according to relevance criteria.
  • the list of documents is received by browser 232 in step 324.
  • the client user may operate browser 232 to execute selected ones of the returned links in step 326, resulting in download and display of the linked web pages in step 328.
  • the headers 302 h of documents 302 each contain both key words descriptive of the contents of the web page and an abstract, composed in accordance with the grammar here- inabove described, which also describes the subject matter.
  • search en- gine 315 may prompt the user to designate whether the query is structured or unstructured, or may simply infer this from the query itself, or may instead simply search for matches regardless of the query format. If the query is identified as structured, search engine 315 may apply the search only to the structured portions of web-page headers 302 h . Indeed, due to the utility of the invention's grammar in making meaning explicit, the user may elect to apply even an unstructured search only to the structured portions of the web-page headers.
  • search engine 315 when performing a search in accordance with the invention, is configured for sensitivity to word order and proximity. Word order is always preserved in all information units extracted from a sentence.
  • Ranking can be achieved by emphasizing units extracted from the sentence without word separation.
  • the distance between matched words can also be used as a ranking factor, as can differences in the hierarchical (bracketing) level at which a match occurs. For example, absolute literal matches are weighted more highly than matches where the word order differs from that of the query, or where the identified query words are scattered within the document. Accordingly, in the example discussed above, entry 10 would be selected over the other entries even if these contained a larger absolute number of word matches.
  • buttons "informational" "button, p
  • machine machine
  • machine machine

Abstract

To facilitate accurate document searching, electronically accessible documents are provided with abstracts written in a highly constrained artificial grammar. The artificial grammar is capable of expressing the thoughts and information ordinarily conveyed in a natural grammar, but in a structured format that restricts the number of possible alternative meanings. Accordingly, while the grammar is clear in the sense of being easily understood by native speakers of the vocabulary and complex in its ability to express sophisticated concepts, sentences are derived from an organized vocabulary according to fixed rules. A query, preferably formulated in accordance with these rules, is employed by a search engine in the usual fashion. Due to the highly constrained meaning of the search query, and the likelihood that relevant documents have similar or matching abstracts in their headers, key-word searches are likely to identify the most relevant documents.

Description

METHOD AND APPARATUS FOR IMPROVED DOCUMENT SEARCHING
BACKGROUND OF THE INVENTION
Prior to the proliferation of electronically available information over the Internet, computerized retrieval of information could be approached in a relatively organized fashion. Documents having widespread interest were typically maintained only by commercial database providers, which categorized them (by subject, date, etc.), and perhaps abstracted them, thereby facilitating multiple modes of searching. Consequently, a database user effectively narrowed the search space at the outset merely by choosing the appropriate database, which would limit the searchable documents to the topic of interest. Then, the user could retrieve documents from the selected database based on any of a variety of search cri- teria other than simple "key words": date of publication, contents of a category-specific document field, title or author, to name but a few.
While commercial database providers still exist, increasing amounts of information are stored on servers accessible over the Internet, which frequently make them available free of charge. Information on the Inter- net, of course, is both vast and utterly disorganized in the sense of lacking any hierarchical or category-based indexing scheme. Particular kinds of documents may be found on large numbers of servers, so that arbitrarily confining one's search to a single such server is likely to miss numerous relevant documents located elsewhere.
To allow Internet users to focus their searching efforts, several firms have created free-of-charge sites called "search engines." These systems maintain huge and constantly growing databases duplicating the text (or portions thereof) of thousands or even millions of documents accessible over the Internet, and permit "visitors" to the site to formulate queries that the search engine applies to its database. The search engine retrieves documents matching the query, often ranked in order of relevance (e.g., in terms of the frequency and location of word matches or some other statistical measure).
Unfortunately, the sheer volume of documents and their lack of or- 5 ganization, combined with the limited searching capabilities of most search engines, make it very likely that relevant documents will be missed or elude notice amidst a plethora of irrelevant retrievals. In order to guide these simple types of searches, the proprietors of documents available over the Internet frequently provide them with "headers" which, while in- o visible to someone retrieving the document, are nonetheless acquired by search engines and form part of the searchable text of the document. A document may, for example, repeat a key word over and over in its invisible header, thereby ensuring that matches to queries containing the key word will receive a high relevance rank (since each repetition in the header s counts as a separate match).
Nonetheless, key-word searching remains limited, frequently resulting in missed entries (due to synonymous ways of expressing the relevant concept) or, even more frequently, a flood of irrelevant entries (due to the multiple unrelated meanings that may be associated with words and o phrases). For example, someone interested in military activities in China might attempt to search using the query "troops in China." But because of the numerous and varied topics that may implicate virtually any chosen set of words, the search engine might retrieve documents containing the following sentences: 5 1. Bill Clinton plans meeting with leaders of China to talk about US troops in Taiwan.
2. Troops in Russia improve border security with China.
3. Leader of NATO troops in Bosnia to visit China.
4. Farmer finds crashed WWII troop carrier in southern China. 5. CIA papers reveal US troops in Cambodia near border of China during Vietnam War.
6. Asia expert, Johnson, talks to leaders of US troops about new weapons factories in China. 7. British troops in Hong Kong have mixed reaction to handover of Hong Kong to China.
8. Troops in controversy over design for new china.
9. Troops wear boots made in China.
10. Troops of General Chun put down protest in China. Of course, only the last item is relevant to the user's intent.
SUMMARY OF THE INVENTION
In accordance with the present invention, electronically accessible documents are provided with abstracts written in a highly constrained artificial grammar. In addition, sentences are bracketed in the manner of an algebraic equation. The artificial grammar is capable of expressing the thoughts and information ordinarily conveyed in a natural grammar, but in a structured format that restricts the number of possible alternative meanings. Accordingly, while the grammar is clear in the sense of being easily understood by native speakers of the vocabulary and complex in its ability to express sophisticated concepts, sentences are derived from an organized vocabulary according to fixed rules. A query, preferably formulated in accordance with these rules, is employed by a search engine in the usual fashion. Due to the highly constrained meaning of such a search query and the existence of brackets, it is possible for a machine to deter- mine an exact relationship between all of the words in the sentence. It is then possible to match the relationship of the words in a search query to the relationship of the words in a target of document, instead of simply relying on a general word match.
If relevant documents have in their headers abstracts containing similar word relationships, the query is readily used to identify the most relevant documents merely by examination of document headers. Furthermore, because the abstracts are contained within the invisible header portion of a document, their presence disturbs neither the appearance of the document nor the operation of ordinary search routines. In other words, the document header can contain an abstract in accordance with the invention as well as the usual key words, so that standard searches as well as the searches as described herein can coexist without mutual interference.
In order to constrain meaning, the vocabulary may be represented in a series of physically or logically distinct databases, each containing entries representing a form class as defined in the grammar. In this way, the user formulating a search query, or a document proprietor creating an abstract, is required to select from the allowed vocabulary. The databases are constructed to minimize the occurrence of synonymous terms, thereby reducing the chances of false-negative search results.
While desirable, however, vocabulary constraint is not critical to practice of the invention, since appreciable benefits are attained merely by use of the structured grammar and brackets (which themselves reduce query ambiguity significantly). Starting with a term from one of four form classes, sentences are constructed by iterative application of four expansion rules that govern the manner in which terms from the various classes can be combined. The invention exploits the relative ease of learning a new grammar, particularly one that is highly constrained to a few precise rules, as compared with learning a new vocabulary. As a result, after be- coming familiar with this grammar, the user can easily compose sentences in the manner prescribed by the present invention.
To compose an abstract or query, a sentence is formulated ab initio in accordance with the form classes or expansion rules, or a natural- language sentence is translated or decomposed into the (typically) simpler grammar of the invention but preserving the original vocabulary. ln accordance with the invention, sentences are composed of "linguistic units," each of which may be one or a few words, from the allowed form classes. These classes are "things" or nominal terms that connote, for example, people, places, items, activities or ideas; "connectors" that specify relationships between two (or more) nominal terms; "descriptors" modifying the state of one or more nominal terms; and "logical connectors" establishing sets of the nominal terms. If the invention is to be used with a constrained vocabulary, the list of all allowed entries in all four classes represents the global lexicon of the invention. To construct a sentence in accordance with the invention, entries from the form classes are combined according to four expansion rules detailed below. These rules can be followed explicitly in a stepwise fashion to produce sentences, but more typically, once the user is accustomed to the grammar, sentences are constructed by "feel" and, if necessary, subse- quently tested for conformity with the expansion rules.
BRIEF DESCRIPTION OF THE DRAWINGS
The invention description below refers to the accompanying drawings, of which:
FIG. 1 schematically illustrates application of the expansion rules of the present invention;
FIG. 2 is a schematic representation of a hardware system embodying the invention; and
FIG. 3 schematically illustrates operation of the invention.
DETAILED DESCRIPTION OF AN ILLUSTRATIVE EMBODIMENT
The grammar of the present invention makes use of a lexicon and a constrained set of rules. The rules divide the allowed vocabulary— i.e., the entire English language treated as linguistic units or a subset thereof, either of which represents a global lexicon of linguistic units— into four classes. Each linguistic unit is (1 ) a single word, such as "dog" or "government"; or (2) a hyphenated combination of words, such as "parking-space" or "prime-minister"; or (3) a proper name; or (4) a word with a definition unique to the invention; or (5) one form of a word with multiple meanings. In the latter case, each definition of the word represents a different linguistic unit, the various definitions may appear as entries in different form classes. For purposes of automation, each definition may be distinguished, for example, by the number of periods appearing at the end of the word. The entry for the first (arbitrarily designated) definition is listed with no period, the entry representing the second definition is listed with one period at its end, and so on. Alternatively, different word senses can be identified numerically, e.g., using subscripts.
Words unique to the invention may make up a very small proportion of the total lexicon, and none of these words is specific to the invention or alien to the natural language upon which it is based. Instead, invention- specific words are broadened in connotation to limit the overall number of terms in the lexicon. For example, in a preferred implementation, the word "use" is broadened to connote employment of any object for its primary intended purpose, so that in the sentence "Jake use book," the term connotes reading. The word "on" may be used to connote time (e.g., (i go-to ballgame) on yesterday). If desired for ease of use, however, the invention-specific words can be eliminated altogether and the lexicon expanded accordingly.
The invention divides the global lexicon of allowed terms into four classes: "things" or nominal terms that connote, for example, people, places, items, activities or ideas, identified herein by the code T; "connectors" that specify relationships between two (or more) nominal terms (including words typically described as prepositions and conjunc- tions, and terms describing relationships in terms of action, being, or states of being), identified herein by C; "descriptors" modifying the state of one or more nominal terms (including words typically described as adjectives, adverbs and intransitive verbs), identified herein by D; and "logical connectors" establishing sets of the nominal terms, identified herein by £. Exemplary constrained lists of nominal terms, connectors and descriptors are set forth in Appendices 1-3, respectively. The preferred logical connectors are "and" and "or."
Preferably, verb tenses are not employed, since these tend to create more ambiguity than they resolve; connectors are phrased in the present tense, since tense is easily understood from context. Tense may nonetheless be indicated, however, by specifying a time, day and/or date. Alternatively, if tense is considered important, it may be indicated by symbolic signals such as "/" for past, "| " for present, and "\" for future. It should be noted, however, that some natural languages do not utilize tense indicators.
Sentences in accordance with the invention are constructed from terms in the lexicon according to four expansion rules. The most basic sentences proceed from one of the following three constructions (any of which can be created from a T term in accordance with the expansion rules set forth hereinbelow). These structures, which represent the smallest possible sets of words considered to carry information, are the building blocks of more complex sentences. Their structural simplicity facilitates ready translation into conversational, natural-language sentences; thus, even complex sentences in accordance with the invention are easily transformed into natural-language equivalents through modular analysis of the more basic sentence components (a process facilitated by the preferred representations described later).
Basic Structure 1 (BSD is formed by placing a descriptor after a nominal term to form the structure TD. BS1 sentences such as "dog brown" and "Bill swim" readily translate into the English sentence "the dog is brown" (or the phrase "the brown dog") and "Bill swims."
BS2 is formed by placing a connector between two nominal terms to form the structure TCT. BS2 sentences such as "dog eat food" readily translate into English equivalents.
A sentence comprising one or more of the basic structures set forth above may be expanded using the following rules:
Rule I: To a nominal term, add a descriptor (T- TD)
In accordance with Rule I, any linguistic unit from the nominal class can be expanded into the original item followed by a new item from the descriptor class, which modifies the original item. For example, "dog" becomes "dog big." Like all rules of the invention, Rule I is not limited in its application to an isolated nominal term (although this is how BS1 sentences are formed); instead, it can be applied to any nominal term regard- less of location within a larger sentence. Thus, in accordance with Rule I, TD1→(TD2)D1. For example, "dog big" becomes "(dog brown) big" (corresponding to English sentence, "the brown dog is big").
The order of addition may or may not be important in the case of consecutive adjectives, since these independently modify T; for example, in "(dog big) brown," the adjective "big" distinguishes this dog from other dogs, and "brown" may describe a feature thought to be otherwise unknown to the listener. The order of addition is almost always important where a D term is an intransitive verb. For example, expanding the TD sentence "dog run" (corresponding to "the dog runs" or "the running dog") by addition of the descriptor "fast" forms, in accordance with Rule I, "(dog fast) run" (corresponding to "the fast dog runs"). To express "the dog runs fast," it is necessary to expand the TD sentence "dog fast" with the descriptor "run" in the form "(dog run) fast." Applying expansion Rule I to the structure BS2 produces TCT- (TD)CT. For example, "dog eat food" becomes "(dog big) eat food." Rule I can also be applied to compound nominal terms of the form TCT, so that a structure of form TQT becomes TQT→(TQT)D. For example, "mother and father" becomes "(mother and father) drive." In this way, multiple nominal terms can be combined, either conjunctively or alternatively, for purposes of modification. It should also be noted that verbs having transitive senses, such as "drive," are included in the database as connectors as well as descriptors. Another example is the verb "capsize," which can be intransitive ("boat capsize") as well as transitive ("captain capsize boat").
Rule Ha: To a nominal term, add a connector and another nominal term (T→TCT).
In accordance with Rule lla, any linguistic unit from the nominal class can be replaced with a connector surrounded by two nominal entries, one of which is the original linguistic unit. For example, "house" becomes "house on hill." Applying expansion Rule lla to BS1 produces TD-»(TCT)D; for example, "gloomy house" becomes "(house on hill) gloomy," or "the house on the hill is gloomy."
Rule lla can be used to add a transitive verb and its object. For example, the compound term "mother and father" can be expanded to "(mother and father) drive car."
Rule lib: To a nominal term, add a logical connector and another nominal term (T→TCT).
In accordance with Rule lib, any linguistic unit from the nominal class can be replaced with a connector surrounded by two nominal entries, one of which is the original linguistic unit. For example, "dog" becomes "dog and cat." Again, for purposes of Rule lla and Rule lib, a nominal term can be a composite consisting of two or more nominal terms joined by a connector. For example, the expansion "(John and bill) go-to market" satisfies Rule lla. Subsequently applying Rule I, this sentence can be further expanded to "((John and bill) go-to market) together.
Rule III: To a descriptor, add a logical connector and another descriptor (D→DQD).
In accordance with Rule III, a descriptor can be replaced with a logical connector surrounded by two descriptors, one of which is the original. For example, "big" becomes "big and brown." Applying expansion Rule III to BS1 produces TD-»T(DCD); for example "dog big" (equivalent to "the dog is big," or "the big dog") becomes "dog (big and brown)" (equivalent to "the dog is big and brown" or "the big brown dog").
The manner in which these rules are applied to form acceptable sen- tences in accordance with the invention is shown in FIG. 1. Beginning with a nominal term such as cat, shown at 1 10, any of the three basic structures can be formed by following expansion Rules I, lla and lib as shown at 112, 1 14, 1 16, respectively, to produce "cat striped" (BS1 ), "cat on couch" (BS2) or "cat and Sue." Iterative application of expansion rule lla at 118 and 1 19 produces structures of the forms
Figure imgf000012_0001
or "((cat on couch) eat mouse)" and (TC1T1)C2T2->{(TC1T1)C2T2)C3T3 or "(((cat on couch) eat mouse) with tail)." Expansion rule I can be applied at any point to a T linguistic unit as shown at 122 (to modify the original T, cat, to produce "(happy cat) on couch") and 124 (to modify "eat mouse"). Rule III can also be applied as shown at 126 (to further modify cat to produce "(((happy and striped) cat) on couch)") and 128 (to further modify "eat mouse").
Expansion Rule I can be applied iteratively as shown at 112, 130 to further modify the original T (although, as emphasized at 130, a descriptor need not be an adjective). Expansion Rule lla is available to show action of the modified T (as shown at 132), and Rule I can be used to modify the newly introduced T (as shown at 134). Rule I can also be used to modify (in the broad sense of the invention) a compound subject formed by Rule s lib, as shown at 136.
The order in which linguistic units are assembled can strongly affect meaning. For example, the expansion TC1T1→(TC1Tι)C2T2 can take multiple forms. The construct "cat hit (ball on couch)" conveys a meaning different from "cat hit ball (on couch)." In the former the ball is definitely o on the couch, and in the latter the action is taking place on the couch. The sentence "(John want car) fast" indicates that the action should be accomplished quickly, while "(John want (car fast))" means that the car should move quickly.
This approach substantially reduces ambiguity. Returning to an ear- lier example, the 10 retrieved sentences containing the words "troops in China" are shown in English (E) and in accordance with their representation as prescribed herein (I):
1 E. Bill Clinton plans meeting with leaders of China to talk about US troops in Taiwan.
11. Bill Clinton plan ((he meet-with (leader/s of China)) in-order-to (he talk- about (troops of United-States) in Taiwan)).
2E. Troops in Russia improve border security with China.
21. (Troops in Russia) improve ((security of border) with China).
3E. Leader of NATO troops in Bosnia to visit China.
31. (Leader of ((troops of NATOJin Bosnia)) visit China.
4E. Farmer finds crashed WWII troop carrier in southern China. 41. (Farmer find (troop-carrier from WWII)) in (China southern). (Troop- carrier crash) during WWII.
5E. CIA papers reveal US troops in Cambodia near border of China during Vietnam War.
51. (Paper/s of CIA) reveal (((troop/s of united-states) in Cambodia) near (border of China)) during Vietnam-War.
6E. Asia expert, Johnson, talks to leaders of US troops about new weapons factories in China.
61. Johnson be (expert about Asia). He talk-to (leader/s of (troop/s of united-states)) about (((factory/s for weapon/s) new ) in China.)
7E. British troops in Hong Kong have mixed reaction to handover of Hong Kong to China.
71. (Reaction of ((troops of Britain) in Hong-Kong)) about ((handover of Hong-Kong) to China) mixed.
8E. Troops in controversy over design for new china.
81. (Troop/s have controversy) about (design of (china new)).
9E. Troops wear boots made in China.
91. Troops wear (boot/s made-in China).
10E. Troops of General Chun put down protest in China.
101. ((Troops of General-Chun) put-down protest) in China.
The query "troops in China"— which is an acceptable formulation in accordance with the grammar of the invention— would retrieve the last entry (101) as the most relevant, since only sentence 101 contains the infor- mation unit "troops in China" or a one-to-one underlying grammatical relationship between the words in the query and the words in the sentence.
Queries are processed according to a routine that extracts "information units" in sentences constructed according to the invention. For example, in the sentence,
((Clinton visit (aircraft-carrier in persian-gulf)) on jan-97)
the routine would identify the following information units:
1. aircraft-carrier in persian-gulf
2. Clinton visit aircraft-carrier 3. clinton visit aircraft-carrier in persian-gulf
4. aircraft-carrier on jan-97
5. aircraft-carrier in persian-gulf on jan-97
6. clinton on jan-97
7. clinton visit aircraft-carrier on jan-97 8. clinton visit aircraft-carrier in persian-gulf on jan-97
The information units represent the most basic elements of information content in the sentence, as well as their combinations. Thus, the sentence would be meaningful for a searcher looking not only for information specifically concerning President Clinton's visit to an aircraft carrier in the Persian Gulf in January 1997. A searcher might, for example, be interested generally in the president's itinerary for January 1997, or events in the Persian Gulf at this time.
Information units are extracted according to the following method:
1. Start with sentence S, e.g., (I like ((house on hill) near beach)) 2. Locate first occuring complete info unit, (TCT) or (TD); e.g., in sentence S, this is (house on hill)
3. If this sentence does not contain a variable then identify "house on hill" as information unit 4. Replace information unit with a variable, e.g., (I like (% near beach))
5. Repeat step 2, e.g., (% near beach)
6. If this sentence does contain a variable then remove first word from first printed sentence and replace for variable and identify "house near beach" as information unit; replace variable with entire first sentence and identify as information unit "house on hill near beach"; repeat this process with all sentences that existed prior to the beginning of step 6.
7. Repeat from step 2, e.g., (I like %); identify the following information units: "I like house," "I like house on hill," "I like house," "I like house near beach," "I like house," "I like house on hill near beach"
8. If step 2 produces empty brackets than remove all duplicate sentences from identified information units
9. End processing when sentence S is empty. If sentence S is not empty, than repeat method until step 2 finds unit with two variables (% C %). Repeat steps 4-7 on both variables. Repeat method until sentence S is empty.
The results of this processing are the following information units:
1. house on hill
2. house near beach 3. house on hill near beach
4. I like house
5. I like house on hill
6. I like house
7. I like house near beach 8. I like house
9. I like house on hill near beach
A suitable computer program for implementing the foregoing procedure is as follows:
Public Sub subPullUnitO Dim varLeft As Variant Dim varRight As Variant Dim varTarget As Variant Dim intCount As Integer Dim intCount2 As Integer Dim intHolde As Integer s Dim intHolder2 As Integer varLeft = Left(varlnput, lnStr(varlnput, ")")) varRight = Trim(Right(varlnput, Len(varlnput) - lnStr(varlnput, ")"))) varTarget = varLeft For intCount = 1 To 10 If lnStr(varTarget, "(") = 0 Then Exit For varTarget = Right(varTarget, Len(varTarget) - lnStr(varTarget, "(")) Next intCount varLeft = Trim(Left(varLeft, Len(varLeft) - Len (varTarget) - 1 )) varTarget = Trim(Left(varTarget, Len(varTarget) - 1 )) varlnput = varLeft & " :1 : " & varRight intHoldeM = frmPage.lstHolderl .ListCount intHolder2 = frmPage.lstHolder2.ϋstCount Select Case lnStr(varTarget, ":1 :") Case Is = 0 frmPage.lstHolderl .Addltem varTarget
Case Is = 1
If lnStr(Right(varTarget, Len(varTarget) - 3), ":1 :") < > 0 Then varTarget = Right(varTarget, Len(varTarget) - 3) varTarget = Left(varTarget, Len(varTarget) - 3) With frmPage.lstHolderl
For intCount = 0 To intHolder2 - 1 For intCount2 = 0 To intHolderl - 1 frmPage.lstHolder2.Addltem _ Left(frmPage.lstHolder2.List(intCount), lnStr(frmPage.lstHolder2.List(intCount), " ")) _ & varTarget & _
Left(.List(intCoύnt2), lnStr(.List(intCount2), " ")) 'RIGHT WORD AND LEFT WORD frmPage.lstHolder2.Addltem _ Left(frmPage.lstHolder2.List(intCount), lnStr(frmPage.lstHolder2.List(intCount), " ")) _ & varTarget & _ .List(intCount2f
RIGHT WORD AND LEFT PHRASE frmPage.lstHolder2.Addltem _ frmPage.lstHolder2.List(intCount) _ & varTarget & _
Left(.List(intCount2), lnStr(.List(intCount2), " ")) RIGHT PHRASE AND LEFT WORD frmPage.lstHolder2.Addltem _ frmPage.lstHolder2.List(intCount) _ & varTarget & _ .List(intCount2)
'RIGHT PHRASE AND LEFT PHARSE Next intCount2 Next intCount End With
'MsgBox varlnput
'For intCount = O To frmPage.lstHolderl . ListCount - 1 intHoldeM = frmPage.lstHolderl .ListCount For intCount = 0 To intHoldeM - 1 frmPage.lstHolder3. Addltem frmPage.lstHolderl . List(intCount)
Next intCount frmPage.lstHolderl .Clear
For intCount = 0 To frmPage.lstHolder2. ListCount - 1 frmPage.lstHolderl .Addltem frmPage.lstHolder2.List(intCount) Next intCount frmPage.lstHolder2. Clear Else varTarget = Right(varTarget, Len(varTarget) - 3)
With frmPage.lstHolderl
For intCount = 0 To intHoldeM - 1 .Addltem Left(.ϋst(intCount), lnStr(.ϋst(intCount), " ")) _ & varTarget .Addltem .List(intCount) & varTarget
Next intCount End With End If Case Else varTarget = Left(varTarget, Len(varTarget) - 3)
With frmPage.lstHolderl
For intCount = 0 To intHoldeM - 1 .Addltem varTarget & _ Left(.List(intCount), lnStr(.List(intCount), " ")) .Addltem varTarget & .List(intCount)
Next intCount End With End Select
'If lnStr(varlnput, ":1 :"J = 3 And If lnStr(varlnput, ":1 :") < > 0 And _
(lnStr(Right(varlnput, Len(varlnput) - 5), ")") > _ lnStr(Right(varlnput, Len(varlnput) - 5), "(")) And _ lnStr(Right(varlnput, Len(varlnput) - 5), ":1 :") = 0 And _ lnStr(Right(varlnput, Len(varlnput) - 5), "(") < > 0 Then For intCount = 0 To frmPage.lstHolderl . ListCount - 1 frmPage.lstHolder2. Addltem frmPage.lstHolderl .List(intCount) Next intCount frmPage.lstHolderl .Clear End If If lnStr(varlnput, ")") < > 0 Then Call subPullUnit For intCount = 0 To frmPage.lstHolder3. istCount - 1 frmPage.lstShow.Addltem frmPage.lstHolder3.List(intCount)
Next intCount For intCount = 0 To frmPage.lstHolderl . ListCount - 1 frmPage.lstShow.Addltem frmPage.lstHolderl .List(intCount) Next intCount With frmPage.lstShow
For intCount = 0 To .ListCount
For intCount2 = intCount + 1 To .ListCount If .List(intCount) = .List(intCount2) Then intCount2 = intCount2 - 1 .Removeltem (intCount2)
Else
Exit For End If Next intCount2 Next intCount
End With End Sub
"Things" in the first place of a set generally act as subjects, while "things" in the end place of a set generally act as objects; e.g., in the sentence (cat hit dog), "cat" is the primary Thing or subject, and "dog" is the secondary "Thing." Accordingly, in the sentence ((cat with hat)see dog) the routine does not produce the information unit "hat see dog," but does produce the information unit "cat see dog."
Similarly, consider the two sentences
((((Ship American) with (radar new)) shoot-down airplane) in persian-gulf
and
(Ship see (helicopter shoot-down airplane)) ln this case, while both sentences contain the same three words "ship," "shoot-down," and "airplane" in the same order, only the first sentence is actually about a ship that shoots down an airplane.
Although the invention is suitably practiced in any system calling for electronic retrieval of documents from a large database (or multiple databases), it is especially useful in conjunction with the Internet, which affords even a personal computer access to tremendous numbers of other and potentially far larger computers. Much of the Internet is based on the client-server model of information exchange. This computer architecture, developed specifically to accommodate the "distributed computing" environment that characterizes the Internet and its component networks, contemplates a server (sometimes called the host)— typically a powerful computer or cluster of computers that behaves as a single computer— that services the requests of a large number of smaller computers, or clients, which connect to it. The client computers usually communicate with a single server at any one time, although they can communicate with one another via the server or can use the server to reach other servers. A server is typically a large mainframe or minicomputer cluster, while the clients may be simple personal computers. Servers providing Internet access to multiple subscriber clients are referred to as "gateways"; more generally, a gateway is a computer system that connects two computer networks.
The Internet supports a large variety of information-transfer protocols. One of these, the World Wide Web (hereafter, simply, the "web"), has recently skyrocketed in importance and popularity; indeed, to many, the Internet is synonymous with the web. Web-accessible information is identified by a uniform resource locator or "URL," which specifies the location of the file in terms of a specific computer and a location on that computer. Any Internet "node" can access the file by invoking the proper communication protocol and specifying the URL. Typically, a URL has the format http://<host>/<path> , where "http" refers to the HyperText Transfer Protocol, "host" is the server's Internet identifier, and the "path" specifies the location of the file within the server. Each "web site" can make available one or more web "pages" or documents, which are format- ted, tree-structured repositories of information, such as text, images, sounds and animations.
An important feature of the web is the ability to connect one document to many other documents using "hypertext" links. A link appears unobtrusively as an underlined portion of text in a document; when the viewer of this document moves his cursor over the underlined text and clicks, the link— which is otherwise invisible to the user— is executed and the linked document retrieved. That document need not be located on the same server as the original document.
Hypertext and document-retrieval functionality is typically imple- mented on the client machine, using a computer program called a "web browser." With the client connected as an Internet node, the browser, operating as a process on the client machine, utilizes URLs— provided either by the user or a link— to locate, fetch and display the specified documents. The browser passes the URL to a protocol handler on the associ- ated server, which then retrieves the information and sends it to the browser for display; the browser causes the information to be cached (usually on a hard disk) on the client machine.
A representative client machine implementing the present invention is shown in FIG. 2. As indicated therein, the system includes a main bidi- rectional bus 200, over which all system components communicate. The main sequence of instructions effectuating the functions of the invention and facilitating interaction among the user, the system, and the Internet, reside on a mass storage device (such as a hard disk or optical storage unit) 202 as well as in a main system memory 204 during operation. Exe- cution of these instructions and effectuation of the functions of the inven- tion is accomplished by a central-processing unit ("CPU") 206. A network interface 208 connects, generally via telephone dial-up, to a gateway or other Internet access provider. As a result the client machine becomes a node on the Internet, capable of exchanging data with other Internet com- s puters.
The user interacts with the system using a keyboard 210 and a position-sensing device (e.g., a mouse) 212. The output of either device can be used to designate information or select particular areas of a screen display 214 to direct functions to be performed by the system.
ιo The main memory 204 contains a group of modules that control the operation of CPU 206 and its interaction with the other hardware components. An operating system 220 directs the execution of low-level, basic system functions such as memory allocation, file management, and operation of mass storage devices 202. At a higher level, an analysis module is 225, implemented as a series of stored instructions, may be included to assist the user in developing queries, or to detect queries that do not accord with the above-described rules (or which fall outside the global lexicon). Instructions defining a user interface 230 allow straightforward interaction over screen display 214. User interface 230 provides functional- 0 ity for generating words or graphical images on display 214 to prompt action by the user, and for accepting user commands from keyboard 210 and/or position-sensing device 212. A web browser 232 facilitates interaction with the web via network interface 208. Browser 232 may be integrated with user interface 230, deriving therefrom the functionality nec- s essary for interaction with the user. Suitable browsers are well known and readily available; these include the EXPLORER browser marketed by Microsoft Corp., and the COMMUNICATOR and NAVIGATOR products supplied by Netscape Communications Corp.
To support analysis module 225 (if included), main memory 204 o may also include a partition defining a series of databases capable of storing the linguistic units of the invention; these are representatively denoted by reference numerals 2351 r 2352, 2353, 2354. Databases 235, which may be physically distinct (i.e., stored in different memory partitions and as separate files on storage device 202) or logically distinct (i.e., stored in a single memory partition as a structured list that may be addressed as a plurality of databases), each contain all of the linguistic units corresponding to a particular class. In other words, each database is organized as a table each of whose columns lists all of the linguistic units of the particular class. Nominal terms may be contained in database 2351 f and a representative example of the contents of that database appears in Appendix 1 hereto; connectors may be contained in database 235 , a representative example of which appears in Appendix 2 hereto; descriptors may be contained in database 2353, a representative example of which appears in Appendix 3 hereto; and logical connectors (most simply, "and" and "or") are contained in database 2354. The appendices may simply contain lists of linguistic units, but are preferably formatted in three columns—the first containing the linguistic unit, the second containing a definition (if the linguistic unit has more than one meaning and is therefore replicated in the database), and the third containing a synonyms.
An input buffer 240 receives from the user, via keyboard 210, an input sentence. Analysis module 225 examines the input sentence for conformance to the structure, and makes corrections as necessary. Analysis module 225 enters a proposed sentence revision (or the unmodified sentence, if no changes were necessary) into an output buffer 245, the contents of which are presented to the user over screen display 214 (e.g., as a pop-up window in the browser display). The user is free to accept the revision or revise it; in the latter case, analysis module 225 once again reviews the sentence for conformance to the above-described rules, and enters the approved sentence or a proposed revision into output buffer 245. If the appendices include definitions and synonyms, analysis module 225 first determines whether whether each linguistic unit has more than one meaning (i.e., definition). If so, the user is prompted (via screen display 214) to choose the entry with the intended meaning, if a linguistic unit has one or more associated synonyms, these are offered to the user as alternatives. Furthermore, if the a synonym is linked to more than one linguistic unit, all of these are offered as alternatives.
It must be understood that although the modules of main memory 204 have been described separately, this is for clarity of presentation only; so long as the system performs all necessary functions, it is immaterial how they are distributed within the system and the programming architecture thereof.
Operation of the invention may be understood with reference to FIG. 3. The browser 232 is capable of establishing connection, via network interface 208, to one or more remote sources 300. These sources are servers containing one or more web pages that include text and rendering instructions. When a web page is downloaded by browser 232 (via network interface 208), it is cached, and browser 232 executes the rendering instructions to create on screen 214 a display that includes the text, as well as graphical and/or image portions, of the web page. Each web page may be stored on remote source 300 as a document containing a body portion 302b and a header portion 302h. Only the body portion 302b is actually visible when the web page is "visited"— that is, downloaded onto the client computer (usually accompanied by further interaction with the server).
Web pages are stored as a database 310 on a search engine 315, i.e., a specialized server computer equipped to apply to database 310 queries received from connected client computers. Typically, the entire textual portion of each stored web page appears in database 310. Al- though only the body portion 302b of a document will actually appear on the display of web browser 232, both header and body portions are searchable by key word. Search engine 315 applies a client-originated query to database 310 and generates a report listing the web pages matching the search criteria. The various search engines differ in their op- erating characteristics, but generally the results of the search appear as list of hypertext links to the identified web pages, each link being accompanied by a portion of the text.
In general operation, browser 232 performs a sequence of steps that is initiated by the user's acceptance of the query in output buffer 245, shown as a step 320. Browser 232 then transmits the query (step 322), via network interface 208, to a search engine 315 with which the client computer has established an Internet connection. The search engine 315 applies the query to its database 310 (preferably in accordance with the query-processing routine described above), identifying relevant web pages, and returning a list of hypertext links thereto. Generally, the list is ranked hierarchically to reflect both the absolute number of word or information-unit matches between the query and the listed documents as well as other factors suggesting relevance; for example, a document in which word order is preserved or the query terms are found in close proximity to one another may be ranked higher than another document with the same number of word matches but where the words are separated or scattered. The invention is capable of extending its search to a desired level of estimated relevance, ordering the retrieved documents according to relevance criteria.
The list of documents is received by browser 232 in step 324. The client user may operate browser 232 to execute selected ones of the returned links in step 326, resulting in download and display of the linked web pages in step 328.
In accordance with the invention, the headers 302h of documents 302 each contain both key words descriptive of the contents of the web page and an abstract, composed in accordance with the grammar here- inabove described, which also describes the subject matter. In formulating a query, the user is free to enter a conventional series of key words or a sentence formulated according to the grammar rules hereof. Search en- gine 315 may prompt the user to designate whether the query is structured or unstructured, or may simply infer this from the query itself, or may instead simply search for matches regardless of the query format. If the query is identified as structured, search engine 315 may apply the search only to the structured portions of web-page headers 302h. Indeed, due to the utility of the invention's grammar in making meaning explicit, the user may elect to apply even an unstructured search only to the structured portions of the web-page headers.
Because of the importance of the order and organization of words in a structured query, search engine 315, when performing a search in accordance with the invention, is configured for sensitivity to word order and proximity. Word order is always preserved in all information units extracted from a sentence.
Ranking can be achieved by emphasizing units extracted from the sentence without word separation. The distance between matched words can also be used as a ranking factor, as can differences in the hierarchical (bracketing) level at which a match occurs. For example, absolute literal matches are weighted more highly than matches where the word order differs from that of the query, or where the identified query words are scattered within the document. Accordingly, in the example discussed above, entry 10 would be selected over the other entries even if these contained a larger absolute number of word matches.
It will therefore be seen that the foregoing represents readily implemented and exploited approach to improving the reliability of text-based searches. The terms and expressions employed herein are used as terms of description and not of limitation, and there is no intention, in the use of such terms and expressions, of excluding any equivalents of the features shown and described or portions thereof, but it is recognized that various modifications are possible within the scope of the invention claimed. For example, the various modules of the invention can be implemented on a general-purpose computer using appropriate software instructions, or as hardware circuits, or as mixed hardware-software combinations.
APPENDIX 1
NOMINATIVE TERMS
"actor" H "
"actress"
"address" "ofbuilding"
"advertisement" " " ad, commercial"
"advice" "recommendation, suggestion, opinion"
"afiica" in velt"
"afternoon"
"age" n II
"air" n II
"airplane" n "plane, aircraft"
"airplane-pilot" " " "pilot, aviator"
"airport" "airfield"
"alarm-clock "clock"
"algeria" "altitude" "aluminum" " " "ambassador" " " "animal" "ankle"" "
"answer" "on test" "solution" Hant" " " "apartment"
"apple"" " appointment >"" " "ajrrangement to meet" "apricot t"" "April" ' It n "architect" "Argentina"
"argument" "heated discussion" "fight, dispute, quarrel" "arm" "of body"
"armchair" "chair"
"army
"arrival"
"art-exhibition" "art-show, exhibition"
"artist"
"artwork" "art"
"Asia" " "
"attic" " "
"August"
"aunt" " "
"Australia"
"Austria"
"automotive-insurance "automobile-insurance, insurance, car-insurance" "avalanche" "landslide" "baby" " " M n "baby-crib" "crib" "back" "of body" "backpack" knapsack"
"baker" "balcony" n it
"ball" "for game" "banana" "band" "musical" group" "bandage" "band-aid" "bank" "financial"
"banquet" "dinner"
"barber" "hairdresser"
"barn" " " "stable"
"barrel" "for storage" tt n
"barrel" "of gun"
"basement" "cellar'
"bathrobe"
"bath-towel" "towel, hand-towel"
"bathtub" "tub"
"battery" "electrical" " "
"beach"
"bean" " "
"bear" " "
"beard"
"bed" " " "cot"
"bedroom"
"bee
"beer" " " "ale"
"beet" " "
"behavior"
"Belgium"
"bell" " "
"bell-pepper" "pepper"
"belt" "for pants s" II It
"beverage "drink" "bicycle"
"bird
"birth "childbirth"
"birthday" "bladder" "blanket" "blood" "boat" " " "ship" "body" "of animal "bodyguard" " " "guard" "bolivia" "bomb"
"bone" " "
"book" " "
"border" "of state"
"bottle"
"bottom"
"bow-tie" n n
"boy" " " n n
"boyfriend"
"bracelet" H n
"brain" " " "mind"
"brake" "for stopping machine" "
"brand" "brandname, kind, type"
"brass" " "
"brazil"
"bread"
"breakfast"
"breath" n II
"brick" " "
"bridge" "for crossing" " "
"broom"
"brother"
"Bulgaria" H n
"bullet" "forj gun" " "
"bus-driver"
"business-trir. "trip"
"butcher"
"butter"
"butterfly"
"button" "for clothing" " "
"button" "informational" "button, p
"cabbage"
"cab-driver" "taxi-driver"
"cabin" " " "chalet, cottage"
"cafe" " " "coffee-shop"
"cake" "for celebration"
"camel"
"Canada"
"canal" " " n H
"candle"
"capital" "of state"
"captain" "of boat"
"captain" "rank in military" " "
"cardboard" tt n
"cardboard-box" "box"
"cargo" "freight"
"carpenter"
"carpet" "wall-to-wall" "rug" "carrot"
"cash" " " money
"cast-iron" "iron"
"cat" "domestic"
"cauliflower" " "
"chain" "of links"
"chain" "of stores"
"chair" " " " "
"chart" " " "table"
"chauffeur" "driver"
"cheek"
"cheese" n H
"chemistry" tt tt
"cherry"
"chess"" "
"chess-piece"
"chest" "of body
"chicken"
"chile" " "
"chili-pepper' pepper"
"chin" " "
"china" "country
"chocolate"
"Christmas"
"church"
"cigar" " "
"cigarette"
"circle"" "
"citizen"
"city" " "
"clock"" "
"clothes-iron" " " "iron"
"clothing" "chothes"
"cloud"
"club" "weapon"
"coal" " " " "
"coat" " " "jacket"
"cockroach"
"coffee" "the beverage"
"coffee-bean" " "coffee"
"coffee-pot" " "pot"
"collar" "of animal" " "
"collar" "of shirt"
"Colombia"
"color" " "
"comb"
"competition" "between people" "rivalry"
"computer" " " "concert"
"condition" "of object" "state, shape"
"construction-crane" "crane"
"conversation "discussion"
"cook" " " "chef'
"copper"
"corn" " "
"cost" " " "price" 1
"cotton"
"couch" "sofa"
"country" "nation"
"country-club' "club"
"countryside" "country"
"courage" "bravery"
"cousin"
"cracker" "biscuit"
"cream" "dairy"
"crime"
"crop" " "
"Cuba" " "
"cucumber" " " " " "cup" "for drinking" "glass" "Czechoslovakia" "damage" " " "harm" "dance" "event" "ball"
"danger" " " "hazard, risk, peril, jeopardy"
"date" "romantic" "date" "in time" "daughter"
"day" "length of time" "day-trip" " " "trip, field-trip" "death"" " "debt" " " " " "December"
"decision" " " "choice, selection" "degree" "academic" " "
"demonstration" " " "demo"
"Denmark" "dentist" "departure" It It tt tt desert" "dessert i"ll H II M M "diagram" H II It It "diarrhea" II II II It
« dji:c~ti.i:o~n—a~r—y."ιι H II II H "digestion" H II II II "dining-room ,"11 II II It It "direction" "to be pointed in" "directions" " " "instructions, direction"
"disease" "sickness, illness"
"dishes" "china, dish"
"dish-towel" " " "towel"
"distance"
"dock" " " "pier, wharf
"document"
"dog" " "
"donkey"
"door" "
"doormat" "mat"
"drawing" "illustration"
"dream" "in sleep"
"dress" " "
"driver" "of car" "motorist"
"driveiVlicense" "license"
"drug" "illegal"
"drum" "for music"
"duck" " " " "
"dust" " " n n
"eagle"
"ear" earring "earthquake "Ecuador" tt tt "edge" " " "border" "education" "training" "eel" " " "egg" " " tt n "Egypt" tt tt "elbow" "electricity" "elevator" "lift" "emergency-exit' "end" "ofevent" "finish, conclusion" "end" "of object "enemy" " " "rival, adversary"
"energy-source" "engine" " " "motor" engineer "of science" "England" "entrance" "to building" "entry" "envelope" "essay"" " paper" "Ethiopia" "Europe" "exhibition" "show, exhibit" "exit" " " "expedition" "adventure, trip, journey"
"exportation "export"
"eye" " "
"face" "ofperson"
"fact" " " "truth"
"factory" "plant"
"fall" "season" "autumn"
"family"
"feπn" " " "ranch"
"father"
"features" "propety, characteristic, atl
"February"
"ferry" " "
"fig" " "
"finger"
"fingernail" "nail"
"Finland"
"fire" " "
"fish" " "
"fist" " "
"flashlight" "light"
"flavor" tt n "taste"
"flea" " "
"flood"" "
"floor" "of room"
"floor" "story1 ' "story ,lt
"flour" " "
"flower"
"flower-gradei n" " " "garden, flower-bed'
"flower-pot" "pot"
"flute" " "
"fly" "insect M
"fly" "in pants"
"food" " "
"foot" " "
"football" "the game" " "
"football" "the ball"
"forest" "woods, woodland"
"fork" "inroad"
"fork" "untensil"
"fox" " "
"France" tt tt
"Friday"
"friend"
"front" "of object"
"fruit" " "
"fundraiser" "event" "benefit" "funeral" "burial"
"game" "to play"
"garbage" "trash, waste"
"garlic"
"gasoline" " " "gas, petrol, fuel"
"gem" " " "jewel"
"geographic-map" "map"
"Germany" " "
"crirl" " " " "
"girlfriend" " "
"glass" "material"
"glasses" " "
"glove"
"glue" " " "paste, cement"
"goat" " " " "
"god" " " " "
"gold" " "
"goose"
"government" " "
"grape"
"grapefruit" " "
"grass
"graveyard" " " "cemetary"
"Greece"
"grocery-store" " " "supermarket, market"
"ground-pepper" "pepper"
"group" " " "set, collection, bunch"
"guest" "in home" "guest, visitor"
"guest" "on program' ' "guest"
"guest" "of hotel" "guest"
"guidebook" " " "guide"
"gun" " " " "
"gymnastics" " " n n
"hail" " " " "
"hair" " " " "
"hairbrush "brush"
"half' " " " "
"hammer" " " "mallet"
"hand" "of person"
"handkerchief"" "
"happiness" " " "pleasure"
"harvest" "crop"
"hat" " " " "
"head" "of body"
"headlight" " " "ϋght"
"health"
"health-insurance" "insurance"
"heart" "in body" "heart" "shape"
"heel" "offoot"
"heel" "of shoe" tt tt
"here" " " " "
"highway" " " "turnpike"
"hole" "through object t""
"hole" "in ground" It N
"holland"
"honey" " "
"horse"" " M H
"hot-cocoa" N tl "hot-chocolate, cocoa"
"Hungary"
"husband" tt H
"hypdermic-needle" It II "needle"
"ice-cream" " " It II
"Iceland" II II
"idea" " " "thought II
"importation" import
"income-tax" It II N H
"India" " " M n
"Indonesia" It H tt «
"information" N It "data"
"insect" M It "bug"
"insurance" tt It interpreter II It "invention" tt tl "invoice" tt tl "bill, receipt"
"Iran" " " H N
"Iraq" " " " "
"Ireland"
"iron" "material"
"island" II II "isle"
"Israel" N II
"Italy" " " II N
"January" II II
"japan"" " II n
"job-title" It II "job"
"Jordan" n it II N
"juice" " " II II
"July" " " M n
"June" " "
"Kenya" II M
"kidney" II N
"kind "type, sort, variety, make, model, style"
"king" " " "kitchen" "kitchen-knife' "knife" "knee" " " "Kuwait" " "
"lace" "decorative" « tt
"ladder"
"lake
"lamp" "table" "light H
"lamp" "standing" "Ught"
"language" " "
"lead" "material"
"leaf" " " " "
"leather"
"Lebanon"
"leer" " " " "
"lemon"
"length"
"letter" "written"
"letter" "of alphabet"
"Liberia"
"Libya"
"license"
"life" " " " "
"life-insurance" " " "insurance"
"light" "illumination" "illumination"
"light-bulb" " " "bulb"
"light-fixture" " " "Ught"
"lightning"
"lime" "substance"
"lime" "fruit" " "
"lion" " " " "
"lin" " " " "
"liquid" "fluid"
"liver" " " " "
"living-room" " "
"lobster"
"lock" "for key"
"look" "on face" "expression"
"look" "appearance" "appearance"
"love" " " " "
"luck fortune"
"luggage" " " "baggage"
"lunch"" "
"lung" " " " "
"machine" "machinery, device, apparatus'
"magazine" " "
"magic"
"mail" " " " "
"Malaysia" " "
"Malta"
"man" " " " " "march" "month-of
"marriage"
"match" "of game" "game"
"match" "for fire"
"mattress"
"may" " "
"meat" " " "flesh"
"medication" " " "drug, medicine"
"meeting"
"member" "of club" II n
"memorial-service commeration, memorial"
"metal"" " " "
"meter" "unit of measurement" " "
"meter" "device for measurement" gauge"
"Mexico"
"middle"
"miUtary-camp" 'camp"
"milk" " " " "
"minute" " "
"mistake" "error"
"mixing-bowl" " " "bowl"
"Monday"
"monkey"
"month"
"monument" "memorial"
"moon"
"morning"
"morocco"
"mosquito"
"mother"
"mountain"
"mouse" "animal" n n
"mouth" "of animal"
"mouth" of person" N n
"movie" "film"
"movie-star" "star"
"movie-theater1 "theater"
"mushroom" "toadstool"
"mustard"
"nail" "for wo >oodd"'
"nail-file"
"name" "persona 1l"
"necklace"
"necktie" "tie"
"neighbor" " "
"Nepal"
"Netherlands" " "
"newspaper" " " paper "new-Zealand""
"Nicaragua
"Nigeria"
"night" " "
"noon"
"north-america"
"north-pole"
"Norway" M " " "
"nose" " " " "
"November"
"numeral" " " "number"
"nurse"
"nut" "hardware" " " "nut" "for eating" " " "oar" " " "paddle"
Figure imgf000039_0001
"object" " " "thing"
"objective" "goal" "aim, goal, intention"
"occupation" " " "job, work"
"October" " " " "
"office" " " "study"
"oil" "for cooking" " "
"oil" "for eating" " "
"olive" " " " "
"onion"
"opponent" " " "adversary, opposition"
"option" " " "choice, alternative"
"orange"
"outer-space" space"
"package" parcel"
"pain" "emotional" "suffering, grief '
"pain" "physical" "suffering, discomfort, hurt"
"painting" " " " "
"Pakistan" " " " "
"pancake"
"pants"" " "slacks, trousers"
"paper-bag" " " "bag"
"parachute" n II
"parent" tt w
"parking-lot" " " " "
"parking-space" " " " "
"parking-ticket" " " "ticket"
"part" "in play" "role, part'
"part" "of device/machine" "c IΛoJmllψpoUnlleCnlUt
"partridge"
"passport"
"pasta" " " "noodle"
"pavilion tent »« "pea" " " "peace" "pear" " "
pen" "for writing" pen" "for animals" "coop"
"pencil"
"people"
"Peru" " "
"pharmacy" "drugstore"
"PhiUppines"
"photo-copy "copy"
"photograph "photo, picture"
"physician" "doctor"
"piano"
"picture" "Ulustration"
"piece" " " "section, portion, share, bit, segment, slice, chunk, fraction, part"
"pig" " "
"pigeon"
"pfflow"
"pine-tree" " "
"pipe" "for smoking
"pipe" "for plumbing""
"plant "
"plastic-bag" " " "bag"
"plate" "for eating" "dish"
"play" "dramatic"
"playing-card" " " "card"
"plum" " "
"pocket"
"pocket-knife "knife"
"poison" "toxin, venom"
"Poland"
"pohce-officer" "officer, policeman, poUce"
"port" "for shipping" "harbor"
"porter" "for door" "doorman"
"porter" "for bags"
"Portugal" " " " " possessions "property" "possibiUty" "likelihood, risk, chance" "postage-stamp" " " "stamp" "postcard" "card" "post-office" " " "pot" "for cooking" "potato" "powder" "practical-joke" "joke" "present" 'gift" "prison" "jail, penitentary" "prison-guard" " " "guard"
"problem" "difficulty, trouble, issue"
"problem" "to be solved" "question"
"property" "of land" "land, real-estate, characteristic"
"protest-march" "march"
"pubUc-bus" " " "bus"
"pubUc-garden" "garden"
"public-holiday" "hoUday"
"pubUc-Ubraiy" "Ubrary"
"purse"" " " "
"push-button" " " "button"
"quantity" "number, total, sum, amount"
"queen" " " II n
"question" "inquiry"
"rabbit" "hare"
"radio"" "
"raincoat" " "
"razor-blade" " " "razor"
"reason" " " "excuse, explanation, justification, arguer
"refrigerator" " "
"region" " " "area, part"
"reUgion" " "
"rent" " " " "
"response" " " "reply, answer"
"restaurant" " "
"result "effect, outcome, consequence"
Figure imgf000041_0001
"ridding-crop" " " "crop"
"riddle" "joke"
"ring" "for finger"
"ring" "shape of
"river" " " " "
"river-bank" " " "bank"
"road" " " " "
"road-map" " " "map"
"rocket" " " "missile"
"roof " " " "
"room" "of building"
"root" "ofplant"
"rubber" "material"
"rug" " " "carpet, mat"
"Rumania" " "
"Russia" " "
"rust" " "
"sadness" " "
"safety" "security" "safety-belt" " " "seat-belt" "sailor"" "
"sales-tax"
"sand" " " " "
"Saturday" " " "sauce" " " "gravy" "Saudi-Arabia" U N N N
"sausage" " " N N
"scale" "device to measure weight" "balance"
"scale" "offish"
"scarf
"school" n it
"schoolbus" "bus"
"science"
"scissors"
"Scotland"
"screw"
"sea" " "
"searchlight" " "
"section" "of town" "part, buπow' "security-guard" " " "guard" "self " " " "
"sentry" "guard"
"September" " " "sewing-needle" "needle" "shape" "contour" "shaver" " " "razor"
"sheep" "shirt" " "
"shoe" " "
"shoelace" " " "lace" "shoulder" "ofperson" "side" "of object" " "
"signature" M II "autograph" "silk" " " H N "sUver"" " N N "sister"" " N H "size" " " II H "ski" " " N II "sky" " " tt tt "sled" " " toboggan" "smell" " " aroma, odor" "smoke" "snake" "snow"" " "soap" " " "sock" " " "soft-drink" "soda, soda-pop" "soldier" "troop"
"son" " "
"song" " " "baUad"
"sound" "noise"
"soup" " "
"south-Africa" " "
"south-America"
"south-pole"
"space" "room"
"Spain" tt n
"species" "kind, type"
"speech" "address"
"spoon"
"spring" "season" "springtime"
"spring" "coil"
"spring" "of water
"staircase" "stairs, stairweU, stairway"
"stamp" "tool for stamping" " "
"star" "in sky"
"steel" " "
"stick" " " "twig' n
"still-camera" "camera"
"stock-market"
"stomach"
"stone"" " "rock'
"store" " " "boutique, shop, market"
"storm"
"story "tale"
"stove" "for heat"
"stove" "for cooking"
"straight-pin" n H "pin"
"street" "lane, avenue"
"street-map" "map"
"student" "pupil"
"subway"
"sugar"" "
"summer"
"summer-camp" " " "camp"
"sun" " "
"Sunday"
"sunhght" "sunshine, Ught, dayUght, sun'
"supper" "dinner"
"swamp" "marsh, bog"
"Sweden"
"Switzerland"
"Syria
"table" "physical"
"table-salt" "salt" "tail" " " H II "tailor" " " H H "task" " " II 'job, chore" "tax" " " II duty, tarif "tea" " " H H "teacher" N tt "instructor" "telephone" N II
"phone" "television" H II "tv" "television-set" "television" "tent" "for camping "Thailand" "theater" "for plays" "playhouse" "thief " " "robber" "thigh" " " "thirst" " " "throat" It H "thumb" H H "thunder" H It Thursday" It II H N "ticket" "for event" "tiger" " " It II "time" "of something" time" "of day" "timetable" " " "schedule" "tin" " " " M
Figure imgf000044_0001
"title" "of objects "name" "toast" "cheer"" " "toast" "of bread" It N
M II "tobacco" tt It "today" N II II II "toe" " " It H "toenaU" II It "naU" "toϋet" " " II II "tomato" II tt "tomorrow" II tl "tongue" H II "tool" " " "device" "tooth"" " N II "toothbrush" « II H N "top" " " "tour" " " H II "tour-guide" guide" "town" " " "vilUage" "toy" " " H H "traffic-Ught" n II "Ught" "traffic-ticket" " "ticket, speeding-ticket" "traU-guide" " "guide" "train" " " "tree" " "
Figure imgf000045_0001
"truck-driver" "
"trunk" "for storage" "box, chest" "trunk" "of elephant" " " "trunk" "of car" "Tuesday" " " " "
"Tunisia"
"turkey" "animal" " "
"turkey" "the country" " "
"tv-set" "television, tv"
"tv-show" "tv-program, television-show"
"umbreUa"
"uncle"" "
"united-states ' "america, us, usa"
"Uruguay"
"vacation"
"vaccination" "inoculation"
"vegetable"
"vegetable-garden" "garden"
"velvet"
"Venezuela"
"victim"
"view" " "
"vinegar"
"violin"
"visitor" "of exhibit/musuem" " "
"waiter" "waitress"
"walking-stick" " " "cane"
"waU" "outside"
"wall" "ofroom"
"water"
"weather" "climate"
"wedding"
"Wednesday"
"week"" "
"weight" "for Ufting" " "
"weight" "of something"" "
"wheat"
"where"
"who
"wife" " "
"wild-game" "game"
"wind" " " "breeze, draft"
"window"
"window-curtain" curtain"
"winter" it "woman"
"wood" "lumbar, firewood"
"wool
"word"" " " "
"woven-basket" "basket"
"wristwatch" " " "watch"
"writer" " " "author, noveUst"
"writing-paper" " " "notepaper, paper1 "vear" " " " " "yesterday" " " "Yugoslavia" " "
APPENDIX 2
CONNECTORS
"able-to" " " "can, capable-of, able, could"
"aboard" " "on"
"about" " " "concern, regard, on"
"above" " " "over"
"absorb" " " "soak-up"
"accept" "invitation/award" " "
"accept" "club/coUege" "get-into"
"accustomed-to" " " "used-to, famiUar, accustomed"
"across"
"add" " "
"address" "deal-with issue" "deal-with"
"admire" "respect"
"admit-that" " " "acknowledge, confess, admit"
"adopt" "methods"
"adopt" "child" " "
"advertise"
"afraid-of "fear, scared"
"after" "in time"
"age" " " "years-old"
"agree-to" "agree"
"agree-with" " " "agree"
"aUege" "something" "assert, claim"
"aUergic-to" "aUergic"
"allowed-to" " " "can, may, aUowed"
"amaze" " " "astonish, surprize"
"among" "amid"
"amuse" "interest, like"
"and" " "
"angry-that" " " "angry, upset, mad"
"apply-for" "apply"
"aquit" " " "exonerate"
"around" "encircle" "encircle, enclose, surround"
"around" "in general" " "
"arrest"
"as-favor-to" "for"
"as-if " " "as-though, as, like"
"assemble" " " "put-together"
"assist"" " "aid, help"
"as-work-for" "for"
M-f" II II
"attach" " " "put-together, attach, pin, connect, Unk"
"attack"
"attend" "school" "at, enroUed"
"authorize" certify" "avoid" "avert"
"aware-that" "know, realize, understand"
"away-from" "from"
Figure imgf000048_0001
"bad-for" "detrimental-to"
"bake" " "
"bargin-with"
"bathe"" "
"be" " " "is, am, was, are"
"beat-up"
"because" "since"
"become" "turn-into, change"
"become-infected-with" " " "get"
"before" "in time" "by, precede, prior-to, sooner-than"
"begin"" " "start"
"behave-like' "act-like"
"behind" "fall-behind"
"bend" " " "fold"
"better-than"
"better-than" M H
"between" "in-between"
"beware-of H II "lookout-for"
"beyond" It It "farther-than, past"
"bite N N
"blame" II H "accuse" "blow-up" tt H "detonate, explode, bomb" "boil" " " M II "born-in" II II "from" "bother" tt II "annoy, irritate, disturb, bug, pester" "brag-that" tl It "boast, brag" "break" "does not function "bribe" " " "blackmail, extort" "brush" "hair" " " "brush" "teeth" " " "buUd" " " "make, construct" "burn" "w/ fire" "scorch, singe, char" "bury" " " " " N
"but" " " " "aalthough, except, however, nevertheless' "caUed" " II "H "named, known-as"
"caU-to" " "wwiitthh vvooiiccee"" "caU" caU-up" "with phone" "phone, caU"
"can-afford-to" " " "can, afford"
"care-about" " " "care, concerned" "care-for" "people, plants, animals" "tend, take-care-of, look-after" "carry" " " " " "catch" "person/animal" "apprehend, capture"
"catch" "ball/object
"cause" " " "make, result-in" "celebrate" M n
"censor" "suppress"
"certain-that" "sure, know, certain"
"change" "alter, adjust, modify"
"chase" "pursue"
"cheat-at" "swindle, cheat"
"chew"" "
"circle" " "
"climb"" " "mount t, scale"
"close" "business, school" " " "close" "Ud or door" "shut" "coUect" "for hobby" " "
"comb
"come-to" come, get-to, reach, arrive-at, arrive" "compare" "complement" " " "congratulate" " " "contact"
"contain" "include"
"cook" " " " "
"cool" " " " "
"cost" " " " "
"cover" " " "cover-up"
"crack fracture"
"criticize"
"crush"" "
"cure" " " "heal' 1
"cut-down" " " "cut, chop-down"
"cut-off" "cut"
"damage"
"date" " " "go-out-with"
"decide-that" " " "determine"
"decorate" n N
"decrease" "reduce, diminish, lessen'
"deduce-that" " " "realize, infer, calculate"
"defeat" "beat, overcome"
"deflate"
"demonstrate" " " "show"
"depend-on" " " "rely-on"
"describe" "explain"
"design" "devise"
"despite" "in-spite-of
"destroy" "ruin, demoUsh, wreak"
"detect" "sense"
"dial" " " " "
"different-from"
"disagree-with" "disagree"
"disappoint" " " "disconnect" " " "detach"
"discuss" " " "talk-about"
"disinfect" "sterilize"
"dislike" "dont-like"
"disobey"
"display" " " "show, exhibit"
"distribute" " "
"distrust" " " "mistrust"
"divorce" " "
"dont-care-about" "dont-care"
"dont-know-how-to " " " "cant"
"doubt-that "doubt"
"down"
"drag
"draw" " " " "
"dread"
"drink" " "
"drive" "any vehicle" "fly, sail, pilot"
"drop" " " "let-go-of
"during" "in"
Figure imgf000050_0001
"edit" " " " "
"elect" " " " "
"embarrassed" "personal inadequacy" "ashamed"
"emit" " " "give- off, exhaust"
"empty"
"engage-to" " "
"enter" " " "go-into, into"
"envy" " " "jealoi us"
"even-if "weather-or-not"
"excite "thriU"
"exempt-from"" " "free"
"exit" " " "go-out-of, out"
"expel"" " " "
"explain-why" " " "explain"
"explore"
"extract" "remove"
"face" " " "point-toward"
"fail" " " " "
"far-from" "far-away-from"
"fascinate" "interest"
"festen"
"fed-up-with" " " "tired-of
"feed" " " " "
"feel-sorry-for" pity, sorry"
"fiU-out" "complete"
"fiU-up"
"find" ' come-across, come-upon, locate, discover' "finish"" " "complete, conclude, finalize" "fire" "dismiss" "first-meet" " " "meet"
"fit-in" fit"
"fix" " " "repair, mend, darn" "foUow" " " " "
"for" "for-time-of " " "for" "by-comparison-of " " "for" "for-doing" " " "for-cost-of " " "for" "forget" " " " "
"for-purpose-of " " "for" "frighten" "in general" "scare" "frightent" "suddenly" "scare" "from" "location" "from" "point in time" " " A-. II II II H
"gather" " " "pick-up, coUect"
"give-away" " give-birth-to" " " "conceive, have, bear" "go-from" " " "leave, depart" "go-get" " " "get, retrieve, fetch, pick-up"
"good-at" " " "excel-at" "good-for" " " "benefit" "go-to" " "
"go-to" " " "go"
"grab" " " "take" "graduate-from"
"greet" welcome"
"griU" " " "barbeque"
"grow-up-in" " " "raised-in"
"guess-that" " " "guess"
"hangout-with"
"happy-that" " " "glad-that, happy, glad"
"hate" " " "despise, detest, loathe, abhor, cant-stand"
"have take"
"have-here" have"
"have-to" " " "must" "hear" " " " "
"heard-of know-of
"hide" " " "conceal" "hire" " " " " "hit" "target" "hold" "in hand" "hope" " " "wish" "hug" " " "embrace" "hunt"
« i.f "question" "ignore"
"imply" " "
"impressed" " " "impressed-by"
"imprison" "lock-up"
"improve" " "
"in" "in-area-of
"in" "in-time-of N n
"in-addition" " " "further-more, moreover,
"in-case" " "
"in-center-of " "
"in-charge-of " " "run, direct, manage"
"increase" " "
"indicate-that" " " "signify, mean, indicate"
"in-exchange-for" "for"
"in-favor-of " " "for, approve-of
"infect" "give"
"inflate"
"in-front-of " " "ahead-of, front"
"inherit" "legal
"inherit" "biologicaUy" " "
"injure" "wound, hurt"
"in-order-of " " "by"
"in-order-to" " " "so-that, to"
"insert"
"inside-of "within, in, inside"
"instead-of " "
"insult" " " "offend, hurt, tuant"
"intend-to" " " "mean, determined, intend'
"inteπogate" " " "question"
"into" " " "in"
"invade" "with i tniUtary" " "
"invent" " "
"invert"
"invite"
"jealous-of "romantic" " "
"join" " " " "
"kidnap"
"kiU" " " "slay"
"ldss" " " " "
"knock-over" " " "tip-over"
"know" "friend
"know" "acquaintance"" "
"know-how-to" " " "can, know, how"
"last-for" "continue-for"
"laugh-with' "laugh-at"
"lean-against" " " "on, lean-on, against"
"learn-about" " " "leave-behind" " " "leave, forget"
"legalize" " " II n
"less-than" " "
"Uberate" "free, let-go"
"lick" " " " "
"Ue-to" " " "Ue, decieve"
"lift-up" " " "raise, hoist, pick-up"
"like" "romantic"
"like" "general" "enjoy, fond-of, interested-ii
"likely-to" "bound-to, probably"
"limit" " " "restrain"
Figure imgf000053_0001
"Usten-to" "hear"
"Uve-in" "reside, inhabit, Uve"
"load-up" "load"
"lock" " " " "
"look-at "watch, observe"
"look-forward-to" "anticipate"
"look-up"
"loosen"
"lose" "object" "misplace"
"lose" "game"" "
"love" "romantic"
"love" "general"
"lower
"mad-at" "hate"
"made-of "consist-of, of
"make-fun-of " " "laugh-at, ridicule"
"make-sure" " " "ensure"
"manufacture "produce, generate"
"mark" " " " "
"marry"
"measure" " "
"meet"
"member-of "group" "belong-to"
"memorize" " "
"might" "could, maybe"
"mispronounce"
"miss" "regret loss"
"miss" "target" "overlook"
"mix-together"" " "mix, combine, put-together"
"more-than" " "
"move" "relocate" "relocate"
"move" "object
"nationalize" " "
"near close-to"
"nearer-than" " " "closer-than"
"need " "next-to" "beside, alongside"
"nominate" " "
"not-aUowed-to" "cant"
"obey oUow"
"of "possessive"
"off-of "off*
"omit" " " "leave-out"
"on-left-side-of
"on-media-of " " "on"
"on-other-side-of "across"
"on-right-side-of
"on-side-of " " "side"
"on-surface-of "on"
"on-to"" " "on"
"on-top-of " " "on"
"open" "business, school"
"open" "Ud or door"
"operate"
"opposed-to" " " "against, disapprove
"opposite-of " "
"out-of "out"
"outside-of " " "outside"
"overestimate" " " It It
"overwhelm" " " It II
" "Aowiimn1"1 " " "nncιn~ lι~-v~ll
"paint" "building/object"
"paint" "picture"
"parallel-to"
"park" " " " "
"pass" "to someone" " "
"pass" "test" " "
"pass-by" "pass, go-by, bypass"
"pass-by" "pass, go-by, bypass"
"pay-attention-to " " " "concentrate-on, focus-on"
"peel" " " " "
"permit-that" " " "aUow, let"
"perpendicular-to"
"photocopy" " " "copy"
"photograph" " " tt n
"pierce" "puncture, penetrate"
"plant" " " " "
"plan-to" "determined, plot-to, conspire, plan"
"play" "musical device" " "
"play" "game"" "
"plough"
"plug-in"
"poison" "poUsh"
"poUute"
"pour" " "
"predict-thai "forecast, predict"
"prepare" n It
"prescribe"
"press" " " "push"
"prevent" "prohibit, bar, keep, block"
"print" "w/ printer"
"privatize"
"promote" "in workplace"" "
"propose-that" " " "suggest, offer, propose"
"protect" "defend, guard"
"protest" "contest, dispute"
"prove" "show, demonstrate"
"pubUsh" "print"
"puU" " "
"puU-apart"
"punch" "hit, strike"
"push" " "
"put" " " "place"
"qualify-for"
"quit" " " "resign-from, resign"
"raise" "children"
"read" " "
"reassure" "comfort, reUeve"
"recommend-that" " " "suggest, advise, recommend"
"record"
"reflect"
"refuse-to"
"remember" "recaU, recoUect"
"remove" "take-off, delete, eUminate, erase"
"replace" "change, switch"
"rescue" "save, help"
"respond-to" "reply, answer, respond"
"ride" "animal"
"ride" "vehicle" "fly, take"
Figure imgf000055_0001
"roast" " "
"roU" " "
"rotate" "spin"
"sad-that" "regret, sorry, sad"
"salute"
"saw" " "
"say" " " "remark, declare, assert, aUege, state, notify, ir
"scald" "w/ water" "burn"
"scatter"
"scold" " " "reprimand" "score" "points" "make"
"search" It It It II
"search-for" H II "look-for, seek"
"see" " " "notice"
"select" N M "choose, pick, decide-on"
"sew" " " M N
"shake" tt It "rock, vibrate"
"share" " " H N
"shave" tt M H «
"shoot-down H It It N II
"shop-for" tt II "shop"
"should" H It "ought-to"
"similar-to" It II "like"
"since" " " "as-of
"sing" " " It II
"slow-down" II II "delay, retard, decelerate"
"smeU" " " II II
"sort" " " "organize, arrange"
"speak" "language" " "
"speed-up" It II "hasten, rush, accelerate"
"speU" " " n II
"spend" "time1 I It tt
"spend" "money" " "
"spϋl" " " It N
"spray"" " It N
"spread" H It n II
"squeeze" II II H M
"stay-at" tt II "remain-at, stay-in, remain-in, st
"steam" H II H H
"sting" " " "bite"
"stir" " " "mix"
"stop" " " "cease , end, halt"
"store" " " "stock, keep, save"
"straighten-up II " " "clean, tidy-up, organize'
"strangle" K II "choke"
"strike" "run into" "hit, run-into"
"study"" " It II
"subtract" tt tt "minus, deduct"
"supposed-to" II II "supposed"
"surprise" n it "shock"
"swaUow" tt M II M
"swing" n It "rock"
"take-into-account" "consider"
"taste" " " "try"
"tax" " " II It
"teach" "people ;" "instruct, train, educate"
"tear II π p_«
"terrify" It II It II "that" "statement" " "
"that-have" "with"
"think-about " " " "consider, think-over"
"think-that" " " "suppose, suspect, assume, beUeve"
"through"
"throughout1
"throw" " " "toss, hurl, heave"
"throw-away '" " " "throw-out"
"throw-up" "vomit"
"tighten" N It U N
"to" "send to"
"to" "say to"
"to" "give to" " "
"together-with" " " "accompany, with"
"to-give-to" "for"
"to-honor" "for"
"to-put-in" "for"
"torture"
"touch" "any form of contact" "contact"
"touch" "with hand" "feel"
"toward" it II "tn"
"translate"
"trick" " " "fool, decieve"
"trust" " "
"try-to" "attempt-to, try, attempt"
"turnaround"
"turn-of "switch-off
"turn-on" "switch-on, start"
"turnover" "flip"
"unable-to" "cant, incapable-of
"under" " " "beneath, underneath, below"
"underestimate" " " " "
"underline" H H U N
"understand" " " "comprehend"
"understand" "language" " "
"unfasten" "undo"
"unless"
"unlikely-to"
"unload"
"unlock"
"unplug"
"until" " " "till, to" up _ιι it
"use" " " "utilize, apply"
"use-up" "finish"
"using"" " "with"
"via" " " "by-means-of
"wait-for" føwpiwwai " "want" " " "want, desire, long-for, dream, wish'
"warm" "heat"
"warn "caution"
"wash" " " "clean1 It
"watch-for" "lookout-for"
"wear" " "
"wear-out"
"weigh"
"while"" " tt n
"win" " "
"wipe" " "
"without"
"work-for" "company" "work"
"work-on" "address, take-care-of
"worry" "worried"
"worse-than"
"worse-than" " "
"worship" "religious" " "
"worth"
"write" "by hand" "write-down, print"
"write" "compose" "author, compose"
"yeU-at yeU, shout, scream"
"appologize-to" " " "appologize"
"borrow"
"buy" " " "purchase"
"confiscate" " " "take, appropriate, seize, impound"
"receive" " " "get, acquire"
"steal" " " "take, rob"
"ask" "question" "ask"
"teU" "someone about something" "say"
"convince" "of something"" "
"guarantee" " " "assure, insure"
"promise" " " "vow, pledge"
"threaten"
"hijack"
"ask" "someone to do something" "request"
"beg" " " "plead, urge"
"convince" "someone to do something" "pursuade, compel"
"dare" " " " "
"designate" " " "assign"
"discourage" " " "dissuade"
"encourage" " " "urge, stimulate, incite, inspire"
"force" " " "make"
"order" "someone to do something" "insist, demand"
"require" " " "stipulate"
"tell" "someone to do something" " "
"banish" exile!'
"bring "take" "broadcast" "transmit"
"deUver"
"donate" "contribute"
"email"" "
"evacuate"
"export"
"fix" " "
"give" " "
"import" n M
"lease
"lend" " "
"mail" " "
"offer" " "
"order" "from company" " "
"owe" " "
"provide" "supply"
"rent" " "
"return" "give-back, send-back"
"seU" " "
"send" " "
"serve" " "
"ship" " "
"smuggle"
"exchange" "swap, trade"
"pay" "money"
"bet" " " "wagei .H
"abnormal" "irregular"
APPENDIX 3
DESCRIPTORS
"abnormal" It It "irregular"
"abroad" N N "overseas, away"
"absent" « N
"ache" " " painful, hurt, sore"
"adjacent" "next, neighboring"
"afraid" "scared, frightened"
"again"" "
"agree"" " "like-minded"
"ahead-of-time" " " "early, premature, in-advance"
"aUve" " " "deceased, die"
"aU "every, each"
"aU-N" " " "both" "almost" ti II "almost, nearly, about, practicaUy"
"alone II ti
"also" " " too, in-addition"
"always" H II "aU-the-time"
"amazing" II II "astonishing, fantastic, impressive, miraculous, incredible, remarkable"
"angry" "irate, cross, mad, upset, spiteful"
"annoyed" It II
"annoying" tt II "irritating, aggravating, troublesome, bothersome"
"argue" II H "quarrel, fight"
"asleep" II It It
"at-home" II It tt N
"at-office" It tt II II
"at-school" II II H tt
"automatic" It II H II
"average" N It H II
"awake" II tt "up"
"backward" M II "go-back, reverse, back"
"bad" " " "badly"
"barely" "hardly"
"bathe"" " "take-bath"
"be-careful" " " "watch-out"
"bent" " " "folded"
"best" " " "ultimate"
"black" " "
"blind" " "
"blonde" "hair"
"blue" " " H tt
"boil" " " "boiling, simmer"
"bold" " " II M
"boo" " " "hiss"
"bored" "boring" n H "monotonous, duU"
"born" " "
"brand-new"
"brave" n II "bold, courageous, daring"
"broke" n It "pennyless, bankrupt"
"brown"
"buddist"
"buip" " " "belch"
"by-fer" "easUy"
"calm" " " "easy-going, meUow, tranquU, sedate"
"cautious" "careful, wary"
"change" "different, changed"
"change-clothes" change"
"Chinese"
"Christian"
"clean" " " "immaculate, spotless"
"closed"
"cloudy" tt tt
"cold" " " "chiUy r"
"commit-suicide"
"common" "typical, usual, abundant, plentiful"
"complain" n It "whine, object"
"complex" "complicated, difficult, elaborate, detaUed"
"confident"
"confusing" "puzzling"
"continue"
"continuous" "continuously, constantly, continually, continual"
"cooked"
"correct" "right, accurate"
"cough"
"crushed" "smashed"
"cry" " " "weep. , sob"
"daUy" " " "annual"
"damaged" n II "broken"
"dangerous" "hazardous, perilous, unsafe, risky, precarious"
"Danish"
"dark" "in color" "dim"
"dead" " "
"deaf " "
"decrease" "diminish, lessen, decline, subside, reduce, deprec
"deep" " "
"deep-fried"
"defective" "faulty, imperfect"
"definitely" "doubtless, definite, certain, undoubtably, absolute without-fail, without-doubt" "different" " " "dissiπύlar, differ, unequal" "difficult" " " "compUcated, tough, hard" "dirty 'filthy" "disabled" "handicapped, crippled"
"disagree" "differ"
"dishonest" "Uar, disreputable"
"disorganized" "messy, chaotic, sloppy, disorderly, untidy, slovenly"
"divorced" " " "divorce"
"downstairs" " "
"drunk" "intoxicated, inebriated"
"dry" "thing" "drying f'
"dry" "weather"
"Dutch"
"early" "in time"
"east" " " "eastern"
"easy' "simple, effortless, easUy"
"eerie" " " "weird, mysterious, bizaπe"
"embarrassed" " " "ashamed"
"embarrassing"
"employed"
"empty" " " "vacant, bare, barren
"end" " " " "
"engaged" "to-be-married" " "
"enjoyable" " " "fiin, amusing, delightful, entertaining"
"enormous" "amount/size" "enormous, immense, huge, gigantic, massive, vast"
"enough" " " "plenty, sufficient, adequate"
"envious"
"exactly"
"exceUent" outstanding, superb, wonderful, exceptional, tremendous, terrific, sensational, marvelous, fantastic, fabulous, great, glorious" "exhausting" "tiring" "expensive" "costly, valuable, precious" "expert" "experienced, skiUed, talented, skillful" "extreme" "severe, harsh, radical" "face" " " "faU-down" "faU" "famiUar" "far-away" distant, far"
"fast" "abstain from food" " "
"fast" " " "quick, speedy, rapid, quickly, swift, hasty"
"favorite
"few" " " "several" "first" " "
"fish" " "
"flat" "soft-drink"
"flat" "surface" "even, level"
"forbidden" " " "foreign" "for-rent "for-sale"
"forward" "straight"
"fractured" " "cracked, break, broken, smashed"
"fragUe" "deUcate, breakable"
"free-of-charge" "free, gratis"
"fresh" "food" " "
"fried" " " " "
"frozen" "by temperature" " "
"fuU" "with food" "stuffed"
"functioning" " " "work, working, function, operate1
"good" " " " "
"good" " " "pleasing"
"goodbye" must"
"grateful" " " "thankful, appreciative"
"green"
"grey" " " " "
"grow" "person, plant"" "
"grow" " " " "
"guUty" " " " "
"handsome' person "good-looking, attractive"
"hang" " " It N
"happen" II M "occur, take-place, transpire, result"
"happy" II H "joyful, glad, joyful, joyous, deUghted, pleased"
"hard-of-hearing H It
I'"
"healthy" "fit"
"heavy" tt H
"heUo" " " "hi"
"hereditary" tt H "inherited"
"hiccup" " "
"hidden" "concealed"
"high" "amount/level"
"high-y" "y-direction" "taU"
"hike" " "
"hitchhike"
"hot" " "
"hourly"
"how" " "
"humorous" H II "funny, amusing, hUarious, hysterical, witty"
"hungry" N II N N
"identical" II tt ιι i;ndentical, match, same, look-alike"
"Ulegal" II M "unlawful, illicit, against-the-law, criminal"
"important" tt H "significant, crucial, critical, great, momentous, profound, vital, matter, essential, noble"
"impossible" " " "hopeless"
"in-bad-condition" " " "in-bad-shape"
"incorrect" "wrong, erroneous, untrue, false, inaccurate, mistaken"
"increase" "advance, rise, grow, explode, multiply"
"indoors" "inside" "inexpensive" cheap, affordable"
"inferior" N H
"in-future" H II
"in-good-condition" "in-good-shape"
"injured" N It "hurt"
"in-love" N H
"in-past" II N H N
"insane" II II N crazy, nuts, mad"
"inside-out" II H
"instead" M II
"in-stock" M H "avaiUable"
"inteUigent" "smart, clever, bright, sensible"
"intentionally f» It H "on-purpose, deUberate, deliberately"
"interesting ,11 tt II "stimulating, provocative, entertaining"
"intermitent tt tl It H
"international" It II "global, worldwide"
"itaUc" " " M M
"Japanese" II II
"jealous" tt tl "spiteful"
"Jewish" II It H H
"keep-quiet" M II "quiet"
"kiss" " " It II
"large" "amount/size" "big"
"late" "in time"
"late" "tardy" "tardy, belated, overdue'
"laugh chuckle"
"lazy" II H H
"leading" "top, foremost, principle"
"least-favorite"
"left" "direction" II H
"left-handed" " " II tt
"left-over" " " "remaining, left, remain"
"legal" lawful, legitimate, just"
"Ught" "in color" N N
"Ught" "weight" H N
"likely"" " "good-chance, probable, probably, presumable"
"local " "
"lonely" " " "lonesome"
"long" "in time" H N
"long-ago" " " II N
"long-z" "z-direction" " "
"look" " " " "
"look-out" " " "watch-out"
"lost" "w/o directions"
"low" "amount/level
"lower-case" " "
"loyal" " " "honorable, faithful, obedient"
"lucky "fortunate" "main" " " "primary, principal, fundamental, basic, central" "mandatory" " " "compulsory, requisite, required, obUgatory" "many"" " "numerous, a-lot"
"married" many"
"mass-produced" " " "factory-built, factory-made" "maybe" " " "might, perhaps, could"
"mean"" " "maUcious, unkind, cruel, hurtful, bitter"
"medical
"medium-size" "amount/size" "medium" "melt thaw"
"metal"" " It N
"Mexican" H tl M N
"miUtary" tt II II It
"misplaced" II II "lost"
"missing" M tl "lost"
"moldy" It II H H
"monthly" •1 II II II
"more "another, extra"
"more-than-enough" "plenty'
"most" " " It II
"mountainous' li lt II It N
"Muslim"
"must" " " "have-to"
"mute dumb"
"my" " " "mine"
"narrow-x" "x-direction" "thin, slender, slim"
"N-ary" " " "primary, secondary, tertiary"
"national"
"nauseous"
"near" " " "close, around, near-by"
"nervous"
"never" " " " "
"new" " " " "
"next" " " "following, subsequent" nice" " " "pleasant, considerate, friendly, kind, likeable"
"N-more" " " "once-more, one-more, twice-more, another"
"N-of " " " "
"none" " "
"normal" " " "standard, ordinary, regular, typical, customary, usual"
"north"" " "northern"
"nnt" " " " "
"not-busy" " " "unoccupied, free" "not-complex" " " "simple" not-enough" " " "inadequate, insufficient"
"novice" " " "beginner"
"now" " " "asap, immediately, immediate, right-away, right-now"
"N-th" " " "first, second, third, initial"
"N-times" " " "once, twice, three-times" "nutritious" "nourishing, healthy, good-for-you"
"obvious" "undeniable, self-evident, clear, apparent, evident, obviously, apparently"
"of-course"
"off " "
"often" " " "aU-the-time, frequently, frequent"
"okay" " " "alright" old" "thing"
"on" " " "running"
"one-of
"on-sale"
"on-time" "punctual, prompt"
"on-vacation" " "
"open" "store" " "
"open" "box
"open-air"
"optional"
"orange"
"organized" "tidy, systematic, neat, in-order, clean"
"outdoors" "outside"
"out-of-order" " "out-of-order, break-down, broken"
"overweight" "obese, corpulent, fat"
"panhandle" "beg"
"paranoid"
"pause" "hesitate, stop"
"permitted" "aUowed"
"pink" " " rosy"
"plastic"
"play" " " playing"
"please"
"poor" " " impoverished, destitute, indigent"
"portable"
"possible" " " "attainable, possible, could, conceivable, realizable"
"powerful" "miUtary" "strong, mighty"
"powerful" "poUtical" "strong, mighty, influential"
"presently" " " "nowadays, current, present, right-now"
"pretty" "person" "beautiful, good-looking, attractive"
"previous" "prior, former, preceding, last, formerly"
"punctured"
"purple" it n
"quiet "sUent"
"raw" " " "uncooked"
"ready" "prepared"
"recently" "recent, lately"
"red" " "
"regional"
"relaxed"
"religious" "remarried
"rest" " " "relaxing, relax, resting"
"retired"
"right" "direction" M It
"right-handed" " M H H
"right-side-up ,M M M It H
Figure imgf000067_0001
"round" II H "circular, rounded"
"run" "jog" "jog'
"run-away" " "
"run-out-of-gas"
"rural" " " "rustic'
"Russian"
"sad" " " "unhappy, upset, discontented, melancholy, lamentable?"
"safe" " " "secure"
"scary" " " "terrifying, frightening"
"seasick"
"seldom" It II "rarely, infrequently, scarcely, infrequent"
"selfish" tt II "conceited, arrogant, egotistic"
"senile" II It M II
"separated" M H "separate"
"set-table" H II
"shake-hands" "
"shaUow" H H
"short" "in time" "brief
"short-sided" "
"short-y" "y-direction" "low, Uttle"
"short-z" "z-direction" "Uttle"
"should" "ought"
"shower" "take-shower"
"shut" "box" "closed"
"shy" " " "timid, bashful"
"sick" " " "Ul, ailing"
"simUar"
"single" "bachelor"
"sit" " " sit-down, perch"
"skinny" "person" "slim"
"skydive" H n It N
"sleep-walk"
"slow" " " II n
"smaU" "amount/size" "Uttle, nominal, low, minor
"srmle" " " II gri '-nII "smoked" M H H H "sneeze" H II II II "snore"" " II H "snowy" W It "snow"
"sober It tt
"some" " " "several, any" sometimes " " "occasionaUy, occasional, now-and-then" soon" shortly, immediately, immediate, asap"
"sour"
"south" " " "southern"
"South-American" " " "Spanish"
"special" ' " "distinct, unique, exceptional, extraordinary"
"spoiled" food" "rancid, rotten, bad"
"squint" "clap"
"stand"" " standing, get-up, stand-up"
"start" " " begin, commence, kick-off
"starving" seriously" " "
"stingy" "cheap"
"stop" " " tl II
"stormy" tt II
"strange" " " "curious, funny, odd, pecuUar, weird, bizarre"
"strong" "a person, physicaUy" " "
"stupid" " " "absurd, crazy, nuts, insane, foolish, idiotic, ridiculous, mad, ignorant, unwise"
"sturdy" It tl "hardy, robust, strong, tough"
"suburban" N tt N H
"sunbathe" H II H II
"sunny" H H H II
"suspicious" N II H N
"sweat" M It II perspire
"sweet" N H II N
"swim"" " tt N
"take-walk" H II "stroll, walk"
"temporary" II II "provisional"
"terrible" tt II "awful, horrible, dreadful, disastrous"
"thanks" II II "thank-you"
"that" " " It It
"there" " " N N
"these" " " tt n
"thick" " " II N
"thick" "fluid" "viscus"
"thin" " " " "
"thin" "fluid" " "
"think" " " "ponder"
"thirsty" " " " "
"this" " " " "
"those"" "
"thrifty" " " "frugal, cheap"
"tiny" "amount/size" "minute, Uttle, miniature, puny, sUght"
"tired" " " "weary, fatigued, sleepy"
"together" "do something" "coUectively, aU-together, jointly"
"too-much" " " "excessive"
"transparent clear" "travel" "take-trip" "turn" " "
"unanimous" It tt
"unattractive" "person It "gross, ugly" "unavoidable" " " "i impending, inevitable" "unchanged" H
"uncommon" n it
"rare, unusual, out-of-the-ordinary, obscure, scarce, rarely"
"underlined" II H "unemployed" tt tt "unfamiliar" II It "unfortunately" "unfriendly" " "cold, anti-social" "unhealthy" "food I"II H M
"unimportant" " " "insignificant, superficial" "unintentionally" " " "accidental, by-chance, by-mistake, accidentally, by- accident"
"unique" "only, one-of-a-kind"
"unlikely" "doubtful, improbable, hopeless"
"unlucky" "unfortunate"
"unpleasant"
"unprepared"
"unripe" II II
"upside-down -It It II
"upstairs"
"urban"
"urgent" "critical"
"voluntary" H It
"vomit" "throw-up"
"walk" "as opposed to run" " "
"warm" " " "hot"
"waterproof " "
"weak" "a person, physically" " "
"weak" "miUtary" "powerless, feeble
"weak" "poUtical" "powerless, feeble'
"wealthy" " " " "rir ch, affluent"
"weekly" N M
"well-known" " person
"west' "western"
"wet" " " "damp"
"white"
"why" " "
"wide-x" "x-direction" "broad, thick"
"widowed"
"windsurf
"worn "threadbare"
"worn-out" tt It M M
"worried" "concerned"
"worst" It H "worthless"
"yawn
"yearly" " " "e ev\ eryday"
"yeUow"
"young" "person"
"your" " " "yours"

Claims

1 1. A method of facilitating searches of electronically stored documents
2 in a text-searchable database, the method comprising the steps of:
3 a. electronically storing, with respect to each document, at least one text-searchable abstract descriptive of a document subject,
5 the abstract comprising a series of words generated by selecting
6 a nominal item and expanding the abstract by iteratively applying
7 at least one of a set of rules comprising:
8 i. to a nominal item, add a descriptor describing the nominal
9 item; to ii. to a nominal item, add a connector item and another ii nominal item, connector items specifying relationships be-
12 tween at least two nominal items;
13 iii. to a nominal item, add a logical connector and another
14 nominal item, logical connectors establishing sets of nomi- ls nal items; and i iv. to a descriptor item, add a logical connector and another i7 descriptor item; is b. receiving a user query comprising a series of words; i9 c. applying the query to the document abstracts to identify word 0 matches therebetween; i d. identifying stored documents having abstracts with words 2 matching at least some of the words of the query; and 3 e. based on word matches, ranking the identified documents in an 4 order relevance, the order favoring documents having abstracts s with terms literally matching the query.
1 2. The method of claim 1 wherein each iteration of the rules is identi-
2 fied by enclosure within parentheses.
1 3. The method of claim 1 further comprising the step of structuring the
2 query according to the set of rules.
1 4. The method of claim 1 further comprising the step of providing da-
2 tabases of nominal, connector, descriptor and logical-connector items, the
3 words of the abstract being selected from the databases.
i 5. The method of claim 4 wherein the words of the user query are also
2 selected from the databases.
1 6. The method of claim 1 wherein the documents comprise viewable
2 and non-viewable portions, the abstracts being contained in the non-
3 viewable portions.
i 7. The method of claim 1 further comprising the step of facilitating
2 user retrieval of the identified documents.
1 8. The method of claim 1 wherein the nominal items name a person,
2 place, thing, activity or idea.
1 9. The method of claim 8 wherein the nominal items include the terms
2 set forth in Appendix 1.
1 10. The method of claim 1 wherein the connector items show action, being or state of being.
i 11. The method of claim 10 wherein the connector items include the
2 terms set forth in Appendix 2.
1 12. The method of claim 1 wherein the descriptor items describe a quality, quantity, state or type of a nominal entry.
1 13. The method of claim 12 wherein the descriptor items include the en-
2 tries set forth in Appendix 3.
1 14. The method of claim 4 wherein the logical-connector database com-
2 prises the entries and, or.
i 15. Apparatus for facilitating searches of electronically stored docu-
2 ments, the apparatus comprising:
3 a. a database of electronically stored documents, the database
4 comprising, with respect to each document, at least one text- s searchable abstract descriptive of a document subject, the ab-
6 stract comprising a series of words generated by selecting a
7 nominal item and expanding the abstract by iteratively applying at
8 least one of a set of rules comprising: i. to a nominal item, add a descriptor describing the nominal lo item; π ii. to a nominal item, add a connector item and another
12 nominal item, connector items specifying relationships be- i3 tween at least two nominal items;
14 i. to a nominal item, add a logical connector and another is nominal item, logical connectors establishing sets of nomi-
16 nal items; and i7 iv. to a descriptor item, add a logical connector and another is descriptor item;
1 b. means for receiving a user query comprising a series of words; 0 c. means for applying the query to the document abstracts to iden- i tify word matches therebetween; 2 d. means for (i) identifying stored documents having abstracts with 3 words matching at least some of the words of the query, and (ii) 4 based on the word matches, ranking the identified documents in 3 an order relevance, the order favoring documents having ab- 6 stracts with terms literally matching the query.
1 16. The apparatus of claim 15 further comprising databases of nominal,
2 connector, descriptor and logical-connector items, the words of the ab-
3 stract being selected from the databases.
1 17. The apparatus of claim 16 wherein the words of the user query are
2 also selected from the databases.
1 18. The apparatus of claim 15 wherein the documents comprise
2 viewable and non-viewable portions, the abstracts being contained in the
3 non-viewable portions.
i 19. The apparatus of claim 15 further comprising means for facilitating user retrieval of the identified documents.
1 20. The apparatus of claim 15 wherein the nominal items name a per-
2 son, place, thing, activity or idea.
1 21. The apparatus of claim 20 wherein the nominal items include the
2 terms set forth in Appendix 1.
1 22. The apparatus of claim 15 wherein the connector items show ac-
2 tion, being or state of being.
i 23. The apparatus of claim 22 wherein the connector items include the
2 terms set forth in Appendix 2.
i 24. The apparatus of claim 15 wherein the descriptor items describe a quality, quantity, state or type of a nominal entry.
1 25. The apparatus of claim 24 wherein the descriptor items include the entries set forth in Appendix 3.
26. The apparatus of claim 16 wherein the logical-connector database comprises the entries and, or.
PCT/US1999/001299 1999-01-22 1999-01-22 Method and apparatus for improved document searching WO2000043911A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/US1999/001299 WO2000043911A1 (en) 1999-01-22 1999-01-22 Method and apparatus for improved document searching
AU24636/99A AU2463699A (en) 1999-01-22 1999-01-22 Method and apparatus for improved document searching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US1999/001299 WO2000043911A1 (en) 1999-01-22 1999-01-22 Method and apparatus for improved document searching

Publications (1)

Publication Number Publication Date
WO2000043911A1 true WO2000043911A1 (en) 2000-07-27

Family

ID=22272029

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1999/001299 WO2000043911A1 (en) 1999-01-22 1999-01-22 Method and apparatus for improved document searching

Country Status (2)

Country Link
AU (1) AU2463699A (en)
WO (1) WO2000043911A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10445384B2 (en) 2013-10-16 2019-10-15 Yandex Europe Ag System and method for determining a search response to a research query

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0361464A2 (en) * 1988-09-30 1990-04-04 Kabushiki Kaisha Toshiba Method and apparatus for producing an abstract of a document
EP0855660A2 (en) * 1997-01-17 1998-07-29 Fujitsu Limited Summarization apparatus and method
US5842206A (en) * 1996-08-20 1998-11-24 Iconovex Corporation Computerized method and system for qualified searching of electronically stored documents

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0361464A2 (en) * 1988-09-30 1990-04-04 Kabushiki Kaisha Toshiba Method and apparatus for producing an abstract of a document
US5842206A (en) * 1996-08-20 1998-11-24 Iconovex Corporation Computerized method and system for qualified searching of electronically stored documents
EP0855660A2 (en) * 1997-01-17 1998-07-29 Fujitsu Limited Summarization apparatus and method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10445384B2 (en) 2013-10-16 2019-10-15 Yandex Europe Ag System and method for determining a search response to a research query

Also Published As

Publication number Publication date
AU2463699A (en) 2000-08-07

Similar Documents

Publication Publication Date Title
Beye Ancient Greek literature and society
Eble Slang & sociability: In-group language among college students
US5884247A (en) Method and apparatus for automated language translation
Geeraerts Diachronic prototype semantics: A contribution to historical lexicology
Lindberg Oxford American writer's thesaurus
Hale Sin and syntax: How to craft wicked good prose
Kulick A death in the rainforest: How a language and a way of life came to an end in Papua New Guinea
Burridge Weeds in the garden of words: further observations on the tangled history of the English language
Randolph Pissing in the snow and other Ozark folktales
Gurnah Pilgrims Way: By the Winner of the Nobel Prize in Literature 2021
US20060195433A1 (en) Information searching system and method thereof
Steinmetz et al. The life of language: The fascinating ways words are born, live & die
Emanatian Everyday metaphors of lust and sex in Chagga
Igboanusi The Igbo tradition in the Nigerian novel
Rabalais Folklore Figures of French and Creole Louisiana
Gagné et al. Does s now man prime plastic snow?: The effect of constituent position in using relational information during the interpretation of modifier-noun phrases
WO2000043911A1 (en) Method and apparatus for improved document searching
Ling A Rumble in the Silence: Crossings by Chuang Hua
LIST From 2020
Black The use of words in context: The vocabulary of college students
Kohnstamm Do Travel Writers go to Hell?: A swashbuckling tale of high adventures, questionable ethics, and professional hedonism
Koén-Sarano King Solomon and the golden fish: Tales from the Sephardic tradition
Kipfer Phraseology: Thousands of Bizarre Origins, Unexpected Connections, and Fascinating Facts about English's Best Expressions
Berlinski Fieldwork: A Novel
Sajé Windows and doors: A poet reads literary theory

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AU BA BB BG BR CA CN CU CZ EE GD GE HR HU ID IL IN IS JP KP KR LC LK LR LT LV MG MK MN MX NO NZ PL RO SG SI SK SL TR TT UA UZ VN YU

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
122 Ep: pct application non-entry in european phase