WO2008014469A2 - Recherche de produit de données utilisant des concepts associés - Google Patents

Recherche de produit de données utilisant des concepts associés Download PDF

Info

Publication number
WO2008014469A2
WO2008014469A2 PCT/US2007/074621 US2007074621W WO2008014469A2 WO 2008014469 A2 WO2008014469 A2 WO 2008014469A2 US 2007074621 W US2007074621 W US 2007074621W WO 2008014469 A2 WO2008014469 A2 WO 2008014469A2
Authority
WO
WIPO (PCT)
Prior art keywords
term
search
terms
user
data
Prior art date
Application number
PCT/US2007/074621
Other languages
English (en)
Other versions
WO2008014469A3 (fr
Inventor
Robert M. Brinson
Bryan Glenn Donaldson
Nicholas Levi Middleton
Harry H. Blakeslee
Original Assignee
Intelliscience Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US11/733,478 external-priority patent/US20070175674A1/en
Application filed by Intelliscience Corporation filed Critical Intelliscience Corporation
Publication of WO2008014469A2 publication Critical patent/WO2008014469A2/fr
Publication of WO2008014469A3 publication Critical patent/WO2008014469A3/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3322Query formulation using system suggestions

Definitions

  • This invention relates generally to computer software and, more specifically, to conducting a search using related concepts.
  • Systems and methods for searching data are disclosed herein.
  • the systems and methods include searching a plurality of data products stored at one or more locations over a computer-based network. At least one data product is identified containing a topic of interest. A list of significant terms is ranked in the identified data product. The ranking is based on a weight value for each of the significant terms found in the data store. A search string is created including at least one significant term. At least one search application is searched using the search string. If at least one data store was found during the search, the found data products are displayed.
  • FIGURE 1 shows an example system for executing a search based on related concepts
  • FIGURE 2 shows an example method fo ⁇ ned in accordance with an embodiment of the present invention
  • FIGURE 3 shows an example method for parsing data products and retrieving words in accordance with a first embodiment
  • FIGURE 4 shows a method for weighting terms in data products in accordance with an embodiment of the present invention
  • FIGURE 5 shows an example method for searching based upon complex terms in accordance with an embodiment of the present invention
  • FIGURE 6A shows an embodiment of optionally executing any of 3 search functions in accordance with another embodiment
  • FIGURE 6B shows an embodiment of producing a list of related terms for a query in accordance with an embodiment of the present invention
  • FIGURE 6C shows an embodiment of producing a list of data products for a query in accordance with an embodiment of the present invention
  • FIGURE 6D shows an example method for determining which data products satisfy a query and how closely a given data product matches a query, in accordance with another embodiment
  • FIGURE 7 shows an example method for determining additional search terms by offering synonyms and spelling suggestions during a search in a preferred embodiment
  • FIGURE 8 shows an example method for altering the significance of a search term in accordance with an embodiment of the present invention
  • FIGURE 9 shows an example method for selecting data products in accordance with an embodiment of the present invention.
  • FIGURE 10 shows an example method for selecting data products in accordance with an embodiment of the present invention
  • FIGURE 11 shows major database relationship tables formed in accordance with an embodiment of the present invention
  • FIGURE 12 shows a relationship between search terms and data products in accordance with an embodiment of the present invention
  • FIGURE 13 shows a relationship between multiple search terms and data products in accordance with an embodiment of the present invention
  • FIGURE 14 demonstrates the relationship between terms in a chosen subject
  • FIGURE 15 demonstrates the relationship between terms in a chosen subject and further suggests related terms
  • FIGURES 16-22 show graphical user interfaces formed in accordance with an embodiment of the present invention.
  • FIGURE 23 shows a similarity matrix used to find similar queries formed in accordance with an embodiment of the present invention
  • FIGURE 24 shows a method 600 of searching using related concepts in an alternate embodiment
  • FIGURE 25 shows a screenshot of the identification of at least one data store
  • FIGURE 26 shows a screenshot of a search application and search string selection screen
  • FIGURES 27-31 show screenshots of the results of the search in a plurality of search applications.
  • FIGURE 1 shows an example system 100 for executing a search based on related concepts.
  • the system 100 includes a computer 101 in communication with a plurality of other computers 103.
  • the computer 101 can be connected with a plurality of computers 103, a server 104, a data storage center 106, and/or a network 108, such as an intranet or the Internet.
  • a bank of servers, a wireless device, a cellular phone and/or another data entry device can be used in the place of the computer 101.
  • a database stores significant terms and/or similar queries. The database is stored at the center 106 or locally at the computer 101.
  • an application program run by the server 104 or computer 101 creates initial database tables.
  • the tables store significant terms found in each of a plurality of the data products, as well as the relationships between each table, and data product locations.
  • Example database tables are described in FIGURE 11.
  • the computer 101 or server 104 includes an application program that parses and ranks terms in each of the plurality of data products. This is described in more detail in FIGURE 3.
  • the computer 101 or server 104 includes an application program that displays results of a search. This process is described in more detail in FIGURE 6.
  • the application program monitors the data products for changes and updates the database tables when a change has occurred or a new data product has been made available.
  • a data product search using related concepts is executed on a stand alone computer 101.
  • a data product search using related concepts is executed on a computer 101 connected to a plurality of computers 103, a server 104, a data storage center 106, and/or a network 108, such as an intranet or the Internet.
  • a data product search using related concepts is executed on the Internet allowing a user to search a plurality of Internet pages.
  • the data products could be of any format containing text, including but not limited to a word processing document, a spreadsheet, a database, a web page, and/or a text file.
  • FIGURE 2 shows a method formed in accordance with an embodiment of the present invention.
  • a database is setup through a data product parsing function, which will be described in more detail below in FIGURES 3-5.
  • a search of the database is performed by searching the results of the data product parsing function stored in the database. The search is described in more detail below with respect to FIGURES 6-10.
  • FIGURE 3 shows an example method (block 105) for parsing data products and retrieving significant terms from each data product in accordance with a first embodiment.
  • the method (block 105) begins at a block 124 by determining the type of a data product to be parsed.
  • a parsing routine which is based on the identified data product type, parses each word and the parsed words are entered into a parsed list of terms for each data product.
  • a term includes one or more words.
  • the terms are analyzed and weighted. This step is described in FIGURE 4.
  • the remaining terms after each term has been analyzed and manipulated, are stored in a list of significant terms in the database. The list of terms is stored in the database with each term being linked to its corresponding data product.
  • FIGURE 4 further describes the method described at block 128 of FIGURE 3.
  • a term is selected from the generated parsed list of terms.
  • a weight value is incremented and the additional occurrence of the term is deleted from the list.
  • a term's weight value is defined as a number assigned to a word, such that in a computation the word's effect on the computation reflects its importance.
  • the term is tested to determine whether the word is a sentence construction word. If the term is a sentence construction word then the term is removed and excluded from the parsed list see block 146.
  • Sentence construction words are those used commonly in written text to build sentences, but have very little content information. They include words such as "and”, “the”, “this”, “of. Because they are common, the algorithm for determining significance of a term might incorrectly assign a high significance to these words that carry very little meaning. A configurable list of sentence construction words is maintained and no term is added to the term storage or weighted for a data product that is found in this list. Any query terms which match a sentence construction word are ignored, and if all the terms in a query are sentence construction words, the query is rejected.
  • a term's weight value is incremented if the term is in all caps see block 148.
  • a term's weight value is incremented if the term is in sentence case see block 150.
  • Sentence case is defined as a term that is all lower case, or is just capitalized because it follows a period, i.e. is the start of a new sentence.
  • a term's weight value is incremented if the term is in the name of the data product containing the term see block 152.
  • a term's weight value is incremented if the term is in the file location of the data product see block 154.
  • a term's weight value is incremented if the term has any special formatting see block 156.
  • special formatting includes italics, underline, larger font than most of the other text in the data product, quotations marks and/or strikethrough. Additional factors can be used to generate or adjust weights of terms, depending upon the data product format and application needs.
  • a term's weight value is incremented based on a terms proximity to a query term found in the data product (See FIG. 6).
  • a term's weight value is increased or decreased if the term is found within specified sections of the data product.
  • One embodiment would adjust the term's weight based on a dictionary of terms suitable to the data product and application system. After a term has been analyzed the final weight is then assigned to the term 158.
  • the parsed list is checked to determine if there are any additional terms to be analyzed. If so, the method returns to block 140 to enable the next term to be analyzed. If there are not any additional terms to be analyzed, then the weighted parsed list is returned to block 130 in FIGURE 3.
  • FIGURE 5 further shows a method described at block 126 of FIGURE 3 for parsing a word and entering that word into a list based upon multiple words or terms in accordance with an embodiment of the present invention.
  • the primary function of the method described in FIGURE 5 is to allow the database to learn and assign a weight value to phrases or combinations of terms.
  • a complex term is defined as a term containing phrases or combinations of terms.
  • the string is a complex term that has been used before, at block 176, the string is stored in the parsed list, then the method returns to block 174. If it is not a known complex term, then the string is checked to see if it is the beginning of a known complex term see block 180. If the string is the beginning of a complex term, the method returns to block 174. If the string is not the beginning of a known complex term, then the string is cleared at block 182 and the method returns to block 174.
  • FIGURE 6 A shows an example method from the block 110 of FIGURE 2 for initiating a search using one or more query terms.
  • a search is initiated when a user selects a query term or string of query terms (block 184).
  • the query terms are formatted into a proper syntax to conduct a search.
  • a query term is defined as a term or set of terms (search string) used in a search. Each term will be appended to a search string with an appropriate modifier.
  • search string search string
  • Each term will be appended to a search string with an appropriate modifier.
  • a term is entered through a user interface as shown in FIGURES 16-22. Once a query has been started by the user (block 190), then the desired type of query is identified.
  • the query is evaluated and output is produced at block 186. If a Similar Queries search is requested at block 187, the query is evaluated and output is produced at block 188. If a Data Products search is requested at block 189, the query is evaluated and output is produced at block 191.
  • the user has the choice of further refining their query (block 204), executing a different search (block 190) or viewing the Data Product from a Data Product or Similar Query search.
  • FIGURE 6B shows an example method from block 186 of FIGURE 6A for executing a related terms search.
  • the query term or string is used to identify at least one data product, and rank all the data products that are found at block 192. If a search is executed and no data products are found, then the user will be given the opportunity to edit the query term(s).
  • the weight values of all of the significant terms in each of the found data products are adjusted by the data product's query score and combined to those from the other data products to create a weighted list of significant terms.
  • the list of synonymous terms and potentially corrected spellings are generated in block 197.
  • the created weighted list of related terms is displayed to a user in ranked order on a visual display.
  • FIGURE 6C shows a method 205 for determining possible additional search terms by offering synonyms and spelling suggestions during a search.
  • a query term is selected.
  • the selected term is analyzed to determine if the term has any alternate spelling suggestions at block 208. If the term does have an alternate spelling then the alternate spelling is added to a list of related words see block 210. In an alternate embodiment, the user can alter the weight of different spelling suggestions.
  • the term is analyzed to determine if the term has any synonymous terms at block 212. If the term has one or more synonymous terms, then the synonymous term(s) is added to the list of related words at block 214.
  • the method 205 returns to block 206 if there are significant query terms in the query string that have not been analyzed. Once all of the search terms in the query string have been analyzed, the list of related words is displayed at block 218. The words in the list of related words can be then selected by the user to alter the original search terms. In an alternate embodiment, the user can alter the significance of different spelling suggestions.
  • FIGURE 6D shows a method of block 191 of FIGURE 6A for executing a Data Products Search.
  • the query term or string is used to generate a list of data products and rank them at block 191a. If a search is executed and no data products are found, then the user will be given the opportunity to edit the query term(s).
  • the weight values of the significant terms in the found data products that are not query terms are used to rank the terms within each data product.
  • the created weighted list of data products and their significant terms is displayed to a user in ranked order on a visual display.
  • FIGURE 7 shows the method of block 192 of FIGURE 6B or 191a of FIGURE 6D to determine which data products match the query, and rank them by how relevant they are to the query.
  • the query terms are used to identify at least one data product that satisfies the query.
  • the ranks for all the query terms and data product significant terms are loaded for each data product.
  • a score is calculated for each data product from the term rank for each query term that was found in list of terms for the data product. The list of data products, their query score and their significant terms are returned to FIGURE 6B or FIGURE 6D.
  • FIGURE 8 shows the method of block 204 shown in FIGURE 6 for altering the significance of a search term in accordance with an embodiment of the present invention.
  • the required modifier is a symbol that identifies the weight value of the term as required. If the user does not choose to add the term to the required word list, then the user may choose to add the term to an increase value term list at block 246. If the term is selected as an increase value term, then the term is added to the search query with an increase modifier at block 242.
  • the increase modifier is a symbol that identifies the weight value of the term as increase. If the user does not choose to add the term to the increase value word list, then the user may choose to add the term to the decrease value term list at block 248. If the term is selected as a decrease value term, then the term is added to the search query with a decrease modifier at block 242.
  • the decrease modifier is a symbol that identifies the weight value of the term as decrease. The user may choose not to add or modify a query term at all.
  • the definition of the weight value term "required" is any data product included in the results must include this term. Additionally, the term's rank in the data product is added to the data product rank when calculating the data product's query rank.
  • the definition of the weight value term "increase” is any data products containing this term will have the term's rank in the data product added to the data product rank when calculating the data product's query rank.
  • An "increase” term is a term that is desirable to the user.
  • the definition of the weight value term "decrease” is any data products containing this term will have the term's rank subtracted from the data product rank when calculating the data product's query rank.
  • a "decrease” term is a term that is undesirable to the user.
  • the definition of the weight value term "exclude” is any data product included in the results must not include this term. Accordingly, no change to the query rank is made for these terms.
  • an algorithm is used to manipulate the assigned weights of the found terms. Once a search is started, each of the query terms is assigned to a variable name. Each of the data products that contain the term is found, and all the terms in the data products are identified.
  • a data product's ranking is based on the following formula. The total rank of a data product is dete ⁇ nined by the weight of the query terms found in the data product. In one embodiment, the data product's total ranking is further adjusted by an analysis of all of the data products, such as references from one data product to another, or the location of the data products in the system.
  • the data product's ranking is increased when it includes any terms that have been used recently in other queries, by the weight of those terms in the data product.
  • the weight of Data product A equals the weight of Term 1 plus the weight of Term 2 plus the weight of Term 3.
  • the total value of each data product is stored temporarily in memory and the data products are ranked from highest score to lowest score.
  • the significant terms in the data product are ranked and set up on a graphical user interface.
  • the terms that do not match the query terms are ranked.
  • the Rank of Term 4 in Data product A is equal to the Rank of Data product A multiplied by the weight of Term 4 in Data product A.
  • all instances of the Term 4 are added up across all data products. For example, in this example Term 4 is found in Data products A and B; therefore the rank of Term 4 in A is added to the rank of Term 4 in B, to determine the final rank of Term 4.
  • FIGURE 9 shows a method 202 for selecting data products in accordance with an embodiment of the present invention.
  • the similar queries option allows the user to review queries that have been executed in the past that have some relation to their current query.
  • the similar queries tab is selected, a set of results that past users found helpful is displayed see FIGURE 22.
  • the similar queries tab is implemented by loading a set of queries that contain any terms that match any of the terms used by the user. Similarity between a past query and the user's current query is calculated by selecting each term in a past query that matches the current query, and then adding the value from a similarity matrix (see FIGURE 23) to determine a similarity score. Finally, the similar queries list is sorted form highest score to lowest score. Typically for queries with the same similarity score, the query with the fewest additional terms will be higher than one with more additional terms.
  • FIGURE 10 shows a method of displaying a list of similar queries in an embodiment of the present invention.
  • a similar queries search is initiated by the user selecting a similar queries tab (see FIGURE 22).
  • the current query is compared with all past queries. In order to make the comparison a similarity matrix is used (see FIGURE 23). If similar queries are found, the data products that were selected during the past similar queries are displayed to the user at block 259.
  • the similar queries option allows a user to see results that past users have found, the amount of times that a particular result has been selected, and/or the similarity between the current query and the past query.
  • FIGURE 11 shows major database relationship tables 260-270. There are several primary tables that include a unique key.
  • the tables include a table 262 that defines a term to the system.
  • the entries in the table 262 can be created from words found in the data products on the system and from terms used in queries by a user.
  • the tables ISFiIe 266, ISTerm 262 and ISQuery 270 are the primary elements.
  • the tables ISFileTermRel 260 records relations between ISFiIe 266 and ISTerm 262 (where terms exist in data products).
  • the table ISQueryFileRel 268 records relations between ISQuery 270 and ISFiIe 266 (which files were access from search queries).
  • ISQueryTermRel 264 records relations between ISQuery 270 and ISTerm 262 (which terms are present in each query).
  • ISFiIe 266 that defines a data product to the system and a ISQuery 270 that defines a query when a user has viewed a data product are defined.
  • the ISQuery 270 provides the basis for a similar queries search.
  • ISFileTermRel 260 defines the relationship between data products (266) and terms (262).
  • ISQueryTermRel 264 defines relationships between queries (270) and terms (262).
  • ISQueryFileRel 268 defines relationships between queries (270) and data products (266).
  • ISFiIe 266 may also include the following: a unique data product identifier that is assigned by a database; a stored location or path of the data product; a Boolean rank flag to determine whether the data product has been ranked. Typically, priority is given to data products that have not been ranked.
  • ISFileTermRel 260 includes a key for a term, a key for a data product, and a calculated value for the term in the data product, and/or a Boolean flag, which indicates that this term is a signal term in this data product.
  • ISTerm 262 includes a unique identifier for the term assigned by a database, the text of the term, and/or a Boolean flag indicating whether the term has embedded spaces, and needs special processing when looking for the term in a data product.
  • ISQueryTermRel 264 includes a key for a term, key for a query, and/or a string indicating how the term is used in the query, such as is the term required, increased in value, decreased in value, or excluded.
  • ISQueryFileRel 268 includes a key for the query table, a key for the data product table, and how many times a data product has been viewed form results of a query.
  • ISQuery 270 which defines a query when a user has viewed a data product, and includes a unique identifier for a term assigned by a database and/or a numeric value of a query terms and attributes used to quickly identify potential equal queries for lookup.
  • FIGURE 12 shows an example relationship network of search terms and data products when the query search term is Term A 272.
  • each oval represents a query search term and each rectangle represents a data product.
  • This relationship network is based on each data products relationship to Term A 272.
  • Term A 272 can be found on Page 1 274 and Page 2 276.
  • the Terms unique to Page 1 274 signify one theme of data products and the Terms unique to Page 2 276 signify a different theme of data products.
  • Page 1 274 also includes Terms B 278 and C 280.
  • Page 2 276 also includes Terms D 282 and E 284. From the significant terms on Page 1 274, there are two additional pages found.
  • Page 3 286 contains both Term A 272 and Term B 278.
  • Page 3 286, also includes Term F 290 and Term G 292.
  • Page 4 288 includes Term A 272 and Term C 280 (see FIGURE 13). Page 4 288 further includes Term H 294 and Term I 296.
  • a results set can be more clearly defined by selecting an additional term from Page 1 or Page 2. Pages 1-4 refer to distinct data products.
  • FIGURE 13 shows a relationship network when the search terms are Terms A and C 300.
  • Term A represents Term A 272 found in FIGURE 12 and
  • Term C represents Term A 280 found in FIGURE 12.
  • the combination of Terms A and C 300 reduces the total number of pages shown by the relationship network in FIGURE 12.
  • the combination of Term A and Term C result in only two pages, Page 4 302 and Page 1 304.
  • the remaining significant terms are Term H 306, Term I 308, and Term B 310.
  • FIGURE 14 demonstrates the relationship between terms in a chosen subject from a query. The most significant terms from the useful pages are displayed. This allows the users to select appropriate terms that can narrow a search. The relationship is shown by showing a term as an oval and linking the terms using arrows. A search for Term A would likely find data products containing at least one of Terms B-E. Therefore, by using significant terms a user is more likely to find the result they are looking for.
  • FIGURE 15 demonstrates the relationship shown in FIGURE 14 and also the relationship between terms in a chosen subject and further suggests related terms. In one embodiment there are not only terms that are related, but there are additional terms that the user did not think of such as synonyms and different spellings. These additional terms are shown as Terms 1-4.
  • FIGURE 16 shows a screen shot of a graphical user interface (GUI).
  • GUI graphical user interface
  • the GUI includes a menu bar 350.
  • This menu bar includes drop down menu's that are generally known in the art.
  • Below the menu bar is a query text box 352.
  • the query text box 352 includes a field where a user enters terms for a query. Text can also be added to this block using other means included in the GUI.
  • the GUI includes a text box 356 that allows a user to enter additional query terms. The entered terms will be appended to the end of a string in the query text box 352.
  • a user can choose to a scout tab 354 to show a listing of terms in data products that were found using the terms in the query text box 352. The listing of terms is ranked by the weight values of the terms that appear in the found data products.
  • the text box 356 allows a user to enter a term and then further select, as an example, "require term.”
  • the term shown in box 356 will then be appended to the string in the text box 352 with the character "+" preceding the entered term. This signifies to the system that the term directly following the "+” is a required term.
  • a list box 360 Directly below the text box 356 is a list box 360.
  • the list box 360 includes a list of terms currently used in the query.
  • the list box 360 includes the attribute of the searched term. In one embodiment an attribute is the designation given to the term by a user, such as require, exclude, increase value, or decrease value.
  • a results display area 366 includes a require section 358, an exclude section 354, an increase section 362 and/or .a decrease section 364.
  • a data product search using related concepts is implemented on or in conjunction with a preexisting search application.
  • FIGURE 17 shows a screen shot of a set of results from a Related Terms or Scout query in one embodiment.
  • the results display area 366 is populated with a result statistics field 370, a search statistics field 372, and/or a graphical display 376 of significant terms found in the search.
  • the result statistics field 370 shows the number of significant terms found, and the search string used.
  • the search statistics field 372 displays the amount of time it took to conduct the search.
  • the terms found in the search are displayed.
  • the terms are shown in a circular and/or clockwise manner. The most heavily weighted term is displayed at 12 o'clock and the weights of the terms decrease with a progression of displayed terms in the clockwise direction.
  • Each term in the display 376 is highlighted when a cursor control device, such as a mouse, places a cursor over or near a term.
  • the cursor can be activated by a user using the cursor control device to select a term and to drag it to any of the sections 354, 358, 362, 364.
  • a significant term is dragged onto one of the sections 354, 358, 362, 364 and dropped, the term with its corresponding modifying characteristic is added to the text block 352 and to the list box 360, see also FIGURE 19.
  • FIGURE 18 is a screen shot of one embodiment showing a list of data products found after a Data Products query.
  • a list of data products 380 are shown after a user chooses to have their results displayed by selection of a search tab 382 and pressing the "GO" button 383.
  • the list 380 shows title, the data product file path, and/or an abstract (not shown). Further under each term is a list of the most heavily weighted significant terms found in that data product.
  • the terms shown under a data product in the list 380 can be selected by the user to refine the present search, adding them to the query either as Required, Increase Decrease or Excluded.
  • the user selects a data product from the list 380 the data product is presented to the user.
  • FIGURE 19 is a screen shot of one embodiment showing a significant term being moved from the display area 366 to the section 354.
  • the term “themes” 400 is selected using the cursor control device and moved to the "exclude” section 354. Once the term is dropped in the section 354, the search query is appended with the term “themes” with the "-" modifier appearing next to the displayed term.
  • FIGURE 20 is a screen shot of one embodiment showing the term "scout” being added to the search query.
  • the term “scout” is added to text box 356. Then the user selects the require term function by activating the cursor over the "+,” “Require Term,” or by a selection from a pull-down menu. The term is appended to text box 352 and added to the list box 360.
  • FIGURE 21 is a screen shot showing terms after they have been added to the query.
  • the terms 410 and 412 have already been added to the text box 352 and the list box 360.
  • a new search is ready to be run with the additional query terms.
  • Go button 402 When the user activates a Go button 402, a new search is performed and a new graphical display of significant terms is presented.
  • FIGURE 22 is a screen shot showing a similar queries screen.
  • a user selects a similar queries tab 420. Shown in the display area 366 are the terms of similar queries and the paths of data products that were selected by previous user. Also shown is an access count that identifies the number of times that data product was selected when that particular query was performed.
  • the query 422 is a hyperlink allowing the user to re-run the similar search.
  • the data product paths 424 are also hyperlinked allowing a user to go directly to the data product. In one embodiment, when the user accesses a data product, an access count for that data product is incremented in the database.
  • the similar queries used to access each data product are reported to the application handling each data product. For example, if data products are web pages, the similar queries used to access each web page can be used to notify the organization hosting the pages. The hosting organization is then able to target their pages to the largest set of users looking for them.
  • each entry in the matrix shown in FIGURE 23 by N.
  • the values in the figure are preferably positive, and not negative, because the system is considering queries with some element of similarity to the user's query, thus the most similar query has all the same terms with all the same attributes, and the most dissimilar queries have no terms in common with the user's query.
  • other means for determining a query's similarity to a given query include: modifying the values in the table in FIGURE 23 to provide different weighting to attribute similarity and dissimilarity.
  • One embodiment expands the term comparison to allow for similar terms (not exact matches) such as synonyms, alternate spellings, root words and plurals. For example, if the query from the user had 4 terms, the matrix could be:
  • a term similarity score is calculated for each term in the user's query whose literal value matches one of the terms used in a similar query. Those term similarity scores are summed up and become the query similarity score. The number of terms in the potentially similar query that are not found in the user's query are stored temporarily.
  • the queries would be sorted in descending score order as B, A, C.
  • the server 104 or similar device includes a watch service.
  • a watch service When a new data product is made available for searching, an entry is created in a data product table containing the path for the new data product, an initial rank value of 0, and/or a ranking Boolean variable is set to true.
  • a watch service When a data product has been updated as determined by the watch service, the entry in the table for the data product is found and the Boolean variable is set to true. The Boolean value is set to true, because a new ranking needs to be done based on the updated content of the data product. Finally, if a data product is deleted then the corresponding entry in the data product table is deleted as well as any relationships with other system tables.
  • a watch service includes a general document repository or an indexing system.
  • FIGURE 24 shows a method 600 of searching using related concepts in an alternate embodiment.
  • This embodiment includes a pointed and directed search, which may be conducted locally or over a network, such as the Internet or an intranet, using significance-ranked terms determined via user-specified, preset, automatically- determined, and/or industry- or system-acceptable criteria used during file parsing and/or filtering.
  • the user identifies at least one data product containing a topic of interest.
  • the data source, or plurality of data sources includes, but is not limited to, a word processing document, a spreadsheet, a file, a database, a text file, a web page/site and/or a selection of manually-entered text.
  • a list of significant terms is ranked in the at least one data store based on a weight value for each of the significant terms found in the data store.
  • the ranking process is fully described above.
  • a search string is created including at least one signification term.
  • the user modifies the rank/weight of the terms presented in the ranked list and is also given the opportunity to require, exclude, include, remove, add, etc., the individual terms on the list.
  • the processor selects at least one significant term for the search string.
  • at least one search application is searched using the search string.
  • the search string is manually or automatically entered into a search query box as a properly concatenated search query string in accordance with the syntax and/or protocol requirements of the search engine(s) or search application(s) being utilized to conduct the local or network search.
  • multiple data-stores, such as a plurality of search engine or search application indexes are automatically queried simultaneously.
  • the found data products are displayed to a user.
  • the results of each query search are presented by a graphical user interface in an aggregated assembly, while in an alternate embodiment the results are presented in the individual or separate graphical user interfaces of each applicable search engine or search application.
  • FIGURE 25 shows a screenshot 700 of the identification of at least one data store.
  • the page 700 allows a user to identify a data store located on a computer or from a location on the internet that contains a topic of interest.
  • the identification screen also allows for a browse feature that allows a user to search for the document without having to enter in a data store path.
  • search button 710 When the user activates a search (search button 710) the file is parsed and a ranked list of significant terms is created and stored in the database.
  • FIGURE 26 shows a screenshot 800 of a search application and search string selection screen.
  • a user selects which search application (area 810) he/she is interested in. Once selected, a list of significant terms from the identified data store is displayed (area 820). The user then determines which terms to require, exclude and/or remove. This process as shown is manual, however in an alternate embodiment this is an automated process.
  • FIGURES 27-31 show screenshots of the results of the search in a plurality of search applications.
  • the search string is generally the same in all but is optimized for each of the search applications.
  • a user has the ability to switch between the pluralities of search applications by clicking on a variety of tabs with the search application titles prominently displayed.
  • a data product could be a text file, a webpage or any form of searchable medium. Accordingly, the scope of the invention is not limited by the disclosure of the preferred embodiment. Instead, the invention should be determined entirely by reference to the claims that follow.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Systèmes et procédés pour configurer une chaîne de recherche. Ces systèmes et procédés comprennent la recherche dans une pluralité de produits de données enregistrés à un ou plusieurs emplacements sur un réseau informatique. Au moins un produit de données est identifié contenant un sujet d'intérêt. Une liste de termes significatifs est classée dans le produit de données identifié. Le classement repose sur une valeur de pondération pour chacun des termes significatifs trouvés dans la mémoire de données. Une chaîne de recherche est créée comprenant au moins un terme significatif. Au moins une application de recherche est exécutée en utilisant la chaîne de recherche. Si au moins un enregistrement de données a été trouvé pendant la recherche, les produits de données trouvés sont affichés.
PCT/US2007/074621 2006-07-27 2007-07-27 Recherche de produit de données utilisant des concepts associés WO2008014469A2 (fr)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US82054006P 2006-07-27 2006-07-27
US60/820,540 2006-07-27
US88327407P 2007-01-03 2007-01-03
US60/883,274 2007-01-03
US11/733,478 2007-04-10
US11/733,478 US20070175674A1 (en) 2006-01-19 2007-04-10 Systems and methods for ranking terms found in a data product

Publications (2)

Publication Number Publication Date
WO2008014469A2 true WO2008014469A2 (fr) 2008-01-31
WO2008014469A3 WO2008014469A3 (fr) 2008-11-06

Family

ID=38982396

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2007/074621 WO2008014469A2 (fr) 2006-07-27 2007-07-27 Recherche de produit de données utilisant des concepts associés

Country Status (1)

Country Link
WO (1) WO2008014469A2 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6473752B1 (en) * 1997-12-04 2002-10-29 Micron Technology, Inc. Method and system for locating documents based on previously accessed documents
US20030055810A1 (en) * 2001-09-18 2003-03-20 International Business Machines Corporation Front-end weight factor search criteria
US6873982B1 (en) * 1999-07-16 2005-03-29 International Business Machines Corporation Ordering of database search results based on user feedback

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6473752B1 (en) * 1997-12-04 2002-10-29 Micron Technology, Inc. Method and system for locating documents based on previously accessed documents
US6873982B1 (en) * 1999-07-16 2005-03-29 International Business Machines Corporation Ordering of database search results based on user feedback
US20030055810A1 (en) * 2001-09-18 2003-03-20 International Business Machines Corporation Front-end weight factor search criteria

Also Published As

Publication number Publication date
WO2008014469A3 (fr) 2008-11-06

Similar Documents

Publication Publication Date Title
US20080021887A1 (en) Data product search using related concepts
US20070168344A1 (en) Data product search using related concepts
US11080315B2 (en) Systems and methods for providing a visualizable results list
US20060248078A1 (en) Search engine with suggestion tool and method of using same
US7840589B1 (en) Systems and methods for using lexically-related query elements within a dynamic object for semantic search refinement and navigation
US7039635B1 (en) Dynamically updated quick searches and strategies
US20110029563A1 (en) System and method for searching data sources
US20020073079A1 (en) Method and apparatus for searching a database and providing relevance feedback
US20060129541A1 (en) Dynamically updated quick searches and strategies
US20100293162A1 (en) Automated Keyword Generation Method for Searching a Database
US20070214154A1 (en) Data Storage And Retrieval
US11036747B2 (en) Systems and methods for providing a visualizable results list
WO2008014469A2 (fr) Recherche de produit de données utilisant des concepts associés
EA002016B1 (ru) Способ поиска хранимых на устройствах хранения данных электронных документов и их фрагментов
US20080228725A1 (en) Problem/function-oriented searching method for a patent database system
EP2083361A1 (fr) Procédé de recherche basé sur une interface définie par un problème/une fonction pour système de base de données de brevets
AU2002339257A1 (en) A system and method for searching data sources
AU2008202352A1 (en) A system and method for searching data sources

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07840562

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

NENP Non-entry into the national phase in:

Ref country code: RU

122 Ep: pct application non-entry in european phase

Ref document number: 07840562

Country of ref document: EP

Kind code of ref document: A2