WO2013009613A1 - Systems and methods for natural language searching of structured data - Google Patents

Systems and methods for natural language searching of structured data Download PDF

Info

Publication number
WO2013009613A1
WO2013009613A1 PCT/US2012/045742 US2012045742W WO2013009613A1 WO 2013009613 A1 WO2013009613 A1 WO 2013009613A1 US 2012045742 W US2012045742 W US 2012045742W WO 2013009613 A1 WO2013009613 A1 WO 2013009613A1
Authority
WO
WIPO (PCT)
Prior art keywords
natural language
search
information
structured
text
Prior art date
Application number
PCT/US2012/045742
Other languages
French (fr)
Inventor
Jochen Lothar Leidner
Frank Schilder
Thomas Robert ZEILUND
Isabelle Alice Yvonne MOULINIER
Original Assignee
Thomson Reuters Global Resources
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thomson Reuters Global Resources filed Critical Thomson Reuters Global Resources
Priority to EP12811026.9A priority Critical patent/EP2729886A4/en
Publication of WO2013009613A1 publication Critical patent/WO2013009613A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/243Natural language query formulation

Definitions

  • the invention relates to searching structured data using natural language searches. More specifically, the invention relates to using data that is typically not searchable using a natural language search and making it searchable with a natural language search.
  • a natural language search is a search wherein the searcher uses a regular spoken language, such as English, to enter a search. For example, the searcher may access
  • a keyword search is a search, not necessarily using regular spoken language (i.e., sentences), wherein at least one word is entered. Such a search may be used to attempt to find documents with at least one of the entered words. For example, the searcher may access www.qooqle.com and enter "grass seed plant best time” in the search box. This particular search returned over 800,000 results.
  • natural language search includes keyword searches. Searchers use search engines from Google, Microsoft, and various other companies to conduct natural language searches.
  • both the natural language searches and keyword searches do not include searches performed on entered words in a form wherein the searcher is limited to a particular set of words.
  • the website http://apartments.cazoodle.com permits one to search for apartments but only if the search is limited to, e.g., a city or state.
  • a search for the term "three bedrooms” will not identify any results (instead, this type of search may be performed by using a pull-down menu on the website).
  • the results of natural languages searches are from unstructured data.
  • data includes any type of information and includes but is not limited to both numbers and text. Differentiating between unstructured data and structured data is based upon whether the data is associated with a logical schema.
  • Unstructured data is data unassociated with a logical schema.
  • Structured data is data that is associated with a logical schema.
  • structured data is associated with a specification as to how the data may be found or located in an unambiguous manner. For example, a specification for a relational database table of ordered names, street addresses, towns, states, and zip codes would state that zip codes are found in column five (whereas names, street addresses, towns, and states are found in columns one, two three, and four, respectively).
  • structured data examples include, but are not limited to relational databases (which use the Data Definition Language [DDL] for writing logical schema), XML databases (which use an XML schema to describe the structure of XML files and the types of the data contained therein) and spreadsheets (which provide a manner in which to accurately identify data stored within fixed fields within a record or file).
  • unstructured data examples include, but are not limited to email messages, word processing documents, documents in .pdf format, web pages, and other types of data comprising free-form text.
  • structured data is associated with a specification as to how data may be found or located in an unambiguous manner. This is why, for example, that although data in XML databases are not stored in fixed locations (as is the case with spreadsheets), XML data is still considered structured because it may be unambiguously identified (via, e.g., tags associated with the data).
  • Google provider of one of the most commonly used search engines, has admitted that it has "not been doing a good job" of presenting structured data found on the web to users. See www.readwriteweb.com/archives/qooqle were not doing a good job wit h structured data.php.
  • Google has difficulty providing search results which include content from the "deep web” (those internet resources that sit behind forms and site-specific search boxes and are unable to be indexed by passive means).
  • Other search engines may face similar challenges. Google estimates the "deep web” to be about 500 times the size of the "shallow web” which is estimated to contain about 5 million web pages.
  • our invention relates to computer implemented methods to respond to receiving a natural language search. This is done by searching a set of information searchable using the natural language search wherein the set of information was generated from a set of structured information which is unsearchable using the natural language search. Next, a set of search results is formulated and a signal associated with the set of search results is transmitted.
  • Corresponding systems are also disclosed as are methods and systems for creating such information searchable via natural language searches.
  • the present invention permits the use of natural language searching on a set of information associated with structured data. Also advantageously, the present invention permits the use of natural language searching using an inverted file index.
  • Figure 1 shows a system in accordance with the present invention that may be used to generate a text collection and an inverted file index and also shows the resultant text collection and inverted file index;
  • Figure 2 shows a flowchart detailing the operation of the system of Figure 1 which may be done offline;
  • Figure 3 shows an example of a document which is a portion of a text collection and was generated from a set of structured information
  • Figure 4 shows an example of an inverted file index
  • Figure 5 shows a flowchart detailing the operation of the system of Figure 5.
  • the system 100 of Figure 1 comprises a database 1 10, an exporter 120, a text generator 130, and a rules engine 140, all of which may be implemented as combinations of hardware and software as will be appreciated by those skilled in the art.
  • Text generators are known in the art. See, e.g., Dale, Robert and Reiter, Ehud, Building Natural Language Generation Systems (Cambridge University Press, Cambridge, U.K. 2000).
  • the database 110 comprises structured data and is functionally connected to the exporter 120 via communications link 150.
  • the exporter 120 is functionally connected to the rules engine 140 and the text generator 130 via communication links 160 and 170, respectively.
  • Communication links 150, 160, and 170 may be a hardwired bus, a wireless link, or any other type of communications link, including optical links, software function calls, and the like, known to those skilled in the art.
  • system 100 is used to generate a text collection 180 and an inverted file index 190, both of which may be stored in memory 195.
  • the system 100 may be used in an offline manner when generating the text collection 180 and the inverted filed index 190.
  • the manner in which memory 195 may be accessed is through the use of an online search using, e.g., natural language via communications link 198.
  • system 199 that responds to natural language searches. More specifically, system 199 comprises memory 195 and search engine 198, along with other associated hardware and software to respond to natural language searches. Through the use of hardware and software, system 199 has a means for receiving a natural language search and a means for searching.
  • An example of hardware and software that may be used to receive a natural language search and to conduct a search is a personal computer based on an Intel central processing unit ("CPU").
  • CPU central processing unit
  • Other examples include a mobile computing device such as an Apple iPhone® or a Hadoop parallel computation cluster.
  • System 199 also has, through the use of hardware and software, means for formulating a set of search results and means for transmitting a signal associated with the set of search results.
  • An example of hardware and software that may be used for formulating a set of search results is the Apache Lucene full-text search engine. Other examples include both an inverted index managed by an Objective C indexing and retrieval library and the Glascow Terrier system.
  • an example of hardware and software that may be used to transmit a signal associated with the set of search results is a machine implementing any of the Hyper Text Transfer Protocol/Hyper Text Markup Language (“HTTP/HTML”), extensible Markup Language over HTTP (“XML-over-HTTP”) or Simple Object Access
  • HTTP/HTML Hyper Text Transfer Protocol/Hyper Text Markup Language
  • XML-over-HTTP extensible Markup Language over HTTP
  • the text collection 180 is comprised of multiple documents.
  • the first set of documents, 180-1-1 through 180-1 -N relates to, e.g., a first spreadsheet having N records wherein each document (e.g., 180-1 -3) has a corresponding record (e.g., record #3) within the first spreadsheet.
  • the last set of documents, 180-M-1 through 180-M-J related to, e.g., the M th spreadsheet having J records wherein each document (e.g., 180-M-4) has a corresponding record (e.g., record #4) within the M th spreadsheet.
  • each record within the database 1 10 will have a
  • FIG. 2 a flowchart 200 is described detailing the operation of the components of system 100 and how they generate a text collection 180 and an inverted file 190.
  • database 1 10 comprises various sets of structured information such as spreadsheets 1 through M. Further assume that spreadsheet 1 contains N records and spreadsheet M contains J records.
  • a spreadsheet counter, SSC is initialized to zero in step 202.
  • SSC is incremented in step 204.
  • a spreadsheet record counter, SSRC is initialized to zero in step 206 and then
  • step 210 the exporter 120 reads record 1 of spreadsheet 1 and creates file 180-1 -1 of text collection 180.
  • a portion of system 100 determines whether spreadsheet 1 contains
  • step 212 If so, the process goes to step 208 and SSRC is incremented. Otherwise, in step 214, the portion of the system determines whether there is an additional spreadsheet. If so, the process goes to step 204 and counter SSC is incremented. Otherwise, the text collection 180 is complete as shown in box 216.
  • Those skilled in the art will realize that the above description of Figure 2 may be done in an offline fashion. They will also realize that the example has been described with respect to sets of structured information that happen to be spreadsheets but the same could be done with any set or sets of structured information including but not limited to SQL databases, XML files, tab-separated text files, and graph stores.
  • the exporter may be realized as a batch exporter, generating all documents offline and at once. It may also be realized as an incremental process, generating documents only as required (e.g., triggered by changes in the database 110).
  • the exporter 120 communicates with the rules engine 140.
  • Rules engine 140 has two sets of rules. The first set of rules specifies textual transformations. An example of a textual transformation is the expansion of a stock ticker symbol by the company name with which it is associated (e.g., substituting TRI with Thomson Reuters).
  • the second set of rules represents language templates with placeholders. A completed example of this is shown as document 300 of Figure 3. Instantiation of document 300 is discussed further below with reference to Figure 3.
  • the text generator 130 selects an appropriate template and instantiates the placeholders with the values from the current database row.
  • FIG. 3 an example of a document 300 which is a portion of a text collection 180 and was generated from a set of structured information is shown.
  • the document 300 is comprised of a template portion 302 and placeholders 304, 306, and 308.
  • placeholders 304, 306, and 308 are placeholders for the document 300.
  • the document 300 relates to stock prices on a particular day.
  • the set of structured information used to generate the document 300 is shown in row 310 of the set of structured information 312.
  • This set of structured information 312 has various records denoted by a row number (see column 314).
  • Each record contains a company ticker symbol identified in column 316, a share price identified in column 318, a date identified in column 320, and a currency identified in column 321.
  • a set of rules 322 is used to take entries in columns 316, 318, 320, and 321 and translate them into
  • the set of rules 322 may be generated through human review of the set of structured information 312. After this review, the reviewer drafts particular rules (322a, 322b, 322c, and 322d) relating to the particular set of structured information (may need some additional examples/information/discussion on how these are generated).
  • row 310 reflects that a stock with a ticker symbol "TRI" was sold for $40.10 on May 20, 1011.
  • Rules engine 140 is applied to row 310 to generate a document 300 stating "[t]he share price of Thomson Reuters was $40.10 on November 2, 2010.” This is accomplished by identifying where to insert, within a template stating "[t]he share price of (insert company name) was (insert currency) (insert amount) on (insert date),” particular fields of each record within database 110. This completes generation of document 300 which is part of text collection 180. It is noted that document 300 is searchable using a natural language search whereas the set of structured information 312 is not searchable using a natural language search.
  • an inverted file index 190 is shown.
  • This particular inverted file index 190 relates to document 300 (repeated in Figure 4 for convenience), document 410, and document 412.
  • Documents 300, 410, and 412 relate to the share prices of Thomson Reuters,
  • the inverted file index 190 is comprised of a first column 414, a second column 416, a third column 420, a fourth column 422, and a fifth column 426.
  • the first column 414 comprises a list, preferably alphabetically, of all terms within documents 300, 410, and 412.
  • the second column comprises the document numbers relating to text collection 180.
  • Price appears in each of documents 180-1-7, 180-1 -8, and 180-1 -9.
  • the third column 420 comprises the number of "hits" for each term in the first column 414. For example, assuming that
  • documents 180-1 -7, 180-1-8, and 180-1-9 were the only documents in text collection 180, performing two separate natural language searches using the present invention for the terms "price” and "Pfizer” would return three documents (i.e., documents 180-1-7, 180-1 -8, and 180-1-9) and one document (i.e., document 180-1 -9), respectively.
  • the fourth column 422 denotes the number of occurrences of each term. For example, Microsoft appears one time in document 180-1-8 whereas was appears one time in each of documents 180-1 -7, 180-1-8, and 180-1-9.
  • the fifth column 426 represents the position, in words, of each term in each document. For example, "Reuters " is the sixth word in document 180-1-7 whereas
  • the exemplary inverted file index 190 is both a record level inverted index and a word level inverted index because it comprises the second column 416 and the fifth column 426, respectively.
  • an inverted file index 190 functions to map content, such as words, numbers, and other things searchable using natural languages searches, to structured data (e.g., XML databases).
  • modifications to inverted file index 190 which would result in another inverted file index 190 include but are not limited to the removal of and/or addition of columns.
  • database 110 will typically be comprised of many different sets of structured information comprising various records and fields. For example, some records may relate to restaurants in a particular zip code along with hours of operation whereas other records may relate to sales prices of television sets (arranged by, e.g., size, model number, manufacturer, technology type, etc ..) at
  • each record in database 1 10 will have a
  • a flowchart 500 detailing the operation of the portion of the system 100 to the right of line 197 is shown.
  • a user enters a natural language search. These searches may utilize ranked retrieval based on keywords or Boolean logic.
  • Google, Bing, and Yahoo are examples of search engines wherein a user may conduct a natural language search.
  • the natural language search is received by a search engine.
  • a set of search results is gathered, formulated, and/or otherwise collected.
  • the inverted file index 190 is used to perform this step.
  • the set of search results gathered comprises various files within text collection 180.
  • a signal associated with the set of search results is sent to the user. This signal, as will be appreciated by those skilled in the art, may be
  • step 510 the user may analyze and/or display the set of search results (or reasonable facsimile thereof).
  • search results may also contain information, such as unstructured data, that was always searchable using a natural language search.

Abstract

The invention relates to searching structured data using natural language searches. More specifically and preferably, the invention relates to the use of an inverted file index built from generated documents to make data, typically unsearchable using a natural language search, searchable.

Description

Searching of Structured Data
Field of the Invention:
The invention relates to searching structured data using natural language searches. More specifically, the invention relates to using data that is typically not searchable using a natural language search and making it searchable with a natural language search.
Background of the Invention:
Often, when people have a topic to research, they turn to the internet. Through the internet, people may access search engines from many companies including Google, Microsoft, and others.
In order to research a given topic, people will typically perform a natural language or keyword search. A natural language search is a search wherein the searcher uses a regular spoken language, such as English, to enter a search. For example, the searcher may access
www.qooqle.com and enter "what is the best time to plant grass seed?" in the search box. This particular search returned over 1 ,000,000 results. Similarly, a keyword search is a search, not necessarily using regular spoken language (i.e., sentences), wherein at least one word is entered. Such a search may be used to attempt to find documents with at least one of the entered words. For example, the searcher may access www.qooqle.com and enter "grass seed plant best time" in the search box. This particular search returned over 800,000 results. As used herein, the term "natural language search" includes keyword searches. Searchers use search engines from Google, Microsoft, and various other companies to conduct natural language searches. It is noted that, as used herein, both the natural language searches and keyword searches do not include searches performed on entered words in a form wherein the searcher is limited to a particular set of words. For example, the website http://apartments.cazoodle.com permits one to search for apartments but only if the search is limited to, e.g., a city or state. A search for the term "three bedrooms" will not identify any results (instead, this type of search may be performed by using a pull-down menu on the website). The results of natural languages searches are from unstructured data. As used herein, "data" includes any type of information and includes but is not limited to both numbers and text. Differentiating between unstructured data and structured data is based upon whether the data is associated with a logical schema.
Unstructured data is data unassociated with a logical schema. Structured data is data that is associated with a logical schema. Thus, unlike unstructured data, structured data is associated with a specification as to how the data may be found or located in an unambiguous manner. For example, a specification for a relational database table of ordered names, street addresses, towns, states, and zip codes would state that zip codes are found in column five (whereas names, street addresses, towns, and states are found in columns one, two three, and four, respectively).
Examples of structured data include, but are not limited to relational databases (which use the Data Definition Language [DDL] for writing logical schema), XML databases (which use an XML schema to describe the structure of XML files and the types of the data contained therein) and spreadsheets (which provide a manner in which to accurately identify data stored within fixed fields within a record or file). Examples of unstructured data include, but are not limited to email messages, word processing documents, documents in .pdf format, web pages, and other types of data comprising free-form text. Thus, as mentioned above, the difference between structured data and unstructured data is that structured data is associated with a specification as to how data may be found or located in an unambiguous manner. This is why, for example, that although data in XML databases are not stored in fixed locations (as is the case with spreadsheets), XML data is still considered structured because it may be unambiguously identified (via, e.g., tags associated with the data).
Unfortunately, natural language search engines are ineffective at providing search results from structured data. This is problematic from a number of perspectives. For example, Google, provider of one of the most commonly used search engines, has admitted that it has "not been doing a good job" of presenting structured data found on the web to users. See www.readwriteweb.com/archives/qooqle were not doing a good job wit h structured data.php. In this context, Google has difficulty providing search results which include content from the "deep web" (those internet resources that sit behind forms and site-specific search boxes and are unable to be indexed by passive means). Other search engines may face similar challenges. Google estimates the "deep web" to be about 500 times the size of the "shallow web" which is estimated to contain about 5 million web pages. Another example relates to information solutions providers, such as Thomson Reuters, which provides information solutions to workers in the healthcare, tax and accounting, legal, scientific, news/media and financial areas. This problem is made more acute by the fact that people are becoming more and more accustomed to searching for information using natural language searches. Summary of the Invention: We have realized that the use of text generation technology
enhances the effectiveness of being able to search structured data using natural language searches. More specifically, our invention relates to computer implemented methods to respond to receiving a natural language search. This is done by searching a set of information searchable using the natural language search wherein the set of information was generated from a set of structured information which is unsearchable using the natural language search. Next, a set of search results is formulated and a signal associated with the set of search results is transmitted. Corresponding systems are also disclosed as are methods and systems for creating such information searchable via natural language searches. Advantageously, the present invention permits the use of natural language searching on a set of information associated with structured data. Also advantageously, the present invention permits the use of natural language searching using an inverted file index. Other advantages of the present invention will be apparent to those skilled in the art from the remainder of this specification.
Brief Description of the Drawings:
Figure 1 shows a system in accordance with the present invention that may be used to generate a text collection and an inverted file index and also shows the resultant text collection and inverted file index;
Figure 2 shows a flowchart detailing the operation of the system of Figure 1 which may be done offline;
Figure 3 shows an example of a document which is a portion of a text collection and was generated from a set of structured information;
Figure 4 shows an example of an inverted file index; and
Figure 5 shows a flowchart detailing the operation of the system of Figure 5.
Detailed Description:
The system 100 of Figure 1 comprises a database 1 10, an exporter 120, a text generator 130, and a rules engine 140, all of which may be implemented as combinations of hardware and software as will be appreciated by those skilled in the art. Text generators are known in the art. See, e.g., Dale, Robert and Reiter, Ehud, Building Natural Language Generation Systems (Cambridge University Press, Cambridge, U.K. 2000). The database 110 comprises structured data and is functionally connected to the exporter 120 via communications link 150. The exporter 120 is functionally connected to the rules engine 140 and the text generator 130 via communication links 160 and 170, respectively. Communication links 150, 160, and 170 may be a hardwired bus, a wireless link, or any other type of communications link, including optical links, software function calls, and the like, known to those skilled in the art. Referring again to Figure 1 , system 100 is used to generate a text collection 180 and an inverted file index 190, both of which may be stored in memory 195. The system 100 may be used in an offline manner when generating the text collection 180 and the inverted filed index 190. Once generated and stored in memory 195, the manner in which memory 195 may be accessed is through the use of an online search using, e.g., natural language via communications link 198. Referring yet again to Figure 1 , the portion of the system to the right of line 197, exclusive of any user equipment, is a system 199 that responds to natural language searches. More specifically, system 199 comprises memory 195 and search engine 198, along with other associated hardware and software to respond to natural language searches. Through the use of hardware and software, system 199 has a means for receiving a natural language search and a means for searching. An example of hardware and software that may be used to receive a natural language search and to conduct a search is a personal computer based on an Intel central processing unit ("CPU"). Other examples include a mobile computing device such as an Apple iPhone® or a Hadoop parallel computation cluster. System 199 also has, through the use of hardware and software, means for formulating a set of search results and means for transmitting a signal associated with the set of search results. An example of hardware and software that may be used for formulating a set of search results is the Apache Lucene full-text search engine. Other examples include both an inverted index managed by an Objective C indexing and retrieval library and the Glascow Terrier system. Additionally, an example of hardware and software that may be used to transmit a signal associated with the set of search results is a machine implementing any of the Hyper Text Transfer Protocol/Hyper Text Markup Language ("HTTP/HTML"), extensible Markup Language over HTTP ("XML-over-HTTP") or Simple Object Access
Protocol ("SOAP"). Still referring to Figure 1 , the text collection 180 is comprised of multiple documents. The first set of documents, 180-1-1 through 180-1 -N, relates to, e.g., a first spreadsheet having N records wherein each document (e.g., 180-1 -3) has a corresponding record (e.g., record #3) within the first spreadsheet. Likewise, the last set of documents, 180-M-1 through 180-M-J, related to, e.g., the Mth spreadsheet having J records wherein each document (e.g., 180-M-4) has a corresponding record (e.g., record #4) within the Mth spreadsheet. Those skilled in the art will appreciate that each record within the database 1 10 will have a
corresponding document (e.g., 180-17-23, corresponding to the 23rd record of the 17th spreadsheet [not shown]) in the text collection 180. Those skilled in the art will also appreciate that those elements to the right of line 197 are used in an online manner while those elements to the left of line 197 are used to generate the contents of memory 195 (e.g., the text collection 180 and the inverted file 190) in an offline fashion. Referring to Figure 2, a flowchart 200 is described detailing the operation of the components of system 100 and how they generate a text collection 180 and an inverted file 190. Assume, as in Figure 1 , that database 1 10 comprises various sets of structured information such as spreadsheets 1 through M. Further assume that spreadsheet 1 contains N records and spreadsheet M contains J records. To create a first document in text collection 180, a spreadsheet counter, SSC, is initialized to zero in step 202. Next SSC is incremented in step 204. Next, a spreadsheet record counter, SSRC, is initialized to zero in step 206 and then
incremented in step 208. Next, in step 210, the exporter 120 reads record 1 of spreadsheet 1 and creates file 180-1 -1 of text collection 180. Next, a portion of system 100 determines whether spreadsheet 1 contains
additional records (see step 212). If so, the process goes to step 208 and SSRC is incremented. Otherwise, in step 214, the portion of the system determines whether there is an additional spreadsheet. If so, the process goes to step 204 and counter SSC is incremented. Otherwise, the text collection 180 is complete as shown in box 216. Those skilled in the art will realize that the above description of Figure 2 may be done in an offline fashion. They will also realize that the example has been described with respect to sets of structured information that happen to be spreadsheets but the same could be done with any set or sets of structured information including but not limited to SQL databases, XML files, tab-separated text files, and graph stores.
Again referring to Figure 2, the relationship with Figure 1 is described. There is one document for each row of the database 110. The exporter may be realized as a batch exporter, generating all documents offline and at once. It may also be realized as an incremental process, generating documents only as required (e.g., triggered by changes in the database 110). The exporter 120 communicates with the rules engine 140. Rules engine 140 has two sets of rules. The first set of rules specifies textual transformations. An example of a textual transformation is the expansion of a stock ticker symbol by the company name with which it is associated (e.g., substituting TRI with Thomson Reuters). The second set of rules represents language templates with placeholders. A completed example of this is shown as document 300 of Figure 3. Instantiation of document 300 is discussed further below with reference to Figure 3. The text generator 130 selects an appropriate template and instantiates the placeholders with the values from the current database row.
Referring to Figure 3, an example of a document 300 which is a portion of a text collection 180 and was generated from a set of structured information is shown. The document 300 is comprised of a template portion 302 and placeholders 304, 306, and 308. In this particular
example, the document 300 relates to stock prices on a particular day. The set of structured information used to generate the document 300 is shown in row 310 of the set of structured information 312. This set of structured information 312 has various records denoted by a row number (see column 314). Each record contains a company ticker symbol identified in column 316, a share price identified in column 318, a date identified in column 320, and a currency identified in column 321. A set of rules 322 is used to take entries in columns 316, 318, 320, and 321 and translate them into
characters which will ultimately populate placeholders 304, 306, 308, and 307, respectively. Once complete, document 300 is referred to as being instantiated. More specifically, the set of rules 322 may be generated through human review of the set of structured information 312. After this review, the reviewer drafts particular rules (322a, 322b, 322c, and 322d) relating to the particular set of structured information (may need some additional examples/information/discussion on how these are generated). In this example, row 310 reflects that a stock with a ticker symbol "TRI" was sold for $40.10 on May 20, 1011. Rules engine 140 is applied to row 310 to generate a document 300 stating "[t]he share price of Thomson Reuters was $40.10 on November 2, 2010." This is accomplished by identifying where to insert, within a template stating "[t]he share price of (insert company name) was (insert currency) (insert amount) on (insert date)," particular fields of each record within database 110. This completes generation of document 300 which is part of text collection 180. It is noted that document 300 is searchable using a natural language search whereas the set of structured information 312 is not searchable using a natural language search.
Referring to Figure 4, an inverted file index 190 is shown. This particular inverted file index 190 relates to document 300 (repeated in Figure 4 for convenience), document 410, and document 412. Documents 300, 410, and 412 relate to the share prices of Thomson Reuters,
Microsoft, and Pfizer stocks as of November 2, 2010. These documents are among many documents that may be part of, e.g., text collection 180. Assume documents 300, 410, and 412 correspond to documents bearing numbers 180-1-7, 180-1 -8, and 180-1-9, respectively, of text collection 180. In other words, they are associated with, respectively, the 7th through 9th records of the first spreadsheet. In this example, the inverted file index 190 is comprised of a first column 414, a second column 416, a third column 420, a fourth column 422, and a fifth column 426. The first column 414 comprises a list, preferably alphabetically, of all terms within documents 300, 410, and 412. The second column comprises the document numbers relating to text collection 180. It should be noted that, for ease of reading, the word "AH" has been substituted for the collection of documents 180-1-7, 180-1-8, and 180-1-9. Thus, by way of example, because the term "price" bears the entry "AH" in the second column 416, it means that the term
"price" appears in each of documents 180-1-7, 180-1 -8, and 180-1 -9.
Similarly, because the term "Microsoft" bears the entry 180-1-8 in the second column 416, it means that the term "Microsoft" appears in
document 180-1 -8. The third column 420 comprises the number of "hits" for each term in the first column 414. For example, assuming that
documents 180-1 -7, 180-1-8, and 180-1-9 were the only documents in text collection 180, performing two separate natural language searches using the present invention for the terms "price" and "Pfizer" would return three documents (i.e., documents 180-1-7, 180-1 -8, and 180-1-9) and one document (i.e., document 180-1 -9), respectively. The fourth column 422 denotes the number of occurrences of each term. For example, Microsoft appears one time in document 180-1-8 whereas was appears one time in each of documents 180-1 -7, 180-1-8, and 180-1-9. The fifth column 426 represents the position, in words, of each term in each document. For example, "Reuters " is the sixth word in document 180-1-7 whereas
"November" is the tenth, ninth, and ninth word, respectively, in documents 180-1-7, 180-1-8, and 180-1-9.
Again referring to Figure 4, it will be apparent that the exemplary inverted file index 190 is both a record level inverted index and a word level inverted index because it comprises the second column 416 and the fifth column 426, respectively. It is apparent to those skilled in the art that, in general, an inverted file index 190 functions to map content, such as words, numbers, and other things searchable using natural languages searches, to structured data (e.g., XML databases). Thus, modifications to inverted file index 190 which would result in another inverted file index 190 include but are not limited to the removal of and/or addition of columns. As will be appreciated by those skilled in the art, database 110 will typically be comprised of many different sets of structured information comprising various records and fields. For example, some records may relate to restaurants in a particular zip code along with hours of operation whereas other records may relate to sales prices of television sets (arranged by, e.g., size, model number, manufacturer, technology type, etc ..) at
particular stores. Thus, each record in database 1 10 will have a
corresponding file within text collection 180 designated by one reference numeral ranging from 180-1 -1 through 180— M-J.
Those skilled in the art will appreciate that the portion of the detailed description above, relating to the creation of an inverted file index and a system for the same, may be done in an offline fashion. However, in order to conduct a natural language search on a set of information associated with structured data, work must be done online.
Referring to Figure 5, a flowchart 500 detailing the operation of the portion of the system 100 to the right of line 197 is shown. First, in step 502 a user enters a natural language search. These searches may utilize ranked retrieval based on keywords or Boolean logic. Google, Bing, and Yahoo are examples of search engines wherein a user may conduct a natural language search. Second, in step 504 the natural language search is received by a search engine. Third, in step 506, a set of search results is gathered, formulated, and/or otherwise collected. The inverted file index 190 is used to perform this step. The set of search results gathered comprises various files within text collection 180. Fourth, in step 508, a signal associated with the set of search results is sent to the user. This signal, as will be appreciated by those skilled in the art, may be
compressed or take on any format as long as a reasonable facsimile of particular document within the text collection may be reproduced for the user. Fifth and finally, in step 510, the user may analyze and/or display the set of search results (or reasonable facsimile thereof).
Those skilled in the art will realize that the detailed description above is provided for illustrative purposes and to enable those skilled in the art to make and use the claimed invention. For example, although the text collection 180 and inverted file index 190 are described in English, the invention may be used in any language. Additionally, although the present invention has been described with respect to financial information (e.g., stock prices), it may be used to make any structured data searchable using a natural language search. Further, there may be a set of templates used wherein each template, once completed, corresponds to a different instantiation of document 300 in a different language. Still further, although the present invention has been described as retrieving only search results that at one point were unsearchable using a natural language search, those skilled in the art will appreciate that the search results may also contain information, such as unstructured data, that was always searchable using a natural language search. Thus, the invention is defined by the appended claims.

Claims

Claims:
1. A computer implemented method comprising: a. receiving a natural language search; b. in response to the natural language search, searching a set of information searchable using the natural language search, the set of information having been generated from a set of structured information which is unsearchable using the natural language search; c. based upon the step of searching, formulating a set of search results; and d. transmitting a signal associated with the set of search results.
2. The method of claim 1 wherein a language associated with the
natural language search is English.
3. The method of claim 1 wherein a language associated with the
natural language search is a language other than English.
4. The method of claim 1 wherein the set of information was generated by: a. accessing the set of structured information; and b. applying a text generator to the set of structured information.
5. The method of claim 4 wherein the text generator generates the set of information in multiple languages.
6. The method of claim 4 wherein the text generator generates the set of information in English.
7. A computer implemented method comprising: a. identifying a set of structured information wherein the set of structured information is unsearchable using a natural language search; b. based upon the set of structured information, generating an additional set of information wherein the additional set of information is searchable using the natural language search.
8. The method of claim 7 wherein the step of generating comprises
using a text generator and a rules engine.
9. The method of claim 8 wherein the additional set of information comprises a text collection.
10. The method of claim 0 wherein the additional set of information further comprises an inverted file index.
11. A system comprising: a. means for receiving a natural language search; b. means, responsive to the means for receiving, for searching a set of information searchable using the natural language search, the set of information having been generated from a set of structured information which is unsearchable using the natural language search; c. means for formulating a set of search results; and d. means for transmitting a signal associated with the set of
search results.
12. The system of claim 11 wherein the means for formulating a set of search results comprises a text collection and an inverted file index.
PCT/US2012/045742 2011-07-08 2012-07-06 Systems and methods for natural language searching of structured data WO2013009613A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP12811026.9A EP2729886A4 (en) 2011-07-08 2012-07-06 Systems and methods for natural language searching of structured data

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US13/178,924 2011-07-08
US13/178,924 US20130013616A1 (en) 2011-07-08 2011-07-08 Systems and Methods for Natural Language Searching of Structured Data

Publications (1)

Publication Number Publication Date
WO2013009613A1 true WO2013009613A1 (en) 2013-01-17

Family

ID=47439294

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2012/045742 WO2013009613A1 (en) 2011-07-08 2012-07-06 Systems and methods for natural language searching of structured data

Country Status (3)

Country Link
US (1) US20130013616A1 (en)
EP (1) EP2729886A4 (en)
WO (1) WO2013009613A1 (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9600471B2 (en) 2012-11-02 2017-03-21 Arria Data2Text Limited Method and apparatus for aggregating with information generalization
US9640045B2 (en) 2012-08-30 2017-05-02 Arria Data2Text Limited Method and apparatus for alert validation
US9904676B2 (en) 2012-11-16 2018-02-27 Arria Data2Text Limited Method and apparatus for expressing time in an output text
US9946711B2 (en) 2013-08-29 2018-04-17 Arria Data2Text Limited Text generation from correlated alerts
US9990360B2 (en) 2012-12-27 2018-06-05 Arria Data2Text Limited Method and apparatus for motion description
US10115202B2 (en) 2012-12-27 2018-10-30 Arria Data2Text Limited Method and apparatus for motion detection
US10255252B2 (en) 2013-09-16 2019-04-09 Arria Data2Text Limited Method and apparatus for interactive reports
US10282878B2 (en) 2012-08-30 2019-05-07 Arria Data2Text Limited Method and apparatus for annotating a graphical output
US10282422B2 (en) 2013-09-16 2019-05-07 Arria Data2Text Limited Method, apparatus, and computer program product for user-directed reporting
US10445432B1 (en) 2016-08-31 2019-10-15 Arria Data2Text Limited Method and apparatus for lightweight multilingual natural language realizer
US10467333B2 (en) 2012-08-30 2019-11-05 Arria Data2Text Limited Method and apparatus for updating a previously generated text
US10467347B1 (en) 2016-10-31 2019-11-05 Arria Data2Text Limited Method and apparatus for natural language document orchestrator
US10528633B2 (en) 2017-01-23 2020-01-07 International Business Machines Corporation Utilizing online content to suggest item attribute importance
US10565308B2 (en) 2012-08-30 2020-02-18 Arria Data2Text Limited Method and apparatus for configurable microplanning
US10664558B2 (en) 2014-04-18 2020-05-26 Arria Data2Text Limited Method and apparatus for document planning
US10747795B2 (en) 2018-01-11 2020-08-18 International Business Machines Corporation Cognitive retrieve and rank search improvements using natural language for product attributes
US10769380B2 (en) 2012-08-30 2020-09-08 Arria Data2Text Limited Method and apparatus for situational analysis text generation
US10776561B2 (en) 2013-01-15 2020-09-15 Arria Data2Text Limited Method and apparatus for generating a linguistic representation of raw input data
US11061979B2 (en) 2017-01-05 2021-07-13 International Business Machines Corporation Website domain specific search
US11176214B2 (en) 2012-11-16 2021-11-16 Arria Data2Text Limited Method and apparatus for spatial descriptions in an output text

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130024459A1 (en) * 2011-07-20 2013-01-24 Microsoft Corporation Combining Full-Text Search and Queryable Fields in the Same Data Structure
RU2616598C1 (en) * 2012-01-19 2017-04-18 Мицубиси Электрик Корпорейшн Image decoding device, image encoding device, image decoding method and image encoding method
US9282328B2 (en) * 2012-02-10 2016-03-08 Broadcom Corporation Sample adaptive offset (SAO) in accordance with video coding
US9305050B2 (en) * 2012-03-06 2016-04-05 Sergey F. Tolkachev Aggregator, filter and delivery system for online context dependent interaction, systems and methods
US9330090B2 (en) * 2013-01-29 2016-05-03 Microsoft Technology Licensing, Llc. Translating natural language descriptions to programs in a domain-specific language for spreadsheets
US9870422B2 (en) 2013-04-19 2018-01-16 Dropbox, Inc. Natural language search
US9710558B2 (en) 2014-07-22 2017-07-18 Bank Of America Corporation Method and apparatus for navigational searching of a website
US10754881B2 (en) 2016-02-10 2020-08-25 Refinitiv Us Organization Llc System for natural language interaction with financial data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040205044A1 (en) * 2003-04-11 2004-10-14 International Business Machines Corporation Method for storing inverted index, method for on-line updating the same and inverted index mechanism
US20090024620A1 (en) * 2005-04-08 2009-01-22 Dong Arm Kim Method and Apparatus for Providing Search Result Using Language Chain

Family Cites Families (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5309359A (en) * 1990-08-16 1994-05-03 Boris Katz Method and apparatus for generating and utlizing annotations to facilitate computer text retrieval
US5953723A (en) * 1993-04-02 1999-09-14 T.M. Patents, L.P. System and method for compressing inverted index files in document search/retrieval system
EP0834139A4 (en) * 1995-06-07 1998-08-05 Int Language Engineering Corp Machine assisted translation tools
US5963940A (en) * 1995-08-16 1999-10-05 Syracuse University Natural language information retrieval system and method
US5802495A (en) * 1996-03-01 1998-09-01 Goltra; Peter Phrasing structure for the narrative display of findings
US5778373A (en) * 1996-07-15 1998-07-07 At&T Corp Integration of an information server database schema by generating a translation map from exemplary files
US6081774A (en) * 1997-08-22 2000-06-27 Novell, Inc. Natural language information retrieval system and method
US6094649A (en) * 1997-12-22 2000-07-25 Partnet, Inc. Keyword searches of structured databases
KR100285265B1 (en) * 1998-02-25 2001-04-02 윤덕용 Db management system and inverted index storage structure using sub-index and large-capacity object
US6393428B1 (en) * 1998-07-13 2002-05-21 Microsoft Corporation Natural language information retrieval system
US7818232B1 (en) * 1999-02-23 2010-10-19 Microsoft Corporation System and method for providing automated investment alerts from multiple data sources
US6601026B2 (en) * 1999-09-17 2003-07-29 Discern Communications, Inc. Information retrieval by natural language querying
US20020010574A1 (en) * 2000-04-20 2002-01-24 Valery Tsourikov Natural language processing and query driven information retrieval
US6778951B1 (en) * 2000-08-09 2004-08-17 Concerto Software, Inc. Information retrieval method with natural language interface
AU2001293596A1 (en) * 2000-09-29 2002-04-08 Gavagai Technology Incorporated A method and system for adapting synonym resources to specific domains
US9009590B2 (en) * 2001-07-31 2015-04-14 Invention Machines Corporation Semantic processor for recognition of cause-effect relations in natural language documents
US8799776B2 (en) * 2001-07-31 2014-08-05 Invention Machine Corporation Semantic processor for recognition of whole-part relations in natural language documents
US7283951B2 (en) * 2001-08-14 2007-10-16 Insightful Corporation Method and system for enhanced data searching
US7398201B2 (en) * 2001-08-14 2008-07-08 Evri Inc. Method and system for enhanced data searching
US7403938B2 (en) * 2001-09-24 2008-07-22 Iac Search & Media, Inc. Natural language query processing
US7324990B2 (en) * 2002-02-07 2008-01-29 The Relegence Corporation Real time relevancy determination system and a method for calculating relevancy of real time information
AU2003245506A1 (en) * 2002-06-13 2003-12-31 Mark Logic Corporation Parent-child query indexing for xml databases
US7039625B2 (en) * 2002-11-22 2006-05-02 International Business Machines Corporation International information search and delivery system providing search results personalized to a particular natural language
GB0228941D0 (en) * 2002-12-12 2003-01-15 Ibm Methods, apparatus and computer programs for processing alerts and auditing in a publish/subscribe system
US7143026B2 (en) * 2002-12-12 2006-11-28 International Business Machines Corporation Generating rules to convert HTML tables to prose
US20040158799A1 (en) * 2003-02-07 2004-08-12 Breuel Thomas M. Information extraction from html documents by structural matching
US10332416B2 (en) * 2003-04-10 2019-06-25 Educational Testing Service Automated test item generation system and method
US7257585B2 (en) * 2003-07-02 2007-08-14 Vibrant Media Limited Method and system for augmenting web content
US7376552B2 (en) * 2003-08-12 2008-05-20 Wall Street On Demand Text generator with an automated decision tree for creating text based on changing input data
JP2005182280A (en) * 2003-12-17 2005-07-07 Ibm Japan Ltd Information retrieval system, retrieval result processing system, information retrieval method, and program
JP3790825B2 (en) * 2004-01-30 2006-06-28 独立行政法人情報通信研究機構 Text generator for other languages
US20060010172A1 (en) * 2004-07-07 2006-01-12 Irene Grigoriadis System and method for generating text
US7930169B2 (en) * 2005-01-14 2011-04-19 Classified Ventures, Llc Methods and systems for generating natural language descriptions from data
US7792829B2 (en) * 2005-01-28 2010-09-07 Microsoft Corporation Table querying
JP4581962B2 (en) * 2005-10-27 2010-11-17 株式会社日立製作所 Information retrieval system, index management method and program
US20070169021A1 (en) * 2005-11-01 2007-07-19 Siemens Medical Solutions Health Services Corporation Report Generation System
US8024653B2 (en) * 2005-11-14 2011-09-20 Make Sence, Inc. Techniques for creating computer generated notes
US20070150520A1 (en) * 2005-12-08 2007-06-28 Microsoft Corporation User defined event rules for aggregate fields
JP4956757B2 (en) * 2006-03-15 2012-06-20 国立大学法人大阪大学 Formula description structured language object search system and search method
US8024329B1 (en) * 2006-06-01 2011-09-20 Monster Worldwide, Inc. Using inverted indexes for contextual personalized information retrieval
US20100153213A1 (en) * 2006-08-24 2010-06-17 Kevin Pomplun Systems and Methods for Dynamic Content Selection and Distribution
US8095538B2 (en) * 2006-11-20 2012-01-10 Funnelback Pty Ltd Annotation index system and method
US7765216B2 (en) * 2007-06-15 2010-07-27 Microsoft Corporation Multidimensional analysis tool for high dimensional data
US8707166B2 (en) * 2008-02-29 2014-04-22 Sap Ag Plain text formatting of data item tables
KR100905434B1 (en) * 2008-08-08 2009-07-02 (주)이스트소프트 File uploading method with function of abstracting index-information in real-time and web-storage system using the same
JP5135272B2 (en) * 2009-03-24 2013-02-06 株式会社東芝 Structured document management apparatus and method
US8229952B2 (en) * 2009-05-11 2012-07-24 Business Objects Software Limited Generation of logical database schema representation based on symbolic business intelligence query
KR101667232B1 (en) * 2010-04-12 2016-10-19 삼성전자주식회사 Semantic based searching apparatus and semantic based searching method and server for providing semantic based metadata and method for operating thereof
US8527518B2 (en) * 2010-12-16 2013-09-03 Sap Ag Inverted indexes with multiple language support

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040205044A1 (en) * 2003-04-11 2004-10-14 International Business Machines Corporation Method for storing inverted index, method for on-line updating the same and inverted index mechanism
US20090024620A1 (en) * 2005-04-08 2009-01-22 Dong Arm Kim Method and Apparatus for Providing Search Result Using Language Chain

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2729886A4 *

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10282878B2 (en) 2012-08-30 2019-05-07 Arria Data2Text Limited Method and apparatus for annotating a graphical output
US9640045B2 (en) 2012-08-30 2017-05-02 Arria Data2Text Limited Method and apparatus for alert validation
US10504338B2 (en) 2012-08-30 2019-12-10 Arria Data2Text Limited Method and apparatus for alert validation
US10769380B2 (en) 2012-08-30 2020-09-08 Arria Data2Text Limited Method and apparatus for situational analysis text generation
US10467333B2 (en) 2012-08-30 2019-11-05 Arria Data2Text Limited Method and apparatus for updating a previously generated text
US10026274B2 (en) 2012-08-30 2018-07-17 Arria Data2Text Limited Method and apparatus for alert validation
US10963628B2 (en) 2012-08-30 2021-03-30 Arria Data2Text Limited Method and apparatus for updating a previously generated text
US10839580B2 (en) 2012-08-30 2020-11-17 Arria Data2Text Limited Method and apparatus for annotating a graphical output
US10565308B2 (en) 2012-08-30 2020-02-18 Arria Data2Text Limited Method and apparatus for configurable microplanning
US9600471B2 (en) 2012-11-02 2017-03-21 Arria Data2Text Limited Method and apparatus for aggregating with information generalization
US10216728B2 (en) 2012-11-02 2019-02-26 Arria Data2Text Limited Method and apparatus for aggregating with information generalization
US10853584B2 (en) 2012-11-16 2020-12-01 Arria Data2Text Limited Method and apparatus for expressing time in an output text
US10311145B2 (en) 2012-11-16 2019-06-04 Arria Data2Text Limited Method and apparatus for expressing time in an output text
US11176214B2 (en) 2012-11-16 2021-11-16 Arria Data2Text Limited Method and apparatus for spatial descriptions in an output text
US11580308B2 (en) 2012-11-16 2023-02-14 Arria Data2Text Limited Method and apparatus for expressing time in an output text
US9904676B2 (en) 2012-11-16 2018-02-27 Arria Data2Text Limited Method and apparatus for expressing time in an output text
US10860810B2 (en) 2012-12-27 2020-12-08 Arria Data2Text Limited Method and apparatus for motion description
US10803599B2 (en) 2012-12-27 2020-10-13 Arria Data2Text Limited Method and apparatus for motion detection
US9990360B2 (en) 2012-12-27 2018-06-05 Arria Data2Text Limited Method and apparatus for motion description
US10115202B2 (en) 2012-12-27 2018-10-30 Arria Data2Text Limited Method and apparatus for motion detection
US10776561B2 (en) 2013-01-15 2020-09-15 Arria Data2Text Limited Method and apparatus for generating a linguistic representation of raw input data
US9946711B2 (en) 2013-08-29 2018-04-17 Arria Data2Text Limited Text generation from correlated alerts
US10255252B2 (en) 2013-09-16 2019-04-09 Arria Data2Text Limited Method and apparatus for interactive reports
US11144709B2 (en) 2013-09-16 2021-10-12 Arria Data2Text Limited Method and apparatus for interactive reports
US10282422B2 (en) 2013-09-16 2019-05-07 Arria Data2Text Limited Method, apparatus, and computer program product for user-directed reporting
US10860812B2 (en) 2013-09-16 2020-12-08 Arria Data2Text Limited Method, apparatus, and computer program product for user-directed reporting
US10664558B2 (en) 2014-04-18 2020-05-26 Arria Data2Text Limited Method and apparatus for document planning
US10853586B2 (en) 2016-08-31 2020-12-01 Arria Data2Text Limited Method and apparatus for lightweight multilingual natural language realizer
US10445432B1 (en) 2016-08-31 2019-10-15 Arria Data2Text Limited Method and apparatus for lightweight multilingual natural language realizer
US10963650B2 (en) 2016-10-31 2021-03-30 Arria Data2Text Limited Method and apparatus for natural language document orchestrator
US10467347B1 (en) 2016-10-31 2019-11-05 Arria Data2Text Limited Method and apparatus for natural language document orchestrator
US11727222B2 (en) 2016-10-31 2023-08-15 Arria Data2Text Limited Method and apparatus for natural language document orchestrator
US11061979B2 (en) 2017-01-05 2021-07-13 International Business Machines Corporation Website domain specific search
US11144606B2 (en) 2017-01-23 2021-10-12 International Business Machines Corporation Utilizing online content to suggest item attribute importance
US10528633B2 (en) 2017-01-23 2020-01-07 International Business Machines Corporation Utilizing online content to suggest item attribute importance
US10747795B2 (en) 2018-01-11 2020-08-18 International Business Machines Corporation Cognitive retrieve and rank search improvements using natural language for product attributes

Also Published As

Publication number Publication date
US20130013616A1 (en) 2013-01-10
EP2729886A1 (en) 2014-05-14
EP2729886A4 (en) 2015-07-08

Similar Documents

Publication Publication Date Title
US20130013616A1 (en) Systems and Methods for Natural Language Searching of Structured Data
US10261954B2 (en) Optimizing search result snippet selection
JP6416150B2 (en) Search method, search system, and computer program
US9798820B1 (en) Classification of keywords
EP1988476A1 (en) Hierarchical metadata generator for retrieval systems
US20150356202A1 (en) Methods and apparatus for identifying concepts corresponding to input information
CN102609512A (en) System and method for heterogeneous information mining and visual analysis
Kanwal et al. A review of text-based recommendation systems
JP6165955B1 (en) Method and system for matching images and content using whitelist and blacklist in response to search query
CN105824872B (en) Method and system for search-based data detection, linking and acquisition
Lin et al. Finding topic-level experts in scholarly networks
EP3485394B1 (en) Contextual based image search results
US20160299951A1 (en) Processing a search query and retrieving targeted records from a networked database system
Jin et al. CT-Rank: A Time-aware Ranking Algorithm for Web Search.
CA3051919C (en) Machine learning (ml) based expansion of a data set
Mirizzi et al. Semantic tag cloud generation via DBpedia
CN114117242A (en) Data query method and device, computer equipment and storage medium
Ma et al. API prober–a tool for analyzing web API features and clustering web APIs
Gupta et al. Document summarisation based on sentence ranking using vector space model
US9530094B2 (en) Jabba-type contextual tagger
CN111782958A (en) Recommendation word determining method and device, electronic device and storage medium
Dinesh Real world evaluation of approaches to research paper recommendation
Saraswathi et al. Design of dynamically updated automatic ontology for mobile phone information retrieval system
Boughareb et al. Positioning Tags Within Metadata and Available Papers‟ Sections: Is It Valuable for Scientific Papers Categorization?
Lobo et al. A novel method for analyzing best pages generated by query term synonym combination

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12811026

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2012811026

Country of ref document: EP