US20230385321A1 - Systems and methods for processing a natural language query in data tables - Google Patents

Systems and methods for processing a natural language query in data tables Download PDF

Info

Publication number
US20230385321A1
US20230385321A1 US18/227,450 US202318227450A US2023385321A1 US 20230385321 A1 US20230385321 A1 US 20230385321A1 US 202318227450 A US202318227450 A US 202318227450A US 2023385321 A1 US2023385321 A1 US 2023385321A1
Authority
US
United States
Prior art keywords
formula
query
natural language
user
language query
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US18/227,450
Inventor
Nikunj Agrawal
Mukund Sundararajan
Shrikant Ravindra Shanbhag
Kedar Dhamdhere
Garima
Kevin Snow McCurley
Rohit Ananthakrishna
Daniel Adam Gundrum
Juyun June Song
Rifat Ralfi Nahmias
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Google LLC
Original Assignee
Google LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Google LLC filed Critical Google LLC
Priority to US18/227,450 priority Critical patent/US20230385321A1/en
Publication of US20230385321A1 publication Critical patent/US20230385321A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2452Query translation
    • G06F16/24522Translation of natural language queries to structured queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3325Reformulation based on results of preceding query
    • G06F16/3326Reformulation based on results of preceding query using relevance feedback from the user, e.g. relevance feedback on documents, documents sets, document terms or passages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3337Translation of the query language, e.g. Chinese to English
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/177Editing, e.g. inserting or deleting of tables; using ruled lines
    • G06F40/18Editing, e.g. inserting or deleting of tables; using ruled lines of spreadsheets
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/20Drawing from basic elements, e.g. lines or circles
    • G06T11/206Drawing of charts or graphs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing

Definitions

  • a spreadsheet is a data document that includes one or more data tables storing data under different categories.
  • the spreadsheet can perform calculation functions.
  • the user can construct a database search query to look for the desired data.
  • the user may use available data from the spreadsheet to derive the desired data.
  • the user may review the spreadsheet and identify relevant data entries in the spreadsheet, and then compile a formula using the calculation function associated with the spreadsheet to calculate the result. For example, when the spreadsheet records a test score for each student in a class, a user may want to know the average score of the class.
  • the user may need to compile a formula by summing up the test scores and then dividing by the number of students to obtain the average score of the class.
  • the data table may then calculate the average score of the class based on the compiled formula.
  • the user may need to manually compile a formula and input it into the data table for calculation, which may be inefficient when processing a large amount of data, and also requires a high level of knowledge of database operations from the user.
  • a query term is obtained from a natural language query.
  • a table summary is prepared based on the query term and a formula of the natural language query is generated based on the table summary.
  • a result is generated based on the formula. Responsive to receiving negative feedback to the result, one or more of the table summary or the formula is disassociated from the query term.
  • a natural language query may be originated by a user via a user interface.
  • the natural language query may be parsed to obtain a query term, and a grid range may be identified in a data table as relevant to the query term.
  • a table summary may be prepared including a plurality of data entities based on the grid range.
  • a logic operation may then be determined to apply on the plurality of data entities to derive the query term.
  • the logic operation may then be translated into a formula executable on the data table, and the formula is applied on the data table to generate a result in response to the natural language query.
  • the natural language query may be parsed to obtain a query term, and a grid range may be identified in a data table as relevant to the query ter.
  • a table summary may be prepared by extracting a plurality of characteristics from the grid range in the data table.
  • a logic operation may be determined to apply on the plurality of characteristics to derive a result corresponding to the query term.
  • the logic operation may be translated into a formula executable on the data in the data table, and the formula may be applied to the data in the data table to generate the result in response to the natural language query. Responsive to receiving negative feedback to the result, the negative feedback may be automatically processed and a new table summary may be prepared or a new logic operation may be determined based on the query term.
  • An alternative interpretation of the natural language query may be determined based on at least one of the new table summary or the new logic operation, and an alternative result may be generated based on the alternative interpretation.
  • the natural language query is submitted by the user via a user interface at a client device, and is manually or vocally entered by the user.
  • the natural language query may be received at a server from a client device via a hypertext transfer protocol (HTTP) post request.
  • HTTP hypertext transfer protocol
  • the natural language query may be originated in a first language (e.g., non-English, etc.) and may then be translated into a second language, e.g., English, for processing.
  • a grid range is identified at a client device when the data table is stored at the client device, or at a server after receiving the natural language query at the server when the data table is stored at the server.
  • the data table includes any data table stored at a client device, a remote server, or a cloud.
  • the plurality of data entities include any of dimensions, dimension filters and metrics.
  • the result is presented to the user via a visualization format including any of an answer statement, a chart, or a data plot.
  • the formula may be sorted associated with the natural language query or the query term when the user feedback is positive.
  • translating the logic operation into the formula includes selecting a formula build operation from a plurality of formula build operations based on the logic operation and executing the formula build operation to generate the formula based on the table summary and the logic operation.
  • the formula build operation may correspond to a type of formula.
  • the result may be translated from a second language (e.g., English) into a first language (e.g., non-English) when the natural language query is received in the first respective language from the user.
  • a second language e.g., English
  • a first language e.g., non-English
  • FIG. 1 is a block diagram of a computerized system 100 for natural language query processing, according to an illustrative embodiment
  • FIG. 2 A- 2 B provides an example logic flow diagram illustrating aspects of processing a natural language query in a data spreadsheet, according to some embodiments described herein;
  • FIG. 3 provides a block diagram illustrating example aspects of data flows between various components at the client side and the server side to process a natural language query, according to some embodiments described herein;
  • FIG. 4 provides an example user interface diagram illustrating aspects of the answer panel (e.g., 302 in FIG. 3 ), according to some embodiments described herein;
  • FIG. 5 is a block diagram of a computing device, such as any of the components of the system of FIG. 1 , for performing any of the processes described herein.
  • the computerized systems described herein may comprise one or more engines, which include a processing device or devices, such as a computer, microprocessor, logic device or other device or processor that is configured with hardware, firmware, and software to carry out one or more of the computerized methods described herein.
  • a processing device or devices such as a computer, microprocessor, logic device or other device or processor that is configured with hardware, firmware, and software to carry out one or more of the computerized methods described herein.
  • Systems and methods for processing a natural language query allow a user to enter a query for data in natural language.
  • the natural language query may be translated into a structured database query.
  • the structured database query indicates the data is not readily available in the data table, existing data entries may be identified in the data table that may be relevant to generate the desired data, and a formula may be automatically compiled to derive the desired data based on the available data entries.
  • the data source includes a spreadsheet that records a test score for each student in a class
  • a user may input a natural language query “what is the average score of the class?”
  • the natural language query may be interpreted and parsed by extracting terms from the query, such as “what,” “is,” “the,” “average,” “score,” “of,” “the,” and “class.”
  • the term “average score” may be identified as a key term of the query based on previously stored key terms that are commonly used. It may then be determined that no data entry is available in the spreadsheet corresponding to the data category “average score,” e.g., no column header corresponds to “average score.” Logic may then be identified to derive an “average score” from the existing data entries.
  • an “average score” may be calculated by summing up all the test scores in the class and dividing the sum by the total number of students.
  • a formula may then be automatically generated to calculate the “average score” and output the calculation result to the user in response to the natural language query.
  • the generated formula may be stored in association with a tag “average score” such that even when the spreadsheet is updated with more data entries, e.g., with new test scores associated with additional students, the formula may still be applicable to automatically calculate an average score of the class, in response to the natural language query.
  • the platform may help the users to generate structured queries or even formulas.
  • FIG. 1 is a block diagram of a computerized system 100 for natural language query processing, according to an illustrative implementation.
  • the system 100 includes a server 104 , two remote databases 114 a - 114 b (generally referred to as remote database 114 ), user devices 108 a - b (generally referred to as user device 108 ), and/or other related entities that communicate with one another over a network 101 .
  • the user devices 108 a and 108 b contain user interfaces 110 a and 110 b (generally referred to as user interface 110 ) respectively.
  • Each user device 108 includes a device such as a personal computer, a laptop computer, a tablet, a smartphone, a personal digital assistant, or any other suitable type of computer of communication device. Users at the user device 108 access and receive information from the server 104 and remote databases 114 over the network 101 .
  • the user device 108 may include components, such as an input device and an output device.
  • a user may operate the user device 108 to input a natural language query via the user interface 110 , and the processor 112 a - b (generally processor 112 ) may process the natural language query.
  • the user device 108 may process the natural language query locally and search within a local database.
  • the user device 108 may send the natural language query to a remote server 104 , which may store data tables 106 and use a processor 102 to analyze the natural language query.
  • the server 104 may provide updates and may access remote databases 114 a - b for a data query.
  • a natural language query is received at the user device 108
  • the database query may be performed locally at the user device 108 , at the data tables 106 stored at the server 104 , or at the remote databases 114 (e.g., cloud, etc.).
  • the user device 108 may have a locally installed spreadsheet application for a user to review data and enter a natural language query.
  • such spreadsheet application may not be installed at the user device 108 , and a user may access a spreadsheet or a data table stored at the server 104 via a remote access component within a browser application, or a mobile application.
  • FIGS. 2 A- 2 B provide an example logic flow diagram illustrating aspects of processing a natural language query in a data spreadsheet, according to some implementations described herein.
  • a natural language query may be received via a user interface, e.g., at the user interface 110 of a user device 108 in FIG. 1 .
  • the natural language query may be a question entered by a user such as “how's the growth of monthly total sales,” “what is the average score of MATH 301 ,” and/or the like.
  • the natural language query may be manually typed in by a user via an input device, or articulated by the user via a microphone and captured by the user device.
  • the natural language query may also be automatically generated by an analytics application and passed through to the server via an application programming interface (API) from another program.
  • API application programming interface
  • a business analytics software may automatically generate a list of business analytics questions such as “how's the growth of monthly total sales” in a natural language, and the question may be automatically sent to the server.
  • the natural language query may be originated in a variety of different natural language, and may be translated into a language compatible with the platform (e.g., the operating system, or the natural language query processing tool, and/or the like), such as English, etc.
  • the natural language query may optionally be parsed to extract key terms and a query string may be generated.
  • the parsing may be performed at the user device.
  • the server may receive a parse request over Hypertext Transfer Protocol (HTTP) from the user device.
  • HTTP Hypertext Transfer Protocol
  • the server may send a request to an analytics module (e.g., see 305 in FIG. 3 ).
  • an analytics module e.g., see 305 in FIG. 3 .
  • Words such as “monthly growth” and “sales” may be identified as query terms based on previously stored query term rules and/or heuristics from previously processed queries.
  • the query string may optionally be sent to the server.
  • the natural language query may be processed within one or more spreadsheets that are locally stored on the user device.
  • one or more data tables or spreadsheets, or a grid range of a spreadsheet may be identified as relevant to the query string.
  • a table detection module e.g., see 307 in FIG. 3
  • natural language key terms from the query string may be used to identify relevant data tables/spreadsheets.
  • key terms such as “growth,” “monthly,” “sales,” data tables/spreadsheet that have a column or a row recording monthly sales may be identified.
  • data tables/spreadsheet can also be identified based on previously used data tables for similar query strings, e.g., when a natural language query “what is the monthly distribution of sales” identified a certain data table, the same data table may be identified for the query “how's the growth of monthly total sales” as well.
  • the selected range of cells from the data table may be flipped in orientation if necessary.
  • the user may manually select the cells by selecting a single cell or a range of cells that may belong to a table.
  • the cells surrounding the selection are analyzed for possible table structures.
  • a table schema may be generated based on the selected range of cells.
  • several table schemas may be sent in a batch request to the server.
  • the user device may only send the grid range of the detected table (for chart recommendations), and the server may determine a table structure from the sent grid range.
  • the server may prepare a table summary by extracting the dimensions, columns, rows, metrics, dimension filters, and/or other characteristics of the detected data table, and map the extracted table characteristics to cell ranges or addresses in a spreadsheet.
  • the table summary may include the number and index of rows and columns, the corresponding value in a cell identified by the row and column number, the metric of the value, and/or the like.
  • the server may extract operations to be applied to the data table, and translate the operations into one or more formulas executable on the data table. Further details of the formula building embodiments may be found in connection with FIG. 3 .
  • the server may send the formula(s) back to the user device, and the formula(s) may be applied on the detected data table to generate a result in response to the natural language query.
  • the generated result may be presented via different visualization, such as, but not limited to, a pie chart, a data plot, and/or the like.
  • the user may provide feedback on the result. For example, the user may provide a positive rating if the result is accurate. Or, the user may submit a negative rating when the result is unsatisfactory, e.g., misinterpreting the question, insufficient data, etc.
  • the server may save the formula building objects such as the table summary, formula(s) associated with the query string, for machine learning purposes at 210 , so that the formula may be reused, or used as a reference when similar questions are received.
  • the server may disassociate the formula building objects with the question at 211 , so that when similar questions are received, such questions are not to be interpreted in the same way.
  • the server may optionally obtain further information from the user feedback on the result. For example, if the user asks “how's the monthly growth of sales,” and a result of the monthly increase from last month to the current month is provided but the user submits negative feedback, the user interface may prompt the user to provide further information. The user may be prompted to re-enter the question with a time period “how's the monthly growth of sales from ______ to ______” Or the user interface may prompt the user to confirm whether the identified data entities “monthly growth” and “sales” are accurate. As another example, the user interface may provide suggested queries to the user, if the server fails to parse and identify what the natural language query is. Other additional methods may be employed for the user to provide further detailed feedback to refine the question.
  • the server may provide an alternative interpretation of the query string based on information obtained at 212 , and may generate an alternative formula using the alternative table summary at 214 . Then the server may proceed at 207 to provide the updated result to the user.
  • FIG. 3 provides a block diagram illustrating example aspects of data flows between various components at the client side and the server side to process a natural language query, according to some embodiments described herein.
  • a user interface may present an answer panel 302 (e.g., see 401 - 402 in FIG. 4 ), which may post a query request 321 to a backend server 300 (e.g, the server 104 in FIG. 1 ).
  • the query request 321 may include a query string (e.g., the question asked by a user, or key terms extracted from the original natural language question asked by the user, etc.), a list of data entities (e.g., table schema generated based on key terms from the query string, etc.), a grid range from an existing data table, and/or the like.
  • the backend server 302 may be operated under a java environment, and may pass the query request 321 to be processed at a series of modules such as but not limited to a get-answer action module 303 , an entity list extractor 304 , an analytics module 305 , a query interpreter 306 , a table detector 307 , and/or the like.
  • the get-answer action module 303 may act as a communication interface that receives the client request 321 , which may include query parameters such as a query string (e.g., question asked by user, etc.), a grid range of the data table detected in and around cell selection, and/or the like. If the request 321 has reached the server, the grid range may contain a constructed table. On the other hand, if no data table is detected or the selected grid range does not contain any data, the answer panel interface 302 may not be presented to a user at the beginning.
  • query parameters such as a query string (e.g., question asked by user, etc.), a grid range of the data table detected in and around cell selection, and/or the like. If the request 321 has reached the server, the grid range may contain a constructed table. On the other hand, if no data table is detected or the selected grid range does not contain any data, the answer panel interface 302 may not be presented to a user at the beginning.
  • the get-answer action module 303 may send the grid range information 322 to the entity list extractor 304 to get a table view of the data entity list based on the grid range information, e.g., a sub-table having columns and rows defining relevant data entities.
  • the entity list extractor 304 may construct a table schema, e.g., a data entity list including data entities relevant to the query.
  • the entity list extractor 304 may obtain a table summary 324 (e.g., including column headers, data types, labels column, cell metrics, and/or other properties) from the table detector 307 .
  • the entity list extractor 304 may also build a typed table 323 from the grid range and pass it on to the table detector 307 for summarization.
  • the entity list extractor 304 may provide a table view that is an object representation of the data entity list.
  • the entity list may be represented in a data structure as a per-table knowledge graph, represented by graph nodes such as but not limited to dimensions, dimension filters, metrics, and/or the like.
  • Dimensions may include header of a column whose values act as row keys (or labels) into the table. For example, “Country” will be a dimension in a table with country names as labels or row keys).
  • Dimension filters may include values in the dimension column (row keys/label column). For example, “India”, “USA” are the dimension filters for the dimension “Country.”
  • Metrics may include all number columns taken as metrics or column values.
  • a user may look for a metrics for a particular dimension filter (or label). For example, in the string “Population of India,” “Population” is identified as a metric and dimension filter is identified as “India” for dimension “Country.”
  • the entity list extractor 304 may provide an entity list table view 325 to the get-answer action module 303 .
  • the entity list table view 325 may be generated by extracting metrics, dimensions and dimension filters from the table summary 324 .
  • metrics e.g, a column header “population” is a metric as in the above example
  • all string and date/time column headers are dimensions (e.g., a column header “country,” a text string, is a dimension) and the values in these dimension columns are dimension filters (e.g, values under the column header “country” such as “U.S.A.” “India” etc., are dimension filters).
  • Other determination of the metrics, dimensions and dimension filters can be applied.
  • the entity list table view 325 may serve to reverse lookup row and column indices given a dimension, metric or dimension filter string, which may be used to map parameters such as dimensions, metrics, dimension filters back to the grid row and column indices during formula construction.
  • the entity list table view 325 may provide a metrics-to-column number map, a dimensions-to-column number map, and a dimension-filters-to-row-and-column pair map.
  • the table detector 307 may extract information from a data table and generate a table summary 324 , which may be used to determine what entities in the table can be used to generate a formula to derive the query term.
  • Tables can be generally represented as a common object, which stores the data in a table, the column headers and types of data in the columns, and derived facts about the data.
  • the table detector 307 may extract information from a data table in several steps. First, light parsing of cells and inference of column headers and data types may be performed. For cells having numeric values between 1900-2100, the cells may be interpreted as years, instead of pure numeric values. The table detector 307 may then filter out spurious rows and columns, including but not limited to empty rows/columns, numeric columns with ID numbers, columns for taking notes, and/or the like.
  • the table detector 307 may then add column-based statistics. For example, for all column types, the number of missing or distinct values may be recorded. For numeric columns, the number of negative/positive/floats/zero values, as well as the sum, standard deviation, monotonicity and uniformity of the array may be recorded. For string columns, the ratio of numeric to non-numeric characters, an average string length, and a maximum string length may be recorded.
  • the table object created from the input table cell values from the data table 323 may then used to create an aggregate table.
  • Each column in the aggregate table may be inspected to determine a number of unique values as compared to the number of total values (e.g, the range of data values). If the column is categorical (e.g., when the unique values in the column is a subset of the entire spectrum of data values), then the column may be used to create an aggregated table.
  • each categorical column two aggregated objects may be created in association with the column.
  • a new “count” aggregated object may be created to record information relating to the “count” of a unique value.
  • each row of the object may represent a unique value, and in each row, the first cell stores the unique value and the second cell records the number of times that the respective unique value appears in the original categorical column.
  • a new “sum” aggregated object may be created to record the total sum of each unique value in the original table.
  • each row of the object represents a unique value
  • each column of the object represents a categorical numeric column in the original table 323 .
  • the value in each cell of the object represents a sum of unique values of all cells in the respective categorical column that contain the respective unique value (based on the respective row of the object).
  • Example Data Grid Yes 1 Yes 3 No 2 Yes 5 No 3 No 2 instead of charting or responding with the raw data grid above, the first column may be pivoted or grouped and the second column is to be summed per distinct entries in the first column so that Table 1 can be recorded as “Yes, 9; No, 7.” Or alternatively, the count of each repeated entry “Yes” or “No” may be recorded such that Table 1 can be recorded as “Yes, 3; No, 3.”
  • the “count” and “sum” object may be example objects for aggregation.
  • average aggregation objects may be created, e.g., using an average value of the “count” or “sum.”
  • the objects recording the count and sum of each unique value may be used to carry information of the original data table 323 .
  • the get-answer action module 303 may also send a parse request 326 including data entity list information and query information to the analytics module 305 , which may generate a parse response 327 .
  • the parse response 327 may include a structured data table/spreadsheet query represented as the query in the protocol buffer.
  • the query interpreter 306 may interpret returned query response 328 to an executable formula 329 string using the entity list table view passed on from the get-answer action module 303 .
  • the query interpreter 306 may include various comparable classes for formula builder, e.g., a particular formula builder may correspond to one type of formula. Here a given set and count of fields in the query 328 may correspond to only one formula, e.g., a query with exactly two metrics corresponds to a correlation formula.
  • the query interpreter 306 may invoke a variety of operations.
  • An example operation includes a query scoring operation, e.g., scoreQuery (the query in the protocol buffer), which returns a score, built simply by counting the number of fields of the input query in the protocol buffer it can consume, or returns a negative/zero score if the fields in the query in the protocol buffer are not sufficient to build a formula.
  • scoreQuery the query in the protocol buffer
  • the scoreQuery( ) operator may return a score of two (e.g., one point for satisfying the at least one dimension requirement and one point for satisfying the at least one dimension filter requirement).
  • the score of two indicates that the parameters included in the query in the protocol buffer are sufficient for formula building.
  • a given query may have more than one formula builder that may return the same score, e.g., if another formula builder that requires just two dimension filters, the input query in the protocol buffer in the above example would also be given a score of two with this formula builder.
  • the query interpreter 306 may then run a getFormula (query in the protocol buffer, Entity ListTableView) operation, based on the input of the query and the entity list table view at 328 . After determining that the query score is a positive number, the query interpreter 306 may return a formula built by joining data in the input values query in the protocol buffer and EntityListTableView.
  • the query interpreter 306 may take in a list of formula builders available (injected), and may interpret the input query in the protocol buffer by first scoring each formula builder by the number of fields of the input query in the protocol buffer may consume. This may filter out a set of formula builders that cannot understand the input query in the protocol buffer. If there is at least one formula builder with a positive score in response to the input query in the protocol buffer, the formula builder with the highest score may be used to map a formula 329 . In this way, the formula builder that consumes the maximum number of fields from the input query in the protocol buffer can be used to construct the possible formula parses.
  • the query interpreter 306 may be structured as a class with multiple smaller formula builders plugged into it. In this way, the query interpreter structure can be expandable with additional formula builders. For example, when a different type of query is received, new formula type may be added to the formula builders without the need to change the existing formula builder.
  • a JSON response 330 including the formula may be returned to the answer panel 302 at the frontend 301 (e.g., at the client side).
  • the answer panel 302 may then provide the formula 331 to a formula preview calculator 308 , which may in turn generate a result 332 based on the formula.
  • the answer panel 302 may then provide the result to the user at 333 .
  • FIG. 4 provides an example user interface diagram illustrating aspects of the answer panel (e.g., 302 in FIG. 3 ), according to some embodiments described herein.
  • Example mobile interface 401 and 402 show example mobile screens of the answer panel 302 .
  • the answer panel 302 may have an interface on a desktop computer, e.g., similar to a browser-based application.
  • a user can type a natural language question in the query box 403 , e.g., “how's the growth of monthly totals?”
  • the query box 403 may provide a suggested query in response to the user entered question, to help users better understand how to structure their own questions using the precise terms.
  • the question intake at the query box 403 may also automatically complete, or correct typographical mistakes from, the user-entered question, so that the data entities for the query can be auto-completed.
  • the query may be annotated with same colors with relevant sections in a spreadsheet to show how key terms in the query relate back to sections in the spreadsheet.
  • An answer may be provided at 404 , e.g., a statement containing a calculated result of the “monthly total”
  • the answer may include a human-friendly interpretation of the answer in natural language, e.g., “for every week, monthly total increases by,” and the calculated result, “$1,500.”
  • a user asks the question in a certain language (e.g., non-English)
  • the answer may correspondingly be provided in the same language.
  • the answer to the query “how's the growth of monthly totals” may take a variety of visualization format.
  • a chart may be generated showing different data plots 407 over a period of time, such as the monthly totals, commission income, sales of product and service income, etc, as related to the query question “growth of monthly total.”
  • the answer panel may further provide analytics of the data plots at 408 .
  • the answer screen 401 or 402 may include a rating button, a “like” or “dislike” button, or a “thumbs up” or “thumbs down” button for the user to provide feedback to the answer to the original question asked.
  • FIG. 5 is a block diagram of a computing device, which could be any of the components of the system of FIG. 1 , for performing any of the processes described in FIGS. 2 A- 3 or provide the user interface described in FIG. 4 .
  • Each of the components of these systems may be implemented on one or more computing devices 500 .
  • a plurality of the components of these systems may be included within one computing device 500 .
  • a component and a storage device may be implemented across several computing devices 500 .
  • the computing device 500 comprises at least one communications interface unit, an input/output controller 510 , system memory, and one or more data storage devices.
  • the system memory includes at least one random access memory (RAM 502 ) and at least one read-only memory (ROM 504 ). All of these elements are in communication with a central processing unit (CPU 506 ) to facilitate the operation of the computing device 500 .
  • the computing device 500 may be configured in many different ways. For example, the computing device 500 may be a conventional standalone computer or alternatively, the functions of computing device 500 may be distributed across multiple computer systems and architectures. In FIG. 5 , the computing device 500 is linked, via network or local network, to other servers or systems.
  • the computing device 500 may be configured in a distributed architecture, wherein databases and processors are housed in separate units or locations. Some units perform primary processing functions and contain at a minimum a general controller or a processor and a system memory. In distributed architecture implementations, each of these units may be attached via the communications interface unit 508 to a communications hub or port (not shown) that serves as a primary communication link with other servers, client or user computers and other related devices.
  • the communications hub or port may have minimal processing capability itself, serving primarily as a communications router. A variety of communications protocols may be part of the system.
  • the CPU 506 comprises a processor, such as one or more conventional microprocessors and one or more supplementary co-processors such as math co-processors for offloading workload from the CPU 506 .
  • the CPU 506 is in communication with the communications interface unit 508 and the input/output controller 510 , through which the CPU 506 communicates with other devices such as other servers, user terminals, or devices.
  • the communications interface unit 508 and the input/output controller 510 may include multiple communication channels for simultaneous communication with, for example, other processors, servers or client terminals.
  • the CPU 506 is also in communication with the data storage device.
  • the data storage device may comprise an appropriate combination of magnetic, optical or semiconductor memory, and may include, for example, RAM 502 , ROM 504 , flash drive, an optical disc such as a compact disc or a hard disk or drive.
  • the CPU 506 and the data storage device each may be, for example, located entirely within a single computer or other computing device; or connected to each other by a communication medium, such as a USB port, serial port cable, a coaxial cable, an Ethernet cable, a telephone line, a radio frequency transceiver or other similar wireless or wired medium or combination of the foregoing.
  • the CPU 506 may be connected to the data storage device via the communications interface unit 508 .
  • the CPU 506 may be configured to perform one or more particular processing functions.
  • the data storage device may store, for example, (i) an operating system 512 for the computing device 500 ; (ii) one or more applications 514 (e.g., computer program code or a computer program product) adapted to direct the CPU 506 in accordance with the systems and methods described here, and particularly in accordance with the processes described in detail with regard to the CPU 506 ; or (iii) database(s) 516 adapted to store storage management information that may be utilized to manage storage information required by the program.
  • applications 514 e.g., computer program code or a computer program product
  • database(s) 516 adapted to store storage management information that may be utilized to manage storage information required by the program.
  • the operating system 512 and applications 514 may be stored, for example, in a compressed, an uncompiled or an encrypted format, and may include computer program code.
  • the instructions of the program may be read into a main memory of the processor from a computer-readable medium other than the data storage device, such as from the ROM 504 or from the RAM 502 . While execution of sequences of instructions in the program causes the CPU 506 to perform the process steps described herein, hard-wired circuitry may be used in place of, or in combination with, software instructions for implementation of the processes of the present disclosure. Thus, the systems and methods described are not limited to any specific combination of hardware and software.
  • Suitable computer program code may be provided for performing one or more functions in relation to any of the processes as described herein.
  • the program also may include program elements such as an operating system 512 , a database management system and “device drivers” that allow the processor to interface with computer peripheral devices (e.g, a video display, a keyboard, a computer mouse, etc.) via the input/output controller 510 .
  • computer peripheral devices e.g, a video display, a keyboard, a computer mouse, etc.
  • Non-volatile media include, for example, optical, magnetic, or opto-magnetic disks, or integrated circuit memory, such as flash memory.
  • Volatile media include dynamic random access memory (DRAM), which typically constitutes the main memory.
  • Computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, an EPROM or EEPROM (electronically erasable programmable read-only memory), a FLASH-EEPROM, any other memory chip or cartridge, or any other non-transitory medium from which a computer can read.
  • a floppy disk a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, an EPROM or EEPROM (electronically erasable programmable read-only memory), a FLASH-EEPROM, any other memory chip or cartridge, or any other non-transitory medium from which a computer can read.
  • Various forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to the CPU 506 (or any other processor of a device described herein) for execution.
  • the instructions may initially be borne on a magnetic disk of a remote computer (not shown).
  • the remote computer can load the instructions into its dynamic memory and send the instructions over an Ethernet connection, a cable line, or even a telephone line using a modem.
  • a communications device local to a computing device 100 e.g., a server
  • the system bus carries the data to main memory, from which the processor retrieves and executes the instructions.
  • the instructions received by main memory may optionally be stored in memory either before or after execution by the processor.
  • instructions may be received via a communication port as electrical, electromagnetic or optical signals, which are exemplary forms of wireless communications or data streams that carry various types of information.
  • a communication port as electrical, electromagnetic or optical signals, which are exemplary forms of wireless communications or data streams that carry various types of information.

Abstract

Systems and methods are disclosed herein for processing a natural language query on data tables. According to some embodiments, a query term is obtained from a natural language query. A table summary is prepared based on the query term and a formula of the natural language query is generated based on the table summary. A result is generated based on the formula. Responsive to receiving negative feedback to the result, one or more of the table summary or the formula is disassociated from the query term.

Description

    RELATED APPLICATION
  • This application is a continuation of U.S. patent application Ser. No. 17/306,666 filed May 3, 2021, which is a continuation of U.S. patent application Ser. No. 15/408,664 filed Jan. 18, 2017, now U.S. Pat. No. 10,997,227 issued on May 4, 2021, each of which is incorporated by reference herein in its entirety.
  • BACKGROUND
  • A spreadsheet is a data document that includes one or more data tables storing data under different categories. Sometimes the spreadsheet can perform calculation functions. When a user wants to obtain certain data from the spreadsheet, the user can construct a database search query to look for the desired data. Sometimes, if the user-desired data is not previously stored by the spreadsheet, the user may use available data from the spreadsheet to derive the desired data. The user may review the spreadsheet and identify relevant data entries in the spreadsheet, and then compile a formula using the calculation function associated with the spreadsheet to calculate the result. For example, when the spreadsheet records a test score for each student in a class, a user may want to know the average score of the class. Then the user may need to compile a formula by summing up the test scores and then dividing by the number of students to obtain the average score of the class. The data table may then calculate the average score of the class based on the compiled formula. Thus the user may need to manually compile a formula and input it into the data table for calculation, which may be inefficient when processing a large amount of data, and also requires a high level of knowledge of database operations from the user.
  • SUMMARY
  • Systems and methods are disclosed herein for processing a natural language query on data tables, e.g., a spreadsheet, etc. According to some embodiments, a query term is obtained from a natural language query. A table summary is prepared based on the query term and a formula of the natural language query is generated based on the table summary. A result is generated based on the formula. Responsive to receiving negative feedback to the result, one or more of the table summary or the formula is disassociated from the query term.
  • According to some embodiments, a natural language query may be originated by a user via a user interface. The natural language query may be parsed to obtain a query term, and a grid range may be identified in a data table as relevant to the query term. A table summary may be prepared including a plurality of data entities based on the grid range. A logic operation may then be determined to apply on the plurality of data entities to derive the query term. The logic operation may then be translated into a formula executable on the data table, and the formula is applied on the data table to generate a result in response to the natural language query.
  • The natural language query may be parsed to obtain a query term, and a grid range may be identified in a data table as relevant to the query ter. A table summary may be prepared by extracting a plurality of characteristics from the grid range in the data table. A logic operation may be determined to apply on the plurality of characteristics to derive a result corresponding to the query term. The logic operation may be translated into a formula executable on the data in the data table, and the formula may be applied to the data in the data table to generate the result in response to the natural language query. Responsive to receiving negative feedback to the result, the negative feedback may be automatically processed and a new table summary may be prepared or a new logic operation may be determined based on the query term. An alternative interpretation of the natural language query may be determined based on at least one of the new table summary or the new logic operation, and an alternative result may be generated based on the alternative interpretation.
  • In some implementations, the natural language query is submitted by the user via a user interface at a client device, and is manually or vocally entered by the user. The natural language query may be received at a server from a client device via a hypertext transfer protocol (HTTP) post request. The natural language query may be originated in a first language (e.g., non-English, etc.) and may then be translated into a second language, e.g., English, for processing.
  • In some implementations, a grid range is identified at a client device when the data table is stored at the client device, or at a server after receiving the natural language query at the server when the data table is stored at the server.
  • In some implementations, the data table includes any data table stored at a client device, a remote server, or a cloud.
  • In some implementations, the plurality of data entities include any of dimensions, dimension filters and metrics.
  • In some implementations, the result is presented to the user via a visualization format including any of an answer statement, a chart, or a data plot.
  • In some implementations, the formula may be sorted associated with the natural language query or the query term when the user feedback is positive.
  • In some implementations, translating the logic operation into the formula includes selecting a formula build operation from a plurality of formula build operations based on the logic operation and executing the formula build operation to generate the formula based on the table summary and the logic operation. The formula build operation may correspond to a type of formula.
  • In some implementations, the result may be translated from a second language (e.g., English) into a first language (e.g., non-English) when the natural language query is received in the first respective language from the user.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • Further features of the disclosure, its nature and various advantages will become apparent upon consideration of the following detailed description, taken in conjunction with the accompanying drawings, in which like reference characters refer to like parts throughout, and in which:
  • FIG. 1 is a block diagram of a computerized system 100 for natural language query processing, according to an illustrative embodiment;
  • FIG. 2A-2B provides an example logic flow diagram illustrating aspects of processing a natural language query in a data spreadsheet, according to some embodiments described herein;
  • FIG. 3 provides a block diagram illustrating example aspects of data flows between various components at the client side and the server side to process a natural language query, according to some embodiments described herein;
  • FIG. 4 provides an example user interface diagram illustrating aspects of the answer panel (e.g., 302 in FIG. 3 ), according to some embodiments described herein; and
  • FIG. 5 is a block diagram of a computing device, such as any of the components of the system of FIG. 1 , for performing any of the processes described herein.
  • DETAILED DESCRIPTION
  • To provide an overall understanding of the disclosure, certain illustrative embodiments will now be described, including systems and methods for connecting with remote databases. In particular, a connection between an application and a remote database is described. The application modifies the format of the data imported from the remote database before displaying the modified data to the user. However, it will be understood by one of ordinary skill in the art that the systems and methods described herein may be adapted and modified as is appropriate for the application being addressed and that the systems and methods described herein may be employed in other suitable applications, and that such other additions and modifications will not depart from the scope thereof. Generally, the computerized systems described herein may comprise one or more engines, which include a processing device or devices, such as a computer, microprocessor, logic device or other device or processor that is configured with hardware, firmware, and software to carry out one or more of the computerized methods described herein.
  • Systems and methods for processing a natural language query allow a user to enter a query for data in natural language. The natural language query may be translated into a structured database query. When the structured database query indicates the data is not readily available in the data table, existing data entries may be identified in the data table that may be relevant to generate the desired data, and a formula may be automatically compiled to derive the desired data based on the available data entries.
  • For example, when the data source includes a spreadsheet that records a test score for each student in a class, a user may input a natural language query “what is the average score of the class?” The natural language query may be interpreted and parsed by extracting terms from the query, such as “what,” “is,” “the,” “average,” “score,” “of,” “the,” and “class.” Among the extracted terms, the term “average score” may be identified as a key term of the query based on previously stored key terms that are commonly used. It may then be determined that no data entry is available in the spreadsheet corresponding to the data category “average score,” e.g., no column header corresponds to “average score.” Logic may then be identified to derive an “average score” from the existing data entries. For example, it may be determined that an “average score” may be calculated by summing up all the test scores in the class and dividing the sum by the total number of students. A formula may then be automatically generated to calculate the “average score” and output the calculation result to the user in response to the natural language query. The generated formula may be stored in association with a tag “average score” such that even when the spreadsheet is updated with more data entries, e.g., with new test scores associated with additional students, the formula may still be applicable to automatically calculate an average score of the class, in response to the natural language query.
  • In this way, a user may get an answer about their data in a faster and more efficient way than by manually entering formulas or doing other forms of analysis by hand. For users who may not have the knowledge of all the features of the spreadsheet, the platform may help the users to generate structured queries or even formulas.
  • FIG. 1 is a block diagram of a computerized system 100 for natural language query processing, according to an illustrative implementation. The system 100 includes a server 104, two remote databases 114 a-114 b (generally referred to as remote database 114), user devices 108 a-b (generally referred to as user device 108), and/or other related entities that communicate with one another over a network 101. The user devices 108 a and 108 b contain user interfaces 110 a and 110 b (generally referred to as user interface 110) respectively.
  • Each user device 108 includes a device such as a personal computer, a laptop computer, a tablet, a smartphone, a personal digital assistant, or any other suitable type of computer of communication device. Users at the user device 108 access and receive information from the server 104 and remote databases 114 over the network 101. The user device 108 may include components, such as an input device and an output device. In some implementations, a user may operate the user device 108 to input a natural language query via the user interface 110, and the processor 112 a-b (generally processor 112) may process the natural language query. In some implementations, the user device 108 may process the natural language query locally and search within a local database. In some implementations, the user device 108 may send the natural language query to a remote server 104, which may store data tables 106 and use a processor 102 to analyze the natural language query.
  • The server 104 may provide updates and may access remote databases 114 a-b for a data query. Thus, when a natural language query is received at the user device 108, upon translation of the query into a database query, the database query may be performed locally at the user device 108, at the data tables 106 stored at the server 104, or at the remote databases 114 (e.g., cloud, etc.).
  • In some implementations, the user device 108 may have a locally installed spreadsheet application for a user to review data and enter a natural language query. In some implementations, such spreadsheet application may not be installed at the user device 108, and a user may access a spreadsheet or a data table stored at the server 104 via a remote access component within a browser application, or a mobile application.
  • FIGS. 2A-2B provide an example logic flow diagram illustrating aspects of processing a natural language query in a data spreadsheet, according to some implementations described herein. At 201, a natural language query may be received via a user interface, e.g., at the user interface 110 of a user device 108 in FIG. 1 . The natural language query may be a question entered by a user such as “how's the growth of monthly total sales,” “what is the average score of MATH 301,” and/or the like. The natural language query may be manually typed in by a user via an input device, or articulated by the user via a microphone and captured by the user device. The natural language query may also be automatically generated by an analytics application and passed through to the server via an application programming interface (API) from another program. For example, a business analytics software may automatically generate a list of business analytics questions such as “how's the growth of monthly total sales” in a natural language, and the question may be automatically sent to the server. In some implementations, the natural language query may be originated in a variety of different natural language, and may be translated into a language compatible with the platform (e.g., the operating system, or the natural language query processing tool, and/or the like), such as English, etc.
  • At 202, the natural language query may optionally be parsed to extract key terms and a query string may be generated. In some implementations, the parsing may be performed at the user device. Or alternatively, the server may receive a parse request over Hypertext Transfer Protocol (HTTP) from the user device. The server may send a request to an analytics module (e.g., see 305 in FIG. 3 ). For example, for a natural language question “what is the monthly growth of sales,” words in the question may be extracted and assessed to rule out words such as “what,” “is,” “the,” “of,” etc. as meaningful query terms. Words such as “monthly growth” and “sales” may be identified as query terms based on previously stored query term rules and/or heuristics from previously processed queries.
  • At 203, the query string may optionally be sent to the server. Alternatively, the natural language query may be processed within one or more spreadsheets that are locally stored on the user device.
  • At 204, one or more data tables or spreadsheets, or a grid range of a spreadsheet, may be identified as relevant to the query string. A table detection module (e.g., see 307 in FIG. 3 ) may be used to output tables detected from originally stored data tables or spreadsheets, e.g., based on heuristics or machine learning. For example, natural language key terms from the query string may be used to identify relevant data tables/spreadsheets. When the query string includes key terms such as “growth,” “monthly,” “sales,” data tables/spreadsheet that have a column or a row recording monthly sales may be identified. As another example, data tables/spreadsheet can also be identified based on previously used data tables for similar query strings, e.g., when a natural language query “what is the monthly distribution of sales” identified a certain data table, the same data table may be identified for the query “how's the growth of monthly total sales” as well.
  • The selected range of cells from the data table may be flipped in orientation if necessary. In some implementations, the user may manually select the cells by selecting a single cell or a range of cells that may belong to a table. The cells surrounding the selection are analyzed for possible table structures.
  • In some implementations, a table schema may be generated based on the selected range of cells. Sometimes when the whole table schema is too small, to avoid communication of a large number of small messages from the client device to the server and improve communication efficiency, several table schemas may be sent in a batch request to the server. When the identified table is too large to include in an XMLHttpRequest (XHR) request, the user device may only send the grid range of the detected table (for chart recommendations), and the server may determine a table structure from the sent grid range.
  • At 205, the server may prepare a table summary by extracting the dimensions, columns, rows, metrics, dimension filters, and/or other characteristics of the detected data table, and map the extracted table characteristics to cell ranges or addresses in a spreadsheet. For example, for a data table recording monthly sales data of the year, the table summary may include the number and index of rows and columns, the corresponding value in a cell identified by the row and column number, the metric of the value, and/or the like.
  • At 206, the server may extract operations to be applied to the data table, and translate the operations into one or more formulas executable on the data table. Further details of the formula building embodiments may be found in connection with FIG. 3 .
  • At 207, the server may send the formula(s) back to the user device, and the formula(s) may be applied on the detected data table to generate a result in response to the natural language query. In some implementations, the generated result may be presented via different visualization, such as, but not limited to, a pie chart, a data plot, and/or the like.
  • At 208, when the user receives the result in response to the original question via a user interface (e.g., see FIG. 4 ), the user may provide feedback on the result. For example, the user may provide a positive rating if the result is accurate. Or, the user may submit a negative rating when the result is unsatisfactory, e.g., misinterpreting the question, insufficient data, etc. When the user feedback is positive at 209, the server may save the formula building objects such as the table summary, formula(s) associated with the query string, for machine learning purposes at 210, so that the formula may be reused, or used as a reference when similar questions are received. When the user feedback is negative at 209, the server may disassociate the formula building objects with the question at 211, so that when similar questions are received, such questions are not to be interpreted in the same way.
  • At 212, the server may optionally obtain further information from the user feedback on the result. For example, if the user asks “how's the monthly growth of sales,” and a result of the monthly increase from last month to the current month is provided but the user submits negative feedback, the user interface may prompt the user to provide further information. The user may be prompted to re-enter the question with a time period “how's the monthly growth of sales from ______ to ______” Or the user interface may prompt the user to confirm whether the identified data entities “monthly growth” and “sales” are accurate. As another example, the user interface may provide suggested queries to the user, if the server fails to parse and identify what the natural language query is. Other additional methods may be employed for the user to provide further detailed feedback to refine the question.
  • At 213, the server may provide an alternative interpretation of the query string based on information obtained at 212, and may generate an alternative formula using the alternative table summary at 214. Then the server may proceed at 207 to provide the updated result to the user.
  • FIG. 3 provides a block diagram illustrating example aspects of data flows between various components at the client side and the server side to process a natural language query, according to some embodiments described herein. At the front end 301 (e.g., a client/user device 108 in FIG. 1 ), a user interface may present an answer panel 302 (e.g., see 401-402 in FIG. 4 ), which may post a query request 321 to a backend server 300 (e.g, the server 104 in FIG. 1 ). The query request 321 may include a query string (e.g., the question asked by a user, or key terms extracted from the original natural language question asked by the user, etc.), a list of data entities (e.g., table schema generated based on key terms from the query string, etc.), a grid range from an existing data table, and/or the like. The backend server 302 may be operated under a java environment, and may pass the query request 321 to be processed at a series of modules such as but not limited to a get-answer action module 303, an entity list extractor 304, an analytics module 305, a query interpreter 306, a table detector 307, and/or the like.
  • The get-answer action module 303 may act as a communication interface that receives the client request 321, which may include query parameters such as a query string (e.g., question asked by user, etc.), a grid range of the data table detected in and around cell selection, and/or the like. If the request 321 has reached the server, the grid range may contain a constructed table. On the other hand, if no data table is detected or the selected grid range does not contain any data, the answer panel interface 302 may not be presented to a user at the beginning. The get-answer action module 303 may send the grid range information 322 to the entity list extractor 304 to get a table view of the data entity list based on the grid range information, e.g., a sub-table having columns and rows defining relevant data entities.
  • The entity list extractor 304 may construct a table schema, e.g., a data entity list including data entities relevant to the query. The entity list extractor 304 may obtain a table summary 324 (e.g., including column headers, data types, labels column, cell metrics, and/or other properties) from the table detector 307. The entity list extractor 304 may also build a typed table 323 from the grid range and pass it on to the table detector 307 for summarization.
  • The entity list extractor 304 may provide a table view that is an object representation of the data entity list. The entity list may be represented in a data structure as a per-table knowledge graph, represented by graph nodes such as but not limited to dimensions, dimension filters, metrics, and/or the like. Dimensions may include header of a column whose values act as row keys (or labels) into the table. For example, “Country” will be a dimension in a table with country names as labels or row keys). Dimension filters may include values in the dimension column (row keys/label column). For example, “India”, “USA” are the dimension filters for the dimension “Country.” Metrics may include all number columns taken as metrics or column values. Generally, a user may look for a metrics for a particular dimension filter (or label). For example, in the string “Population of India,” “Population” is identified as a metric and dimension filter is identified as “India” for dimension “Country.”
  • The entity list extractor 304 may provide an entity list table view 325 to the get-answer action module 303. The entity list table view 325 may be generated by extracting metrics, dimensions and dimension filters from the table summary 324. For example, it may be assumed that all column headers that correspond to cells with numeric values are metrics (e.g, a column header “population” is a metric as in the above example), all string and date/time column headers are dimensions (e.g., a column header “country,” a text string, is a dimension) and the values in these dimension columns are dimension filters (e.g, values under the column header “country” such as “U.S.A.” “India” etc., are dimension filters). Other determination of the metrics, dimensions and dimension filters can be applied. In addition, the entity list table view 325 may serve to reverse lookup row and column indices given a dimension, metric or dimension filter string, which may be used to map parameters such as dimensions, metrics, dimension filters back to the grid row and column indices during formula construction. To allow this, the entity list table view 325 may provide a metrics-to-column number map, a dimensions-to-column number map, and a dimension-filters-to-row-and-column pair map.
  • The table detector 307 may extract information from a data table and generate a table summary 324, which may be used to determine what entities in the table can be used to generate a formula to derive the query term. Tables can be generally represented as a common object, which stores the data in a table, the column headers and types of data in the columns, and derived facts about the data.
  • The table detector 307 may extract information from a data table in several steps. First, light parsing of cells and inference of column headers and data types may be performed. For cells having numeric values between 1900-2100, the cells may be interpreted as years, instead of pure numeric values. The table detector 307 may then filter out spurious rows and columns, including but not limited to empty rows/columns, numeric columns with ID numbers, columns for taking notes, and/or the like.
  • The table detector 307 may then add column-based statistics. For example, for all column types, the number of missing or distinct values may be recorded. For numeric columns, the number of negative/positive/floats/zero values, as well as the sum, standard deviation, monotonicity and uniformity of the array may be recorded. For string columns, the ratio of numeric to non-numeric characters, an average string length, and a maximum string length may be recorded.
  • The table object created from the input table cell values from the data table 323 may then used to create an aggregate table. Each column in the aggregate table may be inspected to determine a number of unique values as compared to the number of total values (e.g, the range of data values). If the column is categorical (e.g., when the unique values in the column is a subset of the entire spectrum of data values), then the column may be used to create an aggregated table.
  • For each categorical column, two aggregated objects may be created in association with the column. A new “count” aggregated object may be created to record information relating to the “count” of a unique value. For example, each row of the object may represent a unique value, and in each row, the first cell stores the unique value and the second cell records the number of times that the respective unique value appears in the original categorical column.
  • A new “sum” aggregated object may be created to record the total sum of each unique value in the original table. For example, each row of the object represents a unique value, and each column of the object represents a categorical numeric column in the original table 323. The value in each cell of the object represents a sum of unique values of all cells in the respective categorical column that contain the respective unique value (based on the respective row of the object).
  • For example, if the original data has two columns like.
  • TABLE 1
    Example Data Grid
    Yes
    1
    Yes 3
    No 2
    Yes 5
    No 3
    No 2

    instead of charting or responding with the raw data grid above, the first column may be pivoted or grouped and the second column is to be summed per distinct entries in the first column so that Table 1 can be recorded as “Yes, 9; No, 7.” Or alternatively, the count of each repeated entry “Yes” or “No” may be recorded such that Table 1 can be recorded as “Yes, 3; No, 3.”
  • The “count” and “sum” object may be example objects for aggregation. Alternatively, average aggregation objects may be created, e.g., using an average value of the “count” or “sum.” The objects recording the count and sum of each unique value may be used to carry information of the original data table 323.
  • The get-answer action module 303 may also send a parse request 326 including data entity list information and query information to the analytics module 305, which may generate a parse response 327. The parse response 327 may include a structured data table/spreadsheet query represented as the query in the protocol buffer.
  • The query interpreter 306 may interpret returned query response 328 to an executable formula 329 string using the entity list table view passed on from the get-answer action module 303. The query interpreter 306 may include various comparable classes for formula builder, e.g., a particular formula builder may correspond to one type of formula. Here a given set and count of fields in the query 328 may correspond to only one formula, e.g., a query with exactly two metrics corresponds to a correlation formula.
  • For example, the query interpreter 306 may invoke a variety of operations. An example operation includes a query scoring operation, e.g., scoreQuery (the query in the protocol buffer), which returns a score, built simply by counting the number of fields of the input query in the protocol buffer it can consume, or returns a negative/zero score if the fields in the query in the protocol buffer are not sufficient to build a formula. For example, if the input query in the protocol buffer having two dimension filters and a dimension, is passed to a formula builder that requires at least one dimension filter and at least one dimension, the scoreQuery( ) operator may return a score of two (e.g., one point for satisfying the at least one dimension requirement and one point for satisfying the at least one dimension filter requirement). The score of two (non-zero) indicates that the parameters included in the query in the protocol buffer are sufficient for formula building. In some situations, a given query may have more than one formula builder that may return the same score, e.g., if another formula builder that requires just two dimension filters, the input query in the protocol buffer in the above example would also be given a score of two with this formula builder.
  • The query interpreter 306 may then run a getFormula (query in the protocol buffer, Entity ListTableView) operation, based on the input of the query and the entity list table view at 328. After determining that the query score is a positive number, the query interpreter 306 may return a formula built by joining data in the input values query in the protocol buffer and EntityListTableView.
  • The query interpreter 306 may take in a list of formula builders available (injected), and may interpret the input query in the protocol buffer by first scoring each formula builder by the number of fields of the input query in the protocol buffer may consume. This may filter out a set of formula builders that cannot understand the input query in the protocol buffer. If there is at least one formula builder with a positive score in response to the input query in the protocol buffer, the formula builder with the highest score may be used to map a formula 329. In this way, the formula builder that consumes the maximum number of fields from the input query in the protocol buffer can be used to construct the possible formula parses.
  • The query interpreter 306 may be structured as a class with multiple smaller formula builders plugged into it. In this way, the query interpreter structure can be expandable with additional formula builders. For example, when a different type of query is received, new formula type may be added to the formula builders without the need to change the existing formula builder.
  • When the get-answer action module 303 receives a formula 329 from the query interpreter 306, a JSON response 330 including the formula may be returned to the answer panel 302 at the frontend 301 (e.g., at the client side). The answer panel 302 may then provide the formula 331 to a formula preview calculator 308, which may in turn generate a result 332 based on the formula. The answer panel 302 may then provide the result to the user at 333.
  • FIG. 4 provides an example user interface diagram illustrating aspects of the answer panel (e.g., 302 in FIG. 3 ), according to some embodiments described herein. Example mobile interface 401 and 402 show example mobile screens of the answer panel 302. In other implementations, the answer panel 302 may have an interface on a desktop computer, e.g., similar to a browser-based application.
  • At screen 401, a user can type a natural language question in the query box 403, e.g., “how's the growth of monthly totals?” As another example, the query box 403 may provide a suggested query in response to the user entered question, to help users better understand how to structure their own questions using the precise terms. The question intake at the query box 403 may also automatically complete, or correct typographical mistakes from, the user-entered question, so that the data entities for the query can be auto-completed. In some implementations, the query may be annotated with same colors with relevant sections in a spreadsheet to show how key terms in the query relate back to sections in the spreadsheet.
  • An answer may be provided at 404, e.g., a statement containing a calculated result of the “monthly total” The answer may include a human-friendly interpretation of the answer in natural language, e.g., “for every week, monthly total increases by,” and the calculated result, “$1,500.” When a user asks the question in a certain language (e.g., non-English), the answer may correspondingly be provided in the same language.
  • In another implementation, at screen 402, the answer to the query “how's the growth of monthly totals” may take a variety of visualization format. For example, at 405, a chart may be generated showing different data plots 407 over a period of time, such as the monthly totals, commission income, sales of product and service income, etc, as related to the query question “growth of monthly total.” The answer panel may further provide analytics of the data plots at 408.
  • In a further implementation, the answer screen 401 or 402 may include a rating button, a “like” or “dislike” button, or a “thumbs up” or “thumbs down” button for the user to provide feedback to the answer to the original question asked.
  • FIG. 5 is a block diagram of a computing device, which could be any of the components of the system of FIG. 1 , for performing any of the processes described in FIGS. 2A-3 or provide the user interface described in FIG. 4 . Each of the components of these systems may be implemented on one or more computing devices 500. In certain aspects, a plurality of the components of these systems may be included within one computing device 500. In certain implementations, a component and a storage device may be implemented across several computing devices 500.
  • The computing device 500 comprises at least one communications interface unit, an input/output controller 510, system memory, and one or more data storage devices. The system memory includes at least one random access memory (RAM 502) and at least one read-only memory (ROM 504). All of these elements are in communication with a central processing unit (CPU 506) to facilitate the operation of the computing device 500. The computing device 500 may be configured in many different ways. For example, the computing device 500 may be a conventional standalone computer or alternatively, the functions of computing device 500 may be distributed across multiple computer systems and architectures. In FIG. 5 , the computing device 500 is linked, via network or local network, to other servers or systems.
  • The computing device 500 may be configured in a distributed architecture, wherein databases and processors are housed in separate units or locations. Some units perform primary processing functions and contain at a minimum a general controller or a processor and a system memory. In distributed architecture implementations, each of these units may be attached via the communications interface unit 508 to a communications hub or port (not shown) that serves as a primary communication link with other servers, client or user computers and other related devices. The communications hub or port may have minimal processing capability itself, serving primarily as a communications router. A variety of communications protocols may be part of the system.
  • The CPU 506 comprises a processor, such as one or more conventional microprocessors and one or more supplementary co-processors such as math co-processors for offloading workload from the CPU 506. The CPU 506 is in communication with the communications interface unit 508 and the input/output controller 510, through which the CPU 506 communicates with other devices such as other servers, user terminals, or devices. The communications interface unit 508 and the input/output controller 510 may include multiple communication channels for simultaneous communication with, for example, other processors, servers or client terminals.
  • The CPU 506 is also in communication with the data storage device. The data storage device may comprise an appropriate combination of magnetic, optical or semiconductor memory, and may include, for example, RAM 502, ROM 504, flash drive, an optical disc such as a compact disc or a hard disk or drive. The CPU 506 and the data storage device each may be, for example, located entirely within a single computer or other computing device; or connected to each other by a communication medium, such as a USB port, serial port cable, a coaxial cable, an Ethernet cable, a telephone line, a radio frequency transceiver or other similar wireless or wired medium or combination of the foregoing. For example, the CPU 506 may be connected to the data storage device via the communications interface unit 508. The CPU 506 may be configured to perform one or more particular processing functions.
  • The data storage device may store, for example, (i) an operating system 512 for the computing device 500; (ii) one or more applications 514 (e.g., computer program code or a computer program product) adapted to direct the CPU 506 in accordance with the systems and methods described here, and particularly in accordance with the processes described in detail with regard to the CPU 506; or (iii) database(s) 516 adapted to store storage management information that may be utilized to manage storage information required by the program.
  • The operating system 512 and applications 514 may be stored, for example, in a compressed, an uncompiled or an encrypted format, and may include computer program code. The instructions of the program may be read into a main memory of the processor from a computer-readable medium other than the data storage device, such as from the ROM 504 or from the RAM 502. While execution of sequences of instructions in the program causes the CPU 506 to perform the process steps described herein, hard-wired circuitry may be used in place of, or in combination with, software instructions for implementation of the processes of the present disclosure. Thus, the systems and methods described are not limited to any specific combination of hardware and software.
  • Suitable computer program code may be provided for performing one or more functions in relation to any of the processes as described herein. The program also may include program elements such as an operating system 512, a database management system and “device drivers” that allow the processor to interface with computer peripheral devices (e.g, a video display, a keyboard, a computer mouse, etc.) via the input/output controller 510.
  • The term “computer-readable medium” as used herein refers to any non-transitory medium that provides or participates in providing instructions to the processor of the computing device 500 (or any other processor of a device described herein) for execution. Such a medium may take many forms, including but not limited to, non-volatile media and volatile media. Non-volatile media include, for example, optical, magnetic, or opto-magnetic disks, or integrated circuit memory, such as flash memory. Volatile media include dynamic random access memory (DRAM), which typically constitutes the main memory. Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, an EPROM or EEPROM (electronically erasable programmable read-only memory), a FLASH-EEPROM, any other memory chip or cartridge, or any other non-transitory medium from which a computer can read.
  • Various forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to the CPU 506 (or any other processor of a device described herein) for execution. For example, the instructions may initially be borne on a magnetic disk of a remote computer (not shown). The remote computer can load the instructions into its dynamic memory and send the instructions over an Ethernet connection, a cable line, or even a telephone line using a modem. A communications device local to a computing device 100 (e.g., a server) can receive the data on the respective communications line and place the data on a system bus for the processor. The system bus carries the data to main memory, from which the processor retrieves and executes the instructions. The instructions received by main memory may optionally be stored in memory either before or after execution by the processor. In addition, instructions may be received via a communication port as electrical, electromagnetic or optical signals, which are exemplary forms of wireless communications or data streams that carry various types of information. In general one of ordinary skill in the art that the source features, destination features and content of the document are not limited in any way by the examples provided above.

Claims (20)

What is claimed is:
1. A method for processing a natural language query, the method comprising:
obtaining, by a processor, a query term from the natural language query;
preparing a table summary based on the query term;
generating a formula of the natural language query based on the table summary;
generating a result based on the formula; and
responsive to receiving negative feedback to the result, disassociating one or more of the table summary or the formula with the query term.
2. The method of claim 1, wherein the natural language query is provided by a user via a user interface at a client device.
3. The method of claim 2, wherein the natural language query is manually or vocally entered by the user.
4. The method of claim 1, wherein the natural language query is received at a server from a client device via a hypertext transfer protocol (HTTP) post request.
5. The method of claim 1, wherein the natural language query is originated in a first language and is then translated into a second language.
6. The method of claim 1, wherein the result is presented to a user via a visualization format including any of an answer statement, a chart, or a data plot.
7. The method of claim 1, further comprising:
storing the formula associated with the natural language query or the query term when the user feedback is positive.
8. The method of claim 1, further comprising translating a logic operation into the formula by:
selecting a formula build operation from a plurality of formula build operations based on a logic operation, wherein the formula build operation corresponds to a type of formula; and
executing the formula build operation to generate the formula based on the table summary and the logic operation.
9. The method of claim 1, wherein the result is translated into a respective language when the natural language query is received in the respective language from the user.
10. A system for processing a natural language query, the system comprising:
a memory; and
a processor, coupled to the memory, to:
obtain a query term from the natural language query;
prepare a table summary based on the query term;
generate a formula of the natural language query based on the table summary;
generate a result based on the formula; and
responsive to receiving negative feedback to the result, disassociate one or more of the table summary or the formula with the query term.
11. The system of claim 10, wherein the natural language query is provided by a user via a user interface at a client device.
12. The system of claim 11, wherein the natural language query is manually or vocally entered by the user.
13. The system of claim 10, wherein the natural language query is received at a server from a client device via a hypertext transfer protocol (HTTP) post request.
14. The system of claim 10, wherein the natural language query is originated in a first language and is then translated into a second language.
15. The system of claim 10, wherein the result is presented to a user via a visualization format including any of an answer statement, a chart, or a data plot.
16. The system of claim 10, wherein the processor is further to:
store the formula associated with the natural language query or the query term when the user feedback is positive.
17. The system of claim 10, wherein the processor is further to translate a logic operation into the formula by:
select a formula build operation from a plurality of formula build operations based on a logic operation, wherein the formula build operation corresponds to a type of formula; and
execute the formula build operation to generate the formula based on the table summary and the logic operation.
18. The system of claim 10, wherein the result is translated into a respective language when the natural language query is received in the respective language from the user.
19. A computer-readable non-transitory storage medium storing processor-executable instructions for a processor, the instructions to cause the processor to:
obtain a query term from a natural language query;
prepare a table summary based on the query term;
generate a formula of the natural language query based on the table summary;
generate a result based on the formula; and
responsive to receiving negative feedback to the result, disassociate one or more of the table summary or the formula with the query term.
20. The computer-readable non-transitory storage medium of claim 19, wherein the processor is further to translate a logic operation into the formula by:
select a formula build operation from a plurality of formula build operations based on a logic operation, wherein the formula build operation corresponds to a type of formula; and
execute the formula build operation to generate the formula based on the table summary and the logic operation.
US18/227,450 2017-01-18 2023-07-28 Systems and methods for processing a natural language query in data tables Pending US20230385321A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/227,450 US20230385321A1 (en) 2017-01-18 2023-07-28 Systems and methods for processing a natural language query in data tables

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US15/408,664 US10997227B2 (en) 2017-01-18 2017-01-18 Systems and methods for processing a natural language query in data tables
US17/306,666 US11714841B2 (en) 2017-01-18 2021-05-03 Systems and methods for processing a natural language query in data tables
US18/227,450 US20230385321A1 (en) 2017-01-18 2023-07-28 Systems and methods for processing a natural language query in data tables

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US17/306,666 Continuation US11714841B2 (en) 2017-01-18 2021-05-03 Systems and methods for processing a natural language query in data tables

Publications (1)

Publication Number Publication Date
US20230385321A1 true US20230385321A1 (en) 2023-11-30

Family

ID=62841492

Family Applications (3)

Application Number Title Priority Date Filing Date
US15/408,664 Active 2037-07-29 US10997227B2 (en) 2017-01-18 2017-01-18 Systems and methods for processing a natural language query in data tables
US17/306,666 Active 2037-01-23 US11714841B2 (en) 2017-01-18 2021-05-03 Systems and methods for processing a natural language query in data tables
US18/227,450 Pending US20230385321A1 (en) 2017-01-18 2023-07-28 Systems and methods for processing a natural language query in data tables

Family Applications Before (2)

Application Number Title Priority Date Filing Date
US15/408,664 Active 2037-07-29 US10997227B2 (en) 2017-01-18 2017-01-18 Systems and methods for processing a natural language query in data tables
US17/306,666 Active 2037-01-23 US11714841B2 (en) 2017-01-18 2021-05-03 Systems and methods for processing a natural language query in data tables

Country Status (1)

Country Link
US (3) US10997227B2 (en)

Families Citing this family (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11537644B2 (en) * 2017-06-06 2022-12-27 Mastercard International Incorporated Method and system for conversational input device with intelligent crowd-sourced options
US10482180B2 (en) * 2017-11-17 2019-11-19 International Business Machines Corporation Generating ground truth for questions based on data found in structured resources
US11016986B2 (en) * 2017-12-04 2021-05-25 Palantir Technologies Inc. Query-based time-series data display and processing system
US10896297B1 (en) 2017-12-13 2021-01-19 Tableau Software, Inc. Identifying intent in visual analytical conversations
US11030226B2 (en) * 2018-01-19 2021-06-08 International Business Machines Corporation Facilitating answering questions involving reasoning over quantitative information
CN109446217A (en) * 2018-09-17 2019-03-08 平安科技(深圳)有限公司 Data method, electronic device and computer readable storage medium
US11055489B2 (en) * 2018-10-08 2021-07-06 Tableau Software, Inc. Determining levels of detail for data visualizations using natural language constructs
US11537276B2 (en) 2018-10-22 2022-12-27 Tableau Software, Inc. Generating data visualizations according to an object model of selected data sources
US11314817B1 (en) 2019-04-01 2022-04-26 Tableau Software, LLC Methods and systems for inferring intent and utilizing context for natural language expressions to modify data visualizations in a data visualization interface
US20200372019A1 (en) * 2019-05-21 2020-11-26 Sisense Ltd. System and method for automatic completion of queries using natural language processing and an organizational memory
US11042558B1 (en) 2019-09-06 2021-06-22 Tableau Software, Inc. Determining ranges for vague modifiers in natural language commands
US11132492B2 (en) * 2019-10-07 2021-09-28 Vyasa Analytics, LLC Methods for automated filling of columns in spreadsheets
US10997217B1 (en) 2019-11-10 2021-05-04 Tableau Software, Inc. Systems and methods for visualizing object models of database tables
US11714807B2 (en) * 2019-12-24 2023-08-01 Sap Se Platform for conversation-based insight search in analytics systems
CN111061830B (en) * 2019-12-27 2023-12-05 深圳市元征科技股份有限公司 Method and device for processing automobile repair data
US11663199B1 (en) 2020-06-23 2023-05-30 Amazon Technologies, Inc. Application development based on stored data
US11500843B2 (en) * 2020-09-02 2022-11-15 Coupa Software Incorporated Text-based machine learning extraction of table data from a read-only document
CN112131257B (en) * 2020-09-14 2023-10-27 泰康保险集团股份有限公司 Data query method and device
US11768818B1 (en) * 2020-09-30 2023-09-26 Amazon Technologies, Inc. Usage driven indexing in a spreadsheet based data store
US11500839B1 (en) 2020-09-30 2022-11-15 Amazon Technologies, Inc. Multi-table indexing in a spreadsheet based data store
US11429629B1 (en) * 2020-09-30 2022-08-30 Amazon Technologies, Inc. Data driven indexing in a spreadsheet based data store
US11514236B1 (en) 2020-09-30 2022-11-29 Amazon Technologies, Inc. Indexing in a spreadsheet based data store using hybrid datatypes
US11714796B1 (en) 2020-11-05 2023-08-01 Amazon Technologies, Inc Data recalculation and liveliness in applications
US11734522B2 (en) * 2021-01-04 2023-08-22 Sap Se Machine learning enabled text analysis with support for unstructured data
CN113688213B (en) * 2021-02-09 2023-09-29 鼎捷软件股份有限公司 Application program interface service searching system and searching method thereof
WO2022220921A1 (en) * 2021-04-16 2022-10-20 Ohio State Innovation Foundation Automatic cross document consolidation and visualization of data tables
JP2022165879A (en) * 2021-04-20 2022-11-01 富士通株式会社 Information generation program, information generation method, and information generation apparatus
US20230094042A1 (en) * 2021-09-24 2023-03-30 Google Llc Personalized autonomous spreadsheets
US20230106058A1 (en) * 2021-09-24 2023-04-06 Google Llc Autonomous spreadsheet creation
CN117216245B (en) * 2023-11-09 2024-01-26 华南理工大学 Table abstract generation method based on deep learning

Family Cites Families (46)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SE466029B (en) * 1989-03-06 1991-12-02 Ibm Svenska Ab DEVICE AND PROCEDURE FOR ANALYSIS OF NATURAL LANGUAGES IN A COMPUTER-BASED INFORMATION PROCESSING SYSTEM
JPH0744638A (en) 1993-07-29 1995-02-14 Nec Corp Table data retrieving device
US6026388A (en) * 1995-08-16 2000-02-15 Textwise, Llc User interface and other enhancements for natural language information retrieval system and method
US5963940A (en) * 1995-08-16 1999-10-05 Syracuse University Natural language information retrieval system and method
US5960384A (en) * 1997-09-03 1999-09-28 Brash; Douglas E. Method and device for parsing natural language sentences and other sequential symbolic expressions
US6629097B1 (en) * 1999-04-28 2003-09-30 Douglas K. Keith Displaying implicit associations among items in loosely-structured data sets
US6615209B1 (en) * 2000-02-22 2003-09-02 Google, Inc. Detecting query-specific duplicate documents
US6763359B2 (en) * 2001-06-06 2004-07-13 International Business Machines Corporation Learning from empirical results in query optimization
US7283951B2 (en) * 2001-08-14 2007-10-16 Insightful Corporation Method and system for enhanced data searching
US7398201B2 (en) * 2001-08-14 2008-07-08 Evri Inc. Method and system for enhanced data searching
US7181465B2 (en) * 2001-10-29 2007-02-20 Gary Robin Maze System and method for the management of distributed personalized information
US6901411B2 (en) * 2002-02-11 2005-05-31 Microsoft Corporation Statistical bigram correlation model for image retrieval
US6946715B2 (en) * 2003-02-19 2005-09-20 Micron Technology, Inc. CMOS image sensor and method of fabrication
US7853551B1 (en) * 2002-06-25 2010-12-14 Gill Susan P Zann Natural language knowledge processor using trace or other cognitive process models
US20040139107A1 (en) * 2002-12-31 2004-07-15 International Business Machines Corp. Dynamically updating a search engine's knowledge and process database by tracking and saving user interactions
US7065536B2 (en) * 2002-12-31 2006-06-20 International Business Machines Corporation Automated maintenance of an electronic database via a point system implementation
US7216121B2 (en) * 2002-12-31 2007-05-08 International Business Machines Corporation Search engine facility with automated knowledge retrieval, generation and maintenance
US20050187913A1 (en) * 2003-05-06 2005-08-25 Yoram Nelken Web-based customer service interface
US7409336B2 (en) * 2003-06-19 2008-08-05 Siebel Systems, Inc. Method and system for searching data based on identified subset of categories and relevance-scored text representation-category combinations
US20050177358A1 (en) * 2004-02-10 2005-08-11 Edward Melomed Multilingual database interaction system and method
US20070100650A1 (en) * 2005-09-14 2007-05-03 Jorey Ramer Action functionality for mobile content search results
US9329753B2 (en) * 2006-11-10 2016-05-03 Blackberry Limited Handheld electronic device having selectable language indicator and menus for language selection and method therefor
US8996433B2 (en) 2007-10-11 2015-03-31 Steven Ginzberg Automated natural language formula translator and data evaluator
US8521732B2 (en) * 2008-05-23 2013-08-27 Solera Networks, Inc. Presentation of an extracted artifact based on an indexing technique
US8364462B2 (en) * 2008-06-25 2013-01-29 Microsoft Corporation Cross lingual location search
US20170060856A1 (en) * 2008-12-10 2017-03-02 Chiliad Publishing Incorporated Efficient search and analysis based on a range index
US8577910B1 (en) * 2009-05-15 2013-11-05 Google Inc. Selecting relevant languages for query translation
US8620890B2 (en) * 2010-06-18 2013-12-31 Accelerated Vision Group Llc System and method of semantic based searching
US9623119B1 (en) * 2010-06-29 2017-04-18 Google Inc. Accentuating search results
US20160307000A1 (en) * 2010-11-09 2016-10-20 Phuong B. Nguyen Index-side diacritical canonicalization
US9060001B2 (en) * 2011-10-25 2015-06-16 Cisco Technology, Inc. Prefix and predictive search in a distributed hash table
US9047278B1 (en) * 2012-11-09 2015-06-02 Google Inc. Identifying and ranking attributes of entities
US9330090B2 (en) 2013-01-29 2016-05-03 Microsoft Technology Licensing, Llc. Translating natural language descriptions to programs in a domain-specific language for spreadsheets
US10956433B2 (en) * 2013-07-15 2021-03-23 Microsoft Technology Licensing, Llc Performing an operation relative to tabular data based upon voice input
US9514230B2 (en) * 2013-07-30 2016-12-06 Facebook, Inc. Rewriting search queries on online social networks
US10083205B2 (en) * 2014-02-12 2018-09-25 Samsung Electronics Co., Ltd. Query cards
US20160171050A1 (en) * 2014-11-20 2016-06-16 Subrata Das Distributed Analytical Search Utilizing Semantic Analysis of Natural Language
US10296507B2 (en) * 2015-02-12 2019-05-21 Interana, Inc. Methods for enhancing rapid data analysis
US10373171B2 (en) * 2015-02-23 2019-08-06 Genesys Telecommunications Laboratories, Inc. System and method for making engagement offers based on observed navigation path
US10127321B2 (en) * 2015-02-23 2018-11-13 Genesys Telecommunications Laboratories, Inc. Proactive knowledge offering system and method
US9984116B2 (en) * 2015-08-28 2018-05-29 International Business Machines Corporation Automated management of natural language queries in enterprise business intelligence analytics
US10303686B2 (en) * 2015-11-19 2019-05-28 Sap Se Query plan optimization by persisting a hint table
US20170185920A1 (en) * 2015-12-29 2017-06-29 Cognitive Scale, Inc. Method for Monitoring Interactions to Generate a Cognitive Persona
US20170185919A1 (en) * 2015-12-29 2017-06-29 Cognitive Scale, Inc. Cognitive Persona Selection
US11893512B2 (en) * 2015-12-29 2024-02-06 Tecnotree Technologies, Inc. Method for generating an anonymous cognitive profile
US10719506B2 (en) * 2016-12-22 2020-07-21 Sap Se Natural language query generation

Also Published As

Publication number Publication date
US20210271697A1 (en) 2021-09-02
US20180203924A1 (en) 2018-07-19
US11714841B2 (en) 2023-08-01
US10997227B2 (en) 2021-05-04

Similar Documents

Publication Publication Date Title
US11714841B2 (en) Systems and methods for processing a natural language query in data tables
CN110543517B (en) Method, device and medium for realizing complex query of mass data based on elastic search
RU2380747C2 (en) Table representation using natural language commands
US8768976B2 (en) Operational-related data computation engine
US10719524B1 (en) Query template based architecture for processing natural language queries for data analysis
US10324947B2 (en) Learning from historical logs and recommending database operations on a data-asset in an ETL tool
US20180210883A1 (en) System for converting natural language questions into sql-semantic queries based on a dimensional model
US20120036463A1 (en) Metric navigator
US7912867B2 (en) Systems and methods of profiling data for integration
US11416509B2 (en) Data processing systems and methods for efficiently transforming entity descriptors in textual data
US20220164363A1 (en) Data extraction system
CN110795524B (en) Main data mapping processing method and device, computer equipment and storage medium
EP3176706B1 (en) Automated analysis of data reports to determine data structure and to perform automated data processing
US20180365626A1 (en) Systems and methods for creating and managing dynamic user teams
CN111767334A (en) Information extraction method and device, electronic equipment and storage medium
CN113902009A (en) Resume analysis method and device, electronic equipment, medium and product
CN114860737B (en) Processing method, device, equipment and medium of teaching and research data
CN112507098B (en) Question processing method, question processing device, electronic equipment, storage medium and program product
US9058345B2 (en) System and method of generating reusable distance measures for data processing
WO2014168961A1 (en) Generating data analytics using a domain model
WO2022032679A1 (en) Method and device for intelligently providing recommendation information
CN114564954B (en) Index management method and system for maintaining index uniqueness
JP2023184034A (en) System and method for supporting use of data
US20220374450A1 (en) Composite relationship discovery framework
CN114218226A (en) Report information system

Legal Events

Date Code Title Description
STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION