EP1282870A2 - Natural language interface for database queries - Google Patents
Natural language interface for database queriesInfo
- Publication number
- EP1282870A2 EP1282870A2 EP01937641A EP01937641A EP1282870A2 EP 1282870 A2 EP1282870 A2 EP 1282870A2 EP 01937641 A EP01937641 A EP 01937641A EP 01937641 A EP01937641 A EP 01937641A EP 1282870 A2 EP1282870 A2 EP 1282870A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- query
- natural language
- database
- keywords
- graph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2452—Query translation
- G06F16/24522—Translation of natural language queries to structured queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/243—Natural language query formulation
Definitions
- the field of the invention generally relates to user interfaces, and more specifically, to user interfaces that recognize natural language.
- Natural language (NL) processing has achieved considerable progress in areas such as speech recognition and generation. Natural language systems have become commonplace, especially in server-based Internet applications, speech recognition products, database search tools, and other environments where human interaction is required. However, decades of hard work by some of the brightest minds in the Artificial Intelligence field has proven the understanding of speech one of the most evasive information technology goals. researchers, therefore, have lowered their expectations for practical NL systems. For instance, existing NL prototypes focus on specific topics of conversation as more broader applications are more difficult to apply conventional NL techniques. Examples of these focused prototypes include MIT's Jupiter that provides weather forecasts and Carnegie Mellon' s Movieline that provides local movie schedules. These prototypes serve as a first step towards a broader range understanding of NL solutions. Alternatively, some leading industrial efforts concentrate on building a logical structure (conceptual networks - not linguistics) for a general dictionary to support understanding and translation (e.g., Microsoft's MindNet).
- NLIs natural language interfaces
- Ask.com is an example of an Internet search engine that allows a user to perform natural queries in the form of a free format input.
- the search engine's results rely on (1) its recognized keywords, (2) predefined keywords related to recognized keywords (text classification), and (3) predefined possible questions (templates) associated with each group of keywords from (1) and (2).
- Its processing steps are (1) capture all recognized keywords from the inputs, (2) determine all keywords that relate to recognized keywords, (3) retrieve and display predefined questions (each question with one keyword group).
- ask.com looks natural, it functions structurally as a text classification system, and does not actually interpret natural language queries. Therefore, it does not have the capability of processing natural language queries.
- a system and method that utilizes a definitive model of enterprise metadata, a design of keywords with simplified complexity, a graphical model of logical structure, a branch and bound search algorithm, and a case-based interaction method to process natural language inputs.
- a method for processing a natural language input provided by a user.
- the method comprises providing a natural language query input to the user, performing, based on the input, a search of one or more language-based databases including at least one metadata database comprising at least one of a group of information types comprising case information, keywords, information models, and database values, and providing, through a user interface, a result of the search to the user.
- the method further comprises a step of identifying, for the one or more language-based databases, a finite number of database objects, and determining a plurality of combinations of the finite number of database objects.
- the method further comprises a step of mapping the natural language query to the plurality of combinations.
- elements of the metadata database are graphically represented.
- the step of mapping comprises steps of identifying keywords in the natural language query, and relating the keywords to the plurality of combinations.
- the method further comprises a step of determining a reference dictionary comprising case information, keywords, information models, and database values.
- the step of mapping further comprises resolving ambiguity between the keywords and the plurality of combinations.
- the step of resolving includes determining an optimal interpretation of the natural language query using at least one of a group comprising rules and heuristics.
- a method for processing a natural language input comprising providing a plurality of database objects, identifying a finite number of permutations of the plurality of database objects, the database objects being stored in a metadata database comprising at least one of a group of information comprising case information, keywords, information models, and database values, and interpreting at least one of the permutations to determine a result of the natural language input.
- the step of providing a plurality of database objects includes providing at least one of the group comprising data types, data instances, and database computational operators.
- the step of interpreting includes mapping the at least one of the permutations to a database query.
- the database query is formulated in a structured query language (SQL) query.
- the method further comprises providing a reference dictionary comprising cases, keywords, information models, and database values, identifying, in the natural language input, a plurality of elements that belong to the reference dictionary, determining complete paths that implied by the plurality of elements that span elements of the natural language input and which belong to the graphics of the reference dictionary.
- the method further comprises determining a path that include elements of at least the informational models and database values.
- the method further comprises providing rules and heuristics for searching, and determining an optimum permutation based on the rules and heuristics.
- the method further comprises adding, as a result of user input, new cases and keywords to the reference dictionary.
- elements of the metadata database are graphically represented.
- a method for processing a natural language input comprising determining, from the natural language input, a plurality of recognized terms, the recognized terms existing in a data dictionary having a logical graph structure, determining a minimum number of the plurality of recognized terms, and determining vertices associated with the minimum number of the logical graph structure, determining at least one minimum cost query graph that contains a minimum amount of vertices, if there are more than one minimum cost query graphs, remove at least one redundant cost query graph and producing a solution set of cost query graphs, determining, within the solution set, at least one cost query graph that is a complete solution, and translating the at least one cost query graph to a query language statement.
- Figure 1 shows a general purpose computer in which one embodiment of the invention may be implemented
- Figure 2 shows a natural language query processor in accordance with one embodiment of the invention
- Figure 3 shows a reference dictionary in accordance with one embodiment of the invention
- FIG. 4 shows a metadatabase system in accordance with one embodiment of the invention
- Figure 5 shows an example complete search graph in accordance with one embodiment of the invention
- Figure 6 shows an example search graph in accordance with one embodiment of the invention.
- Figure 7 shows an example query image in accordance with one embodiment of the invention
- Figure 8 shows an example query image in accordance with one embodiment of the invention
- Figure 9 shows an example query image in accordance with one embodiment of the invention.
- Figure 10 shows an example feasible graph in accordance with one embodiment of the invention.
- Figure 11 shows an example feasible graph in accordance with one embodiment of the invention
- Figure 12 shows an example feasible graph in accordance with one embodiment of the invention
- Figure 13 shows an example feasible graph in accordance with one embodiment of the invention
- Figure 14 shows an example feasible graph in accordance with one embodiment of the invention
- Figure 15 shows an example query graph in accordance with one embodiment of the invention.
- Figure 16 shows an example query graph in accordance with one embodiment of the invention.
- Figure 17 shows an example query graph in accordance with one embodiment of the invention.
- a typical goal of a natural language interface (NLI) to databases is to allow users to access information stored in databases by articulating questions directly in their natural language such as English.
- a large number of natural language interfaces have been developed by specialists in computational linguistics and artificial intelligence to achieve this goal. However, they do not seem to be truly "natural".
- "truly natural” means any styles of linguistic articulation that a native speaker could understand and use sensibly. Available results tend to require of users using only some well-structured sentence templates and a tightly controlled vocabulary set to articulate queries. These artificial constructs are hardly natural to those who prefer their own choice of vocabulary, phrases, and expressions. Natural articulation goes beyond any fixed design of templates and significant- words dictionary. For example, a user of a Computer-Integrated Manufacturing database might query the system in any number of ways in a truly natural environment, such as:
- NLI neurodegenerative intelligence
- the NLI could also retain valuable cases (usage patterns) so that the NLI could improve its performance.
- the first two tasks of learning could help "close the loop" so that a query is always executed properly (completeness and correctness of query processing), while the third reduces the complexity of interpretation (e.g., the number of possible interpretations).
- Al Artificial Intelligence
- Each of the first four approaches requires users to articulate only in natural language forms that the system provides - or at least they assume that the user's articulation is consistent with these underlying forms. When this basic requirement or assumption does not hold in practice, the system fails to function properly (e.g., with poor performance and low accuracy), or even fails altogether. These forms typically feature some generic, linguistic prototype consisting of only one single sentence per query. Their disadvantage is in their restriction on naturalness.
- the fifth and last approach seeks for naturalness, allowing free-format text as input, but it couples a particular NLI design with a particular domain of application. If the first four approaches are "top-down" in relying on predefined natural language forms, the last one in contrast exhausts all possible interpretations from the "bottom up”. Its basic method is to provide a semantic model or a dictionary as the roadmap to generate possible interpretations. Its control is implicit: it assumes that users always query the databases known to the system, and can therefore always tune the NLI according to this known kernel of meaning.
- an NLI is provided that provides truly natural query capability to end users of enterprise databases in that the NLI interprets any style of articulation and to learn from the users in a way that improves both effectiveness and efficiency.
- the strategy uses a concept referred to herein as search and learn. This approach recognizes implicit enumeration-evaluation as a basic solution paradigm to the problem of natural language queries.
- a reference dictionary is used that integrates enterprise metadata (information models and contextual knowledge) with case-based reasoning.
- the new design affects two vital functions: (1) the generation of all possible interpretations of a natural query suitable for evaluation, and (2) the reduction of the complexity of keywords and the reduction of growth of keywords.
- a reference dictionary is used to search for an optimal solution and the dictionary "learns" from experience, achieving maximum naturalness with minimum enumeration.
- this new approach promises realistic performance and completeness of a solution because the new reference dictionary and learning capability allows for the determination of complete solutions.
- the new approach identifies that the NLI problem is primarily a search problem and relates the problem to the vast tradition of constrained optimization (e.g., scheduling and traveling salesman).
- models are the conceptual networks of enterprise databases, which are the known domain of enterprise users' queries. Therefore, it develops an innovative approach to use enterprise metadata to search and interpret the meaning of natural queries.
- the objective is to support queries expressed in any combinations of multiple sentences (essays), any forms of sentence (complete or not), and any vocabulary (personal or standard) against databases that have well-defined information models. Furthermore, this objective may be accomplished with practical performance free of the above problems.
- a simplified model of the general logical structure paradigm is provided which is more generic than the current NLI prototypes, when applied to enterprise database queries.
- an NLI system includes one or more of a new class of reference dictionary (of enterprise metadata), a branch-and-bound search method, and a case-based reasoning design, to implicitly evaluate a number of possible interpretations of the user's natural articulation and determine the optimal solution.
- a new class of reference dictionary of enterprise metadata
- branch-and-bound search method to implicitly evaluate a number of possible interpretations of the user's natural articulation and determine the optimal solution.
- One task is to determine a minimally sufficient set of metadata and develop a feasible logical structure for natural queries, such that the system could interpret the queries using the structure alone.
- Users are bound to refer, either directly or indirectly, to database objects (including data semantics, structures, instances or values, and operators) in their natural queries. If they do not use directly these database objects, they articulate their queries in terms of other significant words and phrases (i.e., keywords) that correspond sufficiently to these objects. Therefore, a natural query is reducible to a particular combination of these database objects and keywords. Conversely, a particular combination of database objects and keywords could represent a particular natural query. These keywords are overwhelming when all possible permutations of natural words (phrases) in queries are considered (i.e., a linguistic dictionary). However, keywords could be manageable if they are based on database objects and other known enterprise metadata.
- a metadata search may be used to efficiently manage database objects.
- the above argument presents a way to systematically resolve ambiguity in natural queries. That is, according to one embodiment of the invention, it is realized that ambiguity is the deviation from an exact and sufficient match on the logical structure, including no matching of metadata as well as all forms of multiple matching. In the worst case, the system might not be able to identify any database object and keyword at all.
- the system may search the entire space of existing permutations and evaluate each of them for the user to determine which one is correct. Note that, this approach still has a better lower bound than generic natural language processing, which fails without user feedback.
- Each additional database object the system recognizes as the result of user feedback serves to eliminate certain space from the search and narrow down the possible interpretations. In the best case, the system identifies a single complete permutation in the input and determines a unique and correct interpretation for the user.
- a minimally sufficient set of metadata can include all proven database objects, of which the information models are largely constant (known at the design time) but the database values are extensible at run time. Therefore, according to various embodiment of the invention, a "base" set of proven database objects may be provided with an NLI system, and modification and/or new database objects may be added at runtime. Next, the metadata set includes keywords to assist matching with database objects.
- Keywords may be identified, organized, and controlled based on an enterprise information model connected to database values.
- Each database object corresponds to a finite number of keywords, and keywords would generally not be derived from any enumeration of possible permutations of words in natural queries. Therefore, the information model-based subset of keywords would generally be a constant (or a linear growth with a small slope), leaving the database value-based subset to grow with new applications and attain sufficiency without infinite growth in the number of keywords as in conventional systems.
- keyword growth would be a polynomial-type growth at most.
- Keywords in accordance with various embodiments of the invention is a significant improvement over that of a linguistic dictionary, whose growth is exponential by nature due to the number of keyword permutations.
- cases of query processing may be created and integrated with other metadata.
- enterprise metadata resources of search
- they are integrated in an extensible metadata representation method so that every resources item references all other related resources for query interpretation.
- a repository of metadata may be implemented as, for example, a reference dictionary. More particularly, a semantically based graphical abstraction of the reference dictionary may be used to define the logical structure of the enterprise database query for the application domain concerned.
- the core of the reference dictionary may be, for example, a design-time product, developed by analysts, designers, and users. Cases and additional keywords and other metadata (e.g., changes to the information models) and database values can be added during runtime as the enterprise database system ages and evolves. Further, case- based reasoning may be used for producing richer keywords and cases.
- the computer system 101 may include a processor 108 connected to one or more storage devices 103, such as a disk drive through a communication device such as bus 107.
- the computer system also includes one or more output devices 104, such as a monitor or graphic display, or printing device.
- the computer system 101 typically includes a rnemory 105 for storing programs and data during operation of the computer system 101.
- the computer system may contain one or more communication devices that connect the computer system to a communication network 106.
- Computer system 101 may be a general purpose computer system that is programmable using a high level computer programming language. The computer system may also be implemented using specially programmed, special purpose hardware.
- the processor 108 is typically a commercially available processor, such as the PENTIUM, PENTIUM II, PENTIUM III, PENTIUM IV, or StrongARM microprocessor from the Intel Corporation, Athlon or Duron processor available from AMD, PowerPC microprocessor, SPARC processor available from Sun Microsystems, or 68000 series microprocessor available from Motorola. Many other processors are available.
- Such a processor usually executes an operating system which may be, for example, DOS, WINDOWS 95, WINDOWS NT, WINDOWS 2000, or WinCE available from the Microsoft Corporation, MAC OS SYSTEM 7 available from Apple Computer, SOLARIS available from Sun Microsystems, NetWare available from Novell Incorporated, PalmOS available from the 3COM corporation, or UNIX-based operating systems (such as Linux) available from various sources. Many other operating systems may be used.
- an operating system which may be, for example, DOS, WINDOWS 95, WINDOWS NT, WINDOWS 2000, or WinCE available from the Microsoft Corporation, MAC OS SYSTEM 7 available from Apple Computer, SOLARIS available from Sun Microsystems, NetWare available from Novell Incorporated, PalmOS available from the 3COM corporation, or UNIX-based operating systems (such as Linux) available from various sources. Many other operating systems may be used.
- the communication network 102 may be an ETHERNET network or other type of local or wide area network (LAN or WAN), a point-to-point network provided by telephone services, or other type of communication network.
- Information consumers and providers also referred to in the art as client and server systems, respectively, communicate through the network 102 to exchange information.
- aspects of the present invention may be performed by one or more computer systems, processors, or other computing entity.
- Various aspects of the present invention may be centralized or distributed among more than one system, and the invention is not limited to any particular implementation. It should be understood that the invention is not limited to a particular computer system platform, processor, operating system, or network. Also, it should be apparent to those skilled in the art that the present invention is not limited to a specific programming language or computer system and that other appropriate programming languages and other appropriate computer systems could also be used.
- a general-purpose computer system 101 may include a natural language user interface (NLUI or simply NLI), through which a user requests information and performs other transactions. For instance, the user may provide input and receive output from graphical user interfaces. In one case, the interface may prompt a user with a series of questions, to which the user may respond. The questions may be multiple choice question format, of which a single selection of the choices is an appropriate response. However, the system 101 may present a general query interface on graphical user interface, through which the user may pose natural language queries or responses to questions.
- NLUI natural language user interface
- NLI natural language user interface
- system 101 may prompt the user to "Please enter a search (natural language or keyword)."
- the user may provide a natural language response, asking system 101 "Where is the Houston Field House at RPI located?"
- the natural language interface may have associated with it a natural language analyzer which determines the meaning of a provided input.
- a natural language analysis system such as the system shown in Figure 2 discussed in more detail below.
- the natural language analysis system finds the meaning of the request and determines the correct source of the information requested.
- system 101 may format and send the request to a server-based system, and the server-based system may return the result to system 101.
- a natural language analyzer that analyzes queries may be part of computer system 101.
- This query processor may perform one or more analyzing steps on a received query, which is generally a string of characters, numbers, or other items.
- a long-standing goal in the field of information technology is to allow humans to communicate with computer systems in the natural languages of humans.
- queries are difficult for a computer system to interpret precisely.
- the first four approaches (1) - (4) require users to articulate only in the natural language forms that the system provides — or at least they assume that the user's articulation is consistent with these underlying forms. When this basic requirement or assumption does not hold in practice, the system would fail to function properly (e.g., with poor performance and low accuracy), or even fail altogether.
- These forms typically feature some generic, linguistic prototype consisting of only one single sentence per query. Thus, their advantage is that the resultant NLI is easily portable from one database system to another.
- the disadvantage is the restriction on naturalness of the input from the user.
- the last approach (5) essentially embraces a different priority, placing naturalness ahead of portability (i.e., coupling a particular NLI design with a particular domain of application, but allowing free-format text as input). If the first four approaches are top-down in their relying on the computer's direct understanding of the user's articulation, the last one could be considered as the computer's exhausting of all possible interpretations from the bottom up.
- the basic strategy of system (1) - (5) is to provide a semantic model or a dictionary as the roadmap for generating possible interpretations.
- These systems assume that the users always query databases known to the system, thus the NLI could be tuned according to this known information.
- users are bound to refer, either directly or indirectly, to these known database objects (types or semantic models, instances or values, and operators) in their natural queries. If they do not use directly these database objects, they have to articulate their query in terms of other significant words and phrases (hereinafter referred to as "keywords") that correspond to these objects.
- keywords significant words and phrases
- Semantic model is a form of keywords and a dictionary is a more extensive collection of keywords beyond a usual semantic model.
- the critical success factor of this approach is clearly the semantic model-dictionary employed, which must be powerful enough to span efficiently the space of possible usage of natural language in the domain. Because database objects can only be a grossly simplistic portion of the natural vocabulary, keywords must shoulder the burden of representing naturalness. Their number could increase exponentially as the number of users and usage patterns increase.
- a new approach according to one embodiment of the invention recognizes implicit enumeration-evaluation as a basic solution paradigm to the problem of natural language queries.
- the new approach includes designing a reference dictionary of concepts that integrates enterprise information models and contextual knowledge with user-oriented keywords and past cases of usage, to provide the logical structure of natural queries for the enterprises databases concerned.
- the new design affects two vital functions: (1) generate all possible interpretations of a natural query suitable for evaluation, and (2) stem the complexity and growth of keywords.
- a new NLI method according to one embodiment of the invention uses a branch-and-bound algorithm to search for an optimal interpretation of the natural query based on the logical structure. Case-based reasoning adds to the search to achieve maximum naturalness with minimum enumeration.
- the proposed project formulates for the first time the NLI research as a search problem and relates it to the vast tradition of constrained optimization (e.g., scheduling and traveling salesman). Because, according to one embodiment of the invention, constrained optimization is used, solution to the natural language query problem is bounded. Drawing ideas and strengths from this tradition as well as from the literature of NLI, the new metadata search model according to various aspects of the invention is able to add to the paradigm of logical structures for natural language processing.
- Figure 2 shows a natural language query processor 201 according to one embodiment of the invention.
- Processor 202 receives a natural language query and a plurality of database objects 204, and produces a query result.
- the natural language query may be, for example, a paragraph, a sentence, sentence fragment, or a plurality of keywords.
- the query result may be any information that is relevant to the combination of database objects 204A and query 202.
- the natural language query 202 is mapped to the plurality of database objects 204A using a reference dictionary 208 comprising keywords 209, case information 210, information models 211, and database object values 204B.
- This advantage enables, for example, use of such an NLI on a portable device such as PalmPilot, or other portable device that has limited storage capability. Also, because the portable device may be allocated to a single user, and processor 201 is capable of learning using case-based learning, processor 201 may become more accurate for the particular user.
- query processor 201 includes a reference dictionary object identifier 205 that parses query 202 and generates one or more objects recognized in the reference dictionary 208.
- Reference dictionary object identifier 205 also identifies words that are meaningful in the reference dictionary 1208 and eliminates useless or meaningless words.
- Processor 201 also accepts and processes a number of database objects 204A.
- processor 201 may have an associated reference dictionary 208 that includes keywords 209, case information 210, information models 211 and one or more database objects 204B.
- Keywords 209 may be, for example, a set of keywords and their combinations generated from the plurality of database objects 204 A, which includes one or more objects 214A- 214ZZ. Keywords 209 may also be "learned" from a user through performing queries, or may be provided through a separate keyword administrator interface associated with query processor 201.
- Query processor 201 also includes an interpreter and dictionary processor 207 that receives objects identified by the reference dictionary object identifier 205 and determines an optimal interpretation of the received objects. More specifically, processor 207 determines optimal interpretations of the received objects, resolves ambiguities, updates information models 211, and interacts with users to facilitate learning. Processor 207 utilizes rules 212 and heuristics 213 to resolve ambiguities in determining the optimal interpretation of query 202. Rules 212 and heuristics 213 may relate to information models 211, which are in turn related to keywords 209, cases 210, and database objects 204B in a semantic manner. When there are ambiguities in the interpretation of objects, e.g. multiple possible interpretations, multiple permissible combinations of meaningful objects, etc., rules 212 and heuristics 213 related to these objects are used to reduce or resolve these ambiguities.
- interpreter and dictionary processor 207 that receives objects identified by the reference dictionary object identifier 205 and determines an optimal interpretation of the received objects. More specifically,
- Mapping processor 206 performs a mapping between incoming objects and database objects 204A.
- processor 206 may generate database queries from the objects and the interpretations provided by identifier 205 and processor 207, respectively.
- Processor 206 may, for example, generate SQL queries used to locate database objects 204A. These queries may be executed by an SQL search engine, and processor 201 may provide query result 203 to user through, for example, a graphical user interface.
- Literature on NLI clearly indicates that establishing a complete set of keywords is a key factor in handling ambiguity.
- additional information beyond keywords are used to determine the meaning of an input query. This additional information makes it possible to use a collection of keywords far smaller than those required by conventional NL processing methods.
- resources comprising a data dictionary that are used to relate an incoming query 202 to database object 204A; i.e., cases 210, keywords 209, information models 211, and database object values 208B.
- These resources may be integrated through an extensible metadata representation method so that every piece of resources references to all other related resources in a semantically -based graphic.
- a keyword 209 points to the semantic subject(s) it refers to, which points in turn to entities, relationships, and items pertaining to the subject(s), and ultimately to database object values 204B.
- the keywords 209 also connect to cases 210 involving them.
- the core of the reference dictionary (information model, initial keywords, and database structure) maybe, for example, a design-time product, developed by the analysts, designers, and users. Cases and additional keywords, metadata (e.g., changes to the information model) and database values may be added during operation of the system, and thus the system ages and evolves. A learning mechanism allows richer keywords and cases to provide more accurate performance.
- the reference dictionary enables a computer system to recognize a feasible region of interpretations of the input query 202 and evaluate them.
- the reference dictionary 208 also serves as the basis for interaction with the user (identifying needs and generating meaningful reference points) and acquisition of lessons (determining additional keywords and cases) - i.e., the reference dictionary may be used to assist the user in learning.
- a reference dictionary according to one embodiment of the invention has four fundamental attributes, as compared to conventional systems: the reference dictionary according to one embodiment of the invention generates search-ready graphics-based representation of all four layers of resources; supports learning; simplifies keywords, and assures complete interpretations of natural queries. Regarding the last two points, the inclusion of information models 211 and case information 210 reduces the volume of keywords 209 needed to reduce the first two sources of ambiguity. For example, consider a natural articulation in the form of a short essay. If the essay consists of n words of which m are database objects or other recognized dictionary entries, there could be n/m words associated with each known term. These n/m words become the candidate keywords for the term. When including phrases (grouping of words), there could be, in theory, up to m*(n/m)! new keywords implied from the short essay. It is desired to increase the number m (hits) because the bigger m becomes, the fewer
- Properly-developed information models 211 having rich semantics provide a large m for the initial design of keywords, and increase the chance of subsequent "hits" (their use in queries) in practice resulting in less ambiguity, less possible interpretations to search, and less new keywords needed.
- Case information 210 do not directly change m, but do help in resolving some ambiguity and hence still helps reducing the need for new keywords.
- Information models 211 and cases 210 represent a tightly structured, efficient kernel of meaning with which the users are familiar and tend to use more frequently in their articulation with respect to the particular databases.
- information models 211 and case information 210 also contribute to resolving another type of ambiguity. In particular, they identify the possible missing information for incomplete input, by examining the graphics of the reference dictionary. Therefore, a reference dictionary determines more accurately and quickly than conventional systems a complete set of possible interpretations for queries articulated in a natural language format.
- Step 1 Identify all words and phrases in the input natural language query 202 that also belong to R. Denote this set of elements I (including possibly elements from K, M or D).
- Step 2 Determine all possible, complete paths implied by I that span all input elements and query 202 and belong to the overall graphics of R. These paths might include additional elements inferred from the reference dictionary in order to complete the paths. A complete path includes elements (original or inferred) in M and D. Each path corresponds to a particular interpretation of the original query.
- Step 3 Search for the best interpretation by using branch and bound methods when multiple possible solutions exist. If multiple possible solutions exist, the elements in C that are associated with elements of I are used to resolve the ambiguity.
- Step 4 Map the result to the database query language. Obtain the results of query and confirm them with the user.
- a learning mechanism may be engaged to interact with the user whenever the result provided at each step is insufficient.
- the outcome of the learning is stored in the system 201 as new cases and keywords added to C and K, respectively.
- each step allows for a wide range of possible strategies and algorithms to implement it.
- Reference dictionary 208 may also be based on a metadatabase model described in more detail below with respect to Figure 4.
- a reference dictionary having a model that integrates four different types of enterprise metadata may be used. These metadata types include: database structure, semantic model, application, and software resource.
- the model may be used to form a core of the reference dictionary, and this core may be extended to include other three layers: keywords, cases and database values, and hence form the integrative (connected) structure of the reference dictionary.
- the other benefits of using this model includes its capability to incorporate rules and to support global query processing across multiple databases.
- a modeling system helps the development and creation of the metadatabase.
- FIG. 3 A structure of an example reference dictionary 301 is shown in Figure 3.
- Each object in the figure represents either a table of metadata (in the case of square icon and diamond icon), or a particular type of integrity control rules (in the case of double diamond and broken diamond).
- These metadata include subjects and views, entity- relationship models, contextual knowledge in the form of rules, application and user definitions, database definitions and values, keywords, and cases.
- Keywords are the natural words and phrases users use to refer to database objects and information model elements in natural articulation such as a natural language query. They could represent instances, operators, items (attributes), entities, relationships, subjects, and applications.
- a keyword according to one embodiment of the invention is defined as an ordered pair of (class, object). Classes include Application, Subject, EntRel (entity-relationship), Item, Value, and Operator; all of which are metadata tables shown in Figure 3. Objects are instances (contents) of these classes. Because a hierarchy of objects in the core structure of the reference dictionary is Item- EntRel- Subject- Application, an object can be identified by an ordered quadruple (Item name, EntRel name, Subject name, Application name). In the model, however, each object has a unique identifier, thus the ordered quadruple is not needed to uniquely identify each object. It should be understood that any method for identifying objects may be used.
- system 101 may use a database to store keywords and other information.
- the database is metadatabase, which is well-known in the art of data and knowledge management tools. Metadatabase theory is described in more detail in a number of books and publications, including the book entitled Enterprise Integration and Modeling: The Metadatabase Approach, by Cheng Hsu, Kluwer Academic Publishers, Amsterdam, Holland and Boston, Massachusetts, 1996. Also, metadatabase theory is described in the journal article by Hsu, C, et al. entitled The Metadatabase Approach to Integrating and Managing Manufacturing Information Systems, Journal of Intelligent Manufacturing, 1994, pp. 333-349. In conventional systems, metadatabase theory has traditionally been applied to manufacturing problems. A metadatabase contains information about enterprise data combined with knowledge of how the data is used. The metadatabase uses this knowledge to integrate data and support applications.
- the metadatabase model as shown in Figure 4 uses a structure that shows how a metadatabase system 402 provides an enterprise information model describing data resources of globally-distributed provider systems applications and their control strategy in the form of rules. These globally-distributed systems applications may be executed, for example, at one or more provider systems discussed above.
- the information model also includes knowledge regarding dynamics of information transfer such as "what and how" information is shared among local systems and under what circumstances it is used.
- the information model may be in the form of a metadatabase 401 having data items 404, models 405, rules 406, software resources 407 and application and user information 408.
- a case in a case-based reasoning paradigm typically includes three components: problem definition, solution, and its outcome. New problems would use the problem definition to find the (best) matching cases and apply the associated solutions to them.
- the third component is useful when the domain knowledge is incomplete or unpredictable.
- the reference dictionary contains complete domain knowledge needed, thus, we expand the problem definition but drop outcome.
- the system uses cases to resolve ambiguity in the recognition of meaningful terms (i.e., user's natural terms that are included in the reference dictionary) in the input and to help determine the solution among multiple possible interpretations.
- the case structure includes case-id, case-type, choices, context, and solution.
- a set of known terms describes the context (for problem definition). User's selection among possible choices of the meaningful term defines the solution.
- a set of known elements of the information model describes the context, possible paths in the information model define the choices, and user's selection solution.
- the resources (entries) of the reference model are connected in two ways.
- the structure shown in Figure 3 may be a meta-schema representing the types and organization of all enterprise metadata.
- the elements of information models are metadata instances stored in some of the meta-entities (squares) and meta-relationships (diamonds) of the structure.
- These model elements are themselves connected internally in terms of their entity-relationship semantics. They are also connected externally to other types of resources including database values, keywords, and cases through the meta-schema. Keywords and cases are connected to information models and database values through particular meta-relationships.
- elements of information models (subjects, entities, relationships, and items) and keywords are linked to the database objects they represent. Therefore, the reference dictionary contains sufficient knowledge to determine the database objects involved and required for all queries defined sufficiently in information model elements or keywords.
- Each sufficient statement corresponds to a complete and unique path (connection) of these elements and their corresponding database objects.
- An SQL-like style database called MSQL may determine the shortest path when alternative paths exist. MSQL is discussed further in the j ournal entitled The Model- Assisted Global Query System for Multiple Databases in Distributed Enterprises. ACM Trans. Information Systems, 14:4, October 1996, pp. 421-470.
- These complete paths represent the system's interpretations of users' queries.
- Ambiguity exists when a statement is insufficient such that there are conflicting interpretations - multiple paths leading to different database objects - for the query.
- These multiple paths could be the result either from providing incomplete elements or from providing conflicting elements implied in the input, or both. Such are the cases easily taking place with truly natural articulation of database queries.
- the system employs a rich information model to maximize the chance with which the users would naturally choose its elements in their articulation.
- the system uses keywords to capture the words in the natural articulation that the information model misses.
- the information model is the roadmap (together with database values) for developing keywords at design time. These keywords represent multiple natural equivalents of terms used in the information model (and database values).
- a rich information model not only lessens the burden of "scoring hit" on the keywords, it also greatly reduces the complexity of adding new keywords at the run time.
- it accumulates cases of usage from actual operation and applies them to resolve remaining ambiguity when both information model and keywords are insufficient for a query.
- the reference dictionary contains different interpretations related to the query; interpretations being represented by multiple paths leading to different database objects for the query. Thus, the reference dictionary allows the system to definitively measure and identify ambiguities, based on the following graphical model of the logical structure of the metadata.
- Search includes the identification of all possible paths-interpretations (when ambiguity exists) and the evaluation of them.
- a search algorithm could follow a branch- and-bound strategy to minimize the space of search (limiting the number of possible paths to search).
- bounds and branching rules would require a way to evaluate a given path with respect to the original natural query.
- a method for eliminating paths may also be used; that is, the system could infer contradiction based on the information model and perhaps operational rules (contextual knowledge) the reference dictionary contains.
- a method of optimization - inferring goodness of fit for the user - could be performed. Information about user's profile, concerned applications, and past cases are among the metadata that could be used form a basis to identify the most probable interpretations. Elimination is more conservative, but robust, than optimization because elimination places safety (correctness) first.
- the logical structure of the reference dictionary may be represented graphically according to one embodiment of the invention.
- Interpretations of a natural language query are defined on a graph G (Definition 1 below), abstracted from the (content of) reference dictionary.
- G graph G
- the natural language interface Given a natural language query Q (Definition 2), the natural language interface performs interpretation in several steps. It first determines all recognized terms (Definition 3) contained in Q and their corresponding recognized vertex sets (Definition 4) in G. It then identifies all query images (Definition 5) of Q on G. Since Q may be ambiguous (e.g., incomplete and multi-valued mapping) to G, each of its recognized terms could correspond to multiple recognized vertices, resulting in multiple query images.
- Q may be ambiguous (e.g., incomplete and multi-valued mapping) to G, each of its recognized terms could correspond to multiple recognized vertices, resulting in multiple query images.
- a recognized vertex may not always connect to other recognized vertices in a way covering a complete range of data semantics (database value, attribute, entity, and relationship) with a unique path. Therefore, it could have multiple semantic paths (Definition 6) covering multiple semantic domains (Definition 7). Taking these data semantics into account results in all possible query graphs for a query image, called feasible graphs (Definition 8). The refinement of feasible graphs leads to connected feasible graphs (Definition 9) and complete query graphs (Definition 10).
- a complete query graph represents an interpretation of the natural language query Q according to the logical structure G.
- the branch and bound algorithm searches implicitly all possible inte ⁇ retations to determine the final query graph for execution.
- the logical structure G is a graph ⁇ V, E>, where sets V and E are defined on the reference dictionary.
- V is a set of vertices of five types: subjects, entities, relationships, attributes, and values; and E is a set of their connection constraints (owner-member and peer-peer associations).
- Owner-member constraints belong to two types: subject-(sub)subject-entity/relationsl ip-attribute-value and subject- attribute.
- Peer-peer constraints belong to three types: entity-entity, entity-relationship, and relationship-relationship.
- Definition 2 A natural language query Q is a string of characters segmented by spaces.
- Definition 3 A recognized term ti of Q is a segment of Q matching some keywords in the reference dictionary or some vertices in graph G.
- a recognized vertex set of a recognized term ti , Vt j is a set of vertices of G that matches ti.
- a member of Vtj is a recognized vertex.
- a semantic path of a recognized vertex vi is a minimum set of vertices in G containing vi that satisfies the following conditions: it contains a subject vertex and its vertices are connected. The vertices it contains, other than vi, are all implied vertices by vi to provide an interpretation (semantics) of vi.
- Definition 7 A semantic domain of a semantic path in graph G is a sub-graph of G that includes the semantic path, its member vertices, and edges connecting them.
- Definition 8 A. feasible graph of an n-recognized- vertex query image, where n
- sdi 1 , ... , n and sdi is a semantic domain implied by a semantic path of recognized vertex vi of the query image.
- a connected feasible graph is a connected sub-graph of G containing the feasible graph and a collection of entity/relationship vertices and edges in this graph. This collection is minimally sufficient to connect entity/relationship vertices of the feasible graph.
- a query graph is a connected feasible graph. It represents an interpretation of a query that is definitive for a database query such as QBE and SQL to process the query.
- a database query such as QBE and SQL
- the system used the well-known TSER - Two Stage Entity-Relationship method (whose constructs include Application-Subject-Context-Entity-Relationship-Item described more in detail in the journal article entitled Paradigm Translations in Integrating Manufacturing Engineering Using a Meta-Model: the TSER Approach, by Cheng Hsu et al., J. Information Systems Engineering, 1:1, September 1993, pp. 325-352) to develop their information models and created a reference dictionary.
- the information model would be sufficient to sort out the ambiguity and suggest a unique, optimal interpretation for these terms, and hence for the natural query. Still, cases could also be used either to confirm or to assist the resolution of ambiguity.
- there may be another kind of ambiguity in the input the user indicated " around" 20th of last December in the original natural query. Because of this ambiguity, the user may find the final answer less than satisfactory.
- the system generally would have no method for interpreting correctly this piece of input since the user herself was ambivalent about it. There may be, in this instance, no proper solution other than to leaving the interpretation to the user.
- the final answer (based on 12/20/1999) may represent the best point estimation for the user's fuzzy interval of possibilities.
- cases - i.e., matching a query with a case - is based on the vector space model as is known in the art.
- Two binary vectors represent a case (C) and a query (Q); and their COSINE measure indicates the goodness of fit.
- COSINE measure indicates the goodness of fit.
- Form a term space as an ordered n-tuples of terms /* n is a number of terms in the base set */ Form a binary vector for query (Q) corresponding to the meaning space
- COSINE (Q, C) (Q.C)/(
- mapping performs processing in order to determine the GET lists and some conditions (such as AND/OR). However, at this point, the reference model would have all information needed to perform the query.
- a branch and bound algorithm including the optimal search strategies and evaluation rules.
- a reliable idea is to use a method of elimination; that is, the system could infer contradiction based on the information model and operational rules (contextual knowledge) to remove certain interpretations.
- the method of optimization - inferring goodness of fit for the user - could be another possibility.
- Information such as user profile, concerned applications, and past cases is among the metadata that could help identify the most probable interpretations.
- a method is provided to enumerate all possible interpretations for a natural query, the optimal evaluation methods to improve the performance of search, and case-based reasoning and other heuristics to enhance user-feedback and assure closure.
- An interpretation problem is formulated as an optimization problem with objective function z(t) where t is a terminal vertex.
- a goal is to minimize z(t) with respect to graph G.
- An evaluation function LB(v) finds the lower bound for an intermediate vertex v, so that the search could either fathom all paths starting with v, or pick the most promising v to explore in more detail.
- the evaluation embodies operating rules and heuristics. Note that the search is complete; i.e. there always exists an optimal solution in the search space and the search will find it.
- Below is an example branch and bound algorithm that may be used according to one embodiment of the invention:
- successor_set branch (current_vetrex) //branch() return with successor's LB orz() For each successor in successor_set ⁇
- cases - i.e., matching a query with a case - is based on the vector space model.
- Two binary vectors represent a case (C) and a query (Q); and their COSINE measure indicates the goodness of fit.
- anNL system processes an input according to a series of steps.
- the NL system may accept an input (such as a query string), determine recognized terms and their vertices, and determine minimal recognized terms and their minimal recognized vertices.
- the NL system may then search for minimum cost query graphs, eliminate redundant solutions and determine complete solutions, and translate these complete solutions to query language statements.
- the following are a series of steps and algorithms that may be used to inte ⁇ ret an input database query.
- Step 1 Determine recognized terms and their recognized vertices (Algorithm 1)
- Step 2 Determine minimal recognized terms and their minimal recognized vertices (Algorithm 2)
- Step 3 Search for minimum cost query graphs (Algorithm 3)
- Step 4 Remove redundant solutions (Algorithm 4)
- Step 5 Determine complete solutions (Algorithm 5)
- Step 6 Translate to SQL statement
- Algorithm 2 determine minimal recognized terms and their minimal recognized vertices For each recognized term ⁇
- Figure 5 shows an example complete search graph having a number of query images, query graphs, and feasible graphs arranged in a hierarchical structure.
- Algorithm 3 Search for minimum cost query graphs (query graph containing minimum vertices)
- Algorithm 5 Determine complete solutions for each solution ⁇ determine target attribute list determine er set determine join condition determine selection condition ⁇
- Example Query Get orderjd, model, wo_quan, and num_completed of John Smith's orders.
- the NL system determines recognized terms and their vertices. Below are example recognized terms and vertices corresponding to the example query above.
- _6, WORKJDRDER), (I opsMOO, ORDERJTEM), (I sfcl_17, WORK_ORDER), (I sfcl_20, WORK_ORDER) ⁇ ValueJ ⁇ r_set ⁇ ( V opsl_90
- John Smith, CUSTOMER) ⁇ LB 10 A corresponding feasible graph is shown in Figure 10.
- ErSet ⁇ WORKJDRDER, PART, ORDERJTEM, ORDERJ-IEADER,
- John Smith, CUSTOMER) ⁇ Z 10
- John Smith, CUSTOMER) ⁇ Z 10
- Solution 1 Query graph 1 of FG33
- Solution 2 Query graph 1 of FG12
- Solution 3 Query graph 1 of FG11
- Solution Set after removing redundancy Solution 1: Query graph 1 of FG33
- a particular enterprise database or distribution of databases necessarily has only a finite number of database objects, including data types, data instances, and database computational operators (also referred to as database values). They give rise to only a finite number of possible permutations. All meaningful queries must ultimately refer to these database objects and the articulation of any queries must correspond to some of these permutations. When the system identifies the permutations implied, it has interpreted the query.
- Section 1 introduces a basic model of the reference dictionary as a graph of complete interpretations of natural language queries for use in enterprise databases and as a resource for resolving ambiguity.
- the graph incorporates four resource layers (database values, information models, keywords, and cases).
- Section 2 describes a search and learn process that identifies a single and complete interpretation (path) for a query by identifying all possible complete paths. This spans all recognized elements of the query as well as intermediate elements in this graph to complete the path, and finally evaluates them.
- ambiguity multiple paths
- users are engaged in dialogue in determining the correct path.
- users are engaged in dialogue in evaluating the result of the query and acquiring new keywords.
- a basic graph model of the reference dictionary is designed to serve three purposes: (1) to include all possible interpretations for a natural language query suitable for evaluation, (2) to provide a basis for evaluation (resolution of ambiguity), and (3) to support learning. It integrates four resource layers (database values, information models, keywords, and cases) into a single graph model. In this graph, therefore, every element can refer to all related elements.
- the design adopts the entity-relationship data model as a conceptual model because it is both sufficient for modeling the semantics of database objects and other database models can be translated into it. Therefore, in this graph, vertices constitute an entity or a relationship (an associative integrity constraint) and edges are connections based on their relationship constraints.
- the design follows three steps.
- the first step models database objects as a database graph.
- the second step models indirect references to these objects.
- the final step models additional resources for evaluation.
- S, R, A, and D respectively are the finite sets of entities, relationships, attributes, and values; and S Ru Au D is a set of enterprise database objects (O).
- H is an entity set of high-level hierarchical objects
- V Ou H
- E ⁇ (Xx Y) (AxD) (Hx (H Y)).
- Edges (Hx(HuY)) satisfy the following properties:
- T an associative relationship set of contextual knowledge among elements of H
- V M u K T
- Edges ((H uP)x(Uu P)) satisfy the property: for all p e P, if p represents an association between he H and u TJ, then edges ⁇ h, p ⁇ and ⁇ u, p ⁇ e E.
- V Mu Ku TuUu Pu Cu Z and Ec(X ⁇ Y)u(A ⁇ D)u((HuT) ⁇ (HuYuT))u(A ⁇ F)u(KxM)u((H P) x (U u P)) u ((M Z) x (C Z)).
- An ambiguous (incomplete) input is an input that corresponds to more than one interpretation (represented by more than one path in the graph defined in Section 1.1 and Section 1.2(1), which this graph denotes as G).
- the ambiguity stems from a meaningful term (a word or phrase) in the input which refers to more than one element in the graph G. Therefore, there exists more than one target graph for each ambiguous input.
- a target graph is a subset of the set of vertices in the graph G and its vertices corresponds to each recognized term of the input. Each of these target graphs may determine more than one query graph.
- a query graph is the connected graph of a target graph. It represents the complete path for each interpretation.
- a complete path is the path that spans all vertices of its target graph and may include intermediate vertices in the graph G to complete it.
- An unambiguous (complete) input is the input that corresponds to a unique complete path in the graph G.
- the basic idea of search is to enumerate all possible paths (query graphs) and to evaluate (rank) them to determine the best path (inte ⁇ retation) for a query.
- search there are two sources of ambiguity: multiple meanings of a term in the input and multiple paths determined from a target graph. These sources can lead to a huge number of complete paths in the graph G. So, it is inefficient to generate all of these possible paths.
- the primary goal is to place correctness first while generating the minimal set of possible complete paths (i.e., correctness and efficiency are our goals).
- the goal is to determine the minimal set of candidate interpretations (minimize number of possible complete paths) without excluding any meanings of terms from the input and suggesting the best interpretation for the user.
- the next task is to obtain a minimal set of query graphs corresponding to the obtained set of target graphs.
- the query graph of a target graph is the connected graph that spans all vertices of the target graph and intermediate vertices of G to connect them.
- the number of connections is minimal. This bases on the observation that the semantically related terms tend to be near each other in G. Note that this problem can be modeled as a discrete optimization problem to determine the shortest path connecting all elements of a target graph. If there exists more than one target graph corresponding to a query, the next query graph will be generated based on the accumulated knowledge from the generation of the past query graphs. However, the new set of query graphs may contain identical elements. These can be reduced to obtain a minimal set of query graphs.
- the last task of the search process is to evaluate candidate query graphs (inte ⁇ retations) to suggest the best inte ⁇ retation for the user.
- This task follows three steps.
- the first step uses heuristics to measure the semantic relatedness of term meanings based on the same observations used in generating query graphs. Thus, the query graphs with the shortest path length (minimum number of edges) have the greatest semantic relatedness.
- the next two steps are employed if the result of the previous step is ambiguous.
- the second step identifies the solution based on high-level object-usage history of users (the extension of graph G defined in 1.3(2)).
- the last step identifies the solution based on the past cases (the extension of graph G defined in 1.3(3)).
- the above process may still lead to three results: no inte ⁇ retation, one inte ⁇ retation, and multiple inte ⁇ retations.
- the system will resolve this ambiguity prior to presenting candidate interpretations back to the user to choose the correct one.
- the result of the best inte ⁇ retation is that which is evaluated by the user. If the user does not accept the result or there is no inte ⁇ retation of the query, the system will engage the user in a dialogue relying on the learning mechanism.
- Step 1 Identify all meaningful words or phrases in the input and their meanings (elements in G).
- Step 2 Determine all possible and complete paths (inte ⁇ retations) implied by I 2 .
- CCG MI consistent target graphs
- Step 3 Evaluate candidate inte ⁇ retations (query graphs).
- LQGMI minimum-length query graphs
- Step 3.2 Determine the set of the most likely interpretation (when ambiguity exists as the result of Step 3.1) such that for all QG in this set its measure of similarity between its set of high-level objects (HQ G ) of QG in LQG MI and a set of high-level object-usage of the user (H ) is maximal. If there exists one best inte ⁇ retation based on this measure, then go to Step 4.
- Step 3.3 Determine the set of the most likely inte ⁇ retation (when ambiguity exists as the result of Step 3.2) such that for all QG in this set its measure of similarity between the I and a set of terms described cases is maximal and the case solution is the most similar to the query graph QG. If there exists multiple inte ⁇ retations, then ask users to choose the correct inte ⁇ retation.
- Step 4 Map the query graph to the database query language, resolve semantic ambiguity result from the query processing, obtain the results of query and confirm them with the user.
- Step 4.1 Determine the database query language (DBL) such that the DBL represents the final inte ⁇ retation (QG, ⁇ SC> QG ).
- Step 4.2 Determine the best semantics of DBL (if there exists semantic ambiguity from the query processing) such that the measure of similarity between its correspondent database graph and the case is maximal and the best semantics of DBL is the solution of the case. If using cases cannot resolve this problem, then ask users to choose the correct one.
- Step 4.3 Obtain the results of query and confirm them with the user. If the user accepts them, the learned lessons will be appropriately updated as new keywords (the extension of graph G defined in Section 3.1.2(3)) and cases (the extension of graph G defined in Section 1.3(3)). Otherwise, the system will engage the user in dialogue in the learning mechanism to acquire new keywords and then go to Step 1.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
- Document Processing Apparatus (AREA)
- Cash Registers Or Receiving Machines (AREA)
Abstract
Description
Claims
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US20572500P | 2000-05-19 | 2000-05-19 | |
US205725P | 2000-05-19 | ||
PCT/US2001/016459 WO2001090953A2 (en) | 2000-05-19 | 2001-05-21 | Natural language interface for database queries |
Publications (1)
Publication Number | Publication Date |
---|---|
EP1282870A2 true EP1282870A2 (en) | 2003-02-12 |
Family
ID=22763376
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP01937641A Withdrawn EP1282870A2 (en) | 2000-05-19 | 2001-05-21 | Natural language interface for database queries |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP1282870A2 (en) |
AU (1) | AU2001263354A1 (en) |
CA (1) | CA2409734A1 (en) |
WO (1) | WO2001090953A2 (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050283473A1 (en) * | 2004-06-17 | 2005-12-22 | Armand Rousso | Apparatus, method and system of artificial intelligence for data searching applications |
EP1914639A1 (en) * | 2006-10-16 | 2008-04-23 | Tietoenator Oyj | System and method allowing a user of a messaging client to interact with an information system |
US20130124194A1 (en) * | 2011-11-10 | 2013-05-16 | Inventive, Inc. | Systems and methods for manipulating data using natural language commands |
WO2013115985A2 (en) * | 2012-02-01 | 2013-08-08 | Siemens Corporation | Architecture for natural language querying in service analytics domains |
CN108804580B (en) * | 2018-05-24 | 2021-05-25 | 湖南大学 | Method for querying keywords in federal RDF database |
-
2001
- 2001-05-21 AU AU2001263354A patent/AU2001263354A1/en not_active Abandoned
- 2001-05-21 WO PCT/US2001/016459 patent/WO2001090953A2/en not_active Application Discontinuation
- 2001-05-21 CA CA002409734A patent/CA2409734A1/en not_active Abandoned
- 2001-05-21 EP EP01937641A patent/EP1282870A2/en not_active Withdrawn
Non-Patent Citations (1)
Title |
---|
See references of WO0190953A2 * |
Also Published As
Publication number | Publication date |
---|---|
CA2409734A1 (en) | 2001-11-29 |
WO2001090953A2 (en) | 2001-11-29 |
AU2001263354A1 (en) | 2001-12-03 |
WO2001090953A3 (en) | 2002-08-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7177798B2 (en) | Natural language interface using constrained intermediate dictionary of results | |
Katsogiannis-Meimarakis et al. | A survey on deep learning approaches for text-to-SQL | |
Kumar et al. | Deep learning driven natural languages text to SQL query conversion: a survey | |
Das et al. | MyNLIDB: a natural language interface to database | |
Wiederhold | The roles of artificial intelligence in information systems | |
Dongilli et al. | Semantics driven support for query formulation | |
WO2001090953A2 (en) | Natural language interface for database queries | |
US8321200B2 (en) | Solving constraint satisfaction problems for user interface and search engine | |
Börner | CBR for Design | |
Damiani et al. | A descriptor-based approach to OO code reuse | |
Brajnik et al. | An expert interface for effective man-machine interaction | |
CN113392202A (en) | Knowledge graph-based question-answering system and method | |
Rau et al. | NL∩ IR: Natural language for information retrieval | |
Grant et al. | Query-driven sampling for collective entity resolution | |
Atkinson | A Formal Model for Integrated Retrieval from Software Libraries | |
Kwan et al. | A hybrid approach to convert relational schema to object-oriented schema | |
Boonjing et al. | Natural language interaction using a scalable reference dictionary | |
González | Applying knowledge modelling and case-based reasoning to software reuse | |
Scheuermann | On the design and evaluation of data bases | |
Vlasenko | Saturation-based Algebraic Reasoning for Description Logic ALCHQ | |
Baik | Maximizing User Domain Expertise to Clarify Oblique Specifications of Relational Queries | |
Boonjing et al. | A New Feasible Approach to Natural Language Database Query | |
JPH10247197A (en) | Data mining device | |
CN118780398A (en) | Large model training method and data query method based on large model | |
Flater et al. | Towards flexible distributed information retrieval |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20021127 |
|
AK | Designated contracting states |
Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR |
|
AX | Request for extension of the european patent |
Extension state: AL LT LV MK RO SI |
|
RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: BOONJING, VEERA Inventor name: HSU, CHENG |
|
17Q | First examination report despatched |
Effective date: 20031009 |
|
RBV | Designated contracting states (corrected) |
Designated state(s): AT BE CH DE FR GB LI |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20050209 |
|
RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: BOONJING, VEERA Inventor name: HSU, CHENG |