WO2011051970A2

WO2011051970A2 - Method and system for obtaining semantically valid chunks for natural language applications

Info

Publication number: WO2011051970A2
Application number: PCT/IN2010/000693
Authority: WO
Inventors: Shailly Goyal; Shefali Bhat; Shailja Gulati; Chandrasekhar Anantaram
Original assignee: Tata Consultancy Services Ltd.
Priority date: 2009-10-28
Filing date: 2010-10-27
Publication date: 2011-05-05
Also published as: WO2011051970A3

Abstract

A method and system for obtaining semantically valid chunks for natural language applications are disclosed in the present invention. The method includes the following steps: identifying predicates, objects and comparison operators in a natural language query; binding the identified predicates and objects using the identified comparison operator; replacing all occurrences of string comparator operators in said natural language query with corresponding mathematical operators; binding predicates and objects of same data type; checking compatibility of bound predicates and objects using domain ontology; binding string objects to their compatible predicates using domain ontology; forming constraint predicate sets from the remaining predicates of the query in order to find semantically valid chunk sets from said natural language query; syntactically parsing natural language query for absolving ambiguities; determining the depth between any two predicates using domain ontology, thereby providing a syntactically and semantically valid chunk set adapted to be used as a query.

Description

METHOD AND SYSTEM FOR OBTAINING SEMANTICALLY VALID CHUNKS FOR NATURAL LANGUAGE APPLICATIONS

FIELD OF THE INVENTION

The present invention relates to the field of natural language question answering systems.

Particularly, the present invention relates to the application of ontology in natural language question answering systems.

BACKGROUND OF THE INVENTION

Natural language (NL) enabled question answering systems for business applications aim at providing appropriate answers to the user queries. In such systems, query interpretation is a fundamental task. However, due to the innately ambiguous nature of the natural language, interpretation of a user's query is usually not straightforward. The ambiguity can be either syntactic, (for example, prepositional phrase (PP) attachment), or it can be semantic. In order to resolve such ambiguities, NL enabled question answering systems mostly use general purpose NL parsers. Although these parsers give syntactically correct chunks for a sentence, these chunks might not be semantically meaningful in a domain. This can be illustrated with the following queries:

• "Give the employees working in loss making projects". From this query, a human being can easily disambiguate that "loss making" is a modifier of "projects", and "working in loss making projects" is a modifier of "the employees". That is, the correct chunks can be represented as: "[Give [[the employees] [working [in [loss making [projects]]]]]]". However, the chunks obtained from a general purpose

l NL parser are "[Give [[[the employees] [working [in [loss]]]] [making [projects]]]]", which will be interpreted as "Give the employees who are working in loss and who make projects".

• "Give the projects having costing and billing >$25000 and <$35000, respectively". The general purpose NL chunker may interpret and chunk this query as "[Give [[the projects] [having [[costing] and [billing]] [>$25000] and [<$35000]]], respectively". From these chunks it is not possible to identify that "costing >$25,000" and "billing <$35,000" are the two constraints which modify "the projects".

Thus the chunks obtained from such general purpose NL parsers may not be helpful in extracting the answer to the user's query. The problem becomes even more severe in case of complex queries involving multiple constraints and nested sub questions. Thus, the problem is the requirement of a method to automatically enrich the output of a general purpose NL parser with the domain knowledge in order to obtain syntactically as well as semantically valid chunks for the queries in the domain.

Several attempts have been made to process natural language questions as disclosed in the documents given below.

United States Patent No. US6829603 (Androutsopoulos et. al.) discloses the processing of natural language questions to obtain an equivalent structured query. However, the method disclosed in US6829603 cannot interpret real- life natural language questions properly. Some methods which can interpret real-life natural language questions properly depend so much on domain specific rules, that porting to other domains becomes an issue. Popescu et al. (Paper: Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability, 2004) adapts the Charniak parser (Charniak et. al., Paper: A maximum-entropy-inspired parser) for domain-specific question answering by extending the training corpus of the parser with a set of 150 hand-tagged domain specific questions. Further, semantic rules inferred from domain knowledge are used to check and correct preposition attachment and preposition ellipsis errors. Katz et. al. (Paper: Syntactic and semantic decomposition strategies for question answering from multiple resources, (START), 2005) decomposes complex questions syntactically or semantically to obtain sub questions that can be answered from available resources. If these answers are not sufficient to solve the question, semantic information (in the form of rules that map 'key' domain questions to the answers) is used. The main drawback of these approaches is that the creation of domain specific rules is very resource intensive, and hence restricts portability.

Lopez et. al. (Paper: AquaLog: An ontology-driven question answering system for organizational semantic intranets, 2006) tries to transform the NL question to ontology specific triples using syntactic annotations, semantic terms and relations, and question words to interpret the natural language question. If these cannot resolve the ambiguity in the question, domain ontology and/or WordNet are used to make sense of the input query. There have also been some attempts for adapting general purpose natural language POS (Parts of Speech) taggers or parsers for a given domain. Coden et. al. (Paper: Domain-specific language models and lexicons for tagging 2006) adds a small domain specific POS tagged corpus to a large general English training set to build a POS tagger for the specific domain. Miller et. al. (Paper: Rapid Adaptation of POS Tagging for Domain Specific Uses, 2006) trains a generic domain POS tagger for biomedical texts by extending it with a lexicon that is updated to include domain-specific information based on the morphological rules specific to the domain. Pyysalo et. al. (Paper: Lexical adaptation of link grammar to the biomedical sub language: a comparative evaluation of three approaches, 2006) adapts a general purpose English parser to suit domain specific sentences by adding domain specific terminology to the lexicon of a parser, and by providing the parser with domain specific morphological rules to predict the morpho - syntactic class of unknown words.

None of the abovementioned work and documents disclose methods to enrich a general purpose NL parser with domain knowledge to obtain semantically valid chunks for an input query. Therefore, it is felt that there is a need for a method and system for obtaining semantically valid chunks for natural language applications which:

• can interpret queries easily, efficiently and effectively;

• can parse the query correctly with regard to both syntax and semantics;

• has required domain knowledge; and

• can identify the predicate - object pairs of a query correctly. OBJECTS OF THE INVENTION

it is an object of the present invention to provide a method and system for obtaining semantically valid chunks for natural language applications which can interpret queries easily, efficiently and effectively.

It is another object of the present invention to provide a method and system for obtaining semantically valid chunks for natural language applications which can parse the query correctly with regard to both syntax and semantics.

It is yet another object of the present invention to provide a method and system for enriching a general purpose natural language parser with the domain knowledge (in the form of domain ontology) so that the semantically valid chunks for natural language query can be obtained.

It is still another object of the present invention to provide a method and system for obtaining the correct predicate - object pairs of a natural language query so that the constraints in the query can be identified.

One more object of the present invention is to provide a method and system for identifying those syntactic parses of the natural language question which are semantically valid in the domain.

SUMMARY OF THE INVENTION

According to this invention, there is provided a system for obtaining semantically valid chunks for natural language application queries, said system comprises: - predicate identifying means for identifying predicates in a natural language query;

- object identifying means for identifying objects in said natural language query;

- identification means for identifying comparison operators in said natural language query;

- first binding means adapted to bind said identified predicate with said identified objects using said identified comparison operator;

- mathematical operator dictionary means adapted to replace all occurrences of string comparator operators in said natural language query with corresponding mathematical operators;

- second binding means for binding predicates and objects of same data type;

- compatibility checking means for checking compatibility of bound predicates and objects using domain ontology;

- third binding means for binding string objects (not previously bound to any predicate) to their compatible predicates using domain ontology;

- constraint predicate set forming means for forming constraint predicate sets from the remaining predicates of the query in order to find semantically valid chunk sets from said natural language query;

- syntactic parsing means adapted to syntactically parse natural language query for absolving ambiguities; and

- depth determination means, using domain ontology, for determining depth between any two predicates, thereby providing a syntactically and semantically valid chunk set adapted to be used as a query. Typically, said system includes a pre-defined default operator pre-fixing means adapted to pre-fix said default operator to said bound predicate and objects not having any numerical value in said natural language application.

Typically, said system includes grouping means for grouping said identified predicates and said identified objects that immediately follow/precede the POS tags of assignment words.

Typically, said third binding means includes compatibility checking means for checking compatibility of bound string objects with predicates using domain ontology.

According to this invention, there is provided a method for obtaining semantically valid chunks for natural language application queries, said system comprises the steps of:

- identifying predicates in a natural language query;

- identifying objects in said natural language query;

- identifying comparison operators in said natural language query;

- binding said identified predicate with said identified objects using said identified comparison operator;

- replacing all occurrences of string comparator operators in said natural language query with corresponding mathematical operators;

- grouping predicates and objects;

- binding predicates and objects of same data type; - checking compatibility of bound predicates and objects using domain ontology;

- binding string objects (not previously bound to any predicate) to their compatible predicates using domain ontology;

- forming constraint predicate sets from the remaining predicates of the query in order to find semantically valid chunk sets from said natural language query;

- syntactically parse natural language query for absolving ambiguities; and

- determining depth between any two predicates, using domain ontology, thereby providing a syntactically and semantically valid chunk set adapted to be used as a query.

Typically, said method includes the step of pre-fixing means said default operator to said bound predicate and objects not having any numerical value in said natural language application.

Typically, said method includes the step of grouping said identified predicates and said identified objects that immediately follow/precede the POS tags of assignment words.

Typically, said step of binding string objects (not previously bound to any predicate) to their compatible predicates using domain ontology includes the step of checking compatibility of bound string objects with predicates using domain ontology. BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS

The method and system for obtaining semantically valid chunks (SVC) for natural language applications will now be described with reference to the accompanying drawings, in which:

Figure 1 illustrates the overview of the method in accordance with the present invention;

Figure 2 illustrates the flow diagram of the method of constraint identification in accordance with the present invention; and

Figure 3 illustrates the flow diagram of the method of semantically valid chunk set formation in accordance with the present invention.

DETAILED DESCRIPTION OF THE INVENTION

The drawings and the description thereof are merely illustrative of a method and system for obtaining semantically valid chunks for natural language applications and only exemplify the system of the invention and in no way limit the scope thereof.

The system in accordance with the present invention is robust enough to analyze, understand and comprehend the question posed to it and to come up with the appropriate answer. This requires correct parsing, chunking, constraints formulation and sub query generation. Although most general purpose parsers parse the query correctly, due to lack of domain knowledge, domain relevant chunks are not obtained. Therefore, the method in accordance with the present invention focuses mainly on enriching general purpose parsers with domain knowledge using domain ontology in the form of RDF. Constraints formulation and sub query generation are handled which form the backbone of any robust NL system. Tackling all these issues make any natural language enabled business application system more robust, and enables it to handle even complex queries easily, efficiently and effectively.

NL based question answering system requires the queries to be analyzed and chunked in an appropriate manner to carry out the correct query generation and answer extraction. An NL query typically has a set of unknown predicates whose values need to be determined based on the constraints imposed by the remaining part of the query other than the predicates. Domain ontology along with a POS tagger is used to identify the constraints in the query. These constraints along with the domain knowledge and the parsed structure of the query are used to find the semantically valid chunk set. These chunks are then converted to a formal query language and the answer is retrieved from the ontology. Figure 1 illustrates the overview of the method in accordance with the present invention. In Figure 1 , solid arrows represent the process flow for a query, and dashed arrows represent the information flow.

For the appropriate interpretation and analysis of the query, the important areas those are to be analyzed and addressed can be summarized as:

• Constraint identification: This involves identifying the correct predicate - object pairs. • Semantically valid chunk set identification: This involves identification of the valid constraints for the unknown predicates so that correct interpretation of the given query can be ensured.

• Query generation: In this step, semantically valid chunks are converted to appropriate formal language query using the domain ontology.

Semantic web technologies (Antoniou and van Harmelen, 2004) are used to create the domain ontology in RDF (Resource Description Framework) format using the relational data of the business application along with its meta information stored in the seed ontology (Bhat et al., 2007). The ontology D₀ of a domain D describes the domain terms and their relationships in the {subject - predicate - object} format. For illustration, {Ritesh - project name - Bechtel} indicates that the predicate 'project name' of the subject 'Ritesh' has object 'Bechtel'. A synonym dictionary having information about the synonyms of the domain terms is also maintained. The domain ontology and synonym dictionary are used to identify the concepts in the user query Q posed in the domain D. The domain ontology D₀ is used to further classify the concepts as predicates and objects. For a query Q, we denote the set of predicates as ¾ ⁼ i i - P^- -^■ - Pn I =K ^S - Pi — o} € U^Q . _an(j p. j_s p_resent i_n the query Q) . The set of objects present in the query Q is °Q = W > · ·^■ o_m \ 3(s - p - o_t) € D_c> _{or 0}. _{ls a} numerical/date value, and o, is present in the query Q) .

For a successful query creation and execution, the process of identification and formulation of correct constraints is of utmost importance. Constraint identification involves binding each 'objecf in the query with its corresponding 'predicate' . This predicate-object pair is referred to as a 'constraint'. Figure 2 illustrates the flow diagram of the method of constraint identification in accordance with the present invention.

A constraint is defined as = ^ °i °i ^PQ> and _0j is the value for the predicate p_t in Q. All the constraints in the query Q are identified, and ^" '- ' ^{' " '} ^^{n s} denotes the constraints set. Predicate used in any constraint is referred to as a constraint predicate. The set of constraint predicates is ^PQ ⁼ i^Pi I ^{Pi e PQ} such that ³^ ⁰** ^€

Predicates that do not form part of the constraint set are referred to as unknown predicates. The set of unknown predicates is

In a natural language query, constraint identification (or predicate - object binding) needs special attention due to the following reasons:

Unspecified predicates: For some (or all) of the objects present in the query, the corresponding predicates might not be explicitly specified. Yet, these predicates are to be identified and bound to their respective objects. For example, in the query 'Give me the role of Puneet in the project having Ritesh as project leader", the predicate set is given by P_Q = {role, project name}, and the object set is given by O_Q = {Puneet, Ritesh, project leader}. Here the objects 'Puneet' and 'Ritesh' are to be attached to the corresponding predicate 'employee name', which is not specified in the query. Hence, the system has to drill and extract the required predicate. Constraint vs. unknown predicate: The issue of unspecified predicates becomes even more severe when a predicate p_t for an object o is present in the query, but the same predicate p also happens to be an unknown predicate. For example, in the query mentioned above, the value 'project leader' in O_Q is compatible to the predicate 'role' in P_Q. But this predicate and its value are not to be bound as the predicate 'role' is an unknown predicate, whose value is to be determined.

Numerical object and mathematical operator binding: Many times the query posed might entail numerical value comparison. Hence, such questions involve the usage of comparative operators. These operators can be specified in many ways; like '<', '>', or in words like 'less than', 'below', or assignment words like 'is', 'as' etc. Sometime there might not be any word or operator specified between the predicate and its object. Thus these operators are to be identified and bound with the correct object.

Predicates followed by the respective objects: In questions with multiple constraints, sometimes a predicate and its object may not be given consecutively. Instead, the query may have a predicate list followed by the corresponding object list (or vice versa). There is a need to identify and bind the appropriate predicate - operator - object group from the predicate and the object lists.

The main steps of the process for predicate - object discovery and binding in accordance with the present invention are as follows:

Step 1 - Binding operator object pairs for numerical/date objects: The first step towards operator object pair binding is the identification of the comparison operators in the query. For operator identification, the system in ^c accordance with the present invention maintains a mathematical operator dictionary. All the occurrences of the string comparators in the question are replaced by the corresponding mathematical comparator. Also, if there is any numerical value in the question that is not preceded by any operator, '=' operator is prefixed by default. Thus, the corresponding operator object pairs are formed.

Step 2 - Grouping the predicates and objects that immediately follow/precede the POS tags of the assignment words, such as 'VBZ', 'VBP', 'IN', 'SYM' and the like. The predicates that are immediately followed (or preceded) by any object are grouped. In case there is a list of predicates and a list of objects satisfying the above, then these lists are also grouped. These groups are the possible pairs for predicate object binding.

Step 3 - From the groups obtained in Step 2, binding the predicates and objects of the same data type: The compatibility for the predicate and the object is also checked using domain ontology. While using a predicate list and its object list, one-on-one binding is done. These compatible predicate object pairs form the constraints of the query.

Step 4 - The string objects that are not bound to any predicate in Step 3 are bound to their compatible predicates. The compatible predicate for an object is determined using the domain ontology.

Step 5 - The predicates bound to any object in Step 3 form the constraint predicate set, and the remaining predicates constitute the unknown predicate set. The constraint sets thus obtained are used to find the semantically valid chunk set as discussed below.

Figure 3 illustrates the flow diagram of the method of semantically valid chunk set formation in accordance with the present invention. Semantically valid chunk set identifies the conditions on each unknown predicate in the query, and are constituted from the constraints and unknown predicates. Due to the syntactic ambiguity, more than one syntactic parse might be obtained for an NL query. Such cases may eventually result in more than one semantically viable chunk set. A semantically viable chunk set (SVC set) of a query Q corresponding to the k^th parse is a set v^L'Qk— v- -^JQ I ^{μ "~} *= t where is a semantic chunk. Semantic chunk of a predicate

is defined as:

such that SVC, satisfies the following: a, Vp€ P ^SC _h SVC_Qh .

b. Vc" e <¾»₍ 35C¾_fc 6 SVCQ^ such ύκύ SC¾,

(p, ^l , ¾ ; . , . </ . . . . Cr).

The condition 'a' states that there is a semantic chunk for each unknown predicate in the query. The condition 'b' states that each constraint in the query is used in at least one semantic chunk. For a query Q, the semantically viable chunk set which is semantically valid as per the domain ontology is the semantically valid chunk set, ~" -^J.Q. These sets are referred to as SVaC sets. The syntactic information of the question is used to obtain semantically viable chunk sets as described below.

For a query, the main task for identification of semantically viable chunk sets is to identify the conditions for all the unknown predicates. The syntactic information of the query is exploited for this purpose. A dependency based parser (for example, Stanford Parser - Klein and Manning, Paper: Fast Exact Inference with a Factored Model for Natural Language Parsing, 2003; Link Parser - Grinberg et. al. Paper: A robust parsing algorithm for link grammars, 1995) is used to obtain the syntactic structure of the question. The process of identifying the appropriate semantic chunks for different categories of queries is explained below.

If an unknown predicate in the query plays the role of a noun, its syntactic modifiers identify the constraints on the predicate. A noun can have either pre-nominal (e.g. adjective) or post-nominal (preposition phrase, relative clause etc.) modifiers. Dependency based parsers provide dependencies between noun and its modifiers. This information along with the phrase structure of the query is used to determine the phrase modifying the unknown predicate. These phrases give the constraints for the unknown predicate. The unknown predicate with its constraint is a candidate semantic chunk. For example, for the question 'What is the role of the associates with age > 30 years?', the preposition phrase 'with age > 30 years' is a post- nominal modifier of the noun 'associates'. The constraint corresponding to this preposition phrase is 'age > 30', and hence the corresponding semantic chunk can be obtained as Q · ' + ·^, -J - - ^<> - , ·

Further, the preposition phrase Of the associates with age > 30 years' is

^^-i employee— name modifying the noun 'role'. Since the semantic chunk, ^"'Q ^' , for the phrase 'of the associates with age > 30 years' has already been identified, the semantic chunk for the predicate 'role' is

SC?q ^le = irole

In a domain 'who' usually refers to a person, such as 'employee name', 'student name'; 'when' refers to date/time attributes like 'joining date', 'completion time'; and 'where' refers to locations like 'address', 'city'. For the given business application, this information about the wh-words is identified, and stored in the seed ontology. In questions involving any of these wh-words, the predicate corresponding to the wh-word is found using the domain ontology, which might be a possible candidate for being an unknown predicate. If the wh-word in the question is compatible to more than one predicate in the domain, then more semantic chunks - corresponding to each compatible predicate - are obtained. Semantic information is used in such cases to resolve the ambiguity regarding the most appropriate predicate. The constraints of the wh-word are determined on the basis of the role of the wh-word in the question as described below.

• If the wh-word is the subject in the question, the corresponding verb phrase determines the constraint on the wh-word.

• In other cases, the words in the phrase enclosing the wh-word determine the constraints on the wh-word. In the case of wh-words becoming the determiners of the unknown predicates also, the constraints are determined. For example, in the question 'In which project is Ritesh allocated?', the constraint for the unknown predicate 'project name' can be identified as 'employee name = Ritesh'. Thus the semantic chunk is {project name; employee name = Ritesh}.

Using the syntactic information, all possible semantic chunks for a parse structure of the question are determined. The set of these chunks is a semantically viable chunk set only if the chunk set satisfies the conditions (a) and (b) specified in the definition of SVC sets.

If for a query Q, only one semantically viable chunk set is found then this chunk set is the semantically valid chunk set. In other cases, the semantically valid chunk set is found by using the domain specific semantic information as described below.

If more than one semantically viable chunk sets are obtained for a question, semantic information obtained from the domain ontology is used to determine the semantically valid chunk set. Let SV C_Ql = {SC¾ e P$} and SVC_Q, = _{be any} two SVC sets for a query Q. Since there are more than one SVC set for Q, 3p_u pj £ ¾ . and c' = (p ) £ C_{Q such that c} ^> _{is a} constituent of

^SCQi ^{€ / (J}Qi and ^5<¾ ^€ SV C^ . But, in the valid interpretation of Q, c ' can specify either the unknown predicate p, or the unknown predicate P_j. Hence, it can be concluded that, in this case, the syntactic information is not sufficient to resolve the ambiguity whether c ' is a constraint of p, or p_j. To resolve such ambiguities, the depth between concerned predicates is used. The number of tables required to be traversed in order to find the relationship between any two predicates is determined through the domain ontology. This is referred to as the depth between two predicates. If for a pair of predicates, there exist more than one path, then the one with the minimum depth is chosen. It is observed that the semantic chunk in which the unknown predicate and the constraint predicate pair has lesser depth is the one which is more likely to be the correct pair. Domain ontology is used to find the depth between two predicates as described below.

Step 1 - Breadth first search (BFS): The system in accordance with the present invention does a BFS on the tables in the ontology to determine if p_t or p_j belongs in the same table as that of p '. Without loss of generality, assume that ?, and p ' belong to the same table, and p_j does not belong to the table of p '. In this case, SC"_Q\ , and consequently SV C_Q\ is assumed to be correct, and SV C_Q2 is rejected. Thus in this case, SV _OCQ = SV C_Q\ .

Step 2 - Depth first search (DFS): The DFS method is involved to resolve the ambiguity regarding the constraint c ' if BFS is not able to do so. The depths of the path from p ' to p_t and p_j are found using domain ontology. The constraint c ' is attached to the predicate with which the distance of p ' is minimum; and the corresponding SVC set is the semantically viable chunk set.

An advantage of this approach is that depending upon question complexity, the system in accordance with the present invention does a deeper analysis. Domain ontology is used only if a question cannot be resolved using just the syntactic information. If domain information also is not sufficient for question interpretation, then answers for all possible interpretations are found, and the user is left with the option of identifying the correct answer.

The semantic chunks of the SVaC set are processed by a query manager module. In this module, a formal query is generated from the semantic chunks to extract the answer to the user's question. Since the domain ontology is in RDF format, the queries are typically generated in SPARQL which is a query language for RDF.

For a semantically valid chunk set, the query generation starts with formulating SPARQL queries for the semantic chunks which do not contain any sub chunk. The unknown predicate of the semantic chunk forms the 'SELECT' clause, and the constraints form a part of the ' WHERE' clause.

The answers obtained from these independent semantic chunks are substituted in the semantic chunks involving nested sub chunks. Finally, the SPARQL query is generated for these chunks and the answer is returned to the user.

TECHNICAL ADVANCEMENTS

The technical advancements of the present invention include realization of a method and system for obtaining semantically valid chunks for natural language applications which:

• can interpret queries easily, efficiently and effectively;

• can parse the query correctly with regard to both syntax and semantics;

• has required domain knowledge; and

• can identify the predicate - object pairs of a query correctly. While considerable emphasis has been placed herein on the particular features of this invention, it will be appreciated that various modifications can be made, and that many changes can be made in the preferred embodiments without departing from the principles of the invention. These and other modifications in the nature of the invention or the preferred embodiments will be apparent to those skilled in the art from the disclosure herein, whereby it is to be distinctly understood that the foregoing descriptive matter is to be interpreted merely as illustrative of the invention and not as a limitation.

Claims

Claims.

1. A system for obtaining semantically valid chunks for natural language application queries, said system comprising:

- predicate identifying means for identifying predicates in a natural language query;

- second binding means for binding predicates and objects of same data type;

- syntactic parsing means adapted to syntactically parse natural language query for absolving ambiguities; and - depth determination means, using domain ontology, for determining depth between any two predicates, thereby providing a syntactically and semantically valid chunk set adapted to be used as a query.

2. A system as claimed in claim 1 wherein, said system includes a predefined default operator pre-fixing means adapted to pre-fix said default operator to said bound predicate and objects not having any numerical value in said natural language application.

3. A system as claimed in claim 1 wherein, said system includes grouping means for grouping said identified predicates and said identified objects that immediately follow/precede the POS tags of assignment words.

4. A system as claimed in claim 1 wherein, said third binding means includes compatibility checking means for checking compatibility of bound string objects with predicates using domain ontology.

5. A method for obtaining semantically valid chunks for natural language application queries, said system comprising the steps of:

- identifying predicates in a natural language query;

- identifying objects in said natural language query;

- identifying comparison operators in said natural language query;

- binding said identified predicate with said identified objects using said identified comparison operator; - replacing all occurrences of string comparator operators in said natural language query with corresponding mathematical operators;

- grouping predicates and objects;

- binding predicates and objects of same data type;

- checking compatibility of bound predicates and objects using domain ontology;

- forming constraint predicate sets from the remaining predicates of the query in order to find semantical ly valid chunk sets from said natural language query;

- syntactically parse natural language query for absolving ambiguities; and

A method as claimed in claim 5 wherein, said method includes the step of pre-fixing means said default operator to said bound predicate and objects not having any numerical value in said natural language application.

A method as claimed in claim 5 wherein, said method includes the step of grouping said identified predicates and said identified objects that immediately follow/precede the POS tags of assignment words. A method as claimed in claim 5 wherein, said step of binding string objects (not previously bound to any predicate) to their compatible predicates using domain ontology includes the step of checking compatibility of bound string objects with predicates using domain ontology.