WO2016068690A1 - Method and system for automated semantic parsing from natural language text - Google Patents

Method and system for automated semantic parsing from natural language text Download PDF

Info

Publication number
WO2016068690A1
WO2016068690A1 PCT/MY2015/050120 MY2015050120W WO2016068690A1 WO 2016068690 A1 WO2016068690 A1 WO 2016068690A1 MY 2015050120 W MY2015050120 W MY 2015050120W WO 2016068690 A1 WO2016068690 A1 WO 2016068690A1
Authority
WO
WIPO (PCT)
Prior art keywords
verb
semantic
subgraph
linguistic
identified
Prior art date
Application number
PCT/MY2015/050120
Other languages
French (fr)
Inventor
Benjamin Chu
Qiang Simon LIU
Dickson Lukose
Original Assignee
Mimos Berhad
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mimos Berhad filed Critical Mimos Berhad
Publication of WO2016068690A1 publication Critical patent/WO2016068690A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/268Morphological analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Definitions

  • the present invention generally relates to natural language processing, and more particularly to a method for generating semantic structures independent of any syntax structures.
  • Semantics commonly being defined as implied meaning of a particular subject, is a crucial component in understanding and interpreting a subject matter in the form of a natural language texts.
  • semantic parsing or syntactic analysis is typically performed.
  • semantic parsing involves linguistic based processing of text and transforming it into a conceptual representation of its meaning.
  • One of the primary setbacks of semantic parsing and processing includes the presence of sematic variations and semantic ambiguities that are presence within natural language texts. Accordingly, if not interpreted accurately, can lead to the creation of multiple ambiguous meaning representations.
  • the existing methods and techniques have partially evolved but essentially include the use of Syntactic Analysis, thus heavily rely on the presence of syntax structures.
  • the syntactic approach entails manipulations which
  • a semantic parsing method for use in natural language processing of an input; the method comprising: performing an entity recognition for extraction of at least one salient entity; performing a coreference resolution to resolve referents; and performing a semantic analysis to generate semantic structures; wherein the semantic analysis comprises: performing a semantic pre-processing for deriving at least one main root verb for retrieval of at least one corresponding linguistic structure and; performing semantic filtering for selecting the best linguistic structure and merging of semantic structure to represent the input.
  • performing a semantic pre-processing further comprises: extracting at least one token of lexical baseforms from the input and generate a vector list; identifying at least one verb type from the vector list; if the verb is an auxiliary verb type; discards all auxiliary words and extract the verb as it is and identifying a least important weight verb; if the verb is a lexical verb; transforming the verb into a lexical form, extracting the verb and identifying the least important weight; if the verb is a dynamic or stative verb; transforms the verb into its lexical form; extracting the verb and identifying the more important weight; searching all possible definition from a linguistic resource and identify a polysemy count for each verb and from all the identified verbs; a maximum weight verb is identified.
  • the method transforms the verb into its lexical form by performing inflection and extract the verb; whereby the
  • the verb with a maximum weight is selected as the main root verb.
  • the method proceeds with selecting a main verb based on the highest polysemy count.
  • the method further comprises: retrieving all possible candidate linguistic structures from at least one linguistic structure repository based on the main root verb; and performing a semantic graph matching for each of the linguistic structure with the input semantic structure.
  • the semantic filtering further comprises: identifying at least one subgraph attached to each verb identified and selected; checking whether all identified subgraph(s) are processed; if at least one subgraph is not processed, selecting said subgraph and iterate all concepts from the input; checking if each concept is conformed to a predefined semantic constraint to each of the concepts in the subgraph; if all concepts are conformed, adding a subgraph count; and merging the concepts and producing at least one new subgraph.
  • the method reverts to checking whether all subgraphs have been processed and repeating preceding steps.
  • the method further comprising consolidating and merging all subgraph counts upon completion of iteration.
  • the method further comprises: if all subgraphs are processed, selecting a linguistic structure with the highest subgraph match count; and returning a merged semantic structure to represent the input based on the highest match count.
  • FIG. 1 shows the overall process flow of the method for use in natural language processing in accordance with an embodiment of the present invention
  • FIG. 2 shows the process flow for the semantic pre-processing in accordance with an embodiment of the present invention
  • FIG. 3 shows the process flow for semantic filtering in accordance with an embodiment of the present invention
  • FIG. 4 shows the process of matching and merging of the concepts from the input to the linguistic structure in accordance with an embodiment of the present invention.
  • the present invention provides a method for generating semantic structures to represent the meaning of natural language texts without relying on any form of syntax structures.
  • the present invention resolves issues associated to complex syntactic structure, whereby the present invention entirely eliminates the use of syntactic analysis.
  • the present invention utilizes at least one set of linguistic resources, a knowledge base and a series of semantic parsing processes to automatically generate semantic structures.
  • FIG. 1 depicts the overall process of the method for generating semantic structures in accordance with an embodiment of the present invention.
  • the process starts and followed by 101, during which an input, which can be in the form of a sentence containing texts is subjected to an entity recognition (NER) process at step 102, to extract salient entities.
  • NER entity recognition
  • information and input for extraction can be obtained from a knowledge base 60.
  • the process proceeds to 103, for coreference resolution to resolve referents based on corresponding noun antecedents and precedents.
  • the semantic analysis process follows at 104, whereby the semantic analysis process comprises a semantic preprocessing 104A, and then a semantic filtering at 104B.
  • the output from the semantic analysis which includes semantic structures, is generated at 105, and then overall process ends at 106. Accordingly, the linked data in relation to the entity recognition and coreference resolution processes are stored 80.
  • the semantic preprocessing 104A is performed primarily for generating and deriving the main root verb from the sentence, in order to retrieve all corresponding linguistic structures.
  • the semantic filtering 104B then aids in selecting the best linguistic structure that can be applied to return the merged semantic structure and thus to represent the sentence.
  • the semantic pre-processing 104A of the semantic analysis 104 will now be described with reference to FIG. 2 in accordance with an embodiment of the present invention.
  • the process proceeds to extract all tokens of lexical baseforms from the text and incorporated into a vector list at 200.
  • at least one verb type is identified at 201.
  • the verb is an auxiliary verb at 202; all auxiliary words are discarded and extract the verb as it is, whereby the least important weight within the extracted verbs is identified and may be assigned as W e at 203.
  • the verb is transformed into its lexical form by performing inflection and extracting the verb; whereby the least important weight is identified and may be assigned as W Recipe at 205.
  • a dynamic or stative verb is identified at 206; the verb is transformed into its lexical form by performing inflection and extract the verb; whereby a more important weight is identified and may be assigned as WA at 207.
  • the verbs are transformed into its lexical form by performing inflection and extract the verb; whereby the least important weight W c at 208.
  • a polysemy count is assigned as P; at 209. From the W «,W6 and W c ; a verb with a higher or maximum weight assigned is selected at 210, whereby the selected verb is considered as the main root verb. In the event that there are more than one main verb within the sentence at 211; by referring to the Pi * the verb with a higher polysemy count is selected at 212. With the selected main root verb, all possible candidate linguistic structures from the repository are retrieved at 213. .
  • semantic graph matching is performed for each of the linguistic structure with the sentence or input semantic structure at 214.
  • the overall process then continues to step 104B, the semantic filtering step.
  • the semantic filtering step 104B will now be described in accordance with an embodiment of the present invention with reference to FIG. 3. Following from the semantic pre-processing step, all subgraphs that are attached to verb from each of the candidate linguistic structures is accordingly identified at 300. Next, all subgraphs are checked to determined whether they are all processed at 301, to which if a NO response is received, the next subgraph is selected at 302. And then, for each identified and selected subgraph, iteration is performed through all the concepts from the sentence at 303. Each concept is checked whether each of them conforms to a predefined semantic constraint to each of the concepts in the subgraph at 304.
  • Steps 302 onwards may then be repeated.
  • the concepts are then merged and at least one new subgraph is produced at 306.
  • steps 302 to 306 may be repeated subject to the amount of subgraphs identified. In one embodiment, all subgraph counts are consolidated and merged thereafter upon completion of iteration at 306A.
  • the linguistic structure with the highest subgraph match count is selected at 301A from the consolidated and merged subgraphs, and return the merged graph, being the finalized and merged semantic structure 90 to represent the input at 301B, thus ending the process at 307.
  • EXAMPLE 1 An example of a sentence subjected to the steps semantic pre-processing and filtering in accordance with an embodiment of the present invention is shown as EXAMPLE 1 below: EXAMPLE 1 [0030] Sentence: John bought Mary a Ferrari
  • tokenized lexical baseforms can be extracted; these include; "John”, “buy, “Mary”, “Ferrari” where propositions or stopwords like the word "a” will be excluded for the semantic analysis.
  • Linguistic Structure #1 [animate] ⁇ - (agnt) ⁇ -[buy]->(thme)->[entity]
  • the structure can be explained as the agent of the action "buy” is an animate being (e.g person); and the theme of the action is referring to an entity.
  • Linguistic Structure #2 [animate] ⁇ -(agnt) ⁇ -[buy]- ⁇ (thme)->[entity];
  • the method iterates through all the concepts/instances from the sentence and perform matching to the concept nodes for each of the subgraphs ( subgraphs are determined from the linguistic structure before the matching process). For instance in this example; the iteration can be in the following form: [0040] Subgraph #1: [animate] ⁇ -(agnt) ⁇ - [buy]
  • the method proceeds to iterate all concepts/instances and perform semantic constraints check during the matching process, for instance; for "John”, the method checks if this conforms to the first concept "animate” in the linguistic structure. From the knowledge base hierarchy, it can be known that John is an instance of a "male person” concept, whereby the concept "person” is lesser order than the "animate” concept. Upon completion of the conformity check, the instance "John” can be matched to the first node of the Subgraph #1.

Abstract

The present invention discloses semantic parsing method for use in natural language processing of an input; the method comprising: performing an entity recognition for extraction of at least one entity (102); performing a coreference resolution to resolve referents (103); and performing a semantic analysis (104) to generate semantic structures. In one embodiment, the semantic analysis (104) comprises: performing a semantic pre-processing (104 A) for deriving at least one main root verb for retrieval of at least one corresponding linguistic structure and; performing semantic filtering (104B) for selecting the best linguistic structure and merging of semantic structure to represent the input.

Description

METHOD AND SYSTEM FOR AUTOMATED SEMANTIC PARSING FROM
NATURAL LANGUAGE TEXT
FIELD OF INVENTION
[0001] The present invention generally relates to natural language processing, and more particularly to a method for generating semantic structures independent of any syntax structures.
BACKGROUND OF INVENTION
[0002] Semantics, commonly being defined as implied meaning of a particular subject, is a crucial component in understanding and interpreting a subject matter in the form of a natural language texts. To understand the meaning of the natural language texts, semantic parsing or syntactic analysis is typically performed. In general, semantic parsing involves linguistic based processing of text and transforming it into a conceptual representation of its meaning. [0003] One of the primary setbacks of semantic parsing and processing includes the presence of sematic variations and semantic ambiguities that are presence within natural language texts. Accordingly, if not interpreted accurately, can lead to the creation of multiple ambiguous meaning representations. At present, the existing methods and techniques have partially evolved but essentially include the use of Syntactic Analysis, thus heavily rely on the presence of syntax structures. In addition, the syntactic approach entails manipulations which
[0004] Hence it would be highly desirable to have a method and system that can provide accurate representations with respect to semantic meaning of a text, and independent of any syntax structures. SUMMARY [0005] In one aspect, there is disclosed a semantic parsing method for use in natural language processing of an input; the method comprising: performing an entity recognition for extraction of at least one salient entity; performing a coreference resolution to resolve referents; and performing a semantic analysis to generate semantic structures; wherein the semantic analysis comprises: performing a semantic pre-processing for deriving at least one main root verb for retrieval of at least one corresponding linguistic structure and; performing semantic filtering for selecting the best linguistic structure and merging of semantic structure to represent the input.
[0006] In one embodiment, performing a semantic pre-processing further comprises: extracting at least one token of lexical baseforms from the input and generate a vector list; identifying at least one verb type from the vector list; if the verb is an auxiliary verb type; discards all auxiliary words and extract the verb as it is and identifying a least important weight verb; if the verb is a lexical verb; transforming the verb into a lexical form, extracting the verb and identifying the least important weight; if the verb is a dynamic or stative verb; transforms the verb into its lexical form; extracting the verb and identifying the more important weight; searching all possible definition from a linguistic resource and identify a polysemy count for each verb and from all the identified verbs; a maximum weight verb is identified. [0007] In another embodiment, for finite, non-finite, regular, irregular, transitive and intransitive verbs; the method transforms the verb into its lexical form by performing inflection and extract the verb; whereby the least important weight.
[0008] In a further embodiment, the verb with a maximum weight is selected as the main root verb.
[0009] In yet a further embodiment, in the event that there is a plurality of main verbs identified; the method proceeds with selecting a main verb based on the highest polysemy count. [0010] In yet a further embodiment, in the event that the main root verb is identified, the method further comprises: retrieving all possible candidate linguistic structures from at least one linguistic structure repository based on the main root verb; and performing a semantic graph matching for each of the linguistic structure with the input semantic structure.
[0011] In another embodiment, the semantic filtering further comprises: identifying at least one subgraph attached to each verb identified and selected; checking whether all identified subgraph(s) are processed; if at least one subgraph is not processed, selecting said subgraph and iterate all concepts from the input; checking if each concept is conformed to a predefined semantic constraint to each of the concepts in the subgraph; if all concepts are conformed, adding a subgraph count; and merging the concepts and producing at least one new subgraph. [0012] In yet a further embodiment in the event that the predefined semantic constraints are not met, the method reverts to checking whether all subgraphs have been processed and repeating preceding steps.
[0013] In another embodiment, the method further comprising consolidating and merging all subgraph counts upon completion of iteration.
[0014] In yet a further embodiment, the method further comprises: if all subgraphs are processed, selecting a linguistic structure with the highest subgraph match count; and returning a merged semantic structure to represent the input based on the highest match count.
BRIEF DESCRIPTION OF DRAWINGS
[0015] The invention will be more understood by reference to the description below taken in conjunction with the accompanying drawings herein:
[0016] FIG. 1 shows the overall process flow of the method for use in natural language processing in accordance with an embodiment of the present invention; [0017] FIG. 2 shows the process flow for the semantic pre-processing in accordance with an embodiment of the present invention; [0018] FIG. 3 shows the process flow for semantic filtering in accordance with an embodiment of the present invention; [0019] FIG. 4 shows the process of matching and merging of the concepts from the input to the linguistic structure in accordance with an embodiment of the present invention.
DETAILED DESCRIPTION
[0020] In line with the above summary, the following description of a number of specific and alternative embodiments is provided to understand the inventive features of the present invention. It shall be apparent to one skilled in the art, however that this invention may be practiced without such specific details. Some of the details may not be described at length so as not to obscure the invention. For ease of reference, common reference numerals will be used throughout the figures when referring to the same or similar features common to the figures.
[0021] The present invention provides a method for generating semantic structures to represent the meaning of natural language texts without relying on any form of syntax structures. In one embodiment, the present invention resolves issues associated to complex syntactic structure, whereby the present invention entirely eliminates the use of syntactic analysis. In a further embodiment, the present invention utilizes at least one set of linguistic resources, a knowledge base and a series of semantic parsing processes to automatically generate semantic structures.
[0022] FIG. 1 depicts the overall process of the method for generating semantic structures in accordance with an embodiment of the present invention. The process starts and followed by 101, during which an input, which can be in the form of a sentence containing texts is subjected to an entity recognition (NER) process at step 102, to extract salient entities. During the entity recognition process, information and input for extraction can be obtained from a knowledge base 60. Similarly with the aid of the knowledge base 60 and then upon completion of the entity recognition step and thus successfully extracted at least one entity; the process proceeds to 103, for coreference resolution to resolve referents based on corresponding noun antecedents and precedents. The semantic analysis process follows at 104, whereby the semantic analysis process comprises a semantic preprocessing 104A, and then a semantic filtering at 104B. Upon completion of the semantic analysis, the output from the semantic analysis, which includes semantic structures, is generated at 105, and then overall process ends at 106. Accordingly, the linked data in relation to the entity recognition and coreference resolution processes are stored 80.
[0023] In accordance with an embodiment of the present invention, the semantic preprocessing 104A is performed primarily for generating and deriving the main root verb from the sentence, in order to retrieve all corresponding linguistic structures. The semantic filtering 104B then aids in selecting the best linguistic structure that can be applied to return the merged semantic structure and thus to represent the sentence.
[0024] The semantic pre-processing 104A of the semantic analysis 104 will now be described with reference to FIG. 2 in accordance with an embodiment of the present invention. Upon initiated, the process proceeds to extract all tokens of lexical baseforms from the text and incorporated into a vector list at 200. From the vector list, at least one verb type is identified at 201. During the identification of at least one verb, there can be three types of verbs to be identified; there are, but not limiting to; auxiliary verb; lexical verb and dynamic or stative verb. In the event that the verb is an auxiliary verb at 202; all auxiliary words are discarded and extract the verb as it is, whereby the least important weight within the extracted verbs is identified and may be assigned as We at 203. In the event that the identified verb is a lexical verb at 204, the verb is transformed into its lexical form by performing inflection and extracting the verb; whereby the least important weight is identified and may be assigned as W„ at 205. In the event that a dynamic or stative verb is identified at 206; the verb is transformed into its lexical form by performing inflection and extract the verb; whereby a more important weight is identified and may be assigned as WA at 207. As for finite, non-finite, regular, irregular, transitive and or intransitive verbs is identified, the verbs are transformed into its lexical form by performing inflection and extract the verb; whereby the least important weight Wc at 208. [0025] In one embodiment, for each of the verbs identified; all possible definitions are searched and retrieved from the linguistic resource or repository 70 and a polysemy count is assigned as P; at 209. From the W«,W6 and Wc; a verb with a higher or maximum weight assigned is selected at 210, whereby the selected verb is considered as the main root verb. In the event that there are more than one main verb within the sentence at 211; by referring to the Pi* the verb with a higher polysemy count is selected at 212. With the selected main root verb, all possible candidate linguistic structures from the repository are retrieved at 213. .
Upon retrieved all linguistic structures from the repository, semantic graph matching is performed for each of the linguistic structure with the sentence or input semantic structure at 214. [0026] The overall process then continues to step 104B, the semantic filtering step.
The semantic filtering step 104B will now be described in accordance with an embodiment of the present invention with reference to FIG. 3. Following from the semantic pre-processing step, all subgraphs that are attached to verb from each of the candidate linguistic structures is accordingly identified at 300. Next, all subgraphs are checked to determined whether they are all processed at 301, to which if a NO response is received, the next subgraph is selected at 302. And then, for each identified and selected subgraph, iteration is performed through all the concepts from the sentence at 303. Each concept is checked whether each of them conforms to a predefined semantic constraint to each of the concepts in the subgraph at 304. If a NO is received as a response at 304 in determining whether the constraints are met, the process reverts to 301, where the process reverts to checking whether the subgraphs are processed at 301. Steps 302 onwards may then be repeated. In the event that YES is received as a response when checked whether all semantic constraints satisfied at 304, the subgraph count is added; for instance; SGi = 1 at 305. The concepts are then merged and at least one new subgraph is produced at 306. Then steps 302 to 306 may be repeated subject to the amount of subgraphs identified. In one embodiment, all subgraph counts are consolidated and merged thereafter upon completion of iteration at 306A.
[0027] In one embodiment, in the event that a YES is received as a response when checking whether all subgraphs are processed at 301, the linguistic structure with the highest subgraph match count is selected at 301A from the consolidated and merged subgraphs, and return the merged graph, being the finalized and merged semantic structure 90 to represent the input at 301B, thus ending the process at 307.
[0028] Referring back to FIG. 1, the finalized semantic structure 90 that suitably and accurately represents the input is generated and the overall method ends.
[0029] An example of a sentence subjected to the steps semantic pre-processing and filtering in accordance with an embodiment of the present invention is shown as EXAMPLE 1 below: EXAMPLE 1 [0030] Sentence: John bought Mary a Ferrari
[0031] During the pro-processing step, tokenized lexical baseforms can be extracted; these include; "John", "buy, "Mary", "Ferrari" where propositions or stopwords like the word "a" will be excluded for the semantic analysis.
[0032] Identifies the main root verb, whereby for this example the main root verb
"buy" and this verb type is a dynamic verb and hence, the more important weight, W_> is assigned. [0033] All possible linguistic structures for the identified main root verb are extracted; for instance; a linguistic structure defines how a verb is used in a certain way or it is structure pre-defined with fixed attachments (relations). The matching and merging process of the concepts from the sentence or input to the linguistic structure are simplified in a linear form as shown in FIG. 4.
[0034] Linguistic Structure #1: [animate] <- (agnt)<-[buy]->(thme)->[entity]
[0035] For linguistic structure #1, the structure can be explained as the agent of the action "buy" is an animate being (e.g person); and the theme of the action is referring to an entity.
[0036] Linguistic Structure #2: [animate]<-(agnt)<-[buy]-{(thme)->[entity];
(benf)-> [animate] ;} [0037] For linguistic structure #2, the structure can be explained in the similar way as described for linguistic structure #1 whereby the agent of the action "buy" is an animate person, and the theme of the action is referring to an entity, and in addition the beneficiary of the action is referring to an animate being (e.g. person). .
[0038] In this example, there can be many possible linguistic structures, which can be extracted from the linguistic resources for a particular verb. For instance in structure #2, there can be 3 subgraphs to be identified from the linguistic structure. [0039] And then performing the iteration step, the method iterates through all the concepts/instances from the sentence and perform matching to the concept nodes for each of the subgraphs ( subgraphs are determined from the linguistic structure before the matching process). For instance in this example; the iteration can be in the following form: [0040] Subgraph #1: [animate] <-(agnt)<- [buy]
Subgraph #2: [buy] -> (thme) -> [entity]
Subgraph #3: [buy] -> (benf) -> [animate]
[0041] Next the method proceeds to iterate all concepts/instances and perform semantic constraints check during the matching process, for instance; for "John", the method checks if this conforms to the first concept "animate" in the linguistic structure. From the knowledge base hierarchy, it can be known that John is an instance of a "male person" concept, whereby the concept "person" is lesser order than the "animate" concept. Upon completion of the conformity check, the instance "John" can be matched to the first node of the Subgraph #1.
[0042] Proceeding from the above, a new semantic structure (a subgraph) is produced: [male-person: "john"] <-(agnt)<-[buy]. The overall method is continued until all concepts are iterated and the semantic constraints are checked. Upon completion of the iteration 2, another semantic structure can be produced; such as: [buy]->(thme)->[car "Ferrari"].
[0043] Upon completion of the overall method, there can be an instance where the total subgraphs matched are higher than the other linguistic structures. For instance, in the event that the Linguistic Structure #1 as discussed in the preceding paragraph is selected, total subgraphs matched are two, whereas the Linguistic Structure #2 total matched subgraphs are three. Accordingly, checking against all possible linguistic structures is required to as to determine which structure has the highest matched subgraphs. Next, based on the best linguistic structure selected, a final and merged semantic structure can be produced, as per below: Subgraph #1:
[male-person: "John"] -> (agnt)<-[buy]
Subgraph#2:
[buy]->(benf)->[car:"Ferrari"l
Subgraph#3:
[buy]->(benf)->[female-person:"Mary"]
[0044] From the above, the three subgraphs can be merged to produce a final semantic structure as shown below: Merged Semantic Structure:
[male-person: "John"]<-(agnt)<-[buy]-{->(thme)->[car. "Ferrari"];
->(benf)-> [female-person :
"Mary"]}. [0045] Accordingly, in EXAMPLE 1, the final merged semantic structure produced represents the meaning of the text that "John bought a Ferrari for Mary" without having to experience the complexities of syntactic analysis or analysing the syntax structure. Perceptibly, the possible different variations of meanings and representations of a sentence, which eventually can cause ambiguities, can be avoided with the use of the method of the present invention.
[0046] As would be apparent to a person having ordinary skilled in the art, the afore- described methods may be provided in many variations, modifications or alternatives to existing methods and systems. The principles and concepts disclosed herein may also be implemented in various manners which may not have been specifically described herein but which are to be understood as encompassed within the scope of the following claims.

Claims

1. A semantic parsing method for use in natural language processing of an input; the method comprising: performing an entity recognition for extraction of at least one salient entity (102);
performing a coreference resolution to resolve referents (103); and performing a semantic analysis (104) to generate semantic structures; wherein the semantic analysis (104) comprises: performing a semantic pre-processing (104A) for deriving at least one main root verb for retrieval of at least one corresponding linguistic structure and; performing semantic filtering (104B) for selecting the best linguistic structure and merging of semantic structure to represent the input.
2. The semantic parsing method as claimed in Claim 1 wherein performing a semantic pre-processing (104 A) further comprises: extracting at least one token of lexical baseforms from the input and generate a vector list (200);
identifying at least one verb type from the vector list (201);
if the verb is an auxiliaiy verb type (202); discards all auxiliary words and extract the verb as it is and identifying a least important weight verb (203); if the verb is a lexical verb (204) ; transforming the verb into a lexical form, extracting the verb and identifying the least important weight (205); if the verb is a dynamic or stative verb (206); transforms the verb into its lexical form; extracting the verb and identifying the more important weight (207). O 2016/068690 , . „ , , . . r , . . PCT/MY2015/050120 searching all possible definition from a linguistic resource and identify a polysemy count for each verb (209); and
identifying a maximum weight verb (210) from all the identified verbs.
3. The method as claimed in Claim 2, wherein in the event that the verb is identified as one of the following: finite, non-finite, regular, irregular, transitive and intransitive verbs; the method transforms the verb into its lexical form by performing inflection and extract the verb; whereby the least important weight (208).
4. The method as claimed in Claim 2 wherein the verb with a maximum weight is selected as the main root verb.
5. The method as claimed in Claim 2 wherein in the event that there is a plurality of main verbs identified (211); the method proceeds with selecting a main verb based on the highest polysemy count (212).
6. The method as claimed in Claim 2 wherein in the event that the main root verb is identified, the method further comprises: retrieving all possible candidate linguistic structures from at least one linguistic structure repository (213) based on the main root verb; and performing a semantic graph matching for each of the linguistic structure with the input semantic structure (214).
7. The method as claimed in Claim 1 wherein the semantic filtering (104B) further comprises: identifying at least one subgraph all subgraphs attached to each verb identified and selected (300);
checking whether all identified subgraph(s) are processed (301);
if at least one subgraph is not processed, selecting said subgraph (302) and iterate all concepts from the input (303);
checking if each concept is conformed to a predefined semantic constraint to each of the concepts in the subgraph (304); O 2016/068690 · . , , · , , PCT/MY2015/050120 it an concepts are conformed, adding a subgraph count (3l)5); and merging the concepts and producing at least one new subgraph (306).
8. The method as claimed in Claim 7, wherein in the event that the predefined semantic constraints are not met, the method reverts to checking whether all subgraphs have been processed and repeating steps (302) to (306).
9. The method as claimed in Claim 7, the method further comprising consolidating and merging all subgraph counts upon completion of iteration (306A).
10. The method as claimed in Claim 7, wherein the method further comprises: if all subgraphs are processed, selecting a linguistic structure with the highest subgraph match count (301 A); and
returning a merged semantic structure to represent the input based on the highest match count (30 IB).
PCT/MY2015/050120 2014-10-27 2015-10-12 Method and system for automated semantic parsing from natural language text WO2016068690A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
MY2014003046 2014-10-27
MY2014003046 2014-10-27

Publications (1)

Publication Number Publication Date
WO2016068690A1 true WO2016068690A1 (en) 2016-05-06

Family

ID=55857891

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/MY2015/050120 WO2016068690A1 (en) 2014-10-27 2015-10-12 Method and system for automated semantic parsing from natural language text

Country Status (1)

Country Link
WO (1) WO2016068690A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106599305A (en) * 2016-12-29 2017-04-26 中南大学 Crowdsourcing-based heterogeneous media semantic meaning fusion method
CN110717017A (en) * 2019-10-17 2020-01-21 腾讯科技(深圳)有限公司 Method for processing corpus
CN112836499A (en) * 2019-11-23 2021-05-25 中国科学院长春光学精密机械与物理研究所 Method for constructing PCB fault diagnosis rule base, electronic equipment and storage medium
US11657229B2 (en) 2020-05-19 2023-05-23 International Business Machines Corporation Using a joint distributional semantic system to correct redundant semantic verb frames

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01113869A (en) * 1987-10-28 1989-05-02 Hitachi Ltd Japanese sentence analyzing system
JPH0869466A (en) * 1994-08-30 1996-03-12 Sumitomo Electric Ind Ltd Natural language analyzing device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01113869A (en) * 1987-10-28 1989-05-02 Hitachi Ltd Japanese sentence analyzing system
JPH0869466A (en) * 1994-08-30 1996-03-12 Sumitomo Electric Ind Ltd Natural language analyzing device

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106599305A (en) * 2016-12-29 2017-04-26 中南大学 Crowdsourcing-based heterogeneous media semantic meaning fusion method
CN106599305B (en) * 2016-12-29 2020-03-31 中南大学 Crowdsourcing-based heterogeneous media semantic fusion method
CN110717017A (en) * 2019-10-17 2020-01-21 腾讯科技(深圳)有限公司 Method for processing corpus
CN110717017B (en) * 2019-10-17 2022-04-19 腾讯科技(深圳)有限公司 Method for processing corpus
CN112836499A (en) * 2019-11-23 2021-05-25 中国科学院长春光学精密机械与物理研究所 Method for constructing PCB fault diagnosis rule base, electronic equipment and storage medium
CN112836499B (en) * 2019-11-23 2022-11-22 中国科学院长春光学精密机械与物理研究所 Method for constructing PCB fault diagnosis rule base, electronic equipment and storage medium
US11657229B2 (en) 2020-05-19 2023-05-23 International Business Machines Corporation Using a joint distributional semantic system to correct redundant semantic verb frames

Similar Documents

Publication Publication Date Title
EP2664997B1 (en) System and method for resolving named entity coreference
CN106528532B (en) Text error correction method, device and terminal
US10810372B2 (en) Antecedent determining method and apparatus
US11544459B2 (en) Method and apparatus for determining feature words and server
US20040148154A1 (en) System for using statistical classifiers for spoken language understanding
RU2610241C2 (en) Method and system for text synthesis based on information extracted as rdf-graph using templates
US20040148170A1 (en) Statistical classifiers for spoken language understanding and command/control scenarios
de Araújo et al. Re-bert: automatic extraction of software requirements from app reviews using bert language model
CN111444330A (en) Method, device and equipment for extracting short text keywords and storage medium
CN106570180A (en) Artificial intelligence based voice searching method and device
US10223349B2 (en) Inducing and applying a subject-targeted context free grammar
WO2016068690A1 (en) Method and system for automated semantic parsing from natural language text
Jayan et al. A hybrid statistical approach for named entity recognition for malayalam language
CN107526721A (en) A kind of disambiguation method and device to electric business product review vocabulary
CN111723192B (en) Code recommendation method and device
Dunn Frequency vs. association for constraint selection in usage-based construction grammar
CN107480197B (en) Entity word recognition method and device
KR102026967B1 (en) Language Correction Apparatus and Method based on n-gram data and linguistic analysis
Yuwana et al. On part of speech tagger for Indonesian language
CN111858894A (en) Semantic missing recognition method and device, electronic equipment and storage medium
KR102567896B1 (en) Apparatus and method for religious sentiment analysis using deep learning
Fahrni et al. HITS'Monolingual and Cross-lingual Entity Linking System at TAC 2012: A Joint Approach.
CN111814025A (en) Viewpoint extraction method and device
KR100574887B1 (en) Apparatus And Method For Word Sense Disambiguation In Machine Translation System
Cattle et al. Srhr at semeval-2017 task 6: Word associations for humour recognition

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15854036

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15854036

Country of ref document: EP

Kind code of ref document: A1