CN111950290A - Semantic analysis method and device based on concept graph - Google Patents

Semantic analysis method and device based on concept graph Download PDF

Info

Publication number
CN111950290A
CN111950290A CN201910364368.9A CN201910364368A CN111950290A CN 111950290 A CN111950290 A CN 111950290A CN 201910364368 A CN201910364368 A CN 201910364368A CN 111950290 A CN111950290 A CN 111950290A
Authority
CN
China
Prior art keywords
concept
semantic
sentence
semantics
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910364368.9A
Other languages
Chinese (zh)
Inventor
崔颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Genius Technology Co Ltd
Original Assignee
Guangdong Genius Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Genius Technology Co Ltd filed Critical Guangdong Genius Technology Co Ltd
Priority to CN201910364368.9A priority Critical patent/CN111950290A/en
Publication of CN111950290A publication Critical patent/CN111950290A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Abstract

The invention belongs to the field of semantic analysis, and discloses a semantic analysis method and a semantic analysis device based on a concept graph, wherein the method comprises the following steps: acquiring a sentence to be analyzed input by a user; extracting key words in the sentence to be analyzed; matching the keywords with a pre-generated concept graph, and determining the semantics of the keywords in the sentence to be analyzed; the concept graph comprises a plurality of concepts, incidence relations among the plurality of concepts, names corresponding to the concepts and semantics of the concepts; and determining the semantics of the sentence to be analyzed according to the semantics of the keyword in the sentence to be analyzed and the sentence pattern structure of the sentence to be analyzed. The method comprises the steps of firstly obtaining the semantics of each word in the sentence to be analyzed according to a pre-constructed concept graph, and then accurately obtaining the semantics of the sentence to be analyzed according to the semantics of each word and the sentence pattern structure of the sentence to be analyzed so that the intelligent terminal can make correct feedback.

Description

Semantic analysis method and device based on concept graph
Technical Field
The invention belongs to the technical field of semantic analysis, and particularly relates to a semantic analysis method and device based on a concept graph.
Background
With the rapid development of intelligent terminals and network technologies, various intelligent products play more and more important roles in the life of people, and people are more and more accustomed to using intelligent terminals to fulfill various requirements. Along with the increasing maturity of the related technology of artificial intelligence, the intelligent degree of various terminals is also higher and higher. Natural language has gradually become the most mainstream man-machine interaction mode in the field of intelligent services as the most convenient and natural way for human to express self thought.
In the human-computer interaction process, semantic parsing of natural language is an indispensable link, and the semantic parsing is mainly implemented by analyzing natural sentences input by a user, parsing the semantics of the sentences, converting the semantics into a structured data format which can be understood by a machine, and then making corresponding feedback. Therefore, in a human-computer interaction scenario, the semantics of the accurate parsing statement is the basis for making a correct response.
Disclosure of Invention
The invention aims to provide a semantic analysis method and a semantic analysis device based on a concept graph, which are used for achieving the purpose of accurately acquiring the semantics of a user input sentence.
The technical scheme provided by the invention is as follows:
in one aspect, a semantic parsing method based on a concept graph is provided, which includes:
acquiring a sentence to be analyzed input by a user;
extracting key words in the sentence to be analyzed;
matching the keywords with a pre-generated concept graph, and determining the semantics of the keywords in the sentence to be analyzed; the concept graph comprises a plurality of concepts, incidence relations among the plurality of concepts, names corresponding to the concepts and semantics of the concepts;
and determining the semantics of the sentence to be analyzed according to the semantics of the keyword in the sentence to be analyzed and the sentence pattern structure of the sentence to be analyzed.
Further preferably, the method for generating the concept graph includes:
constructing a word library according to the dictionary;
acquiring a large amount of user corpora;
identifying concepts in the corpus using the term library and determining relationships between the concepts;
acquiring all names corresponding to the concepts and the semantics of the concepts by utilizing a word library;
and generating a concept graph according to the relationship among the concepts, all names corresponding to the concepts and the semantics of the concepts.
Further preferably, the matching the keyword with a pre-generated concept graph and determining the semantics of the keyword in the sentence to be parsed specifically include:
matching the keyword with a pre-generated concept graph, and finding out the concept corresponding to the keyword in the concept graph;
and determining the semantics of the keywords in the sentence to be analyzed according to the semantics of the concept.
Further preferably, the determining, according to the semantics of the concept, the semantics of the keyword in the sentence to be parsed specifically includes:
when the concept corresponds to a semantic, the semantic of the concept is the semantic of the keyword in the sentence to be analyzed;
and when the concept corresponds to a plurality of semantics, performing semantic disambiguation on the concept of the plurality of semantics, and determining at least one semantic of the concept, wherein the at least one semantic determined by the concept is the semantic of the keyword in the statement to be analyzed.
Further preferably, when the concept corresponds to a plurality of semantics, performing semantic disambiguation on the concept of the plurality of semantics, and determining at least one semantic of the concept, where the at least one semantic determined by the concept as the semantic of the keyword in the sentence to be parsed specifically includes:
determining at least one semantic meaning of the concept according to the context of the keyword corresponding to the concept in the statement to be analyzed;
and at least one semantic meaning after the concept determination is the semantic meaning of the keyword in the sentence to be analyzed.
In another aspect, a semantic parsing apparatus based on a concept graph is further provided, including:
the sentence acquisition module is used for acquiring a sentence to be analyzed input by a user;
the keyword extraction module is used for extracting keywords in the sentence to be analyzed;
the keyword semantic determining module is used for matching the keywords with a pre-generated concept graph and determining the semantics of the keywords in the sentence to be analyzed; the concept graph comprises a plurality of concepts, incidence relations among the plurality of concepts, names corresponding to the concepts and semantics of the concepts;
and the sentence semantic determining module is used for determining the semantics of the sentence to be analyzed according to the semantics of the keyword in the sentence to be analyzed and the sentence pattern structure of the sentence to be analyzed.
Further preferably, the system further comprises a conceptual diagram generation module;
the concept graph generation module comprises:
the word bank building unit is used for building a word bank according to the dictionary;
the corpus acquiring unit is used for acquiring a large amount of user corpuses;
a concept relationship determination unit, configured to identify concepts in the corpus using the word bank, and determine relationships between the concepts;
the name and semantic acquiring unit is used for acquiring all names corresponding to the concepts and the semantics of the concepts by utilizing a word library;
and the concept graph generating unit is used for generating a concept graph according to the relationship among the concepts, all names corresponding to the concepts and the semantics of the concepts.
Further preferably, the keyword semantics determining module comprises:
the matching sub-module is used for matching the keyword with a pre-generated concept graph and finding out the concept corresponding to the keyword in the concept graph;
and the keyword semantic determining submodule is used for determining the semantics of the keyword in the sentence to be analyzed according to the semantics of the concept.
Further preferably, the keyword semantic determination submodule includes:
a keyword semantic determining unit, configured to, when the concept corresponds to a semantic, determine that the semantic of the concept is the semantic of the keyword in the sentence to be parsed;
the keyword semantic determining unit is further configured to perform semantic disambiguation on the concept of the plurality of semantics to determine at least one semantic of the concept when the concept corresponds to the plurality of semantics, where the at least one semantic after the concept determination is the semantic of the keyword in the sentence to be parsed.
Further preferably, the keyword semantic determining unit is further configured to determine at least one semantic of the concept according to a context of a keyword corresponding to the concept in the sentence to be parsed; and taking at least one semantic meaning determined by the concept as the semantic meaning of the keyword in the sentence to be analyzed.
Compared with the prior art, the semantic analysis method and device based on the concept graph have the beneficial effects that: the method comprises the steps of firstly obtaining the semantics of each word in the sentence to be analyzed according to a pre-constructed concept graph, and then accurately obtaining the semantics of the sentence to be analyzed according to the semantics of each word and the sentence pattern structure of the sentence to be analyzed so that the intelligent terminal can make correct feedback.
Drawings
The above features, technical features, advantages and implementation manners of a semantic analysis method and apparatus based on a conceptual diagram will be further described in the following description of preferred embodiments in a clearly understandable manner with reference to the accompanying drawings.
FIG. 1 is a schematic flow chart diagram of a semantic parsing method based on a concept graph according to a first embodiment of the present invention;
FIG. 2 is a schematic flow chart of generating a concept graph in a second embodiment of the semantic parsing method based on the concept graph according to the invention;
FIG. 3 is a schematic flow chart of a semantic parsing method based on concept graph according to a third embodiment of the present invention;
FIG. 4 is a schematic flow chart of a semantic parsing method based on concept graph according to a fourth embodiment of the present invention;
FIG. 5 is a flow chart of a fifth embodiment of the semantic parsing method based on concept graph according to the invention;
FIG. 6 is a block diagram illustrating the structure of an embodiment of a semantic parsing apparatus based on concept graph according to the present invention;
FIG. 7 is a block diagram illustrating the structure of another embodiment of a semantic parsing apparatus based on concept graph according to the present invention.
Description of the reference numerals
100. A statement acquisition module; 200. A keyword extraction module;
300. a keyword semantic determination module; 310. A matching sub-module;
320. a keyword semantic determination submodule; 321. A keyword semantic determination unit;
400. a sentence semantic determination module; 500. A concept graph generation module;
510. a word stock construction unit; 520. A corpus acquiring unit;
530. a conceptual relationship determination unit; 540. A name and semantic acquisition unit;
550. a conceptual diagram generation unit.
Detailed Description
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description will be made with reference to the accompanying drawings. It is obvious that the drawings in the following description are only some examples of the invention, and that for a person skilled in the art, other drawings and embodiments can be derived from them without inventive effort.
For the sake of simplicity, the drawings only schematically show the parts relevant to the present invention, and they do not represent the actual structure as a product. In addition, in order to make the drawings concise and understandable, components having the same structure or function in some of the drawings are only schematically illustrated or only labeled. In this document, "one" means not only "only one" but also a case of "more than one".
According to a first embodiment provided by the present invention, as shown in fig. 1, a semantic parsing method based on a concept graph includes:
s1000, obtaining a sentence to be analyzed input by a user;
specifically, the obtained sentence to be analyzed may be an unstructured text sentence input by the user, or may be voice information acquired by a microphone or other voice acquisition device. The voice information may be voice input by the user in real time.
S2000, extracting key words in the sentence to be analyzed;
specifically, after the sentence to be analyzed is obtained, word segmentation and part-of-speech tagging are performed on the sentence to be analyzed, and then keywords are extracted from the sentence to be analyzed based on a word segmentation result. The keyword is a word from which meaningless words such as "in", "with", etc. in the sentence to be analyzed are removed.
And if the obtained sentence to be analyzed is an unstructured text sentence, directly performing word segmentation and part-of-speech tagging on the sentence to be analyzed. The word segmentation and part-of-speech tagging can use the word segmentation and part-of-speech tagging methods in the prior art, for example, the word segmentation can use the longest word matching word segmentation method, the word segmentation method based on character string matching, and the like, and the part-of-speech tagging can use the method based on the HMM (Hidden Markov Model), and the like.
If the obtained sentence to be analyzed is voice information, the voice information is recognized as text information, and then word segmentation and part-of-speech tagging are carried out on the text information.
For example, the results of word segmentation and part-of-speech tagging for "Hangzhou West lake landscape is well a tourist resort" are:
hangzhou/n West lake/n landscape/n is good/a is/v tourist resort/n. Wherein, the letter in the word segmentation result represents the part of speech,/n represents the noun,/v represents the verb, and/a represents the adjective.
According to the word segmentation result, the keywords extracted from Hangzhou west lake landscape which is well as tourist attractions are 'Hangzhou', 'West lake', 'good', 'Ye' and 'tourist attractions'.
S3000, matching the keywords with a pre-generated concept graph, and determining the semantics of the keywords in the sentence to be analyzed; the concept graph comprises a plurality of concepts, incidence relations among the plurality of concepts, names corresponding to the concepts and semantics of the concepts;
specifically, a conceptual graph generally consists of "nodes," links, "and" related word labels. Nodes represent a concept by geometry, patterns, text, etc., with each node representing a concept. Links represent meaningful relationships between different nodes, and various forms of wires are often used to link different nodes. The word labels can be relations representing concepts on different nodes, and can also be detailed descriptions of the concepts on the nodes.
In order to facilitate the analysis of the sentence by using the concept graph, the pre-generated concept graph in the embodiment includes the relationship between the concepts, the names corresponding to the concepts, and the semantics of the concepts.
After extracting the keywords in the sentence to be analyzed, matching the extracted keywords with the nodes in the concept graph, and then determining the semantics of each keyword in the sentence to be analyzed according to the matching result of each keyword.
S4000, determining the semantics of the sentence to be analyzed according to the semantics of the keyword in the sentence to be analyzed and the sentence structure of the sentence to be analyzed.
Specifically, after the semantics of each keyword in the sentence to be parsed is obtained, the semantics of the sentence to be parsed can be determined by combining the semantics of each keyword, the position of the keyword in the sentence to be parsed, and the sentence structure of the sentence to be parsed.
In this embodiment, the semantics of each word in the sentence to be parsed is first acquired according to the pre-constructed concept graph, and then the semantics of the sentence to be parsed is accurately acquired according to the semantics of each word and the sentence pattern structure of the sentence to be parsed, so that the intelligent terminal can make correct feedback.
According to a second embodiment provided by the invention, a semantic parsing method based on a concept graph comprises the following steps:
s1000, obtaining a sentence to be analyzed input by a user;
s2000, extracting key words in the sentence to be analyzed;
s3000, matching the keywords with a pre-generated concept graph, and determining the semantics of the keywords in the sentence to be analyzed; the concept graph comprises a plurality of concepts, incidence relations among the plurality of concepts, names corresponding to the concepts and semantics of the concepts;
s4000, determining the semantics of the sentence to be analyzed according to the semantics of the keyword in the sentence to be analyzed and the sentence structure of the sentence to be analyzed.
As shown in fig. 2, the method for generating the conceptual diagram includes:
s0100, constructing a word library according to the dictionary;
specifically, all words, the semantics of the words, synonyms of the words, and all possible names of the words are included in the dictionary. And constructing a word library according to various information contained in the dictionary, wherein the constructed word library comprises the association relation among all words, the semantics of the words, all possible names of the words and the like, and the mapping relation between the words and the names is established.
S0200 obtains a large amount of user corpora;
specifically, there are various ways to obtain corpora, for example: collecting linguistic data in the process of using the intelligent terminal by a user; or crawl a large amount of corpora through technologies such as crawlers and the like. Of course, other ways of collecting the corpus are also possible, and all the ways can be used in combination, so that the corpus is richer and the coverage range is wider.
S0300 identifying concepts in the corpus by using the word library and determining the relationship between the concepts;
in particular, the term library includes the term as well as all possible names for the term. And identifying all names appearing in the word library from the corpus by using the word library, and determining at least one word corresponding to the identified name according to the mapping relation between the word and the name, wherein the word is the concept. And determining the relationship between concepts according to the incidence relationship between the words in the word library.
S0400 obtains all names corresponding to the concepts and semantics of the concepts by using a word library;
specifically, after the concepts in the corpus are extracted, all names corresponding to the concepts are expanded according to the word bank, so that the generated concept graph is enriched, and the success rate of semantic analysis is improved. And then obtaining the semantics of all the concepts according to the word library.
For example, the concept "name", all possible words that correspond to "name", "full name", "nickname", etc. As another example, the concept "sports," corresponding to all possible words being "training," "sports," and the like.
S0500 generates a concept graph from the relationships between the concepts, all names corresponding to the concepts, and semantics of the concepts.
Specifically, a concept graph for semantic analysis of the natural language can be generated by acquiring the relationship between concepts, all names corresponding to the concepts, and the semantics of the concepts. In the generated concept graph, a plurality of nodes representing names and nodes representing semantics of the concepts are included under the nodes representing the concepts, and there may be association relations, such as a top-bottom relation, a parallel relation, etc., between the nodes representing the concepts.
In the embodiment, the concept graph is generated according to the corpus, so that concepts, names and the like in the generated concept graph better conform to language use habits in a language interaction process, and the success rate of semantic analysis in a human-computer interaction process is further improved.
According to a third embodiment provided by the present invention, as shown in fig. 3, a semantic parsing method based on a concept graph includes:
s1000, obtaining a sentence to be analyzed input by a user;
s2000, extracting key words in the sentence to be analyzed;
s3100, matching the keywords with a pre-generated concept graph, and finding out concepts corresponding to the keywords in the concept graph; the concept graph comprises a plurality of concepts, incidence relations among the plurality of concepts, names corresponding to the concepts and semantics of the concepts;
specifically, when the keyword is matched with a node in a pre-generated concept graph, if the keyword directly matches the node representing the concept, the node is the concept corresponding to the keyword. If the keyword is matched with the node representing the name, acquiring a concept connected with the name, namely a concept corresponding to the name, wherein the concept corresponding to the name is the concept corresponding to the keyword.
S3200, determining the semantics of the keywords in the sentence to be analyzed according to the semantics of the concept;
specifically, after the concept corresponding to the keyword is obtained, the semantics of the keyword in the sentence to be analyzed is determined according to the semantics corresponding to the concept in the concept graph.
S4000, determining the semantics of the sentence to be analyzed according to the semantics of the keyword in the sentence to be analyzed and the sentence structure of the sentence to be analyzed.
In the embodiment, the corresponding nodes are searched in the concept graph through the keywords, then the semantics of the keywords are determined according to the incidence relation between the nodes, and the hierarchical relation and the incidence relation of each concept in the concept graph are clear, so that the semantics of the keywords can be quickly and conveniently obtained, and the semantic analysis speed is further improved.
According to a fourth embodiment provided by the present invention, as shown in fig. 4, a semantic parsing method based on a concept graph includes:
s1000, obtaining a sentence to be analyzed input by a user;
s2000, extracting key words in the sentence to be analyzed;
s3100, matching the keywords with a pre-generated concept graph, and finding out concepts corresponding to the keywords in the concept graph; the concept graph comprises a plurality of concepts, incidence relations among the plurality of concepts, names corresponding to the concepts and semantics of the concepts;
s3210, when the concept corresponds to a semantic, the semantic of the concept is the semantic of the keyword in the sentence to be analyzed;
s3220, when the concept corresponds to a plurality of semantics, performing semantic disambiguation on the concept of the plurality of semantics, and determining at least one semantic of the concept, wherein the determined semantic of the concept is the semantic of the keyword in the sentence to be analyzed;
s4000, determining the semantics of the sentence to be analyzed according to the semantics of the keyword in the sentence to be analyzed and the sentence structure of the sentence to be analyzed.
Specifically, when determining the semantics of the keyword in the sentence to be parsed according to the semantics of the concept, there may be a variety of situations; such as one concept for one semantic or one concept for multiple semantics. When the concept only corresponds to one semantic, the semantic of the concept is the semantic of the keyword in the sentence to be analyzed.
However, when a concept corresponds to a plurality of semantics, i.e. a word is ambiguous, it is necessary to determine which of the plurality of semantics is the semantic of the keyword in the sentence to be analyzed, and thus, the plurality of semantics corresponding to the concept need to be semantically disambiguated. By disambiguating the semantics, some interference semantics are eliminated, and the accuracy of semantic analysis is improved.
According to a fifth embodiment provided by the present invention, as shown in fig. 5, a semantic parsing method based on a concept graph includes:
s1000, obtaining a sentence to be analyzed input by a user;
s2000, extracting key words in the sentence to be analyzed;
s3100, matching the keywords with a pre-generated concept graph, and finding out concepts corresponding to the keywords in the concept graph; the concept graph comprises a plurality of concepts, incidence relations among the plurality of concepts, names corresponding to the concepts and semantics of the concepts;
s3210, when the concept corresponds to a semantic, the semantic of the concept is the semantic of the keyword in the sentence to be analyzed;
s3221, determining at least one semantic meaning of the concept according to the context of the keyword corresponding to the concept in the sentence to be analyzed;
s3222 at least one semantic meaning determined by the concept is a semantic meaning of the keyword in the sentence to be analyzed;
s4000, determining the semantics of the sentence to be analyzed according to the semantics of the keyword in the sentence to be analyzed and the sentence structure of the sentence to be analyzed.
In particular, in semantically disambiguating a semantic, at least one semantic of a concept may be determined according to a context of a keyword in a statement to be parsed.
For example, "catch-up" may mean both "catch-up" and "eviction". When "catch-up" is in statement 1 "we want to learn advance, catch-up" is collocated with "advance (man)", it can be known that "catch-up" in statement 1 means "catch-up". When "catch up" in statement 2 "he is catching flies," catch up "in conjunction with" flies (insects) "is meant" catch up ". By analyzing the context of the keyword in the sentence to be analyzed, the semantics of the keyword can be disambiguated, the interference semantics can be removed, and the correct semantics can be obtained.
After disambiguating the semantics of the keyword, if there is more than one remaining semantics, it may be indicated that the plurality of semantics may all be true, and the plurality of semantics may be output at the same time.
According to a sixth embodiment of the present invention, as shown in fig. 6, a semantic analysis device based on a concept graph includes:
a sentence acquisition module 100, configured to acquire a sentence to be analyzed, which is input by a user;
specifically, the obtained sentence to be analyzed may be an unstructured text sentence input by the user, or may be voice information acquired by a microphone or other voice acquisition device. The voice information may be voice input by the user in real time.
A keyword extraction module 200, configured to extract a keyword in the sentence to be parsed;
specifically, after the sentence to be analyzed is obtained, word segmentation and part-of-speech tagging are performed on the sentence to be analyzed, and then keywords are extracted from the sentence to be analyzed based on a word segmentation result. The keyword is a word from which meaningless words such as "in", "with", etc. in the sentence to be analyzed are removed.
And if the obtained sentence to be analyzed is an unstructured text sentence, directly performing word segmentation and part-of-speech tagging on the sentence to be analyzed. The word segmentation and part-of-speech tagging can use the word segmentation and part-of-speech tagging methods in the prior art, for example, the word segmentation can use the longest word matching word segmentation method, the word segmentation method based on character string matching, and the like, and the part-of-speech tagging can use the method based on the HMM (Hidden Markov Model), and the like.
If the obtained sentence to be analyzed is voice information, the voice information is recognized as text information, and then word segmentation and part-of-speech tagging are carried out on the text information.
For example, the results of word segmentation and part-of-speech tagging for "Hangzhou West lake landscape is well a tourist resort" are:
hangzhou/n West lake/n landscape/n is good/a is/v tourist resort/n. Wherein, the letter in the word segmentation result represents the part of speech,/n represents the noun,/v represents the verb, and/a represents the adjective.
According to the word segmentation result, the keywords extracted from Hangzhou west lake landscape which is well as tourist attractions are 'Hangzhou', 'West lake', 'good', 'Ye' and 'tourist attractions'.
A keyword semantic determining module 300, configured to match the keyword with a pre-generated concept graph, and determine the semantic of the keyword in the to-be-analyzed sentence; the concept graph comprises a plurality of concepts, incidence relations among the plurality of concepts, names corresponding to the concepts and semantics of the concepts;
specifically, a conceptual graph generally consists of "nodes," links, "and" related word labels. Nodes represent a concept by geometry, patterns, text, etc., with each node representing a concept. Links represent meaningful relationships between different nodes, and various forms of wires are often used to link different nodes. The word labels can be relations representing concepts on different nodes, and can also be detailed descriptions of the concepts on the nodes.
In order to facilitate the analysis of the sentence by using the concept graph, the pre-generated concept graph in the embodiment includes the relationship between the concepts, the names corresponding to the concepts, and the semantics of the concepts.
After extracting the keywords in the sentence to be analyzed, matching the extracted keywords with the nodes in the concept graph, and then determining the semantics of each keyword in the sentence to be analyzed according to the matching result of each keyword.
A sentence semantics determining module 400, configured to determine semantics of the sentence to be parsed according to the semantics of the keyword in the sentence to be parsed and a sentence structure of the sentence to be parsed.
Specifically, after the semantics of each keyword in the sentence to be parsed is obtained, the semantics of the sentence to be parsed can be determined by combining the semantics of each keyword, the position of the keyword in the sentence to be parsed, and the sentence structure of the sentence to be parsed.
In this embodiment, the semantics of each word in the sentence to be parsed is first acquired according to the pre-constructed concept graph, and then the semantics of the sentence to be parsed is accurately acquired according to the semantics of each word and the sentence pattern structure of the sentence to be parsed, so that the intelligent terminal can make correct feedback.
According to a seventh embodiment provided by the present invention, as shown in fig. 7, a semantic analysis device based on a concept graph includes:
a sentence acquisition module 100, configured to acquire a sentence to be analyzed, which is input by a user;
a keyword extraction module 200, configured to extract a keyword in the sentence to be parsed;
a keyword semantic determining module 300, configured to match the keyword with a pre-generated concept graph, and determine the semantic of the keyword in the to-be-analyzed sentence; the concept graph comprises a plurality of concepts, incidence relations among the plurality of concepts, names corresponding to the concepts and semantics of the concepts;
a sentence semantics determining module 400, configured to determine semantics of the sentence to be parsed according to the semantics of the keyword in the sentence to be parsed and a sentence structure of the sentence to be parsed.
A concept graph generation module 500 is also included;
the concept graph generation module 500 includes:
a word bank constructing unit 510, configured to construct a word bank according to the dictionary;
specifically, all words, the semantics of the words, synonyms of the words, and all possible names of the words are included in the dictionary. And constructing a word library according to various information contained in the dictionary, wherein the constructed word library comprises the association relation among all words, the semantics of the words, all possible names of the words and the like, and the mapping relation between the words and the names is established.
A corpus acquiring unit 520, configured to acquire a large amount of user corpuses;
specifically, there are various ways to obtain corpora, for example: collecting linguistic data in the process of using the intelligent terminal by a user; or crawl a large amount of corpora through technologies such as crawlers and the like. Of course, other ways of collecting the corpus are also possible, and all the ways can be used in combination, so that the corpus is richer and the coverage range is wider.
A concept relationship determining unit 530 for identifying concepts in the corpus using the word library and determining relationships between the concepts;
in particular, the term library includes the term as well as all possible names for the term. And identifying all names appearing in the word library from the corpus by using the word library, and determining at least one word corresponding to the identified name according to the mapping relation between the word and the name, wherein the word is the concept. And determining the relationship between concepts according to the incidence relationship between the words in the word library.
A name and semantic acquiring unit 540, configured to acquire all names corresponding to the concepts and semantics of the concepts by using a word library;
specifically, after the concepts in the corpus are extracted, all names corresponding to the concepts are expanded according to the word bank, so that the generated concept graph is enriched, and the success rate of semantic analysis is improved. And then obtaining the semantics of all the concepts according to the word library.
For example, the concept "name", all possible words that correspond to "name", "full name", "nickname", etc. As another example, the concept "sports," corresponding to all possible words being "training," "sports," and the like.
A concept graph generating unit 550, configured to generate a concept graph according to the relationship between the concepts, all names corresponding to the concepts, and semantics of the concepts.
Specifically, a concept graph for semantic analysis of the natural language can be generated by acquiring the relationship between concepts, all names corresponding to the concepts, and the semantics of the concepts. In the generated concept graph, a plurality of nodes representing names and nodes representing semantics of the concepts are included under the nodes representing the concepts, and there may be association relations, such as a top-bottom relation, a parallel relation, etc., between the nodes representing the concepts.
In the embodiment, the concept graph is generated according to the corpus, so that concepts, names and the like in the generated concept graph better conform to language use habits in a language interaction process, and the success rate of semantic analysis in a human-computer interaction process is further improved.
Preferably, the keyword semantics determining module 300 includes:
the matching sub-module 310 is configured to match the keyword with a pre-generated concept graph, and find a concept corresponding to the keyword in the concept graph;
specifically, when the keyword is matched with a node in a pre-generated concept graph, if the keyword directly matches the node representing the concept, the node is the concept corresponding to the keyword. If the keyword is matched with the node representing the name, acquiring a concept connected with the name, namely a concept corresponding to the name, wherein the concept corresponding to the name is the concept corresponding to the keyword.
And the keyword semantic determining submodule 320 is configured to determine, according to the semantic of the concept, the semantic of the keyword in the sentence to be parsed.
Specifically, after the concept corresponding to the keyword is obtained, the semantics of the keyword in the sentence to be analyzed is determined according to the semantics corresponding to the concept in the concept graph.
In the embodiment, the corresponding nodes are searched in the concept graph through the keywords, then the semantics of the keywords are determined according to the incidence relation between the nodes, and the hierarchical relation and the incidence relation of each concept in the concept graph are clear, so that the semantics of the keywords can be quickly and conveniently obtained, and the semantic analysis speed is further improved.
Preferably, the keyword semantics determining sub-module 320 includes:
a keyword semantic determining unit 321, configured to, when the concept corresponds to a semantic, determine that the semantic of the concept is the semantic of the keyword in the to-be-analyzed sentence;
the keyword semantic determining unit 321 is further configured to perform semantic disambiguation on the concept of the multiple semantics to determine at least one semantic of the concept when the concept corresponds to the multiple semantics, where the at least one semantic after the concept determination is a semantic of the keyword in the sentence to be parsed.
Specifically, when determining the semantics of the keyword in the sentence to be parsed according to the semantics of the concept, there may be a variety of situations; such as one concept for one semantic or one concept for multiple semantics. When the concept only corresponds to one semantic, the semantic of the concept is the semantic of the keyword in the sentence to be analyzed.
However, when a concept corresponds to a plurality of semantics, i.e. a word is ambiguous, it is necessary to determine which of the plurality of semantics is the semantic of the keyword in the sentence to be analyzed, and thus, the plurality of semantics corresponding to the concept need to be semantically disambiguated. By disambiguating the semantics, some interference semantics are eliminated, and the accuracy of semantic analysis is improved.
Preferably, the keyword semantic determining unit 321 is further configured to determine at least one semantic of the concept according to a context of a keyword corresponding to the concept in the sentence to be parsed; and taking at least one semantic meaning determined by the concept as the semantic meaning of the keyword in the sentence to be analyzed.
In particular, in semantically disambiguating a semantic, at least one semantic of a concept may be determined according to a context of a keyword in a statement to be parsed.
For example, "catch-up" may mean both "catch-up" and "eviction". When "catch-up" is in statement 1 "we want to learn advance, catch-up" is collocated with "advance (man)", it can be known that "catch-up" in statement 1 means "catch-up". When "catch up" in statement 2 "he is catching flies," catch up "in conjunction with" flies (insects) "is meant" catch up ". By analyzing the context of the keyword in the sentence to be analyzed, the semantics of the keyword can be disambiguated, the interference semantics can be removed, and the correct semantics can be obtained.
After disambiguating the semantics of the keyword, if there is more than one remaining semantics, it may be indicated that the plurality of semantics may all be true, and the plurality of semantics may be output at the same time.
It should be noted that the above embodiments can be freely combined as necessary. The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A semantic parsing method based on a concept graph is characterized by comprising the following steps:
acquiring a sentence to be analyzed input by a user;
extracting key words in the sentence to be analyzed;
matching the keywords with a pre-generated concept graph, and determining the semantics of the keywords in the sentence to be analyzed; the concept graph comprises a plurality of concepts, incidence relations among the plurality of concepts, names corresponding to the concepts and semantics of the concepts;
and determining the semantics of the sentence to be analyzed according to the semantics of the keyword in the sentence to be analyzed and the sentence pattern structure of the sentence to be analyzed.
2. The semantic parsing method based on the concept graph according to claim 1, wherein the generation method of the concept graph is as follows:
constructing a word library according to the dictionary;
acquiring a large amount of user corpora;
identifying concepts in the corpus using the term library and determining relationships between the concepts;
acquiring all names corresponding to the concepts and the semantics of the concepts by utilizing a word library;
and generating a concept graph according to the relationship among the concepts, all names corresponding to the concepts and the semantics of the concepts.
3. The semantic parsing method according to claim 1, wherein the matching of the keyword with a pre-generated concept graph and the determining of the semantic meaning of the keyword in the to-be-parsed sentence specifically comprise:
matching the keyword with a pre-generated concept graph, and finding out the concept corresponding to the keyword in the concept graph;
and determining the semantics of the keywords in the sentence to be analyzed according to the semantics of the concept.
4. The semantic parsing method according to claim 3, wherein the determining the semantics of the keyword in the sentence to be parsed according to the semantics of the concept specifically comprises:
when the concept corresponds to a semantic, the semantic of the concept is the semantic of the keyword in the sentence to be analyzed;
and when the concept corresponds to a plurality of semantics, performing semantic disambiguation on the concept of the plurality of semantics, and determining at least one semantic of the concept, wherein the at least one semantic determined by the concept is the semantic of the keyword in the statement to be analyzed.
5. The semantic parsing method according to claim 4, wherein when the concept corresponds to a plurality of semantics, performing semantic disambiguation on the concept of the plurality of semantics to determine at least one semantic meaning of the concept, where the at least one semantic meaning after the concept determination is the semantic meaning of the keyword in the sentence to be parsed specifically includes:
determining at least one semantic meaning of the concept according to the context of the keyword corresponding to the concept in the statement to be analyzed;
and at least one semantic meaning after the concept determination is the semantic meaning of the keyword in the sentence to be analyzed.
6. A semantic parsing apparatus based on a concept graph, comprising:
the sentence acquisition module is used for acquiring a sentence to be analyzed input by a user;
the keyword extraction module is used for extracting keywords in the sentence to be analyzed;
the keyword semantic determining module is used for matching the keywords with a pre-generated concept graph and determining the semantics of the keywords in the sentence to be analyzed; the concept graph comprises a plurality of concepts, incidence relations among the plurality of concepts, names corresponding to the concepts and semantics of the concepts;
and the sentence semantic determining module is used for determining the semantics of the sentence to be analyzed according to the semantics of the keyword in the sentence to be analyzed and the sentence pattern structure of the sentence to be analyzed.
7. The semantic parsing device based on concept graph as claimed in claim 6, further comprising a concept graph generation module;
the concept graph generation module comprises:
the word bank building unit is used for building a word bank according to the dictionary;
the corpus acquiring unit is used for acquiring a large amount of user corpuses;
a concept relationship determination unit, configured to identify concepts in the corpus using the word bank, and determine relationships between the concepts;
the name and semantic acquiring unit is used for acquiring all names corresponding to the concepts and the semantics of the concepts by utilizing a word library;
and the concept graph generating unit is used for generating a concept graph according to the relationship among the concepts, all names corresponding to the concepts and the semantics of the concepts.
8. The semantic parsing apparatus based on concept graph according to claim 6, wherein the keyword semantic determination module comprises:
the matching sub-module is used for matching the keyword with a pre-generated concept graph and finding out the concept corresponding to the keyword in the concept graph;
and the keyword semantic determining submodule is used for determining the semantics of the keyword in the sentence to be analyzed according to the semantics of the concept.
9. The concept graph-based semantic parsing apparatus according to claim 8, wherein the keyword semantic determination sub-module comprises:
a keyword semantic determining unit, configured to, when the concept corresponds to a semantic, determine that the semantic of the concept is the semantic of the keyword in the sentence to be parsed;
the keyword semantic determining unit is further configured to perform semantic disambiguation on the concept of the plurality of semantics to determine at least one semantic of the concept when the concept corresponds to the plurality of semantics, where the at least one semantic after the concept determination is the semantic of the keyword in the sentence to be parsed.
10. The concept graph-based semantic parsing device according to claim 9,
the keyword semantic determining unit is further configured to determine at least one semantic of the concept according to a context of a keyword corresponding to the concept in the sentence to be analyzed; and taking at least one semantic meaning determined by the concept as the semantic meaning of the keyword in the sentence to be analyzed.
CN201910364368.9A 2019-04-30 2019-04-30 Semantic analysis method and device based on concept graph Pending CN111950290A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910364368.9A CN111950290A (en) 2019-04-30 2019-04-30 Semantic analysis method and device based on concept graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910364368.9A CN111950290A (en) 2019-04-30 2019-04-30 Semantic analysis method and device based on concept graph

Publications (1)

Publication Number Publication Date
CN111950290A true CN111950290A (en) 2020-11-17

Family

ID=73335432

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910364368.9A Pending CN111950290A (en) 2019-04-30 2019-04-30 Semantic analysis method and device based on concept graph

Country Status (1)

Country Link
CN (1) CN111950290A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101510221A (en) * 2009-02-17 2009-08-19 北京大学 Enquiry statement analytical method and system for information retrieval
CN102306144A (en) * 2011-07-18 2012-01-04 南京邮电大学 Terms disambiguation method based on semantic dictionary
CN103678418A (en) * 2012-09-25 2014-03-26 富士通株式会社 Information processing method and equipment
CN106155999A (en) * 2015-04-09 2016-11-23 科大讯飞股份有限公司 Semantics comprehension on natural language method and system
CN108228820A (en) * 2017-12-30 2018-06-29 厦门太迪智能科技有限公司 User's query intention understanding method, system and terminal

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101510221A (en) * 2009-02-17 2009-08-19 北京大学 Enquiry statement analytical method and system for information retrieval
CN102306144A (en) * 2011-07-18 2012-01-04 南京邮电大学 Terms disambiguation method based on semantic dictionary
CN103678418A (en) * 2012-09-25 2014-03-26 富士通株式会社 Information processing method and equipment
CN106155999A (en) * 2015-04-09 2016-11-23 科大讯飞股份有限公司 Semantics comprehension on natural language method and system
CN108228820A (en) * 2017-12-30 2018-06-29 厦门太迪智能科技有限公司 User's query intention understanding method, system and terminal

Similar Documents

Publication Publication Date Title
US10997370B2 (en) Hybrid classifier for assigning natural language processing (NLP) inputs to domains in real-time
CN104050160B (en) Interpreter's method and apparatus that a kind of machine is blended with human translation
CN109949799B (en) Semantic parsing method and system
CN108538286A (en) A kind of method and computer of speech recognition
CN105631468A (en) RNN-based automatic picture description generation method
CN110609983B (en) Structured decomposition method for policy file
EP1754169A2 (en) A system for multilingual machine translation from english to hindi and other indian languages using pseudo-interlingua and hybridized approach
WO2008107305A2 (en) Search-based word segmentation method and device for language without word boundary tag
CN108681574A (en) A kind of non-true class quiz answers selection method and system based on text snippet
CN105551485B (en) Voice file retrieval method and system
CN108920447B (en) Chinese event extraction method for specific field
CN112860896A (en) Corpus generalization method and man-machine conversation emotion analysis method for industrial field
CN107526721A (en) A kind of disambiguation method and device to electric business product review vocabulary
CN108959630A (en) A kind of character attribute abstracting method towards English without structure text
Hong et al. Automatically extracting word relationships as templates for pun generation
Kaur et al. A detailed analysis of core NLP for information extraction
Comas et al. Sibyl, a factoid question-answering system for spoken documents
Kessler et al. Extraction of terminology in the field of construction
CN113919339A (en) Artificial intelligence auxiliary writing method
CN111950290A (en) Semantic analysis method and device based on concept graph
Li et al. Chinese frame identification using t-crf model
KR950013128B1 (en) Apparatus and method of machine translation
CN109002540B (en) Method for automatically generating Chinese announcement document question answer pairs
Tammewar et al. Can distributed word embeddings be an alternative to costly linguistic features: A study on parsing hindi
Sankaravelayuthan et al. A Comprehensive Study of Shallow Parsing and Machine Translation in Malaylam

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination