CN112632259A - Automatic dialog intention recognition system based on linguistic rule generation - Google Patents
Automatic dialog intention recognition system based on linguistic rule generation Download PDFInfo
- Publication number
- CN112632259A CN112632259A CN202011625429.1A CN202011625429A CN112632259A CN 112632259 A CN112632259 A CN 112632259A CN 202011625429 A CN202011625429 A CN 202011625429A CN 112632259 A CN112632259 A CN 112632259A
- Authority
- CN
- China
- Prior art keywords
- model
- intention
- words
- semantic
- generation module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000011218 segmentation Effects 0.000 claims abstract description 20
- 238000004364 calculation method Methods 0.000 claims abstract description 12
- 238000012549 training Methods 0.000 claims description 31
- 238000000034 method Methods 0.000 claims description 9
- 238000013499 data model Methods 0.000 claims 1
- 238000013135 deep learning Methods 0.000 description 4
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000013145 classification model Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000000137 annealing Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 238000005111 flow chemistry technique Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000007429 general method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Abstract
The invention discloses a dialogue intention automatic recognition system based on linguistic rule generation, which comprises: the system comprises a word segmentation module, a linguistic rule generation module, an intention model group generation module and a model similarity calculation module, wherein the word segmentation module divides a sentence into words, the linguistic rule generation module marks the words with respective syntax and semantic information, the intention model group generation module performs duplication removal and combination with intention as a unit, each intention generates a semantic model group, and the model similarity calculation module calculates the similarity between two models. The invention solves the problems of inaccurate conversation intention identification and large manual workload in the prior art.
Description
Technical Field
The invention relates to the technical field of semantic recognition, in particular to an automatic recognition system for conversation intentions generated based on linguistic rules.
Background
There are two general methods for recognizing dialog intentions: firstly, dictionary template-based regular matching; and secondly, performing intention recognition based on a deep learning classification model. The regular matching based on the dictionary template is composed of rule templates which are manually summarized and based on regular expressions, and different dialog texts are recognized as different intents in a way of carrying out regular matching on each dialog. The disadvantage of this technique is that the workload of manual summarization of rules is large, and the simple regular matching of keywords cannot fully identify semantics, and when a sentence hits the two intended regular matching rules at the same time, the system cannot judge.
The intention recognition is carried out on the basis of a deep learning classification model, and the method generally adopts supervised learning, utilizes a large amount of manually labeled linguistic data to carry out model training, and then judges the conversation intention through the deep learning model. The method has the problems that the demand for the labeled corpora is very large, the identification process is a black box, and the identification result cannot be subjected to hard manual intervention.
Disclosure of Invention
Therefore, the invention provides a dialogue intention automatic identification system generated based on linguistic rules, which aims to solve the problems of inaccurate dialogue intention identification and large manual workload in the prior art.
In order to achieve the above purpose, the invention provides the following technical scheme:
the invention discloses a dialogue intention automatic recognition system based on linguistic rule generation, which comprises: the system comprises a word segmentation module, a linguistic rule generation module, an intention model group generation module and a model similarity calculation module, wherein the word segmentation module divides a sentence into words, the linguistic rule generation module marks the words with respective syntax and semantic information, the intention model group generation module performs duplication removal and combination with intention as a unit, each intention generates a semantic model group, and the model similarity calculation module calculates the similarity between two models.
Further, the word segmentation module segments the complete sentence into a plurality of words according to commonly used words and phrases.
Furthermore, the linguistic rule generation module marks each word with respective syntactic and semantic information according to common paraphrases of the words and the sentences, so that the understanding of the sentences is facilitated.
Further, the intention model group generation module performs deduplication and merging on one or more models with syntactic and semantic information generated by the linguistic rule generation module by taking intents as units, and each intention generates a semantic model group.
Furthermore, the system firstly carries out model training to enable the sentence with determined intention to generate a corresponding unique semantic model group, after the training is finished, a text is input, a corresponding model with syntax and semantic information is determined, the model with syntax and semantic information is compared with the semantic model group, the best similarity is matched, and a recognition and understanding result is output.
Furthermore, in the model training process, training sentences with different intentions are input into the system, the training sentences are divided into words by using the word segmentation module, corresponding syntax and semantic information are marked on the words in the sentences by using the module for generating the linguistic rules in the system, and training data of each sentence are respectively generated into the model with the syntax and the semantic information.
Furthermore, in the different intention training sentences, the models of a plurality of training data generated in the same intention are subjected to de-duplication and combination processing, and each intention generates a semantic model group.
Furthermore, the input text is segmented into a plurality of words by using a word segmentation module, and the segmentation standard is determined by a general field vocabulary preset according to a system and a user-defined field vocabulary capable of being freely configured.
Furthermore, after the input text is segmented into words, the linguistic rule generation module is used for automatically generating the models with syntactic and semantic information from the segmented words.
And further, carrying out similarity calculation on the model with the syntactic and semantic information and a plurality of semantic model groups for determining the intention until the similarity between the model with the syntactic and semantic information and all the semantic model groups is calculated, selecting an intention result with the highest similarity and outputting a recognition result.
The invention has the following advantages:
the invention discloses a dialogue intention automatic recognition system based on linguistic rule generation, which does not need a large amount of manually labeled training data required by deep learning, and converts training linguistic data into a model with high abstraction and rich linguistic information through a linguistic rule generation module, so that the model is generated only by a small amount of data. And the model content is visible to the system maintainer and can be modified and adjusted manually and directly to the content within the model. Meanwhile, the system has the matched similarity calculation for a plurality of intentions which are input by a certain user and can be hit simultaneously, and the condition that the keywords are hit simultaneously but cannot be distinguished can not occur.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.
The structures, ratios, sizes, and the like shown in the present specification are only used for matching with the contents disclosed in the specification, so as to be understood and read by those skilled in the art, and are not used to limit the conditions that the present invention can be implemented, so that the present invention has no technical significance, and any structural modifications, changes in the ratio relationship, or adjustments of the sizes, without affecting the effects and the achievable by the present invention, should still fall within the range that the technical contents disclosed in the present invention can cover.
FIG. 1 is a flowchart of a training process of an automatic dialog intention recognition system based on linguistic rule generation according to an embodiment of the present invention;
FIG. 2 is a flowchart of a recognition process of an automatic dialog intention recognition system based on linguistic rule generation according to an embodiment of the present invention;
Detailed Description
The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Examples
The embodiment discloses a dialog intention automatic recognition system based on linguistic rule generation, which comprises: the system comprises a word segmentation module, a linguistic rule generation module, an intention model group generation module and a model similarity calculation module, wherein the word segmentation module divides a sentence into words, the linguistic rule generation module marks the words with respective syntax and semantic information, the intention model group generation module performs duplication removal and combination with intention as a unit, each intention generates a semantic model group, and the model similarity calculation module calculates the similarity between two models.
The word segmentation module segments the complete sentence, segments the sentence into a plurality of words according to the commonly used words and phrases for subsequent flow processing; the linguistic rule generation module marks each word with respective syntax and semantic information according to common paraphrases of the words and the sentences, and is favorable for understanding the sentences; the intention model group generation module performs deduplication and merging on one or more models with syntactic and semantic information generated by the linguistic rule generation module by taking intents as units, and each intention generates a semantic model group.
The system firstly carries out model training to enable the sentences with determined intentions to generate a corresponding unique semantic model group, after the training is finished, texts are input, corresponding models with syntax and semantic information are determined, the models with the syntax and the semantic information are compared with the semantic model group, the best similarity is matched, and recognition results are output.
In the model training process, training sentences with different intentions are input into the system, the training sentences are split into words by using the word segmentation module, meanwhile, the corresponding syntax and semantic information on words in the sentences are marked by the generation module according to linguistic rules in the system, and training data of each sentence are respectively generated into a model with the syntax and the semantic information. For example: such as: why cannot the ticket be annealed? Replacing the question phrase, negation semantic, returning and train ticket.
In different intention training sentences, the models of a plurality of training data generated in the same intention are subjected to de-duplication and merging processing, and each intention generates a semantic model group.
After training is finished, inputting a text needing to recognize an intention, segmenting the input text into a plurality of words by using a word segmentation module, wherein the segmentation standard is determined by a general field vocabulary preset according to a system and a user-defined field vocabulary which can be freely configured; after the input text is divided into words, the linguistic rule generation module is used for automatically generating the words into a model with syntactic and semantic information. For example: such as: why cannot the ticket be annealed? And replacing the question phrase and the negative semantic annealing ticket.
And carrying out similarity calculation on the model with the syntactic and semantic information and a plurality of semantic model groups for determining the intention until the similarity between the model with the syntactic and semantic information and all the semantic model groups is calculated, selecting an intention result with the highest similarity and outputting a recognition result.
According to the dialogue intention automatic recognition system generated based on the linguistic rule, a large amount of manually labeled training data required by deep learning is not needed, and the linguistic rule generation module is used for converting the training corpus into a model with high abstraction and rich linguistic information, so that the model is generated only by a small amount of data. And the model content is visible to the system maintainer and can be modified and adjusted manually and directly to the content within the model. Meanwhile, the system has the matched similarity calculation for a plurality of intentions which are input by a certain user and can be hit simultaneously, and the condition that the keywords are hit simultaneously but cannot be distinguished can not occur.
Although the invention has been described in detail above with reference to a general description and specific examples, it will be apparent to one skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.
Claims (10)
1. An automatic dialog intention recognition system based on linguistic rule generation, characterized in that the system comprises: the system comprises a word segmentation module, a linguistic rule generation module, an intention model group generation module and a model similarity calculation module, wherein the word segmentation module divides a sentence into words, the linguistic rule generation module marks the words with respective syntax and semantic information, the intention model group generation module performs duplication removal and combination with intention as a unit, each intention generates a semantic model group, and the model similarity calculation module calculates the similarity between two models.
2. The system of claim 1, wherein the segmentation module segments the complete sentence into a plurality of words according to commonly used words and phrases.
3. The system of claim 1, wherein the linguistic rule generation module is configured to mark each word with respective syntactic and semantic information based on common paraphrases of the word and the sentence to facilitate comprehension of the sentence.
4. The system of claim 1, wherein the intent model group generation module de-duplicates and merges one or more models with syntactic and semantic information generated by the linguistic rule generation module in units of intent, each intent generating a semantic model group.
5. The system of claim 1, wherein the system first performs model training to generate a unique semantic model set for an intent-determining sentence, inputs a text after the training, determines a corresponding model with syntactic and semantic information, compares the model with the semantic model set with the syntactic and semantic information, matches an optimal similarity, and outputs a recognition result.
6. The system as claimed in claim 5, wherein in the model training process, training sentences with different intentions are input into the system, the training sentences are divided into words by the word segmentation module, the words in the sentences are marked with corresponding syntactic and semantic information according to the linguistic rule generation module in the system, and each sentence of training data is generated into a model with syntactic and semantic information.
7. The system of claim 6, wherein the different intention training sentences are processed by de-duplication and merging of training data models generated in the same intention, and each intention generates a semantic model group.
8. The system of claim 5, wherein the input text is segmented into a plurality of words by using a segmentation module, and the segmentation criteria is determined by a general domain vocabulary preset by the system and a freely configurable custom domain vocabulary.
9. The system of claim 8, wherein after the input text is segmented into words, the linguistic rule generation module is used to automatically generate the segmented words into models with syntactic and semantic information.
10. The system of claim 9, wherein the model with syntactic and semantic information is similarity-calculated with a plurality of intent-determining semantic model groups until similarity between the model with syntactic and semantic information and all semantic model groups is calculated, an intent result with highest similarity is selected and recognition results are output.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011625429.1A CN112632259A (en) | 2020-12-30 | 2020-12-30 | Automatic dialog intention recognition system based on linguistic rule generation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011625429.1A CN112632259A (en) | 2020-12-30 | 2020-12-30 | Automatic dialog intention recognition system based on linguistic rule generation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112632259A true CN112632259A (en) | 2021-04-09 |
Family
ID=75289811
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011625429.1A Pending CN112632259A (en) | 2020-12-30 | 2020-12-30 | Automatic dialog intention recognition system based on linguistic rule generation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112632259A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115329883A (en) * | 2022-08-22 | 2022-11-11 | 桂林电子科技大学 | Semantic similarity processing method, device and system and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100204982A1 (en) * | 2009-02-06 | 2010-08-12 | Robert Bosch Gmbh | System and Method for Generating Data for Complex Statistical Modeling for use in Dialog Systems |
CN110334347A (en) * | 2019-06-27 | 2019-10-15 | 腾讯科技(深圳)有限公司 | Information processing method, relevant device and storage medium based on natural language recognition |
CN111858888A (en) * | 2020-07-13 | 2020-10-30 | 北京航空航天大学 | Multi-round dialogue system of check-in scene |
-
2020
- 2020-12-30 CN CN202011625429.1A patent/CN112632259A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100204982A1 (en) * | 2009-02-06 | 2010-08-12 | Robert Bosch Gmbh | System and Method for Generating Data for Complex Statistical Modeling for use in Dialog Systems |
CN110334347A (en) * | 2019-06-27 | 2019-10-15 | 腾讯科技(深圳)有限公司 | Information processing method, relevant device and storage medium based on natural language recognition |
CN111858888A (en) * | 2020-07-13 | 2020-10-30 | 北京航空航天大学 | Multi-round dialogue system of check-in scene |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115329883A (en) * | 2022-08-22 | 2022-11-11 | 桂林电子科技大学 | Semantic similarity processing method, device and system and storage medium |
CN115329883B (en) * | 2022-08-22 | 2023-05-09 | 桂林电子科技大学 | Semantic similarity processing method, device and system and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107741928B (en) | Method for correcting error of text after voice recognition based on domain recognition | |
CN107066455B (en) | Multi-language intelligent preprocessing real-time statistics machine translation system | |
CN105404621B (en) | A kind of method and system that Chinese character is read for blind person | |
WO2008107305A2 (en) | Search-based word segmentation method and device for language without word boundary tag | |
CN109033064B (en) | Primary school Chinese composition corpus label automatic extraction method based on text abstract | |
Vinnarasu et al. | Speech to text conversion and summarization for effective understanding and documentation | |
CN110321434A (en) | A kind of file classification method based on word sense disambiguation convolutional neural networks | |
Kübler et al. | Part of speech tagging for Arabic | |
CN108363692A (en) | A kind of computational methods of sentence similarity and the public sentiment measure of supervision based on this method | |
Al-Anzi et al. | The impact of phonological rules on Arabic speech recognition | |
Christodoulides et al. | Automatic detection and annotation of disfluencies in spoken French corpora | |
CN112632259A (en) | Automatic dialog intention recognition system based on linguistic rule generation | |
CN112307756A (en) | Bi-LSTM and word fusion-based Chinese word segmentation method | |
CN107797986A (en) | A kind of mixing language material segmenting method based on LSTM CNN | |
CN111104515A (en) | Emotional word text information classification method | |
Yoo et al. | Speech-act classification using a convolutional neural network based on pos tag and dependency-relation bigram embedding | |
CN114969312A (en) | Marketing case theme extraction method and system based on variational self-encoder | |
Mahafdah et al. | Arabic Part of speech Tagging using k-Nearest Neighbour and Naive Bayes Classifiers Combination. | |
Wang et al. | Mongolian named entity recognition using suffixes segmentation | |
Tukur et al. | Parts-of-speech tagging of Hausa-based texts using hidden Markov model | |
Mansikkaniemi et al. | Adaptation of morph-based speech recognition for foreign names and acronyms | |
Hong et al. | A hybrid approach to english-korean name transliteration | |
Outahajala et al. | The development of a fine grained class set for Amazigh POS tagging | |
CN113836943B (en) | Relation extraction method and device based on semantic level | |
Hasegawa-Johnson et al. | Arabic speech and language technology |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |