CN112632259A

CN112632259A - Automatic dialog intention recognition system based on linguistic rule generation

Info

Publication number: CN112632259A
Application number: CN202011625429.1A
Authority: CN
Inventors: 冷月
Original assignee: Icsoc Beijing Communication Technology Co ltd
Current assignee: Icsoc Beijing Communication Technology Co ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2021-04-09

Abstract

The invention discloses a dialogue intention automatic recognition system based on linguistic rule generation, which comprises: the system comprises a word segmentation module, a linguistic rule generation module, an intention model group generation module and a model similarity calculation module, wherein the word segmentation module divides a sentence into words, the linguistic rule generation module marks the words with respective syntax and semantic information, the intention model group generation module performs duplication removal and combination with intention as a unit, each intention generates a semantic model group, and the model similarity calculation module calculates the similarity between two models. The invention solves the problems of inaccurate conversation intention identification and large manual workload in the prior art.

Description

Automatic dialog intention recognition system based on linguistic rule generation

Technical Field

The invention relates to the technical field of semantic recognition, in particular to an automatic recognition system for conversation intentions generated based on linguistic rules.

Background

There are two general methods for recognizing dialog intentions: firstly, dictionary template-based regular matching; and secondly, performing intention recognition based on a deep learning classification model. The regular matching based on the dictionary template is composed of rule templates which are manually summarized and based on regular expressions, and different dialog texts are recognized as different intents in a way of carrying out regular matching on each dialog. The disadvantage of this technique is that the workload of manual summarization of rules is large, and the simple regular matching of keywords cannot fully identify semantics, and when a sentence hits the two intended regular matching rules at the same time, the system cannot judge.

The intention recognition is carried out on the basis of a deep learning classification model, and the method generally adopts supervised learning, utilizes a large amount of manually labeled linguistic data to carry out model training, and then judges the conversation intention through the deep learning model. The method has the problems that the demand for the labeled corpora is very large, the identification process is a black box, and the identification result cannot be subjected to hard manual intervention.

Disclosure of Invention

Therefore, the invention provides a dialogue intention automatic identification system generated based on linguistic rules, which aims to solve the problems of inaccurate dialogue intention identification and large manual workload in the prior art.

In order to achieve the above purpose, the invention provides the following technical scheme:

the invention discloses a dialogue intention automatic recognition system based on linguistic rule generation, which comprises: the system comprises a word segmentation module, a linguistic rule generation module, an intention model group generation module and a model similarity calculation module, wherein the word segmentation module divides a sentence into words, the linguistic rule generation module marks the words with respective syntax and semantic information, the intention model group generation module performs duplication removal and combination with intention as a unit, each intention generates a semantic model group, and the model similarity calculation module calculates the similarity between two models.

Further, the word segmentation module segments the complete sentence into a plurality of words according to commonly used words and phrases.

Furthermore, the linguistic rule generation module marks each word with respective syntactic and semantic information according to common paraphrases of the words and the sentences, so that the understanding of the sentences is facilitated.

Further, the intention model group generation module performs deduplication and merging on one or more models with syntactic and semantic information generated by the linguistic rule generation module by taking intents as units, and each intention generates a semantic model group.

Furthermore, the system firstly carries out model training to enable the sentence with determined intention to generate a corresponding unique semantic model group, after the training is finished, a text is input, a corresponding model with syntax and semantic information is determined, the model with syntax and semantic information is compared with the semantic model group, the best similarity is matched, and a recognition and understanding result is output.

Furthermore, in the model training process, training sentences with different intentions are input into the system, the training sentences are divided into words by using the word segmentation module, corresponding syntax and semantic information are marked on the words in the sentences by using the module for generating the linguistic rules in the system, and training data of each sentence are respectively generated into the model with the syntax and the semantic information.

Furthermore, in the different intention training sentences, the models of a plurality of training data generated in the same intention are subjected to de-duplication and combination processing, and each intention generates a semantic model group.

Furthermore, the input text is segmented into a plurality of words by using a word segmentation module, and the segmentation standard is determined by a general field vocabulary preset according to a system and a user-defined field vocabulary capable of being freely configured.

Furthermore, after the input text is segmented into words, the linguistic rule generation module is used for automatically generating the models with syntactic and semantic information from the segmented words.

And further, carrying out similarity calculation on the model with the syntactic and semantic information and a plurality of semantic model groups for determining the intention until the similarity between the model with the syntactic and semantic information and all the semantic model groups is calculated, selecting an intention result with the highest similarity and outputting a recognition result.

The invention has the following advantages:

the invention discloses a dialogue intention automatic recognition system based on linguistic rule generation, which does not need a large amount of manually labeled training data required by deep learning, and converts training linguistic data into a model with high abstraction and rich linguistic information through a linguistic rule generation module, so that the model is generated only by a small amount of data. And the model content is visible to the system maintainer and can be modified and adjusted manually and directly to the content within the model. Meanwhile, the system has the matched similarity calculation for a plurality of intentions which are input by a certain user and can be hit simultaneously, and the condition that the keywords are hit simultaneously but cannot be distinguished can not occur.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.

The structures, ratios, sizes, and the like shown in the present specification are only used for matching with the contents disclosed in the specification, so as to be understood and read by those skilled in the art, and are not used to limit the conditions that the present invention can be implemented, so that the present invention has no technical significance, and any structural modifications, changes in the ratio relationship, or adjustments of the sizes, without affecting the effects and the achievable by the present invention, should still fall within the range that the technical contents disclosed in the present invention can cover.

FIG. 1 is a flowchart of a training process of an automatic dialog intention recognition system based on linguistic rule generation according to an embodiment of the present invention;

FIG. 2 is a flowchart of a recognition process of an automatic dialog intention recognition system based on linguistic rule generation according to an embodiment of the present invention;

Detailed Description

The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Examples

The embodiment discloses a dialog intention automatic recognition system based on linguistic rule generation, which comprises: the system comprises a word segmentation module, a linguistic rule generation module, an intention model group generation module and a model similarity calculation module, wherein the word segmentation module divides a sentence into words, the linguistic rule generation module marks the words with respective syntax and semantic information, the intention model group generation module performs duplication removal and combination with intention as a unit, each intention generates a semantic model group, and the model similarity calculation module calculates the similarity between two models.

The word segmentation module segments the complete sentence, segments the sentence into a plurality of words according to the commonly used words and phrases for subsequent flow processing; the linguistic rule generation module marks each word with respective syntax and semantic information according to common paraphrases of the words and the sentences, and is favorable for understanding the sentences; the intention model group generation module performs deduplication and merging on one or more models with syntactic and semantic information generated by the linguistic rule generation module by taking intents as units, and each intention generates a semantic model group.

The system firstly carries out model training to enable the sentences with determined intentions to generate a corresponding unique semantic model group, after the training is finished, texts are input, corresponding models with syntax and semantic information are determined, the models with the syntax and the semantic information are compared with the semantic model group, the best similarity is matched, and recognition results are output.

In the model training process, training sentences with different intentions are input into the system, the training sentences are split into words by using the word segmentation module, meanwhile, the corresponding syntax and semantic information on words in the sentences are marked by the generation module according to linguistic rules in the system, and training data of each sentence are respectively generated into a model with the syntax and the semantic information. For example: such as: why cannot the ticket be annealed? Replacing the question phrase, negation semantic, returning and train ticket.

In different intention training sentences, the models of a plurality of training data generated in the same intention are subjected to de-duplication and merging processing, and each intention generates a semantic model group.

After training is finished, inputting a text needing to recognize an intention, segmenting the input text into a plurality of words by using a word segmentation module, wherein the segmentation standard is determined by a general field vocabulary preset according to a system and a user-defined field vocabulary which can be freely configured; after the input text is divided into words, the linguistic rule generation module is used for automatically generating the words into a model with syntactic and semantic information. For example: such as: why cannot the ticket be annealed? And replacing the question phrase and the negative semantic annealing ticket.

And carrying out similarity calculation on the model with the syntactic and semantic information and a plurality of semantic model groups for determining the intention until the similarity between the model with the syntactic and semantic information and all the semantic model groups is calculated, selecting an intention result with the highest similarity and outputting a recognition result.

According to the dialogue intention automatic recognition system generated based on the linguistic rule, a large amount of manually labeled training data required by deep learning is not needed, and the linguistic rule generation module is used for converting the training corpus into a model with high abstraction and rich linguistic information, so that the model is generated only by a small amount of data. And the model content is visible to the system maintainer and can be modified and adjusted manually and directly to the content within the model. Meanwhile, the system has the matched similarity calculation for a plurality of intentions which are input by a certain user and can be hit simultaneously, and the condition that the keywords are hit simultaneously but cannot be distinguished can not occur.

Although the invention has been described in detail above with reference to a general description and specific examples, it will be apparent to one skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.

Claims

1. An automatic dialog intention recognition system based on linguistic rule generation, characterized in that the system comprises: the system comprises a word segmentation module, a linguistic rule generation module, an intention model group generation module and a model similarity calculation module, wherein the word segmentation module divides a sentence into words, the linguistic rule generation module marks the words with respective syntax and semantic information, the intention model group generation module performs duplication removal and combination with intention as a unit, each intention generates a semantic model group, and the model similarity calculation module calculates the similarity between two models.

2. The system of claim 1, wherein the segmentation module segments the complete sentence into a plurality of words according to commonly used words and phrases.

3. The system of claim 1, wherein the linguistic rule generation module is configured to mark each word with respective syntactic and semantic information based on common paraphrases of the word and the sentence to facilitate comprehension of the sentence.

4. The system of claim 1, wherein the intent model group generation module de-duplicates and merges one or more models with syntactic and semantic information generated by the linguistic rule generation module in units of intent, each intent generating a semantic model group.

5. The system of claim 1, wherein the system first performs model training to generate a unique semantic model set for an intent-determining sentence, inputs a text after the training, determines a corresponding model with syntactic and semantic information, compares the model with the semantic model set with the syntactic and semantic information, matches an optimal similarity, and outputs a recognition result.

6. The system as claimed in claim 5, wherein in the model training process, training sentences with different intentions are input into the system, the training sentences are divided into words by the word segmentation module, the words in the sentences are marked with corresponding syntactic and semantic information according to the linguistic rule generation module in the system, and each sentence of training data is generated into a model with syntactic and semantic information.

7. The system of claim 6, wherein the different intention training sentences are processed by de-duplication and merging of training data models generated in the same intention, and each intention generates a semantic model group.

8. The system of claim 5, wherein the input text is segmented into a plurality of words by using a segmentation module, and the segmentation criteria is determined by a general domain vocabulary preset by the system and a freely configurable custom domain vocabulary.

9. The system of claim 8, wherein after the input text is segmented into words, the linguistic rule generation module is used to automatically generate the segmented words into models with syntactic and semantic information.

10. The system of claim 9, wherein the model with syntactic and semantic information is similarity-calculated with a plurality of intent-determining semantic model groups until similarity between the model with syntactic and semantic information and all semantic model groups is calculated, an intent result with highest similarity is selected and recognition results are output.