CN112632259A - Automatic dialog intention recognition system based on linguistic rule generation - Google Patents

Automatic dialog intention recognition system based on linguistic rule generation Download PDF

Info

Publication number
CN112632259A
CN112632259A CN202011625429.1A CN202011625429A CN112632259A CN 112632259 A CN112632259 A CN 112632259A CN 202011625429 A CN202011625429 A CN 202011625429A CN 112632259 A CN112632259 A CN 112632259A
Authority
CN
China
Prior art keywords
model
intention
words
semantic
generation module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011625429.1A
Other languages
Chinese (zh)
Inventor
冷月
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Icsoc Beijing Communication Technology Co ltd
Original Assignee
Icsoc Beijing Communication Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Icsoc Beijing Communication Technology Co ltd filed Critical Icsoc Beijing Communication Technology Co ltd
Priority to CN202011625429.1A priority Critical patent/CN112632259A/en
Publication of CN112632259A publication Critical patent/CN112632259A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The invention discloses a dialogue intention automatic recognition system based on linguistic rule generation, which comprises: the system comprises a word segmentation module, a linguistic rule generation module, an intention model group generation module and a model similarity calculation module, wherein the word segmentation module divides a sentence into words, the linguistic rule generation module marks the words with respective syntax and semantic information, the intention model group generation module performs duplication removal and combination with intention as a unit, each intention generates a semantic model group, and the model similarity calculation module calculates the similarity between two models. The invention solves the problems of inaccurate conversation intention identification and large manual workload in the prior art.

Description

Automatic dialog intention recognition system based on linguistic rule generation
Technical Field
The invention relates to the technical field of semantic recognition, in particular to an automatic recognition system for conversation intentions generated based on linguistic rules.
Background
There are two general methods for recognizing dialog intentions: firstly, dictionary template-based regular matching; and secondly, performing intention recognition based on a deep learning classification model. The regular matching based on the dictionary template is composed of rule templates which are manually summarized and based on regular expressions, and different dialog texts are recognized as different intents in a way of carrying out regular matching on each dialog. The disadvantage of this technique is that the workload of manual summarization of rules is large, and the simple regular matching of keywords cannot fully identify semantics, and when a sentence hits the two intended regular matching rules at the same time, the system cannot judge.
The intention recognition is carried out on the basis of a deep learning classification model, and the method generally adopts supervised learning, utilizes a large amount of manually labeled linguistic data to carry out model training, and then judges the conversation intention through the deep learning model. The method has the problems that the demand for the labeled corpora is very large, the identification process is a black box, and the identification result cannot be subjected to hard manual intervention.
Disclosure of Invention
Therefore, the invention provides a dialogue intention automatic identification system generated based on linguistic rules, which aims to solve the problems of inaccurate dialogue intention identification and large manual workload in the prior art.
In order to achieve the above purpose, the invention provides the following technical scheme:
the invention discloses a dialogue intention automatic recognition system based on linguistic rule generation, which comprises: the system comprises a word segmentation module, a linguistic rule generation module, an intention model group generation module and a model similarity calculation module, wherein the word segmentation module divides a sentence into words, the linguistic rule generation module marks the words with respective syntax and semantic information, the intention model group generation module performs duplication removal and combination with intention as a unit, each intention generates a semantic model group, and the model similarity calculation module calculates the similarity between two models.
Further, the word segmentation module segments the complete sentence into a plurality of words according to commonly used words and phrases.
Furthermore, the linguistic rule generation module marks each word with respective syntactic and semantic information according to common paraphrases of the words and the sentences, so that the understanding of the sentences is facilitated.
Further, the intention model group generation module performs deduplication and merging on one or more models with syntactic and semantic information generated by the linguistic rule generation module by taking intents as units, and each intention generates a semantic model group.
Furthermore, the system firstly carries out model training to enable the sentence with determined intention to generate a corresponding unique semantic model group, after the training is finished, a text is input, a corresponding model with syntax and semantic information is determined, the model with syntax and semantic information is compared with the semantic model group, the best similarity is matched, and a recognition and understanding result is output.
Furthermore, in the model training process, training sentences with different intentions are input into the system, the training sentences are divided into words by using the word segmentation module, corresponding syntax and semantic information are marked on the words in the sentences by using the module for generating the linguistic rules in the system, and training data of each sentence are respectively generated into the model with the syntax and the semantic information.
Furthermore, in the different intention training sentences, the models of a plurality of training data generated in the same intention are subjected to de-duplication and combination processing, and each intention generates a semantic model group.
Furthermore, the input text is segmented into a plurality of words by using a word segmentation module, and the segmentation standard is determined by a general field vocabulary preset according to a system and a user-defined field vocabulary capable of being freely configured.
Furthermore, after the input text is segmented into words, the linguistic rule generation module is used for automatically generating the models with syntactic and semantic information from the segmented words.
And further, carrying out similarity calculation on the model with the syntactic and semantic information and a plurality of semantic model groups for determining the intention until the similarity between the model with the syntactic and semantic information and all the semantic model groups is calculated, selecting an intention result with the highest similarity and outputting a recognition result.
The invention has the following advantages:
the invention discloses a dialogue intention automatic recognition system based on linguistic rule generation, which does not need a large amount of manually labeled training data required by deep learning, and converts training linguistic data into a model with high abstraction and rich linguistic information through a linguistic rule generation module, so that the model is generated only by a small amount of data. And the model content is visible to the system maintainer and can be modified and adjusted manually and directly to the content within the model. Meanwhile, the system has the matched similarity calculation for a plurality of intentions which are input by a certain user and can be hit simultaneously, and the condition that the keywords are hit simultaneously but cannot be distinguished can not occur.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.
The structures, ratios, sizes, and the like shown in the present specification are only used for matching with the contents disclosed in the specification, so as to be understood and read by those skilled in the art, and are not used to limit the conditions that the present invention can be implemented, so that the present invention has no technical significance, and any structural modifications, changes in the ratio relationship, or adjustments of the sizes, without affecting the effects and the achievable by the present invention, should still fall within the range that the technical contents disclosed in the present invention can cover.
FIG. 1 is a flowchart of a training process of an automatic dialog intention recognition system based on linguistic rule generation according to an embodiment of the present invention;
FIG. 2 is a flowchart of a recognition process of an automatic dialog intention recognition system based on linguistic rule generation according to an embodiment of the present invention;
Detailed Description
The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Examples
The embodiment discloses a dialog intention automatic recognition system based on linguistic rule generation, which comprises: the system comprises a word segmentation module, a linguistic rule generation module, an intention model group generation module and a model similarity calculation module, wherein the word segmentation module divides a sentence into words, the linguistic rule generation module marks the words with respective syntax and semantic information, the intention model group generation module performs duplication removal and combination with intention as a unit, each intention generates a semantic model group, and the model similarity calculation module calculates the similarity between two models.
The word segmentation module segments the complete sentence, segments the sentence into a plurality of words according to the commonly used words and phrases for subsequent flow processing; the linguistic rule generation module marks each word with respective syntax and semantic information according to common paraphrases of the words and the sentences, and is favorable for understanding the sentences; the intention model group generation module performs deduplication and merging on one or more models with syntactic and semantic information generated by the linguistic rule generation module by taking intents as units, and each intention generates a semantic model group.
The system firstly carries out model training to enable the sentences with determined intentions to generate a corresponding unique semantic model group, after the training is finished, texts are input, corresponding models with syntax and semantic information are determined, the models with the syntax and the semantic information are compared with the semantic model group, the best similarity is matched, and recognition results are output.
In the model training process, training sentences with different intentions are input into the system, the training sentences are split into words by using the word segmentation module, meanwhile, the corresponding syntax and semantic information on words in the sentences are marked by the generation module according to linguistic rules in the system, and training data of each sentence are respectively generated into a model with the syntax and the semantic information. For example: such as: why cannot the ticket be annealed? Replacing the question phrase, negation semantic, returning and train ticket.
In different intention training sentences, the models of a plurality of training data generated in the same intention are subjected to de-duplication and merging processing, and each intention generates a semantic model group.
After training is finished, inputting a text needing to recognize an intention, segmenting the input text into a plurality of words by using a word segmentation module, wherein the segmentation standard is determined by a general field vocabulary preset according to a system and a user-defined field vocabulary which can be freely configured; after the input text is divided into words, the linguistic rule generation module is used for automatically generating the words into a model with syntactic and semantic information. For example: such as: why cannot the ticket be annealed? And replacing the question phrase and the negative semantic annealing ticket.
And carrying out similarity calculation on the model with the syntactic and semantic information and a plurality of semantic model groups for determining the intention until the similarity between the model with the syntactic and semantic information and all the semantic model groups is calculated, selecting an intention result with the highest similarity and outputting a recognition result.
According to the dialogue intention automatic recognition system generated based on the linguistic rule, a large amount of manually labeled training data required by deep learning is not needed, and the linguistic rule generation module is used for converting the training corpus into a model with high abstraction and rich linguistic information, so that the model is generated only by a small amount of data. And the model content is visible to the system maintainer and can be modified and adjusted manually and directly to the content within the model. Meanwhile, the system has the matched similarity calculation for a plurality of intentions which are input by a certain user and can be hit simultaneously, and the condition that the keywords are hit simultaneously but cannot be distinguished can not occur.
Although the invention has been described in detail above with reference to a general description and specific examples, it will be apparent to one skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.

Claims (10)

1. An automatic dialog intention recognition system based on linguistic rule generation, characterized in that the system comprises: the system comprises a word segmentation module, a linguistic rule generation module, an intention model group generation module and a model similarity calculation module, wherein the word segmentation module divides a sentence into words, the linguistic rule generation module marks the words with respective syntax and semantic information, the intention model group generation module performs duplication removal and combination with intention as a unit, each intention generates a semantic model group, and the model similarity calculation module calculates the similarity between two models.
2. The system of claim 1, wherein the segmentation module segments the complete sentence into a plurality of words according to commonly used words and phrases.
3. The system of claim 1, wherein the linguistic rule generation module is configured to mark each word with respective syntactic and semantic information based on common paraphrases of the word and the sentence to facilitate comprehension of the sentence.
4. The system of claim 1, wherein the intent model group generation module de-duplicates and merges one or more models with syntactic and semantic information generated by the linguistic rule generation module in units of intent, each intent generating a semantic model group.
5. The system of claim 1, wherein the system first performs model training to generate a unique semantic model set for an intent-determining sentence, inputs a text after the training, determines a corresponding model with syntactic and semantic information, compares the model with the semantic model set with the syntactic and semantic information, matches an optimal similarity, and outputs a recognition result.
6. The system as claimed in claim 5, wherein in the model training process, training sentences with different intentions are input into the system, the training sentences are divided into words by the word segmentation module, the words in the sentences are marked with corresponding syntactic and semantic information according to the linguistic rule generation module in the system, and each sentence of training data is generated into a model with syntactic and semantic information.
7. The system of claim 6, wherein the different intention training sentences are processed by de-duplication and merging of training data models generated in the same intention, and each intention generates a semantic model group.
8. The system of claim 5, wherein the input text is segmented into a plurality of words by using a segmentation module, and the segmentation criteria is determined by a general domain vocabulary preset by the system and a freely configurable custom domain vocabulary.
9. The system of claim 8, wherein after the input text is segmented into words, the linguistic rule generation module is used to automatically generate the segmented words into models with syntactic and semantic information.
10. The system of claim 9, wherein the model with syntactic and semantic information is similarity-calculated with a plurality of intent-determining semantic model groups until similarity between the model with syntactic and semantic information and all semantic model groups is calculated, an intent result with highest similarity is selected and recognition results are output.
CN202011625429.1A 2020-12-30 2020-12-30 Automatic dialog intention recognition system based on linguistic rule generation Pending CN112632259A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011625429.1A CN112632259A (en) 2020-12-30 2020-12-30 Automatic dialog intention recognition system based on linguistic rule generation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011625429.1A CN112632259A (en) 2020-12-30 2020-12-30 Automatic dialog intention recognition system based on linguistic rule generation

Publications (1)

Publication Number Publication Date
CN112632259A true CN112632259A (en) 2021-04-09

Family

ID=75289811

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011625429.1A Pending CN112632259A (en) 2020-12-30 2020-12-30 Automatic dialog intention recognition system based on linguistic rule generation

Country Status (1)

Country Link
CN (1) CN112632259A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115329883A (en) * 2022-08-22 2022-11-11 桂林电子科技大学 Semantic similarity processing method, device and system and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100204982A1 (en) * 2009-02-06 2010-08-12 Robert Bosch Gmbh System and Method for Generating Data for Complex Statistical Modeling for use in Dialog Systems
CN110334347A (en) * 2019-06-27 2019-10-15 腾讯科技(深圳)有限公司 Information processing method, relevant device and storage medium based on natural language recognition
CN111858888A (en) * 2020-07-13 2020-10-30 北京航空航天大学 Multi-round dialogue system of check-in scene

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100204982A1 (en) * 2009-02-06 2010-08-12 Robert Bosch Gmbh System and Method for Generating Data for Complex Statistical Modeling for use in Dialog Systems
CN110334347A (en) * 2019-06-27 2019-10-15 腾讯科技(深圳)有限公司 Information processing method, relevant device and storage medium based on natural language recognition
CN111858888A (en) * 2020-07-13 2020-10-30 北京航空航天大学 Multi-round dialogue system of check-in scene

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115329883A (en) * 2022-08-22 2022-11-11 桂林电子科技大学 Semantic similarity processing method, device and system and storage medium
CN115329883B (en) * 2022-08-22 2023-05-09 桂林电子科技大学 Semantic similarity processing method, device and system and storage medium

Similar Documents

Publication Publication Date Title
CN107741928B (en) Method for correcting error of text after voice recognition based on domain recognition
CN107066455B (en) Multi-language intelligent preprocessing real-time statistics machine translation system
CN105404621B (en) A kind of method and system that Chinese character is read for blind person
WO2008107305A2 (en) Search-based word segmentation method and device for language without word boundary tag
CN109033064B (en) Primary school Chinese composition corpus label automatic extraction method based on text abstract
Vinnarasu et al. Speech to text conversion and summarization for effective understanding and documentation
CN110321434A (en) A kind of file classification method based on word sense disambiguation convolutional neural networks
Kübler et al. Part of speech tagging for Arabic
CN108363692A (en) A kind of computational methods of sentence similarity and the public sentiment measure of supervision based on this method
Al-Anzi et al. The impact of phonological rules on Arabic speech recognition
Christodoulides et al. Automatic detection and annotation of disfluencies in spoken French corpora
CN112632259A (en) Automatic dialog intention recognition system based on linguistic rule generation
CN112307756A (en) Bi-LSTM and word fusion-based Chinese word segmentation method
CN107797986A (en) A kind of mixing language material segmenting method based on LSTM CNN
CN111104515A (en) Emotional word text information classification method
Yoo et al. Speech-act classification using a convolutional neural network based on pos tag and dependency-relation bigram embedding
CN114969312A (en) Marketing case theme extraction method and system based on variational self-encoder
Mahafdah et al. Arabic Part of speech Tagging using k-Nearest Neighbour and Naive Bayes Classifiers Combination.
Wang et al. Mongolian named entity recognition using suffixes segmentation
Tukur et al. Parts-of-speech tagging of Hausa-based texts using hidden Markov model
Mansikkaniemi et al. Adaptation of morph-based speech recognition for foreign names and acronyms
Hong et al. A hybrid approach to english-korean name transliteration
Outahajala et al. The development of a fine grained class set for Amazigh POS tagging
CN113836943B (en) Relation extraction method and device based on semantic level
Hasegawa-Johnson et al. Arabic speech and language technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination